Parameter Setting in Quantum Approximate Optimization of Weighted Problems

Quantum Approximate Optimization Algorithm (QAOA) is a leading candidate algorithm for solving combinatorial optimization problems on quantum computers. However, in many cases QAOA requires computationally intensive parameter optimization. The challenge of parameter optimization is particularly acute in the case of weighted problems, for which the eigenvalues of the phase operator are non-integer and the QAOA energy landscape is not periodic. In this work, we develop parameter setting heuristics for QAOA applied to a general class of weighted problems. First, we derive optimal parameters for QAOA with depth $p=1$ applied to the weighted MaxCut problem under different assumptions on the weights. In particular, we rigorously prove the conventional wisdom that in the average case the first local optimum near zero gives globally-optimal QAOA parameters. Second, for $p\geq 1$ we prove that the QAOA energy landscape for weighted MaxCut approaches that for the unweighted case under a simple rescaling of parameters. Therefore, we can use parameters previously obtained for unweighted MaxCut for weighted problems. Finally, we prove that for $p=1$ the QAOA objective sharply concentrates around its expectation, which means that our parameter setting rules hold with high probability for a random weighted instance. We numerically validate this approach on general weighted graphs and show that on average the QAOA energy with the proposed fixed parameters is only $1.1$ percentage points away from that with optimized parameters. Third, we propose a general heuristic rescaling scheme inspired by the analytical results for weighted MaxCut and demonstrate its effectiveness using QAOA with the XY Hamming-weight-preserving mixer applied to the portfolio optimization problem. Our heuristic improves the convergence of local optimizers, reducing the number of iterations by 7.4x on average.


Introduction
Quantum computers are widely believed to be able to provide computational speedups for various problems of relevance to science and industry [1,2].Combinatorial optimization is a domain that is very likely to benefit from quantum computing due to the ubiquity of hard optimization problems.Quantum Approximate Optimization Algorithm (QAOA) [3,4,5] cases and the resulting parameter setting rule hold with a high probability for a random weighted instance.
As MaxCut is deeply connected to the SK model [14], we briefly discuss a "weighted" modification of the SK model obtained by drawing couplings in the SK model from N (µ, σ 2 ) instead of N (0, 1).Here µ may depend on the problem size N .We call this modification "biased SK" and show that it behaves trivially in the infinite-size limit, unless µ = µ(N ) decays to zero with increasing N .
We evaluate the parameter setting rule implied by Theorem 3 numerically outside of its theoretical assumptions by applying QAOA with p ∈ {1, 2, 3} to MaxCut on a dataset of 34,701 weighted regular and non-regular graphs.We observe that our scheme outperforms the previously proposed approach of Ref. [9].On average, across all graphs, values of p and edge-weight distributions, QAOA with parameters obtained using our scheme achieves solutions that are only 1.1 percentage points (p.p.) away from optimal, improving upon the 3.5 p.p. obtained using the technique presented in the prior work [9].Moreover, the disparity from the solutions obtained using optimized parameters is reduced by a factor of three (from 3.6 p.p. to 1.0 p.p.) when the edge weights are drawn from the exponential distribution, and by a factor of ≈ 6 with the Cauchy distribution (from 20.7 to 3.3 p.p.).
We then propose a heuristic parameter rescaling rule for QAOA on arbitrary weighted problems.The heuristic rule is inspired by the theoretical results for the weighted MaxCut problem.As an example highlighting the generality of our observations, we consider QAOA applied to a portfolio optimization problem with a budget constraint, where the constraint is enforced throughout the QAOA evolution by the xy Hamming-weight-preserving mixer.We observe that our simple rescaling procedure makes the landscape easier to optimize, reducing the number of iterations required for convergence to a fixed local optimum by a factor of 7.4x on a dataset of 280 portfolios with between 7 and 20 assets (qubits).

Background
We begin by briefly reviewing the Quantum Approximate Optimization Algorithm, the parameter setting schemes for it, and the weighted MaxCut problem.

Quantum Approximate Optimization Algorithm
Consider the problem of optimizing some objective function C(x) defined on the n-dimensional Boolean cube that is encoded on n qubits by a diagonal Hamiltonian C = diag(C(x)).Quantum Approximate Optimization Algorithm (QAOA) [3,4] is a hybrid quantum-classical algorithm that approximately solves optimization problems by preparing a parameterized circuit such that upon measuring it, an approximate solution to the optimization problem is obtained.The QAOA circuit consists of layers of alternating unitaries, e −iγC and e −iβB , where C is the Hamiltonian corresponding to the optimization problem and B is the mixer Hamiltonian.Common choices of the mixer Hamiltonian B include the transverse field (B = j x j ) for unconstrained problems and the xy mixer (B = 1 2 j,k (x j x k + y j y k )) for problems with an equality constraint on the Hamming weight.The QAOA state with p layers is given by |γ, β⟩ = e −iβpB e −iγpC . . .e −iβ 1 B e −iγ 1 C |s⟩, where |s⟩ is the initial state and γ, β are free parameters chosen by a classical routine.We discuss the strategies for setting the parameters γ, β in Sec.2.2 below.
2.2 Parameter setting strategies for QAOA Multiple techniques have been proposed for obtaining high-quality parameters for QAOA.
While the parameters can be obtained by direct optimization of the objective (2) using a preferred optimization method [15,16,17,18,19,20,21], this procedure is typically computationally expensive [22,23,24,25].The cost of finding parameters can be significantly reduced by leveraging the apparent problem-instance independence of the optimal QAOA parameters [26,27].More straightforwardly, optimized parameters from one instance can be used directly as high-quality parameters for another instance from the same problem class [18,28,29,30,9].A machine learning model can be trained that would leverage the concentration to accurately predict the parameters [31,32,33,34,35,36].Optimal parameters can be derived exactly in certain analytically tractable cases, such as triangle-free regular graphs at p = 1 [37].
In certain cases, i.e. in the infinite-size limit of a given problem, a closed-form iteration can be derived for the QAOA objective, Equation (2), at constant p.Then parameters can be optimized in the infinite-size limit and used for finite-size instances.This has been demonstrated for the Sherrington-Kirkpatrick model [13] and for MaxCut on random graphs [7,12].The goal of this work is to extend these results to weighted problems.

MaxCut problem
For an undirected graph G = (V, E) with weights w uv = w {u,v} assigned to edges {u, v} ∈ E, the goal of MaxCut is to partition the set of nodes V into two disjoint subsets, such that the total sum of weights of the edges spanning both partitions is maximized.We refer to this problem as weighted MaxCut in the general case and as unweighted MaxCut when w uv = 1 for all {u, v} ∈ E.
For the weighted MaxCut problem the objective function is given by where z u ∈ {−1, 1} are the variables to be optimized and w uv are sampled from the desired probability distribution.The MaxCut objective is encoded on qubits by the Hamiltonian where z u and z v are Pauli-Z operators applied to the uth and vth qubits, respectively.For unweighted graphs, the cut fraction is defined as the ratio between the number of edges in a cut and the total number of edges in the graph.For a random unweighted (D + 1)-regular graph, the optimal cut fraction is, with high probability, given by where Π * ≈ 0.7632 is the Parisi value [14].
3 Parameter setting scheme for QAOA on weighted problems Our parameter setting scheme is motivated by the observation, formalized in Sec. 4, that in many cases the QAOA energy landscape for weighted MaxCut can be rescaled to match that of unweighted MaxCut for arbitrary p.In the case of weighted MaxCut, this gives an explicit parameter setting rule.In the case of a general objective, we use the same observation to propose a rescaling rule that makes the QAOA energy landscape easier to optimize.We validate our scheme numerically for both cases in Section 6.

Weighted MaxCut
The proposed procedure is as follows.First, rescale the edge weights in the graph following Second, use the parameter setting rule for the corresponding unweighted graph.
As an example of parameter setting rule for unweighted graphs to be used in the second step, one can use the parameters β inf , γ inf optimized for large-girth regular graphs in the infinite-size limit [7, Tables 4 and 5] and follow the rescaling procedure therein, which we include here for completeness: Here D is the average degree of the graph.Alternatively, the procedure from Ref. [38] can be used.For small D and p, higher quality results may be obtained by taking inspiration from the explicit formula of Ref. [37] and setting γ = γ inf arctan 1 As an optional third step, the quality of the parameters can be improved further by running a local optimizer with a small initial step from the parameters obtained in the second step.

General objective
For a general objective function and QAOA with an arbitrary mixer (e.g., constraintpreserving), analytical results are not available.At the same time, we can use the intuition from MaxCut to rescale the QAOA objective to make the geometry of the landscape more amenable to optimization.Specifically, if the objective f is given by a degree-k polynomial over spins z ∈ {−1, 1} n : our first step is to divide the objective by where E i is the set of i-way hyperedges, i.e. the number of terms of order i.In the second step, parameter optimization is performed as usual.This scaling is inspired by the observation that our results on weighted MaxCut generalize to problems with higher-order (higher than quadratic) terms; see Remark 3. In Section 6 we demonstrate the power of this simple procedure using the example of meanvariance portfolio optimization with a budget constraint enforced by the xy-mixer.

Analytical results for QAOA on weighted MaxCut
We now present the analytical results for QAOA applied to weighted MaxCut on largegirth regular graphs with i.i.d.edge weights.We begin by analyzing p = 1 in Section 4.1.QAOA energy for p = 1 is given by a simple trigonometric formula derived in [39,Theorem 7].We use this formula to derive globally-optimal QAOA parameters.The parameters we derive are optimal in expectation, with the expectation taken over the distribution of the edge weights.We first consider weights sampled from the exponential distribution and obtain optimal parameters for any graph size (Theorem 1).We analyze the exponential distribution separately as it allows us to derive globally-optimal parameters for finite-sized graphs.Then we consider the infinite-size limit, which enables us to relax the assumption on the distribution and obtain optimal parameters for graphs with weights sampled from an arbitrary distribution (Theorem 2).We then consider p ≥ 1 in Section 4.3.We extend the techniques of [7] to relate the QAOA objective landscape for weighted MaxCut to that for unweighted MaxCut (Theorem 5) and the SK model (Corollary 6.1).
4.1 Globally-optimal parameters for QAOA with p = 1 According to [39,Theorem 7], the expected QAOA performance for MaxCut on trianglefree graphs can be expressed in closed form as: where nbhd(u) is the neighborhood function that gives the set of vertices adjacent to u.The above is always maximized at β = π 8 .Thus, with a slight abuse of notation, we define ⟨C(γ)⟩ = ⟨C(γ, π 8 )⟩.We are considering the expected QAOA energy over the edge weights, i.e.E w [⟨C(γ)⟩].
In the sections that follow E w [•] denotes the expectation over the graph weights, w uv , that are all drawn independently from the distribution w.Thus, for (D + 1)-regular graphs with i.i.d.edge weights, this expectation simplifies as follows where we drop the subscript on w since the edge weights are i.i.d. and use the fact that We now consider edge weights distributed identically and independently according to the exponential distribution with parameter λ > 0, which has as its probability density function f (x) = λe −λx if x > 0 or f (x) = 0 otherwise.The mean and standard deviation are µ = σ = 1 λ .Proof.To obtain the optimal parameters, we start with Equation (10) and use the following identities which give Taking the derivative with respect to γ, we obtain where c is a positive and γ-independent constant.Setting the derivative to zero gives From Equation (14), we can see that so the global maximum is Note that unlike the following Theorems, this proof does not rely on any assumptions on D.
We now consider a graph with edge weights drawn from an arbitrary distribution with mean value µ and standard deviation σ.To study the infinite-size limit, we define a quantity that tends to a constant as D → ∞.Specifically, we consider the following quantity which reduces to the cut fraction if the graph is unweighted, i.e. w uv = 1; ∀{u, v} ∈ E.
Using Equation (11) followed by Equation (10), we can write where we introduce ϑ 1 (D, γ) to match Π * in Equation (5).We will now show that ϑ 1 (D, γ) tends to a D-independent quantity as D → ∞ when γ = Θ(D −1/2 ), and use the resulting limit to derive the optimal value γ * in the limit of infinite-sized graphs.
The assumption of γ = Θ(D −1/2 ) is inspired by the numerical observation that the optimal γ for unweighted MaxCut is Θ(D −1/2 ) (see, e.g.[12,Figure 1b]).Furthermore, we prove that ϑ 1 (D, γ) has a local maximum at a value γ = Θ(D −1/2 ) for sufficiently large D. In the limit of D → ∞, we prove that this local maximum is also the global maximum.This motivates the definition of the following limiting quantity Theorem 2 (p = 1, infinite size).Consider weighted MaxCut on a given triangle-free (D + 1)-regular graph with edge weights, w, drawn i.i.d.from a distribution w with finite second moment.Then for sufficiently large D, the function ϑ 1 (D, γ) associated with QAOA for p = 1 has a local maximum at a γ that is Θ(D −1/2 ).Moreover, the limiting quantity ϑ 1 (γ) attains its global maximum at .
Proof.The assumption of finite second moment along with Jensen's inequality implies that E w [|w|] is also finite.Thus, since the derivatives of the functions inside the expectations taken in Equation (19) Substituting γ = α √ D , for α independent of D and using the Taylor series expansions of the trigonometric functions, we get where the implicit exchange of infinite series and expectation over w is justified by the finiteness of the second moment and Fubini's theorem [41,Theorem 8.8].Here we use the observation that for x that is bounded by a constant independent of D. ‡ For sufficiently large D, both and for some sufficiently large constant α * independent of D. Thus, by Darboux's theorem [42, Theorem 5.12], γ has a local maxima in the interval (0, α * / √ D) for each triangle-free (D + 1)-regular graph.
We now consider the limiting value of ϑ 1 in the regime of small γ.
where we use Equation ( 24), and the implicit exchange of infinite series and expectation is justified by Fubini's theorem.Now taking the limit in D, we obtain Now, consider the derivative, It can be easily seen that the function ϑ 1 (γ ′ ) is always decreasing to the right of the local maximum at , and the function is negative to the left of zero.Thus this is in fact a global optima.
Remark 1.To see the correspondence between Theorem 2 and Theorem 1, i.e. that the latter is a special case of the former, rescale γ → γ/ √ D and note that the constant in the denominator in Theorem 1 has no effect on the limiting value as D → ∞.

Concentration of QAOA objective at p = 1
We show that the QAOA objective for weighted MaxCut instances concentrates sharply around its expectation as D → ∞, for triangle-free (D + 1)-regular graphs when p = 1.This indicates that our scaling rules, which are derived by investigating the expectation of the objective, can also be expected to hold with high probability for any weighted instance of a fixed graph.The QAOA objective in this setting is given in closed form in Equation (9), where we set β = π 8 .We will use the following bound on the partial derivatives of the objective as a function of the individual edge weights.Lemma 2.1.Let ⟨C(γ)⟩ be the p = 1 QAOA objective for weighted MaxCut on trianglefree (D + 1)-regular graphs as given in Equation (9), with β = π 8 .The partial derivatives of ⟨C(γ)⟩ with respect to the weights w ij satisfy, Proof.We define for convenience, To determine the last term, we have four cases, 1. u = i; v = j: In T uv , the term inside the parenthesis is upper bounded by 2 and independent of w ij .We have, 2. u = i; j ∈ nbhd(u)/{v} : In T uv , only the first term inside the parenthesis depends on w ij .We have, 3. v = j; i ∈ nbhd(v)/{u}: In T uv , only the second term inside the parenthesis depends on w ij .We have, Combining the above four cases, the result follows.
We shall also use the following straightforward observation.

Lemma 2.2. For any (D + 1)-regular triangle free graph, it must be that the number of vertices N ≥ D(D + 1).
Proof.Consider any node u.By definition u has D + 1 unique neighbors.Each neighbor of u has D neighbors in addition to u.Additionally, none of these new neighbors can also be neighbors of u (or they will form a triangle).The observation follows.
We are now ready to establish the concentration of ⟨C(γ)⟩ for some common edge weight distributions, at the value of γ that optimizes the expected objective.

Bounded Edge Weights
In this setting the edge weights are independently and identically sampled from a distribution that is supported on an interval [a, b], where |a|, |b| are independent of D. We can show concentration by using the well-known McDiarmid's inequality [43], which we state in the necessary form below. ).Let X 1 , . . ., X n be independent random variables each with range X .Let f : X n → R be any function with the bounded differences property, i.e., for any co-ordinate i ∈ [n] and points Then it holds that We now show a concentration inequality on the relative error from the mean of the QAOA objective under this setting.Then, for all γ = Θ(D −α ) for some α ∈ (0, 1), the QAOA objective ⟨C(γ)⟩ with p = 1 satisfies the following concentration inequality, where K is independent of D: Proof.
The right hand side of the above inequality can be written as K 1 + γDK 2 where K 1 , K 2 are indepedent of D. Now, we view ⟨C(γ)⟩ as a function of N (D + 1)/2 identical independent random variables.By the mean value theorem, Equation (37), the assumption γ = Θ(D −α ), and the boundedness of w ij , ⟨C(γ)⟩ satisfies the bounded differences property (Equation ( 34)) with respect to Finally, where K 4 is a constant independent of D. Here, the first inequality follows from Lemma 2.3, and the final inequality follows from Lemma 2.2.
|µ| 2 , the result of the theorem follows.
Theorem 3 applies immediately to the case where the parameter γ is set to Such a parameter setting is shown in Theorem 2 to maximize the expected objective in the limit D → ∞.

Gaussian Edge Weights
In order to show concentration when the weight distribution of each edge is Gaussian, we need an additional assumption that the number of vertices N is such that log(N ) = o(D log(D)).We use the following result on the sub-gaussian concentration of Lipschitz continuous functions of standard normal variables.Lemma 3.1 (Adapted from [45, Theorem 5.2.2]).Consider an n-dimensional random vector X drawn from the n-dimension standard normal, i.e.X ∼ N (0, I n ), and a differentiable function where K ψ is a universal constant.
The Lipschitz condition of Lemma 3.1 does not directly apply to the QAOA objective because the weights may be unbounded.Our argument will therefore address the concentration in two cases, using Lemma 3.1 to show concentration inside a suitably chosen interval, and Gaussian tail bounds to show that the probability of lying outside this interval is sufficiently small.We formalize this in the following concentration inequality on the relative error from the mean of the QAOA objective.Theorem 4 (p = 1, concentration, Gaussian weights).Let the edge weights of a (D + 1)regular triangle-free graph (V, E) with |V | = N vertices such that log(N ) = o(D log(D)), be chosen from the normal distribution N (µ, σ 2 ), where µ, σ are independent of D. Then, for all γ = Θ(D −1/2 ), the QAOA objective ⟨C(γ)⟩ with p = 1 satisfies the following concentration inequality, where K is independent of D: Proof.For any ϵ > 0, define the interval where with η to be chosen later.Let B η denote the event that w ij ∈ I η for all {i, j} ∈ E, and the event From the definition of conditional probability, We bound Pr[¬B η ] by the following consideration: where the first inequality follows by the union bound and the final equality by definition of t η .
To bound the second term on the RHS of ( 44), note that in the event of B η , the weights are all bounded in the interval [µ − t η σ, µ + t η σ].The absolute value of any weight in this interval is bounded above by b η = |µ + sgn(µ)t η σ|.We now consider the QAOA objective ⟨C(γ)⟩ as a function of a random N (D +1)/2-dimensional vector Z = (z ij ) {i,j}∈E whose entries are each drawn from the standard normal distribution N (0, 1) by defining z ij = w ij −µ σ .From Lemma 2.1 and an application of the chain rule, We can now bound the Lipschitz constant L of ⟨C(γ)⟩ as which leads to the following sequence of deductions: where K 1 is a constant independent of D. The second inequality follows because µ σtη = O(D −1/2 ), the third because log(D + 1), log(N ) = o(D log(D)) by assumption, and the fourth because γ = Θ(D −1/2 ) by assumption.
Finally, we can return to Equation (10) to see that Using Lemma 3.1 we have that, where K 2 is a constant independent of D. Finally, setting η = 1, the result follows from Equations (44,45,50).

Summary of concentration results
In view of our concentration results for graphs with weight distributions that have bounded support (Theorem 3), or for normally distributed weights (Theorem 4), we see that, in the limit D → ∞, and when γ = Θ(D −1/2 ), the typical relative deviations from the mean for the QAOA objective ⟨C(γ)⟩ are of the order Õ(D −1 ), which ignores terms that are O(polylog(D)).Particularly, the probability of a relative deviation greater than ϵ are ≈ exp(−ϵ 2 D 2 ) and ≈ exp(−ϵD/ log D) for bounded and Gaussian weights, respectively.Crucially, this shows that the relative dimensions in the quantity ϑ 1 (D, γ) have relative deviations of the order Õ(D −1/2 ).This is notable as ϑ 1 (D, γ) tends to a constant as D → ∞ when γ = Θ(D −1/2 ) (Theorem 2), and is the primary quantity of interest when investigating the cut fraction (tending to the Parisi value in the unweighted case [14]).

Correspondence between QAOA on weighted and unweighted graphs with p ≥ 1
To derive a parameter scaling rule for arbitrary p, we extend the techniques developed in Ref. [7] for MaxCut on large-girth, regular unweighted graphs to large-girth, regular graphs with i.i.d.edge weights.Without a subscript, |γ, β⟩ will refer to the p-layer QAOA state for a random weighted instance of MaxCut on a given (D + 1)-regular graph with weights, w, drawn from a distribution w.Note that We start by proving the following result about the above quantity that is valid for any p.
for any edge {L, R}.
where (V LR , E LR ) denotes the vertex and edge sets corresponding to the tree subgraph seen from the edge {L, R}.Note that Ref. [7] has an extra factor of 1 √ D in the exponential since the cost function in Equation ( 51) is therein defined with that extra factor.In addition, Γ is a (2p + 1)-component vector with entries Γ r = γ r , Γ −r = −γ r , Γ 0 = 0 and 1 ≤ r ≤ p.Also for node u, the vector z u = (z denotes an element-wise product, and The authors of [7] noted that Equation ( 53) can be computed recursively by traversing, from leaves to roots, the left and right branches simultaneously.This effectively "factors" the right-hand side of Equation (53).For simplicity we will only do this for p = 2, however, the generalization is straightforward.Define H (0) D = 1.We start by summing over the configuration of an arbitrary leaf in either branch of Figure 1: where p(u) is the parent of u in the tree subgraph.Since the left and right branches are D-ary trees, the expression for any node v in the second level of either branch is zv g(z v )H (1) where The second equality follows from the even parity of g • H This can again be done for the next level, i.e. the roots L or R, producing the quantity H (2) D .Lastly, we combine the results from the two branches by summing over the configurations of the left and right roots: where the last equality follows again from the even parity of g • H (r) D .The general iteration for the random quantity w LR ⟨γ, β|z L z R |γ, β⟩ follows by induction.
Note that in the previous recursion, each edge was only counted once.Since all of the edges are i.i.d., the expectation operation commutes with all products that appear in the right-hand side of Equation (58).More specifically, we have the following for general p: where subscripts have been dropped from the weights due to the i.i.d.assumption, and for 1 ≤ r ≤ p.Since all randomness has been removed by the expectation operation, it is evident that this quantity only depends on the graph structure and not the sampled weights, since they are i.i.d.The non-random quantities in the iteration for w LR ⟨γ, β|z L z R |γ, β⟩ only depend on the local graph structure that QAOA sees from a given edge.As argued earlier this graph structure is always two D-ary trees joined at their roots.Thus, the quantity E w [w⟨γ, β|z L z R |γ, β⟩] is independent of the chosen edge {L, R}, which is analogous to the unweighted case.
The result of the lemma follows from Equation ( 51) and the linearity of expectation.
By the previous lemma, we can, analogously to the p = 1 case, define ϑ p (D, γ, β) as follows: and where the subscripts on w and z have been dropped since they can be arbitrary by the previous lemma.The quantity considered in [7,Equation A.19] for the unweighted case is the following where the subscript "u" indicates that the parameterized state is prepared by a p-layer QAOA for the corresponding unweighted problem on the same graph.Our main result below shows that γ can be scaled to make these two quantities equal up to a global scaling factor.
Theorem 5 (p ≥ 1, infinite size).If the girth > 2p + 1, and the edge-weight distribution, w, has finite second moment, then for all parameters γ, β the following holds Proof.We implicitly assume that γ → γ √ D and thus Γ → Γ √ D in Equation (59).By the product rule for limits, we can evaluate the limits of the terms H(p) D (z L ), H(p) D (z R ), and the one involving the sin separately, since we will show they individually exist.Note that for any sum inside of the product of Equation ( 60): where the implicit exchange of the expectation operator and infinite series expansion of trig functions is justified by Fubini's theorem and the assumption of the weight distribution having finite second moment, like in Section 4.1.In addition, we have used the following generalization of [7,Equation (A.23)], where for any r: After taking expectations, it follows that zu g(z u ) H(r−1) D (z u ) = 1.By the i.i.d.assumption, Equation ( 65) is the same for every u ∈ c(v), and thus where we use Equation (24).Along with continuity, the previous result implies The limit can then be propagated down to the lowest level of the recursion.Similarly for the term involving: Putting this altogether, for arbitrary {L, R} ∈ E, we have This result implies a relationship in the infinite-size limit between QAOA's objective value for weighted MaxCut and the SK model.Let where J uv ∼ N (0, 1), and where the subscript J of the state signifies that the state was prepared by a p-layer QAOA with the SK objective as the phase operator.
Theorem 6 (Restated from [7]).For all p and all parameters (γ, β) the following holds Trivially, this in combination with Theorem 5 leads to the following corollary.
Corollary 6.1.If the edge-weight distribution has finite second moment, then for all p and all parameters (γ, β) the following holds Thus the performance of QAOA on SK, MaxCut on large-girth, regular graphs and weighted MaxCut on large-girth, regular graphs are equivalent in the infinite-size limit.
Remark 4. By [46,Theorem 3] and Remark 3, one can trivially extend Corollary 6.1 to connect QAOA's performance on weighted MaxCut on regular k-uniform hypergraphs to its performance on pure k-spin models, generalizing SK.

Observations about biased SK model
As presented in Corollary 6.1, there is a deep connection between arbitrarily-weighted MaxCut and the SK model.The SK model is given in Equation (70) and has couplings J uv ∼ N (0, 1).A natural generalization to consider is a model which has couplings J uv ∼ N (µ, σ 2 ) with arbitrary µ and σ.More generally, we can allow for the bias to be a function of the number of spins, i.e. µ(N ).When µ(N ) ̸ = 0, we call this the biased SK model, and when µ(N ) = 0, we call it the standard SK model.Unfortunately, this natural generalization does not lead to interesting behavior.Specifically, we show that unless µ(N ) → 0, the biased SK problem is trivial in the thermodynamic limit.
The performance of QAOA for arbitrary p on standard SK, specifically an iteration for the quantity V p , was originally established in [13] using different techniques than those of Section 4.3.However, it is not clear how these techniques can be generalized to nonsymmetric distributions.In this section, we use a different set of elementary techniques to determine the limiting optimal value of different versions of the biased SK model.Our goal is to find an analog to the Parisi value [47,48] for the biased model.The following is based on Ref. [49]. For where The optimization problem is max z G(z).Note that unlike the standard SK, which is symmetric around zero, here we must keep track of the signs of the couplings.
When µ(N ) = 0, we know that In the standard SK model, the weights are scaled by N −1/2 to ensure that expected maximum instead grows linearly with N .This is the reasoning for the scaling of the standard SK objective presented in Equation (70).For even N , let when µ(N ) ̸ = 0.More specifically, when µ(N ) > 0, the problem max z E J [G(z)] reduces to MaxCut on a complete graph with all edge weights equal to µ(N ), and thus the optimal cut value is µ(N ) N 2 4 , i.e. set half of the z i = 1.When µ(N ) < 0, the optimal value is obtained when all z i = 1, and results in an objective function value of µ(N ) N  2 .Note that when N is odd, the factor in the denominator is the same for all cases, and thus we can restrict to even N , wlog.
We have the following by the convexity of max: and Minimizing over α > 0, gives and thus Note, the right-hand side of the last inequality now involves only σ instead of σ 2 , and thus is invariant under any scaling before or after the expectation in When µ(N ) = 0, we have h(N ) = 0, and we recover the SK scaling, i.e., Also for standard SK with J ij ∼ N (0, 1), which is a special case of µ(N ) = 0, it is known that where Π * is the Parisi value [49].Thus for J ij ∼ N (0, σ 2 ), it is easily seen that To summarize, the limiting behavior for SK with a distribution with any σ ̸ = 1 can be obtained by simple rescaling.To connect this to the quantities discussed at the end of Section 4.3, for N (0, σ 2 ), the quantity V p in Equation (71) scales as Therefore for the remainder of this Section we focus on the case where µ(N ) ̸ = 0.
If µ(N ) → µ as N → ∞ for some nonzero constant µ, the term involving µ(N ) dominates, and the expected maximum is Θ(N 2 ).More specifically, using the definition of h(N ) in (75) for nonzero µ, we get that lim N →∞ µ(N )h(N ) = Θ(N 2 ), and both sides of Equation (79) are of the same order in N .Thus the squeeze theorem implies that the limiting quantity is exactly where we define sign(x) to be 1 when x ≥ 0 and 0 when x < 0. This accounts for the constant factor differences in h(N ) for µ ̸ = 0.The solution to the above problem is trivial in the infinite limit: set all z i = 1, when µ < 0, and set half of the z i to 1 when µ > 0.
However, to compensate for the maximum growing Θ( √ N ) faster (than standard SK) when µ(N ) is not always zero, we could consider defining the biased SK model to have Thus the growth of the maximum is again O(N 3/2 ), like in the original SK.Recall that the original SK satisfies with the limiting value being the Parisi value.In this regime, the "biased SK" model appears non-trivial and the limiting value and solution are not obvious.We are unable to find any mention of such a model in existing literature, and we leave the study of its properties to future work.
6 Numerical results

Weighted MaxCut
Numerical investigation of the proposed parameter setting rule has been performed on a dataset of weighted graphs from Ref. [9], available through QAOAKit [50].The dataset consists of a total of 34,701 weighted graphs with up to 20 nodes and contains both regular and non-regular graphs.The graphs have edge weights drawn i.i.d.from four different distributions, namely Uniform over [0, 1] ("Uniform+"), Uniform over [−1, 1] ("Uniform ±"), Exponential (with λ = 0.2), and Cauchy.
For the numerical study, we investigate two proposed parameter setting rules, which are variants of Equation (6): The approximation ratios obtained with directly optimized parameters, parameter setting method of Ref. [9], and parameter setting methods presented in this work.b) The gap between the approximation ratios with optimized parameters and with parameter setting methods of Ref. [9], (i) and (ii).The proposed parameter setting methods perform better when compared to the prior work, as indicated by the reduced gap from the objective obtained with the optimized parameters. (ii where the parameters β inf , γ inf are the optimized parameters for large-girth, regular graphs in infinite-size limit from [51, Table 4], and D is the average degree.Our baseline is the parameter scheme of Ref. [9], given by: where γ median is a median taken over optimized parameters for all 261, 080 nonisomorphic connected 9-node graphs.The key difference between our scaling and that of Ref. [9] is the choice of the denominator.Since γ median is close in value to γ inf , the nominator is similar in both schemes.We refer to the parameter setting procedure described in Equation (86) as method (i) and that in Equation (87) as method (ii).The first method is inspired directly by the analytical results described in Section 4. We observe that in the case of small p, better results are obtained when the formula for p = 1 from [37] is considered, which motivates the second rule.We note that while we have derived the exact formula for graphs with weights sampled from the exponential distribution, we do not use it in the numerical experiments.Our goal for numerics is to simulate the practical setting, wherein one does not know the distribution from which the weights are sampled.
We analyze the performance of the proposed parameter setting rules across multiple The optimality gaps achieved by the proposed parameter setting rules and the rule in [9].Proposed method leads to lower optimality gaps in all cases except p = 1 for the Uniform+ distribution.
weight distributions, values of p and values of N .Herein, we denote the median approximation ratio with directly optimized parameters by r opt , with the parameter setting scheme from Ref. [9] by r [9] and with the two proposed methods as r D and r arctan respectively.We refer to the difference between the approximation ratio of a given parameter setting scheme and r opt as the optimality gap.The results are presented in Figure 2.
Our techniques lead to lower optimality gaps as compared to Ref. [9] in all cases except p = 1 with weights sampled uniformly from [0, 1].We note that the gap between the methods (i) and (ii) reduces as p increases.For example, for N = 8 the optimality gap drops from 0.0111 on average for p = 1, to 0.0062 for p = 2, and eventually to 0.0005 for p = 3.
The median difference in approximation ratios for all considered p and weight distributions is 1.8 p.p. for method (i) and 1.45 p.p. for method (ii).Specifically, for the cases of exponential and Cauchy distributions, the median differences in approximation ratios from our method (i) are 1.3 p.p. and 3.8 p.p. respectively, and those from method (ii) remain a mere 1.0 p.p. and 3.3 p.p. respectively.For comparison, the previous proposal [9] obtains median differences of 3.6 p.p. and 20.7 p.p. for the weights drawn from exponential and Cauchy distributions respectively.As can be seen in Figure 3, for the case when the edge weights are drawn from a Cauchy distribution, the improvement over Ref. [9] is the largest, with an 8× reduction in optimality gap at p = 3.The median quality of solution for median variance across different distributions and for multiple p achieved through our introduced methods when compared with the previous study [9] for n = 14 .As variance, s 2 , grows, the optimality gaps achieved by our presented techniques are almost an order of magnitude better in comparison to the method in Ref. [9].The optimality gaps obtained by considering our presented parameters setting rules are of comparable values for each p.However, the performance of the method of Ref. [9] deteriorates as p increases.
From the values shown in Table .2, it can be observed that when the variance is small,  the performance obtained using the method in the prior work [9] is comparable to that of our methods.However, as variance increases the solution qualities achieved by the methods introduced in this work beat those of the previous work [9].

Portfolio optimization
While the analytical results of Section 4 only apply to MaxCut and QAOA with transversefield mixer B = j x j , the intuition applies more broadly.To illustrate this point, in this Section we consider a portfolio optimization problem with a budget constraint, given by This problem is commonly considered as a target for QAOA [52,11,53,54,55,56].We use random instances generated by RandomDataProvider in qiskit finance [57] and set q = 0.5 and k = ⌊ n 2 ⌋.We consider 20 instances for each number of qubits between 7 and 20, for a total of 280 instances.For each instance, we reformulate the problem in terms of spin variables and rescale the objective following Equation (8), with the scaling coefficient computed separately for each instance.We use QAOA with p = 1 and the xy mixer given by B = 1 2 N j=1 (x j x j+1 + y j y j+1 ) and set the initial state |s⟩ to be the uniform superposition over all Hamming weight k states (Dicke state).We denote this variant xy-QAOA.
We first observe that the rescaling improves the optimization landscape.The QAOA energy at p = 1 is presented in Figure 4.The landscape with the original objective function (Figure 4a,b) is flat, making the optimization difficult.After rescaling (Figure 4c), a clear local minimum is visible.Moreover, the geometry of the landscape is similar with regards to both parameters.This suggests that the optimization of QAOA parameters with the rescaled objective should be easier.
To quantify this improvement, we optimize the parameters using the NLopt [58] implementation of the BOBYQA [59] gradient-free optimizer.We use BOBYQA as it has been shown to be an effective optimizer for QAOA parameters [18].For each instance, we run one local optimization starting from the optimal QAOA parameters for the Sherrington-Kirkpatrick model obtained from [51,Table 4] and given by γ init = 1, β init = π 4 ‡ .We set the stopping criteria to be xtol = ftol = 10 −8 .We present the performance profiles in Figure 5.The optimizer finds the same optimum for the original and rescaled objective in 91% of the cases.We consider that an optimizer "solves" the instance when it recovers this optimum.When the objective is rescaled, the optimizer takes on average 7.4 times fewer iterations to find the same optimum.
To contrast portfolio optimization with the weighted MaxCut problem and to highlight the importance of including the weight for the linear term in Eq. 8, we additionally plot the performance profile for results with rescaling rule where the coefficients for linear terms µ j are ignored during the rescaling and only Σ ij are used.This profile is marked "ignore µ" in Figure 5.As can be seen from the plot, ignoring weights for linear terms leads to incorrect rescaling, with the optimizer failing to recover the optimum in the majority of cases.We note that the weights µ j act analogously to bias in the "biased SK" model of Section 5 and can dominate the objective for some instances.

Discussion
In this work, we propose heuristic parameter setting rules for QAOA, inspired by a formal connection between weighted and unweighted MaxCut on regular graphs.For p = 1, we derive explicit expressions for the parameter γ that maximizes the cost function in the weighted case.Our analysis of MaxCut at p = 1 rigorously proves that the globallyoptimal γ are small, providing additional justification for this commonly used assumption [7,10,6,12].For p ≥ 1, we show explicitly how the energy landscape and, consequently, the optimal parameters scale between the weighted and unweighted cases.As we prove the concentration of the QAOA objective, our results apply with high probability to any random weighted MaxCut instance.An important limitation of our analysis is the high-girth assumption, which we inherit from Ref. [51].Surprisingly, the simple rules we derive for MaxCut on high-girth, regular graphs apply broadly, which we demonstrate by extensive numerical experiments on MaxCut on a general class of random graphs and on random instances of a constrained portfolio optimization problem.
Additionally, we consider the biased SK problem and rigorously show that it has a ‡ The parameters in [7,Table 4] differ from these by a factor of 2. The difference is due to the constant factors in the QAOA simulator implementation that we use.trivial solution in the infinite-size limit, unless the mean of the weight distribution falls sufficiently fast with the number of vertices.This investigation was inspired by the connection between SK and MaxCut on regular graphs, and the observation that the closed-form iterations that we use for QAOA do not apply to complete graphs.However, it appears that, unlike standard SK, the analysis of QAOA performance is unlikely to lead to significant insights when the weights are biased.
Our observation that QAOA parameters γ have to decrease with problem size is an instantiation of a broader principle, namely that parameterized quantum circuits are not scale-independent.Similar results have been observed for quantum kernel methods [60,61] and quantum neural network initialization [62].A unification of these observations into a general theory of parameterized quantum circuits is a tempting prospect, though it would require the development of novel mathematical techniques.
[8] Matthew B. Hastings."A classical algorithm which also beats

Theorem 3
(p = 1, concentration, bounded weights).Let the edge weights of a trianglefree (D + 1)-regular graph (V, E) with |V | = N vertices, be chosen from a distribution w with mean µ, that is supported on the interval [a, b], where |a|, |b|, |µ| are independent of D.

Lemma 4 . 1 .
If a (D + 1) regular graph has girth > 2p + 1 and i.i.d.edge weights, w, drawn from w, then the QAOA objective for weighted MaxCut on this graph satisfies

Figure 1 :
Figure 1: Unweighted tree subgraph seen by QAOA from the edge {L, R} with p = 2 on a four-regular graph.The operation p produces the parent of a node, and the operation c produces the set of a node's immediate children.

D
, implied by [7, Claims A.14 and A.15] and c(v) denotes the set of D immediate children of v.

Figure 2 :
Figure2: a) The approximation ratios obtained with directly optimized parameters, parameter setting method of Ref.[9], and parameter setting methods presented in this work.b) The gap between the approximation ratios with optimized parameters and with parameter setting methods of Ref.[9], (i) and (ii).The proposed parameter setting methods perform better when compared to the prior work, as indicated by the reduced gap from the objective obtained with the optimized parameters.

Figure 3 :
Figure3: Approximation ratio for the graphs with edge weights drawn from a Cauchy distribution for N = 14.The proximity to the optimized parameter scenario, especially for large p, indicates the power of the suggested parameter setting strategies and shows a clear improvement over the earlier work.Our methods reduce the optimality gap by a factor of 8 for p = 3 as compared to Ref.[9].

Figure 4 :
Figure 4: Energy landscape of xy-QAOA applied to a random portfolio optimization problem with (a,b) original and (c) rescaled objective.Rescaling improves the geometry of the landscape.To highlight the flatness of the unrescaled landscape, (b) plots the same landscape as (a), but using the color scheme of (c).

Figure 5 :
Figure 5: Performance profiles for BOBYQA run from a fixed initial point on 280 instances between 7 and 20 qubits.A line plots the ratio of instances solved with a given number of iterations.Rescaling the objective reduces the number of iterations until convergence to the target local optimum by 7.4x on average.Ignoring weights for linear terms, i.e. µ, leads to incorrect rescaling and to optimizer failing to recover the optimum.