Quantum annealing initialization of the quantum approximate optimization algorithm

The quantum approximate optimization algorithm (QAOA) is a prospective near-term quantum algorithm due to its modest circuit depth and promising benchmarks. However, an external parameter optimization required in QAOA could become a performance bottleneck. This motivates studies of the optimization landscape and search for heuristic ways of parameter initialization. In this work we visualize the optimization landscape of the QAOA applied to the MaxCut problem on random graphs, demonstrating that random initialization of the QAOA is prone to converging to local minima with sub-optimal performance. We introduce the initialization of QAOA parameters based on the Trotterized quantum annealing (TQA) protocol, parameterized by the Trotter time step. We find that the TQA initialization allows to circumvent the issue of false minima for a broad range of time steps, yielding the same performance as the best result out of an exponentially scaling number of random initializations. Moreover, we demonstrate that the optimal value of the time step coincides with the point of proliferation of Trotter errors in quantum annealing. Our results suggest practical ways of initializing QAOA protocols on near-term quantum devices and reveals new connections between QAOA and quantum annealing.


Introduction
Recent technological advances have led to a large number of implementations [1-4] of so-called Noisy Intermediate-Scale Quantum (NISQ) devices [5]. These machines, which allow to manipulate a small number of imperfect qubits with limited coherence time, inspired the search for practical quantum algorithms. The quantum approximate optimization algorithm (QAOA) [6] Stefan H. Sack: stefan.sack@ist.ac.at Maksym Serbyn : maksym.serbyn@ist.ac.at has emerged as a promising candidate for such NISQ devices [7][8][9].
The QAOA is a variational hybrid quantum algorithm where the classical computer operates a NISQ device. The computer is responsible for the optimization of the cost function over a set of variational parameters. The cost function is calculated using a NISQ device that prepares a quantum state corresponding to chosen parameters and performs quantum measurements. In QAOA of depth p the wave function is prepared by a unitary circuit parametrized by 2p parameters, see Fig. 1(a). Each of the p layers consist of two unitaries: the first is generated by a classical Hamiltonian H C that encodes the cost function of a combinatorial optimization problem, and the second is generated by the mixing quantum Hamiltonian, H B . While the p = 1 limit of QAOA allows for analytic considerations and derivation of performance guarantees [6], subsequent work suggested that higher depth p may be required in order to achieve a quantum advantage [8,10]. However, increasing p leads to a progressively more complex optimization landscape, that is characterized by a large number of local suboptimal minima [7,9,11,12], see Fig. 1(c). The convergence of classical optimization algorithms into such sub-optimal solutions was demonstrated to be a potential bottleneck of QAOA performance as finding a nearly optimal minimum usually requires exponential in p number of initializations of the classical optimization algorithm [6,7]. Note, that the problem of sub-optimal local minima is different from that of barren plateaus [13,14], i.e. large regions in parameter space with vanishing gradients, since barren plateaus are associated with circuit depths p polynomial in system size N [15], beyond what is typically considered in the QAOA.
The complexity of the energy landscape of large-p QAOA has motivated the search for heuristic ways of improving the convergence to a (nearly) optimal minimum values of the variational parameters. Recent work has demonstrated a concentration of the QAOA landscape for typical problem instances [16], which implies the existence of a typical landscape and hints at the fact that the same variational parameter choice may work between different problem instances or sizes. A particular example of such a heuristic was proposed in Ref. [7] which constructs a good initialization for the QAOA at level p + 1 using the solution at level p, thus requiring a polynomial in p number of optimization runs. Other approaches, such as reusing parameters from similar graphs [12], using an initial state that encodes the solution of a relaxed problem [17], or utilizing machine learning techniques to predict QAOA parameters [18,19] were also proposed.
In this work we propose a different approach to the QAOA initialization, based on the relation between QAOA and the quantum annealing algorithm. Quantum annealing uses adiabatic time evolution to find the lowest energy state of H C , but often requires unfeasible evolution time T [20]. We explore the observation that Trotterization of unitary evolution in quantum annealing provides a particular choice of parameters for the QAOA [6]. This leads us to introduce a one-parameter family of Trotterized quantum annealing (TQA) initializations for QAOA, controlled by the time step or, equivalently, total time used in adiabatic evolution.
The central result of our work is the demonstration that TQA initialization for QAOA gives comparable performance to the search over an exponentially scaling number of random initializations. To this end, we establish that TQA initialization leads to convergence of the QAOA to a nearly optimal minimum for a certain range of time steps, see Fig. 1(c) for visualization. Furthermore, we identify the optimal time step of the TQA initialization and suggest a purely experimental way of fixing this parameter.
Our work reveals a connection between intermediatep QAOA and short-time quantum annealing. Previous studies [6][7][8] established a correspondence between quantum annealing with long annealing times and the QAOA protocol with large p (potentially increasing exponentially with the problem size). More recent work proposed quantum annealing inspired initialization strategies for the so-called 'bang-bang' modification of the QAOA [21] that however also corresponds to large circuit depths. Our work is different from this context, since we establish that the best performance is achieved for a very coarse discretization of quantum annealing, resulting in a realistic circuit depth. We show the existence of an optimal step for TQA discretization that does not depend on problem size and QAOA depth. This suggests an intimate relation between QAOA and TQA, since the optimal value of the time step is in close correspondence to the point where proliferation of Trotter error occurs in TQA [22].
The remainder of the paper is organized as follows: in Section 2 we introduce the QAOA, visualize its optimization landscape and show that most random initial-

TQA-initialization
Random initializations initialization i γ i , β i 1 5 Figure 1: (a) The circuit that prepares a quantum state in the QAOA is parametrized by a set of 2p angles γi, βi. For the MaxCut problem, that is considered in the main text, the unitaries can be expressed using single and two qubit gates that are readily available on current NISQ devices. (b) The optimization of HC is launched from a certain guess of parameters, state preparation and measurements are iterated until the algorithm converges to a set of optimized angles γ * i , β * i . (c) The cartoon of the cost function HC landscape as a function of variational parameters shows that random initializations are prone to converge to sub-optimal local minima. In contrast, the family of TQA initializations proposed in this work converges to the (nearly) optimal minimum.
izations concentrate around sub-optimal local minima. Next, in Section 3 we discuss TQA and the corresponding initialization and show that it avoids converging at sub-optimal local optima. Finally, in Section 4 we summarize the results, discuss its implications and potential future work.
2 Optimization landscape of QAOA 2.1 QAOA for MaxCut problems As we discussed in the Introduction, the QAOA is often applied to hard combinatorial optimization problems. In what follows we concentrate on the problem of finding a maximal cut (MaxCut) in a given graph which has become one of standard tasks used to benchmark the QAOA [7,9]. Finding the maximum cut is an N P -hard combinatorial optimization problem, though efficient classical algorithms exist that yield good approximate solutions. Notably, the Goemans-Williamson algorithm yields a cut that is at least 88% of the size of the maximum cut in polynomial time [23].
Given a graph G = (V, E) with vertices V = 1, 2, ..., N and edges E = { i, j }, the maximal cut is defined as the partition that splits the vertices into two groups, maximizing the number of edges that connect vertices from different groups. Mathematically, such partition amounts to finding the global minimum of a cost function, C( z) = i,j E z i z j , where the binary variables z i correspond to the vertices of the graph, and their value z i = ±1 encodes which partition the given vertex i belongs to. The cost function C( z) can be mapped into a classical spin Hamiltonian by promoting the binary variable z i to the quantum spin-1/2 operator σ z i . The resulting Hamiltonian, operates on N spins that reside on the vertices V of the corresponding graph and interact with each other when connected by an edge. The QAOA uses a NISQ device to prepare the following quantum state [see Fig. 1(a)]: where H C is classical Hamiltonian introduced above, and H B = − N i σ x i the mixing Hamiltonian, as proposed by Farhi et al. [6]. Both operators operate on the Hilbert space corresponding to N spins or, equivalently, qubits, and the initial state |0 B = |+ ⊗N corresponds to all qubits pointing along x-direction, thus yielding the ground state of H B . The variational parameters are obtained by minimizing the expectation value H C γ, β as: ( γ * , β * ) = arg min which is typically carried out with numerical optimization routines. To benchmark the QAOA it is useful to define the approximation ratio, which quantifies how close the expectation value of the classical Hamiltonian over the QAOA wave function is to the ground state energy of H C , denoted as C min . For QAOA at depth p = 1 the algorithm is guaranteed to find a cut that is at least 69% the size of the optimal cut [6], while for p > 1 analytic results are limited [24]. The performance of the QAOA is typically investigated over an ensemble of graphs rather than an individual realization. Below we focus on the ensemble of random 3-regular graphs, where each vertex is connected to three other vertices chosen at random. However, in the Appendix we also consider weighted 3-regular graphs and Erdős-Rényi graph ensembles in order to illustrate the general applicability of our results.

Visualizing optimization landscape
The performance of the classical optimization in Eq. (3) strongly depends on the properties of the optimization landscape. While this landscape can be readily visualized for p = 1, the dependence of approximation ratio r γ, β on 2p angles parametrizing QAOA was suggested to become progressively more complex for larger values of p. In order to visualize the properties of this highdimensional landscape, we focus below on points where 1 − r γ, β achieves (local) minima.
We quantify properties of minima using two different characteristics. First, we measure the difference between the approximation ratio of the given minimum characterized by angles γ, β and the global minimum characterized by angles γ * , β * , ∆r γ, β = r γ * , β * − r γ, β . This definition implies that the smallest possible value of ∆r γ, β is 0, and larger values of ∆r γ, β corresponds to local minima with poor performance (i.e. much larger value of cost function) compared to the global minimum. The second characteristic measures the distance between minima in parameter space, where | . . . | α denotes the absolute value modulo α which takes into account symmetries, see Appendix A.
We calculate values of ∆r γ, β and d γ, β numerically. For a given graph realization we use 2 p different random initializations of variational parameters γ, β and optimize them using the iterative BFGS algorithm [25][26][27][28]. The algorithm is accessed via the scipy.optimize Python module with default parameters [29]. Convergence is achieved when the norm of the gradient is less than 10 −5 , maximum number of iterations is set to 400p, where p is the QAOA depth. In our simulations the routine typically converged before using up the maximum number of allowed iterations. We use the converged angles with the lowest value of 1 − r γ, β as an estimate for the global minimum γ * i , β * i . Figure 2 visualizes the structure of local minima via the joint probability distribution of ∆r γ, β and d γ, β for 50 different graphs using Kernel Density Estimation [30,31]. We observe that for QAOA with p = 5 the most typical local minima reached from random initialization are far away from the best minimum (corresponding to ∆r γ * , β * = 0 and d γ * , β * = 0) both in terms of quality of approximation ratio and parameter values. While this figure illustrates a particular choice of system size and QAOA depth, a similar trend is observed for different N , p, and other graph ensembles, see Appendix A.
The tendency of random initialization to converge to suboptimal solutions highlights the importance of better initialization methods. In the next section we investigate a family of initializations inspired by quantum annealing and demonstrate that it achieves a good approximation ratio with a suitable choice of parameters.  Figure 2: Joint probability distribution of distance to the global minimum in parameter space d γ, β and in terms of approximation ratio ∆r γ, β reveals that the most probable outcome of random initialization is a convergence to sub-optimal local minima (yellow region). The orange dot corresponds to average values of d γ, β , ∆r γ, β for random initialization. In contrast, TQA initialization leads to a local minimum with a better approximation ratio that occasionally outperforms the best random initialization (red square, shifted from slightly negative values to ∆r γ, β = 0 for improved visibility). The data is averaged over 50 random unweighted 3-regular graphs with N = 12 vertices and QAOA at level p = 5.
3 Trotterized quantum annealing as initialization 3.1 Optimal time for TQA Quantum annealing [32,33] was among the first algorithms proposed for quantum computing [34,35], and was demonstrated to be universal for T → ∞ and equivalent to digital quantum computing [36]. The general idea of quantum annealing is to prepare the ground state |0 C of a classical Hamiltonian H C starting from the ground state |0 B of the mixing Hamiltonian H B using adiabatic time evolution under the Hamiltonian Practical execution of quantum annealing on NISQ devices requires discretization to represent such unitary evolution via a sequence of gates, resulting in the TQA algorithm. The first order Suzuki-Trotter decomposition allows to approximate the time evolution with Applying such decomposition to the quantum annealing protocol that is uniformly discretized on a grid of evolution times t i = i∆t with i = 1, ..., p and time step ∆t = T /p, we obtain the unitary circuit equivalent to Top inset illustrates that optimal performance of TQA at time T * is followed by the rapid decrease in approximation ratio at longer times T * . Data is shown for N = 12. Bottom inset shows finite size scaling of the time step δt, determined by the slope of the T * vs p dependence, that assumes approximately constant value with the graph size. All averaging is performed over 50 random instances of unweighted 3-regular graphs. the depth-p QAOA ansatz (2) with angles being In what follows we refer to such choice of angles as TQA initialization, controlled by the time step ∆t at a fixed depth p.
The mapping between TQA and QAOA along with the universality of quantum annealing for T → ∞ was previously used as an argument for the existence of good QAOA protocols at depths p → ∞ [6]. Typically the required evolution time of quantum annealing is inversely proportional to the square of the minimal energy gap T ∝ ∆ −2 encountered in the Hamiltonian H(t) over the time evolution. Numerous studies established that the time required for a good performance often blows up exponentially due to the encounter of exponentially small gaps in N [20].
In contrast to previous studies, we investigate TQA performance in a different setting that is motivated by its subsequent usage as a QAOA initialization. The QAOA is characterized by a fixed circuit depth, p. Therefore, we fix p and study the performance of TQA as a function of total time T or, equivalently, time step ∆t, related as T = p ∆t. Generally the performance of quantum annealing tends to increase with the total annealing time. However in case of fixed p, longer annealing time corresponds to a coarser discretization, which leads to larger Trotter errors that scale proportionally to O(∆t 2 ) at small values of ∆t. It is the interplay between increased efficiency and Trotter errors that leads to the existence of an optimal annealing time in the present setting. This is illustrated in Fig. 3 (top inset), where the approximation ratio for the TQA protocol increases with T for small times, reaching a maximum at time T * followed by a sharp downturn. The sharp decrease of QA performance after T * was reported by Heyl et al. [22], who attributed it to a phase transition caused by a proliferation of Trotter errors.
Main panel of Fig. 3 reveals a linear scaling of the optimal time T * with the number of time steps p. This is equivalent to the existence of an optimal time step δt, that determines T * as The bottom inset in Fig. 3 shows that the time step δt defined as a slope of a linear fit of T * with p converges with the problem size N . This gives a strong evidence that δt is a well-defined quantity in the thermodynamic limit N → ∞. For the family of the 3-regular graphs considered here we observe that the optimal time step tends to value δt ≈ 0.75. The existence of an optimal time step that is of order one holds for three other graph ensembles, considered in Appendix B, although the numerical value of this time step depends on the specific graph ensemble. We use the TQA initialization in Eq. (6) with time step ∆t = 0.75 for the QAOA and observe in Fig. 2 that it allows to avoid the local minima and helps the QAOA to converge to a minimum that is very close to the global minimum in terms of approximation ratio. This result motivates the systematic analysis of the performance of the TQA initialization.

TQA initialization of QAOA
We continue with a detailed study of the TQA initialization defined in Eq. (6) as a function of time T at fixed p. The green line in Fig. 4(a) reveals that the approximation ratio remains constant for a range of times, denoted as [T * min , T * max ]. This figure shows results for p = 5 QAOA applied to graphs with N = 12 vertices, but a similar trend holds for other values of depth, problem sizes, and graph ensembles. The constant approximation ratio in a range of T is naturally explained by the convergence of parameter optimization routine to the same minimum for T ∈ [T * min , T * max ], see cartoon in Fig. 1(c). In order to discriminate between different times in the above range, we study the distance between initialization parameters and optimized values of γ, β. The red line Fig. 4(a) shows that this distance has a well-pronounced minimum at a time denoted as T * d that is contained within the same interval closest to the local minimum achieved from it in a sense of distance defined in Eq. (5).
All three times T * min , T * max , and T * d were defined above using the QAOA with fixed depth p. Figure 4(b) reveals that all three times scale approximately linearly with p. This allows to define a range of time steps for the TQA initialization that yield the same performance of optimized QAOA, ∆t ∈ [0.16, 0.92] for the present graph ensemble. Moreover, the time T * d nearly coincides with the optimal TQA time T * TQA = δt p obtained in the previous section, implying that ∆t = δt = 0.75 is the optimal value of time step. This result also holds for the MaxCut problem on other graph families, see Appendix.
The similarity between the optimal time of the TQA protocol to the time where the angular distance d γ, β between the initial and final protocol is minimized, suggests that the performance of the QAOA is bounded by the same phase transition that occurs in TQA [22]. However, the QAOA is able to provide a significant improvement over TQA by doing additional optimizations of variational parameters. Recent work [7] suggested that such performance improvement may be due to uti-  Inset reveals that the comparable performance persists over the entire range of considered system sizes, circuit depth is p = 10. Averaging was performed over 50 random graphs. lization of "diabatic pumps" that allow to return the population from excited states back to the ground state. This could potentially explain the systematic deviation of the QA protocol from TQA initialization as seen in Fig. 8 in Appendix C.
Finally, we compare the performance of QAOA that used 2 p random initializations to the QAOA launched from TQA initialization with optimal time step δt. Surprisingly, Fig. 5 shows that TQA initialization yields the same performance as the best result for random initialization even for QAOA protocols with depth comparable to the problem size, N . Moreover, the inset of Fig. 5 illustrates that the excellent performance of TQA initialization holds true for a broad range of system sizes N , while Appendix D presents equally encouraging results for other graph ensembles. Note that the QAOA performance for fixed p decreases with system size N , which was attributed to the fact that the QAOA with fixed p cannot "probe" the whole graph. In order for the QAOA to achieve constant performance for increasing problem size N , the depth of QAOA should increase at least as log N [7].

Summary and discussion
Our central result is the establishment of a family of TQA initializations for the QAOA parametrized by a time step ∆t. We find that TQA initialization allows the QAOA to find a solution close to the global optima for a broad range of parameter ∆t. In this range our initialization scheme achieves results similar to the best outcome of 2 p random initializations, with a single optimization run. Moreover we establish a heuristic way to identify the optimal ∆t for the TQA initialization from the performance of the TQA protocol.
Our results open the door to more time-efficient practical implementations of the QAOA on NISQ devices. To this end, we propose a two-step practical NISQ algorithm that capitalizes on the success of TQA initialization and uses the heuristic results to establish an optimal value of the time step. The first two steps of Algorithm 1 implement the TQA protocol on a NISQ device, thus obtaining an estimate for the optimal time in the TQA initialization. This can be readily carried out on today's NISQ devices [37]. The second part of the algorithm consists of running the QAOA optimization loop using values of variational parameters according to Eq. (6).
Algorithm 1 QAOA with TQA initialization 1: Implement QAOA ansatz with circuit depth p. Numerical simulations presented above suggest good performance of the above algorithm in the idealized case when presence of noise, gate errors, and other imperfections are neglected. Moreover, the fact that TQA initialization converges to a good minimum for the range of times (equivalently, time steps) T ∈ [T * min , T * max ], see Fig. 4, suggests that this algorithm has a high tolerance towards imperfections in determining the value of δt. Determining the performance of this algorithm on a real NISQ device or incorporating some of the imperfections into the numerical simulation remains an interesting open problem.
In our studies we restricted our attention to the Max-Cut problem and demonstrated success of our approach for three different random graph ensembles. We expect that these results also hold for other graph ensembles, provided that the concentration of the QAOA landscape is true [16]. It is also interesting to check if our findings hold true beyond the MaxCut problem. Furthermore, it will be interesting to study the finite size scaling for problem sizes N > 12 considered here using matrix product states [38] or neural-network quantum states [39,40].
In addition to practical NISQ algorithms, our finding suggest a previously unknown connection between the QAOA at relatively small circuit depth and quantum annealing. The fact that quantum annealing inspired initializations belong to a basin of attraction of a highquality minimum in the QAOA landscape, see Fig. 1(c), invites a more comprehensive study of the QAOA landscape from this perspective. How many good quality minima typically exist in such landscape? How different are they from each other and what are their basins of attraction? Can one use other information measures such as entanglement or Fisher information [41] to characterize the QAOA landscape? Finding answers to such questions may lead to other prospective families of QAOA initializations.
While TQA provides a good initialization, the subsequent QAOA optimization is able to significantly improve the performance. Understanding the underlying mechanisms of such performance improvement is an outstanding challenge. In particular, there remains an intriguing possibility that the QAOA optimization routine implements some of the techniques, developed to improve the annealing fidelity, such as diabatic pumps [7], shortcuts to adiabaticity [42], and counterdiabatic driving [43,44]. The fact that the optimal time step coincides with the point of proliferation of Trotter errors [22], thus effectively taking maximal possible value suggests interesting parallels to the Pontryagin's minimum principle considered in context of variational quantum algorithms [45].
To conclude, we hope that TQA initialization of the QAOA established in this work will help to achieve practical quantum advantage by executing the QAOA on available devices and inspire future research that could lead to better understanding of what happens under the hood of QAOA optimization.

Data and code availability
Data is available upon reasonable request, a brief tutorial for the TQA initialization can be found in Ref. [46] Acknowledgments We would like to thank D. Abanin and R. Medina for fruitful discussions and A. Smith

A Optimization landscape for different graph ensembles
We start by reviewing all graph ensembles used in the main text and Appendices. In particular, we focus on symmetries that allow to reduce the space of QAOA parameters.
3-regular unweighted graphs represent the graph ensemble considered in the main text. Each vertex is connected exactly to three other vertices chosen at random. In order to sample graphs from this ensemble we use the networkx Python package [47]. For 3-regular unweighted graphs the space of variational parameters can be restricted using the fact that the classical Hamiltonian has integer eigenvalues (thus γ i are defined modulo π) and that shifting any of angles β i by π/2 is equivalent to a spin flip of H C that has no effect [7]. This allows to restrict β i ∈ [− π 4 , π 4 ) and γ i ∈ [− π 2 , π 2 ), and is reflected in the definition of distance in Eq. (5) in the main text.
3-regular weighted graphs are characterized by presence of random weights w ij assigned to each edge i, j . These weights are chosen to be w ij ∈ [0, 1). Presence of random weights does not allow to restrict the domain of γ i angles as before, though restriction β i ∈ [− π 4 , π 4 ) still works. Therefore the analogue of Eq. (5) for this and other weighted ensembles reads d . Erdős-Rényi graphs represent a random graph ensemble where two edges are connected on random with a fixed probability, chosen to be q = 0.5. In contrast to above examples, the fixed value of q implies that edge connectivity increases with number of vertices as qN . Erdős-Rényi graphs exhibit the same symmetries as 3regular unweighted graphs.
The presence of an unbounded region of parameters γ i in the weighted graph ensemble represents an additional challenge in visualizing the QAOA optimization landscape and choice of initialization parameter. In order to explore the importance of large values of |γ i |, we consider the sequence of enlarged intervals γ i ∈ [−k π 2 , k π 2 ) with k = 1, 2. Figure 6 shows the joint probability distributions similar to Fig. 2. We see that for 3-regular weighted graphs the enlarged initialization interval k = 2 leads to a concentration of local optima further away from the global solution compared to the k = 1 interval. When we repeat the same analysis for Erdős-Rényi graphs, we observe that ∆r γ, β is unaffected by the enlarged k = 2 interval. This numerically confirms the symmetry considerations from above and allows us to restrict γ to the k = 1 interval in all further analysis. For unweighted graphs such restriction relies on symmetry, and for weighted graphs this is motivated by the fact that an extended region of γ i worsens the performance of random initialization in the QAOA.

B Optimal time for TQA
Below we discuss the dependence of the optimal time step δt of the TQA algorithm on the graph ensemble.  Figure 6: Comparing the joint probability distribution of the distance to the global minimum in parameter space d γ, β and in terms of approximation ratio ∆r γ, β for weighted 3-regular (top) and Erdős-Rényi graphs with edge probability 0.5 (bottom) reveals that the distribution is dependent on the initialization interval for weighted 3-regular graphs. We initialize the parameters for k = 1 (left) and k = 2 (right) and observe that for weighted 3-regular graphs the enlarged interval leads to an increased spread of the local optimas in ∆r γ, β (yellow region). The spread in ∆r γ, β for Erdős-Rényi graphs remains largely unaffected, as expected from the symmetry considerations. Similarly to Fig. 2, red squares correspond to the QAOA minimum achieved from TQA initialization (shifted from small negative values of ∆r γ, β to zero for improved visibility), orange dots correspond to the average performance of random initialization. Data is for 50 random graphs with N = 10 and p = 5.
An analytical upper bound on the number of Trotter steps p needed to approximate the time evolution with precision in terms of operator trace distance was obtained in Ref. [48]. Translating this bound into the scaling of δt we obtain δt ∝ 1/(||H C || F N ), where ||H C || F is the Frobenius norm of the classical Hamiltonian. This norm exponentially diverges with N , suggesting very small values of δt at large system sizes. This is not surprising, since the bound of Ref. [48] operates on the distance between two many-body unitary operators. In contrast, the performance of the TQA algorithm is studied using the approximation ratio that quantifies how close the expectation value of the local observable H C , is to the ground state energy.
The effect of Trotterization on local observables was considered in Ref. [22]. This work conjectured the existence of a finite value of the time step of order one, at which the discretization of time evolution fails to approximate the local observables. This value of the time step may be related to the convergence radius of the Baker-Campbell-Hausdorff series expansion, which is governed by the norm of the classical Hamiltonian and its commutator with H B . Phenomenologically, the Frobenius norm divided by the square root of Hilbert space dimension and problem size N , is expected to be N -independent in the thermodynamic limit. Figure 7 compares the dependence of δt on the system size with the phenomenological scaling s N defined in Eq. (8). We observe that the expression s N qualitatively matches the numerical scaling that we observe for δt between different graph ensembles. In particular, the value of the time step is largest for weighted 3-regular graphs that are expected to have the smallest norm of the classical Hamiltonian. However, s N fails to capture δt quantitatively, highlighting the need to develop a better analytical understanding of the point that governs the phase transition from localization to quantum chaos for local observables according to Ref. [22].

C Patterns in optimized parameters
The QAOA is inspired by TQA and is thus universal for p → ∞. However, for finite p the converged QAOA parameters also display stark similarity to a QA protocol which was noticed in some earlier works [7,8]. In Fig. 8 we compare the TQA initialization and final QAOA parameters. The QAOA parameters show only slight alterations at the beginning of the protocol and remain close to their original values throughout the rest of the protocol. This holds true for the three graph types that we considered in our analysis. In addition, the small variation between optimal parameters for different graph instances is in line with the concentration of the QAOA landscape demonstrated analytically at low p in Ref. [16].

D Random vs TQA initialization for other graph ensembles
In addition to the unweighted 3-regular graphs, discussed in the main text, we also test TQA initialization on weighted 3-regular graphs and Erdős-Rényi graphs. We find that TQA initialization yields the same performance as the best of random initializations for weighted 3-regular graphs, see Fig. 9. For Erdős-Rényi, TQA initialization even outperforms the best of 2 p random initializations.   Figure 9: TQA initialization leads to the same QAOA performance as the best of 2 p random initializations for both weighted 3-regular graphs (top) and Erdős-Rényi graphs (bottom). We average the results over 50 graph realizations, the main plot was obtained for system size N = 10, inset is for circuit depth p = 10.