De-Signing Hamiltonians for Quantum Adiabatic Optimization

Quantum fluctuations driven by non-stoquastic Hamiltonians have been conjectured to be an important and perhaps essential missing ingredient for achieving a quantum advantage with adiabatic optimization. We introduce a transformation that maps every non-stoquastic adiabatic path ending in a classical Hamiltonian to a corresponding stoquastic adiabatic path by appropriately adjusting the phase of each matrix entry in the computational basis. We compare the spectral gaps of these adiabatic paths and find both theoretically and numerically that the paths based on non-stoquastic Hamiltonians have generically smaller spectral gaps between the ground and first excited states, suggesting they are less useful than stoquastic Hamiltonians for quantum adiabatic optimization. These results apply to any adiabatic algorithm which interpolates to a final Hamiltonian that is diagonal in the computational basis.


I. INTRODUCTION
Quantum adiabatic optimization (QAO) [1][2][3][4][5] is a quantum algorithm that uses adiabatic evolution in Hamiltonian ground states to solve classical combinatorial optimization problems. The computational cost of the algorithm is given by the scaling with system size of the adiabatic condition [6][7][8][9][10], which in turn depends on the minimum spectral gap encountered along the adiabatic path. While QAO is able to reproduce the known oracular speedup for quantum search [11], the method has not yet theoretically or empirically demonstrated an advantage over state-of-the-art classical algorithms in optimization [12][13][14][15].
The standard proposal for QAO uses an adiabatic path defined by a one parameter family of Hamiltonians {H(s)} s∈ [0,1] that are all stoquastic, meaning that H(s) has real and non-positive matrix elements in the computational basis for each s ∈ [0, 1] [16]. Empirically it has been observed that this stoquastic QAO can in many cases be efficiently classically simulated by quantum Monte Carlo (QMC) methods [17], meaning that QMC simulations are able to effectively follow the instantaneous ground state of the interpolating Hamiltonian with the same computational cost scaling as QAO.
(QMC simulations can approximately sample equilibrium states in the computational basis and estimate the expectation values of local observables.) There are now several rigorous results on polynomial-time QMC simulations for various classes of equilibrium states of stoquastic Hamiltonians [16,[18][19][20][21][22], however there are also counterexamples where QMC methods fail to converge efficiently [23][24][25].
In terms of computational complexity, estimating the ground state energy of a stoquastic Hamiltonian can be done in the complexity class AM which has a classical prover and classical verifier [26], whereas estimating the ground energy of general local Hamiltonians is QMAcomplete [27]. Similarly, stoquastic adiabatic computation cannot be universal for quantum computing unless the Polynomial Hierarchy collapses [28]. Therefore, a significant open question is whether 'non-stoquastic' Hamiltonians (which have positive or non-real off-diagonal elements in every choice of local basis [16,[29][30][31]) could improve the performance of QAO [32][33][34][35][36][37]. Adiabatic computation based on non-stoquastic Hamiltonians can be universal for quantum computing [38,39] and is not efficiently simulable by QMC due to the sign problem [29,[40][41][42].
In this work we define a locality-preserving mapping which takes every non-stoquastic QAO protocol to a corresponding stoquastic QAO protocol. Considering various ensembles (dense matrices, signed graphs, and local Hamiltonians) we find that the non-stoquastic adiabatic paths have smaller spectral gaps than the corresponding stoquastic adiabatic paths, with high probability. Using random matrix theory, spectral graph theory, and other analytical techniques we develop an explanation of this phenomenon based on the low-energy spectrum of stoquastic and non-stoquastic Hamiltonians.

II. BACKGROUND AND OVERVIEW
In QAO, we assume that the global optima of discrete classical optimization problems are encoded in the ground states of a problem Hamiltonian H p that is diagonal in the computational basis [4,[12][13][14][15]43]. To reach a minimizing configuration of H p , the system is initially prepared in the ground state of a non-diagonal 'driver' Hamiltonian H d , with [H p , H d ] = 0. The total Hamiltonian H(s) of the system interpolates between H d and H p , with s being the interpolation parameter. (We suppress the dependence on s unless otherwise noted.) If this process is carried out sufficiently slowly, the adiabatic theorem of quantum mechanics [8][9][10]44] ensures that the system will stay close to the ground state of the instantaneous Hamiltonian throughout the evolution, such that one obtains a state close to the ground state of H p by the end of the interpolation. For the adiabatic approximation to hold it suffices for the running time of the algorithm to be inversely proportional to a low power of the minimum gap -the difference between the second lowest and lowest eigenvalue of the Hamiltonianthroughout the evolution [6][7][8][9][10].
To compare the performance of QAO driven by stoquastic and non-stoquastic fluctuations, we associate every non-stoquastic interpolating n-qubit Hamiltonian H = i,j∈{0,1} n H ij |i j| to a 'stoquasticized' version of it, i.e., a stoquastic Hamiltonian whose locality structure is identical to that of the non-stoquastic one. We will consider two types of stoquastization. In the first, which we refer to as 'de-signed' stoquastization, the de-signed stoquastic Hamiltonian is (1) The de-signing operation leaves the diagonal entries (corresponding to the classical cost function in QAO) unchanged, while adjusting the phases of the off-diagonal elements so that H (de-signed) is stoquastic in the computational basis. Note that the transformation does not change the locality structure of the Hamiltonian so H (de-signed) is as easy to construct in terms of products of Pauli operators as the original Hamiltonian H. A second form of stoquastization that we will consider is based on an additive shift, which we refer to as 'shifting. ' We primarily consider shifting in the context of dense random matrices with i.i.d. entries, where it takes the form, where |s = z∈{0,1} n |z is the unnormalized uniform superposition. Similar to H, both H (de-signed) and H (shifted) define a path through the space of Hamiltonians that end at the same point H p . We ask the natural question: which of the Hamiltonian paths has a larger minimum spectral gap, the non-stoquastic or the stoquasticized Hamiltonian?
To answer this question, we provide a series of results that cover different cases, all of which favor a larger spectral gap for the stoquasticized Hamiltonians. (i) For a random N × N Hermitian matrix H with i.i.d. entries we prove that ∆ H (de-signed) is larger than ∆ H by a factor of N 3/2 , with probability 1 − exp(−poly(n)), where ∆ denotes the energy gap between the ground state and first excited state. (ii) For local Hamiltonians we first prove that if the total Hamiltonian H is diagonal in the X basis, then the stoquastization in the Z basis, H (de-signed) , always obeys ∆ H (de-signed) ≥ ∆ H . (iii) To show that these properties extend throughout the interpolation path, we present numerical simulations of QAO up to n = 22 qubits which demonstrate that the fraction of QAO Hamiltonian paths with a larger minimum spectral gap than H (de-signed) or H (shifted) rapidly goes to zero with n. (iv) One may also consider quantum annealing algorithms which do not necessarily satisfy the adiabatic approximation, and for these we show by simulations of Schrödinger time-evolution up to n = 20 qubits that the time-to-solution metric [45] also favors stoquastic Hamiltonians. (v) Finally, we apply techniques from spectral graph theory called signed graph Cheeger inequalities [46] to derive new theorems relating the low-energy spectrum of signed (non-stoquastic) and unsigned (stoquasticized) versions of a graph. Although these theorems do not fully capture the quantitative results that we find in the numerical simulations, we believe they present a valuable intuitive explanation for why spectral gaps of nonstoquastic Hamiltonians tend to be smaller than those of their stoquastic counterparts. We begin with a couple of preliminary observations.

A. Preliminary Observations
Our first observation is that de-signing always reduces the ground state energy.
Observation 1 For any H, H (de-signed) with ground state energies E 0 and E (de-signed) 0 respectively, the ground state energies satisfy with equality holding if and only if there exists a unitary transformation U such that U HU † = H (de-signed) , where U is diagonal in the basis of the stoquastization.
This bound [Eq. (4)] is proven by using the variational method with a ground state ansatz for H (de-signed) that is formed from the absolute value of the components of the ground state of H. (Details of the proof are given in Appendix A.) In fact, this relationship between the ground state energies has previously been studied in a different but related context. In Ref. [47] the authors consider many-body systems consisting of particles hopping on a graph, and interacting by a potential (i.e., Hubbard models on a general graph). In our terminology, they prove that if the hopping part of the Hamiltonian is stoquastic then bosons always have a strictly lower energy than fermions. However, if the hopping Hamiltonian is nonstoquastic then fermions can have a lower ground energy (note that the spectral gaps of these systems are not considered in [47]). While the de-signed Hamiltonian H (de-signed) always has a lower ground state energy, it is easy to generate examples in which the first excited energies satisfy E (de-signed) 1 < E 1 . Therefore, showing that ∆ (de-signed) ≥ ∆ with high probability requires a more sophisticated understanding of how much the first excited energy can be decreased by de-signing.
Note that we can always shift our non-stoquastic Hamiltonian to obtain a positive semidefinite matrix G = H I − H, with the ground state of H becoming the principal eigenvector of G (here H denotes the operator norm of H). Therefore we can reformulate our problem in terms of the positive semidefinite matrix G and its counterpart G + that is the non-negative matrix formed by taking the entrywise absolute values of G. We now compare the spectral gap between the largest and next largest eigenvalues of G and G + , which leads to the following observation: Observation 2 Let G, G + be as above. Then G + ≥ G by Eq. (4), but Tr(G) = Tr(G + ), so at least some of the eigenvalue gaps in the spectrum of G must be smaller than the corresponding gaps in G + .
In the remainder of this work we will provide theoretical and numerical evidence that this "compression" of the spectrum is concentrated in the low-energy sector.
Finally, our third preliminary observation pertains to the first-order perturbative correction that arises when a Hamiltonian H 0 that is stoquastic (in, say, the computational basis) is perturbed by a Hamiltonian V with nonnegative entries in the computational basis (therefore V perturbs H towards becoming non-stoquastic). If |ψ 0 0 is the ground state of H 0 with energy E 0 0 , then |ψ 0 0 has all non-negative amplitudes in the computational basis. The perturbed Hamiltonian H s = H 0 + sV has ground energy E 0 0 + s ψ 0 0 |V |ψ 0 0 to first order in s. Similarly, if |ψ 1 0 is the first excited state of H 0 with energy E 1 0 , then to first order in s the first excited energy of H s is E 1 s = E 1 0 + s ψ 1 0 |V |ψ 1 0 . Therefore, up to first order in s we have The fact that V has non-negative entries, and |ψ 0 0 has non-negative amplitudes, means that the ψ 0 0 |V |ψ 0 0 is non-negative (it is a sum of non-negative numbers). Meanwhile, ψ 1 0 |V |ψ 1 0 is a sum of numbers with opposite signs. This suggests that a perturbation in the direction of becoming non-stoquastic increases the ground energy, and may either increase or decrease the first excited energy, but likely by a lesser amount (due to the cancellation) unless the unperturbed first excited state aligns with the perturbation in a finely tuned way. Such an effect was demonstarted in Ref. [36], where an example was provided where a non-stoquastic intermediate Hamiltonian appears to mitigate (the study was limited to n ≤ 24) a perturbative crossing [48] that the de-signed Hamiltonian could not.

III. RIGOROUS RESULTS
We rigorously compare general Hamiltonians to their de-signed and shifted counterparts in three complimentary regimes: random dense matrices with i.i.d. entries (compared to their shifted counterparts), Hamiltonians which are diagonal in the X basis compared to their counterparts which have been de-signed in the Z basis, and non-stoquastic Hamiltonians corresponding to signed graph Laplacians, which are compared to their de-signed counterparts (i.e., the underlying unsigned graphs).

A. Random matrices
In this section we apply results from random matrix theory to compare the spectral gap (between the largest two eigenvalues) of N × N random real symmetric matrices to the corresponding spectral gap of their shifted counterparts (which form an ensemble of symmetric matrices with non-negative entries). The random matrices we consider have i.i.d. entries, and therefore these matrices are dense with non-zero entries, almost surely. For the real symmetric matrices we will consider a distribution W of non-Gaussian Wigner matrices (which are defined in general to have entries distributed with mean 0 and finite variance). In our case the matrices in W are defined to have entries that are uniformly distributed in [−1, 1], in order to use max i,j |H i,j | = 1 in Eq. (3).
We will compare these matrices to a distribution W + of random matrices with nonnegative entries drawn uniformly from [0, 1] -these would be the 'shifted' matrices. The key observation that enables the analysis is that these two matrix distributions are related by a rank 1 shift matrix, A ij = 1 for all i, j = 1, ..., N . Specifically, for every random matrixŴ + drawn from W + there exists aŴ drawn from W such that We note that similar arguments involving finite rank perturbations have previously appeared in the random matrix literature [49], however our application to comparing spectral gaps of stoquastic and non-stoquastic Hamiltonians is novel. We will apply a large deviation bound that can be found as corollary 2. 3.5 in Ref. [50], which we reproduce here for completeness. Lemma. Suppose that the entriesŵ ij of W are independent, have mean zero, and are uniformly bounded in magnitude by 1. Then there exist absolute constants C, c > 0 such that for all ≥ C. This result states that the largest eigenvalue ofŴ is O( √ N ) with probability exponentially close to 1. Even tighter bounds than this are known, which show that Ŵ lies in the interval , but Eq. (6) has the advantage of a simple tail bound and suffices for our purposes.
Using Eqs. (5)(6) together with Weyl's inequality [51,52] allows us to show that the spectral gap of W + is Ω(N ) with high probability. Let the eigenvalues of and also

B. Gaps of X-stoquastic matrices
We next show that if H that is diagonal in the Pauli-X basis (and is therefore obviously stoquastic) then the spectral gap is not reduced by de-signing in the Z basis. To show this, let us denote the two lowest eigenvalues of H by λ 0 and λ 1 and the two lowest eigenvalues of H (de-signed) by λ (de-signed) 0 and λ (de-signed) 1 . In our proof, we shall use the following representation of H [54] : We further decouple the magnitude of the various α parameters from their signs and denote and since both H and H (de-signed) are diagonal in the xbasis, we can replace the Pauli operators X i with their bit representations For convenience and without loss of generality we will fix the constant α to be so that H and H (de-signed) can be written as: and similarly where ⊕ denotes bitwise XOR. It is easy to see that Further denoting the ground and first excited state configurations of H byx G i and x E i , respectively, and the first excited state configu- and similarly Rearranging ∆ above, we get: We can upper-bound the gap ∆ by replacing the first excited state configuration by a (possibly) different configuration (which may have a higher cost) given bỹ Comparing∆ above with the expression obtained for λ and so Thus, the gaps of X-diagonal matrices are never decreased by de-signing in the Z basis.

C. Spectral graph theory
In this section, we consider signed graph Laplacians, which serve as an interesting class of non-stoquastic Hamiltonians (including some sparse and local Hamiltonians), and enable the use of signed graph Cheeger inequalities [46] to derive new theorems that relate the low-energy spectrum of a signed (non-stoquastic) graph Laplacian to that of its unsigned (stoquastic) counterpart. This work belongs to a line of progress in adapting discrete Cheeger inequalities [55] and related bounds to the setting of quantum Hamiltonians [56][57][58], including the recent adaptation of signed graph Cheeger inequalities [46,59,60] to non-stoquastic Hamiltonians [61,62]. The bounds we obtain are not probabilistic, but the drawback is that they also depend on some geometric properties of the associated ground states. Roughly speaking, the cases where the non-stoquastic spectral gap can be larger than the stoquastic spectral gap correspond to highly localized non-stoquastic ground states that are only nonzero in a small sector of the local basis. Despite this limitation, we present these results in part because they lead to an appealing intuitive explanation for the general observations we make about stoquastic vs. nonstoquastic spectra.
The Laplacian of an unsigned graph G = (V, E), where V = {v 1 , . .., v N } is a set of vertices and E ⊆ V × V is a set of edges between those vertices, is an N × N symmetric matrix L with the degree of each vertex along the diagonal, and off-diagonal matrix entries that are zero except for L ij = −1 corresponding to edges (v i , v j ) ∈ E (see appendix B for full details including the generalization to graphs with weighted edges). Therefore these unsigned graph Laplacians always correspond to stoquastic Hamiltonians. In a signed graph each edge (v i , v j ) is associated with a signature σ ij ∈ {+1, −1}, and the signed Laplacian matrix L σ has off-diagonal entries L σ ij = −σ ij , while the diagonal entries are again given by the (unsigned) degrees of the vertices. Since L σ can now contain positive off-diagonal elements, such signed graph Laplacians can be non-stoquastic Hamiltonians. Our goal is to compare the eigenvalues of a graph with signature σ, enumerated in nondecreasing order, to the eigenvalues of its unsigned counterpart. Since signed graph Laplacians can contain a sign problem, one may wonder about the conditions under which this problem might be cured. The natural class of transformations to consider in this context are of the form S † L σ S, where S is a signature matrix (a diagonal unitary with real entries). For this class of transformations it is possible to completely characterize the cases in which the non-stoquasticity can be cured, and these signed graphs are called "balanced." A signed graph is balanced if and only if the product of σ ij around every cycle in the graph is positive. Another equivalent characterization is that every subset S ⊆ V of the graph can be partitioned into two sets S = S 1 ∪ S 2 , with no negative edges (σ ij = −1) within the sets, and no positive edges between them. If we denote the number of positive edges between S 1 and S 2 by |E + (S 1 , S 2 )|, and the number of negative edges within the sets by |E − (S 1 )| and |E − (S 2 )| respectively, then a balanced graph satisfies It turns out that this idea of bipartitioning a subset of vertices S ⊆ V into subsets S 1 , S 2 leads to a useful quantity F (S) called the (intensive) frustration index (note that we modify the notation of Ref. [46] to better suit a comparison of signed and unsigned graphs), where vol(S) is the sum of the vertex degrees in S.
Through the signed Cheeger inequalities we will see that the frustration index has an essential role in characterizing the low-energy spectrum of the graph. For unsigned graphs, Cheeger's inequality and its "multi-way" generalizations characterize the low energy spectrum in terms of a quantity called the expansion, which detects bottlenecks in the graph. For any nonempty subset S ⊆ V the expansion is defined by where |E(S,S)| denotes the number of (unsigned) edges leaving S. The expansion Φ(S) is also sometimes called the "bottleneck ratio", since it measures the size of the boundary of the set to the size of its interior. For any k ≥ 1 the k-th order Cheeger constant is In words, S 1 , . .., S k is the sub-partition that minimizes the expansion of all the subsets, and h + k is the largest expansion amongst those subsets in the sub-partition. Cheeger's inequality relates h k to the k-eigenvalue, where C is a constant and D max is the maximum (unsigned) degree of any vertex. The conclusion from Eq. (20) is that the low-energy spectrum of an unsigned graph (and in fact any stoquastic Hamiltonian) is characterized by sub-partitioning the ground space into subsets of minimal expansion: one has k small eigenvalues if and only if there are at least k disjoint subsets with bottlenecks.
To generalize Cheeger's inequality to signed graphs, one defines the following signed Cheeger constant and then the generalization of Eq. (20) is Therefore a signed graph has at least k small eigenvalues above the ground state if and only if it can be subpartitioned into at least k disjoint subsets that all have low expansion and a small frustration index. The Cheeger inequalities already provide an argument why we might generically expect non-stoquastic Hamiltonians to have smaller spectral gaps. Consider a stoquastic spectrum and the associated Cheeger subsets, λ + 2 with S 1 , S 1 , S 2 , S 3 , etc. These subsets all try to minimize their expansion. What upper bounds on the non-stoquastic eigenvalues do we get by considering these subsets? The upper bound we obtain on λ σ 1 , λ σ 2 , λ σ 3 is now increased by the frustration index of these subsets. But note that S 1 , S (2) 2 will usually be bigger subsets than S 3 , and so in the typical case S will have a larger frustration index. This rough arguments suggests that the ground state energy is pushed up most by phase frustration, the first excited energy is pushed up slightly less, etc. We illustrate this intuition in Fig. 1. This then provides an explanation for why non-stoquastic Hamiltonians tend to have smaller spectral gaps.
Next we apply these Cheeger inequalities to prove a new result that relates the spectral gap of a signed graph with its de-signed counterpart. We have already seen in Eq. (4) that λ + 1 ≤ λ σ 1 for all σ (an unbalanced signature σ cannot decrease the ground energy), and so we now seek to obtain an upper bound on λ σ 2 . To do this we can construct a variational excited state for L σ as follows. From λ σ 1 and Eq. (22) we can upper-bound the expansion and the frustration index of the subset Ω corresponding to the support of the non-stoquastic ground state. From λ + 2 and the unsigned Cheeger inequality Eq. (20) we also know there is a subset S 1 and S 2 with minimal expansion. Therefore we can upper bound the expansion of the intersections S 1 = S 1 ∩ Ω and S 2 = S 2 ∩ Ω using Eq. (20), and upper bound the frustration with respect to a bipartition S 1 = V 1 ∪ V 2 , S 2 = V 3 ∪ V 4 inherited from the non-stoquastic ground state (see fig. 2). These subsets are used to obtain the upper bound on λ σ 2 in the following theorem.
Theorem 1 Let L σ be a signed graph Laplacian with ground state φ σ 1 with energy λ σ 1 .
Define Ω := {v ∈ V : φ σ 1 (v) = 0} and consider the unsigned Laplacian L + on the same graph. By Cheeger's inequality there must be a subset S ⊆ V with Φ(S) ≤ 2D max λ + 2 , and for any subset S with expansion Φ(S) satisfying this relation we have The proof is presented in Appendix B, with the proof strategy illustrated in Fig. 2. However, using simpler techniques, we can also obtain the following theorem upper bounding λ + 2 . Theorem 2 (converse bounds): For any graph G and any signature σ, we have In the context of n-qubit local Hamiltonians with bounded interaction degree one always has D max = O(n).
The fact that we have λ σ 1 and λ + 2 on the RHS weakens the bound, but in appendix B we describe how improved Cheeger inequalities and standard assumptions about the low-energy spectra of QAO Hamiltonians (in particular, that their exists a gap somewhere in the low energy spectrum) would eliminate these square roots and strengthen the bound. Finally, recall that Ω is the size of the basis support set of the non-stoquastic Hamiltonian, and S is the size of the set that minimizes the bottleneck ratio in the ground state of the stoquastic Hamiltonian. Typically, S is large and encompasses a constant fraction of the total number of vertices (for example, in a path graph with N vertices vol(S) = N/2, and similarly for a hypercube with N vertices). Therefore in many cases this ratio of volume factors is O(1), and together with the improved Cheeger inequality the bound becomes as tight as we expect. In contrast, there will also be some fraction of cases in which Ω or the intersection Ω∩S is very small, and in these cases the bound predicts that the non-stoquastic first excited energy can be much larger than its stoquastic counterpart. These correspond to cases in which the stoquastic and non-stoquastic ground states are very different from one another.
In conclusion, if larger subsets tend to have (proportionally) larger phase frustration, then the stoquastic and non-stoquastic spectra are similar with generally larger gaps between the stoquastic eigenvalues. On the other hand, if the phase frustration is nonuniformly distributed then the non-stoquastic eigenstates can look very different from the stoquastic ones. Specifically, the subsets with small expansion will have extra large frustration, so that the sets that simultaneously minimize expansion and frustration look very different from the ones that minimize the expansion by itself. For these cases the stoquastic and non-stoquastic spectra can look almost arbitrarily different from one another.

D. Important caveat
Our results so far have concerned the spectrum of the Hamiltonian, but adiabatic evolutions are sometimes restricted to specific subspaces of the entire Hilbert space due to symmetries of the Hamiltonian. We will refer to this as the 'evolution subspace.' In this case, the relevant minimum gap for the adiabatic condition is not the energy gap between the ground state and first excited state of the total Hamiltonian but the energy gap between the lowest two energy states in evolution subspace, which need not necessarily coincide with the ground state and first excited state of the Hamiltonian. In this case, it may very well be that the non-stoquastic Hamiltonian gap is smaller than its stoquastic counterpart, with the opposite being the case in the evolution subspace.

IV. NUMERICAL SIMULATIONS
To complete our discussion, in this section we provide numerical evidence in support of the analytical analysis presented in the preceding sections. We demonstrate that the stoquasticized variants of various classes of random matrices become increasingly more favorable to their non-stoquastic versions with increasing system size. We study the gaps of dense and local matrices as well as quantum annealing runtimes for a few classes of problems. In each case we consider both the de-signed and shifted stoquastizations.

A. Dense matrices
We start off by considering random dense matrices. Here, we generate random (i) real-valued and (ii) complex-valued Hermitian matrices of different dimensions and compute their gaps. In the real case, matrix entries are drawn independently and uniformly from the range [−1, 1] whereas in the complex case the sampling is carried out for the real and imaginary parts separately. To enforce Hermiticity, we average the generated matrices with their conjugate transpositions. As a next step, we calculate numerically the gaps of the stoquasticized counterparts of the generated matrices. The figure of merit we focus on is the 'fraction of non-stoquastic wins'-the fraction of occurrences for which the energy gap of the (generally) non-stoquastic matrix is strictly larger than its stoquasticized counterpart. We examine the behavior of this fraction as a function of matrix dimension. The results are summarized in Fig. 3. As is evident, the fraction decays with the matrix dimension, and already at N = 10 the fraction is incredibly small, which is consistent with our analytical derivation in Sec. III A.

B. Minimum gaps Max-Cut
When written in terms of Pauli operators, dense matrices are highly non-local in nature and as such are not expected to be easily realizable in a physical implementation of QAO. We therefore also consider the more physical setting where the Hamiltonian has bounded locality. Furthermore, the efficiency of QAO is determined by the minimum gap along the the entire interpolation from the initial to the final Hamiltonian. To address these two issues, we consider a time-dependent Hamiltonian of the form [35]: where s is the dimensionless annealing parameter, H D is the initial 'driver' Hamiltonian, H I is the final Hamiltonian that encodes the computational problem, and H C is a catalyst Hamiltonian. We take H D to be the standard transverse-field Hamiltonian, H D = − i X i . We take H I to be an Ising Hamiltonian representing Max-Cut problem instances defined on random 3-regular graphs H I = ij Z i Z j [15,63]. This class of problem instances is known to be NP-hard (see, e.g., Refs. [15,64]). The Ising Hamiltonian has each spin coupled antiferromagnetically (with strength J ij = 1) with exactly three other spins picked at random. To incorporate nonstoquasticity [32][33][34]65] we choose the catalyst Hamiltonian H C to be with the same connectivities ij as H I . The coefficients α ij are chosen: (i) uniformly from [−1, 1] and (ii) randomly with equal probability from the set {−1, 1}. For the catalyst above, de-signed stoquastization means α ij → −|α ij | whereas shifted stoquastization implies α ij → 1 2 (α ij − 1). Before proceeding, we note that as defined, the Hamiltonian Eq. (23) is invariant under the transformation P = i X i , which we refer to as the global bit-flip symmetry of the Hamiltonian. Since the ground state of H(0) is the uniform superposition state and has eigenvalue +1 under P , the evolution under H(s) with this initial state is restricted to the P = +1 subspace [66,67]. This subspace is our evolution subspace (see Sec. III D), and we denote its energy eigenvalues by ε i (s) for i = 0, 1, . . . , 2 n−1 − 1. Therefore, the relevant gap for the QAO algorithm is the minimum gap in this subspace. This is relevant because the lowest energy state in the evolution subspace may not be correspond to the global ground state, whose eigenvalue we denote by E 0 (s). We will see that this may have important consequences.
We study the minimum gaps of randomly generated nspin problems with sizes ranging from n = 6 to n = 20. For each size we generate 100 random instances with a unique satisfying assignment (up to the global bit-flip symmetry) and further consider up to 100 non-stoquastic {α ij } realizations for each Max-Cut instance yielding a maximum of 10 4 Hamiltonian realizations per problem size.
We first show in Fig. 4 that at the point s * in the interpolation where the minimum gap occurs, a nonnegligible fraction of the non-stoquastic instances have E 0 (s * ) < ε 0 (s * ), meaning the lowest energy state in the evolution subspace is not the global ground state. We therefore choose to distinguish between these two cases, when E 0 (s * ) = ε 0 (s * ) and when E 0 (s * ) < ε 0 (s * ). While the latter fraction appears to be more or less constant over the range of n studied, we cannot rule out the possibility that it will decay to zero at larger n values.
For every Hamiltonian instance, we inspect the minimum gap of H(s) throughout the evolution and compare it against the minimum gap of its stoquasticized variants. The results are summarized in Fig. 5 showing the 'fraction of non-stoquastic wins' as a function of system size. As both panels of the figure indicate, the fraction is typically very low, more so for the shifted case where the 2σ error bars prevent us from concluding anything statistically relevant. We therefore focus on the de-signed case. We first observe that that the ±1 non-stoqaustic instances exhibit measurably fewer wins despite having a larger fraction of cases with E 0 (s * ) < 0 (s * ) (Fig. 4). For the uniform case, we observe that while the instances with E 0 (s * ) = ε 0 (s * ) appear to have a constant fraction of wins as a function of system size, the fraction of wins with instances with E 0 (s * ) < ε 0 (s * ) grows with system size n. While this result may indicate a positive trend, it is important to emphasize that when compared to both the de-signed and shifted cases, the results are more in line with the shifted case, meaning that the shifted case almost always beats the non-stoquastic case.

C. Time evolution simulations
We note that the fraction of non-stoquastic wins as measured by the gaps (Fig. 5) does not necessarily translate to the same fraction of wins as measured by an optimized computational cost t f /p GS , often referred to as the 'average time to solution'. For each instance, we find the annealing time that minimizes this quantity and calculate the median computational cost as a function of system size for the same group of instances. We show our results in Fig. 6, and we see no increase in the fraction of non-stoquastic wins. This discrepancy between the gaps and the computational cost can be attributed to the smallness of the problem instances: these small instances still have relatively large gaps (we still do not see the exponential decay with system size that might be expected for this class of instances), so the gap alone does not quantitatively predict the computational cost.

V. CONCLUSIONS AND DISCUSSION
In this study, we provided analytical as well as numerical evidence in favor of the assertion that non-stoquastoic driver Hamiltonians are inferior to their stoquastic variants for quantum annealing optimization tasks. We analyzed the gaps of several types of random non-stoquastic Hamiltonians comparing them against their 'stoquasticized' counterparts. We find that generically the nonpositivity of the latter Hamiltonians renders their gap larger than that of non-stoquastic ones, making nonstoquastic Hamiltonian less favorable for quantum annealing optimization. Our results imply that stoquastic Hamiltonians should generally be preferable to nonstoquastic ones, at least as far as runtimes of quantum adiabatic algorithms are concerned.
It should be noted that examples to the contrary nonetheless exist. One well understood example where a non-stoquastic Hamiltonian exhibits an exponentially larger minimum gap over its de-signed counterpart is the case of the ferromagnetic fully-connected p-spin models with a non-stoquastic intermediate (or catalyst) Hamiltonian [34,36,37,[68][69][70], with the caveat that while it is non-stoquastic in the computational Z basis, it is stoquastic in the X basis.
Another example exhibiting a similar non-stoquastic advantage over its de-signed counterpart is the introduction of a non-stoquastic intermediate Hamiltonian in the strong-weak cluster problem [71] between the two clusters [36,72]. This example illustrates an important point: while one choice of intermediate Hamiltonian exhibits an exponentially larger minimum gap over its de-signed counterpart, other choices of the intermediate Hamiltonian can exhibit the opposite effect, with the de-signed Hamiltonian exhibiting an exponentially larger minimum gap over the non-stoquastic Hamiltonian [72]. This example illustrates that even if one adiabatic path may exhibit an advantageous minimum gap scaling for the nonstoquastic Hamiltonian, there may be other adiabatic paths with stoquastic Hamiltonians that exhibit similar minimum gap scalings.
While the above examples provide flashes of optimism about the prospects of non-stoquastic drivers in QAO, our results call into question the promise attributed to non-stoquastic drivers to serve as generic catalysts of quantum speedups.
We note that we have not studied here the possibility of specifically tailoring driver Hamiltonians (stoquastic as well as non-stoquastic) to instances of optimization problems with the goal of raising the minimum gap along the evolution. Since nonetheless such tailoring usually requires detailed knowledge of the spectrum of the problem to be solved, in most cases doing so optimally may turn out to be as difficult to gain knowledge about as the original problem to be solved.

ACKNOWLEDGMENTS
Computation for the work described in this paper was supported by the University of Southern California's Center for High-Performance Computing (hpc.usc.edu) and by ARO grant number W911NF1810227. The research is based upon work (partially) supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the U.S. Army Research Office contract W911NF-17-C-0050. This material is based on research sponsored by the Air Force Research laboratory under agreement number FA8750-18-1-0044. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon." The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.
which are the diagonal degree matrix, the signed adjacency matrix, and the signed graph Laplacian. Denote the eigenvectors of L σ by {φ σ i } n i=1 with the corresponding eigenvalues, In the special case of the all-positive signature, i.e., σ uv = +1 for all (u, v) ∈ E, the corresponding Laplacian L + is the standard graph Laplacian of G and is an explicitly stoquastic Hamiltonian. In general L σ can be non-stoquastic, and in these cases its stoquastization corresponds to L + . We've already seen the bound λ + 1 ≤ λ σ 1 (and in fact λ + 1 = 0 for all unsigned graph Laplacians). Therefore it remains to find relations between the nonstoquastic first excited energy λ σ 2 in terms of the spectrum of L + , and our original results of this nature are derived in Section B 2.

Background
First we describe the important class of balanced graphs, for which the sign problem in the signed graph Laplacian is curable. Then we define the frustration index, which is a quantitative measure of how far a graph is from being balanced. Then we review Cheeger inequalities for unsigned graphs, in order to appreciate how they become modified by the frustration index to obtain the signed Cheeger inequalities.
a. Balanced graphs. Let θ be a diagonal matrix with ±1 along the diagonal. Taking L σ to θ −1 L σ θ is a unitary transformation, and so in quantum mechanics we view it as a change of basis. Graph theorists called θ a switching function, and the new signed graph defined by θ −1 A σ θ = A σ is called switching-equivalent to the original. Needless to say, switching preserves the spectrum. Equivalence under switching defines a partition of the set of signatures into equivalence classes [σ]. A signed graph in the same switching class as the all-positive signature is called balanced. It has the property that the product of the edge signatures around every cycle is positive. This is the equivalence class of Hamiltonians that have a sign problem that is curable in the vertex label basis.
It turns out we will need another characterization of balance throughout the following sections. A graph is balanced if and only if there exists a bipartition V = V 1 ∪ V 2 , with V 1 ∩ V 2 = ∅, such that each positive edge connects to elements of the same subset, and each negative edge connects elements of opposite subsets. We call this a balanced bipartition. For the graph with allpositive signature this is achieved by taking V 1 = V and V 2 = ∅. In general the bipartition may be nontrivial.
b. Frustration Index. In the previous section we saw that a graph is balanced iff it admits a balanced bipartition. In this section we define the frustration index of an arbitrary subset of vertices, which captures the residual frustration in the most-balanced possible bipartition of the subset. To define the most-balanced bipartition we need some additional notation.
Let V 1 , V 2 ⊆ V and define functions that count the number of edges between V 1 and V 2 of positive, negative, or either type: and Note that each edge is counted twice in these sums. Now for any subset S ⊆ V define the frustration index F (S), which captures the minimum residual energy due to non-stoquastic frustration in that subset, · min S1,S2⊆S S1∩S2=∅ It turns out that F (S) is also related to the minimum number of edges which need to be removed from the subgraph induced on S in order to put it in the balanced signature equivalence class. . This quantity is also sometimes called the bottleneck ratio, because a small value of Φ(S) indicates a relatively large number of vertices and small number of edges, corresponding to a bottleneck in the graph.
For an unsigned graph the following Cheeger constant captures the minimum possible expansion over all disjoint pairs of subsets, Taking the maximum expansion of two disjoint subsets ensures that at least one of them has vol(S i ) ≤ vol(V)/2. Cheeger's inequality relates Eq. (B4) to λ + 2 (which is the spectral gap because λ + 1 = 0), The proof for the bound λ + 2 ≤ 2h + 2 is that for any S we can take a first excited state ansatz that is positive on S and negative onS := V − S, The upper bound h + 2 ≤ 2D max λ + 2 says that if the gap is small, then there is a subset with small expansion. The proof is based on a spectral clustering algorithm. The idea is that the first excited state can be divided into vertices with positive amplitude and vertices with negative amplitude. Then these sets are further refined a bit to obtain the subsets S 1 , S 2 .
The Cheeger constant and it's relation to the first excited energy can also be generalized to the rest of the spectrum by what are called higher-order Cheeger inequalities. Define the k-th order Cheeger constant to be which are related to the k-eigenvalue, where C is a constant. Figure 7 illustrates a sub-partition with 4 subsets which can be used to upper bound h + 4 and hence upper bound λ + k . d. Signed Cheeger inequalities. The lowest order signed Cheeger constant is defined to be The meaning of this quantity is that it seeks to find the subset of vertices in the graph which is simultaneously close to balanced and has small expansion. Such a subset can be used to construct a good ansatz for the ground state, which is nonzero in S and vanishes outside of S. The most-balanced bipartition S = V 1 ∪V 2 is then used to assign the ansatz positive values in V 1 and negative values in V 2 . This leads to the lowest order signed Cheeger inequality, It is also possible to define a series of higher order signed Cheeger constants which bound the eigenvalues above the ground state of L σ , (B10) which for any signature σ satisfy In particular we will use the case k = 2 (for which the definition B10 is illustrated in Fig. 8) to upper λ σ 2 and hence upper bound the non-stoquastic spectral gap.

New Results
Let Ω ⊆ V be the support of the ground state of the signed Laplacian, The first bound we present is given in terms of the first excited energy of the unsigned graph Ω, considered as a subgraph of G. The proof strategy is illustrated in Fig. 9. This unsigned subgraph Laplacian is L + Ω , and the eigenvalue and eigenvector are λ + 2,Ω , φ + 2,Ω . By Cheeger's inequality we know there exists sets S 1 , S 2 with max {Φ(S 1 ), Φ(S 2 )} ≤ 2D max λ + 2,Ω Now we use S 1 , S 2 to upper bound the second eigenvalue of unsigned Laplacian In order to upper bound F (S 1 ), F (S 2 ) we need the fact that for any subsets S ⊆ S we have This is a comparison of the numerators in Eq. (B3), which states that the most-balanced bipartition does not get better as more vertices and signed edges are added. Therefore and this leads to the following theorem.
Theorem 3 Let L σ = D − A σ be a signed graph Laplacian with ground state φ σ 1 with energy λ σ 1 .
Using the same proof idea we can also obtain an upper bound directly in terms of λ + 2 without restricting to the subgraph Ω, at the cost of introducing more volume factors. This time the subsets S 1 , S 2 come from applying Cheeger's inequality to the full unsigned Laplacian L + , and then forming the intersections R 1 = S 1 ∩ Ω and R 2 = S 2 ∩ Ω to take the variational upper bound λ σ 2 ≤ 2 max {F (R 1 ) + Φ(R 1 ), F (R 2 ) + Φ(R 2 )} .
As before we can upper bound the frustration where R is either R 1 or R 2 . Next observe that every vertex on the boundary of R = S ∩ Ω is either on the boundary of S or the boundary of Ω, and so Therefore we have now shown the following variant of the theorem.
Theorem 4 Let L σ = D − A σ be a signed graph Laplacian with ground state φ σ 1 with energy λ σ 1 .
Define Ω := {v ∈ V : φ σ 1 (v) = 0} and consider the unsigned Laplacian L + on the same graph. By Cheeger's inequality there must be a subset S ⊆ V with Φ(S) ≤ 2D max λ + 2 , and for any subset S with expansion Φ(S) satisfying this relation we have Finally we present a cleaner theorem that unfortunately produces a statement that goes in the opposite direction from what we want. a. Theorem 3 (converse bounds): For any graph G and any signature σ, we have c. Comments. This shows that the first excited energy of the unsigned graph cannot be too much larger than the first excited energy of the signed graph. However since λ + 1 ≤ λ σ 1 , the unsigned graph could still have a larger spectral gap. 3. How tight are these bounds?
In the case of theorem 1, the only volume factor is vol(Ω)/vol(S). Here S is a Cheeger subset for a graph defined on Ω. Usually the Cheeger subset occupies about half of the vertices, for example in a path graph {1, ..., k} the subset S = {1, ..., k/2 } achieves the minimum expansion. Or in an n-dimensional hypercube, we would take a Hamming ball of radius n/2 . So in many cases we expect vol(Ω)/vol(S) = O(1) and we'll have For theorem 2 in the previous section, the RHS can blow up when S ∩ Ω = ∅, and this does in fact sometimes happen. What this means is that the first excited state in stoquastic case is very different from the non-stoquastic ground state. In other words, the subset that minimizes the expansion does a very poor job at minimizing the phase frustration. The best case for theorem 2 occurs when Ω = V , and theorem 1 and 2 coincide. The only advantage theorem 2 ever has is that it refers to the eigenvalue λ + 2 instead of λ + 2,Ω . a. Improved Cheeger inequalities and the square root. The topic of improved Cheeger inequalities is based on using higher eigenvalues to obtain tighter bounds. For any signed graph and any k ∈ {1, ..., N } we have [46] h σ 1 ≤ 16 2 D k The key point being the absence of the square root on λ σ 1 on the RHS. The meaning of this inequality is that we can get a quadratically improved connection between expansion and gap, in the cases when there one of the higher eigenvalues (for k ≥ 2 but not too large because of the k on the RHS above) is much larger than the bare spectral gap.
For example, consider a transverse Ising model in the ferromagnetic regime. It has two exponentially close eigenvalues, then a constant gap to the rest of the spectrum. The improved Cheeger inequality tells us that the expansion and the gap scale with the same order in this ground state.
An interesting thing about this is that it is similar to the condition under which diabatic QA succeeds: the low energy spectrum (which may have polynomially many states, say) is separated from the rest of the spectrum by a gap. There is also higher order improved signed Cheeger inequality, which says that there exists a constant C such that for any signed graph and any 1 ≤ k ≤ l ≤ N , h σ k ≤ C D lk 6 λ σ k λ σ l Therefore, if QA succeeds then there should be a gap after k = poly(n) eigenvalues, and so we expect that in many cases the bound tightens to λ σ 2 = O poly(n) λ σ 1 + λ + 2,Ω .