Duality in Quantum Quenches and Classical Approximation Algorithms: Pretty Good or Very Bad

We consider classical and quantum algorithms which have a duality property: roughly, either the algorithm provides some nontrivial improvement over random or there exist many solutions which are significantly worse than random. This enables one to give guarantees that the algorithm will find such a nontrivial improvement: if few solutions exist which are much worse than random, then a nontrivial improvement is guaranteed. The quantum algorithm is based on a sudden of a Hamiltonian; while the algorithm is general, we analyze it in the specific context of MAX-$K$-LIN$2$, for both even and odd $K$. The classical algorithm is a"dequantization of this algorithm", obtaining the same guarantee (indeed, some results which are only conjectured in the quantum case can be proven here); however, the quantum point of view helps in analyzing the performance of the classical algorithm and might in some cases perform better.


I. INTRODUCTION
For many combinatorial optimization problems, we expect that it is not possible to obtain an exact solution in polynomial time. Instead, the best that we can hope for is to obtain an approximate solution. The main result of this paper is a duality for certain approximations, that one can call "pretty good or very bad", in which the algorithm either finds a nontrivial improvement over random ("pretty good") or there exist many solutions which are significantly worse ("very bad"). This can lead to a method of proving guarantees of performance for the algorithm, if one knows that such very bad solutions do not exist.
We were led to this duality by analyzing a quantum approximation algorithm based on the idea of a quench: a sudden change in the Hamiltonian; more specifically, we prepare the system in the ground state of a given Hamiltonian, and then evolve it under a different Hamiltonian. We present this algorithm in this paper and give some evidence for the duality there. We then consider a classical approximation algorithm and prove the duality there. Finally, we discuss the performance of the quantum algorithm on certain instances.
We will consider the optimization problem MAX-K-LIN-2, with the assumption of a degree bound explained below. Roughly speaking, this problem MAX-K-LIN-2 considers an objective function which is a sum of terms of order K in binary variables; we give a more precise definition later (we use the term "order" rather than "degree" to denote the exponent of a polynomial to avoid confusion with the use of "degree" for the degree bound). We will call these binary variables "bits", though we emphasize that they take values in {−1, +1} rather than {0, 1}.
We consider an instance with degree D, so that each bit participates in D terms in the problem. Previous work has shown that for odd K it is possible to obtain a nontrivial approximation of order 1/ √ D for MAX-K-LIN-2 using a classical algorithm [2] (initially a quantum algorithm was found [10] providing weaker approximation guarantees but later the classical algorithm was discovered). Further, for arbitrary K the classical algorithm finds a solution which is either better than random by an amount 1/ √ D or worse by an amount of order 1/ √ D. This result implies the order 1/ √ D improvement for K odd, since if the algorithm finds a result which is worse than random by order 1/ √ D, one can change the sign of all bits to obtain an improvement by order 1/ √ D. We consider a different but closely related classical approximation algorithm and find (for arbitrary K, though the result is most interesting for even K) the duality mentioned above which generalizes this: rather than being better or worse by 1/ √ D, one can instead choose it to be slightly better or much worse. There is a constant one may choose and roughly (precise results are in theorem 2 below) the algorithm either finds a solution which improves on random by an amount / √ D or there is a solution which is worse than random by an amount −1 / √ D. For example, if one chooses slightly larger than 1/ √ D, the algorithm either improves on random by an amount more than 1/D (a "pretty good" solution) or there exists a solution which is worse than random by almost 1 (a "very bad" solution). The improvement by more than 1/D is important because it is always possible to find an assignment which improves by a factor 1/D in polynomial expected time [12], i.e, such an improvement can be found in polynomial expected time regardless of the value of the optimal assignment.
We also analyze a quantum algorithm based on quenches. Rather than slowly changing a Hamiltonian as in the adiabatic algorithm [9] (which in general is expected to have trouble with small energy gaps [1]), we suddenly change the Hamiltonian, but then spend some time evolving under the new Hamiltonian. We propose this algorithm as a general method for approximate optimization, but we analyze it in the context of MAX-K-LIN-2. Here we find a arXiv:1904.13339v2 [quant-ph] 4 Nov 2019 similar duality.
The quantum algorithm gives a point of view that is useful in analyzing the classical algorithm: both algorithms find improvements unless there is a quantum state with large polarization in the X direction (i.e., the expectation value of the sum of Pauli X operators on all qubits is large as defined below) and which has an expectation value for the objective function which is significantly worse than random. Some of the results in the quantum case are only conjectured, while they can be proven in the classical case. However, the quantum algorithm may be useful for some other instances.

A. Problem Definition and Examples
We consider the problem MAX-K-LIN-2. There are N variables, called bits, each of which may take values in {−1, +1}. The objective function, which we denote H Z , is taken to be a weighted sum of monomials of order K in these bits, i.e., each monomial is a product of K distinct bits (sometimes this problem is called MAX-EK-LIN-2 to distinguish it from a more general case where monomials may have order up to K). We will require that the weight of each monomial be chosen from {−1, +1}, and that all monomials be distinct from each other.
We consider an optimization problem where the goal is to maximize this objective function. We emphasize this because we will later consider a Hamiltonian which includes a term proportional to H Z , and so we will be considering states near the highest energy state of that Hamiltonian, rather than the lowest energy state as more commonly done in physics.
We write the bits as Z i where i ∈ {1 . . . , N } so that there are N bits, so for MAX-2-LIN-2 we have where J ij is a matrix with entries chosen from {−1, 0, +1}. We will assume a degree bound D, so that each bit Z i appears in at most D distinct monomials in H Z . Indeed, for simplicity we will only consider the case where each Z i appears in exactly D monomials in H Z . We define N T to equal the number of terms in H Z so that if every bit has degree exactly D and every term is order exactly K then we have A random assignment has expectation value of H Z equal to 0. Typically in computer science, one regards each of these monomials as a constraint: the constraint is satisfied if the monomial is equal to +1 and it is violated otherwise, so that the number of satisfied constraints is equal to the value of H Z /2 plus N T /2. Hence, a random assignment satisfies half the constraints on average. Then, the approximation ratio achieved by some assignment to the bits is defined to be the fraction of constraints satisfied by that assignment divided by the fraction of constraints satisfied by the optimal assignment.
We will define the approximation ratio differently: we will define it to be the value of H Z for a given assignment divided by the value of H Z in the optimal assignment. That is, we will not add this term N T /2.
We will also say that an assignment improves by a factor f over random if it has H Z ≥ f N T . We say that an assignment is worse than random by a factor f if it outputs an assignment H Z ≤ −f N T .
For odd K, it is possible to improve over a random assignment by exp(−O(K))/ √ D in polynomial expected time [2]. One cannot expect to have such an improvement for even K simply because there exist families of instances in which no assignment has H Z larger than N T · O(1/D). For K = 2, a simple such example is to choose Here we have taken D = N − 1 so that every bit is in some monomial with another bit. It is possible to obtain a very large negative expectation value of H Z (i.e., −N (N − 1)/2) by choosing all Z i to have the same sign, but for even N , the maximum positive expectation value of H Z is to choose N/2 of the Z i to equal +1 and the remainder to equal −1, giving expectation value N/2, which is proportional to N T /N . This example provides an early example of the duality: the maximum improvement over random is quite small (a factor O(1/D)) but one can find an assignment which is a factor Ω(1) worse than random. For K = 2m, one can generalize example (3) to give an instance for MAX-K-LIN-2 as follows: let N = mD. Divide the set of mD bits into D disjoint sets, each containing m bits. Label the sets by integers in 1, . . . , D. LetZ i be the product of the bits in the i-th set. Let H Z = − i<jZ iZj .

B. Outline
In section II we define the quench algorithm, both in the specific form that we analyze later as well as some variants that may be useful. Subsection II C shows how the duality arises in the quantum algorithm; here we need to make some conjectures to show that the duality holds. In section III we collect some results that will be useful in analyzing the classical algorithm that we give later as well as in analyzing the quantum algorithm. In section IV we define the classical algorithm and analyze it; in contrast to the quantum case, we will be able to prove all the conjectured results about the classical algorithm. In section V we consider some applications of the analysis of these algorithms. In section VI we give a further analysis of the quantum algorithm in an attempt to support the conjectures of subsection II C.

II. QUENCH ALGORITHM
To define the algorithm, we promote the bits to qubits, and we let Z i be the Pauli Z operator on the i-th qubit. Let X i be the Pauli X operator on the i-th qubit and let We use the following algorithm. Let where α is a scalar chosen later. We prepare the system in the state ψ + maximally polarized in the + direction so that the expectation value of X i is equal to +1 for all i. We then evolve the system under Hamiltonian H for a time T that we choose later. This time will in all case be at most poly(N ); indeed, our analysis will be for T = O(1). Hence, this evolution can be performed in polynomial time on a quantum computer in time polynomial in t max and polynomial in the inverse error using any of a number of algorithms [3-5, 15, 16] (indeed, the simulation can be performed in time polylogarithmic in the inverse error for some of these but we will not need this). In any simulation algorithm on a quantum computer, we discretize the variable t; for example, one may choose it to equal an integer multiple of some time t min for some t min which is polynomially small; this causes only a polynomially small error. Finally, we measure the state of the system in the computational basis, giving an assignment of bits Z i . This algorithm can be regarded as an example of a quantum walk on a hypercube [8,20]. While the present paper was in preparation, another paper used these quantum walk ideas to give a closely related algorithm which was then analyzed numerically in the context of the Sherrington-Kirkpatrick and random energy models [7].
In the analysis of the algorithm, we will ignore all the errors associated with the time evolution and the discretization of time, since a polynomially small error is negligible as may be verified.
When we apply this algorithm, one may repeat the algorithm several times with T chosen from an appropriate distribution. In this regard, it is interesting to think about the state arising from averaging T over an interval of times; by choosing the time from a random distribution (or alternatively by performing phase estimation of the Hamiltonian H) we can decohere the system in an eigenbasis. The fixed evolution has a similar effect but is easier to analyze using the techniques here. We can also use a similar idea to that in Ref. [18] and simulate a function of the Hamiltonian which should have a similar effect but may be faster to simulate.

A. Motivation
Let us heuristically explain the algorithm. The time evolution has two purposes. The first is to decohere different eigenstates of the Hamiltonian as mentioned; for fixed time, the evolution for time t produces a pure state, but produces some change in phase for different energies which has a similar effect to a random evolution. The second purpose is to do it in a way that conserves energy. One hopes that the decoherence between different eigenstates will lead to a reduction in the expectation value of X, since one hopes that individual eigenstates will not have large X. This reduction will lead to a positive expectation value of H Z due to the energy conservation as we now explain: this energy conservation is the second reason for the time evolution.
For arbitrary operators O, H and scalar t, define τ H t (O) = exp(itH)O exp(−itH). Define Define so that O T is the expectation value at time T . We have independent of T by energy conservation. Hence, we have That is, if the state at time T has an expectation value of X that is smaller than the maximal (i.e., smaller than N ), it necessarily has an expectation value of H Z that is positive, i.e., it has obtained some solution that is better than random. This is the key idea behind the quench algorithm. Note that if the algorithm obtains a state with a large expectation value of H Z (much larger than N T /D), then since the expected value is within 1/poly(D) of the optimal value (which is at most O(N T )), by repeating the algorithm poly(D) times we can, with probability at least one-half, obtain a solution which is at least a constant factor times the expected value. Here the constant factor can be any number strictly less than 1, for example 1/2. This is an application of Markov's inequality. Consider H Z − H Z as a non-negative random variable with expectation value

B. Heuristic Choices of α
We now discuss how to choose α. We give a calculation that introduces some of the notation used later. We consider perturbation theory to only second order, and we then give a purely heuristic treatment of higher orders to motivate the choice of α. Later we will give a different treatment.
Consider the series for τ H T (X i ) for some given i: For any operator O, we have the series So, we have where the dots denote terms of order T 3 or higher. Hence, and so by Eq. (9), Of course, the higher order corrections to this perturbation theory must become important for large enough T, α. For one thing, once T 1, the effects of higher order terms in T X in the exponential become important, i.e., we must consider higher order commutators such as [[[[X, H Z ], X], X], H Z ]. However, we might hope that for some T of order unity (for example, T = 1/2) the higher orders in T will not be too important; maybe they will not be negligible but we might hope that they will only slightly reduce the result.
However, even for such a fixed T = 1/2, we certainly cannot ignore higher order terms in (α/D)H Z for large enough α. For example, if α is sufficiently larger than √ D, we would find that Eq. (12) gives a result for X which is smaller than −N , which is impossible. So, the most optimistic thing we can hope for is that second order perturbation theory is roughly accurate up to some T of order unity such as T = 1/2 and up to α proportional to √ D. If so, we would find that the best choice of α would be to take α proportional to √ D, in which case we would have H Z proportional to N √ D, which is proportional to N T · Ω(1/ √ D). Thus it would give an Ω(1/ √ D) factor approximation. However, clearly this heuristic analysis is too optimistic. Such solutions do exist for MAX-K-LIN-2 for odd K (though we certainly have not shown that the algorithm finds them), but they do not exist in general, such as the example of Eq. (3).

C. Duality
The previous subsection considered a perturbative approach; the second order term corresponded to considering the second derivative with respect to T of X T at T = 0. We now consider this derivative at arbitrary T .
We introduce some notation that will be useful both here and later, including for the classical algorithm. Let us define F i (the symbol "F" is for "force", i.e., a derivative of energy with respect to some coordinate) to equal Z i times the sum of terms in H Z that include Z i . For example, for K = 4 and Note that the multiplication by Z i reduces the order of the terms in F i to K − 1 since Z 2 i = 1. The "force" depends upon the choice of Z i so we will sometimes write F i ( Z) to indicate its dependence on Z.
Considering this second derivative at arbitrary T we have where for any operator O, For T = 0 the first term is equal to −4 α 2 D . Assuming (we consider this in more detail later) that the first term Heuristically speaking, and ignoring the correlation between X i and F 2 i , one way that the first term could become small is for the expectation value of X i to become small. This would of course mean a state with large expectation value of H Z . Another way is for F 2 i to become small. Thus, under the assumption about the first term, we have at least one of two situations. Either, after time T , we have for some time s ≤ T (or both possibilities may hold). Further, at that time s, if we do not have , if the algorithm does not find a state (by sampling over times s ≤ T ) with expectation value of H Z equal to Ω K (1)αT 2 N , then there exists some state with expectation value of X at least Here the notation Ω K (. . .) denotes that the constant may depend upon K but not on α, T, N, D. Choosing α 2 T 2 ∼ D, we see that either the algorithm finds a solution with expectation value of H Z equal to at least or both hold. Choosing α ∼ √ D, these two quantities, Ω K ( D α )N and Ω K (αN ), are comparable to each other. Choosing √ D α D, the first quantity improves by a factor which is 1/D compared to the random solution, even if it is not as large as 1/ √ D; we call this a "pretty good solution", while the second quantity gives an expectation value of which is very large. We will see how, in the section which follows, to convert this large expectation value to a large expectation value (which may be positive or negative) for H Z ; if this expectation value for H Z is positive then this is also a good solution, while if it is negative then it gives a solution which is worse than random by a factor 1/ √ D; we call this a very bad solution. Note that i Z i F i = KH Z . Above, we have considered the expectation value X T , but we can also consider higher moments of X. We will explain the reason for considering this later. The time evolution conserves the quantity H = X + α D H Z , but it also conserves all moments of this quantity. Note that in the state ψ + we have (H − N ) 2 Hence, Hence, we have related fluctuations in X − N to fluctuations in H Z . Suppose it is the case that with probability at most (αT 2 /D) 2 that τ H T (H Z ) is measured to be greater than αT 2 N (if this does not hold, then we can repeat the algorithm polynomially many times to have a large probability of obtaining a state in the computational basis with expectation value of H Z greater than αT 2 N ). Under this assumption, then since H Z ≤ DN , it follows that In the limit of large N , the quantity √ N T is asymptotically only √ N and so is negligible compared to the leading term, i.e., the rms (root-mean-square) fluctuations in X − N are comparable to or smaller than the magnitude of X − N .

III. COMBINING SOLUTIONS
Here we give some general results on how, given a solution to an optimization problem for a polynomial in several vectorial variables, one can construct a solution to the same problem where all variables are chosen to be the same; we call this "combining". Theorem 1 is the main result. We will use this result in both the classical and quantum algorithms; the vectors w a are the solution to the problem using several vectorial variables, while the u is the solution with all variables the same. This plays a key role in the classical algorithm, while for the quantum algorithm one can use a large expectation value for a quantity like Y iḞi , which is a polynomial in variables Y i , Z i to find a solution with large expectation in a single variable.
These results involve polynomials in real variables. However, the objective function H Z is an order-K polynomial in variables Z i ∈ {−1, +1}. Each Z i is chosen from {−1, +1}. Let Z be a vector of choices of variables Z. We write H Z ( Z) to denote the value of H Z for that given set of choices.
To apply the results to H Z , we randomly round choices of Z i from the interval [−1, +1] to choices of Z i from the discrete set {−1, +1} while preserving expectation value. Formally, consider a vectorial variable v with each entry chosen from the interval [−1, +1]. Then, independently choosing each Z i at random from {−1, +1}, picking the probability for each Now, let us define a polynomial H Z ( v 1 , v 2 , . . . , v K ) which depends upon K different vectorial variables as follows. This polynomial will be homogeneous of order 1 in each variable. For each term in where c is a scalar and i 1 , i 2 , . . . , i K are a sequence of distinct choices of i, we have a corresponding term in where the sum is over permutations π on K elements and ( v a ) b denotes the b-th entry of vector v a . For example, for Here in an abuse of notation we use the same symbol H Z (·) for two different functions, one depending on K vectorial arguments and one depending on a single vectorial argument.
Note that We will show that, given choice of v 1 , . . . , v K such that H Z ( v 1 , . . . , v K ) has a certain magnitude, we will find a choice of v such that H Z ( v) obeys certain conditions on its magnitude. This will then be used in the classical setting in the following simple way: we will pick some vector w 2 at random and then choose w 1 greedily to optimize H Z ( w 1 , w 2 , w 2 , w 2 , . . . , w 2 ). Here the variable w 1 appears 1 time while the variable w 2 appears K − 1 times. This will give us the choice of K different vectorial variables (though one variable is repeated K − 1 times) from which we will construct a solution with a single variable.
Item 1 in the theorem will be the case that we need most. Item 2 almost follows from item 1 with = 1, except item 2 has slightly tighter bounds. Item 3 is given for completeness as it shows that some similar results hold when many variables are present and also item 3 is used in the proof of item 1. Thus, the reader may consider only item 1.
where ( v a ) i denotes the i-th entry of vector v a . Assume that all vectors v a have the same number of entries, and assume that P is symmetric under permuting its arguments, i.e., that a i1,...,i K is symmetric under permuting its arguments.
Then the following holds: 1. Suppose that there exist vectors w 1 , w 2 such that P ( w 1 , w 2 , w 2 , w 2 , . . . , w 2 ) = C/K. Then for any > 0, at least one of the following two possibilities holds: Remark: item A is a statement about P while item B is a statement about the absolute value of P .
2. Suppose that there exist vectors w 1 , w 2 such that P ( w 1 , w 2 , w 2 , w 2 , . . . , w 2 ) = C/K and such that |( w a ) i | ≤ 1 for all a, i. (That is, the variable w 1 appears 1 time while the variable w 2 appears K − 1 times. Then, there exists some vector u with 3. Suppose that there exist some vectors w 1 , . . . , w K such that P ( w 1 , . . . , w K ) = C and such that |( w a ) i | ≤ 1 for all a, i. Then, there exists some vector u with for all i such that Further, in all cases, we can find u up to any desired nonzero error in a time linear in N , exponential in K, and at most polynomial in inverse error multiplied by the magnitude of the terms in the polynomial.
Note that item 3 above allows all of the w a to be distinct. Items 1,2 consider the case of just two different w a , with w 2 repeated K − 1 times in the argument of P (·). We can summarize item 2 as saying that one can obtain a solution whose absolute value is close to C, while item 1 can be summarized for small as saying that, compared to P ( w 2 , w 2 , . . . , w 2 ), either we can improve by a small amount (this is the "pretty good") or there is a solution which is much worse (this is the "very bad"). Note also that the bound on |( u) i | is different in item 2 compared to items 1,3.
We now prove the theorem. Define a function u(·), from R 2 to vectors, by where x a w a denotes the vector with i-th entry equal to x a ( w a ) i . First, we prove item 1. We need Remark: the factor 1/6 in the above equation is not optimal. It can be tightened easily. Indeed, for a 1 a max , the factor 1/6 approaches 1/2. , 1), . . . , u(x, 1)). Apply lemma 1 to Q(x) with a 1 = C. If case A of item 1 of theorem 1 does not hold for some given , then (1/6)C 2 /a max < C so a max > (1/6)(C/ ). So for some i ≥ 1, |a i | > (1/6)(C/ ). So, where w 1 appears i times in the argument of P (·) and w 2 appears K − i times. So, by item 3 of theorem 1, which we prove below, there is some choice of u with |( u) i | ≤ 1 for all i such that Lemma 2. Let p(x) be a polynomial of order K with p(x) = 0≤i≤d a i x i . Then, for K odd and for K even Proof. The proof is similar to the proof that the Chebyshev polynomials minimize the maximum absolute value on the interval [−1, 1] among all polynomials with given leading coefficients, i.e., with given value of a K . In this case, we instead fix the value of a 1 , but the proof is almost the same. First, without loss of generality we may assume that p(x) = −p(−x), as (p(x) − p(−x))/2 is also a polynomial of order K with coefficient of the linear term also equal to a 1 and |(p(x) − p(−x))/2| ≤ max(|p(x)|, |p(−x)|). So, we can assume that K is odd and the result for even K will follow immediately from the result for odd K.
Also, without loss of generality we may assume that a 1 = 1. Indeed, if a 1 = 0, then the result is trivially true, while for any nonzero a 1 we can instead consider p(x)/a 1 .
Assume that the lemma is false, i.e., assume that p(x) has maximum absolute value on the interval [−1, 1] which is strictly smaller than 1/K. Let T n (x) be the Chebyshev polynomials of first kind. For odd K, we have that −(−1) K ·T K (x)/K is an polynomial of order K which has coefficient of the linear term equal to 1. Further, −(−1) K · T K (x)/K has a maximum absolute value on the interval [−1, 1] equal to 1/K and it attains this maximum K + 1 times on this interval at points x = cos(kπ/K) for 0 ≤ k ≤ K. Let q(x) = p(x) + (−1) K · T K (x)/K. So, q(x) has coefficient of the linear term equal to zero, i.e., since it is an odd function of x, we have q(x) = i=3,5,...,K b i x i for some coefficients b i . Further by the assumption that p(x) has absolute value strictly smaller than 1/K on the interval, we have that at points x = cos(kπ/K) the sign of q(x) is the same as the sign of (−1) K · T K (x)/K. So, since the sign of T K (x) alternates at these points, i.e., the sign for even k is opposite to that for odd k, we have that q(x) changes sign at least K times so q(x) must have at least K − 1 distinct zeros. However, q(x) has order K and the root at x = 0 is triply degenerate so in fact q(x) can only have at most K − 2 distinct zeros, giving a contradiction. , 1), . . . , u(x, 1)). Applying lemma 2 to p(x) = Q(x) with a 1 = C, the result follows.
For both item 1 and 2, we can find an x which maximizes or minimizes |Q(x)| up to any given error by exhaustively trying a discrete set of points on the interval [−1, 1] with the spacing between points dependent on the error.
We finally prove item 3 of theorem 1. We need: Lemma 3. Let p(x 1 , . . . , x K ) be a polynomial (not necessarily homogenous) of order at most K in real variables x 1 , . . . , x K . Suppose that the coefficient of the term i x i in p(·) is equal to C. Then, for some choice of x 1 , . . . , x K ∈ {−1, +1} K we have that |p(x 1 , . . . , x K )| ≥ C.
Proof. We claim that This holds because any term in p(·) proportional to i x di i for some sequence of integers d i will vanish in the weighted sum above unless all d i are odd. However, since p(·) has order d, the only such nonvanishing term is that with all d i = 1.

IV. CLASSICAL ALGORITHM
We now describe the classical optimization algorithm. Recall that we define F i to equal Z i times the sum of terms in H Z that include Z i .

Algorithm 1 Classical algorithm
1. Choose a set S of bits, by including each bit in S independently with probability 1/2.

2.
Define vectorial variables w1, w2 as follows; the index of the vectorial variable will range over {1, . . . , N } so that it labels a bit. Let w2 be a vector with ( w2)i = 0 for i ∈ S while for i ∈ S we choose ( w2)i to be +1 or −1 independently and uniformly at random. We choose vector w1 so that ( w1)i = 0 for i ∈ S while for i ∈ S we choose ( w1)i "greedily". That is, we pick ( w1)i = +1 if Fi( w2) > 0 and ( w2)i = −1 otherwise.

A. Some Probability Bounds
We collect here some probability bounds that we will need to analyze this algorithm, as well as to analyze the classical algorithm. The use of these bounds is similar to that in Ref. [2].
By theorem 9.23 of Ref. [17], for any function f of order at most K from {−1, 1} N → R we have for any t ≥ (2e) K/2 that By theorem 9.24 of Ref. [17], for any nonconstant function f of order at most K from {−1, 1} N → R , Hence, for any nonconstant function f of order at most K from {−1, 1} N → R , by applying Eq. (32) to f 2 we have

B. Analysis of Classical Algorithm
We will use E[. . .] to denote expectation values over choices of w 2 . We claim that E[|F i |] ≥ √ D exp(−O(K)) and that E[C] ≥ N √ D exp(−O(K)). To see this, note that each site is in S with probability at least 1/2. For any site (including a site in S in particular), we have that E[F i ( w 2 ) 2 ] is equal to 2 −(K−1) D. The factor of 2 −(K−1) arises because each monomial in F i is of order K − 1 and has probability 2 −(K−1) that all bits in that monomial are not in ) follows from Eq. (33). Note that the maximal value for C is N D, so with probability at least 1/poly(D) we find a choice of w 2 such that C is at least a constant factor times the expected value. Here the constant factor can be any number strictly less than 1, for example 1/2. This is an application of Markov's inequality. Consider N D − C as a non-negative random variable with expectation value N D − E[C]. The probability that C is smaller than E[C]/2, for example, is bounded by For such choices of w 2 , the algorithm must choose either case 1A or case 1B at least half the time (or any other number Ω(1) rather than one half). Hence, at least one of the following holds: with probability P at least poly(1/D) the algorithm chooses case 1A and C is within a constant factor of the expected value so that H Z ( u) is at least H Z ( w 2 ) + N √ D exp(−O(K)), or, with probability P at least poly(1/D) the algorithm chooses case 1B and C is within a constant factor of the expected value so that |H Z ( u)| ≥ N √ D exp(−O(K))/ . Now consider H Z ( w 2 ). This has expectation value 0 and the expectation value of H Z ( w 2 ) 2 is O(N T ). So by Eq. (31) the probability that |H Z ( w 2 )| is larger than O(log(N ) K/2 √ N T ) is equal to N −K/2e . This probability N −K/2e is asymptotically (in N ) negligible compared to P for any P = Ω(poly(1/D)). Hence, by a union bound, with probability P = Ω(poly(1/D)) one of the cases in the above paragraph holds (i.e., algorithm chooses either case 1A or 1B and the given bounds on H Z ( u) hold) and also |H Z ( w 2 )| is o(N ). So, Note that for odd K, we can guarantee that it achieves expected H Z ( u) ≥ N √ D exp(−O(K)) as in Ref.
[2] since we can pick = 1 and if case 1B occurs, we can change the sign of all bits.

C. Modification With Generalized Duality and Comparison to Quantum Duality
The classical algorithm above achieves a duality very similar to that of the conjectured duality of conjecture 1 with Then, Thus, while we conjectured that the quantum algorithm either gave an expectation value of H Z equal to Ω K (1)αT 2 N or that there was some state with i (2Z i F i − Y iḞi ) at least Ω K (1)αN , we find for the classical algorithm that we can prove either that at least poly(1/D) of the time it chose 1A and has an expectation value of H Z equal to Ω K (1)αT 2 N , or that there was some state with expectation value of H Z at least Ω K (1)αN in absolute value. The main difference then between the conjectured result for the quantum algorithm and the proven result for the classical algorithm (in the particular case that with H Z in the expectation value in the classical case. These two operators, Y iḞi and H Z , are closely related to each other, with the first operator i Y iḞi being obtained by taking each monomial in the sum defining H Z and replacing two of the Pauli Z operators in that monomial with Pauli Y operators; each term in H Z then gets replaced with ≈ K 2 /2 different terms. We describe this in more detail in subsubsection IV C 1. One may then ask if one can achieve a similar duality in the classical algorithm that would be analogous to the case of α 2 T 2 /D 1. Indeed, this can be done, as we describe in subsubsection IV C 2. We emphasize that that subsubsection considers a statement about the classical algorithm: it proves that either the classical algorithm attains a certain performance or a quantum state (not necessarily at all related to the quantum state considered in the quantum algorithm) has a certain expectation value of H Z and also that state has a certain expectation value of X. The point of that subsubsection is to show that an apparent extra feature of the duality in the quantum case (when one considers α 2 T 2 /D 1 so that the expectation value of X is large) does not actually directly give any more powerful results.

Combining Solutions for Quantum Algorithm
Suppose we have a quantum state with large (in absolute value) expectation value for the operator i (2Z i F i −Y iḞi ). We will describe how to construct a classical state which has large expectation value (again, in absolute value) for H Z . Indeed, the classical state will be constructed simply by measuring the quantum state in a product basis and (possibly) combining solutions.
Let V denote the absolute value of the expectation value of i (2Z i F i − Y iḞi ) in the quantum state. Then, at least one of the following holds: the expectation value of 2 i Z i F i is at least V /2 in absolute value, or the expectation value of − i Y iḞi is at least V /2 in absolute value. In the first case, using the identity that i Z i F i = KH Z , the expectation value of H Z in the state is at least V /(4K) in absolute value and one can simply measure the quantum state in the Z basis to obtain a classical state with expectation of H Z which is at least V /(4K) in absolute value.
In the second case we have that the expectation value of i Y iḞi is larger in absolute value than V /2. This operator i Y iḞi is related to H Z as mentioned. We will explore this relation in more depth. We randomly divide the qubits into two subsets; each qubit will be placed in subset S 1 with probability 1/K and in subset S 2 with probability 1 − 1/K, choosing independently for each qubit. Then, we measure each qubit in S 1 in the Y basis and measure each qubit in S 2 in the Z basis.
We will define χ to be the state after measurement, i.e., χ is a product state where each qubit in S 1 is either ±1 in the Y basis and each qubit in S 2 is either ±1 in the Z basis. Consider the expectation value of i Y iḞi in this product state χ. For each term in i Y iḞi , the expectation value is zero unless both occurrences of Pauli Y operators are for qubits in S 1 and all occurrences of Pauli Z operators are for qubits in S 2 ; this occurs with probability (1/K) 2 (1 − 1/K) K−2 = Ω(1/K 2 ). Let us use E meas,S1 [. . .] to denote an expectation value over measurement outcomes and over choices of S 1 . So E meas,S1 [ χ| i Y iḞi |χ ] ≥ Ω(1/K 2 )V . So, using a similar Markov inequality as before, with at most polynomially small probability a given choice of χ and S 1 will have | χ| i Y iḞi |χ | at least Ω(1/K 2 )V .
Consider such a choice of χ and S 1 .
where v 2 is repeated K − 2 times. Consider a given term cZ i1 Z i2 . . . Z i K in the Hamiltonian for some scalar c.
. This vanishes unless exactly two elements of the sequence i 1 , . . . , i K are in S 1 and the remaining elements are in S 2 . Let us permute the order of the sequence so that i 1 , i 2 ∈ S 1 . Then, the term is equal to (after summing over permutations) This is equal to a sum of K 2 different terms, corresponding to the different ways of replacing two Pauli Z operators with Pauli Y operators. The only one of these terms which is nonvanishing in the expectation value χ| i Y iḞi |χ is when we choose to replace i 1 , i 2 by Pauli Y operators. In this case, the contribution to the expectation value is 2 Hence, summing over all terms in H Z , we find that At this point, if desired, one could apply the combining solution techniques to v 1 , v 2 to obtain a single vector with large expectation value for H Z in absolute value.

Generalized Duality for Classical Algorithm
In this subsubsubsection, we describe a modification of the classical algorithm which provably achieves a duality similar to that in the quantum case, so that the performance of the classical algorithm is guaranteed unless there exists a quantum state with certain properties, including a large expectation value of X. The modification is simple: we change step 2 to give the following modification. The bounds in step 3 change as a consequence and are given later.

Algorithm 2 Modified classical algorithm
1. Fix some real number 0 < p < 1. Choose a set S of bits, by including each bit in S independently with probability 1/2.

2.
Define vectorial variables w1, w2 as follows; the index of the vectorial variable will range over {1, . . . , N } so that it labels a bit. Let w2 be a vector with ( w2)i = 0 for i ∈ S while for i ∈ S we choose ( w2)i to be +1 or −1 independently and uniformly at random. We choose vector w1 so that ( w1)i = 0 for i ∈ S while for i ∈ S we choose ( w1)i as follows. Pick where the constant p > 0 is chosen below. If this choice of ( w1)i gives |( w1)i| > 1, then replace ( w1)i with ( w1)i/|( w1)i|.
We will always pick where the constant c is chosen below. For any choice of S, the expectation value of F 2 i is bounded by D. So, the probability that |p Fi √ D | > 1 is bounded by the probability that |F i | > (2ec) K/2 √ D and so by Eq. (31), this probability is bounded by exp(−cK).
First let us estimate E[ F · w 1 ], where F is a vector with components F i . This is at least equal to the sum over sites i ∈ S of with |( w 1 ) i | < 1 of the expected value of E[F i ( w 1 ) i ], which in turn is equal to the sum over sites i ∈ S of where µ i (f ) is the probability distribution of force f on site i. Using Eq. (31), we have that |f |> √ D/p dµ i (f ) ≤ exp(−cK), as explained above. Indeed, further application of Eq. (31) shows that To show this one may, for example, divide the integral of |f | > D/p into integrals over |f | in intervals [k √ D/p, (k + 1) D/p] for integer k, and separately bound each integral by (k + 1) 2 (D/p) |f |>k √ D/p dµ i (f ). For large enough c we then have Hence, Hence, the constant C has E[C] ≥ pN √ D exp(−O(K)). As before, the maximal value for C is N D, so with probability at least 1/poly(D) we find a choice of w 2 such that C is within a constant factor of the expected value.
For such choices of w 2 , the algorithm must choose either case 1A or case 1B at least half the time (or any other number Ω(1) rather than one half). Hence, at least one of the following holds: with probability P at least poly(1/D) the algorithm chooses case 1A and C is within a constant factor of the expected value so that H Z ( u) is at least o(N ) + N p √ D exp(−O(K)), or, with probability P at least poly(1/D) the algorithm chooses case 1B and C is within a constant factor of the expected value so that the |H Z ( u)| ≥ o(N ) + N p √ D exp(−O(K))/ . Thus far, it seems that all we have accomplished is worsening the previous result (by a factor of p). However, now we show how if the second case holds (the algorithm chooses case 1B), then we can construct a quantum state with large expectation value of X and with an expectation value (in that quantum state) of |H Z ( u)| which is at least O(log(N ) K/2 √ N T ) + N p √ D exp(−O(K))/ . Before defining ψ, let us note the following. When the algorithm chooses case 1B, the vector u is equal to a linear combination xp w 1 + w 2 for some x ∈ [−1, 1]. Discretizing the interval [−1, 1] into poly(D) bins, each of width poly(1/D), we find that if the algorithm chooses case 1B with probability at least poly(1/D), then there is some bin such that with probability at least poly(1/D) the algorithm chooses case 1B and such that x falls in that bin. Choosing the width of the bins small enough, we can assume then that there is some fixed value of x = x 0 ∈ [−1, 1] such that with probability poly(1/D) the vector u = x 0 p w 1 + w 2 has |H Z ( u)| ≥ o(N ) + N p √ D exp(−O(K))/ − N/poly(D). Now apply Eq. (31) to this case. Let E[H Z ] x0 denote the expectation value of H Z ( u) for vector u = x 0 w 1 + w 2 , taking the expectation value over choices of w 2 . This expectation value is the expectation value of a polynomial of order at most O(K 2 ), as each monomial in H Z is of order K and each entry of w 1 in turn is a monomial of order at most K − 1; it is true that we cutoff entries w 1 by 1, but this occurs with negligible probability. The expectation value (N poly(D)). Hence, using Eq. (31), we can bound fluctuations of H Z ( x 0 p w 1 + w 2 ) about its average. So, since with probability at least poly(1/D) we have |H Z ( . We now define this state ψ by where θ = p x 0 √ D and where we define F S,i to denote the sum of all terms in F i which are supported on the complement of S.
We first compute ψ|H Z |ψ . Consider any term in H Z . Such a term is proportional to some monomial M ≡ Z i1 Z i2 . . . Z i K , for some sequence of distinct qubits i 1 , . . . , i K . Suppose that i 1 , . . . , i j are in S and i j+1 , . . . , i K are in the complement of S. We have that ψ|M |ψ is equal to We can expand the above expectation value as a sum of 2 j different expectation values, by choosing for each j = 1, . . . , l to take either sin(θF S,i l )X i l or cos(θF S,i l )Z i l . However, every expectation value for which we choose cos(θF S,i l )Z i l for at least one choice of l is vanishing in the state ψ + , as then Z i l appears exactly once in the product (the terms F S,im do not contains Z i l . Hence, We now estimate the error in approximating sin(θF S,i l ) by θF S,i l . This error, for any M , is O(θ 3 F 3 S,i l ). We show below that this is negligible for sufficiently small θ.
Before bounding this error, note that if we include only the linear term θF S,i l in the approximation to the sine, then for θ = px 0 / √ D we find that the expectation value of H Z in state ψ is equal to To get oriented, suppose that F S,i l were bounded by the two results match. Note, however, that we only can show this result for sufficiently small p obeying Eq. (37) for the classical algorithm. Of course, we can always achieve the performance of theorem 2; the restriction on p here is just if we also wish to show the existence of a quantum state obeying with large expectation value of X. One can consider higher moments too. Note that ψ|X i X j |ψ − ψ|X i |ψ · ψ|X j | is vanishing unless i, j both appear in some term in H Z or unless there is some k such that i, k are both in some term in H Z and k, j are both in some term in H Z . Hence, ψ|X 2 |ψ − ψ|X|ψ 2 = o(N ). Similar bounds can be made for higher moments.

V. LARGE X EXPECTATION VALUE IN DUALITY
We now consider some applications of these dualities.

A. Random Models
First consider a random model. Consider any K and any D. We consider a fixed set of terms in H Z , but with the signs of each term chosen randomly. We choose the signs independently, setting them equal to +1 with probability 1/2 and −1 with probability 1/2. Then, for any choice of v ∈ {−1, +1} N , the expectation value of H Z ( v) is equal to 0. The probability that |H Z ( v)| is greater than ∆ is bounded by exp(−Ω(∆ 2 /N T )) = exp(−Ω(∆ 2 /(N D))). There are 2 N possible choices of v, so by a union bound, with high probability there is no choice of v such that |H Z ( v)| is greater than O(N √ D). Hence, by theorem 2 with = exp(−O(K)), we have that with high probability, a random instance has the property that the classical algorithm succeeds in finding a solution with H Z ≥ N √ D exp(−O(K)) a fraction at least poly(1/D) of the time.

B. Mean-Field Treatment
Now we consider some heuristic motivation why it may be worth considering the dualities that involve a large expectation value of X.
For motivation, to explain why this large expectation value of X may be useful, we give an approximate mean-field treatment: consider some Hamiltonian of order K that we will call H 0 that is diagonal in the Z-basis. Suppose we wish to minimize the expectation value of H Z over states with given expectation value of X, i.e., we seek a state with large negative expectation value of H Z . If no constraint were placed on the expectation value of X, then we maximize H Z by choosing some state in the computational basis. For each qubit i, this state has some expectation value Z i = z i with z i ∈ {−1, +1}. If we wish to obtain a nonzero expectation value of X, then a simple way is to take a product state, where each qubit has X i = cos(θ) and Z i = z i sin(θ), for some angle θ. For θ = π/2, we recover the classical state. At small θ, the expectation value of H Z is proportional to θ K , while the expectation value of 1 − X i is proportional to θ 2 . Thus, for K > 2, the expectation value of H 0 drops more rapidly as a function of θ than does the expectation value of 1 − X i .
A similar mean-field treatment might be applied to a Hamiltonian that includes both Y and Z operators, such as i Y iḞi relevant to the quantum algorithm: given any product solution of H 0 with Z i = z i and Y i = y i with z 2 i + y 2 i = 1, we can define a product state with X i = cos(θ) and Z i = z i sin(θ) and Y i = y i sin(θ). If this mean-field procedure were the best possible then we would have very strong bounds on the existence of such a state: we would have (for small θ) the scaling θ 2 ∼ p 2 and while the expectation value of H Z would be at most θ K in absolute value times the minimal value of H Z . Call this optimal value H min Z . So, we would have |H min Here we are ignoring terms which are o(N ). Ignoring K-dependent constants such as exp(−O(K)), and taking, at the most optimistic situation, p ∼ θ ∼ 1/ √ D (since for smaller θ the expectation value of X i is within 1/D of 1 and certainly the mean-field is not accurate here), we would find that such a state has |H min Z | ∼ D K/2 N/ . That is, either the algorithm finds a solution with expectation value of H Z at least N or |H min Z | D K/2 N/ . For the case K = 2, this is the same guarantee as before, if we rescale → √ D, but for K = 4 or larger, this is a much stronger guarantee. Of course, this mean-field procedure is only an approximation and other states may exist with more negative expectation value of H Z at the same expectation value of X.
Still, one use of the large expectation value of X is that any such quantum state must necessarily have a large entropy in the computational basis [14]. Thus, not only must there exist computational basis states states with large |H Z |, there must exist many such states.

C. Dense Case
A final interesting case to consider is a dense case, N T ∼ N K . The dense case was studied previously [13] where it was shown that one can in general improve upon a random assignment by an amount proportional to √ N T . This means that one can achieve H Z ∼ N K/2 in the worst case. This is interesting as the problem has degree D ∼ N K−1 and so the improvement over random even in the worst case is by much more than N T /D for K > 2.
In fact, the algorithm of Ref. [13] is very simple, consisting simply of randomly sampling solutions until one achieve a solution with the given improvement. Indeed, the fluctuations in the expectation value of H Z that we have written as o(N ) above simply reflect this variance in the solution.
However, it is interesting to analyze what happens with the quantum algorithm. Consider the Hamiltonian which amounts to choosing α = D/ √ N T . We analyze this Hamiltonian using a Krylov subspace: define the three states where the scalar c = ( 0|H 2  ), it follows that no approximate eigenstate, can have almost all of its probability on state |0 . Precisely, let ψ be a state such that |Hψ − Eψ| ≤ exp(−O(K)). Then, ψ cannot have more than 1 − exp(−O(K)) of its probability on state 0 . Otherwise, 1|(H − E)|ψ would be too large as the term H 01 = 1.

VI. ANALYSIS OF QUANTUM ALGORITHM
We now analyze the quantum quench algorithm in more detail. From Eq. (9), Consider any given site i. We will estimate X i T . Summing over i will give X T .
The basic physical idea is that if we can ignore the time-dependence of the force F i , then we can approximate X i T by the expectation value of X i assuming that the qubit i evolves for a time T under a time-independent Hamiltonian. This time-independent Hamiltonian has a transverse field of strength 1 (i.e., the term X i in the Hamiltonian) and a parallel field (α/D)F i , where F i is the force assuming that all other qubits Z j for j = i are drawn from a uniformly random distribution (because at time T = 0, the state of the system is ψ + which has equal amplitude on all states). In this case, similar to the analysis of the classical algorithm before, the force F i is likely to be at least of order √ D in which case we will have 1 − X i Y ∼ (α/D) 2 F 2 i + T 2 ∼ α 2 T 2 /D. However, we cannot always neglect the time-dependence of the force. To estimate whether or not the timedependence of the force is important, we should compare the time-derivative of the force to √ D/T . If the timederivative of the force is small enough compared to √ D/T , then the approximation of the above paragraph will be valid.
In subsection VI A we analyze the time-independent case. Subsection VI B describes a toy example where we can see the effects of time-dependence. In subsection VI C we consider the time-dependence in more detail.
to the force at time 0 (i.e., to √ D) divided by the time T , in order for the force to be small at time T . We will show this more precisely using Cauchy-Schwarz inequalities.
One might then guess (we do not show this) that givenḞ i of order √ D/T , and given that 1 − X i T is of order α 2 T 2 /D, then Y i would be of order αT / √ D and so Y iḞi would be of order α. Thus, the results in this section may be interpreted as evidence in favor of conjecture 1. (Note that if 1 − X i T is much larger than this, then Eq. (9) guarantees a large expectation value for H Z while if 1 − X i T is much smaller than this, the constraints on the state at time T become more stringent due to the larger expectation value for X. Further, the magnitude ofḞ i would need to be larger to have a smaller expectation value 1 − X i T .) Define Define This state φ(T ) has the following property as can be seen by going to the interaction representation. Define the operator R by so that R includes all terms in H which are not supported on site i. Then φ(T ) = exp(−iRT ) exp(−i( α D Z i F i + X i )T )ψ + . Then, so that the expectation value φ(T )|X i |φ(T ) is given by the time-independent approximation above. where the exponential is an s-ordered exponential (i.e., it is time-ordered with respected to s, as are later exponentials of integrals below) and where we define So, φ(T ) = ψ + (T ) + ξ.
Let Π − i = (1 − X i )/2, so that it projects onto the |− state on qubit i. So, We emphasize that this quench algorithm is not described by a fixed depth quantum circuit, independent of D. The Lieb-Robinson velocity v LR of this Hamiltonian is proportional to √ α, as can be shown by using Lieb-Robinson bounds adapted to Hamiltonians which are a sum of two types of terms (in this case, X j for different qubits j is one type and terms in H Z is another type) such that terms within a type commute [19]; more generally, we can use bounds adapted to the case of a bounded commutator [11]. Here to define the Lieb-Robinson velocity, we define a distance between qubits by using a graph metric for a graph with vertices corresponding to qubits and an edge between vertices if the corresponding qubits are both in some term in H Z .
The estimates using the Lieb-Robinson velocity give some upper bound on how far a perturbation can propagate in a given time; the effect of a perturbation beyond a distance proportional to v LR t is negligible. These estimates may not be tight, but we expect that indeed the velocity of perturbations will be proportional to √ α in many systems. If this is true, then if αt 2 diverges with D to obtain a nontrivial approximation, the necessary circuit depth also diverges.
Also, it may be useful to consider a generalization of the algorithm in which one does some slow (but not necessarily adiabatic) evolution of the Hamiltonian from an initial Hamiltonian X to H = X +(α/D)H Z , followed by an additional time evolving under H = X + (α/D)H Z . This is similar to the quantum adiabatic algorithm except on proceeds at some nonzero speed, allowing level crossings. The point of the analysis here is that even if the evolution from initial to final Hamiltonian is very nonadiabatic, the evolution for a nonzero time in some fixed Hamiltonian can achieve a useful result as decohering in the eigenbasis can increase the expectation value of H Z while reducing that of X. This decoherence is a possible principle that can be used to show a nontrivial approximation. .