Classical and Quantum Algorithms for Tensor Principal Component Analysis

We present classical and quantum algorithms based on spectral methods for a problem in tensor principal component analysis. The quantum algorithm achieves a quartic speedup while using exponentially smaller space than the fastest classical spectral algorithm, and a super-polynomial speedup over classical algorithms that use only polynomial space. The classical algorithms that we present are related to, but slightly diﬀerent from those presented recently in Ref. [1]. In particular, we have an improved threshold for recovery and the algorithms we present work for both even and odd order tensors. These results suggest that large-scale inference problems are a promising future application for quantum computers.


Introduction
Principal component analysis is a fundamental technique that finds applications in reducing the dimensionality of data and denoising.While an optimal choice of principal components for a matrix can be computed efficiently using linear algebra, the corresponding problem for tensors is much less well-understood.Ref. [2] introduced a simple statistical model for tensor principal component analysis, termed the "spiked tensor" problem, and this paper has lead to a large amount of follow-up research.The model consists of (see below for more precise definitions) randomly choosing some unknown "signal vector" v sig ∈ R N ; then, the p-th order tensor is formed, where G ∈ (R N ) ⊗p is noise chosen from some random distribution and where λ is some scalar representing a signal-to-noise ratio.One task called recovery is to infer v sig (to some accuracy) given T 0 .A simpler task called detection is to distinguish the case λ = 0 from λ = λ for some λ > 0, again just given T 0 .
Ref. [2] presented a variety of algorithms for this problem.Following the normalization of Ref. [1], the entries of G are chosen independently from a Gaussian distribution of zero mean and unit variance, with |v sig | = √ N .Then, information theoretically, it is possible to recover for λ much larger than N (1−p)/2 [2,3].However, no polynomial time algorithm is known that achieves this performance.Rather, the two best known algorithms are spectral and sum-of-squares.Spectral algorithms were first suggested in Ref. [2].There, a matrix is formed from T 0 (if p is even, the matrix is N p/2 -by-N p/2 , with its entries given by entries of T 0 ) and the leading eigenvector of the matrix is used to determine v sig .For even p, this method works for λ much larger than N −p/4 , and a variant of it is conjectured to perform similarly for odd p. Methods based on the sum-of-squares also perform similarly to the spectral method.The sum-of-squares method [4,5] for this problem gives rise to a sequence of algorithms [6,7], in which one can recover at λ smaller than N −p/4 at the cost of runtime and space increasing exponentially in polylog(N )N −p/4 /λ.In Ref. [1], a sequence of spectral algorithms with similar performance was shown.
In this paper, we present another spectral algorithm for the problem.Our spectral algorithm for even p is closely related to that of Ref. [1] which we became aware of while this paper was in preparation, and we use the normalization in that paper.However, we make several changes.Our main technical results are the following.First, we prove an improved threshold for recovery for even p using our algorithm; the improvement is by a constant factor and relies on a randomized method of recovering.Second, we provide a different algorithm for odd p with provable guarantees on while no guarantees were given in Ref. [1] for odd p.For both even and odd p, we have provable bounds on recovery for λ of order N −p/4 (without any polylogarithmic factors) and we have a sequence of algorithms similar to that above for λ small compared to N −p/4 .Third, we give a quantum algorithm for our spectral method, achieving a quartic speedup and exponential reduction in space.This quantum algorithm involves two main ideas.The first uses phase estimation and amplitude amplification to obtain a quadratic speedup in computing the largest eigenvector.The second idea uses a chosen input state to obtain a further quadratic speedup, giving the overall quartic speedup.
We emphasize that the quantum speedup is quartic compared to classical spectral algorithms presented here and in previous work.We are not able to make an accurate comparison of the runtime to sum-of-squares methods.In part, given that the runtime of all of these algorithms increases exponentially in λ −1 , a change in prefactors in some estimates for threshold can give rise to polynomial changes in runtime.We expect that many of these estimates of thresholds are not tight (indeed, we expect that they are off by a polylogarithmic factor), and so either improved analytic methods or numerical simulations are needed to give an accurate comparison.
At a more heuristic level, we present a rather different motivation for our spectral algorithm compared to Ref. [1].Rather than being motivated by the so-called Kikuchi free energy, we instead are motivated by mean-field approximations to quantum many-body systems.We consider a system of a some number n bos of qudits, each of dimension N , and use the tensor T 0 to construct a quantum Hamiltonian on these qudits.Increasing n bos gives rise to a similar sequence of algorithms as above, with increased runtime but improved performance: the required n bos increases polynomially in λ −1 as polylog(N )(N −p/4 /λ) 4/(p−2) , but the runtime increases exponentially.
Restricting to the symmetric subspace, these n bos qudits can be thought of as a system of bosons.In the case p = 4, for example, our Hamiltonian has pairwise interaction terms for all pairs of qudits.It is natural from the viewpoint of mean-field theory in physics then to expect that the leading eigenvector of the problem, for large n bos , can be approximated by a product state.While the bounds for arbitrary pairwise Hamiltonians would require rather large n bos for given N in order for such a mean-field approximation to be accurate [8,9,10], we will be able to prove that for the statistical model above the mean-field approximation becomes accurate with high probablity at much smaller n bos , depending upon the value of λ.In this mean-field regime, the product state is an n bos -fold tensor product of a single particle state, and this single particle state is given by v sig in an obvious way, regarding the vector v sig as a vector in the single-particle Hilbert space.While we will not prove that this state is a good approximation to the leading eigenvector, it will be a good approximation to some state in an eigenspace with large eigenvalue.Then, the single particle density matrix allows one to infer v sig (a similar matrix was used in Ref. [1] where it was termed a voting matrix).
Classically, implementing this spectral algorithm requires high-dimensional linear algebra, in particular finding the leading eigenvector of a matrix of dimension ≈ N n bos .This makes it a natural candidate for a quantum algorithm.Since the Hamiltonian here is fairly simple, it can be simulated efficiently using standard techniques in the literature reviewed later.This allows us to give a simple algorithm based on preparing a random initial state and then phase estimating in an attempt to project onto the leading eigenvector.The probability of success in this projection is inverse in the dimension of the matrix, so this simple algorithm leads to no speedup over classical.However, we show that it is possible to apply amplitude amplification to give a quadratic speedup over classical.More surprisingly, we show that one can use the tensor T 0 to prepare an input state to the algorithm with improved overlap with the leading eigenvector, giving the quantum algorithm a quartic speedup over classical.Here, when comparing to classical we are considering classical algorithms based on the power method or similar algorithms such as Lanczos; these algorithms require exponential space while the quantum algorithm uses only polynomial space.We also consider classical algorithms based on ideas in Ref. [11] which use polynomial space but the quantum algorithm is super-polynomially faster than these algorithms.We also present some minor improvements to the quantum algorithm which may be useful in practice.

Definitions, Random Ensembles, and Notation
Let us make some formal definitions.A tensor T of order p and dimension N is a multi-dimensional array.The entries of the tensor are written T µ1,µ2,...,µp where p ≥ 1 is an integer and each µ a ranges from 1, . . ., N .Generalizing previous work on this problem, we consider two possible cases, one in which entries of a tensor are chosen to be real numbers, and one in which they may be complex numbers, so that either T ∈ (R N ) ⊗p or T ∈ (C N ) ⊗p ; we explain later the reason for this generalization; a tensor with all entries real will be called a real tensor.A symmetric tensor is one that is invariant under permutation of its indices.The symmetrization of a tensor is equal to 1/p! times the sum of tensors given by permuting indices.
The spiked tensor model for given N, p is defined as follows.Let v sig be a vector in R N , normalized by |v sig | = √ N , chosen from some probability distribution; this is the "signal vector".Let G be a real tensor of order p with entries chosen from a Gaussian distribution with vanishing mean.We let T 0 = λv ⊗p sig + G as above, where v ⊗p sig is defined to be the tensor with entries Here we use the notation that a subscript on a vector denotes an entry of that vector; we use a similar notation for matrices later.Remark: some of the best sum-of-squares results are for a different distribution in which the entries of T 0 are chosen from a biased distribution on {−1, +1}, rather than for the Gaussian distribution.We expect that using that distribution would not affect the results here too much, but we avoid treating that case also for simplicity.
Since the tensor v ⊗p sig is symmetric, of course it is natural to replace T 0 by its symmetrization.Indeed, no information can be lost by this replacement since given a tensor T 0 one can symmetrize the tensor, and then add back in Gaussian noise chosen to vanish under symmetrization to obtain a tensor drawn from the same distribution as T 0 was.That is, the cases in which G is symmetrized or not can be reduced to each other.
A generalization of this problem is the case in which G is chosen to have complex entries, with each entry having real and imaginary parts chosen from a Gaussian distribution with vanishing mean and variance 1/2.We refer to this as the complex ensemble, while we refer to the case where G has real entries as the real ensemble; the choice of reducing the variance to 1/2 is a convenient normalization for later.It is clear that since v sig is real, the case of complex G can be reduced to the real case (up to an overall rescaling for the different variance) simply by taking the real part of T 0 , and similarly the real case can be reduced to the complex case (again up to an overall rescaling) by adding Gaussian distributed imaginary terms to the entries of T 0 .We will see that for odd p, at least for reasons of analyzing the algorithms, it is convenient not to symmetrize T 0 and to take complex G, while for even p this is not necessary.It may be possible to avoid doing this for odd p (which may improve the detection and recovery threshold of the algorithm by constant factors) and we comment on this later.
We treat p as fixed in the asymptotic notation, but consider the dependence on N, n bos .So, throughout this paper, when we refer to a polynomial in N , we mean a polynomial independent of the parameter n bos .The polynomial may, however, depend on p, such as N p .
We make additionally the following assumptions: Assumption 1.We assume that n bos = O(N θ ) for some p-dependent constant θ > 0 chosen sufficiently small.We will also assume that λ is Ω(N −θ ) for some p-dependent constant θ > p/4.
Finally, we assume that λ = O(N −p/4 ).Remark: there is of course no reason to consider λ larger than this since simple spectral methods succeed if λ is ω(N −p/4 ), but we state this assumption explicitly as it simplifies some of the big-O notation.
We will explicitly state this Assumption 1 in all theorems where it is needed; the assumption will be implicit in the statement of the lemmas and will not be explicitly stated to avoid cluttering the statement of the results.
The first of these assumptions, that n bos = O(N θ ), is useful to simplify some of the statements of the results to avoid having to specify the allowed range of n bos in each case.For example, we will say that a quantity such as n bos p /N is o (1), meaning that we must take θ < 1/p.We do not specify the exact value of θ but it can be deduced from the proofs if desired.
The second of these assumptions, that λ is Ω(N −θ ), also helps simplify some of the statements of the results.Since we have assumed that n bos = O(N θ ) and we will see that the required n bos increases polynomially with λ −1 , this assumed lower bound on λ is not a further restriction on our results.
We write E[. ..] to denote expectation values and Pr[. ..] to denote a probability.Usually these are expectation values or probabilities over choices of G, though in some cases we consider expectation values over other random variables.We use . . . to denote the operator norm of an operator, i.e., the largest singular value of that operator.We use | . . .| to denote either the 2 norm of a tensor or the 2 norm of a quantum state.All logarithms are natural logarithms.
We use . . .| . . . to denote inner products and use bra-ket notation both for vectors and for quantum mechanical states.
We will need to compute expectation values over random choices of G and also compute expectation values of certain operators in quantum states, such as ψ|O|ψ for some state ψ and operator O.We refer to the latter as a quantum expectation value of O in state ψ to distinguish it from an expectation value over random variables.

Outline
In section 2, we review some results on recovery and boosting from Ref. [1] and present a randomized recovery procedure that will help in improving the recovery threshold.In section 3, we give spectral algorithms for the spiked tensor problem for the case of both even and odd p.In that section, we present algorithms in terms of eigenvalues and eigenvectors of a matrix (more precisely, vectors in some eigenspace and quantum expectation values of operators in those vectors) that we call a Hamiltonian.We leave the method to compute these eigenvalues and expectation values for later in section 5, where we give classical and quantum algorithms for this and give time bounds for those algorithms.In section 4, we give some results on the spectra of random tensors needed for the analysis of these algorithms.
A key idea here is reducing the case of a p-th order tensor for odd p to the case of a q-th order tensor for even q = 2(p − 1).One interesting corollary of this technique, see corollary 2, is that for odd p and for the minimal value of n bos we are able remove a logarithmic factor in some of the bounds (a similar logarithmic factor has been removed also using in [4] using what they termed an "unfolding" algorithm).An appendix A gives an introduction to some techniques used to evaluate expectation values of tensors networks whose tensors are chosen from a Gaussian distribution; these techniques are used earlier in the paper.In section 6, we further discuss tensor networks and use this to consider limitations of certain algorithms and also to explain further some of the motivation for this algorithm.In section 7, we discuss some extensions of the results.
The proof of detection is in theorem 2 for the even p case and 4 for the odd p case.The proof of recovery is in theorem 3 for the even p case and theorem 5 for the odd p case.The runtime bound for the fastest quantum algorithm is in theorem 6.This theorem gives a quartic improvement in the runtime compared to the fastest classical spectral algorithm; more precisely the log of the runtime with the quantum algorithm divided by the log of the runtime of the classical algorithm approaches 1/4 as N → ∞ at fixed N −p/4 /λ.

Recovery
In this section we discuss recovery and define randomized procedures for recovery that will be useful in boosting the threshold.Following the notation of Ref. [1], define the correlation between two vectors by a normalized overlap The goal of an algorithm is to produce a vector x with large corr(x, v sig ).Note that we take the absolute value in the correlation, ignoring the sign.For even p, the sign of v sig is irrelevant, while for odd p it is easy, given a guess of v sig up to sign, to try both choices of sign and see which is most likely.
Strong recovery means that corr(x, v sig ) = 1 − o(1).Proposition 2.6 of Ref. [1], which is noted in that reference as being implicit in Ref. [2], shows how to "boost" a weaker correlation to strong recovery.It is shown that given a vector u one can apply a single iteration of the the tensor power algorithm to obtain a new vector x such that, with high probability, where c is a constant depending on p. So, for any λω(N (1−p)/2 , if corr(u, v sig ) = Ω(1), we have corr(x, v sig ) = 1 − o(1).Thus, it suffices to construct which outputs some vector u which has, with high probability, corr(u, v sig ) = Ω(1).This is termed weak recovery; indeed, one can be satisfied with even weaker assumptions depending on the value of λ; for λ close to N −p/4 , one may have polynomially small corr(u, v sig ).
The spectral algorithms that we construct later will output a matrix that we will write ρ 1 .This matrix will be a positive semi-definite Hermitian matrix with trace 1.In physics terms, such a matrix is termed a density matrix.For sufficiently large λ, the leading eigenvector of the matrix will have a large overlap with v sig .However, for smaller λ, we will still have some lower bound that, with high probability, v sig |ρ 1 |v sig = Ω(1)|v sig | 2 .The randomized algorithm 1 below shows how to use this to obtain weak recovery.This randomized algorithm allows us to improve the threshold over the recovery method [1] of simply using the leading eigenvector ρ 1 since it works even in some cases where the leading eigenvector has small correlation with v sig .
Putting these results together we find that Corollary 1.Given an algorithm that, with high probability, outputs a density matrix ρ with then in polynomial time, with high probability, strong recovery is possible.We will simply say that such an algorithm achieves recovery.
We present the algorithm for the case that the matrix ρ 1 may be complex; if the matrix is real, one can instead sample from a real Gaussian distribution and the proof of lemma 1 goes through will slightly different constants.
Algorithm 1 Input: density matrix ρ.Output, some vector w obeying bounds described above 1.Randomly sample a vector u with entries chosen from a correlated complex Gaussian distribution with zero mean and with covariance We have Lemma 1.For algorithm 1, with probability at least 1/2, for some scalar c > 0.
Proof.We have E[|u| 2 ] = tr(ρ) = 1.Hence, with probability at least 3/4 we have that |u| 2 ≤ 4/3.We have E[| u|v sig | 2 ] = v sig |ρv sig and since u|v sig is a Gaussian random variable with mean 0, with probability at least 3/4 its absolute value is at least some positive constant c (the exact constant can be deduced from the error function) times its standard deviation.Hence, the lemma follows for c = (3/4)c .

Spectral Algorithm
We now give the spectral algorithms for the spiked tensor problem for the case of both even and odd p.In subsection 3.1, we define a Hamiltonian H(T ) given an arbitrary tensor T .Then, in subsection 3.2, we present the spectral algorithm in terms of H(T 0 ).For even p, the Hamiltonian that we present is very similar to the matrix Y given in Ref. [1] but it has some minor differences.In our language (explained below), this matrix Y is obtained by projecting our Hamiltonian of Eq. ( 5) into the subspace of the "symmetric subspace" spanned by |µ 1 ⊗ |µ 2 ⊗ . . .⊗ |µ n bos with µ 1 , . . ., µ n bos all distinct.The relative reduction in the size of the matrix is only O(1/N ) in the limit of large N .
Also, in our method, we have an O(N ) rotational symmetry of the basis which is very useful in analysis, for example showing that the eigenvalues of H(λv ⊗p sig ) are independent of choice of v sig .For the matrix Y of [1], this is not obvious to us and we do not fully understand the claimed behavior of the largest eigenvalue in that case.We will use a different notation, using creation and annihilation operators, which will help make this rotational symmetry more explicit.
For odd p, the Hamiltonian that we use is unrelated to that of Ref. [1].

Hamiltonian Definition
For even p, given a tensor T we define an linear operator H(T ) that we call a Hamiltonian as follows.This Hamiltonian is a linear operator on a vector space (R N ) ⊗n bos or (C N ) ⊗n bos , for integer n bos ≥ 1 chosen below.We write basis elements of this space as |µ 1 ⊗ |µ 2 ⊗ . . .⊗ |µ n bos , and we call this space the full Hilbert space.We define This replacement by a second-quantized Hamiltonian is simply a convenient notation.The operators a † µ , a µ are bosonic creation and annihilation operators, obeying canonical commutation relations [a µ , a † ν ] = δ µ,ν .See appendix B for a brief review of this formalism.We restrict to the subspace with a total of n bos bosons, i.e., we define the number operator n by and restrict to n = n bos .An orthonormal basis of states for this symmetric subspace is given by all states equal to some normalization constant multiplying a † µ1 a † µ2 . . .a † µn bos |0 , where |0 is the vacuum state (i.e., the state annihilated by a µ for all µ), and where µ 1 ≤ µ 2 ≤ . . .≤ µ n bos is some sequence.
The second quantized Hamiltonian for the symmetric subspace is unchanged under arbitrary permutation of the first p/2 indices of T and arbitrary (not necessarily the same) permutation of the last p/2 indices of T .
For odd p, we define the Hamiltonian H(T ) as follows.Given a tensor T of odd order p, define a new tensor T of even order q = 2(p − 1) with components Tµ1,...,µ (p−1)/2 ,ν1,...,ν Then define H(T ) = H( T ), using the definition (5) for H( T ).Note the order of indices on the left-hand side of Eq. ( 8).Using the second-quantized notation, this gives for odd p: a µi a νi + h.c., Now we require that n bos ≥ p − 1 as otherwise H(T ) is trivially zero.For this Hamiltonian, it is convenient to take G from the complex ensemble because, as we explain more below, it makes E[H(G)] equal to zero, as well as canceling out certain terms in higher order moments, making the proof of the spectral properties of H(G) simpler.We discus later to what extent we can avoid using the complex ensemble.

Spectral Algorithms
The spectral algorithm for detection and recovery is algorithm 2. In this subsection we prove correctness of this algorithm, using statistical properties of H(G) proven later.This algorithm uses quantities E 0 and E max defined later; roughly E max is an upper bound on the eigenvalues of H(G) and E 0 is the largest eigenvalue of H(λv ⊗p ).The algorithm can then achieve detection by verifying that the largest eigenvalue is significantly larger than E max , which occurs when E 0 is large enough.Indeed, we will see that it suffices to have E 0 = (1+c)E max for any fixed c > 0 (some results can be extended to the case that c decays slowly with N but we omit this for brevity).This fixes the scaling of n bos with λ so that we need (up to polylogarithmic factors) n bos (N −p/4 /λ) 4/(p−2) .
One interesting feature of the algorithm is that in step 3, we compute the density matrix of the leading eigenvector or of any vector in the eigenspace of eigenvalue ≥ E cut , for E cut = (E 0 + E max )/2 defined in the algorithm.This might seem surprising, given that the leading eigenvector is computed in step 1; one might wonder why some other vector should be taken.We describe the algorithm in this way since, in later classical and quantum algorithms that we give to compute the spectral properties of the matrix, we might not extract the leading eigenvector but instead extract only some vector in this eigenspace due to use of the power method in a classical algorithm or due to approximations in phase estimation in a quantum algorithm.Thus, much of our analysis is given to showing that some eigenvalue larger than E cut exists by lower bounding the leading eigenvalue λ 1 , but given that some such eigenvalue exists, we do not worry too much about exactly what mixture of eigenvectors in the given eigenspace we compute.
Algorithm 2 Spectral algorithm.This algorithm takes a tensor T 0 as input and also a scalar λ and an integer n bos .The output is a decision about whether λ = λ or λ = 0, and, if the algorithm reports that λ = λ, it also returns an approximation of v sig (up to an overall sign).The quantity n bos is chosen depending upon the value of λ; smaller values of λ require larger values of n bos in order for E 0 to be sufficiently larger than E max for the algorithm to be accurate.See theorems 2,4,3,5.For E 0 ≥ (1 + c)E max for any c > 0, the algorithm achieves recovery.

(Detection) If
where for even p, and and where E max is defined in theorem 1, then report that λ = λ.Otherwise report λ = 0.
3. (Recovery) Compute the single particle density matrix (defined below) of the leading eigenvector or of any vector in the eigenspace of eigenvalue ≥ E cut .Apply algorithm 1 to recover an approximation to v sig .
In section 4, we consider the largest eigenvalue λ 1 of H(G) and show the following theorem which summarizes the results of lemmas 5,7.Roughly, up to prefactors, the result for odd p is given by considering the result for a q-th order tensor for even q = 2(p − 1) and then multiplying the largest eigenvalue by a factor of √ N .
Theorem 1.Let λ 1 be the largest eigenvalue of G.For even p let and for odd p let where J is a scalar depends that implicitly on p, n bos , N and tends to some function depending only on p for large n bos , N .More precisely, for even p, J is equal to (p/2)! n bos p/2! /n bos p/2 + o(1) for the real ensemble and is twice that for the complex ensemble, and for odd p, J is equal to that for the even case for 2(p − 1).
Then, for any x, assuming Assumption 1, 2 log(N ) (11) and for odd p So, for any E which is ω(ξ), with high probability λ 1 ≤ E max + E .Now consider the eigenvectors and eigenvalues of H(T 0 ).For any symmetric tensor T of order n bos , let |T be the vector on n bos qudits (each of dimension N ) with amplitudes given by the entries of the tensor in the obvious way: This vector is only normalized if |T | = 1.So, Ψ sig ≡ N −n bos /2 |v ⊗n bos sig is a normalized vector.We have the following simple property: Proof.Immediate from the variational principle.

Even p Case
We now show correctness of the algorithm.All results in this subsubsection refer to even p even if we do not state it explicitly.First we estimate E of lemma 2 to show detection.Then, we show recovery.
We have where To evaluate Ψ sig |H(G)|Ψ sig it is convenient to exploit a rotational invariance of this problem.We can apply a rotation using any matrix O in the orthogonal group O(N ), rotating the creation and annihilation operators by making the replacement This rotation preserves the canonical commutation relations and is equivalent to rotating the basis states on each qudit by O.To preserve the Hamiltonian H(T 0 ), we rotate each leg of the tensor T 0 : This rotation preserves the Gaussian measure on G but changes v sig .So, we can rotate so that v sig is some fixed vector, say ( √ N , 0, 0, . ..) so multiplied by a single entry of G, i.e., the entries with all indices equal to 1, which is some quantity chosen from a Gaussian distribution with zero mean and unit variance.So, with high probability, , and if Assumption 1 holds, then with high probability algorithm 2 correctly determines whether λ = 0 or λ = λ.
Proof.This follows from the estimate of E which lowers bounds the largest eigenvalue when λ = λ and from theorem 1 which upper bounds H(G) .
Given any state, define the single particle density matrix ρ 1 in the basis for the full Hilbert space used in Eq. ( 5) to be the reduced density matrix on any of the qudits.Equivalently, this single particle density matrix can be expressed in terms of creation and annihilation operators by Note that ρ 1 is positive semi-definite, Hermitian, and trace 1.
We have Theorem 3. Let Assumption 1 hold.Given any vector Ψ such that Ψ|H(T 0 )|Ψ ≥ (1 + c )E max for any scalar c > 0, then with high probability the corresponding single particle density matrix ρ 1 obeys In particular, for Hence, algorithm 2 achieves recovery.
Proof.We have Ψ|H(λv Then, the Hamiltonian H(λv ⊗p sig ) is diagonal in the computational basis for the full Hilbert space.Let P (m) be the probability, for state Ψ, that exactly m out of n bos qudits are in state |1 .Then, Ψ|H(λv Remark: more generally, the same result holds for any mixed state: given any mixed state σ such that tr(σH(T 0 )) ≥ (1 + c)E max for any scalar c > 0, then with high probability the corresponding single particle density matrix ρ 1 obeys Eq. ( 17).

Odd p Case
We now show correctness of the algorithm for odd p.All results in this subsubsection refer to odd p even if we do not state it explicitly.
Let us first estimate E to show detection.We have ) is a quadratic function of T 0 so there are some cross-terms.We have The cross-term is a µi a νi +h.c.|Ψ sig .
Exploiting the same rotational invariance as in the even p case, this is equal to 2λ(p − 1)! n bos p−1 N p/2 multiplied by the real part of a single entry of G, i.e., the entry with all indices equal to 1. So, with high probability, this cross-term is bounded by λn bos p−1 O(N p/2+η ) for any η > 0.
The term quadratic in G is a µi a νi +h.c.|Ψ sig .
Exploiting the same rotational invariance as before, fixing v sig = ( √ N , 0, 0, . ..), we must have all µ i , ν i = 1.So, this is a sum of squares of N different entries of G, corresponding to N possible choices of σ; since we use the complex ensemble, this has vanishing mean.So, with high probability this term is n bos p−1 O(N 1/2+η ) for any η > 0. So, with high probability, for any η > 0. Hence, , then with high probability algorithm 2 correctly determines whether λ = 0 or λ = λ.
Proof.This follows from the estimate of E in Eq. ( 20) which lowers bounds the largest eigenvalue when λ = λ and from theorem 1 which upper bounds H(G) .Note that for λ = O(N −p/4 ), the terms ) on the right-hand side of this equation are negligible as they are o( . We now consider recovery.We will first give a more general bound on cross-terms that will be useful.We have H(T 0 ) = H(λv ⊗p sig ) + H(G) + cross terms.Let us first bound the cross-terms.

Lemma 3. With high probability, the operator norm of the cross-terms is bounded by λN
Proof.The cross-terms are equal to 2H(T cross ) where T cross is a tensor for even q = 2(p − 1) which has components (21) Rotating so that v sig = ( √ N , 0, 0, . ..), we have Clearly, H(T cross ) ≤ n bos p−1 M cross , where M cross is an N p−1 -by-N p−1 matrix, whose entries are determined by the entries of T cross in an obvious way so that the first p − 1 indices of T cross determine a column index in M and the last p − 1 indices of T cross determine a row index of M .Regard T µ1,...,µp−1,1 as being the entries of some tensor of order p − 1 and let M by the N (p−1)/2 -by-N (p−1)/2 matrix whose entries are determined by the entries of this tensor, again in the obvious way so that the first (p − 1)/2 indices of the tensor determine a column index and the last (p − 1)/2 indices determine a row index. Then, However, since the entries of M are independent Gaussian random variables, it follows from standard random matrix theory results [12] that with high probability M = O(N (p−1)/4 ).Remark: the dependence on n bos is not tight here.
So as in the even case, we have Theorem 5. Let Assumption 1 hold.Given any vector Ψ such that Ψ|H(T 0 )|Ψ ≥ (1 + c )E max for any scalar c > 0, then with high probability the corresponding single particle density matrix ρ 1 obeys In particular, for Hence, algorithm 2 achieves recovery.
Proof.We use lemma 3 to bound H(T cross ) .This bound is asymptotically negligible compared to E max .So, we have Ψ|H(λv 1))E max where the o(1) denotes the o(1) contribution from the cross-terms.
Then, the rest of the proof is the same as in the even case, except for a replacement of p/2 by p − 1.In detail: by theorem 1, with high probability Ψ|H(G)|Ψ ≤ (1 + o( 1))E max .Hence, Ψ|H(λv ⊗p sig |Ψ ≥ (c − o( 1))E max .Rotate so that v sig = ( √ N , 0, 0, . ..).Then, the Hamiltonian H(λv ⊗p sig ) is diagonal in the computational basis for the full Hilbert space.Let P (m) be the probability, for state Ψ, that exactly m out of n bos qudits are in state |1 .Then, Ψ|H(λv

Spectrum of Random Hamiltonian
In this section, we will estimate the eigenvalues of H(G).We consider first the case of even p.Here our proof is very similar to the of Ref. [1], though the method here also suggests some heuristics that may lead to a tighter bound in the future.Then, we consider the case of odd p by reducing it to the case of even p.

Even p
We first consider the case of even p.Let Z(τ, p, N, n bos ) denote E[tr(exp{τ H(G)})] for a tensor G of rank p, with entries of G chosen from the Gaussian distribution, with given N, n bos , for some real scalar τ .In this subsection, G may be symmetrized or not, and may be chosen from either the real or complex ensemble.
The main result in this section is the following lemma: Lemma 4. For each ensemble, real or complex, symmetrized or not, we have where J is a scalar depends that implicitly on p, n bos , N and tends to some function depending only on p for large n bos , N .More precisely, J is equal to (p/2)! n bos p/2! /n bos p/2 + o(1) for the real ensemble and is twice that for the complex ensemble.
Proof.We first give a brief derivation of Eq. ( 28) below which is a standard result using quantum field theory techniques.Note that τ H(G) = H(τ G) and τ G is a tensor with entries chosen from a Gaussian distribution with zero mean and variance τ 2 .Hence for any τ > τ we have where E G,η [. ..] denotes the expectation value over G and η, with the tensor η having Gaussian entries with zero mean and variance (τ ) 2 − τ 2 .Taking the expectation value over η, for τ 2 − τ 2 small, we need to keep only the zeroth and second order terms on the right-hand side.So, we find that Using cyclic properties of the trace, this can be simplified to We now use a general result.Consider any Hermitian H (we will use H = τ H(G)) and any operator O (we will use . The cosh function is maximized on the interval [0, 1] at s 1 = 0, 1 when the right-hand side becomes equal to the left-hand side.So, For the real ensemble without symmetrization, we have To leading order in N , we may approximate ν a ν a † ν = N so on the given Hilbert space, . In general, for the complex or real ensemble, symmetrized or not, we find that E[H(η) 2 ] is a scalar equal to J(p/2)! n bos p/2 N p/2 , where J obeys the claims of the lemma.To verify that E[H(η) 2 ] is a scalar and to compute the scalar to all orders in N , commute the annihilation operators a ν to the right.The result is some linear combination of operators with all annihilation operators to the right of creation operators which can be written in the form µ1,...,µ k ( .This means that the integral of this correlation function over s 1 is dominated by its value for small s 1 of order 1/τ so that for τ large compared to the inverse width of the semi-circle (though of course τ not too large) the integral becomes of order 1/τ .This is stronger than the upper bounds here where we have bounded by the integral by something independent of τ .We may guess that a tighter analysis will show that a similar effect will happen in the case of n bos p/2; however, an important difference occurs.If we take an eigenstate of H(G) with some eigenvalue λ 0 , and apply H(η) for random η, this only changes p/2 out of the n bos qudits in the state.So, one might guess that the resulting state will have expectation value of H(G) that is λ 0 (1 − p/(2n bos )) rather than an (as in the random matrix case) expectation value of H(G) which is zero.So, we may guess that the correlation function will be non-negligible for s 1 (n bos /p)τ −1 .A heuristic estimate in this fashion suggests that the lemma below for the eigenvalue is tight up to logarithmic factors.
From lemma 4, the following lemma is an immediate corollary: Lemma 5. Let λ 1 be the largest eigenvalue of G. Let Then, for any x, So, for any E which is ω(ξ), with high probability λ 1 ≤ E max + E .
Proof.We have tr(exp{τ H(G)}) ≥ exp(τ λ 1 ).Hence, for any x, Since D(N, n bos ) ≤ N n bos , minimizing over τ , we find that For x = E max , the right-hand is equal to 1 and for x > E max the right-hand side decays exponentially.

Odd p
We now consider the case of odd p.Let Z(τ, p, N, n bos ) denote E[tr(exp(τ H(G)))] for a tensor G of rank p, with entries of G chosen from the Gaussian distribution, with given N, n bos .In this subsection, G is complex and not symmetrized.The Hamiltonian H(G) is given by Eq. ( 6) for even p and by Eq. ( 9) for odd p.
We will reduce the calculation for odd p to the case for even p, up to a bounded error, showing the following Lemma 6.For odd p, for All occurrences of Z(•) in the above equation refer to the complex ensemble without symmetrizing.Remark: the assumptions require that τ is o(1); however, that is still large enough τ to be useful later in bounding the spectrum since the largest eigenvalues of H(G) are typically large compared to 1.
Proof.For arbitrary H(G), the exponential exp{τ H(G)} can be expanded as a series We bring the expectation value (over G) inside the trace, and compute the expectation value of this series.This expectation value can be computed using Wick's theorem to compute the expectation value of a moment of a Gaussian distribution.For odd p, each term in H is a sum of two terms, one depending on the product of two tensors G and one depending on the product of two tensors G, where the overline denotes complex conjugation.Hence, the m-th order term τ m m! H(G) m is a sum of 2 m terms, corresponding to m different choices of G or G in each H(G)); we call each such choice a history.Each term is a product of creation and annihilation operators depending on a product of 2m tensors, some of which are G and some of which are G.This expectation value of such a term is non-vanishing only if there are m tensors G and m tensors G.In that case, the expectation value is given by summing over ways of pairing each tensor G with a distinct tensor G.There are m! such pairings.Then, for each such pairing, one computes an operator by taking the operator τ m m! H(G) m and replacing, for every pair, the two tensors G µ1,...,µp G ν1,...,νp in that pair with a product of δ-functions Summing this operator over pairings gives the expectation value.
Note also that here we have not symmetrized G.If we were to symmetrize G, then E[G µ1,...,µp G ν1,...,νp ] is given by 1/p! times a sum of p! different products of δ-functions, i.e. π δ µa,ν π(a) , where π is a permutation.This would lead to additional terms that we need to compute and would make the analysis more difficult (though in practice may lead to better performance).
For given m, given history and given pairing, let us define a cluster as follows: define a graph with 2m vertices, each vertex corresponding to a single tensor, either G or G. Let there be an edge between any two tensors which both appear in the same term in the Hamiltonian.Let there also be an edge between any two tensors which are in a pair.Hence, this is a graph of degree 2 (we allow the possibility of multiple edges connecting two vertices if we pair two tensor which both appear in the same term in the Hamiltonian).A cluster is a connected component of this graph.
We refer to a cluster containing four vertices as a minimal cluster.A cluster with six or more vertices is called a non-minimal cluster.Note that there are no clusters containing only two terms because each term in H(G) depends on a product of two tensors G or two tensors G; if we had not taken the complex ensemble and instead taken G to be real, we would instead have these clusters with two terms.We discuss the case of the real ensemble further after the proof of the lemma.
The minimal clusters will turn out to give the dominant order contribution in an expansion in N .The non-minimal clusters will be subleading order.
We have expressed τ m m! H(G) m as a sum over histories and pairings and so E[exp(τ H(G))] is a sum over m, histories, and pairings.Each term in the sum over m, histories, and pairings for E[exp(τ H(G))] is an operator.Hence, Z(τ, p, N, n bos ) is also given by a sum over m, histories, and pairings as one may take the trace of each term in E[exp(τ H(G))].Note that each m, history, and pairing gives a non-negative contribution to Z(τ, p, N, n bos ), i.e., it has a non-negative trace.
Lower Bound-Now we prove the first inequality in Eq. ( 33), namely that Z( √ 2N τ, 2(p−1), N, n bos ) ≤ Z(τ, p, N, n bos ).To do this, we first define a similar summation over m, histories, and pairings to compute Z(τ, q, N, n bos ) for even q = 2(p − 1) and then we compare the two summations.
For even q, each term in H is a sum of two terms, one depending linearly on a tensor G and one depending on tensor G.As before, the m-th order term τ m m! H(G) m is a sum of 2 m histories, with each history corresponding to m different choices of G or G in each H(G).Each term is a product of creation and annihilation operators depending on a product of m tensors, some of which are G and some of which are G; note the difference here from the odd case as now there are only m tensors, rather than 2m.This expectation value of such a term is non-vanishing only if there are m/2 tensors G and m/2 tensors G.In that case, the expectation value is given as before by summing over pairings and replacing the two tensors in the pair with δ-functions.
We claim that if we consider Z(τ, p, N, n bos ) for odd p and consider only the sum of pairings for which all clusters are minimal, this gives precisely the sum of terms for Z( √ 2N τ, 2(p − 1), N, n bos ).Since all terms contribute non-negatively to the trace, this will prove the first inequality.
To show this claim, let L 2(p−1) label a given choice of m, history, and pairing for Z( √ 2N τ, 2(p − 1), N, n bos ) and let L p label a given choice of m, history, and clusters for Z(τ, p, N, n bos ) for which all clusters are minimal.There are 2 m/2 different pairings for the given choice of clusters labelled by L p .
We will construct a one-to-one correspondence between L p and L 2(p−1) and show that the sum of the terms labelled by L p is equal to the term labelled by L 2(p−1) , i.e., that they give the same operator, and hence have the same trace.Consider given L p .Then, L 2(p−1) is as follows.The value of m is the same.The history is also the same: for a given sequence of choices of G or G for L p , we use the same sequence for L 2(p−1) .Note that in L p , each of the m choices of G or G denotes a choice that both tensors in H(G) are equal to G or both are equal to G, while in L 2(p−1) each of the m choices is only a choice about a single tensor in H(G).
The pairing is as follows.We introduce notation to label the tensors in H(G) m .For even p, we label the tensors by an integer i ∈ {1, 2, . . ., m} depending which of the m factors of H(G) it is in.Here, we mean that the tensors appear in the product ; we label each of the different factors in the product 1, 2, . . ., m in sequence.For odd p, we label the tensors by a pair (i, w) for i ∈ {1, 2, . . ., m} and w ∈ {1, 2}.The integer i labels which of the m factors of H(G) it is in and w labels whether it is the first or second of the two tensors in H(G).For a pairing for Z(τ, p, N, n bos ) with all clusters minimal, then each cluster is of the form that for some i, j that we pair (i, 1) with (j, 1) and (i, 2) with (j, 2) or of the form that we pair (i, 1) with (j, 2) and (i, 2) with (j, 1).For a given choice of clusters, the corresponding pairing for L 2(p−1) pairs i with j.
We sketch the claim about the terms.For a term in L p , for each cluster we have two tensors T and two tensors T .We replace these tensors with the expectation value which is equal to some product of δ-functions.This expectation value then multiplies the operators a † αi a † βi p−1 i=(p−1)/2+1 a αi a βi inserted at the appropriate places into the product H(G) m .The δ-functions constrain σ = σ; summing over this gives a factor of N for each cluster, while there are two pairings for each cluster giving another factor of 2 for each cluster.The number of clusters is equal to m/2, giving an overall factor (2N ) m/2 .This proves the first inequality.
Upper bound-Now we prove the second inequality in Eq. (33).To do this, we define the following quantity for q which is a multiple of 4 (note that q = 2(p − 1) is a multiple of 4 if p is odd): where the expectation value is over tensors G of order q chosen from the complex ensemble and tensors G of order q/2 chosen also from the complex ensemble (note that q/2 is even).Note that we square H(G ) in the exponent.We will prove that for τ = N 1/3 τ .From this, the second inequality in Eq. (33) follows.To see this, we have, for any G , H(G ) 2 ≤ O(n bos p−1 ) G 2 where the operator norm H(G ) 2 denotes the largest eigenvalue in absolute value and G denotes the largest singular value of G , regarding G as a matrix of size N q/4 -by-N q/4 .So, by the Golden-Thompson inequality, where the subscript G or G denotes the expectation over G or G .The matrix G has complex entries; we can bound its operator norm by the sum of the operator norms of its Hermitian and anti-Hermitian parts.For )] as can be shown using Hermite polynomials [12] to compute the probability distribution of the eigenvalues of a random matrix and using the decay of a Hermite polynomial multiplying a Gaussian (the study of the largest eigenvalue of a random matrix is a rich field if we consider the probability distribution near the edge of the Wigner semi-circle but here we are satisfied to consider the probability distribution for eigenvalues which are some constant factor > 1 multiplying the edge of the semi-circle).
To show Eq. ( 35), let L p label the set of terms for Z(τ, p, N, n bos ) with a given choice of m, history, clusters (the clusters need not be minimal), and choice of pairing for all tensors in non-minimal clusters (we do not specify the pairing of the tensors in the minimal clusters; if there are n min minimal clusters then there are 2 nmin terms in the set).Let L 2(p−1) label a term for Z ( √ 2N τ, τ , 2(p − 1), N, n bos ) with given m, history, clusters, and pairing.Here, for Z , a history corresponds to m choices of either H(G) or H (G) 2 and also to choices of G or G, or G or G , for each H(G) or H(G ).We define a map from choice of L p to choice of L 2(p−1) so that the sum of terms labelled by L p is equal to the term labelled by the corresponding L 2(p−1) .This map will be one-to-one but it will not be onto.However, since all terms are non-negative, this will establish the needed inequality.
The map is as follows.The m will be the same.Using the notation above, if tensor (i, 1) is in a minimal cluster (and hence, so is (i, 2)) in the set labelled by L p , then in the history for L 2(p−1) for the i-th term we choose τ H(G), while if (i, 1) is not in a minimal cluster, then we choose τ H(G ) 2 .Given a history labelled by L p , corresponding to a choice of tensor or its complex conjugate (i.e., either G or G) in each of the m different H(G) in a term in the series for Z(τ, p, N, n bos ), then we make the same choice of tensor or its complex conjugate in each of the m different H(G) or H(G ) 2 in the history labelled by L 2(p−1) .That is, if we choose G in the i-th H(G) in some terms in the series for Z(τ, p, N, n bos ), then we choose G in H(G) or G in both terms in H(G ) 2 and similarly if we choose G we choose G in H(G) and G in both terms in H(G ) 2 .
We label the tensors in the expansion for Z ( √ 2N τ, τ , 2(p − 1), N, n bos ) either by an integer i, if it appears in H(G), or by a pair (i, w), if it appears in H(G ) 2 , in which case the index w ∈ {1, 2} labels which of the two H(G ) it is in.Finally, we define the pairing labelled by L 2(p−1) .For a minimal cluster labelled by L p , pairing (j, 1) and (i, 2) with (j, 2) or (i, 1) with (j, 2) and (i, 2) with (j, 1), then in the pairing labelled by L 2(p−1) we pair i with j..If a cluster is non-minimal, then we simply use the same pairing for the corresponding tensors in L 2(p−1) .That is, suppose a cluster pairs (i 1 , w 1 ) with (i 2 , w 2 ), and pairs (i 2 , w 2 ) with (i 3 , w 3 ), and so on, where w a = 1 if w a = 2 and w a = 2 if w a = 1.Then, we also pair (i 1 , w 1 ) with (i 2 , w 2 ), and pairs (i 2 , w 2 ) with (i 3 , w 3 ), and so on.
The smallest non-minimal cluster has six vertices.In every cluster, minimal or not, there is a sum over some index (for example, the sum over the index σ = σ in the lower bound calculation above) which gives a factor N .Thus, taking τ = τ N 1/3 accounts for this factor.No factor of 2 occurs for the non-minimal clusters.
Remark: since we have chosen the entries of G from the complex ensemble, we have that E[H(G)] = 0 for p odd.If instead we had chosen the entries of G from the real ensemble we would have (considering the specific case p = 3 for simplicity and not symmetrizing G, again for simplicity) a non-vanishing expectation value since . Such a term is sometimes called a pairing term or a "cooperon" in the study of disordered system in physics.In the case n bos = 2 (the smallest possible for p = 3), this term has operator norm N 2 .This is much larger than N 1/2 times the expected operator norm of H(G) for p = 2(p − 1) = 4, i.e., that expected operator norm is proportional to N by random matrix theory, and There may be other ways to deal with this non-vanishing expectation value other than using the complex ensemble.One way is to use the real ensemble, but to consider the Hamiltonian H(T 0 ) − M , where we define M = N µ1,µ2 a † µ2 a † µ2 a µ1 a µ1 for p = 3.In this case, the added term −M cancels the expectation value of H(G) term-by-term in the perturbation expansion.However, if we do this we still have some additional terms when we consider clusters of four or more vertices.We expect that the clusters of six or more vertices are still negligible, but the structure of the clusters of four vertices becomes more complicated.We leave the analysis of this case for the future, but we expect that it would work and may be practically useful.
It is worth noting that lemma 6 has the following corollary: Corollary 2. For odd p, for n bos = p − 1, with high probability λ 1 = O(N p/2 ).

Quantum and Classical Algorithms
We now discuss the complexity of classical and quantum algorithms to implement the needed linear algebra.In particular, we need to determine if that largest eigenvalue is larger than E cut and we need to find some vector in the eigenspace of eigenvalue ≥ E cut .We emphasize that it is not necessary to find the leading eigenvector itself.
We will use ψ target to denote this leading eigenvector.Note that if we lower bound the squared overlap of some vector with ψ target , this will lower bound the probability of success of phase estimation.
We have defined algorithm 2 so that in the detection step it uses a hard cutoff on eigenvalue: if the leading eigenvalue is ≥ E cut it reports that λ = λ while otherwise it reports that λ = 0.However, no algorithm, classical or quantum, will be able to compute the leading eigenvalue exactly; there will always be some limit on the precision.Fortunately, the proofs of theorems 2,4 show that if E 0 ≥ (1 + c)E max for any c > 0, then if λ = λ then with high probability λ 1 ≥ (1 − η)E 0 + ηE max for any 0 < η < 1 while if λ = 0 then with high probability λ 1 ≤ (1 − η )E 0 + η E max for any 0 < η < 1.For example, we might take η = 1/8 and η = 1/2.So, it suffices instead to implement some "soft" estimate of the leading eigenvalue which will be very likely to give one result (i.e., reporting that λ = λ) if the leading eigenvalue is larger than (7/8)E 0 + (1/8)E max but very unlikely to give that result if the leading eigenvalue is ≤ E cut .
In the quantum algorithms, to obtain some vector in the eigenspace of eigenvalue ≥ E cut and to do this soft estimate, we will implement an approximate projector by phase estimation in the quantum algorithms onto the eigenspace of eigenvalue ≥ (E 0 +E cut )/2 = (3/4)E 0 +(1/4)E cut .By doing this, the phase estimation error will become negligible when considering the projection of the resulting vector onto the eigenspace with eigenvalue < E cut .Similarly, in the classical power method we will take the number of iterations sufficiently large that the vector has negligible projection on the eigenspace with eigenvalue < E cut and further so that it has expectation value for H(T 0 ) greater than E cut .
We begin with a description of some classical algorithms.The time and space requirements for the classical algorithms are of course not intended to represent a lower bound; rather, they represent times that can be achieved using standard algorithms in the literature.We then give quantum algorithms.Finally, we give a further improvement to the quantum algorithm that may be useful in practice.
When we refer to "space" in a classical algorithm, if we store a D(N, n bos )-dimensional vector, the space requirement is equal to D(N, n bos ) multiplied by the number of bits to store a single entry of the vector.In the classical algorithms, we will not discuss issues with finite precision arithmetic in detail.Since we will be applying operators of the form H(T 0 ) m to vectors in the "path integral" methods, we might need to approximate each entry of the vector to accuracy D(N, n bos ) −1 H(T 0 ) −m = O(poly(N n bos m )) −1 .However, the required number of bits is then only O(mn bos log(N )) and m will be logarithmic in N so the required number of bits will be only polylogarithmic in N .

Classical Algorithms
Classically, the most obvious algorithm is to perform an eigendecomposition on H(T 0 ).This requires storing matrices of size D(N, n bos )-by-D(N, n bos ) so that the space required is Õ(D(N, n bos ) 2 ) and the time is D(N, n bos ) ω where ω is the matrix multiplication exponent [13], though of course in practice the time is closer to D(N, n bos ) 3 .
However, there is no need to perform a full eigendecomposition.One can instead initialize a random vector and then apply the power method to extract some eigenvector of H(T 0 ) in the eigenspace with eigenvalue ≥ E cut .The space required is then only Õ(D(N, n bos )).The time required for a single iteration of the power method is Õ(D(N, n bos )).If λ = λ, then in O(log(D(N, n bos ))/ log(E 0 /E max )) iterations, the resulting vector will have an 1 − o(1) projection onto the eigenspace with eigenvalue ≥ (E 0 + E cut )/2 and a negligible projection onto the eigenspace with eigenvalue < E cut .So, after this many iterations, one can compute the expectation value of H(T 0 ) on that vector to perform detection, i.e., if λ = λ the expectation will be larger than E cut but if λ = 0 the expectation will be close to E max , and one can compute the single particle density matrix of the vector after those iterations.So, the time is This power method still requires exponential (in n bos ) space.In fact, one can use only polynomial space.One obvious choice is to perform a "path integral".That is, given a random initial state Ψ random , the power method computes the state H(T 0 ) m |Ψ random , for some exponent m which gives the number of iterations in the power method.So, we wish to compute where the denominator is to correctly normalize the state.Let us choose the initial state to be a (normalized) random state from the basis for the symmetric basis given before.We make this choice to make the "path integral" simpler and since this is a complete orthonormal basis of states, a random state from this basis has expected overlap with the largest eigenvector of H(T 0 ) equal to 1/N n bos .Then, both the numerator and denominator above can be expressed by a summation over intermediate states from this basis, requiring only space Õ(log(D(N, n bos ))m); this summation is a "path integral".The time required however is now Õ(D(N, n bos ) m ) and so becomes significantly worse if the number of iterations is much more than 1.
An improvement to this time can be obtained by using the algorithm of theorem 4.1 of Ref. [11].This algorithm is expressed in terms of qubits, but we can translate the algorithm to get an algorithm for a single qudit of dimension D(N, n bos ), or, using the tradeoffs discussed there, n bos qudits each of dimension N .
Let us explain this in detail, following the discussion there.We wish to compute y|C|x , for some fixed basis states x, y from the basis above (in our application x = y, where C is a "circuit" of some depth d (for example C = 2m + 1 for the numerator above and C = m for the denominator above).While in Ref. [11], a circuit was assumed to be unitary, there in fact is no need to assume that; for non-unitary circuits, the issues with finite precision arithmetic do become more important but as remarked above this can be handled with only polylogarithmic overhead.For us, a circuit is not necessarily unitary; rather it is built out of a sequence of operations, each of which is either H(T 0 ) or a † µ a ν ; more generally a circuit might include any operations built out of some tensor of order O(1) multiplying some number of creation and annihilation operators.
For d = 1, y|C|x can be computed computed in time poly(n bos ).For d > 1, we have  d) .So, the runtime is Õ((2D(N, n bos )) log(2m+1) ).This is much faster than the path integral method but potentially much slower than the power method, depending on the required m, i.e., for E 0 /E max = Θ(1), we need m proportional to log(D(N, n bos )) and so the time required is superpolynomially worse than the time required if one stores the full vector.

Quantum Algorithms
We now discuss quantum algorithms for the same problem.In contrast to the classical algorithms above, all these algorithms take only polynomial space.First let us describe a simple algorithm, given as algorithm 3. We then describe a sequence of improvements.
For algorithm 3 and all subsequent algorithms, we analyze assuming that λ = λ, i.e., for purposes of analysis we consider the problem of recovery rather than detection.All these algorithms report success or failure and, if they fail, they are rerun until they succeed.We give bounds on the expected runtime under the assumption that λ = λ.If we consider the problem of detection, and if λ = 0, then the algorithm will not report success in the given runtime (since it will be unable to succeed in a certain phase estimation procedure) and so all these algorithms can also be used for detection by running them for some multiple of the given runtime and reporting that λ = 0 if success is not reported in that time.

Maximally Entangled (or Maximally Mixed) Input State
Algorithms 3,4 in this subsubsection work on a Hilbert space which is the tensor product of two Hilbert spaces, each of which have dimension D(N, n bos ).(In some versions of the Hamiltonian simulation used in the algorithms, it is convenient to embed the symmetric subspace of dimension D(N, n bos ) within the full Hilbert space.) We use a tensor product notation A ⊗ B to denote an operator that is a tensor product of two operators A, B on the two different tensor factors.
Algorithm 3 Quantum Algorithm (simplest, unamplified version).This and all other quantum algorithms have the same inputs.outputs, and parameter choices as algorithm 2.
1. Prepare a maximally entangled state between the two qudits.
Steps 1 − 2 are designed to prepare a state whose density matrix on the first qudit has large projection onto the eigenspace of eigenvalue ≥ E cut .For purposes of analysis, we trace out the second qudit, so that the input state on the first qudit is a maximally mixed state.If success is reported then (ignoring phase estimation error) we have indeed projected onto this eigenspace.Ignoring phase estimation error, the probability of success is ≥ 1/D(N, n bos ).
Remark: we prepare a maximally entangled state between the two qudits so that the density matrix on the first state is maximally mixed; we could equally well modify the algorithm to use only a single qudit (reducing the space by a factor of 2) and prepare a random state on the first qudit.This modification however requires some care when we discuss a version of the algorithm that uses amplitude amplification below.In the unamplified version, we can choose a new random state (for example, choosing it uniformly from any orthogonal basis) each time we perform phase estimation; however, in the amplified version, we must choose a fixed random initial state on the first qudit and amplify the algorithm with that choice of initial state.We claim (we omit the proof) that if one picks a tensor product of random single qudit states |v 1 ⊗ v 2 ⊗ . . .⊗ v n bos where v a are independently chosen from a Haar uniform distribution on the sphere |v a | = 1, with high probability the state has squared overlap with the leading eigenvector close to N −n bos ; more precisely, "close" means that with high probability the logarithm of the squared overlap is at least −n bos log(N )•(1+o(1)).Note that we might not have this property of the overlap if we had chosen the initial state from the computational basis, for example.Note also that we can efficiently prepare states from this distribution.Sketch of proof of claim: consider the logarithm of the probability that the sequence of measurements |v i i v i | succeeds for i = 1, . . ., n bos in sequence.The probability that the i-th measurement succeeds, conditioned on the previous measurements succeeding, can be computed from the trace of |v i v i | with the reduced density matrix on the i-th qudit of some state (i.e., the leading eigenvector projected by the previous measurements), and for any such reduced density matrix with high probability the logarithm of the trace is at least − log(N ) Step 3 of the algorithm measures some property of a state in the eigenspace with eigenvalue ≥ E cut .It is possible that each time the algorithm is run, one obtains a different energy measurement and hence a different state, so that measuring this property of the state gives some expectation value of a † µ a ν in a mixed state.This does not matter since theorems 3 or 5 also hold for mixed states.
We explain the measurement in step 3 in more detail below.The simplest possibility is to simply measure one matrix element of a † µ a ν .Since there are N 2 matrix elements, we then need to repeat the algorithm poly(N, log( )) times to measure each matrix element to accuracy .We explain improvements to this in subsection 5.3.
We explain the phase estimation in more detail below.First, let us analyze the algorithm in a rough outline.Let the phase estimation be carried out to a precision sufficiently smaller than E 0 − E max .To define this, we work in an eigenbasis of H(T 0 ).Let ˜ be a bound on the error probability of the phase estimation.More precisely, we will say that phase estimation implements an operator E P E which is diagonal in the eigenbasis such that on a normalized eigenvector v i with eigenvalue λ i we have The reader should appreciate that there are a few energy scales here, chosen rather arbitrarily.The exact value of the energies are not too important.We have picked E cut to be partway between E 0 and E max so that it is very unlikely that the largest eigenvalue of H(G) is above E cut and also very unlikely that the λ 1 < E cut .We have picked the energy cutoff in step 2 to be (E 0 + E cut )/2 simply to pick some energy partway between E 0 and E cut so that it is very unlikely that phase estimation reports success on an eigenstate with energy < E cut ; see first line of Eq. ( 41).In the last line of Eq. (41) we wrote (7/8)E 0 + (1/8)E max simply to pick some energy scale slightly below E 0 above which it is very likely that phase estimation reports success; for us later, it would suffices to have any instead a bound for λ i ≥ (1 − η)E 0 + ηE max for any η > 0. Of course, the two lines in Eq. ( 41) are not completely symmetric about Then, choose ˜ = /D(N, n bos ), so that the algorithm reports success with probability at least (1 − ˜ )/D(N, n bos ) and, given that the algorithm reports success, the resulting state has projection onto the eigenspace with eigenvalue ≥ E cut which is greater than or equal to where the first term in the denominator is the probability that it reports success on ψ target as input and the second term is the probability of reporting success on a state in the eigenspace with eigenvalue < E cut , multiplied by D(N, n bos ) − 1, i.e., multiplied by an upper bound on the dimensionality of that eigenspace.Taking 1, we can obtain a large projection onto the eigenspace with eigenvalue ≥ E cut , so that the cost of phase estimation increases logarithmically with D(N, n bos )/ .
The success probability for 1 is greater than or equal to (1/D(N, n bos ))(1 − /D(N, n bos )), so for small it is very close to 1/D(N, n bos ).Hence, repeating the algorithm until it succeeds, the expected runtime to obtain a single measurement of one matrix element of a † µ a ν is bounded the time for phase estimation multiplied by O(D(N, n bos )).
To perform phase estimation, we use controlled simulation of the Hamiltonian H(T 0 ).There are a large number of quantum simulation algorithms which would work here, such as Refs.[14,15,16,17,18] to name just a few.There are two broad possibilities.The first possibility is to work in the symmetric subspace of dimension D(N, n bos ).In this case, H(T 0 ) is a sparse Hamiltonian, and sparse simulation algorithms apply.The second possibility is to use the Hamiltonian of Eq. ( 5) and embed the symmetric subspace into the full Hilbert space; in this case, H(T 0 ) is a local Hamiltonian, in that each terms acts on a small number of qudits, each of dimension N , and local simulation algorithms apply.The cost for these algorithms to simulate for a time t to error ˜ is poly(t H(T 0 ) , n bos , N, log(˜ )).
Using the simplest phase estimation algorithm of Ref. [19], the number of bits that we need to phase estimate is s = O(log( H(T 0 ) /(E 0 − E max )).The most expensive bit to obtain is the least significant bit, since obtaining the j-th least significant bit requires simulating for a time proportional to 2 s−j (E 0 − E max ) −1 .So, we can obtain the least significant bit to error ˜ /2, then obtain the next least significant bit to error ˜ /4, and so on, making the total error ˜ .Of course, a large number of variations of the Kitaev phase estimation algorithm exist in the literature, and any could be used here.
We can speed this algorithm up quadratically by applying amplitude amplification [20].Modify the phase estimation step 2 of algorithm 3 so that the algorithm phase estimates the eigenvalue, determines if the eigenvalue is larger than E cut , then uncomputes the eigenvalue, returning just a single bit of success or failure.See algorithm 4.Then, applying amplitude amplification, with high probability the algorithm succeeds in expected time D(N, n bos ) 1/2 poly(N, n bos , 1/(E 0 − E max ), log(D(N, n bos )/ )).Multiplying by poly(N, 1/ ) to measure a † µ a ν to accuracy , the the expected time is still giving a quadratic time improvement, up to poly(N ) factors, and an exponential space improvement, over the fastest classical algorithm described above.
Algorithm 4 Quantum Algorithm (amplified version) 1. Apply amplitude amplification to steps 1 − 2 of algorithm 3, modifying step 2 to uncompute the eigenvalue and return only success or failure.

Chosen Input State: Simple Version
We can obtain a further quadratic speedup by modifying the initial state that we phase estimate.In this subsubsection, let us first explain an algorithm that gives the basic idea of the initial state preparation; we will only be able to prove some slightly weaker results for this algorithm (in particular we will prove a lower bound on the average inverse runtime, rather than an upper bound on the average runtime).We expect that this is primarily a technical issue and that some concentration of measure argument should allow us to prove an upper bound on the average runtime.We will then describe in the next subsubsection a modification to the algorithm which avoids this technical difficulty and for which we can prove a quadratic improvement without further assumption.Instead of a maximally entangled state, we can use the tensor T 0 to prepare a state with a larger projection onto ψ target .In this subsubsection, we will work in the D n bos -dimensional Hilbert space of Eq. ( 5), so we will use n bos qudits each of dimension N .The unamplified version is algorithm 5 and a version with amplitude amplification is algorithm 6.
The most important new step that must be explained is the initial state preparation (we discuss some other details at the end of this subsubsection).We use the fact that, given a classical list of amplitudes for some M -dimensional vector, with the vector having unit norm, we can prepare a quantum state on an M -dimensional qudit with the given amplitudes (up to an ill-defined overall phase, of course) using a quantum circuit of depth O(M ) and using O(M ) classical computation.For example, labelling the basis states |0 , |1 , . . ., |M − 1 , one can start with initial state |0 and apply a sequence of M −1 rotations in the two dimensional subspaces spanned by i , |i+1 for i = 0, . . ., M −2.
Algorithm 5 Quantum Algorithm (improved input state, unamplified version) 1. Use T 0 to prepare the initial state Ψ input of Eq. (42).
2. If the initial state is not in the symmetric subspace, report "failure".If the state is in the symmetric subspace, apply phase estimation using H(T 0 ).Let ψ be the resulting state.If the resulting eigenvalue is larger than (E 0 + E cut )/2, report "success".Otherwise, report "failure".
In this subsubsection, for simplicity in analysis, we assume that the error in phase estimation is small enough to be negligible.

Algorithm 6 Quantum Algorithm (amplified version)
1. Apply amplitude amplification to steps 1 − 2 of algorithm 5, modifying step 2 to uncompute the eigenvalue and uncompute the determination of whether the state is in the symmetric subspace and to return only success or failure.
We use the same method to produce the input state for both even and odd p.First consider the case that n bos is an integer multiple of p.For any tensor T of order p, let |T denote the vector on p qudits (each of dimension N ) with amplitudes given by the entries of the tensor.This vector is correctly normalized if |T | = 1.We prepare the input state So, Ψ sig has some non-negligible projection onto the desired eigenspace.This does not however yet give us a lower bound on Ψ input |E P E |Ψ input : we can expand Ψ input as a linear combination of Ψ sig and some orthogonal state but we have not bounded the cross-terms in the expectation value.
However, we now give heuristic evidence (not a proof) for a lower bound on E[ The main assumption we will make is that λ 1 = E 0 • (1 + o(1/ log(N )); we expect that that assumption can be proven to hold with high probability.We consider just the case of even p (we expect that odd p can be handled similarly).
The main reason that we do not give a full proof is that a lower bound on E[ Ψ input |E P E |Ψ input ] will only imply a lower bound on the expected inverse squared runtime, rather than an upper bound on the expected runtime, which is what we really want.Instead in the next subsubsection we give a modified algorithm with an upper bound on the expected runtime.Let us note that we conjecture that algorithm 6 does have a quartic speedup for the expect runtime with high probability.One might guess that this could be proven using the bound on expectation value of the inverse squared runtime and some concentration of measure argument.However, we have not been able to make this precise.
Let us clarify some terminology to distinguish two meanings of the word "expected", corresponding to averages over G or averages over outcomes of a quantum algorithm, i.e., to the "expected runtime".From here on, when we refer to the "runtime" of a phase estimation algorithm, this is a short way of saying the expected runtime for a given choice of G.When we refer to the "expectation value of the runtime", we mean the expectation value over G of this expected runtime.Applying amplitude amplification, the runtime is bounded by the time for the state preparation and phase estimation multiplied by the inverse square-root of Ψ input |E P E |Ψ input .So, lower bounding E G [ Ψ input |E P E |Ψ input ] will upper bound the expectation value over G of the inverse squared runtime and we will find a bound on the expectation value of the inverse squared runtime by For fixed N −p/4 /λ, this gives a further quadratic improvement, in terms of the scaling of the runtime with N , over algorithm 4.
Given the existence of that modified algorithm of subsubsection 5.2.3, we will just sketch an outline of a possible proof of the lower bound on E[ Ψ input |E P E |Ψ input ], leaving several details out.The basic idea is to lower bound this expectation value by a tensor network, working in some approximation which amounts to ignoring fluctuations in λ 1 , then average the tensor network by a sum of pairings, and use this sum to lower bound the expectation value.Roughly the physical idea is that the terms in Ψ input proportional to G will tend on average to increase the overlap Ψ input |ψ target , rather than decrease it.
Consider the operator λ −m 1 H(T 0 ) m for large m.If we take m sufficiently large compared to n bos log(N )/ log(E 0 /E max ), this will operator will lower bound E P E up to some negligible error.That is, we take m large enough that λ −m 1 H(T 0 ) m is negligibly small acting on any eigenvector v i with eigenvalue λ i ≤ (7/8)E 0 + (1/8)E max , and for λ i ≥ (7/8)E 0 + (1/8)E max , Eq. ( 41) gives a lower bound on E P E that is equal to 1 up to some negligible phase estimation error ˜ while clearly λ −m 1 H(T 0 ) m ≤ 1.For fixed E 0 /E max , it suffices to take m = O(n bos log(N )).
For the range of m that we consider here, by assumption we can ignore the fluctuations in λ 1 , i.e., we approximate where in the second line we further approximate that we can ignore fluctuations in the norm T can be evaluated by a sum of tensor networks, using the full Hilbert space, i.e., for each of m choices of i 1 , . . ., i p/2 in each of the m factors of H(T 0 ) we have a tensor network).We can then write each tensor network as a sum of tensor networks, inserting either λv ⊗n sig or G for each tensor, and we can average these tensor networks over G using the methods of appendix A by summing over pairings.Since every term in this sum over networks and pairings is positive, if we restrict to some subset of terms, we get a lower bound.Let us restrict to the terms in which for Ψ input , we choose λv ⊗p sig for every tensor.For this set of terms, the tensor network computes precisely We make some implementation remarks on the algorithm.The algorithm as described requires measuring whether we are in the symmetric subspace.Note that the input state Ψ input need not be in the symmetric subspace.Such a projection can be done for example by phase estimating a Hamiltonian which is a sum of permutation operators.One can also omit this projection onto the symmetric subspace since our upper bounds on H(G) holds both in the full Hilbert space and in the symmetric subspace.
We have considered the case that n bos is an integer multiple of p.If n bos = kp + l for some integers k, l with 0 < l < p, then one can use l ancilla qudits, and prepare an input state which is equal to on kp qudits, tensored with a maximally entangled state between the remaining l qudits and the remaining l ancillas.The idea is that we get the additional quadratic improvement in overlap on kp of the qudits, and the remaining l ancilla only cost 1/poly(N ) overlap since l = O(1).

Chosen Input State: Modified Version
We now modify the algorithm 5 (and its amplified version) to obtain an algorithm for which we can prove the quadratic improvement over algorithm 4 without any assumption.Consider given T 0 .Let ∆ be a p-th order tensor, chosen from the same distribution as G. Consider the tensor Remark: the reader may wonder why we picked x = 1/ log(N ) rather than a more slowly decaying function.Indeed, any choice of x which is o(1) would suffice, and this would replace the factor log(N ) 4n bos with some other slowly decaying function.

Further Improvements
In this section, we sketch one improvement that might be practically useful even if it is not theoretically interesting.For that reason, we do not give any formal proofs in this section.As we have explained the algorithm above, once the algorithm succeeds in projecting onto the leading eigenvector, it measures some element of the single particle density matrix.It is necessary to repeat this operation, including both preparing the leading eigenvector and measuring, a total of poly(N, 1/ log( )) times to measure the single particle density matrix to error , and, as explained, the algorithm needs to try again to prepare the leading eigenvector each time.The multiplicative overhead poly(N, 1/ log( )) can be turned into an additive overhead using the non-destructive measurement technique introduced in Ref. [22].To implement this, we work in the full Hilbert space (the leading eigenvector preparation can be implemented in the symetric subspace, but after preparing the leading eigenvector we work in the full Hilbert space).Then, we implement a two-outcome projective measurement on a single qudit (this measurement takes us outside the symmetric subspace though it does leave us in a subspace which is symmetric under permutation of a subset of n bos − 1 of the qudits), such as |µ 1 µ| for some µ.Using the non-destructive measurement techniques of Ref. [22], we can alternate this two-outcome projective measurement with a projector onto the leading eigenvector (implemented approximately using phase estimation as above) to recover the leading eigenvector.See Ref. [22] for more details.

Networks
In this paper, we have used the language of tensor networks to perform some of the estimates.In this section, we explore the relation to tensor networks further.Let us consider the simplest task of detection so that the spectral algorithms need only to estimate the largest eigenvalue of H(T 0 ).In

Discussion
The main new results of this paper are a sequence of spectral algorithms for odd p, and a quantum algorithm to compute the single particle density matrix of some vector in an eigenspace with large eigenvalue.The quantum algorithm could be applied to other matrices other than H(T 0 ) considered here such as Y from Ref. [1]; it is not certain though if the speedup using an input state chosen based on T 0 would apply to such other matrices.
In this paper, we have related the spectrum for odd p to that for even q = 2(p − 1), up to an overall scale factor √ N , at least in the case of Gaussian distributed entries of G.This enables us to prove correctness of the algorithm for odd p.Also, for n bos = p − 1, the spectrum for q = 2(p − 1) is given by random matrix theory, removing a logarithmic factor present for odd p in some previous work.See corollary 2. We have not considered the case of entries of T 0 being chosen from a discrete distribution, however, we believe that a similar reduction from odd p to even q = 2(p − 1) could be proven in that case.
We have used entries of G chosen from a complex Gaussian distribution in the case of odd p to simplify the proof of the bounds on the spectrum for odd p.We conjecture however that a modification discussed above to cancel certain "pairing terms" would work for real entries of G.
i.e., it increases n µ by one, leaving the other n ν for ν = µ unchanged, and it multiplies by a scalar n µ + 1.The annihilation operator a µ is defined to be the Hermitian conjugate of the creation operator and one may verify that a † µ a µ |n 1 , . . ., n N = n µ |n 1 , . . ., n N .
The number operator n ≡ µ a † µ a µ obeys n|n 1 , . . ., n N = µ n µ |n 1 , . . ., n N .In this paper we restrict to the subspace with n = n bos .This gives us a finite basis of states; indeed, restricting to any finite value for n will give a finite basis.
Finally, the single-particle density matrix is defined by The remarkable property of the second-quantized notation is how it describes behavior in the symmetric subspace of the full Hilbert space.Any Hamiltonian such as Eq. ( 5) which is symmetric under permutation (that is, permuting basis vectors by replacing |µ 1 ⊗ . . .⊗ |µ n bos with |µ π(1) ⊗ . . .⊗ |µ π(n bos ) where π is an arbitrary permutation on n bos elements), will preserve the symmetric subspace, i.e., the subspace which is invariant under permutation.An othonormal basis for this symmetric subspace can be given by symmetrizing the basis vectors |µ 1 ⊗ . . .⊗ |µ n bos for µ 1 ≤ µ 2 ≤ . . .≤ µ n ; we impose this (arbitrary) ordering on the µ to avoid redundancies.The symmetrization of this basis vector is then equal to (using the notation of this appendix) a basis vector of the form |n 1 , n 2 , . . ., n N where for each µ ∈ 1, . . ., N , the number n µ is equal to the number of a ∈ 1, . . ., n bos such that µ a = µ.This basis is also equal to (up to normalization constant) the basis a † µ1 . . .a † µn bos |0 used in the text.Then, the single particle density matrix that we have defined above using creation and annihilation operators is equal to (using the formalism of the full Hilbert space) the reduced density matrix (i.e., the marginal) on the first tensor factor in the full Hilbert space (indeed, by symmetry, it is also equal to the marginal on any of the n bos tensor factors).Similarly, a symmetric Hamiltonian such as Eq. ( 5) can be expressed compactly using creation and annihilation operators as in Eq. ( 6).

⊗n bos /p 0 |T
⊗n bos /p 0 and treat it as a constant (the proof is a standard large deviation argument on the norm of the tensor).The quantity T ⊗n bos /p 0 |H(T 0 ) m |T ⊗n bos /p 0 means adding the Hermitian conjugate of the given terms, so that H(T ) is Hermitian and where |µ i ν| denotes the operator |µ ν| on qudit i.We require that n bos ≥ p/2 or else H(T ) is trivially zero.indices of T and applies the same permutation to the last p/2 indices of T .We may restrict to the symmetric subspace of this Hilbert space.We write D(N, n bos ) to indicate the dimension of this subspace.For N n bos , we can approximate D(N, n bos ) ≈ N n bos /n bos !.
and each such operator is equal to k! n bos Remark: this result is clearly not tight in the regime where random matrix theory is accurate (n bos = p/2).It is interesting to see what happens there.The correlation function E G,η [tr exp{(1 − s 1 )τ H(G)}H(η) exp{s 1 τ H(G)}H(η) ] is not independent of s 1 , but rather decays as a function of s 1 for s 1 ≤ 1/2 (of course, it increases again as s 1 becomes larger than 1/2).Considering the regime s 1 1/2, using the square-root singularity at the edge of the Wigner semi-circle we can estimate that it decays as s Preparing this state takes circuit depth O(N p ) since we can prepare n bos /p copies of the state 1 |T0| |T 0 in parallel.We want to know the expectation value Ψ input |E P E |Ψ input , but to get oriented, let us estimate the overlap Ψ input |Ψ sig .We have E[|G| 2 ] = O(N p ), where the precise constant in the big-O notation depends on whether we symmetrize G or not and whether we use complex entries or not.Further, |G| 2 is a sum of squares of independent random variables (the entries of G).So, by central limit, with high probability |G| 2 is bounded by O(N p ). So, with high probability, |T 0 | 2n bos /p = O(N n bos ).For λ = CN −p/4 , this is (1 − o(1))C 2n bos /p N −n bos /2 .If ψ target were equal to Ψ sig = N −n bos /2 |v ⊗n bos sig , then for fixed N −p/4 /λ, Eq. (46) would give a lower bound to the squared overlap of the initial state with ψ target which would be quadratically better (in terms of its scaling with N ) than the squared overlap for the maximally entangled input state.So, after applying amplitude amplification, this would give a quartic improvement over the fastest classical algorithm.However, ψ target is not equal to Ψ sig = N −n bos /2 |v ⊗n bos Suppose that E 0 ≥ (1 + c)E max .Then, the projection of Ψ sig onto the eigenspace with eigenvalue ≥ (7/8)E 0 + (1/8)E max is greater than or equal to Ω(1)• c/(1 + c).Proof.The expectation value Ψ sig |H(T 0 )|Ψ sig was estimated in subsections 3.2.1,3.2.2.We have that with high probability, this expectation value is ≥ (1 − o(1))E 0 .With high probability, the largest eigenvalue of H(T 0 ) in absolute value is bounded by (1+o(1))(E 0 +E max ); for the even case this is just the triangle inequality, while for the odd case this uses lemma 3. Hence, by Markov's inequality applied to λ 1 − H(T 0 ), the projection of Ψ sig onto the eigenspace with eigenvalue ≥ (7/8)E 0 + (1/8)E 0 = λN p + v ⊗p sig |G .(43)Theprobabilitydistribution of v ⊗p sig |G is a Gaussian with zero mean and unit variance, so with high probability, v ⊗p sig |G is o(λN p ); indeed, for any increasing function of N which diverges as N → ∞,= (1 − o(1)) • λ n bos /p N n bos .sigandso this does not give a lower bound on ψ target |Ψ input .However, we have: Lemma 8.
So the tensor network is lower bounded by E m 0 |λ n bos /p v ⊗n bos sig | 2 .So, with these approximations we have lower bounded E[ Ψ input |E P E |Ψ input ].