Simultaneous estimation of multiple eigenvalues with short-depth quantum circuit on early fault-tolerant quantum computers

We introduce a multi-modal, multi-level quantum complex exponential least squares (MM-QCELS) method to simultaneously estimate multiple eigenvalues of a quantum Hamiltonian on early fault-tolerant quantum computers. Our theoretical analysis demonstrates that the algorithm exhibits Heisenberg-limited scaling in terms of circuit depth and total cost. Notably, the proposed quantum circuit utilizes just one ancilla qubit, and with appropriate initial state conditions, it achieves significantly shorter circuit depths compared to circuits based on quantum phase estimation (QPE). Numerical results suggest that compared to QPE, the circuit depth can be reduced by around two orders of magnitude under several settings for estimating ground-state and excited-state energies of certain quantum systems.


Introduction
The estimation of ground-state energies and excited-state energies of a Hamiltonian is a fundamental problem in quantum physics with numerous practical applications, such as in the design of new materials, drug discovery, and optimization problems.While the ground-state is often the state of interest for many quantum systems, excited-state energies also provide crucial information for understanding the electronic and optical properties of materials.Classical computers are limited in their ability to accurately estimate these energies for large-scale systems, and quantum computers have the potential to provide a significant speedup in solving such problems.Therefore, developing efficient and accurate methods for estimating ground-state and excited-state energies on quantum computers has become a major area of research in quantum information science.
When estimating multiple eigenvalues on quantum computers, there are two different strategies to consider.The first method involves preparing a variety of initial states that approximate different target eigenstates, and then estimating each eigenvalue one by one.The second method is to prepare a single initial state that has nontrivial overlap with all eigenstates of interest and estimate the eigenvalues simultaneously.The effectiveness of each approach depends on the assumptions and qualities of the initial state.This paper concerns the second approach.Given a quantum Hamiltonian H ∈ C M ×M , we assume that we can prepare an initial quantum state |ψ⟩ with K dominant modes.Specifically, let {(λ m , |ψ m ⟩)} M m=1 denote the eigenvalue and eigenvector pairs of H, and define p m = |⟨ψ m |ψ⟩| 2 as the overlap between the initial state and the m-th eigenvector.We assume that we can choose a set D ⊂ {1, 2, • • • , M } satisfying |D| = K, and p m = Ω(R (K) ) for any m ∈ D.Here R (K) = m ′ ∈D c p m ′ is called the residual overlap, and D c = {1, 2, • • • , M }\D.The eigenvalues {λ m } m∈D are called the dominant eigenvalues of H with respect to the initial state |ψ⟩ (or dominant eigenvalues for short), and the associated eigenvectors are called the dominant eigenvectors (or dominant modes).We assume {λ m } m∈D ⊂ [−π, π] for simplicity.Using an oracle access to the Hamiltonian evolution operators e −itH for any t ∈ R, we introduce an efficient quantum algorithm to estimate these dominant eigenvalues.We quantify the efficiency of a quantum algorithm by means of the maximal runtime denoted by T max , and the total runtime T total .Assuming an algorithm needs to run a set of Hamiltonian evolution operators {exp(−it n H)} N n=1 , then the maximal runtime is T max = max 1≤n≤N |t n | and the total runtime is T total = N n=1 |t n |.Here, T max and T total approximately measure the circuit depth and the total cost of the algorithm, respectively, in a way that is oblivious to the details in implementing the Hamiltonian evolution operator e −itH .
Our algorithm satisfies the following properties: (1) Allow a nonzero residual overlap: R (K) = m ′ ∈D c p m ′ > 0.
(4) Reduce the circuit depth: the maximal runtime can be (much) lower than π/ϵ, especially when the ratio of residual overlap to the minimum overlap in the dominant set R (K) /(min m∈D p m ) approaches zero.
(5) Use the information of the spectral gap to further reduce maximal runtime: in the presence of a spectral gap ∆, when ϵ ≪ ∆, the maximal runtime can be independent of ϵ as O(∆ −1 ), and in this case, the total runtime is O(ϵ −2 ) (see detail in Remark 2).
Our algorithm is designed to perform only classical postprocessing on the quantum data that is generated by the Hadamard test circuit, which uses one ancilla qubit.It is particularly well-suited for early fault-tolerant quantum devices that may be limited in the number of ancilla qubits and maximal coherence times.
The structure of this paper is organized as follows: In Section 2, we present the main idea of our method and provide a brief summary of related works.Next, in Section 3, we describe the main algorithm and provide the complexity results.The proof is included in the Appendix.Finally, we showcase the efficiency of our algorithm through numerical examples in Section 4.

Main idea and related work
In order to illustrate the main idea of the algorithm, we assume that we can use a quantum circuit (see Section 3.1) to accurately estimate ⟨ψ| exp(−itH) |ψ⟩ = M m=1 p m exp(−iλ m t) (1) for any t ∈ R, where t is drawn from a probability distribution with a symmetric density a(t), i.e., a(t) ≥ 0, a(t) = a(−t), and ∥a(t)∥ L 1 = ∞ −∞ a(t)dt = 1.In our implementation, to estimate (1), we first use a classical computer to randomly sample t from distribution a(t).Then, we execute the Hadamard test circuit (Fig. 2) on a quantum computer multiple times and average the output of measurement to estimate (1).We emphasize that the random selection of t plays an important role in improving the efficiency of our algorithm (see Section 2.1).
Define the Fourier transform of the probability density which is a real-valued even function.As detailed in the following explanation, our approach requires carefully selecting an appropriate distribution a(t) to ensure that the resulting filter function F (x) reaches its maximum at x = 0 and decay rapidly as |x| increases.There are several choices of a(t) such that F (x) concentrates around x = 0.In this paper, we choose a(t) as the density function of a truncated Gaussian function: where the normalization constant The selection of a(t) is guided by the observation that the Fourier transform of a Gaussian function remains a Gaussian function.More specifically, when γ = ∞, we have , which attains its maximal value at x = 0 and exponentially decays to zero with respect to T |x|.In addition, the use of the truncated Gaussian ensures that the maximal runtime never exceeds γT .We may also use the notation a T (t) = a(t), F T (x) = F (x) to emphasize the dependence on T .
From this we can define an ideal loss function as Then we can we estimate the dominant eigenvalues {λ m } m∈D by solving the optimization problem For example, consider the case K = 1, we can obtain a closed form expression for the r 1 variable, and the optimization problem in Eq. ( 5) can be rewritten as and Since F (x) is a function that concentrates around x = 0, we expect that θ * 1 ≈ λ m * , where m * = arg max m p m .
In practice, we can only generate an approximate value of ⟨ψ| exp(−itH) |ψ⟩ up to statistical errors for a finite number of time steps t.As a result, the loss function we use in our optimization problem is only an approximation of the ideal loss function presented in Eq. ( 4).The main objective of this work is to generalize this idea to the estimation of multiple eigenvalues, and to rigorously implement this idea and control the complexity.
Using the concentration property of F (x), we can demonstrate that the solution of the optimization problem is within δ/T of the dominant eigenvalues λ m .Therefore by choosing T max = δ/ϵ, the parameter δ directly affects the circuit depth, and can be selected to be arbitrarily small if R (K) is small enough.Additionally, the solution of the optimization problem presented in Eq. ( 5) is robust to the measurement noise, which means that we do not need to generate a large number of data points and samples to construct the approximated optimization problem.This ensures that the algorithm can achieve the Heisenberg limit, i.e., T total = O(1/ϵ) (see Section 3.3).

Related work
If the initial quantum state |ψ⟩ is a precise eigenvector of H with a single dominant eigenvalue, such as λ 1 , the Hadamard test algorithm is the simplest algorithm for estimating this eigenvalue with ϵ-accuracy.Given any real number T ̸ = 0, we can run the Hadmard test circuits (Fig. 2) several times to generate an estimate 1 can be arbitrarily close to λ 1 by increasing the number of measurements.This observation implies that the maximal running time T max = 1 suffices to achieve arbitrary precision when employing the Hadamard test algorithm.Although the Hadamard test algorithm only utilizes a short circuit depth, it has several limitations: (1) the initial state |ψ⟩ must be an exact eigenstate, (2) the total runtime T total does not satisfy the Heisenberg-limited scaling and is Ω(ϵ −2 ), and (3) it cannot estimate multiple eigenvalues.The first two limitations can be addressed by using quantum phase estimation (QPE) and its variants [17,19,26,31,1,13,12].The textbook version of QPE [28] uses multiple ancilla qubits to store the phase and relies on inverse quantum Fourier transform (QFT).However, it can be difficult to employ the textbook version of QPE for multiple eigenvalue phase estimation due to the difficulty in handling spurious eigenvalues.In addition, although QPE can start from an imperfect eigenstate and saturates the Heisenberg-limited scaling, it requires the maximal runtime T max at least π/ϵ to achieve ϵ accuracy [28, Section 5.2.1].
Ref. [9] recently proposed a multiple eigenvalue phase estimation method that satisfies the Heisenberg limit scaling without any spectral gap assumptions.However, the theoretical analysis assumes that all non-dominant modes vanish [9, Definition 3.1 and Theorem 4.5], i.e., p m ′ = 0 for any m ′ ∈ D c .Additionally, it has not yet been demonstrated that the method satisfies the short-depth property (4) described at the beginning of the paper.
An alternative way to understand the phase estimation problem is to view it as a sparse Fourier transform problem in classical signal processing.In recent years, the sparse Fourier transform problem has been extensively studied in both the discrete [38,3,40,8,2,16] and continuous [32,4,15] settings.In particular, Ref. [32] concerns the recovery of the Fourier frequency of a continuous signal in the low noise setting.Our Theorem 1 shares similarities with [32,Theorem 1.1].However, in our approach, each data point is generated by running the Hadamard test once, resulting in significantly higher noise level than that in [32].Our optimization based method may also be easier to implement in practice.Very recently, Ref. [21] introduces a robust multiple-phase estimation (RMPE) algorithm that can be considered as an extension of the multi-level robust phase estimation method (RPE) proposed in an earlier work [27].By leveraging a multi-level signal processing technique [20], the RMPE algorithm can estimate multiple eigenvalues with Heisenberg-limited scaling without spectral gap assumptions.Under an additional spectral gap assumption, and with a different technique called ESPRIT [22] originally developed for super-resolution of signals and capable of efficiently estimating multiple eigenvalues in polynomial time [36], the resulting algorithm can satisfy the short-circuit depth property when the residual overlap R (K) /(min m∈D p m ) is small.
Quantum subspace diagonalization (QSD) methods, quantum Krylov methods, and matrix pencil methods [5,14,18,24,25,30,33,35,29] estimates the eigenvalues by solving certain projected eigenvalue problems or singular value problems, and have been used to estimate ground-state and excited-state energies in a number of scenarios.Classical perturbation theories suggest that such projected problems can be highly ill-conditioned and the solutions are sensitive to noise.However, it has been observed that these methods can perform better than the pessimistic theoretical predictions.Recently, Ref. [10] provided a theoretical analysis explaining this phenomenon in the context of quantum subspace diagonalization methods used for computing the smallest eigenvalue.Nevertheless, the error of these methods generally increase with respect to the number of data points, which is related to the dimension of the projected matrix problem.In contrast, the error of our algorithm decreases with respect to the number of data points, making it much more robust to measurement noise.It is worth noting that the sharp complexity analysis of these methods (especially, with respect to the dimension of the projected matrix problem) remains open.
The loss function (4) utilized in this work is inspired by a recently developed subroutine called quantum complex exponential least squares (QCELS) [6].QCELS employs the same quantum circuit as depicted in Fig. 2 to generate data and uses an optimization-based approach for estimating a single dominant eigenvalue.Although our method is also based on solving an optimization problem, it is important for us to emphasize that directly extending QCELS to estimate multiple eigenvalues does not yield satisfactory results.We detail the differences in the following: • QCELS samples the time steps t on a uniform grid as t n = n N T for 0 ≤ n ≤ N − 1.In order to satisfy the Heisenberg-limited scaling, Ref. [6] proposed a multi-level QCELS algorithm that gradually increases the step size τ = T /N as one refines the estimate to the eigenvalue.However, such an approach can result in aliasing problems for multiple eigenvalue estimation since it may not be possible to distinguish exp(−iλ m 1 t n ) and exp(−iλ m 2 t n ) in the loss function if (λ m 1 − λ m 2 )T /N = 2πq for some m 1 ̸ = m 2 and q ∈ Z.To overcome this difficulty, we combine the random sampling strategy [23,37] with QCELS and sample t n 's randomly from a probability density a(t).
• For estimating a single dominant eigenvalue, our method also achieves improved theoretical results.In Ref. [6], QCELS requires p 1 > 0.71 and the circuit depth satisfies T max = δ/ϵ, where δ = Θ( √ 1 − p 1 ).By applying our method to the single eigenvalue estimation problem, our analysis demonstrates that the method works for any p 1 > 0.5, and the constant δ can be improved to δ = Θ(1−p 1 ).This improvement tightens the bound for the circuit depth.For clarity, the disparities in the constant dependence can be observed in Fig. 1 • As previously noted, QCELS [6] is formulated as a multi-level algorithm to attain Heisenberg-limited scaling and mitigate the aliasing problem.In our work, we also develop MM-QCELS as a multi-level method to help us control the random noise throughout the entire optimization domain.However, unlike QCELS [6], our approach can be simplified to solve the optimization problem with suitable parameters only once to approximate the dominant eigenvalues, resulting in a one-level algorithm.We put detailed discussion in the first comment of Remark 2.
• In the original QCELS [6], it was demonstrated that the maximum runtime can be further decreased to T max = Θ(1/(λ 2 − λ 1 )) by initially employing a ground state filter [23] to enhance p 1 .Our approach extends this characteristic to the estimation of multiple eigenvalues without requiring the use of a ground-state filter.This significantly simplifies the post-processing procedure.See the second comment of Remark 2.
We include a summary of the comparison among various methods for estimating multiple eigenvalues in Table 1.

Main method and informal complexity result
This section begins by presenting a quantum circuit for data generation.Subsequently, we use these data points to formulate an approximated optimization problem and propose the main algorithm.Finally, we provide an intuitive complexity analysis for our algorithm, followed by its rigorous statement.
Table 1: Comparison of various methods for estimating multiple eigenvalues.

Generating data from quantum circuit
The quantum circuit used in this paper is the same as that used in the Hadamard test.In Fig. 2, we may (1) Set W = I, measure the ancilla qubit and define a random variable X n such that (2) Set W = S † (S is the phase gate), measure the ancilla qubit and define a random variable We assume an oracle access to the Hamiltonian evolution operator e −itH for any t ∈ R.This assumption is not overly restrictive.For instance, if we can approximate e −iτ H for some desired τ > 0 using the Trotter method of a certain order (the choice of τ and the order of the Trotter splitting is problem dependent and should balance between efficiency and accuracy), we may express e −itH = e −iτ H p e −iτ ′ H , where t = pτ +τ ′ , p ∈ Z, and |τ ′ | < τ .
The cost for implementing e −iτ ′ H should be no larger than that of implementing e −iτ H .The Trotter error can be systematically controlled and this is an orthogonal issue to the type of error considered in this paper (see, e.g., [23,Appendix D] for how to factor such errors into the analysis).Therefore, throughout the paper, we assume that the implementation of e −itH for any t ∈ R is exact.Given a set of time points {t n } N n=1 drawn from the probability density a(t), we use the quantum circuit in Fig. 2 to prepare the following data set: By running the quantum circuit (Fig. 2) for each t n once, we define the output Here X n , Y n are independently generated by the quantum circuit (Fig. 2) with different W and satisfy (8), ( 9) respectively.Hence, we have Therefore Z n is an unbiased estimation ⟨ψ| exp(−it n H) |ψ⟩ = M m=1 p m exp(−iλ m t n ).We also note that if we use the above method to prepare the data set in Eq. ( 10), the maximal simulation time is T max = max 1≤n≤N |t n | and the total simulation time is N n=1 |t n |.Given t n , we can sample the circuit N s times such that Most algorithms, including the QCELS algorithm in Ref. [6], require this repetition step.In this aspect, our algorithm is innovative in that it achieves convergence with just a single sample per t n , i.e., N s = 1.For simplicity, we assume N s = 1 throughout the paper.Increasing N s reduces the statistical noise in Z n and can further decrease the error in the eigenvalue estimate.This also increases T total by a factor of N s without affecting T max .Generate a random variable t n with the probability density a(t).

5:
Run the quantum circuit (Figure 2) with t = t n and W = I to obtain X n .

6:
Run the quantum circuit (Figure 2) with t = t n and W = S † to obtain Y n .

Main algorithm
After generating the data set, we define the numerical loss function (referred to as the loss function for short) as and the optimization problem Compared to the ideal loss function described in Eq. ( 4), the current loss function is considerably noisier.Define the expectation error We note that |E n | is bounded by 3 but not small since Z n is generated by running the quantum circuit only once.On the other hand, the expectation of E n is zero and Define the vector r = (r 1 , r 2 , . . ., r K ) and θ = (θ 1 , θ 2 , . . ., θ K ), and plug this into (13), we have where ⟨a, b⟩ = a•b for any a, b ∈ C. In the second equality, we omit terms that are independent of (r, θ).In the approximation step, we use 2 While the loss function L K may not converge to the ideal loss function as N approaches infinity due to the statistical noise in Z n , we find that the optimization problem in Eq. ( 14) can still yield a solution that approaches that in Eq. ( 5) with the ideal loss.
After formulating the loss function (13) and the optimization problem (14), we are ready to introduce our main algorithm.Inspired by the multi-level QCELS [6], the algorithm contains two steps: • Step 1: Choose a relative small T 0 .Set T = T 0 , a(t) = a T 0 (t) and solve the optimization problem (14) to give a rough estimation for each dominant eigenvalues {λ m } m∈D .
• Step 2: Gradually increase T so that: 1.The solution with T j gives a good initial guess for the optimization problem with T j+1 .
2. The total running time still satisfies the Heisenberg-limit.
The detailed algorithm is summarized in Algorithm 2, which is referred to as the multimodal, multi-level quantum complex exponential least squares (MM-QCELS) algorithm.We note that the optimization problems (16) in Algorithm 2 have the technical constraint ∥r∥ 1 ≤ 1.This constraint is natural since each r k should approximate some p m k , and the sum of the absolute value thus should exceed 1.This constraint is added to simplify the complexity analysis for a uniform upper bound of the expectation error (see Lemma 8 in Appendix E for detail).In practice, we find that the performance of the algorithm is independent of this constraint (see the numerical tests in Section 4).
Algorithm 2 Multi-modal, multi-level quantum complex exponential least squares (MM-QCELS) 1: Preparation: Number of data pairs: {N j } l j=0 ; number of iterations: l; sequence of time steps: {T j } l j=0 ; sequence of probability distributions {a T j (t)} l j=0 ; number of dominant eigenvalues: K 2: Running: ] is the estimation interval.

Complexity analysis of Algorithm 2
In this section, we study the complexity of Algorithm 2. First, the cost of the algorithm depends on a number of parameters used throughout the analysis, including 1.The minimal dominant overlap: 2. The minimal dominant spectral gap: 3. The residual overlap: Now, we are ready to introduce the complexity result of Algorithm 2, which is summarized in the following theorem.Theorem 1.Given the failure probability 0 < η < 1/2, error ϵ > 0, and any ζ > 0. Assume p (K) min > 3R (K) and define q = Θ(R (K) /p (K) min ).In Algorithm 2, we choose the following parameters: min − 3R (K)  .
• For Step 1, set • For Step 2, set l = max{⌈log 2 (q/(ϵT 0 ))⌉ , 1}, T j = 2 j T 0 , and This gives where δ = Θ q log(q −1 ) .In the above equations, the constant in Θ depends at most polynomially on log (p Here, ζ can be chosen arbitrarily close to 0 and the constant in Θ depends on ζ. We put the proof of the above theorem in the Appendix.To provide a clear exposition of the core concept, we first present the intuitive idea of the proof in Appendix B. The key step in our proof is to demonstrate that solving the ideal optimization problem (5) could yield a precise approximation to dominant eigenvalues with a short maximal running time.The rigorous proof of the theorem is then given in Appendix C. In particular, we rigorously bound the approximation error in (15), and optimizing the numerical loss function (13) gives us an accurate approximation of dominant eigenvalues with low cost.

Remark 2. The results of Theorem 1 can be extended as follows:
1.In the above theorem, we could relax the condition that T 0 = Θ((∆ K λ ) −1 log(q −1 ).To be more precise, if we are given a lower bound ∆ low for ∆ K λ , we could set T 0 = Θ((∆ low ) −1 log(q −1 )), and the result of the theorem still holds.

In Theorem 1 and Algorithm 2, we utilize a sequence of T n to approximate the domi-
nant eigenvalues.This is due to a technical consideration in our theoretical analysis.Specifically, in order to ensure the feasibility of optimization problem (16), our proof requires that the random discrepancy between (16) , it is theoretically possible to show that by minimizing (16) once with T max = Θ (q/ϵ) and N = Θ(q −2−o(1) ), we can achieve ϵ-accuracy for all dominant eigenvalues.This approach also offers benefits such as short circuit depth and Heisenberg-limited scaling.

When a spectral gap exists between the dominant eigenvalues and the non-dominant
ones (represented as ∆ λ ), it may be feasible to further reduce the maximum runtime to The reason is given in the first point of Remark 3.

Numerical results
In this section, we numerically demonstrate the efficiency of our method using two different models.In Section 4.1, we compare the performance of Algorithm 2 with QPE (textbook version [28]) for a transverse-field Ising model.In Section 4.2, we compare the performance of Algorithm 2 with QPE for a Hubbard model.In both cases, we assume there are two dominant eigenvalues (λ 1 , λ 2 ), meaning K = 2.We share the code on Github (https: //github.com/zhiyanding/MM-QCELS).
In our numerical experiments, we normalize the spectrum of the original Hamiltonian H so that the eigenvalues belong to [−π/4, π/4].Given a Hamiltonian H, we use the normalized Hamiltonian in the experiment: Recall that the textbook version of QPE [28] relies on a quantum circuit that involves the inverse Quantum Fourier Transform (QFT).This circuit serves to encode the information of approximate eigenvalues using the ancilla register.By measuring the ancilla qubits, we could acquire an output θ that closely approximates λ k , one of the eigenvalues of Hamiltonian H.To find an approximation to the smallest eigenvalue, we repeat the quantum circuit for N times and defines the approximation θ * 1 = min 1≤n≤N θ n , where θ n is the output of n-th repetition.The analysis of this method can be found in e.g., [23, Section I.A].However, QPE can also produce spurious eigenvalues which lead to failures, and it is not straightforward to generalize the procedure above for estimating multiple eigenvalues.
Consequently, in our experiment, we first utilize QPE to estimate the smallest eigenvalue λ 1 and measure the error accordingly.We then use Algorithm 2 to estimate the two dominant eigenvalues and measure the error by (assuming For simplicity, in this section, Algorithm 2 is implemented without the constraint ∥r∥ 1 < 1 in (16).While QPE's error is gauged based on a single eigenvalue, the error of Algorithm 2 is evaluated using the maximum error across two eigenvalues.This testing design intrinsically gives QPE a head start.Even with this bias, we demonstrate that Algorithm 2 can outperform QPE.

Ising model
Consider the one-dimensional transverse field Ising model (TFIM) model defined on L sites with periodic boundary conditions: where g is the coupling coefficient, Z i , X i are Pauli operators for the i-th site and the dimension of H is 2 L .We choose L = 8, g = 4.In the test, we set p 1 = p 2 = 0.4.In Algorithm 2, the parameters are set to be  13).As T increases, the landscape of the loss function becomes increasingly complex.However, the value of the loss function around the true eigenvalues decreases significantly, which also leads to a reduction in run-to-run variation of the optimizer.As a result, the optimizer concentrates more tightly around the true eigenvalues.We apply Algorithm 2 and QPE to estimate the dominant eigenvalues (λ 1 , λ 2 ) of the normalized Hamiltonian H according to Eq. ( 24).We then run Algorithm 2 (with K = 2, T 0 = 2(λ 2 − λ 1 ) −1 , N 0 = 3 × 10 3 , N j≥1 = 2 × 10 3 and γ = 1) to compute the error of both λ 1 and λ 2 .We also run QPE 10 times only to estimate λ 1 .The comparison of the results is shown in Fig. 4. We find that the errors of both QPE and Algorithm 2 are proportional to the inverse of the maximal evolution time (T max ).But the constant factor δ = T ϵ of Algorithm 2 is much smaller than that of QPE.Fig. 4 shows that Algorithm 2 reduces the maximal evolution time by two orders of magnitude.The error of QPE is observed to scale as 6π/T .Moreover, the total evolution time (T total ) of Algorithm 2 is also slightly smaller than that of QPE.
According to Theorem 1, accurate estimation of the dominant eigenvalues with a short circuit depth using MM-QCELS depends on two critical factors: the appropriate selection of the parameter K and the fulfillment of the condition R (K) /p (K) min ≪ 1.We would like to emphasize that these criteria are necessary for addressing worst-case scenarios.However, in practical implementations, even if a slightly larger value of K is chosen and the ratio R (K) /p (K) min approximates 1, Algorithm 2 is still possible to produce a precise approximation of the dominant eigenvalues with short circuit depth.In Appendix A, we test these two cases and demonstrate the robustness of MM-QCELS to the choice of parameters.

Hubbard model
Consider the one-dimensional Hubbard model defined on L spinful sites with open boundary conditions Here c j,σ (c † j,σ ) denotes the fermionic annihilation (creation) operator on the site j with spin σ. ⟨•, •⟩ denotes sites that are adjacent to each other.n j,σ = c † j,σ c j,σ is the number operator.
We choose L = 4, 8, t = 1, U = 10.To implement Algorithm 2 and QPE, we again normalize H according to Eq. ( 24) and choose overlap p 1 = 0.4, p 2 = 0.4.We run Algorithm 2 (with K = 2, T 0 = 10(λ 2 − λ 1 ) −1 , N 0 = 4 × 10 4 , N j≥1 = 2 × 10 3 , γ = 1), and QPE 10 times to compare the errors (using Eq. ( 25 We run the experiment ten times and plot the positions of ten minimizers θ * using star (θ * 1 ) and triangle (θ * 2 ) points.The landscape of the loss function is calculated using the data from the last experiment.The blue solid line stands for L K (r * , θ 1 , θ * 2 ) (the variable is θ 1 ), and the blue square is the true eigenvalue λ 1 .The orange dash-dotted stands for L K (r * , θ * 1 , θ 2 ) (the variable is θ 2 ), and the orange filled circle is the true eigenvalue λ 2 .When T is large, the minimizer of the loss function concentrates around the dominant eigenvalues λ 1 and λ 2 .In both cases, it can be seen that the maximal evolution time of Algorithm 2 is almost two orders of magnitude smaller than that of QPE.The total cost of the two methods are comparable.

Discussion
In this paper, we have developed a new algorithm to simultaneously estimate multiple eigenvalues using a simple circuit.This optimization-based approach, called multimodal, multi-level quantum complex exponential least squares (MM-QCELS) saturates the Heisenberg-limited scaling, and achieves a relatively short maximal running time when the residual overlap (denoted by R (K) ) is small.With a proper initial state, this algorithm can be used to efficiently estimate ground-state and excited-state energies of a quantum many-body Hamiltonian on early fault-tolerant quantum computers.
There are a number of directions to extend this work and to strengthen the analysis.
1.If the initial state has significant high energy contributions, it can be implicitly filtered using the circuit in Fig. 2 [23], or explicitly filtered using quantum eigenvalue transformation of unitary matrices (QETU) [7] to satisfy the condition of small residual overlap.Similar to the discussion in Ref. [6], this does not necessarily require a large spectral gap between the dominant eigenvalues and the non-dominant ones, and a relative overlap condition (which is a property of the initial state rather than the Hamiltonian) could suffice.

Our complexity analysis depends the minimal dominant spectral gap ∆ (K)
λ .This is a necessary condition, since the simulation time should be long enough in order to separate the eigenvalues.If two or more eigenvalues are nearly degenerate (i.e., the distance is less than ϵ), we may combine them and view these nearly degenerate eigenvalues as a single eigenvalue.The MM-QCELS method can still be applied without changes.However, compared to the result in Theorem 1, there may not be a one-to-one correspondence between the estimated eigenvalues θ k and the dominant eigenvalues λ m k .
3. Due to the complex energy landscape of the loss function, if we use a grid search to find the global minima, the cost of solving the optimization problem in Eq. ( 14) on a classical computer may scale exponentially in K in the worst case.On the other hand, the signal processing method in [32,21] can scale polynomially in K, but the implementation can be much more complicated than ours.Therefore it is desirable to improve our algorithm (e.g., using a robust initialization strategy) to reduce the cost on the classical post-processing step for large K.
4. While Algorithm 2 is formulated as a multi-level optimization problem, as discussed in Remark 2, solving (14) with sufficiently large values of T and N once may be enough to approximate the dominant eigenvalues.This approach differs from the signal processing-based method discussed in [21], which requires multi-level signal processing.
k p k = 1).Therefore according to Theorem 1, we should choose K = 1 or K = 2.We apply Algorithm 2 (with K = 2, 3, 41 ,T 0 = 10(λ 2 − λ 1 ) −1 , N 0 = 3 × 10 3 , N j≥1 = 2 × 10 3 and γ = 1) to estimate (λ 1 , λ 2 ) and compute the maximal error (25).We also run QPE 10 times only to estimate λ 1 .The comparison of the results is shown in Fig. 7. Surprisingly, although R (K) /p (K) min ≫ 1 when we choose K = 3, 4, meaning that it does not satisfy the condition of Theorem 1, Algorithm 2 still estimates (λ 1 .λ 2 ) accurately with short circuit depth (T max ).Moreover, the total evolution time (T total ) of Algorithm 2 is similar to that of QPE.One intuitive explanation for this phenomenon can be derived from the proof of Theorem 1 in Appendix B. When K > 2, aiming to reach the global minimum leads us to anticipate the existence of a unique pair Then, analogous to the derivation of (37), we can show . These equalities imply that, even if a wrong choice of K is chosen, as long as R (K) /p 1 ≪ 1 and R (K) /p 2 ≪ 1 hold true, an accurate estimation of the dominant eigenvalues can still be obtained with relatively short circuit depth.
The second test in this section focuses on studying the effect of small p min ≈ 1).We construct the initial state with p 1 = 0.21, p 2 = 0.6 and set K = 2.In this setting, we have R (K) /p (K) min = 0.19/0.21≈ 1.We apply Algorithm 2 (with K = 2,T 0 = 10(λ 2 − λ 1 ) −1 , N 0 = 3 × 10 3 , N j≥1 = 2 × 10 3 and γ = 1) to estimate (λ 1 , λ 2 ) and compute the maximal error (25).We also run QPE 10 times to estimate λ 1 only.The results is summarized in Fig. 8. Interestingly, although Theorem 1 requires R (K) /p (K) min ≪ 1 to achieve short circuit depth, MM-QCELS still demonstrates superior performance in circuit depth compared to QPE in this case.Intuitively, this phenomenon finds its explanation through reasoning analogous to the first point raised in Remark 3. To be more specific, for any given δ > 0 and a small enough ϵ, when T ≥ δ/ϵ ≫ ∆ −1 λ , where ∆ λ is the spectral gap between the dominant eigenvalues and the non-dominant ones, we can establish that Here, θ * k is the solution to the ideal loss function (4).Furthermore, we can select the number of data points as N = Θ(δ −2−o (1) ) to effectively control the random measurement error.This approach ultimately enables us to attain Heisenberg-limited scaling and a short circuit depth.

B An intuitive proof of Theorem 1
In this section, we first give an informal derivation to show that by solving the optimization problem, it is possible to find an accurate approximation to dominant eigenvalues with a short maximal running time.For simplicity, in the inituitive proof, we only consider the case when 1.The minimal dominant spectral gap ∆ (K) λ is much larger than the precision ϵ: 2. The modes in D are dominant: p (K) min > CR (K) for some constant C > 0; 3. The maximal runtime is sufficient for resolving the dominant eigenvalues: for some constant C ′ >0.
Here, p is defined in (18), and R (K) is defined in (19).For now we only focus on the ideal loss function, which can be rewritten as Here ) , and W ∈ R are defined as For simplicity, in this intuitive analysis we also neglect the difference between the truncated Gaussian distribution and the Gaussian distribution, i.e., We first claim (without giving the proof here) that when In other words, each θ * k approximates a unique dominant eigenvalue up to a constant proportional to the minimal dominant spectral gap.The next step is to refine the eigenvalue estimate to the target precision ϵ.
, the matrix U is approximately the identity matrix, and Hence conceptually, we can solve for each pair Consider the minimization problem on the right-hand side with fixed k.Noticing that this new loss function is quadratic in r, we obtain that Plugging θ = λ m k in the right-hand side, we obtain Using the Gaussian approximation F (x) = exp(−T 2 x 2 /2), we have where we view T |x| as z in the inequality.This implies F (x) is a O(T )-Lipschitz function.
Combining this with (33), we obtain where we use 0 ≤ F ≤ 1 and F is a O(T )-Lipschitz function in the last inequality.From (34), we first have exp − , which implies When R (K) is sufficiently small, combining Eqs. ( 34) and ( 35), we further obtain This implies In summary, to obtain This implies the depth constant of maximal running time min is close to 0. We remark that when K = 1, there is only one dominant mode m k , and p m k = 1−R (K) by definition.In this case, the result in Eq. ( 35) is comparable to the estimate in Ref. [6] for the QCELS method.The analysis in this work provides a tighter bound of the maximal runtime (or the circuit depth).Specifically, Eq. ( 37) provides a quadratic improvement with respect to the preconstant R (K) for estimating a single dominant eigenvalue, and the same conclusion holds for estimating multiple eigenvalues.Remark 3.

When a spectral gap exists between the dominant eigenvalues and the non-dominant ones (represented as ∆ λ ), we can further reduce the maximum runtime to
. This reduction can be derived in a similar manner to the previous intuitive analysis.Specifically, referring to equation (34), when T is sufficiently large to ensure θ Following a similar derivation to (35)- (37), we obtain the following expression: ) is sufficient to ensure ϵ-accuracy.However, in this scenario, while T max logarithmically depends on the desired precision ϵ, the number of data points needs to increase to N = Ω(ϵ −2 ) to adequately control the random noise.This is similar in flavor to the result in [39] as well as in QCELS [6] for estimating a single dominant eigenvalue.
2. The rigorous proof of Theorem 1 follows a slightly different path from the previous intuitive derivation.The numerical loss function (13) admits a similar quadratic expansion as in Eq. ( 27) with noisy coefficients U (θ), V (θ), and W . Due to the presence of noise and off-diagonal entries in U , the perfect separation assumed in (31) no longer holds, and the analysis of the independent optimization problem cannot be directly applied to show that θ * k is close to λ m k .To overcome this difficulty, we adopt the idea of separation and decompose the numerical loss function after bounding the noise.By comparing the resulting loss function with L K ({p m } m∈D , {λ m } m∈D ), min ).This implies, to obtain ϵ-accuracy, the maximal running time T max = γT = δ/ϵ, where δ = Θ q log(q −1 ) .

C Rigorous proof of Theorem 1
To prove Theorem 1, we first rewrite the optimization problem (14) as Here Roughly speaking, when N ≫ 1, we have the expectation error The proof contains two steps: 1. Find initial estimation intervals for the dominant eigenvalues (using T 0 , N 0 ); 2. In the correct estimation intervals, find more accurate approximations of the dominant eigenvalues (from T j to T j+1 ).Now, we introduce a lemma and a proposition to control the complexity of these two steps respectively.
Let E(r, θ) = E p,r (r, θ)+E r,r (r, θ)+E r,Z (r, θ).We use the following lemma to control the complexity of the first step: Lemma 4. Given 0 < q < 1 such that q = Ω(R (K) /p (K) min ), where p (K) min and R (K) are defined in (17) and (19), we assume p where L K is defined in (13).If then, for each m ∈ D, there must exist a unique 1 ≤ k m ≤ K such that Lemma 4 constitutes a key element in the remaining part of the proof.According to Lemma 4, when the expectation error is small enough, the error of refined dominant eigenvalue estimation is bounded by q/T , where q is a fixed (maybe small) constant and T is the maximal runtime.Combining this lemma with Lemma 8 by setting q = 1, we can demonstrate that when and N 0 = Ω(T 2 0 ), solving the optimization problem gives us a reasonable approximation to the refined dominant eigenvalues, meaning: for each m ∈ D, there must exist a unique Proof of Lemma 4. First, we define where dt.Using the tail bound of a Gaussian and the choice of γ (41), we have Notice where we Ê = E {p m } m∈D , {λ m } m∈D , and ÊF = E F {p m } m∈D , {λ m } m∈D .We separate the proof into two steps.In the first step, we show that for each m ∈ D, there must exist a unique 1 ≤ k m ≤ K such that In the second step, we further improve the bound in (45).
Step 1: Show each θ * k should be close to one λ m for m ∈ D. Suppose there exists m * ∈ D such that for any 1 ≤ k ≤ K, Then, let E * = E (r * , θ * ) and In the second inequality, we use p m * > 3R (K) , (41), and (43).In the first inequality, we use the decaying property of F (x) to obtain where we use λ /4.This contradicts the assumption that {r * k } K k=1 , {θ * k } K k=1 is the minimization point.Thus, (45) is true for all 1 ≤ k ≤ K.
Step 2: Improve the upper bound.Define We have Noticing that (r * , θ * ) is the minimum point and comparing the above inequality with the second inequality of (44), we have Using (41) and conditions of Theorem 1, we have Ê , This implies min ), we further have Expanding F * , we find that By utilizing the bound min q 2 and (47), we can infer from (48) that Because p m ≥ p O q 2 , which implies the first inequality of (42).For the second inequality of ( 42), (49) also implies This concludes the proof of the second inequality of (42).
The following proposition controls the complexity of the second step of the algorithm: Proposition 5. Given failure probability 0 < η < 1/2, any small constant ζ > 0, 0 < q < 1 such that q = Θ(R (K) /p (K) min ), and given a sequence of rough intervals where L K is defined in (13).
then with probability 1 − η, We give the proof of Proposition 5 in Appendix D. Here, we emphasize that Proposition 5 can not be directly proved by combining Lemma 4 and Lemma 8.According to Lemma 4, to obtain the accuracy q/T , we need the expectation error |E| = O(q 2 ).According to Lemma 8, to guarantee |E| = O(q 2 ), we need to choose N = Ω(q −4 ), which is worse than the scaling of N with respect to q −1 in the proposition.This means that to achieve the correct scaling of N with respect to q −1 , we need to bound the expectation error in a different way to obtain the sharp estimation.Now, we are ready to use Lemma 4 and Proposition 5 to prove Theorems 1.

D Proof of Proposition 5
In this section, we prove Proposition 5.The proof idea is similar to [6,Appendix B.2].We first give a rough complexity result that has a O(q −4 ) scaling in N .Then, we consider the difference of the expectation error more carefully and improve this scaling to O(q −2−o(1) ) using an iteration argument.Applying Eq. ( 59) from Lemma 8 in Appendix E to bound the expectation error E(r, θ), we obtain the following lemma: Lemma 6.Given failure probability 0 < η < 1/2, any small constant ζ > 0, 0 < q < 1 such that q = Ω(R (K) /p (K) min ), and given a sequence of rough interval where L K is defined in (13).Next, in order to improve the scaling O(q −4 ) in (54).We propose a different approach to bound the error terms.Instead of bounding E(r * , θ * ) and E({p m }, {λ m }) separately, we aim to bound the difference between these two error terms.Intuitively, when (r * km , θ * km ) and (p m , λ m ) are close to each other, the two error terms are likely to cancel each other out when we compare the difference between L(r * , θ * ) and L({p m }, {λ m }).This intuition is supported by Lemma 8 (60).Assuming that we already have θ * km − λ m < q T and |r km − p m | ≤ qp m , then it is sufficient to choose N = Ω q 2 ξ −2 to ensure that |E(r * , θ * ) − E({p m }, {λ m })| ≥ ξ with high probability.This requirement, compared with the second inequality of (54), reduces the blow-up rate to O(q −2 ) as q → 0, which matches the condition in Proposition 5.However, the above calculation is assuming θ * km − λ m < q T and |r km − p m | ≤ qp m , which is unknown to us in advance.To overcome this difficulty, we need to use an iteration argument to obtain the desired order.This is summarized in the following lemma: Lemma 7. Given failure probability 0 < η < 1/2, an integer S > 1, any small constant ζ > 0, a sequence of rough interval {I k } K k=1 ⊂ R, and a decreasing sequence {q s } S s=0 with 0 < q 0 ≤ 1 and q S = Ω(R (K) /p q −4 s+1 q 2 s polylog KQ p then with probability 1 − η, Proof of Lemma 7. By utilizing Lemma 6 in conjunction with (54), we can demonstrate that with probability 1 − η/(2Q), λ m − θ * km ≤ q 0 T , p m − r * km ≤ q 0 p m .
Finally, we are ready to prove the proposition 5: Proof of Proposition 5. Using the given q from the conditions of the proposition, we construct a decreasing positive sequence {q s } S s=0 with q s = q 2−(1/2) s 2−(1/2) S .With this choice, we have q −4 s+1 q 2 s = q Proof of Lemma 8. We start by proving (59).We will only prove the inequalities for E p,r , but note that the other two can be shown similarly.Define which implies the first inequality of (59) using |r k | ≤ 1.
We proceed to prove (60).As before, we only show the proof of the first inequality in (60).Since k p m k ≤ 1, we have Combining this inequality with (61), we prove the first inequality of (60).

Figure 2 :Algorithm 1
Figure 2: Choosing W = I or W = S † (S is the phase gate), the Hadamard test circuit allows us to estimate the real or the imaginary part of ⟨ψ| exp(−itH)|ψ⟩ on quantum computers.H is the Hadamard gate.The classical computer provides the evolution time t according to a probability density a(t), and performs postprocessing on the quantum data for eigenvalue estimation.

10 3 and γ = 1 .
An illustrative example, Fig 3 demonstrates the landscape of the loss function in Eq. (
) where we omit term 1 N N n=1 |E n | 2 in the first equality and p m p m ′ exp(i(λ m − λ m ′ )t) terms in the second equality because they do not affect the solution of the optimization problem.

) then with probability 1 4 ,∆ 2 , we can conclude that with probability 1 2 .
− η, λ m − θ * km ≤ q T , |p m − r * km | ≤ qp m .(55)Proof of Lemma 6.According to the second equality of (54), we have |I k which implies that for all m ∈ D λ m − θ * km ≤ − η, |E| = O p Finally, since we have obtained a rough bound (56), we can use the same argument as the second step in the proof of Lemma 4 to prove (55).
[6]gure1:A comparison of the theoretical upper bound of δ = T max ϵ for QCELS, QPE-type algorithm, and MM-QCELS for ground state energy estimation.The Hadamard test is only applicable when p 1 = 1 and the theoretical result in[6]requires p 1 > 0.71 for QCELS.
In practical calculations, Algorithm 2 may be simplified into a one-level algorithm by directly choosing sufficiently large values for T max and N .Moreover, if we optimize our function only over a finite number of grid intervals such that θ k (5) the ideal optimization problem(5)remains uniformly small when θ k ∈ [λ min,k , λ max,k ] (see Appendix C Lemma 4 for detail).Achieving this requires a uniform bound for infinitely many continuous random variables, which necessitates constraining λ max,k − λ min,k to be sufficiently small, thus guaranteeing the manageability of random noise with a reasonable number of samples.Detailed information can be found in Appendix E.