On low-depth algorithms for quantum phase estimation

Quantum phase estimation is one of the critical building blocks of quantum computing. For early fault-tolerant quantum devices, it is desirable for a quantum phase estimation algorithm to (1) use a minimal number of ancilla qubits, (2) allow for inexact initial states with a significant mismatch, (3) achieve the Heisenberg limit for the total resource used, and (4) have a diminishing prefactor for the maximum circuit length when the overlap between the initial state and the target state approaches one. In this paper, we prove that an existing algorithm from quantum metrology can achieve the first three requirements. As a second contribution, we propose a modified version of the algorithm that also meets the fourth requirement, which makes it particularly attractive for early fault-tolerant quantum devices.


Introduction
Quantum phase estimation (QPE) is one of the most important building blocks of quantum computing.Consider a unitary matrix U ∈ C M ×M .Let {|ψ m ⟩} M −1 m=0 be the orthogonal eigenstates of U , and {e iλm } M −1 m=0 be the corresponding eigenvalues.In the QPE problem, given U and an eigenstate, say |ψ 0 ⟩, the goal is to estimate λ 0 up to a certain accuracy.In a more general and practical setting, the initial quantum state |ψ⟩ provided for phase estimation is a linear combination of eigenstates, i.e., |ψ⟩ = M −1 m=0 c m |ψ m ⟩, where the coefficient c 0 for |ψ 0 ⟩ dominates.
Due to the significance of QPE, many algorithms have been devised to address this problem.The well-known Hadamard test provides probably the simplest circuit (see Figure 1(a)) for this purpose, but it requires O ϵ −2 executions to reach a precision ϵ.Improving on the Hadamard test, Kitaev's algorithm [13,14] identifies the phase bit-by-bit by using quantum circuits with dyadic powers U 2 j (see Figure 1(b)).It reduces the total complexity but is only applicable to exact eigenstates.The QPE algorithm based on quantum Fourier transform (QFT) [4] requires only a single run but a relatively large number of ancilla qubits.Many alternatives for QPE have been proposed in the literature [3,5,7,9,15,17,18,21,22,26,27,28].Among them, [27,28] focus on the case of ground state energy estimation and a spectral gap is assumed.For a more comprehensive overview of the QPE algorithms, we refer to the detailed discussions in [5,18,20].
Hongkang Ni: hongkang@stanford.eduHaoya Li: lihaoya@stanford.eduLexing Ying: lexing@stanford.edu,We thank Lin  There are several key complexity metrics for evaluating the performance of the QPE algorithms.The first is the number of ancilla qubits required; the smaller, the better.The second one is the maximum runtime T max , measured by the maximum depth of any circuit used by the algorithm.The third one is the total runtime T total , equal to the sum of the circuit depths over all executions.It has been demonstrated (see [8,29,30] for example) that T total must follow the Heisenberg limit T total = Ω(ϵ −1 ).The fourth one is the minimal overlap p 0 = |c 0 | 2 required between the initial state |ψ⟩ and the target state |ψ 0 ⟩.A lower bound of this overlap is usually assumed because the problem of finding λ 0 is otherwise shown to be difficult ( [1,11,14]).Among these metrics, a small T max is particularly important for early fault-tolerant quantum devices since these devices typically have a small number of qubits and a relatively short coherence time.For the phase estimation problem, one can use similar ideas to the proof of lower bound in [10] to show that T total = Ω(ϵ −2 T max ), which means T max = Ω(ϵ −1 ) when T total = O ϵ −1 .A detailed proof is given in [25].In conclusion, a near-optimal phase estimation algorithm should meet the following requirements: 1. Use a small number of (even a single) ancilla qubits.
2. Allow the initial state |ψ⟩ to be inexact.
3. Achieve the Heisenberg-limited scaling T total = Õ ϵ −1 .4. T max = O ϵ −1 , and the prefactor can be arbitrarily small when |ψ⟩ approaches to the exact eigenstate |ψ 0 ⟩.As we mentioned, this is particularly important for early fault-tolerant quantum devices.
Most of the proposed QPE algorithms fail to meet all four requirements.For example, the first requirement is violated by QPE algorithms using QFT.The Hadamard test and the original Kitaev algorithm violate the second requirement.The Hadamard test also violates the third requirement.It has been pointed out in [5] that the only algorithm proven to meet all four conditions is the recent optimization-based algorithm proposed in [5].
There have also been rapid developments of phase estimation in the related field of quantum metrology, such as [2,12,19,23,24].For example, the robust phase estimation (RPE) algorithm in [2,12,24] halves the confidence interval of the phase iteratively to achieve the target accuracy.In this paper, • we show that this RPE algorithm satisfies the first three requirements listed above as long as the initial overlap is above 4 − 2 √ 3 ≈ 0.536.This is an improvement over the threshold 0.71 obtained in [5], and • we propose a modified algorithm with a much shorter circuit length when the overlap approaches 1.The prefactor of T max = Õ ϵ −1 can be as small as Θ(1 − p 0 ), which is better than the bound Θ( √ 1 − p 0 ) provided in [5].
The rest of the paper is organized as follows.Section 2 proves the correctness of RPE when the initial overlap is above 4 − 2 √ 3. Section 3 presents the new algorithm that allows for shorter circuit length when the initial overlap approaches one.

Analysis of the existing algorithm
The angles and their computations are understood as modulo 2π.The absolute value of an angle θ, denoted by |θ| 2π , is defined to be the minimum distance to 0 modulo 2π, i.e., the overlap between the given state and the eigenstate |ψ m ⟩.
In the Hadamard test, the circuit in Figure 1(a) is used to estimate the real and imaginary parts of ⟨ψ|U |ψ⟩.In Kitaev's algorithm, the circuits in Figure 1(b) are used to estimate the real and imaginary parts of ⟨ψ|U 2 j |ψ⟩ for different values of j.Algorithm 1 is a reformulation of the RPE outlined in [2], which uses the same primitives of the Kitaev's algorithm and produces an approximation of λ 0 with high probability.In the j-th step, the Hadamard test is executed N s times to obtain an estimate Z j of 2 j λ 0 .Then arg Z j is used to get a candidate set S j of new approximations of λ 0 .The one closest to the previous approximation θ j−1 is chosen as the new approximation θ j .

Algorithm 1 An adapted version of RPE in [2]
Input: ϵ: target accuracy, η: upper bound of the failure probability, δ: upper bound for the noise in the initial state |ψ⟩.Let J = ⌈log 2 (ϵ −1 )⌉ and calculate N s with the values of ϵ, η and δ according to (5).θ −1 = 0. for j = 0, 1, . . ., J do Run the circuit in Figure 1(b) for the real part and imaginary part of ⟨ψ|U 2 j |ψ⟩ for Ns 2 times each to generate Z j as an estimation of ⟨ψ|U . end for Output: θ J as an approximation to λ 0 .
The following lemma is key to the analysis of Algorithm 1.Here, the constant δ serves as an upper bound for the noise in the initial state, i.e., δ > p If the quantum state |ψ⟩ satisfies p 0 > 1 − δ and where arg Z j is the principal argument of Z j .
where α(δ) is defined in (1).Then the output of Algorithm 1 satisfies In addition, the maximal runtime and the total cost of Algorithm 1 are, respectively, Proof.First, we claim that under the circumstances of by induction.The case j = 0 is a direct consequence of Lemma 1.Now if (9) holds for j−1, then Lemma 1 states that λ 0 is in one of the intervals for k = 0, . . ., 2 j − 1. Checking the length of the gaps between these intervals shows that only one such interval 3).Recall that this is exactly the criteria for choosing θ j .Hence λ 0 ∈ I k * = θ j − π 3•2 j , θ j + π 3•2 j .It remains now to prove that (8) holds with a probability greater than 1 − η.By Hoeffding's inequality, using N s samples ensures that for every j.Therefore, we conclude that Figure 3: Illustration of Algorithm 1 and Theorem 2. At each iteration j, the blue interval is the neighborhood of the chosen θ j with length π 3•2 j−1 .The yellow ones are discarded.

Low-depth circuit for large initial overlap
This section shows that the maximal runtime can be further reduced when the initial overlap p 0 is close to one (i.e., the upper bound δ of the noise in the initial state ≪ 1).This is achieved via Algorithm 2.
In order to analyze the complexity of Algorithm 2, we need a generalization of Lemma 1 as provided below.
Here, we generalize the result presented in Theorem 2 and show that Algorithm 2 can give a further reduced T max .Notice that even though the assumption on δ still reads δ < 2 √ 3 − 3, Theorem 5 only provides substantial reduction on T max when δ is sufficiently small.

Theorem 5. Assume that the quantum state |ψ⟩ satisfies p
and that the parameter N s in Algorithm 2 is given by: Then the output of Algorithm 2 satisfies In addition, the maximal runtime and the total cost of Algorithm 2 are, respectively, Proof.Similar to Theorem 2, we first prove that under the circumstance of by induction.The case j = 0 is a direct corollary of Lemma 3. Now assume that (20) holds for j − 1.It is guaranteed by Lemma 3 that λ 0 is in one of the intervals for k = 0, . . ., 2 j − 1.Since ξ < 1, the same argument as in Theorem 2 shows that at most one such interval can have a non-empty intersection with (see Figure 4).Since this is the criteria for choosing θ j in the algorithm, we have (19) holds with a probability greater than 1 − η.In fact, by Hoeffding's inequality, with N s samples from the Hadamard test, one can ensure that for every j.Thus, with the union bound, one arrives at Remark 6.It is worth noticing that when reducing T max to O ξϵ −1 , T total will increase to Õ ξ −1 ϵ −1 .Therefore, a trade-off similar to [5] exists between T total and T max .In particular, by using a small prefactor ξ = Θ(δ) in the large overlap regime, one reduces T max to O δϵ −1 (while increasing the total runtime to Õ δ −1 ϵ −1 ).According to Remark 4, Algorithm 2 reduces the maximal circuit length by a factor of Θ(1/δ), while the result presented in [5] only gives a factor of Θ(1/ √ δ).After the initial submission of our manuscript, an extended version of QCELS is shown in [6] to achieve the factor Θ(1/δ) with a more sophisticated analysis.

Remark 7.
The result in Algorithm 2 can be extended to the case with large relative overlap (Cf.[5,18]) without increasing the depth of the quantum circuit.The filtering process and proof in [5] can be directly applied here.Briefly speaking, if one knows a priori that λ 0 ∈ I ⊂ I ′ ⊂ [−π, π], then one only needs The idea is to replace Z j , which is an estimator of ⟨ψ|U 2 j |ψ⟩ = M −1 m=0 p m 2 j λ m , with an estimator of ⟨ψ|U Here H is a matrix such that U = e iH and F q is an approximation of the indicator 1 I ′ up to precision q, which is obtained by a truncation of the Fourier series of 1 I ′ .The estimator of ⟨ψ|U 2 j F q (H)|ψ⟩ is then constructed with the results of Hadamard tests and the coefficients of the Fourier series.

Numerical simulation
In this section, we present the results of numerical simulations of our algorithms and compare our method to other phase estimation algorithms.The unitary operator we employ is defined as U = e i π 4 H/∥H∥ 2 , where H is a Hamiltonian and ∥ • ∥ 2 represents the operator 2-norm.The scaling factor π 4∥H∥ 2 in the exponential is used to ensure that the eigenvalues are all in [− π 4 , π 4 ], thereby eliminating any ambiguity arising from modulo 2π.Specifically, we employ the one-dimensional transverse field Ising model with L sites and periodic boundary conditions as the Hamiltonian, which is given by with parameters L = 8 and g = 4. Here, X i and Z i represent the Pauli matrices acting on the i-th site.
In Figure 5, we present the performance of our RPE algorithm (Algorithm 1) and compare it to two other algorithms, namely the QCELS algorithm (the optimization-based method in [5]) and the textbook version QPE [4], with two different initial overlap.The parameters for QCELS are set identically to those in [5].All data for these three methods are obtained by conducting ten random experiments and calculating the average error.As demonstrated in Figure 5(a), the error given by RPE and QCELS decreases when increasing T max with a linear trend for both p 0 = 0.6 and p 0 = 0.8, and both RPE and QCELS provide a much smaller prefactor than textbook QPE.On the other hand, it is clear from Figure 5(b) that Algorithm 1 achieves a lower total cost T total compared to QCELS.In the second numerical experiment, we consider the same unitary operator U but assume that the initial overlap is sufficiently large, and we proceed to verify that Algorithm 2 can further reduce the prefactor in T max by lowering the value of ξ in this regime.In particular, we assume that the initial overlap is p 0 = 0.99.Then ξ can be as small as 3  π arcsin(0.01/0.99)≈ 0.01 according to Theorem 5. Figure 6 displays the results for Algorithm 2 with different values of ξ, where the error is also averaged from 10 random experiments.For the three different values ξ = 1, 0.3, and 0.1, the error shows a similar linear decreasing trend while the maximal runtime and the total runtime increase, which indicates that the difference only lies in the prefactor.Moreover, the prefactor of T max decreases as ξ decreases, at the expense of an increase in T total , which verifies the conclusions of Theorem 5.

Conclusion and discussion
In this paper, we have demonstrated through theoretical analyses and numerical experiments that a simple RPE-type algorithm and its variant can be particularly suitable for the implementation of phase estimation on early fault-tolerant quantum computers since they satisfy the requirements (1), (2), (3) and (4).Compared with the previous work [5], our method is structurally simpler since no optimization procedure is needed.Our method also provides a more significant prefactor reduction while at the same time posing a looser requirement for the initial overlap.
For the theoretical results presented in this paper, a large probability statement is adopted.One can also easily extend the results to the other metrics, such as the mean squared error (MSE), where a different number of measurements can be used in each iteration to minimize the MSE.
The unitary U is assumed to be a black-box unitary in the setting of this paper.When other powers of U apart from U 2 j are accessible, it is possible to relax further the requirement δ < 2 √ 3 − 3 (see the follow-up work [16] for details).The fact that δ only needs to be smaller than an O (1) threshold makes the algorithm presented here specifically suitable for the case where the unitary comes from a simulation of a Hamiltonian of interest.In that case, the result shown here only requires an approximate simulation with O (1) precision for each time duration 2 j , thus making the algorithm particularly advantageous for the combination with Hamiltonian simulation algorithms.

Figure 1 :
Figure 1: (a) The circuit for Hadamard test.H is the Hadamard gate.Concerning the I/S + gate, we use I (the identity) for the real part of ⟨ψ|U |ψ⟩, and S + (the adjoint of the phase gate S) for the imaginary part.(b) The circuit used in the Kitaev algorithm estimates the real and imaginary parts of ⟨ψ|U 2 j |ψ⟩ for multiple integer values of j.

Figure 2 :
Figure 2: Illustration of the proof of Lemma 1.

Figure 4 :
Figure 4: Illustration of Algorithm 2 and Theorem 5. Compared to Algorithm 1, a short interval is maintained at each level, and thus, the maximum value J is smaller even for the same precision ϵ, leading to a smaller T max .

Figure 5 :
Figure 5: Numerical simulations for the transverse field Ising model using different phase estimation methods.RPE, QCELS, and QPE denote our method, the optimization-based method in [5], and textbook version QPE, respectively.The initial overlap p 0 is chosen to be 0.6 and 0.8.(a) Comparison of the maximal runtime T max .(b) Comparison of the total runtime T total .

Figure 6 :
Figure 6: Numerical simulation for transverse field Ising model with different ξ's.The initial overlap p 0 is chosen to be 0.99.The logarithmic scale is used for both the vertical and the horizontal axes.(a) Comparison of maximal runtime.(b) Comparison of total runtime.