Amplitude Estimation from Quantum Signal Processing

Amplitude estimation algorithms are based on Grover's algorithm: alternating reflections about the input state and the desired outcome. But what if we are given the ability to perform arbitrary rotations, instead of just reflections? In this situation, we find that quantum signal processing lets us estimate the amplitude in a more flexible way. We leverage this technique to give improved and simplified algorithms for many amplitude estimation tasks: we perform non-destructive estimation without any assumptions on the amplitude, develop an algorithm with improved performance in practice, present a new method for unbiased amplitude estimation, and finally give a simpler method for trading quantum circuit depth for more repetitions of short circuits.

matches the performance of [Suzuki&19]. [VO20] gave a relative-error estimation algorithm that is also non-adaptive. Finally, [GKLPZ20] present a family of estimation algorithms with trade-offs between the circuit depth and the total number of samples. Together, these techniques make amplitude estimation simpler, faster, and more versatile.
A separate line of work studies quantum algorithms for approximating partition functions that leverage amplitude estimation as a subroutine. In an effort to improve an algorithm from [Montanaro15], [HW19] modify the [BHMT00] method in order to be 'non-destructive'. In this situation, |ψ is much harder to prepare than to reflect about. Thus, we only want to use one copy of |ψ for the estimation algorithm, and return it undamaged when done. The non-destructive method by [HW19] was later used again in [AHNTW20].
Recent literature has presented many different quantum algorithms, all of which improve over [BHMT00] in some particular way. In this manuscript we build a framework that unifies the underlying quantum subroutine of some of these methods. This allows us to inherit several of these improvements simultaneously, whenever sensible. This 'unification' of amplitude estimation algorithms is achieved through Quantum Signal Processing (QSP).
QSP is a powerful technique for building quantum algorithms, capable of unifying many other existing results [GSLW18,MRTC21]. While existing techniques only consider reflections Z Π and Z ψ around Π and |ψ , we consider arbitrary rotations e iφZ Π and e iφZ ψ . We find that in situations where Grover's algorithm is implemented, it is generally possible to implement these arbitrary rotations at no additional cost. The dynamics of such quantum circuits are well studied [YLC14,LYC16,GSLW18,MRTC21]: we find that careful choice of phases φ lets us implement a family of polynomials P (a) in the amplitude. This yields our first result: Theorem. (Section 1, Theorem 6) Sampling from polynomials. Say we are given access to one copy of |ψ as well as the oracles e iφZ ψ and e iφZ Π . Then it is possible to repeatedly toss coins with Pr[heads] = |P (a)| 2 where P (a) can be freely chosen from a certain family of polynomials. The number of oracle queries per coin toss is O(deg(P )).
We find that all the algorithms mentioned above except for those presented in [BHMT00,HW19] can be formulated as repeated invocations of special cases of this theorem. This already brings many previous results into a single framework. By considering the full range of polynomials, we find this tool for amplitude estimation can improve or simplify many of the existing algorithms.
In Section 2 we consider the setting where the state |ψ is expensive to prepare, but the reflection Z ψ is relatively cheap. This appears in algorithms for estimating observables on ground states such as [LT20], where the manner by which |ψ is prepared in the first place is by constructing a reflection about the ground space Z ψ and then employing amplitude amplification. It also appears in the partition function estimation algorithms in [Montanaro15, HW19,AHNTW20], where a sequence of qsamples of thermal states is constructed through amplitude amplification. Here, we can reflect about any of these states using Szegedy walk unitaries, but re-preparing the states corresponding to low temperatures is expensive. To aid in settings like these, [HW19] gave a method for non-destructive amplitude estimation that restores a copy of |ψ after the algorithm completes. That way, each qsample state can be re-used.
However, we find that the non-destructive estimation algorithm presented in [HW19] makes a subtle implicit assumption: that some κ is known such that κ < a < √ 1 − κ 2 . But since the whole point of amplitude estimation is to determine the value of a, such a bound may not be known beforehand. We resolve this issue by developing a procedure that requires no such bound. Furthermore, our procedure can be combined with any algorithm in the polynomial sampling framework above. However, in the process we incur a polynomial slowdown in the failure probability δ, which only appears polylogarithmically in the overall complexity when a bound κ is known.
Theorem. (Section 2, Theorem 15) State repair. Say we just sampled from several polynomials, and the sum of the degrees of these polynomials is D. Then we can restore the quantum state to a copy of |ψ with success probability ≥ 1 − δ by sampling from one additional polynomial with degree O(δ −1/2 D).
We remark that there was also a non-destructive amplitude estimation algorithm presented in [Rall21], but this algorithm estimates the probability a 2 , not a, which is weaker 1 . The algorithms in [HW19, AHNTW20] rely on methods for mean estimation from [Montanaro15] which in turn rely on estimates of arcsin(a) ∼ a, and it is not clear how to generalize them to rely on a 2 instead.
Parallel work. [CH22] simultaneously presented a different method for achieving nondestructive amplitude estimation algorithm that also improves over that of [HW19].
Next, in Section 3 we ask if the polynomial sampling framework can reduce the constant factors in the number of queries required. Since we want amplitude estimation to be useful in practice, it is essential to study which method can minimize the number of queries, even though the asymptotic performance is already determined to be Θ(ε −1 ) [NW98]. The prior state of the art for amplitude estimation with high empirical performance is Iterative Quantum Amplitude Estimation (IQAE), the algorithm presented by [GGZW19], which, according to our experiments, requires about ≈ 9.93/ε queries to the Z Π oracle to achieve a success probability of ≥ 95%.
IQAE relies on the oracles Z Π and Z ψ , which can be viewed as special cases of the e iφZ Π and e iφZ ψ oracles when φ = π/2. In this case, the polynomial P (a) is a Chebyshev polynomial. While in this sense IQAE is restricted to odd Chebyshev polynomials, the polynomial sampling framework admits Chebyshev polynomials of both even and odd degree. But except for this modest increase in flexibility, we find that it is not particularly useful to deviate from φ = π/2 in order to improve constant-factor performance. This makes intuitive sense: Chebyshev polynomials exhibit the maximum rate of change among polynomials bounded by |P (a)| ≤ 1 for a ∈ [0, 1], and we even rely on this fact in Section 2. Thus, as a rule of thumb, deviating from Chebyshev polynomials should waste resources.
However, we still find that it is possible to improve over the performance of IQAE. IQAE features a 'no-overshooting' condition that prevents the algorithm from taking too many samples from the most expensive polynomials near the end of execution. We find that this no-overshooting condition can be tuned in order to significantly improve performance. We combine this optimization with the ability to estimate the amplitude a directly, rather than the Grover angle θ := arcsin(a), and the ability to sample from both odd and even Chebyshev polynomials, into a new algorithm we call 'ChebAE'.

Empirical Claim. (Section 3, Empirical Claim 18) Chebyshev amplitude estimation 'ChebAE'.
There exists an amplitude estimation algorithm that samples from Chebyshev polynomials and returns an estimate of the amplitude with error ε and success probability ≥ 95% in about ≈ 4.66/ε queries to Z Π .
Depending on the details of the analysis, ChebAE only requires about 45% to 65% of the queries of IQAE depending on the analysis methodology. To support this claim, we performed a comprehensive numerical study where we benchmark the query complexity of the relevant algorithms in the literature.
In Section 4, we study a problem inspired by recent work [LdW21,vACGN22]: unbiased amplitude amplitude estimation. An amplitude estimation algorithm effectively samples from a random variableâ that satisfies |â − a| ≤ ε with high probability. For unbiased estimation, we additionally demand that E[â] ≈ a. This is in contrast to the unbiased phase estimation problem considered by [LdW21,vACGN22], which, given an oracle for e iθ , samples from an unbiased estimatorθ of θ. Their solution to the problem is based on applying a random shift to the angle θ before estimation, and accounting for it afterward. It is not possible to use [BHMT00] to convert unbiased phase estimation to unbiased amplitude estimation since the relationship a = sin θ is not linear. Instead, we invoke Jackson's theorem [Rivlin69] to build a polynomial that guarantees Theorem. (Section 4, Theorem 23) Unbiased amplitude estimation. For any ε, η, δ, there exists an amplitude estimation algorithm that samples from a random variableâ that satisfies Pr[|â − a| ≥ ε] ≤ δ and |E[â] − a| ≤ εη + δ. It has a query complexity of O(ε −1 (η −1 + log(δ −1 ))).
Parallel work. While we were unaware of any applications of this subroutine in other quantum algorithms as we were writing this manuscript, [CH22] were able to improve partition function estimation using an unbiased amplitude estimation algorithm of their own.
Finally, in Section 5 we revisit a problem studied by [GKLPZ20]: trading classical and quantum resources in amplitude estimation. Early quantum computers may be able to perform the oracles e iφZ Π and e iφZ ψ , but may not be able to perform them O(ε −1 ) many times within a single circuit as required by most amplitude estimation algorithms. [GKLPZ20] develop two algorithms that, for any β ∈ [0, 1), have query complexity O(ε −(1+β) ) and query depth O(ε −(1−β) ). When β = 0, this corresponds to the traditional scaling of amplitude estimation, but as β → 1 we approach the complexity of classical probability estimation.
The algorithms presented in [GKLPZ20] are rather complicated: one algorithm relies on maximum likelihood estimation, which functions very well in practice but rigorously proving its accuracy and convergence rate is challenging. The other algorithm pieces together several estimates of multiples of a via the Chinese remainder theorem. We find that there exists a simpler procedure based on careful construction of polynomials. The polynomials approximate a line with slope k near the current best estimate of a, and have degree O(k). By selecting k ∈ O(ε −(1−β) ) and we can achieve depth ε −(1−β) as desired.
The main technical challenge in constructing our polynomial-based method is rigorously proving bounds on the quality of the polynomial approximation. We find that Jackson's theorem is not accurate enough, so we must instead rely on a construction based on the Jacobi-Anger expansion (see [LC17]). However, we find that we can analyze our method with greater rigor than that of [GKLPZ20], since we do not need to implicitly assume that a, β ∈ Θ(1) as they do.
Theorem (Section 5, Theorem 31). Hybrid quantum-classical amplitude estimation. For any ε, δ > 0 and 0 ≤ β < 1, there exists an amplitude estimation algorithm that estimates a to accuracy ε with probability ≥ 1 − δ. It makes at mostÕ((a −1 + ε −(1+β) ) log(δ −1 )) queries, and the circuits have depth at most Our analysis reveals an additional O(a −1 ) dependence, which we suspect is also present for the algorithms presented in [GKLPZ20]. Of course, if we treat a and β as constants then we recover the same complexity.

Sampling from Polynomials
We begin by establishing our general framework for amplitude estimation algorithms. All future sections will be based on the main result of this section, which, informally, is as follows: Say P (a) is a 'Pellian' or a 'semi-Pellian' polynomial in a. Then, there exists a quantum algorithm that samples from a random variable over {0, 1} with bias |P (a)| 2 . This algorithm makes O(deg(P )) oracle queries to |ψ and Π.
The definition of 'Pellian' and 'semi-Pellian', the exact nature of the oracles, as well as the input and output states of this algorithm will be specified formally in this section. In particular, the outline for the rest of the section is as follows. First, we establish some general notation and discuss the implementation of our oracles. Second, we define the restricted family of 'Pellian' polynomials, and give the simpler algorithm for sampling from them. Third, we define the more general family of 'semi-Pellian' polynomials and give a more complicated algorithm for sampling from them. Finally, we put all these results together with some standard notation and show how some existing results fit into this framework.
In the traditional presentation of Grover's algorithm we are given access to reflection oracles 2 |ψ ψ| − I and 2Π − I. Projected down into the two-dimensional subspace, we can write these as: In this work we consider what is possible when we are instead given access to a larger family of oracles e iφZ ψ and e iφZ Π for arbitrary phases φ. These can always be implemented using two queries to controlled-Z ψ or controlled-Z Π , by using a Hadamard test. For example, to implement e iφZ ψ : This lets us implement e iφZ ψ using two queries to controlled-Z ψ . However, we argue that in essentially all practical situations the oracles e iφZ ψ , e iφZ Π do not have twice the complexity of Z ψ , Z Π . Here are some examples: • Say we have a function f : {0, 1} n → {0, 1} and some classical program that computes it. In the usual situation for Grover's search, we apply reflections Z Π where Π = x f (x) |x x| by rendering the classical program into a quantum circuit via Toffoli gates. That circuit produces an output bit |f (x) plus garbage. To build Z Π , we apply Z to the output bit, and then uncompute. But we could also just apply e iφZ to the output bit instead of Z, and obtain e iφZ Π with the same number of gates.
• Say we have a preparation unitary U ψ for an n-qubit state |ψ , satisfying U |0 n = |ψ .
Then, can implement Z 0 n := 2 |0 n 0 n | − I via a some Toffoli gates and a Z gate, and Z ψ via U ψ Z 0 n U † ψ . But notice that implementing e iφZ 0 n requires the same number of Toffoli gates as implement Z 0 n . Therefore the circuit for e iφZ ψ = U e iφZ 0 n U † has the same complexity.
• Say |ψ is the ground state of a Hamiltonian H, and assume the ground space is nondegenerate. Modern algorithms for estimating ground state energies or other observables on the ground state (such as [LT20]) synthesize approximate reflections via singular value transformation: a polynomial p(x) is approximately 1 in the ground space, and −1 elsewhere, so that p(H) ≈ Z ψ . But we can also design polynomials that are e iφ in the ground space and e −iφ elsewhere, and then p(H) ≈ e iφZ ψ . These polynomials have the same degree, so they have the same circuit complexities.
• Quantum algorithms based on Szegedy walk unitaries (such as [HW19, AHNTW20]) prepare a unitary U that has an eigenvalue |ψ with eigenphase 0, and the other eigenphases are at least ∆. To implement Z ψ , they apply phase estimation to U to extract an estimate of the eigenphase with precision ∆ -enough to distinguish 0 from values larger than ∆. Then they apply a phase flip depending on if the estimate is 0, and uncompute. Similar to the Toffoli circuit, we can also simply apply e iφ if the estimate is 0 and e −iφ otherwise, and we have implemented e iφZ ψ with the same circuit complexity.
It seems that in the black-box setting two queries to Z Π are necessary to implement e iφZ Π . But once we 'open the black-box' and actually implement it with a quantum circuit, we find that for every example in the literature that we know of the circuits implementing Z Π and e iφZ Π are essentially the same size. As we shall see, access to these oracles significantly expands our capabilities.
In a fault tolerant setting, we may find that the complexity of executing a circuit may depend less on its overall size and more on the number of non-Clifford gates. In this case, e iφZ may be significantly more expensive than Z, since Z is a Clifford gate and e iφZ is not. The severity of this effect may vary with the level of precision with which phases φ are implemented [CO16]. We leave an investigation of these concerns to future work.

Sampling from Pellian Polynomials
In this section we present an algorithm based on alternating applications of e iφZ ψ and e iφZ Π . The behavior of such alternating rotations is extensively studied, initially by [YLC14,LYC16]. We find that this lets us toss a coin that comes up heads with probability |P (a)| 2 where P belongs to a certain restricted class of polynomials.
Indeed, [MRTC21] presents a derivation of quantum signal processing in terms of amplitude amplification. Our framework can be seen as a synthesis of this observation with the idea of using quantum signal processing to perform Kitaev's phase estimation algorithm on the Grover operator.
[ GSLW18] gives a precise characterization of what these polynomials are that is completely independent of the actual phase rotations φ. This is exceedingly convenient for quantum algorithms, since we no longer need to worry about circuit synthesis and can simply construct polynomial approximations to the functions we want to apply. We just need to make sure the polynomial we construct meets some simple constraints. Actually computing the phase rotations that implement the polynomial is studied by several recent works [Haah18,Chao&20,WDL21].
However, one thing that is lacking from the literature is simple nomenclature for the different families of polynomials. [Chao&20] introduces the 'Low-algebra' and 'Haah-algebra' for different families of implementable matrices, and [MRTC21] introduced the idea of 'Quantum Signal Processing (QSP) Conventions' such as (W x , S z , 0| · |0 )-QSP. These ideas are useful for understanding the inner workings of polynomial synthesis. But if all we want to do is refer to synthesizable polynomials without worrying about how to find the phases, then these terms are still too technical.
Instead, we are motivated by the fact that the Chebyshev polynomials are a family of polynomials that are implementable via quantum signal processing. [Demeyer07] furthermore observes that the Chebyshev polynomials of the first kind T n and of the second kind U n are solutions of the 'Pell equation': Indeed, the Pell equation is satisfied by exactly the polynomials that are directly implementable by quantum signal processing. We thus propose the following nomenclature: form a Pell pair if they have fixed and opposite parity 2 and, for all x ∈ [−1, 1], they together satisfy: The condition is not symmetrical with respect to P (x) and Q(x), and therefore the different classes of polynomials have different names. While the above definition is independent of the existence of the phases, it still does not make it easy to check if a polynomial P (x) is Pellian. This is facilitated by Theorem 3 of [GSLW18], which states that a polynomial P (x) ∈ C[x] is Pellian if and only if it has fixed parity, and: Here, P * refers to complex conjugation of the coefficients of P only.
We have defined the family of polynomials in terms of simple conditions independent of the implementation. Now we move on to the actual algorithm that samples from these polynomials.
The core idea of quantum signal processing is to apply alternating rotations in different bases. In our case, this will correspond to alternatingly applying e iφZ ψ and e iφZ Π . Keeping track of the dynamics when both bases are rotating is very complicated, so in our analysis we find it useful to standardize the bases. We let {|0 , |0 ⊥ } be some other basis for the two-dimensional Grover subspace, and introduce a new unitary V that is used entirely for analysis purposes and never appears in the implementation.
Now, a pair of rotations in different bases becomes: Next, observe what happens when we write out V in the {|0 , |0 ⊥ } and {|Π , |Π ⊥ } bases via some abuse of notation: Although V is not actually a reflection, when viewed in these particular bases the matrix elements look just like those of a reflection matrix. Our goal is to track the complicated motion through the multiple bases of the Grover subspace via just matrix multiplication over C 2 . In that case, the |Π 0| , |Π 0 ⊥ | , ... labels disappear and we are actually just left with a reflection, call it R. Now we can think of V e iφ 1 Z 0 V † e iφ 2 Z Π just as Re iφ 1 Z Re iφ 2 Z . Given access to alternating phase rotations and reflections, we invoke quantum signal processing to implement the desired polynomial. Throughout the paper, products run from left to right: 3 j=1 c j = c 1 c 2 c 3 .

Lemma 2. Implementing Pellian polynomials via phase rotations and reflections.
Say P, Q ∈ C[a] form a Pell pair, and say d := deg(P ). Then there exist phases φ 1 , ..., φ d−1 , and another phase ϕ such that: Proof. This follows from Theorem 3 and Corollary 8 of [GSLW18] via the substitution e i arccos(a)X = ie −i π 4 Z Re −i π 4 Z .
The |ψ and |Π outcomes occur with probability |P (a)| 2 , and the |ψ ⊥ and |Π ⊥ outcomes occur with probability 1 − |P (a)| 2 . The final measurement basis and the number of queries to e iφZ ψ , e iφZ Π are determined by the degree of P and the input state. The details are given by the transition diagrams in Figure 1.
Proof. The transitions in Figure 1 are performed by applying one of the following unitaries, and then measuring in either the {|ψ , |ψ ⊥ } basis or the {|Π , |Π ⊥ } basis by using a Hadamard test on Z ψ or Z Π . Say P, Q form a Pell pair. Then, there exist phases Figure 1: Transition diagrams implemented by the algorithms in Proposition 3. Given a Pellian polynomial P and one of the states |ψ , |ψ ⊥ , |Π , |Π ⊥ , we can implement one of these transitions using some number of queries Q ψ , Q Π to e iφZ ψ , e iφZ Π respectively. φ 1 , ..., φ deg(P )−1 such that: We will later show that these equalities hold -suppose for now that they do. Then, the query complexities Q ψ and Q Π can just be read off from the products on the left hand sides plus the extra query for measurement, and the transition probabilities can be obtained by taking the magnitude squared of the matrix elements on the right hand sides. Recall that |āQ(a)| 2 = 1 − |P (a)| 2 since P, Q form a Pell pair. So all that remains to show is that these equalities are indeed true, that U (a) , U (b) correspond to odd P , and that U (c) , U (d) correspond to even P .
The calculation for all of these unitaries are extremely similar, so we will just prove the equality for U (a) . Recall the analysis sketch above, where we considered another basis |0 , |0 ⊥ for the Grover subspace, and let V := |ψ 0| + |ψ ⊥ 0 ⊥ |. We then wrote Z 0 := |0 0| − |0 ⊥ 0 ⊥ | and e iφZ ψ = V e iφZ 0 V † . Now we can rewrite the U (a) as: Now we expand each of the components V , e iφZ 0 and e iφZ Π in the {|0 , |0 ⊥ } and {|Π , |Π ⊥ } bases: We observe that the basis expansions of the components line up perfectly when plugged into (21). That means that the unitary implemented by the algorithm can be obtained by matrix multiplication over C 2 : Since R = R † this multiplication over C 2 matches the expression in Lemma 2. Recall d := deg(P ).
So we have established: The calculations for U (b) , U (c) , and U (d) proceed similarly: plug in e iφZ ψ = V e iφZ 0 V † , and write out V, e iφZ 0 , and e iφZ Π explicitly in the {|0 , |0 † } and {|Π , |Π ⊥ } bases, while inserting V V † as needed. All the bases will line up, so we can treat this as just regular multiplication of 2x2 matrices. Then invoke Lemma 2.
Observe also that the number of phases that appear in Lemma 2 is deg(P ) − 1. Since each phase corresponds to one call to either e iφZ ψ or e iφZ Π , we see that U (a) , U (b) correspond to odd polynomials and U (c) , U (d) to even polynomials.

Sampling from Semi-Pellian Polynomials
Some of the later sections in this paper require sampling access to a very broad family of polynomials. The polynomials are obtained from techniques that approximate arbitrary functions, and are then post-processed through shifting and scaling. To this end, Pellian polynomials are still rather restricted. The condition that P (ix)P * (ix) > 1 for even P is very difficult to ensure, essentially forcing us to use odd P only. Furthermore, Pellian polynomials must also satisfy |P (±1)| = 1, which is also not always desirable. Thus, we would like access to a more flexible family of polynomials.
Fortunately, there exists an implementable family of real polynomials that is not so restricted.

Definition 4. A polynomial P ∈ R[x] is semi-Pellian if it is the real part of a Pellian polynomial.
Indeed, [GSLW18] not only characterize this family, but also show how to obtain the corresponding Pellian polynomials: Corollary 10 of [GSLW18] states that a polynomial P (x) ∈ R[x] is semi-Pellian if and only if it has fixed parity, and ∀x ∈ [−1, 1] we have |P (x)| ≤ 1.
To sample from semi-Pellian polynomials, we follow a similar technique to Corollary 18 of [GSLW18]. Say P is semi-Pellian, and sayP is a Pellian polynomial with Re(P ) = P . We conditionally apply phase rotations e iφZ or e −iφZ , which implements eitherP orP * . Then we use a linear combination of unitaries to average these together to obtain P .
Linear combinations of unitaries require an additional flag qubit, and depending on the value of the flag we either obtain Re(P ) or iIm(P ). Thus, the bit b that we sample from depends on the joint outcome of the flag qubit and the output register. Independently of the flag qubit, the state found in the output register depends only on |P | just like Proposition 3. The proof of Proposition 5 shows this calculation in detail.
To efficiently implement phase rotations with a conditional sign flip, we require oracles of the form: and similarly for Π. Such an oracle can be implemented with one controlled and one regular query to e iφZ ψ . But, again, we claim that in most practical situations e iφZ ψ ⊕ e −iφZ ψ can be implemented with about the same query complexity as Z ψ . As articulated above, most quantum algorithms take the following form: some computation flips a qubit depending on if a phase should be applied, applies Z to that qubit, and then performs an uncomputation. The cost of the circuit is dominated by the computation and uncomputation. So, if the Z is replaced with e iφZ or even e iφZ ⊕ e −iφZ , the additional cost is negligible compared to the rest of the circuit.
The output basis, |target , as well as the query complexities to e iφZ ψ ⊕ e −iφZ ψ and e iφZ Π ⊕ e −iφZ Π are shown in the following table: Proof. Just as in Proposition 3 there are actually only four algorithms since the algorithm only depends on the input basis and the parity of the degree. We only work though the case when deg(P ) = 2k + 1 and the input is |ψ or |ψ ⊥ , since the other cases follow the same pattern. Let φ 1 , ..., φ 2k and ϕ be the phases from Lemma 2 corresponding toP and its Pell-complementary polynomialQ. Select φ 0 = φ 2k+1 = −ϕ/2 to achieve: Using the same techniques from Proposition 3 we show: Now we apply the trick from Corollary 18 of [GSLW18]: we use a linear combinations of unitaries circuit to perform the average of two circuits A, B, so the top left matrix element becomes P . The circuit involves preparing an ancilla qubit in the |+ state, applying either A or B depending on the ancilla, and then applying a Hadamard gate to the ancilla: This extra register on the left is |flag . We find: If we begin in the state |ψ , then the |0 |Π component of the output state has amplitude Re(P ) = P , so we obtain this case with probability |P | 2 . The |1 |Π component has amplitude iIm(P ), so the total probability of seeing |out = |Π regardless of |flag is |Re(P )| 2 + |iIm(P )| 2 = |P | 2 . A similar pattern holds for the input state |ψ ⊥ . All that remains to show is how to actually implement the unitary A ⊕ B and to count the total query complexity. Using the identity CD ⊕ EF = (C ⊕ E)(D ⊕ F ) we see from (34, 35) that A⊕B factors into a product of a (i 2k+1 ⊕(−i) 2k+1 ) gate, and k +1 many (e iφZ ψ ⊕e −iφZ ψ ) and (e iφZ Π ⊕ e −iφZ Π ) gates each. Finally, we require one more query to (e iφZ Π ⊕ e −iφZ Π ) to measure in the {|Π , |Π ⊥ } basis.

Previous Results in this Framework
Having established Propositions 3 and 5, we can state the full version of the theorem. Theorem 6. Sampling from a polynomial. Say P (a) is a Pellian or a semi-Pellian polynomial. Then, there exists a quantum algorithm that samples from a random variable b P ∈ {0, 1} such that: This algorithm takes a copy of |ψ , |ψ ⊥ , |Π or |Π ⊥ as input, and also outputs one of these four states at random. This algorithm makes We call invoking this theorem on a particular polynomial P to sample from the polynomial P . The algorithm is specifically designed to be used repeatedly. It accepts any of the four states |ψ , |ψ ⊥ , |Π , |Π ⊥ as input, and also only ever outputs one of these. So, regardless of the actual measurement outcomes and regardless of the sequence of polynomials sampled from, we can always chain invocations of Theorem 6 into a very long circuit. This is only useful in situations where an input state |ψ is expensive to prepare. In that case we might want to reuse that state as much as possible, and indeed Theorem 6 can in principle be applied arbitrarily many times given only one copy of |ψ . But we are by no means forced to do this: at any point, we can discard the output of the algorithm and re-prepare |ψ . Furthermore, the schedule of polynomials is completely independent of this choice.
Recall that Chebyshev polynomials of the first kind T n (a) = cos(n arccos(a)) are Pellian, and can be sampled from by setting all the φ j to π/2, in which case e iφ j Z ψ = Z ψ and e iφ j Z Π = Z Π . In this case, U (a) from Proposition 3 simply becomes Grover's algorithm: alternating applications of Z Π and Z ψ , followed by a {|Π , |Π ⊥ }-basis measurement. So we see that several previous results in amplitude estimation can be viewed as invoking a special case of Theorem 6. We will list these momentarily.
To aid in comparing to previous results and also describe our new results in later sections, we set up some standard nomenclature about amplitude estimation algorithms. In particular, we define the random variablesâ, Q Π , Q ψ , d and D. The query complexities Q Π , Q ψ , and degrees d, D may be random since the algorithms are adaptive sometimes and can vary their runtime based on the results of measurement outcomes. Definition 7. An amplitude estimation algorithm is an algorithm that, starting with |ψ , repeatedly samples from polynomials until it finally outputs an estimate of a as well as one of |ψ , |ψ ⊥ , |Π , |Π ⊥ . The estimate is denoted byâ and is a random variable over R. The total number of queries to e iφZ Π , e iφZ ψ or (e iφZ ψ ⊕ e −iφZ ψ ), (e iφZ Π ⊕ e −iφZ Π ) are denoted by the random variables Q Π , Q ψ ∈ Z + respectively. Finally, the highest degree of a polynomial ever sampled from is d ∈ Z + and the sum of the degrees of all the sampled polynomials is D ∈ Z + . D can be regarded as the overall query complexity of the algorithm. If only one state |ψ is used, then D is also the circuit depth. But if the state |ψ is reset every time, then maximum circuit depth is only d, even though the total query complexity is still D.

Proposition 8. Say an amplitude estimation algorithm only ever samples from Pellian polynomials. Then
Now we list some previous results on amplitude estimation using the notation above.

Relative-error estimation. ([AR19], Theorem 3) For any ε, δ ∈ [0, 1/2] there exists an
amplitude estimation algorithm such that: Fast additive-error estimation. ([GGZW19], Theorem 1) For any ε, δ ∈ [0, 1/2] there exists an amplitude estimation algorithm such that: Non-adaptive relative-error approximate counting. ([VO20], Algorithm 1) Suppose a ≥ 1/N for some known N ∈ Z + . For any ε, δ ∈ [0, 1/2] there exists an amplitude estimation algorithm such that all polynomials depend exclusively on ε, δ, N and are thus all known beforehand. It satisfies: Hybrid quantum-classical estimation. ([GKLPZ20], QoPrime Algorithm.) For any ε, δ ∈ [0, 1/2], and for any 0 < β ≤ 1 there exists an amplitude estimation algorithm such that: We focused here on results that proved the correctness of their algorithms, which leaves out [Suzuki&19]. All of the above algorithms, [Suzuki&19] included, repeatedly invoke the special case of Theorem 6 where P (a) = T n (a). The research papers presenting these algorithms describe discarding and re-preparing the input state |ψ every single time. But since Theorem 6 does not require this, these algorithms could all also work with just a single copy of |ψ .
This means our framework has already extended the capabilities of previous works. Although, for the result of [GKLPZ20] in particular, the whole point of separating d of D was to reduce the maximum circuit depth. If a single copy of |ψ were re-used, the maximum depth would be D, not d. So this extension is not always sensible.
We also did not list [BHMT00] and [HW19] because these do not fit into the framework of Theorem 6. Both of these algorithms involve phase estimation of the Grover operator, rather than simply repeated application. However, this complicated and expensive technique has largely been superseded by the faster, simpler, and more versatile algorithms listed above. There is only one capability that has not yet been 'modernized': the non-destructive estimation algorithm from [HW19]. This task is the subject of the next section.

State Repair
In the traditional setting for Grover's algorithm where we are searching for a marked item, the state |ψ is just a uniform superposition over all items. In this case, |ψ is trivial to prepare and to reflect about, at least compared to querying which items are marked.
However, there also exist settings where |ψ is extremely expensive to prepare despite being relatively easy to reflect about. In this situation, one might hope for an amplitude estimation algorithm that could make do with a single copy of |ψ , and ideally also give that copy of |ψ back once an estimate of |Π |ψ | has been extracted. |ψ could then be used for another part of the algorithm. We call this task non-destructive amplitude estimation.
We know of two examples of such situations in the literature. One of them stems from the study of physical systems: say we have a Hamiltonian H with a ground state |ψ 0 . Our goal is to estimate the expectations of several observables ψ 0 | O 1 |ψ 0 , ψ 0 | O 2 |ψ 0 , etc. Suppose for the moment that the ground space is non-degenerate. In that case, following [LT20], we can design a polynomial p(x) that is ≈ 1 in the ground space and ≈ −1 elsewhere, so that p(H) ≈ Z ψ 0 . Assuming the spectral gap of H is not too small, we can now efficiently reflect about |ψ 0 . In order to prepare |ψ 0 however, we have to guess some trial state |φ that hopefully has large overlap | φ|ψ 0 | with |ψ 0 , and use amplitude amplification to transform |φ → |ψ 0 . Since ground state finding is QMA-complete, this step may take exponential time in general. So if it is really worth investing the time to prepare |ψ 0 , we better only need to do so once. The previous section already describes an algorithm that requires only one copy of |ψ 0 to estimate a quantity, but a non-destructive algorithm could be used to extract several expectations from a single state 3 .
Another example is the kind of quantum algorithm studied by [HW19] and [AHNTW20]. These works consider a classical Hamiltonian H(x), and are interested in preparing a qsample of its thermal distribution |ψ β , where x|ψ β ∝ e −βH(x)/2 . We are given reflection operators 2 |ψ β ψ β | − I not just for the target temperature β * but also all intermediate temperatures.
The idea is to find a sequence of temperatures β 0 , β 1 , ...β * called a 'Chebyshev cooling schedule' that guarantees ψ β i |ψ β i+1 is never too small. With β 0 = 0, |ψ β 0 is just the uniform superposition which is easy to prepare. Then we can use amplitude amplification to transform |ψ β 0 → |ψ β 1 → ... → |ψ β * . The remaining challenge is to identify the temperatures β 1 , β 2 , ... such that ψ β i |ψ β i+1 is never too small. If we have a copy of |ψ β i , we can identify β i+1 via binary search, invoking amplitude estimation at each candidate temperature to estimate the overlap ψ β i |ψ β i+1 . This crucially requires amplitude estimation to be non-destructive, since the state |ψ β i was expensive to prepare and needs to be re-used for several estimations and finally transformed into |ψ β i+1 .
To this end, [HW19] developed a non-destructive amplitude estimation algorithm based on the methods of [BHMT00]. After running [BHMT00] to obtain an estimate, we obtain an eigenstate of the Grover operator: Measuring this state in the {|ψ , |ψ ⊥ } basis, we have 'repaired' a copy |ψ with probability 1/2. If we obtain |ψ ⊥ , we can just repeat the procedure until we succeed.
However, [HW19], featured a hidden assumption. We made this assumption explicit in the above: that some bounds κ < a < √ 1 − κ 2 are known. κ informs the precision of phase estimation on the Grover operator: if precision is not high enough, then we fail to distinguish the eigenvectors |ψ ± , so measuring the output fails to collapse the superposition over them. That means we do not have a copy of |ψ ± after running [BHMT00], so the probability of obtaining |ψ after measuring might be very small 4 . In their analysis, [HW19] implicitly focus on the case where ε < a < 1 − ε, in which case the procedure always works. But if no bounds on a are known, then it is not clear how to use their result to achieve non-destructive estimation.
In this section we present an amplitude estimation algorithm that overcomes this limitation. In Theorem 15 we show that any amplitude estimation algorithm with total degree D can be modified to also output a copy of the input state |ψ at the end with probability ≥ 1 − δ for any δ > 0. This new algorithm has total degree O(δ −1/2 D).
Just like [HW19], the main idea is to add a 'repair step' to the end of another amplitude estimation algorithm that transforms whatever state we have back into |ψ . However, the only piece of information that the repair step needs is the length of the estimation algorithm that was just executed. We emphasize the flexibility of this result: our repair procedure can be appended to any amplitude estimation algorithm in the framework of Theorem 6, which includes several algorithms from previous works as well as the other new algorithms we present in later sections of this work.
We now outline the proof. Since a is unknown, the main idea is to split the analysis into two cases. Let κ be some threshold depending on δ and D, and letκ := √ 1 − κ 2 as usual.
• Say a ≤ κ orκ ≤ a. Then the estimation algorithm is unlikely to have damaged |ψ at all.
In other words, we have used the runtime D of the previous procedure to essentially obtain bounds on a. This is possible because we need D to be ∼ a −1 in order to perturb |ψ . We establish this fact first, and then we discuss amplitude amplification. Amplitude estimation algorithms in the framework of Theorem 6 may appear difficult to analyze in such generality because they could sample from so many polynomials P . But we can simplify things via two observations. First, consider sampling from a semi-Pellian polynomial P that is the real part of a Pellian polynomialP . Looking at Proposition 5, the actual transition probabilities between states are determined byP , not P . So we can assume without loss of generality that we only ever sample from Pellian polynomials. Second, it turns out that, among Pellian polynomials, Chebyshev polynomials essentially maximize the probability of damaging |ψ near when a ≤ κ orκ ≤ a. That means that we can assume without loss of generality that the previous algorithm only ever sampled from Chebyshev polynomials.
Our first goal is to use properties of Chebyshev polynomials to determine good choices for bounds on a, that is, to determine κ. To do so, we employ a geometrical argument involving small rotations on the Bloch sphere. Chebyshev polynomials are commonly interpreted in terms of rotation: For |a| ≤ 1, it is a standard fact that T d (a) = cos(d arccos(a)). However, for small a, arccos(a) is a large angle, not a small angle. We would prefer to consider rotations about θ := arcsin(a), which is small whenever a is small. Fortunately we also have the following: In the previous section, we expressed Pellian polynomials in terms of alternating reflections and rotations. However, it is also possible to express them in terms of alternating rotations alone. Rephrasing part of Theorem 3 of [GSLW18], we find that if P, Q ∈ C[a] is a Pell pair, then there exist phases φ 1 , ..., φ d such that: Notice that W = e iX arccos(a) , but again we would like to write things in terms of θ := arcsin(a). This is achieved by adding an extra factor of iX: Now we can express both general Pellian polynomials and Chebyshev polynomials in terms of small rotations θ. This allows us to show that Chebyshev polynomials always bound Pellian polynomials near a ≈ 0 or a ≈ 1.
Lemma 11. Chebyshev polynomials extremize Pellian polynomials. Say P is a Pellian polynomial of degree d, and say a ≤ sin π Similarly, say cos π 2d ≤ a. Then |P (a)| ≥ |T d (a)|, regardless of if d is even or odd. Proof. We begin by plugging W = iXe iθX into (50). Observe that e iφZ X = Xe −iφZ , so we can find new φ j such that: Next, we repeatedly insert e iφZ e −iφZ for various values of φ, and absorb these rotations into the φ j to make φ j such that: Now we make a geometric argument involving the Bloch sphere to bound |P | 2 . We begin with the a ≤ sin π Consider the dynamics of (53) on the Bloch sphere, as depicted in Figure 2. We begin in the |0 state, and repeatedly make rotations e iθX φ around axes X φ := e iφZ Xe −iφZ that are confined to the XY-plane. Finally, |P | 2 is the probability of measuring |0 in the even case, and |1 in the odd case.
Say we are in the odd case, where we want to give an upper bound on |P | 2 . This is the probability of measuring |1 , which corresponds to how close to the south pole of the Bloch sphere we are. Each rotation e iθX φ moves us an arclength of θ, satisfying θ ≤ π 2d . While these could cancel each other out depending on the φ j , they achieve the greatest distance when they all rotate in the same direction, in which case they achieve a total arclength of dθ. Since dθ ≤ π 2 , we can never overshoot the south pole |1 if we do this. So the final state's amplitude on |1 is at most sin(dθ), so |P | 2 ≤ sin 2 (dθ) = |T d (a)| 2 .
In the even case we want to give a lower bound on |P | 2 . This is the probability of measuring |0 , which corresponds to how close to the north pole we are. But a lower bound on |P | 2 can be obtained from an upper bound on 1 − |P | 2 , which measures how close to the south pole we are. But this is exactly the quantity we bounded in the previous paragraph: we have 1 − |P | 2 ≤ sin 2 (dθ). With some trigonometric identities this yields |P | 2 ≥ |T d (a)| 2 .
The cos π 2d ≤ a case is very similar. We first notice thatā ≤ sin π 2d , so we definē θ := arcsin(ā) = arccos(a), and observe that W = e iθX . We repeat the calculation without needing to propagate forward an iX, and arrive at the existence of someφ j such that: So, regardless of the parity of d, we arrive in a similar situation: we start at the north pole of the Bloch sphere with |0 , apply d small rotations of angleθ ≤ π 2d around various axes Xφ k . Afterwards, |P | 2 is the probability that we measure |0 , or equivalently 1 − |P | 2 is the probability that we measure |1 . As we argued before we have 1 − |P | 2 ≤ sin 2 (dθ), so we have Figure 2: Visualization of the proof of Lemma 11. In particular, consider (53). The quantum state starts at |0 , and is then acted on by several rotations e iθX φ around axes X φ := e iφZ Xe −iφZ . The arclength is always θ, and the axes are confined to the XY-plane. We see that the furthest angle we could achieve after d many such rotations is dθ.
We have demonstrated that for sufficiently extreme a (that is, a ≤ κ orκ ≤ a), we can consider Chebyshev polynomials without loss of generality. We want to argue that for extreme a, the probability that we damage |ψ is very small. However, recalling Figure 1, sampling from an odd polynomial essentially forces us to obtain a copy of |Π or |Π ⊥ , therefore unavoidably damaging |ψ . We argue that this is fine, since for extreme a, one of these is always very close to |ψ and the other is very far from |ψ . In particular, if a ≤ κ, then as we sample from odd polynomials we will just bounce back and forth between |ψ and |Π ⊥ with high probability. Figure 3 summarizes the two cases.
Proposition 12. For extreme a, we are unlikely to damage |ψ . Say we just ran an amplitude estimation algorithm such that the sum of the degrees of all the sampled polynomials is D. Then, the following holds for any δ > 0: • If a ≤ sin( √ δ/D), then the algorithm returns a copy of |ψ or |Π ⊥ with probability ≥ 1 − δ.
Proof. For the purposes of transitions between states {|ψ , |ψ ⊥ , |Π , |Π ⊥ } we can without loss of generality assume that the algorithm sampled from Pellian polynomials only. This is because sampling from a semi-Pellian polynomial P entails finding a Pellian polynomialP  such that Re(P ) = P , and the transitions between states are determined byP in exactly the same way (see Proposition 5).
While in general D is a random variable, the randomness does not really matter for this proof so we drop the boldface and just write D. We begin with the a ≤ sin( √ δ/D) case. Looking at Figure 1, we will show that with probability ≥ 1 − δ, we only see the transitions |ψ → |Π ⊥ , |Π ⊥ → |ψ , |ψ → |ψ or |Π ⊥ → |Π ⊥ . That way, we can only finish with |ψ or |Π ⊥ .
Say we sampled from m polynomials total, and the j'th polynomial had degree d j . That way m j=1 d j = D. If the j'th polynomial P j is odd, we must show that the |ψ → |Π and |Π ⊥ → |ψ ⊥ transitions are unlikely. These each occur with probability |P j | 2 . Since arcsin(a) ≤ √ δ/D, we can invoke Lemma 11 to obtain |P j (a)| ≤ |T d j (a)|: If the j'th polynomial P j has even degree d j , then we must show that the |ψ → |ψ ⊥ and |Π ⊥ → |Π transitions are unlikely. These occur with probability 1 − |P j | 2 . Lemma 11 gives us |P j (a)| ≥ |T d j (a)|.
Finally, we observe that j d 2 j ≤ j,k d j d k = D 2 to complete the proof with a union bound: Now we consider a ≥ cos( √ δ/D), which implies arccos(a) ≤ √ δ/D. In this case we want to demonstrate that we only ever see the transitions |ψ → |Π , |Π → |ψ , |ψ → |ψ or |Π → |Π , so we finish only with |ψ or with |Π . Since a ≥ cos( √ δ/D), Lemma 11 gives us |P j (a)| ≥ |T d j (a)| regardless of d j .
If d j is odd, then we want to show that the |ψ → |Π ⊥ and |Π → |ψ ⊥ transitions are unlikely. These each occur with probability 1 − |P j | 2 , so: Similarly, if d j is even then we want the |ψ → |ψ ⊥ and |Π → |Π ⊥ transitions to be unlikely. These occur with probability 1 − |P j | 2 , so: The same union bound argument shows that we never create |ψ ⊥ , |Π with probability ≥ 1 − δ.
This completes the first part of the argument: if a is sufficiently close to 0 or 1 then it is probably not necessary to repair the state at all, since we either already have |ψ or something very close to it. Next, we simply turn this argument on its head: say we do not have |ψ or something close to it. Then a is probably far from 0 or 1, meaning it is easy to repair using amplitude amplification.
We just ran an amplitude estimation algorithm, so we probably have a decent estimate of a. This means we could just use that estimate to inform a very standard amplitude estimation protocol based on Grover rotations only. However, part of the strength of this section's state repair algorithm is that it can be appended to any amplitude estimation algorithm, including those that give pretty weak guarantees on the quality of the final estimation. So, we will stick to our polynomial sampling framework and construct an algorithm for fixed-point amplitude estimation within it. We can construct Pellian polynomials that, when sampled from, will essentially force the desired transitions in Figure 1b, reprinted for convenience in Figure 3c. It turns out that this method is extremely similar to [YLC14].
We move on to the construction of the Pellian polynomial K(x) satisfying |K(x)| 2 ≥ 1 − η for |x| ≥ κ, using J(x) as a starting point. Indeed, the desired properties follow from: We obtain K from the Pell-complementarity characterization of Theorem 3 from [GSLW18]: we show that there is an even polynomial E(x) such that J(x) =xE(x), and that E(x) satisfies the conditions in the lemma. J(x) is an odd polynomial inx, so J(x)/x is an even polynomial inx. Even polynomials inx are really polynomials inx 2 = 1 − x 2 , so E(x) := J(x)/x is an even polynomial in x as well. By construction J(x) =xE(x).
The first propertyx|E(x)| ≤ 1 for |x| ≤ 1 is guaranteed by the fact that |J(x)| ≤ 1 for |x| ≤ 1. Since E(x) is even, we also need to show the second property that (1 + x 2 )E(ix)E * (ix) ≥ 1 for all real x. Since the coefficients of J(x) are real, the coefficients of E(x) are also real, so Since √ 1 + x 2 ≥ 1 for all real x, and J(x) ≥ 1 for all positive x, we have shown the second condition for E. Thus, by Theorem 3 of [GSLW18] E is Pell-complementary, so there exists an odd polynomial K(x) such that K and E form a Pell pair. We have |K( These polynomials immediately yield an algorithm for amplitude amplification. At this point we finally identify κ := (4/5)· √ δ/D in order to ensure κ ≤ sin( √ δ/D) and cos( √ δ/D) ≤ κ.
Similarly, suppose sin( √ δ/D) < a, and say we have a copy of |Π . Then, we can prepare |ψ with success probability ≥ 1−η by sampling from another polynomial with the same degree.
Proof. Figure 3c shows the transition probabilities when sampling from an odd polynomial starting with |Π or |Π ⊥ .
Finally, we combine Propositions 12 and 14 to prove the main result of this section. The main challenge of this algorithm is that we do not actually know the value of a, and moreover never obtain conclusive evidence that sin( √ δ/D) < a or a < cos( √ δ/D). All we have is the output state of the previous amplitude estimation algorithm and its total degree D. Given this information we essentially guess that one of the bounds on a hold.
Proof. The new algorithm is just the old algorithm plus an extra repair step. We first run the old algorithm, which gives our final estimateâ in some total degree D, as well as one of |ψ , |ψ ⊥ , |Π , |Π ⊥ . Then: • If we have |ψ , then there is nothing to do, and we successfully return |ψ .
• If we have |ψ ⊥ , then we measure in the |Π , |Π ⊥ basis (i.e., sample from P (a) = a) and proceed according to the next two cases.
• If we have |Π ⊥ , then we guess that a ≤ sin( √ δ/D) and sample from the polynomial J(x) from Proposition 14. If we obtain |ψ , then we have succeeded and we are done. If we obtain |ψ ⊥ , then we have failed and we give up.
• Finally, if we have |Π , then we guess that a ≥ cos( √ δ/D) and sample from the polynomial K(x) from Proposition 14. Similarly, we return |ψ if we obtain it, and otherwise we give up.
We want to show that the probability that we fail and give up is at most µ. We split the analysis into three cases depending on a, although we do not need know know which case we are in in order to run the procedure above.
Say a ≤ sin( √ δ/D). Then, by Proposition 12, we have a copy of |ψ or |Π ⊥ with probability at least 1 − δ. If we have |ψ we are done, and if we have |Π ⊥ then we sample from J(a) which returns a copy of |ψ with probability ≥ 1 − η. So overall we fail with probability at most δ + η = µ.
In practice, it may be advantageous to actually use the information on a obtained from the previous amplitude estimation algorithm in order to reduce the overall complexity. In theory, however, we have demonstrated that this is not really necessary and that state repair is always possible regardless of the actual value of a.
We conclude the section with some additional remarks about this result.

Remark 16. Always returning |ψ , even if it takes forever. The modified algorithm from Theorem 15 is a Monte Carlo algorithm: it only adds at most a fixed number of additional queries, but does not always succeed at preparing |ψ . But it can easily be transformed into a Las
Vegas algorithm: First, observe that when a = 0 or a = 1, then sampling from a polynomial can never produce anything other than |ψ . Otherwise, if 0 < a < 1 then we consider the following protocol. If we have |ψ ⊥ then measure in the |Π , |Π ⊥ basis. If we have |Π or |Π ⊥ , measure in the |ψ , |ψ ⊥ basis. Since all measurement outcomes have nonzero probability, this protocol must reach |ψ eventually. Now we have Las Vegas algorithm that will eventually produce |ψ with certainty, but can only inherit the performance guarantees of the original algorithm with probability ≥ 1 − µ.
Remark 17. State repair via sampling given bounds on a. Say we are in the same situation as [HW19], where we are given a bound κ < a <κ. Then we can also achieve state repair with only O(κ −1 log(δ −1 )) queries by sampling from another polynomial. To do so, we follow Theorem 15, but instead of using the bounds sin( √ δ/D) < a < cos( √ δ/D) to determine κ, we just use the κ we were given.

Improved Performance
The asymptotic performance of amplitude estimation has been known for some time. However, for those interested in forecasting the resource requirements of quantum algorithms built on amplitude estimation, constant-factor improvements in query complexity could make a large difference in the time-scale on which certain quantum applications may be practically realizable. To this end, many competing implementations of amplitude estimation have emerged in the last few years with differing empirical query complexity.
The first recent attempt at improving the constant-factor performance of amplitude estimation was [Suzuki&19], which describes an algorithm called Maximum Likelihood Amplitude Estimation (MLAE). There is sound empirical evidence that this algorithm significantly improves over the performance of [BHMT00] and numerics indicate that it achieves O(1/ε) scaling. However, there is no rigorous proof that this algorithm always delivers an accurate answer and the desired performance.
Later, [GGZW19] gave an algorithm called Iterative Quantum Amplitude Estimation (IQAE) that seems to have the best of both worlds. Its empirical performance also outperforms [BHMT00] and seems comparable to that of MLAE, but it also has a rigorous proof of correctness. The numerics supporting the speedup have been independently reproduced [YLRJ20]. The proofs in [GGZW19] even offer bounds on the constant-factor performance, but these bounds are not strong enough to rigorously prove outperforming [BHMT00]. As the desired precision grows, the likelihood-landscape of which MLAE needs to find the maximum becomes more and more complicated, requiring more and more classical resources to analyze. IQAE also avoids this problem with a simpler classical component to the algorithm.
Are there any amplitude estimation algorithms that improve over the constant-factor performance of IQAE while simultaneously featuring a rigorous proof? [Nakaji20] claims to present a modified version of IQAE with better query complexity, but the experiments in the manuscript do not support this claim 5 . Further, [Zhao&22] gave an algorithm where the classical processing is significantly faster then IQAE, but the query complexity is about the same.
In this section we present a modified version of IQAE called 'ChebAE' that we empirically demonstrate to require only about 50% to 60% of the queries of IQAE. Since it is a fairly simple modification, it inherits the proof of correctness presented in [GGZW19]. ChebAE was discovered in an attempt to improve the performance of IQAE using the polynomial sampling framework in Theorem 6. Recall that IQAE is a special case of such algorithms where only odd-degree Chebyshev polynomials are considered. Does access to a larger family of polynomials allow us to improve the query complexity in terms of constant factors? We were not able to devise an algorithm that answers this question in the affirmative because it seems that the Chebyshev polynomials deliver the best performance. However, in our efforts to perform hyperparameter optimization, we discovered a faster algorithm that is based on Chebyshev polynomials only. The algorithm also has the property that it requires a smaller query complexity when a ≈ 1.
We now give a brief description of IQAE in terms of the polynomial sampling framework in order to point out what optimization was made. IQAE is inspired by [AR19] in that it gradually improves a confidence interval [a min , a max ] that contains a with high probability. The method for improving this confidence interval is sketched in Figure 5. First, we find the largest odd integer d such that |T d (a)| 2 is invertible on [a min , a max ]. Then, we sample from T d in order to obtain a confidence interval [p min , p max ] containing |T d (a)| 2 with high probability. For this purpose we can rely on the Clopper-Pearson method [CP34] which is based on tight bounds on the binomial distribution. Then, since |T d (a)| 2 is invertible, we can compute a new interval [a * min , a * max ] where (if |T d (a)| 2 is increasing on the interval): |T d (a * min )| 2 = p min and |T d (a * max )| 2 = p max (67) If |T d (a)| 2 is decreasing on the interval, then we just swap p min and p max in the equations above. We repeat this process until the confidence interval is as small as desired, or until a better Chebyshev polynomial can be found. In order to guarantee a bound δ on the failure probability, we anticipate the total number of confidence intervals [p min , p max ] required and divide the failure probability evenly among them. If we demand that the degree d increases by at least a factor of r at every iteration, then the confidence interval must also shrink by a factor of r. Thus, at most log r (1/ε) iterations are required. With this method, the final asymptotic complexity is O(ε −1 log(δ −1 log(ε −1 ))).
[AR19] gave a method to remove this extra log(log(ε −1 )) factor, but this method blows up the constant factor on the query complexity.
How many samples from T d should we take at every step? We need to take enough samples in order to shrink the confidence interval by a factor of r. But since the shape of |T d (a)| 2 on [a min , a max ] is non-linear and the width of the Clopper-Pearson confidence interval depends on the value of |T d (a)| 2 itself, this exact number is hard to anticipate. The best method would be to take samples from T d one at a time until the confidence interval is small enough so that a larger d can be found. But [GGZW19] found that this is prohibitively slow in terms of classical performance. Observing that the query complexity is dominated by the final iterations of the algorithm, they introduced a parameter N shots = 100 -the number of samples for the 'early' iterations. Once we get to the 'later' iterations we take fewer samples at a time.
vation. This is where our optimization comes in: how do we decide what iterations are 'early' and what iterations are 'late'? IQAE leverages a heuristic based on the maximum error of any Clopper-Pearson confidence interval as well as the assumption that |T d (a)| 2 is linear on the interval. We found that the cutoff from this heuristic is too late, causing IQAE to take more samples than needed. Instead, we introduce a hyperparameter ν that determines when we switch from sampling N shots to 1. When we set ν = 8, then ChebAE needs only about 50% to 60% of the queries of IQAE.
Having established context, we briefly digress to give intuition as to why Chebyshev polynomials appear to deliver better performance over the broader class of Pellian polynomials. Optimizing Pellian polynomials introduces several computational overheads relative to using only Chebyshev polynomials. Classical evaluation of a Pellian polynomial takes O(d) time where Chebyshev polynomials can be evaluated in constant time. This can have a large impact on the efficiency of choosing a new polynomial once the confidence interval has been updated. Additionally, checking whether a Chebyshev polynomial is invertible can be done exactly, but for Pellian polynomials one must check for extrema within the target interval via binary search. Failure to detect an extrema within the target interval results in an increased failure probability. Finally, the optimization of QSP angles is a highly non-convex problem which must be tuned carefully. Given that nearly optimal Chebyshev polynomials can be found inexpensively, the potential benefit of using Pellian polynomials is overshadowed by the associated overhead and reduced algorithmic stability.
ChebAE does add a minor additional flexibility over IQAE: rather than considering only odd-degree Chebyshev polynomials, ChebAE considers Chebyshev polynomials of any degree. But we do not find that this improves performance much. Nonetheless, we find that phrasing the algorithm in terms of Chebyshev polynomials significantly declutters the algorithm compared to IQAE. We also find that our implementation of ChebAE is faster to study classically than the implementation of IQAE that was kindly provided by the authors of [GGZW19]. Our numerical experiments of IQAE were based on the same implementation as the one used for [GGZW19]. Although the modification of IQAE underlying ChebAE is rather slight, we believe that the ChebAE's simpler presentation as well as significant performance improvement merit a full presentation of the algorithm here.

Empirical Claim 18.
There is an amplitude estimation algorithm 'ChebAE' that for any ε, δ > 0 samples from a random variableâ satisfying Pr[|â − a| ≥ ε] ≤ δ. For a = 0.5, δ = 0.05, and ε ∈ [10 −3 , 10 −6 ], the observed average query complexity Q Π satisfies 6 : Proof. The ChebAE algorithm relies on a subroutine find next cheb which we present after the description of the main algorithm. It also has three hyperparameters r, N shots , ν: r is the factor by which we grow the degree of the Chebyshev polynomial at each iteration, N shots is the number of samples from T d we take at each iteration in the 'early' phase of the algorithm, and ν determines when we switch to the 'late' phase of the algorithm. We find the best values of these parameters are: (b) If the following condition holds then we are 'late', otherwise we are 'early'. The purpose of the find next cheb(a min , a max ) subroutine is to find the highest d such that |T d (a)| 2 is invertible on the interval [a min , a max ]. The method is based on the identity T d (a) = cos(d arccos(a)).

Return d.
The proof of correctness is identical to that of IQAE from [GGZW19]: since we grow the degree by a factor of r with every confidence interval, we will need at most T many confidence intervals. Since each interval contains T d (a) with probability ≥ 1 − δ/T , the union bound implies that the algorithm fails with probability at most δ overall. By the same analysis, the asymptotic query complexity is O(ε −1 log(δ −1 log(ε −1 ))).
The data supporting the empirical claim is presented in Figure 6. We took 1000 many runs of the above procedure for each of 9 logarithmically spaced ε in the interval ε ∈ [10 −2 , 10 −6 ] with δ = 0.05 and a = 0.5. We found that for each ε the fraction of runs where |a −â| > ε was indeed always < δ.
We took the mean of the sampled query complexities Q Π for each value of ε, call it Q Π , and fitted the function f A,B (ε) := Aε −1 ln(B ln(ε −1 )). Using brute force, we found the parameters (A, B) = (1.71, 2.08) ensured that the model f A,B (ε) was always within a 3.15% relative error of Q Π . The same parameters ensure that f A,B (ε) are within 71.77% of all values of Q Π . We also fitted a simpler model f C (ε) := C/ε to Q Π , and found that the best parameter choice C = 4.66 approximates both Q Π to within 19.65% and Q Π to within 74.28%.
The software that performed this analysis is available at: https://github.com/qiskit-community/ChebAE/blob/main/chebae.ipynb A key question in this data analysis is if we should consider in our fits the extra log(log(ε −1 )) factor, which is present in the asymptotic performance. This is complicated by the fact that Q Π is a random variable, and visibly exhibits significant variation about the mean. The f A,B fit function captures the empirical mean Q Π of our data rather accurately, suggesting that the log(log(ε −1 )) does affect the empirical query complexity. But the most extreme values of Q Π are only within ∼ 70% of the f A,B model. However, the simpler f C model does not capture Q Π or Q Π any better. This suggests that the large error of ∼ 70% does not stem from poor fit quality but from actual variation within the random variable Q Π .
Having presented the algorithm, we seek to demonstrate three things. First, we want to argue a significant improvement in query complexity relative to prior art. Second, we want to demonstrate that it was indeed the 'late' vs 'early' heuristic that resulted in the improvement over IQAE. Third, we show that ChebAE's requires fewer queries when a ≈ 1.
To compare to the prior art, we characterize the constant-factor query complexity performance of some of the previous algorithms. We consider [BHMT00] for reference, and also look at MLAE and IQAE since these two seem to have similar performance according to prior literature [GGZW19,YLRJ20]. Fortunately, the amplitude estimation method from [BHMT00] has a closed-form expression for its query complexity. Proposition 19. [BHMT00] For any ε, δ > 0, there is an amplitude estimation algorithm (not in the form of Definition 7) that outputs an estimateâ satisfying Pr[|â − a| ≤ ε] > 1 − δ with deterministic query complexity Q Π as follows: At δ = 0.05 this is about Q Π ≈ 1% 50/ε for ε < 10 −2 .
For MLAE and IQAE, we perform similar numerical experiments as in Empirical Claim 18.
Empirical Claim 20. For a = 0.5, δ = 0.05 and ε ∈ [10 −3 , 10 −6 ], the IQAE algorithm from [GGZW19] satisfies: Proof. For IQAE we took 1000 runs with N shots = 100, r = 2 for the same 9 logarithmically spaced ε in the interval ε ∈ [10 −3 , 10 −6 ] with δ = 0.05 and a = 0.5. The f A,B model with (A, B) = (2.62, 6.61) captured the mean with Q Π to within 7.27%, and captured all values with Q Π to within 57.62%. The f C (ε) model fitted to Q Π with C = 9.93 achieved Q Π ≈ 16.41% f C and Q Π ≈ 72.31% f C . MLAE does not take ε as an input parameter. Instead, following [GGZW19], we take 100 samples from T 2k+1 with k = 0, 2 0 , 2 1 , ..., 2 k and compute a confidence interval based on likelihood ratio method. The reported ε is the half-width of this interval. In some sense, this gives MLAE an unfair advantage: IQAE and ChebAE also produce a confidence interval, and that confidence interval's half-width may be smaller than the requested ε. IQAE and ChebAE do not report that smaller potentially smaller half-width, while MLAE does.
Either way, we perform 100 runs with k = {1, . . . , 11} with δ = 0.05 and a = 0.5. Interestingly, we observed that Q Π hardly deviated from its mean The results of these experiments are also shown in Figure 6, and the fraction of incorrect estimates was < δ for all ε (IQAE) and for all k (MLAE).
Again we are in a situation where the performance can be measured differently depending on what model is chosen. Just like ChebAE, for IQAE the f A,B model captures Q Π more accurately than f C . This again suggests that the log(log(ε −1 )) dependence is empirically visible. The accuracy of the f A,B and f C models for MLAE are about the same. Now we justify the claim that ChebAE only needs 45% to 65% of the queries of IQAE. Any such claim will be severely reductive: both of the query complexities are random, depend on the amplitude a being estimated, and the fit methodology also affects the claim. Certainly from Figure 6 we can see that ChebAE exhibits a clear speedup over IQAE, with the worst ChebAE Q Π being about the same as the best IQAE Q Π . The speedup is visible despite the enormous variation of Q Π around Q Π . To quantify the reduction in query complexiy we will focus only on the average Q Π and rely on the fits we performed. Both ChebAE and IQAE are better explained by the f A,B model. The fact that we need two parameters to properly fit ChebAE and IQAE makes a comparison more complicated, but ChebAE improves over IQAE in both parameters A, B. We can thus be generous to IQAE and ignore the B parameter, and say the fraction is about 1.71/2.62 ≈ 65%. If we use the f C model then the fraction is 4.66/9.93 ≈ 45%. By this metric, IQAE only needs about 9.93/50 ≈ 20% of the queries of [BHMT00].
It remains to justify that it was the altered 'late' vs 'early' heuristic that was the primary source of the speedup. ChebAE is extremely similar to IQAE, and the differences can be summarized with three alterations: First, rather than obtain an estimate of the Grover angle θ := arcsin(a), ChebAE estimates a directly. Second, ChebAE considers both odd and even Chebyshev polynomials whereas IQAE only considers odd polynomials. Third, the 'late' vs 'early' heuristic in step 4(b) in Empirical Claim 18 has a different form and also has the ν optimization parameter.
We compare these changes individually in Figure 7. Here we observe that (for a = 0.5) modifying IQAE to estimate a instead of θ significantly worsens the query complexity on average, and increases the amount of random variation. We also observe that when ChebAE is restricted to consider only odd polynomials just like IQAE does, then the performance hardly changes. When switching to the new heuristic in step 4(b), the performance is slightly improved even when ν is small. Then, the query complexity decreases with increasing ν until even the worst ChebAE runs are faster than the best IQAE runs. So clearly it was the altered heuristic and the tuning of ν that caused the improvement. The behavior of ν can be explained as follows: as ν increases the heuristic switches from 'early' to 'late' for earlier iterations, thus decreasing the chances of taking more samples from the expensive Chebyshev polynomials near the end. The improvement with ν exhibits diminishing returns, and worsens the classical performance of algorithm analysis. We find little benefit in increasing ν beyond ν = 8, and the classical analysis time is still faster than that of IQAE.
Finally, we remark on the performance as a function of a. In Proposition 19 we claimed that the accuracy [BHMT00] is independent of a. However, Theorem 12 in [BHMT00] actually has an extra dependence on a. Both IQAE and [BHMT00] estimate the Grover angle θ := arcsin(a), and then convert the estimate of θ to an estimate of a. When a ≈ 1, a less accurate estimate of θ suffices for the same accuracy on a, if only some bounds on a were known beforehand. We consider a setting where nothing is known about a before running the algorithm. Furthermore, we request that the algorithms return an estimate of a certain accuracy ε, and then do not reward the algorithms for achieving a higher accuracy than what was requested. In this more realistic setting for estimation, IQAE and [BHMT00] must assume the worst-case value of a to determine their query complexity and are then penalized for overshooting their accuracy. Consequently, their query complexity as a function of a is constant.
The experiments presented in Figure 8 shows that the query complexity of ChebAE delivers smaller query complexities when a is closer to 1. This stems from the fact that ChebAE does not indirectly estimate a via the Grover angle θ, and instead maintains a confidence interval on a directly. By estimating a directly, ChebAE can take advantage of the fact that Chebyshev polynomials have larger slope when a is closer to 1, and thereby exhibit improved performance for these amplitudes even if nothing is known about a beforehand. The algorithm terminates as soon as the estimate of a is good enough, so it does not overshoot and obtain a higher estimate than necessary. Note that for the numerical experiments supporting our claim that ChebAE outperforms IQAE more generally, we deliberately chose a = 0.5 because that way we compare IQAE to the slowest possible version of ChebAE.

Unbiased Estimation
In this section we present a simple new algorithm for amplitude estimation based on sampling from semi-Pellian polynomials. This algorithm has a new capability: it samples from a random variableâ satisfying E[â] ≈ a. We call this task unbiased amplitude estimation.
This is the first algorithm achieving unbiased amplitude estimation in the literature to our knowledge. On the other hand, unbiased phase estimation has appeared in [LdW21,vACGN22,CH22]. The main idea is that if U has eigenphase e iφ , we can prepare U e iθ with eigenphase e i(φ+θ) for a random shift θ. After applying phase estimation to U e iθ and subtracting off the random shift yields an unbiased estimator for φ.
However, one cannot use [BHMT00] to directly translate the technique of [LdW21] to amplitude estimation. [BHMT00] directly estimates θ := arcsin(a) via phase estimation. Say we had an unbiased estimatorθ of θ. Then, sin(θ) is not an unbiased estimator of a. A different approach is needed.
Instead, we rely on the following strategy. Say we have established a confidence interval [a min , a max ] within which a is contained with high probability. Suppose furthermore that a max − a min < ε, so any value in the interval is within ε of a. We construct a polynomial P (a) such that: Observe that |P (a)| 2 approximates a line with |P (a min )| 2 ≈ 0 and |P (a max )| 2 ≈ 1. So, we let our final output be:â recalling that Pr[b = 1] = |P (a)| 2 from Theorem 6. Then E[â] = |P (a)| 2 a max + (1 − |P (a)| 2 )a min ≈ a, so the estimator is approximately unbiased. But since we also havê a ∈ [a min , a max ], we also have a separate guarantee thatâ is within ε of a.
Sampling from the polynomial P (a) immediately solves the problem once we have obtained the confidence interval [a min , a max ]. But it turns out that the same polynomial can also be used to refine such a confidence interval, that is, shrinking it by a factor of 0.9 while keeping a in the interval with high probability. This immediately yields a full algorithm for amplitude estimation that is rather similar to [AR19, GGZW19], but dramatically simpler than either of them.
Thus, the main technical work of this section is to find a polynomial P (a) with the desired properties. To this end, we use a standard result form the theory of function approximation using polynomials.
Theorem 21. Jackson's Theorem. [Rivlin69] For any continuous function g(x) on the interval [−1, 1] there exists a polynomial P (x) of degree at most d so that for all x ∈ [−1, 1]: where ω g (δ) is the modulus of continuity of g(x) given by: Essentially, we can approximate any continuous function g(x) with a polynomial to error ε. The degree will be on the order of the maximum slope of g times ε −1 . One key complication remains: we want |P (a)| 2 to approximate a line, not P (a). This means we need to account for the Born rule in our choice of g(x), which makes the error analysis a bit more involved.
Proof. Let c be some constant that we will pick later in the proof, and let g(x) be defined on the interval [−1, 1] as Observe that g(x) is even, continuous, and that ω g (1/d) ≤ c ∆d . Following Theorem 21, let J(x) be a degree d polynomial satisfying |J(x) − g(x)| ≤ 6ω g (1/d). We will also pick d later.
We would like to ensure that our final polynomial P η (x) is even, even though J(x) might not be. Actually, under the hood, Jackson's theorem relies on a Chebyshev expansion of g(x). Since g(x) is even, all the odd components of the expansion will vanish, so J(x) will be even anyway. But we do not need to rely on this fact to proceed: we simply let P η (x) be the even part of J(x). Then: P η (a) is even, so it has fixed parity. If we can also show that |P η (x)| ≤ 1 then by Corollary 10 of [GSLW18] we have demonstrated that P η (x) is semi-Pellian.
It remains to show that |P η (x)| 2 approximates x−a min ∆ on the interval [a min , a max ], and to select d. Letc := 1 − c and apply the triangle inequality: So, we achieve 1 ∆d ≤ η when d := η −1 ∆ −1 . This completes the proof. Now that we have a construction for P η (a), we can present the algorithm for unbiased estimation. The main idea is to take an interval [a min , a max ] starting at [0, 1], and to refine it until ∆ := a min − a max satisfies ∆ < ε. Then we apply the trick we described at the beginning of the section to sample aâ satisfying both E[â] ≈ a and Pr[|â − a| ≥ ε] ≤ δ.
Before we step into the main proof, we take note of three things: • Since we always shrink ∆ by the same factor, it is possible to anticipate how many iterations we need: ∼ log(ε −1 ) many. If we grow our failure probability at each iteration δ t via a geometric series, we can avoid an extra log(ε −1 ) factor in the query complexity. This analysis is taken directly from [AR19].
• For shrinking the interval, we actually can get away with a constant accuracy η = 0.1 in our polynomial approximation. It is only the final step where we sampleâ that requires high accuracy.
• Since any random variable bounded in the interval [a min , a max ] must also have its expectation in that interval, we automatically get |E[â] − a| ≤ ε. So the main capability of the algorithm is to make expected value even closer to a. This additional accuracy η enters the runtime as η −1 , stemming from the accuracy of Jackson's theorem. Is there a polynomial approximation achieves polylogarithmic scaling in η −1 ?
We first establish that Pr[|â − a| ≥ ε] ≤ δ. This is guaranteed if a ∈ [a min + 0.9∆ t and S (t) ≤ m t /2. We show that both of these cases occur with probability at most δ t .
Say a ≤ a We have: Using the Chernoff-Hoeffding theorem: Similarly say a ≥ a (t) To finish the argument that Pr[|â − a| ≥ ε] ≤ δ, we follow the analysis of [AR19]. Let b := 1/0.9 ≈ 1.11 and observe that T = log b (ε −1 ) . We bound the probability that any of the steps fail with a union bound: Now we show that D only takes a single value with probability 1. We observe that in Lemma 22, while P max may vary randomly in the algorithm, ∆ t is always exactly 0.9 t . Since the total number of polynomials we sample from is also independent of a (t) min , a (t) max , to total degree of all polynomials D is deterministic. This makes asymptotic claims on D well defined.
At the t'th iteration of step 2, we sample from a polynomial of degree ∈ O(∆ −1 t 0.1 −1 ) = O(0.9 −t ) = O(b t ) a total of m t times. Then, in step 4, we sample from a polynomial of degree So the total degree is: The first term is the desired bound, so all that remains to show is that the second term is dominated by the first term. Indeed, a somewhat cumbersome calculation in equations (38-44) of [AR19] demonstrates that the second term is ≤ O(ε −1 ) (albeit with T + 1 instead of T , which is a stronger result). Finally, we prove the fact that makes this construction unique: that |E[â] − a| ≤ εη + δ. To do so, we interpret the dynamics of the algorithm as a binary decision tree: at each node we can either proceed to the left child where we increase a min or the right child where we decrease a max . At the t'th iteration of step 2 we are at the t'th layer of the tree, and we can think of step 4 as the T 'th layer containing all the leaves. See Figure 9.
We can label each node in the t'th layer of the tree with a bit string x ∈ {0, 1} t , where the bits encode the choices made to get to that node. We say a node x is 'good' if the corresponding interval [a (t) min , a (t) max ] contains a, otherwise we say x is 'bad'. Each node is associated with a random variableâ x , denoting the output of the algorithm assuming we were at the x node at iteration t = dim( x). Correspondingly, at the root of the treeâ equalsâ x where x is the 'empty' bit string with dimension 0.
We prove the statement recursively, starting with the leaves and working our way to the root. For a leaf, t = T so the sum T −1 k=T δ k vanishes. We have: By Lemma 22, we have ||P If xy is good for both y (like x = 01 in Figure 9), then by assumption |E[â xy ] − a| ≤ εη + T k=t+1 δ k . For some probability p we have: Now assume without loss of generality that x0 is bad, and x1 is good (like x = 0 in Figure 9). Now the assumption says nothing about |E[â x0 ]−a|, but we at least know that |E[â x0 ]−a| ≤ 1. Furthermore, we know that the probability p with which we proceed down the x0 path is at most δ t . So:

Hybrid Classical-Quantum Estimation
In the future, we may have access to quantum computers that are capable of running Grover's algorithm but are still limited in the overall circuit depth. Amplitude estimation algorithms may become useful earlier if they can still accelerate estimation to accuracy ε without requiring depth O(1/ε). This can be achieved by mixing classical and quantum resources. In particular, [GKLPZ20] consider the following version of this problem: for any 0 ≤ β < 1 they show that there exists an amplitude estimation algorithm that runs in time O(1/ε 1+β ), but requires a maximum depth of only O(1/ε 1−β ). When β = 0 this reproduces the 'fully quantum' performance of other algorithms, and when β → 1 this algorithm has the same asymptotic complexity as classical probability estimation.
We consider the construction of such hybrid algorithms a fascinating theoretical problem. [GKLPZ20] give two methods for achieving this: a 'Power law' algorithm relying on maximum likelihood estimation, and a 'QoPrime' algorithm relying on the Chinese remainder theorem. In this section we give another method for solving this problem based on quantum signal processing. Depending on the reader's background this new method may be more intuitive.
However, we remark that we consider it questionable if any of these methods are actually useful in practice. In particular, constructing the algorithms to achieve the nice theoretical scaling of O(1/ε 1±β ) undercuts the simplicity of the underlying situation. A depth-limited quantum computer will be able to run some fixed number of Grover rotations r. A practical version of hybrid estimation would take as input such an r, and use that to extract as much information about a as possible. In the context of quantum signal processing, this r corresponds to some fixed 'degree budget'. We could then use machine learning to find whatever polynomial of degree ≤ r gives the most information about a. That is not to say that [GKLPZ20] does not consider practicality: they perform a noise analysis, and even implement their algorithm in hardware [Guirgica-Tiron&21] albeit with a trivial oracle. We merely invite the possibility of simpler algorithms with less elegant theoretical properties. There is a reason to be skeptical of the β → 1 limit in particular: we expect early quantum computers to have significantly slower clock speeds than classical computers. Thus, classical oracle evaluation is likely much faster than quantum oracle evaluation. So if one is going to put in all the effort of running the oracle on the quantum computer, one might as well do so as many times as possible. A perhaps more reasonable notation of hybrid quantum-classical is discussed by [Rosmanis22]: we have two different types of queries, quantum and classical, each with different cost. Unfortunately, [Rosmanis22] shows that there is no speedup for Grover's search in this setting. Is the same true for estimation?
Despite these observations, we consider the task of varying the complexity with β an excellent opportunity to display the power of the polynomial sampling framework. We now discuss the new algorithm for achieving depth d ∈ O(1/ε 1−β ) and overall complexity D ∈ O(1/ε 1+β ) based on quantum signal processing. The final result is presented in Theorem 31. While the polynomial constructions require cumbersome error analysis as usual, the core idea is quite simple. In light of the above discussion, we do not attempt a constant-factor performance analysis on this algorithm.
We take the same approach as in Theorem 23: we maintain an interval [a min , we test this by finding a polynomial P (a) that is ≤ 1 2 − γ when a is ≤ a (t) min + 0.1∆ t , and is ≥ 1 2 + γ when a is ≥ a (t) max − 0.1∆ t . This can be achieved by making P (a) approximate a line with slope ∝ γ/∆ t , and the degree is largely determined by the slope. Then, ∝ 1/γ 2 samples from the polynomial suffice to distinguish the two cases.
In the previous section we selected γ to be constant, so the degree is ∝ 1/∆ t . This makes the depth and the total number of queries ∝ 1/∆ t . To achieve hybrid estimation, we simply select γ ∝ ∆ β t . Then the depth is ∝ ∆ β t /∆ t = 1/∆ 1−β t . But, since we need to take ∝ 1/γ 2 ∝ 1/∆ 2β t samples from the polynomial, the total complexity is 1/∆ 1+β t . The runtime is dominated by the final iterations where ∆ t ∝ ε, so the final depth is ∝ 1/ε 1−β and the final complexity is ∝ 1/ε 1+β as desired.
Letting a (t) mid := a (t) min + 0.5∆ t , we require a polynomial that approximates a line passing through (a (t) mid , 1/2) with whatever slope we want. Once such a polynomial has been constructed the remainder of the analysis is extremely simple as shown above. However, such a polynomial is unfortunately not so easy to build because of two complications: • Jackson's theorem (Theorem 21) is not precise enough. The line that we desire to approximate only creates a gap γ ∝ ∆ β ∈ o(1). The approximation error better be be smaller than γ, but Jackson's theorem requires an additional factor of 1/γ in the degree to achieve this. This is too slow, so we need a more accurate polynomial approximation of a line. Fortunately, [LC17] constructed an extremely accurate approximation of erf(kx) ≈ kx via the Jacobi-Anger expansion.
• Semi-Pellian polynomials have fixed parity, so even though erf(kx) is odd, the shifted version erf(k(a − a mid )) has mixed parity. We can fix this by instead approximating the even function erf(k(a − a mid )) + erf(−k(a + a mid )) + 1, which is very similar to erf(k(a − a mid )) for a ≥ 0. But this only works when a mid is not too close to the origin -otherwise the two erf functions interfere. This forces us to adopt special behavior for small a mid , causing O(a −1 ) terms in the final runtime. [GKLPZ20] treat a and β as constants in their analysis, so if we are allowed to do the same then we still reproduce their complexities.
Remark 24. As pointed out by [MML22], it is sometimes possible to construct oracles for a shifted amplitude a+1 2 . For some state |0 say we have a unitary U satisfying U |0 = |Ψ and another unitary V such that Π = V |0 0| V † . Then redefine: Observe that | 0| V † U |0 | = |Π |Ψ | = a, but suppose furthermore that 0| V † U |0 = a. Then, we see that: If we use this oracle in our algorithm instead, then we have a ≥ 1/2 so all the O(a −1 ) terms in Theorem 31 become constant.
We begin by discussing the approximation of erf(kx).
The construction in [LC17] actually obtains an approximation for x ∈ [−1, 1], but this is easily expanded to x ∈ [−2, 2] by dividing the input by two and then doubling k. Now we leverage that erf(kx) can be used to approximate the line kx, and is close to −1 or 1 when far from the origin. These properties are listed in the two facts below and also plotted in Figure 10. Note that the kind of approximation erf(kx) ≈ kx we achieve here is sufficient for interval reduction, but not sufficient for the sampling performed for the unbiased estimation algorithm from the previous section. If this had been the case, we could have exponentially improved the accuracy of the unbiased estimation algorithm. Fact 27. For any τ > 0, let κ(τ ) := 1 2 2 ln(2/(πτ 2 )). Then: x Properties of erf(x) Figure 10: Sketch of the properties of erf(x) stated in Facts 26 and 27. Here we selected τ = 0.1, making κ(τ ) ≈ 1.44. Fact 26 asserts that erf(x) is either strictly above or below the dashed line 0.8x depending on the sign. Fact 27 asserts that erf(x) is within τ of ±1 when x is κ-far from the origin. These bounds correspond to the gray regions.
Now we discuss the trick we explained earlier that lets us approximate the mixed-parity function erf(k(a − a mid )) with an even function on the interval a ∈ [0, 1]. Think of f (a) ≈ erf(k(a − a mid )) + 1, so that f (a) jumps from 0 to 1 on the interval [a min , a max ]. Then f e (a) := f (a) + f (−a) approximates f (a). We track the quality of the approximation via the parameter λ.
We have all the tools in place to construct the family of polynomials for hybrid estimation. A sketch of the construction is shown in Figure 11, which displays the critical property proven by the following lemma: that P (a) is bounded away from 1/2 for values close to a min and a max . That bound will later become γ from the previous discussion. The criterion a mid ≥ κ/k allows us to determine when a mid is too close to the origin for the construction to work.
Bounds on P hybrid,τ,η,k,a mid  Figure 11: Sketch of the construction of P hybrid,τ,η,k,a mid (a) or P (a) for short from Lemma 29. Here we selected τ = η = 0.025, k = 4, which make κ ≈ 1.87, κ/k ≈ 0.47 and deg(P erf,k,η ) = 21. In (a) we have a mid ≥ κ/k, so we see that P (a) is contained in the gray regions asserted in (122, 123). The region at most λ above f (a) is highlighted in dark gray. But in (b) where a mid < κ/k, we see that f (−a) is large on the interval [a min , a max ] causing the shape of P (a) to be distorted and no longer be contained in the gray regions.
This polynomial has the same degree as P erf,k,η (x) from Lemma 25.
We have yet to take into account the Born rule: when we sample from P (a), we actually toss a coin that comes up heads with probability |P (a)| 2 . Fortunately, as the following lemma shows, it suffices to bound x away from 1/2 even when tossing a coin with bias x 2 in order to decide if x is big or small.
Having constructed the polynomial and shown how to use it to distinguish a ≤ a (t) min +0.1∆ t and a ≥ a (t) max − 0.1∆ t , all that remains to show is how to deal with the restriction that we must have a (t) mid ≥ κ/k for the polynomial construction to work properly. Recall that we would ideally like to pick the slope k ∝ 1/∆ 1−β t . Since κ is roughly constant, this essentially translates the condition a (t) mid ≥ κ/k to a (t) mid ≥ 1 2 ∆ 1−β t . If this is not the case, then we must go 'full quantum' (β = 0) and set the slope to be k ∝ 1/∆ t instead. Then the condition is a (t) mid ≥ 1 2 ∆ t which is always true. We find that the consequences of this behavior are to knock out an extra O(a −1 ) term in the total query complexity and an extra O(a −1/(1−β) ) term in the depth. Intuitively, these arise as follows. At iteration t we have a ∈ [a (t) min , a (t) max ], so, implying that a mid is lower bounded by a − ∆ t /2 ≥ a − ∆ 1−β t /2. At a certain time t * , ∆ t becomes small enough (∆ 1−β t ≤ a) so that the bound a (t) mid ≥ 1 2 ∆ 1−β t is assured. So, the algorithm divides into two phases. In the first phase (t ≤ t * ) we have ∆ 1−β t ≥ a and the algorithm is essentially fully quantum. The complexity and depth are then given by O(a −1 ) and O(a −1/(1−β) ) respectively. In the second phase (t > t * ) we are guaranteed to have a (t) mid ≥ κ/k, so the algorithm has the desired complexity and depth of O(1/ε 1+β ) and O(1/ε 1−β ) respectively.
Recall that d is the maximum degree of any particular polynomial, and that D is the sum of the degrees of all the polynomials. These correspond to query depth and query complexity respectively.