Improved Quantum Query Complexity on Easier Inputs

Quantum span program algorithms for function evaluation sometimes have reduced query complexity when promised that the input has a certain structure. We design a modified span program algorithm to show these improvements persist even without a promise ahead of time, and we extend this approach to the more general problem of state conversion. As an application, we prove exponential and superpolynomial quantum advantages in average query complexity for several search problems, generalizing Montanaro's Search with Advice [Montanaro, TQC 2010].


Introduction
Quantum algorithms often perform better when given a promise on the input.For example, if we know that there are M marked items out of N , or no marked items at all, then Grover's search can be run in time and query complexity O( N/M ), rather than O( √ N ), the worst case complexity with a single marked item [Gro97,Aha99].
In the case of Grover's algorithm, a series of results [BBHT98, BHMT02, BHT98] removed the promise; if there are M marked items, there is a quantum search algorithm that runs in O( N/M ) complexity, even without knowing the number of marked items ahead of time.Most relevant for our work, several of these algorithms involve iteratively running Grover's search with exponentially growing runtimes [BBHT98,BHMT02] until a marked item is found.
Grover's algorithm was one of the first quantum query algorithms discovered [Gro97].Since that time, span programs and the dual of the general adversary bound were developed, providing frameworks for creating optimal query algorithms for function decision problems [Rei09,Rei11] and nearly optimal algorithms for state conversion problems, in which the goal is to generate a quantum state based on an oracle and an input state [LMR + 11].Moreover, these frameworks are also useful in practice [BT20, BR12, Bel12, BR20, CMB18, DKW19].
For some span program algorithms, analogous to multiple marked items in Grover's search, there are features which, if promised to exist, allow for improvement over the worst case query complexity.For example, a span program algorithm for deciding st-connectivity uses O n 3/2 queries on an n-vertex graph.However, if promised that the shortest path, if it exists, has length at most k, then the problem can be solved with O( √ kn) queries [BR12].
Our contribution is to remove the requirement of the promise; we improve the query complexity of generic span program and state conversion algorithms in the case that some speed-up inducing property (such as multiple marked items or a short path) is present, even without knowing about the structure in advance.One might expect this is trivial: surely if an algorithm produces a correct result with fewer queries when promised a property is present, then it should also produce a correct result with fewer queries without the promise if the property still holds?While this is true and these algorithms always output a result, even if run with fewer queries, the problem is that they don't produce a flag of completion, and their output cannot always be easily verified.Without a flag of completion or a promise of structure, it is impossible to be confident that the result is correct.Span program and state conversion algorithms differ from Grover's algorithm in their lack of a flag; in Grover's algorithm one can use a single query to test whether the output is a marked item, thus flagging that the output of the algorithm is correct, and that the algorithm has run for a sufficiently long time.We note that when span program algorithms previously have claimed an improvement with structure, they always included a promise, or they give the disclaimer that running the algorithm will be incorrect with high probability if the promise is not known ahead of time to be satisfied, e.g.Ref. [CMB18,App. C.3].
We use an approach that is similar to the iterative modifications to Grover's algorithm; we run subroutines for exponentially increasing times, and we have novel ways to flag when the computation should halt.On the hardest inputs, our algorithms match the asymptotic performance of existing bounded error algorithms.On easier inputs, our approach on average matches the asymptotic performance, up to log factors, of existing algorithms when those existing algorithms additionally have an optimal promise.
Because our algorithms use fewer queries on easier inputs without needing to know they are easier inputs, they provide the possibility of improved average query complexity over input oracles when there is a distribution of easier and harder inputs.In this direction, we generalize a result by Montanaro that showed a super-exponential quantum advantage in average query complexity for the problem of searching for a single marked item under a certain distribution [Mon10].In particular, we provide a framework for proving similar advantages using quantum algorithms based on classical decision trees, opening up the potential for a broader range of applications than the approach used by Montanaro.We apply this technique to prove an exponential and superpolynomial quantum advantage in average query complexity for searching for multiple items and searching for the first occurring marked items, respectively.
Where prior work showed improvements for span program algorithms with a promise, our results immediately provide an analogous improvement without the promise: • For undirected st-connectivity described above, our algorithm determines whether there is a path from s to t in an n-vertex graph with Õ( √ kn) queries if there is a path of length k, and if there is no path, the algorithm uses Õ( √ nc) queries, where c is the size of the smallest cut between s and t.In either case, k and c need not be known ahead of time.
• For an n-vertex undirected graph, we can determine if it is connected in Õ(n √ R) queries, where R is the average effective resistance, or not connected in Õ( n 3 /κ) queries, where κ is the number of components.These query complexities hold without knowing R or κ ahead of time.See Ref. [JJKP18] for the promise version of this problem.
• For cycle detection on an n-vertex undirected graph, whose promise version was analyzed in Ref. [DKW19], if the circuit rank is C, then our algorithm will detect a cycle in Õ( n 3 /C) queries, while if there is no cycle and at most µ edges, the algorithm will decide there is no cycle in Õ(µ √ n) queries.This holds without knowing C or µ ahead of time.
To achieve our results for decision problems, we modify the original span program function evaluation algorithm to create two one-sided error subroutines.In the original span program algorithm, the final measurement tells you with high probability whether f (x) = 1 or f (x) = 0.In one of our subroutines, the final measurement certifies that with high probability f (x) = 1, providing our flag of completion, or it signals that more queries are needed to determine whether f (x) = 1.The other behaves similarly for f (x) = 0.By interleaving these two subroutines with exponentially increasing queries, we achieve our desired performance.
The problem is more challenging for state conversion, as the standard version of that algorithm does not involve any measurements, and so there is nothing to naturally use as a flag of completion.We thus design a novel probing routine that iteratively tests exponentially increasing query complexities until a sufficient level is reached, before then running an algorithm similar to the original state conversion algorithm.
While we analyze query complexity, the algorithms we create have average time complexity on input x that scales like O(T U E[Q x ]), where E[Q x ] is the average query complexity on input x, and T U is the time complexity of implementing an input-independent unitary.Since the existing worst-case span program and state conversion algorithms have time complexities that scale as O(max x T U E[Q x ]), our algorithms also improve in average time complexity relative to the original algorithms on easier inputs.For certain problems, like st-connectivity [BR12] and search [CJOP20], it is known that T U = Õ(1), meaning that the query complexities of our algorithms for these problems match the time complexity up to log factors.

Directions for Future Work
Ambainis and de Wolf show that while there is no quantum query advantage for the problem of majority in the worst case, on average there is a quadratic quantum advantage [AdW01].However, their quantum algorithm uses a technique that is specific to the problem of majority, and it is not clear how it might extend to other problems.On the other hand, since our approach is based on span programs, a generic optimal framework, it may provide opportunities of proving similar results for more varied problems.
In the original state conversion algorithm, to achieve an error of ε in the output state (by some metric), the query complexity scales as O ε −2 [LMR + 11].In our result, the query complexity scales as O ε −5 .While this does not matter for applications like discrete function evaluation, as considered in Section 4.2, in cases where accuracy must scale with the input size, this error term could overwhelm any advantage from our approach, and so it would be beneficial to improve this error scaling.
Ito and Jeffery [IJ19] give an algorithm to estimate the positive witness size (a measure of how easy an instance is) with fewer queries on easier inputs.While there are similarities between our approaches, neither result seems to directly imply the other.Better understanding the relationship between these strategies could lead to improved algorithms for determining properties of input structure for both span programs and state conversion problems.
Our work can be contrasted with the work of Belovs and Yolcu [BY23], which also has a notion of reduced query complexity on easier inputs.Their work focuses on the "Las Vegas query complexity," which is related to the amount of the state that a controlled version of the oracle acts on over the course of the algorithm, and which is an input-dependent quantity.They show the "Monte Carlo query complexity," what we call the query complexity, is bounded by the Las Vegas query complexity of the worst-case input.We suspect that using techniques similar to those in our work, it would be possible to modify their algorithm to obtain an algorithm with input-dependent average query complexity that scales roughly with the geometric mean of the Las Vegas and Monte Carlo complexities for that input, without knowing anything about the input ahead of time.

Preliminaries
Basic Notation: For n > 2, let [n] represent {1, 2, . . ., n}, while for n = 2, [n] = {0, 1}.We use log to denote base 2 logarithm.For set builder notation like {r z : z ∈ Z} we will frequently use {r z } z∈Z = {r z }, where we drop the subscript outside the curly brackets if clear from context.We denote a linear operator from the space V to the space U as L(V, U ).We use I for the identity operator.(It will be clear from context which space I acts on.)Given a projection Π, its complement is Π = I − Π.For a matrix M , by M xy or (M ) xy , we denote the element in the x th row and y th column of M .By Õ, we denote big-O notation that ignores log factors.The l 2 -norm of a vector |v⟩ is denoted by ∥|v⟩∥.For any unitary U , let P Θ (U ) be the projection onto the eigenvectors of U with phase at most Θ.That is, P Θ (U ) is the projection onto span{|u⟩ :

Quantum Algorithmic Building Blocks
We consider quantum query algorithms, in which one can access a unitary O x , called the oracle, which encodes a string x ∈ X for X ⊆ [q] n , q ≥ 2. The oracle acts on the Hilbert space Given O x for x ∈ X, we would like to perform a computation that depends on x.The query complexity is the minimum number of uses of the oracle required such that for all x ∈ X, the computation is successful with some desired probability of success.We denote by E[Q x ] the average number of queries used by the algorithm on input x where the expectation is over the algorithm's internal randomness.Given a probability distribution {p x } x∈X over the elements of X, then x∈X p x E[Q x ] is the average quantum query complexity of performing the computation with respect to {p x }.
Several of our key algorithmic subroutines use a parallelized version of phase estimation [MNRS11], in which for a unitary U , a precision Θ > 0, and an accuracy ϵ > 0, a circuit D(U ) implements O(log 1 ϵ ) copies of the phase estimation circuit on U , each to precision O(Θ), that all measure the phase of a single state on the same input register.If U acts on a Hilbert Space H, then D(U ) acts on the space , where we have used A to label the register that stores the input state, and B to label the registers that store the results of the parallel phase estimations.
The circuit D(U ) can be used for Phase Checking: applying D(U ) to |ψ⟩ A |0⟩ B and then measuring register B in the standard basis; the probability of outcome |0⟩ B provides informa-tion on whether |ψ⟩ is close to an eigenvector of U that has eigenphase close to 0 (in particular, with eigenphase within Θ of 0).To characterize this probability, we define Π 0 (U ) to be the orthogonal projection onto the subspace of (Since Π 0 (U ) depends on the choice of Θ and ϵ used in D(U ), those values must be specified, if not clear from context, when discussing Π 0 (U ).)We now summarize prior results for Phase Checking in Lemma 1: Lemma 1 (Phase Checking [Kit95,CEMM98,MNRS11]).Let U be a unitary on a Hilbert Space H, and let Θ, ϵ > 0. We call Θ the precision and ϵ the accuracy.Then there is a circuit D(U ) that acts on the space We also consider implementing D(U ) as described above, applying a −1 phase to the A register if the B register is not in the state |0⟩ B , and then implementing D(U ) † .We call this circuit Phase Reflection1 and denote it as R(U ).Note that R(U ) = Π 0 (U ) − Π 0 (U ), where R(U ) and Π 0 (U ) have the same implicit precision Θ and accuracy ϵ.The following lemma summarizes prior results on relevant properties of Phase Reflection.

Span Programs
Span programs are a tool for designing quantum query algorithms for decision problems.

Definition 5 (Span Program). A span program is a tuple
H true ⊕H false , and for j ∈ [n] and a ∈ [q], we have H j,a ⊆ H j , such that a∈[q] H j,a = H j .
2. V is a vector space 3. τ ∈ V is a target vector, and Given a string x ∈ [q] n , we use H(x) to denote the subspace

and we denote by Π H(x) the orthogonal projection onto the space H(x).
We use Definition 5 for span programs because it applies to both binary and non-binary inputs (q ≥ 2).The definitions in Refs.[BR12,CMB18] only apply to binary inputs (q = 2).Definition 6 (Positive and Negative Witness).Given a span program P = (H, V, τ, A) on [q] n and x ∈ [q] n , then |w⟩ ∈ H(x) is a positive witness for x in P if A|w⟩ = τ .If a positive witness exists for x, we define the positive witness size of x in P as w + (P, x) = w + (x) := min ∥|w⟩∥ 2 : |w⟩ ∈ H(x) and A|w⟩ = τ . (1) Then |w⟩ ∈ H(x) is an optimal positive witness for x if ∥|w⟩∥ 2 = w + (P, x) and A|w⟩ = τ .We say ω ∈ L(V, R) is a negative witness for x in P if ωτ = 1 and ωAΠ H(x) = 0.If a negative witness exists for x, we define the negative witness size of x in P as w − (P, x) = w − (x) := min ∥ωA∥ 2 : ω ∈ L(V, R), ωAΠ H(x) = 0, and ωτ = 1 . ( Then ω is an optimal negative witness for x if ∥ωA∥ 2 = w − (P, x), ωAΠ H(x) = 0, and ωτ = 1.

Each x ∈ [q]
n has a positive or negative witness (but not both).We say that a span program P decides the function f : (1) has a positive witness in P, and each x ∈ f −1 (0) has a negative witness in P. Then we denote the maximum positive and negative witness of P on f as Given a span program that decides a function, one can use it to design an algorithm that evaluates that function with query complexity that depends on W + (P, f ) and W − (P, f ): Theorem 7 ([Rei09, IJ19]).For X ⊆ [q] n and f : X → {0, 1}, let P be a span program that decides f .Then there is a quantum algorithm that for any x ∈ X, evaluates f (x) with bounded error, and uses O W + (P, f )W − (P, f ) queries to the oracle O x .
Not only can any span program that decides a function f be used to create a quantum query algorithm that decides f , but there is always a span program that creates an algorithm with asymptotically optimal query complexity [Rei09,Rei11].Thus when designing quantum query algorithms for function decision problems, it is sufficient to consider only span programs.
Given a function f : X → {0, 1}, we denote the negation of the f as f ¬ , where ∀x ∈ X, f ¬ (x) = ¬f (x).We use a transformation that takes a span program P that decides a function f : X → {0, 1} and creates a span program P † that decides f ¬ , while preserving witness sizes for each input x.While such a transformation is known for Boolean span programs [Rei09], in Lemma 8 we show it exists for the span programs of Definition 5.The proof is in Appendix A.

State Conversion
In the state conversion problem, for X ⊆ [q] n , we are given descriptions of sets of pure states {|ρ x ⟩} x∈X and {|σ x ⟩} x∈X .Then given access to an oracle for x, and the quantum state |ρ x ⟩, the goal is to create a state |σ ′ x ⟩ such that ∥|σ ′ x ⟩ − |σ x ⟩|0⟩∥ ≤ ε.We call ε the error of the state conversion procedure.
Let ρ and σ be the Gram matrices of the sets {|ρ x ⟩} and {|σ x ⟩}, respectively, so ρ and σ are matrices whose rows and columns are indexed by the elements of X such that ρ xy = ⟨ρ x |ρ y ⟩, and σ xy = ⟨σ x |σ y ⟩.
We now define the analogue of a span program for the problem of state conversion, which we call a converting vector set: (4) We call such a P a converting vector set from ρ to σ.
Then the query complexity of state conversion is characterized as follows: Theorem 10 ([LMR + 11]).Given X ∈ [q] n and a converting vector set P = ({|v xj ⟩}, {|u xj ⟩}) x∈X,j∈ [n] from ρ to σ, then there is quantum algorithm that on every input x ∈ X converts |ρ x ⟩ to |σ x ⟩ with error ε and has query complexity Analogous to witness sizes in span programs, we define a notion of witness sizes for converting vector sets: By scaling the converting vector sets, we obtain the following two results: a rephrasing of Theorem 10 in terms of witness sizes, and a transformation that exchanges positive and negative witness sizes.Both proofs can be found in Appendix A.
Corollary 12. Let P be a converting vector set from ρ to σ with maximum positive and negative witness sizes W + and W − .Then there is quantum algorithm that on every input x ∈ X converts |ρ x ⟩ to |σ x ⟩ with error ε and uses O Lemma 13.If P converts {|ρ x ⟩} x∈X to {|σ x ⟩} x∈X , then there is a complementary converting vector set P † that also converts ρ to σ, such that for all x ∈ X and for all j ∈ [n], we have w + (P, x) = w − (P † , x), and w − (P, x) = w + (P † , x); the complement exchanges the values of the positive and negative witness sizes.

Function Decision
Our main result for function decision (deciding if f (x) = 0 or f (x) = 1) is the following: Theorem 14.For X ⊆ [q] n , let P be a span program that decides f : X → {0, 1}.Then there is a quantum algorithm such that for any x ∈ X and δ > 0 1.The algorithm returns f (x) with probability 1 − δ.

On input
queries on average, and if queries on average.

The worst-case (not average) query complexity is O
Comparing Theorem 14 to Theorem 7 (which assumes constant error δ), we see that in the worst case, with an input x where w + (x) = W + or w − (x) = W − , the average and worst-case performance of our algorithm is the same as the standard span program algorithm.However, when we have an instance x with a smaller witness size, then our algorithm has improved average query complexity, without having to know about the witness size ahead of time.
We can also compare the query complexity of our algorithm, which does not require a promise, to the original span program algorithm when that algorithm is additionally given a promise.If the original span program algorithm is promised that, if f (x) = 1, then w + (x) = O(w), then the bounded error query complexity of the original algorithm on this input would be O( √ wW − ) by Theorem 7. On the other hand, without needing to know ahead of time that w + (x) = O(w), our algorithm would use Õ( wW − ) queries on this input on average, and in fact would do better than this if w + (x) = o(w).
A key routine in our algorithm is to apply Phase Checking to a unitary U (P, x, α), which we describe now.We follow notation similar to that in [BR12].In particular, for a span program P = (H, V, τ, A) on [q] n , let H = H ⊕ span{| 0⟩}, and H(x) = H(x) ⊕ span{| 0⟩}, where | 0⟩ is orthogonal to H and V .Then we define Ãα ∈ L( H, V ) as Let Λ α ∈ L( H, H) be the orthogonal projection onto the kernel of Ãα , and let Π x ∈ L( H, H) be the projection onto H(x).Finally, let Note that 2Π x − I can be implemented with two applications of O x [IJ19, Lemma 3.1], and 2Λ α − I can be implemented without any applications of O x .Queries are only made in our algorithm when we apply U (P, x, α).To analyze the query complexity, we will track the number of applications of U (P, x, α).
The time complexity will also scale with the number of applications of U (P, x, α).We denote the time required to implement U (P, x, α) by T U , which is an input independent quantity.Since our query complexity analysis counts the number of applications of U (P, x, α), and the runtime scales with the number of applications of U (P, x, α), to bound the average time complexity of our algorithms, simply determine T U and multiply this by the query complexity.
The following lemma gives us guarantees about the results of Phase Checking of U (P, x, α) applied to the state | 0⟩: Our approach differs from these previous algorithms in the addition of a parameter that controls the precision of our phase estimation.This approach has not (to the best of our knowledge)2 been applied to the non-Boolean span program formulation of Definition 5, so while not surprising that it works in this setting, our analysis in Appendix B may be of independent interest for other applications.
We use Alg. 1 to prove Theorem 14.
High Level Idea of Alg. 1 The algorithm makes use of a test that, when successful, tells us that f (x) = 1.However, the test is one-sided, in that failing the test does not mean that f (x) = 0, but instead is inconclusive.We repeatedly run this test for both functions f and f ¬ while increasing the queries used at each round.If we see an inconclusive result for both f and f ¬ at an intermediate round, we can conclude neither f (x) = 1 nor f (x) = 0, so we repeat the subroutine with larger queries.Once we reach a critical round that depends on w + (x) (if f (x) = 1) or on w − (x) (if f (x) = 0), the probability of an inconclusive result becomes unlikely from that critical round onward.
We stop iterating when the test returns a conclusive result, or when we have passed the critical round for all x.While it is unlikely that we get an inconclusive result at the final round, we return 1 if this happens.

Algorithm 1:
Input : Span program P that decides a function f , oracle O x , and failure probability δ, if Measure |0⟩ in register B at least N i /2 times then return 0 10 return 1 // if exit the for loop without passing a test, the algorithm makes a guess More specifically, we use Phase Checking to perform our one-sided test; we iteratively run Phase Checking on U (P, x, α + ) in Line 5 to check if f (x) = 1, and on U (P † , x, α − ) in Line 8 to check if f (x) = 0, increasing the parameters α + and α − by a factor of 2 at each round.At some round, which we label i * , α 2 + becomes at least 3w + (P, x) or α 2 − becomes at least 3w − (P, x) = 3w + (P † , x) (by Lemma 13), depending on whether f (x) = 1 or f (x) = 0, respectively.Using Lemma 15 Item 1, from round i * onward we have a high probability of measuring the B register to be in the state |0⟩ B at Line 5 if f (x) = 1 or at Line 8 if f (x) = 0, causing the algorithm to terminate and output the correct result.Item 2 of Lemma 15 ensures that at all rounds we have a low probability of outputting the incorrect result.We don't need to know i * ahead of time; the behavior of the algorithm will change on its own, giving us a smaller average query complexity for instances with smaller witness size, the easier instances.
The number of queries used by Phase Checking increases by a factor of 2 at each round of the for loop.If N i , the number of repetitions of Phase Checking at round i, were a constant N , then using a geometric series, we would find that the query complexity would be asymptotically equal to the queries used by Phase Checking in the round at which the algorithm terminates, times N .At round i * , the round at which termination is most likely, the query complexity of Phase Checking is O w We show the probability of continuing to additional rounds after i * is exponentially decreasing with each extra round, so we find an average query complexity of O N w Since there can be T = log √ W + W − rounds in Alg. 1 the worst case, this suggests that each round should have a probability of error bounded by O T −1 , which we can accomplish through repetition and majority voting, but which requires N = Ω(log T ), adding an extra log factor to our query complexity.
To mitigate this effect, we modify the number of repetitions (given by N i in Alg. 1) over the course of the algorithm so that we have a lower probability of error (more repetitions) at earlier rounds, and a higher probability (fewer repetitions) at later rounds.This requires additional queries at the earlier rounds, but since these rounds are cheaper to begin with, we can spend some extra queries to reduce our error.As a result, instead of a log factor that depends only on T , we end up with a log factor that also decreases with increasing witness size, so when Proof of Theorem 14.We analyze Alg. 1.
We first prove that the total success probability is at least 1 − δ.Consider the case that f (x) = 0. Let i * = log 3w − (P, x)W + (P, f ) , which is the round at which we will show our probability of exiting the for loop becomes large.The total number of possible iterations is denote the probability of continuing to the next round of the for loop at round i, conditioned on reaching round i, let P r(err i) be the probability of returning the wrong answer at round i, conditioned on reaching round i, and let P r(f inal) be the probability of reaching the end of the for loop without terminating.(Since we return 1 if we reach the end of the for loop without terminating, this event produces an error when f (x) = 0.) The total probability of error is then We will use the probability tree diagram in Fig. 1a to help us analyze events and probabilities.Since f (x) = 0, P r(err i) is the probability of returning 1, which depends on the probability of measuring |0⟩ B in Phase Checking of U (P, x, α + ) in Line 5 of Alg. 1.Since α 2 + W − ≥ 1, we can use Item 2 of Lemma 15, to find that there is at most a 3 2 ϵ = 1/3 probability of measuring |0⟩ B at each repetition of Phase Checking.Using Hoeffding's inequality [Hoe63], the probability of measuring outcome |0⟩ at least N i /2 times and returning 1 in Line 6 of Alg. 1 is at most a i := e −N i /18 .Therefore, ≤ 1 (a) Probability tree diagram for f (x) = 0, with bounds on probabilities of relevant event branches, with reference to the corresponding equations from the text where that bound is derived.12)) (b) Probability tree diagram for f (x) = 1, with bounds on probabilities of relevant event branches, derived using similar analyses as the equations referenced on the branches.
Figure 1: Probability tree diagrams for a round of the for loop in Alg. 1 when i ≥ i * , and f (x) = 0 (Fig. 1a), and f (x) = 1 (Fig. 1b).By our choice of parameters, a i is small (it is always less than 1/4), and decreases exponentially with increasing i.
which holds for all i but in particular, gives us a bound on the first left branching of Fig. 1a, corresponding to outputting a 1 when i ≥ i * .
When i < i * , we trivially bound the probability of continuing to the next round: When i ≥ i * , we continue to the next round when we do not return 1 in Line 6 of Alg. 1 and then do not return 0 in Line 9, corresponding to the two right branchings of the diagram in Fig. 1a.We upper bound the probability of the first event (first right branch in Fig. 1a) by 1.To bound the probability of the second event, consider Phase Checking of U (P † , x, α − ) in Line 8. Since i ≥ i * , we have α 2 − ≥ 3w − (P, x) = 3w + (P † , x) by Lemma 8. Also since W + (P, f ) = W − (P † , f ) by Lemma 8, we have α 2 ≥ 1/W − (P † , f ).Thus, as we are performing Phase Checking with precision ϵ/(α 2 W + (P, f )) = ϵ/(α 2 W − (P † , f )), we can use Item 1 of Lemma 15 with C = 3 to conclude that the probability of measuring |0⟩ B at a single repetition of Line 8 is at least 2/3.Using Hoeffding's inequality [Hoe63], the probability of measuring |0⟩ B more than N i /2 times, and therefore returning 0, is at least 1 − e −N i /18 .Thus the probability of not returning 0 in Line 9 is at most Therefore when i ≥ i * , using the product rule, the probability of following both right branchings of Fig. 1a and continuing to the next iteration of the for loop is Finally, if we ever reach the end of the for loop without terminating, our algorithm returns 1, which is the wrong answer.This happens with probability using Eq.(11) for i < i * and Eq. ( 13) for i ≥ i * .Now we calculate the total probability of error.Plugging in Eq. (10), Eq. (11), Eq. ( 13), and Eq. ( 14) into Eq.( 9), and splitting the first term of Eq. ( 9) into two parts to account for the different behavior of the algorithm before and after round i * , we get: Since N i = 18(N − i), we have a i = e −(N −i) , which means the first sum in Eq. ( 15) is a geometric series, and is bounded by: where the final inequality arises from our choice of N = ⌈T + log(3/δ)⌉ + 1. Combining the second and third terms of Eq. ( 15), and upper bounding their a j 's and a i 's by a T , we get another geometric series that sums to less than δ/2: Thus, P r(error) < δ, and our success probability is at least 1 − δ.Now we analyze the probability of error for f (x) = 1 and set i * = log 3w + (P, x)W − (P, f ) .Then nearly identical analyses as in the f (x) = 0 case (and using Lemma 8 to relate witness sizes of P and P † ) provide the bounds on probabilities of relevant events, corresponding to branchings in Fig. 1b.By following the first right branching and then the next left branching in Fig. 1b, we see the probability of error at round i for i since by our choice of parameters, a i is always less than 1.Following the two right branchings in Fig. 1b, the probability of continuing when i Thus the rest of the analysis is the same, and so we find that for f (x) = 0 or f (x) = 1, the probability of success is at least 1 − δ.

Now we calculate the average query complexity on input
given by Here, Q(i) is the number of queries used by the algorithm up to and including round i, and is the probability that we terminate at round i.
The only time we make queries is in the Phase Checking subroutine.By Lemma 1, the number of queries required to run a single repetition of Phase Checking in the i th round is Accepted in Quantum 2024-03-21, click title to verify.Published under CC-BY 4.0.
. Taking into account the N i repetitions of Phase Checking in the i th round, we find Now setting i * to be log 3w + (P, x)W − (P, f ) or log 3w − (P, x)W + (P, f ) depending on whether f (x) = 1 or 0, respectively, we can use our bounds on event probabilities from our error analysis to bound the relevant probabilities for average query complexity.When i < i * , we use the trivial bound P r(cont i) ≤ 1.When i ≥ i * , we use Eqs.( 13) and ( 19) and our choice of a i to conclude that P r(cont i) ≤ a i ≤ 1/4.For all i, we use that (1 − P r(cont i)) ≤ 1. Splitting up the sum in Eq. (20) into 2 terms, for i ≤ i * and i > i * , and using these bounds on P r(cont i) along with Eq. ( 21), we have We use the following inequalities to simplify Eq. ( 22), and finally find that By our choice of i * , T , and N , on input x when f (x) = 1, the total query complexity is and when f (x) = 0, the total average query complexity is and the worst case query complexity is where we have again used Eq.(23).

Application to st-connectivity
As an example application of our algorithm, we analyze the query complexity of st-connectivity on an n-vertex graph.There is a span program P such that for inputs x where there is a path from s to t, w + (P, x) = R s,t (x) where R s,t (x) is the effective resistance from s to t on the subgraph induced by x, and for inputs x where there is not a path from s to t, w − (P, x) = C s,t (x), where C s,t (x) is the effective capacitance between s and t [BR12,JJKP18].
In an n-vertex graph, the effective resistance is less than n, and the effective capacitance is less than n 2 , so by Theorem 14, we can determine with bounded error that there is a path on input x with Õ( R s,t (x)n 2 ) average queries or that there is not a path with Õ( C s,t (x)n) average queries.In the worst case, when R s,t (x) = n or C s,t (x) = n 2 , we recover the worst-case query complexity of O(n 3/2 ) of the original span program algorithm.The effective resistance is at most the shortest path between two vertices, and the effective capacitance is at most the smallest cut between two vertices.Thus our algorithm determines whether or not there is a path from s to t with Õ( √ kn) queries on average if there is a path of length k, and if there is no path, the algorithm uses Õ( √ cn) queries on average, where c is the size of the smallest cut between s and t.Importantly, one does not need to know bounds on k or c ahead of time to achieve this query complexity.
The analysis of the other examples listed in Section 1 is similar.

State Conversion Algorithm
Our main result for state conversion is the following: Theorem 16.Let P be a converting vector set from {|ρ x ⟩} x∈X to {|σ x ⟩} x∈X .Then there is a quantum algorithm such that for any x ∈ X, any failure probability δ ≤ 1/3, and any error ε > 0, 1.With probability 1 − δ, on input x the algorithm algorithm converts |ρ x ⟩ to |σ x ⟩ with error ε.
2. On input x, if w + (P, x)W − (P) ≤ w − (P, x)W + (P), the average query complexity is If w + (P, x)W − (P) ≥ w − (P, x)W + (P), the average query complexity is Comparing Theorem 16 with Corollary 12, and considering the case of ε, δ = Ω(1), we see that in the worst case, when we have an input x where w + (P, x) = W + (P) or w − (P, x) = W − (P) the average query complexity of our algorithm is asymptotically the same as the standard state conversion algorithm.However, when we have an instance x with a smaller value of w ± (P, x), then our algorithm has improved query complexity, without knowing anything about the input witness size ahead of time.
Our algorithm has worse scaling in ε than Corollary 12, so our algorithm will be most useful when ε can be constant.One could also do a hybrid approach: initially run our algorithm and then switch to that of Corollary 12.
The problem of state conversion is a more general problem than function decision, and it can be used to solve the function decision problem.However, because of the worse scaling with ε in Theorem 16, we considered function decision separately (see Section 3).
We use Alg. 2 to prove Theorem 16.We now describe a key unitary, U(P, x, α, ε), that appears in the algorithm.In the following, we use most of the notation conventions of Ref.
[LMR + 11].Let {|µ i ⟩} i∈ [q] and {|ν i ⟩} i∈[q] be unit vectors in C q as defined in [LMR + 11, Fact 2.4], such that C m be a converting vector set from ρ to σ.For all x ∈ X, the states |ρ x ⟩ and |σ x ⟩ are in the Hilbert space H. Then for all x ∈ X, define where α is analogous to the parameter α in Eq. ( 7).We will choose ε to achieve a desired accuracy of ε in our state conversion procedure.Set Λ α,ε to equal the projection onto the orthogonal complement of the span of the vectors {|ψ x,α,ε ⟩} x∈X , and set The reflection 2Π x − I can be implemented with two applications of O x [LMR + 11], and the reflection (2Λ α,ε − I) is independent of x and so requires no queries.
As with function decision, the time and query complexity of the algorithm is dominated by the number of applications of U(P, x, α, ε).If T U is the time required to implement U(P, x, α, ε), then the time complexity of our algorithm is simply the query complexity times T U .High level idea of Alg.2: when we apply Phase Reflection of U(P ′ , x, α, ε) (for to pick up a −1 phase.(Note that in this case, half of the amplitude of the state is picking up a +1 phase, and half is picking up a −1 phase.)If this were to happen perfectly, we would have the desired state We show that if α is larger than a critical value that depends on the witness size of the input x, then in Line 10, we will mostly pick up the desired phase.However, we don't know ahead of time how large α should be.To determine this, we implement the Probing Stage (Lines 1-9), which uses Amplitude Estimation of a Phase Checking subroutine to test exponentially increasing values of α.
We use the following two Lemmas (Lemma 17 and Lemma 18) to analyze Alg. 2 and prove Theorem 16: Lemma 17.For a converting vector set P that converts ρ to σ, and Phase Checking of U(P, x, α, ε) done with accuracy ε2 and precision Θ = ε3/2 / αW − (P), then Algorithm 2: Input : Converting vector set P from ρ to σ, failure probability δ < 1/3, error ε, Lemma 17 Item 2 ensures that the |t x− ⟩ part of the state mostly picks up a −1 phase when we apply Phase Reflection regardless of the value of α, and Lemma 17 Item 1 ensures that when α is large enough, the |t x+ ⟩ part of the state mostly picks up a +1 phase.Lemma 17 plays a similar role in state conversion to Lemma 15 in function decision.It shows us that the behavior of the algorithm changes at some point when α is large enough, without our having to know α ahead of time (Item 1) but it also is used to show that we don't terminate early when we shouldn't, leading to an incorrect outcome (Item 2).
The following lemma, Lemma 18, tells us that when we break out of the Probing Stage due to a successful Amplitude Estimation in Line 8, we will convert |ρ x ⟩ to |σ x ⟩ with appropriate error in the State Conversion Stage in Line 10, regardless of the value of α (Item 1 in Lemma 18).However, Lemma 18 also tells us that once α ≥ w + (P, x), then if Amplitude Estimation does not fail, we will exit the Probing Stage (Item 2 in Lemma 18).Together Item 1 and Item 2 ensure that once α is large enough, the algorithm will be very likely to terminate and correctly produce the output state, but before α is large enough, if there is some additional structure in the converting vector set that causes our Probing Stage to end early (when α < w + (P, x)), we will still have a successful result.
Notice that once i ≥ i * , we have α ≥ w + (P, x), so by Lemma 18 Item 2 we have Thus when we do Amplitude Estimation in Line 8 of Alg. 2 to additive error ε/4, with probability 1 − δ i we will find the probability of outcome |0⟩ B will be at least 1/2 − 11ε/4, causing us to continue to the State Conversion Stage.Furthermore, by combining Lemma 18 Item 1 and Item 2, our algorithm is guaranteed to output the target state within error 6 √ ε = ε regardless of an error in Amplitude Estimation.Therefore, the algorithm can only return a wrong state before round i * , and only if Amplitude Estimation fails.Thus, we calculate the probability of error as: where P r(cont i) is the probability of continuing to the next round of the for loop at round i, and P r(err i) is the probability of a failure of Amplitude Estimation at round i, both conditioned on reaching round i.We upper bound P r(cont j) by 1 and P r(err i) by 2δ i , as δ i is the probability of Amplitude Estimation failure in Line 8, and we do two rounds (for P and P † ).This then gives us: where we have used our choice of δ i and that i * ≤ T .Thus the probability of error is bounded by δ.Now we analyze the average query complexity.Let Q(i) be the query complexity of the algorithm when it exits the Probing Stage at round i.Then the average query complexity on input x is where the second term in the first line is the query complexity of the State Conversion Stage, so the complexity is dominated by the Probing Stage.
We divide the analysis into two parts: i ≤ i * and i * < i ≤ T .When i ≤ i * , we use the trivial bound i−1 j=0 P r(cont j) (1 − P r(cont i)) ≤ 1.Thus, the contribution to the average query complexity from rounds i with i ≤ i * is at most: where we have used the following inequality twice: For i from i * + 1 to T , as discussed below Eq. ( 57), Amplitude Estimation in Line 8 should produce an estimate that triggers breaking out of the Probing Stage at Line 9. Thus the probability of continuing to the next iteration depends on Amplitude Estimation failing, which happens with probability The contribution to the average query complexity for rounds after i * is therefore where we have used Eq. ( 63) and Eq. ( 24).Combining Eqs. ( 62) and (65) and replacing ε with ε, the average query complexity of the algorithm on input x is When w + (P, x)W − (P) ≥ w − (P, x)W + (P) = w + (P † , x)W − (P † ), using the same analysis but with P † and applying Lemma 8, we find

Function Evaluation with Fast Verification
The state conversion algorithm can be used to evaluate a discrete function f : X → [m] for X ⊆ [q] n on input x by converting from |ρ x ⟩ = |0⟩ to |σ x ⟩ = |f (x)⟩ and then measuring in the standard basis to learn f (x).When the correctness of f (x) can be verified with an additional constant number of queries, we can modify our state conversion algorithm to remove the Probing Stage, and instead use the correctness verification of the output state as a test of whether the algorithm is complete.In this case, we can remove a log factor from the complexity: Theorem 19.For a function f : X → [m], such that f (x) can be verified without error using at most constant additional queries to O x , given a converting vector set P from ρ = {|0⟩} x∈X to σ = {|f (x)⟩} x∈X and δ < 2 −1/2 , then there is a quantum algorithm that correctly evaluates f with probability at least 1 − δ and uses average queries on input x.
While removing a log factor might seem inconsequential, it yields an exponential quantum advantage in the next section for some applications, as opposed to only a superpolynomial advantage.
Proof of Theorem 19.We analyze Alg. 3, which is similar to Alg. 2, but with the Probing Stage replaced by a post-State Conversion verification procedure.

Algorithm 3:
Input : for m, q ∈ N, a converting vector set P from ρ to σ, where |σ x ⟩ = |f (x)⟩, a procedure to verify the correctness of f (x) with constant additional queries to O x .Probability of error δ < 2 −1/2 , initial state We first analyze the case that w + (P, x)W − (P) ≤ w − (P, x)W + (P) = w + (P † , x)W − (P † ).From Lemma 18, we have that when α ≥ w + (P, x), then the output state |ψ⟩ of Line 7 of Alg. 3 satisfies ∥|ψ⟩ − |1⟩|f (x)⟩|0⟩∥ ≤ 6 This gives us Taking the square of both sides and using that 1 , we have Since |1⟩|f (x)⟩|0⟩ is a standard basis state, if we measure |ψ⟩ in the standard basis, this implies that probability that we measure the second register to be |f (x)⟩ is at least 1 − δ.
Once we measure f (x), we can verify it with certainty using constant additional queries.Thus, our success probability is at least 1 − δ at a single round when α ≥ w + (which we only reach if we haven't already correctly evaluated f (x)), and so our overall probability of success must be at least 1 − δ (from Line 1 of Alg. 3).This is because further rounds (if they happen) will only increase our probability of success.
To calculate the average query complexity, we note that the i th round uses queries, where the 1 is from the verification step, which we henceforward absorb into the big-Oh notation.We make a worst case assumption that the probability of measuring outcome |f (x)⟩ in a round when α < w + (P, x), or equivalently, at a round i when i < i * , is 0, so these rounds contribute queries to the average query complexity, where in the last equality, we've replaced ε with δ.
At each additional round i for i ≥ i * , we have a 1 − δ probability of successfully returning f (x), conditioned on reaching that round, and δ probability of continuing to the next round.This gives us an average query complexity on input x of By our assumption that δ < 2 1/2 , the summation is bounded by a constant.Thus the average query complexity on input x is When w + (P, x)W − (P) > w − (P, x)W + (P) = w + (P † , x)W − (P † ), we get the same expression except with P replaced by P † , which by Lemma 13 gives us the claimed query complexity.

Quantum Advantages for Decision Trees with Advice
Montanaro showed that when searching for a single marked item, if there is a power law distribution on the location of the item, then a quantum algorithm can achieve a (super)exponential speed-up in average query complexity over the best classical algorithm [Mon10].He called this "searching with advice," as in order to achieve the best separations between quantum and classical performance, the algorithm had to know an ordering of the inputs such that the probability of finding the marked item was non-increasing, the "advice." In this section, we generalize Montanaro's result to decision tree algorithms, and use this generalization to prove a superpolynomial and exponential speed-up for several additional search problems.We use a decision tree construction similar to that of Beigi and Taghavi [BT20].
A classical, deterministic query algorithm that evaluates f : X → [m] for X ⊆ [q] n is given access to an oracle O x , for x ∈ X, and uses a single query to learn x i , the i th bit of x.We can describe the sequence of queries this classical algorithm makes as a directed tree T , a decision tree, with vertex set V (T ) and directed edge set E(T ).Each non-leaf vertex v of V (T ) is associated with an index J(v) ∈ [n], which is the index of x that is queried when the algorithm reaches that vertex.The algorithm follows the edge labeled by x J(v) (the query result) from v to another vertex in V (T ).Each leaf is labelled by an element of [m], which is the value that the algorithm outputs if it reaches that leaf.Let path(T , x) be the sequence of edges in E(T ) that are followed on input x when queries are made starting from the root of T .We say that We require that {Q(v, v ′ )} for all edges (v, v ′ ) leaving vertex v form a partition of [q], so that there is always exactly one edge that the algorithm can choose to follow based on the result of the query at vertex v.
To create a quantum algorithm from such a decision tree T , we label each edge e ∈ E(T ) with a weight r(e) ∈ R + and a color c(e) ∈ {red, black}, such that all edges coming out of a vertex v with the same color have the same weight.There must be exactly one edge leaving each non-leaf vertex that is black, and the rest must be red.We denote by r(v, black) the weight of the black edge leaving v, and r(v, red) the weight of any red edge(s) leaving v.If there are no red edges leaving v, we set r(v, red) = ∞.(In Ref. [BT20], the red and black weights are the same throughout the entire tree, instead of being allowed to depend on v.We note a similar flexibility in assigning weights is used in Ref. [Tag22].)Using these weights we design a converting vector set to decide f : Lemma 20.Given a decision tree T that decides a function f : X → [m], for X ∈ [q] n , with weights r(e) ∈ R + for each edge e ∈ E(T ), then there is a converting vector set P that on input Proof.In this proof |u xj ⟩ and |v yj ⟩, with double subscripts, refer to converting vector sets, and u, v with single or no subscripts refer to vertices.We use essentially the same construction as in Ref. [BT20], but with a slightly different analysis because of our generalization to weights that can change throughout the tree.We will make use of the unit vectors {|μ i,d ⟩} i∈ [d] and {|ν i,d ⟩} i∈ [d] , defined in [BT20], which are scaled versions of the vectors in Eq. ( 31), which have the properties that ∀i ∈ First note that we can assume that on any input x ∈ X, for any index j, there is at most a single vertex on in path(T , x) at which j is queried.Otherwise the tree would query the same index twice, which would be a non-optimal tree.Then we define a converting vector set on and From Definition 9, Eq. ( 4), we want that For evaluating a discrete function f , we have |ρ x ⟩ = |0⟩, and , then there must be some vertex in the decision tree at which path(T , x) and path(T , y) diverge.Let's call this vertex v * , and assume that J(v * ), the index of the input queried at vertex v * , is j * .Let (v * , u 1 ) be the edge on path(T , x) and (v * , u 2 ) be the edge on path(T , y).This means that x j * ∈ Q(v * , u 1 ), and In either case, we see from Eqs. (77) and (78) that Now for all other j ∈ [n] with j ̸ = j * , we have ⟨u xj |v yj ⟩ = 0, which we can prove by looking at the following cases: • There is no vertex in T where j is queried for x or y, which results in |v xj ⟩ = 0 or |u xj ⟩ = 0, and so ⟨u xj |v yj ⟩ = 0.
• The index j is queried for both x and y before their paths in T diverge, at a vertex v where the paths for both x and y then travel to a vertex u, in which case, • The index j is queried for both x and y after their paths in T diverge, at a vertex v 1 for x and a vertex v 2 for y, in which case ⟨u xj |v yj ⟩ = 0, since ⟨v 1 |v 2 ⟩ = 0.
Putting this all together for j = j * and j ̸ = j * , we have Now to calculate the positive and negative witness sizes.For the positive witness size, we have where in the second line, we have used Eq. ( 77) and that T will query each index at most once, according to the vertices that are encountered in path(T , x), and in the final line, we have used that ∥|μ u,|V (T )| ⟩∥ 2 , ∥|μ f (x),m ⟩∥ 2 ≤ 2.
For the negative witness size, note that again for input x, T will query each index of x at most once, according to the vertices that are encountered in path(T ).Then if index j of x is queried at vertex v, where (v, u) ∈ path(T , x), and c(v, u) = black then Thus In Theorem 21, we use Lemma 20 to derive average quantum and classical query separations based on classical decision trees.
Theorem 21.If T is a decision tree that decides f : X → [m] for X ⊆ [q] n , with optimal average classical query complexity for the distribution {p x } x∈X , and T has a coloring such that there are at most G red edges on any path from the root to a leaf, then the average quantum query complexity of deciding f (x) with bounded error is If it is possible to verify a potential output ŷ as correctly being f (x) using constant queries, then the average quantum query complexity of deciding f (x) with bounded error is The average classical query complexity of deciding f (x) with bounded error is Proof.The average classical query complexity comes from the fact that on input x, which occurs with probability p x , the algorithm uses |path(T , x)| queries, since each edge on the path of the decision tree corresponds to a single additional query.By assumption, T is optimal for the distribution {p x }, giving the complexity as in Eq. (88).
For the quantum algorithm, we will assign weights to each edge in T , and then use Lemma 20 to create and analyze a state conversion algorithm.Then we will then apply Theorem 16 and Theorem 19 to achieve better complexity on easier inputs.
For each black edge e in T let r(e) = G.For each red edge e, let r(e) = l(e), where l(e) is the number of edges on the path in T from the root to e, including e. Let P be the converting vector set from Lemma 20 that converts We first analyze w + (P, x).By Lemma 20, where in the second-to-last line, we've used that the total number of black or red edges in the path is |path(T , x)|.In the last line, we've used that the number of red edges on any path is at most G.This implies that W + (P) = O(nG).
Now to analyze w − (P, x).From Lemma 20, Now applying Theorem 16 with ε, δ = Θ(1) and W + (P) = O(n) gives us a bounded error algorithm with an average query complexity of O G|path(T , x)| log 3 (n) on input x.On average over x ∈ X, we obtain an average query complexity of O x∈X p x G|path(T , x)| log 3 (n) .When there is a way to verify f (x) using a constant queries, we can apply Theorem 19 with δ = Θ(1) to give us a bounded error algorithm with an average query complexity of O G|path(T , x)| log(n) on input x.On average over x ∈ X, we obtain an average query We now use Theorem 21 to show an average quantum advantage for two problems related to searching: searching for two marked items in a list and searching for the first marked item in a list with two marked items: Theorem 22.For the problem of finding 2 bits with value 1 in an n-bit string, there is a distribution for which there is an exponential advantage in average quantum query complexity over average classical query complexity.For the problem of finding the first 1-valued bit in an n-bit string with at most two 1-valued bits, there is a distribution for which there is a superpolynomial advantage in average quantum query complexity over average classical query complexity.
The proof uses a decision tree T that checks the n bits of the string in order until either one or two 1-valued bits are found.The tree for finding two 1-valued bits is shown in Fig. 2. Each time a 1-valued bit is found, the edge that the algorithm traverses is colored red.Then G(T ) = 1 or 2, depending on the problem, so Theorem 21 tells us the average query complexity will be small when the two marked bits occur early in the list, resulting in a short path for that input.We combine this idea with distributions that are modified versions of a power law distribution; this power law distribution is tailored to allow a quantum algorithm, which has at most a quadratic advantage on any particular input, but which only uses constant queries on the easiest inputs, to achieve an exponential/superpolynomial advantage on average [Mon10].
To obtain these results, we explicitly analyze a particular distribution of bit strings with Hamming weight 1 or 2, but we expect that similar techniques could be applied to additional distributions that include strings with larger Hamming weights.The bottleneck is not analyzing the quantum algorithm, but in proving properties of the optimal classical algorithm.The decision tree we use to design the quantum algorithm for finding two bits with value 1.Each vertex is labelled by its name (v i ) for some i, and J(v i ), which is the bit of the input that is queried if the algorithm reaches that vertex of the tree.Each edge (v i , v j ) is labelled by Q(v i , v j ), which is the set in curly brackets alongside each edge.The algorithm follows the edge (v i , v j ) from vertex v i if the value of the query made at vertex v i is contained in Q(v i , v j ).Each edge is also labelled by its weight, r(e), and is also colored red or black (and red edges are additionally rendered with dot-dashes.)Black edges all have weight G(T ), which in this case is 2. Each red edge has a weight that is equal to the number of edges on the path from the root v 1 to that edge, inclusive.The vertex v 1 is the root, and each leaf (denoted as a rectangular vertex) is labelled by the output of the algorithm on that input.
Proof of Theorem 22.For both problems (finding two marked items, or finding the first marked item), we consider the following distribution of n-bit strings.For the purposes of describing the distribution, for each string we label an index among {2, . . ., n} as the dividing index, although the position of this index is not known to us when we actually sample a string.Let E * i be the event that i is the dividing index of the sampled string.Then p(E * i ) ∝ (i − 1) k for a constant k such that −3/2 > k > −2.For a string with dividing index i, all bits with index greater than i have value 0, and exactly one bit with index less than i has value 1, chosen uniformly at random.In the case that we are trying to find two bits with value 1, bit at the dividing index also has value 1.In the case that we are trying to find the first bit with value 1, we set the bit at the dividing index to have value 1 with probability p + .In Appendix C Lemma 23, we prove that given these distributions and problems, querying the bits of the string in order solves these problems with query complexity within 1 query of the optimal classical strategy, and so analyzing the query complexity of an algorithm that queries the bits in order is sufficient for bounding the optimal asymptotic classical query complexity.
We first analyze the case of finding both 1-valued bits.That is, on input x, we want to evaluate the function f (x) = {i 1 , i 2 }, where i 1 and i 2 are the indices of the two bits of x that have value 1.
Consider a decision tree that queries the bits of the string in increasing order.This corresponds to a tree T as shown in Fig. 2, where |path(T , x)| = i, where i is the index of the 2 nd 1-valued bit in the string.We label an edge (v, v ′ ) of this tree as red whenever 1 ∈ Q(v, v ′ ); that is, edges that are traversed when a 1-valued bit is found are colored red.Then G(T ) = 2.
For this problem, one can verify whether an output of the quantum algorithm is correct using an additional 2 queries; query the two output indices to ensure there is a 1 at each position, in which case, one knows with certainty that the output is correct.Otherwise, the output is incorrect.Thus by Theorem 21, since p(E * i ) is the probability of finding the 2 nd (final) 1-valued bit at index i, the average quantum query complexity is In Appendix C Lemma 24, we show that so the quantum query complexity of finding both 1-valued inputs is Since the optimal classical strategy (to within one query) is to query the bits of the string in order (see Appendix C Lemma 23), and such an algorithm will terminate after i queries if i is the dividing index, the asymptotic classical query complexity is Ω ( n i=r p(E * i )i), and in Lemma 24, we prove Thus we have an exponential quantum improvement, as the average quantum query complexity is O log 1/2 (n) compared to the classical Ω(n k+2 ), for −2 < k < −3/2.Now we consider the problem of finding the first 1-valued bit in the input string for the distribution described at the beginning of this lemma.Consider a decision tree that queries the bits of the string in increasing order.This corresponds to a tree T where |path(T , x)| = i, where i is the index of the first 1-valued bit in the string.We label an edge (v, v ′ ) of this tree as red whenever 1 ∈ Q(v, v ′ ); that is, edges that are traversed when a 1-valued bit is found are colored red.Then G(T ) = 1.However, now when this algorithm returns an index, we can no longer easily verify that it is the first 1-valued bit.From Theorem 21, if E † i is the event that the index of the first 1-valued bit is i, we have that the quantum query complexity is On the other hand, because the optimal classical strategy (to within one query) is to query the bits of the input string in order, the average classical query complexity is Ω i∈n p(E † i )i , since the algorithm will terminate after i queries if the first 1-valued bit is at index i.In Lemma 24, we show so the average classical query complexity is Ω(n k+2 ) and the average quantum query complexity is O (polylog(n)) , a superpolynomial improvement. [Rei11] Ben W.

A Complements of Span Programs and Converting Vector Sets
We first prove that, given a span program that decides : X → {0, 1}, we can create a span program that decides f ¬ while preserving witnesses sizes for each x.
Proof.We first define H ′ , starting from H ′ j,a : where H ⊥ j,a is the orthogonal complement of H j,a .We define H ′ j = a∈[q] H ′ j,a , and H ′ true = H f alse and H ′ f alse = H true .Then Let | 0⟩ be a vector that is orthogonal to H and V , and define V ′ = H ⊕ span{| 0⟩} and τ ′ = | 0⟩.Finally, set where Λ A is the projection onto the kernel of A, Π H is the projection onto H, and Let x ∈ X be an input with f (x) = 1, so x has a positive witness |w⟩ in P. We will show ω ′ = ⟨ 0| + ⟨w| is a negative witness for x in P † .Note ω ′ τ ′ = 1, and also, where in the second line, we have used that ⟨ 0|Λ A = 0 since Λ A projects onto the H subspace, and ⟨w| 0⟩ = 0 because | 0⟩ is orthogonal to H.The final line follows from [IJ19, Definition 2.12], which showed that every positive witness can be written as |w⟩ = |w 0 ⟩ + |w ⊥ ⟩, where |w ⊥ ⟩ is in the kernel of A and |w 0 ⟩ is orthogonal to the kernel of A.
Then ⟨w|Π H ′ (x) = 0, because |w⟩ ∈ H(x), and H ′ (x) is orthogonal to H(x), so ω ′ is a negative witness for x in P † .Also, ∥ω ′ A ′ ∥ 2 = ∥|w⟩∥ 2 , so the witness size of this negative witness in P † is the same as the corresponding positive witness in P.This implies w − (P † , x) ≤ w + (P, x).
If f (x) = 0, there is a negative witness ω for x in P. Consider |w ′ ⟩ = (ωA) † .Then where in the second line, we have used that (ωA) † is orthogonal to the kernel of A. Also, Π H(x) (ωA) † = 0, so |w ′ ⟩ ∈ H ′ (x).This means |w ′ ⟩ is a positive witness for x in P † .Also, ∥|w ′ ⟩∥ 2 = ∥ωA∥ 2 , so the witness size of this positive witness in P † is the same as the corresponding negative witness in P.This implies w + (P † , x) ≤ w − (P, x).Now to show w + (P † , x) ≥ w − (P, x).Let |w ′ ⟩ be a positive witness for x in P † .Then since τ ′ = A ′ |w ′ ⟩, we have Since Λ A | 0⟩ = 0, we must have Λ A |w ′ ⟩ = 0, which implies |w ′ ⟩ = (ωA) † for some ω ∈ V .Plugging this into Eq.(111), and using Eq.(100), we have Thus we have ωτ = 1.Since |w ′ ⟩ is a positive witness for P † , Π H ′ (x) |w ′ ⟩ = |w ′ ⟩, which implies that Π H(x) (ωA) † = 0, or equivalently, ωAΠ H(x) = 0. Thus we see that ω must in fact be a negative witness for x in P. Therefore, w + (P † , x) ≥ w − (P, x).Finally, since if x has a positive witness in P, we have shown it has a negative witness in P † , and if x has a negative witness in P, we have shown it has a negative witness in P † .Thus P † does indeed decide f ¬ .
The next two proofs deal with manipulating converting vector sets.
so applying Theorem 10 gives the result.
Next we prove that given a converting vector set, we can design another converting vectors set that converts between the same states, but with positive and negative witness sizes exchanged.
Lemma 13.If P converts {|ρ x ⟩} x∈X to {|σ x ⟩} x∈X , then there is a complementary converting vector set P † that also converts ρ to σ, such that for all x ∈ X and for all j ∈ [n], we have w + (P, x) = w − (P † , x), and w − (P, x) = w + (P † , x); the complement exchanges the values of the positive and negative witness sizes.
where we have used the fact that ρ and σ are Hermitian.

Thus the vectors (|v
satisfy the same constraints of Eq. (4), and thus produce the same have value in Eq. (5).However, now w + (P, x) = w − (P † , x), and w − (P, x) = w + (P † , x).We perform Phase Checking on the state | 0⟩, so the probability of measuring the state |0⟩ B in the phase register is at least (by Lemma 1), the overlap of | 0⟩ and (normalized) |u⟩.This is
Then when we perform Phase Checking of the unitary U (P, x, α) to some precision Θ with error ϵ on state | 0⟩, by Lemma 1, we will measure |0⟩ B in the phase register with probability at most Now |v⟩ is orthogonal to the kernel of Ãα .(To see this, note that if |k⟩ is in the kernel of Ãα , then ⟨v|k⟩ = αω Ãα |k⟩ = 0.) Applying Lemma 4, and setting Θ = ϵ α 2 W − , we have To bound ∥|v⟩∥ 2 , we observe that where we have used our assumption that α 2 W − ≥ 1. Plugging Eqs. ( 119) and (120) into Eq.( 118), we find that the probability of measuring |0⟩ B in the phase register is at most 3 2 ϵ, as claimed.

C Analyzing the Distributions for Search Problems
In the lemmas in this section, we analyze the following distribution on n-bit strings with Hamming weight 2 or 1.Each string has a "dividing index" which is a bit position between 2 and n.We denote by E * i the event that i is the dividing index of the string, and we sample strings with dividing index i with probability p(E * i ) ∝ (i−1) k for k a constant −2 < k < −3/2.With probability p + the value of the bit at the dividing index itself is 1, and with probability 1 − p + it is 0. There is one index, chosen uniformly at random from among indices less than the dividing index, such that the bit at that index has value 1.All other bits in the string have value 0.
Let E i be the event that the bit at index i has value 1.Then note that, given the description above, Let A n be the normalizing factor of p(E * i ), so p( Lemma 23.For the distribution described around Eq. (121), to find all 1-valued bits in the string, or to find the first 1-valued bit in the string, querying the bits of the string in order will result in an algorithm whose average query complexity is within one query of the optimal strategy.
Proof.We first show that as long as no 1-valued bit has yet been found in the string x, the most likely place for a 1-valued bit to be found is at the unqueried bits with the smallest or second smallest indices.Let S be the set of un-queried indices of x, and as just stated, we know that x i = 0 for all i / ∈ S. Let s l be the l th smallest element of S.
We will show that for i, i ′ ∈ S, if s 1 < i < i ′ then where p(E i |S) is the probability that x i = 1, given that S are the unqueried indices (and by assumption all other indices are queried and have been found to have value 0.) This implies that the most likely place to find a bit with value 1 among all unqueried indices except s 1 is s 2 , regardless of which prior queries have been made.Summing over all possible locations of dividing index, we have We first analyze p(S|E * j ).When the dividing index is j, since a priori, the first 1-valued bit is equally distributed among the j − 1 prior indices, the probability that all bits with index less than j, except those in S, have value 0 is while the probability that x j has value 0 is (1 − p + ).Since the likelihood of the dividing index bit being 0 or 1 is independent from the probability of prior bits being 0 or 1, we have where the first inequality comes from replacing all (j − 1) k−1 and (i − 1) k−1 terms with the smaller term (i ′ − 1) k−1 .Therefore, the probability of finding a 1-valued bit is always higher at smaller indices (up to the second to smallest unqueried index.)Next we show p(E s 1 |S) ≥ p(E s 2 |S).
Note that p(E * s 1 |S) = 0, since all bits with index less than s 1 have been queried and found to value 0, so s 1 can not be the location of the dividing index because there must be a 1-valued bit with index smaller than the dividing index.Thus we modify Eq. ( 123 where in the second line, we have used that the second summation is non-negative, and the first summation only contains one term, j = s 2 .In the final line, we used that |{l ∈ S : l < s 2 }| = 1, since s 1 is the only element of S with value less than s 2 .Note that there is only equality in the last line when p + = 1.Thus combining Eqs. ( 130) and (133), we have that for i > s 2 , where again the final inequality can only be tight if p + = 1 Therefore the best strategy for a classical algorithm to find a 1-valued bit is to always query the first or second unqueried index until a 1 is found.If the initial 1 is found at the first unqueried index, then the algorithm has found the first non-zero bit.If the algorithm needs to find an additional 1-valued bit, the probability of finding a 1 at any later index is given by the power law distribution, so the best strategy is to query the remaining bits in order.If the initial 1 is found at the second as-yet-unqueried index, then with one additional query, the algorithm can query the first as-yet-unqueried index, to determine if the first 1-valued bit is there.If it is, the algorithm has found all 1-valued inputs, including the first 1-valued input.If it is not, then the algorithm has found the first 1-valued input, and the probability of finding the next 1 at any later index is given by the power series distribution, so the best strategy is to query the remaining bits in order.
Thus, even if we find a 1 in the second unqueried position, an asympototically optimal strategy is then to go back to the first unqueried position and query it, since that only adds 1 extra query to the complexity.This modified strategy with the addition of querying this extra bit, is equivalent in query complexity, if not worse than, querying the bits of the string in order.Thus without loss of generality, we can assume that the optimal classical strategy to find either the first 1-valued bit, or both 1-valued bits is to query the bits of the string in order, as the algorithm does not get an asymptotic advantage (the difference is at most one query in the worst case).
Lemma 24.Given the distribution of bit strings as described around Eq. (121), if E † i is the event that i is the index of the first 1-valued bit, then p(E * i )i = Ω(n k+2 ) (136) Proof.We first bound A n where recall A n is the normalization factor such that p(E * i ) = A n (i − 1) k and n i=1 p(E * i ) = 1.From [Mon10], we have that Because −2 < k < −3/2, this tells us that A n = Θ(1).We first prove Eq. ( 135).We have For Eq. (136), we have To prove Eq. (138), we first lower bound p( where we've used Eq. ( 121) in the second line.Thus To prove Eq. ( 137), we upper bound p(E † i ).We first do this for i > 1: For i = 1, we obtain We combine Eqs. ( 144) and (145) to get proving Eq. (137).

Lemma 15 .
Let the span program P decide the function f , and let C ≥ 2. Then for Phase Checking with unitary U (P, x, α) on the state | 0⟩ A |0⟩ B with error ϵ and precision Θ = ϵ α 2 W − , 1.If f (x) = 1, and α 2 ≥ Cw + (P, x), then the probability of measuring the B register to be in the state |0⟩ B is at least 1 − 1/C. 2. If f (x) = 0 and α 2 ≥ 1/W − (P, f ), then the probability of measuring the B register in the state |0⟩ B is at most 3 2 ϵ.Note that if f (x) = 1 and α 2 < Cw(P, x), Lemma 15 makes no claims about the output.To prove Lemma 15, we use techniques from the Boolean function decision algorithm of Belovs and Reichardt [BR12, Section 5.2] and Cade et al. [CMB18, Section C.2] and the dual adversary algorithm of Reichardt [Rei11, Algorithm 1].

89
) and accuracy ε2 to (|0⟩|ρ x ⟩) A |0⟩ B and return the result 7 Measure |ψ⟩ in the standard basis to obtain a guess ŷ for f (x) If ŷ passes the additional verification, return ŷ Return "error"

Figure 2 :
Figure2: The decision tree we use to design the quantum algorithm for finding two bits with value 1.Each vertex is labelled by its name (v i ) for some i, and J(v i ), which is the bit of the input that is queried if the algorithm reaches that vertex of the tree.Each edge (v i , v j ) is labelled by Q(v i , v j ), which is the set in curly brackets alongside each edge.The algorithm follows the edge (v i , v j ) from vertex v i if the value of the query made at vertex v i is contained in Q(v i , v j ).Each edge is also labelled by its weight, r(e), and is also colored red or black (and red edges are additionally rendered with dot-dashes.)Black edges all have weight G(T ), which in this case is 2. Each red edge has a weight that is equal to the number of edges on the path from the root v 1 to that edge, inclusive.The vertex v 1 is the root, and each leaf (denoted as a rectangular vertex) is labelled by the output of the algorithm on that input.

Lemma 15 .
Let the span program P decide the function f , and let C ≥ 2. Then for Phase Checking with unitary U (P, x, α) on the state | 0⟩ A |0⟩ B with error ϵ and precision Θ = ϵ α 2 W − , 1.If f (x) = 1, and α 2 ≥ Cw + (P, x), then the probability of measuring the B register to be in the state |0⟩ B is at least 1 − 1/C.
Corollary 12. Let P be a converting vector set from ρ to σ with maximum positive and negative witness sizes W + and W − .Then there is quantum algorithm that on every input x ∈ X converts |ρ x ⟩ to |σ x ⟩ with error ε and uses O Let P = (|u xj ⟩, |v xj ⟩) x∈X,j∈[n].We scale the vectors in P to create the converting vector setP ′ with |v ′ xj ⟩ = |v xj ⟩ W − /W + and |u ′ xj ⟩ = |u xj ⟩ W + /W − .The converting vector set P ′ still satisfies the constraints of Eq. (4), but now has maximum witness sizes W Proof.Let P = (|v xj ⟩, |u xj ⟩) x∈X,j∈[n] .For all x ∈ X and j ∈ [n], define yj ⟩ = (⟨u yj |v xj ⟩) * .Since P satisfies the constraints of Eq. (4), j∈[n]:x j ̸ =y j ⟨u C xj |v C yj ⟩ = j∈[n]:x j ̸ =y j (⟨u yj |v xj ⟩) * =   j∈[n]:x j ̸ =y j (⟨u yj |v xj ⟩) where we have used Eq.(121) in the second line.Note that p(E i |E * j , S) takes the same value for all i ∈ S such that i < j, because the first 1-valued bit is uniformly distributed over all indices less than the dividing index.Thus when we analyze p(E i |S) − p(E i ′ |S), we get a cancellation of terms in the summation, giving us, for i > s 1 ,p(E i |S) − p(E i ′ |S) = p + (p(E * i |S) − p(E * i ′ |S)) + |E * i ,S) and p(E i ′ |E * i ′ , S) with p + by Eq. (121).Using Bayes' Theorem, we can write p(E * j |S) as p(E i |S) = ) to get p(E s 1 |S) = Then using a similar analysis as in Eq. (124), we havep(E s 1 |S) − p(E s 2 |S) = A n p(S) −p + |{l ∈ S : l < s 2 }| (s 2 − 1) k−1