Hamiltonian Simulation by Qubitization

Given a Hermitian operator $\hat{H}=\langle G|\hat{U}|G\rangle$ that is the projection of an oracle $\hat{U}$ by state $|G\rangle$ created with oracle $\hat{G}$, the problem of Hamiltonian simulation is approximating the time evolution operator $e^{-i\hat{H}t}$ at time $t$ with error $\epsilon$. We show that this can be done with query complexity $\mathcal{O}\big(t+\frac{\log{(1/\epsilon)}}{\log\log{(1/\epsilon)}}\big)$ to $\hat{G},\hat{U}$ that is optimal, not just in asymptotic limits, but for all values $t,\epsilon$. Furthermore, only $2$ additional ancilla qubits are required in total, together with $\mathcal{O}(1)$ additional single and two-qubit gates per query. Our approach to Hamiltonian simulation subsumes important prior art considering Hamiltonians which are $d$-sparse or a linear combination of unitaries, leading to significant improvements in space complexity, as well as a quadratic speed-up for precision simulations. It also motivates useful new instances, such as where $\langle G|\hat{U}|G\rangle$ is a density matrix. A key technical result is `qubitization' which uses controlled-$\hat{U}$ and controlled-$\hat{G}$ to embed $\hat{H}$ in an invariant $\text{SU}(2)$ subspace. A large class of operator functions of $\hat{H}$ can then be computed with optimal query complexity, of which $e^{-i\hat{H}t}$ is a special case.


Introduction
Quantum computers were originally envisioned as machines for efficiently simulating quantum Hamiltonian dynamics. As Hamiltonian simulation is BQP-complete, the problem is believed to be intractable by classical computers, and remains a strong primary motivation. The first explicit quantum algorithms were discovered by Lloyd [31] for local interactions, and then generalized by Aharonov and Ta-Shma [3] to sparse Hamiltonians. Celebrated achievements over the years [20,8,32,10,33] have each ignited a flurry of activity in diverse applications from quantum algorithms [26,19,21,15] to quantum chemistry [46,44,40,41,4,29,39]. In this dawning era of the small quantum computer [5,23], the relevance and necessity of space and gate efficient procedures for practical Hamiltonian simulation has intensified.
non-zero elements per row with max-norm Ĥ max , to achieve a quadratic improvement in sparsity

queries. A prominent open question featured in all these works was
whether the additive lower bound Ω t + log(1/ ) log log (1/ ) was achievable for any of these models. Recently [33], we found for the d-sparse model, which is popular in algorithm design by quantum walks [17], a procedure achieving the optimal trade-off between all parameters, with query complexity Θ dt Ĥ max + log (1/ ) log log (1/ ) . The strictly linear-time performance with additive complexity is a quadratic improvement for precision simulations t ∼ log (1/ ) log log (1/ ) , and the constant number of n + m + 3 ancilla qubits significantly improves on prior art which depends on t, like O log (t/ ) log log (t/ ) . The approach, based on Childs' [16,7,30] extension of Szegedy's quantum walk [43], required two quantum oracles: one accepting a j row and k column index to return the value of entryĤ jk to m bits of precision, and another accepting accepting a j row and l sparsity index to return in-place the l th non-zero entry in row j.
Unfortunately, the d-sparse model is less appealing in practical implementations for several reasons. First, it is exponentially slower than BCCKS when theÛ j are of high weight with sparsity O(2 n ). Second, its black-box oracles can be challenging to realize. Avoiding the O(2 n ) blowup by exploiting sparsity requires that positions of non-zero elements are efficiently row-computable, which is not always the case. Third, the Childs quantum walk requires a doubling of the n system qubits, which is not required by BCCKS. It was unclear whether our methodology could be applied to other formulations of Hamiltonian simulation, in contrast to alternatives that seem more flexible [12].
Ideally, the best features of these two algorithms could be combined (Table. 1). For example, given the decompositionĤ = d j=1 α jÛj , one would like the optimal additive complexity of sparse Hamiltonian simulation, but with the BCCKS oracles that are more straightforward to implement. Furthermore, one could wish for a constant ancilla overhead, of say log 2 (d) + 2, superior to either algorithm. These improvements would greatly enhance the potential of early practical applications of quantum computation.
We achieve precisely this optimistic fusion via an extremely general procedure, made possible by what we call 'qubitization', that subsumes both and motivates new formulations of Hamiltonian simulation, as captured by this main theorem: Theorem 1 (Optimal Hamiltonian simulation by Qubitization). Given a unitaryÛ on q qubits and state preparation unitaryĜ|0 = |G such that G|Û |G =Ĉ is a complex matrix on n ≤ q qubits, e −iĤt can be simulated for time t, error in trace distance, and failure probability O( ), with q + 2 qubits in total and Θ(N ) queries to controlled-Ĝ, controlled-Û , and their inverses, and O(N ) additional single and two-qubit primitive quantum gates wherê The optimality of the procedure follows by using the Childs quantum walk forĜ,Û . Furthermore, the transparent nature of Thm. 1 significantly expedites the development of new useful formulations of Hamiltonian simulation. For instance, we easily obtain a new result for the scenario whereĤ is a density matrixρ. Whereasρ can be produced by discarding the ancilla of some output from a quantum circuitĜ, we instead keep this ancilla, leading to an unconditional quadratic improvement in time scaling, and an exponential improvement in error scaling over the sample-based Lloyd, Mohseni, and Rebentrost (LMR) model [32,28], as summarized in Table. 1. This scenario could enhance recent advances, such as quantum semidefinite programming [15].
In fact, Hamiltonian simulation is an application of our main innovation: an approach we call the 'quantum signal processor', where the equation G|Û |G =Ĉ in Thm. 1 is interpreted as a non-unitary signal operator encoded in a subspace of an oracleÛ flagged by |G . Many problems, highlighted in Table. 2, can be of this form. Qubitization is the essential first step, where the oraclesĜ,Û are queried to obtain a Grover-like search parallelized over each eigenvalues λ of , of the search is implementable deterministically with O(1) queries, and resembles Szegedy's [43] and Childs' [16] quantum walk. However, the key difference lies in the extremely general encoding of the signal through anyĜ andÛ , instead of via oracles of the d-sparse formulation.
Unlike the Grover search, we do not seek to prepare some target state. Rather, we exploit Grover-like rotations, which are isomorphic to SU(2), to engineer arbitrary target functions f (λ) of its overlap λ. The quantum signal processor exploits this structure to attack the often-considered problem of designing a quantum circuitQ that queriesĜ,Û such that G|Q|G = f [Ĥ] for some target function f [·]. Whereas prior art based on the linear-combination of unitaries algorithm [30] required a detailed analysis for f on a case-by-case basis and post-selection leading to suboptimal scaling, the quantum signal processor computes f (Ĥ) with an optimal query complexity that exactly matches polynomial lower bounds for a large class of functions. We provide two methods: 'observable transformations' where f (λ), f (λ) are of equal parity, and 'quantum signal processing' 1 where f (λ), f (λ) are of opposite parity. Thus generic improvements to all applications in Table. 2 can be expected in query complexity, ancilla overhead, and scope of possible signal inputs. In particular, Thm. 1 follows directly from the query complexity of the choice f (λ) = e −iλt , which corresponds to applying t sin (·) on eigenphases of the iterate.
The implications of the quantum signal processor stretch beyond the applications of Table. 2. As Hermitian operators are also quantum observables, our methodology also offers a new approach for implementing arbitrary functions of measurement operators on quantum states [1,13,24,38]. We conjecture that this application would be optimal, which is supported by our optimal results for Hamiltonian simulation. In Sec. 2, we introduce the framework for the quantum signal processor, and provide an overview of the major problems it addresses. These are then developed in detail:

Overview of the Quantum Signal Processor
Since coherent quantum computation is restricted to unitary operations, one commonly finds a situation such as in where the signal operatorĤ is in general a complex matrix with spectral norm Ĥ = G|Û |G ≤ 1. Whenever the context is clear, we drop the ancilla and system subscripts. We representÛ such that the top-left block is preciselyĤ and acts on an input state |G ψ ≡ |G |ψ ∈ H G , whereas the undefined parts ofÛ transform |G ψ into some orthogonal state |G ⊥ ψ ∈ H G ⊥ of lesser interest. In the following, we consider the case whereĤ is a normal matrix, until otherwise stated. Thus the action ofÛ on |G ψ in Eq. 2 can be more easily understood by decomposing |ψ as a linear combination of eigenstatesĤ|λ = λe iθ λ |λ , where λ, θ λ ∈ R: Later on, operator functions indicated by [·] are defined by f [Ĥ] = λ f (λe iθ λ )|λ λ| for any scalar function f (·). For each eigenstate |λ , we also find it useful to define the subspace H λ = span{|G λ , |G ⊥ λ }. Note that λ H λ is strictly contained in H G ⊕ H G ⊥ = H a ⊗ H s when H a has dimension greater than two. We summarize these assumptions and resources, as well as other requirements as follows: Quantum Signal Processor: Assumptions: (1) The signal oracleÛ is an m qubit unitary; (2) the state oracleĜ is a dimension d unitary; (3) the signal state |G =Ĝ|0 is prepared byĜ from some standard computational basis; (4) the signal operatorĤ = G|Û |G is an n qubit normal operator. Resources : (1) Signal oracleÛ ,Û † , and their controlled versions; (2) state oracleĜ,Ĝ † , and their controlled versions; (3) comptational basis state |0 ; (4) m + 2 qubits; (4) arbitrary single and two-qubit gates.
In the simplest case, one might wish to applyĤ multiple times to generate higher moments. WhenĤ is proportional to a unitary, this corresponds to phase accumulation, which is essential to precision measurements at the Heisenberg limit. Alternatively, ifĤ is Hermitian, it is a quantum observable, thusĤ 2 would allow a direct estimate of variance, and so on. Unfortunately, the subspace H λ for each eigenstate |λ is not invariant underÛ in general. As a result, repeated applications in this basis do not produce higher moments ofĤ due to leakage out of H λ . The manner they do so depends on the undefined components ofÛ , must be analyzed on a case-by-case basis, and thus is of limited utility.
Order can be restored to this undefined behavior by stemming the leakage. The simplest possibility that preserves the signal operator of Eq. 2 replacesÛ with a unitary, the iterate, that on an input in λ H λ has the form Thus for eachĤ eigenstate,Ŵ is isomorphic to SU(2). More explicitly, In the following,Ŵ will always be applied to states in the subspace λ H λ , thus its action on states outside it need not be defined. The usefulness of this construct is evident; due to its invariant subspace, multiple applications of the iterate result in highly structured behavior. However, implementingŴ requires g[Ĥ], which appears difficult to compute efficiently in general. In prior art [22], this was approximated using phase estimation. We formalize 'qubitization' as the problem of findingŴ : Qubitization: Given an m qubit signal oracleÛ and dimension d signal state |G such that G|Û |G =Ĥ is an n qubit normal signal operator, create a unitaryŴ = G|Ŵ |G =Ĥ, and with an SU(2) invariant subspace containing |G .
Using the oraclesĜ,Û , and arbitrary unitary operations on only the ancilla register, we provide necessary and sufficient conditions, for the case of HermitianĤ in Thm. 2 for whenŴ can be implemented exactly using only one query toÛ . We then prove in Thm. 3 that by using the controlled-Û oracle, there always exists a quantum circuit with the same signal operator and satisfies these conditions. We describe a similar construction for normal operators in Appendix. A.
WhenĤ is Hermitian,Ŵ L produces Chebyshev polynomials A[Ĥ] = T L [Ĥ] [19]. We call any function [·] of the signalĤ target operators when they occur in the top-left block. The fact that Chebyshev polynomials are the best polynomial basis for L ∞ function approximation [37] suggests that the target operator A[Ĥ] + iB[Ĥ] could approximate any desired function with a judicious choice of controls on the ancilla register. As all Hermitian operators are also quantum observables, achieving this would be of great utility to designing measurements on quantum states. We call this the Observable transformations problem.
The two polynomials in our solution of the Observable transformations problem are of the same parity. Polynomials of opposite parity can be obtained by embeddingŴ , or evenŴ φ into yet another SU(2) invariant subspace. We call this the quantum signal processing problem, which we addressed previously in [33].
Quantum signal processing: Given a real function h : R → R and an m qubit oracleV = λ e iθ λ |λ λ|, create a unitaryV ideal = λ e ih(θ λ ) |λ λ| Inputs: (1) Controlled-V and its Hermitian conjugate; (2) an odd periodic function h : (−π, π] → (−π, π]; (3) target error > 0 in trace distance; (4) m + 1 qubits. These schemes for target operator processing would be augmented by any constructive approach to implementing someĜ,Û consistent with a desiredĤ -the linear-combination-of-unitaries algorithm [30], reviewed in Sec. 3.1, is one such possibility. In fact, as any complex operator H = d j α jÛj may be decomposed into a linear combination of unitaries, we conjecture that this combined with our algorithm for Observable transformations is an optimal approach to implementing arbitrary quantum measurements and their operator transformations. This claim is supported by its query complexity matching fundamental lower bounds in polynomial approximation theory, and our main result on Hamiltonian simulation.
We consider a general formulation of Hamiltonian simulation, outlined below. By applying our methods for target operator processing, an algorithm optimal in query complexity for all parameters values, and not just in asymptotic limits, is found.
Hamiltonian simulation: Given an m qubit signal oracleÛ and dimension d signal state |G such that G|Û |G =Ĥ is an n qubit Hamiltonian, create a unitary G|Ŵ ideal |G = e −iĤt for some t ∈ R.
Inputs: (1) Controlled-Û and its Hermitian conjugate; (2) simulation time t > 0; (3) target error > 0 in trace distance; (4) m + 2 qubits. Having motivated our perspective on the problem of qubitization in a quantum signal processor and its significance to designing arbitrary operator transformations, we present the solution.

Qubitization in a Quantum Signal Processor
This section describes, in three steps, qubitization: the process for creating the iterateŴ and an essential component in a systematic procedure for implementing operator transformations ofĤ. In Thm. 2, we provide necessary and sufficient conditions on whenŴ can be constructed from the oraclesĜ,Û . Then in Thm. 3, we show that anyĜ,Û not satisfying these conditions can be efficiently transformed into aĜ ,Û that do, and encode the same signal operatorĤ = G |Û |G . A constructive approach to implementing an encodingÛ for any desiredĤ is then reviewed in Sec. 3.1.
For now, we assume that |G is known. This soon proves to be unnecessary and only oracle access toĜ is required. Thus we must find a unitaryŜ , acting only on the ancilla register such that the iterateŴ =Ŝ Û of Eq. 4 is obtained. For the case of HermitianĤ|λ = λ|λ , we now determine necessary and sufficient conditions on whatŜ must be. AsŜ is otherwise arbitrary, we use without loss of generality the ansatz ofŜ being a product of a reflection about |G and another arbitrary unitaryŜ on the ancilla: Theorem 2 (Conditions on Hermitian qubitization). For all signal oraclesÛ that implement the signal operatorĤ, the unitaryŜ in Eq. 6 creates a unitary iterateŴ with the same signal operator in the same basis, but in an SU(2) invariant subspace containing |G if and only if G|ŜÛ |G =Ĥ and G|ŜÛŜÛ |G =Î.
Proof. In the forward direction, we assume Eq. 4, then compute and compare G λ |Ŵ |G λ = G λ |ŜÛ |G λ = λ. By Gram-Schmidt orthonormalization ofŴ |G λ , we also obtain the state As these must be true for all eigenvectors |λ , the conditions in Eq. 7 are necessary. That these are also sufficient follows from assuming Eq. 7 and a straightforward computation ofŴ |G λ andŴ |G ⊥ λ , which recovers Eq. 4.
In hindsight, these results are manifest. After all, G|ŜÛŜÛ |G =Î implies thatŜÛ is a reflection when controlled by input state |G , and it is well-known that a Grover iterate [25,45] is the product of two reflection about start and target subspaces. Nevertheless, the sufficiency of these conditions highlights that this is the simplest method to extract controllable and predictable behavior out ofÛ .
Unfortunately, depending on the details ofÛ , a solution to Eq. 7 may not exist. Thm. 2, amounts to choosingŜ such thatŜÛŜ is the inverseÛ † whist preserving the signal operator G|ŜÛ |G =Ĥ. Given thatŜ only acts on the ancilla register, it is hard to see how this is possible in general. The solution is to construct a different quantum circuitÛ that containsÛ but still implements the same signal operator, and crucially always has a solutionŜ. We now show how this can be done in all cases using only 1 query to controlled-Û and controlled-Û † .
Theorem 3 (Existence of Hermitian qubitization). For all m qubit signal unitariesÛ that implement the signal operator G|Û |G =Ĥ, there exists an m + 1 qubit quantum circuitÛ that queries controlled-Û and controlled-Û † once to implement the Hermitian component 1 2 (Ĥ +Ĥ † ) as the signal operator, such that the of conditions Eq. 7 can be satisfied.
where we have used the fact thatŜ|G = |G is an eigenstate, and thatŜ swaps the |0 , |1 ancilla states inÛ , thus transforming it into its inverse.
Even if we are givenÛ for which there is no solution to Eq. 7, we can always apply Thm. 3 to construct aÛ that does with minimal overhead. Furthermore our proof uses no information about the detailed structure of |G . Thus without loss of generality, we can assume that anyĜ,Û have already been qubitized.

ImplementingĜ,Û with a Linear Combination of Unitaries
Qubitization can be made fully constructive with an approach for implementing someĜ,Û that encode any desiredĤ. One option is provided by the Linear-Combination-of-Unitaries algorithm (LCU) [20,30], which underlies the BCCKS simulation algorithm. LCU is based on the fact that any complexĤ is a linear combination of some d unitary operators: where the upper bound on the spectral norm is α. Note that this bound depends on the choice of decomposition, but is tight for the some choice. Without loss of generality, all α j ≥ 0 by absorbing complex phases intoÛ j . The algorithm assumes that the α j are provided as a list of d numbers, and eachÛ j is provided as a quantum circuit composed of O(C) primitive gates. With these inputs, can be constructed, where the ancilla state creation operatorĜ|0 = |G is implemented with O(d) primitive gates, and the selectorÛ is implemented with O(dC) primitive gate. By direct expansion ofÛĜ|0 a |ψ s , this leads exactly to Eq. 2. Of course, the optimal decomposition that costs the fewest number of ancilla qubits and primitive gates may be difficult to find, and may not even fit naturally in this model, but LCU shows that implementing an encoding for anyĤ is possible in principle.

Operator Function Design on a Quantum Signal Processor
The purpose of the quantum signal processor is to transform the signalĤ into any desired target operator f [Ĥ] -the observable transformations and quantum signal processing problems. We present a systematic framework that furnishes the optimal complexity and a concrete procedure for almost any f and show how an exact connection is made between query complexity and the theory of best function approximations with polynomials [36,37]. Qubitization in Sec. 3 is the essential first step that makes this endeavor plausible, as evidenced by the highly structured behavior of the iterateŴ in Eq. 4, where for HermitianĤ, multiple applications elegantly generate Chebyshev polynomials T L [Ĥ] [19]. To go further, additional control parameters onŴ are necessary, and in the following, we only consider HermitianĤ. Thus we introduce the phased iterate with the same invariant subspace asŴ : Lemma 4. The phased iterateŴ φ =Ẑ φ−π/2ŴẐ−φ+π/2 , whereẐ φ = ((1 + e −iφ )|G G| −Î) is a partial reflection about |G by angle φ ∈ R, implements a relative phase between the |G λ and |G ⊥ λ subspaces. In block form,Ẑ Proof. By direct computation.
We provide algorithms for two large classes of transformations. 'Observable transformations' in Sec. 4.1 implement target operators where the real and imaginary parts have the same parity with respect toĤ. This is complemented by 'quantum signal processing' in Sec. 4.2 for target operators with opposite parity.

Observable Transformations
A sequence of L of phased iterates is a product of SU(2) rotationŝ whereÎ 2 ,σ x,y,z are the identity and Pauli matrices respectively in the |G λ and |G ⊥ λ basis, and (A, B, C, D) are operator functions ofĤ. As we can only prepare and measure states spanned by the |G λ , the observable transformations problem is concerned with the component G|Ŵ φ |G = A[Ĥ] + iB [Ĥ]. Any choice of phases φ ∈ R L generates sophisticated interference effects between elements of the sequence, leading to (A, B, C, D) with some non-trivial functional dependence on H. Though the dependence of the output on φ seems hard to intuit, they nevertheless specify a program for computing functions ofĤ, similar to how a list of numbers might specify a polynomial.
Remarkably, one can specify the desired output A[Ĥ] + iB[Ĥ], and then efficiently invert this specification to obtain its list of φ. Clearly not all choices are achievable, but the space of possibilities turns out to be large.
As polynomials form a complete basis on bounded real intervals, these results imply the query complexity of approximating any real function A[Ĥ] with error is exactly that of its best polynomial -approximation satisfying the constraints of Thm. 5, and similarly for the complex case.

Quantum Signal Processing
The solution to observable transformations in Thm. 5 implements a function ofĤ where the real and complex parts have the same parity. We now describe how complementary behavior where these parts of the target operator have opposite parity. The main results of this section are drawn from [33] and are included for completeness. Observe that the SU(2) invariant subspace of the iterate times a global phase e iΦŴ φ can be diagonalized to obtain with eigenvectors |G λ± = . Quantum signal processing allows us to approximatê where h is any odd real periodic function. This is implemented by a sequence +| bV ϕ |+ b of controlled- , whereσ x |± = ±|± , that acts on and is then projected on the state |+ b of the new ancilla qubit: ϕ 2 +πV ϕ 1 ,V ϕ = (e −iϕσz/2 ⊗Î as )V 0 (e iϕσz/2 ⊗Î as ). (16) Similar to Thm. 5, a large class of eigenphase functions can be approximated, but with one complication. Up to this point, all our constructions have been deterministic. In contrast, Quantum signal processing requires a probabilistic projection onto an ancilla state. However, this is not problematic as the failure probability can be made arbitrarily small. The query complexity of implementing this approximation is given by Theorem 6 (Quantum Signal Processing [33]). ∀ real odd periodic functions h : (−π, π] → (−π, π] and even N > 0, let (A[θ], C[θ]) be real Fourier series in (cos (kθ), sin (kθ)), k = 0, ..., N/2, that , one can efficiently compute the φ such that +| bV ϕ |+ b in Eq. 16 appliesV φ a number N times to approximateŴ ideal in Eq. 15 with trace distance +| bV ϕ |+ b −Ŵ ideal ≤ and success probability ≥ 1 − 2 .
This result complements Thm. 5 as the Fourier basis is also complete for periodic functions.

Application to Hamiltonian Simulation
With these results for qubitization and operator function design with a quantum signal processor, the application to Hamiltonian simulation follows easily. We complete the proof Thm. 1, and then apply these results to obtain our claims of improvements in Table. 1. Suppose we are given access to a unitaryĜ that prepares state |G =Ĝ|0 from some standard computational basis state |0 , and a unitaryÛ on q qubits such that G|Û |G =Ĉ implements a complex signal operator. By qubitizingĜ,Û with an additional qubit in Thm. 3, we obtain a modifiedÛ with signal operatorĤ = 1 2 (Ĉ +Ĉ † ). ThisÛ has an iterateŴ in Eq. 4 which can be multiplied by a global phase e iΦ to obtain e iΦŴ = e iΦ−iσy⊗cos −1 (Ĥ) in Eq. 14 to implement a nonlinear function ofĤ on its eigenstates. Hamiltonian simulation is accomplished by linearization this phase with a suitable choice of Φ and h with yet another qubit in Thm. 6 such that As done in [10,33], it is easily verified that one suitable choice is Thm. 6 requires an N/2 order Fourier approximation to e i sin (θ)t , which can be obtained by the truncating its Jacobi-Anger expansion e i sin (θ)t = ∞ k=−∞ J k (t)e ikθ [2], with error [10] where J k (t) is the k th Bessel function of the first kind. The rapid converge by truncation arises as e it sin (θ) is an entire analytic function [14]. Solving for N [33] then furnishes the number of queries toŴ required to simulate e −iĤt with error O( ) and failure probability O( ): This achieves the upper bound in Thm. 1. The tradeoff between , t, N in Eq. 19 is plotted in Appendix B, together with example phases ϕ implementing Eq. 18. To prove that it is optimal, we show that sparse Hamiltonian simulation is a special case.

Corollary 7 (Hamiltonian Simulation of a Sparse Hermitian Matrix). Given access to oraclê
O H |j |k |z = |j |k |z ⊕Ĥ j,k queried by j ∈ [2 n ] row and k ∈ [2 n ] column indices to return the valueĤ j,k = j|Ĥ|k , with maximum absolute value Ĥ max = max jk |Ĥ j,k |, and oracleÔ F |j |l = |j |f (j, l) queried by j ∈ [2 n ] row and l ∈ [d] column indices to compute in-place the column index f (j, l) of the l th non-zero entry of the j th row, time evolution byĤ can be simulated with for time t and error with O(dt Ĥ max + log (1/ ) log log (1/ ) ) queries toÔ H ,Ô F .
The query complexity in Eq. 19 for this sparse case exactly matches a lower bound based on simulating a Hamiltonian that solves PARITY with unbound error [8], valid for all parameter values, and not just in asymptotic limits [10,33]. This completes the proof of Thm. 1.
The case whereĤ decomposes into a linear combination of unitaries is an immediate application: Corollary 8 (Hamiltonian Simulation of a Linear Combination of Unitaries). Given access to an oracleĜ that prepares |G = d j=1 α j /α|j a , where α j ≥ 0, α = d j=1 α j and oracleÛ = d j=i |j j| a ⊗Û j , time evolution byĤ = d j=1 α jÛj can be simulated with for time t and error with O(αt + log (1/ ) log log (1/ ) ) queries toĜ,Û .
Proof. The corollary is proven from Thm. 1 if G|Û |G =Ĥ/α. This is true by direct computation.
An example of this algorithm this is considered in Appendix. C for simulating the dissociation of molecular hydrogen at chemical accuracy.
The intuitiveness of Thm. 1 allows us to swiftly devise new models of Hamiltonian simulation.
Corollary 9 (Hamiltonian Simulation of a Purified Density Matrix). Given access to an oracleĜ that prepares a purificationĜ|0 a = |G a = j √ α j |j a 1 |χ j a 2 of density matrixρ = Tr a 1 [|G G|], time evolution byρ can be simulated with for time t and error with O(t + log (1/ ) log log (1/ ) ) queries tô G.
Proof. The corollary is proved if we can findÛ such that Thm. 1 G|Û |G =ρ . We choose aÛ that swaps the system s and ancilla register a 2 . Let {|λ } be a complete basis on the system. By direct computation, |α j ||χ j s χ j |λ λ| s =ρ. (21)

Conclusion
Our general procedure for Hamiltonian simulation in Thm. 1 extends the scope of possible useful formulations of Hamiltonian simulation. As seen in Table. 1, it encompasses any case where the Hamiltonian is embedded in a flagged subspace of the signal unitary. Given this, a simulation algorithm with query complexity optimal in all parameters, and also not just in asymptotic limits, is easily obtained with minimal overhead. While this procedure contains and significantly improves upon important models where the Hamiltonian is d-sparse or a linear combination of unitaries, its greater value lies in illuminating an intuitive and straightforward path to other as-yet undiscovered models of Hamiltonian simulation. In particular, our result for time-evolution by a purified density matrix is a quadratic improvement in time and an exponential improvement in error over the sample-based model -the proof of which consisted of just a few lines. Many other exciting directions extend from this work. One example is how additional structural information aboutĤ [18] may be exploited. This is illustrated by when the spectral norm of Ĥ is smaller than the sum of coefficients α of a particular linear combination of unitaries decomposition. If this decomposition were to be used, simulation would take time O α Ĥ t + log (1/ ) log log (1/ ) -a factor α slowdown. In principle, an α = O(1) decomposition always exists, but this may be difficult to find. Furthermore, it is easy to construct pathologicalĤ with small norms, but nevertheless decompose by naive methods into components with large spectral norms [42]. Our approach offers a possible solution -one finds any projector for the Hamiltonian, rather than some specific decomposition into parts with properties dictated by the formulation. It remains an interesting challenge to identify the cases where, and determine how structural information may be incorporated.
These advances are special cases arising from our vision of the more general quantum signal processor. Through qubitization, structure is imposed onto any unitary process implementing some Hermitian signal operator. This structure allows for efficient processing of the signal, by the techniques of observable transformations and quantum signal processing, into some more desired form. As the query complexity of approximating any arbitrary target function of the signal exactly matches fundamental bounds lower in polynomial and Fourier approximation theory [37], we expect this to have numerous application in metrology [34] and other quantum algorithms [45].

A Qubitization of Normal Operators
The results of Thms. 3, 2 for qubitization can be extended to normal operators. It is well known that any normal matrix has a polar decomposition whereĤ U is unitary,Ĥ H is positive-semidefinite, and [Ĥ U ,Ĥ H ] = 0 commute, with eigenvalues ofĤ|λ = e iθ λ λ|λ , where λ ≥ 0, θ λ ∈ R. This reduces to a Hermitian operator whenĤ U has eigenvalues ±1, and reduces to a unitary operator when allĤ H has eigenvalues 1. The trivial approach to qubitization applies to any complex matrix. We simply use the construction of Thm. 3 to implement the Hermitian signal operator 1 2 (Ĥ +Ĥ † ).
Another possibility uses two phased iterates in an alternating sequence on input state |G a |ψ s , whereÛ + =Û andÛ − =Û † . For each eigenstate |λ , the separate iterates have the block form where the first column corresponds to input states {|G |λ , |G V − |λ , |G V + |λ }. The subspace spanned by these states is not invariant under any repeated application of an iterate of the same sign. However, the productŴ φ i −Ŵφ j + has an invariant subspace containing |G . With the understanding that we only consider alternating sequences, eachŴ φ± has the representation Note that when all eigenvalues λ are identical and φ = π/2, this reduces to Oblivious amplitude amplification [7], and we recover Hermitian qubitization when all θ λ = 0. While this approach uses one less ancilla qubit than the construction of Thm. 3, quantum signal processing can only be performed on controlled block of even lengthŴ φ± , as only they have an invariant subspace. This limitation can be relevant in some cases, such as Hamiltonian simulation where quantum signal processing is applied to a singleŴ φ .

B Practical Details for Implementing Hamiltonian Simulation
This appendix illustrates a specific application of the quantum signal processing approach to a signal unitary that encodes the HamiltonianĤ as a signal operator. In particular, a comparison of performance with the BCCKS approach is made. The details will be useful to readers interested in implementing our procedure on a quantum computer. These include plots for the exact error scaling Eq. 19 of the N/2 term Fourier approximation to e it sin (θ) in Fig. 1(Left), and the number of required queries per unit of simulated time in Fig. 1(Right). Furthermore, a table of select phases ϕ computed using the algorithm in [35] can be found in Table. 3.
In the interests of a fair comparison, Fig. 1(Right) counts the number of queries to the signal operatorÛ in theĤ = G|Û |G encoding.Û is not assumed to be qubitized, thus incurring a factor 2 additional cost from queringÛ andÛ † each over the asymptotic limit of 2 queries per unit of simulation time in Fig. 1(Left). Similarly, the BCCKS algorithm [9] incurs a factor 3 additional cost from queryingÛ twice andÛ † in their use of oblivious amplitude amplification. As BCCKS is known to be optimal in the regime t = O( log (1/ ) log log (1/ ) ), the improvement of our approach is most dramatic outside of it. In particular, the queries per unit time of BCCKS scales like O( log (t/ ) log log (t/ ) ), whereas our approach approaches 4 in the limit t → ∞.
The number of queries to in the BCCKS model is 3Kr, where K is the queries per segment e −iĤ log 2/r , r = t/ log (2) is the number of segments, and 3 is for oblivious amplitude amplification on each segment. K is chosen such that ∞ k=K+1 log k 2 k! ≤ r . An analysis of the procedure [9] shows that the trace distance of its simulated evolution is a factor O(1) 2 larger than . Thus Fig. 1 slightly overestimates BCCKS performance as we take directly to be the trace distance.

C Scalable Quantum Simulation of Molecular Energies
Recently, O'Malley et. al realized a breakthrough experimental quantum simulation of the dissociation of molecular hydrogen [39]. This featured the use of techniques believed essential for scaling quantum chemistry simulations to larger problem sizes. There, interactions between a minimal set of orbitals were mapped, through efficient classical pre-computation, to an effective Hamiltonian with d + 1 termsĤ j that captured the essential quantum dynamics: whereσ (j) k denotes a Pauli matrixσ k on the j th qubit, the α k are real coefficients that are a function of the bond length R, andĤ is in hartree units. One major goal of quantum chemistry is estimating energies at 'chemical accuracy'. This is = 1.6 × 10 −3 hartrees, at which point reaction rates at room temperature can be realistically predicted. In the following, we drop the identity term, as it only contributes a predictable global phase that can be subtracted from energy estimates, and all units are set to 1.
We now provide a comparison of gate counts required to simulate time evolution by Eq. 26 for time t = 1 = 625 with error to achieve chemical accuracy in quantum phase estimation. A rough estimate is provided for the linear-combination-of-unitaries approach in Cor. 8, and this is compared to best-case estimates of the number of exponentials required by first-order Trotterization of e −iĤt applied in [39], and upper bounds from higher-order Trotter-Suzuki formulas. Although the commonly used first-order Trotterization has a gate complexity that scales poorly like O(d 3 t 2 / ), compared to the O(d(αt + log (1/ ) log log (1/ ) )), where α = d j=1 |α j |, of our approach, it has small constant factors in best case scenarios, uses no ancilla qubits, and establishes a baseline.
The first-order Trotterization of e −iĤt is e −iĤ 1 t/n · · · e −iĤ d t/n n = e −iĤt + O The required number of Trotter steps n ≈ (dt Ĥ j ) 2 /(2 ) is estimated by solving = O (dt Ĥ j ) 2 /2n . This can be improved with higher-ordered Trotter-Suzuki product formulas [6] which bound the N t ϕ = (ϕ 1 , ϕ 2 , ..., ϕ N ) for h(θ) = t sin (θ) 2   . We also compute the true error of Eq. 27 T = max |ψ ((e −iĤ 1 t/n · · · e −iĤ d t/n ) n − e −iĤt )|ψ , and find the minimum number of exponentials N T = dn such that T = . This is the best-case estimate, as this computation is in general intractable for larger systems. Applying Cor. 8 requires the signal oracleÛ = d j=1 |j j| ⊗ sign(α j )Ĥ j . The signal state is |G = d j=1 |α j | α |j . It is straightforward to verify that these satisfy the conditions G|Û |G = H/α and G|Û 2 |G =Î of Thm. 2, thusÛ is already qubitized -this is a general feature of the Pauli basis, which is also unitary. We take the gate complexity of implementingÛ and the state oracleĜ to be ≈ d each. Thus the iterateŴ in Eq. 6 requires ≈ 3d gates. The number of iterates for quantum signal processing can be obtained from Fig. 1. In Eq. 26, α = O(1). Thus αt = α625 log (1/ ) is in the long time limit, so ≈ 2 iterates, or 6d gates per unit of simulation time is required, asÛ is aready qubitized. Thus the gate complexity of Hamiltonian simulation with a linear combination of untiaries is N QSP ≈ 6dαt.
These estimated gate costs for achieving chemical accuracy are plotted in Fig. 2 as a function of bond length R. The α j parameters as a function of R are drawn from Table. 1 of [39]. While the estimate N QSP is quite optimistic as it does not account for the constant factor costs of compiling multiply-controlled gates, even a factor of ∼ 20 overhead keeps it competitive with the best-case Trotterization, which requires inefficient pre-computation. In any case, given minimal information about the structure ofĤ, the gate cost of our approach is several orders of magnitude better than ancilla-free Trotter-Suzuki methods at chemical accuracy, and will only improve further in simulations for longer times, at greater precision, and with more terms. Bond Length (R) / Å Gate Cost Estimate Figure 2: Estimated gate counts for achieving chemical accuracy in simulating the hydrogen dissociation Hamiltonian of Eq. 26. A comparison is made between upper bounds N TS by ancilla-free Trotter-Suzuki methods (blue), a best-case first-order Trotterization N T given inefficient classical pre-computation (yellow), and a reasonable estimate of our algorithm in the quantum signal processing formulation N QSP (green).