Optimal Hamiltonian simulation for time-periodic systems

The implementation of time-evolution operators $U(t)$, called Hamiltonian simulation, is one of the most promising usage of quantum computers. For time-independent Hamiltonians, qubitization has recently established efficient realization of time-evolution $U(t)=e^{-iHt}$, with achieving the optimal computational resource both in time $t$ and an allowable error $\varepsilon$. In contrast, those for time-dependent systems require larger cost due to the difficulty of handling time-dependency. In this paper, we establish optimal/nearly-optimal Hamiltonian simulation for generic time-dependent systems with time-periodicity, known as Floquet systems. By using a so-called Floquet-Hilbert space equipped with auxiliary states labeling Fourier indices, we develop a way to certainly obtain the target time-evolved state without relying on either time-ordered product or Dyson-series expansion. Consequently, the query complexity, which measures the cost for implementing the time-evolution, has optimal and nearly-optimal dependency respectively in time $t$ and inverse error $\varepsilon$, and becomes sufficiently close to that of qubitization. Thus, our protocol tells us that, among generic time-dependent systems, time-periodic systems provides a class accessible as efficiently as time-independent systems despite the existence of time-dependency. As we also provide applications to simulation of nonequilibrium phenomena and adiabatic state preparation, our results will shed light on nonequilibrium phenomena in condensed matter physics and quantum chemistry, and quantum tasks yielding time-dependency in quantum computation.

The implementation of time-evolution operators U (t), called Hamiltonian simulation, is one of the most promising usage of quantum computers. For time-independent Hamiltonians, qubitization has recently established efficient realization of time-evolution U (t) = e −iHt , with achieving the optimal computational resource both in time t and an allowable error ε. In contrast, those for time-dependent systems require larger cost due to the difficulty of handling time-dependency. In this paper, we establish optimal/nearlyoptimal Hamiltonian simulation for generic time-dependent systems with timeperiodicity, known as Floquet systems. By using a so-called Floquet-Hilbert space equipped with auxiliary states labeling Fourier indices, we develop a way to certainly obtain the target time-evolved state without relying on either time-ordered product or Dyson-series expansion. Consequently, the query complexity, which measures the cost for implementing the time-evolution, has optimal and nearly-optimal dependency respectively in time t and inverse error ε, and becomes sufficiently close to that of qubitization. Thus, our protocol tells us that, among generic time-dependent systems, time-periodic systems provides a class accessible as efficiently as time-independent systems despite the existence of time-dependency. As we also provide applications to simulation of nonequilibrium phenomena and adiabatic state preparation, our results will shed light on nonequilibrium phenomena in condensed matter physics and quantum chemistry, and quantum tasks yielding time-dependency in quantum computation. chemistry, such as quantum dynamics [2,3,4] and quantum phase estimation [5,6,7,8]. For long time, Trotterization has been a standard way of Hamiltonian simulation for both time-independent and time-dependent systems, which can provide simple realization available in today's quantum computers [2,9,10,11,12]. Instead, it requires a huge number of elementary gates up to poly (1/ε) to achieve an acceptable error ε.
For time-independent systems, various efficient Hamiltonian simulation algorithms have appeared in the past decade, which yield fewer resources than Trotterization to implement the time-evolution U (t) = e −iHt [13,14,15,16,17,18]. Among them, the qubitization technique [18] achieves the best cost in that its query complexity (a measure of cost) has optimal scaling both in time t and inverse error 1/ε in an additive way. [18]. In contrast, for time-dependent systems, while there exist several efficient Hamiltonian simulation for constructing time-evolution U (t) [19,20,21,22,23,24], they largely rely on discretizing the time. Although the truncated Dyson-series algorithm [19,20], which is versatile among them, achieves the query complexity optimal or nearly-optimal both in t and 1/ε, its scaling is multiplicative with requiring much larger cost than qubitization. This originates from the difficulty of efficiently dealing with continuous-time modulation. It is nontrivial whether we can simulate time-dependent Hamiltonians with much fewer cost close to that of the qubitization.
Among time-dependent Hamiltonians, one of the most important targets of quantum simulation is a time-periodic Hamiltonian satisfying H(t + T ) = H(t) with T being a period. In fact, quantum systems under time-periodic Hamiltonians are called Floquet systems, and have been platforms of various nonequilibrium phenomena in condensed matter physics and quantum chemistry; for instance, they can host nonequilibrium phases of matter absent in time-independent systems, such as topological phases [25,26,27] and time crystals [28,29,30,31,32,33]. Time-periodic Hamiltonians also cover optical responses in solids and molecules [34,35,36,37], exemplified by high-harmonic generation and photo-chemical reactions. In addition, adiabatic quantum dynamics such as Thouless pumping [38,39,40] and adiabatic state preparation for quantum computation [41,6,42] can be regarded as a part of time-periodic Hamiltonian dynamics under sufficiently large period T . Despite fundamental significance and potential application of time-periodic Hamiltonians in broad fields, there is no Hamiltonian simulation protocol which can efficiently handle their time-dependency, while their simulation itself is possible by the truncated Dyson-series algorithm.
In this paper, we develop an optimal and/or nearly-optimal quantum algorithm for time-periodic Hamiltonian simulation. The key idea of our formalism is mapping timedependent systems to effective time-independent systems in the infinite-dimensional Floquet-Hilbert space, obtained by the Fourier series expansion in time. Although the infinitedimensionality prohibits its simulation on quantum computers, we resolve this by formulating the Lieb-Robinson bound [43], amplitude amplification, and the qubitization technique. Consequently, we can simulate the target time-evolved state with arbitrary small error and failure probability with a finite-dimensional Hilbert space by preparing ancillary states labeling Fourier indices.
The resulting query complexity for time-periodic Hamiltonian simulation has optimal and nearly-optimal scaling respectively in time t and inverse error 1/ε, and significantly, these contributions appear in an additive way. This implies that the time-periodic Hamiltonian simulation can be executed with the cost sufficiently close to qubitization, which provides the theoretically best scaling, despite the existence of time-dependency. We also note that the auxiliary states for Fourier indices accurately reproduce the exact dynamics with fewer degrees of freedom than those for discretized time, which is employed for generic time-dependent systems. This leads to smaller query complexity and simpler oracles of our algorithm compared to the truncated Dyson-series algorithm. In addition, we provide simulation of the Fermi-Hubbard model under light and adiabatic state preparation as potential applications. Thus, our protocol will shed light on the promising usage of quantum computers for nonequilibrium quantum many-body phenomena in condensed matter physics and quantum chemistry, and the optimal control in quantum computation.
The rest of this paper is organized as follows. In Section 2, we briefly review Floquet theory and the qubitization for this paper to be self-contained. We provide our main results in Sections 3-9, with firstly summarizing them in Section 3. In Section 4, we derive the Lieb-Robinson bound in time-periodic systems, and the way to accurately reproduce the exact dynamics with the Floquet-Hilbert space. Sections 5 and 6 are devoted to deriving the subroutines of the algorithm. They respectively provide the amplification protocol that can realize the time-evolved state with sufficiently high success probability (Section 5) and efficient implementation of the time-evolution operator in the Floquet-Hilbert space (Section 6). Section 7 completes the optimal time-periodic Hamiltonian simulation, and compares its resource with other algorithms for time-independent /dependent Hamiltonian simulation. We provide some promising applications in Section 8, and concludes our paper in Section 9.

Review on Floquet theory
We briefly review Floquet theory to analyze Schrödinger equation under a time-periodic Hamiltonian, Here, the time-evolved state of the system of interest is represented by |ψ(t) , and we assume that it is defined on a finite-dimensional Hilbert space H. The set of states denotes a certain choice of the basis of H. Floquet theorem says that the solution |ψ(t) is always written in the form of similar to Bloch theorem for spatially-periodic systems. Here, ε α ∈ [−π/T, π/T ) and |φ α (t) ∈ H are respectively called quasienergy and Floquet state. Time-periodicity of H(t) and |φ α (t) allows the Fourier series expansion as with the frequency ω = 2π/T . The hermiticity of H(t) implies H −m = H † m . We assume H(t) < ∞ ( · ; operator norm), which results in m∈Z H m 2 < ∞. Introducing an auxiliary degree of freedom {|l } l∈Z (sometimes called photon number) to relate |φ l α ↔ |l |φ l α , every pair of ε α , |φ α (t) can be obtained by the time-independent eigenvalue equation, with the effective Hamiltonian defined by Intuitively, this time-independent Hamiltonian H eff describes a static one-dimensional system, where each l-th site has potential energy H 0 − lω, as Fig. 1. Off-diagonal terms H −m represent hopping by −m sites, which can be recognized as either emission or absorption of |m| photons. We remark that the difficulty of time-dependence does not vanish by this mapping; it is translated into infinite dimensionality of the space spanned by {|l |ψ j | l ∈ Z, j = 1, 2, . . . , dim(H)}, which is called Floquet-Hilbert space or Sambe space [44]. While the effective Hamiltonian H eff is often used for identifying ε α , |φ α (t) , it can be employed also for directly determining the dynamics as follows [45]; We can extract the dynamics under a time-dependent Hamiltonian H(t) from that under the time-independent one H eff by preparing auxiliary systems labeled by |l . However, we note that applying optimal algorithms for time-independent systems to simulating this extended dynamics is not straightforward, since the Floquet-Hilbert space is infinitedimensional.

Review on qubitization technique
Here, we briefly review so-called qubitization technique to efficiently implement e −iHt for time-independent Hamiltonian H on the Hilbert space H [18]. It begins with assuming the existence of block-encoding, that is, we suppose an n a -qubit auxiliary state |G a (called an oracle state) and a unitary gate O on the space C 2 na ⊗ H (called an oracle gate) satisfying The inequality α ≥ H comes from the norm of the unitary gate as O = 1, and the parameter α represents a typical energy scale of the whole system. Here, the oracles O and |G a are supposed to be efficiently implemented; the unitary gates O and G, which realizes the oracle state from a reference state |0 a as G |0 a = |G a , require at most C ∈ poly (N ) elementary gates. The explicit construction of the oracles has been explored for certain classes of static Hamiltonians. For instance, when the Hamiltonian H is given by a linear combination of unitary (LCU) as a possible choice of the oracles is with α = jmax j=1 α j . The number of ancillary qubits n a scales as log j max . Since j max is poly (N ) for typical N -site systems, the number of elementary gates for the oracles amounts to C ∈ poly (N ). LCUs cover various types of Hamiltonians exemplified by quantum spin systems composed of local Pauli operators and fermionic systems in condensed matter physics and quantum chemistry [46]. The block-encoding of other Hamiltonians, such as sparse-access matrices and purified density matrices, has been also revealed [18,47,48].
Once we find out the oracles O and |G a , we can organize a unitary gate W q , implemented by O(1) additional ancillary qubits and O(q)-times usage of O, G, and O(n a ) elementary gates, which satisfies for every state |ψ ∈ H. Here, the error ε q comes from approximating e −iHt by a certain degree-q polynomial of H, and decays as ε q ∈ O((αt/q) q ). The scaling of q required for making the error ε q smaller than an acceptable error ε is given by the following formula; q ∈ Θ αt + log(1/ε) log(e + (αt) −1 log(1/ε)) . (12) In order to determine the explicit form of W q , we resort to quantum signal processing [17]. It dictates that only poly (q)-time classical computation is required for this purpose.
To summarize the qubitization technique, we require the following computational resources to implement the time-evolution operator e −iHt with an acceptable error ε (See also Here, the query complexity is defined by the complexity counted by the number of oracles, corresponding to q in this case, and gives the scaling of the number of elementary gates below. Significantly, the query complexity takes an additive form as The o(log(1/ε)) term scales as log(1/ε)/ log log(1/ε) under 1/ε → ∞ with fixed time t. Its scaling is known to be optimal both in t and 1/ε for simulating generic time-independent Hamiltonians [47]. In our algorithm for time-periodic Hamiltonians, we exploit qubitization as a subroutine to implement time evolution under an effective Hamiltonian in Floquet theory. We can achieve the query complexity for time-periodic systems in the additive form like Eq. (14), while the conventional truncated Dyson-series algorithm for generic time-dependent systems needs the query complexity in a multiplicative form αt × o(log(αt/ε)).

Summary of this paper
In this section, we overview our main results on efficient quantum simulation of timeperiodic Hamiltonians. We will provide their detailed derivation in the following Sections 4-7.

Overview of Algorithm
We first show an overview of the algorithm for time-evolution under a time-periodic Hamiltonian H(t + T ) = H(t). Throughout the main text, we assume that the Fourier components of the Hamiltonian H(t) vanishes at a certain cutoff m max ∈ O(1); (This discussion can be extended to cases where H m exponentially decays in |m|. See Appendix C.). The central attempt in the algorithm is to efficiently simulate Eq. (7), on quantum circuits. The time-dependency is erased by mapping the dynamics to the one on the Floquet-Hilbert space, which results in the following two benefits. First, we do not discretize the time with infinitesimal intervals. We instead rely on Fourier indices l ∈ Z, which are originally discrete, and actually they lead to much better accuracy with the same auxiliary degrees of freedom. Second, we can use various Hamiltonian simulation algorithms for time-independent systems. Since the qubitization technique has achieved the best query complexity in the time t and the inverse error 1/ε, exploiting it for H eff accelerate the simulation of time-periodic systems.
On the other hand, in order to simulate the time-evolved state |ψ(t) via Eq. (16) on quantum circuits, we also have several problems to be tackled. First, the Floquet-Hilbert space is infinite dimensional. We have to introduce truncation at a certain Fourier index l max , and consider a finite set of indices defined by Then, we focus on the approximate dynamics by The finite-dimensional space, spanned by 2l max × dim(H) states {|l |ψ j | l ∈ D lmax , j = 1, . . . , dim(H)}, is called the truncated Floquet-Hilbert space. While l max → ∞ reproduces |ψ lmax (t) → |ψ(t) , it is nontrivial how we should choose l max to achieve the accuracy |ψ(t) − |ψ lmax (t) ≤ ε. The second problem is the small success probability of postselecting the ancillary state. Even if we succeed in approximation with finite l max , Eq. (19) requires a projection to a unnormalized ancillary state l∈D lmax |l for the state in the truncated Floquet-Hilbert space, In the actual computation, we need post-selection to an ancillary state given by Success probability of the projection is given by a lmax |Ψ lmax (t) 2 ψ(t)|ψ(t) /(2l max ) = 1/(2l max ). As l max increases to ensure the accuracy, the expected time to successfully obtain |ψ(t) becomes longer in proportion to it. The final problem is about implementation of e −iH lmax eff t . While the effective Hamiltonian H eff is time-independent, its structure is complicated due to the additional degrees of freedom labeled by {|l }. It is nontrivial whether the optimal Hamiltonian simulation algorithm, i.e. the qubitization technique, provides the optimality for time-periodic systems.
Our algorithm relying on the truncated Floquet-Hilbert space efficiently simulates the time-evolved state |ψ(t) with resolving the above problems. The significant developments are composed of the following steps; (a) Decision of the Fourier index for truncation, l max , to achieve an allowable error ε. (c) Efficient implementation of exp −iH lmax eff t by the qubitization technique. In Step (a), we explicitly derive the upper bound of |ψ(t) − |ψ lmax (t) in a similar way to the Lieb-Robinson bound. We show that the choice of l max , based on is sufficient to make the error smaller than ε. Here, γ gives the scale of the time-dependent terms in H(t). The second step (b) plays a role in amplifying the success probability of the post selection from 1/(2l max ) to 1 − O(ε). The first protocol, which we call amplification by symmetry, exploits the symmetry of H eff which is always present and inherent in timeperiodic systems. It brings the success probability from 1/(2l max ) to 1/4 − O(ε) only with small cost of O(log l max ) elementary gates. Following this, we apply the oblivious amplitude amplification [16], reminiscent of Grover's search algorithm [49]. The success probability is further amplified from 1/4 − O(ε) to 1 − O(ε). Exploiting these two kinds of amplification, we can obtain the target state |ψ(t) only with a little additional resource that does not change the scaling. The final step (c) constructs conversion of the effective Hamiltonian so that exp −iH lmax eff t can be efficiently implemented. The original effective Hamiltonian H lmax eff is not suitable for the qubitization technique, since the additional degrees of freedom |l makes its block-encoding inefficient. To resolve this problem, we derive an alternative effective Hamiltonian such that the desired evolution exp −iH lmax eff t is accurately reproduced. The block-encoding for the alternative effective Hamiltonian requires O(1)-times queries to the oracles for each Fourier component {H m }, which is important for achieving optimal or nearly-optimal dependence in t and 1/ε.

Main results
Here, we summarize the computational resources for computing the dynamics |ψ(t) under a time-periodic Hamiltonian H(t). We construct two different quantum algorithms depending on the time scale of the dynamics. The first case is an adiabatic-like case, where we are interested in the dynamics over O(1) periods as ωt ∈ O(1). We call it "adiabatic" since adiabatic dynamics under sufficiently large period T is a typical target, while T is not required to be large. The other case is generic long-time regime, in which we focus on dynamics over multiple periods as ωt ∈ Ω(1). The algorithms for both cases follow Steps (a)-(c) of Section 3.1; the number of ancillary qubits is determined by the truncation order l max of Step (a), and the number of elementary gates comes mainly from the cost of implementing exp −iH lmax eff t in Step (c). We assume that the Fourier component H m becomes zero for |m| > m max with an O(1) constant m max . The block-encoding for each H m is supposed to be given by an oracle unitary gate O m and an n a -qubit oracle state |G m as Each oracle state |G m is generated by trivial states as |G m = G m |0 . We define the energy scale of the Hamiltonian H(t) by and suppose that {α m } m can be embedded into an O(1)-qubit quantum system by The query complexity is defined by the number of queries to the oracles O m , G m , and G coef , where each of them is supposed to require at-most C elementary gates. We summarize the computational resources for simulating |ψ(t) in the adiabatic-like regime and the generic long-time regime respectively by the following theorems.

Theorem 1. (Resource for adiabatic-like regime)
Suppose that we are interested in the time-evolved state |ψ(t) from an arbitrary initial state |ψ(0) over O(1) periods. The computational resources to obtain it with the acceptable error and the failure probability smaller than O(ε) are summarized as follows; • Number of ancillary qubits; n a + O(log(γt) + log log(1/ε)).
The query complexity has optimal scaling in the time t and nearly-optimal scaling in the inverse error 1/ε.

Theorem 2. (Resource for long-time regime)
Suppose that we are interested in the time-evolved state |ψ(t) from arbitrary initial states |ψ(0) over multiple periods ωt ∈ Ω(1). The computational resources to obtain it with the acceptable error and the failure probability smaller than O(ε) are summarized as follows; • Number of ancillary qubits; n a + O(log(γ/ω) + log log(ωt/ε)).
• Additional gates per query; O(n a + log(γ/ω) + log log(ωt/ε)) The scaling of the query complexity is optimal in time t for practical problems up to poly (N )-time, while it is formally nearly-optimal. It is nearly-optimal in the inverse error 1/ε.
The parameters α and γ, defined by Eqs. (25) and (23), respectively represent energy scales of the overall terms and the time-dependent terms in H(t). They typically scale as α, γ ∈ poly(N ) with the system size N . In contrast, the frequency ω = 2π/T is typically an O N 0 value much smaller than α and γ (The high-frequency cases where ω are comparable to or larger than α, γ are trivial, as discussed in Appendix D

Building an appropriate truncated Floquet-Hilbert space
This section is devoted to deriving the proper truncation order for the Floquet-Hilbert space, following (a) in Section 3.1. In order to achieve desirable accuracy for the exact time-evolved state with the truncated Floquet-Hilbert space, we should obtain an exact upper bound on its error. Here, we prove the Lieb-Robinson bound in the Floquet-Hilbert space in Section 4.1, and determine a proper truncation order based on it in Section 4.2.

Lieb-Robinson bound in Floquet-Hilbert space
The proper choice of the truncation order l max is determined so that the approximate state |ψ lmax (t) [See Eq. (19)] can reproduce the time-evolved state |ψ(t) with satisfying the relation |ψ(t) − |ψ lmax (t) ≤ ε. From the explicit formula Eq. (19), we can see that it is important to observe how the transition amplitude l|e −iH lmax eff t |0 behaves for sufficiently large l. We first show the upper bound on this transition amplitude in Theorem 3. Since it is reminiscent of the Lieb-Robinson bound in single-particle quantum systems [50], we call it the Lieb-Robinson bound in the Floquet-Hilbert space.

Theorem 3. (Bound on transition amplitude)
We assume H m = 0 for |m| > m max . Then, for indices l, l ∈ Z satisfying |l − l | ≥ 2m max γt, the transition amplitude is bounded by where the parameter γ is defined by Eq. (23).
Proof.-We will omit the superscripts l max for some operators introduced here since they are not important. We first employ the interaction picture. With the unitary operation defined by the time evolution operator e −iH lmax eff t is represented as Here, the Hamiltonian in the interaction picture is defined by where the summation in the second line is taken over l, m such that l, l +m ∈ D lmax . Using the Dyson series expansion, we obtain In the last equality, we employ the identity l i ∈D lmax |l i l i | = I for n − 1 times. The summation {l i } is taken over l i ∈ D lmax for i = 1, 2, . . . , n − 1 under the fixed l 0 = l and l n = l. Each product n i=1 l i |H I (t i )|l i−1 represents a complex transition amplitude from |l to |l along the path |l → |l 1 → . . . → |l n−1 → |l under H I (t i ). Since the Hamiltonian H I (t i ) shifts the Fourier index |l i by at-most m max due to Eq. (18), the low order terms labeled by n < |l − l |/m max disappears. This results in Here, 0 . This leads to the upper bound of its operator as follows [51], We also use the following inequality; This relation can be easily confirmed by For the indices l, l satisfying |l − l | ≥ 2m max γt, we can substitute x = γt and n 0 = |l − l |/m max . This results in the bound, The inequality from the Stirling formula, ensures the inequality Eq. (36).
The Lieb-Robinson bound in the transition amplitude provides a guide for choosing a proper truncation order. By setting l = 0, Eqs. (35) and (36) say that the contributions to |ψ lmax (t) from indices satisfying |l| ≥ 2m max γt rapidly decay as l −l . Thus, O(γt) becomes a possible truncation order. However, we note that the change in l max affects |ψ lmax (t) also via the change in the support of the effective Hamiltonian H lmax eff . By taking it into account with using the Lieb-Robinson bound, we obtain the exact upper bound on the error between |ψ(t) and |ψ lmax (t) as follows.

Theorem 4. (Floquet-Hilbert space truncation)
We consider the approximate time-evolved state obtained from the truncated Floquet-Hilbert space, |ψ lmax (t) , given by Eq. (19). Then, its deviation from the exact one |ψ is bounded by if the truncation order l max satisfies l max ≥ 2m max γt.
Proof.-We evaluate the convergence of |ψ lmax (t) . For different orders l max , l max satisfying l max > l max , we compute the difference, The first error comes from truncating the order of the post-selected state |a lmax . Using Theorem 3 directly implies The summation ∞ l=lmax can be divided based on l (mod. m max ), and each summation corresponds to the left hand side of Eq. (45) with n 0 ≥ l max /m max . In other words, for l max ≥ 2m max γt, we obtain the following inequality, As a result, the first error ε 1 is bounded by 4m The second error ε 2 comes from truncating the order of the effective Hamiltonian. In a similar way to the proof of Theorem 3, each term is bounded by  Their differences survive only when the path goes across the domain D l max \D lmax , where these two Hamiltonians have different actions. In other words, low order terms with n < {(l max − 0) + (l max − |l|)/m max should vanish. Considering that the norm of H lmax eff (t) and H l max eff (t) is both bounded by γ, we obtain In the last inequality, we again use the relation Eq. (54). Taking the limit l max → ∞ for ε 1 + ε 2 reproduces the result of Theorem 4.

Truncation order of Floquet-Hilbert space
We hereby determine the truncation order l max so that |ψ lmax (t) can reproduce the exact time-evolved state with a desirable error up to O(ε). By using the inequality from the Stirling formula, Eq. (48) to Theorem 4, we obtain the error bounded by for l max ≥ 2m max γt. The truncation order l max is chosen so that the right hand side can be smaller than ε. To this aim, we should evaluate a function f ( which is known to be dealt with the Lambert W function W (x) satisfying W (x)e W (x) = x [52]. Here, we rely on the resulting proposition [47].
is monotonically decreasing in x ≥ κ/e, and satisfies the following inequality for 0 < η < 1; See Lemma 59 in the full version of Ref. [47] for the proof. Based on the above proposition, we choose l max by so that the error can be bounded from above as We note that this choice does not violate the condition l max ≥ 2m max γt, which is required for Theorem 4. The monotonicity of f (x) ensures |ψ(t) − |ψ lmax (t) ≤ ε under the above choice (The additional term m max is attached for later calculation, especially for Appendix B.1) and B.2).
Let us discuss how l max increases in the time t and the inverse error 1/ε. The form of the error in l max , given by Eq. (49), is the same as that for qubitization in the query complexity q, given by ε q ∈ O((αt/q) q ). Therefore, its scaling can be determined in a similar way, which results in To reproduce the truncated Floquet-Hilbert space, we should prepare an ancillary system labeled by {|l } l∈D lmax . The number of qubits for such ancillary system is at-most

Amplitude amplification of the dynamics
In this section, we show two kinds of amplification to achieve sufficiently high success probability of extracting the time-evolved state |ψ(t) .
As we stated in Section 3.1, the approximate state |ψ lmax (t) cannot be directly realized on quantum circuits. After simulating the dynamics in the truncated Floquet-Hilbert space to get |Ψ lmax (t) defined by Eq. (20), we make a projection to |a lmax . Although the resulting renormalized state is sufficiently close to |ψ lmax (t) and also the time-evolved state |ψ(t) as for l max given by Eq. (59), we should be careful of the low success probability; it becomes We need approximately O(γt)-times trials of the post selection, and every trial is expected to require at-least O(t) complexity for implementing |Ψ lmax (t) = e −iH lmax eff t |0 ⊗ |ψ(0) . Therefore, the naive implementation based on Eq. (19) is not efficient for the time-evolved state |ψ(t) in that the expected computational time reaches O t 2 .
We resolve this problem by the amplification of the success probability up to 1−O(ε) in the next section. The first one, which exploits the symmetry of the effective Hamiltonian H eff , amplifies from O l −1 max to O(1). The latter one following this, which is reminiscent of the Grover's search algorithm, allows the success probability 1 − O(ε). As discussed later, using only either one fails to efficiently compute |ψ(t) .

Amplification by symmetry
We introduce the amplification exploiting the symmetry of the effective Hamiltonian H eff . First of all, we specify the symmetry here; it is about the translation of the photon number from |l to |l + m . When we define the translation operator on the Floquet-Hilbert space by T m = l∈Z |l l + m| ⊗ I, H eff satisfies the translation symmetry, Since the symmetry is present as long as the time-periodicity H(t + T ) = H(t) holds, the amplification discussed here is always available in our algorithm.
Here, we provide two tasks for the amplification by symmetry. First, we slightly extend the truncated Floquet-Hilbert space to C 8lmax ⊗ H, where the Fourier index |l is chosen The number of additional qubits required for this extension from C lmax ⊗ H is three, and hence this does not affect the scaling of the computational resources. The second task is to preprocess the initial state by where the unitary operator U lmax ini is expressed by The unitary operator U lmax ini nontrivially acts only on the ancillary space, and it can be implemented with O(log l max ) gates (e.g. for l max such that log 2 l max ∈ N, we have U lmax ini = Had ⊗ log 2 (2lmax) ⊗ I with the Hadamard gate Had). This process plays a role of making the initial state approximately translation-invariant with the width of Fourier indices, 2l max . The remaining procedures after the above two tasks is the same as those for the original protocol in Section 3.1. Reflecting that the ancillary Hilbert space is extended to C 8lmax , we evolve the above uniform state |a lmax |ψ(0) by H 4lmax eff and l∈D 4lmax lω |l l| ⊗ I, and make a projection to |a 4lmax . The resulting state |ψ lmax (t) is calculated as follows.
The point of this process is that the above state |ψ lmax (t) can reproduce |ψ(t) with O(1) amplitude owing to the translation symmetry of the effective Hamiltonian, Eqs. (64) and (65). We give its brief explanation in this section, while the rigorous derivation is provided in Appendix B.1. Although we do not have the exact translation symmetry represented by Eqs. (65) due to the finite-size effect of l max , it is expected to approximately hold as if the truncation order l max is sufficiently large. As a matter of fact, we can derive the exact upper bound on the difference between the left-and right-hand sides based on the Lieb-Robinson bound (See Lemma 8 in Appendix B.1). We proceed the discussion with assuming the approximate relation Eq. (69) here. The resulting state of the process |ψ lmax (t) can be roughly computed as follows; The second equality comes from Theorem 4, considering that the summation of l − l over l ∈ D 4lmax is sufficient to suppress the error up to O(ε). As a result, we obtain Let us focus on the success probability of the projection onto |a 4lmax . It is provided by ψ lmax (t)|ψ lmax (t) = 1/4 + O(ε), which is much larger than the original one 1/(2l max ).
Therefore, the amplification protocol by symmetry enables us to avoid O(t)-times repetition of the time evolution e −iH eff t in contrast to the original protocol. Intuitively, this drastic improvement can be understood as a result of the interference under the translation symmetry, as Fig. 2. As we can see from Eq. (20), the target state |ψ(t) is extracted from |Ψ lmax (t) via the accompanying ancillary state |a lmax , which is uniform in the Fourier index |l . When we begin with the non-uniform initial state |0 |ψ(0) , the resulting state after the time evolution is also non-uniform. Since it involves (1/ √ 2l max ) l∈D lmax e ikl |l for k ∈ (2π/2l max )Z (different eigenstates of the translation operator) with approximate equal weight, the amplitude of the desirable component |a lmax is comparably small as 1/ √ 2l max . In contrast, when we employ the uniform initial state |a lmax |ψ(0) , only the uniform components that includes |a lmax are amplified while the other components cancels one another. It can be viewed also as the interference of different initial states |l |ψ(0) in |a lmax |ψ(0) as shown in Fig. 2, and this is why the amplification protocol achieves O(1) amplitude of |a lmax |ψ(0) . We summarize the amplification by symmetry with adding the exact results obtained in Appendix B.1. We prepare the truncated Floquet-Hilbert space C 8lmax ⊗ H, and make the initial state uniform as |a lmax |ψ(0) . As a result, this process is described by a unitary gate where the last unitary gate (U 4lmax ini ) † is added to replace the projection to |a 4lmax by the one to |0 . This approximately realizes the time-evolved state |ψ(t) with 1/4 + O(ε) success probability as The probability 1/4 comes from the ratio of the width of the initial state |a lmax to that of the projected ancillary state |a 4lmax . The exact version, derived based on the Lieb-Robinson bound in Appendix B.1, is stated as follows, and ensures the validity of the discussion here.
We finally note that the amplification solely by this process cannot achieve the success probability 1−O(ε) with keeping the efficiency. Analogous to the above formulation, when we set the widths of the supports of the initial ancillary state, the effective Hamiltonian, and the projected ancillary state to pl max , ql max , ql max (p, q ∈ N, q ≥ p + 2) respectively, the success probability becomes p/q. However, the probability larger than 1 − ε demands the relation, This implies p ≥ 2(1 − ε)/ε, and hence the number of ancillary qubits reaches at-least log(pl max ) ∈ O(log(γt/ε)). In addition, the query complexity includes a term proportional to the dimension of the ancillary system, ql max as discussed later in Section 6. This implies that the query complexity increases linearly in 1/ε, destroying the original logarithmic scaling. Thus, relying solely on this amplification protocol is not suitable for efficient simulation of the dynamics. We resolve this problem by the oblivious amplitude amplification below.

Oblivious amplitude amplification
The above amplification based on the translation symmetry of H eff enhances the amplitude of |ψ(t) from 1/ √ 2l max to 1/2−O(ε). Here, we introduce another amplification, called the oblivious amplitude amplification [16], to achieve the amplitude (or the success probability) The starting point of this amplification is the result of the previous section 5.1. Equation (73), or equivalently Theorem 6, indicates that the consequence of the amplification by symmetry is written as with an additional term |Ψ ⊥ ∈ C 8lmax ⊗ H satisfying The state |Ψ ⊥ generally depends on |ψ(0) and t. The oblivious amplitude amplification takes a similar strategy to that of the Grover's search algorithm, in which we compose the following two unitary operators; The first one R reverses the sign of |l for l = 0, and it is implemented with O(log l max ) gates. The second one U lmax amp2 (t) plays a role in enhancing the amplitude of |ψ(t) up to 1 − O(ε). Its action on any initial state |0 |ψ(0) is computed as follows; In the second equality, we use the relation obtained by applying (U lmax amp1 ) † to Eq. (76). Next, we evaluate Theorem 6 indicates that the time evolution operator U (t) is approximated as and hence the relation is also satisfied. This provides the relation, and substituting this into Eq. (80) results in for an arbitrary initial state |ψ(0) . This result indicates that the operation U lmax amp2 (t) generates the time-evolved state |ψ(t) with the amplitude 1 − O(ε). Or equivalently, as an exact bound for the error, we can derive the inequality, The coefficient comes from the fact that an error bounded by ε/3 appears due to Theorem 6 every time we call Eq. (76). The amplification protocol U lmax amp2 (t) employs 3 times queries to U lmax amp1 (t). Reflecting that the operations U lmax ini and R require relatively a little resource (at-most O(log l max ) elementary gates and complexity), the resource for U lmax amp2 (t) has the same scaling as the one for implementing the time evolution operators e −it l lω|l l| and e −iH 4lmax eff t . As well as the amplification by symmetry, relying only on the oblivious amplitude amplification fails to efficiently enhances the success probability to obtain |0 ⊗ |ψ(t) from O l −1 max to 1 − O(ε). When we do not use the first amplification, the protocol of the oblivious amplitude amplification is given bỹ where the operation U lmax orig represents the time evolution without the amplification by symmetry, defined by ApplyingŨ lmax amp2,p to the initial state |0 ⊗ |ψ(t) returns |0 ⊗ |ψ(t) whose amplitude increases from 1/ √ 2l max approximately in proportion to p. The integer p should be O √ l max to achieve the amplitude 1 − O(ε), reminiscent of the Grover's search algorithm. In other words, O √ l max times call of e −it l lω|l l| and e −iH lmax eff t is required when we useŨ lmax amp2,p . This is why we suggest the combination of the two amplification protocols, with which O(1) times call of e −it l lω|l l| and e −iH lmax eff t can amplify the amplitude of |0 ⊗ |ψ(t) from 1/ √ 2l max to 1 − O(ε).

Block-encoding of Effective Floquet Hamiltonian
In the previous section, we show that the combination of the two kinds of amplification protocols, implemented by U lmax amp2 (t) [See Eq. (86)], provides the target time-evolved state |ψ(t) with an arbitrarily small error O(ε). The computational resource for U lmax amp2 (t) is mostly determined by that for e −it l lω|l l| and e −iH 4lmax eff t . Our strategy is to implement these two time evolution operators by the qubitization technique [18]. As discussed in Section 5.2, the computational resources are determined by how to introduce the blockencoding of the two static Hamiltonians, H lmax LP = l∈D lmax lω |l l| ⊗ I (linear potential Hamiltonian) and H lmax eff (effective Hamiltonian). The aim of this section is to obtain an efficient block-encoding of them and to evaluate the costs required to implement e −it l lω|l l| and e −iH 4lmax eff t .

Block-encoding of linear potential Hamiltonian
We compose block-encoding of the linear potential Hamiltonian, By simple calculation, it can be written in the form of an LCU, Using the block-encoding formalism of LCUs by Eq. (10), we immediately obtain the oracle unitary gate and the oracle state by The subscript b for the states |l b and |a 4lmax b represents a new ancillary system introduced for the block-encoding, requiring the number of qubits n b ∈ O(log l max ). Combining Eqs.
To implement this oracle, we use a comparator defined by with a single-qubit ancillary system b [53]. We can immediately confirm the relation, for arbitrary inputs l, l ∈ D 4lmax and |ψ ∈ H, where Z b denotes a Pauli Z operator • Scaling of overall complexity;

Block-encoding of effective Hamiltonian
We hereby provide a way to efficiently implement e −iH 4lmax eff t by the qubitization technique. We find out that naive block-encoding of the effective Hamiltonian H 4lmax eff , given by Eq.  First, we specify the assumption for the time-periodic Hamiltonian H(t). Here, we suppose two kinds of the oracles that can be accessed. The first one is about the blockencoding of each Fourier component H m (|m| ≤ m max ), given by with an n a -qubit ancillary system. Each oracle unitary gate on C 2 na ⊗ H is represented by O m , and each oracle state |G m a ∈ C 2 na is generated by a quantum circuit G m as |G m a = G m |0 ⊗na . The second oracle is a query to coefficients of Fourier components, represented by α m . We define the oracle unitary circuit G coef , acting on C 2mmax+1 [i.e. O(1)-qubit system], by Implementation of the oracle G coef is equivalent to embedding probability distributions in quantum states [54], which also appears as the oracle state for the qubitization technique in the case of a LCU [See Eq. (10)]. Stimulated by its various applications covering linear equation solver [55] and quantum singular value transformation [47,48], there have been various efficient implementations [56,54,46,57,58]. Here, we suppose that the number of elementary gates required for the oracles O m , G m , and G coef is at-most C. As discussed later in Section 8, the form of time-periodic Hamiltonians H(t) designated by Eq. (98) is reasonable in that they involve various important classes such as LCUs.

Construction of a refined effective Hamiltonian
Let us compose a refined effective Hamiltonian, which accurately reproduces the dynamics of e −iH 4lmax eff t . To come to the point, such a Hamiltonian equipped with suitability for efficient block-encoding is given by The integer l ⊕ m ∈ D 4lmax is defined modulo 8l max . The additional terms in H 4lmax eff,pbc connects the boundary of D 4lmax so that the hopping terms, |l l + m|, induced by H −m can be translation-invariant under the shift of |l . The subscript "pbc" comes from the periodic boundary conditions (PBC) for the hopping terms.
In our algorithm, we employ the time evolution e −iH 4lmax eff,pbc t instead of the original one e −iH 4lmax eff t , which is derived by Floquet theory. In other words, we organize the amplification protocols with the refined effective Hamiltonian H 4lmax eff,pbc as and apply them to the initial state |0 |ψ(0) . The refined effective Hamiltonian is valid in a sense that it can also provide the exact time-evolved state as The rigorous upper bound on the error is provided by the following theorem.
We deliver its detailed proof in Appendix B.2, and instead we briefly explain why the refined effective Hamiltonian is valid. The proof relies mainly on the Lieb-Robinson bound, stated by Theorem 3. Focusing on the amplification protocols U lmax amp1 (t) and U lmax amp1,pbc (t), the resulting difference caused by them originates from their actions on the uniform initial state |a lmax |ψ(0) , as Eq. (72). As we can see from Fig. 2 (b), the Lieb-Robinson bound dictates that the dynamics during U lmax amp1 (t) is almost closed within −2l max l 2l max in the Fourier indices. In contrast, the support of the additional terms of the refined effective Hamiltonian in Eq. (101) is located at {|l } l for l ±4l max , which is sufficiently far from the Fourier indices relevant for the dynamics. As a result, the refined protocol U lmax amp1,pbc (t) transforms the state |a lmax |ψ(0) in almost the same way as the original one U lmax amp1 (t), and hence it also outputs the target state |ψ(t) as Eq. (105). Discussion similar to Section 5.2 ensures that the oblivious amplitude amplification under the refined effective Hamiltonian, represented by U lmax amp2,pbc (t), provides |ψ(t) with a sufficiently small error O(ε) as Eq. (106).

Block-encoding of a refined effective Hamiltonian
The benefit of employing the refined effective Hamiltonian where we omit the identity operators. We also provide a state |G 4lmax In the above formula, we implicitly include ancillary qubits to efficiently implement the oracles [e.g. the comparator for O 4lmax LP in Eq. (95)]. The number of such ancillary qubits is smaller than O(log l max ), and hence we neglect it below. We can confirm that they provide the block-encoding of H 4lmax eff as follows; with using the quantum circuit G 4lmax eff and the state |w d defined by The quantum circuit G 4lmax eff can be composed of O(m max C) elementary gates. The cost of the state preparation |w d is negligible compared to others since it requires only a singlequbit rotation. The cost for preparing |a 4lmax b is at-most O(log l max ) complexity and elementary gates. To summarize, the cost for the oracles O 4lmax eff and |G 4lmax eff , which embody the refined effective Hamiltonian H 4lmax eff,pbc via Eq. (111), is at-most O(m max (C + log l max )) elementary gates. The additional gates per query, other than the block-encoding, amounts to O(n a + log(l max )).
As a result, preparing a unitary circuit corresponding to e −iH 4lmax eff t with an allowable error up to O(ε) is summarized as follows; • Number of ancillary qubits; n a (l max ) = n a + O(log l max + log m max ). (115) • Scaling of query complexity q(l max ); • Number of overall gates; Since m max is supposed to be a O(1) constant, we will omit it from the cost in the rest of the paper.

Algorithm and Its computational cost
This section provides the main result of this paper; we compose the efficient quantum algorithm for simulating the time-evolved state |ψ(t) = U (t) |ψ(0) under the Hamiltonian H(t + T ) = H(t). We note that we take two different approaches depending on the time scale of interest. The first case is the adiabatic-like case, in which we are interested in O(1)period dynamics with t/T ∈ O(1). We call it "adiabatic-like" since long-time dynamics during 0 ≤ t ≤ T under the sufficiently large period T , exemplified by Thouless pumping and adiabatic state preparation, is a typical target. The second case is the generic longtime case, where we are interested in multiple-period dynamics at t/T ∈ Ω(1). In that case, the the period T is not so large and we often consider long-time dynamics at t T , exemplified by laser-irradiated materials.

Adiabatic-like cases
We consider the adiabatic-like cases, where long-time dynamics over O(1)-periods is of interest. For simplicity, we first consider the dynamics within one period at t ∈ [0, T ]. The algorithm in this case is composed of the following steps; 1. Determine the truncation order of the Fourier index l max by Eq. (59), for the given time t.

Compose two unitary gates
for arbitrary inputs l ∈ D 4lmax and |ψ ∈ H.
3. Execute the amplification protocols with substituting U 4lmax LP (t) and U 4lmax eff (t) re-spectively for e −iH LP t and e −iH 4lmax eff,pbc t [See Section 5]; 4. Apply the unitary operationŪ lmax amp2 (t) to the initial state |0 ⊗na(4lmax) |0 |ψ(0) ; Step 1. determines the dimension of the truncated Floquet-Hilbert space as C 8lmax ⊗ H so that it can accurately reproduce |ψ(t) with the precision 1 − O(ε), as discussed in Section 4. In Step 2., we employ the qubitization technique to realize the time evolution in the truncated Floquet-Hilbert space, as discussed in Section 6. We note that U 4lmax We finally determine the resource for simulating time-periodic Hamiltonians in adiabaticlike regimes. The ancillary system should involve the degree of freedom for Fourier indices {|l } l∈D 4lmax and n a (4l max ) qubits for qubitization. The total number of ancillary qubits is n a (4l max ) + log 2 (8l max ) ∈ n a + O(log(γt) + log log(1/ε)).

Generic long-time cases
We next consider the other generic cases where we are interested in long-time dynamics over multiple periods as ωt ∈ Ω(1). In this regime, we take a different strategy from that for the adiabatic cases; We split the time t by t = (n+δ)T with n ∈ N and δ ∈ [0, 1). Following this separation, we implement the time-evolution operator U (t) by n-times operation of U (T ) and single operation of U (δT ). The algorithm is composed of the following steps; 1. Split the time by t = (n + δ)T with n ∈ N and δ ∈ [0, 1). Determine the truncation order l T max by substituting T and ε/n into t and ε of Eq. (59); l T max ∈ Θ γT + log(n/ε) log(e + (γT ) −1 log(n/ε)) = Θ γ/ω + log(ωt/ε) log(e + (γ/ω) −1 log(ωt/ε)) . (125)

Compose a unitary gateŪ
by the qubitization and the amplitude amplification protocols (Follow Steps 2.-4. of Section 7.1 wih substituting T , ε/n, and l T max into t, ε, and l max respectively).

Apply the unitary gateŪ
l T max amp2 (T ) to the initial state n times, which results in

Prepare the unitary gateŪ
l T max amp2 (δT ) by substituting δT , ε, and l T max into t, ε, and l max respectively in Steps 2.-4. of Section 7.1. Applying it once to the above state results in for arbitrary initial states |ψ(0) .
We remark several points in each step. Steps 1. and 2. are executed to apply U (T ), giving the time-evolution over one period T . Here, we set the acceptable error to O(ε/n) so that we can obtain the time-evolved state |ψ(nT ) with an error up to O(ε) after the n-times repetition in Step 3. The cost for the unitary gateŪ We use the fact n ∈ O(ωt) in the above relation.
Step 4. realizes the remaining micromotion U (δT ) for the duration δT with an error up to O(ε). We remark that the choice of the truncation order l T max so far is sufficient to achieve the precision 1 − O(ε) for U (δT ) due to l T max > l δT max . This implies that we can reuse the ancillary state |0 ⊗na(4l T max ) for the qubitization approach to the time-evolution U (δT ) in Step 4. The cost for implementinḡ U l T max amp2 (δT ) once is obtained by setting t = δT in adiabatic-like cases. It is smaller than the cost for implementing the time-evolution U (T ), and does not affect the scaling of the computational resource.
Finally, we provide the computational resource for time-periodic Hamiltonian dynamics in generic long-time regimes. The number of ancillary qubits is given by n a (4l max ) + log 2 (8l T max ) , which is bounded by n a + O(log(γ/ω) + log log(ωt/ε)).
The query complexity is dominated by Eq. (130). We summarize them in Theorem 2 and Table 1.

Comparison with other algorithms
Let us compare our algorithm on time-periodic Hamiltonians with other quantum algorithms for Hamiltonian simulation, based on Table 1. We pick up the qubitization technique [18] for time-independent Hamiltonian H and the truncated Dyson-series algorithm [19,20] for generic time-dependent Hamiltonian H(t), whose resources have the best scaling in t and 1/ε as far as we know. We also consider the standard way, Trotterization, which covers generic time-independent, time-periodic, and generic time-dependent Hamiltonians. Let us first compare our algorithm with Trotterization. When the Hamiltonian can be divided into H(t) = Γ r=1 H r (t), where every term in H r (t) commutes with one another at every time, the first-order Trotterization roughly approximates the time-evolution U (t) by with M partitions of the time t as t l = lt/M [2]. Reflecting that the Trotterization error polynomially decreases in the partition number M , we need queries to a layered quantum circuit for O (αt) 2 /ε times to achieve the allowable error ε. While the coefficient α and the powers in t and 1/ε can be improved by higher-order Trotterization [12], its resource increases polynomially in 1/ε. Our algorithm has better scaling of elementary gates than Trotterization in that the resource increases logarithmically in 1/ε. The comparison with the qubitization and the truncated Dyson-series algorithm is instructive for evaluating the efficiency of our algorithm due to the inclusion relation, The qubitization technique achieves the least number of ancillary qubits in a sense that it is independent of t and 1/ε. It also has the best query complexity, in which the optimal scaling both in t and 1/ε appears in an additive way as Eq. (14). The number of ancillary qubits and the query complexity for the qubitization provide the best bound for time-periodic Hamiltonians, while it is nontrivial whether or not it is actually achievable. Comparison with the truncated Dyson-series algorithm tells us how efficiently we deal with the time-dependent Schrödinger equation. The cost reduction compared to them can be interpreted as improvement of efficiency due to the time-periodicity. Before going to the comparison respectively for the adiabatic-like regime and the generic long-time regime, we remark several points in common. First, we replace some parameters by those which have similar scales to simply compare these algorithms. For instance, the parameters α and γ respectively give the energy scales of the whole Hamiltonian H(t) and the time-dependent terms H(t) − H 0 according to Eqs. (25) and (23). We substitute α and ωγ respectively for sup t ( H(t) ) and sup t ( d dt H(t) ), which are characteristic values in the truncated Dyson-series algorithm.
The second point is the complexity of the oracles themselves. In the qubitization technique, the query complexity is measured by the oracles to a static Hamiltonian H as Eq. (8). In contrast, the truncated Dyson-series algorithm for generic cases employs the oracle where l labels the discretized time t l with the partition number M ∈ O(1/ε) [19]. The oracle Ham−O includes multiple implementation of O(t l ), which is an oracle for a static instantaneous Hamiltonian H(t l ) (some specific cases such as LCUs and sparse-access matrices with integrable time-dependency can be simplified [20]). Our algorithm for timeperiodic Hamiltonians uses the oracles {O m , |G m }, which gives block-encoding of each Fourier component H m as Eq. (98), and the oracle for the coefficients, G coeff given by Eq.
(99). The query complexity in our algorithm is roughly measured by the oracle for a static operator H m since the latter one G coeff , a quantum gate on at-most O(1) qubits, is usually negligible. Therefore, note that our algorithm adopts essentially the same measure for the query complexity as the query complexity, while the truncated Dyson-series algorithm counts it by rather complicated oracles involving discretized time.

Adiabatic-like cases
We assess the cost for simulating time-periodic Hamiltonians in the adiabatic-like cases based on Table 1. First, we compare the number of ancillary qubits with those for other algorithms. What should be noted in our algorithm is its scaling in the inverse error 1/ε. Our algorithm requires O(log log(1/ε)) additional qubits, whose number lies just in the middle of that for the qubitization (ε-independent) and that for the truncated Dyson-series algorithm [O(log(1/ε))]. Importantly, the reduction compared to the latter one can be attributed to the faster convergence of our formalism based on the Fourier indices. The error arising from time discretization polynomially decays in the partition number M ; as a result, the truncated Dyson-series algorithm requires at-least O(log(1/ε)) ancillary qubits via the oracle Eq. (135). In contrast, we introduce the truncation order l max for the Floquet-Hilbert space. The error by this cutoff scales as O (γt/l max ) lmax , whose decay is faster than the exponential function e −O(lmax) . This leads to the reduction of ancillary qubits from O(log(1/ε)) to O(log log(1/ε)). Next, we compare the query complexity. When we fix the inverse error 1/ε and increase time t, it scales as O(αt). The linear increase in time t implies the optimality of our algorithm in time. When we consider 1/ε → ∞ for given time, the query complexity scales as log(1/ε) log log log(1/ε) , where we substitute the form of the o(log(1/ε)) term, Eq. (29), into Eq. (28). This scaling is nearly-optimal in 1/ε. Importantly, their contributions affect the query complexity in an additive way as αt + o(log (1/ε)). This means that our algorithm deals with timeperiodic systems with the cost sufficiently close to qubitization [See Eq. (14)], and saves resource compared to the truncated Dyson-series algorithm requiring the multiplicative query complexity αt × o(log(αt/ε)). As long as we suppose polynomial accuracy ε ∈ 1/poly (N ), implying log(1/ε) αt, the query complexity can achieve the best scaling for time-independent systems. While we hereby provide comparison with the truncated Dyson-series algorithm, our algorithm outperforms other time-dependent Hamiltonian simulation algorithms [21,22,23,24] in the presence of time-periodicity. For instance, Ref. [24] can achieve the additive query complexity when the second derivative of H(t) in t vanishes (i.e., linear dependence), but time-periodicity prohibits vanishing derivatives. For time-periodic Hamiltonian simulation, the additive query complexity close to that of the qubitization is unique to our algorithm. Any of the other algorithms using the discretized time requires at-least O(1/ε) degrees of freedom, while we need O(log(1/ε)) for the Fourier indices. Thus, our algorithm always saves the number of ancillary qubits.

Generic long-time cases
Here, we evaluate the computational resources for generic long-time regimes. We suppose that the frequency ω is constant and hence small compared to the whole-system energy scales α and γ, which usually increase polynomially in the system size N . This assumption comes from the fact that ω typically represents the frequency of external drives (e.g. light), which is size-independent. More importantly, when the frequency ω is poly (N ) so that it becomes comparable to α and γ, the dynamics can be efficiently simulated by the methods for time-independent Hamiltonians with the help of high-frequency expansions [59,60,61,62,63], as we prove in Appendix D. Therefore, it is sufficient to consider the computational resources under ω ∈ O N 0 and α, γ ∈ poly (N ).
The number of ancillary qubits is similar to that for the adiabatic cases. We note that its scaling in the time [O(log log t)] overwhelms that of the truncated Dyson-series algorithm [O(log t)] in addition to the inverse error 1/ε. The number of ancillary qubits for time-periodic Hamiltonians lie between those for time-independent and time-dependent Hamiltonians in terms of both t and 1/ε.
We discuss the query complexity given by Theorem 2. Under the fixed allowable error ε, the scaling of the query complexity in sufficiently large time t is given by Due to the second term in the right hand side, the query complexity has at least nearlyoptimal dependence in t, which involves logarithmic correction log(ωt). However, reflecting the assumptions α ∈ poly (N ) and ω ∈ O N 0 , the O(ωt log(ωt)) term can be nonnegligible compared to O(αt) only when the time reaches t ∼ e α/ω /ω ∈ O e poly(N ) T .
Therefore, unless we consider unpractical time scales t ∈ e Θ(N ) , this nearly-optimal scaling is formal, and the query complexity actually increases linearly in time t, showing the optimal scaling. On the other hand, when we focus on the scaling in the inverse error 1/ε, the query complexity scales as ωt log(1/ε) log log log(1/ε) .
When it comes to the combined scaling of the query complexity in t and 1/ε, we note that it has an additive form as where we neglect O(ωt log(ωt)). Although linear increase in t couples with logarithmic increase in 1/ε, the coefficient ωt ∈ O N 0 t is much smaller than the whole energy-time scale αt ∈ poly (N ) t. As a consequence, also in generic long-time cases, our algorithm achieves the query complexity close to that of qubitization [See Eq. (14)] and saves much cost compared to the truncated Dyson-series algorithm, where the query complexity scales as poly (N ) t × o(log(αt/ε)).

Illustrative examples
In this section, we briefly discuss some potential applications of the algorithm. We expect that it can be applied to nonequilibrium quantum many-body phenomena, which are often of interest in condensed matter physics and quantum chemistry. In terms of quantum computation, it will offer an efficient protocol for adiabatic state preparation, which can be applied to quantum phase estimation for instance. We suggest the simplest examples for both applications below.

Nonequilibrium quantum many-body phenomena
The first application is to simulate nonequilibrium dynamics of periodically-driven quantum materials. Optical responses are typical but of great interest both in condensed matter physics and quantum chemistry. We pick up an N -site Fermi-Hubbard model under laser light as the simplest case; Here,n kσ =ĉ † kσĉ kσ andn xσ =ĉ † xσĉ xσ are number operators of electrons in the momentum and real spaces respectively, generated by fermionic annihilation operatorsĉ kσ andĉ xσ .
The time-periodic term H ext (t) represents the coupling with light. When we shine linearly-polarized light with the frequency Ω, it is given by which results in T = 2π/Ω and m max = 1. The Fourier components H m are given by H 0 = H Hub and H ±1 = (±i) x,σ (V x /2)n xσ . To evaluate the cost of simulating |ψ(t) , we compose the oracles for them. We employ a unitary operation, called fermionic fast Fourier transform (FFFT) [64,65,66], which transforms the basis in the momentum space k i to that in the real space x i asn k i σ = FFFT †n x i σ FFFT. We map the fermionic system to a spin system by Jordan-Wigner transformation asn xσ = (1 + Z xσ )/2 ("xσ" denotes an index for qubits). By neglecting constant terms and a conserved particle number xσn xσ , the block-encoding for {H m } can be constructed by the technique for LCUs, as Eq. (10). Assuming k , U, V x ≥ 0 without loss of generality, this leads to with α 0 = 2 x x + U N and α ±1 = x V x . In the above oracles, the unitary gate O 0 requires much cost due to O(1)-times usage of FFFT, which can be implemented with at-most O(N log N )-depth quantum circuits composed of adjacent two-qubit gates [66]. The depth needed for each oracle is at-most O (N log N ).
The number of ancillary qubits for them to express {|0 |x, σ , |1 |x, σ } amounts to n a ∈ O(log N ). The energy scales of the whole Hamiltonian, α, determined by Eq. (25) as Although it is difficult to obtain γ from its definition Eq. (23), we can easily obtain its upper bound, which results in It is adequate for determining the computational resource, since α is rather dominant in the query complexity and γ appears as O(log γ) in the number of ancillary qubits. When we define the characteristic local energy scale by α loc = max( k , U, V x ), they are approximately described by α, γ ∈ O(α loc N ).
Finally, if we are interested in the time-evolved state |ψ(t) over multiple periods, the following resources are required to achieve the precision 1 − O(ε); • Number of ancillary qubits; O(log(α loc N/Ω) + log log(Ωt/ε)). (149) • Overall gate complexity; The above results are based on Eqs. (130) and (131), where we neglect size-independent additional gates for each query to the oracles. According to classical numerical calculations [67,68,69], the above model is expected to host high-harmonic generation, where intense oscillation with the frequency nΩ (n = 2, 3, . . .) arises in response to laser light with the frequency Ω. Our algorithm allows to efficiently identify such a nontrivial nonequilibrium phenomenon with guaranteed accuracy by simulating the Fourier spectrum of some observables ψ(t)|O|ψ(t) (e.g. electric current) [70].
Another interesting example is a discrete time crystal as a phase of matter inherent in nonequilibrium, where time-translation symmetry is spontaneously broken [28,29,30]. When we choose H ext (t) by uniform circularly-polarized ac field represented by the Fermi-Hubbard model becomes a potential platform for a time-crystalline phenomenon protected by Floquet dynamical symmetry [71]. Its signature can be detected by subharmonic oscillations of spatio-temporal correlation functions and local observables. These values are both efficiently computed via the time-evolution operator U (t) by our algorithm, requiring the computational resources similar to Eqs. (149) and (150). As nonequilibrium systems dominated by time-periodic Hamiltonians have been vigorously explored as Floquet systems, our algorithm will cover various phenomena. We also note that our algorithm is extended to time-periodic Hamiltonians with exponentiallydecaying Fourier components H m e −O(|m|) . Since we often face at situations where high-frequency components of H(t) rapidly diminish, our result will be useful for a variety of nonequilibrium phenomena in condensed matter physics and quantum chemistry, other than the above examples (See also Appendix C.5).

Adiabatic state preparation
Adiabatic state preparation is a protocol to obtain a preferable quantum state by adiabatically evolving quantum systems with time-dependent Hamiltonians H(t). While it has been originally developed in the context of adiabatic quantum computation relying on the adiabatic theorems of quantum dynamics [42], it can be exploited also on circuit-based quantum computers by mimicking the adiabatic dynamics under H(t). One of typical aims is to prepare initial states required for quantum simulation [41,6,58]. Here, we provide the simplest application of our algorithm in this field.
We prepare a time-periodic Hamiltonian H(t), which continuously connects two different time-independent HamiltoniansH 0 andH 1 . A certain eigenstate ofH 0 , denoted by |ψ 0 , is supposed to be easily prepared on quantum circuits, and we assume that the eigenstate ofH 1 , which is continuously connected to |ψ 0 , corresponds to the target state |ψ 1 . As the simplest case, we organize such a time-periodic Hamiltonian by with satisfies H(0) =H 0 and H(T /4) =H 1 . Our algorithm for the adiabatic-like cases in Section 7.1 enables us to efficiently execute the adiabatic state preparation. Since the Fourier components of H(t) is simply given by H 0 =H 0 and H ±1 = ±(H 0 −H 1 )/2i with m max = 1, the following oracles are necessary; Here, (Ō 0 , |Ḡ 0 ,ᾱ 0 ) and (Ō 1 , |Ḡ 1 ,ᾱ 1 ) respectively provides the block-encoding of the time-independent HamiltoniansH 0 andH 1 as Eq. (8). The oracles O ±1 and |G ±1 gives the block-encoding of H ±1 as G ±1 |O ±1 |G ±1 = 2H ±1 /(ᾱ 0 +ᾱ 1 ). According to the standard adiabatic theorem [42], the duration required for approximating the target state |ψ 1 by the adiabatically-evolved state U (T /4) |ψ 0 with accuracy 1 − O(ε) roughly amounts to T H 0 −H 1 /ε∆ 2 , where ∆ denotes the minimal gap upon the instantaneous eigenstate of H(t) continuously connecting |ψ 0 and |ψ 1 . Therefore, in the adiabatic state preparation based on our algorithm, the query complexity counted by the oracles of the two static HamiltoniansH 0 andH 1 amounts tō Although the second term is buried by the first one polynomially increasing in 1/ε, the cost has better scaling in 1/ε compared to the cases where similar schedules for H(t) are tackled with the Trotterization [∼ (1/ε) 1+2/p ] or the truncated Dyson-series algorithm In adiabatic quantum computation, the required time in 1/ε can be improved via more sophisticated scheduling, such as local adiabatic interpolation [72], boundary cancellation [73,74] and quasi-adiabatic processes [75]. In particular, the last one has achieved the query complexity poly-logarithmic in 1/ε with the help of the truncated Dyson-series algorithm. If we can find time-periodic Hamiltonians which are consistent with such sophisticated schedules, it will offer a more efficient protocol for the adiabatic state preparation.

Discussion and Conclusion
We conclude our paper with summarizing the results. In this paper, we focus on timedependent systems with time-periodicity, and organize an efficient implementation of their time-evolution operators. Once we prepare the oracles which embeds each Fourier component and each coefficient of the Hamiltonian, a series of unitary operations on the truncated Floquet-Hilbert space extract the time-evolved state |ψ(t) with an allowable error 1/ε, whose query complexity are both optimal or nearly-optimal in t and 1/ε. In addition, it has an additive scaling αt + o(log(1/ε)) [adiabatic-like regime] or αt + ωt × o(log(1/ε)) [generic long-time regime], and hence achieves the cost sufficiently close to that of the best algorithm for time-independent systems [18], despite the existence of time-dependency. As exemplified by nonequilibrium quantum many-body phenomena and adiabatic state preparation, our algorithm will contribute to pioneering applications of quantum computers for various aims, in condensed matter physics, quantum chemistry, and quantum computation.
We finally discuss some potential future directions of our results. The first one is to explore efficient implementation of meaningful functions in time-periodic systems, other than the time-evolution operator U (t) = T exp −i t 0 H(t )dt . In our algorithm, we exploit the qubitization technique to implement the exponential function, e −iH lmax eff t , in the effective Hamiltonian H lmax eff . As the qubitization technique allows to efficiently apply various polynomial functions, we expect that our algorithm can be extended for various aims other than the unitary time evolution discussed here. For instance, our discussion on the Lieb-Robinson bound in Section 4 is valid also when a time-periodic Hamiltonian H(t) is non-hermitian. This suggests that our algorithm may be available also for solving timeperiodic linear differential equations [76], exemplified by dissipative quantum may-body systems [77,78]. While we do not expect exponential speedup due to the non-unitarity in general, it will offer an efficient way both in time and desirable accuracy. It would be important to clarify what kind of function is useful in time-periodic systems, how the Lieb-Robinson bound and the amplitude amplification should be modified, and then how the computational cost is affected. Another significant direction is to seek for useful tasks that can be efficiently tackled with time-periodic Hamiltonians. As we have shown throughout the paper, time-periodic Hamiltonians can be simulated more efficiently generic time-dependent Hamiltonians with computational resources close to those for time-independent Hamiltonians. This means that quantum tasks which inevitably requires time-dependent operations, such as adiabatic state preparation, can be optimized by tuning their schedules in a time-periodic way. It will be important to clarify what kind of tasks can be addressed by time-periodic Hamiltonians and how our algorithm provides better scaling in 1/ε for their costs, which we leave for future work.

B Proof of the theorems for formulation B.1 Amplitude amplification by symmetry
In Section 5.1, we discuss the amplification of the time-evolved state |ψ(t) with exploiting the translation symmetry of generic time-periodic Hamiltonians. With the usage of the protocol, U lmax we see that it generates the time-evolved state |ψ(t) with O(1) amplitude as We rigorously prove this statement, which is summarized as Theorem 6. In order to show Theorem 6, we begin with discussing the approximate translation symmetry. To be precise, we evaluate how much error appears in the approximation, which we refer to as Eq. (69) in the main text. The exact upper bound on this error is given by the following theorem.

Lemma 8. (Approximate translation symmetry)
We choose the truncation order l max ∈ Θ(γt + log(1/ε)/ log log(1/ε)) by Eq. (59). For indices l ∈ D lmax and l ∈ D 4lmax , the inequality In a similar way, we can obtain the bound on the second term of the right hand side of Eq. (168), which results in the above formula under the replacement of n(l , l) by n(0, l l ). Since n(0, l l ) is always larger than n(l , l) from their definitions, we obtain the accuracy of the approximate translation symmetry as When we choose the truncation order l max by Eq. (59), the integer n(l , l) is always larger than 6γt for indices l ∈ D 4lmax and l ∈ D lmax . We use Eq. (45) for the last inequality.
This lemma ensures that the approximate translation symmetry in the truncated Floquet-Hilbert space is extremely accurate; the right hand side of Eq. (162) is approximately O ε 3 under the choice of l max , Eq. (59). As we intuitively discuss in Section 5.1, the approximate translation symmetry ensures the amplification by symmetry as Eq. (73). We next prove the consequence of Theorem 6, which gives the exact description of Eq. (73); Proof of Theorem 6.-We track Eq. (70) in a rigorous way. We begin with the definition, We separate the summation over l ∈ D 4lmax in the above formula by Let us focus on the first summation. For each l ∈ D lmax , the summation can be approximated as In the first inequality, we employ Lemma 8 with taking l − l = l l for l − l ∈ D 3lmax into consideration. The last inequality comes from Theorem 4. The first term in Eq. (177) is further bounded by where we use the relation Eq. (54) and the Stirling's formula Eq. (48). This accomplishes the evaluation of Eq. (177) with the usage of Eq. (60) as We next compute the second summation in Eq. (176), which is taken over l ∈ D 4lmax satisfying l − l / ∈ D 3lmax . The Lieb-Robinson bound, dictated by Theorem 3, immediately concludes its upper bound by Finally, we show that the amplification protocol relying on the symmetry reproduces the time-evolved state |ψ(t) . Combining the results of Eqs. (179) and (180), we arrive at which is smaller than ε/3 for m max ∈ N and ε ∈ [0, 1].

B.2 Validity of refined effective Hamiltonian
In Section 6.2, we raise an efficient block-encoding protocol for implementing e −iH lmax eff t by the qubitization technique. The strategy is to substitute the effective Hamiltonian under the periodic boundary conditions, H 4lmax eff,pbc , for the original one H 4lmax eff ; The complexity of the oracles reduces from O(l max ) to O(log l max ). In order to justify this replacement of H 4lmax eff with H 4lmax eff,pbc , we provide Theorem 7, which says that its error is bounded from above by The amplification protocols under the refined effective Hamiltonian, U lmax amp1,pbc (t) and U lmax amp2,pbc (t), are designated by Eqs. (102) and (103).
justified as This bound is actually smaller than ε/3, and we complete the proof of Theorem 7.

C Extension to exponentially-decaying Fourier components
In the main text, we focus on time-periodic Hamiltonians which have vanishing Fourier components H m = 0 for |m| > m max . Here, we generalize our results to time-periodic Hamiltonians H(t) with exponentially-decaying Fourier components as for |m| > 0. The norm of H 0 is arbitrary as long as it is bounded. The effective Hamiltonian in the truncated Floquet-Hilbert space, H lmax eff , is the same as Eq. (18), but it is dense in the basis {|l } l∈D lmax compared to the cases where H m = 0 is satisfied for |m| > m max .
We formulate the protocol in a similar manner to the main text, and show that the computational resources for the time-evolved state have nearly-optimal dependence both in t and 1/ε. The difference mainly comes from the form of the Lieb-Robinson bound and the infinite series of {H m } needed for designating H(t). The former one affects the truncation order l max for the Floquet-Hilbert space. The latter one yields the change in block-encoding so that almost all the information about H(t) can be embedded with keeping the efficiency.

C.1 Truncation order of Floquet-Hilbert space
We first determine the proper truncation order l max for the Floquet-Hilbert space, as we did in Section 4. To this aim, we begin with deriving the Lieb-Robinson bound on the transition rate, corresponding to Theorem 3.

Theorem 9. (Bound on transition rate)
We assume H m ≤ he −|m|/ζ with certain positive constants h and ζ. Then, for l, l such that |l|, |l | ≤ l max , the transition rate is bounded from above by Here, β and ζ are positive constants defined by where we insert the identity l i ∈D lmax |l i l i | = I for n − 1 times. The summation {l i } is taken over l i ∈ D lmax for i = 1, 2, . . . , n − 1 under fixed l 0 = l and l n = l. We introduce a new variable m i = l i − l i−1 instead of using {l i }, and then the above integrand is bounded by Since l + m 1 + . . . + m n = l implies |m 1 | + . . . + |m n | ≥ |l − l |, it can be further bounded as follows, Here, θ(x) is a step function, defined by θ(x) = 1 for x ≥ 0 and θ(x) = 0 otherwise. The summation S n (M ) is defined by We use this relation recursively until S 1 appears. After the single use of this equality, we obtain where we use e −mn/ζ ≤ 1 for the second term. By repeating this calculation n − 1 times, we arrive at We define a positive constant ζ by 1/ζ = 1/ζ −1/β as Eq. (193). The above inequality enables to evaluate the integrand F n as F n ≤ (2h) n S n (|l − l |) ≤ (2βh) n e −|l−l |/ζ +2/β .
This completes the proof of the theorem. Theorem 9 says that the bound on the transition amplitude from |l to |l exponentially decays in the distance |l − l | for Hamiltonians with H m e −O(|m|) . The decay is relatively slow compared to the cases H m = 0 (|m| > m max ) showing O |l − l | −|l−l | . In fact, this result is reminiscent of the Lieb-Robinson bound for short-ranged interacting systems [79,80,81], which provides exponentially-decaying correlation functions under interactions U ij ∼ e −O(|i−j|) . Based on Theorem 9, we assess how the exact time-evolved state can be approximated by the truncated Floquet-Hilbert space, and determine the proper truncation order l max . We summarize the result by the following theorem, which is a counterpart of Theorem 4.
Proof.-The derivation is similar to the one for Theorem 4. According to Eqs. (50)-(52), we should evaluate the following two values,  We introduce five kinds of ancillary systems labeled by a, b, c, d, and e. The systems a, b, and e are respectively described by 2 na -, 8l max -, and 2-dimensional Hilbert spaces, as we consider in Section 6.2. States of the system c is spanned by {|m } m∈D 4lmax due to the absence of m max . The system d is characterized by {|j } jmax j=1 . The oracle unitary gate for the refined effective Hamiltonian H 4lmax eff,pbc is given by Here by directly substituting the above equations. The cost of implementing the oracle state |G 4lmax eff from the trivial state |0 e |0 d |0 c |0 b |0 a is O j max C + C 4lmax + log l max elementary gates.
Therefore, implementing the time evolution operator exp(−iH eff,pbc t) with the qubitization technique requires the following resources for time-periodic Hamiltonians satisfying H m ≤ he −|m|/ζ ; • Number of ancillary qubits; O(log j max + log l max ) • Number of overall gates; O {(α + l max ω)t + o(log(1/ε))} j max C + C 4lmax + n a + log j max + log l max .