Quantum Algorithms for Simulating the Lattice Schwinger Model

The Schwinger model (quantum electrodynamics in 1+1 dimensions) is a testbed for the study of quantum gauge field theories. We give scalable, explicit digital quantum algorithms to simulate the lattice Schwinger model in both NISQ and fault-tolerant settings. In particular, we perform a tight analysis of low-order Trotter formula simulations of the Schwinger model, using recently derived commutator bounds, and give upper bounds on the resources needed for simulations in both scenarios. In lattice units, we find a Schwinger model on $N/2$ physical sites with coupling constant $x^{-1/2}$ and electric field cutoff $x^{-1/2}\Lambda$ can be simulated on a quantum computer for time $2xT$ using a number of $T$-gates or CNOTs in $\widetilde{O}( N^{3/2} T^{3/2} \sqrt{x} \Lambda )$ for fixed operator error. This scaling with the truncation $\Lambda$ is better than that expected from algorithms such as qubitization or QDRIFT. Furthermore, we give scalable measurement schemes and algorithms to estimate observables which we cost in both the NISQ and fault-tolerant settings by assuming a simple target observable---the mean pair density. Finally, we bound the root-mean-square error in estimating this observable via simulation as a function of the diamond distance between the ideal and actual CNOT channels. This work provides a rigorous analysis of simulating the Schwinger model, while also providing benchmarks against which subsequent simulation algorithms can be tested.


Introduction
The 20th century saw tremendous success in discovering and explaining the properties of Nature at the smallest accessible length scales, with just one of the crowning achievements being the formulation of the Standard Model. Though incomplete, the Standard Model explains most familiar and observable phenomena as arising from the interactions of fundamental particles-leptons, quarks, and gauge bosons. Effective field theories similarly explain emergent phenomena that involve composite particles, such as the binding of nucleons via meson exchange. Many impressive long-term accelerator experiments and theoretical programs have supported the notion that these quantum field theories are adequate descriptions of how Nature operates at subatomic scales.
Given an elementary or high-energy quantum field theory, the best demonstration of its efficacy is showing that it correctly accounts for physics observed at longer length scales. Progress on extracting long-wavelength properties has indeed followed: For example, in quantum chromodynamics (QCD), the proton is a complex hadronic bound state that is three orders of magnitude heavier than any of its constituent quarks. However, methods have been found to derive its static properties directly from the theory of quarks and gluons; using lattice QCD, calculations of the static properties of hadrons and nuclei with controlled uncertainties at precisions relevant to current experimental programs have been achieved [1][2][3][4][5][6][7][8][9]. It is even possible with lattice QCD to study reactions of nucleons [10,11]. While scalable solutions for studies like these have been found, they are still massive undertakings that require powerful supercomputers to carry them out.
Generally, however, scalable calculational methods are not known for predicting non-perturbative emergent or dynamical properties of field theories. Particle physics, like other disciplines of physics, generically suffers from a quantum many-body problem. The Hilbert space grows exponentially with the system size to accommodate the entanglement structure of states and the vast amount of information that can be stored in them. This easily overwhelms any realistic classical-computing architecture. Those problems that are being successfully studied with classical computers tend to benefit from a path integral formulation of the problem; this formulation trades exponentially-large Hilbert spaces for exponentially-large configuration spaces, but in these particular situations it suffices to coarsely importance-sample from the high-dimensional space with Monte Carlo methods [12,13]. Analytic continuation to imaginary time (Euclidean space) is key to this formulation. Real-time processes are intrinsically ill-suited for importance-sampling in Euclidean space, and they (along with finite-chemical-potential systems) are usually plagued by exponentially-hard "sign problems" [14][15][16][17][18][19][20][21][22][23].
Quantum computing offers an alternative approach to simulation that should avoid such sign problems by returning to a Hamiltonian point of view and leveraging controllable quantum systems to simulate the degrees of freedom we are interested in [24][25][26][27]. The idea of simulation is to map the time-evolution operator of the target quantum system into a sequence of unitary operations that can be executed on the controllable quantum device-the quantum computer. The quantum computer additionally gives a natural platform for coherently simulating the entanglement of states that is obscured by sampling classical field configurations. We will specifically consider digital quantum computers, i.e., the desired operations are to be decomposed into a set of fundamental quantum gates that can be realized in the machine. Inevitable errors in the simulation can be controlled by increasing the computing time used in the simulation (often by constructing a better approximation of the target dynamics on the quantum computer) or by increasing the memory used by the simulation (e.g. increasing the number of qubits to better approximate the continuous wave function of the field theory with a fixed qubit register size, through latticization and field digitization).
Beyond (and correlated with) the qubit representation of the field, it is important to understand the unitary dynamics simulation cost in terms of the number of digital quantum gates required to reproduce the dynamics to specified accuracy on a fault tolerant quantum computer. Furthermore, since we are currently in the so-called Noisy Intermediate Scale Quantum (NISQ) era, it is also important to understand the accuracy that non-fault tolerant hardware can achieve as a function of gate error rates and the available number of qubits. For many locally-interacting quantum systems, the simulation gate count is polynomial in the number of qubits [25,28], but for field theories, even research into the scaling of these costs is still in its infancy.
While strides are being made in developing the theory for quantum simulation of gauge field theories , much remains to be explored in terms of developing explicit gate implementations that are scalable and can ultimately surpass what is possible with classical machines. This means that there is much left to do before we will even know whether or not quantum computers suffice to simulate broad classes of field theories.
In this work, we report explicit algorithms that could be used for time evolution in the (massive) lattice Schwinger model, a gauge theory in one spatial dimension that shares qualitative features with QCD [50]-making it a testbed for the gauge theories that are so central to the Standard Model. Due to the need to work with hardware constraints that will change over time, we specifically provide algorithms that could be realized in the NISQ era, followed by efficient algorithms suitable for fault-tolerant computers where the most costly operation has transitioned from entangling gates (CNOT gates), to non-Clifford operations (T -gates). We prove upper bounds on the resources necessary for quantum simulation of time evolution under the Schwinger model, finding sufficient conditions on the number and/or accuracy of gates needed to guarantee a given error between the output state and ideal time-evolved state. Additionally, we give measurement schemes with respect to an example observable (mean pair density) and quantify the cost of simulation, including measurement, within a given rms error of the expectation value of the observable. The following is an outline of this analysis.
In Section 2, we describe the Schwinger Model Hamiltonian, determine how to represent our wavefunction in qubits (Section 2.1), then describe how to implement time evolution using product formulas (Section 2.2, Section 2.4) and why we choose this simulation method (Section 2.3). Additionally, we define the computational models we use to cost our algorithms in both the nearand far-term settings (Section 2.5). In Section 3 and Section 4 we give our quantum circuit implementations of time evolution under the Schwinger model for the near-and far-term settings, respectively. Then, in Section 5, we determine a sufficient amount of resources to simulate time evolution as a function of simulation parameters in the far-term settings and discuss how our results would apply in near-term analyses. With our implementations of time evolution understood, we then turn to the problem of measurement in Section 6, where we analyze two methods: straightforward sampling, and a proposed measurement scheme that uses amplitude estimation. We apply our schemes to a specific observable -the mean pair density -and cost out full simulations (time evolution and measurement) in Section 7. We report numerical analysis of the costing in the farterm setting in Appendix B. We then discuss how to take advantage of the geometric locality of the lattice Schwinger model Hamiltonian may be used in simulating time evolution or in estimating expectation values of observables in Section 8 before concluding.

Schwinger Model Hamiltonian
Quantum electrodynamics in one spatial dimension, the Schwinger model [50], is perhaps the simplest gauge theory that shares key features of QCD such as confinement and spontaneous breaking of chiral symmetry. The continuum Lagrangian density of the Schwinger model is given by Here, F µν = ∂ µ A ν − ∂ ν A µ is the field strength tensor, ψ is a two-component Dirac field with ψ = ψ † γ 0 its relativistic adjoint, and D µ = ∂ µ − iA µ is the covariant derivative acting on ψ.
In their seminal paper [51] Kogut and Susskind introduced a Hamiltonian formulation of Wilson's action-based lattice gauge theory [12] in the context of SU(2) lattice gauge theory. The analogous Hamiltonian for compact U(1) electrodynamics can be written as [52] with where µ = 2m/(ag 2 ) and x = 1/(ag) 2 , with a the lattice spacing, m the fermion mass and g the coupling constant, as described in [52]. (H has been non-dimensionalized by rescaling with a factor 2a −1 g −2 .) The electric energy H E is given in terms of E r , the dimensionless integer electric fields living on the links r of the lattice. For convenience, the most important definitions of symbols used in this paper are tabulated in Section 11. The interaction or "hopping" Hamiltonian H I corresponds to minimal coupling of the Dirac field to the gauge field. On the lattice, the (temporal-gauge) gauge field A µ = (0, A 1 ) is encoded in the unitary "link operators" U r , which are complex exponentials of the vector potential, These are defined to run "from site r to site r + 1." The continuum commutation relations of E and A translate to the lattice commutation relations The fermionic operators ψ r , ψ † r coupled to the link operators live on lattice vertices r and satisfy the anticommutation relations Finally, H M is the mass energy of the fermions; the alternating sign (−) r corresponds to the use of staggered fermions [51].
The full Hamiltonian acts on a Hilbert space characterized by the on-link bosonic Hilbert spaces and the on-site fermionic excitations. Moreover, the physical interpretation of the fermionic Hilbert spaces is occupied even site ∼ presence of a positron empty odd site ∼ presence of an electron, so the operator that counts electric charge at site r is given by Gauge invariance leads to a local constraint, Gauss's law, relating the states of the links to the states of the sites: These G r operators generate gauge transformations and must commute with the full Hamiltonian. It is well-known that in 1d one can gauge-fix all the links (except for one in a periodic volume), essentially removing the gauge degree of freedom. Correspondingly, the constraint can be solved exactly to have purely fermionic dynamical variables at the cost of having long-range interactions. This is not considered in this work since it is not representative of the situations encountered in higher dimensions.

Qubit Representation of the Hamiltonian
A link Hilbert space is equivalent to the infinite-dimensional Hilbert space of a particle on a circle, and it is convenient to use a discrete momentum-like electric eigenbasis |ε r , in which E r takes the form In this basis, the link operators U r and U † r have simple raising and lowering actions, The microscopic degrees of freedom are depicted in Figure 1.
In order to represent the system on a quantum computer, we first truncate the Hilbert space. On the link Hilbert spaces, we do this by periodically wrapping the electric field at a cutoff Λ: This modifies the on-link commutation relation (7) at the cutoff: Figure 1: Labeling of the 1d lattice degrees of freedom on a finite, non-periodic lattice.
This altered theory (which is distinct from simulating a Z(2Λ) theory due to H E remaining quadratic in the electric field) allows the U r operator to be easily simulated using a cyclic incrementer quantum circuit. However, the periodic identification requires one to minimize simulating states that approach the cutoff, in order to avoid mapping |Λ − 1 ↔ |−Λ (which we expect to not be an issue in practice). The asymmetry between the positive and negative bounds is introduced to make the link Hilbert space even-dimensional, so the electric field is more naturally encoded with a register of qubits. The number of qubits on the link registers is given by where we will implicitly assume that Λ is a non-negative power of 2 and all logarithms are base 2.
We map the electric field states into a computational basis with η qubits, which is in a tensor product space of η identical qubits. Here, the Hilbert space of any single qubit is spanned by computational basis states |0 and |1 and the Pauli operators take the form A non-negative integer 0 ≤ j < 2 η is represented on the binary η-qubit register as Using this unsigned binary computational basis, the electric eigenbasis |ε is encoded via The last ingredient to make the theory representable by a quantum computer is a volume cutoff; the lattice is taken to have an even number of sites N , corresponding to N/2 physical sites. In addition to the above truncations, we will take "open" boundary conditions (the system does not source electric flux beyond its boundary sites) and a vanishing background field. Note that the Gauss law constraint limits the maximum electric flux saturation possible in this setup: If Λ > ⌈N/4⌉ then the quantum computer can actually represent the entire (truncated) physical Hilbert space.
The Jordan-Wigner transformation is a simple way to represent fermionic creation/annihilation operators with qubits. In the transformation, the state |0 encodes the vacuum state for a fermionic mode and |1 represents an occupied state. For the creation operator ψ † r , we have (We associate ψ † with σ − because in the computational Pauli-Z basis σ − = |1 0|.) Therefore the interaction Hamiltonian (4) can be expressed as [52] with the Hamiltonian now acting on a tensor product space of link Hilbert spaces and on-site qubit Hilbert spaces. Tensor product notation for operators will often be supressed; at times a superscript b ( f ) may be added to an operator O r to emphasize that O r acts on the r th bosonic (spin) degree of freedom. In terms of the Pauli matrices, H I can be reexpressed as H I in this one-dimensional model is seen to involve only nearest-neighbor fermionic interactions after Jordan-Wigner transformation, but in higher dimensions one will generally have to deal with non-local Pauli-Z strings. Note that in (24) the Hamiltonian manifestly expands to a sum of unitary operators. This is true in spite of truncation due to the periodic wrapping of U r that preserves its unitarity. This observation will be crucial to our development of simulation circuits.

Trotterized Time Evolution
Perhaps the central task in simulating dynamics on quantum computers is to compile the evolution generated by the Hamiltonian into a sequence of implementable quantum gates. Trotter-Suzuki decompositions were the first methods that were proposed to efficiently simulate quantum dynamics on a quantum computer [25,53,54]. The idea behind these methods is that one first begins by decomposing the Hamiltonian into a sum of the form H = j H j where each H j is a simple Hamiltonian for which one can design a brute force circuit to simulate its time evolution. Examples of elementary H j that can be directly implemented include Pauli-operators [25], discretized position and momentum operators [53,55] and one-sparse Hamiltonians [28,56,57].
The simplest Trotter-Suzuki formula used for simulating quantum dynamics is Here we implicitly take all products to be lexicographically ordered in increasing order of the index j of H j from right to left in the product. Similarly we use 1 j=m to represent the opposite ordering of the product. The error in the approximation can be bounded above by [58,59] where · is the spectral norm. This shows that not only is the error in the approximation at second order in t, but it is also zero when the operators in question commute. In general, the error in a sequence of such approximations can be bounded using Box 4.1 of [60] which states that for This implies that if we break a total simulation interval T into s time steps, then the error can be bounded by Therefore if the propagators for each of the terms in the Hamiltonian can be individually simulated, then e −iHT can be also simulated within arbitrarily small error by choosing a value of s that is sufficiently large. The simple product formula in (25) is seldom optimal for dynamical simulation because the value of s needed to perform the simulation within ǫ spectral-norm error grows as O(T 2 /ǫ). This can be improved by switching to the symmetric Trotter-Suzuki formula, also known as the Strang splitting, which is found by symmetrizing the first-order Trotter-Suzuki formula to cancel out the even-order errors. This formula reads (see [61] and [59]) This expression can be seen to be tight (up to a single application of the triangle inequality) in the limit of t ≪ 1 from the Baker-Campbell-Hausdorff formula [59,62]. The improved error scaling similarly leads to a number of time-steps that scales as O(T 3/2 / √ ǫ). Higher-order variants can be constructed recursively [63] to achieve still better scaling [64], via where Until very recently, the scaling of the error in these formulas as a function of the nested commutators of the Hamiltonian terms was not known beyond the fourth-order product formula. Recent work, however, has shown the following tighter bound for Trotter-Suzuki formulas: This specifically follows from Eqns. (189-191) of [59]. A major challenge that arises in making maximal use of this expression is due to the fact that there are many layers of nested commutators. This means that, although this expression closely predicts the error in Trotter-Suzuki expansions [59], precise knowledge of the commutators is needed to best leverage this formula. As the number of commutators in the series grows as M 2k+3 this makes explicit calculation of the commutators costly for all but the smallest values of M and k [65,66]. This can be ameliorated to some extent by estimating the sum via a Monte-Carlo average [67], but millions of norm evaluations are often still needed to reduce the variance of the estimate to tolerable levels. Such sampling further prevents the results from being used as an upper bound without invoking probabilistic arguments and loose-tail probability bounds such as the Markov inequality. Even once these bounds are evaluated, it should be noted that the prefactors in the analysis are unlikely to be tight [59] which means that in order to understand the true performance of high-order Trotter formulas for lattice discretizations of the Schwinger model, it may be necessary to attempt to extrapolate the empirical performance from small-scale numerical simulations of the error.
These higher-order Trotter-Suzuki formulas also are seldom preferable for simulating quantum dynamics for modest-sized quantum systems. Indeed, many studies have verified that low-order formulas (such as the second-order Trotter formula) actually can cost fewer computational resources than their high-order brethren [61,67,68] for realistic simulations. Furthermore, the secondorder formula can also outperform asymptotically-superior simulation methods such as qubitization [68,69] and linear-combinations of unitaries (LCU) simulation methods [70,71]. This improved performance is anticipated to be a consequence of the fact that the Trotter-Suzuki errors depend on the commutators, rather than the magnitude of the Hamiltonian terms' coefficients as per [69][70][71].
For the above reasons (as well as the fact that tight and easily-evaluatable error bounds exist for the second-order formula), we choose to use the second-order Trotter formula in the following discussion and explicitly evaluate the commutator bound for the error given in (30), which is the tightest known bound for the second-order Trotter formula.

Comparison to Qubitization and Linear Combination of Unitaries Methods
One aspect in which the second-order Trotter formulas that we use perform well relative to popular methods, such as qubitization [69], linear combinations of unitaries [69][70][71] or their classicallycontrolled analogue QDRIFT [72,73], is that the complexity of the Hamiltonian simulation scales better with the size of the electric cutoff, Λ. The complexities of qubitization, LCU and QDRIFT all scale linearly with the sum of the Hamiltonian terms' coefficients when expressed as a sum of unitary matrices. This leads to a scaling of the Trotter step number with Λ of O(Λ 2 ). Instead, it is straightforward to note that the norm of the commutators [E 2 r , [E 2 r , H I ]] scale as O(Λ 2 ) and that all other terms scale at most as O(Λ 2 ) (see Lemma 20 for more details). This leads to a number of Trotter steps needed for the second-order Trotter-Suzuki formula that is in O(Λ) from (30). Thus, accounting for the logarithmic-sized circuits that we will use to implement these terms, the total cost for the simulation is in O(Λ), which is (up to polylogarithmic factors) quadratically better than LCU or qubitization methods and without the additional spatial overheads they require [68].
The fact that Trotter-Suzuki methods can be the preferred method for simulating discretizations of unbounded Hamiltonians is also noted in [74] where commutator bounds are used to show that quartic oscillators can be simulated more efficiently with Trotter-series decompositions than would be expected from a standard implementation of qubitization or LCU methods. This is why we focus on Trotter methods for this study here. It is left for subsequent work to examine the performance of post-Trotter methods discussed above.

Trotter-Suzuki Decomposition for the Schwinger Model
In order to form a Trotter-Suzuki formula, we decompose the Schwinger model Hamiltonian into a sum of simulatable terms. Ideally, each term would also be efficiently simulatable. Examples of such terms are Pauli operators or operators diagonal in the computational basis (with efficiently computable diagonal elements) [28,60].
First we define T r to be the interaction term coupling sites r and r + 1 and define D r to be the (diagonalized) mass plus electric energy associated to site r: where we further define The D r are each of a sum of two terms that commute so that The T r are "hopping" terms, which we further decompose in Section 3.1 into four non-commuting Hermitian terms T (i) Using these terms, we determine the corresponding resource scaling of second-order simulation (29). Explicitly, we choose to approximate the time-evolution operator with The following sections discuss the computational models we will use to analyze the cost of implementing time evolution and measurement, using the above approximation.

Definitions of Noisy Entangling Gate and Fault-Tolerant Models
Throughout the rest of the paper, our analysis will usually operate under one of the two computational models discussed here.

Noisy Entangling Gate Model
With the possible exception of topological qubits, physical realizations of quantum computers will usually be able to perform arbitrary single-qubit rotations accurately and inexpensively. Instead, two-qubit interactions will often be more time consuming and/or less accurate. Thus, counting the number of two-qubit gates needed to implement a protocol is a significant metric of the cost of implementing an algorithm in most NISQ platforms. We call this computational model the NEG (Noisy Entangling Gate) model for brevity.
In particular, here we take a more specific model than the one-qubit model. We assume that the user has single qubit gates and measurements that are computationally free and also error free. They also have access to CNOT channels, C x , that act between any two qubits (meaning that the qubits are on the complete graph) and if we define C x to be the ideal CNOT channel then C x − C x ⋄ ≤ δ g . Here the diamond norm is defined to be the maximum trace distance between the outputs of the channels over all input states [75]. This error is assumed to not be directly controllable and thus places a limit on our ability to accurately simulate the quantum dynamics.
Additionally, because fault tolerance is not assumed to hold, we cannot make the simulation error arbitrarily small in the NEG model. Therefore, the analysis needed in this model is qualitatively different than that needed in the case of fault tolerance.

Fault-Tolerant Model
In this model, we choose to minimize the number of non-Clifford operations, specifically the number of T -gates, because they are by far the most expensive gates to implement in most approaches to fault tolerance. This is because the Eastin-Knill theorem prohibits a universal set of fault tolerant gates to be implemented transversally [76]. Magic state distillation is commonly used to sidestep this restriction, but it is still by far the most costly aspect of this approach to fault tolerance [77]. For this reason, we will consider the cost of circuits to be the number of T -gates they require.

Trotter Step Implementation for Noisy Entangling Gate Model
Here we lay out the design primitives that we use to simulate the Schwinger model using quantum computers. The focus here will be on those elements that would be appropriate to use in a resource constrained setting, wherein the number of qubits available is minimal and two-qubit gates are not assumed to be perfect. Later we will adapt some of the components to provide better asymptotic scaling, which is important for addressing the intrinsic complexity of simulating Schwinger model dynamics.

Implementing (Off-Diagonal) Interaction Terms T
The hopping terms in the Hamiltonian govern the local creation and annihilation of electronpositron pairs. Rewriting the associated Hamiltonian contribution that acts on each site 1 ≤ r ≤ N − 1, To fully translate this into qubit operators, we use the linear mapping of electric fields (21) onto a binary register of η qubits: |−Λ → |00 · · · 0 , |−Λ + 1 → |00 · · · 1 , and so on up to |Λ − 1 → |11 · · · 1 . As discussed before, if the electric field cutoff is chosen such that the dynamical population of states at ±Λ becomes negligible, then the digitized link operator may be wrapped periodically, simplifying the following discussion 1 . The combinations of link raising and lowering operators appearing in the Hamiltonian then take the form As written, the operators U ±U † couple all possible nearest-neighbor pairs in the ordered binary basis. Matters are vastly simplified by splitting these operators into terms that are block-diagonal, with each block only mixing a two-dimensional subspace. This is possible by the decompositions and Above, S E = 2 η −1 j=0 |j + 1 (mod 2 η ) j| is a unit shift operator on the electric basis, numerically identical to the periodically-wrapped U , which transformsÃ (B) into A (B). The decompositions above therefore show how to map the action of the link operator combinations into incrementers, decrementers and actions on an individual qubit.
To implement the incrementer of the link basis S E , one can use a quantum Fourier transform and single-qubit rotations in Fourier space, as shown in Figure 2. An asymptotically advantageous alternative to this implementation using η − 1 ancillary qubits is presented in Section 4.1.
Turning to the factors from the fermionic fields, the pertinent operators are with S the "phase gate," |0 0| + i |1 1|. To reduce clutter, these composite operators are denoted by To simulate a hopping term in the Trotter step V (t), we will employ the approximation Figure 3: A circuit to simulate the Schwinger model hopping terms, 1 j=4 e −iT (j) t/2 , in the order corresponding to (50). The locality of the presented operator will be expanded to include η-distance CNOTs between qubits representing fermionic degrees of freedom in quantum registers with one-dimensional connectivity. The gates labeled +1 and −1 are the incrementer and decrementer circuits. where A circuit representation of the right-hand side of (50) is given in Figure 3. This routine can be understood in a simple way by first noting the similarity of the four T (i) operators: Consequently, the whole circuit is essentially just four applications of e −itT (1) /2 along with appropriately inserted basis transformations and rotation angle negations. The specific ordering of the T (i) chosen yields cancellations that reduce the number of internal basis transformations that must be individually executed. A few single-and two-qubit gates are also spared by additional cancellations. The remainder of this section addresses the implementation of e ∓itT (1) /2 . To effect an application of e −itT (1) /2 , one can first transform to a basis in which X ⊗ G is diagonal. (Recall A is just X 0 -a bit flip on the last bit of the bosonic register.) G is diagonalized by the so-called Bell states, withb indicating the binary negation of b, while X is diagonalized by |± = (|0 ± |1 )/ √ 2. From this, we have that Thus, in the Bell basis, we implement rotations conditioned on a and b.
|j0 Figure 4: Simplified circuit for simulating e −iE 2 r t in qubit limited setting. The circuit is shown acting on the product state ⊗ η−1 k=0 |j k to clearly mark which qubit each gate is intended to act upon although the circuit is valid for arbitrary inputs.
The first two time slices of the circuit serve to change to the X ⊗ G eigenbasis. The subsequent parallel R z rotations flanked by CNOTs implement the controlled rotations in the computational basis, taking this is equivalent to acting with e − ixt 4 Z⊗Z . After undoing the basis transformation, we will have effected e − ixt 8 A⊗G . Three similar operations are executed in the remainder of the circuit; an incrementer S E (denoted by "+1"), the phase gates, and the overall minus sign on the rotations in the latter half of the circuit all stem directly from the relations given in (55,56,57).
The above discussion is summarized below as a lemma for convenience.
Proof. The time evolution associated with the electric energy can be exactly implemented utilizing the structure of the operator. As defined in (14), , where Λ is the electric field cutoff. Note that the diagonal elements are not distributed symmetrically-the first diagonal entry is Λ 2 while the last entry is (Λ− 1) 2 . This lack of symmetry is required to incorporate the gauge configuration with zero electric field. However, symmetry can be leveraged by using the following operator identity: The operator E + 1 2 I = 1 2 diag[−2Λ + 1, · · · , −1, 1, · · · , 2Λ − 1] is skew persymmetric-containing positive-negative pairs along the diagonal. We then have from (68) Since unitaries are equivalent in quantum mechanics up to a global phase, we can ignore the last phase in the computation (even if we didn't want to ignore it, it can be efficiently computed as t is a known quantity).
Assuming that each lattice link is given by an η-qubit register to represent the electric fields (Λ = 2 η−1 ), the goal is to find a compact representation of E + 1 2 I as sums of Pauli operators. This can be done by performing a decomposition of positive submatrices using the binary representation of integers. Alternatively, symmetric averages across matrix sub-blocks with dimensions scaling by powers of two may be used to identify where the (η − 1) th qubit in the register represents the most significant digit in the binary representation. Intuitively, the least significant qubit in the binary representation is seen to have a coefficient of 1 2 , necessary to split the unit integer spacing in the value of the electric field of neighboring quantum states.
The leftmost column of R z operations in Figure 4, where can then be seen to implement the linear exponential in 69, This requires η single-qubit rotations and no CNOT gates. The quadratic contribution in 69 is given by a sum of entangling operations involving all qubit pairs in the register. We can manipulate this equation so it is easier to realize its exponential as a quantum circuit. Pulling out summands proportional to the identity, we see that where exponentiating the constant term results in a global phase that we can drop. We give the corresponding quantum circuit in Figure 5.
Note that the network of CNOT i,j operations (with control qubit i and target qubit j) used in implementing (74) may be described by We can reduce the number of CNOTs in Figure 5 by using the following identity: This can be seen by comparing the action of the left-and right-hand side of this equation on the substring |x j , x j+1 , . . . , x η−1 of a computational basis vector |x . Acting with the left-hand side: The right-hand side of (76) requires fewer CNOTs to implement than the left-hand side. Reducing the circuit of Figure 5 to Figure 4 employs an application of (76) η − 2 times between the columns of z-rotations, along with a minor CNOT simplification at the end of the circuit. Counting the CNOTs in Figure 4, we get  Similarly, the number of single-qubit rotations needed for the circuit is Combining this information, the electric time evolution operator, e −iE 2 t , can be implemented exactly (up to a t-dependent global phase) by η(η+1) Figure 5: Circuit for simulating e −iE 2 t in qubit limited setting. The circuit is shown acting on the product state ⊗ η−1 k=0 |j k to clearly mark which qubit each gate is intended to act upon although the circuit is valid for arbitrary inputs.
While being conducive to implementation on qubit-limited hardware, this implementation strategy is attractive as a decomposition into mutually-commuting operators-contributing no additional systematic errors to the Trotterized time evolution operator. It is interesting to note that this set of all two-qubit Z operators has also been found sufficient to implement time evolution of the scalar field mass term [27,[78][79][80][81]. Similar resource constrained results can be found using singular value transformations or phase arithmetic [69,82,83] but these approaches require at least one additional qubit. For fault-tolerent implementation, a quadratically-improved method utilizing arithmetic with ancillary qubits is presented in Section 4.2.

Cost to Implement Approximate Time Step in Noisy Entangling Gate Model
In the following statement, we summarize the cost of implementing a single time step V (t) (38). Note that we may implement these exactly in the NEG computational model, where single-qubit operations are free.

Lemma 3 (Schwinger Time
Step Cost in NEG Model). Consider any t ∈ R. The unitary operation V (t) as defined in (38) may be implemented on a quantum computer in the Noisy Entangling Gate model using a number of CNOTs that is at most Proof. Proof follows by considering the symmetric Trotter-Suzuki expansion of the time-evolution operator. In particular, we have that , where V is defined as in (38) and restated here for convenience: The proof now proceeds by a counting of the number of gates needed to implement V . The mass terms, given in D , are single-qubit operations and require no CNOTs to implement. They are therefore free within the cost model considered here.
The hopping terms are implemented according to the discussion of Section 3.1 and their total cost is determined as follows: 18 explicit CNOTs appear in Figure 3 and the rest are embedded in the two shifters. Each shifter can be implemented as in Figure 2, namely using two quantum Fourier transforms and single-qubit rotations. A single quantum Fourier transform can be done Figure 6: A circuit for incrementing the computational basis. The wire corners denote logical ANDs with ancilla qubits, which are initialized at |0 .
The cost of the electric terms, given in D (E) r , follows from the fact that the procedure given in Lemma 2 can be implemented with no more than (η + 2)(η − 1)/2 CNOTS. Since we use the second-order Trotter formula, each Trotter step consists of 2(N − 1) such evolutions and the total cost is at most (N − 1)((η + 2)(η − 1)) CNOT gates.

Trotter Step Implementation in Fault-Tolerant Model
In this section, we adapt the above implementations for fault-tolerant architectures, recalling that we wish to minimize the number of T -gates. Due to the fact that singe-qubit rotations are not free under the computational model of this section, we will have to implement them approximately in general. We denote our circuit construction of a time step V (t) by V (t), where we introduce the tilde to emphasize the fact that there are approximation errors that arise due to circuit synthesis.

Implementing (Off-Diagonal) Interaction Terms T
First, we consider the hopping interactions.
In the scenario of fault-tolerance, the incrementers used in Figure 3 would be an optimization of one that trades T -gates for more ancillas. An example of such an incrementer, for η = 3, is depicted in Figure 6. Cascading the two-Toffoli construction between ancillas gives the generalization. Using a computation and uncomputation trick for the logical ANDs [84], the total cost for each incrementer is η − 2 Toffolis and η − 1 ancilla.
The R z rotations may be done using the Repeat-Until-Success method discussed in [85], in which one repeatedly tries to implement the rotation with some fixed success probability until it has succeeded. Using one ancilla and measurement, this method implements the one-qubit rotation to ǫ precision in the Frobenius norm (scaled by 1/2) such that for the approximation R ′ of rotation R, and the expected number of T -gates is 1.15 log(1/ǫ). Since ||R − R ′ || S ≤ ||R − R ′ || F , choosing a precision of δ = 2ǫ, the expected T count is 1.15 log(2/δ) in terms of the spectral norm error. Combining all this, we get the following result.
Proof. Figure 3 gives the existence, which is proven to implement 4 j=1 e −itT (j) by the discussion of Section 3.1. As for its cost, note that if the construction in Figure 6 is used then the two incrementers contribute 2(η − 2) Toffolis with η = log(2Λ). Toffolis may be implemented using an additional ancilla qubit and four T -gates each [86]. However, the single ancilla may be reused to implement all the Toffolis. Thus the incrementers contribute 8(η − 2) T -gates and 1 ancilla barring any simplifications.
The only other contributions are the eight z-rotations, each performed to accuracy at least δ/8, which from Box 4.1 of [60] guarantees a cumulative error of at most δ from synthesizing the rotations. Thus their expected T-count is at most 9.2 log(16/δ). Summing the two contributions in terms of T -gates gives the result.

Implementing (Diagonalized) Mass and Electric Energy Terms (D)
Next, we turn to simulating the non-interaction terms in the Hamiltonian.
The easiest part of the second-order Trotterized time step V (t) to implement is the mass term propagators D (M ) , which are single-qubit z-rotations.
Lemma 5. Let D (M ) be defined as in (34). For any t ∈ R (evolution time) and δ > 0 (circuit synthesis tolerance), there exists a unitary operation V M (t) which can be implemented on a quantum computer such that V M (t) − e −iD (M ) t ≤ δ, which has an expected T -gate count of 1.15 log(2/δ) and requires one ancilla.
Proof. The exponential of D (M ) is simply a z rotation. The cited complexity then follows from the repeat-until-success results given in [85].
The most direct way to implement the electric propagators e −iD (E) t = e −iE 2 t involves using a multiplication circuit to evaluate the value of E 2 on a given link and subsequently generating phase changes proportional to E 2 . Here we present an elementary implementation of a multiplication circuit which is based on the logarithmic-depth ripple-carry adder of [87]. There are a host of multiplication algorithms that can be considered. However, for simplicity, we will use "grammar school" multiplication here. The circuit size for this method is expected to be in O(η 2 ) for η bits. More advanced methods, based on Fourier-Transforms such as the Schönage-Strassen multiplication algorithm or its refinement in the form of Fürer's algorithm, can be used to solve the problem using a number of gates in O(η). While these methods are asymptotically more efficient, they often prove to be less practical for modest-sized problems and, furthermore, have space complexity that is harder to bound than that of the comparably simpler grammar school multiplication algorithm.
We give below an implementation of a squaring algorithm based on this strategy and bound the number of non-Clifford operations as well as the number of ancillary qubits needed to compute the function out of place.
Lemma 6 (Squaring Cost in Fault-Tolerant Model). For any positive integer η (link register size) there exists a positive integer α (number of ancillas) and unitary operation U ∈ C 2 η+α ×2 η+α such that for all computational basis vectors |x ∈ C 2 η , U : |x |0 ⊗α → |x |x 2 |0 α−2η where α ≤ 5η − ⌊log η⌋ − 1, using a number of T -gates that is bounded above by Proof. We choose to base our multiplier on a logarithmic-depth adder proposed by Draper et al. in [87]. We use an in-place version of the adder that adds two η-bit integers using at most 10η − 3⌊log η⌋ − 13 Toffoli gates by performing where |a + b is η + 1 bits and |0 ⊗β consists of at most 2η − ⌊log η⌋ − 2 ancillary qubits.
To initialize the grammar school multiplication algorithm, we copy the input |x to an additional η-sized register conditioned on the zeroth bit of x. This requires at most (η − 1) Toffoli gates and acts as Additionally, append a blank ancilla as the most significant bit of the second register: Next, the grammar school multiplication algorithm is done in rounds proceeding initialization. On the jth round (j = 1, 2, . . . , η − 1), we perform the sequence of steps: 1. Copy input |x to the third register conditioned on the jth bit of x. Requires at most η − 1 Toffoli gates.
2. Apply in-place addition algorithm to add the value in the third register (as ordered above) to the η most-significant bits of the second register (i.e., the bits from the 2 j 's place through the 2 η−1+j 's place). Requires at most 10η − 3⌊log η⌋ − 13 Toffoli gates.
To illustrate the effect of a round, the jth round takes Upon completing η − 1 rounds, we will finish with Counting the cost for the entire algorithm including initialization, we require Toffoli gates.
As used prior, Toffolis may be implemented using an additional ancilla qubit and four T -gates each [86], implying a factor of 4 when converting this bound to T -gates. The single additional ancilla may be reused for every implementation.
To conclude, the total number of ancillary qubits needed for this algorithm implies that This multiplier can also be implemented in polylogarithmic depth by noting that each adder can be implemented in O(log η) depth and, by using a fan-out network 2 , the results can be combined in depth O(log η) resulting in a depth of O(log 2 η).
Using the above squaring algorithm, we may implement the complex exponential of the electric energy operator.
Proof. To begin note that Therefore, up to a (t-dependent) phase that can be ignored, we need to implement The phase linear in j can be acquired (up to an unimportant global phase) by applying e −i2 η 2 m tZ/2 to each qubit m = 0, 1, . . . , (η − 1) of |j . The phase quadratic in j can be similarly obtained by first computing j 2 into an ancilla register as described in Lemma 6, then applying e it2 m Z/2 to each qubit m = 0, 1, . . . , (2η − 1) of |j 2 , and finally uncomputing j 2 from the ancilla register.
As there are 2η qubits used to store the value of j 2 and only η bits needed to store j, (89) implies that 3η single-qubit rotations are needed to perform the unitary transformation exactly.
The final step is to consider the cost of performing these rotations using the H, T gate library. Using Box 4.1 from [60] we have that it suffices to implement each rotation within error δ/3η to ensure that the overall error in the circuit is at most δ. It then follows from [85] and the fact that the number of synthesis attempts made by each invocation of repeat-until success synthesis are independent random variables that the mean T count of implementing all the 3η rotations is at most 4.45η log(3η/δ) since the mean number of gates needed to synthesize a single-qubit rotation within operator error ǫ > 0 is at most 1.15 log(1/ǫ). The total T cost is the sum of this cost and the cost of applying Lemma 6 twice. The ancilla count is given by that of the squaring circuit, since it is the circuit element that requires the most ancilla, and its ancillas can be reused to perform the z rotations.
By this point, we have given asymptotically-advantageous implementations of every constituent of the second-order Trotter approximation V (t) of (38).

Cost to Implement Approximate Time Step in Fault-Tolerant Setting
We combine all the relevant lemmas above to determine the resource scaling for a single approximate time step. For convenience, we state the upper bound on the cost after factoring out a sum of terms that show the possible asymptotic behaviors. We use this format here and in subsequent results to emphasize possible asymptotic scalings.
Theorem 8 (Schwinger Time Step Cost in Fault-Tolerant Setting). For any t ∈ R (propagation time) and δ circ > 0 (synthesis error) there exists a unitary operation V (t) which can be implemented on a quantum computer such that V (t) − V (t) ≤ δ circ , where V (t) (the second-order Trotterized time step) is defined in (38). Furthermore, V (t) requires on average a number of T -gates that is at most  We state the result after factoring out the leading order terms. The number of ancillas is given by Corollary 7, as it is the circuit that requires the largest number of ancilla. Adding the number of qubits needed to represent the fermionic and gauge fields, η(N − 1) + N , we arrive at the claimed upper bound on the total number of qubits required for simulation.
We put these results to use in the following section.

Cost of Trotterized Time
Step (e −iHT ) In this section, we use the circuits given in Section 3 and Section 4 to analyze implementation of the time evolution operator (e −iHT ) in the Fault-Tolerant and NEG settings. We emphasize that because we provide loose upper bounds, the results contained here give sufficient (but not necessary) conditions for simulation. For convenience, we touch on the main points of our analysis with the following asymptotic scaling, an immediate consequence of the upper bounds we establish later on.

Corollary 9 (Trotterized Time Evolution Cost in Fault-Tolerant Model).
Consider any δ > 0 (Trotter error), T ∈ R (total evolution time), and let V (t) be the second-order Trotter-Suzuki decomposition of e −iHt as defined as in (38). Additionally, let µ be a constant and x be lower bounded by a constant (i.e. x ∈ Ω(1)).
Under these assumptions, there exists an s ∈ N (Trotter steps) such that V (T /s) s −e −iHT ≤ δ, where V (T /s) s consists of N exp matrix exponentials and Furthermore, there exists an implementation of the Trotter decomposition within spectral-norm error δ of e −iHT that requires a number of T -gates in where O(·) is equivalent to O(·) but with all non-dominant sub-polynomial factors in the scaling suppressed.
Proof of this corollary follows after the next section.

Costing e −iHT Simulation in the Fault-Tolerant Model
With a second-order approximation, we exploit the fact that many of the Hamiltonian summands commute and calculate an error bound through (30), restated as, We order the summands of H as follows, recalling that the decomposition of D r into D 1 , T 1 , T 1 , T i , D j ] = 0 when i + 1 < j. In other words, operations with at least one lattice site between them commute. We calculate the right-hand side of (92) in Appendix A according to these simplifications, giving the bound of Corollary 21, restated here: We recall the definitions of the parameters of the simulation, that N is the number of lattice sites, Λ is the electric cutoff, t is the time step length, x = 1/(ag) 2 (where a is the lattice spacing and g is the coupling constant), and µ = 2m/ag 2 with m the mass of the particles on the lattice. Ultimately, we have the following lemma.
Lemma 10 (Trotter Step Count with Second-Order Product Formula). Consider any δ > 0 (total Trotter error) and T ∈ R (total evolution time), and let V (t) be the second-order Trotter-Suzuki approximation to e −iHt as defined as in (38). Then we have that V (T /s) s − e −iHT ≤ δ provided that where ρ(δ) = δ 1/2 N 1/2 T 3/2 Λx 1/2 (96) Proof. From (27) we have V (T /s) s − e −iHT ≤ sδ Trot , where δ Trot = V (T /s) − e −iHT /s , so we seek the minimum s such that sδ Trot ≤ δ. Corollary 21 gives a bound on δ Trot , which we may use to choose s. For convenience, let so that implying that to guarantee V s (T /s) − e −iHT ≤ δ. Since the number of time steps needs to be an integer, it suffices to choose Combining this result with the cost to implement a single Trotter step, we can determine the number of T -gates required to construct e −iHT to arbitrary precision δ. Note that below we evenly distribute the errors due to circuit synthesis and due to Trotterization. However, when we later incorporate measurement we will allow for different allocations of the errors in order to numerically optimize the T -gate count with respect to the error distribution.
We can now easily see Corollary 9, the asymptotic scaling of implementing e −iHT in the faulttolerant setting.
Proof of Corollary 9. Proof of the scaling of the number of operator exponential directly follows from Lemma 10 and (93). The scalings for the cost of fault-tolerant simulation is given in Corollary 11.

Costing e −iHT Simulation in NEG Model Setting
Here, we briefly discuss the cost of implementing the unitary e −iHT within the NEG model (see Section 2.5.1), since more rigorous analysis follows once we introduce measurement in Section 7.2.
The analysis required in this model is different than that required in the Fault-Tolerant model. This is because, due to the assumptions of the NEG model, we may not approximate e −iHT to arbitrarily small error. However, we know that where K (propagation time cubed, along with the parameters of the Schwinger model) is defined according to (94). If we may evaluate V (T /s) − V (T /s) , we can find the s for which the bound above is minimized, which is near The number of CNOTs required to approximate e −iHT to nearly-optimal worst-case error is then N g s err , with N g the bound on the number of CNOTs per timestep given in Lemma 3. Another possible scenario is if we are given a fixed number N x of noisy CNOTs as a resource. In this case, we would be limited to ⌊N x /N g ⌋ timesteps, and the only remaining free parameters in simulation are in K, since V (t) − V (t) is independent of t within the NEG Model. The effect of tuning these parameters on the error bound (109) is implied within (94).
In Section 7.2 we will introduce measurement and translate the diamond-norm error due to the noisy CNOT resource to error in estimating a time-evolved observable (as opposed to time-evolution operator spectral-norm error). In that section, we give more complete analysis that may be easily applied to other near-term Schwinger model simulations of interest.

Estimating Mean Pair Density
An important objective of simulation is to compute observables rather than naively try to learn the amplitudes of the final state wavefunction. Therefore, in order to fully analyze the cost of simulation, we must choose some illustrative observable to examine. For the following, we adapt the above discussions to determine a sufficient number of calls to our time-evolution algorithm to estimate the expectation value of a simple example observable. We will estimate the mean pair density by way of the mean positron densityN p /N , whereN p counts the total number of positrons on the lattice, i.e.N p = N/2 i=1n 2(i−1) ; the mean positron density and mean pair density are synonymous in the physical sector. However, we emphasize that much of this analysis would be similar for other operators of interest such as fermionic correlation functions of the form [62] C j (t) = ψ 0 | e −iHt a † j e iHt a j |ψ 0 . The cost of such simulations is asymptotically the same as those of the mean-pair operator after applying the Jordan-Wigner transformation and a "Hadamard Test" circuit to learn the expectation value. However, we focus on the mean pair density because of its relative simplicity compared to other physically meaningful observables.
The following two subsections give distinct schemes of estimation. Section 6.1 uses straightforward sampling. Section 6.2 employs amplitude estimation to achieve improved asymptotic scaling of calls to our time-evolution algorithm (by a factor of 1/ǫ).

Estimating Mean Pair Density using Sampling
The following lemma determines a sufficient number of calls to our time-evolution algorithm (or "shots") N shots to estimate the mean pair density, to rms errorǫ via straightforward sampling.
We give this result in terms of a parameter κ, which will be used later to optimize the resource scaling over the allotment of error to each contribution: the Trotter-Suzuki decomposition, the approximate circuit synthesis and the measurement error. As in previous sections, we factor out the dominant terms for convenience in subsequent discussions.
Lemma 12. Let 0 < κ < 1, and let |ψ , |φ ∈ C 2 η(N−1)+N be normalized states for the electric and fermionic fields on the Schwinger model lattice. If | |ψ − |φ | ≤ǫ √ κ and the number of samples in a stochastic sampling of the number of positrons in the state |φ is given by then the empirically observed sample mean N satisfies we may thus bound The second line is given through the triangle inequality. The third line is given by factoring and applying (111) twice, noting that both states are normalized. Also, we used the fact that ||N p || = N/2. We are assuming that N is an unbiased estimator of a stochastic sampling of the mean number of positrons in the approximate state, using N shots samples. We have that E(N ) ≡ φ|N p |φ . The variance of this estimator is We then have that the rms error in the process is Figure 7: Circuit for computing expectation value of a set of 2 m unitary operations, where each U j is a unitary operation. The unitary operator U ψ is defined to prepare the state |ψ i.e. U ψ : |0 → |ψ . The expectation value is related to the probability of measuring all qubits to be in the zero state as shown in Lemma 13.

Estimating Mean Pair Density using Amplitude Estimation
We assumed in the previous discussion that the mean number of positrons is gleaned by repeatedly measuring the system. A drawback of this approach is that the number of measurements scales as O(1/ǫ 4 ) whereǫ is the target rms error in the mean positron density. In contrast, using ideas related to Grover's search, the mean value can be learned using quadratically fewer queries. We present a simple method for estimating this mean using linear combinations of unitaries. The operator that we aim to measure iŝ We will refer to N p and the above quantity N p − N/4 interchangeably as the positron number. Figure 7 provides a way to estimate the mean positron number on a lattice with 2 m physical sites from the probability of measuring a control register to be |0 ⊗m . Gauss' law equates the number of positrons with the number of electrons at all times in the physical sector dynamics, though the error budget allows for errors that cause the state to leak into the unphysical space. Figure 7 Then

Lemma 13 (Generalized Hadamard Test). Let U be the unitary transformation enacted in
. Given access to controlled unitary implementations of U 0 , . . . , U 2 m −1 the circuit can also be implemented using 2m ancillary qubits within the Clifford+T gate library.
Proof. The circuit implementation of U performs where (|0 0| ⊗ I) |φ = 0 and C is an irrelevant constant. The probability of measuring all the qubits to be 0 is therefore The result then immediately follows.
The above lemma may be applied to the positronic sites of a Schwinger model lattice of N = 2 m+1 total fermionic sites. For convenience, we shall restrict our attention to the reduced lattice of 2 m positronic sites when proving the following corollary.
and Z j is the Pauli-Z operator acting on the j th qubit, and let Ψ † j be the creation operator acting on the positron mode j ∈ [0, 2 m − 1]. Then Further U can be implemented using 2 m (m − 2) Toffoli gates and 2(m + 1) ancillas.
Proof. The proof follows from the generalized Hadamard test given in Lemma 13 after applying the Jordan-Wigner transformation. The Jordan-Wigner representation the fermion number operator acting on site j is Ψ † j Ψ j = (I − Z)/2. Thus the mean positron number is Next it follows from the definition of U j that  We therefore have from Lemma 13 that The bound on the number of ancillary qubits comes from noting that if U j has m controls then we can implement this using a m − 1 controlled not gate and a single controlled U j gate using an additional ancilla to store the output. Using [88], such a gate can be implemented using at most m − 2 Toffoli gates each requires an ancillary qubit. Therefore the number of ancillas needed in this protocol is m − 1. However, if each Toffoli gate is implemented using [86] then an additional ancilla is required to store the output. Thus the number of qubits needed to implement the controlled operations is at most m. Since m + 2 ancillas are required in in the generalized Hadamard test protocol on m + 2 qubits, this shows that the total number of ancillas used (including those used in the generalized Hadamard test circuit) is at most 2(m + 1).
Theorem 15 (Estimating Mean Pair Density). Let U ψ be an oracle such that U ψ |0 = |ψ and assume the user only has access to an oracle U ψ such that U ψ − U ψ ≤ǫ 8 . There exists a quantum circuit that outputs an estimate N of the mean positron number per site such that the rms error in the estimate satisfies E(|N − 1 ψ| Ψ † j Ψ j |ψ | 2 ) ≤ǫ 2 < 1 using a number of queries to U ψ that is bounded above by and a number of auxillary T -gates that are bounded above by Proof. The proof follows directly from prior results, the Amplitude Estimation Lemma (as given in Lemma 5 of [89]) and the Chernoff bound. For brevity, let us defineN p = 2 m −1 j=0 ψ| Ψ † j Ψ j |ψ . First note that assuming that N is an unbiased estimator of ψ|N p | ψ (which amplitude estimation yields) we have that Thus the mean square error is at most the sum of the variance of the estimate from amplitude estimation using a faulty state and the maximum square error in the mean due to simulation and synthesis.
We have from Corollary 14 that if the expectation value of the probability deviates by δ L from the true expectation value 1 Next we need to relate δ L to the simulation error. First note that from the triangle inequality we have that operator O of norm 1 We then have that if U ψ − U ψ = δ andN p is Hermitian then it follows from (125) and the fact that N p /2 m = 1 that In turn we have that sinceN p 0 as claimed. Now let us turn our attention to estimating the mean positron number using amplitude estimation. The Amplitude Estimation Lemma states that amplitude estimation can be performed yielding an estimate of the probability using L iterations of the Grover operator that is defined by the reflections I − 2 |0 0| and I − 2U |0 0| U † . The error in the estimate yielded in a bit-string by amplitude estimation can be made at most ǫ L with probability at least 8/π 2 , by Lemma 5 of [89], by choosing L to satisfy Solving the resultant quadratic equation for L > 0 yields On the other hand, if an error occurs in the amplitude estimation routine then the error in the inferred positron number is at most 2 m , which means the error per site is at most 1. Thus if the success probability is at least P and if we choose ψ| Ψ † j Ψ j |ψ )/4 then the mean-square error in the procedure is at most Next let us choose and ǫ L = ǫ/2, where ǫ 2 is the target upper bound in the mean-square error for the estimate. We then have that P ǫ 2 2 In order to satisfy (132) it suffices to aim for a success probability that satisfies As the probability of success is at least 8/π 2 [89], we cannot directly guarantee that we can achieve this accuracy using a single run.
In fact the probability of success from amplitude estimation is in practice slightly lower than the 8/π 2 lower bound given in [89]. This is because finite error is introduced in implementing the quantum Fourier transform using H, T and CNOT gates. For any projector P , and unitary operations V and V we have that Thus if the error in the Fourier transform is at most ǫ QF T then from the union bound 3 we have that it suffices to take The success probability can, however, be amplified to this level at logarithmic cost using the Chernoff bound. Specifically, if we take the median of R different runs of the algorithm then the probability that the median constitutes a success is at least This implies that (133) can be satisfied if we take ǫ ≤ 1, ǫ QF T ≤ ǫ/8 and Next let us count the total number of operations. Every application of U involves two queries to U ψ . Thus since two applications of U are required in the Grover operator I − 2U |0 0| U † we then have that four queries of U ψ are required per Grover iteration. Thus it follows from (130) and (137) that the total number of queries needed obeys The number of T -gates required in addition to the oracle queries is slightly harder to find. First, note from [88] and [86] that an (m − 1)-controlled Toffoli gate requires 2(m − 2) + 1 standard (two-controlled) Toffoli gates to implement, which in turn requires at most 4(2m − 1) T -gates if measurement is used. Further using the result of [84], half of these Toffolis can be negated using measurement and post-selected Clifford operations, as used prior in Section 4.1. Thus the minimal cost of each m-controlled Toffoli gate is at most 4(m−1) T -gates. There are at most 2 m+1 such gates so the number of auxiliary operations is 2 m+3 (m − 1) to implement U and thus (2 m+4 + 4)(m − 1) auxiliary T -gates suffice to implement I − 2U |0 0| U † . Similarly, at most 4(m − 1) T -gates are needed to implement the controlled gate in I − 2 |0 0|. Each Grover iteration therefore requires at most (2 m+4 + 8)(m − 1) auxiliary T -gates. Therefore Next we need to bound the complexity of performing the quantum Fourier transform as part of the phase estimation algorithm used above. As discussed previously, we have that ⌈log(L)⌉ = log 2 √ 2π ǫ + 4 additional qubits are needed to perform the phase estimation. Thus the number of single qubit rotations needed is at most We need to ensure that the error is at most ǫ 2 /8 from these rotations. From Box 4.1 of [60] we then have that the error per single qubit rotation can be as high as ǫ 2 /(8 log(L)(log(L) + 1)). The average gate complexity for implementing these rotations using RUS synthesis [85] is at most The claim involving the number of auxillary T -gates then follows from (139) and (141) after taking ǫ = ǫ.

Simulation including Estimation of Mean Pair Density
While many choices for an observable could have been made, we have focused on a particular observable: the mean positron density, which in turn gives us the number of pairs produced (on average) in the quantum dynamics. For simulations optimized for fault-tolerant quantum devices, the number of T gates needed to simulate this quantity within fixed rms error is described in the following corollary; this corollary is the immediate consequence of upper bounds we will give for fault-tolerant simulations involving measurement via sampling or amplitude estimation. Similarly, the number of T -gates needed to simulate the evolution of the mean pair density using amplitude estimation is in and requires O(N log(Λ) + log(1/ǫ)) qubits.
The following section will end with the proof of this corollary.

Cost Analysis in Fault-Tolerant Model
We combine the results of prior sections to determine a sufficient number of T -gates to simulate the pair density evolution for arbitrary simulation parameters toǫ precision in the rms error of the mean pair density. First, we consider estimation via sampling, incorporating parameters τ and κ to encode the distribution of error (in order to optimize this distribution with respect to T -gate cost).
Furthermore, the entire process requires on average at most T -gates, where the factors are ρ as in Lemma 10, λ as in Theorem 8, ν as in Lemma 12, and and N (η + 1) + 4η − ⌊log η⌋ − 1 qubits are required.
Proof. Assume 0 < κ < 1, which we include in order to numerically optimize the error distribution.

We get that
By Lemma 12, this implies that Theorem 8 gives the cost of a single Trotter step as a function of δ circ , Λ = 2 η−1 , and N . Denote this cost by C Trot . The total cost of the entire process is thus C := sC Trot N shots , which is, with fixed lattice parameters, only a function of the parameters κ and τ that encode the error distribution.
We numerically minimize the total cost, C(κ, τ ), over κ and τ for given lattice parameters using Mathematica, reporting the results in Appendix B. Unable to produce an analytical minimum that is both lucid and avoids approximation, we have stated the result with κ = τ = 1/2 and note that, over the parameter range in Appendix B, the upper bound on C is no more than twice the optimal T gate bound found numerically.
Next, we consider a simulation incorporating the alternative measurement scheme using amplitude estimation. ≤ǫ 2 can be satisfied using amplitude amplification with a quantum computation that uses a number of T -gates bounded above by and at most N (η + 1) + 3η − ⌊log(2η − 1)⌋ + 1 + 2⌈log(N/2)⌉ + log 2 Proof. In order to prove the bound on T gates we need to apply the result of Theorem 15 but provide a concrete quantum circuit implementation of the oracle. In particular, we require that this simulation satisfy U ψ − U ψ ≤ǫ 8 in total. This error needs to be distributed over two sources of error: Trotter-Suzuki error and circuit synthesis error. For simplicity, we choose both sources of error to be at mostǫ 16 . We then have from Lemma 10 that the number of Trotter steps s needed to simulate the pair density within an error tolerance ofǫ/16 is at most This process needs to be repeated N query times to achieve the target accuracy, where from Theorem 15 we have that Next note that a single query to a controlled Trotter step requires two repetitions of each rotation to implement the controlled version of the operator. Note that this is in contrast with static simulation wherein doubling the number of rotations is not needed [61]. Therefore the total number of Trotter steps in the amplitude amplification protocol obeys We then have from Theorem 8 that the cost of synthesis with s steps is N η(η + ln(96s/ǫ))λ(ǫ/16s) Thus the contribution to the total T -count from the Trotter-Suzuki operations in the simulation is at most the product of (149) and (150) which is The auxillary T -gates needed to perform the Toffoli operations used in the amplitude estimation routine is then simply given by (139). The result then follows from summing these two results and using the definition of α(ǫ) to simplify the result. The total number of ancillas needed for this procedure will be greater than that required by the sampling-based approach because of the additional qubits needed for the generalized Hadamard test as well as those needed to perform the phase estimation within amplitude estimation. The number of ancillary qubits required within amplitude estimation is at most However, the circuit for phase estimation requires adding an additional ⌈log(L)⌉ controls to the operations in the generalized Hadamard test. Any qubits used in synthesis of Toffoli gates can be absorbed into those accounted for in other steps. Thus from Lemma 13 and Theorem 17 the number of qubits is at most N (η + 1) + 4η − ⌊log η⌋ − 1 + 2m + log 2 √ 2π ǫ + 4 as claimed.
The asyptotic scalings for simulating the mean pair density, stated in Corollary 16, now follow from the above theorems.

Cost Analysis in NEG Model
In this section, we would like to determine sufficient resources to simulate real-time dynamics of the mean-pair density in the NEG setting (defined in Section 2.5.1); assume that the user has access to a quantum computer that is equipped with CNOT and single qubit gates wherein the single qubit gates, preparation and measurement are error-free and CNOT can be implemented within diamond distance δ g of the ideal CNOT channel. Additionally, as before,ǫ is the rms deviation from the true expectation value of the mean positron number per lattice site.
More explicitly, CNOT is the only gate with non-negligible error and, furthermore, if C X is the ideal channel that carries out the CNOT gate, then we assume that the user has access to C X such that C X − C X ⋄ ≤ δ g .
Theorem 19 (Real-Time Evolution of Mean-Pair Density in NEG Setting). Assume a quantum computer is provided with a universal set of single qubit gates and measurements that are error free. Further, assume that for any pair of qubits i, j the computer is equipped with a quantum channel C x such that C x − CNOT i,j ⋄ ≤ δ g . We then have that the smallest achievable root-mean-square error in positron number per lattice site evaluated at time T -for a Schwinger model simulation in 1 + 1D using second-order Trotter formulas-is at most . The optimal error target for the error in the Trotter-Suzuki formula is similarly at least Proof. Let us take C p x to be the p-fold composition of the quantum channel C x , which is the ideal CNOT i,j channel. We then have from the definition of · ⋄ that for all p ≥ 1 Let us assume that for some p ≥ 1 we have that C p x − C p x ⋄ ≤ pδ g . We then have from the sub-multiplicativity of the diamond norm that Since the result holds by assumption for p = 1 it follows by induction that it also holds for all p ≥ 1. Thus the error from composing the channel is at most additive. It is shown in [90] that for any operator M and channels C and C max ρ In our case, the observable M is the mean positron number per lattice site which is at most 1/2. We therefore have that if our simulation circuit contains N g CNOT gates with errors per gate of δ g , then where C tot and C tot denote the ideal total channel and the noisy total channel, respectively. We can bound N g in terms of the target Trotterization error using prior results. From Lemma 3, a complete Trotter step for the lattice using the second-order formula costs at most (N − 1)(9η 2 − 7η + 34) CNOTs. If the total number of Trotter steps used is at most s, then the number of CNOT gates is bounded by We can use Lemma 10 to choose s such that our Trotterization error is at most δ Trot . This implies that It then follows from Lemma 12,(156), and (158) that ifǫ is the rms error in the true expectation value of the observable, and if we choose s such that the Trotter-Suzuki error is at most δ Trot , then we havẽ (159) δ g = 10 −3 δ g = 10 −4 δ g = 10 −5 δ g = 10 −6 δ g = 10 −7 ǫ 2 CNOTǫ 2  Table 1: Upper bound on the worst-case mean-square error in sampling the mean positron number density (ǫ 2 ) in a six qubit Schwinger model simulation, along with a sufficient number of CNOTs per shot taken to ensure that the bias in the estimate is appropriately small. Note that the range of the square of the mean positron density is [0, 1/4], and "-" represents when the error bound falls outside this range. This is tabulated over x = (ag) −2 (where a is the lattice spacing and g is the coupling constant) and the diamonddistance for the CNOT channel (δ g ), with fixed η = N = 2, T = 10/x and µ = 1.
Here, N shots is the number of samples used in the estimate of the mean. The errors add in quadrature as demonstrated prior in the proof of Lemma 12. Next let us take ǫ 2 min := lim N shots →∞ ǫ 2 . Therefore we have that We then can find an upper bound on the minimal achievable value ǫ min through calculus. Specifically, we set Solving for this using computer algebra, we find that if we take the constant part of ρ to be Γ := ρ(δ) − δ 1/2 N 1/2 T 3/2 Λx 1/2 then the optimal Trotter error to choose is δ Trot,opt = T δ g ΓN 1/2 (N − 1)Λx 1/2 (9η 2 − 7η + 34) Next, substituting (162) into (161) yields the least upper bound on ǫ min is Table 1 shows that highly accurate CNOT gates (diamond distance at most 10 −5 ) will be sufficient to simulate short evolutions in the weak coupling limit. Even smaller infidelities given by diamond distances of 10 −7 are sufficient to demonstrate a small simulation at strong coupling (x ≤ 0.1). Owing to the looseness of bounds such as this that have been seen in simulations of fermionic systems [67], the data in Table 1 suggests that numerical studies may be needed to assess whether existing quantum computers will be able simulate the Schwinger model in the strong coupling regime or whether further algorithmic optimizations will be needed to do so.

Incorporating Locality Constraints via the Lieb-Robinson Bound
It is apparent, by examining (34)(35) that the interaction between electron and positron sites is geometrically local with nearest neighbor interactions on the line. The locality of interactions here actually can potentially yield benefits for simulation as noted in [91,92]. This advantage stems from the use of Lieb-Robinson bounds which show the existence of an effective light cone for the evolution of local observables. Observables that have lightcones that do not intersect can be shown to have commutators that are exponentially small and thus Lieb-Robinson bounds provide a mechanism for dividing the time evolution into quasi-independent pieces that can be evolved separately with far less error than would be suggested from bounds that do not exploit these light cones that emerge from the locality of the Hamiltonian.
This locality can be made manifest by rewriting the Hamiltonian as H = r . Lemma 6 in [91] can be applied to H, i.e. for any disjoint regions A, B, and C on the lattice there are constantsμ, v ≥ 0 such that, where bd(A ∪ B, C) denotes the index sets that span the boundary between A ∪ B and C. The constant v is often called the Lieb-Robinson velocity and, for the systems with strictly local interactions, it can be estimated as (see Apendix C.6 in [91]) . For the Schwinger model the graph degree d = 2 and K is an upper bound on the commutator [h X , h Y ] . We can directly compute K = x(µ + 4Λ) and thus v ∈ O(x 1/2 (µ + Λ) 1/2 ). Also, using the definition of H bd = H A∪B∪C − H A∪B − H C we can compute Z:bd(A∪B,C) h Z ≤ 2x. If we denote δ as the unitary operator approximation error then by using (164) we can estimate the partitioning block length l = dist(A, C) ∈ O(log( x δ ) + vT ) where we have setμ = 1. For the lattice of a given size N we will need to recursively apply N/l partitions with the total unitary decomposition error O(δl/N ). After each partitioning step we will be left with a Schwinger sublattice of the size l for which we apply the second order symmetric Trotter decomposition discussed in Section 5. From Lemma 10 we can estimate the number of T gates required to implement a single block of length l on the lattice with N sites N/l times as On the other hand, the T -gate gate count in our Trotter-based algorithm, that does not explicitly use locality of the interaction, can be readily computed by multiplying the number of Trotter steps s in Lemma 10 with the T -gate count per step in Theorem 8. Comparing the resulting gate count with that of (165) we conclude that the asymptotic scaling of our Trotter-based algorithm and the algorithm of Haah et al. [91] is identical. So far we have established that there is no explicit asymptotic cost improvement from Lieb-Robinson bounds to the Schwinger model time evolution algorithm. Recall, however, that ultimately we are interested in estimating the expectation value of local observables such asN p . Can we use the locality of the observable and the Hamiltonian to reduce the computational cost? Intuition suggests that because correlations propagate at a finite velocity v the effect of distant lattice sites on the dynamics of local observables can be sufficiently small beyond certain characteristic radius. Thus the effective dynamics can be approximated by the Schwinger Hamiltonian defined on a sublattice around a site of interest. Let us first bound the size of such a sublattice by using the Lieb-Robinson bound for exponentially decaying interactions (see Appendix C.4 Eq. (40) in [91]). Consider a local observable Z k with support on a single lattice site k. If H is the Schwinger model Hamiltonian for the entire N -site lattice Ω and H Ω k is the Schwinger model Hamiltonian defined for a sublattice Ω k (|Ω k | ≤ N ) around the site k then the following upper bound holds: where ξ = (8x + µ 2 + Λ 2 )e, η = K/( h X h Y ) = (µ + 4Λ)/(µ + 2Λ 2 ), and Ω c k is the compliment to the set Ω k (here we are using equations 15 and 40 in [91]). Given time t and arbitrary δ ≥ 0, choosing the cardinality of Ω k to be l = |Ω k | ≥ | ln The knowledge of l enables us to estimate the cost of quantum simulation of the mean number of positronsN p using the techniques of Theorem 17.
Let us denote W k (t) the second order Trotter-Suzuki approximation of e −itH Ω k and introduce the following expectation values where |ψ 0 is an arbitrary initial state. We can estimate the number of shots N shots needed to bound the rms error in computing the expectation value ofN p to δ using Lemma 12. The key insight here is that where we used (166) (setting |Ω k | = | ln √ δ −ln( 2ξ|t| √ η )−ξ|t| √ 8η| for all k ) to upperbound E ψ (N p )− E φ (N p ) and the second order Trotter-Suzuki bound for E φ (N p ) − E χ (N p ) . Since the variance V(N ) of the empirically observed sample mean N can still be bounded by N 2 4N shots we conclude that the number of samples N shots we need to take to satisfy E given by Lemma 12 withǫ 2 κ ∈ O(δ). Therefore, we can use Theorem 17 to estimate the simulation cost ofN p by replacing the lattice size N with N eff = | lnǫ − ln( 2ξT √ η ) − ξT √ 8η| . Thus we have from Corollary 16 that the number of T -gates needed to perform the simulation using amplitude amplification is in We note that a particularly interesting simulations regime is the one for very large lattice sizes N and fixed cutoff Λ. In this regime the Lieb-Robinson argument would allow to significantly improve the cost of simulation by removing the dependence on N at the expense of increasing the dependence on T quadratically. Recent work has further shown that if high-order product formulas are used in the place then the temporal scaling of high-order Trotter-Suzuki approximations to e −iHt can be improved from O(T 3 /δ 1/2+o(1) Trot ) to T 2 (T /δ Trot ) o(1) [59], where (T /δ Trot ) o(1) is the sub-polynomial contribution to the scaling that comes from adaptively choosing an increasingly high-order Trotter formula as T increases. While the nested commutators in such an expansion are harder to evaluate to determine the scaling of the cost with ζ, η, x and Λ, this other work nonetheless shows that in some regimes our approach can be improved by going to higher order Trotter formulas.

Conclusion
The foremost contribution of this article is having detailed the first explicit quantum algorithms to simulate the lattice Schwinger model on a quantum computer and proven them to be scalable in the computational models considered. As the Schwinger Model is a testbed for the study of gauge field theories underpinning the Standard Model, the algorithms of this paper are an important stepping stone towards simulating quantum electrodynamics in higher dimensions as well as more general quantum field theories both in the NISQ era and beyond.
Further significance of this work is seen in its comparison to state-of-the-art quantum simulation algorithms. As discussed in Section 2.3, our Trotter-Suzuki-based algorithms scale only as O(T 3/2 Λx 1/2 ). This is quadratically better scaling with the electric field cutoff Λ than would be expected from other simulation algorithms such as qubitization or QDRIFT.
We also analyze simulation of the Schwinger model in a simplified cost model that is appropriate for near-term simulations wherein qubits and entangling gates are regarded as the costly resources. A similar O(T 3/2 Λx 1/2 ) scaling for the number of two-qubit gates in this setting has been shown, while also calling for only one ancillary qubit. This scaling by itself, while technically true, neglects the effects of gate error on the simulation; we address this by providing an analysis of the minimal achievable root-mean-square error as a function of the diamond distance between the two-qubit gates and the target, and providing sufficient conditions on the diamond norm to allow a given simulation. These contributions are significant since it allows subsequent NISQ-era algorithms to be explicitly compared to our approach. Subsequent optimizations can therefore rigorously quantify their improvements over the initial simulation strategies set forth herein.
There are several important avenues of investigation to pursue beyond what we have considered. While we chose Trotter-Suzuki formulas in our simulation algorithms because of their potential advantages over other state-of-the-art methods, it is possible that explicit implementation of these other algorithms may yet yield unforeseen benefit, so we leave it to subsequent work to examine their performance and compare the effectiveness of each algorithm in simulating the lattice Schwinger model. Analysis of qubitization, the interaction picture, and multi-product formulas, as well as numerical studies of the Trotter error, will be of particular interest for subsequent studies. In actual quantum simulations of the Schwinger model, one might also wish to add in a background field or periodic conditions, the algorithms for both of these being straightforward extensions of what has been introduced in this article.
Importantly, while this work has focused on establishing rigorous upper bounds on the resources needed in both NISQ-era quantum computing and beyond, the bounds may prove to be loose compared to the empirically-observed resources needed. A further numerical study would be useful to understand the gaps between our bounds and the actual performance in small-scale simulations.
Perhaps the ultimate goal of subsequent work is to generalize the analysis in this paper to the simulation of gauge field theories in higher dimensions and for non-Abelian gauge groups. These innovations will allow us to not only simulate quantum electrodynamics in higher dimensions, but will also provide valuable insights into how to simulate quantum field theories that better represent what we know about Nature on quantum computers.

Acknowledgements
The authors would like to thank Natalie Klco for conceptual discussions and contributions to Sections 2 and the circuits of Section 3 of this work and M. Savage  Operator that counts total number of positrons in the lattice E (E r ) Electric field operator (on link r) U (U r ) Link operator (on link r) x Abbreviation for the quantity (ag) −2 T r The "hopping" Hamiltonian operator between fermionic mode r and r + 1 D r The diagonal Hamiltonian operator yielding the mass-and electric-energy at site r. v Lieb-Robinson velocitȳ µ Scaling constant between physical distance and graph distance in lattice. ǫ Maximum tolerable root-mean-square error in the observable in a quantum simulation. || · || Spectral norm of an operator

B Numerical Evaluation and Analysis of T -Count Upper Bounds
This appendix reports the upper bound of "Sampling," Theorem 17, and the (unoptimized) upper bound from "Estimating," Theorem 18, evaluated over the indicated lattice parameters. The cost bound of Theorem 17 is numerically minimized with respect to τ and κ using Mathematica.
We express the time of the following simulation scenarios in terms of 1/x since the dimensionless quantity xT describes the coupling that emerges between subsystems in time-dependent perturbation theory. This allows comparison to the capabilities of classical algorithms, which are anticipated to not have capabilities that extend far into the "weak coupling" and "long time" regimes.
Note in Figure 9 we see that, under our assumptions, the upper bound ratio is consistently greater than one in reasonable parameter ranges. This hints that a simulation using amplitude estimation may be beneficial only at more extreme lattice parameters. However, since we are considering upper bounds on the cost of simulation, this conclusion is uncertain.