Compilation by stochastic Hamiltonian sparsiﬁcation

Simulation of quantum chemistry is expected to be a principal application of quantum computing. In quantum simulation, a complicated Hamiltonian describing the dynamics of a quantum system is decomposed into its constituent terms, where the eﬀect of each term during time-evolution is individually computed. For


Introduction
Chemistry simulation is expected to be a principal application of quantum computing, revealing the properties of chemical bonds and interactions by simulating classically intractable systems, with applications in pharmaceuticals, material science, and industrial chemical manufacture [1,29].For example, efficient simulation of the chemical cluster FeMoCo [5,9,35] may allow more efficient nitrogen fixation, improving the manufacture of fertilizers.The difficulty of directly solving the Schrödinger equation of chemically interesting problems renders even moderately com-Yingkai Ouyang: y.ouyang@sheffield.ac.ukDavid R. White: d.r.white@sheffield.ac.ukEarl T. Campbell: earltcampbell@gmail.com plex systems classically intractable.Using quantum mechanics to simulate quantum systems [25] will provide unprecedented detail in the solution to chemical problems, enabling us to better understand the dynamics of highly entangled systems, predicting their properties and chemical reactions [1,17].For example, simulation can be used in phase estimation to extract the eigenspectrum of a Hamiltonian [22,33,44], and by sufficiently understanding its energy spectra we can accurately predict chemical reaction rates [23].
Given a Hamiltonian H expressed as the sum of multi-qubit Pauli matrices, solution of the Schrödinger equation requires the calculation of its exponentiation.Quantum simulation techniques are distinguished by the way they map the chemical Hamiltonian to an effective Hamiltonian on qubits [3,4,20,31,36], and subsequent mapping of the exponentiation of this effective Hamiltonian to a computation.One well-established method is to apply the Trotter-Suzuki formula [38][39][40] to reduce this larger exponentiation to a product of Pauli exponentials.Each Pauli exponential can then be interpreted as a single quantum gate.To date, many variants of Trotter-Suzuki methods have been studied in the context of quantum simulation of chemical systems [2,21,37].
Trotterisation is an attractive approach to quantum simulation because of its simplicity.However, a significant limitation of Trotterisation is that the number of quantum gates required scales linearly with the number of terms in the Hamiltonian, which may grow very large [42,44].Campbell [12] observed the problematic scaling of the Trotter-Suzuki decomposition and introduces qDRIFT, a stochastic approach that samples terms from the Hamiltonian, trading accuracy for computational cost; importantly, qDRIFT scales independently of the number of terms.We build on Campbell's approach, introducing an algorithm that combines the strengths of standard Trotterisation with the efficiency of qDRIFT.
Our approach approximates the target Hamiltonian via sparsification, yielding a Hamiltonian with fewer terms; the Trotterised computation then requires far fewer gates per Trotter step.Although sparsification introduces a new approximation error, the reduction in gate count means we can apply more Trotter steps within a fixed budget.Our key insight is that sparsification can be performed stochastically instead of deterministically, and that this leads to improved per-formance.By defining our approximate Hamiltonian to be a random variable with an expectation value equal to the actual Hamiltonian, we benefit from a quadratic suppression of errors relative to deterministic methods; this behaviour is also seen in other stochastic compilers [11,12,15,16].We refer to our combination of first order Trotterisation and stochastically sparsified Hamiltonians as SparSto.
In many systems, including electronic structure Hamiltonians, we empirically observe power-law distributions of term strengths, which is promising for sparsification since weaker terms are clear candidates for truncation.In a stochastic compiler, it is natural to relate the probability of a term being truncated from the Hamiltonian with the magnitude of its strength.One of our main technical results is a rigorous upper bound on the error of SparSto for an arbitrary probability distribution where the terms are sampled independently.
To obtain the best possible upper-bound we need to select the best probability distribution.In our analysis, we place the most important terms, with the largest strengths, inside an active set so that they always appear in the sparsified Hamiltonian.Random sparsification is instead applied only to a tail of weaker Hamiltonian terms that we label the inactive set.We rely on optimality conditions in convex optimisation theory in order to derive a probability distribution over the inactive set, which we call the "linear ansatz".We numerically optimise and analyse the performance of our error bounds for some electronic structure Hamiltonians.For low gate budgets SparSto behaves similarly to qDRIFT, and for larger gate budgets it exactly reproduces randomized first order Trotter; as such it interpolates between these approaches.However, for intermediate gate budgets, which coincide with parameter regimes of practical interest, our new sparsification method outperforms both methods by around an order of magnitude.We emphasize numerical optimisation is always performed at the level of upper bounds and not by considering empirical performance of small simulatable systems.Though SparSto uses first order Trotter, sparsification of Hamiltonians could also be naturally combined with higher order Trotter schemes to yield higher order randomized compilers.
When comparing with other Trotter methods we can just count the number of gates of the form e −isHj that are unitaries generated by easily accessible Hamiltonians H j .However, to fairly compare against post-Trotter methods [7,9,13,26,27], we would also need to assess the resource cost of extra ancilla and select and prepare gadgets that are not built using e −isHj .This makes direct comparison with post-Trotter methods an involved task and sensitive to the cost model.However, post-Trotter methods have better asymptotic scaling and for problems of interest they have often been found to have a con-siderable advantage over Trotterisation methods, and it is likely that this advantage persists over SparSto.However, the value of this work is two-fold.Firstly, we improve the best known Trotter methods.Secondly, we have demonstrated the value of randomly sparsifying Hamiltonians as a technique that might later be incorporated into post-Trotter methods.Indeed, Berry's perspective article [6] also highlighted that potential application of randomization to post-Trotter protocols is a promising future research direction.

Trotterisation
Consider a time-independent Hamiltonian that admits a decomposition H = L j=1 h j P j .While P j can often be considered to be multi-qubit Pauli matrices, we make no such assumption, and for us P j are matrices with singular value at most 1.Without loss of generality, the corresponding coefficients h j can then be positive.Solving the Schrödinger equation |ψ(t) = e −iHt |ψ(0) allows us to model the continuous evolution of the state |ψ(t) over time.Over a short time period s, the first order Trotter-Suzuki decomposition approximates the exponential e −iHt with a product of exponentials given by [38] where H j = h j P j .We call this the "vanilla Trotterisation" scheme, as there are many variants to this approach [14,15,39,40], including ones that propose to deterministically coalesce Hamiltonian terms [34,43].We assume that each e −isHj , which we call a gate, can be efficiently implemented on a target quantum computer.To simulate e −itH , we approximate e −isH repeatedly r times, such that t = rs.The number of gates G required by a simulation is thus the principal measure of its computational cost.Since each Trotterisation of e −isH involves L gates, vanilla Trotterisation requires G = rL gates, and can become potentially computationally expensive when L is large.
In particular, it is known that an effective Hamiltonian for the electronic structure problem for a system with N modes typically has L = O(N 4 ) terms [1].
In contrast to Trotterisation, the qDRIFT method introduced by Campbell has a computational cost that is independent of L; its gate count is O(λ2 t 2 / ).qDRIFT simulates an ideal unitary process by a Markovian evolution, sampling a sequence of Pauli gates from a predetermined distribution; each exponentiation in the resulting circuit is given the same weight τ such that the distribution alone determines the outcome of the calculation.The probability p j of choosing a given e iτ Hj as the next gate in a computation is weighted by the corresponding interaction strength: p j = h j /λ, ensuring that the stochastic process of repeated sampling drifts stochastically towards the target unitary.The number of gates is set at a fixed computational budget G representing the number of primitive gates, and gives approximation error 4λ 2 t 2 /G1 .Whilst Trotter-Suzuki decompositions have worse computational complexity than qDRIFT in the number of terms, their computational cost scales better with respect to t and .To exploit this trade-off, we introduce a new approach SparSto, which interpolates between qDRIFT and the higher-order Trotter-Suzuki decompositions whilst also building and improving on the analysis of randomised simulation in Ref. [15].
Like qDRIFT, SparSto approximates a unitary evolution with a probabilistic ensemble of unitary evolutions instead of direct Trotterisation -although higher-order Trotterisation can subsequently be applied.This stochastic scheme is in the spirit of related work [8,14,15] and its merits lie in the ability to use mixtures of unitary operators to approximate a unitary operator [11,16]; intuitively stochastic methods avoid systematic noise.

SparSto Analysis
SparSto uses a random Hamiltonian Ĥ and crucial to our analysis is that the expectation value is equal to the system Hamiltonian E( Ĥ) = H.To reduce gate counts, we would like Ĥ to have far fewer terms than H and to be a good approximation, or at least to do this with high probability.Rather than considering arbitrary probability distributions we consider the term-wise independent sampling where Ĥ contains the term h j P j /p j with probability p j and with probability 1 − p j this term is dropped.This ensures E( Ĥ) = H and that the expected number of terms is µ = L j=1 p j .Next, we review Lindblad's formalism of unitary maps [24], where such maps are generated by exponentiating Liouville operators.We let L j = h j P j so that P j is a Liouville operator that maps ρ to −i(P j ρ − ρP j ).Clearly then, L j (ρ) = −i(H j ρ − ρH j ).For any positive number s, Liouville operators generate unitary evolutions in the sense that e sLj (ρ) = e −iHj s ρe iHj s .The ideal evolution operator can be written in terms of the Liouville operator L = L j=1 L j , because e sL (ρ) = e −iHs ρe iHs .Using a vanilla Trotterisation analogous to that given in (1), a first order approximation of e sL is given by T s,→ = L j=1 e sLj .Given no a priori reason to simulate L j in any particular order, it is natural to consider permutations of this operator sequence.A second order approximation of the Taylor expansion can be made by mixing the above Trotterisation T s,→ with T s,← = 1 j=L e sLj , where the arrows denote the ordering over the term index j.The uniformly mixed operation 1 2 (T s,→ + T s,← ), which is a randomised first order Trotterisation scheme that we denote as R1oTrott, approximates e sL with error that is third order in sL [15,Theorem 1].More generally, Trotterisation can be further improved by completely randomising the order in which the gates are performed [15,42].To approximate e tL for a fixed time t, we approximate e sL for a total of r times, where s = t/r.By taking the number of repeats r to be large, the simulation time s can become small, and (1) holds to a good approximation.
In SparSto, we stochastically sparsify the Hamiltonian using some term-wise independent probability distribution and then apply one step of randomized first order Trotter.For each of the r Trotter steps a fresh stochastic Hamiltonian is sampled.The random Hamiltonian Ĥ induces a random Liouville operator L in the natural way L(ρ) = i( Ĥρ − ρ Ĥ).Similarly, we have random terms Lj such that Lj = L j /p j with probability p j 0 with probability 1 − p j ( Here, Lj approximates the ideal Liouville operator L j in the sense that E( Lj ) = L j .Given a sampled Ĥ or L, we also randomize the order of the gates in each Trotter step and so introduce the randomized operators of forward and reverse Trotter steps Ts,← = A single step of SparSto is then described by which has µ gates on average.To approximate e tL , SparSto simulates Ês independently and sequentially r times.By fixing the expected number of gates G of SparSto to be constant, we require in the first G/µ repeats to have s = µt/G and in the final repeat to have s = t − G/µ .The total number of repeats is then r = G/µ .We do not consider completely randomising the gate orders in Ês because it renders our subsequent analysis overly complicated.We quantify the maximum error of SparSto using the diamond norm [19], which when evaluated on the difference between quantum channels, quantifies their distinguishability.For us, we quantify the distinguishability between the average of r repeats of Ês and the ideal channel e tL with the error where • denotes the diamond norm.Our main result is an analytic upper bound on Ês − e sL that we denote as , which is expressed in terms of the 1-norm • 1 .
Theorem 1.Using SparSto with vector of probabilities p = (p 1 , . . ., p L ), vector of Hamiltonian coefficients h = (h 1 , . . ., h L ), L ≥ 3, and expected number of gates G, the error of simulating e tL with SparSto is at most where Moreover, u, v and w are vectors given by A tighter bound with full details on the higher order terms in is supplied in Theorem 5 of Appendix A.
To bound the diamond distance between E( Êr s ) and e tL , we bound the diamond distance between E( Ês ) and e sL .Using the triangle inequality on a telescoping sum, the unit diamond norm of all channels, and the independence of each random unitary Ês , we get the bound To obtain an upper bound on E( Ês ) − e sL , we perform a series expansion of the operators Ês and e sL with respect to the parameter s to get Ês = j≥0 Âj s j and e sL = j≥0 B j s j .Using this notation, we can see that Â0 and B 0 are both trivially the identity operator 1, and E( Â1 ) and B 1 are both equal to the Liouvillean L. To obtain the O(t 2 µ/G) and O(t 3 µ 2 /G 2 ) terms in , we evaluate upper bounds on E( Â2 ) − B 2 and E( Â3 ) − B 3 respectively.To do this, we rewrite Â2 as sums over products of Lkj j , where each sum comprises of terms of the form L2 j , Lj Lk , where j and k are distinct indices.Having j and k distinct allows us to find that E( L2 j ) = L 2 j /p j and E( Lj Lk ) = L j L k .Using a similar strategy for rewriting Â3 , we can evaluate its expectation explicitly.Writing B 2 and B 3 in a similar form then allows us to compute the leading order terms in .We supply the full details of this argument in Appendix A.
We upper bound the difference between tails of Ês and e sL , which are O(t 4 µ 3 /G 3 ) terms, by essentially using the fundamental theorem of calculus to bound the tail of a power series from its derivatives.From this, we evaluate upper bounds on the diamond norm of j≥4 s j (E( Âj ) − B j ), and call this our tail bound.To apply the fundamental theorem of calculus, we first take the fourth derivatives of Êsθ and e sθL with respect to θ, evaluate upper bounds on the norm of their difference over the unit interval for θ.Second, we integrate this upper bound over an appropriate region, which gives a rescaling factor of 1/4!.Also, by obtaining polynomials in the diamond norms of L j and subsequently using the inequality L j ≤ 2h j , along with the triangle inequality on the diamond norm of the difference between the ideal channel and the approximate channel, we can obtain a closed form expression for the tail bounds which we show explicitly in Theorem 5 of Appendix A.
It is important to point out that the upper bound on the simulation error in Theorem 1 depends very much on the choice of the probabilities p 1 , . . ., p L .Each p j signifies the probability that the Hamiltonian term H j contributes to the Trotterisation at each iteration.The smaller the value of µ = p 1 + • • • + p L , the sparser our Hamiltonian simulation is.Intuitively, different choices on the values of the probabilities p j in Theorem 1 affect the overall simulation error of e tL .When all probabilities are equal to one, SparSto becomes identical to R1oTrott.The simulation error, can thereby be obtained as the following corollary of Theorem 1.

Corollary 2. When p
While the bound that we have in Corollary 2 is tighter than [15, Theorem 1], a careful analysis of the third order terms in [15, Theorem 1] yields the same expression as that given in Corollary 2.
One might also observe that when all the probabilities in Theorem 1 are set to p j = 1, we have u 1 = v 1 = 0 and w 1 is minimized, which implies that /r which is roughly the simulation error per time segment s, is in fact minimized.This leads one to wonder what advantage might be gained by setting the probabilities to be otherwise.The solution to this conundrum lies in the penalty we pay in making such a choice.In this scenario, each Ês comprises of µ = L gates, and the overall error for simulating e tL need not be optimized since rs j ∼ t j (µ/G) j−1 , which appears as coefficients in Theorem 1, is in fact maximized when µ = L.The resultant algorithm simulates e tL , with an expected gate count of G when s = µt/G for all but the last repeat and r = G/µ .One can imagine SparSto to be analogous to another qDRIFT where µ = 1 so that the expected number of gates per time segment s is equal to one.The tradeoff in this scenario is that u 1 and v 1 are potentially very large because the probabilities become very small.The key advantage of using Theorem 1 allows us to understand how interpolates between having all the probabilities to be either 1 or 0. In what follows, we consider one family of probability distributions that we use together with Theorem 1.For this example, we set p j = 1 whenever h j is above a set threshold.Otherwise, p j < 1.We denote the active set A as the set of indices j for which p j = 1, and the inactive set Ā to be the set of indices for which p j < 1.We choose the values of p j according the following ansatz.Definition 3 (Linear ansatz).For every j ∈ A, we set p j = 1.For every j ∈ Ā, we set p j = ch j .We correspondingly have µ = |A| + c j∈ Ā h j .
Clearly, c has to be sufficiently small so that we indeed have p j < 1 for all indices j in the inactive set.By minimizing with respect to all possible values of |A| and µ using our linear ansatz, we can determine which probabilities p j to use.These probabilities can be inputted into SparSto, which we describe in the pseudocode Algorithm 1.
We numerically study the performance of SparSto using models of molecules drawn from the Open-Fermion library [30], including carbon dioxide, ethane, and propane in the STO-3G basis set, and depict these results in Fig. 1.We evaluate the error bound for SparSto given by Theorem 5 in Appendix A. We compare the performance of SparSto with the Trotter bounds from [15] (Theorem 2 in their paper, setting k = 1), and by setting all probabilities p j = 1 we also plot Corollary 2. Only the second Algorithm 1 SparSto (t, G, p1, . . ., pL, h1, . . ., hL) for all rep = 1 to r do 3: dir ← fwd or bwd with probability 1/2 if dir = fwd then 9: for all j = 1 to L do 10: Choose x uniformly at random from [0, 1]. for all j = L to 1 do 14: Choose x uniformly at random from [0, 1].We perform a limited brute force numerical optimisation over all feasible values of µ and |A| for our ansatzes; we examine |A|/L over the interval [0, 1] with a step size of 0.1, and consider the same values for µ = (µ − |A|)/(L − |A|) along with 1 × 10 −5 , 1 × 10 −4 , and 1×10 −3 ; we consider all pairwise combinations of these settings.Intuitively we expect that as the gate budget G increases, we ought to interpolate between the qDRIFT regime [12] and the R1oTrott regime, and the size of the active set |A| ought to go from 0 to L. We observe from our numerical study that this indeed is the case.In general, the optimal active set size increases with G, and the optimal value for µ was usually small, and never more than 0.3.
The linear ansatz outperforms the uniform ansatz.This is expected, as the uniform ansatz is naïve and the linear ansatz can be obtained as the optimal solution of the convex program which minimizes the leading order term in the total error for constant µ (see Appendix B).In each of the molecules, the number of Hamiltonian terms is over a hundred thousand, which is very large.When t = 6000 and for a range of desired error values, there is a considerable advantage in using SparSto over both R1oTrott and qDRIFT.We observe similar results across other values of t and smaller molecules.

Discussion
While vanilla Trotterisation can simulate any Hamiltonian with sufficiently many gates, the number of these gates can become very large.This leads to the need to reduce the gate count of quantum simulation while keeping the size of simulation error fixed.
Here we present a new approach to chemistry simulation on a quantum machine, using the stochastic sparsification of a target Hamiltonian to derive a hybrid approach between canonical Trotterisation and qDRIFT.Our analysis provides an upper error bound for the scheme, and optimisation over the probabilities used in sparsification allows for reductions in the simulation error over parameter regimes of interest.
It would be instructive to consider how the ideas in our hybrid approach might extend to other variants of quantum simulation schemes, such as that of the socalled "quantum signal processing" (QSP) [26] techniques, linear combinations of unitaries [7], the use of quantum walks [9,13], qubitisation [9,27] and postprocessing techniques [18].There has also been recent interest in the quantum simulation of time dependent Hamiltonians [8,28], and applications of quantum simulation in phase estimation [12,22], which may also prove amenable to stochastic sparsification.Given that random techniques can prove advantageous when applied to hybrid quantum-classical algorithms for numerical optimisation [41], our techniques might also offer some speedups in this area.Furthermore, there might exist certain families of Hamiltonians where the advantage of using our techniques over deterministic Trotterisation can be understood analytically, and we leave this as a subject for future work.

A Upper bounds on the simulation error
In this section, we show that Theorem 1 is a corollary of Theorem 5, which we state in Section A.2. Before we can state Theorem 1, we define relevant notation in Section A.1.After that, we evaluate the leading order terms and tail terms of Theorem 5 in Section A.3 and Section A.4 respectively.

A.1 Sum over distinct indices
Given real vectors a = (a 1 , . . ., a n ), b = (b 1 , . . ., b n ) and c = (c 1 , . . ., c n ), we define the sums over distinct indices to be To perform fast computation of the above sums, we can use the following lemma which vectorises summations with distinct indices. where Lemma 4 can be proved iteratively by careful consideration of summation indices.
Proof of Lemma 4. The result ( 9) is straightforward to show.To show (10), note that we can use (9) to write We can specialize this to sum of distinct combinations of a u a v a w to get which yields (11).

A.2 Complete simulation error bound
The complete upper bound that we prove here is given by the following.
Theorem 5. Using SparSto with vector of probabilities p = (p 1 , . . ., p L ), vector of Hamiltonian coefficients h = (h 1 , . . ., h L ), L ≥ 3, and expected number of gates G where G/(p 1 + • • • + p L ) is an integer, the error of simulating e tL is at most where = 1 + 2 + 3,1 + 3,2 and with µ = L j=1 p j .Moreover, u, v, w and q are vectors given by We use S as defined in Section A.1.We will see that by considering explicitly the commutation structure of the matrices P j , we can obtain a tighter bound on 2 in Theorem 5 by substituting S(h, h, h) for D 5 in (45).
Before we proceed to prove Theorem 5, we prove that Theorem 1 is a straightforward consequence of Theorem 5. Note that Proof of Theorem 1.By overcounting (8), it is easy to see that Moreover, since u and v are non-negative vectors, we have S(u) = u 1 and S(v) = v 1 .Furthermore, we have 3,j = O(t 4 µ 3 /G 3 ).This completes the proof.
The proof of Theorem 5 then arises from the evaluation of (1) the leading order terms 1 and 2 , and (2) the higher order terms 3,1 and 3,2 .This will proceed in the next two subsections.We emphasize that in what follows, because of the telescoping argument we mentioned in the main text, it suffices to only analyze Ês − e sL , and the overall simulation error will just be r times of this diamond norm.
A.3 The leading order terms 1 and 2 in Theorem 5 Here, we show that the leading order terms in the simulation error are as given by 1 and 2 .First recall that we have the Taylor series expansions Ês = j≥0 Âj s j and e sL = j≥0 B j s j .
Note that when L ≥ 3, we have Moreover, we know that , which implies that for 0 < p j ≤ 1, we have Since p j ≤ 1, we have Since B 2 = L 2 /2!, we get the upper bound where s = tµ/G.Multiplying this by r = G/µ gives us 1 .
To evaluate 2 , we proceed to write where We now proceed to simplify D j for j = 1, . . ., 5. Note that E( L3 j ) = p j L 3 j p 3 j .This implies that Next, multiplicativity of the expectation for independent random variables implies that Now we can write Clearly, we have Next by swapping the roles of j and k in the summation, we get By pairing the first term with the third term and the second term with the fourth term in (34), this implies that where we swap the roles of j and k in the second sum.From the above, we can see that From this, we can obtain the first two terms in 2 .To see this, note that and Multiplying the right sides of (40) and (41) by r gives the first two terms in 2 .
We proceed to simplify D 5 .Note from (30) that Now by ordering all the indices in the same way we get By pairing the first term with the fifth term, and the third term with the fifth term in (43), we get By pairing the second term with the sixth term, and the fourth term with the sixth term in (43), we get We can thus rewrite (43) as Collecting the terms in the above summation in terms of commutators again, we get A trivial upper bound on the diamond norm of this is where the first factor of 8 arises from going from the diamond norm of the Liovillean L j to the operator norm of H j , and the numerator 8 in the fraction arises from the total number of summations over non-decreasing indices, and 6 arises from the number of ways to permute the indices j, k and l.From the bounds we have on the diamond norms of D 1 , D 2 , D 3 , D 4 and D 5 , we obtain Multiplying this by r gives us the error in that is O(t 3 µ 2 /G 2 ).We have thus completed bounding the leading order errors in Theorem 5.

A.4 Tail bounds in Theorem 1
Here, we explain how the tail bounds 3,1 and 3,2 in Theorem 5 arise.
To evaluate the higher order terms in , we consider a convergent power series in θ given by F θ = k≥0 f k θ k .Here, f k is independent of θ, and F θ and f k belong to a Banach algebra.Define [θ j ]F θ = f j as the jth coefficient in the power series expansion of F θ .A useful technique to bound quantities in a Banach algebra relies on the fundamental theorem of calculus, and has been used for example in Ref [10] and Ref [8].This for example can be used to obtain the well-known integral form of the remainder term of the Taylor series of the power series F s where s > 0. Lemma 6.Let θ 0 = 1, s > 0 and F s = k≥0 f k s k .For every positive integer t, we have Proof.The proof of this is well-known but we provide the complete details for completeness.We first note that where k t = (k) . . .(k − t + 1) denotes the falling factorial.
Applying the fundamental theorem of calculus on monomials in θ t , we have Applying this argument iteratively gives the result.
Now let us denote a single timeslice of the ideal channel and SparSto for time sθ as U θ = e sθL and Êsθ = 1  2 (e sθ L1 . . .e sθ LL + e sθ LL . . .e sθ L1 ) respectively.We proceed to evaluate the fourth derivatives of a single timeslice of U θ and Vθ , which are respectively given by d 4 dθ 4 U θ = (sL) 4   It is clear that (54) also holds when n j = 0. Using (54), we get the upper bound E( Lnj j e sθ Lj ) ≤ p j ( L j /p j ) nj .(55) Using (55) with the multinomial theorem on (53), the diamond norm of the tail of Êsθ is at most Applying the identity L j ≤ 2h j , we find that the diamond norm of the tail of Êsθ is at most By setting θ = 1 and multiplying the results that we obtained from the tail bounds on a single timeslice s by a factor of r, we thus find that the expected contribution to the simulation error from the tail bounds on r repeats of e sL and Ês is at most 3,1 and 3,2 respectively, where from which the result follows.
Thus our ansatz for p j is with the regularity condition that this formula satisfies p j < 1 for j ∈ Ā.

Figure 1 :
Figure 1: Error Bounds: Rigorous upper bounds on the simulation errors of various molecules in the STO-3G basis set with L ≥ 100000 are compared with rigorous bounds for Trotterisation and qDRIFT.Here t = 6000.In an intermediate regime for expected the number of gates, SparSto requires fewer gates than both R1oTrott and qDRIFT for a fixed simulation error.For propane and carbon dioxide, the second order Trotter error bounds (COS 2nd Order [15, Theorem 2]) are too large to be seen on the plots.

15 :
if x ≥ pj then implement exp(s(hj/pj)Pj) 1 order bounds from Childs et al. [15, Theorem 2] are visible, in the upper-right of the second plot.

When n j ≥ 1 ,=
we appeal to the Taylor series expansion for the exponential function, to getE(Lnj j e sθ Lj ) = E p j (L j /p j ) nj e sθLj /pj .(54) Ln11 e sθ L1 ...Ln L L e sθ LL Ln L L e sθ LL ...Ln1 1 e sθ L1 , (49)where in the second equation we used the general Leibniz rule and the 4 n1,...,n L = 4!/(n 1 !...nL !) denotes the multinomial coefficient.The diamond norm of the tail of U θ is therefore at most where the last equality arises because U θ is a quantum channel.Now note that by the linearity of the derivative and expectation operator,n1+•••+n L =4 n1,...,n L ∈N 4 n 1 , ..., n L E( Ln1 1 e sθ L1 ...Ln L L e sθ LL ) Ln L L e sθ LL ...Ln1 1 e sθ L1 ).(51)By the independence of every stochastic Trotter step, the expectation is multiplicative so that Using the triangle inequality and the submultiplicativity of the diamond norm, the diamond norm of the tail of Êsθ is at most 1 e sθ L1 ) . . .E( Ln L L e sθ LL ) .