Approximating Hamiltonian dynamics with the Nyström method

Simulating the time-evolution of quantum mechanical systems is BQP-hard and expected to be one of the foremost applications of quantum computers. We consider classical algorithms for the approximation of Hamiltonian dynamics using subsampling methods from randomized numerical linear algebra. We derive a simulation technique whose runtime scales polynomially in the number of qubits and the Frobenius norm of the Hamiltonian. As an immediate application, we show that sample based quantum simulation, a type of evolution where the Hamiltonian is a density matrix, can be eﬃciently classically simulated under speciﬁc structural conditions. Our main technical contribution is a randomized algorithm for approximating Hermitian matrix exponentials. The proof leverages a low-rank, symmetric approximation via the Nyström method. Our results suggest that under strong sampling assumptions there exist classical poly-logarithmic time simulations of quantum computations.


Introduction
Special purpose quantum simulators permit the efficient implementation of unitary dynamics governed by physically meaningful families of Hamiltonians, while the general task is BQP-hard -since we can implement any quantum computation by a sequence of Hamiltonian evolutions.The properties responsible for efficiency correspond to plausible structural restrictions such as locality of interaction between subsystems [Llo96], or sparsity, which permits simulation complexities sub-logarithmic in the inverse error [AT03].Moreover, sparse operators provide examples of a class of quantum circuits with efficient weak classical simulation [SN13], whereas general strong simulation is known to be #P -hard [Van10].
The intuition supporting this widely used empirical phenomenon is based on the fact that sparse matrices have good storage requirements and an easier combinatorial structure mirroring the small number of physical interactions.
In this paper, we consider the problem of classical simulation of quantum Hamiltonian dynamics.In particular, we will be interested in the case of time-independent Hamiltonians.In this setting, the problem is mathematically equivalent to the task of approximating the matrix exponential of an Hermitian matrix, i.e. the Hamiltonian of the system.
The task is particularly relevant for quantum many-body physics where the exponential scaling of the wave function limits to only a handful of qubits the applicability of ordinary differential equation solvers.In order to simulate bigger system it is thus necessary to leverage specific properties of the system.An important class of methods exploits the entanglement structure of the states to obtain efficient representations, known as tensor networks [VMC08;Orú14], that can then be evolved either via Trotterisation [Vid04] or with a time-dependent variational principle [VGC04;Hae+11].
We conclude this introduction by noting that, from a computational perspective, the problem of simulating Hamiltonian dynamics is related, but different, to the problem of simulating quantum circuits.In the latter case the task no longer involves the computation of costly matrix exponentials and reduces to approximating the action of a circuit, described as a sequence of unitary operations, on a state vector.Recent techniques developed for this problem involve the notion of stabilizer rank [Bra+18] or neural network quantum states [JBC18].

Results
3. We require H to be efficiently row-searchable.This condition informally states that we can efficiently sample randomly selected indices of the rows of H in a way proportional to the norm of the row (general case) or the diagonal element of the row (positive semidefinite case).
It is important to remark that, although the notion of row-searchability is commonly assumed to hold in the randomised numerical linear algebra literature (see for example Section 4 in the seminal paper by Frieze, Kannan, and Vempala [FKV04]), in the context of quantum systems this is a much stronger requirement and cannot be assumed to hold for every matrix.Indeed, whilst for a polynomially sized matrix it is always possible to evaluate all the row-norms efficiently in the number of non-zero entries of the matrix, this is generally no longer the case for the exponentially sized matrices found in quantum mechanics.
Under this input model we prove the following theorem that here we state informally.
Theorem 1 (Informal statement of the main result).If H is a row-computable, rowsearchable Hamiltonian on n qubits, and if ψ is an n-qubit quantum state with an efficient classical description then, there exists an algorithm that, with probability 1−δ, approximates any chosen amplitude of the state e iHt ψ in time where determines the quality of the approximation, • F is the Frobenius norm, • is the spectral norm, s is the maximum number of non-zero elements in the rows of H, and q is the number of non-zero elements in ψ.
By analyzing the dependency of the runtime on the Frobenius norm we can determine under which conditions we can obtain efficient Hamiltonian simulations.Informally, we obtain: , and if ψ is an n-qubit quantum state with an efficient classical description, then there exists an efficient algorithm that approximates any chosen amplitude of the state e iHt ψ.
The algorithm proceeds by performing a two steps approximation.First, the Hamiltonian H is approximated in terms of a low rank operator H which is more amenable to computations by sampling the rows with a probability proportional to the row norm.Second, the time evolution e i Ht ψ is approximated by a truncated Taylor expansion of the matrix exponential.By combining the structure of H and the spectral properties of the truncated exponential we are able to guarantee that the above procedure can be efficiently performed.
In a more specific way, we separately consider the case of a generic Hamiltonian H and the restricted case of positive semidefinite Hamiltonians, for which a refined analysis is possible.The proposed algorithm leverages on a low-rank approximation of the Hamiltonian H to efficiently approximate the matrix exponential e iHt .Such approximation is performed by randomly sampling M = O(polylog(N )) rows according to the distribution determined by the row-searchability condition and then collating them in a matrix A ∈ C M ×N .
When H is positive semidefinite, we consider the approximation H = AB + A * , where B + is the pseudoinverse of the positive semidefinite matrix B ∈ C M ×M obtained by selecting the rows of A whose indices correspond to those originally sampled for the rows of H.The approximation of the time evolution e i Ht ψ is then performed in terms of a Taylor expansion of the matrix exponential function, truncated to the K-th order.Leveraging the structure of H it is possible to formulate the truncated expansion only in terms of linear operations involving the matrices A * A, B + and B and the vector A * ψ.Under the row-computability assumption, all these operations can be performed efficiently.
In the general Hermitian setting, rows of H, that are sampled to form A, are first rescaled according to their sampling probability.Differently from the positive semidefinite case, we resort to a slightly more involved variant of the approximation of the matrix exponential, in particular we use H 2 := AA * to approximate H 2 .This approach involves two auxiliary functions in which the matrix exponential is decomposed and for each of these two functions we evaluate its truncated Taylor expansion.We show that this approximation can be formulated only in terms of linear operations involving A * A and A * ψ.Again, under the row-computability assumption, these operations can be performed efficiently.

Applications and outlook
It remains an open question-and an interesting research direction-to determine if there exist physically relevant families of Hamiltonians that respect the requirements for efficient simulation outlined in this paper.In general terms, our methods appear to function well in cases of "small variations" (for example, if all the eigenvalues of the Hamiltonian are contained in an exponentially small band) and in this sense share some similarities with perturbation theory.
As a specific application of our theorem, we propose the case of sample based Hamiltonian simulation, that is the simulation of quantum dynamics where the Hamiltonian is a density matrix [LMR14].This type of simulation has recently found applications in various quantum algorithms for machine learning tasks such as linear regression [SSP16] and least squares support vector machines [RML14].Note that when these algorithms are used to analyze classical data, they assume that the data can be efficiently encoded into a density matrix.Specifically, as a direct consequence of our main theorem, we obtain the following result: Theorem 3 (Informal).If ρ is a row-computable, row-searchable density matrix and if ψ is an n-qubit quantum state with an efficient classical description, then there exists an efficient algorithm that approximates any chosen amplitude of the state e iρt ψ.
We remark that all our results do not require the Hamiltonian to be sparse (that is, to have at most polylog(N ) non-zero entries) but only to be row-sparse (that is, to have at most polylog(N ) non-zero entries per row ).The latter requirement is compatible with matrices that are non-sparse.
Going beyond immediate applications, we believe that the introduction of randomised numerical algebra techniques in quantum mechanics may provide a new direction from which to tackle the quantum many-body problem.In this spirit, it is relevant to mention that shortly after our preprint was posted on arXiv, Tang [Tan19] showed that, using a strong sampling data structure-and with a significantly higher polynomial overheadit is possible to derive a classical poly-logarithmic time algorithm for recommendation problems that nearly matches the asymptotic scaling of the quantum algorithm proposed by Kerenidis and Prakash [KP16].The work by Tang assumes the ability to sample from the probability distribution defined by the squared entries of a matrix divided by the 2 norm of the matrix, the so called 2 -norm sampling.Our row-searchable condition is fundamentally equivalent to this and both our results provide evidence that a careful assessment of the state preparation conditions is fundamental in order to determine whether a quantum algorithm provides a true advantage over a classical variant.
More specifically, [Tan19] defines the sample access to the input in the following way; Definition 1 ([Tan19]).We have sample access to data point x ∈ C N if, given an index i ∈ [N ], we can produce independent random samples i ∈ [N ] where i is sampled with This definition can be easily extended to the columns (or rows) H :,i (H i,: ) of a Hamiltonian H.In this case, we sample a column with probability p(i) = H :,i 2 / H 2 F (similarly for a row).The sampling scheme we use for Hermitian matrices is equivalent to this one (when the matrix is also PSD we use a more computationally efficient variant).Additionally, we also provide a method based on binary-trees to compute the sampling probabilities if these are not given by an oracle but stored in a standard memory model.
Informally Definition 1 and Definition 2 are fundamentally the same.The main difference is that, while we use a traditional memory structure and provide a fast way to calculate the marginals using a binary tree (see Sec. 3), [Tan19] assumes a memory structure which allows one to sample efficiently according to this distribution.
Finally, it is relevant to mention that the techniques introduced by Tang have been directly applied to the problem of classical Hamiltonian simulation in [Chi+19].While the time complexity of this algorithm has a 36-th power dependency on the Frobenius norm of the Hamiltonian, our algorithm scales with a 4-th power.

Classical and quantum approximation of matrix exponentials
The problem of matrix exponentiation has been extensively studied in the linear algebra literature [Hig05; Hig09; AH09; AH11].Presently, for the case of general Hermitian matrices, there is no known algorithm with a runtime logarithmic in the dimension of the input matrix.Such fast scaling is required for the simulation of quantum dynamics where the dimensions of the matrix that describes the evolution scale exponentially with the number of qubits in the system.
The results described in this paper are based on randomized numerical linear algebra techniques.These methods, along with results from spectral graph theory, have lead to a variety of new classical algorithms to approximate matrix exponentials [DKM06a; Dri+11; Mah+11; Woo+14; RCR15a].For specific types of matrices, these techniques give efficient runtimes.For example, Orrecchia et al. [OSV12] have demonstrated that the spectral sparsifiers of Spielmann and Teng [ST04; ST11] can be used to approximate exponentials of strictly diagonally dominant matrices in time almost linear in the number of non-zero entries of H.
Quantum computers on the other hand can approximate efficiently some kinds of matrix exponentials.In particular, there exist time-efficient quantum algorithms for simulating the dynamic of row-sparse Hamiltonians that have only a linear dependency in the row-sparsity.For an important class of algorithms the simulation exploits an efficient edge-coloring of the graph associated with the Hamiltonian matrix H [Chi+03;CK11]. Once this edge coloring is found, the Hamiltonian can be decomposed into a sum of sparse Hamiltonians assuming that H is sparse.It is known that these terms can be simulated separately using the Trotter-Suzuki formula [Tro59;Suz76].Improved methods have been developed which result in algorithms with a runtime of Õ(s poly(log(N s/ ))) where, s is the sparsity, N the dimension and the maximum error in the solution [CW12; BCK15; LC17].

Randomized numerical linear algebra and Nyström methods
Randomized numerical linear algebra (RandNLA) seeks to solve large-scale linear algebra problems exploiting randomisation as a computational resource.Central is the notion of a sketch.A sketch is a smaller or sparser approximation of the original input instance, such as a matrix of data points, that is used to compute quantities of interest.The review of Drineas and Mahoney covers the main ideas and tools of RandNLA [DM18].As a simple example of the techniques used in RandNLA we present here the case of approximate matrix multiplication via random sampling.In general, given two n×n square matrices A and B, computing AB takes O(n 3 ) time.Drineas, Kannan, and Mahoney showed that, given an efficient sampling method for the rows and columns of the matrices, it is possible to efficiently approximate AB in time O(c n 2 ), where c is the number of rows and columns randomly sampled from the matrices [DKM06a].The algorithm is structurally very simple.Draw c random samples of columns of A and rows of B according to a probability distribution p. Group the samples into two, properly scaled, smaller matrices C and R. When the probability distribution p is chosen appropriately and the columns and rows accordingly re-scaled, it is possible to show that that ).Similar ideas can be used to compute low-rank approximations of matrices that can be then used in a wide range of applications [DKM06b].In general, these methods produce sketches that do not preserve given symmetries of the matrix, a fundamental requirement for applications in quantum mechanics.A technique where the symmetry of the sketched matrix is preserved is the Nyström method, a RandNLA tool developed for the approximation of kernel matrices in statistical learning theory.Roughly speaking, the method allows one to construct a lower dimensional, symmetric, positive semidefinite approximation of a given matrix given a sampling schemes for its columns.More specifically, let K ∈ R n×n be a symmetric, rank r, PSD matrix, K :,j the j-th column vector of K, and K i,: the i-th row vector of K.The singular value decomposition of K is K = U ΣU , where the columns of U are orthogonal and Σ = diag(σ 1 , . . ., σ r ) is the matrix of the singular values σ 1 , . . ., σ r of K.The Moore-Penrose pseudoinverse of The Nyström method finds a low-rank approximation of K that preserves the symmetry and PSD property of the matrix.Let C denote the n × l matrix formed by (uniformly) sampling l n columns of K, W denote the l × l matrix consisting of the intersection of these l columns with the corresponding l rows of K, and The Nyström method generates a rank-k approximation K of K for k < n defined by: The running time of the algorithm is O(nkl) [KMT12].
The performance of the algorithm can be improved using non-uniform sampling schemes.For a sampling scheme equivalent to the one we use in this paper we have Theorem 4 (Theorem 3 in [DM05]).Let K be a n × n symmetric PSD matrix, let k < l be a rank parameter, and let K be the approximation constructed using the Nyström method by sampling l columns with probabilities The Nyström method has proved to be a powerful tool in a range of applications where the matrices are approximately low rank.The method in its present form was developed by Williams and Seeger [WS01] as a sampling-based algorithm to solve regression and classification problems involving Gaussian processes.This problem requires the approximation of symmetric, positive semidefinite matrices that can be well low-rank approximated [WS01;Wil+02].The technique proved to be closely related to a method for solving linear integral equations developed by Nyström [Nys30] and hence the name Nyström method.
It is worth mentioning the Nyström extension, a further refinement of the technique that has found numerous applications ranging from large-scale machine learning problems, to applications in statistics and signal processing [WS01; Wil+02; ZK10; TKR08; Fow+04; KMT12; BW07; BW08; KMT09; LKL10; MTJ11; ZK10; ZTK08].Typical extensions that substantially improve the performance, e.g.lead to lower reconstruction error, introduce non-uniform importance sampling distributions or random mixing of the input before sampling the columns.

Organization
Section 2 introduces relevant notation and definitions.Section 3 discusses the row-searchability condition and the algorithm to sample efficiently from the rows of the Hamiltonian.Section 4 proves the theorem for the approximate simulation of exponentials of PSD matrices.Section 5 proves the theorem for the approximate simulation of exponentials of Hermitian matrices.Section 6 discusses applications to the simulation of the evolution of density matrices.

Preliminaries
We denote vectors with lower-case letters.For a vector x ∈ C n , let x i denotes the i-th element of x.A vector is sparse if most of its entries are 0.For an integer k, let [k] denotes the set {1, . . ., k}.
For a matrix A ∈ C m×n let a j := A :,j , j ∈ [n] denote the j-th column vector of A, a i := A i,: , i ∈ [m] the i-th row vector of A, and a ij = A(i, j) the (i, j)-th element.We denote by A i:j the sub-matrix of A that contains the rows from i to j.
The supremum is denoted as sup and the infimum as inf.For a measure space (X, Σ, µ), and a measurable function f an essential upper bound of f is defined as Then the essential supremum is defined as ess supf := inf U ess f .We let the span of a set The set is linearly independent if i α i v i = 0 if and only if α i = 0 for all i.The range of A ∈ C m×n is defined by range(A) = {y ∈ R m : y = Ax for some x ∈ C n } = span(A 1 , . . ., A n ).Equivalently the range of A is the set of all linear combinations of the columns of A. The nullspace null(A) (or kernel ker(A)) is the set of vectors such that Av = 0. Given a set S = {v The rank of a matrix A ∈ C m×n , rank(A) is the dimension of range(A) and is equal to the number of linearly independent columns of A; Since this is equal to rank(A T ) it also equals the number of linearly independent rows of A, and satisfies rank(A) ≤ min{m, n}.The trace of a matrix is the sum of its diagonal elements Tr (A) = i a ii .The support of a vector supp(v) is the set of indices i such that v i = 0 and we call it sparsity of the vector.For a matrix we denote the sparsity as the number of non zero entries, while row or column sparsity refers to the number of non-zero entries per row or column.A symmetric matrix A is positive semidefinite (PSD) if all its eigenvalues are non-negative.For a PSD matrix A we write A 0. Similarly A B is the partial ordering which is equivalent to A − B 0.
We use the following standard norms.The Frobenius norm We denote the pseudo-inverse of a matrix A with singular value decomposition U ΣV * as 3 From row-searchability to efficient row-sampling All our algorithms require row-searchable Hamiltonians.In this section we define the row-searchability condition and describe an efficient algorithm to sample from the rows of row-searchable Hamiltonians.
Let n ∈ N. We first introduce a binary tree of subsets spanning {0, 1} n .In the following, with abuse of notation, we identify binary tuples with the associated binary number.Let L be a binary string with |L|≤ n, where |L| denotes the length of the string.We denote with S(L) the set We are now ready to state the row-searchability property for a matrix H.

Definition 3 (Row-searchability). Let H be a Hermitian matrix of dimension
where h is the function computing the weight associated to the i-th column H :,i .For positive semidefinite H we use h(i, H :,i ) = H i,i .i.e. the diagonal element i while for general Hermitian H we use h(i, H :,i ) = H :,i 2 .
Row-searchability intuitively works as follows.If we are given a binary tree, where the leaves contain the individual probabilities and the parents at each level contain the marginal over their children nodes, then we can, for a randomly sampled number in [0, 1] traverse this tree in log(N ) time to find the leave node that is sampled, i.e. the indices of the column of H.

Algorithm 1 MATLAB code for the sampling algorithm
Input: wS(L) corresponds to the function w(S(L)) defined in Eq. 2. Output: L is the sampled row index L = []; q = rand()*wS(L); More specifically, row-searchability requires the evaluation of w(S(L))) as defined in Eq. ( 2) which computes marginals of the diagonal of H, where the co-elements, i.e. the elements where we are not summing over, are defined by the tuple L. Hence, for empty L, w(S(L)) = Tr (H).Note that the function h considered is related to leverage score sampling, which is a common approach in randomized algorithm for linear algebra [Mah+11;Woo+14].
In Alg. 1 we provide an algorithm, that, given a row-searchable H, is able to sample an index with probability p(j) = h(j, H :,j )/w({0, 1} n ).Let q be a random number uniformly sampled in [0, T ], where T = w({0, 1} n ) is the sum of the weights associated to all the rows.The algorithm uses logarithmic search, starting with L empty and adding iteratively 1 or 0, to find the index L such that w({0, . . ., L − 1}) ≤ q ≤ w({0, . . ., L}).The total time required to compute one index, is O(nQ(n)) where Q(n) is the maximum time required to compute a w(S(L)) for L ∈ {0, 1} n .Note that if w(S(L)) can be computed efficiently for any L ∈ {0, 1} n then Q(n) is polynomial and the cost of the sampling procedure will be polynomial.

Algorithm for PSD row-searchable Hermitian matrices
Given a 2 n × 2 n matrix H 0, our goal is to produce an approximation of the state ψ(t) = exp (iHt)ψ. (3) In particular, we will provide an expression of the form exp(i Ht)ψ, where exp and H are a suitable approximation respectively of the exponential function and H.We give here an algorithm for H 0 which we then generalize in the following section to arbitrary Hermitian H, if the row-searchability condition, i.e. condition 3, is fulfilled.
Let h be the diagonal of the positive semidefinite H and let t 1 , . . ., t M , with M ∈ N be indices independently sampled with repetition from {1, . . ., 2 n }, with probabilities e.g.via Alg. 1.Then, define the matrix B ∈ C M ×M such that B i,j = H t i ,t j , for 1 ≤ i, j ≤ M .Finally, denote by A ∈ C 2 n ×M the matrix A i,j = H i,t j for 1 ≤ i ≤ 2 n and 1 ≤ j ≤ M .The approximated matrix is defined as H = AB + A * , where (•) + is the pseudoinverse.Let us also define a function g(x) = (e itx − 1)/x.Then we have e itx = 1 + g(x)x.Note that g is an analytic function, in particular,

Then
(5) where the last step is due to the fact that, given an analytic function q(x) = k≥0 α k x k , we have By writing D = B + A * A, the algorithm is now Now we approximate g with g K (x), which limits the series defining g to the first K terms, for K ∈ N.Moreover, by rewriting g K (D)B + (A * ψ) in an iterative fashion, we have A MATLAB implementation of this procedure is presented in Alg. 2. Let the row sparsity s of H be of order poly(n).Then the total cost of applying this operator is given by O(M 2 poly(n) + KM 2 + M 3 )) in time, where the terms M 3 and M 2 poly(n) are resulting from the calculation of D. To compute the total cost in space, note that we do not have to save H or A in memory, but only B, D and the vectors v, b j , for a total cost of O(M 2 ).Indeed D can be computed in the following way.Assuming, without loss of generality, to have 2 n /M ∈ N, then where A a:b is the submatrix of A containing the rows from a to b.A similar reasoning holds for the computation of the vector v.In this computation we have assumed that the sample probabilities are give to us and that we can efficiently sample from the matrix H according to these probabilities.In order to make our algorithm practical we hence need to give an algorithm for performing the sampling.This properties of the SPDS case algorithm are summarized in the following theorem.
Theorem 5 (Algorithm for simulating PSD row-searchable Hermitian matrices).Let , δ ∈ (0, 1], let K, M ∈ N and t > 0. Let H be positive semidefinite, where K is the number of terms in the truncated series expansions of g( Ĥ) and M the number of samples we take for the approximation.Let ψ(t) be the true evolution (Eq. 3) and let ψ K,M (t) be the output of our Alg. 2 (Eq.6).When then the following holds with probability 1 − δ, Note that with the result above, we have that ψ K,M (t) in Eq. (6) (Alg.2) approximates ψ(t), with error at most and with probability at least 1 − δ, requiring a computational cost that is O( st 2 Tr(H) 2 2 log 2 1 δ ) in time and O( t 2 Tr(H) 2 2 log 2 1 δ ) in memory.In the following we now prove the first main result of this work.To prove Theorem 5 we need the following lemmas.Lemma 1 performs a basic decomposition of the error in terms of the distance between H and the approximation H as well as in terms of the approximation g K with respect to g. Lemma 2 provides an analytic bound on the distance between H and H, expressed in terms of the expectation of eigenvalues or related matrices which are then concentrated in Lemma 3. Lemma 1.Let K, M ∈ N and t > 0, then Proof.By definition we have that e ixt = 1 + g(x)x with g(x) = k≥1 x k−1 (it) k /k! and g K is the truncated version of g.By adding and subtracting e i Ht , we have By [Nak03], (9) e iHt − e i Ht ≤ t H − H , moreover, by [Mat93], and since H is Hermitian and hence all the eigenvalues are real, we have (10) Finally note that ψ = 1.
To study the norm H − H note that, since H is positive semidefinite, there exists an operator S such that H = SS * , so H i,j = s * i s j with s i , s j the i-th and j-th row of S. Denote with C and C the operators Tr (H) h t j s t j s * t j .
We then obtain the following result.
Proof.Define the selection matrix V ∈ C M ×2 n , that is always zero except for one element in each row which is V j,t j = 1 for 1 ≤ j ≤ M .Then we have that i.e., A is again given by the rows according to the sampled indiced t 1 , . . ., t M and B is the submatrix obtained from taking the rows and columns according to the same indices.In particular by denoting with P the operator P = S * V * (V SS * V * ) + V S, and recalling that H = SS * and C = S * S, we have By definition P is an orthogonal projection operator, indeed it is symmetric and, by definition Indeed this is a projection in the row space of the matrix R := S * V * , since with the singular value decomposition which spans the same space as R. Finally, since (I − P ) = (I − P ) 2 , and Z * Z = Z 2 , we have ( 14) Note that C can be rewritten as C = S * V * LV S, with L a diagonal matrix, with Moreover t j is sampled from the probability p(q) = h q /Tr (H), so h t j > 0 with probability 1, then L has a finite and strictly positive diagonal, so C has the same range of P .Now, with C = S * S, we are able to apply Proposition 3 and Proposition 7 of [RCR15b], and obtain Finally, note that, since P is a projection operator we have that P = 1, so where the last step is due to the fact that H = SS * .
Lemma 3. Let δ ∈ (0, 1] and τ > 0. When then with probability 1 − δ it holds that Proof.Define the random variable ζ j = Tr(H) almost surely.Moreover, By definition of ζ j , we have Since ζ j are independent for 1 ≤ j ≤ M , uniformly bounded, with expectation equal to C, and with , we can apply Proposition 8 of [RCR15b], that uses non-commutative Bernstein inequality for linear operators [Tro12], and obtain with probability at least 1 − δ, with α = log 4Tr(C) τ δ .Since by Remark 1 of [RCR15b], we have that where ess sup here denotes the essential supremum.
Now we are ready to prove the Theorem for PSD matrices.
Proof of Theorem 5.By Lemma 1, we have Let τ > 0. By Lemma 2, we know that H ≤ H and that with probability 1. Finally by Lemma 3, we have that the following holds with probability 1 − δ, So we have with probability 1 − δ.Now we select K such that (t H ) K+1 (K+1)!≤ 2 .Since, by the Stirling approximation, we have Finally we require M , such that and select Then we have that We now extend this result to the more general case of arbitrary Hermitian matrices that fulfill the row-searchability condition, i.e. the ability to sample according to some leverage of the rows.This will lead to the second main result of this work.

Algorithm for row-searchable Hermitian matrices
In this section we provide the algorithm for simulating Hermitian (possibly non-psd) matrices and we provide guarantees on the efficiency when H is row-searchable.Let in the following s be the maximum number of non-zero elements of the rows of H, the error in the approximation of the output states of the algorithm w.r.t. the ideal ψ(t), and t the evolution time of the simulation.Let further K be the order of the truncated series expansions and M the number of samples we take for the approximation.As before we first outline the algorithm and then prove its properties.
For arbitrary matrices H we will use the following algorithm.Sample M ∈ N independent indices t 1 , . . .t M , with probability p(i) where h i is the i-th row of H (sample via Alg.1).Then denote with A, the matrix 2 n × M defined as Then we will use H 2 = AA * as the approximation for the Hamiltonian.
Define two functions that we will use to approximate e ix , x , moreover denote with f K and g K the K-truncated Taylor expansions of f and g, for (−1) j+1 x j (2j + 3)! .
In particular note that Analogously to the previous algorithm, we hence estimate e ix via f K and g K .The final algorithm will be with u = Ĥψ, v = A * ψ, z = A * u.Note that the product f k (A * A)v and Ag K (A * A)z are done by exploiting the Taylor series form of the two functions and performing only matrix vector products in the same way as in Alg. 2. Denote with s the maximum number of non-zero elements in the rows of H, with q the number of non-zero elements in ψ.The final algorithm requires O(sq) in space and time to compute u, O(M min(s, q)) in time and O(M ) in space to compute v and O(M s) in time and space to compute z.We therefore obtain a total computational complexity of time : O (sq + M min(s, q) + sM K) , ( space : Note that if s > M is it possible to further reduce the memory requirements at the cost of more computational time, by computing B = A * A that can be done in blocks and require O(sM 2 ) in time and O(M 2 ) in memory, and then compute In that case the computational cost would be The properties of the this algorithm are summarized in the following theorem (this is a formal statement of Theorem 1): Theorem 6 (Algorithm for simulating row-samplable Hermitian matrices).Let δ, ∈ (0, 1].Let t > 0 and K, M ∈ N, where K is the number of terms in the truncated series expansions of g( H) and M the number of samples we take for the approximation, and let t > 0. Let ψ(t) be the true evolution (Eq. 3) and let ψ K,M (t) be computed as in Eq. 20.When with probability at least 1 − δ.
Note that with the result above, we have that ψ K,M (t) in Eq. (20) approximates ψ(t), with error at most and with probability at least 1 − δ, requiring a computational cost that is O sq + M min(s, q) + M 2 (s + K) in time and O sq + M 2 is memory.
Combining Eq. 23 and 24 with Eq. 25 and 26, the whole computational complexity of the algorithm described in this section, is where the quantity log 4 H 2 F δ H 2 in Eq. 25 was bounded using the following inequality where λ M AX is the biggest eigenvalue of H.
Observe now that simulation of the time evolution of αI does only change the phase of the time evolution, where I ∈ C N ×N is the identity matrix and α some real parameter.We can hence perform the time evolution of H := H − αI, since for any efficient classical description of the input state we can apply the time evolution of the diagonal matrix e −iαIt .We can then optimize the parameter α such that the Frobenius norm of the operator H is minimized, i.e.
from which we obtain the condition α = Tr(H) 2 n .Since our algorithm requires that H F is bounded by polylogN .Using the spectral theorem, and the fact that the Frobenius norm is unitarily invariant, this in turn gives us after a bit of algebra the condition for which we can simulate the Hamiltonian H efficiently.We now prove the second main result of this work and establish the correctness of the above results.
Proof of Theorem 6. Denote with By definition of ψ K,M (t) and the fact that ψ = 1, we have We first study Z(Ht, At) − e iHt .Define l(x) = f (x)x and m(x) = g(x)x.Note that, by the spectral theorem, we have To bound the norms in l, m we will apply Thm.
Moreover it is easy to see that l

Now note that, by defining the random variable ζ
Let τ > 0. By applying Thm. 1 of [Hsu14] (or Prop.9 in [RCR15b]), for which when τ ≥ e, by selecting we have that the equation above holds with probability at least 1 − δ.
Now we study Z K (At, Ht) − Z(Ht, At) .Denote with a, b the functions a(x) = l(x 2 ), b(x) = m(x 2 ) and with a K , b K the associated K-truncated Taylor expansions.Note that a(x) = cos(x) − 1, while b(x) = (sin(x) − x)/x.Now by definition of Z K and Z, we have Note that, since and

Application to density matrix simulation
Sample-based Hamiltonian simulation is a method for simulating Hamiltonians which are density matrices [LMR14].Such method has been used in several recent quantum machine learning algorithms, including quantum support vector machines [RML14], quantum gradient descent / Newton's method [Reb+19], and quantum linear regression [SSP16].These techniques all make use of a quantum algorithm that implements the time evolution e iρt |ψ governed by a Hamiltonian corresponding to the density matrix ρ of a pure n-qubit state |ψ of dimension 2 n .Specifically, given an oracle that returns superpositions of the entries of a density matrix ρ ∈ C 2 n ×2 n , the quantum algorithm for density matrix exponentiation implements a unitary matrix U , such that U |ψ −e iρt |ψ ≤ and requires O(t 2 / ) copies of ρ.This has been shown to be optimal [Kim+17].
The total runtime of the density matrix simulation algorithm is O(t 2 T (U ρ )/ ), where T (U ρ ) is the time required to prepare a quantum superposition of the entries of the matrix ρ.If the state preparation can be achieved in O(poly(n)) time, the total runtime of the algorithm reduces to O(poly(n)t 2 / ), which is polylogarithmic in the dimension of ρ.A data structure that grants this sort of fast access is common in the quantum machine learning literature [Bia+17;Cil+18].As discussed in the introduction, it was recently noted by Tang [Tan19] that some common implementations of this data structure, such as [KP16], are equivalent to the ability to perform 2 -norm sampling on the entries of ρ.This requirement is fundamentally equivalent to the row-searchable condition used in this paper.
More formally, the sample-based Hamiltonian simulation Theorem states that Theorem 7 (Sample-based Hamiltonian simulation [LMR14]).Given access to multiple copies of an n-qubit state ρ, there is an efficient quantum circuit implementing a unitary U , such that U ψ − e iρt ψ ≤ , where 0 ≤ ≤ 1, t > 0, and ψ is an arbitrary pure state.The algorithm requires O(t 2 / ) copies of ρ.
Note that the full running time of the algorithm depends on the memory access mode and the time to prepare ρ.
Theorem 5, i.e. the algorithm for positive semidefinite matrices, suggests a straightforward classical analogue of sample-based Hamiltonian simulation which outputs a classical description of the evolved quantum state, under the condition that we can efficiently compute marginals of the diagonal entries of ρ.Note that this is the case for any density matrix with structured diagonal entries (e.g., non-decreasing order).More precisely, we obtain a classical efficient algorithm for density matrix simulation with the following properties (the next result is a formal statement of Theorem 3): Theorem 8 (Classical sample-based Hamiltonian simulation).Let , δ ∈ (0, 1], and let ρ be a row-searchable n-qubit density matrix with at most s non-zero entries in each row.Given an efficient classical description of an arbitrary input state ψ then, with probability 1 − δ, Algorithm 2 returns an approximation of any chosen amplitude of the state ψ such that ψ − e iρt ψ ≤ .The algorithm runs in time O( st 2 2 log 2 1 δ ) and requires O( t 2 2 log 2 1 δ ) of memory.
Proof of Theorem 8.The Theorem is an immediate consequence of the unitarity of the trace of density matrices and Theorem 5 from which we know that it is possible to compute an approximation ψ(t) of ψ(t), with error at most and with probability at least 1 − δ, in O( st 2 Tr(H) 2 2 log 2 1 δ ) time and O( t 2 Tr(H) 2 2 log 2 1 δ ) memory.
and the spectral norm A = sup x∈C n , x =0 |Ax| |x| .Note that that A 2 F = Tr A T A = Tr AA T .Both norms are submultiplicative and unitarily invariant and they are related to each other as A ≤ A F ≤ √ n A .The singular value decomposition of A is A = U ΣV * where U, V are unitary matrices and U * defines the complex conjugate transpose, also called Hermitian conjugate, of U .