Differentiable matrix product states for simulating variational quantum computational chemistry

Quantum Computing is believed to be the ultimate solution for quantum chemistry problems. Before the advent of large-scale, fully fault-tolerant quantum computers, the variational quantum eigensolver~(VQE) is a promising heuristic quantum algorithm to solve real world quantum chemistry problems on near-term noisy quantum computers. Here we propose a highly parallelizable classical simulator for VQE based on the matrix product state representation of quantum state, which significantly extend the simulation range of the existing simulators. Our simulator seamlessly integrates the quantum circuit evolution into the classical auto-differentiation framework, thus the gradients could be computed efficiently similar to the classical deep neural network, with a scaling that is independent of the number of variational parameters. As applications, we use our simulator to study commonly used small molecules such as HF, HCl, LiH and H$_2$O, as well as larger molecules CO$_2$, BeH$_2$ and H$_4$ with up to $40$ qubits. The favorable scaling of our simulator against the number of qubits and the number of parameters could make it an ideal testing ground for near-term quantum algorithms and a perfect benchmarking baseline for oncoming large scale VQE experiments on noisy quantum computers.


Introduction
Computational quantum chemistry is an elementary tool for the material and biochemistry science.However exactly solving the electronic Schrödinger equation, namely the full configuration interaction (FCI) method, has a complexity that scales exponentially with the system sizes.As a result the FCI method is strictly limited within 20 orbitals on classi-Chu Guo: These authors contribute equally Yi Fan: These authors contribute equally Honghui Shang: shanghui.ustc@gmail.comcal computers.In the meantime, the fast development of quantum computing technologies already allows to manipulate more than 50 qubits with moderate circuit depth [1][2][3].In combination with the variational quantum eigensolver (VQE) which does not require fault-tolerant qubits in principle [4][5][6][7], it is possible to solve real quantum chemistry problems on nearterm noisy quantum computers [8,9].
As a heuristic quantum algorithm, the effectiveness of VQE could only be tested in real applications, similar to the variational tensor network states algorithm used to solve quantum many-body problems [10,11].Due to the noisy nature of the current and near-term quantum computers, it is crucial to have a perfect reference to benchmark the precision of the computations performed on those quantum computers.Classical simulators serve for this purpose.Moreover, efficient classical simulators allow researchers to explore the advantages or limitations of a given variational quantum circuit ansatz in absence of an actual quantum computer.Currently, the most commonly used classical simulator for VQE is the brute force method, which directly stores an Nqubit quantum state as a state vector of 2 N complex numbers [12,13].Due to the exponential memory requirement, with this method it is extremely difficult to simulate beyond 50 qubits even with the most powerful existing supercomputer.In fact, most classical simulations of VQE reported in literatures are limited within 30 qubits [14][15][16][17][18][19][20][21][22][23][24].
To overcome the memory bottleneck and increase the simulation efficiency, we propose an matrix product states (MPS) based simulator for VQE.MPS is known to be able to efficiently represent a class of low entanglement quantum states satisfying an area law [25].During the past twenty years, it has become a standard ansatz to represent low energy states of one-dimensional quantum many-body systems [10].The MPS based simulator has also been applied for simulating quantum circuits [26], random quantum circuit sampling [27] as well as VQE [28], and it has also been integrated into numerical packages such as qiskit [29] and quimb [30].However, in terms of simulating variational quantum circuits the existing MPS simulators lack an efficient way to compute the gradients beyond the universal parameter shift rule (PSR) [31] (the complexity of PSR grows linearly with the number of variational parameters) or large scale parallelizability.The contribution of this work is two fold: 1) we propose a strategy to integrate our MPS based simulator into the classical auto-differentiation framework, such that the complexity of computing the gradients is independent of the number of variational parameters in the circuit ansatz; 2) we propose and implement a parallelization scheme for our simulator on distributed architectures.As applications, we calculate the ground state energies of commonly used smaller molecules including HF, HCl, LiH and H 2 O, reaching comparable accuracies to existing approaches, we also study larger molecules such as CO 2 , BeH 2 and H 4 with up to 40 qubits, distinguishing our method from the commonly used state-vector approach which is mostly limited within 30 qubits.The favorable scaling of our simulator against both the number of qubits and the number of parameters could make it a promising tool to study near-term quantum algorithms.
The paper is organized as follows.In Sec. 2, the framework of variational quantum eigensolver for quantum computational chemistry is introduced.In Sec. 3 we introduce the implementation of quantum circuit simulator based on MPS and then in Sec. 4, we propose the back propagation algorithm for the MPS based simulator.In Sec. 5 we first validate our simulator by benchmarking it against other approaches in terms of both accuracy and runtime efficiency, and then we demonstrate several applications for commonly used molecules.We conclude in Sec. 6.

VQE for quantum chemistry
The variational quantum eigensolver encodes the ground state of a quantum Hamiltonian into a variational quantum circuit and then minimize the expectation value of the Hamiltonian against the variational parameters using a hybrid quantum-classical optimization procedure.It is a hardware-friendly alternative compared to the well established quantum phase estimation algorithm, mainly due to its flexibility in choosing different quantum circuit ansatz as well as its possible robustness against certain level of noises in the underlying quantum circuit [32].The variational quantum circuit ansatz is, in principle, very similar to the tensor network ansatz used in classical variational tensor network algorithms, and its expressive power ultimately determines the power of VQE for a certain Hamiltonian.Compared to classical variational ansatz, a quantum circuit with even a very moderate circuit depth d (d ∝ N with N the number of qubits) could be extremely difficult to classically simulate, such as the random quantum circuits [1][2][3].
It has also been shown that a shallow quantum circuit could efficiently generate any MPS [33] as well as some specific instances of projected entangled pair states (PEPS) [34].Therefore it is possible that a variational quantum circuit could better represent and thus allow to more efficiently compute the ground states of certain Hamiltonians than commonly used classical ansatz (PEPS is believed to be a powerful ansatz for certain two-dimensional Hamiltonians, but manipulating PEPS on classical computers could be very hard).
In recent years, quantum chemistry calculations for molecular systems using VQE has been demonstrated using numerical simulations on most of the leading quantum computer architectures [28,[35][36][37][38][39][40][41][42].The electronic Hamiltonian of a chemical system can be written in a second-quantized formulation: where p, q, r, s label the orbitals, â † p and âp denote the fermionic creation and annihilation operators for orbital p, h p q and g pq rs are one-and two-electron integrals in molecular orbital basis.In VQE, the creation and annihilation operators are first mapped to weighted Pauli strings using the Jordan-Wigner or Bravyi-Kitaev transformations, and then the energy can be obtained by summarizing the expectation values of all the Pauli strings: where the parametric wave function ansatz |ψ( ⃗ θ)⟩ is prepared using a quantum circuit containing optimizable parameters denoted as ⃗ θ, P p q and P pq rs denote the Pauli strings corresponding to the Hamiltonian terms â † p âq and â † p â † q âr âs respectively.The key ingredient of VQE is the underlying variational quantum circuit ansatz.In this work, we will use the physically motivated unitary coupled-cluster (UCC) ansatz [43] (however our approach is completely general for other ansatz), where the wave function takes the form with |Φ 0 ⟩ a reference state, typically chosen as the Hartree-Fock ground state.T is the cluster operator.When truncated to the single and double excitations (UCCSD), we have with where {i, j, k, . . .}, {a, b, c, . . .}, and {p, q, r, . . .} indicate the occupied, virtual and general orbitals respectively.Using the Trotter-Suzuki decomposition e Â+ B ≈ (e Â/k e B/k ) k [44,45], the operator exponential in the UCC ansatz in Eq.( 3) can be converted into a quantum circuit structured as successive V-shaped blocks.Truncating the Trotterization procedure to a finite order k results in a quantum circuit with depth approximately O(kN 4 ), where N is the number of qubits.

MPS based quantum circuit simulator
The correlated wave function, generated by the FCI method in quantum chemistry by considering all excitations above the Hartree-Fock reference state, can be written as: where |i 1 i 2 i 3 . . .i N ⟩ refers to a specific computation basis, and the coefficients c i1i2i3...i N constitute a rank-N tensor of 2 N complex numbers, which are the parameters to be optimized if the energy is to be minimized in a brute force way (the state-vector approach).This high-rank tensor can also be represented as a matrix product state (MPS), which is the product of a chain of rank-3 tensors written as where i n ∈ {0, 1} refers to "physical" index and α n the "virtual" index related to the partition entanglement entropy.The virtual indices α 0 and α N at the boundaries are trivial indices added for notational convenience.The largest size of the virtual indices, is referred to as the bond dimension of MPS, which characterizes the memory and computational complexity of MPS based algorithms.Eq.( 8) is able to represent any quantum state exactly if D increases exponentially.While a quantum state is said to be efficiently represented as an MPS if D is nearly a constant when N grows.
To simulate VQE a general purpose quantum circuit simulator is required.For tensor network states based simulators three ingredients are usually necessary [46]: 1) preparing an initial separate quantum state as a tensor network state; 2) performing quantum gate operations onto the initial state and updating the underlying tensor network state; 3) computing expectation values for each Hamiltonian term (namely each term in Eq.( 1)).In the following we will present the detailed implementation of these functions based on MPS (we note that some functions have already been shown in our previous work [28], which will still be included here for self-completeness).These functions have also been used to simulate the Hamiltonian evolution of quantum many-body systems [10].
The Hartree-Fock reference state is simply a separable state which can be written as Any quantum state in the form of Eq.( 10) can be efficiently written as an MPS with D = 1, by setting each site tensor of the MPS as When simulating the gate operations on MPS, it is important to keep the underlying MPS in a canonical form for accuracy and stability of the algorithm [10].We use the right canonical form of the MPS, namely each site tensor B in αn−1αn satisfies the right-canonical condition in,αn and we will perform the gate operations in a way that the right canonical form of MPS is preserved during the evolution (the initial state is naturally right canonical since it is separable), we will also save all the bipartition Schmidt numbers of the MPS similar to the time evolving block decimation (TEBD) algorithm [47].A single-qubit gate operation acting on the n-th qubit, denoted as in can be simply applied onto the MPS as The new site tensor Bi ′ n αn−1αn satisfies Eq.( 12) as long as B in αn−1αn satisfies Eq.( 12) in is unitary).For a nearest-neighbour two-qubit gate acting on two qubits n and n + 1 (the n-th bond), denoted as , we use the technique in Ref. [48] to preserve the right canonical form, which is shown as follows.First we contract the two site tensors B in αn−1αn and B in+1 αnαn+1 inin+1 to obtain a two-site tensor Then we contract αn−1αn+1 with the Schmidt numbers stored at the n − 1-th bond (denoted as λ αn−1 ) to obtain a new two-site tensor as Now we perform singular value decomposition (SVD) αn−1αn+1 and get during which we will also truncate the small Schmidt numbers below a certain threshold or simply reserve the D largest Schmidt numbers if the number of Schmidt numbers larger than the threshold is still larger than D. Now the new site tensors Bi ′ n αn−1αn and Bi ′ n+1 αnαn+1 can be obtained as αn−1αn+1 up to truncation errors.In addition, Bi ′ n+1 αnαn+1 satisfies Eq.( 12) by construction, and Bi ′ n αn−1αn also satisfies Eq.( 12) by verifying that The new Schmidt numbers λαn is used to replace the old λ αn at the n-th bond.The application of a single-qubit and a nearest neighbour two-qubit gate are also shown in Fig. 1(a,b).In practice, one can absorb the singlequbit gate operations into two-qubit gate operations using gate fusion, which could further increase the simulation efficiency.A non-nearest-neighbour twoqubit gate could be realized by decomposing it into a series of nearest-neighbour two-qubits gate using SWAP gate.
Last, given a single Pauli string denoted as where each O i ∈ {X, Y, Z, I 2 } is a single-qubit observable (X, Y , Z are the Pauli matrices and I 2 is the 2 × 2 identity matrix), the expectation value of P on an MPS can be computed by where we have used x j:i = {x i , x i+1 , . . ., x j } as an abbreviation for a list of indices.Eq.( 20) amounts to contracting a one-dimensional tensor network with a computational complexity O(N D 3 ), which is also shown in Fig. 1(c).

Back propagation algorithm for MPS
VQE is a minimization problem with loss function in Eq. (2).When the loss function contains a large number of tunable parameters, one generally needs to use a gradient-based optimization algorithm to accelerate convergence.The standard way to compute the gradient of Eq.( 2) exactly (on quantum computers) is to use the parameter shift rule, namely where ⃗ θ ± j means to shift the j-th parameter of ⃗ θ by ± π 2 .Clearly, to compute all the gradients using Eq.( 21), one would have to evaluate the loss function in Eq.( 2) for 2M times if ⃗ θ contains a total of M parameters.
Although Eq.( 21) seems to be the most efficient way to compute the exact gradients on quantum computers, there are shortcuts to compute gradients of Eq.( 2) on a classical computer by using automatic differentiation.In fact, a central reason for the great success of deep learning is the ability to efficiently compute the gradient of a deep neural network using the reversemode automatic differentiation.For a deep neural network, the complexity of evaluating the gradients of a loss function against all the parameters using (the reverse-mode) automatic differentiation (back propagation), is roughly equal to that of evaluating the loss function itself (forward propagation) [49].Briefly speaking, automatic differentiation works by treating the loss function as the composition of many "elementary" functions, then by defining an adjoint function for each of the elementary function and memorizing all the intermediate outputs generated by those elementary functions, one could systematically compute the gradients of the loss function by evaluating the adjoint functions in a reverse order [50].In the context of simulating VQE, if we treat each gate operation as an elementary function and memorize all the intermediate outputs during evaluating Eq.( 2), then the number of copies of the quantum state would scale linearly with the number of gate operations, which is certainly not impractical.Fortunately, by making use of the fact that the quantum circuit is a product of many small unitary gate operations which are reversible, one could use a memory efficient automatic differentiation scheme where only two copies of the quantum state are required at least.
In the next we will denote the variational quantum circuit ansatz |ψ( ⃗ θ)⟩ as Without loss of generality, we denoted the parametric quantum circuit U ( ⃗ θ) as where each parametric quantum gate U (θ j ) contains a single parameter θ j .Then the derivative of Eq.( 2) against one of the parameter θ j can be computed as [31] ∂E( ⃗ θ) where we have used dθj .Now we further define where we note that |ψ j ⟩ is not a proper quantum state in general since Ĥ may not be unitary.With Eqs.(25,26), Eq.( 24) can be rewritten as To this end we note the the gradients of the loss function in Eq.( 2) can be directly evaluated on a classical computer using Eq.( 27).Here we show how to embed the evaluation of Eq.( 27) into the automatic differentiation framework such that one could easily generalize the loss function in Eq.( 2), for example, to allow classical controls between quantum gate operations, or to allow classical functions which accept the expectation values of each Pauli strings as inputs (the latter would be very helpful for us as will be shown later).To enable automatic differentiation, we treat the loss function in Eq.( 2) as a composed function of two elementary functions: 1) the quantum circuit evolution in Eq.( 22) (we treat the whole circuit evolution instead of each gate operation as an elementary function) and 2) computing the expectation value on the final state |ψ( ⃗ θ)⟩, namely The adjoint function of Eq.( 28), denoted as Ẽ, can be defined using the rules in Ref. [50] for a function with complex numbers as input and real output: where z is a scalar passed from the possible outer-level adjoint function.The adjoint function of Eq.( 22) can be defined as where ⟨ϕ| is the gradient passed from Eq.( 29), and the gradient against |Ψ 0 ⟩ is neglected since it is a constant (The back propagation rules for complex functions, which are used to derive Eqs.(29,30), can be found in Appendix.A).Substituting ⟨ϕ| = Ẽ(z) into Eq.(30), we can see that each element of f (⟨ϕ|) is exactly Eq.( 27) up to a constant scalar Re(z) (z will be chosen to be 1 if there are no outer-level functions on top of )). Eq.(30) (or equivalently Eq.( 27)) can be evaluated in a memory efficient way with no additional copies of the quantum state on top of |ϕ⟩ and Algorithm 1 Memory-efficient way to evaluate Eq.( 30), with two input states |ϕ⟩ and |Ψ M ⟩. |Ψ M ⟩, therefore ideally the complexity of this algorithm is approximately two times the complexity of the forward propagation if we neglect the complexity of computing the "expectation value" ⟨ϕ| U (θ j )|Ψ⟩ (which is reasonable if M is much less than the total number of gate operations).While this is true for the state-vector based simulators (the brute-force approach, for which the memory-efficient back propagation has been implemented in packages such as Yao.jl [51]), for our MPS based simulator it is more complicated due to the "state" |ψ j ⟩ in Eq.( 26).Concretely, taking j = M , we get |ψ M ⟩ = Ĥ|ψ( ⃗ θ)⟩, which requires to apply the Hamiltonian Ĥ onto the final quantum state |ψ( ⃗ θ)⟩.Since Ĥ is generally the summation of a large number of Pauli strings, this operation will involve the addition of many states resulting from the application of each Pauli string.While for state-vector based simulator this addition can be implemented with no additional memory overhead (and very little computational overhead if the number of terms in the Hamiltonian, denoted as N h , is much smaller than the number of gate operations, denoted as N g ), for MPS based simulator the addition of different MPSs will generally result in an MPS with much larger bond dimension (If one add k MPSs with bond dimension D each, the resulting MPS would have a bond dimension kD in the worst case if the addition is done very accurately [10]).Therefore in MPS based simulator the gate operations on |ψ j ⟩ will either have a larger computational complexity than the gate operations on |Ψ j ⟩, or have a lower accuracy.To balance between the computational complexity and the accuracy, we split Ĥ into m different groups labeled by Ĥj with 1 ≤ j ≤ m, denoted as Ĥ = m j=1 Ĥj , where each group contains N h /m Pauli strings.Then we compute the gradients for each Ĥj using the memory efficient back propagation algorithm with a fixed bond dimension D, denoted as g j , and then the final gradients g is the summation of all g j , namely We stress that the above equation always holds for loss function in the form of Eq.( 28), independent of how we split Ĥ, since ⟩ holds in general, even if Ĥj does not commute with each other.However, taking into consideration that we perform MPS truncation during the back propagation, different ways of splitting Ĥ, such as the choice of the total number of groups m and the choice of the Pauli strings in each group, may affect the accuracy of the final gradient.In the following we will refer to the MPS simulator powered by the memory efficient back propagation algorithm as the differentiable MPS simulator.
Now we analyze the computational complexity of different approaches to compute the gradients based on MPS in detail.We denote the classical complexity of simulating each gate operation as C g (C g = O(D 3 ) for a nearest neighbour two-qubit gate), the complexity of evaluating the expectation value of a single Pauli string as C h (C h = O(N D 3 ) for a general Pauli string).
Then the complexity of the forward propagation, denoted as C, is C = N g C g + N h C h .The parameter shift rule for computing the gradients has a complexity Assuming that we divide Ĥ into m groups, then the complexity of the memory efficient back propagation algorithm for each group is 2N g C g + N h m C h , and the whole complexity scales as where the coefficient of the first term on the right hand side of Eq.( 33) is m + 1 instead of 2m since the states |Ψ j ⟩ in Eq.( 25) are the same for each group.From Eqs. (32,33) we can clearly see that our back propagation algorithm is more efficient than PSR if m ≪ M .The overhead occurred in the back propagation algorithm by dividing Ĥ into groups could be overcome by parallelization, since the gradients g j for each Ĥj can be computed almost perfectly in parallel (if one does not share the states |Ψ j ⟩ among different processes).The parallelization strategy to compute the gradients for our differentiable MPS simulator is demonstrated in Fig. 2.
To this end, we note that in our approach we have treated the MPS as a "global" state similar to the brute force approach.However, when computing the gradient one could also treat MPS "locally", which means that one compute the gradient of Eq.( 2) with respect to each site tensor of the MPS instead of the global state [52].The possible advantage of this approach is that one could in principle find the exact gradient with respect to those site tensors and then optimize each of them within a fixed bond dimension D, while in our approach the gradient would no longer be accurate if the intermediate states during the back propagation could no longer be written as an MPS with bond dimension D. However, in the local approach one has to explicitly perform the back propagation of all the functions involved during the gate operations and computing the expectation values, which contains functions such as SVD whose back propagation could be very unstable.In addition, one could no longer use the memory efficient algorithm and all the intermediate calculations need to be saved in principle, which makes the local approach impractical.In the following we first validate our differentiable MPS simulator by benchmarking it against PSR.Concretely we consider three molecules LiH, H 2 O and O 2 , and study the accuracy and the runtime efficiency of our differentiable MPS compared to the PSR.The results are shown in Fig. 3.In Fig. 3(a) we study the mean relative error R between the gradients computed using differentiable MPS, denoted as g BP , and the gradients computed using Eq.( 21), denoted as g PSR , which is defined as

Results
with ∥x∥ the Euclidean norm of the vector x.We can see that the gradients calculated by these two methods converge to each other as D grows, as expected (we note that both g PSR and g BP are not exact if D is not large enough, however the g PSR is expected to be more precise than g BP , since in computing g BP one needs in addition to apply the summation of N h /m Pauli strings onto the quantum state and we will generally use m < N h ).In Fig. 3(b), we show the speed up of our differentiable MPS compared to PSR, evaluated by t PSR /t BP , where t PSR/BP are the runtimes for computing the gradients of a single optimization step, and t PSR is estimated by the runtime of the forward propagation times 2M .We can see that the speed up scales approximately linearly as M increases.
Here we note that in the variational quantum circuit ansatz used in VQE, there could often be several parametric quantum gates (more concretely the R z gate) which share the same parameter, and thus the actual number of parameters will be less than the number of parametric quantum gates.Nevertheless, in this work when we mention the number of parameters, we always mean the total number of parametric quantum gates, since 1) those parameters can be optimized independently in principle and 2) the complexities of both our differentiable MPS and the PSR are only determined by the latter.Now we study the effect of m on the accuracy of the obtained gradient.As discussed in Sec. 4, for a given Hamiltonian, we expect higher accuracy and computational cost for larger m and vice versa.This effect is shown in Fig. 4. We can clearly see that the mean relative error R goes up when increasing the number of Pauli strings in each group (we note that the gradient calculated by PSR may also not be exact, but is expected to be more accurate than that calculated by our back propagation algorithm if D is not large enough), and stabilizes to around 10 −3 for N h /m ≥ 8 (m ≤ 48). the N2 molecule.The green, red and cyan lines in both panels stand for the BOBYQA, BFGS and gradient descent (GD) algorithms respectively.The y axis show the absolute error between the energies computed using these methods and the FCI energy.In these simulations we have used N h /m = 4.The typical hyperparameters for these optimization algorithms are set as follows (which are used throughout this work).For gradient-free optimizer (BOBYQA) the convergence criterion is that the difference between two successive minimization steps (epochs) is smaller than 10 −6 .For gradient-based optimizers (GD and BFGS), we use previous criterion plus an additional criterion that the norm of the gradient is less than 10 −5 , and the program stops if any of the two criteria is met.The maximum number of iterations is set to be 100 for gradient-based optimizers and 200 for BOBYQA.
In VQE, there could be various reasons if the algorithm ends in a sub-optimal solution, such as 1) the poor expressiveness of the quantum circuit ansatz, 2) the inaccuracy computing the gradients, and 3) the likelihood of being trapped in an excited state (for example the barren plateaus [53]).A usual approach to overcome the last pitfall is to start from a knowledged initial state (such as the Hartree-Fock reference state used here), while a (numerically) exact gradient could help us to eliminate the second one.Based on the exact gradient, it is also possible to use a very high-level optimization solver, which could be very helpful for one to examine the expressiveness of the underlying quantum circuit ansatz.In Fig. 5, we study the convergence of VQE for the H 2 O and N 2 molecules using different optimization solvers, including the gradient-free solver BOBYQA, the gradient descent algorithm and the high-level BFGS algorithm [54,55].We can see that for both molecules the gradient-based solvers converge much faster than BOBYQA, in particular the BFGS algorithm could converge to the lowest energy within less than 50 epochs.Moreover, the gradient-free solver BOBYQA could still have high fluctuations beyond 100 epochs when the energy is quite close to the exact (FCI) energy.In these simulations as well as in the later simulations we have always used a bond dimension D = 128 and an SVD truncation threshold ϵ = 10 −6 .For reference, the detailed information of the molecules and the corresponding variational ansatz used in Fig. 5 as well as in our later simulations are listed in TABLE. 1.
To demonstrate the powerfulness of our differentiable MPS simulator for large molecules, we compute the potential energy curves for the HF, HCl, LiH, H 2 O molecules.Symmetry-reduced UCCSD [24] is used as the variational ansatz, leading to 100∼432 Rz gates as shown in TABLE. 1 for the testing molecules.The results are compared with the FCI energies as shown in Fig. 6.We can see that for all the molecules with bond length ranging from 0.5 Å to 2.0 Å, chemical accuracy (error within 1.6 × 10 −3 Hatree or 1 kcal/mol) is reached.Although all the optimization algorithms give accurate energies, the gradient-based Adam and BFGS algorithms can achieve smaller absolute errors than the gradient-free BOBYQA, especially for HF and HCl.Generally, BFGS outperforms other method and achieves the best accuracy.
To this end we note that till now we have only studied relatively small molecules with no more than 20 qubits (TABLE.1).However, we stress that the major limitations of our method is the entanglement entropy within the quantum state (which determines the minimal bond dimension require to maintain certain precision), instead of the number of qubits, the circuit depth as well as the number of parameters of the underlying quantum circuit ansatz.The largest VQE simulation using the state-vector based simulators till now have computed the ground state energy of the C 2 H 4 molecule with 28 qubits [24].In the following we go a step forward by computing the energies of the CO, O 2 and CO 2 molecules involved in the chemical reaction: and the results are shown in TABLE .2. Previous studies have calculated this reaction using a state vector simulator by incorporating the frozen core approximation and qubit tapering [56].Here we performed the full-size simulation of this 30-qubit system.For demonstration purpose, we selected operators using the point group symmetry [24] and energy-sorting screening [57].The obtained energy barrier is −21.0 kcal/mol.Compared with FCI (−25.5 kcal/mol), there is an error of 4.5 kcal/mol, mainly caused by the incompleteness of the screened UCCSD operator pool (especially for CO 2 ).More accurate results can be obtained if more robust ansatz such as UCC with generalized excitations (UCCGSD) [58] or ADAPT-VQE [59] are used.
To further demonstrate the powerfulness of our method, we study the BeH 2 molecule in the 6-31G* basis set and H 4 molecule in the cc-pVDZ basis set, with 36 and 40 qubits respectively.Geometry optimization is first performed using PySCF [60], and the equilibrium geometries are used to perform VQE calculations.The convergence of these two simulations against the minimization step is shown in Fig. 7.In Table 1: Information of the molecules used in this work.The columns "Qubits", "Parameters", "Pauli strings", "Basis set" list the number of qubits, the number of parametric gates (after being reduced by symmetry), the number of Pauli strings, and the basis set we used for the molecule.Bond length ( Å) Figure 6: The potential energy curve for four molecules HF (a1), HCl (b1), LiH (c1) and H2O (d1) using the STO-3G basis set.In (a1,b1,c1,d1) the green squares represent the results obtained by the gradient-free BOBYQA optimizer, the red circles are the results obtained by the BFGS optimizer, the blue × are the results obtained by the ADAM optimizer with an initial learning rate 0.01, and the black solid lines represent the FCI energies for comparison.(a2,b2,c2,d2) The absolute errors between the results obtained by the three optimizers BOBYQA, BFGS and ADAM compared to the corresponding FCI energies for HF (a2), HCl (b2), LiH (c2) and H2O (d2).In these simulations we have used N h /m = 2.   trapped in local minima.The relatively stable convergence indicates that the gradients calculated with our differentiable MPS should be fairly accurate, while exploring the convergence of large molecules towards chemical accuracy using more involved variational circuit ansatz is beyond the scope of this work.

Conclusions
In this work, we have proposed the differentiable matrix product states for simulating variational quantum algorithms.The differentiable MPS simulator seamlessly integrates the MPS based quantum circuit simulator with the automatic differentiation framework, by using a memory-efficient back propagation algorithm where only two copies of quantum states have to be stored at least.Compared to other approaches for simulating variational quantum circuits, our differentiable MPS is less sensitive to the number of qubits, and its complexity is independent of the number of variable parameters.The accuracy as well as the efficiency of our method are demonstrated in real chemical systems, which shows that our differentiable MPS simulator could be a promising testing ground for variational quantum algorithms on nearterm quantum computers.
In the end, we stress that as a general feature of MPS based algorithms, the bond dimension of the MPS ansatz could significantly affect the accuracy of the simulation, depending on the entanglement generated in the underlying quantum state.For our differentiable MPS simulator which aims to simulate variational quantum circuits, the calculated gradients also subject to this constrain.Therefore a good prac-tice in applying the differentiable MPS simulator is to check the convergence of it against different bond dimensions.In future investigations, other types of tensor networks, such as the projected entangled pair states [61] and the tree tensor network [62], could also be explored to simulate variational quantum circuits.

Acknowledgments
Our code for the differentiable MPS algorithm is purely written in the Julia language [63], which works on any CPU that supports Julia.The parallelization is done using the MPI.jl package [64].The serial version of our code is open sourced [65], which supports the splitting of the Hamiltonian buts runs in serial for different groups.The optimization algorithms used in this work are from the NLopt.jlpackage [66].

A Back propagation rule for loss function with complex numbers
Here we briefly summarize the back propagation rules defined in Ref. [50] for loss functions involving complex numbers (the output is still real).We assume that the loss function can be written as F (z, z * ) with z = x + iy and z * = x − iy.For simplicity we assume x, y to be real numbers, but the results can be easily generalized to arrays of scalars.F (z, z * ) can also be equivalently viewed as a function of real numbers as F (z, z * ) = F (x, y), and we have the relations If we treat the loss function as a function of real numbers, it is well-known that the gradient of F should be (− ∂ where λ is a small positive number.If we treat the loss function as a function of complex numbers, we can verify that if we update z as z − 2λ ∂F ∂z * , then we which means that we can simply take −2 ∂F ∂z * as the gradient of F (z, z * ) against z when minimizing F (z, z * ).Now we assume that F (z, z * ) can be written as a composite function as It is proven in Ref. [50] that if we define the back propagation rule for each elementary function as then the gradient of F (z, z * ) against z can be evaluated as which is exactly in parallel with the case of real loss functions.For nonscalar complex function g, one can simply generalize Eq. (41) as In case the input of g is real, Eq.( 43) reduces to In case the output of g is real, Eq.( 43) reduces to Eq.( 44) and Eq.( 45) are the back propagation rules used to derive Eq.( 30) and Eq.( 29) in the main text respectively.

Figure 1 :
Figure 1: (a) Single-qubit gate operation on an MPS.(b) Nearest-neighbour two-qubit gate operation on an MPS, which preserves the right-canonical form of the MPS.(c) Computing the expectation value of a general Pauli string on an MPS.In all these panels, the circles represent the MPS site tensors, the diamonds represent the bipartition singular vectors, and the squares represent the single-qubit and twoqubit operators.In panel (a) and (b), we have used different colors for the same objects to stress that they have been updated during the gate operations.

Figure 3 :
Figure 3: (a) The mean relative error R, defined in Eq.(34), as a function of the bond dimension D for the LiH molecule (yellow dashed line with circle), the H2O molecule (blue dashed line with square) and the O2 molecule (cyan dashed line with ×), where the number of parameters used for these three molecules are M = 416, 432, 504 respectively.(b) The speed up of the gradient calculation using differentiable MPS against using the parameter shift rule for the LiH, H2O and O2 molecules with a fixed D = 128.In these simulations we have used N h /m = 8.Each point is calculated by averaging over 10 independent evaluations with randomly initialized variational parameters within range [−π, π].

Figure 4 :
Figure 4: The mean relative error R against the number of Pauli strings in each group, namely N h /m.We have used the H2O molecule in the STO-3G basis set (14 qubits) for this calculation, for which N h = 384.

2 Figure 5 :
Figure 5: Convergence of different optimization algorithms towards the ground state for (a) the H2O molecule and (b)the N2 molecule.The green, red and cyan lines in both panels stand for the BOBYQA, BFGS and gradient descent (GD) algorithms respectively.The y axis show the absolute error between the energies computed using these methods and the FCI energy.In these simulations we have used N h /m = 4.The typical hyperparameters for these optimization algorithms are set as follows (which are used throughout this work).For gradient-free optimizer (BOBYQA) the convergence criterion is that the difference between two successive minimization steps (epochs) is smaller than 10 −6 .For gradient-based optimizers (GD and BFGS), we use previous criterion plus an additional criterion that the norm of the gradient is less than 10 −5 , and the program stops if any of the two criteria is met.The maximum number of iterations is set to be 100 for gradient-based optimizers and 200 for BOBYQA.

Figure 7 :
Figure 7: Convergence of the energy during VQE for the BeH2 molecule in the 6-31G* basis set and the H4 molecule in the cc-pVDZ basis set, where we have used the BFGS optimizer.The y-axis shows the absolute error compared to the FCI energy.The detailed information for these simulations are listed in Table.3.
Figure 7: Convergence of the energy during VQE for the BeH2 molecule in the 6-31G* basis set and the H4 molecule in the cc-pVDZ basis set, where we have used the BFGS optimizer.The y-axis shows the absolute error compared to the FCI energy.The detailed information for these simulations are listed in Table.3.
This work was supported by National Natural Science Foundation of China under Grant No. T2222026 and Grant No. 22003073.C. G. is supported by the Open Research Fund from State Key Laboratory of High Performance Computing of China (Grant No. 202201-00).

Table 2 :
(35)nd state energies for the molecules O2, CO, CO2 involved in the chemical reaction in Eq.(35).Units are in Hartree.We have used N h /m = 8 in this simulation.(the runtime per epoch for BeH 2 is the largest since its variational circuit ansatz contains the largest number of gate operations).We can see that in both cases the energy quickly converges to a steady value that is about 10 −2 Hatree higher than the corresponding FCI energy, that is, one order of magnitude larger the chemical accuracy.Similar to the case of CO 2 , the reason is likely due to the limited expressiveness of the wave function ansatz we have used.Since we have repeated each simulation for a large number of times with random initialization of the variational parameters and picked the one with the lowest final energy, we believe that these simulations are less likely

Table 3 :
Information of our simulations for the three molecules CO2 (STO-3G), BeH2 (6-31G*) and H4 (cc-pVDZ) with equal to or more than 30 qubits.We have used N h /m = 4 and 70 processes for all these simulations.Each process uses 8 threads on an Intel x86 CPU with 2.7 GHz frequency per thread.The column "Memory" shows the largest memory usage per process and the column "Time" shows the runtime per epoch averaged over 5 epochs.