An efficient quantum algorithm for the time evolution of parameterized circuits

We introduce a novel hybrid algorithm to simulate the real-time evolution of quantum systems using parameterized quantum circuits. The method, named"projected - Variational Quantum Dynamics"(p-VQD) realizes an iterative, global projection of the exact time evolution onto the parameterized manifold. In the small time-step limit, this is equivalent to the McLachlan's variational principle. Our approach is efficient in the sense that it exhibits an optimal linear scaling with the total number of variational parameters. Furthermore, it is global in the sense that it uses the variational principle to optimize all parameters at once. The global nature of our approach then significantly extends the scope of existing efficient variational methods, that instead typically rely on the iterative optimization of a restricted subset of variational parameters. Through numerical experiments, we also show that our approach is particularly advantageous over existing global optimization algorithms based on the time-dependent variational principle that, due to a demanding quadratic scaling with parameter numbers, are unsuitable for large parameterized quantum circuits.


Introduction
In recent years, our ability to manipulate and measure quantum systems has improved tremendously. Among all experimental platforms, the number of addressable qubits has increased remarkably: both Google [1] and IBM [2] have reported superconducting circuits chips with ≈ 50 qubits and according to their public timelines they expect to build 1000-qubit devices within the next 2 years. Despite this impressive development, universal quantum-computing remains still far in the future. Algorithms such as Shor's [3], Quantum-Fourier Transform [4] or general Quantum Simulators [5,6,7] require a number of operations (gates) at least polynomial in the qubit number. However, in the absence of large-scale quantum error correction, the number of gates that can be applied is at present strongly limited by hardware noise and decoherence.
To circumvent the problem, current generations of quantum algorithms rely on a hybrid classicalquantum approach [8,9,10,11,12,13]. A typical example is given by the classical optimisation of a quantum circuit encoding the solution of a given problem. Quantum circuits can be used as the model of a machine-learning problem [14,15], such as a quantum classifiers [16,17], quantum kernel machines [18,19,20], quantum Boltzmann machines [21] or convolutional networks [22] to name a few. Quantum circuits can also be used to directly approximate the state of an interacting quantum system [23,24,25]. In particular, several of the proposed hybrid algorithms extend the use of the variational method to quantum computing: a trial quantum state (ansatz state) with a tractable number of parameters and mild circuit depth is considered, then these parameters are optimized in order to approximate a target state as accurately as possible.
For quantum simulation, several variational, resource friendly alternatives to Trotterization [26,27,28,29] have been proposed [9,30,31,32,33]. Among those, an interesting approach introduced in 2017 [9] is the The Time-Dependent Variational Algorithm (TDVA). This method, based on a reformulation of the Dirac-Frenkel and McLachlan variational principle [34,35,36], encodes the state into a variational circuit and iteratively updates the parameters by solving a Euler-Lagrange equation of motion. The approach is conceptually close to the study of quantum dynamics using classically parameterized many-body wave functions [37,38,39,40]. At variance with the classical many-body counterparts, the computational cost of the TDVA approach is dominated by the stochastic estimation of the Quantum Geometric Tensor [41,42] and its inversion, resulting in an expensive quadratic cost in the total number of variational parameters. To alleviate this issue, new variational methods based on partial, local optimisations of the variational parameters have been recently proposed [43,44,45].
In this paper we propose a hybrid quantum algorithm for approximating the real time evolution of an interacting quantum system, called projected-Variational Quantum Dynamics (p-VQD). This algorithm is both global -it optimizes all parameters at once -and efficient -it scales linearly with the number of parameters. Moreover, it does not require auxiliary (ancilla) qubits and the depth of the circuit is constant throughout all the simulation. The structure of this paper is as follows: in Section 2 we present the algorithm in order to simulate real time evolution of quantum systems, while in Section 3 we benchmark our algorithm on a Transverse Field Ising model, assessing the accuracy of the method on an ideal quantum simulator performing single and multi shot executions. We conclude, in Section 4, with some considerations on the proposed algorithm.

Method
We address the simulation of a quantum system with HamiltonianĤ. For clarity of exposition we will assume thatĤ is time independent, but it is not a requirement of the algorithm. The time evolution operator for a small time-step δt ∈ R is e −iĤδt . Let |ψ w(t) be the parameterized ansatz state approximating the exact quantum state of the system |Ψ(t) at time t, where w(t) ∈ R p is the vector of its p−parameters. We assume |ψ w(t) is a unit vector and define the evolved state |φ(t + δt) = e −iĤδt |ψ w(t) . From now on, we will indicate w(t) as w to simplify the notation, implying that the parameters we are referring to are the one assigned to the ansatz at time t, except when explicitly indicated. Also, we will shorten |φ(t + δt) to |φ(δt) .
In order to variationally approximate the time evolution of the system, we aim to maximize the overlap between the evolved parameterized state |φ(δt) and the state |ψ w+dw , where dw ∈ R p is a vector of parameter variations. We want to find dw such that arg max This intermediate overlap optimization is routinely used in classical simulations of the dynamics of quantum systems, and its application to quantum circuits has been recently suggested in the context of quantum tensor networks [46,45] and symmetry-preserving variational wavefunction [47]. Similarly to [48], we derive the condition indicated in Eq. (1) defining the projected real-time evolution, as can be seen in Appendix A. The optimal update dw therefore minimizes the step-infidelity where the 1 δt 2 factor has been added to make L independent from the time step size in the limit of δt → 0. More details about the introduction of this factor can be found in Appendix B. Given that the wave-function is encoded as the circuit C(w), substituting into the loss function in Eq. (2) the definition |ψ w = C(w)|0 we obtain where the second term on the right-hand side is an expectation value which can be sampled on a quantum computer. Another method that can be considered to measure the overlap between |ψ w+dw and |φ(δt) is the SWAP test [49,50], although it requires ancillary qubits and long range gates. For these reasons, we will not consider this method in the following discussion. The time evolution operator e −iĤδt is encoded in the form of a Trotter-Suzuki decomposition [26,27].
The optimal dw are determined by iteratively descending along the steepest direction given by the gradient ∂ ∂dwi L(dw, δt), starting from an initial guess dw 0 . In general, the gradient of a quantum expectation value involving parameterized quantum circuits can be determined using finite differences [51] or simultaneous perturbation techniques [52]. However, those techniques rely on an approximation of the real gradient. To improve the accuracy, we consider circuits of the general form where the gates V k do not depend on any parameter and the parameterized gates U j (w j ) are of the form with G 2 j = I. We remark that in general G j can be any tensor product of Pauli operators. In this case, the gradient can be computed exactly in a hardwarefriendly way using the parameter shift rule [51,53,54,55,56,57], obtaining where e i is the versor in the i-th direction and s ∈ R. We remark that our p-VQD algorithm does not require an ancilla qubit to perform measurements.
The optimisation of dw is then performed according to a standard gradient descent scheme with learning rate η ∈ R + . A single optimisation step requires O(p) measurements. The optimisation Compute the gradient on a quantum computer Final dw * and next time-step Update dw n+1 = dw ndw L �

Converged? No
Yes C † (w) e iHδt C(w+dw) Figure 1: Sketch of the p-VQD algorithm. We follow the real time evolution of the ansatz state ψw(t) in the Hilbert space by optimizing the parameter variation dw at every time step. The optimisation is performed through the gradient of the step-infidelity function L(dw, δt), computed using a quantum computer.
continues until the loss function goes below the desired threshold ν. Finally, once dw is determined, the parameters at time t + δt are obtained by We highlight that the circuit width and depth are the same from the beginning to the end of the simulation. A sketch of the algorithm is shown in Fig 1, while an accurate study of the total hardware cost can be found in Section 3.

Relationship with other methods
In order to understand the connection of our approach with existing methods proposed in the quantum computing setting, it is conceptually interesting to explicitly take the limit of a vanishing time step. In this limit, the parameter updates obtained with p-VQD is dw =ẇδt andẇ is the solution of the equation where G kj is the Quantum Geometric Tensor (QGT) (10) This expression for dw coincides with that given by the McLachlan's variational principle [36,30] (considering also a time-dependent global phase on the ansatz state). Both McLachlan's and the time-dependent variational principle suffice to simulate real time dynamics of closed systems. Moreover, it is possible to prove that the two approaches are equivalent under certain assumptions [30]. However, on the practical point of view, there are some differences between them due to numerical instabilities that we will analyse further in Sec. 3. Considering the expression for the QGT given in Eq. (10), the TDVA relies on the evaluation of its imaginary part, which is skew-symmetric, making its inversion unstable when the off-diagonal elements are close to 0. The McLachlan's variational principle, on the contrary, requires the real part, as can be seen in Eq. (9). For this reason, it is suggested that McLachlan's is the most consistent variational principle for quantum simulation [30]. A detailed derivation of Eq. (9) can be found in Appendix C.
Another method conceptually similar to pVQD is the Restarted Quantum Dynamics (RQD), independently proposed in a recent preprint [47]. This technique also solves an optimisation problem to obtain the timeevolved state. However, the RQD is based on the minimisation of the infidelity squared by varying the parameters of the ansatz state, which can lead to different optimisation dynamics.

Barren plateaus
It has recently been shown that the optimisation of shallow circuits can also be affected by so-called barren plateaus if the cost function is global [58]. This covers the case of operators in the form of that we use to estimate the cost function in Eq. 2 belongs to this class, meaning that our cost function is global.
Barren plateaus are usually recognized by measuring the variance of the components of the gradient [58,59,60,61,62]. However, it is possible to show that both the fidelity between two states and the variance of its gradient have a non-vanishing lower bound (in the limit of a large number of qubits) if the two states differ by an infinitesimal transformation [60]. Therefore, p-VQD avoids such plateaus entirely because at every time step we optimise the infidelity between the current state ψ w(t) and the infinitesimally evolved state U (δt) ψ w(t) .
A more general approach to avoid barren plateaus consists in replacing the global cost function by one that is local but has the same minimum [58]. The global operator O G (in our case, encoding the infidelity) can be replaced by its local counterpart where Ij is the identity operator on all qubits except for j . Since O L w = 0 ⇔ O G w = 0, in the limit of small δt the generated dynamics will also be equivalent to the McLachlan's variational principle. In Sec. 3 we discuss some numerical results comparing the two approaches.

Results
To demonstrate a practical application of the p-VQD algorithm, we consider the Transverse Field Ising Model on an open chain, The first term accounts for interaction between spins while the latter represents a local and uniform magnetic field along the transverse direction x. For our simulations, we considered J = 1 4 , h = 1 and, except when explicitly indicated, N = 3 spins. We compare the time-evolution obtained through the p-VQD against the more-estabilished TDV-Algorithm. We analyse the ideal case of a simulation in which we have access to the state vector produced by the quantum circuit (state-vector simulation) and the case in which to gain information about the quantum state we have to repeatedly measure the qubits (multi-shot simulation). The two simulations coincide in the limit of infinite samples. However, when the number of samples is finite, statistical fluctuations produce a noise on the results which we will refer to as shot noise. For both state-vector and multi-shot simulations we use IBM's open-source library for quantum computing, Qiskit [63].
We consider a circuit ansatz of the form where R (i) α (w i,l ) = e −iw i,l σ α i is a single qubit rotation around the α = {x, y} axis. Every block c l (w l ) has a layer of single qubits rotations followed by a layer of entangling two-qubits gates. The total number of blocks, or depth, is d. When α = x , a block c l is equivalent to the Trotter-Suzuki approximation of the unitary operator e −iĤδt . A more general parametrization is obtained by alternating rotations around the x and y axis (α = x iff l is odd, α = y otherwise). The representation power and the number of variational parameters can be increased by making the ansatz deeper. For the chosen system, the two ansätze perform similarly when shot noise is neglected. However, alternating blocks of x and y rotations proved to be more stable in presence of shot noise. In the following we will be studying the latter unless otherwise noted.
As a first comparison, we consider the integrated infidelity ∆ F (T ) achieved by both algorithms with respect to the exact simulation of the system over an entire time evolution from t = 0 to t = T . This quantity can be expressed as We have performed several simulations both with p-VQD and the TDVA for different shots per circuit evaluation (total number of samples) and report the mean performance in Fig. 2.
At a fixed number of samples, the integrated infidelity ∆ F for the p-VQD is up to an order of magnitude below the TDVA one for the same number of time step. In Appendix C it is shown that with small time steps and in the limit of infinite samples (ideal measures) the parameter variation found by p-VQD coincide with the solution of the Euler-Lagrange equation generated by the McLachlan's variational principle. The matrices of the Euler-Lagrange equations often show high condition number [64], in particular those generated by the time-dependent variational principle [30]. These ill-conditioned matrices are likely to produce large variations on the results when subject to small variations of the coefficients, as those produced by shot noise. We remark that shot noise is unavoidable for quantum computers, even for faulttolerant devices. Improving iteratively the solution of the Euler-Lagrange equation leads to the estimate of the parameters' variation, but does not require matrix inversion. As a result, we obtain a more stable algorithm against shot noise.
In Fig. 3 we compare the expectation values of the total magnetization along thex andẑ axis for the states evolved according to the two algorithms. The results are reported for a different number of measurements per circuit.
The magnetization along thex axis proved to be the most difficult to capture for both the algorithms, suggesting that the problem could be the ansatz choice. We note that both the algorithms converge to the exact simulation values as the number of shots increases. In general, p-VQD shows comparable results with TDVA using one order of magnitude fewer shots.
To further analyse the performance of our algorithm, we studied the behaviour of the step-infidelity L(dw, δt). In Fig. 4 we show the number of optimisation steps required as a function of time and the consequent decrease of the cost function.
We remark that choosing the previous dw as an initial guess often leads to convergence in only one iteration. More details about the convergence of the step-infidelity in a single time step can be found in Appendix D.
We characterise the hardware cost of p-VQD when the number of variational parameters increases. Since the number of measures required by p-VQD scales linearly (O(p)) with the number of variational parameters, it has a lower asymptotic cost with respect to TDVA, which is quadratic. We can provide an accurate estimation of the hardware cost of p-VQD as an upper-bound to the total number of samples needed per simulation. We indicate with n s and n t the number of shots and time steps, respectively. In the spirit of the iterative methods, we suppose that the procedure will find a solution dw * in at most M steps. In this case, we have that the total number of hardware measurements is Final step-infidelity Accuracy required Initial step-infidelity the total dependence of the computational cost of the algorithm upon the number of parameters is indeed O(p).
All the simulations have been done fixing a number of shots n shots per circuit evaluation. For p-VQD, we made multiple simulations and then reported the mean values of our results. For TDVA the number of circuit evaluation is fixed and can be estimated a priori. This comparison does not consider the fidelities of the results obtained with the exact simulation, we know from what we showed in Fig. 2 that p-VQD has a lower error when the total number of samples is comparable with TDVA. We note that the number of samples required scales approximately linearly with the number of parameters, with fluctuations due to different optimisation steps required. The lower the step-infidelity required, the higher will be the number of shots and possibly of the optimisation steps, resulting in greater fluctuations on the total number of samples. In this case, we remark that more advanced methods like the use of an adaptive learning rate η can be used to improve convergence performance of p-VQD.
Finally, we compare numerically the performance of a global and a local cost function. As discussed in Sec. 2, both O G (Eq. 2) and O L ( Eq. 12) can be used. In Fig. 6 we show the minimisation of both cost functions for different time steps and increasing number of spins N ∈ [3,11], obtained using the exact state-vector simulator.  All the curves are obtained by setting the initial condition dw 0 = 0. We highlight that our numerical findings point to an efficient scaling with the system size. We remark that the first optimisation steps of O G depend very weakly on the number of qubits. However, we also note that in general barren plateaus during optimisation may not show up until even larger systems are considered [58,62]. For this reason, one should benchmark which cost function behaves best for their specific system and size.

Discussion
In this work, we presented a new algorithm for the efficient variational simulation of the real-time evolution of quantum systems. We have shown that it is asymptotically more hardware efficient than the time-dependent variational algorithm (TDVA), while retaining an higher accuracy. Considering the projected time evolution, we avoid numerical instabilities due to the matrix inversion combined with statistical fluctuations due to finite shot measurements. We have numerically investigated the absence of barren plateaus, paving the way towards the simulation of larger quantum systems. One possible application of our approach is to study the dynamical properties of two-dimensional interacting systems, a notoriously difficult problem in classical computational physics. Similarly to all other variational algorithms, the choice of the right parametrization is fundamental for the algorithm to succeed. In this sense, having an efficient quantum algorithm to perform variational time evolution is essential to compare to classical results obtained with variational states either based on tensor networks [65,66,67], or neural networks [68,69]. A possible improvement to further enhance the efficiency of our approach concerns the estimation of the gradient. At present, a drawback of our method is that the circuit constructed on the quantum device is approximatively twice as deep as the ansatz used to represent the system. However, by suitably controlling the number of two-qubits gates in the chosen ansatz, representing the major source of circuit error, we believe that p-VQD can already be used to simulate small quantum systems on available devices.

Data availability
The code used to run the simulations in this article has been written in Python using Qiskit [63]. It is open source and can be found on GitHub [70]. (20) Now we expand to first order in dw, obtaining |ψ w+dw = |ψ w + j dw j |∂ j ψ w . The first order contribution in δt vanishes, but also the first order contribution in dw, then we have The final result reads As δt → 0, also dw j → 0 ∀j and their ratio remains constant. Therefore, the addition of 1 δt 2 factor in Eq. (2) makes it independent of the time step size in the limit of δt → 0.

C Equivalence between the p-VQD and the McLachlan's variational principle
In this Appendix, we show that in the limit of small time step the parameter variation that fulfils the request in Eq. (1) is the same obtained by applying the McLachlan's variational principle [36,30].
We start Taylor expanding the overlap to the second order: using the substitution we obtain where we used |φ(δt) = e −iHδt |ψ w as in the main text.
Then, we expand the time evolution operator to the first order e −iHδt = I−iHδt+o(δt) and partially differentiate both sides of the normalization condition ||ψ w || 2 = 1 with respect to parameters w i and w j to obtain the two important properties Substituting in Eq. (24) we have where we neglected the third order contribution in dw and δt. We notice that the second order term in dw k dw j is the real part of the Quantum Geometric Tensor (QGT) , as expressed in Eq. (10). Finally, in the p-VQD algorithm we aim to find the dw that maximizes the overlap in Eq.(1) , thus we impose the first order optimality condition ∂ ∂dw k | φ(δt)|ψ w+dw | 2 = 0 ∀k (28) that in the limit δt → 0 gives the equation where we indicated with G the QGT, as in the main text. This is the same evolution equation for parameters w that is obtained when the McLachlan's variational principle is used to simulate real time dynamics of closed systems with pure quantum states. For an extensive review of the variational principles for time evolution of pure and mixed states, see [30].

D Optimisation routine for a single time-step
In this Appendix, we further analyse the optimisation routine of our algorithm. This optimisation is performed on the parameter variation dw for every time step. In this case we focus on a state-vector simulation of the Transverse Field Ising Model on an open chain (see main text), considering only a single time step (without loss of generality, the third time step of the simulation). As Fig. 7 shows, at every optimisation steps the step-infidelity is reduced of nearly a order of magnitude. The initial choice dw 0 = dw (t − δt) is shown to reduce the number of steps required.

E Relationship between step infidelity and number of shots
In this Appendix, we investigate the relationship between the step infidelity to reach in a p-VQD step and the number of shots n s required. Given that the fidelity is equivalent to the probability of measuring a string of 0s, with n s shots the minimum non-zero infidelity we can resolve is 1 ns . However, we stress that since we are optimising the infidelity by measuring the gradient, it is possible to converge to a state with a smaller step infidelity than this bound, depending on the variance of the gradient and on the optimisation algorithm employed.
To analyse the effects of the shots on the infidelity optimisation we considered a single time step of the algorithm and optimised the ansatz parameters using a gradient sampled with n s shots, while measuring the infidelity exactly using the state-vector. For each number of shots n s , we optimised the step infidelity and measured its convergence value. Due to the presence of shot noise, we repeated the optimisation multiple times and plotted the mean and the standard deviation of those calculations in Fig. 8. We performed the experiment using two different optimizers, Stochastic Gradient Descent and Adam, to study how this choice affects the infidelity minimisation. We also fit the points to a function f = kn γ s with k and γ free parameters,to estimate the exponent γ. The expected n −1 s dependence is shown as a visual guide for clarity. We can see that, when a small number of shots is considered, the Adam optimizer is able to minimise the infidelity up to two order of magnitude below the minimum resolvable with shots. When the number of shots increases, the performances of the two optimisers become comparable.