Variational quantum solver employing the PDS energy functional

Recently a new class of quantum algorithms that are based on the quantum computation of the connected moment expansion has been reported to find the ground and excited state energies. In particular, the Peeters-Devreese-Soldatov (PDS) formulation is found variational and bearing the potential for further combining with the existing variational quantum infrastructure. Here we find that the PDS formulation can be considered as a new energy functional of which the PDS energy gradient can be employed in a conventional variational quantum solver. In comparison with the usual variational quantum eigensolver (VQE) and the original static PDS approach, this new variational quantum solver offers an effective approach to navigate the dynamics to be free from getting trapped in the local minima that refer to different states, and achieve high accuracy at finding the ground state and its energy through the rotation of the trial wave function of modest quality, thus improves the accuracy and efficiency of the quantum simulation. We demonstrate the performance of the proposed variational quantum solver for toy models, H$_2$ molecule, and strongly correlated planar H$_4$ system in some challenging situations. In all the case studies, the proposed variational quantum approach outperforms the usual VQE and static PDS calculations even at the lowest order. We also discuss the limitations of the proposed approach and its preliminary execution for model Hamiltonian on the NISQ device.


Introduction
Quantum computing (QC) techniques attract much attention in many mathematics, physics, and chemistry areas by providing means to address insurmountable computational barriers for simulating quantum systems on classical computers. [2,3,43,47,53,61] One of the focus areas for quantum computing is quantum chemistry, where Hamiltonians can be effectively mapped into qubit registers. In this area, several quantum computing algorithms, including quantum phase estimator (QPE) [5,12,15,26,38,52,58,69] and variational quantum eigensolver (VQE), [16,24,29,31,32,44,50,55,60] have been extensively tested on benchmark systems corresponding to the description of chemical reactions involving bond-forming and breaking processes, excited states, and strongly correlated molecular systems. In more recent applications, several groups reported quantum algorithms for imaginary time evolution, [42,46] quantum filter diagonalization, [48] quantum inverse iteration algorithms, [36] and quantum power/moments methods. [59,67] The main thrust that drives this field is related to the efficient encoding of the electron correlation effects that are needed to describe molecular systems.
Basic methodological questions related to an efficient way of incorporating large degrees of freedom required to capture a subtle balance between static and dynamical correlations effects still need to be appropriately addressed. A typical way of addressing these challenges in VQE approaches is by incorporating more and more parameters (usually corresponding to excitation amplitudes in a broad class of unitary coupledcluster methods [4,17,27,35,37,64]). Unfortunately, this brute force approach is quickly stumbling into insurmountable problems associated with the resulting quantum circuit com-plexity and problems with numerical optimization procedures performed on classical machines (the so-called barren plateau problem reported in Refs. [10,11,41,45,51,66,68]).
In this paper, we propose a new solution to these problems. Instead of adding more parameters to the trial wave function, we choose to optimize a new class of energy functionals (or quasifunctionals, where the energy is calculated as a simple equation solution) that already encompasses information about high-order static and dynamical correlation effects. An ideal choice for such high-level functional is based on the Peeters, Devreese, and Soldatov (PDS) formalism, [49,62] where variational energy is obtained as a solution of simple equations expressed in terms of the Hamiltonian's moments or expectations values of the powers of the Hamiltonians operator defined for the trial wave function. In Ref. [34] we demonstrated that in such calculations highlevel of accuracy can be achieved even with very simple parametrization of the trial wave functions (capturing only essential correlation effects) and low-rank moments. We believe that merging the PDS formalism with the quantum gradient based variational approach would be considered as a more interesting alternative for by-passing main problems associated with the excessive number of amplitudes that need to be included to reach the so-called chemical accuracy.
In the following sections we will briefly introduce the PDS formalism and describe how the PDS energy functional can be incorporated with the minimization procedures that are based on the quantum gradient approach [25,42,45,57,71] to produce a new class of variational quantum solver (which is called PDS(K)-VQS for short in the rest of the paper) to target the ground state and its energy in a quantum-classical hybrid manner. Furthermore, we will test its performance, in particular the performance of the more affordable lower order PDS(K)-VQS (K = 2, 3, 4) approaches combining with the trial wave function expressed in low-depth quantum circuits, at finding the ground state and its energy for the Hamiltonians describing toy models and H 2 molecular system, as well as the strongly correlated planar H 4 system, in some challenging situations where the barren plateau problem precludes the effective utilization of the standard VQE approach.

PDS formalism
In this section we will give a brief description of the PDS formalism. The detailed discussion of the PDS methodology and highly relevant connected moment expansion (CMX) formalisms have been given in the original work [49,62] as well as our recent work [14,34] and many earlier literatures (see for example Refs. [18,19,33,39,40,54,65]). The many-body techniques used in the derivation of PDS expansions originate in the effort to provide upper bounds for the free energies, and to provide alternative re-derivation of the Bogolubov's [8] and Feynman's [20] inequalities. Since the Gibbs-Bogolubov inequality reduces to the Rayleight-Ritz variational principle in zero temperature limit, these formulations can be directly applied to quantum chemistry. Here we only provide an overview of basic steps involved in the derivations of the PDS formulation.
A starting point of the studies of upper bounds for the exact ground-state energy E 0 is the analysis of function Γ(t) (defined for trial wave function |φ having non-zero overlap with the ground-state wave function) and its Laplace transform f (s) f (s) takes the form where ω(E i ) = | Ψ i |φ | 2 . The PDS formalism is based on introducing parameters into expansion (4) using a simple identity (with a real parameter a) When the above identity is applied for the first time to Eq. (4) (introducing the first parameter a 1 ) one gets the following expression for the f (s) function The transformation (6) can be repeated K times (with each time introducing a new parameter a i , i = 1, . . . , K) to reformulate the f (s) function as where The K-th order PDS formalism (PDS(K) for short henceforth) is then associated with defining the introduced K real parameters (a 1 , . . . , a K ) that minimize the value of R K (s, a 1 , . . . , a K ). In this minimization process the necessary extreme conditions are given by the system of equations which can be alternatively represented by the matrix system of equations for an auxiliary vector Here, the matrix elements of M and vector Y are defined as the expectation values of Hamiltonian powers (i.e. moments), M ij = φ|H 2K−i−j |φ , Y i = φ|H 2K−i |φ (i, j = 1, · · · , K) (for simplicity, we will use the notation H n ≡ φ|H n |φ ).
It can be shown that the optimal parameters in the PDS(K) formalism, (a , are the roots of the polynomial P K (E), and these roots provide upper bounds for the exact ground and excited state energies, e.g., for the ground state energy we have Note that, as shown in Refs. [49,62] the PDS formalism also applies to the Hamiltonian characterized by discrete and continuous spectral resolutions together.

PDS(K)-VQS formalism
In the variational method, we approximate the quantum state using parametrized trial state |Ψ ≈ |φ . Using a quantum circuit, the trial state can be prepared by applying a sequence of parametrized unitary gates on the initial state |0 , is the k-th unitary single-or two-qubit gate that is controlled by parameter θ k . The goal is to approach the ground-state energy of a many-body Hamiltonian, H, by finding the values of these parameters, θ, that minimize the expectation value of the Hamiltonian To do this, the conventional VQE starts by constructing the ansatz |φ( θ) and measuring the corresponding expectation value of the Hamiltonian using a quantum computer, and then relies on a classical optimization routine to obtain new θ. During the parameter optimization (or dynamics), the set of parameters that is updated at the k-th step (k > 1) can be written as where ∇E( θ) = ∂E/∂ θ is the energy gradient vector, and η is the step size (or learning rate). R( θ) is the Riemannian metric matrix at θ that is flexible to characterize the singular point in the parameter space and is essentially related to the indistinguishability of E( θ). [71] It is worth mentioning that Eq. (5) originates from natural gradient learning method in the general nonlinear optimization framework especially targeting machine learning problems. [1] Here, the natural gradient is the optimizer that accounts for the geometric structure of the parameter space. For the curved (or nonorthonormal) parameter manifold that exhibits the Riemannian character (e.g. in large neural networks), natural gradient learning method is often employed to avoid the plateaus in the parameter space. [42,63,71] Note that when the parameter space is a Euclidean space with orthonormal coordinate system the Riemannian metric tensor will reduce to the unity matrix (see Tab. 1). In VQE setting, one can define the Riemannian metric as the quantum Fubini-Study metric, which is the quantum analog of the Fisher information matrix in the classical natural gradient, [1] to measure the distance in the space of pure quantum states. The quantum Fubini-Study metric describes the curvature of the ansatz class rather than the learning landscape, but often performs as well as Hessian based methods (e.g. BFGS optimizer that approximates the Hessian of the cost function using first-order gradient, see Ref. [70] for a recent detailed discussion). There are also some other options for the Riemannian metric including imaginary-time evolution (ITE) or even classical Fisher metric that have been discussed in some recent reports. [42,63,71] In Tab. 1, three commonly used flavors of the Riemannian metric matrix R( θ) are listed and will be used in the following case studies. Remarkably, as pointed out in Refs. [42,73], the difference between natural gradient descent (NGD) and ITE accounts for the global phase, and if introducing a time-dependent phase gate to the trial state, the Riemannian metric employing NGD will be equivalent to the metric employing ITE.
where the coefficient X = (X 1 , · · · , X K ) T is obtained by solving the following linear equation with the element of matrix M and vector Y being defined as the expectation values of Hamiltonian powers (i.e. moments), In the variational method, we approximate the quantum state using parametrized trial state | i ⇡ | i. Using a quantum circuit, the trial state can be prepared by applying a sequence of parametrized unitary gates on the initial state |0i, (✓ = {✓ 1 , · · · , ✓ n }). Here U k (✓ k ) is the k-th unitary single-or two-qubit gate that is controlled by parameter ✓ k . The goal is to approach the ground-state energy of a many-body Hamiltonian, H, by finding the values of these parameters,✓, that minimize the expectation value of the Hamiltonian To do this, the conventional VQE starts by constructing the ansatz | (✓)i and measures the corresponding expectation value of the Hamiltonian using a quantum computer, and then relies on a classical optimization routine to obtain new✓. During the parameter optimization, the set of parameters that is updated at the k-th step (k > 1) can be written as where rE(✓) = @E/@✓ is the energy gradient vector, and ⌘ is the step size (or learning rate). R(✓) is the Riemannian metric matrix at✓ that are flexible to characterize the singular point in the paramter space and are essentially related to the indistinguishability of E(✓). In Tab. I, three commonly used flavors of the Riemannian metric matrix R(✓) are listed and will be used in the following numerical examples.
To get the energy gradient in the PDS framework, take the derivative w.r.t. ✓ i on both sides of Eq. (1), we then have where @X @✓i can be obtained by solving the following linear equation Fig. ?? summarizes the workflow of PDS-VQE framework relies on new PD hH k i and their✓-derivatives being circuits. In the following, from sev ther demonstrate how the correspo constructed and how the PDS-VQE

A. Toy Hamoltonians
We first consider two slightly lar Note that same toy models have b et al. to demonstrate the difference inary time evolution (ITE) and grad ing the ground-state energy of Ham Here,R p,q Y (✓) is a controlled Y qubit p and target qubit q, and R qubit p around the x-axis. The r defined as R j (✓) = e i✓ j /2 with spin matrices. Based on the ansat the Hamiltonian can be expressed a with the element of matrix M and vector Y bei the expectation values of Hamiltonian powers (i In the variational method, we approximate state using parametrized trial state | i ⇡ | i. U tum circuit, the trial state can be prepared by a quence of parametrized unitary gates on the init is the k-th uni two-qubit gate that is controlled by parameter ✓ to approach the ground-state energy of a many tonian, H, by finding the values of these param minimize the expectation value of the Hamilton To do this, the conventional VQE starts by the ansatz | (✓)i and measures the correspondin value of the Hamiltonian using a quantum comp relies on a classical optimization routine to obtai ing the parameter optimization, the set of para updated at the k-th step (k > 1) can be written a where rE(✓) = @E/@✓ is the energy gradien ⌘ is the step size (or learning rate). R(✓) is the metric matrix at✓ that are flexible to characteriz point in the paramter space and are essentially indistinguishability of E(✓). In Tab. I, three co flavors of the Riemannian metric matrix R(✓) will be used in the following numerical example To get the energy gradient in the PDS framew derivative w.r.t. ✓ i on both sides of Eq. (1), we t where @X @✓i can be obtained by solving the fol equation Fig. ?? summarizes the work In our previous work (J. Chem. Phys. 2020, 153, 201102), we reported a new class of quantum algorithms that are based on the quantum computation of the connected moment expansion to find the ground-state energy. In particular, the Peeters-Devreese-Soldatov (PDS) formulation is found variational and bearing the potential for further combining with the existing variational quantum infrastructure. Following the direction, here we propose an variational quantum algorithm based on the energy gradient of this new PDS energy expression. In comparison with the original PDS, the present variational approach helps find the ground state energy at the low order PDS expansion with the trial wave-function of modest quality, thus improves the accuracy and efficiency. The improvement can even be witnessed from some toy examples and simple chemical systems.

I. INTRODUCTION
Quantum computing (QC) techniques attract much attention in many mathematics, physics, and chemistry areas by providing means to address insurmountable computational barriers for simulating quantum systems on classical computers. ? One of the focus areas for quantum computing is quantum chemistry, where Hamiltonians can be effectively mapped into qubit registers. In this area, several quantum computing algorithms, including quantum phase estimator (QPE) ? , and variational quantum eigensolver (VQE) ? , have been extensively tested on benchmark systems corresponding to the description of chemical reactions involving bondforming and breaking processes, excited states, and strongly correlated molecular systems. In more recent applications, several groups reported quantum algorithms' applications to describe the imaginary time evolution of quantum systems. The main thrust that drives this field is related to the efficient encoding electron correlation effects needed to describe molecular systems.
Basic methodological questions related to an efficient way of incorporating large degrees of freedom required to capture a subtle balance between static dynamical correlations effects still need to be appropriately addressed. A typical way of addressing these challenges in VQE approaches is by incorporating more and more parameters (usually corresponding to excitation amplitudes in a broad class of unitary coupledcluster methods). Unfortunately, this brute force approach is quickly stumbling into insurmountable problems associated a) Electronic mail: peng398@pnnl.gov b) Electronic mail: karol.kowalski@pnnl.gov with the resulting quantum circuit complexity and problems with numerical optimization procedures performed on classical machines (the so-called barren plateau problem reported in Refs. ? ).
In this paper, we propose a new solution to these problems. Instead of adding more parameters to the trial wave function, we choose to optimize a new class of energy functionals (or quasi-functionals, where the energy is calculated as a simple equation solution) that already encompasses information about high-order static and dynamical correlation effects. An ideal choice for such high-level functional is based on the Peeters, Devreese, and Soldatov (PDS) formalism, ? where variational energy is obtained as a solution of simple equations expressed in terms of the Hamiltonian's moments or expectations values of the powers of the Hamiltonians operator defined for the trial wave function. In Ref. ? we demonstrated that in such calculations high-level of accuracy can be achieved even with very simple parametrization of the trial wave functions (capturing only essential correlation effects) and low-rank moments. We believe that the proposed algorithm may be an interesting alternative for by-passing main problems associated with the excessive number of amplitudes that need to be included to reach the so-called chemical accuracy.
In this paper we discuss the minimization procedures based on the natural gradients approach for PDS formulation to calculate ground-and excited-state electronic energies. ? Using this combined approach, we demonstrate a rapid convergence of the minimization algorithm in situations where the barren plateau problem precludes the effective utilization of the standard variational quantum eigensolver (VQE) approach. We also demonstrate that combining a new class of energy functional based on a straightforward form of the trial wavefunction expressed in terms of low-depth quantum circuit and low-rank PDS expansions (PDS(n), n=2,3), we are able to achieve a chemical level of accuracy even for strongly correlated molecular problems. We also discuss algorithms for simple encoding trial wave function and its derivatives. As a benchmark systems we employ a set of model Hamiltonians tested in Refs. ? , the H 2 , and H4 systems.

II. METHOD
In their original work, 1,2 Peetters, Devreese, and Soldatov have shown that the K-th order upper bound of the groundstate energy is the root of the polynomial P K (E), To get the energy gradient in the PDS framework, take the derivative w.r.t. θ i on both sides of Eq. (12), and after reorganizing the terms we can express the energy derivative as where ∂X ∂θ i is associated with the θ i -derivative of Eq. (11), In the present work, due to the relatively small system size, we directly exploit the Hadamard test to compute the real part of H n for the Hamiltonians that are represented as a sum of Pauli strings. It is worth mentioning that for typical molecular systems that can be represented by N qubits, the number of H n measurement scales as O(N 4n ), which nevertheless can be reduced once the Pauli strings are multiplied and their expectation values are re-used as the contributions to the higher order moments. [34] For example, as we will show later for the H 4 system that comprises 184 Pauli strings in the Hamiltonian, the effective number of Pauli strings required for arbitrary H n (n = 2, 3, 4) measurements can be dropped from 184 2 , 184 3 , and 184 4 to 1774, 3702, and 4223, respectively, after the Pauli reduction, and the 4223 strings will not be changed for more complex H n 's (n > 4). Similar findings have also been reported in Ref. [67], where by grouping the Pauli strings into tensor-product basis sets the authors examined the operator counts for H 4 of Heisenberg model defined on different lattice geometries for the number of qubits ranging from 2 up to 36, and found that the effective number of Pauli strings to be measured drops by several orders of magnitude with sub-linear scaling in a given number of qubits. For larger systems, the number of measurements can be further reduced by introducing active space and local approximation. Alternatively, one can approximate H n by a linear combination of the time-evolution operators as introduced in some recent reports. [7,59] For the estimation of ∂ H n /∂θ k , in the present work we limit U k (θ k ) exploited in the state preparation to be only one-qubit rotations. Then, following Ref. [57], ∂ H n /∂θ k can be obtained by measuring H n twice using the same circuit but shifting θ k by ± π 2 separately, i.e.
If θ k parametrizes more than one one-qubit rotations in the circuit, then based on the product rule ∂ H n /∂θ k will have contributions from all one-qubit θ k rotations, each of which will be obtained by applying (19) on the corresponding rotation.

Numerical examples
In this section, with several examples, we will demonstrate how the PDS(K)-VQS performs in some challenging situations, and its difference in comparison to the conventional VQE and static PDS(K) expansions.

Toy Hamoltonians
We first test the PDS(K)-VQS on two toy Hamiltonians , that have been exploited by McArdle et al. [42] to demonstrate the performance of different Riemannian metrics in the conventional VQE approach for finding the ground-state energy of the same Hamiltoinans. Here,R p,q Y (θ) is a controlled Y rotation of θ with control qubit p and target qubit q, and R p X (θ) is a rotation of θ on qubit p around the x-axis. The rotation about the j-axis is defined as R σ j (θ) = e −iθσ j /2 with σ j being one of the Pauli spin matrices.
Figs. 2 and 3 show the performances of the proposed PDS(K)-VQS (K = 2, i.e. PDS(2)-VQS) and the conventional VQE approaches for finding the ground state energy of the toy Hamiltonians. As can be seen, the ability of VQE navigation to avoid the local minima on the conventional PES depends on the Riemannian metric exploited. For system A, in comparison to GD, the NGD (or equivalently ITE in this case) is able to avoid the local minimum at (θ 1 , θ 2 ) = (0, 0). This is because the Riemannian metric, , used in the NGD/ITE correctly characterizes any rotation pair with θ 1 = 0 as a singular point (i.e. det |R| = 0) such that R −1 will numerically navigate the dynamics (e.g. via singular value decomposition) to avoid collapsing in this local minimum once the trajectory is getting close. Therefore, if the metric is unable to characterize the local minima as singular points, the VQE would still get trapped. This can be observed from the VQE performance for system B, where both NGD and ITE fail to escape the local minima, (θ 1 , θ 2 ) ∼ (± 3π 8 , 0), in the dynamics due to the fact that the local minima are not the singular points of R in either NGD or ITE.
In contrast, the PDS(2)-VQS robustly converge to the true ground state for both systems regardless of the employed Riemannian metric. The success of PDS(K)-VQS in these toy examples can be essentially attributed to the fact that, in comparison to the original PES where the local minima corresponding to a non-ground state, the entire PDS(K) energy surface, except the singular areas (see the infinitesimal white strips on the left panels of Fig. 2 at θ 1 = 0) where the fidelity of the trial wave function w.r.t the target state is strictly zero, provides an upper bound energy surface for the same (ground) state. This state-specific nature makes the PDS(K)-VQS essentially explore a lower upper bound of the ground state at a given PDS order, and therefore the dynamics will not be trapped at a location that is associated with a different state. It worth mentioning that, a lower bound of the ground state energy can also be obtained from a static, and more costly, higher order PDS(K) standalone calculation as demonstrated in our previous work. [34] From this perspective, the PDS(K)-VQS approach provides an effective way to explore the possibility of pushing the low order PDS(K) results towards high accuracy that would otherwise require higher-order and more expensive PDS(K) calculations. Besides, since generalized variational principle applies in the PDS framework, [49,62] if other roots of Eq. (12) are concerned, the PDS(K)-VQS will also be able to navigate the dynamics to give lower upper bounds for excited states as long as the fidelity of the trial wave function with respect to the target state is non-zero.

H 2 and H 4 systems
We further employ the proposed PDS(K)-VQS approach to find the ground state energy of H 2 and H 4 molecular systems. For H 2 molecule, we exploit an effective Hamiltonian and an ansatz exploited by Yamamoto [71] and Bravyi et al. [9], whereŨ p,q N denotes the CNOT gate with control qubit p and target qubit q. Figure 4: The computed ground state energy (top panels), energy deviation w.r.t. exact energy (middle panels), and fidelity of the trial state (bottom panels) of the H 2 molecule iterate in the conventional VQE and PDS(K)-VQS (K = 2, 3, 4) infrastructures employing gradient descent (left panels) and natural gradient descent/imaginary time evolution (right panels). The initial rotation is given by θ = (7π/32, π/2, 0, 0). The step size η = 0.05 in all the calculations. Fig. 4 compares the VQE and PDS(K)-VQS performances exploiting the above-mentioned ansatz to find the ground state energy of the H 2 Hamiltonian. As can be seen, starting from the given initial rotation, the VQE is unable to converge to the ground state energy within 100 iterations, but rather drops to an excited state energy (−0.2 a.u. in this case). Actually, it has been shown that, [71] starting from the same initial rotation, the VQE needs to go through a "plateau" that resides at this energy value and spreads over ∼400 iterations before hitting the ground state energy (∼−0.8 a.u. in this case) regardless of the employed Riemannian metric.

GD NGD/ITE
To achieve a higher level of accuracy (e.g. chemical accuracy E( θ) − E exact ) < 1.5 × 10 −3 a.u.), low order PDS(K)-VQS typically needs more iterations than high order PDS(K)-VQS. As shown in the middle left panel of Fig. 4, by employing GD in the dynamics, it takes the PDS(4)-VQS <10 iterations to converge to the ground state energy with energy deviation being < 10 −14 a.u. regardless of the employed Riemannian metric, while it takes the PDS(2)/PDS(3)-VQS almost 100 steps to bound the deviation to be < 10 −3 a.u. Remarkably, the performance can be improved when GD is replaced by NGD/ITE in the PDS(2)/PDS(3)-VQS dynamics. In particular, within 80 iterations the PDS(3)-VQS employing NGD/ITE can converge to the accuracy level that is almost same as that of PDS(4)-VQS.
On the other hand, the quality of the trial wave function is more significantly improved in the low order PDS(K)-VQS dynamics than in the high order PDS(K)-VQS dynamics. For example, the fidelity of the trial wave function w.r.t the exact ground state gradually increases from almost zero to ∼0.35 within 100 iterations using PDS(2)-VQS regardless of the employed Riemannian metric, and this change is significantly steeper than the almost flat curves of PDS(3)/PDS(4)-VQS as shown at the bottom of Fig. 4. However, in comparison to GD, employing NGD/ITE in the PDS(3)/PDS(4)-VQS quickly improves the fidelity of the trial wave function from <0.02 to 0.2∼0.3 within 10 iterations. It is worth mentioning that since the fidelity of the trial wave function at the initial rotation is almost zero, both VQE and the static PDS(K) (K = 2, 3, 4) calculations alone cannot help identify the ground state energy in this case, which makes PDS(K)-VQS a necessary and effective approach to target ground state energy and improve the trial wave function.
Remarkably, the improvement of the trial wave function employing PDS(K)-VQS approach might be limited. This can be seen from the flat fidelity curves of the trial state driven in the PDS(3/4)-VQS dynamics after first several iterations as shown at the bottom right of Fig. 4. This is due to the fact that the PDS(K) formalism does not require the ansatz to sufficiently ap-proximate the target state, while is still able to provide systematically improvable upper bounds of the expectation value of the target state by exploring the Krylov subspace. The benefit is the great simplification of the state preparation. The limitation is also obvious in that it sometimes would be challenging to further improve the quality of the trial state within the PDS(K)-VQS framework if the energies were well converged already, which would then compromise the accuracy of the property calculations that usually requires a sufficiently accurate description of the target wave function. We also test the proposed PDS(K)-VQS approach for a slightly larger system, the planar H 4 system, where a 8-qubit circuit with 16 rotation parameters shown at the top of Fig. 5 is employed to prepare the trial wave function for finding the ground state energy. The state preparation circuit is inspired by the similar circuit that has been reported being successfully applied for preparing the trial state for the Hartree Fock state of the linear hydrogen chain systems. [2] For the planar H 4 system whose ground state is a triplet, the circuit with close-to-zero initial rotations would generate a trial state that is almost singlet, which makes the conventional VQE and the static PDS(K) (K = 2, 3, 4) simply fail. On the other hand, as shown at the bottom of Fig.  5, the PDS(K)-VQS (K = 2, 3, 4) are capable of dealing with such a tough situation and again outperform. As can be seen, within 200 iterations, PDS(K)-VQS (K = 2, 3, 4) are able to converge to the ground state energy well below chemical accuracy and improving the fidelity of the trial wave function to be >0.96. It is worth noting that even though the converged rotations obtained from the PDS(K)-VQS calculations generate a high fidelity state, the expectation value of the generated state is still ∼ 0.02 a.u. above the exact energy, and it then becomes challenging to further improve the fidelity employing the same circuit infrastructure through varying the rotations. Therefore, the circuit used here might not be sufficient for preparing true ground state in practice if higher fidelity is desired. We here intentionally employ the circuit to artificially generate an extreme challenging case to show the performance difference between conventional VQE and PDS(K)-VQS approaches.

Discussion
From Section III, it has been seen that the PDS(K)-VQS approach bears the potential of speeding up the iterations in comparison with the conventional VQE approach. However, it is worth noting that the measurement effort of evaluating H n 's (n > 1) and their derivatives are usually more expensive than that of H and its derivative, and the actual cost saving will therefore be compromised.
To have a close look at the measurement of the H n (and its impact on the total cost), we employ the following metric to give an estimate for the number of measurements, M , [23,56,69] where is the desired precision and h i 's and P i 's are the coefficients and Pauli strings representing a moment (i.e. H n = i h i P i ) and having been partitioned into certain groups, G's, in which simultaneous measurement can be performed. cov P i , P j is the covariance between two Pauli strings bounded by cov P i , P j ≤ | var(P i ) · var(P j )| (21) with the variance being computed from var(P i ) = 1 − P i 2 . Here, we assume the covariances between different Pauli strings to be zero for the brevity of the discussion. We can apply the above metric to, for example, estimate the number of measurements of H n (n = 1, 2, 3) required by the PDS(2)-VQS calculation for the complete active space (4 electrons, 4 spin-orbitals) of the planar H 4 system. Given ∼ 0.5mHartree, since H n (n = 1, 2, 3) can be generated from at most ∼ 3700 Pauli strings, the estimated number of measurements needs to be done is ∼ 4.8 × 10 9 , which is one order of magnitude higher than that for H (∼ 1.2 × 10 8 ). Thus, given the same trial state in this H 4 case, if the number of conventional VQE iterations is no more than one order of magnitude larger than that of the PDS(2)-VQS iterations, VQE would outperform PDS(2)-VQS in terms of total number of measurements, and PDS(2)-VQS outperforms otherwise. It is worth mentioning that, during the PDS(K)-VQS process for the ground state and energy, the excited state energies can also be estimated directly from the higher roots of the polynomial (12) without any additional measurement (although accurate excited state energies would require higher order PDS(K)-VQS calculations). In contrast, the conventional VQE would need distinct trial states, and thus different measurements, for targeting different states.
Generally speaking, as long as the relatively large number of measurements of the Pauli strings becomes manageable, the PDS(K)-VQS approach can be potentially applied for targeting the exact solutions for the system sizes that are not classically tractable, in particular for the systems whose true ground and excited states we have little knowledge of, or are challenging to obtain classically. To reduce the measurement demand, typical strategy is to partition the Pauli strings (that contribute to the moments) into commuting subsets that follow a certain rule, e.g. qubit-wise commutativity (QWC) [31,44], general commutativity [22,72], unitary partitioning [30], and/or Fermionic basis rotation grouping [28] to name a few. The applications of these commuting rules to the single Hamiltonian have shown that, at a cost of introducing additional one-/multiqubit unitary transformation before the measurement, the total number of required measurements can be significantly reduced from O(N 4 ) to O(N 2∼3 ), or for simpler cases even O(N ). For higher order moments, as we mentioned in the method section, early study of applying QWC bases to Heisenberg models represented by up to 36 qubits exhibits a sub-linear scaling of the number of measurements in the number of the qubits (Ref. [67]), which then leads us to expect similar scaling behaviors of the number of required measurements for evaluating moments for molecular systems. Beside exploring the commutativity of Pauli strings, other approaches including the linear combinations of unitary operators (LCU) technique [13], direct block-encoding [6,21], and quantum power methods [59] might also be worth studying for reducing the number of measurements at the cost of circuit depth. In the light of that, we plan to perform a comprehensive benchmark as a follow-up work.
Since the PDS(K)-VQS formalism involves solving linear system of equations and polynomial, there is a concern of numerical instability when applying the PDS(K)-VQS approach in optimization. Theoretically, the numerical instability of the PDS(K)-VQS approach might come from two sources, (a) the singularity and illconditioning of the matrix M in Eq. (11) that might consist of high order moments, and (b) the singularity of the Riemannian metric (R) used in the dynamics (16). In particular, the singularity of matrix M can be easily observed if the trial vector becomes very close to the exact wave function (det|M| = 0 if we replace the trial vector with exact vector). Numerically, the singularity problem can be avoided by adding a small positive number (e.g. 10 −6 ) to the eigenvalue of the matrix M or R via singular value decomposition (SVD). However, it is worthing noting that adding small perturbation to M might violate the variationality of the PDS approach, and would not be recommended to use if the strict upperbounds to the true energy are concerned. The illconditioning of matrix M could occur in the high order PDS calculations, where high order moments could make the condition number of matrix M very large. Thus, from the practical point of view, due to the potentially larger number of measurements and ill-conditioning arising from high Hamiltonian powers, lower oder PDS(K)-VQS approaches are usually more feasible. Ultimately, one would be concerned about how the PDS(K)-VQS applies to general models and how it performs on the real quantum hardware subject to the device noise. To address these concerns and explore the potential of the PDS(K)-VQS approach, we have started to launch the PDS(K)-VQS calculations for more general Hamiltonians on both simulator and the real quantum hardware. Figs. 6 and 7 exhibit some preliminary results for a four-site 2D Heisenberg model with external magnetic field, The simple circuit employed for the state preparation in both VQE and PDS(K)-VQS simulations is shown in Fig. 6, where, for the brevity of our discussion, we only treat one rotation in the state preparation as the variational pa- rameter, and fix all other three rotations. As can be seen from the noise-free simulations in Fig. 6, the PDS(2)-VQS results quickly converge within five iterations achieving ∼ 0.99 fidelity, while the performance of VQE exhibits strong dependence on the initial rotation (for θ 0 = −2.0, the conventional VQE is able to converge in 10 iterations with ∆E < 0.05 a.u. and Fidelity ∼ 0.97). When running the PDS(K)-VQS simulations for the same model on the IBM Toronto quantum hardware, as shown in Fig. 7, in comparison to the ideal curves, the PDS(2/3)-VQS optimization curves on the real hardware significantly slows down, and deviate from the exact solutions due to the error from the real machine. However, if we increase the PDS order to perform PDS(4)-VQS calculations, the accuracy of the results systematically improves. For example, in the PDS(4)-VQS approach both the computed ground state energy and the trial state (and thus the magnetization) converge within 10 iterations being very close to the exact solutions.

Conclusion
In summary, we propose a new variational quantum solver that employs the PDS energy gradient. In comparison with the usual VQE, the PDS(K)-VQS helps identify an upper bound energy surface for the ground state, and thus frees the dynamics from being trapped at local minima that refer to non-ground states. In comparison with the static PDS(K) expansions, the PDS(K)-VQS guides the rotation of the trial wave function of modest quality, and is able to achieve high accuracy at the expense of low order PDS(K) expansions. We have demonstrated the capability of the PDS(K)-VQS approach at finding the ground state and its energy for toy models, H 2 molecule, and strongly correlated planar H 4 system in some challenging situations. In all the case studies, the PDS(K)-VQS outperforms the standalone VQE and static PDS(K) calculations in terms of efficiency even at the lowest order. We also discussed the limitations of the PDS(K)-VQS approach at the current stage. In particular, the PDS(K)-VQS approach may suffer from large amount of measurements for large systems, which can nevertheless be reduced at the cost of circuit depth by working together with some measurement reduction methods. Finally, we have started to launch PDS(K)-VQS simulations for more general Hamiltonians on IBM quantum hardware. Preliminary results for Heisenberg model indicate that higher order PDS(K)-VQS approach exhibits better noise-resistance than the lower order ones. The discussed approach can be extended to any variational formulation based on the utilization of H n moments (e.g. Krylov subspace algorithms).
The data points correspond to mean value from the calculations on IBM Quantum processor ibmq_toronto with statistical error bars corresponding to 5 × 8192 shots (per point). The trial state is constructed using the circuit given in Fig. 6 with initial rotation θ 0 = −3.0.

Acknowledgement
B. P. and K. K. were supported by the "Embedding QC into Many-body Frameworks for Strongly Correlated Molecular and Materials Systems" project, which is funded by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences (BES), the Division of Chemical Sciences, Geosciences, and Biosciences. B. P. and K. K. acknowledge the use of the IBMQ for this work. The views expressed are those of the authors and do not reflect the official policy or position of IBM or the IBMQ team.

Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.