Optimizing Variational Quantum Algorithms with qBang: Efficiently Interweaving Metric and Momentum to Navigate Flat Energy Landscapes

Variational quantum algorithms (VQAs) represent a promising approach to utilizing current quantum computing infrastructures. VQAs are based on a parameterized quantum circuit optimized in a closed loop via a classical algorithm. This hybrid approach reduces the quantum processing unit load but comes at the cost of a classical optimization that can feature a flat energy landscape. Existing optimization techniques, including either imaginary time-propagation, natural gradient, or momentum-based approaches, are promising candidates but place either a significant burden on the quantum device or suffer frequently from slow convergence. In this work, we propose the quantum Broyden adaptive natural gradient (qBang) approach, a novel optimizer that aims to distill the best aspects of existing approaches. By employing the Broyden approach to approximate updates in the Fisher information matrix and combining it with a momentum-based algorithm, qBang reduces quantum-resource requirements while performing better than more resource-demanding alternatives. Benchmarks for the barren plateau, quantum chemistry, and the max-cut problem demonstrate an overall stable performance with a clear improvement over existing techniques in the case of flat (but not exponentially flat) optimization landscapes. qBang introduces a new development strategy for gradient-based VQAs with a plethora of possible improvements.


Introduction
Fostered by its anticipated potential, recent technological progress, and a surge of widespread interest, quantum computing is approaching the next level of popularity.Despite its impressive progress over the past years [1,2,3,4,5] much remains to be accomplished before a practical use moves into reach [6,7].Two of the most severe constraints are the limited number of qubits and short coherence times [8].In order to combat those challenges, mixed quantum-classical algorithms, labeled variational quantum algorithms (VQAs) [1,9,10,11,12,2,3], have been devised.VQAs split an optimization task into two entwined steps: (i) an energy estimation using the quantum processing unit (QPU) and (ii) a classical optimization of the characterizing parameters.Due to the existing challenges, the aim of developing VQAs is to ensure convergence while limiting the number of function evaluations on the QPU to a minimum.
Classical optimizers have come a long way, from vanilla gradient descent, over natural gradient methods to the modern widely used adaptive gradient-based methods (Adam) [13].Similar gradient-based approaches have been introduced for quantum algorithms [14,15,16].The nature of quantum mechanics implies that, as the system size grows, the associated Hilbert space grows exponentially.While it is our goal to leverage this complexity, the majority of available eigenstates are closely packed in energy, mimicking de facto thermal behavior for a local operator according to the eigenstate thermalization hypothesis [17].Consequently, gradients, which result in small local changes in a high-dimensional Hilbert space, decrease exponentially with increasing system size, a feature known as a barren plateau, making parametrized quantum circuits (PQCs) prone to poor convergence.Albeit not directly mitigating BPs [18,19,20,21], higherorder derivative information can aid in maneuvering the optimization landscape by accounting for its local curvature or metric [22,23].A quantity related to local curvature is the quantum Fisher information matrix (QFIM), which appears also in the context of multi-parameter estimation [24].
Estimating gradients and higher-order derivatives of quantum circuits is, unfortunately, costly, and requires many function evaluations.Given its quadratic form, for n θ parameters the QFIM requires O(n θ 2 ) function evaluations which, considering the cost of measurements, renders its use for relevant problems challenging.Stokes et al. [22] introduced for pure quantum states the quantum natural gradient (QNG).Blockdiagonal approximations of the latter require only a linear amount of function calls but discard essential information about parameter correlation which severely limits its performance [25].Generalizations of QNG to non-unitary circuits [26] as well as alternative approximation strategies have been proposed [27,28].While the specific cost of estimating the QFIM depends on the specific problem at hand, the cost for performing O(n θ 2 ) evaluations is particularly prohibitive in systems that feature vanishing gradients due to a quickly rising number of variables (e.g., the BP circuit [18]).Practical use of VQAs requires the availability of optimization strategies that provide reliable predictions with as few as possible evaluations on the QPU.
In this work, we introduce the quantum Broyden adaptive natural gradient (qBang) approach -an optimization strategy that augments the reliable momentum-based optimization Adam with an efficient update of the local metric based on the QFIM using the Broyden method [29].After initialization, qBang requires only O(n θ ) evaluations and, yet, shows considerable performance gain over QNG, Adam, and even quantum imaginary time evolution (QITE) [30,31,32,33] on flat optimization landscapes.
The remainder of this article is structured as follows: Section 2.1 recapitulates VQAs, comprising the quantum approximate optimization algorithm (QAOA) and the variational quantum eigensolver (VQE), followed by a brief review of gradient-based optimization paradigms in Sec.2.2.Sec.2.3 subsequently introduces the newly developed qBang algorithm which is extensively benchmarked and discussed in Sec. 3 for BP, max-cut, and quantum chemical systems.We finally conclude the discussion in Sec. 4 and provide an outlook toward possible applications, improvements, and future challenges.

Variational quantum algorithms
VQAs are a collection of practically applicable algorithms that harness the computational capabilities of programmable quantum devices [1,9,32].These algorithms are well suited for the hardware constraints imposed by the current generation of quantum computers, namely short coherence times, noisy operations, and the limited number of qubits [8].These near-term algorithms have been proposed for a wide range of applications, including quantum chemistry [3], classical optimization [2] and machine learning [1,34].
VQAs are composed of three key elements, which are represented in Fig. 1.The first component is the objective/cost function to be minimized.In our work, the cost function is expressed as the expectation value of the Hamiltonian, and provides information about the energy of the ground state of the Hamiltonian Ĥ. Depending on the complexity of the Hamiltonian, different Pauli strings have to be measured to get an accurate estimate of the energy.The state |ψ(θ)⟩ is represented by a parametrized quantum circuit, and the optimizable parameters of the circuit are denoted as θ = (θ 1 , θ 2 , . . ., θ n θ ) ⊤ .These parameters commonly represent the angles of unitary rotation operators.The Hamiltonian is composed of quantum operators that encode information about a chemical or classical system, such as a molecule or an optimization problem.The second component is the problem-specific circuit ansatz, |ψ(θ)⟩.These ansätze are tailored to the specific problem, and numerous works focus on finding optimal PQCs [35,36,37].A shared aspect is the use of only unitary operations, a limitation that will become relevant in the subsequent sections.The final component is the classical optimizer, which is used to find parameters that minimize the objective function [2,38,26,22].
The task of VQAs is to optimize the cost function, Eq. ( 1), by adjusting the tunable parameters θ of the circuit ansatz in a closed loop.This is done by iterating between evaluating the cost function on the quantum computer and updating the parameters using a classical optimizer.The objective is to find the set of parameters, θ * , that minimizes the cost function and provides a solution to the problem at hand.The process of evaluating the cost function and updating the parameters is repeated until the cost function converges to its minimum value or a stopping criterion is met.Current limitations in the available complexity of circuits are thus circumvented by dividing the optimization problem into small sets of quantum evaluations steered via classical parameter optimization.The circuit ansatz, cost function, and classical optimizer are problem-specific, and the choice of these components can significantly affect the algorithm's performance.
VQAs offer a versatile framework that can be broadly categorized into several areas of application.While QAOA [36] is often employed for classical optimization problems and VQE [9,35,3] is commonly used for solving quantum eigenvalue problems, these categories are not exhaustive.
QAOA has been proposed to solve various classical optimization problems [36,39,1,40,41] and is a candidate for hybrid quantum-classical computation.Here, optimization problems are encoded into an Ising Hamiltonian [39].QAOA typically suggests a circuit ansatz |ψ(θ)⟩ composed of the consecutive application of two noncommuting operators.One operator encodes the optimization problem and the other serves as a mixing Hamiltonian.The goal is to optimize the parameters, θ, of the quantum circuit to minimize L(θ), and thereby find the solution to the optimization problem.Once the quantum circuit has been optimized, bitstrings are sampled to obtain approximate solutions to the classical optimization problem.
In contrast, the VQE is the most widely studied quantum algorithm to minimize a given cost function, usually the energy, of a given quantum system, Eq. (1).A prominent example is the solution of Schrödinger's equation for molecular systems.A selected PQC is initialized, and the corresponding energy of the output state is subsequently evaluated on a quantum computer.Information about energy, gradients, and the metric can be inferred from multiple evaluations of the circuit and then used to update the parameters of the circuit with classical optimization methods [42].This process is repeated until the expectation value converges to the ground-state energy of the system (see Section 2.1).The VQE algorithm has been applied in various fields, including quantum chemistry [43] and materials science [44].

Existing optimization paradigms
Here we review the existing optimization paradigms that inspire the qBang approach.

Gradient-based Optimization
A vital component of every variational algorithm is the classical optimizer.Here, the task of the classical computer is to iterate the parameters from an initial guess θ 0 such that the cost, Eq. ( 1), is minimized.Generally, this requires several iterations, depending on the quality of the initial guess.Assuming the cost function is differentiable, this procedure can be realized with gradient descent (GD).GD uses the parameter update rule θ k+1 = θ k − η∇L k , where the step size η ∈ R + controls how much each iteration is allowed to change the parameters and ∇L k ≡ ∇L(θ k ) is the gradient of the cost function at iteration k.The norm of the gradient ∥∇L k ∥ 2 can be used as a criterion to determine when to stop the GD algorithm, as a zero norm gradient implies a stationary point.Gradients of quantum circuits can be obtained via finite-difference methods, linear combination of unitaries [16] and without the need for additional hardware by evaluating the cost function at two shifted parameter positions and using the rescaled difference of the results as an unbiased estimate of the derivative [45,38,16].

GD-based methods have apparent limitations.
If the cost function is relatively flat, the gradient will be small, and the GD may require unfeasibly many iterations to converge, even on ideal quantum devices.The noisy results on realistic devices put additional strain on the optimizer to escape flat energy landscapes as quickly as possible.As long as the cost function gradients are not completely vanishing, this problem may be mitigated by the extension of GD to include higher-order derivatives.For a second-order algorithm, this introduces the Hessian H and results in the Newton method, θ k+1 = θ k − ηH −1 k ∇L k .However, these higher-order methods are not always applicable, as the Hessian may not be positive semi-definite [46,47].Additionally, computing the Hessian is computationally expensive if the parameter space is large.To overcome these challenges, there are several quasi-Newton methods that can efficiently estimate the Hessian, such as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm or the Gauss-Newton method [48,29,49,50].
Other methods exist that are tailored to navigate flat energy landscapes.For an intuitive picture, consider a ball rolling down in a frictionless bowl.Instead of stopping at the bottom, the accumulated momentum pushes it forward and keeps the ball rolling back and forth.This idea is used in momentum-based optimizers, illustrated in a simplified form where each step is a linear combination of the previous update and the current gradient, with m k being the momentum accumulated during the optimization process, β the decay rate and η the step size.Compared to GD, these methods are more effective at escaping local minima [51].
The Adam [13] momentum-based optimizer is widely used throughout different scientific disciplines and has proven versatile and consistent in performance.
The optimization of VQAs can suffer when the energy surface becomes flat.To handle this issue, two directions can be taken.One approach is to find a good initial state that can be easily obtained and prepared on the quantum device.A typical example for chemistry applications is the uncorrelated Hartree-Fock state, but it can be expected that more complex systems will require correlated initial states.In the second approach, we utilize information about the local metric to guide each step toward the minimum, which will be discussed in detail in the following section.Overall, finding practical solutions to this problem is crucial for successfully implementing VQAs.

2.2.2
Metric-informed Optimization: quantum imaginary time evolution and quantum natural gradient As stated above, VQAs rely on a parametrization of the wave function in which the parameters represent phases of unitary gates acting on an input state.A small change in a parameter δθ i not only results in changes in the observable of interest, as utilized by GD, but also in the associated metric ⟨ψ(δθ j )|ψ(δθ i )⟩.This additional information can provide a more suitable direction for the optimization trajectory.We will briefly review QITE and QNG, representing the two most widely discussed metric-informed optimization strategies.QITE [30,31,32,33] is based on the "Wickrotated"(τ = it) [52] imaginary time Schrödinger equation and is a quantum algorithm to find the ground and excited states [53] of a quantum system.It is a variant of the imaginary time evolution (ITE) algorithm [54,55,56,57], which is a wellestablished technique in "classical" computational physics for finding the ground state of a system.The iterative application of the exponential operator with sufficiently small time-steps ∆τ [56] is exponentially damping higher energy contributions, resulting in a convergence to the ground state |Ψ 0 ⟩ if the initial state |Ψ(0)⟩ has a nonzero overlap with the ground state [30,31].However, since e −∆τ Ĥ is not unitary, it is not straightforward to directly implement ITE on quantum hardware.One option, which we will pursue in this work, is to cast QITE into a hybrid quantumclassical variational form (VarQITE) [31,32] (Fig. 1), where the target state |Ψ(τ )⟩ is encoded by a PQC |ψ(θ(τ ))⟩ = Û (θ(τ )) |ψ 0 ⟩ and the timeevolution is mapped to the parameters θ(τ ) of the variational ansatz.The rule to update the parameters θ k for the next iteration k+1 at (imaginary) time τ + ∆τ is obtained by applying McLachlan's variational principle [58] to Eq. ( 4), minimizing the difference of the time evolution of the ansatz state |ψ(τ )⟩ ≡ |ψ(θ(τ ))⟩ to the exact imaginary time evolution where ∥|ψ⟩∥ 2 = ⟨ψ|ψ⟩ is the 2-norm of a quantum state |ψ⟩ and E τ = ⟨ψ(τ )| Ĥ |ψ(τ )⟩ is the expected energy at time τ .Solving Eq. ( 5) yields the imaginary-time derivative of the parameters where F is the QFIM and ∇L the cost gradient.Eq. ( 6) allows updating the parameters for the next iteration, i.e., with a fixed time-step ∆τ and the Euler method or higher-order methods [59].∆τ is equivalent to a step size, η, in the above mentioned GD update rule.The elements of the QFIM are given by . There is a close relation between the QFIM and the Fubini-Study metric, which is the metric of parametrized pure quantum states |ψ⟩, see the Supplemental Information (SI) Section E and Refs.numbers [60,61,62,63,64,65,24,66,67] for details.The QFIM F encodes the nontrivial geometry of the parameter space [68,67] and is the quantum-analog of the classical Fisher information matrix, which is the unique Riemannian metric associated to a probability density function [69,70,71].
QNG [22] is another metric-informed optimization technique based on the principles of natural gradient descent by Amari et al. [72,73,74,75,69], initially developed for optimizing neural networks.As VarQITE, the natural gradient considers the geometry of the function's parameter space and is calculated using the inverse of the QFIM [24,76].Thus, using the QNG results in steps that are more aligned with the geometry of the parameter space and allows for faster convergence, crossing of local minima, and helps the algorithm to escape regions with vanishing gradients [22,25,23,77,18,78,20].Var-QITE and QNG are equivalent when the energy of the system, E = ⟨ Ĥ⟩, is used as the cost function [73,46,22,26], as considered in this work, see Eq. (1).
The major drawback of QITE and QNG is that computing the entire QFIM for an ansatz with n θ parameters is computationally expensive and requires measuring O(n 2 θ ) terms every iteration.Existing approximations such as the (block-) diagonal approximation of Stokes et al. [22] reduce the scaling to linear in the number of parameters, but discarding the off-diagonal elements omits essential information about correlation within the system and leads to an overall suboptimal performance [25].
The metric F and the gradient ∇L can be directly evaluated on quantum hardware [16,32,79,80].It should be noted that the metric is frequently singular due to over-parametrization of the chosen circuit ansatz and requires regularization [22,25] or comparable strategies [81,59].

Quantum Broyden Adaptive Natural Gradient
In this section, we introduce qBang, that combines the Broyden quasi-Newton method with the natural gradient and adaptive momentum approaches.We discuss the core components of qBang, as well as its motivation, mechanics, and resources required on the programmable quantum device.We also introduce a simplified version of our optimization approach, which we refer to as qBroyden.
The algorithms qBang and qBroyden utilize an adaptive approach to approximate the QFIM, drawing inspiration from the works of Amari, Park, and Fukumizu [75,82].The intuition behind this approach can be understood as follows.We would like to retain the benefits of the natural gradient method without computing the QFIM at each iteration.For this reason, we assume that the QFIM varies slowly as the parameter space is traversed.For time step k, we use a metric denoted by the matrix B k .Between steps, the metric is updated with a rank-1 perturbation given by the current gradient.In particular, B k+1 is realized as a low-pass filter process with learning rate ε k , allowing the metric to pick up momentum as the parameter space is traversed, given by the relation Conceptually, this updates the local metric with an approximation of the Hessian.In the classical setting, the Hessian is equivalent to the Fisher information matrix for certain classes of optimisation problems, e.g., with Gaussian statistics or if the connection between the probability of encountering a given state decreases exponentially with its energy density (see SI Section E).More generally, the connection to curvature is also found in the equivalence between the classical Fisher information matrix and the Hessian of the relative entropy between two parametrically separated distributions [83].We want to note that recently, Dash et al. [84] have related the QFIM with the Hessian in the context of neural quantum states by using the infidelity with respect to the exact ground state as the cost function.
The famous BFGS algorithm uses similar ideas as Eq. ( 9) but differs in approximating the Hessian using two rank-1 updates.
Instead of updating and then inverting B k+1 , we utilise the Sherman-Morrison formula to equivalently perform the update on the inverse as We select the hyperparameter ε k according to a decaying filter ε k = ε 0 /(k + 1) [73].Algorithm 1 presents the pseudo-code of the qBang optimizer, which will be briefly exercised in the following.The algorithm takes as input the learning rates η = 0.01 and ε 0 = 0.2, the decay rates β 1 = 0.9 and β 2 = 0.999, the convergence criterion γ, and the PQC U (θ) with the initial parameter vector θ 0 ∈ R n θ .In the initialization step, the algorithm sets the iteration counter k ← 0, the momentum vector m −1 ← 0 and the biased variance vector v −1 ← 0, whose role will become apparent in the following.The matrix B 0 is initialized using, either, the full Fisher information matrix (F) or an approximation as introduced in [22].Other choices for the matrix B 0 would result in variations of the algorithm.The optimization starts with the estimation of the cost function L(θ k ) and its gradient ∇L(θ k ) through quantum circuits, followed by the update of the momentum and variance vectors, similar to the Adam algorithm [13].Specifically, the algorithm calculates a weighted average of past gradients m k , with the weight given by a parameter β 1 , and uses this as a moving direction.It incorporates a moving average of the squared gradient, , with the weight given by a second parameter β 2 .The vector v k can be interpreted as the variance under the assumption of a vanishing average.Its magnitude provides information about the reliability of a gradient estimate.The moving averages are then adjusted for bias via division with (1/2) ), delivering m k and v k .The variance vector v k is then used to rescale the effective momenta into a sliding trust region , . . ., p}, i.e., increasing the stability of the algorithm by shortening unreliable steps.Unless the convergence criterion is reached, the algorithm updates the parameter vector and the metric based on the update rule Eq. (10).It also rescales ε k with the learning rate schedule, resulting in smaller updates with increasing number of optimization steps.Otherwise, if the convergence criterion is satisfied, the algorithm stops the iteration and outputs the optimal parameter vector θ * .We suggest reinitializing qBang once the update of the Fisher information matrix becomes minute, which might appear for particularly long optimization trajectories but has not been encountered in this work.
Algorithm 2 presents a simplified version of our optimization approach, which we refer to as qBroyden.Unlike qBang, qBroyden does not incorporate momentum and variance update rules and instead utilizes only the metric to update the parameter vector at each optimization step.Consequently, qBroyden is more closely related to QNG and VarQITE than qBang.
Firstly, the Fisher information matrix is guaranteed to be positive semi-definite [24].
With the Gauss-Newtonlike update, we maintain the positive semidefiniteness property through the optimisation, see SI Section G and Martens et al. [46].In fact, we apply a small regularisation to the initial QFIM to ensure that B 0 is positive definite.This is an important feature since it can happen that the QFIM is singular, particularly in overparameterized systems with multiple layers.Additionally, because the QFIM is not recalculated at each time step, this framework significantly reduces the necessary number of circuit evaluations.Lastly, incorporating momentum updates not only results in superior speed but also increases the stability with respect to hyperparameter changes (illustrated in Sec.3.4).
We want to note that a potential drawback of approximating the QFIM is that the resulting algorithms technically lose theoretically ensured convergence properties of QITE [30,31].However, this was not an issue for all the problems studied in this work.On the contrary, qBang ensured a faster and more stable convergence.
Regarding circuit evaluations, our proposed method reduces cost and increases efficiency.Each optimization step requires O(n θ ) circuit evaluations, which is on par with Adam due to the parameter-shift rule [38,45].QNG without any approximation scales as O(n 2 θ ) due to estimating the full Fisher information matrix [76].Our proposed optimizers, qBang and qBroyden, require as many circuit evaluations in the first step as QNG, and only O(n θ ) circuit evaluations per subsequent optimization step.The following sections demonstrate that the most striking advantage of qBang is its efficiency.

Results
This section presents numerical results from noise-free simulations of the new optimizers applied to several important classes of problems.We focus only on hybrid quantum-classical algorithms, which combine quantum and classical processing.The necessary quantum circuits for this study are available on GitHub [85] and additional information is provided in the SI.
Considering that quantum circuit queries are costly, our main goal is to reduce the number of circuit evaluations to obtain the parameters encoding the ground state of the PQC.Therefore, the key metric is the number of circuit evaluations.See Section 2.3 for the scaling of the number of circuit evaluations for each optimizer.Another important metric to assess the performance of the optimization is the approximation ratio.It describes how close the energy of the optimized quantum circuit is to the ground state energy.Formally, the approximation ratio is defined as where E opt is the energy obtained after optimization, and E min and E max are the theoretical min-imum and maximum energy values, respectively.We compare the optimizers Adam [13], QNG [22] with the block-diagonal approximation, as well as qBroyden and qBang using either the full or block-diagonal Fisher information in the first iteration.We largely exclude VarQITE in the following due to its prohibitive cost but show results for individual trajectories in SI Section A. It should be noted that the computational overhead for VarQITE might reduce in relation to gradient estimates when using advanced sampling techniques [86].However, the cost of simulation with sampling is considerably larger than the here employed state propagation.For QNG and Var-QITE, in case the QFIM is singular, we employ a Tikhonov regularization [87] and add 10 −7 to its diagonal.Both algorithms of qBroyden and qBang use an initial filter parameter of ϵ 0 = 0.2.For QNG and Adam, we use default parameters provided in [88].
We use identical step sizes for all algorithms to ensure a fair comparison but emphasize that the optimal step size will depend on the problem and algorithm at hand.Our investigation is comprehensive, accounting for statistical features in the random initialization, but not exhaustive, given the infinite combinations of hyperparameters and VQAs.

Barren plateau circuit
We start by illustrating the performance of the newly proposed optimizers on the BP circuit introduced in Ref. [18].This quantum circuit was initially designed to show that highly expressible circuits come with a caveat, i.e., the more freedom we give a quantum circuit, the more difficult the optimization due to vanishing gradients in the exponentially growing Hilbert space [20].The consequence: simple gradient-based optimizers fail.
Our circuit consists of an initial fixed layer of R y (π/4) gates acting on 9 qubits, followed by l layers of parameterized Pauli rotations with an entangling layer of controlled-Z gates.The objective operator is Ĥ = Ẑ1 Ẑ2 with a ground state energy of −1.The relative quality of the optimization will depend on the initial configuration, i.e., drawing a meaningful conclusion for the performance of an optimizer for a given problem requires a statistical analysis.In this manuscript, we obtain the expectation value ⟨ψ(θ)| Ĥ|ψ(θ)⟩ for a parametrization of the wavefunction which is to be optimized.Our plots show the mean and variance of 25 trajectories with randomly initialized parameters (the same for all algorithms) and a step size of η = 0.01.The PQC considered has 4, 6, 8, and 10 layers, respectively.Figure 2 illustrates the performance as a function of circuit evaluations using 4 and 6 layers.
The QNG (block-diagonal) optimizer shows a moderate improvement over Adam within the initial 5000 evaluations for a small set of parameters but loses this initial advantage in the long run.qBang, on the other hand, is substantially faster.Approximating the QFIM as blockdiagonal reduces the computational cost for the first iteration and explains the reduction in the required number of evaluations for the convergence of qBang (block-diag).The early plateau observed in the performance of qBroyden and qBang results from the upfront computational effort needed to estimate the QFIM.More relevant in practice is the number of circuit evaluations required to approximate the ground state accurately.To evaluate this, we determine the number of circuit evaluations necessary to reach an approximation ratio of 0.99 and present the results in Table 1.As shown in the table, qBang (block-diag) substantially outperforms Adam and QNG, requiring merely a third of the circuit evaluations.
While the BP circuit is of no practical use, it illustrates that qBang is a highly competitive optimizer when handling almost flat energy surfaces.We will briefly discuss classical optimization problems before moving on to quantum chemistry, arguably the most promising application for quantum computing to this date.

Quantum Approximate Optimization Algorithm
Classical combinatorial optimization can be just as hard as the optimization of quantum systems.QAOA represents a subclass of VQAs that handles the question if quantum computing could assist such classical combinatorial optimization.
We study the max-cut problem for which the cost (or energy) of the classical problem is mapped to an Ising Hamiltonian [39].The Hamiltonian for the max-cut problem is encoded using eight qubits on the quantum device.The optimization performance of the different optimizers is displayed in Fig. 3 against the number of circuit evaluations.The results are averaged over five random initializations of parameters and a step size of η = 0.06.We show the optimization trajectories for the 4-and 6-layered circuits in subplots (a) and (b), respectively.In Table 2, we compare the approximation ratios for the quantum state with the lowest expectation value, obtained by averaging over five trials for 4-, 6-, 8-, and 10-layered quantum circuits.
The optimization trajectories shown in Fig. 3 are similar in convergence behavior.One notable difference is the oscillations that qBroyden and qBang exhibit after many circuit evaluations using the full Fisher information.The oscillations result from incomplete updates of the off-diagonal elements in the Fisher information, which pushes the optimization away from the optimal direction.We elaborate on this feature in the SI Section A.1.Using the block-diagonal approximation ensures a smoother optimization.Alternatively, qBroyden and qBang could be reinitialized whenever instabilities occur.
Table 2 shows the approximation ratio averaged over five trajectories.Our proposed algorithms perform well on the 4-and 6-layered quantum circuits, while Adam outperforms all optimizers for 8-and 10-layers.Overall we observe only minor differences in convergence behavior, and the significant deviation from the optimal solution demonstrates that QAOAs face a serious challenge.It is important to note that the used circuit ansatz is likely incapable of representing a quantum state near the ground state of the classical optimization problem.[89].To construct the quantum circuits, we used the Jordan-Wigner Fermion-toqubit mapping and employed a hardware-efficient ansatz [35] that utilizes 8, 10, and 12 qubits for H 4 , LiH, and H 2 O, respectively.This ansatz is composed of l layers, each comprising a tunable R y (θ) gate on each qubit register, followed by a closed ring of CNOT gates.We compare the algorithm's performance with random and Hartree-Fock parameter initializations.Details of the molecular geometries and the Hartree-Fock parameter initialization can be found in the SI Section C. We used Pennylane [88] with the built-in PySCF interface [90] to setup our molecular systems and perform the Fermion-to-qubit mapping.
Our results provide insight into the feasibility and limitations of hardware-efficient circuit ansätze for preparing the ground state of molecular systems.In addition to assessing the optimization performance, we also analyze the physical soundness of the quantum states generated with the lowest overall energy.To this end, we calculate various observables, including the particle number, N , the total spin projection observable, Ŝz , and the total spin observable, Ŝ2 , based on the optimized quantum state |ψ(θ)⟩.

Hydrogen square, H 4
We studied four hydrogen atoms, H 4 , arranged in a square geometry with a side length of 2.25 Å. Figure 4 presents the mean energy as a function of the number of circuit evaluations for circuits with two and four layers.qBang requires substantially fewer circuit evaluations, qBroyden is on par with Adam and the performance of QNG is limited.The latter is likely due to the importance of offdiagonal components in the QFIM for correlated systems.
Upon further analysis of the quantum states generated by the PQCs, we find that, for all optimizers, the particle number ⟨ N ⟩ and total spin projection ⟨ Ŝz ⟩ observables are in proximity, but not in precise agreement with, the physical ground state (see Table 3).The deviations are most severe for the total spin ⟨ Ŝ2 ⟩ and illustrate that the total energy is not the only observable of interest for the optimization in VQEs.This issue is a common challenge for hardware-efficient ansätze and stems from the choice of the circuit ansatz rather than the optimization algorithm itself (see also SI Section B.3).We verified the numerics with an equivalent Qiskit implementation providing the same hyperparameter and initial conditions leading to the same optimization trajectory.

Lithium hydride, LiH
We studied LiH at a bond distance of 1.59 Å with the 1s orbital of Li frozen.Figure 5 clarifies that the conclusions drawn for H 4 can be largely transferred to LiH: qBang vastly outperforms its competitors and consistently finds the best estimation for the energy closest to the ground state.Furthermore, once the optimum has been obtained, the comparably small variance of the 10 trajectories indicates a reliable optimization process.Consistent with H 4 , ⟨ Ŝ2 ⟩ challenges all optimizers (see Table 3).

Water, H 2 O
We studied H 2 O with an OH distance of 0.7 Å and with an ∠(HOH) of 104.48 • with the 1s orbital of O frozen.Figure 6 illustrates the mean expectation value as a function of the number of circuit evaluations for quantum circuits consisting of two and four layers averaged over five trials.As before, qBang outperforms Adam and QNG.Interestingly, qBang with the full Fisher information is the only optimizer that manages to discover the exact ground state energy of the system in one of the optimization trajectories for two layers.The optimized circuits corresponding to the state with the lowest overall energy are analyzed in Table 4, showing an overall good performance of qBang and Adam.
Overall, qBang deliver accurate results for quantum chemistry applications at a discount.An important question remains: How resilient is this observation against changes in hyperparameters or noise?

Hyperparameter resilience
Hyperparameter resilience is important in ensuring robust and reliable optimization outcomes, especially in quantum chemistry, where the objective is to find a particular quantum state.A hyperparameter-resilient optimizer increases the chances of successfully finding the optimal solution and reduces the additional overhead of optimizing hyperparameters.
In Fig. 7, we investigate the effect of varying step size on the approximation ratio over the number of optimization steps in the BP circuit with 9 qubits and 5 layers.We use qBang, qBroyden, QNG with block diagonal, and Adam as the    optimization algorithms and optimize each circuit for 300 optimization steps.The approximation ratio, equal to one if the energy minimum is reached [see Eq. ( 11)], is used to evaluate the optimization performance.We show the approximation ratio plotted against the number of optimization steps for step sizes ranging from 0.01 to 0.7.Fig. 7 demonstrates the greatest strength of Adam -its extreme resilience.Even for large step-sizes, such as 0.7, Adam remains stable and provides reliable predictions.Approximate or perturbative second order optimization methods, such as QNG and qBroyden, are prone to instabilities when using large steps.They tend to result in unreliable predictions for the local curvature which might even further amplify a large step, resulting in oscillating or divergent behaviour.Let us emphasize here that this is not a failure of second-order informed optimization but rather its approximation.Consider for example the stepreducing influence of second-order information in Newtons method for a steep harmonic potential.
Importantly, qBang can benefit from the momentum update that it inherits from Adam and achieves a resilience located between Adam and QNG/qBroyden.An even stronger resilience of qBang could be realized by unifying the gradient update with the metric update or the use of a more controlled step size depending on the local gradient and cost function, based for example on the Wolfé conditions [91].Given the excel-lent performance in the previous section, we conclude that qBang is a promising optimizer that strikes the balance between low cost, high stability, speed, and accuracy.

Noise resilience
Understanding the resilience of quantum algorithms to various types of noise is crucial in the noisy intermediate-scale quantum (NISQ) era.Shot noise is one of the most fundamental contributors and arises due to the statistical nature of quantum measurements.Let us put our previous discussions in this context by considering first a simple BP circuit with 9 qubits and 6 layers, similar to the setup in Sec.3.1.The step size is fixed at η = 0.01, and the results are averaged over 15 random initializations of parameters with 500 shots for each circuit evaluation.
Figure 8 demonstrates that all optimizers exhibit performance closely resembling that of exact state vector simulations.Among them, qBang consistently finds the solution most efficiently.We note that with shot noise, the estimate of the initial QFIM is not guaranteed to be positive semi-definite.If necessary, we ensure invertibility (and thus positive definiteness) of the initial QFIM by shifting the diagonal by the most negative eigenvalue λ min < 0, as F PD = F + (γ reg − λ min ) 1, see SI. Section H for details.Here, γ reg > 0 is a small regularising parameter to ensure that F PD ≻ 0.
Next, we revisit quantum chemistry in the form  [23] optimizer, often used in a noisy circuit setting, to our comparison.All optimizers are run for 700 steps, with the exception of SPSA, which is run for 50000 steps.The step size is fixed at η = 0.01. Figure 9 illustrates how qBang outperforms Adam, while SPSA is failing to find the minimum.Surprisingly, the performance of qBang is even better when affected by noise, likely due to a slightly larger effective step when PD is enforced.Individual trajectories are presented in SI Section A.2.We can expect the improved performance of qBang to be thus of practical relevance for NISQ devices.
SPSA is a representative of a stochastic approach to optimization, closely related to random walk algorithms, and we refer the reader to Ref. [23,92] for a detailed discussion and possible improvements.The isolated example shown here is of anecdotal evidence and does not allow to draw any conclusion about the superiority of stochastic or gradient-based approaches.We are indeed convinced that a synergistic approach could be the most promising.

Conclusion
Quantum computing has developed into a vibrant research domain, promising nothing less than a revolution.If this ambitious target can be met depends largely on the availability of fault-tolerant hardware and efficient algorithmic design.VQAs, merging quantum evaluations on short circuits with classical optimization of the parameterized state, are a promising framework for the use of near-term quantum computing resources.However, associated energy landscapes often feature sizeable flat areas that are challenging to maneuver.Here, we have introduced qBang and qBroyden, curvature-informed gradient-based algorithms that perform better than previous approaches for relevant quantum circuits while requiring comparably few evaluations on the QPU.The reduction in quantum evaluations is achieved by performing rank-1 updates to the Fisher information matrix.Additionally, qBang utilizes a momentum-based update rule, providing an additional boost in performance and resilience to changes in hyperparameters.We provide access to qBang and qBroyden Four optimization algorithms, including qBang, qBroyden, QNG with block-diagonal approximation, and Adam, are evaluated with step sizes ranging from 0.01 to 0.7.The optimization performance is assessed using the approximation ratio, which equals one if the energy minimum is reached (see Equation Eq. ( 11)).
A dotted line at a step size of 0.01 is included to facilitate comparison with other simulations.via the freely accessible repository [93].
Our benchmarks, including QNG and Adam, are evaluated on a broad range of VQAs.First, we demonstrated for a set of BP circuits [18] that qBang is able to tackle flat energy landscapes efficiently.Second, we investigate classical optimization on QAOA circuits in the form of the max-cut problem, resulting in an overall underwhelming performance of all optimizers.Third, we moved on to quantum chemistry, arguably the most promising application for quantum computing.The associated VQEs have been investigated for three chemical compounds, namely H 4 , LiH, and H 2 O, where qBang is consistently more efficient than its competitors.Lastly, we illustrate that qBang, i.e., the combination of qBroyden and Adam, does indeed lead to a more noise-and hyper-parameter-resilient optimizer than QNG or qBroyden itself.qBang is an efficient and capable optimizer, yet the strongest aspect of our work is that it inspires a new generation of optimizers -qBang representing a first step in an evolutionary process.Such an evolution will be fostered by understanding the consequences of locality, complexity, and entanglement on the existence of BPs [94,95].
With the increasing number of qubits and their connectivity, the number of quantum Ansatz parameters will grow, resulting in increasing pressure on the classical optimizers.With this in mind, we suggest using qBang as a "convergence starter" for optimization problems that involve a sizeable number of Ansatz layers.One potential approach is to optimize the first few layers and then keep those optimized layers with their parameters as an initial guess for the next few layers to optimize.This process can be repeated recursively until all layers are optimized and could significantly reducing the number of optimization steps required to find an acceptable ground-state energy.For a last refinement, one could use the VarQITE algorithm or restart the qBang algorithm by wiping the memory.Furthermore, the Fisher information matrix encodes information about the degree of linear dependence, i.e., it can be used to maximize the efficiency of additional layers and improve stability by controlling overparametrization [96].To this end, it should be noted that an application to relevant problems with real-world devices remains a considerable challenge.

A Single trajectories including QITE
In this section, we compare the performance of qBang, qBroyden, QNG, and Adam optimizers, including QNG using the full quantum Fisher information matrix (QFIM) at each step.We consider a barren plateau (BP) circuit with 4 layers and 9 qubits, resulting in 36 tunable parameters.We optimize for 700 steps, resulting in varying circuit evaluations since the QFIM requires n 2 θ circuit evaluations while approximations such as diagonal or block-diagonal approximation require only n θ + l circuit evaluations, where n θ is the number of variational parameters and l is the number of layers in the circuit.QNG using the QFIM is equivalent, up to a constant factor, to VarQITE [31].QNG, qBang, and qBroyden require the QFIM in the first step, explaining the initial plateau in the number of circuit evaluations compared to Adam or the approximated versions.All optimizers, except for QNG with the block-diagonal approximation, converge to the exact ground state solution.The results in Fig. 10 show that a single estimate of the QFIM, in combination with an appropriate cost-efficient metric update, is sufficient to speed up convergence to the desired ground state.Figure 10: Comparison of optimization performance of Adam, QNG, qBroyden, and qBang in finding the ground state of the BP circuit.⟨ψ(θ)| Ĥ|ψ(θ)⟩ is shown as a function of the number of circuit evaluations.The step size is fixed at η = 0.01.The PQCs used consist of 4 layers.All optimizers perform 700 steps, which results in a wide range of circuit evaluations due to the expensive estimation of the Fisher information.The initial plateau in the optimization using QNG, qBroyden and qBang arises from the significant cost of initially measuring the QFIM.

A.1 Why updating the metric is important (ablation study)
In this subsection, we perform an ablation study to investigate the impact of the update rule formula on optimization performance.We use a BP circuit with 9 qubits and 6 layers and average over 10 random parameter initializations.
We show in Fig. 11 that, for the first iterations, both algorithms perform similarly, but in the long run, without a metric update, oscillations appear in the system, leading to no convergence of the optimization.To understand this behavior, let us recall that the Fisher information is a measure of how much a parametrized state changes under a change of a parameter [76].This information can be understood as an adaptive step size for each parameter to optimize.However, since the energy landscape changes during optimization, we need to adjust the Fisher information to ensure proper Figure 11: Effect of metric update on the optimization performance in a 6-layer, 9-qubit BP circuit.The performance of qBroyden is compared for ε 0 = 0 and ε 0 = 0.2.When ε 0 = 0, the update rule Eq. ( 10) is not used.For both settings, the algorithms are initialized with the full Fisher information matrix.Results are averaged over 10 random parameter initializations with 300 optimization steps each.
convergence.As shown in Figure 11, if we do not correct the metric, oscillations start after a few optimization steps when the energy landscape has undergone a sufficient change and is no longer described by the initial QFIM.On the other hand, the quasi-Newton updates to the initial metric ensure that the gradient descent is more consistent and qBroyden find the ground state quickly.
The update rule is thus crucial and provides the necessary correction to adjust the curvature of the Fisher information matrix based on the current point in the energy landscape.This has two significant advantages.First, it reduces the number of circuit queries required, and second, it simplifies the algorithm's execution on the hardware because we only need to estimate the Fisher information once on the quantum device.
In summary, the ablation study in Fig. 11 shows that correcting the metric is essential to avoid oscillations and ensure convergence of the optimization process.

A.2 Analysis of H 4 optimization trajectories under shot noise
Revisiting the H 4 circuit with 2 layers, as discussed in Section 3.5 of the main document, we now shift our focus from averaged results to an examination of individual optimization trajectories.This approach provides a more granular view of the optimizer performance under shot noise conditions.Each circuit evaluation is performed using 500 shots, and we observe the behavior across 5 random initializations.
The BP circuit ansatz.The ansatz consists of an initial layer of R y (π/4) gates followed by l layers of parameterized Pauli rotations and a controlled-Z entangling layer, initialized in the state |0⟩ n for all n qubit registers.
In the analysis, represented in Fig. 12, qBang demonstrates reliable performance in finding the ground state and outperforms the Adam and Simultaneous Perturbation Stochastic Approximation (SPSA) optimizer.Notably, SPSA, despite running for 50000 steps, struggles to locate the minimum in several cases.

B Circuit layouts and Hamiltonians
This section collects all the circuit ansätze and Hamiltonian descriptions used for the benchmarks.All of the circuits are built with l layers.The more layers the larger the expressivity of the circuit which allows for potentially more accurate solutions but also increases the linear dependence of parameters.All circuits are optimized in a closed-loop with a classical optimization algorithm to minimize ⟨ψ(θ)| Ĥ |ψ(θ)⟩, where ψ(θ) describes the circuit ansatz.

B.1 Barren plateau circuit
BPs are a major obstacle in quantum computing, hindering its potential for solving complex problems [11,20].The BP circuit is an example of this phenomenon and utilizes the objective operator Ĥ = Ẑ1 Ẑ2 with a ground state energy of −1.The circuit is initialized in the state |0⟩ n and consists of an initial fixed layer of R y (π/4) gates acting on n qubits, followed by l layers of parameterized Pauli rotations with an entangling layer of controlled-Z gates, as shown in Figure 13.This circuit is a critical benchmark for understanding and addressing the BP problem in quantum computing.

B.2 Quantum approximate optimization algorithm circuit ansatz
The Quantum Approximate Optimization Algorithm (QAOA) is a quantum algorithm that can be used to solve combinatorial optimization problems.One such problem is the max-cut problem, which involves partitioning a set of vertices in a graph into two disjoint subsets such that the number of edges between the subsets is maximized [36].
The max-cut problem is mapped onto a quantum optimization problem by constructing a cost Hamiltonian ĤC that encodes the objective function of the max-cut problem.The cost Hamiltonian is defined as follows: where E is the set of edges in the graph, and Z i and Z j are the Pauli Z operators acting on the qubits corresponding to vertices i and j, respectively.The cost Hamiltonian penalizes states in which neighboring vertices are in the same subsets since the corresponding edge contributes 1 to the energy in these states.The quantum circuit uses two non-commuting operators, the cost Hamiltonian and the mixing Hamiltonian, to evolve the system towards states that optimize the cost function.The mixing Hamiltonian Figure 14: The QAOA circuit ansatz.It is composed of alternating layers of the cost Hamiltonian and the mixing Hamiltonian.The circuit is initialized in the state |0⟩ n , where n is the number of qubits required by the cost Hamiltonian.The parameters of the circuit are optimized to maximize the expected value of the cost function.
The hardware efficient circuit ansatz is composed of l layers of parametrized single qubit R y rotations and a ring of CNOT gates to entangle the qubits.The circuit is applied to n qubits, with the parameters optimized to minimize the energy of the molecular system.
is typically a sum of Pauli X operators, acting as a "driver" that moves the system away from the initial state and encourages exploration of different states.
Figure 14 shows a QAOA circuit ansatz with one layer, applying the cost and mixing Hamiltonians.The circuit is initialized in the state |0⟩ n , which is transformed into the uniform superposition state |+⟩ n via the Hadamard gate.The QAOA provides an approximation to the optimal solution, with the quality of the approximation expected to improve as the number of layers l is increased.

B.3 Chemistry applications
We employed minimal basis sets (STO-6G) for all quantum chemistry problems and used a frozen core approximation for LiH and H 2 O (with the 1s orbital of Li and O, respectively, frozen) [89].We used a hardware-efficient ansatz (HEA) [35] that utilizes 8, 10, and 12 qubits for H 4 , LiH, and H 2 O, respectively.This ansatz is composed of l layers, each comprising a tunable R y (θ) gate on each qubit register, followed by a closed ring of CNOT gates.A 1-layer motif of the HEA for 4 qubits can be seen in Fig. 15.In the following, we list the geometries of all the studied molecular problems (in the xyz-format and atomic units): We provide a python implementation of the circuits and Hamiltonians used in this work in [85].
HEAs, like the R y Ansatz shown in Fig. 15, are commonly used in quantum computing studies of chemical and physical systems.It is, however, not trivial and thus an active field of research how increasing the number of layers affects the "expressivity" -how well |ψ(θ)⟩ can approximate the target |Ψ⟩ -of a HEA [37,97,98,99,100].This effect can be seen in the slow convergence of the total energy of H 4 with the number of ansatz layers, see Fig. 15.Nevertheless, we chose to study HEA in this work since (a) they are desirable to use as they lead to smaller errors due to hardware noise [35].However, especially because it was proven that the gradient exponentially vanishes for deep, randomly initialized HEA [18,19].

C Initialization using Hartree-Fock parameters
We present the performance starting from the Hartree-Fock parameter initialization in Fig. 16.We compare the optimization performance of four different optimizers, namely Adam, Quantum natural gradient (QNG) with block-diag approximation, qBroyden with full QFIM and qBang with block-diag and full QFIM, for finding the ground state of H 4 using a variational quantum circuit.The step size for each optimizer is set to 0.01.We employ parameterized quantum circuit (PQC) with varying numbers of layers, from 1 to 4, to explore the impact of circuit depth on the optimization performance of each optimizer.To ensure the robustness of our results, we perform 15 independent optimization runs, each with a randomly perturbed Hartree-Fock parameter initialization.Overall we see stable convergence behavior for the chosen circuit ansatz.All optimizers converge to the same minimum.

D Collection of Algorithms
This section summarizes all the optimization algorithms introduced in this work.qBroyden is a quasi-Newton method that approximates the QFIM matrix using rank-one updates.In each iteration, the inverse QFIM is updated using an updating rule that depends on the gradient and parameter differences between the current and previous iterations.Algorithm 2 presents the pseudo-code for qBroyden.
qBang is an extension of qBroyden that incorporates both the approximation of the QFIM and momentum.In each iteration, the gradients are first normalized using the adaptive moment estimation (Adam) method, and then a preconditioned gradient step is taken using the inverse QFIM.Similar to qBroyden, qBang can also incorporate QNG, QFIM, or the identity matrix as a preconditioner.Algorithm 1 presents the pseudo-code for qBang.
Momentum QNG combines momentum optimization with QNG.In each iteration, we utilize an Adam [13] inspired update for the momentum and then take a natural gradient step by using both the momentum and the QNG approximation of the QFIM.Algorithm 5 presents the pseudo-code for Momentum QNG.

E Relation between Fisher information and Hessian
For certain classes of classical optimization problems, the natural gradient method is equivalent to the Newton method.Here, we describe a class of problems where the Fisher information matrix (FIM) and Hessian are related.This relationship is well known in the literature, see, e.g., Ref. [101].
Let the random variable X ∈ D X be distributed according to the probability density function p(X; θ), where the distribution is parametrized by the continuous parameter vector θ.Through the Cramér-Rao lower bound, the FIM describes how well θ can be estimated, ideally, from observations of X.The FIM is defined as . A required condition of regularity permits us to exchange the order of integration and differentiation1 and the FIM can then be described with the second order derivatives, as Let us assume we are dealing with a stochastic optimization problem, where the task is to minimize some loss function L. That is, we want to minimize the expectation over X of some parametrized error function l(X; θ), as L = E X [l(X; θ)].Newton-based optimization involves the Hessian of L, which has = i.e., the Hessian and Fisher information matrix overlap exactly.In practice, this means that a class of problems where the natural gradient method is equivalent to the Newton method are those where the probability density function is exponential in the error function, i.e., p(X; θ) = exp(b(θ) − l(X; θ))).

F Properties of the approximate metric
For the optimization algorithms we have introduced, the update rule is applied to iterate on the metric.If the initial matrix B 0 is positive semi-definite (B 0 ⪰ 0), the update rule preserves this property for all B k .To see this, first assume B k ⪰ 0. Then it holds that (1 − ε k )B k ⪰ 0 for all ε k ∈ (0, 1).Next, ε k ∇L k ∇L ⊤ k ⪰ 0 for all ε k > 0, because = ⟨∇L k , x⟩ 2 ≥ 0 (22) for all x.The sum of two matrices that are positive semi-definite is again positive semi-definite.
Additionally, if we initialise B 0 ≻ 0, we preserve B k ≻ 0 for all k.Consequently, it follows that B −1 k exists and is positive definite for all k.
To provide further intuition for the algorithm, we study the long-term behavior of the metric under the update rule.Each step taken in the parameter space is defined by the vector ∆ k = B −1 k ∇L(θ k ).Now, we insert in the ∆ k+1 explicitly the expression to get Since lim k→∞ ε k = 0, for sufficiently large k the effective step is ∆ k ≈ B −1 k−1 ∇L(θ k ).Let us denote the second term inside the parenthesis of Eq. ( 25) by ) and refer to it as the innovation at each step.Expanding from the initial point and defining ε −1 = 0, the generic step can be written where is the matrix of corrections to the metric picked up by the innovations from the first k − 1 steps.We have that n k=0 (1 and that ∼ n −ε 0 as n → ∞.Since ε k = ε 0 /(k + 1), the innovations are attenuated ∝ k −1 , and, for some number of steps k ′ ≫ 1, the innovations can be considered negligible.In this regime, where k > k ′ , the step taken is ∆ k ∝ (k − 1) ε 0 B −1 0 − Γ k ′ ∇L(θ k ), where the approximate metric B −1 0 − Γ k ′ can be considered constant.
This behavior invites a possible modification to the algorithms, where, if convergence has not been achieved after k ′ steps, the metric is reinitialized at the current parameters by computing the full FIM matrix at θ k ′ and the algorithm restarted.

G Connection of VarQITE and QNG
As stated in the main text, there is a close relationship between the QFIM, F, and the Fubini-Study metric, A, which is given by where, . The Fubini-Study metric [60,61,62,63,64], is the metric of parametrized pure quantum states |Φ(θ)⟩.A can be expressed as the real part of a more general quantum geometric tensor (QGT) [102,63,103,104] whose imaginary part corresponds to the Berry geometrical phase [105,106,71,63].
For pure states -as we consider exclusively in this work -the Fubini-Study metric (in matrix form) is (up to a factor of 4) equivalent to the QFIM [65,24,66,67], i.e., F = 4A.The factor of 4 could, however, be absorbed by a change of variables [71] or in the time-step δτ = η 4 as we did in the main text.Thus we use the terms Fubini-study metric/QFIM and variables A and F interchangeably in the main text.
The matrices A and F describe the geometry of the parameter space rather than the energy landscape.The second term of Eq. ( 8) resolves a possible arbitrary overall phase mismatch between |Φ(θ(τ ))⟩ and the target state |Ψ(τ )⟩ along the imaginary time propagation [59,32].Using different variational principles (time-dependent/Dirac-Frenkel) [32,107] yields slightly different equations for the metric and gradient resulting in possibly complex values of the parameters θ (see Ref. number [32] for details).As θ usually refers to real-valued angles of rotational gates in a PQC, solving Eq. ( 3) from the main text using McLachlan's variational principle is preferred in the VarQITE setting, as it ensures real-valued solutions for ∂θ ∂τ .If |Φ⟩ and ∂ θ i |Φ⟩ are real (not to be confused with real parameters), the second term in Eq. ( 8) vanishes, due to the normalization of |Φ⟩ , ⟨Φ|Φ⟩ = 1 Due to the above-mentioned relation between the Fubini-Study metric and QFIM, F = 4A, Eq. ( 6) from the main text reveals that QNG is equivalent to VarQITE when the energy of the system is used as the cost function, L = ⟨ Ĥ⟩, and η = 4δτ .
Additionally, VarQITE is closely related to the stochastic reconfiguration (SR) method of Sorella [108,109,110], which is a second-order iterative approximation to the "classical" ITE.

H Ensuring Positive Definiteness of the Quantum Fisher Information Matrix
In the presence of noise, especially shot noise, the method used to estimate the QFIM may produce a matrix that is not positive semi-definite.This is problematic as it could adversely affect the optimization process, potentially leading to unstable or divergent behavior.To mitigate this issue, we employ a diagonal loading to ensure that the QFIM remains positive definite (PD).The method is straightforward but crucial for the robustness of our optimization algorithms.We first compute the eigenvalues of the QFIM.If the matrix has any negative eigenvalues, we identify the most negative one, say λ min .We then add (γ reg − λ min ) times the identity matrix to the QFIM, where γ reg is a small regularising parameter.Mathematically, this can be expressed as: Here, F is the original QFIM and 1 is the identity matrix of the same dimension as F.

Figure 1 :
Figure 1: A diagrammatic representation of a VQA consists of three main elements: an objective function that defines the problem to be solved, a PQC Û (θ) in which parameters θ are adjusted to minimize the objective and a classical optimizer that performs this minimization.The inputs for a VQA are the circuit Ansatz and initial parameter θ 0 values, while the outputs are the optimized parameter values θ * and the minimum value of the objective function, ⟨ψ(θ)| Ô |ψ(θ)⟩.

Figure 2 :
Figure 2: Comparison of optimization performance of Adam, QNG, qBroyden, and qBang in finding the ground state of the BP circuit.⟨ψ(θ)| Ĥ|ψ(θ)⟩ is shown as a function of the number of circuit evaluations.The step size is fixed at η = 0.01, and the results are averaged over 25 random initializations of parameters.The PQC used consists of 4 and 6 layers as depicted in subplots (a) and (b), respectively.The initial plateau in the optimization using qBroyden and qBang arises from the significant cost of initially measuring the QFIM.

Figure 3 :
Figure 3: Ground state optimization performance of Adam, QNG with block-diagonal approximation, qBroyden with full Fisher information matrix, and qBang with full Fisher matrix and block-diagonal approximation of the QAOA circuit of an eight qubit max-cut problem instance using a PQC.The expectation value, ⟨ψ(θ)| Ĥ|ψ(θ)⟩ is shown as a function of the number of circuit evaluations.The step size is fixed at η = 0.06, and the results are averaged over five random initializations of parameters.The PQC used consists of 4 and 6 layers as depicted in subplots (a) and (b), respectively.

Figure 4 :
Figure 4: Comparison of optimization performance of Adam, QNG using the block-diagonal approximation, qBroyden using the full Fisher matrix, and qBang with the full Fisher information and block-diagonal approximation in finding the ground state of H 4 using a PQC.The expectation value, ⟨ψ(θ)| Ĥ|ψ(θ)⟩ is shown as a function of the number of circuit evaluations.The step size is fixed at η = 0.01, and the results are averaged over 15 random initialization of parameters.The PQC consists of 2 and 4 layers, as shown in subplots (a) and (b), respectively.The initial plateau in the optimization using qBroyden and qBang arises from the significant cost of initially measuring the QFIM.

Figure 5 :
Figure 5: Comparison of optimization performance for four optimizers in finding the ground state of LiH using a PQC.The optimizers evaluated are Adam, Quantum Natural Gradient using the block-diagonal approximation, qBroyden using the full Fisher information matrix, and qBang with the full Fisher information and block-diagonal approximation.The expectation value, ⟨ψ(θ)| Ĥ|ψ(θ)⟩ is shown as a function of the number of circuit evaluations.The step size is fixed at η = 0.01, and the results are averaged over 5 random initializations of parameters.The PQC used consists of 2 and 4 layers, as shown in subplots (a) and (b), respectively.The initial plateau in the optimization using qBroyden and qBang arises from the significant cost of initially measuring the QFIM.

Figure 6 :
Figure 6: Comparison of optimization performance for four optimizers in finding the ground state of H 2 O using a PQC.The optimizers evaluated are Adam, QNG using the block-diagonal approximation, qBroyden using the full Fisher matrix, and qBang with the full Fisher information and block-diagonal approximation.The expectation value, ⟨ψ(θ)|H|ψ(θ)⟩ is shown as a function of the number of circuit evaluations.The step size is fixed at 0.01 and the results are averaged over 5 random initializations of parameters.The PQC consists of 2 and 4 layers, as shown in subplots (a) and (b), respectively.The initial plateau in the optimization using qBroyden and qBang arises from the significant cost of initially measuring the QFIM.

Figure 7 :
Figure7: Dependence of convergence behavior on the learning rate by comparing the effects of different step sizes on the optimization process.Four optimization algorithms, including qBang, qBroyden, QNG with block-diagonal approximation, and Adam, are evaluated with step sizes ranging from 0.01 to 0.7.The optimization performance is assessed using the approximation ratio, which equals one if the energy minimum is reached (see Equation Eq. (11)).A dotted line at a step size of 0.01 is included to facilitate comparison with other simulations.

Figure 8 :
Figure 8: Comparison of optimization performance of Adam, QNG, qBroyden, and qBang in finding the ground state of the BP circuit under the influence of shot noise.⟨ψ(θ)| Ĥ|ψ(θ)⟩ is shown as a function of the number of circuit evaluations.The step size is fixed at η = 0.01, and the results are averaged over 15 random initializations of parameters.The PQC used consists of 6 layers.For each evaluation 500 shots are used.The initial plateau in the optimization using qBroyden and qBang arises from the significant cost of initially measuring the QFIM.

Figure 9 :
Figure 9: Comparison of optimization performance of SPSA, Adam, and qBang with the block-diagonal approximation in finding the ground state of H 4 using a 2-layer PQC.The expectation value, ⟨ψ(θ)| Ĥ|ψ(θ)⟩ is shown as a function of the number of circuit evaluations.The step size is fixed at η = 0.01, and 500 shots are used for each evaluation.The results are averaged over 5 random initializations.Individual trajectories are presented in SI Section A.2.

Figure 12 :
Figure 12: Individual optimization trajectories for the H 4 circuit with a 2-layer PQC.The expectation value, ⟨ψ(θ)| Ĥ|ψ(θ)⟩, is shown for each circuit evaluation.The step size is fixed at η = 0.01.Each line represents a separate optimization run, illustrating the variability among trajectories.

Table 1 :
Comparison of the number of circuit evaluations required for four optimizers to reach an approximation ratio of r = 0.99 for the BP circuit, with the results averaged over the 25 optimization trajectories.The PQC used range from 4, 6, 8, to 10 layers."bd" indicates the block-diagonal approximation.

Table 2 :
Ground state energy approximation ratios of Adam, QNG with block-diagonal approximation, qBroyden, and qBang with full Fisher information and blockdiagonal approximation for the max-cut Ising Hamiltonian.Results for PQCs with 4, 6, 8, and 10 layers are shown.The values are obtained from the quantum state with the expectation value closest to the ground state averaged over the five optimization pathways with a maximum length of 1100 optimization steps."bd" indicates the block-diagonal approximation.

Table 3 :
Converged optimization results for PQCs, representing H 4 and LiH.Results for H 4 are averaged over 15 optimization trajectories, while results for LiH are averaged over 10 optimization trajectories.The ground truth for each observable is shown in the column ⟨ Ô⟩ Ψ .Observables are calculated for circuits with layers ranging from 1 to 4 based on the variational quantum state with minimum expectation value along the optimization trajectory.Bold symbols indicate the optimizer that gets closest to the ground truth.The column labeled qBang shows results by starting with the full Fisher information matrix, and the column to the right labeled F k=0 block-diag are results starting with the block-diagonal approximation.

Table 4 :
Converged optimization results for PQCs, representing H 2 O. Results are averaged over five optimization trajectories.The ground truth for each observable is shown in the column ⟨ Ô⟩ Ψ .Observables are calculated for circuits with layers ranging from 1 to 4 based on the variational quantum state with minimum expectation value along the optimization trajectory.Bold symbols indicate the optimizer that gets closest to the ground truth.The column labeled qBang shows results by starting with the full Fisher information matrix, and the column to the right labeled F k=0 block-diag are results starting with the block-diagonal approximation.
Listing 1: H 4 geometry in xyz-format and atomic units