# Variational quantum algorithms and geometry

This is a Perspective on "Quantum Natural Gradient" by James Stokes, Josh Izaac, Nathan Killoran, and Giuseppe Carleo, published in Quantum 4, 269 (2020).

By John Napp (Center for Theoretical Physics, MIT).

## Variational quantum algorithms

Recent years have seen tremendous progress in the design of programmable quantum devices. Nonetheless, the construction of a scalable and fault-tolerant quantum computer appears to remain a more distant goal. Given that so-called noisy, intermediate-scale quantum (NISQ) devices [9] are coming online far earlier than full-scale quantum computers, a natural and important question is then whether NISQ devices admit a quantum speedup for a useful computational problem. To better understand the computational power of such devices, a number of NISQ-friendly algorithms have been proposed which require only modest quantum resources as compared with those required for scalable, fault-tolerant quantum computation.

One such class of algorithms could be described as hybrid quantum-classical variational algorithms. In this setting, it is typically assumed that the quantum device is capable of preparing states described by some variational ansatz. For example, the quantum device might be capable of preparing some variational set $\{| \psi_\theta \rangle \}_{\theta}$ parameterized by a real vector $\theta \in \mathbb{R}^d$, and of performing simple measurements on the state. Specializing (for example) to the case of the variational quantum eigensolver (VQE) [8], the goal is to minimize an objective function of the form $\mathcal{L}(\theta) \equiv \langle \psi_{\theta} | H | \psi_\theta \rangle$, where $H$ corresponds to some physical Hamiltonian of interest, so that the minimal value attained by $\mathcal{L}$ approximates the true ground state energy (assuming a good choice of variational ansatz is used). While the quantum computer is used to estimate $\mathcal{L}(\theta)$ at some particular point in parameter space $\theta$ via state preparations and measurements, a classical outer loop is used to perform the optimization of the objective function.

## Geometrical questions

A very broad question is then: what is the best strategy for minimizing $\mathcal{L}$? One possibility is to simulate the imaginary-time dynamics of the state under $H$ to drive it toward the ground state, while constraining the state to remain within the variational set. A method for doing this was described by McArdle et al. [6]. Another natural possibility, suggested in a number of works [11,3,10,2,7], is to simply use the quantum device to estimate the gradient $\nabla \mathcal{L}(\theta)$ and perform the optimization via stochastic gradient descent or some other general-purpose gradient-based algorithm.

Lurking in both of these approaches were unresolved geometrical questions. In the case of imaginary time evolution, McArdle et al. found that that the correct update rule for a step of their algorithm involved multiplication by the inverse of a matrix of the form $A_{ij} = \Re(\langle \frac{\partial}{\partial \theta_i} \psi_\theta | \frac{\partial}{\partial \theta_j} \psi_\theta \rangle)$. Clearly this matrix was encoding some form of geometrical information about the variational ansatz, but its interpretation from the perspective of optimization was unclear. In the case of gradient-based optimization algorithms, it was pointed out [4] that Euclidean space might not be the most natural for variational algorithms, and that indeed non-Euclidean generalizations of stochastic gradient descent could potentially achieve faster convergence in some settings.

In the present work of Stokes, Izaac, Killoran, and Carleo, the authors elucidate the geometry of parameterized quantum circuits and propose a variational quantum algorithm which corresponds to iteratively moving in the direction of steepest descent with respect to this geometry. In fact, their proposed algorithm turns out to be equivalent to a variant of the ansatz-contrained imaginary time evolution proposed by McArdle et al., providing a satisfying geometric interpretation of the latter. The algorithm that they propose is a quantum generalization of a classical optimization algorithm known as natural gradient descent [1]. Before discussing their contribution further, we’ll first take a detour to the classical regime to introduce this algorithm which sets the stage for their work.

Suppose that instead of parameterized quantum states $| \psi_\theta \rangle$, we have parameterized probability distributions $p_\theta$ and some objective function $\mathcal{L}(\theta)$ to minimize which depends only on $p_\theta$. Now, suppose we iteratively minimize $\mathcal{L}$ via gradient descent. Mathematically, this corresponds to the update rule
$$\theta_{t+1} = \theta_t – \eta \nabla \mathcal{L}(\theta_t),$$
where the small parameter $\eta > 0$ controls the stepsize. Each iteration corresponds to a step in the direction opposite to the gradient. While such a strategy seems reasonable, upon reflection we see that there is something not quite optimal about it. Namely, the goal in each iteration would ideally be to move in the direction of steepest descent. But in fact, while the negative gradient points in the direction of steepest descent in Euclidean space, this is not true in other geometries.

This distinction is relevant because Euclidean space is certainly not the most natural if the objective function depends only of the distribution, and therefore not on how the distribution is parameterized. For instance, if the geometry of the parameter space is chosen to be Euclidean, relative distances between different distributions will generally be dependent on how those distributions happen to be parameterized, despite the fact that the objective function is parameterization-invariant. It would therefore be ideal to work in a geometry which takes this into account, and replace gradient descent with a variant which moves in the direction of steepest descent with respect to this more natural geometry. In particular, the behavior of the resulting algorithm should be invariant under arbitrary re-parameterizations of the distribution.

The algorithm which does this is known as natural gradient descent, and has an update rule similar to that of gradient descent, except that the gradient is replaced by the so-called natural gradient, obtained by premultiplying the gradient by the inverse of a certain matrix $g(\theta)$ known as the Fisher Information Matrix:
$$\theta_{t+1} = \theta_t – \eta g^{-1}(\theta_t) \nabla \mathcal{L}(\theta_t).$$
One disadvantage of natural gradient descent is that computing the natural gradient in each iteration could be significantly more expensive than computing the gradient.

Just as one would hope that a natural variant of gradient descent in the classical case would be invariant with respect to re-parameterizations of the probability distribution, in the quantum case one could hope for a variant of gradient descent that is invariant under re-parameterizations of the quantum state, as the underlying objective function depends only on the quantum state. In their present work, Stokes et al. formulate such an algorithm by appropriately generalizing natural gradient descent for (pure state) variational quantum algorithms.

Roughly, they find that the correct generalization of the Fisher Information Matrix in this setting is the Fubini-Study metric tensor, which indeed reduces to the Fisher Information Matrix in the case in which the variational states simply encode probability distributions. Correspondingly, to minimize an objective function in a variational quantum algorithm using quantum natural gradient descent, one follows an update rule similar to that of (classical) natural gradient descent, except the Fisher Information Matrix $g$ is replaced by the Fubini-Study metric tensor. This update rule corresponds to that of the ansatz-constrained imaginary time evolution of McArdle et al., showing that the latter algorithm has a geometric interpretation as performing a quantum generalization of natural gradient descent.

The contribution of Stokes et al. is valuable both conceptually and practically. Conceptually, we have discussed how the quantum natural gradient provides a geometrical explanation for ansatz-constrained imaginary time evolution. Furthermore, by illuminating what the natural geometry for parameterized quantum states is, this paper sets the stage for future theoretical work on variational quantum algorithms. Indeed, their proposal was recently generalized to the case of non-unitary circuits [5].

On the practical side, a priori one might be concerned about the algorithm potentially having poor scaling with problem size. In particular, the update for a step of the algorithm requires computing the matrix-vector product $g^{-1} \nabla \mathcal{L}$, as opposed to simply computing $\nabla \mathcal{L}$ as in the case of vanilla gradient descent. Obtaining a good estimate of a single entry of $g$ requires multiple repetitions of quantum state preparations and measurements, which could potentially be very expensive and negate the benefit of working in the natural geometry. To address this concern, the authors propose that one may approximate $g$ via a diagonal or block-diagonal approximation, which may greatly alleviate the resource requirements as compared to fully constructing $g$. To support this proposal, numerical experiments are carried out which indeed show a marked advantage of the (approximated) quantum natural gradient algorithm over vanilla gradient descent and other common optimization algorithms for the toy problems studied, showing that the proposal may be no less practical than previously considered algorithms, and could in fact surpass them.

By elucidating the geometry of parameterized quantum states and describing how to take this geometry into account algorithmically, the work of Stokes et al. both advances our understanding of variational quantum algorithms and provides new ideas for improving them. For these reasons, this work will undoubtedly be influential in theoretical and numerical studies of such algorithms going forward.

### ► References

[1] Shun-ichi Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2):251–276, 1998. doi:10.1162/​089976698300017746.
https:/​/​doi.org/​10.1162/​089976698300017746

[2] Edward Farhi and Hartmut Neven. Classification with Quantum Neural Networks on Near Term Processors. arXiv e-prints, February 2018. arXiv:1802.06002.
arXiv:1802.06002

[3] Gian Giacomo Guerreschi and Mikhail Smelyanskiy. Practical optimization for hybrid quantum-classical algorithms. arXiv e-prints, January 2017. arXiv:1701.01450.
arXiv:1701.01450

[4] Aram Harrow and John Napp. Low-depth gradient measurements can improve convergence in variational hybrid quantum-classical algorithms. arXiv e-prints, January 2019. arXiv:1901.05374.
arXiv:1901.05374

[5] Bálint Koczor and Simon C. Benjamin. Quantum natural gradient generalised to non-unitary circuits. arXiv e-prints, December 2019. arXiv:1912.08660.
arXiv:1912.08660

[6] Sam McArdle, Tyson Jones, Suguru Endo, Ying Li, Simon C. Benjamin, and Xiao Yuan. Variational ansatz-based quantum simulation of imaginary time evolution. npj Quantum Information, 5(1):75, 2019. doi:10.1038/​s41534-019-0187-2.
https:/​/​doi.org/​10.1038/​s41534-019-0187-2

[7] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii. Quantum circuit learning. Phys. Rev. A, 98:032309, Sep 2018. doi:10.1103/​PhysRevA.98.032309.
https:/​/​doi.org/​10.1103/​PhysRevA.98.032309

[8] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Alán Aspuru-Guzik, and Jeremy L. O'Brien. A variational eigenvalue solver on a photonic quantum processor. Nature Communications, 5(1):4213, 2014. doi:10.1038/​ncomms5213.
https:/​/​doi.org/​10.1038/​ncomms5213

[9] John Preskill. Quantum Computing in the NISQ era and beyond. Quantum, 2:79, August 2018. doi:10.22331/​q-2018-08-06-79.
https:/​/​doi.org/​10.22331/​q-2018-08-06-79

[10] Jonathan Romero, Ryan Babbush, Jarrod R McClean, Cornelius Hempel, Peter J Love, and Alán Aspuru-Guzik. Strategies for quantum computing molecular energies using the unitary coupled cluster ansatz. Quantum Science and Technology, 4(1):014008, Oct 2018. doi:10.1088/​2058-9565/​aad3e4.