# Gradients just got more flexible

This is a Perspective on "Measuring Analytic Gradients of General Quantum Evolution with the Stochastic Parameter Shift Rule" by Leonardo Banchi and Gavin E. Crooks, published in Quantum 5, 386 (2021).

By Johannes Jakob Meyer (Dahlem Center for Complex Quantum Systems, Freie Universität Berlin, 14195 Berlin, Germany and QMATH, Department of Mathematical Sciences, Københavns Universitet, 2100 København Ø, Denmark).

Massive efforts transformed quantum computers from a far-fetched dream into a physical reality. Ever-more capable devices manage to exploit quantum effects to perform computations. Despite the rapid speed of improvements, however, these machines are not yet capable to sustain fault-tolerant computations. They suffer from multiple impediments, such as noise, low coherence times, limited qubit numbers, and limited control. We thus find ourselves in the exciting regime of noisy intermediate-scale quantum (NISQ) devices [1]. An important milestone was recently reached when a quantum computer performed a computation that is intractable for classical computers [2] — but, sadly, with no currently known practical applications.

The search for practically relevant applications of NISQ devices, however, has become a considerable industry by now. The leading contenders to realize such applications are so-called variational quantum algorithms (VQAs) [3]. These approaches were first considered by Peruzzo et al. who introduced the variational quantum eigensolver (VQE) in 2014 [4]. The variational quantum eigensolver aims to solve a task of great interest in the field of quantum chemistry, namely to find the ground state of a given Hamiltonian.
The approach is strikingly simple: A gate-based quantum computer is used to prepare a quantum state — which is now a parametrized quantum state as it depends on the gate parameters. Measurements of the quantum state in multiple bases are then used to estimate the expected value of the Hamiltonian, i.e., the average energy of the quantum state. The gate parameters are then adjusted using a classical optimization algorithm until a minimal energy is found. Depending on the chosen parametrization and the optimization method, this can yield a reasonable approximation to the ground state. This is an example of a hybrid quantum-classical method [5], as the quantum computer is used together with a purely classical feedback loop.

Variational quantum algorithms take the same general approach but extend its reach beyond the calculation of ground state energies. The basic setup, however, is the same: one creates a parametrized quantum state, which is commonly referred to as the ansatz. But to be able to encode more complicated problems, one has to incorporate more than the expectation value of a single observable. Instead, we seek to minimize a cost function that depends on the underlying quantum state and encodes the problem of interest. Usually, this cost function will depend on multiple observables which need not have any physical meaning but are simply tools to reformulate the problem at hand for a quantum computer. Consider quantum machine learning, where different observables could, for example, be used to encode the probabilities of viewing either a cat or a dog.

The community displays a lot of creativity in finding quantum formulations of relevant problems and it is therefore unsurprising that many variational quantum algorithms have been proposed. Applications include optimization, quantum simulation, and variational factoring, to name a few [3].
The hope underlying these developments is that variational quantum algorithms can exploit their inherent access to quantum effects to find solutions out of reach for classical machines.

We have seen that the classical optimization of a multivariate cost function is an essential part of a variational quantum algorithm. This is a field that is extensively studied in classical computer science and a plethora of methods exist to solve this task. But it is known that the availability of gradients of the cost function provably speeds up the optimization [6].

That brings us to the context where the contribution of Banchi and Crooks [7] plays a significant role. To compute the gradient of the cost function in a variational quantum algorithm, we need to take derivatives of the expectation values that we evaluated on the quantum device. This is not a priori straightforward, as quantum states are very complicated objects. But surprisingly, there exists a simple formula that allows the computation of these derivatives for the most widely used class of gates, the Pauli rotations (for example $U(\theta) = e^{-i \theta X/2}$). It relies on the quantum device itself to calculate them: We assume that a quantum state is prepared by a quantum circuit that contains — among other arbitrary gates — a Pauli gate parametrized by $\theta$. At the end, we measure the expectation value of an arbitrary observable $O$ relative to the prepared quantum state. The derivative of this expectation value as a function of $\theta$ is then given by
$$\frac{\partial\langle O(\theta) \rangle}{\partial \theta} = \frac{1}{2}\left[\left\langle O\left(\theta + \frac{\pi}{2}\right)\right\rangle – \left\langle O\left(\theta – \frac{\pi}{2}\right)\right\rangle\right].$$
This means that we can calculate the derivative of any expectation value by evaluating the same circuit with the parameter in question shifted in both directions. Due to the appearance of these shifts, this formula is known as the parameter-shift rule.

At this point, an intuitive explanation of the inner workings of the parameter-shift rule is in order. Different Pauli gates correspond to rotations about different axes in Hilbert space. It is therefore not surprising that any expectation value seen as a function of the parameter of the Pauli gate can be written as a simple sine function [8]
$$\langle O(\theta) \rangle = \alpha \sin(\theta + \beta) + \gamma.$$
The amplitude $\alpha$, the phase $\beta$, and the displacement $\gamma$ are functions of the observable and the other gates in the circuit. It is immediately clear that the derivative is given by
$$\frac{\partial \langle O(\theta) \rangle}{\partial \theta} = \alpha \cos(\theta + \beta).$$
We need to express this as a function of the original expectation values. To this end, we can exploit the fact that an additional phase of $\frac{\pi}{2}$ transforms the sine into a cosine:
$$\left\langle O\left(\theta + \frac{\pi}{2}\right)\right\rangle = \alpha \cos(\theta + \beta) + \gamma.$$
With this reparametrization, we are already very close, but we still have to remove the displacement $\gamma$. Luckily, we can also a apply a shift in the opposite direction, transforming the original sine into a negative cosine while leaving the displacement untouched:
$$\left\langle O\left(\theta – \frac{\pi}{2}\right)\right\rangle = -\alpha \cos(\theta + \beta) + \gamma.$$
By subtracting the two terms we can remove the displacement:
$$\left\langle O\left(\theta + \frac{\pi}{2}\right)\right\rangle – \left\langle O\left(\theta – \frac{\pi}{2}\right)\right\rangle = 2 \alpha \cos(\theta + \beta) = 2\frac{\partial \langle O(\theta) \rangle}{\partial \theta}.$$
The parameter-shift rule follows by applying a prefactor of $1/2$ that cancels the doubled evaluation of the cosine.

To the best of my knowledge, the parameter-shift rule was first introduced by Li et al. [9] in the context of quantum optimal control. It was then adapted in the context of quantum machine learning by Mitarai et al. [10]. An in-depth study by Schuld et al. revealed that the parameter-shift rule actually holds for any gate whose generator has only two distinct eigenvalues and that parameter-shift rules also exist in continuous variable systems, namely for Gaussian gates [11]. The importance of the parameter-shift rule is further underlined by the amount of follow-up work that was done in its regard: It was shown that repeated application of the parameter-shift rule allows for the calculation of higher-order derivatives of expectation values [12] and that certain noise channels also allow for a parameter-shift rule [13].

But a central problem remained: gates whose generators have more than two distinct eigenvalues could not be differentiated with the parameter-shift rule. This was an unsatisfactory state of affairs, as these gates have practical relevance. To realize a universal gate set and to leverage non-classical effects, entangling gates are necessary. But most of the entangling gates that are native to current quantum computing platforms are not differentiable using the parameter-shift rule.
It was thus necessary to compose the native operations to obtain differentiable entangling operations, adding additional depth to the quantum circuits in question. As low coherence times are one of the main obstruction to quantum computer performance to date, additional overheads hurt badly.

Crooks targeted this problem by decomposing the gate in question into simpler gates [14], but the method remained somewhat unwieldy. The contribution of Banchi and Crooks [7] that motivated this perspective article solved this conundrum elegantly by relaxing the requirements. Instead of producing an exact formula, they combine an integral expression of the gradient with Monte-Carlo sampling to give an unbiased estimate of the gradient. It was already outlined by Sweke et al. that, due to the inherent quantum randomness of the measurement outcomes, one can not expect anything more than an unbiased estimate of the gradient anyways. This relaxation of the requirements is therefore not detrimental and convergence of gradient descent is still guaranteed [15].

Banchi and Crooks have taken another step in the ongoing quest to squeeze the most out of near-term quantum devices, making the parameter-shift rule available for a much wider class of gates. This general purpose tool, however, does not mark the end of this research line. Exact parameter-shift rules for specific, more complicated gates are still of high interest. Kottmann et al. have developed a parameter-shift rule for gates that model fermionic excitations [16], which will notably simplify gradient calculations for quantum chemistry applications. It remains to be seen if there are other exact parameter-shift rules for interesting gates that have more than two distinct eigenvalues. Another intriguing question would be to identify conditions when a parameter-shift rule can not work by providing an explicit no-go theorem.

### ► References

[1] John Preskill, Quantum computing in the NISQ era and beyond, Quantum 2, 79 (2018), arXiv:1801.00862.
https:/​/​doi.org/​10.22331/​q-2018-08-06-79
arXiv:1801.00862

[2] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando G. S. L. Brandao, David A. Buell, Brian Burkett, Yu Chen, Zijun Chen, Ben Chiaro, Roberto Collins, William Courtney, Andrew Dunsworth, Edward Farhi, Brooks Foxen, Austin Fowler, Craig Gidney, Marissa Giustina, Rob Graff, Keith Guerin, Steve Habegger, Matthew P. Harrigan, Michael J. Hartmann, Alan Ho, Markus Hoffmann, Trent Huang, Travis S. Humble, Sergei V. Isakov, Evan Jeffrey, Zhang Jiang, Dvir Kafri, Kostyantyn Kechedzhi, Julian Kelly, Paul V. Klimov, Sergey Knysh, Alexander Korotkov, Fedor Kostritsa, David Landhuis, Mike Lindmark, Erik Lucero, Dmitry Lyakh, Salvatore Mandrà, Jarrod R. McClean, Matthew McEwen, Anthony Megrant, Xiao Mi, Kristel Michielsen, Masoud Mohseni, Josh Mutus, Ofer Naaman, Matthew Neeley, Charles Neill, Murphy Yuezhen Niu, Eric Ostby, Andre Petukhov, John C. Platt, Chris Quintana, Eleanor G. Rieffel, Pedram Roushan, Nicholas C. Rubin, Daniel Sank, Kevin J. Satzinger, Vadim Smelyanskiy, Kevin J. Sung, Matthew D. Trevithick, Amit Vainsencher, Benjamin Villalonga, Theodore White, Z. Jamie Yao, Ping Yeh, Adam Zalcman, Hartmut Neven, and John M. Martinis, Quantum supremacy using a programmable superconducting processor, Nature 574, 505 (2019).
https:/​/​doi.org/​10.1038/​s41586-019-1666-5

[3] M. Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C. Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R. McClean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, and Patrick J. Coles, Variational Quantum Algorithms, (2020), arXiv:2012.09265.
arXiv:2012.09265

[4] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Alán Aspuru-Guzik, and Jeremy L. O’Brien, A variational eigenvalue solver on a photonic quantum processor, Nat. Commun. 5, 4213 (2014), arXiv:1304.3061.
https:/​/​doi.org/​10.1038/​ncomms5213
arXiv:1304.3061

[5] Jarrod R. McClean, Jonathan Romero, Ryan Babbush, and Alán Aspuru-Guzik, The theory of variational hybrid quantum-classical algorithms, New J. Phys. 18, 023023 (2016), arXiv:1509.04279.
https:/​/​doi.org/​10.1088/​1367-2630/​18/​2/​023023
arXiv:1509.04279

[6] Aram Harrow and John Napp, Low-depth gradient measurements can improve convergence in variational hybrid quantum-classical algorithms, (2019), arXiv:1901.05374.
arXiv:1901.05374

[7] Leonardo Banchi and Gavin E. Crooks, Measuring Analytic Gradients of General Quantum Evolution with the Stochastic Parameter Shift Rule, (2020), arXiv:2005.10299.
https:/​/​doi.org/​10.22331/​q-2021-01-25-386
arXiv:2005.10299

[8] Javier Gil Vidal and Dirk Oliver Theis, Calculus on parameterized quantum circuits, (2018), arXiv:1812.06323.
arXiv:1812.06323

[9] Jun Li, Xiaodong Yang, Xinhua Peng, and Chang-Pu Sun, Hybrid Quantum-Classical Approach to Quantum Optimal Control, Phys. Rev. Lett. 118, 150503 (2017), arXiv:1608.00677.
https:/​/​doi.org/​10.1103/​PhysRevLett.118.150503
arXiv:1608.00677

[10] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Quantum circuit learning, Phys. Rev. A 98, 032309 (2018), arXiv:1803.00745.
https:/​/​doi.org/​10.1103/​PhysRevA.98.032309
arXiv:1803.00745

[11] Maria Schuld, Ville Bergholm, Christian Gogolin, Josh Izaac, and Nathan Killoran, Evaluating analytic gradients on quantum hardware, Phys. Rev. A 99, 032331 (2019), arXiv:1811.11184.
https:/​/​doi.org/​10.1103/​PhysRevA.99.032331
arXiv:1811.11184

[12] Andrea Mari, Thomas R. Bromley, and Nathan Killoran, Estimating the gradient and higher-order derivatives on quantum hardware, Phys. Rev. A 103, 012405 (2021), arXiv:2008.06517.
https:/​/​doi.org/​10.1103/​PhysRevA.103.012405
arXiv:2008.06517

[13] Johannes Jakob Meyer, Johannes Borregaard, and Jens Eisert, A variational toolbox for quantum multi-parameter estimation, (2020), arXiv:2006.06303.
arXiv:2006.06303

[14] Gavin E. Crooks, Gradients of parameterized quantum gates using the parameter-shift rule and gate decomposition, (2019), arXiv:1905.13311.
arXiv:1905.13311

[15] Ryan Sweke, Frederik Wilde, Johannes Jakob Meyer, Maria Schuld, Paul K. Fährmann, Barthélémy Meynard-Piganeau, and Jens Eisert, Stochastic gradient descent for hybrid quantum-classical optimization, Quantum 4, 314 (2020), arXiv:1910.01155.
https:/​/​doi.org/​10.22331/​q-2020-08-31-314
arXiv:1910.01155

[16] Jakob S. Kottmann, Abhinav Anand, and Alán Aspuru-Guzik, A feasible approach for automatically differentiable unitary coupled-cluster on quantum computers, (2020), arXiv:2011.05938.
arXiv:2011.05938

### Cited by

[1] Juneseo Lee, Alicia B. Magann, Herschel A. Rabitz, and Christian Arenz, "Progress toward favorable landscapes in quantum combinatorial optimization", Physical Review A 104 3, 032401 (2021).

[2] Saad Yalouz, Emiel Koridon, Bruno Senjean, Benjamin Lasorne, Francesco Buda, and Lucas Visscher, "Analytical Nonadiabatic Couplings and Gradients within the State-Averaged Orbital-Optimized Variational Quantum Eigensolver", Journal of Chemical Theory and Computation 18 2, 776 (2022).

The above citations are from Crossref's cited-by service (last updated successfully 2022-10-01 17:53:58). The list may be incomplete as not all publishers provide suitable and complete citation data.

On SAO/NASA ADS no data on citing works was found (last attempt 2022-10-01 17:53:58).