Stochastic gradient descent for hybrid quantum-classical optimization

Ryan Sweke1, Frederik Wilde1, Johannes Meyer1, Maria Schuld2,3, Paul K. Faehrmann1, Barthélémy Meynard-Piganeau4, and Jens Eisert1,5,6

1Dahlem Center for Complex Quantum Systems, Freie Universität Berlin, 14195 Berlin, Germany
2Xanadu, 777 Bay Street, Toronto, Ontario, Canada
3Quantum Research Group, University of KwaZulu-Natal, 4000 Durban, South Africa
4Department of Physics, Ecole Polytechnique, Palaiseau, France
5Helmholtz Center Berlin, 14109 Berlin, Germany
6Department of Mathematics and Computer Science, Freie Universität Berlin, D-14195 Berlin

Find this paper interesting or want to discuss? Scite or leave a comment on SciRate.


Within the context of hybrid quantum-classical optimization, gradient descent based optimizers typically require the evaluation of expectation values with respect to the outcome of parameterized quantum circuits. In this work, we explore the consequences of the prior observation that estimation of these quantities on quantum hardware results in a form of $stochastic$ gradient descent optimization. We formalize this notion, which allows us to show that in many relevant cases, including VQE, QAOA and certain quantum classifiers, estimating expectation values with $k$ measurement outcomes results in optimization algorithms whose convergence properties can be rigorously well understood, for any value of $k$. In fact, even using single measurement outcomes for the estimation of expectation values is sufficient. Moreover, in many settings the required gradients can be expressed as linear combinations of expectation values -- originating, e.g., from a sum over local terms of a Hamiltonian, a parameter shift rule, or a sum over data-set instances -- and we show that in these cases $k$-shot expectation value estimation can be combined with sampling over terms of the linear combination, to obtain ``doubly stochastic'' gradient descent optimizers. For all algorithms we prove convergence guarantees, providing a framework for the derivation of rigorous optimization results in the context of near-term quantum devices. Additionally, we explore numerically these methods on benchmark VQE, QAOA and quantum-enhanced machine learning tasks and show that treating the stochastic settings as hyper-parameters allows for state-of-the-art results with significantly fewer circuit executions and measurements.

► BibTeX data

► References

[1] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik. The theory of variational hybrid quantum-classical algorithms. New J. Phys., 18: 23023, 2016. 10.1088/​1367-2630/​18/​2/​023023.

[2] J. Preskill. Quantum Computing in the NISQ era and beyond. Quantum, 2: 79, August 2018. 10.22331/​q-2018-08-06-79.

[3] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O'Brien. A variational eigenvalue solver on a photonic quantum processor. Nature Comm., 5 (1), 2014. 10.1038/​ncomms5213.

[4] E. Farhi, J. Goldstone, and S. Gutmann. A quantum approximate optimization algorithm. arXiv:1411.4028, 2014.

[5] M. Schuld, A. Bocharov, K. M. Svore, and N. Wiebe. Circuit-centric quantum classifiers. Physical Review A, 101 (3): 032308, 2020. 10.1103/​PhysRevA.101.032308.

[6] E. Farhi and H. Neven. Classification with quantum neural networks on near term processors. 2018. arxiv:1802.06002.

[7] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini. Parameterized quantum circuits as machine learning models. Quantum Science and Technology, 4 (4): 043001, 2019. 10.1088/​2058-9565/​ab4eb5.

[8] D. Zhu, N. M. Linke, M. Benedetti, K. A. Landsman, N. H. Nguyen, C. H. Alderete, A. Perdomo-Ortiz, N. Korda, A. Garfoot, C. Brecque, et al. Training of quantum circuits on a hybrid quantum computer. Science advances, 5 (10): eaaw9918, 2019. 10.1126/​sciadv.aaw9918.

[9] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http:/​/​

[10] A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature, 549 (7671): 242–246, 2017a. 10.1038/​nature23879.

[11] A. Harrow and J. Napp. Low-depth gradient measurements can improve convergence in variational hybrid quantum-classical algorithms. arXiv:1901.05374, 2019.

[12] A. Gilyén, S. Arunachalam, and N. Wiebe. Optimizing quantum optimization algorithms via faster quantum gradient computation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1425–1444. SIAM, 2019. 10.1137/​1.9781611975482.87.

[13] G. Verdon, J. Pye, and M. Broughton. A universal training algorithm for quantum deep learning. 2018. arXiv:1806.09729.

[14] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, M. S. Alam, S. Ahmed, J. M. Arrazola, C. Blank, A. Delgado, S. Jahangiri, K. McKiernan, J. J. Meyer, Z. Niu, A. Száva, and N. Killoran. Pennylane: Automatic differentiation of hybrid quantum-classical computations. 2018. arXiv:1811.04968.

[15] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii. Quantum circuit learning. Phys. Rev. A, 98 (3): 32309, 2018. 10.1103/​PhysRevA.98.032309.

[16] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and N. Killoran. Evaluating analytic gradients on quantum hardware. Phys. Rev. A, 99 (3): 32331, 2019. 10.1103/​PhysRevA.99.032331.

[17] A. Kandala, A. Mezzcapo, K. Temme, M. Takita, M. Brink, J. W. Chow, and J. M. Gambetta. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature, 549: 242, 2017b. 10.1038/​nature23879.

[18] V. Leyton-Ortega, A. Perdomo-Ortiz, and O. Perdomo. Robust implementation of generative modeling with parametrized quantum circuits. arXiv:1901.08047, 2019.

[19] V. Havlíček, A. D. Córcoles, K. Temme, A. W. Harrow, A. Kandala, J. M. Chow, and J. M. Gambetta. Supervised learning with quantum-enhanced feature spaces. Nature, 567 (7747): 209–212, 2019. 10.1038/​s41586-019-0980-2.

[20] S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, New York, NY, USA, 2014. ISBN 1107057132, 9781107057135.

[21] H. Karimi, J. Nutini, and M. Schmidt. Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 795–811. Springer, 2016. 10.1007/​978-3-319-46128-1_50.

[22] L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010, pages 177–186. Springer, 2010. 10.1007/​978-3-7908-2604-3_16.

[23] L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in neural information processing systems, pages 161–168, 2008.

[24] R. Kleinberg, Y. Li, and Y. Yuan. An alternative view: When does sgd escape local minima? 2018. arXiv:1802.06175.

[25] S. Ruder. An overview of gradient descent optimization algorithms. arXiv:1609.04747, 2016.

[26] B. Dai, B. Xie, N. He, Y. Liang, A. Raj, M.-F. F. Balcan, and L. Song. Scalable kernel methods via doubly stochastic gradients. In Advances in Neural Information Processing Systems, pages 3041–3049, 2014.

[27] C.-L. Li and B. Póczos. Utilize old coordinates: Faster doubly stochastic gradients for kernel methods. In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI’16, page 467–476, Arlington, Virginia, USA, 2016. AUAI Press. ISBN 9780996643115.

[28] J. M. Kübler, A. Arrasmith, L. Cincio, and P. J. Coles. An adaptive optimizer for measurement-frugal variational algorithms. Quantum, 4: 263, 2020. 10.22331/​q-2020-05-11-263.

[29] L. Bottou. Stochastic gradient learning in neural networks. Proceedings of Neuro-N\imes, 91 (8): 12, 1991.

[30] M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized stochastic gradient descent. In Advances in neural information processing systems, pages 2595–2603, 2010.

[31] B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in neural information processing systems, pages 693–701, 2011.

[32] X. Li and F. Orabona. On the convergence of stochastic gradient descent with adaptive stepsizes. In K. Chaudhuri and M. Sugiyama, editors, Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pages 983–992. PMLR, 16–18 Apr 2019. URL http:/​/​​v89/​li19c.html.

[33] E. Campbell. Random compiler for fast hamiltonian simulation. Phys. Rev. Lett., 123: 70503, 2019. 10.1103/​PhysRevLett.123.070503.

[34] P. Gokhale, O. Angiuli, Y. Ding, K. Gui, T. Tomesh, M. Suchara, M. Martonosi, and F. T. Chong. Minimizing state preparations in variational quantum eigensolver by partitioning into commuting families. 2019. arXiv:1907.13623.

[35] S. Raeisi, N. Wiebe, and B. C. Sanders. Quantum-circuit design for efficient simulations of many-body quantum dynamics. New J. Phys., 14: 103017, 2012. 10.1088/​1367-2630/​14/​10/​103017.

[36] A. J. Lee. U-statistics: Theory and Practice. Routledge, 2019.

[37] J. Chen and R. Luss. Stochastic gradient descent with biased but consistent gradient estimators. arXiv:1807.11880, 2018.

[38] P. H. Nguyen, L. M. Nguyen, and M. van Dijk. Tight dimension independent lower bound on optimal expected convergence rate for diminishing step sizes in sgd. 2018. arXiv:1810.04723.

[39] M. M. Wolf. Mathematical foundations of supervised learning. https:/​/​​foswiki/​pub/​M5/​Allgemeines/​MA4801_2018S/​ML_notes_main.pdf [Online; accessed 27-September-2019].

[40] H. H. Sohrab. Basic real analysis, volume 231. Birkhäuser, Basel, 2003. 10.1007/​978-1-4939-1841-6.

[41] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.

[42] https:/​/​​frederikwilde/​qradient.

[43] N. Schuch, M. M. Wolf, F. Verstraete, and J. I. Cirac. Entropy scaling and simulability by matrix product states. Phys. Rev. Lett., 100: 30504, January 2008. 10.1103/​PhysRevLett.100.030504.

[44] S. L. Smith, P.-J. Kindermans, C. Ying, and Q. V. Le. Don't decay the learning rate, increase the batch size. arXiv:1711.00489, 2017.

[45] B. Korte and J. Vygen. Combinatorial Optimization: Theory and Algorithms. Springer Publishing Company, Incorporated, 4th edition, 2007. ISBN 3540718435.

[46] L. Zhou, S.-T. Wang, S. Choi, H. Pichler, and M. D. Lukin. Quantum approximate optimization algorithm: Performance, mechanism, and implementation on near-term devices. Physical Review X, 10 (2): 021067, 2020. 10.1103/​PhysRevX.10.021067.

[47] M. Schuld and F. Petruccione. Supervised learning with quantum computers, volume 17. Springer, 2018. 10.1007/​978-3-319-96424-9.

[48] L. Gentini, A. Cuccoli, S. Pirandola, P. Verrucchi, and L. Banchi. Noise-assisted variational hybrid quantum-classical optimization. arXiv:1912.06744, 2019.

[49] E. Conover. Google moves toward quantum supremacy with 72-qubit computer. ScienceNews, 193: 13, 2018.

[50] C. Vu. IBM announces advances to IBM quantum systems and ecosystem, November 2017. IBM press release.

[51] C. Kokail, C. Maier, R. van Bijnen, T. Brydges, M. K. Joshi, P. Jurcevic, C. A. Muschik, P. Silvi, R. Blatt, C. F. Roos, and P. Zoller. Self-verifying variational quantum simulation of the lattice schwinger model. Nature, 569: 355, 2019. 10.1038/​s41586-019-1177-4.

[52] H. Bernien, S. Schwartz, A. Keesling, H. Levine, A. Omran, H. Pichler, S. Choi, A. S. Zibrov, M. Endres, M. Greiner, V. Vuletic, and M. D. Lukin. Probing many-body dynamics on a 51-atom quantum simulator. Nature, 551: 579–584, 2017. 10.1038/​nature24622.

[53] J. Zhang, G. Pagano, P. W. Hess, A. Kyprianidis, P. Becker, H. Kaplan, A. V. Gorshkov, Z.-X. Gong, and C. Monroe. Observation of a many-body dynamical phase transition with a 53-qubit quantum simulator. Nature, 551: 601–604, 2017. 10.1038/​nature24654.

[54] M. J. Bremner, A. Montanaro, and D. J. Shepherd. Average-case complexity versus approximate simulation of commuting quantum computations. Phys. Rev. Lett., 117: 80501, August 2016. 10.1103/​PhysRevLett.117.080501.

[55] S. Aaronson and A. Arkhipov. The computational complexity of linear optics. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 333–342, 2011. 10.1145/​1993636.1993682.

[56] C. Neill, P. Roushan, K. Kechedzhi, S. Boixo, S. V. Isakov, V. Smelyanskiy, R. Barends, B. Burkett, Y. Chen, and Z. Chen. A blueprint for demonstrating quantum supremacy with superconducting qubits. Science, 360: 195–199, 2018. 10.1126/​science.aao4309.

[57] J. Bermejo-Vega, D. Hangleiter, M. Schwarz, R. Raussendorf, and J. Eisert. Architectures for quantum simulation showing a quantum speedup. Phys. Rev. X, 8: 21010, 2018. 10.1103/​PhysRevX.8.021010.

[58] X. Gao, S.-T. Wang, and L.-M. Duan. Quantum supremacy for simulating a translation-invariant ising spin model. Phys. Rev. Lett., 118: 40502, 2017. 10.1103/​PhysRevLett.118.040502.

[59] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell, et al. Quantum supremacy using a programmable superconducting processor. Nature, 574 (7779): 505–510, 2019. 10.1038/​s41586-019-1666-5.

[60] Scientific co2nduct. online. URL https:/​/​

Cited by

[1] Leonardo Banchi and Gavin E. Crooks, "Measuring Analytic Gradients of General Quantum Evolution with the Stochastic Parameter Shift Rule", Quantum 5, 386 (2021).

[2] Andrew Patterson, Hongxiang Chen, Leonard Wossnig, Simone Severini, Dan Browne, and Ivan Rungger, "Quantum state discrimination using noisy quantum neural networks", Physical Review Research 3 1, 013063 (2021).

[3] Srikar Kasi and Kyle Jamieson, Proceedings of the 26th Annual International Conference on Mobile Computing and Networking 1 (2020) ISBN:9781450370851.

[4] Laura Gentini, Alessandro Cuccoli, Stefano Pirandola, Paola Verrucchi, and Leonardo Banchi, "Noise-resilient variational hybrid quantum-classical optimization", Physical Review A 102 5, 052414 (2020).

[5] Frédéric Sauvage and Florian Mintert, "Optimal Quantum Control with Poor Statistics", arXiv:1909.01229, PRX Quantum 1 2, 020322 (2020).

[6] Tyler Volkoff and Patrick J Coles, "Large gradients via correlation in random parameterized quantum circuits", Quantum Science and Technology 6 2, 025008 (2021).

[7] Ljubomir Budinski, "Quantum algorithm for the advection–diffusion equation simulated with the lattice Boltzmann method", Quantum Information Processing 20 2, 57 (2021).

[8] Andrea Mari, Thomas R. Bromley, and Nathan Killoran, "Estimating the gradient and higher-order derivatives on quantum hardware", Physical Review A 103 1, 012405 (2021).

[9] Johannes Jakob Meyer, "Gradients just got more flexible", Quantum Views 5, 50 (2021).

[10] Sam McArdle, Suguru Endo, Alán Aspuru-Guzik, Simon C. Benjamin, and Xiao Yuan, "Quantum computational chemistry", Reviews of Modern Physics 92 1, 015003 (2020).

[11] Carlos Bravo-Prieto, Ryan LaRose, M. Cerezo, Yigit Subasi, Lukasz Cincio, and Patrick J. Coles, "Variational Quantum Linear Solver", arXiv:1909.05820.

[12] Michael Broughton, Guillaume Verdon, Trevor McCourt, Antonio J. Martinez, Jae Hyeon Yoo, Sergei V. Isakov, Philip Massey, Murphy Yuezhen Niu, Ramin Halavati, Evan Peters, Martin Leib, Andrea Skolik, Michael Streif, David Von Dollen, Jarrod R. McClean, Sergio Boixo, Dave Bacon, Alan K. Ho, Hartmut Neven, and Masoud Mohseni, "TensorFlow Quantum: A Software Framework for Quantum Machine Learning", arXiv:2003.02989.

[13] Seth Lloyd, Maria Schuld, Aroosa Ijaz, Josh Izaac, and Nathan Killoran, "Quantum embeddings for machine learning", arXiv:2001.03622.

[14] M. Cerezo, Kunal Sharma, Andrew Arrasmith, and Patrick J. Coles, "Variational Quantum State Eigensolver", arXiv:2004.01372.

[15] Andrew Arrasmith, Lukasz Cincio, Rolando D. Somma, and Patrick J. Coles, "Operator Sampling for Shot-frugal Optimization in Variational Algorithms", arXiv:2004.06252.

[16] Bálint Koczor and Simon C. Benjamin, "Quantum natural gradient generalised to non-unitary circuits", arXiv:1912.08660.

[17] Barnaby van Straaten and Bálint Koczor, "Measurement cost of metric-aware variational quantum algorithms", arXiv:2005.05172.

[18] Johannes Jakob Meyer, Johannes Borregaard, and Jens Eisert, "A variational toolbox for quantum multi-parameter estimation", arXiv:2006.06303.

[19] Bálint Koczor, Suguru Endo, Tyson Jones, Yuichiro Matsuzaki, and Simon C. Benjamin, "Variational-state quantum metrology", New Journal of Physics 22 8, 083038 (2020).

[20] Jonas M. Kübler, Andrew Arrasmith, Lukasz Cincio, and Patrick J. Coles, "An Adaptive Optimizer for Measurement-Frugal Variational Algorithms", arXiv:1909.09083.

[21] Guoming Wang, Dax Enshan Koh, Peter D. Johnson, and Yudong Cao, "Minimizing estimation runtime on noisy quantum computers", arXiv:2006.09350.

[22] Bálint Koczor and Simon C. Benjamin, "Quantum Analytic Descent", arXiv:2008.13774.

[23] Yingkai Ouyang, David R. White, and Earl T. Campbell, "Compilation by stochastic Hamiltonian sparsification", arXiv:1910.06255.

[24] Abhinav Anand, Jonathan Romero, Matthias Degroote, and Alán Aspuru-Guzik, "Experimental demonstration of a quantum generative adversarial network for continuous distributions", arXiv:2006.01976.

[25] Alexey Uvarov and Jacob Biamonte, "On barren plateaus and cost function locality in variational quantum algorithms", arXiv:2011.10530.

[26] Sirui Lu, Lu-Ming Duan, and Dong-Ling Deng, "Quantum adversarial machine learning", Physical Review Research 2 3, 033212 (2020).

[27] Dan-Bo Zhang, Zhan-Hao Yuan, and Tao Yin, "Variational quantum eigensolvers by variance minimization", arXiv:2006.15781.

[28] Sheng-Hsuan Lin, Rohit Dilip, Andrew G. Green, Adam Smith, and Frank Pollmann, "Real- and imaginary-time evolution with compressed quantum circuits", arXiv:2008.10322.

[29] Re-Bing Wu, Xi Cao, Pinchen Xie, and Yu-xi Liu, "End-To-End Quantum Machine Learning Implemented with Controlled Quantum Dynamics", Physical Review Applied 14 6, 064020 (2020).

[30] Alexey Uvarov, Jacob D. Biamonte, and Dmitry Yudin, "Variational quantum eigensolver for frustrated quantum systems", Physical Review B 102 7, 075104 (2020).

[31] Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu, Shan You, and Dacheng Tao, "On the learnability of quantum neural networks", arXiv:2007.12369.

[32] Dan-Bo Zhang and Tao Yin, "Collective optimization for variational quantum eigensolvers", Physical Review A 101 3, 032311 (2020).

[33] Gabriel Matos, Sonika Johri, and Zlatko Papić, "Quantifying the efficiency of state preparation via quantum variational eigensolvers", arXiv:2007.14338.

[34] Richie Yeung, "Diagrammatic Design and Study of Ansätze for Quantum Machine Learning", arXiv:2011.11073.

[35] Ding Liu, Zekun Yao, and Quan Zhang, "Quantum-Classical Machine learning by Hybrid Tensor Networks", arXiv:2005.09428.

[36] Zsolt Tabi, Kareem H. El-Safty, Zsófia Kallus, Péter Hága, Tamás Kozsik, Adam Glos, and Zoltán Zimborás, "Quantum Optimization for the Graph Coloring Problem with Space-Efficient Embedding", arXiv:2009.07314.

[37] Srikar Kasi and Kyle Jamieson, "Towards Quantum Belief Propagation for LDPC Decoding in Wireless Networks", arXiv:2007.11069.

The above citations are from Crossref's cited-by service (last updated successfully 2021-03-04 09:12:10) and SAO/NASA ADS (last updated successfully 2021-03-04 09:12:11). The list may be incomplete as not all publishers provide suitable and complete citation data.