Quantum algorithms for escaping from saddle points

Chenyi Zhang1, Jiaqi Leng2, and Tongyang Li3,4,5

1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
2Department of Mathematics and Joint Center for Quantum Information and Computer Science, University of Maryland, College Park, MD, USA
3Center on Frontiers of Computing Studies, Peking University, Beijing, China
4Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, MA, USA
5Department of Computer Science and Joint Center for Quantum Information and Computer Science, University of Maryland, College Park, MD, USA

Find this paper interesting or want to discuss? Scite or leave a comment on SciRate.


We initiate the study of quantum algorithms for escaping from saddle points with provable guarantee. Given a function $f\colon\mathbb{R}^{n}\to\mathbb{R}$, our quantum algorithm outputs an $\epsilon$-approximate second-order stationary point using $\tilde{O}(\log^{2} (n)/\epsilon^{1.75})$ queries to the quantum evaluation oracle (i.e., the zeroth-order oracle). Compared to the classical state-of-the-art algorithm by Jin et al. with $\tilde{O}(\log^{6} (n)/\epsilon^{1.75})$ queries to the gradient oracle (i.e., the first-order oracle), our quantum algorithm is polynomially better in terms of $\log n$ and matches its complexity in terms of $1/\epsilon$. Technically, our main contribution is the idea of replacing the classical perturbations in gradient descent methods by simulating quantum wave equations, which constitutes the improvement in the quantum query complexity with $\log n$ factors for escaping from saddle points. We also show how to use a quantum gradient computation algorithm due to Jordan to replace the classical gradient queries by quantum evaluation queries with the same complexity. Finally, we also perform numerical experiments that support our theoretical findings.

We initiate the study of quantum algorithms for escaping from saddle points with provable guarantee. Given an $n$-dimensional objective function $f$, our quantum algorithm finds a local minimum using $\tilde{O}(\log^2(n))$ quantum queries to the function value. This quantum algorithm is polynomially better than the classical state-of-the-art algorithm by Jin et al., which requires $\tilde{O}(\log^6(n))$ classical queries to the function gradient. Classical algorithms use perturbations in gradient descent methods to kick the solution path away from saddle points, while we replace the perturbation with a quantum simulation subroutine. Such replacement is possible because we find a quantum particle that escapes from saddle points of its confining potential function. Besides, we also show how to incorporate Jordan's quantum gradient computation routine in our quantum algorithm to remove the gradient oracle assumption in the classical algorithm. Finally, we provide numerical evidence that supports our theoretical findings.

► BibTeX data

► References

[1] Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, and Tengyu Ma, Finding approximate local minima faster than gradient descent, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199, 2017, arXiv:1611.01146. https:/​/​doi.org/​10.1145/​3055399.3055464.

[2] Zeyuan Allen-Zhu, Natasha 2: Faster non-convex optimization than SGD, Advances in Neural Information Processing Systems, pp. 2675–2686, 2018, arXiv:1708.08694.

[3] Zeyuan Allen-Zhu and Yuanzhi Li, Neon2: Finding local minima via first-order oracles, Advances in Neural Information Processing Systems, pp. 3716–3726, 2018, arXiv:1711.06673.

[4] Joran van Apeldoorn and András Gilyén, Improvements in quantum SDP-solving with applications, Proceedings of the 46th International Colloquium on Automata, Languages, and Programming, Leibniz International Proceedings in Informatics (LIPIcs), vol. 132, pp. 99:1–99:15, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019, arXiv:1804.05058. https:/​/​doi.org/​10.4230/​LIPIcs.ICALP.2019.99.

[5] Joran van Apeldoorn, András Gilyén, Sander Gribling, and Ronald de Wolf, Quantum SDP-solvers: Better upper and lower bounds, 58th Annual Symposium on Foundations of Computer Science, IEEE, 2017, arXiv:1705.01843. https:/​/​doi.org/​10.22331/​q-2020-02-14-230.

[6] Joran van Apeldoorn, András Gilyén, Sander Gribling, and Ronald de Wolf, Convex optimization using quantum oracles, Quantum 4 (2020), 220, arXiv:1809.00643. https:/​/​doi.org/​10.22331/​q-2020-01-13-220.

[7] Frank Arute et al., Quantum supremacy using a programmable superconducting processor, Nature 574 (2019), no. 7779, 505–510, arXiv:1910.11333. https:/​/​doi.org/​10.1038/​s41586-019-1666-5.

[8] Ivo Babuška and Manil Suri, The $h$-$p$ version of the finite element method with quasiuniform meshes, ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique 21 (1987), no. 2, 199–238.

[9] Dominic W. Berry, Graeme Ahokas, Richard Cleve, and Barry C. Sanders, Efficient quantum algorithms for simulating sparse Hamiltonians, Communications in Mathematical Physics 270 (2007), no. 2, 359–371, arXiv:quant-ph/​0508139. https:/​/​doi.org/​10.1007/​s00220-006-0150-x.

[10] Dominic W. Berry, Andrew M. Childs, and Robin Kothari, Hamiltonian simulation with nearly optimal dependence on all parameters, Proceedings of the 56th Annual Symposium on Foundations of Computer Science, pp. 792–809, IEEE, 2015, arXiv:1501.01715. https:/​/​doi.org/​10.1109/​FOCS.2015.54.

[11] Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Srebro, Global optimality of local search for low rank matrix recovery, Advances in Neural Information Processing Systems, pp. 3880–3888, 2016, arXiv:1605.07221.

[12] Jean Bourgain, Growth of Sobolev norms in linear Schrödinger equations with quasi-periodic potential, Communications in Mathematical Physics 204 (1999), no. 1, 207–247. https:/​/​doi.org/​10.1007/​s002200050644.

[13] Jean Bourgain, On growth of Sobolev norms in linear Schrödinger equations with smooth time dependent potential, Journal d’Analyse Mathématique 77 (1999), no. 1, 315–348. https:/​/​doi.org/​10.1007/​BF02791265.

[14] Fernando G.S.L. Brandão, Amir Kalev, Tongyang Li, Cedric Yen-Yu Lin, Krysta M. Svore, and Xiaodi Wu, Quantum SDP solvers: Large speed-ups, optimality, and applications to quantum learning, Proceedings of the 46th International Colloquium on Automata, Languages, and Programming, Leibniz International Proceedings in Informatics (LIPIcs), vol. 132, pp. 27:1–27:14, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019, arXiv:1710.02581. https:/​/​doi.org/​10.4230/​LIPIcs.ICALP.2019.27.

[15] Fernando G.S.L. Brandão and Krysta Svore, Quantum speed-ups for semidefinite programming, Proceedings of the 58th Annual Symposium on Foundations of Computer Science, pp. 415–426, 2017, arXiv:1609.05537. https:/​/​doi.org/​10.1109/​FOCS.2017.45.

[16] Alan J. Bray and David S. Dean, Statistics of critical points of Gaussian fields on large-dimensional spaces, Physical Review Letters 98 (2007), no. 15, 150201, arXiv:cond-mat/​0611023. https:/​/​doi.org/​10.1103/​PhysRevLett.98.150201.

[17] David Bulger, Quantum basin hopping with gradient-based local optimisation, 2005, arXiv:quant-ph/​0507193.

[18] Yair Carmon, John C. Duchi, Oliver Hinder, and Aaron Sidford, Accelerated methods for nonconvex optimization, SIAM Journal on Optimization 28 (2018), no. 2, 1751–1772, arXiv:1611.00756. https:/​/​doi.org/​10.1137/​17M1114296.

[19] Shouvanik Chakrabarti, Andrew M. Childs, Tongyang Li, and Xiaodi Wu, Quantum algorithms and lower bounds for convex optimization, Quantum 4 (2020), 221, arXiv:1809.01731. https:/​/​doi.org/​10.22331/​q-2020-01-13-221.

[20] Nai-Hui Chia, András Gilyén, Tongyang Li, Han-Hsuan Lin, Ewin Tang, and Chunhao Wang, Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning, Proceedings of the 52nd Annual ACM Symposium on Theory of Computing, pp. 387–400, ACM, 2020, arXiv:1910.06151. https:/​/​doi.org/​10.1145/​3357713.3384314.

[21] Nai-Hui Chia, András Gilyén, Han-Hsuan Lin, Seth Lloyd, Ewin Tang, and Chunhao Wang, Quantum-inspired algorithms for solving low-rank linear equation systems with logarithmic dependence on the dimension, Proceedings of the 31st International Symposium on Algorithms and Computation, vol. 181, p. 47, Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. https:/​/​doi.org/​10.4230/​LIPIcs.ISAAC.2020.47.

[22] Nai-Hui Chia, Tongyang Li, Han-Hsuan Lin, and Chunhao Wang, Quantum-inspired sublinear algorithm for solving low-rank semidefinite programming, 45th International Symposium on Mathematical Foundations of Computer Science, 2020, arXiv:1901.03254. https:/​/​doi.org/​10.4230/​LIPIcs.MFCS.2020.23.

[23] Andrew M. Childs, Lecture notes on quantum algorithms, https:/​/​www.cs.umd.edu/​%7Eamchilds/​qa/​qa.pdf, 2017.

[24] Andrew M. Childs and Robin Kothari, Limitations on the simulation of non-sparse Hamiltonians, Quantum Information & Computation 10 (2010), no. 7, 669–684, arXiv:0908.4398.

[25] Andrew M. Childs, Jin-Peng Liu, and Aaron Ostrander, High-precision quantum algorithms for partial differential equations, 2020, arXiv:2002.07868.

[26] Andrew M. Childs, Yuan Su, Minh C. Tran, Nathan Wiebe, and Shuchen Zhu, Theory of Trotter error with commutator scaling, Physical Review X 11 (2021), no. 1, 011020, arXiv:1912.08854. https:/​/​doi.org/​10.1103/​PhysRevX.11.011020.

[27] Pedro C.S. Costa, Stephen Jordan, and Aaron Ostrander, Quantum algorithm for simulating the wave equation, Physical Review A 99 (2019), no. 1, 012323, arXiv:1711.05394. https:/​/​doi.org/​10.1103/​PhysRevA.99.012323.

[28] Frank E. Curtis, Daniel P. Robinson, and Mohammadreza Samadi, A trust region algorithm with a worst-case iteration complexity of $\mathcal{O}(\epsilon^{-3/​2})$ for nonconvex optimization, Mathematical Programming 162 (2017), no. 1-2, 1–32. https:/​/​doi.org/​10.1007/​s10107-016-1026-2.

[29] Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio, Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Advances in Neural Information Processing Systems, pp. 2933–2941, 2014, arXiv:1406.2572.

[30] Cong Fang, Chris Junchi Li, Zhouchen Lin, and Tong Zhang, Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator, Advances in Neural Information Processing Systems, pp. 689–699, 2018, arXiv:1807.01695.

[31] Cong Fang, Zhouchen Lin, and Tong Zhang, Sharp analysis for nonconvex SGD escaping from saddle points, Conference on Learning Theory, pp. 1192–1234, 2019, arXiv:1902.00247.

[32] Mauger François, Symplectic leap frog scheme, https:/​/​www.mathworks.com/​matlabcentral/​fileexchange/​38652-symplectic-leap-frog-scheme, 2020.

[33] Yan V. Fyodorov and Ian Williams, Replica symmetry breaking condition exposed by random matrix calculation of landscape complexity, Journal of Statistical Physics 129 (2007), no. 5-6, 1081–1116, arXiv:cond-mat/​0702601. https:/​/​doi.org/​10.1007/​s10955-007-9386-x.

[34] Xuefeng Gao, Mert Gürbüzbalaban, and Lingjiong Zhu, Global convergence of stochastic gradient Hamiltonian monte carlo for non-convex stochastic optimization: Non-asymptotic performance bounds and momentum-based acceleration, 2018, arXiv:1809.04618.

[35] Rong Ge, Furong Huang, Chi Jin, and Yang Yuan, Escaping from saddle points – online stochastic gradient for tensor decomposition, Conference on Learning Theory, pp. 797–842, 2015, arXiv:1503.02101.

[36] Rong Ge, Jason D. Lee, and Tengyu Ma, Matrix completion has no spurious local minimum, Advances in Neural Information Processing Systems, pp. 2981–2989, 2016, arXiv:1605.07272.

[37] Rong Ge, Jason D. Lee, and Tengyu Ma, Learning one-hidden-layer neural networks with landscape design, International Conference on Learning Representations, 2018, arXiv:1711.00501.

[38] Rong Ge and Tengyu Ma, On the optimization landscape of tensor decompositions, Advances in Neural Information Processing Systems, pp. 3656–3666, Curran Associates Inc., 2017, arXiv:1706.05598.

[39] András Gilyén, Srinivasan Arunachalam, and Nathan Wiebe, Optimizing quantum optimization algorithms via faster quantum gradient computation, Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1425–1444, Society for Industrial and Applied Mathematics, 2019, arXiv:1711.00465. https:/​/​doi.org/​10.1137/​1.9781611975482.87.

[40] András Gilyén, Zhao Song, and Ewin Tang, An improved quantum-inspired algorithm for linear regression, 2020, arXiv:2009.07268.

[41] Stephen K. Gray and David E. Manolopoulos, Symplectic integrators tailored to the time-dependent Schrödinger equation, The Journal of chemical physics 104 (1996), no. 18, 7099–7112. https:/​/​doi.org/​10.1063/​1.471428.

[42] Lov K. Grover, A fast quantum mechanical algorithm for database search, Proceedings of the Twenty-eighth Annual ACM Symposium on Theory of Computing, pp. 212–219, ACM, 1996, arXiv:quant-ph/​9605043. https:/​/​doi.org/​10.1145/​237814.237866.

[43] Moritz Hardt, Tengyu Ma, and Benjamin Recht, Gradient descent learns linear dynamical systems, Journal of Machine Learning Research 19 (2018), no. 29, 1–44, arXiv:1609.05191.

[44] Daniel Hsu, Sham Kakade, and Tong Zhang, A tail inequality for quadratic forms of subgaussian random vectors, Electronic Communications in Probability 17 (2012), 1–6, arXiv:1110.2842. https:/​/​doi.org/​10.1214/​ECP.v17-2079.

[45] Prateek Jain, Chi Jin, Sham Kakade, and Praneeth Netrapalli, Global convergence of non-convex gradient descent for computing matrix squareroot, Artificial Intelligence and Statistics, pp. 479–488, 2017, arXiv:1507.05854.

[46] Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, and Michael I. Jordan, How to escape saddle points efficiently, Conference on Learning Theory, pp. 1724–1732, 2017, arXiv:1703.00887.

[47] Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, and Michael I. Jordan, On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points, Journal of the ACM 68.2 (2021), 1–29. arXiv:1902.04811. https:/​/​doi.org/​10.1145/​3418526.

[48] Chi Jin, Praneeth Netrapalli, and Michael I. Jordan, Accelerated gradient descent escapes saddle points faster than gradient descent, Conference on Learning Theory, pp. 1042–1085, 2018, arXiv:1711.10456.

[49] Michael I. Jordan, On gradient-based optimization: Accelerated, distributed, asynchronous and stochastic optimization, https:/​/​www.youtube.com/​watch?v=VE2ITg%5FhGnI, 2017.

[50] Stephen P. Jordan, Fast quantum algorithm for numerical gradient estimation, Physical Review Letters 95 (2005), no. 5, 050501, arXiv:quant-ph/​0405146. https:/​/​doi.org/​10.1103/​PhysRevLett.95.050501.

[51] Stephen P. Jordan, Quantum computation beyond the circuit model, Ph.D. thesis, Massachusetts Institute of Technology, 2008, arXiv:0809.2307.

[52] Iordanis Kerenidis and Anupam Prakash, Quantum recommendation systems, Proceedings of the 8th Innovations in Theoretical Computer Science Conference, pp. 49:1–49:21, 2017, arXiv:1603.08675. https:/​/​doi.org/​10.4230/​LIPIcs.ITCS.2017.49.

[53] Iordanis Kerenidis and Anupam Prakash, A quantum interior point method for LPs and SDPs, ACM Transactions on Quantum Computing, pp. 1–32, ACM, 2020, arXiv:1808.09266. https:/​/​doi.org/​10.1145/​3406306.

[54] Alexei Kitaev and William A. Webb, Wavefunction preparation and resampling using a quantum computer, 2008, arXiv:0801.0342.

[55] Kfir Y. Levy, The power of normalization: Faster evasion of saddle points, 2016, arXiv:1611.04831.

[56] Jianping Li, General explicit difference formulas for numerical differentiation, Journal of Computational and Applied Mathematics 183 (2005), no. 1, 29–52. https:/​/​doi.org/​10.1016/​j.cam.2004.12.026.

[57] Seth Lloyd, Universal quantum simulators, Science 273 (1996), no. 5278, 1073. https:/​/​doi.org/​10.1126/​science.273.5278.1073.

[58] Guang Hao Low and Isaac L. Chuang, Optimal Hamiltonian simulation by quantum signal processing, Physical Review Letters 118 (2017), no. 1, 010501, arXiv:1606.02685. https:/​/​doi.org/​10.1103/​PhysRevLett.118.010501.

[59] Guang Hao Low and Isaac L. Chuang, Hamiltonian simulation by qubitization, Quantum 3 (2019), 163, arXiv:1610.06546. https:/​/​doi.org/​10.22331/​q-2019-07-12-163.

[60] Guang Hao Low and Nathan Wiebe, Hamiltonian simulation in the interaction picture, 2018, arXiv:1805.00675.

[61] Yurii Nesterov and Boris T. Polyak, Cubic regularization of Newton method and its global performance, Mathematical Programming 108 (2006), no. 1, 177–205. https:/​/​doi.org/​10.1007/​s10107-006-0706-8.

[62] Yurii E. Nesterov, A method for solving the convex programming problem with convergence rate ${O}(1/​k^{2})$, Soviet Mathematics Doklady, vol. 27, pp. 372–376, 1983.

[63] John Preskill, Quantum computing in the NISQ era and beyond, Quantum 2 (2018), 79, arXiv:1801.00862. https:/​/​doi.org/​10.22331/​q-2018-08-06-79.

[64] Changpeng Shao and Ashley Montanaro, Faster quantum-inspired algorithms for solving linear systems, 2021, arXiv:2103.10309.

[65] Ju Sun, Qing Qu, and John Wright, A geometric analysis of phase retrieval, Foundations of Computational Mathematics 18 (2018), no. 5, 1131–1198, arXiv:1602.06664 https:/​/​doi.org/​10.1007/​s10208-017-9365-9.

[66] Ewin Tang, Quantum-inspired classical algorithms for principal component analysis and supervised clustering, 2018, arXiv:1811.00414. https:/​/​doi.org/​10.1103/​PhysRevLett.127.060503.

[67] Ewin Tang, A quantum-inspired classical algorithm for recommendation systems, Proceedings of the 51st Annual ACM Symposium on Theory of Computing, pp. 217–228, ACM, 2019, arXiv:1807.04271. https:/​/​doi.org/​10.1145/​3313276.3316310.

[68] Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, and Michael I. Jordan, Stochastic cubic regularization for fast nonconvex optimization, Advances in Neural Information Processing Systems, pp. 2899–2908, 2018, arXiv:1711.02838.

[69] Christian Weedbrook, Stefano Pirandola, Raúl García-Patrón, Nicolas J. Cerf, Timothy C. Ralph, Jeffrey H. Shapiro, and Seth Lloyd, Gaussian quantum information, Reviews of Modern Physics 84 (2012), no. 2, 621, arXiv:1110.3234. https:/​/​doi.org/​10.1103/​RevModPhys.84.621.

[70] Stephen Wiesner, Simulations of many-body quantum systems by a quantum computer, 1996, arXiv:quant-ph/​9603028.

[71] Yi Xu, Rong Jin, and Tianbao Yang, NEON+: Accelerated gradient methods for extracting negative curvature for non-convex optimization, 2017, arXiv:1712.01033.

[72] Yi Xu, Rong Jin, and Tianbao Yang, First-order stochastic algorithms for escaping from saddle points in almost linear time, Advances in Neural Information Processing Systems, pp. 5530–5540, 2018, arXiv:1711.01944.

[73] Christof Zalka, Efficient simulation of quantum systems by quantum computers, Fortschritte der Physik: Progress of Physics 46 (1998), no. 6-8, 877–879. https:/​/​doi.org/​10.1002/​(SICI)1521-3978(199811)46:6/​8<877::AID-PROP877>3.0.CO;2-A.

[74] Christof Zalka, Simulating quantum systems on a quantum computer, Proceedings of the Royal Society of London Series A: Mathematical, Physical and Engineering Sciences 454 (1998), no. 1969, 313–322, arXiv:quant-ph/​9603026. https:/​/​doi.org/​10.1098/​rspa.1998.0162.

[75] Kaining Zhang, Min-Hsiu Hsieh, Liu Liu, and Dacheng Tao, Quantum algorithm for finding the negative curvature direction in non-convex optimization, 2019, arXiv:1909.07622.

[76] Yuchen Zhang, Percy Liang, and Moses Charikar, A hitting time analysis of stochastic gradient Langevin dynamics, Conference on Learning Theory, pp. 1980–2022, 2017, arXiv:1702.05575.

[77] Dongruo Zhou and Quanquan Gu, Stochastic recursive variance-reduced cubic regularization methods, International Conference on Artificial Intelligence and Statistics, pp. 3980–3990, 2020, arXiv:1901.11518.

Cited by

[1] Andrew M. Childs, Jiaqi Leng, Tongyang Li, Jin-Peng Liu, and Chenyi Zhang, "Quantum simulation of real-space dynamics", arXiv:2203.17006, Quantum 6, 860 (2022).

[2] Benjamin A. Cordier, Nicolas P. D. Sawaya, Gian Giacomo Guerreschi, and Shannon K. McWeeney, "Biology and medicine in the landscape of quantum advantages", arXiv:2112.00760, Journal of The Royal Society Interface 19 196, 20220541 (2022).

[3] Yanlin Chen and Ronald de Wolf, "Quantum Algorithms and Lower Bounds for Linear Regression with Norm Constraints", arXiv:2110.13086.

[4] Andrew M. Childs, Tongyang Li, Jin-Peng Liu, Chunhao Wang, and Ruizhe Zhang, "Quantum Algorithms for Sampling Log-Concave Distributions and Estimating Normalizing Constants", arXiv:2210.06539.

[5] Tongyang Li and Ruizhe Zhang, "Quantum Speedups of Optimizing Approximately Convex Functions with Applications to Logarithmic Regret Stochastic Convex Bandits", arXiv:2209.12897.

The above citations are from Crossref's cited-by service (last updated successfully 2022-12-08 13:08:15) and SAO/NASA ADS (last updated successfully 2022-12-08 13:08:16). The list may be incomplete as not all publishers provide suitable and complete citation data.