Deep Reinforcement Learning for Quantum State Preparation with Weak Nonlinear Measurements

Riccardo Porotti1,2, Antoine Essig3, Benjamin Huard3, and Florian Marquardt1,2

1Max Planck Institute for the Science of Light, Erlangen, Germany
2Department of Physics, Friedrich-Alexander Universität Erlangen-Nürnberg, Germany
3Univ Lyon, ENS de Lyon, CNRS, Laboratoire de Physique,F-69342 Lyon, France

Find this paper interesting or want to discuss? Scite or leave a comment on SciRate.

Abstract

Quantum control has been of increasing interest in recent years, e.g. for tasks like state initialization and stabilization. Feedback-based strategies are particularly powerful, but also hard to find, due to the exponentially increased search space. Deep reinforcement learning holds great promise in this regard. It may provide new answers to difficult questions, such as whether nonlinear measurements can compensate for linear, constrained control. Here we show that reinforcement learning can successfully discover such feedback strategies, without prior knowledge. We illustrate this for state preparation in a cavity subject to quantum-non-demolition detection of photon number, with a simple linear drive as control. Fock states can be produced and stabilized at very high fidelity. It is even possible to reach superposition states, provided the measurement rates for different Fock states can be controlled as well.

Quantum control has been of great relevance in recent years, especially due to the spread of quantum computers. Dealing with feedback in quantum control (i.e. using measurements to steer the dynamics) is especially difficult since the control choices get exponentially large. The system studied here can be modelled as a cavity, that can be weakly measured to obtain partial information about each energy level. To prepare and stabilize quantum states in such a cavity, we use reinforcement learning (RL). RL is a branch of machine learning that deals with control problems. In an RL framework, the algorithm tries to maximize an objective function (in this case the fidelity) by interacting with the system via a trial-and-error process. In this work, RL manages to prepare complex superpositions of the Fock state in the cavity, with only very limited linear control. The RL agent also learns to stabilize quantum states against different forms of decay.

► BibTeX data

► References

[1] Navin Khaneja, Timo Reiss, Cindie Kehlet, Thomas Schulte-Herbrüggen, and Steffen J. Glaser. ``Optimal control of coupled spin dynamics: Design of NMR pulse sequences by gradient ascent algorithms''. Journal of Magnetic Resonance 172, 296–305 (2005).
https:/​/​doi.org/​10.1016/​j.jmr.2004.11.004

[2] P. de Fouquieres, S. G. Schirmer, S. J. Glaser, and Ilya Kuprov. ``Second order gradient ascent pulse engineering''. Journal of Magnetic Resonance 212, 412–417 (2011).
https:/​/​doi.org/​10.1016/​j.jmr.2011.07.023

[3] A. C. Doherty and K. Jacobs. ``Feedback control of quantum systems using continuous state estimation''. Phys. Rev. A 60, 2700–2711 (1999).
https:/​/​doi.org/​10.1103/​PhysRevA.60.2700

[4] Pavel Bushev, Daniel Rotter, Alex Wilson, François Dubin, Christoph Becher, Jürgen Eschner, Rainer Blatt, Viktor Steixner, Peter Rabl, and Peter Zoller. ``Feedback Cooling of a Single Trapped Ion''. Phys. Rev. Lett. 96, 043003 (2006).
https:/​/​doi.org/​10.1103/​physrevlett.96.043003

[5] Howard M. Wiseman and Gerard J. Milburn. ``Quantum Measurement and Control''. Cambridge University Press. Cambridge (2009).
https:/​/​doi.org/​10.1017/​CBO9780511813948

[6] G. G. Gillett, R. B. Dalton, B. P. Lanyon, M. P. Almeida, M. Barbieri, G. J. Pryde, J. L. O'Brien, K. J. Resch, S. D. Bartlett, and A. G. White. ``Experimental Feedback Control of Quantum Systems Using Weak Measurements''. Phys. Rev. Lett. 104, 080503 (2010).
https:/​/​doi.org/​10.1103/​physrevlett.104.080503

[7] Clément Sayrin, Igor Dotsenko, Xingxing Zhou, Bruno Peaudecerf, Théo Rybarczyk, Sébastien Gleyzes, Pierre Rouchon, Mazyar Mirrahimi, Hadis Amini, Michel Brune, Jean-Michel Raimond, and Serge Haroche. ``Real-time quantum feedback prepares and stabilizes photon number states''. Nature 477, 73–77 (2011).
https:/​/​doi.org/​10.1038/​nature10376

[8] P. Campagne-Ibarcq, E. Flurin, N. Roch, D. Darson, P. Morfin, M. Mirrahimi, M. H. Devoret, F. Mallet, and B. Huard. ``Persistent Control of a Superconducting Qubit by Stroboscopic Measurement Feedback''. Phys. Rev. X 3, 021008 (2013).
https:/​/​doi.org/​10.1103/​physrevx.3.021008

[9] Nissim Ofek, Andrei Petrenko, Reinier Heeres, Philip Reinhold, Zaki Leghtas, Brian Vlastakis, Yehan Liu, Luigi Frunzio, S. M. Girvin, L. Jiang, Mazyar Mirrahimi, M. H. Devoret, and R. J. Schoelkopf. ``Extending the lifetime of a quantum bit with error correction in superconducting circuits''. Nature 536, 441–445 (2016).
https:/​/​doi.org/​10.1038/​nature18949

[10] Massimiliano Rossi, David Mason, Junxin Chen, Yeghishe Tsaturyan, and Albert Schliesser. ``Measurement-based quantum control of mechanical motion''. Nature 563, 53–58 (2018).
https:/​/​doi.org/​10.1038/​s41586-018-0643-8

[11] Shay Hacohen-Gourgy and Leigh S. Martin. ``Continuous measurements for control of superconducting quantum circuits''. Advances in Physics: X 5, 1813626 (2020). arXiv:2009.07297.
https:/​/​doi.org/​10.1080/​23746149.2020.1813626
arXiv:2009.07297

[12] Alessio Fallani, Matteo A. C. Rossi, Dario Tamascelli, and Marco G. Genoni. ``Learning feedback control strategies for quantum metrology''. PRX Quantum 3, 020310 (2022).
https:/​/​doi.org/​10.1103/​PRXQuantum.3.020310

[13] Richard S. Sutton and Andrew G. Barto. ``Reinforcement Learning, second edition: An Introduction''. MIT Press. (2018). url: http:/​/​incompleteideas.net/​book/​the-book.html.
http:/​/​incompleteideas.net/​book/​the-book.html

[14] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. ``Human-level control through deep reinforcement learning''. Nature 518, 529–533 (2015).
https:/​/​doi.org/​10.1038/​nature14236

[15] Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, and Sergey Levine. ``Learning to Walk via Deep Reinforcement Learning'' (2019). arXiv:1812.11103.
arXiv:1812.11103

[16] Thomas Fösel, Petru Tighineanu, Talitha Weiss, and Florian Marquardt. ``Reinforcement Learning with Neural Networks for Quantum Feedback''. Phys. Rev. X 8, 031084 (2018).
https:/​/​doi.org/​10.1103/​physrevx.8.031084

[17] Chunlin Chen, Daoyi Dong, Han-Xiong Li, Jian Chu, and Tzyh-Jong Tarn. ``Fidelity-Based Probabilistic Q-Learning for Control of Quantum Systems''. IEEE Transactions on Neural Networks and Learning Systems 25, 920–933 (2014).
https:/​/​doi.org/​10.1109/​tnnls.2013.2283574

[18] Moritz August and José Miguel Hernández-Lobato. ``Taking Gradients Through Experiments: LSTMs and Memory Proximal Policy Optimization for Black-Box Quantum Control''. In Rio Yokota, Michèle Weiland, John Shalf, and Sadaf Alam, editors, High Performance Computing. Pages 591–613. Lecture Notes in Computer ScienceCham (2018). Springer International Publishing.
https:/​/​doi.org/​10.1007/​978-3-030-02465-9_43

[19] Marin Bukov, Alexandre G. R. Day, Dries Sels, Phillip Weinberg, Anatoli Polkovnikov, and Pankaj Mehta. ``Reinforcement Learning in Different Phases of Quantum Control''. Phys. Rev. X 8, 031086 (2018). arXiv:1705.00565.
https:/​/​doi.org/​10.1103/​physrevx.8.031086
arXiv:1705.00565

[20] Riccardo Porotti, Dario Tamascelli, Marcello Restelli, and Enrico Prati. ``Coherent transport of quantum states by deep reinforcement learning''. Commun Phys 2, 1–9 (2019).
https:/​/​doi.org/​10.1038/​s42005-019-0169-x

[21] Murphy Yuezhen Niu, Sergio Boixo, Vadim N. Smelyanskiy, and Hartmut Neven. ``Universal quantum control through deep reinforcement learning''. npj Quantum Information 5, 1–8 (2019).
https:/​/​doi.org/​10.1038/​s41534-019-0141-3

[22] Zheng An and D. L. Zhou. ``Deep reinforcement learning for quantum gate control''. EPL 126, 60002 (2019).
https:/​/​doi.org/​10.1209/​0295-5075/​126/​60002

[23] Han Xu, Junning Li, Liqiang Liu, Yu Wang, Haidong Yuan, and Xin Wang. ``Generalizable control for quantum parameter estimation through reinforcement learning''. npj Quantum Inf 5, 1–8 (2019).
https:/​/​doi.org/​10.1038/​s41534-019-0198-z

[24] Juan Miguel Arrazola, Thomas R. Bromley, Josh Izaac, Casey R. Myers, Kamil Brádler, and Nathan Killoran. ``Machine learning method for state preparation and gate synthesis on photonic quantum computers''. Quantum Sci. Technol. 4, 024004 (2019).
https:/​/​doi.org/​10.1088/​2058-9565/​aaf59e

[25] L. O'Driscoll, R. Nichols, and P. A. Knott. ``A hybrid machine learning algorithm for designing quantum experiments''. Quantum Mach. Intell. 1, 5–15 (2019).
https:/​/​doi.org/​10.1007/​s42484-019-00003-8

[26] Thomas Fösel, Stefan Krastanov, Florian Marquardt, and Liang Jiang. ``Efficient cavity control with SNAP gates'' (2020). arXiv:2004.14256.
arXiv:2004.14256

[27] Mogens Dalgaard, Felix Motzoi, Jens Jakob Sørensen, and Jacob Sherson. ``Global optimization of quantum dynamics with AlphaZero deep exploration''. npj Quantum Inf 6, 6 (2020).
https:/​/​doi.org/​10.1038/​s41534-019-0241-0

[28] Hailan Ma, Daoyi Dong, Steven X. Ding, and Chunlin Chen. ``Curriculum-based Deep Reinforcement Learning for Quantum Control'' (2021). arXiv:2012.15427.
arXiv:2012.15427

[29] Zheng An, Hai-Jing Song, Qi-Kai He, and D. L. Zhou. ``Quantum optimal control of multilevel dissipative quantum systems with reinforcement learning''. Phys. Rev. A 103, 012404 (2021).
https:/​/​doi.org/​10.1103/​physreva.103.012404

[30] Yuval Baum, Mirko Amico, Sean Howell, Michael Hush, Maggie Liuzzi, Pranav Mundada, Thomas Merkh, Andre R.R. Carvalho, and Michael J. Biercuk. ``Experimental Deep Reinforcement Learning for Error-Robust Gate-Set Design on a Superconducting Quantum Computer''. PRX Quantum 2, 040324 (2021).
https:/​/​doi.org/​10.1103/​PRXQuantum.2.040324

[31] Thomas Fösel, Murphy Yuezhen Niu, Florian Marquardt, and Li Li. ``Quantum circuit optimization with deep reinforcement learning'' (2021). arXiv:2103.07585.
arXiv:2103.07585

[32] E. Flurin, L. S. Martin, S. Hacohen-Gourgy, and I. Siddiqi. ``Using a Recurrent Neural Network to Reconstruct Quantum Dynamics of a Superconducting Qubit from Physical Observations''. Physical Review X 10 (2020).
https:/​/​doi.org/​10.1103/​physrevx.10.011006

[33] D. T. Lennon, H. Moon, L. C. Camenzind, Liuqi Yu, D. M. Zumbühl, G. a. D. Briggs, M. A. Osborne, E. A. Laird, and N. Ares. ``Efficiently measuring a quantum device using machine learning''. npj Quantum Information 5, 1–8 (2019).
https:/​/​doi.org/​10.1038/​s41534-019-0193-4

[34] Kyunghoon Jung, M. H. Abobeih, Jiwon Yun, Gyeonghun Kim, Hyunseok Oh, Ang Henry, T. H. Taminiau, and Dohun Kim. ``Deep learning enhanced individual nuclear-spin detection''. npj Quantum Inf 7, 1–9 (2021).
https:/​/​doi.org/​10.1038/​s41534-021-00377-3

[35] V Nguyen. ``Deep reinforcement learning for efficient measurement of quantum devices''. npj Quantum InformationPage 9 (2021).
https:/​/​doi.org/​10.1038/​s41534-021-00434-x

[36] Alexander Hentschel and Barry C. Sanders. ``Machine Learning for Precise Quantum Measurement''. Phys. Rev. Lett. 104, 063603 (2010).
https:/​/​doi.org/​10.1103/​physrevlett.104.063603

[37] M. Tiersch, E. J. Ganahl, and H. J. Briegel. ``Adaptive quantum computation in changing environments using projective simulation''. Sci Rep 5, 12874 (2015).
https:/​/​doi.org/​10.1038/​srep12874

[38] Pantita Palittapongarnpim, Peter Wittek, Ehsan Zahedinejad, Shakib Vedaie, and Barry C. Sanders. ``Learning in quantum control: High-dimensional global optimization for noisy quantum dynamics''. Neurocomputing 268, 116–126 (2017).
https:/​/​doi.org/​10.1016/​j.neucom.2016.12.087

[39] Jelena Mackeprang, Durga B. Rao Dasari, and Jörg Wrachtrup. ``A reinforcement learning approach for quantum state engineering''. Quantum Mach. Intell. 2, 5 (2020).
https:/​/​doi.org/​10.1007/​s42484-020-00016-8

[40] Christian Sommer, Muhammad Asjad, and Claudiu Genes. ``Prospects of reinforcement learning for the simultaneous damping of many mechanical modes''. Sci Rep 10, 2623 (2020).
https:/​/​doi.org/​10.1038/​s41598-020-59435-z

[41] Zhikang T. Wang, Yuto Ashida, and Masahito Ueda. ``Deep Reinforcement Learning Control of Quantum Cartpoles''. Phys. Rev. Lett. 125, 100401 (2020).
https:/​/​doi.org/​10.1103/​PhysRevLett.125.100401

[42] Sangkha Borah, Bijita Sarma, Michael Kewming, Gerard J. Milburn, and Jason Twamley. ``Measurement-Based Feedback Quantum Control with Deep Reinforcement Learning for a Double-Well Nonlinear Potential''. Phys. Rev. Lett. 127, 190403 (2021).
https:/​/​doi.org/​10.1103/​PhysRevLett.127.190403

[43] V. V. Sivak, A. Eickbusch, H. Liu, B. Royer, I. Tsioutsios, and M. H. Devoret. ``Model-Free Quantum Control with Reinforcement Learning''. Phys. Rev. X 12, 011059 (2022).
https:/​/​doi.org/​10.1103/​PhysRevX.12.011059

[44] Antoine Essig, Quentin Ficheux, Théau Peronnin, Nathanaël Cottet, Raphaël Lescanne, Alain Sarlette, Pierre Rouchon, Zaki Leghtas, and Benjamin Huard. ``Multiplexed Photon Number Measurement''. Phys. Rev. X 11, 031045 (2021).
https:/​/​doi.org/​10.1103/​PhysRevX.11.031045

[45] B. Peaudecerf, C. Sayrin, X. Zhou, T. Rybarczyk, S. Gleyzes, I. Dotsenko, J. M. Raimond, M. Brune, and S. Haroche. ``Quantum feedback experiments stabilizing Fock states of light in a cavity''. Phys. Rev. A 87, 042320 (2013).
https:/​/​doi.org/​10.1103/​physreva.87.042320

[46] X. Zhou, I. Dotsenko, B. Peaudecerf, T. Rybarczyk, C. Sayrin, S. Gleyzes, J. M. Raimond, M. Brune, and S. Haroche. ``Field Locked to a Fock State by Quantum Feedback with Single Photon Corrections''. Phys. Rev. Lett. 108, 243602 (2012).
https:/​/​doi.org/​10.1103/​physrevlett.108.243602

[47] Jacob C. Curtis, Connor T. Hann, Salvatore S. Elder, Christopher S. Wang, Luigi Frunzio, Liang Jiang, and Robert J. Schoelkopf. ``Single-shot number-resolved detection of microwave photons with error mitigation''. Phys. Rev. A 103, 023705 (2021).
https:/​/​doi.org/​10.1103/​physreva.103.023705

[48] Christine Guerlin, Julien Bernu, Samuel Deléglise, Clément Sayrin, Sébastien Gleyzes, Stefan Kuhr, Michel Brune, Jean-Michel Raimond, and Serge Haroche. ``Progressive field-state collapse and quantum non-demolition photon counting''. Nature 448, 889–893 (2007).
https:/​/​doi.org/​10.1038/​nature06057

[49] B. R. Johnson, M. D. Reed, A. A. Houck, D. I. Schuster, Lev S. Bishop, E. Ginossar, J. M. Gambetta, L. DiCarlo, L. Frunzio, S. M. Girvin, and R. J. Schoelkopf. ``Quantum non-demolition detection of single microwave photons in a circuit''. Nature Phys 6, 663–667 (2010).
https:/​/​doi.org/​10.1038/​nphys1710

[50] B. Peaudecerf, T. Rybarczyk, S. Gerlich, S. Gleyzes, J. M. Raimond, S. Haroche, I. Dotsenko, and M. Brune. ``Adaptive Quantum Nondemolition Measurement of a Photon Number''. Phys. Rev. Lett. 112, 080401 (2014).
https:/​/​doi.org/​10.1103/​physrevlett.112.080401

[51] Crispin Gardiner and Peter Zoller. ``Quantum Noise: A Handbook of Markovian and Non-Markovian Quantum Stochastic Methods with Applications to Quantum Optics''. Springer Series in Synergetics. Springer-Verlag. Berlin Heidelberg (2004). Third edition. url: link.springer.com/​book/​9783540223016.
https:/​/​link.springer.com/​book/​9783540223016

[52] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. ``Proximal Policy Optimization Algorithms'' (2017). arXiv:1707.06347.
arXiv:1707.06347

[53] John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. ``Trust Region Policy Optimization'' (2017). arXiv:1502.05477.
arXiv:1502.05477

[54] Ashley Hill, Antonin Raffin, Maximilian Ernestus, Adam Gleave, Anssi Kanervisto, Rene Traore, Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, and Yuhuai Wu. ``Stable baselines''. url: github.com/​hill-a/​stable-baselines.
https:/​/​github.com/​hill-a/​stable-baselines

[55] Weizhou Cai, Yuwei Ma, Weiting Wang, Chang-Ling Zou, and Luyan Sun. ``Bosonic quantum error correction codes in superconducting quantum circuits''. Fundamental Research 1, 50–67 (2021).
https:/​/​doi.org/​10.1016/​j.fmre.2020.12.006

[56] F. A. M. de Oliveira, M. S. Kim, P. L. Knight, and V. Buek. ``Properties of displaced number states''. Physical Review A 41, 2645–2652 (1990).
https:/​/​doi.org/​10.1103/​physreva.41.2645

[57] Michael Martin Nieto. ``Displaced and Squeezed Number States''. Physics Letters A 229, 135–143 (1997). arXiv:quant-ph/​9612050.
https:/​/​doi.org/​10.1016/​s0375-9601(97)00183-7
arXiv:quant-ph/9612050

Cited by

[1] Björn Annby-Andersson, Faraj Bakhshinezhad, Debankur Bhattacharyya, Guilherme De Sousa, Christopher Jarzynski, Peter Samuelsson, and Patrick P. Potts, "Quantum Fokker-Planck Master Equation for Continuous Feedback Control", Physical Review Letters 129 5, 050401 (2022).

[2] David A. Herrera-Martí, "Policy Gradient Approach to Compilation of Variational Quantum Circuits", Quantum 6, 797 (2022).

[3] Frédéric Sauvage and Florian Mintert, "Optimal Control of Families of Quantum Gates", Physical Review Letters 129 5, 050507 (2022).

[4] Anna Dawid, Julian Arnold, Borja Requena, Alexander Gresch, Marcin Płodzień, Kaelan Donatella, Kim A. Nicoli, Paolo Stornati, Rouven Koch, Miriam Büttner, Robert Okuła, Gorka Muñoz-Gil, Rodrigo A. Vargas-Hernández, Alba Cervera-Lierta, Juan Carrasquilla, Vedran Dunjko, Marylou Gabrié, Patrick Huembeli, Evert van Nieuwenburg, Filippo Vicentini, Lei Wang, Sebastian J. Wetzel, Giuseppe Carleo, Eliška Greplová, Roman Krems, Florian Marquardt, Michał Tomza, Maciej Lewenstein, and Alexandre Dauphin, "Modern applications of machine learning in quantum sciences", arXiv:2204.04198.

[5] Paolo Andrea Erdman, Alberto Rolandi, Paolo Abiuso, Martí Perarnau-Llobet, and Frank Noé, "Pareto-optimal cycles for power, efficiency and fluctuations of quantum heat engines using reinforcement learning", arXiv:2207.13104.

[6] Paolo Andrea Erdman and Frank Noé, "Driving black-box quantum thermal machines with optimal power/efficiency trade-offs using reinforcement learning", arXiv:2204.04785.

[7] Alessio Fallani, Matteo A. C. Rossi, Dario Tamascelli, and Marco G. Genoni, "Learning Feedback Control Strategies for Quantum Metrology", PRX Quantum 3 2, 020310 (2022).

[8] Riccardo Porotti, Vittorio Peano, and Florian Marquardt, "Gradient Ascent Pulse Engineering with Feedback", arXiv:2203.04271.

[9] Luigi Giannelli, Pierpaolo Sgroi, Jonathon Brown, Gheorghe Sorin Paraoanu, Mauro Paternostro, Elisabetta Paladino, and Giuseppe Falci, "A tutorial on optimal control and reinforcement learning methods for quantum technologies", Physics Letters A 434, 128054 (2022).

The above citations are from Crossref's cited-by service (last updated successfully 2022-12-08 01:44:02) and SAO/NASA ADS (last updated successfully 2022-12-08 01:44:03). The list may be incomplete as not all publishers provide suitable and complete citation data.