# Reinforcement Learning with Neural Networks for Quantum Multiple Hypothesis Testing

Sarah Brandsen1, Kevin D. Stubbs2, and Henry D. Pfister2,3

1Department of Physics, Duke University, Durham, North Carolina 27708, USA.
2Department of Mathematics, Duke University, Durham, North Carolina 27708, USA
3Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27708, USA

### Abstract

Reinforcement learning with neural networks (RLNN) has recently demonstrated great promise for many problems, including some problems in quantum information theory. In this work, we apply RLNN to quantum hypothesis testing and determine the optimal measurement strategy for distinguishing between multiple quantum states $\{ \rho_{j} \}$ while minimizing the error probability. In the case where the candidate states correspond to a quantum system with many qubit subsystems, implementing the optimal measurement on the entire system is experimentally infeasible.

We use RLNN to find locally-adaptive measurement strategies that are experimentally feasible, where only one quantum subsystem is measured in each round. We provide numerical results which demonstrate that RLNN successfully finds the optimal local approach, even for candidate states up to 20 subsystems. We additionally demonstrate that the RLNN strategy meets or exceeds the success probability for a modified locally greedy approach in each random trial.

While the use of RLNN is highly successful for designing adaptive local measurement strategies, in general a significant gap can exist between the success probability of the optimal locally-adaptive measurement strategy and the optimal collective measurement. We build on previous work to provide a set of necessary and sufficient conditions for collective protocols to strictly outperform locally adaptive protocols. We also provide a new example which, to our knowledge, is the simplest known state set exhibiting a significant gap between local and collective protocols. This result raises interesting new questions about the gap between theoretically optimal measurement strategies and practically implementable measurement strategies.

Reinforcement learning with neural networks (RLNN) has recently demonstrated great promise for many problems, including some problems in quantum information theory. In this work, we apply RLNN to quantum hypothesis testing, where one is given a set of multiple quantum states $\{\rho_{j} \}$ and needs to maximize the probability of guessing the correct state by finding the optimal quantum measurement.

In general, the quantum states may correspond to a large quantum system composed of multiple smaller subsystems and the optimal measurement may require simultaneously measuring all of the quantum subsystems. However, simultaneous measurements on a large number of quantum systems are typically not experimentally feasible to implement. The main result of this work is using RLNN to develop experimentally practical, locally-adaptive methods for quantum hypothesis testing where only one quantum subsystem is measured in each round. We provide numerical results which demonstrate that RLNN successfully finds the optimal local approach, even for candidate states up to 20 subsystems. Furthermore, we demonstrate that these optimal locally adaptive strategies are robust under noise.

### ► References

[1] A. Ferdinand, M. DiMario, and F. Becerra, Multi-state discrimination below the quantum noise limit at the single-photon level,'' npj Quantum Information, vol. 3, 12 2017. https:/​/​doi.org/​10.1038/​s41534-017-0042-2.
https:/​/​doi.org/​10.1038/​s41534-017-0042-2

[2] H. Krovi, S. Guha, Z. Dutton, and M. P. da Silva, Optimal measurements for symmetric quantum states with applications to optical communication,'' Physical Review A, vol. 92, Dec 2015. https:/​/​doi.org/​10.1103/​PhysRevA.92.062333.
https:/​/​doi.org/​10.1103/​PhysRevA.92.062333

[3] N. Rengaswamy and H. D. Pfister, Quantum advantage in classical communications via belief-propagation with quantum messages,'' 2020. https:/​/​doi.org/​10.1038/​s41534-021-00422-1.
https:/​/​doi.org/​10.1038/​s41534-021-00422-1

[4] A. Assalini, N. Dalla Pozza, and G. Pierobon, Revisiting the Dolinar receiver through multiple-copy state discrimination theory,'' Phys. Rev. A, vol. 84, p. 022342, Aug 2011. https:/​/​doi.org/​10.1103/​PhysRevA.84.022342.
https:/​/​doi.org/​10.1103/​PhysRevA.84.022342

[5] A. S. Holevo, Bounds for the quantity of information transmitted by a quantum communication channel,'' Problemy Peredachi Informatsii, vol. 9, no. 3, pp. 3–11, 1973.

[6] H. Yuen, R. Kennedy, and M. Lax, Optimum testing of multiple hypotheses in quantum detection theory,'' IEEE Transactions on Information Theory, vol. 21, no. 2, pp. 125–134, 1975. https:/​/​doi.org/​10.1109/​TIT.1975.1055351.
https:/​/​doi.org/​10.1109/​TIT.1975.1055351

[7] A. H. Kiilerich and K. Mølmer, Multistate and multihypothesis discrimination with open quantum systems,'' Physical Review A, vol. 97, May 2018. https:/​/​doi.org/​10.1103/​PhysRevA.97.052113.
https:/​/​doi.org/​10.1103/​PhysRevA.97.052113

[8] R. Koenig, R. Renner, and C. Schaffner, The operational meaning of min- and max-entropy,'' IEEE Transactions on Information Theory, vol. 55, p. 4337–4347, Sep 2009. https:/​/​doi.org/​10.1109/​TIT.2009.2025545.
https:/​/​doi.org/​10.1109/​TIT.2009.2025545

[9] R. Bellman, The theory of dynamic programming,'' Bull. Amer. Math. Soc., vol. 60, pp. 503–515, 11 1954. https:/​/​doi.org/​10.1090/​S0002-9904-1954-09848-8.
https:/​/​doi.org/​10.1090/​S0002-9904-1954-09848-8

[10] S. Brandsen, M. Lian, K. D. Stubbs, N. Rengaswamy, and H. D. Pfister, Adaptive procedures for discrimination between arbitrary tensor-product quantum states,'' 2019. https:/​/​arxiv.org/​abs/​1912.05087.
arXiv:1912.05087

[11] G. Tesauro, Practical issues in temporal difference learning,'' Mach. Learn., vol. 8, p. 257–277, May 1992. https:/​/​doi.org/​10.1007/​978-1-4615-3618-5_3.
https:/​/​doi.org/​10.1007/​978-1-4615-3618-5_3

[12] G. J. Gordon, Stable fitted reinforcement learning,'' in Proceedings of the 8th International Conference on Neural Information Processing Systems, NIPS’95, (Cambridge, MA, USA), p. 1052–1058, MIT Press, 1995.

[13] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing Atari with deep reinforcement learning,'' 2013. https:/​/​arxiv.org/​abs/​1312.5602.
arXiv:1312.5602

[14] V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, Human-level control through deep reinforcement learning,'' Nature, vol. 518, pp. 529–33, 02 2015. https:/​/​doi.org/​10.1038/​nature14236.
https:/​/​doi.org/​10.1038/​nature14236

[15] T. Fösel, P. Tighineanu, T. Weiss, and F. Marquardt, Reinforcement learning with neural networks for quantum feedback,'' Phys. Rev. X, vol. 8, p. 031084, Sep 2018. https:/​/​doi.org/​10.1103/​PhysRevX.8.031084.
https:/​/​doi.org/​10.1103/​PhysRevX.8.031084

[16] G. D. Paparo, V. Dunjko, A. Makmal, M. A. Martin-Delgado, and H. J. Briegel, Quantum speedup for active learning agents,'' Phys. Rev. X, vol. 4, no. 9, 2014. https:/​/​doi.org/​10.1103/​PhysRevX.4.031002.
https:/​/​doi.org/​10.1103/​PhysRevX.4.031002

[17] M. Bukov, Reinforcement learning for autonomous preparation of floquet-engineered states: Inverting the quantum kapitza oscillator,'' Phys. Rev. B, vol. 98, p. 224305, Dec 2018. https:/​/​doi.org/​10.1103/​PhysRevB.98.224305.
https:/​/​doi.org/​10.1103/​PhysRevB.98.224305

[18] A. A. Melnikov, H. Poulsen Nautrup, M. Krenn, V. Dunjko, M. Tiersch, A. Zeilinger, and H. J. Briegel, Active learning machine learns to create new quantum experiments,'' Proceedings of the National Academy of Sciences, vol. 115, no. 6, pp. 1221–1226, 2018. https:/​/​doi.org/​10.1073/​pnas.1714936115.
https:/​/​doi.org/​10.1073/​pnas.1714936115

[19] J. Mackeprang, D. Dasari, and J. Wrachtrup, A reinforcement learning approach for quantum state engineering,'' Quantum Mach. Intell. 2, 5, 2020. https:/​/​doi.org/​10.1007/​s42484-020-00016-8.
https:/​/​doi.org/​10.1007/​s42484-020-00016-8

[20] A. A. Melnikov, P. Sekatski, and N. Sangouard, Setting up experimental bell tests with reinforcement learning,'' Phys. Rev. Lett., vol. 125, p. 160401, Oct 2020. https:/​/​doi.org/​10.1103/​PhysRevLett.125.160401.
https:/​/​doi.org/​10.1103/​PhysRevLett.125.160401

[21] J. Wallnöfer, A. A. Melnikov, W. Dür, and H. J. Briegel, Machine learning for long-distance quantum communication,'' PRX Quantum, vol. 1, p. 010301, Sep 2020. https:/​/​doi.org/​10.1103/​PRXQuantum.1.010301.
https:/​/​doi.org/​10.1103/​PRXQuantum.1.010301

[22] R. Sweke, M. S. Kesselring, E. P. L. van Nieuwenburg, and J. Eisert, Reinforcement learning decoders for fault-tolerant quantum computation,'' Machine Learning: Science and Technology, vol. 2, p. 025005, jan 2021. https:/​/​doi.org/​10.1088/​2632-2153/​abc609.
https:/​/​doi.org/​10.1088/​2632-2153/​abc609

[23] F. Schäfer, M. Kloc, C. Bruder, and N. Lörch, A differentiable programming method for quantum control,'' Machine Learning: Science and Technology, vol. 1, p. 035009, Aug 2020. https:/​/​doi.org/​10.1088/​2632-2153/​ab9802.
https:/​/​doi.org/​10.1088/​2632-2153/​ab9802

[24] X.-M. Zhang, Z. Wei, R. Asad, X.-C. Yang, and X. Wang, When does reinforcement learning stand out in quantum control? a comparative study on state preparation,'' npj Quantum Inf 5, 85, 2019. https:/​/​doi.org/​10.1038/​s41534-019-0201-8.
https:/​/​doi.org/​10.1038/​s41534-019-0201-8

[25] R. Sweke, M. S. Kesselring, E. P. L. van Nieuwenburg, and J. Eisert, Reinforcement learning decoders for fault-tolerant quantum computation,'' Machine Learning: Science and Technology, vol. 2, p. 025005, Jan 2021. https:/​/​doi.org/​10.1088/​2632-2153/​abc609.
https:/​/​doi.org/​10.1088/​2632-2153/​abc609

[26] H. Xu, J. Li, L. Liu, Y. Wang, H. Yuan, and X. Wang, Generalizable control for quantum parameter estimation through reinforcement learning,'' npj Quantum Inf 5, 82, 2019. https:/​/​doi.org/​10.1038/​s41534-019-0198-z.
https:/​/​doi.org/​10.1038/​s41534-019-0198-z

[27] P. Sgroi, G. M. Palma, and M. Paternostro, Reinforcement learning approach to nonequilibrium quantum thermodynamics,'' Phys. Rev. Lett., vol. 126, p. 020601, Jan 2021. https:/​/​doi.org/​10.1103/​PhysRevLett.126.020601.
https:/​/​doi.org/​10.1103/​PhysRevLett.126.020601

[28] P. Palittpongarnpim, P. Wittek, and B. C. Sanders, Single-shot adaptive measurement for quantum-enhanced metrology,'' Quantum Communications and Quantum Imaging XIV, Sep 2016. https:/​/​doi.org/​10.1117/​12.2237355.
https:/​/​doi.org/​10.1117/​12.2237355

[29] A. Hentschel and B. C. Sanders, Machine learning for precise quantum measurement,'' Physical Review Letters, vol. 104, Feb 2010. https:/​/​doi.org/​10.1103/​PhysRevLett.104.063603.
https:/​/​doi.org/​10.1103/​PhysRevLett.104.063603

[30] P. Palittapongarnpim, P. Wittek, E. Zahedinejad, S. Vedaie, and B. C. Sanders, Learning in quantum control: High-dimensional global optimization for noisy quantum dynamics,'' Neurocomputing, vol. 268, p. 116–126, Dec 2017. https:/​/​doi.org/​10.1016/​j.neucom.2016.12.087.
https:/​/​doi.org/​10.1016/​j.neucom.2016.12.087

[31] P. Palittapongarnpim and B. C. Sanders, Robustness of quantum-enhanced adaptive phase estimation,'' Physical Review A, vol. 100, Jul 2019. https:/​/​doi.org/​10.1103/​PhysRevA.100.012106.
https:/​/​doi.org/​10.1103/​PhysRevA.100.012106

[32] Y. Eldar, A. Megretski, and G. Verghese, Designing optimal quantum detectors via semidefinite programming,'' IEEE Transactions on Information Theory, vol. 49, p. 1007–1012, Apr 2003. https:/​/​doi.org/​10.1109/​TIT.2003.809510.
https:/​/​doi.org/​10.1109/​TIT.2003.809510

[33] A. Acín, E. Bagan, M. Baig, L. Masanes, and R. Muñoz Tapia, Multiple-copy two-state discrimination with individual measurements,'' Phys. Rev. A, vol. 71, p. 032338, 2005. https:/​/​doi.org/​10.1103/​PhysRevA.71.032338.
https:/​/​doi.org/​10.1103/​PhysRevA.71.032338

[34] C. H. Bennett, D. P. DiVincenzo, C. A. Fuchs, T. Mor, E. Rains, P. W. Shor, J. A. Smolin, and W. K. Wootters, Quantum nonlocality without entanglement,'' Physical Review A, vol. 59, p. 1070–1091, Feb 1999. https:/​/​doi.org/​10.1103/​PhysRevA.59.1070.
https:/​/​doi.org/​10.1103/​PhysRevA.59.1070

[35] S. Massar and S. Popescu, Optimal extraction of information from finite quantum ensembles,'' Phys. Rev. Lett., vol. 74, pp. 1259–1263, Feb 1995. https:/​/​doi.org/​10.1142/​9789812563071_0023.
https:/​/​doi.org/​10.1142/​9789812563071_0023

[36] K. Flatt, S.M. Barnett, and S. Croke, Multiple-copy state discrimination of noisy qubits'', Phys. Rev. A, vol. 100, pp. 032122, Sep 2019. https:/​/​doi.org/​10.1103/​PhysRevA.100.032122.
https:/​/​doi.org/​10.1103/​PhysRevA.100.032122

[37] B.L. Higgins, A.C. Doherty, S.D. Bartlett, G.J. Pryde, and H.M. Wiseman, Multiple-copy state discrimination: Thinking globally, acting locally'', Phys. Rev. A, vol. 81, p. 052314, 2011. https:/​/​doi.org/​10.1103/​PhysRevA.83.052314.
https:/​/​doi.org/​10.1103/​PhysRevA.83.052314

[38] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, OpenAI gym,'' 2016. https:/​/​arxiv.org/​abs/​1606.01540.
arXiv:1606.01540

[39] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms,'' 2017. https:/​/​arxiv.org/​abs/​1707.06347.
arXiv:1707.06347

[40] R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica, Tune: A research platform for distributed model selection and training,'' arXiv:1807.05118, 2018.
arXiv:1807.05118

[41] E. Liang, R. Liaw, P. Moritz, R. Nishihara, R. Fox, K. Goldberg, J. E. Gonzalez, M. I. Jordan, and I. Stoica, Rllib: Abstractions for distributed reinforcement learning,'' 2017. https:/​/​arxiv.org/​abs/​1712.09381.
arXiv:1712.09381

[42] M. Sasaki, K. Kato, M. Izutsu, and O. Hirota, Quantum channels showing superadditivity in classical capacity,'' Phys. Rev. A, vol. 58, pp. 146–158, Jul 1998. https:/​/​doi.org/​10.1103/​PhysRevA.58.146.
https:/​/​doi.org/​10.1103/​PhysRevA.58.146

[43] S. Virmani, M. Sacchi, M. Plenio, and D. Markham, Optimal local discrimination of two multipartite pure states,'' Physics Letters A, vol. 288, p. 62–68, Sep 2001. https:/​/​doi.org/​10.1016/​S0375-9601(01)00484-4.
https:/​/​doi.org/​10.1016/​S0375-9601(01)00484-4

[44] S. Croke, S. Barnett, and G. Weir, Optimal sequential measurements for bipartite state discrimination,'' Physical Review A, vol 95, no 5, 2017. https:/​/​doi.org/​10.1103/​PhysRevA.95.052308.
https:/​/​doi.org/​10.1103/​PhysRevA.95.052308

[45] G. Weir, C. Hughes, S. M. Barnett, and S. Croke, Optimal measurement strategies for the trine states with arbitrary prior probabilities,'' 2018. https:/​/​arxiv.org/​abs/​1803.03590.
arXiv:1803.03590

[46] M. Ban, Optimum measurements for discrimination among symmetric quantum states and parameter estimation,'' International Journal of Theoretical Physics, vol. 36, no. 6, pp. 1269–1288, 1997. https:/​/​doi.org/​10.1007/​BF02435921.
https:/​/​doi.org/​10.1007/​BF02435921

### Cited by

On Crossref's cited-by service no data on citing works was found (last attempt 2022-05-28 18:56:59). On SAO/NASA ADS no data on citing works was found (last attempt 2022-05-28 18:56:59).