Photonic quantum data locking

1 Quantum data locking is a quantum phenomenon that allows us to encrypt a long message with a small secret key with information-theoretic security. This is in sharp contrast with classical information theory where, according to Shannon, the secret key needs to be at least as long as the message. Here we explore photonic architectures for quantum data locking, where information is encoded in multi-photon states and processed using multi-mode linear optics and photo-detection, with the goal of extending an initial secret key into a longer one. The secret key consumption depends on the number of modes and photons employed. In the no-collision limit, where the likelihood of photon bunching is suppressed, the key consumption is shown to be logarithmic in the dimensions of the system. Our protocol can be viewed as an application of the physics of Boson Sampling to quantum cryptography. Experimental realisa-tions are challenging but feasible with state-of-the-art technology, as techniques recently used to demonstrate Boson Sampling can be adapted to our scheme (e.g., Phys. Rev. Lett. 123 , 250503, 2019).


Introduction
In classical information theory, a celebrated result of Shannon states that a message of N bits can only be encrypted using a secret key of at least N bits [1]. This result, which lays the foundation of the security of the one-time pad, does not necessarily apply when information is encoded into a quantum state of matter or light.
The phenomenon of Quantum Data Locking (QDL), first discovered by DiVincenzo et al. [2], shows that a message of N bits, when encoded into a quantum system, can be encrypted with a secret key of k N bits. QDL guarantees information-theoretic security  2. To send information to Bob, first Alice uses a secret key of log K bits to choose one particular unitary transformation, i.e., one particular basis in the agreed set of K bases.
3. Alice selects M basis vectors, {U k |j x } x=1,...,M from the chosen basis and use them as a code to send log M bits of classical information through the quantum channel. This encoding of classical information into a quantum system A is described by the classical-quantum state where X is the classical variable encoded by Alice, which is represented by a set of M orthogonal vectors {|x } x=1,...,M in a dummy quantum system.
In this work we assume that different code words have equal probability. As the goal of the protocol is to extend an initial secret key into a longer one, using equally probable code words is a natural assumption. It makes the analysis of the QDL protocol easier, although it can be relaxed [21,22].
The code words prepared by Alice are then sent to Bob through a quantum channel described as a completely positive and trace preserving map N A→B that transforms Alice's system A into Bob's system B. The channel maps the state in Eq. (1) into We ask a QDL protocol to have the properties of correctness and security.
Correctness. The property of correctness requires that, if Bob knows the secret key used by Alice to chose the code words, then he is able to decode reliably. For example, if the channel is noiseless, then N is the identity map and In this case, Bob can simply apply the inverse unitary, U −1 k , followed by a measurement in the computational basis. In this way, Bob can decode with no error for any M ≤ d. If the channel is noisy, Alice and Bob can still communicate reliably at a certain rate of r < log d bits per channel use. This is possible by using error correction at any rate below the channel capacity, r max = I(X; Y |K) [23]. Here I(X; Y |K) denotes the mutual information between the input variable X and the output of Bob's measurement Y , given the shared secret key K. Notice that here we need classical error correction and not quantum error correction, as the goal of Alice and Bob is to exchange classical information and not quantum information. Furthermore, we apply post facto error correction, as it is commonly done in quantum key distribution [24], in which error correcting information is sent independently on a classical authenticated public channel.
We emphasize the importance of the assumption that the adversary has no quantum memory for the security of post facto error correction. This assumption guarantees that a potential eavesdropper has already measured their share of the quantum system when the error correction information is exchanged on a public channel. If b bits of error correcting information are communicated on a public channel, then the eavesdropper cannot learn more than b bits of information about the message 2 . If instead the eavesdropper has a quantum memory with storage time τ , then Alice and Bob need to wait for a time larger than τ after the quantum signal have been transmitted and before proceeding with post facto error correction. In this work we assume that Alice and Bob know an upper bound on τ .
Security. The property of security requires that, if Bob does not know the secret key, he can obtain no more than a negligibly small amount of information about Alice's input variable X. To clarify this, consider that, if Bob does not know the secret key used by Alice, then his description of the classical quantum state is the average of Eq. (2), In QDL, the security is quantified using the accessible information [2,6] (or similar quantities [7,21,25]). Recall that the accessible information I acc (X; B) σ is defined as the maximum information that Bob can obtain about X by measuring his share of the state σ, that is, where the optimization is over the measurement maps M B→Y on system B, and I(X; Y ) is the mutual information between X and the outcome Y of the measurement. The security 2 To see that the public channel for error correction does not render the protocol insecure, we note that Eve's additional information about the secret key is bounded by classical information theory as follows. Let X be the message sent by Alice, Z the output of Eve's measurement, and I(X; Z) the mutual information. After error correction, Eve obtains a bit string C(X). Hence, we need to consider the mutual information I(X; ZC(X)). It follows from the property of incremental proportionality [2] of the mutual information that I(X; ZC(X)) ≤ I(X; Z) + H(C(X)), where H(C(X)) is the entropy of C(X). This implies that, knowing C(X) after she measured the quantum system, Eve cannot learn more than H(C(X)) bits about the message X.
property can be defined in different ways, depending on how the state σ is chosen. Here we consider a strong notion of QDL [3] and put This is equivalent to saying that the information remains encrypted even if Bob is capable of accessing the quantum resource directly without the mediation of a noisy channel. The data processing inequality [23] then implies that the protocol is secure for noisy channels too. In conclusion, we say that the protocol is secure if I acc (X; B) = O( log M ), with arbitrarily small. This means that only a negligible fraction of the information can be obtained by measuring the quantum state without having knowledge of the secret key.
Intuitively, we expect that the larger K, the smaller the accessible information. This intuition has been proven true using tools from large deviation theory and coding theory [4,6,7]. The mathematical characterization of a QDL protocol consists in obtaining, for given > 0, an estimate of the minimum integer K such that there exist choices of K = K bases that guarantee I acc (X; Y ) = O( log M ).
Finally, the net secret key rate that can be established between Alice and Bob, through a noisy communication channel N , is where β ∈ (0, 1) is the efficiency of error correction, and we have subtracted the initial amount log K of secret bits shared between Alice and Bob. We emphasise that the mutual information I(X; Y |K) depends on the particular noisy channel, whilst log K is universal. The noisier the channel, the smaller I(X; Y |K), which accounts for the error correction overhead. The factor β accounts for the fact that practical error correction requires more overhead than expected in theory.

Multiphoton encoding
Let n photons be sent into m optical modes of an interferometer with at most one photon per input mode. The input modesâ evolve into UâU † , with U the unitary transformation describing the interferometer: A passive multi-mode interferometer realises a unitary transformation that preserves the total photon number. The set of all possible transformations that can be realised in this way defines the group of linear passive optical (LOP) unitary transformations, which is isomorphic to the m-dimensional unitary group U (m) (see e.g. Ref. [20]). By Shur's lemma, the group of LOP unitaries has irreducible representations in the subspaces with definite photon number. For applications to photonic QDL, the representation with 1 photon has been studied in previous works [3,10]. This representation has the unique feature of being the fundamental representation of U (m). However, representations with higher photon number that we are considering here are no longer the fundamental representation.
The output from the interferometer prior to photo-detection can be expanded in the photon-number basis: where n = (n 1 , n 2 , . . . , n m ) denotes a photon-number configuration with n i photons in the i-th mode and λ n its amplitude. The aim of this paper is to characterize a particular family of QDL protocols, where information is encoded into m ≥ 2 optical modes using n > 1 photons. We define the code words by putting photons on different modes, with no more than one photon per mode.
In this way we obtain a code book C m n that contains C = m n code words, whereas the overall Hilbert space defined by n photons on m modes has dimensions d = n+m−1 n (this includes states with more than one photon in a given mode). For example, with m = 4 modes and n = 1 photon, we have the C = 4 code words |1000 , |0100 , |0010 , |0001 . With n = 2 photons, we instead obtain the C = 6 code words |1100 , |0011 , |1001 , |0110 , |1010 , |0101 .
The two users, Alice and Bob, are linked via an optical communication channel that allows Alice to send m optical modes at the time. Initially, we assume the channel is noiseless. Later we will extend to the case of a noisy channel. The goal of the protocol, which is shown schematically in Fig. 1, is for Alice and Bob to expand an initial secret key of log K bits into a longer one.
For given n and m, Alice defines a code bookC m n by choosing a subset of M < C code words from C m n . The code book is publicly announced. We denote the code words as |ψ x , with x = 1, . . . , M . To encrypt these code words, Alice applies an m-mode LOP unitary transformation from a set of K elements {U k } k=1,...,K . The unitary is determined by the value of her secret key of log K bits. We recall that any LOP unitary can be realised as a network of beam splitters and phase shifters [26,27].
We can directly verify the correctness property for a noiseless communication channel. In this case, Bob, who knows the secret key, applies U −1 k and measures by photo-detection. He is then able to decrypt log M bits of information with no error. This implies that Alice and Bob can establish a key of log M bits for each round of the protocol.
To characterise the secrecy of the QDL protocol, we need to identify the minimum key size K . This is the task that we accomplish in the following sections below.

Preliminary considerations
Before presenting our main results, we need to introduce some notation and preliminary results. First, consider the following state, which is defined by taking the average over the LOP unitary U acting on a state ψ. Here E U denotes the expectation value over the invariant measure (i.e., the Haar measure) on the group LOP unitary transformations acting on m optical modes. The choice of the invariant measure is somewhat arbitrary and other measures can be used, see e.g. Ref. [28]. In Eq. (10), ψ is a vector in the code book C m n . By symmetry,ρ B is independent of ψ.
By symmetry, the stateρ B is block-diagonal in the subspaces H q , i.e., We are particularly interested in the smallest coefficient in this expansion, which can be computed numerically for given n and m. Examples are shown in Table 1.
The results of our numerical estimations suggest that the minimum is always achieved for the pattern q min = (1, 1, 1, .., 0, 0), i.e., when each mode contains at most 1 photon. An analytical expression for c (1,1,1,..,0,0) is given in Ref. [29], Supported by the results of our numerical search, we formulate the following conjecture: We have used this conjecture to produce the plot in Fig. 3. If the number of modes is much larger than the number of photons squared, m n 2 1, the probability that two or more photons occupy a given mode is highly suppressed. In this limit, we have c min = n!/m 2 (see Appendix D).
The other quantity we are interested in is where the maximum is over a generic n-photon vector φ, and ψ is a vector in the code book C m n . Again, because of symmetry, γ is independent of ψ. Note that γ quantifies how much the transition probability | φ|U |ψ | 2 changes when a random unitary is applied. In the regime of m n 2 1, an analytical bound can be computed and we obtain γ ≤ 2(n + 1). This is discussed in Appendix D.

Results
Our main result is an estimate of the minimum key size K that guarantees that the accessible information I acc (X; B) is of order . This estimate is expressed in terms of the parameters c min and γ introduced in Section 4.
Proposition 1 Consider the QDL protocol described in Section 3, which encodes log M bits of information using n photons over m modes. For any , ξ ∈ (0, 1), and for any K > K , there exist choices of K linear optics unitaries such that I acc (X; B) < 2 log 1 c min , where (24) and M = ξC. Recall that d = n+m−1 n is the dimension of the Hilbert space with n photons over m modes, and C = m n is the number of states with no more than one photon per mode.
The parameters γ and c min depend on the particular values of n and m. We identify three regimes for n and m: 1. For n = 1, the group of linear optical passive unitaries spans all unitaries in the subspace of n = 1 photon over m modes. The single-photon representation of the group of LOP unitaries is the fundamental representation of U (m). We then obtain γ = 2 and c min = 1/m [4,12].
3. For generic values of n and m, to the best of our knowledge both γ and c min need to be calculated numerically. The estimation can be simplified if we assume Conjectures 1 and 2 introduced in Sec. 4.
We can write Eq. (24) as where the functions f and g scale as log (1/ ). For illustration, Fig. 2 shows log M and an estimate of log K as functions of n. To obtain the plot, we have chosen m = n 3 and used the limiting values for the parameters γ = 2(n + 1) and c min = n!/m n . Note that, as is expected to be sufficiently small, this estimate for the secret key size is useful only in the limit of asymptotically large K , i.e., when one encodes information using asymptotically many modes and photons. This is certainly not the regime one is willing to test in an experimental demonstration of QDL.
The QDL protocol outperforms the classical one-time pad when log M > log K , for some reasonably small value of . Some numerical examples are in Fig. 2, which show the gap between log M and log K increases with increasing number of modes and photons. For example, for n = 20, m = 8000, ξ = 0.01, and = 10 −10 , we obtain log M 192 and log K 127 < 0.7 log M . This shows explicitly that we can achieve information theoretical security with a private key shorter than the message if n and m are large enough.
Scaling up the communication protocol : in a practical communication scenario, not only one signal, but a large number of signals are sent from Alice to Bob through a given quantum communication channel. Consider a train of ν 1 channel uses, where Alice encodes a classical variable X (ν) into tensor-product code words of the form where each component ψ x 1 is a state of n photons over m modes. Over ν channel uses, the total number of code words is denoted as M (ν) = ξC n , and the code rate is lim ν→∞ 1 ν log M (ν) = log C. Similarly, Alice applies local unitaries to these code words,  24)). This is obtained using γ = 2(n + 1) and c min = n!/m n , i.e., assuming the values in the limit of no-collision. The other parameters are: = 2 −n s , s = 0.5 (red dashed); s = 1 (purple dotted); = 10 −10 (green dotted dashed). If we choose the security parameter ∝ 2 −n s , then I acc → 0 as n → ∞. When the blue curve is higher than the other curves, the message is longer than the key. In this case, QDL beats the classical one-time pad and allows to expand the initial secret key of log K bits into a longer key of log M bits.
for a total number of K (ν) allowed unitaries acting on ν channel uses. We denote as B ν the outputs of ν channel uses received by Bob. The security condition on the mutual information then reads I acc (X ν ; B ν ) = O( log M (ν) ).
The minimum secret key consumption rate then reads Corollary 1 allows us to estimate the net secret key rate as the difference between the code rate and the secret key consumption rate, where conjecture 1 implies k = log γ + log d C . If r QDL > 0, then the QDL is successful in beating the classical one-time pad and generates a secret key at a rate of log C bits per channel use larger than the key consumption rate of k bits.
We can compare these results with the classical one-time pad encryption as well as previously known QDL protocols. We consider the three parameters that characterise symmetric key encryption: the length log K of the initial secret key, the length log M of the message, and the security parameter . Classical one-time pad requires log K = log M for perfect encryption ( = 0). Therefore, the comparison with QDL makes sense in the regime where can be made arbitrary small. In this regime, we can then say that a QDL protocol beats the classical one-time pad if K M .
The QDL protocol that has up to now the largest gap between K and M was proposed by Fawzi et al. in Ref. [7]. This protocol requires an initial key of constant size log K ∼ log 1/ for any sufficiently large M . This is obtained by using random unitaries in the M -dimensional Hilbert space, and therefore requires a universal quantum computer acting on a large Hilbert space.
Proposition 1 shows that there exist QDL protocols with log K ∼ O(log 1/ )+log (d/M ) = O(log 1/ ) + log (d/C) + log (1/ξ). Comparing with Ref. [7], the length of the secret key has an overhead proportional to The advantage with respect to Ref. [7] is that the encryption only requires linear optical passive unitaries. For m and n large, using the Stirling approximation we obtain which becomes negligibly small in the limit of diluted photons, m n 2 1. Corollary 1 shows the existence of QDL protocols for ν channel uses where a secret key of log K ∼ ν (log γ + log d/C) allows us to encrypt log M ∼ ν log C, where → 0 in the limit that ν → ∞, and the constant γ depends on the particular choice of the parameters n and m. Note that in these protocols the secret key length log K is not constant, but scales as the message length log M . Although they have the same scaling, we can still have log M > log K in some regime. Despite being less efficient in terms of key use, the advantage of these protocols is that they only need linear optics passive unitaries acting on a small number of photons and modes, i.e., n and m can be chosen finite and small. For example, for n = 10 photons over m = 30 modes, we obtain log M 25 and log (d/M ) 4.4 < 1 5 log M . From table 4 we also obtain the numerical estimates log γ < log (111.5) 6.8 < 1 3 log M . Putting k = lim ν→∞ 1 ν log K, we obtain the following estimate for the asymptotic rate of secret key consumption, This shows explicitly that less than log M bits of secret key are used to encrypt a message of log M bits. Therefore, the net key generation rate in this case is In Section 8 we consider the effect of photon loss in terms of the net rate per mode, r QDL /m.

Proof of Proposition 1
We prove the proposition using a random-coding argument. We show that a random choice of the code and of the set of scrambling unitaries leads, with high probability, to a QDL protocol that satisfies the security property. The code bookC m n of cardinality M is randomly chosen by sampling from the code book C m n of cardinality C. We put M = ξC. For ξ 1, we expect that the M code words are all distinct up to terms of second order in ξ. Therefore the M code words encode log M − O(log (1/ξ)) bits of information.
The sender Alice first prepares a state |ψ x , then applies a linear optics unitary U k . The unitary is chosen among a pool of K elements according to a secret key of log K bits. We choose the pool of unitaries by drawing K unitaries i.i.d. according to the uniform Haar measure on the group U LO (m) of linear optics unitary transformations on m modes. If the receiver does not know the secret key, the state is described by the density operator Given the classical-quantum state Bob attempts to extract information from this state by applying a measurement M B→Y . Such a measurement is characterised by the POVM elements {α y |φ y φ y |} y , where φ y 's are unit vectors and α y > 0 such that y α y |φ y φ y | = I, with I the identity operator. Without loss of generality we can consider rank-one POVM only [2]. The output of this measurement is a random variable Y with probability density and conditional probability The accessible information is the maximum mutual information between X and Y : where This yields Note that the accessible information is written as the difference of two entropy-like quantities. The rationale of the proof is to show that for K large enough, and for random choices of the unitaries and of the code words, both terms in the curly brackets are arbitrarily close to for all vectors φ y , whereρ B is as in Eq. (10). This in turn implies that the accessible information can be made arbitrarily small. To show this we exploit the phenomenon of concentration towards the average of the sum of i.i.d. random variables. This concentration is quantified by concentration inequalities. We now proceed along two parallel directions. First, we apply the matrix Chernoff bound [30] to show that 1 In particular the matrix Chernoff bound implies that the inequality holds true up to a failure probability This in turn implies uniformly for all φ. The details are presented in Appendix A below. Second, we apply a tail bound from A. Maurer [31] to show that up to a failure probability The above applies uniformly to all unit vectors φ and for almost all values of x. This implies that In conclusion, we obtain that, up to a probability smaller than p 2 , The details are presented in Appendix B.
Putting the above results in Eq. (45) and (49) into Eq. (41) we finally obtain Recall that p Y (y) = α y φ y |ρ B |φ y is a probability distribution. Therefore, as the average is always smaller that the maximum, we obtain where c min := min φ φ|ρ B |φ can be computed as shown in Section 4. The above bound on the accessible information is not deterministic, but the probability p 1 + p 2 that it fails can be made arbitrary small provided K is large enough (see Appendix C for details). This probability is bounded away from 1 if and The size of K critically depends on the factor γ, which determines the convergence rate of the Maurer tail bound. How to estimate this coefficient is the subject of Appendix D.

Proof of Corollary 1
Consider a train of ν 1 channel uses. Alice encodes information using M (ν) code words of the form |ψ x = |ψ x 1 ⊗ |ψ x 2 ⊗ . . . |ψ xν , where each component ψ x 1 is chosen randomly and independently from the code book C m n , which has cardinality C. Each ν-fold code word is uniquely identified by the multi-index 1 is a small positive constant. First Alice encodes information across ν signal uses using the code words ψ x , then she applies local unitaries U k = U k 1 ⊗ U k 2 · · · ⊗ U kν to scramble them. The set of possible unitaries is made of K (ν) elements. These unitaries are chosen by sampling identically and independently from the Haar measure on the unitary group U LO (m) of linear optical passive unitary transformations on m modes. Note that, whereas ν is arbitrary large, the number of modes m in each signal transmission will be kept constant and relatively small. Also, the number of photons per channel use is fixed and equal to n.
In conclusion, we can straightforwardly repeat the proof of Section 6 with these new parameters. This yields that, for any arbitrarily small , the bound holds with non-zero probability provided that (recall that M (ν) = ξC ν ) (62) Finally, in the limit of ν 1, and since lim ν→∞

Noisy channels
A practical communication protocol needs to account for loss and noise in the communication channel. This requires us to introduce error correction in the classical post-processing. We address this issue here and show that the structure of our proof encompasses a large class of error correcting protocols.
In the case of a noisy and lossy channel, Alice and Bob can still use the channel by employing error correction. Error correction comes with an overhead that reduces the maximum communication rate from log M (the maximum amount of information that can be conveyed through a noiseless channel) to I(X; Y |K) ≤ log M , where I(X; Y |K) is the mutual information given that both Alice and Bob know the secret key K.
The amount of loss and noise in the communication channel can be experimentally determined with the standard tools of parameter estimation, a routine commonly employed in quantum key distribution. This in turn allows Alice and Bob to quantify I(X; Y |K).
In principle, error correction allows Alice and Bob to achieve a communication rate arbitrarily close to I(X; Y |K). In practice, however, we can only partially achieve this goal. To model this fact, one usually introduce the error correction efficiency factor β ∈ (0, 1). Putting this together with Corollary 1, we obtain our estimate for the net rate of the protocol:  56)) over the classical one-time pad, in the presence of loss. A positive rate expresses the fact that the QDL protocol allows us to generate more secret bits than it consumes, hence beating the classical one-time pad encryption. The estimates of the parameters γ and c min are obtained by assuming Conjectures 1 and 2. We see that the information density per mode increases as m increases. We have chosen n to maximise the rate. The optimal value of n depends on η, and n ≈ m/3 for η ≈ 1. For moderate losses, the optimal n decreases. This suggests that QDL may be observed with high loss by increasing the number of modes. These values for the number of photons and modes are similar to those of a recent experimental demonstration of Boson Sampling [32].
where a positive net rate expresses the fact that the QDL protocol allows us to expand the initial secret key into a longer one.
As an example, consider the case where Alice and Bob communicate through a lossy optical channel. The efficiency factor η ∈ (0, 1) represents the probability that a photon sent by Alice is detected by Bob, including both channel losses and detector efficiency. The mutual information I(X; Y |K) between Alice and Bob can be computed explicitly (see Appendix E for detail). We obtain : (66) Fig. 3 shows the quantity r QDL /m, i.e., the number of bits per mode, for β = 1, for a pure loss channel with transmissivity η. The plot is obtained assuming Conjectures 1 and 2. This shows that QDL can be demonstrated experimentally with loss and inefficient detectors. In particular, higher loss can be tolerated by increasing the number of optical modes. Note that the values for the number of photons and modes used to obtain this figure have been achieved experimentally in Ref. [32].

Conclusions
The phenomenon of Quantum Data Locking (QDL) represents one of the most remarkable separations between classical and quantum information theory. In classical information theory, information-theoretic encryption of a string of N bits can be only made by exploiting a secret key of at least N bits. This is realised, for example, by using a one-time pad. By contrast, QDL shows that, if information is encoded into a quantum system of matter or light, it is possible to encrypt N bits of information with a secret key of k N bits. QDL is a manifestation of the uncertainty principle in quantum information theory [8,9].
Initial works on QDL have focused on abstract protocols defined in a Hilbert space of asymptotically large dimensions. More recent works have extended QDL to system of relatively small dimensions that are transmitted through many uses of a communication channel. This approach allowed to incorporate error correction and led to one of the first experimental demonstrations of QDL in an optical setup [13].
Inspired by Boson Sampling [33,34], in this work we have further extended QDL to a setup where information is encoded using multiple photons scattered across many modes, and processed using linear passive optics. The extension of QDL to multiphoton states is technically challenging due to role played by higher-order representations of the unitary group.
Our protocols for multiphoton QDL has the potential to data-lock more bits per optical mode, hence can achieve a higher information density. Experimental realisations of our protocols are challenging but feasible with state-of-the-art technology. This is suggested by recent results in photon generation and advances in integrated linear optics, e.g., Ref. [32] reported interference of 20 photons across 60 modes.
Several works have attempted to apply the physical insights of Boson Sampling in a quantum information framework beyond its defining problem. In this paper, we provide a protocol for quantum cryptography based on the physics of Boson Sampling. We have presented an information-theoretic proof that a linear-optical interferometer, fed with multiple photons, is useful for quantum cryptography. The security of our protocol does not rely on the classical computational complexity of Boson Sampling. Therefore it holds for any number of modes m and photon number n. The security proof is based on QDL and random coding techniques. We have shown that our protocol remains secure when we use classical error correction to protect the channel against photon loss and other errors. It is therefore a scalable and efficient protocol for quantum cryptography.

A Matrix Chernoff bounds
The matrix Chernoff bound states the following (this formulation can be obtained directly from Theorem 19 of Ref. [30]): Hermitian-matrix-valued random variables, with X t ∼ X, 0 ≤ X ≤ R, and c min ≤ E[X] ≤ c max . Then, for δ ≥ 0: where Pr{x} denotes the probability that the proposition x is true, and Note that for δ > 1 we have and for δ < 1 First consider the collection of M code words ψ x . We apply the Chernoff bound to the M independent random variables X x = |ψ x ψ x |. Note that these operators are defined in a C-dimensional Hilbert space. For τ > 1 we then have Consider now the collection of K random variables X k = 1 M x U k |ψ x ψ x |U † k . We assume that they are bounded by R = 1+τ C . We apply again the Chernoff bound: Thus the total probability reads Putting τ = C Kc min 2M we obtain In conclusion we have obtained that, up to a probability smaller than p 1 ,

B The Maurer tail bound
We also need to apply the following concentration inequality due to A. Maurer [31]: ..,K be K i.i.d. non-negative real-valued random variables, with X k ∼ X and finite first and second moments, E[X], E[X 2 ] < ∞. Then, for any τ > 0 we have that For any given x and φ, we apply this bound to the random variables Note that (see Section 4) and The application of the Maurer tail bound then yields where Note that, by symmetry, γ is independent of ψ x . The calculation of γ is presented in Appendix D.

B.1 Extending to almost all code words
The probability bound in Eq. (80) is about one given value of x. Here we extend it to distinct values x 1 , x 2 , . . . , x : where we have used the fact that for different values of x the variables are statistically independent (recall that the code words are chosen randomly and independently). Second, we extend to all possible choices of code words. This amount to a total of M events.
Applying the union bound we obtain

D Estimating the factor γ
The goal of this Appendix is to estimate the factor γ that determines the secret key consumption rate. The objective is therefore to evaluate the first and second moments of the random variable where φ restricted to be a vector in the single-occupancy subspace H 1 , which is our code space. A generic state can be written as We can apply the Cauchy-Schwarz inequality as shown in Section 4. This yields (see Eq. (21)): By symmetry, the quantities do depend on q but not on the particular vector φ q in the subspace H q , nor on the code word ψ. Therefore for each q, γ q can be computed numerically and in turn obtain an estimate for the upper bound on the speed of convergence γ ≤ 2 max Λ k 1 l 1 Λ k 1 l 2 Λ k 1 l 3 Λ k 2 l 1 Λ k 2 l 2 Λ k 2 l 3 Λ k 3 l 1 Λ k 3 l 2 Λ k 3 l 3   

(103)
The object Λ[1 i1 , 2 i2 , ..|1 j1 , 2 j2 ...] denotes a matrix whose entries are taken from the matrix Λ, and whose row index l occurs i l times, and whose column index k occurs j k times. For example (105) Using Eq. (105), we can calculate Eq. (100) for a particular photon occupancy pattern. We numerically compute γ q for different photon patterns for n between 2 and 8, examples are given in Table 2 and 3. Note that the number of configurations to search over grows exponentially with n, and thus the search becomes infeasible with high n. The calculations were performed in Python by computing the permanents of n×n submatrices of the m×m unitaries generated from the Haar measure. The expectation value is taken by averaging over ∼ 10 6 runs. We observe that the highest value of γ q is achieved when all the photons populate only one mode. To make the calculation feasible, we conjecture (Conjecture 2) that this is also true for higher n; in this case, the computation can be performed much more efficiently because the submatrices have repeated rows. This conjecture has been used to produce the plots in Fig. 3. We repeat the calculation for n = 9 to 13, where the results are shown in Table 4.
We now consider the regime of m n 2 in which we can neglect photon bunching. Therefore, we compute the first and second moments of the random variable X = | ψ j |U |ψ j | 2 . (106) This is a little less general than (98) because ψ j is not a generic vector in H m n . In fact ψ j and ψ j identify two sets of modes, with labels (i 1 , i 2 , . . . i n ) and (i 1 , i 2 , . . . i n ), respectively. This corresponds to photon-counting on the modes, which as we know, maps onto n × n sub-matrix A (jj ) of the unitary matrix U : The random variable X is the modulus square of the permanent of A (jj ) : where the sum is over all the permutations π.
To further explore the statistical properties of the permanent, it is useful to recall that a given entry of a random m × m unitary is itself distributed approximately as an complex Gaussian variable with zero mean and variance 1/m. If instead we consider a submatrix of size n × n the entries are with good approximation independent Gaussian variables as long as n m [33]. This means that the entries A since the non-zero terms are given by i = j, τ = σ. From Lemma 56 of Ref. [33], the fourth moment of the permanent can be computed as In conclusion, we have obtained From which it follows, γ ≤ 2(n + 1) .