Device-independent quantum key distribution from generalized CHSH inequalities

Device-independent quantum key distribution aims at providing security guarantees even when using largely uncharacterised devices. In the simplest scenario, these guarantees are derived from the CHSH score, which is a simple linear combination of four correlation functions. We here derive a security proof from a generalisation of the CHSH score, which effectively takes into account the individual values of two correlation functions. We show that this additional information, which is anyway available in practice, allows one to get higher key rates than with the CHSH score. We discuss the potential advantage of this technique for realistic photonic implementations of device-independent quantum key distribution.


Introduction
The aim of quantum key distribution (QKD) is to give two parties -Alice & Bob -the possibility to generate a secret key when they share a quantum channel. For instance, in the implementation proposed by Ekert [1], the channel consists of a source producing entangled particles that are distributed to Alice & Bob. At each round, each of Alice & Bob measure one particle by choosing one out of several measurement settings. The claim that Alice's measurement results are secure, i.e. unknown to any third party -Eve -who may control the quantum channel, is guaranteed by inferring (from Alice and Bob's measurement results) that the source emits states close to pure bipartite entangled states. This ensures at the same time that Bob's results are correlated to Alice's ones if he chooses an appropriate measurement setting, i.e. Alice and Bob's measurement results can form a secret key.
Ekert suggested that the information about the key that may be available to an adversary can be quantified by choosing settings allowing Alice & Bob to violate a Bell inequality. This idea was later pro-gressively formalised and led to what is now called device-independent QKD (DIQKD). In its simplest version, DIQKD is implemented by letting Alice choose randomly between two measurement settings at each round, A x where x ∈ {0, 1}, while Bob's measurement includes three possible settings, B y where y ∈ {0, 1, 2}. For settings x, y ∈ {0, 1}, the results -which can possibly take many values -are post-processed locally and turned into binary values A x , B y ∈ {−1, +1}. After several iterations, Alice and Bob communicate classically to estimate the CHSH score where A x ⊗ B y = p(A x = B y |x, y) − p(A x = B y |x, y) quantifies the correlation between the outcomes for measurement choices x and y, respectively. The remaining measurement setting y = 2 is chosen to generate an outcome B 2 that minimises the uncertainty with respect to A 0 . Alice then forms the raw key from the outcomes A 0 of the pairs that Bob measured with respect to y = 2.
We consider n such rounds, over which the source produces a tripartite state |Ψ ABE shared between Alice, Bob and Eve. Ref. [2] showed that Eve's information is the same as in the case where the devices have no memory and behave identically and independently in each communication round of the protocol, up to corrections vanishing with n. In particular, we can write |Ψ ABE = |ψ ⊗n ABE where |ψ ABE is the tripartite state of a single round and consider the case where measurements are done successively on the state |ψ ABE .
In the asymptotic limit of large n, the number of secret bits per round obtained after one-way error correction and privacy amplification (i.e. the key rate) is then given by [3] where H is the von Neumann entropy. Ref. [4] showed that the conditional entropy H(A 0 |E) optimized over all states ψ ABE and measurements A x , B y compatible with the observed CHSH score S is lower bounded by where h denotes the binary entropy. This provides a lower bound on the key rate, as the conditional entropy H(A 0 |B 2 ) can be estimated directly from Alice and Bob measurement results associated to setting choices A 0 and B 2 . Interestingly, this bound is obtained device-independently, i.e. without assumptions on the dimension of quantum states and the calibration of measurements. This is not the case for standard (non-device independent) QKD protocols which are not based on the violation of a Bell inequality and whose security guarantees rely on the assumption that the source and measurements carry out precisely the operations foreseen by the protocol. This assumption is hard to meet in practice and leads to vulnerabilities, as demonstrated by hacking experiments [5,6,7,8]. The robustness of device-independent quantum key distribution against these attacks makes it appealing, and a race between several experimental groups is ongoing to report the first proof-of-principle distribution of a key with a fully device-independent security. Measurement-DIQKD, a precursor of DIQKD where device-independence only applies to the measurement devices, but not to those used for state preparation [9,10], already admits a number of experimental implementations [11,12,13,14,15].
Let us note that the proof leading to the bound given in Eq. (3) only uses the knowledge of the CHSH score. This score is computed as a linear combination of the correlation functions A x B y , but the additional information provided by considering these correlations individually -which is anyway available in practice -might help to facilitate a realisation of device-independent quantum key distribution. This motivation is at the core of this work.
Concretely, we consider the individual values of two terms appearing in the CHSH score, namely The use of values of X and Y proved to be useful for device-independent state certification by improving the certified fidelity from the CHSH score [16]. It is also expected to be useful in DIQKD, as the knowledge of X and Y allows one to differentiate the contributions of the key generating measurement A 0 from the ones associated with A 1 , from which no key is generally extracted (see [17] for a noticeable exception). Finally, in implementations of DIQKD with non-unit detection efficiencies where no-detection events are attributed a fixed value ±1, no-detections on Bob's side can only contribute to one of these two correlation functions (X or Y). The goal of the following sections is to derive a tight bound on Eve's entropy in terms of the expected values X and Y, like the bound in Eq. (3) is a function of the CHSH score S. The main result of this work is to confirm the intuition that the use of individual values of X and Y improves the bounds on Eve's information derived from the CHSH score, and hence the key rate of DIQKD. We also apply the new bound to a concrete setup using a photon pair source based on spontaneous parametric down conversion (SPDC) and photon detections and show that it leads to a subtantial improvement of the key rate, at least for high detection efficiency.

Formulation of the problem
where β is deduced from the observed quantities X and Y from the following formula (Further in the text, we will use a compact notation for sine S Ω = sin(Ω) and cosine C Ω = cos(Ω).) Obviously, Ω = π/4 reduces back to the CHSH constraint (up to normalization). Just like the CHSH score can be seen as the result of a test of the CHSH inequality, we can associate β to the test of a Bell inequality -a generalisation of the CHSH inequality -that is characterized in Ref. [18]. This characterization is also done below for the sake of completeness.
us that such observables can be jointly block diagonalised with blocks of size 2 × 2, i.e.
where, without loss of generality, we can assume the restriction to each qubit block to be a real Pauli measurement satisfying A 2 x,k = 1 k and B 2 y,k = 1 k . This means that in each block labelled by k and k respectively, the measurement is characterized by unit vectors a k x , b k y such that where σ z and σ x are Pauli operators. The state |ψ ABE can be enforced to take the form where p (k,k ) is a probability distribution and E , see Refs [21,22] for detailed discussions. Given models with such measurements and state, the quantity of interest can be expressed as where H (k,k ) (A 0 |E) is Eve's conditional entropy for four-qubit models (including the two qubits from Eve's purification) involving real Pauli measurements. If the minimization of H (k ,k) (A 0 |E) over such models satisfying cos(Ω) 2 X model, k, k' + sin(Ω) 2 Y model, k, k' ≥ β provides a convex function of β, this function can be used directly as a lower bound on the quantity H(A 0 |E) through Eq. (10). If it is not convex, it can be convexified so as to apply to all possible mixtures of state and measurement, and thus again apply to Eq. (10). This convexity property allows us to reduce the general problem of finding the minimum of Eve's conditional entropy over all possible models to a minimization over four-qubit models with real Pauli measurements. We will come back to this convexification requirement later.
Noisy preprocessing-We consider a simple post-processing of the raw key, known as noisy pre-processing [23, 24, 25], which has been shown to be beneficial to reduce the requirement on the detection efficiency in photonic implementations of device-independent quantum key distribution [22]. Once the raw key is obtained, Alice is instructed to generate a new raw key A 0 by flipping each bit of the initial raw key with a probability p. (This can be described using a POVM that is a mixture of the original measurement and a measurement with the outcome labels flipped.) Note that we will often parametrise the amount of noise that Alice adds with a parameter q = (1 − 2p) 2 .
Symmetrization-In order to simplify the analysis, it is convenient to consider a symmetrization step in which both parties, Alice and Bob, flip the outcomes of the key generating measurements depending on a public random bit string. This guarantees that bits of the raw key are random, i.e. H(A 0 ) = H( A 0 ) = 1. Importantly, one can show the equivalence of protocols with and without symmetrization, meaning that the symmetrization does not need to be implemented in practice, see [22] for a complete description of the symmetrization step in the presence of noisy preprocessing.
Reduction to Bell diagonal states-If the constraints appearing in the minimization problem do not depend on the marginal probabilities p(A x |x) and p(B y |y) of Alice and Bob respectively, the symmetrization step previously presented reduces the model of the state to a Bell-diagonal structure where , and without loss of generality, a partial ordering of the eigenvalues L 1 ≥ L 2 and L 3 ≥ L 4 can be imposed [21]. Note that the superscripts k, k are omitted in the tripartite state appearing in Eq. (11), i.e. |ψ ABE → |ψ (k,k )

ABE .
Until the end of this section and in the next section which is dedicated to the resolution of the optimization presented in Eq. (19), we remove the index k, k for making the notation simpler and ask the reader to keep in mind that we consider the restriction to four qubit models with real Pauli measurements in this two sections.
Eve's conditional entropy-Eve's conditional entropy can be expressed as (12) where ρ E is the reduced state of Eve andρ E|â corresponds to Eve's state conditioned on Alice's noisy key bit A 0 being equal toâ, which occurs with probability p(â). The equivalence of the protocol with the symmetrized one allows us to take H( A 0 ) = 1 and p(â) = 1 2 . H(ρ E ) is given by the entropy H(L) of the probability vector L = (L 1 , . . . , L 4 ), while for the H(ρ E|â ) terms we have where φ labels Alice measurement A 0 = cos(φ)σ z + sin(φ)σ x (we use the notation C φ = cos(φ) and S φ = sin(φ)). The two states ρ E|â=±1 are related by a simple unitary transformation and therefore have the same entropy, see App.A.2 for details. The expressions of these entropic quantities provide an explicit way to compute H( A 0 |E) as a function of the parameters L and φ. Let us now turn our attention to the constraints.
Quantum correlations in the (X,Y) plane -As mentioned earlier, we are considering quantum models with the values of correlators X and Y given by Eq. (4). Without loss of generality, we can assume X, Y ≥ 0, which can always be attained by relabelling the measurement outcomes of A 1 , B 0 and B 1 (i.e. without touching the angle φ).
In this positive quadrant of the plane, the local strategies are delimited by the CHSH inequality X + Y ≤ 2, i.e. the line connecting the deterministic strategies (X, Y) = (2, 0) and (X, Y) = (0, 2). This implies the following local bounds for the generalized CHSH tests To identify the upper limit of the quantum set, we consider the expected values of the generalized CHSH operator (15) To find its maximum value, we use the qubit parametrization of measurements A y , B y and parametrize the measurement angles on Bob's side as with two arbitrary perpendicular unit vectors c and c ⊥ , and cos(2θ) = b 0 · b 1 . From the diagonalization of the operator on the right hand side of Eq. (15), one finds the quantum bound attained at (X, Y) = (2 cos(Ω), 2 sin(Ω)) by a maximally entangled two qubit state and measurement settings a 0 · a 1 = 0 and b 0 · b 1 = cos(2Ω). It follows that Eve's information is constrained by the part of the quantum set lying between the line X+Y = 2 and the circle X 2 + Y 2 = 4. At this point, we can already conclude that any quantum model with (X,Y) lying on the circle satisfies H(Â 0 |E) = 1 (except for the two points with X+Y= 2), since the underlying state of Alice and Bob has to be pure. This is a straightforward improvement over the CHSH bound.
Formulation of the problem to solve-The reductions introduced so far invite us to first solve the following optimization B Ω (L, φ, a1, b0, b1) ≥ β (19) and then consider directly the solution I(β; Ω, q) if it is concave in β or construct a concave function I(β; Ω, q) ≥ I(β; Ω, q) to bound Eve's uncertainty using ; Ω, q . (20) Note that from the symmetries of the goal function H(L) − H(ρ E|â=+1 ) and the constraint B Ω , we can assume φ ∈ [0, π 4 ] for the key generating setting and L 1 −L 2 ≥ L 3 −L 4 in addition to L 1 ≥ L 2 and L 3 ≥ L 4 for the state, see App.A.1 for the details. Further note that we will often use a parametrisation of the tripartite state given by the following 3 component vector 

Bounding Eve's information with generalized CHSH tests
We are now ready to compute a bound on Eve's information as a function of the generalized CHSH score given in Eq. (6) by solving the optimisation problem given in Eq. (19). Among the parameters of the model in Eq. (19), the measurement setting a 1 , b 0 and b 1 only influence the constraint but not the goal function. Furthermore, it is shown in Ref.
[22] that H(ρ E|â=+1 ) is a monotonic function in the key generating setting φ ∈ [0, π 4 ]. We can thus decompose the maximization problem in two steps. First, for a fixed state L, we find the lowest angle φ allowing to satisfy the constraint φ * (L, β, Ω) = min φ Second, we fix φ = φ * (L, β, Ω) to the optimal value for Eve, and maximize her information with respect to the state, that is, we solve We solve Eq. (22) in App. A.3. The expression of the optimal angle φ * depends on whether the parameter Ω exceeds π 4 . We treat the two cases Ω ≤ π 4 and Ω > π 4 separately.

The simple case with
For the Bell tests satisfying Ω ≤ π 4 , which include the CHSH test, the observed score β does not constrain the key generating setting φ but only the state L, see App. A.3 for details. As a result, there always exists a realization with the optimal angle φ * = 0 as long as the state is such that the Bell score can be attained, In other words, the optimization Eq. (22) yields φ * = 0, and the maximization of the entropy becomes possible: the conditional state ρ E|â=+1 is then block diagonal and its entropy has a simple closedform expression. Such a maximization has been done for the CHSH case (Ω = π 4 ) in Ref.
[26] pointed out that the analytical bounds on conditional entropies given in this reference assume qubit attacks. The same bounds were derived using a different approach in [26], where it is also proved that these bounds are convex. The convexity results of [26], together with Jordan's lemma, imply that the obtained qubit bounds are in fact valid for any dimensions. The same convexity proof applies to the current situation with Ω ≤ π/4. For the sake of completeness, we provide in App. A.5 an alternative proof of convexity, which directly applies to the present case and to [22]. We show in particular (see App. A.4) that where h is the binary entropy. The concavity of I(β; Ω, q) and hence the convexity of Eve's entropy (26) Finally, it remains to determine the optimal inequality to use for a given point (X,Y). h q (z) being a monotonic function of z, we want to maximize its argument with respect to Ω in the range where the argument of the square root is positive (which means that the value for the Bell test exceeds the local bound). Manifestly, the expression has a global maximum at Now, we have to verify that the optimal test we found satisfies Ω ≤ π 4 . One finds that this is the case for providing a bound on Eve's entropy as a direct function of the correlators X and Y: One easily verifies that this bound is indeed better than the CHSH formula [22]

Bounding Eve's information from a numerical optimization
For the remaining Bell tests, with Ω > π 4 , the situation is different. Here, the generalized CHSH score β does not only constraint the state L but also the setting of the key generating measurement. We therefore adopt a strategy in two steps. First, we develop a method that can efficiently compute a bound on Eve's information, either heuristically or under a set of well-defined ansatz. Second, we provide a numerical method able to certify formally the validity of a given bound.
Considering the parameters when Ω > π 4 , we find that there are two different regions. First, for the states falling in the range the constraint B Ω ≥ β can be satisfied with the measurement angle φ ≥ φ * (L, Ω, β), where cos 2 (φ * ) = c 2 * (L, Ω, β) is given by .
(33) The constraint on the angle only becomes trivial, c 2 * = 1, on the boundary of the region S, where (C 2 Bell score β can also be attained with φ = 0. However, there models provide less information to Eve as compared to these on the boundary (C 2 Ω T 2 z +S 2 Ω T 2 x ) = β 2 , see the discussion at the end of App. A.4. So we can safely ignore this region.
To find the best strategy for Eve, it thus remains to solve This optimization only involves an analytic function of three parameters on a compact domain. It can be easily and time-efficiently solved heuristically by standard numerical methods, e.g using fmincon on MATLAB, NMaximize on Mathematica, or scipy.optimize on Python. We now give an ansatz on the solution of the optimization given in Eq. (34), which allows one to speed up its numerical resolution even further.

Ansatz
First, we observe that the vector L saturating Eve's information only has two non-zero coefficients L 1 = 1−L 3 and L 2 = L 4 = 0, or T z = 1 and T x = T p . With this observation, the previous optimization problem becomes a scalar optimization, that is Second, we see that the boundĨ anz (β; Ω, q) is not concave for small β. However, we observe that its concave roof can be obtained by drawing a line which passes through the point which is tangent to the curveĨ anz (β; Ω, q). The value of the generalized CHSH score at the tangent point can be found by solving Labeling I * (Ω, q) =Ĩ anz (β * ; Ω, q), this leads to the concave roof At this stage, we further observe that the optimal value of T 2 x for values of β ≥ β * coincides with its maximum possible value (implying c * = 1). We thus define I anz (β; Ω, q) accordingly. Interestingly, this expression coincides with the solution given in Eq. (25) for the case Ω ≤ π 4 , meaning in particular that the optimal value of Ω for I anz is given by Eq. (28). In the Eqs. (37) and (38) we can now re-placeĨ anz (β; Ω, q) with I anz (β; Ω, q), which does not involve any nonlinear optimization.
While we believe this expression to be the true bound, we do not have a formal proof. Anyway, this conjectured expression helps to solve the optimization of interest.

Certified numerical solution
As mentioned before, the optimization in Eq. (34) can be easily solved by standard numerical methods. However, to provide a strict security guarantee to an actual implementation of DIQKD, such a numerical optimization would need to be done in a certified manner, with a formal proof that the obtained numbers lower bound Eve's conditional entropy on the whole domain. Below we present an algorithm which allows one to do such a certified optimisation based on the Lipshitz continuity of the goal function. The algorithm is rather time-costly, but it only has to be used once the optimal experimental parameters are fixed through an ad hoc maximization of Eq. (34), cf. below.
Concretely, we present in this section an algorithm that approximates the set of possible strategies of Eve, delimited by the bound I(β, Ω, q), from the outside. To avoid the issue of concavity posed by Eq. (19), we rewrite the problem in the dual form in which we look for the tangent lines to the curve I(β, Ω, q) with different slopes t. In Eq. (39) we used the fact that it is only the Bell score that depends on the measurement setting a 1 , b 0 , b 1 , so it can be maximized straightforwardly to define β max (L, φ; Ω), see App. A.6 for its closed form expression.
Before giving the details on the way we solve this dual form, let us shortly discuss on how it shall be used. We consider an actual implementation of DIQKD with fixed values of X * ,Y * and q * and for which there is an optimal value Ω * which saturates the minimum in Eq. (20) with β * = 1 2 (C Ω * X * + S Ω * Y * ). Therefore, an optimal security guarantee for this particular implementation only requires the knowledge of the function I(β * , Ω * , q * ) on a single point. The same lower bond on Eve's conditional entropy can be obtained from the value of the dual bound f (t; Ω * , q * ) in Eq. (39) on a single point. Indeed, the concavity of I(β; Ω, q) ensures that there exists a value t * for which the inequality Hence, for a fixed experimental implementation of the protocol, it is sufficient to certify a single value of the function f (t * ; Ω * , q * ) in order to provide a strict and optimal security guarantee. Furthermore, the value t * is straightforward to find from the knowledge of Ω * , β * , q * and the function I(β; Ω, q).
We can now comment on the algorithm to provably upper bound the quantity in Eq. (39). The basic idea is a branch and bound approach relying on the Lipschitz continuity of the goal function. Concretely, we first derive a parametrization of the set (L, φ) for which the goal function in Eq. (39) is Lipshitz continuous with a constant that we compute. Then, we obtain an upper bound on its value on the whole domain by computing the value of the function on a grid of points. Finally, the algorithm subsequently refines the grid around the points where the value of the function is large in order to approach, step by step, the global maximum. Additional details are given in the next three sections.
We remark that another algorithm for solving this optimisation was recently developed in a separate work [17], which we believe could be adapted to our situation as well. The basic idea in that work is somewhat different -for fixed measurement angles for Alice and Bob, they find a semidefinite programming (SDP) relaxation for the minimization of Eve's entropy with respect to the state. This SDP is then solved on a grid of angles, and continuity of the goal function with respect to the angles is used to certify that the bound is secure. One advantage of the approach we propose here is that it provably converges to a tight bound, whereas the SDP relaxation in [17] is not known to be tight 1 .
Lipshitz continuity of the entropy with respect to the angle -A key ingredient in the practical implementation of the desired certified algorithm is that the von Neumann entropy H(ρ) has a bounded rate of change with respect to the angular distance between the two states ρ and σ (see App. B.1), where the fidelity is defined as This angle is a metric on the set of states [28]. We show that for n-dimensional quantum states, the entropy satisfies where r 1 ≈ 0.203 (see Eq. (157) below for details). This contrasts with the trace distance, another metric on the set of quantum states, for which the entropy has infinite rate of change around non-full-rank states.
Bounding the gradient of the goal function-To apply the continuity bound above to the goal function in Eq. (39), we use the following parametrization of the state so that the quantum models are described by four angles , which can be imposed on the states as an alternative to To obtain a bound on the gradient of the goal function 1 Note that after the publication of this manuscript, some of the authors have shown how to perform an SDP relaxation that converges to a tight bound as well, see [27].
we bound the gradient of each term independently, as described in App. B.2 and B.3. For the entropic terms, we use Eq. (44), while the computation of the maximal gradient is quite straightforward for the CHSH score. Combining the three terms, we obtain a general bound on the whole domain of x.
With this global bound on the gradient, a certified maximization of the function G(x) can be obtained with a branch and bound approach. In order to do so, we extended the code developed in Ref.
[29] to include global optimization. A detailed description of this code can be found in the App. B.4. A python implementation as well as an example for our case of interest, can be found on Gitlab 2 .

Improved bound on Eve's conditional entropy-
In order to demonstrate the advantage of considering the pair of variables (X, Y) when bounding Eve's conditional entropy, we compute the bound on H( A 0 |E) for different values of X, Y and p and compare it to the bound obtained from the CHSH score [22]. Because of the two regimes identified earlier, we compute both the optimal bound assuming Ω ≤ π/4 and the one assuming Ω > π/4 for each value of X and Y, and keep the best one. In the case Ω ≤ π/4, the optimal choice of Ω is readily given by Eq. (28). In the other case, we optimize the bound over Ω ∈ (π/4, π/2]. The difference between this optimal bound on H( A 0 |E) given X and Y and Eq. (3) is shown in Fig. 1. It shows that our bound on H( A 0 |E) is better than the one derived from the CHSH score, except along the line satisfying X(X + Y) = 4. We emphasise that our derivation of the bound is constructive, that is we find the optimal attack that gives H( A 0 |E) to Eve. The final bound is thus tight, up to the precision of the numerical algorithms for Ω > π 4 .
Implication for a practical realization of DIQKD-We now study the potential impact of our bound on practical realizations of DIQKD. In the limit of asymptotically many repetitions, an implementation is uniquely characterized by its key rate r, which is given in Eq.
(2). In our case, this key rate is determined from three quantities: X, Y and Bob's uncertainty about Alice's key generating bit as a function of the noisy preprocessing parameter, given In the presence of noisy preprocessing (i.e. p > 0), the advantage follows a similar distribution, but is smaller in magnitude. The CHSH bound is only optimal along the curve X(X + Y) = 4. The advantage on the right-hand side of this curve is obtained with Ω < π/4, and on the left-hand side with Ω > π/4. The yellow (red) curve shows the trajectory in the X-Y plane which optimizes the key rate in an optical implementation of DIQKD for low (high) detection efficiencies. At the efficiency η = 0.923, it is better to switch from one curve to the other one (at the transition between full and dashed curves). The points with Ω > π/4 were computed both with the heuristic method and with the ansatz described in Sec. 3.2.
by H( A 0 |B 2 ). Hence, in order to find the optimal design for an experimental implementation of DIQKD, we express the quantities as a function of the model's parameters. Then, we maximize the key rate over these parameters: Solving this maximization gives a bound on the key rateř, the values X * , Y * , H( A 0 |B 2 ) expected for a given implementation, as well as the optimal values of the parameters Ω * and q * for this implementation.

SPDC-based implementation of DIQKD-
Photonic experiments using a source based on spontaneous parametric down conversion (SPDC) are one of the most promising setups for implementing DIQKD, as shown by recent experiments reporting on the violation of a Bell inequality without the fair sampling assumption [30, 31,32,33,34,35]. We consider such a setup in which an SPDC source is used to create and distribute polarization entanglement between distant parties who perform measurements as requested here in the proposed protocol. The main limitation in this setup is the overall detection efficiency, i.e. the possibility of losing a photon at any point between its creation at the source and its final detection. To reflect photon losses and non-unit detection efficiency, the transmission channel between the source and the parties is modeled as a lossy channel with an overall transmission η. It is also important to include the statistics of an SPDC source which does not produce a two-qubit state, but a state that contains vacuum and multiple photon components. We invite the reader to look at Ref.
[22] to get explicit expressions of the exact statistics created by this source as well as a description of tunable parameters.
When computing the key rate for an SPDC source with a security determined by the CHSH score, any values of X and Y with the same sum impose the same bound on Eve's conditional entropy H( A 0 |E). In a Eberhard-like scenario where a significant fraction of the entangled particles can be lost before yielding their measurement result, it is advantageous for Alice to use two measurement settings with different overlap with her Schmidt basis in order to maximize the CHSH quantity [36]. It is then easier for Bob to guess the outcome of one of the two measurements (the one best aligned with the Schmidt basis). When the key rate is extracted from this measurement to minimize the cost of error correction (see Ref. [22] for more details), the value of the X quantity is then larger than Y, as shown in Fig. 1. But in this case, the (X,Y) values are very close to the line X(X + Y) = 4, for which there is no advantage. Therefore, we only expect a small improvement in the key rate here.
Given the values shown in Fig. 1, a better bound on Eve's information would be obtained if the same CHSH value was obtained with the contributions from X and Y being inverted, i.e. with Y > X. However, this requires Alice to define her key-generating measurement as being the one less aligned with the Schmidt basis, hence leading to an increase of the error correction cost. In this case, both conditional entropies H( A 0 |E) and H( A 0 |B 2 ) are larger, and we need to check which one increases the most in order to infer a possible gain on the key rate. As it turns out, the tradeoff between these two entropies increase depends on the detection efficiency.
Namely, there are two regimes, as shown in Fig. 2. When the detection efficiency is larger than ∼ 0.923, relabelling the measurements in order to enter the region with Y > X (where inequalities with Ω > π/4 significantly improve the bound on H( A 0 |E)) is advantageous, because the increase in Bob's uncertainty H( A 0 |B 2 ) is smaller. The red curve in Fig. 1 shows the corresponding trajectory in the X-Y plane. When η < 0.923, the cost of error correction become prohibitive compared to the potential increase in Eve's uncertainty, and it is better to stay in the region X > Y, as represented by the yellow curve in Fig. 1. There, a small increase of the key rate is still found because the correlations do not satisfy X(X + Y) = 4 exactly. However, this condition is only slightly violated, resulting in an increase in key rate smaller than ∼ 10 −4 , which is practically negligible. The critical detection efficiency then also remains at ∼ 83%, unchanged compared to a bound based on CHSH alone.

Comparison of qubit vs SPDC bounds-
To illustrate the impact of the photon statistics of SPDC sources on DIQKD, we now consider a simpler model in which the state shared between Alice and Bob is a two-qubit state: We are not aware of physical setups allowing one to produce a state with θ = π/4 but not allowing for a different value of θ. Still, for the sake of the discussion, we distinguish between the cases where the state can be either constrained to be maximally  Table 1: Critical detection efficiencies for various states and protocols. Here, Q is the quantum bit error rate (QBER). These thresholds are compared in Fig. 3. entangled, i.e. θ = π/4, or can have an arbitrary parameter θ ∈ [0, π/4].
In Tab. 1, see also Fig. 3, we report the critical detection efficiencies corresponding to various security analyses applied on these implementations. Like for the SPDC model, no advantage on the critical detection efficiency is found when using arbitrary two-qubit systems. A small advantage is however present when restricting to measurements on the singlet state. Still, this model is not optimal and it remains of course better to use partially entangled states.
In fact, even partially entangled states produced by an SPDC source perform better.
In this respect, it is worth noticing here that the performance of an SPDC source is essentially comparable to that of an arbitrary two-qubit state once noisy preprocessing is taken into consideration, i.e. the requirement on the detection efficiency is very similar. In the case without noisy preprocessing, the state produced by these physical sources do not have a better tolerance to losses than measurements on a maximally entangled state. This suggests that noisy preprocessing is a key ingredient for a first proof of principle implementation with an SPDC source [22].

Discussion
In this paper, we introduced a refinement of the usual CHSH-based analysis of DIQKD experiments: Instead of projecting the measurement statistics onto a single line giving the CHSH score X + Y, we kept the information about the individual values of X and Y throughout the whole security analysis. We found that this refined analysis gives a more restrictive bound on the information available to the eavesdropper for almost all values X and Y in the quantum set.
When applying our results to photonic implementations of DIQKD with a SPDC source, we found that the key rate is improved by a factor going up to 37% for unit detection efficiency. On the other hand, we could not find any improvement for the critical detection efficiency as compared to the CHSH protocol with noisy processing presented in Ref. [22]. However, we focused on a given photonic implementation, and the question of the most favorable optical setup combining squeezing operations, displacement operations, linear optical elements and photon counting techniques is still open. Advanced techniques using automated design of quantum experiments based on reinforcement learning which already proved to be useful to optimize the CHSH score [38] are inspiring. Applying them to the proposed protocol in order to reduce the required detection efficiency for implementing DIQKD appears to be promising for future work.
Finally, we would like to remark that the certified numerical techniques we proposed also open up the possibility of bounding Eve's information reliably when more correlators, or even the full measurement statistics, are taken into account.

Note added
While writing this manuscript, we became aware of another manuscript [26] reporting on similar results.

Acknowledgments
We thank Stefano Pironio for pointing out that the convexity argument provided in [22] was not complete, see the discussion in Sec. 3

A.1 Parametrization of two-qubit models
Following the logic of [21], we assume, without loss of generality that after the qubit reduction, the state shared by Alice, Bob and Eve is of the form , with nonnegative L 1 ≥ L 2 and L 3 ≥ L 4 , and the measurements A x , B y appearing in the constraint are in the x-z plane The key generating setting is explicitly parametrized by an angle φ (As mentioned in the main text we use a compact notation C φ = cos(φ) and S φ = sin(φ) throughout the paper.) One notes that the application of a unitary transformation σ Z on Alice's system is equivalent to changing the state and the measurements as The two situations are completely equivalent for our purpose, so our parametrization of quantum models is actually redundant. To avoid this, we can introduce an order relation between the pairs of coefficients (L 1 , L 2 ) and (L 3 , L 4 ) (permuted by a basis transformation). In particular, we will impose below, but will also use L 1 + L 2 ≥ L 3 + L 4 for the certified numerical algorithm. We also introduce a different parametrization of the probability simplex L with a vector T = (T z , T x , T p ) given by The conditions

A.2 Entropies of ρ E|â=±1
Having introduced a parametrization of quantum models, we now express the quantities of interest H(ρ E ), H(ρ E |Â 0 ) and B Ω as functions of the distribution L describing the state |Ψ ABE and the measurement settings a x , b y . The marginal state of Eve is straightforward to write down: For the conditional states, we have (60) where the factor of two arises because the probability to observe the outcomeÂ 0 = 1 is simply 1 2 . The conditional state for the other outcomeÂ 0 = −1 can be easily obtained by noticing that the two outcomes are interchanged by a mappingÂ 0 → −Â 0 , i.e. with the inversion of the vector It follows that the entropies of both conditional states are equal and Analogously to Eq.
This implies that H(ρ E |Â 0 ) and H(Â 0 |E) do not depend on these sign changes, i.e. they are only functions of C 2 φ and S 2 φ .

A.3 Optimal key generating setting a 0
In this section, we give the minimal angle φ for which constraint B Ω ≥ β can be fulfilled for a given L. We first focus on the possible values of φ. The expected Bell score is straightforward to compute: From this expression we notice that any of the following transformations of the key generating setting: with s 1 , s 2 ∈ {0, 1} can be compensated by applying the same transformation to the remaining settings a 1 , b y to give the same Bell score B Ω . Furthermore, we have seen that this transformation does not change Eve's conditional entropy H(Â 0 |E). Therefore we can always restrict a 0 to the positive quadrant of the circle without loss of generality. We now express the Bell score with Bob's settings parametrized as in Eq. (16): Computing the expected value of the operators on our Bell diagonal state gives We introduce an angle γ to parametrize the vectors c and c ⊥ : The maximization with the second setting of Alice is straightforward, that is The Bell score (optimized with respect to a 1 ) becomes Since T z , T x ≥ 0 and φ, Ω ∈ [0, π 2 ], we can also restrict the angle θ and γ to the interval [0, π 2 ] without loss of generality, and drop the absolute value: |S θ | = S θ . The constraint B Ω ≥ β takes the form Recall that we wish to find the minimal φ for which this inequality can be fulfilled for at least one value of the free parameters θ and γ. We observe that if the right hand side (RHS) can become zero or negative by some choice of θ and γ, the constraint becomes trivial. Since T z ≥ T x , this is possible for In the following we assume that this is not the case, i.e. β 2 > S 2 Ω T 2 z . The angle θ only appears on the right of the inequality, so to satisfy the inequality our best choice is to minimize the RHS with respect to θ. The expression either has a local minimum that can be found by setting its derivative to zero, or there is no local minimum and the expression is minimal at the boundary θ = 0 since it diverges for θ → π 2 . Differentiating the expression with respect to theta, we find that a local minumum does exist at (recall the assumption above). Plugging this value into Eq. (71) allows us to rewrite the constraint as which we rewrite as Now it becomes simple to check if the constraint can be satisfied at all. The vector v γ belongs to the positive quadrant of the plane with v 0 1 0 and v π/2 0 1 , hence the inequality a 0 · v γ ≥ 1 can be satisfied if and only if the length of the vector v γ reaches 1, i.e.
for some γ. We rewrite this inequality as With lengthy but straightforward algebra gives . (90) To find that the minimal angle φ * note that for the tangent line C φ * S φ * · v γ = 0, and therefore −S φ * |v γ | . Plugging in the above equations we find A.4 Eve's maximum information for Ω ≤ π 4 We here give details on the derivation of the formula (25) in the main text which corresponds to Eve's maximum information in the case Ω ≤ π 4 . For these inequalities we have seen that the constraint C 2 Ω T 2 z + S Ω T 2 x ≥ β 2 can be fulfilled with c 2 * (L, β) = 1. It has been shown in Ref.
[22] that H(ρ E|â=+1 ) is a monotonic function in the key generating setting φ ∈ [0, π 4 ]. The wost case (optimal attack for Eve) thus consists in setting the measurement angle C φ = 1, which implies a simple form for the state with a closed form expression for its eigenvalues implying The constraint on the generalized CHSH score leads to the following constraint on the vector L Our goal is thus to find the components of the vector L maximizing H(L) − H(p) and satisfying the previous constraint. Inspired by Ref.
[22], we first introduce the following parametrization The partial ordering of the L coefficients implies which requires P ≥ 1 2 . The advantage of this parametrization comes from the fact that our figure of merit can be nicely rewritten as where h(z) = −z log(z) − (1 − z) log(1 − z) is the binary entropy function with the logarithm in base 2, while the constraint on the expected value of the generalized CHSH operator is given by For a fixed P, the curve in the (x, y)-plan that corresponds to a constant value β satisfies P dx = (1−P ) dy. This remark allows one to maximize H(L) − H(p) along this curve and find that it is optimal for Eve to set x + y = 1, see appendix C2 in Ref. [22] for the detailed argument. The symmetry of the function h q (x) = h q (1 − x) allows us to write the problem as As the goal function h q (x) does not depend on P , we can set its value to P = 1 because this is the value maximizing the LHS of the constraint inequality and allowing the largest possible interval for the remaining variable x. We thus get Finally, as h q (x) is a monotonically decreasing function of x (see Ref. [22]), it is optimal to set x to the least possible value compatible with the constraint. This implies Let us now recall that the situation with c 2 * (L, β) = 1 and C 2 Ω T 2 z + S Ω T 2 x ≥ β 2 also occurs in the case Ω > π 4 . The above proof guarantees that it is optimal for Eve to use strategies where the inequality is saturated C 2 Ω T 2 z + S Ω T 2 x = β 2 . Hence in the optimization problem for Ω > π 4 we can ignore all the strategies with C 2 Ω T 2 z + S Ω T 2 x > β 2 .

A.5 Concavity of h q • z(β)
Recall that in order to use the bound I(β; Ω, q) derived for two-qubit strategies in the previous section as a universal bound (valid for strategies in any dimension), we have to show that this function is concave. In this case, for any mixture of qubit strategies (enforced by the Jordan's lemma) with an average scoreβ = i p i β i , Eve's information satisfiesĪ (β; Ω, q) = i p i I(β i ; Ω, q) ≤ I(β; Ω, q).
The concavity of I(β; Ω, q) = h q (z(β)) follows from the fact that its second derivative is negative which we are going to show below. In this section we will use the natural algorithm instead of logarithm in base 2. The function h q (z) takes a positive real factor upon changing the base of the algorithm, so it is irrelevant for its concavity. Note first that z(β) ∈ [ 1 2 , 1] and Then consider the following identities The identity (26) that we want to prove thus becomes Multiplying by a positive fraction it can be straightforwardly simplified to the form As h q (z) ≤ 0 was proven in [22,39], we have the inequality We use it to relax the inequality in Eq. (107) to At the point z = 1 2 the left hand side becomes zero, since h q ( 1 2 ) = 0 and |h q ( 1 2 )| < ∞, see below. From now on, we thus exclude the point z = 1 2 and consider z ∈ ( 1 2 , 1]. Now we can divide the whole expression by a strictly positive factor (z − 1 2 ) 2 . We obtain In other words we want to show that the function on the interval z ∈ ( 1 2 , 1]. Let us now compute this function: The last fraction can be simplified to Therefore where To complete the proof we thus need to show that From Eq. (114), we have so the inequality to be shown can be rewritten as For q = 0 we have (1 − q) = 1 and n q (z) = z so the two terms are equal. The identity to be shown can then be expressed as which holds for q = 0 trivially. To show that it holds for all q it is sufficient to demonstrate that the function f (z, q) is increasing with q, i.e.
which we are going to show now. Using we obtain for n > 1 2 , which is positive iff q = 1 or In the case q < 1, straightforward algebra allows to find a simple expression of the left hand side and rewrite the condition as Changing the variable to x + 1 = n 1−n with x ≥ 0 we express the above inequality as To prove this relation note that there is equality for x = 0, but the LHS increases slower than the RHS which concludes the proof.
Properties of h q ( 1 2 ) and h q ( 1 2 ) We start with the first derivative and want to show that h q ( 1 2 ) = 1/2. We have The binary entropy hits a maximum at z = 1/2 so h ( 1 2 ) = 0. For the second term, we use (118) to get Changing the variable to x + 1 = √ q with x ≥ 0 yields for the last expression It is obviously bounded for x > with any , and the fact that the limit x → 0 exists can be seen by straightforward application of l'Hôpital's rule. Hence, We also wish to show that the second derivative h q ( 1 2 ) is bounded. To do so we compute But as we have just shown that the desired result

follows.
A.6 Maximization of the generalized CHSH score with respect to auxiliary settings that a priori appears in Eq. (39), it is only the Bell score which depends on the auxiliaty measuremnt settigns a 1 , b 0 and b 1 . As t is always positive we can straigtforwardly maximise the score with respect to these settings. We thus define β max (L, φ) = max a1,b0,b1 which actually appears in Eq. (39).
Let us now compute this expression starting from Eq. (67), that we put in the form This form makes the maximization with respect to θ and a 1 straightforward max θ,a1 To find the maximum with respect to γ or c = Cγ Sγ it is convenient to write the expression inside the square root as max θ,a1 It is now obvious that the value is maximal if c is aligned with the eigenvector of the matrix which corresponds to the maximal eigenvalue. Therefore B Ω (L, φ, a1, b0, b1), B Numerical tool

B.1 Lipshitz continuity of von Neumann entropy
Consider two states ρ and σ on an n-dimensional Hilbert space, that are close in fidelity: Given the monotonicity of arccos in the range [0, 1], this condition can be equivalently written in terms of the angle A(ρ, σ) = arccos(F (ρ, σ)) A(ρ, σ) = a = arccos(f ).
The angle is a metric on the space of density operators [28], in particular it satisfies the triangle inequality.
Next, note that the angle between two states is lower bounded by the angle between the ordered vectors made of their ordered eigenvalues a = A(ρ, σ) ≥ A(p, q) = arccos ( with p = Eig ↓ (ρ), q = Eig ↓ (σ) such that p 1 ≥ p 2 ≥ . . . This inequality follows from where in second line we used the so-called von Neumann trace inequality [40]. This bound is useful because the entropies of the states match the entropies of the probability distributions Let us now bound their difference To do so, note that for any two unit vectors √ p and √ q on the n-sphere there exist a path γ connecting the two and such that the integral along the path satisfies Let us bound the variation of the entropy along the path. To this end, we associate a probability distribution r to each vector v on the path γ, with r i = (v (i) ) 2 (note that the vectors along the curve remain in the positive part of the n-sphere v (i) ≥ 0). A step dA from v along the path corresponds to some deformation of the vector given by with v · v ⊥ = 0. To simplify the following computations we introduce the "natural" entropy as the von Neumann entropy computed with the natural logarithm (recall that the log in H(ρ) was taken in base 2). From we compute the entropy variation for an infinitesimal increment of the angle where we defined a vector w as and it remains to bound the expression on the RHS. To do so, we will construct a concave upper bound on the function c(r) = r ln 2 (r) we see that the function is concave c ≤ 0 on the interval r ∈ [0, 1 e ], and convex c ≥ 0 on the complement. To get a concave upper-bound we thus look for a line passing by r = 1 and c(1) = 0 and tangent to c(r). In the (r, c)-plane the equation of a line tangent to c(r) at r is given by It passes through the point r (λ) = (1, 0) iff where W 0 is the principal branch of the Lambert W -function, and the trivial solution r 2 = 1 is irrelevant.
Knowing r 1 we can construct a concave upper bound Note that r 1 < 1 e . Furthermore, r 1 is the solution of the equation 2 − 2r 1 + ln(r 1 ) = 0. This allows us to simplify c(r 1 ) = r 1 ln 2 (r 1 ) = 4 andĉ (r) = r ln 2 (r) r ≤ r 1 4 r 1 (1 − r 1 )(1 − r) r > r 1 . (160) Finally with the concave boundĉ it is easy to obtain So for entropy susceptibility we get For n ≤ 4, we have r 1 < 1 n , so from Eq. (160) we get n ≤ 4 : For n ≥ 5, we instead have r 1 > 1 n , so we are in the other "part" of the function in Eq. (160) and get a simpler expression: For later use, we evaluate the constant for the n = 4 case in particular: To bound the entropy difference for non-infinitesimal distances, we simply integrate along the curve γ; e.g. for n = 4 this yields

B.2 Continuity of the goal function
The entropy term -To apply the continuity bound previously described to our situation, let ρ, ρ be the states onÂ 0 E produced by measurements along angles φ, φ on the states |Ψ ABE with the weight L, L To bound the other term A(ρ(L , φ), ρ(L , φ )) we note that if we apply a channel that performs a Z measurement followed by noisy preprocessing to the state e iφY A /2 ⊗ 1 BE |Ψ ABE , this produces exactly the state ρ onÂ 0 E; analogously, applying the same channel to the state e iφ Y A /2 ⊗ 1 BE |Ψ ABE produces the state ρ . Therefore the data-processing inequality implies F (ρ, ρ ) is lower-bounded by Putting these together, we conclude that (for |φ − φ| < π) A(ρ(L , φ), ρ(L , φ )) ≤ |φ − φ| 2 .
Combining everything together, we get The Bell score -Next we wish to bound the increment |dβ max (L, a 0 , Ω)| for infinitesimal changes of the parameters (dL, dφ). It is actually straightforward to bound the gradient of the Bell score before the maximization with respect to a 0 , b 0 , b 1 , so we just need to be careful to apply this bound on β max .
We are interested in bounding |dg(y)| as a function of dy. Consider two values of the parameter, y 1 and y 2 , and definex 1 = argmax x f (x, y 1 ) such that g(y 1 ) = f (x 1 , y 1 ) and g(y 2 ) = f (x 2 , y 2 ). Without loss of generality we assume g(y 1 ) ≥ g(y 2 ) and consider the difference Taking the limit y 2 → y 1 we get |dg(y)| ≤ max x |∇ y f (x, y) · dy|. (180) Using the expression (67) for B Ω (L, φ, a 1 , b 0 , b 1 ) we then get

B.3 Gradient of the goal function
We will now combine all the elements to upper bound the gradient of the goal function G(L, φ; Ω, q) = H(ρ E ) − H(E|â 0 (q)) + tβ max (L, φ).
We parametrize the vector L with the help of the angles as in Eq. (45), such that the "model" is described by four angles (L, φ) ω = (α, µ, ξ, φ).
We will bound the gradient of the goal function with respect to this parametrization.
We then compare this value to ν. In the case where ν > ξ 0 ( c), we pass to the next hypercube since the guessed maximum is higher than the potential maximum value of f in h 0 ( c). Otherwise, we mesh h 0 ( c) into a new hypercube grid graph G 1 (s 1 ) of element of size s 1 = s 0 /2. For all of the new generated hypercubes, h 1 ( c) ∈ G 1 (s 1 ), we compute the potential maximum ξ 1 ( c) using Eq. (199). This method is applied recursively until either all new hypercubes of graph G k m satisfy ν > h m ( c), or we reach the minimum step s m ≤ ε. An upper bound of f is thus given by the maximum ξ m ( c) where s m = s 0 /2 m .