Device-independent quantum key distribution ewline with asymmetric CHSH inequalities

The simplest device-independent quantum key distribution protocol is based on the Clauser-Horne-Shimony-Holt (CHSH) Bell inequality and allows two users, Alice and Bob, to generate a secret key if they observe sufficiently strong correlations. There is, however, a mismatch between the protocol, in which only one of Alice’s measurements is used to generate the key, and the CHSH expression, which is symmetric with respect to Alice’s two measurements. We therefore investigate the impact of using an extended family of Bell expressions where we give different weights to Alice’s measurements. Using this family of asymmetric Bell expressions improves the robustness of the key distribution protocol for certain experimentally-relevant correlations. As an example, the tolerable error rate improves from 7.15% to about 7.42% for the depolarising channel. Adding random noise to Alice’s key before the postprocessing pushes the threshold further to more than 8.34%. The main technical result of our work is a tight bound on the von Neumann entropy of one of Alice’s measurement outcomes conditioned on a quantum eavesdropper for the family of asymmetric CHSH expressions we consider and allowing for an arbitrary amount of noise preprocessing.


Introduction
Device-independent quantum key distribution (DIQKD) allows distant parties to create and share a cryptographic key whose security can be proved based only on the detection of Bell-nonlocal correlations [1][2][3]. Its signature feature is that no assumptions are made about the quantum state and measurements performed during the security analysis. DIQKD schemes are, correspondingly, naturally robust against imperfections and some forms of malicious Erik Woodhead: erik.woodhead@ulb.ac.be tampering with the equipment.
The simplest protocol [3,4], inspired by a proposal by Ekert [5], is based around the well-known CHSH Bell inequality [6]. In this scheme, pairs of entangled particles are repeatedly prepared and distributed between two parties, Alice and Bob. On a random subset of these entangled pairs, Alice performs one out of two ±1-valued measurements, A 1 or A 2 , on the particles she receives, and Bob similarly performs randomly one of three ±1-valued measurements B 1 , B 2 , or B 3 . The measurement results are used to estimate the value of the CHSH correlator, as well as the value of the correlator A 1 B 3 , where A x B y = P (A x = B y ) − P (A x = B y ) and P (A x = B y ) and P (A x = B y ) are the probability that the outcomes of the measurements A x and B y are equal and different, respectively. On the remaining subset of entangled particles, Alice always performs the measurement A 1 and Bob always performs the measurement B 3 . The corresponding outcomes are then used to generate, after classical postprocessing, a shared secret key known only to Alice and Bob. This is possible if the estimates of the correlator A 1 B 3 and of the CHSH value are both sufficiently large. Indeed, the first condition implies that the raw outcomes of Alice and Bob are correlated enough to be turned into a shared key using classical error correction. A strong CHSH value implies, on the other hand, that their outcomes are only weakly correlated to a potential adversary and thus that the key can be made almost ideally secret using privacy amplification. This tradeoff between the CHSH expression and the adversary's knowledge, which forms the basis of the security, can be expressed as the following tight bound on the von Neumann entropy of Alice's outcome conditioned on an eavesdropper's quantum side information, where is a function related to the binary entropy by φ(x) = h 1 2 + 1 2 x . This bound is device-independent in that it is valid independently of the measurements A 1 , A 2 , B 1 , B 2 performed by Alice and Bob and the state they share, which could be arbitrarily entangled with the adversary, under the constraint of the expected CHSH value S observed between Alice and Bob.
The bound (2) is not only of fundamental interest. It has recently been shown through the Entropy Accumulation Theorem (EAT) [7] (see also [8]) that proving unconditional security in the finite-key regime of a DIQKD protocol consisting of n measurement runs can be entirely reduced to bounding the conditional von Neumann entropy as a function of a Bell expression, exactly as (2) does for the CHSH case.
Furthermore, a bound on the conditional von Neumann entropy directly translates into a bound on the rate at which key bits can be generated securely per key generation round in the asymptotic limit of many runs n → ∞. Indeed, the rates derived from the EAT approach in this asymptotic limit (up to terms that are sublinearly decreasing in n) are given by the Devetak-Winter rate [9,10] where H(A 1 |B 3 ) is the conditional Shannon entropy associated with probabilities P (ab|13) that Alice and Bob jointly obtain the outcomes a and b when they measure A 1 and B 3 . The Devetak-Winter rate is saturated by a class of attacks, called collective attacks, where an eavesdropper attacks the protocol in an i.i.d. fashion, but where the eavesdropper can retain quantum side information indefinitely. Inserting the bound (2) in the Devetak-Winter rate (4) gives the tight lower bound on the asymptotic key rate for the CHSH protocol in terms of the CHSH parameter S and H(A 1 |B 3 ). It is positive for sufficiently high values of S and sufficiently good correlations between the outcomes of the measurements A 1 and B 3 . The lower bound (5) on the Devetak-Winter rate for the CHSH-based protocol was first presented in [3] and derived in detail in [4]. The main result of [3,4] was essentially 1 a derivation of the bound (2) on the conditional entropy H(A 1 |E) through an explicit attack saturating it (thus establishing the tightness of the bound). 1 More precisely, Ref. [4] derived the tight bound χ(A 1 : E) ≤ φ S 2 /4 − 1 on the Holevo quantity assuming a symmetrisation procedure is applied in the protocol. This was necessary in [4] as the bound on χ(A 1 : E) no longer generally holds if Alice's measurement outcomes are not equiprobable. By contrast, the analogue (2) that we state here for the conditional von Neumann entropy holds generally and this will also be a feature of the more general bound we derive in this work.
The main result presented in this paper is a tight bound on the conditional von Neumann entropy that extends the bound (2) in two ways. First, it generalises it to the family of CHSH-like expressions (6) where α ∈ R is a parameter that can be chosen freely (α = 1 corresponds to the regular CHSH expression). Second, it incorporates an arbitrary level of noise preprocessing [10].
A first motivation for considering these generalisations is purely theoretical. While we now understand how the security of a generic DIQKD protocol can be reduced to computing bounds on the conditional von Neumann entropy (or more precisely the derivation of what the authors of [7] call min-tradeoff functions), obtaining tight or reasonably good bounds beyond the already solved case of the CHSH expression, the simplest Bell expression, is challenging [11][12][13][14]. Our work shows how the von Neumann entropy can be computed for a new class of protocols and our approach, which partly relies on reducing the problem to the well-known BB84 protocol [15], might inspire further, more general, results.
A second motivation is more practical. Demonstrating a working and secure device-independent protocol remains technologically highly challenging [16,17] as it requires entangled particles to be distributed and detected with low noise and a high detection rate over long distances. Our results lead to two refinements to the CHSH-based protocol that ease these demands.
The first refinement, basing the security analysis on the extended family (6) of Bell expressions, is motivated by the tightness of (2). While the entropy bound (2) can be attained with equality, the eavesdropping strategy [3] that achieves it produces asymmetric correlations. For the optimal collective attack, the twobody correlators in the CHSH expression are related to the CHSH expectation value S by This reflects an asymmetry in the protocol: Alice uses the A 1 measurement to generate the key while A 2 is only used for parameter estimation. To mitigate this, instead of using only CHSH we will consider the extended family of Bell expressions (6) where a different weight α ∈ R is given to the correlation terms involving A 1 . Bounding the conditional entropy for the family (6) and then choosing whichever value of α gives the highest result amounts to the same as bounding the conditional entropy in terms of the combinations A 1 B 1 + A 1 B 2 and A 2 B 1 − A 2 B 2 viewed as independent parameters. In general, it has been observed that using more information about the statistics can improve the performance of a deviceindependent cryptography protocol [18,19].
The second refinement, noise preprocessing, consists of a classical change to the protocol in which Alice randomly flips each of her key bits intended for key generation with some probability q, known publicly, before the classical postprocessing to distil the secret key is applied. Noise preprocessing is known to improve the robustness of QKD protocols [10]. Intuitively, adding random noise to Alice's outcomes makes things worse (increases H(A 1 |B 3 )) for Alice and Bob, but it also makes things worse (increases H(A 1 |E)) for the eavesdropper and it turns out the result can be a net increase to the key rate.
Both refinements are simply incorporated to the standard DIQKD protocol of [3] given our generalisation of the conditional entropy bound (2) for the family S α of Bell expressions with noise preprocessing. As we will see in our case, deriving the entropy bound essentially reduces to deriving the conditional entropy bound for the well-known BB84 [15] QKD protocol. We give a short outline of how this works for the entropy bound (2) for CHSH in section 3 before giving the full derivation of our main result in section 4. We then derive some examples of its effect on the robustness of the DIQKD protocol in section 5.

The entropy bound
Let Alice, Bob, and an adversary, Eve, share some arbitrary tripartite state ρ ABE , and let A 1 and A 2 be two arbitrary binary-valued 2 measurements on Alice's system and B 1 and B 2 two arbitrary binary-valued measurements on Bob's system. We can think of the state and measurements as chosen by Eve. Without loss of generality we may assume the measurements to be projective (if necessary by increasing the Hilbert space dimensions).
If Alice measures A 1 and flips her outcome with probability q ∈ [0, 1], the correlations between Alice and Eve are described by the classical-quantum state and [1] are shorthand for classical register states |0 0| and |1 1|, and where Π 0,1 = (1 ± A 1 )/2 are the projectors associated with Alice's A 1 measurement. The conditional entropy of Alice's final outcome conditioned on Eve's knowledge is then defined as 2 In the following, we freely switch back and forth from a description where Alice's and Bob's measurement results take the values {0, 1} or the values {+1, −1}. This is just a convention and the choice depends on what is more convenient in terms of notation.
where τ E = Tr A [τ AE ] = a ρ a E = ρ E is Eve's average reduced state, S(ρ) = − Tr[ρ log 2 (ρ)] is the von Neumann entropy, and log 2 is the logarithm function in base 2.
The main result that we derive is a family of lower bounds on the conditional von Neumann entropy in terms of the expectation value (6) of the Bell expression S α computed on the reduced state ρ AB = Tr E [ρ ABE ], valid for any values of the parameters α ∈ R and q ∈ [0, 1]. These bounds hold for any state ρ ABE and measurements A 1 , A 2 , B 1 , B 2 and are hence deviceindependent.
The functionḡ q,α is piecewise defined and its construction is described below and illustrated for q = 0 and α = 0.9 in figure 1. As a way of explaining its form, we introduce it via a strategy that we considered as a candidate for the optimal collective attack.
The strategy is a minor modification of the optimal attack [3,4] saturating the CHSH bound (2). Eve prepares a pure tripartite state ρ ABE = |Ψ ABE Ψ ABE | of the form where the strength of the attack is determined by the overlap Alice and Bob then measure and where Z and X are the eponymous Pauli operators and ϕ B is an angle that we will optimise momentarily. The classical-quantum state after Alice measures A 1 and flips her outcome with probability q is thus given by (9) with ρ a E = ψ a /2, where ψ a is a shorthand for |ψ a ψ a |. The conditional entropy (11) can then directly be computed in terms of the overlap F to be On the other hand, the marginal state of Alice and Bob is For the above measurements and choosing an optimal angle ϕ B that maximises the expectation value of S α , we find which rearranges for F to Substituting (21) into (18), we find that the conditional entropy is related to S α for the particular strategy we have described by where A little consideration shows that the above strategy cannot be the optimal one minimising the entropy in all cases. The Bell expression S α has the classical and quantum bounds [20,21] At the quantum maximum S α = Q α we find g q,α (Q α ) = 1, i.e., the eavesdropper has no knowledge whatsoever about Alice's outcome as we would naturally expect for any conceivable strategy. At the classical boundary S α = C α , we would expect an optimal attack to yield H(A 1 |E) = h(q) since Alice and Bob's correlations can be attained with a deterministic strategy and the only randomness in Alice's outcome then comes from the noise preprocessing. The function (23) attains g q,α (S α ) = h(q) (25) at S α = 2|α|. If |α| ≥ 1, this is the same as the classical bound and there is no problem. However, if |α| < 1 then the classical bound is C α = 2 and the value of g q,α (S α ) at S α = 2 is too high to describe the optimal strategy. However, we can improve it by taking probabilistic mixtures of the above strategy with the classical one achieving H(A 1 |E) = h(q) at S α = 2. Geometrically we are considering, in the plane S α , H(A 1 |E) , the convex hull of the points S α , g q,α (S α ) and 2, h(q) . As illustrated in figure 1, this amounts to extending the curve g q,α (s) linearly from the point where its tangent intersects Our main result, which we prove in section 4, is that the explicit attack that we just described is optimal. That is, the construction shown in figure 1 gives the device-independent lower bound on the conditional entropy for all |α| < 1 while the bound is simply given by g q,α (S α ) for |α| ≥ 1.

Main result.
To summarise in more mathematical terms, our main result is that the conditional von Neumann entropy, computed on the post-measurement g q,α (S α ) g q,α (S α ) Figure 1: Conditional von Neumann entropy H(A1|E) as a function on the observed value of Sα given by our explicit attack, illustrated here for q = 0 and α = 0.9, which is representative for values |α| < 1. The dashed line is a plot of (23). It is visibly too high to be the optimal deviceindependent strategy for all Sα given that the real curve must be convex and attain h(q) = 0 at the classical bound Sα = 2. To get the correct relation, we use the tangent of gq,α for values of Sα less than the point S * where the tangent intersects the point H(A1|E), Sα = h(q), 2 . For q = 0 and α = 0.9 this happens at S * ≈ 2.4634.
classical-quantum state (9) following an amount q of noise preprocessing, is bounded in terms of S α by whereḡ ≡ḡ q,α is defined in terms of as where in turn g ≡ g q,α is the first derivative of g ≡ g q,α and, for |α| < 1, s * ≡ s * (q, α) is the unique point where the tangent of g(s) crosses h(q) at s = 2, i.e., such that We note that it is sufficient to consider s * in the range The upper bound corresponds to the quantum maximal value; the origin of the lower bound will be explained at the end of section 4. The attack strategy we started with shows that the entropy bound (26) is tight and can be attained for any values of the parameters q and α. For given correlations,ḡ q,α (S α ) can be maximised over α to obtain the best bound on the conditional entropy in terms of A 1 B 1 + A 1 B 2 and A 2 B 1 − A 2 B 2 seen as separate parameters. The result for q = 0 and correlations satisfying is shown and compared with the CHSH entropy bound (2) in figure 2.

Short derivation for CHSH
In the special case of the CHSH expression (α = 1) and that no noise preprocessing is applied (q = 0), the von Neumann entropy bound (26) and main result of this paper simplifies to Before proving the main result (26) we give a short derivation here for the special case (32). We do this partly just to show that there is a much simpler way to derive (32) than the approach originally followed in [4]; it also can serve as an outline for the full derivation of (26) that we undertake in section 4. The derivation is a simplified version 3 of one done in [22,23] 3 In terms of the notation and basis choices we use in this section, [22] essentially did the prepare-and-measure analogue for a prepare-and-measure version of the CHSH-based protocol.
The main idea is that we can reduce deriving (32) to bounding the conditional entropy for the well-known BB84 protocol [15]. To do this, we exploit two facts that are by now well established for this problem: first, we can assume without loss of generality that Alice's and Bob's measurements are projective and, second, since both parties perform only two dichotomic measurements to estimate CHSH, we can use the Jordan lemma to reduce the analysis to qubit systems.
Concentrating on qubit systems, then, we know from security analyses of the BB84 protocol (see e.g. [24] or [25,26]) that the conditional entropy of the outcome of a Pauli Z measurement by Alice is lower bounded by in terms of the correlation X ⊗ X between the outcomes of Pauli X measurements performed by Alice and Bob on the same initial state. To apply (33) to the device-independent protocol we need to identify Alice's measurement A 1 with Z. Since we assume the measurements are projective this is straightforward to justify: the CHSH inequality cannot be violated if any of the measurements are degenerate (i.e., ±1) and thus must all be linear combinations of the three Pauli operators. The only basis-independent properties characterising the measurements then are the angles between them on the Bloch sphere. We can therefore choose the local bases in such a way that and where ϕ A and ϕ B are unknown angles. With this choice of bases, the CHSH expectation value can be expressed as and then bounded by for the min-entropy. Ref. [22] concentrated on bounding the min-entropy due to a complication called "basis dependence" specific to the prepare-and-measure setting that makes it more difficult to tightly bound the conditional von Neumann entropy in that case. Some results for the conditional von Neumann entropy under different assumptions are nevertheless presented for this setting as Eqs. where we used the Cauchy-Schwarz inequality and that to get to the third line and a constraint respected by correlations between Pauli operators to get to the fourth. The inequality (38) rearranges to a lower bound for the absolute value | X ⊗ X | of the correlator appearing in the BB84 entropy bound (33). Since we chose the bases in such a way as to identify A 1 with Z, we simply substitute (41) into (33) to obtain (32).
The convexity of the result in S then allows the qubit bound to be extended to arbitrary dimension through Jordan's lemma.

Derivation of main result
The short derivation for CHSH above illustrates the general approach and kinds of technical ingredients we will work with to obtain a proof of the main result (26). A summary of the key steps is: • We reduce the problem to one where Alice's and Bob's subsystems are qubits.
• We need a generalisation of the BB84 entropy bound (33) allowing for noise preprocessing (q = 0).
• We derive constraints on correlations between Pauli operators that we can work with, such as (40), in order to transform the S α family of Bell expressions into a bound for a correlator | X⊗B | that we can use in the BB84 entropy bound.
• Finally, we should determine whether the resulting qubit bound is convex and, if it is not, take its convex hull to obtain the fully deviceindependent bound.
While we focus here on the family of S α expressions, for which we are able to bound the conditional entropy analytically, we remark that our approach of reducing the problem to qubits and then using the entropy bound for the BB84 protocol applies generically to the two-input/two-output device-independent setting.

Reduction to qubits
Following the approach used in many studies of the CHSH Bell scenario, we start by reducing the problem to one where Alice and Bob perform qubit measurements. We recapitulate how this works here. The reduction is based on the Jordan lemma [27], which tells us that any pair A 1 , A 2 of observable operators whose eigenvalues are all ±1 admit a common block diagonalisation in blocks of dimension no larger than two. That is, there is a choice of bases in which the observables appearing in the S α Bell expression can be expressed as for qubit operators A x|j and B y|k 4 . Proofs of this result can be found in [4,28,29].
After Alice measures A 1 and flips her outcome with probability q, we remind that the correlation between Alice and Eve is described by the classical-quantum state and Π 0,1 = (1 ± A 1 )/2 are the projectors associated with Alice's A 1 measurement. Introducing the block diagonalisation, we can reexpress ρ a E as This allows us to reexpress τ AE as where The expectation value of S α similarly decomposes according to where S α|jk is the contribution to S α from the pair (j, k) of Jordan blocks. Importantly, the expectation value S α|jk and classical-quantum state τ jk conditioned on the Jordan blocks are both determined by the same conditional state where Alice's and Bob's subsystems are qubits. This allows us to reduce the entire problem to qubit systems. More precisely, suppose we have derived a lower bound for the conditional entropy for qubit systems that is convex 6 . Then, concavity of the conditional von Neumann entropy and the convexity ofḡ imply in arbitrary dimension

BB84 entropy bound
We now derive the required BB84 entropy bound including noise preprocessing. The result we derive here is the following. Suppose that Alice, Bob, and Eve share a tripartite state ρ ABE , that Alice's subsystem is limited to a two-dimensional Hilbert space, and that Alice performs a Pauli Z measurement on her subsystem (in some chosen basis) and flips the outcome with probability q. Then, the von Neumann entropy H(Z|E) of Alice's outcome conditioned on Eve's quantum side information is bounded by where is the correlation between the Pauli X observable on Alice's side and any ±1-valued observable B on Bob's side computed on their part ρ AB of the initial state ρ ABE . Note that, for q = 0, (55) simplifies to the more familiar BB84 bound 6 If we have a bound that is not convex we take its convex hull.
that we used in the outline in section 3.
Before proving (55) we draw attention to a few of its properties that are important for us here: 1. (55) holds for any initial state ρ ABE . In particular, we do not assume that Alice's and Bob's marginal ρ AB must respect any symmetries or that the outcomes of any measurements they could perform on it must be equiprobable.
2. The right side of (55) is a monotonically increasing function in the argument | X ⊗ B |. This means that if we know a (nonnegative) lower bound for the argument | X ⊗ B | then we can safely substitute it into (55) to obtain a lower bound for the conditional entropy.
3. Although we will later only need to apply it to bipartite qubit systems, we remark that (55) is fully device-independent on Bob's side.
A derivation of (55) written for the prepare-andmeasure version of the BB84 protocol that is deviceindependent on Bob's side already exists [30]; we simply restate it here for the entanglement-based setting that we are working in and modify it to confirm that the result still holds even if Alice's measurement outcomes are not equiprobable, i.e., that property 1 holds. Property 2 only concerns the end result and was already pointed out in [30]; appendix B of [30] in particular proves that (55) is convex in the argument X⊗B and attains its global minimum at X⊗B = 0. This is also implied by lemma 1 in section 4.5 of this article.
We start with the fact that we can assume Alice, Bob, and Eve initially share a state |Ψ ABE that is pure; this can be justified, for instance, by the fact that the conditional entropy cannot increase if we purify the initial state and give the extension to Eve. Next, using that Alice's system is a qubit, we express the state as where |0 and |1 are the eigenstates of Z and the states |ψ 0 and |ψ 1 are subnormalised so that ψ 0 2 + ψ 1 2 = 1. We don't assume |ψ 0 and |ψ 1 are orthogonal to one another. The correlation between Alice and Eve after Alice measures Z and flips the outcome with probability q is described by the classicalquantum state To simplify the end result, we use that the conditional entropy H(Z|E) τ of (59) is identical to the conditional entropy H(Z|E) τ of a state which is identical to (59) except with [0] and [1] swapped. Furthermore, the entropy in both cases is the same as the conditional entropy H(Z|EF)τ computed on a symmetrised statē That is, one can verify that Hence, we can bound H(Z|E) by deriving a lower bound for the conditional entropy H(Z|EF)τ of (61).
Grouping the terms in [0] A and [1] A together we rewriteτ as which are normalised to Tr[σ = ] = Tr[σ = ] = 1. Next, we use that for any extension of (63), i.e., any stateτ ABEFF such that Specifically, we usē where we replace σ = and σ = in (63) with purifications where, in turn, and B is a (any) Hermitian operator satisfying B 2 = 1 B . Direct computation of the conditional entropy on the state (68) gives Finally, we obtain the result (55) by observing, using the expression (58) for the initial state |Ψ ABE , that Before returning to the device-independent protocol we remark that the BB84 entropy bound (55) is tight and can be attained with, for example, B = X and any tripartite state of the form where are the Bell states, for E xx = X ⊗ X and any value −1 ≤ E zz ≤ 1 of E zz = Z ⊗ Z . One can verify that Alice's and Bob's marginal of (74) is This is the entanglement-based version of a family of optimal attacks originally derived in the first security proof of the BB84 protocol against individual attacks [31]. The attack state (13) that we applied to the device-independent protocol in section 2 corresponds to the special case of (74) with E zz = 1. In both cases, the attack strategy is independent of the amount of noise preprocessing applied.

Correlations in the Z-X plane
As we saw in the outline, the BB84 bound effectively reduces the problem of bounding the conditional entropy to applying quantum-mechanical constraints on correlations that can appear in the subsystem shared by just Alice and Bob. We show here that, for any underlying quantum state, the correlations between the Z and X Pauli operators always respect the bounds and where we use an abbreviated notation E zz = Z ⊗ Z , E zx = Z ⊗ X , and so on for the correlations. Note that one of these constraints, (78), is the constraint (40) that we used earlier in the outline.
To prove these constraints we use the fact that, for normalised Bloch vectors a = (a z , a x ) and b = (b z , b x ), the linear combinations a·σ and b·σ have eigenvalues ±1. It follows that, for any state, We can rewrite the left side as where E is the 2 × 2 matrix of coefficients E ij = σ i ⊗ σ j for i, j ∈ {z, x}. Since the relation holds for any normalised vectors a = [a z , a x ] T and b = [b z , b x ] T , it necessarily holds for whichever vectors maximise the left side. Using these implies This is equivalent to the operator inequality EE T ≤ 1 or, put differently, that the matrix is positive semidefinite. According to the Sylvester criterion, this is the case if and only if all of its principal minors are of nonnegative determinant, i.e., if These are exactly the constraints (78), (79), and (80) asserted at the beginning of this subsection.

Entropy bound for qubits
We are now ready to derive the bound satisfied by the conditional entropy for qubit systems in terms of the S α Bell expression. As we did in the outline, we choose the bases of Alice's and Bob's systems such that their measurement operators are of the form and In this case the expectation value of S α satisfies For |α| ≥ 1, the problem from this point is straightforward. Using the constraint from the previous subsection we obtain which, making the choice B = X, rearranges to Using this in the BB84 entropy bound gives for all |α| ≥ 1 for qubits, and we only need to verify that the right side is convex in S α to justify extending the result to arbitrary dimension, which we do in the next subsection.
For |α| < 1 we need to do a bit more work. In this case, we choose B to be of the form cos(θ)Z + sin(θ)X such that For the best θ, Together with (93), and using the notation and constraints derived in the previous section, the full problem we want to solve is in the variables E zz , E zx , E xz , E xx . The solution to this optimisation problem is derived in detail in appendix A. The end result, depending on S α , is for |S α | ≤ 2 √ 1 + α 2 − α 4 . Applying this in the BB84 bound (55) gives with E α (S α ) given by (101) or (102) depending on the value of S α . As a side remark, we note that the lower bounds on | X ⊗ B | that we just derived in term of S α can be used to derive the tight bound for the min-entropy in terms of S α . This is discussed in appendix D.

Device-independent entropy bound
Having bounded the conditional entropy for qubit systems, the remaining step is to establish the convexity with respect to the Bell expectation value S α , or to construct the convex hull, of the family of bounds we have derived. The qubit bound (103) is illustrated for q = 0 and α = 0.9 in figure 3. It visibly has the appearance of being concave for S α ≤ 2 √ 1 + α 2 − α 4 and convex for S α above this value. We show here that this is generally true of (103) for all q and all |α| < 1 while the qubit bound (97) for |α| ≥ 1, which has the same form as (103) for |α| < 1 and S α ≥ 2 √ 1 + α 2 − α 4 , is always convex. We establish the concavity or convexity of the qubit bounds by bounding their second derivatives. To do this, we recall some conditions under which concavity or convexity are preserved under function composition. The second derivative of the composition f • g of two functions is given by From this we can see that f • g is guaranteed to be convex if both f and g are convex and if f is monotonically increasing. Conversely, f • g is guaranteed to be concave if f is concave and monotonically decreasing while g is convex.
Using this approach, we prove that the bound given by (103) and (101) as well as (97) is convex by expressing it as f • g(S α ) for the function f in lemma 1 below with Q = (1 − 2q) 2 and with which is clearly convex 7 . The following result, which we prove in appendix B.1, confirms that f has the properties needed for us to infer that the composition f • g is convex.

Lemma 1. The function
is convex and monotonically increasing in x for 0 ≤ x ≤ 1 and for any 0 ≤ Q ≤ 1.
We similarly prove that the curve described by (103) and (102) is concave by expressing it as f • g(S α ), this time with the function f in lemma 2 below with Q = (1 − 2q) 2 and with Checking that g is convex amounts to checking that the function s → √ s 2 − 1 is concave, which doesn't present any particular problem. The following result, proved in appendix B.2, verifies that f has the properties required to guarantee that f • g is concave.

Lemma 2. The function
is concave and monotonically decreasing in x for 0 ≤ x ≤ 1 and for any 0 ≤ Q ≤ 1.
Finally, one can verify that the qubit entropy bound for |α| < 1 and its first derivative in S α are continuous. This amounts to checking that (101) and (102) both have the same values and first derivatives, at the point S * = 2 √ 1 + α 2 − α 4 . One can also verify that its gradient becomes infinite as S α approaches the quantum bound 2 √ 1 + α 2 .
The device-independent bound in arbitrary dimension is given by the convex hull of the qubit bound. 7 Note that using the BB84 bound for f and using g(Sα) = S 2 α /4 − α 2 for g does not work with this approach since g would be concave in that case. This implies that the part of the qubit bound described by (103) and (102) for S α ≤ 2 √ 1 + α 2 − α 4 , which is concave, may be ignored and the deviceindependent bound is thus given by the construction described at the end of section 2 and illustrated in figure 1. In particular, this is where the lower limit of 2 √ 1 + α 2 − α 4 in the range (30) for the root-finding problem (29) comes from. The fact that the qubit entropy bound and its gradient are continuous everywhere, that it is concave for S α ≤ 2 √ 1 + α 2 − α 4 and reaches h(q) at the classical bound S α = 2, that it is convex for S α ≥ 2 √ 1 + α 2 − α 4 , and that its gradient becomes infinite at the quantum bound imply that there is necessarily a solution to the root-finding problem (29) in the range (30) and that it is unique.

Applications to DIQKD key rates
The entropy bound H(A 1 |E) ≥ḡ q,α (S α ) we have now proved can be applied in QKD security frameworks that reduce proving the security of a protocol to bounding the conditional von Neumann entropy in a single round. Applying it in the Devetak-Winter rate (4) gives a lower bound on the asymptotic key rate that depends only on parameters -the Bell expectation value S α and probabilities P (ab|13) -that Alice and Bob working together can estimate.
In this section, we apply (111) to obtain explicit estimates of the robustness of the device-independent QKD protocol in two commonly studied imperfection models, both of which were also used as examples in [4]: depolarising noise, where we assume that the optimal Bell state for the protocol is mixed with white noise, and a generic loss model.
All the thresholds we report when using noise preprocessing were computed in the limit q → 1/2 of maximal random noise. This typically seems to give the best threshold and this was what we saw in cases where we computed the key rate for different amounts of noise preprocessing, although we have not checked that q → 1/2 is optimal in every case. We describe how the Devetak-Winter rate can be computed in this limit in appendix C.

Depolarising noise
In this model we suppose that Alice and Bob share a noisy version, of the optimal two-qubit Bell state parametrised by some visibility v. For the ideal measurements A 1 = Z and B 3 = Z for key generation, the possible outcomes are obtained with joint probabilities where the error rate δ is related to the visibility in When Alice additionally applies noise preprocessing, the resulting joint distribution retains the same form but with a worse error rate, The conditional Shannon entropy associated with this distribution is depending on the amount q of noise preprocessing applied.
In the CHSH-based protocol, the ideal measurements in the Bell test are A 1 = Z, A 2 = X, and B 1,2 = (Z ± X)/ √ 2. With these measurements the two-body correlation terms satisfy which translates to an expectation value of the asymmetric CHSH expression.  Table 1: Threshold error rates (%) obtained using either CHSH (α = 1) or the optimal asymmetric expression (α = opt), both without (q = 0) and with maximal (q → 1/2) noise preprocessing. The third row (α, By = opt) gives the thresholds when in addition Bob's measurements are optimised such that Sα = 2 The lower bound on the Devetak-Winter rate we obtain for the depolarising noise model is then explicitly The best possible bound on the key rate is obtained by maximising the right side of (121) over α and q. We illustrate the result as a function of the channel noise rate δ in figure 4. The key rate computed using only the CHSH bound of [4], i.e., q = 0 and α = 1, is also shown for comparison. The combination of applying noise preprocessing and optimising over the S α family of Bell expressions increases the threshold error rate, up to which the key rate remains positive, from δ ≈ 7.15% found in [4] to 8.33%.
In table 1 we list the threshold error rates obtained for the different combinations of using CHSH or the optimal S α expressions without or with noise preprocessing. Table 1 in addition gives the thresholds obtained when using, instead of the measurements B 1,2 = (Z ± X)/ √ 2 that are optimal for CHSH, the measurements that attain the maximal value of the S α expression for the depolarised state. This gives marginally better threshold error rates. Since the conditional entropy bounds used in the above security analysis are tight, the threshold error rates that we compute are optimal in terms of the asymmetric CHSH expressions S α , and the values reported in table 1 optimised over α are optimal in terms of the combinations A 1 B 1 + A 1 B 2 and A 2 B 1 − A 2 B 2 viewed as independent parameters. But they are actually also optimal with respect to an analysis that would take into account the full set of statistics. This is because according to the measurement and noise model considered above, Alice's and Bob's marginal measurement outcomes are equiprobable, i.e., and the two-body correlations satisfy and But these relations, which completely fix the full set of correlators once the independent combinations A 1 B 1 + A 1 B 2 and A 2 B 1 − A 2 B 2 are specified, are also satisfied for the family of optimal attacks presented in section 2 and saturating our entropy bound. Thus, specifying other correlation terms beyond those involved in the definition of S α would not restrict the attack strategies further than already considered.

Losses
In this setting, we suppose that Alice and Bob detect their particles and obtain definite measurement outcomes with some probability η which, for simplicity, we take to be the same on both sides. We model this formally by treating nondetection events as a third measurement outcome, obtained independently by Alice and Bob with probability 1 − η. In this case, as well as the maximally-entangled Bell state we also consider a possible type of strategy in which Alice and Bob deliberately use partially-entangled states, which have been shown to improve the robustness of Bell experiments based on the CHSH inequality to losses [32]. We consider the maximally-entangled state first. In order to apply our entropy bound we need to reduce the setting to one where the measurements used in the Bell test all have only two outcomes. The typical way to do this, which we apply here, is to map ("bin") nondetection events to one of the outcomes +1 or −1. In terms of the global detection efficiency η, the maximum value of the S α expression over the different possible binning strategies is whereη = 1 − η, if Bob uses the diagonal measurements B 1,2 = (Z ± X)/ √ 2 or if Bob uses the optimal ones. For the key generation measurements A 1 = B 3 = Z, Alice and Bob obtain outcomes (including nondetections) with the joint probabilities however, since we map nondetection events to (for example) A 1 = +1 on Alice's side to use the entropy bound we must do the same here, that is, we should add the third row of (128) to the first. This gives the joint distribution P AB (ab|13) =  Table 2: Threshold detection efficiencies (%) obtained both without (q = 0) and with maximal (q → 1/2) noise preprocessing for the maximally-entangled state. The first (α = 1) and second (α = opt) rows give the thresholds obtained using only CHSH and the optimal asymmetric Bell expression using diagonal measurements (Z ± X)/ √ 2 on Bob's side. In the third row (α, By = opt) we also use the optimal measurements on Bob's side.
Finally, as before, when noise preprocessing is applied we also need to swap the rows of (129) with probability q, i.e., transform (129) according to before computing the conditional Shannon entropy The threshold global detection efficiencies we found for the resulting Devetak-Winter rate for the maximally-entangled state are reported in table 2. In this case the thresholds are all a little over 90% with little variation depending on whether the S α family or noise preprocessing are used. The threshold η ≈ 90.78% that we obtain using only CHSH and with no noise preprocessing is better than the threshold η ≈ 92.4% found in [4] as a result of computing the conditional Shannon entropy on the full probability distribution (129) without binning the nondetection event on Bob's side. It is also slightly better than the threshold of 90.9% found in [33] due to a small advantage in bounding the Devetak-Winter rate via the conditional von Neumann entropy rather than via the Holevo quantity as was originally done in [4].
We now consider partially-entangled states which, as we mentioned, are known to increase the robustness to losses in the CHSH Bell experiment. In this case, we suppose that Alice and Bob share a state dependent on a parameter θ characterising the degree of entanglement. The density operator associated to |ψ θ is We then suppose that Alice and Bob measure A 1 = Z and B 3 = Z to generate their key and use whichever measurements A 2 , B 1 , and B 2 give the highest expectation value of the S α expression given that A 1 is fixed to Z and the global detection efficiency is fixed to some value η. For this problem, the best thresholds we saw were obtained by mapping all nondetection events to +1. For this binning strategy, the expectation value of S α can be expressed as in terms of η and the expectation values A x , B y , and A x B y that would be obtained from (131) if there were no losses. Setting and optimising over the measurements B 1 and B 2 on Bob's side gives where in terms of θ, ϕ A , and η. With this strategy, for small θ 8 the expectation value in the special case of CHSH is approximated by to the smallest nontrivial order in θ, or if ϕ A is also small. This shows that the strategy we have described can violate the CHSH inequality as long as the global detection efficiency is better than η = 2/3, the same as was found in [32], although our choice to fix A 1 = Z means that the CHSH violation we can attain is not as high as it could otherwise be. The outcomes including nondetections when Alice and Bob measure A 1 = Z and B 3 = Z on the partiallyentangled state occur with joint probabilities where c 2 = cos θ 2 2 and s 2 = sin θ 2 2 . As before, we should merge the nondetection events on Alice's 8 More precisely, the approximation (139) is valid if |θ| is small compared to |ϕ A |. This means that ϕ A can be taken arbitrarily close to zero as long as θ is taken even smaller. This condition is also why (139) does not imply that the CHSH inequality can be violated with ϕ A = 0. q = 0 q → 1/2 α = 1 86.5479 82.5742 α = opt 86.5255 82.5742 Table 3: Threshold detection efficiencies (%) obtained using either CHSH (α = 1) or the optimal asymmetric expression (α = opt), both without (q = 0) and with maximal (q → 1/2) noise preprocessing, for the strategy using partially-entangled states.
side with the +1 outcome and swap the rows with probability q if noise preprocessing is also used before computing H (A 1 |B 3 ).
Computing the Devetak-Winter rate using the value (135) of S α and maximising the result over θ and ϕ A gives a positive rate up to the global detection efficiencies listed in table 3. The thresholds for q = 0 are attained for partially-entangled states with θ a little under 0.5 radians. The threshold for q → 1/2 by contrast is attained in the limit θ → 0 of a separable state. The approximations of the key rate for q = (1 − ε)/2 described in appendix C and (140) of CHSH for small θ and ϕ A can be used to derive an approximate lower bound, for the key rate when ε and the angles are small. In this vicinity the key rate can be positive, albeit minuscule, as long as the global detection efficiency is better than η = 10/3 − 1 ≈ 82.5742% .
For q → 1/2 we didn't see any improvement to the threshold when using the S α family instead of the CHSH expression.
The results in table 3 should be taken with a pinch of salt as they were derived assuming only losses occur in an otherwise perfect experiment, which is not realistic. The threshold detection efficiency using noise preprocessing in particular was derived by taking the limit θ → 0 of a separable state and is accordingly very vulnerable to noise. To model this, we computed the best thresholds (i.e., using both noise preprocessing and the S α family) when we replace the initial state with an attenuated one of the form The threshold detection efficiencies both for θ = π/2 (the maximally-entangled state) and for whichever partially-entangled state gave the best result are illustrated as a function of the error rate in figure 5.
The threshold using partially-entangled states visibly increases very rapidly as soon as we add even a small amount of channel noise. We also recomputed the thresholds of table 3 with the visibility set to v = 99%, corresponding to a more realistic error rate of δ = 0.5%. This increases the thresholds, listed in table 4, to above 87%. Finally, note that while the conditional entropy bound we used holds generally, it is only really optimised for the case that Alice and Bob's correlations satisfy Eqs. (123)-(125) and in particular obtain equiprobable measurement outcomes. Deterministically binning nondetection events and deliberately using a partially-entangled state both spoil this and the real thresholds could actually be significantly better than the ones we report here.  Table 4: Threshold detection efficiencies (%) obtained using either CHSH (α = 1) or the optimal asymmetric expression (α = opt), both without (q = 0) and with maximal (q → 1/2) noise preprocessing, using partially-entangled states but with a 0.5% channel error rate.

Discussion
In our work we derived a tight lower bound on the conditional von Neumann entropy following an arbitrary amount of noise preprocessing and for the family S α of asymmetric CHSH Bell expressions, which allows us to make more effective use of the statistics than when using the standard CHSH expression. Our proof heavily exploited the similarity of the device-independent protocol to the entanglement-based version of the BB84 protocol. Section 5 showed that these modifications, both individually and together, can improve the robustness of the original CHSH-based protocol using two commonly-used imperfection models as examples. For a maximally entangled two-qubit state subject to a depolarising noise model, the threshold error rate according to our analysis is just above 8.34%. This is actually the optimal error rate, equalling a security analysis that takes into account the full set of statistics.
As is typically the case of research based on the CHSH Bell setting, our analysis is heavily dependent on the fact that the setting can be effectively reduced to the study of bipartite qubit systems. Obviously, it would be interesting in the future to learn how to derive good bounds for the conditional von Neumann entropy in Bell settings with more inputs and/or outputs, where we cannot rely on such a reduction.
Within the CHSH setting however there are still some possible avenues for further work. First, while the entropy bound we have derived is tight in terms of the parameters it depends on, this does not mean it is always optimal for every scenario. Our approach in particular is optimised for the case that Alice's and Bob's marginal measurement outcomes are equiprobable. This is fine if the imperfections in a real implementation most closely correspond to the depolarising noise model but not, as we cautioned in section 5, if they more closely resemble the loss model. It is likely that our entropy bound gives suboptimal results in the latter case. This is partly confirmed by a recent result for partially-entangled qubits in [14], where a threshold of about 84.3% is obtained for the global detection efficiency without using noise preprocessing, which is somewhat better than the thresholds of around 86.5% that we obtained for this case here.
Our proof, however, has a rather modular nature; parts of it could no doubt be changed, generalised, or applied to different problems without affecting other parts. Different preprocessings could be considered and may only require changing the derivation of the BB84 bound in section 4.2; we have not checked, for instance, if flipping both of Alice's outcomes with the same probability q is always the optimal choice. Optimisation problems of the kind we landed on in section 4.4 may lend themselves to numerical approaches 9 , although it should be kept in mind that solving the problem analytically made it much more straightforward for us to prove when the result was and was not convex. Lemmas 1 and 2 in section 4.5 may help prove the convexity or nonconvexity of en- 9 In particular, Eq. (100) as written is the square root of a polynomial optimisation problem and could in principle be solved numerically using the Lasserre hierarchy [34]. This would still be true, albeit the problem larger, if we had not optimised out the measurements; the sines and cosines of the angles ϕ A and ϕ B could still be treated as additional variables satisfying polynomial constraints c 2 + s 2 = 1. tropy bounds with similar functional forms to what we derived in section 4.
Second, our approach exploits two refinements -using more information about the statistics and noise preprocessing -that were already known to improve the performance of cryptography protocols. A third refinement that we have not exploited here would consist of using both of Alice's measurements to generate the key, which forces an eavesdropper to have to gain information about both bases without knowing in advance which will be used. This kind of modification has previously been shown to improve the average bound on the min-entropy in the device-independent setting [19].
This variant of the CHSH-based protocol has recently been considered [12], however the approach of [12] requires a rather elaborate numerical procedure to bound the key rate and the threshold error rate of 8.2% reported by the authors for the depolarising channel does not exceed the threshold just above 8.34% that we found for the single-basis version of the protocol using the refinements we considered here.
We suspect that the result of [12] is not quite optimal, however, and thus a good candidate for further study. One possible way to bound the average entropy of Alice's measurements for this problem may be to try to apply the same method we have applied to the single-basis protocol here. According to a quick numerical test we performed, the best bound on the average conditional entropy that could be obtained using only the BB84 entropy bound of section 4.2 and Pauli correlation bounds of section 4.3 should give a slightly better threshold of around 8.36%, or alternatively up to 9.24% if noise preprocessing is also used. Even these thresholds do not appear to be optimal, however. We also performed a brute-force numerical minimisation of the average conditional von Neumann entropy. The results seemed to show that the optimal attack for qubit systems involves Alice and Bob using measurements of the form A 1,2 = cos ϕ A 2 Z ± sin ϕ A 2 X and B 1,2 = Z, X on an asymmetric version of the optimal BB84 attack state 10 , i.e., (74) with different values of E zz and E xx . In other words, the tight lower bound on the average entropy for qubit systems appeared to us to coincide with the result of minimising 10 Section I.H of the supplementary information to Ref. [12] conjectures that the reduced state shared by Alice and Bob in the optimal attack is Bell diagonal with two nonzero eigenvalues, which would correspond to an attack state like (74) with, e.g., Ezz = 1, but this is not consistent with what we found when minimising the average conditional von Neumann entropy directly. The minimum of (145) subject to (146) is generally not attained with either Ezz = ±1 or Exx = ±1.
for a given expectation value S of the CHSH correlator. Furthermore, similar to the qubit bound we derived in section 4.4, the resulting bound appears to be concave except in a region close to the quantum bound, where the optimal qubit attack appears to consist of mutually unbiased measurements (ϕ A = π/2) on a state for which E zz = E xx = S/ √ 8. Assuming these observations are correct, this would mean that the average bound for qubit systems is given by if the CHSH expectation value is close to the quantum maximum, with the device-independent bound obtained by extending one of the tangents of (147) as in the construction of our main result in section 2. This result would imply that the threshold noise rate of the DIQKD protocol using both bases (without noise preprocessing) is around 8.44%.
Eqs. (145) and (146) suggest it may be difficult to rigorously prove the tight bound on the average conditional entropy for the two-basis version of the DIQKD protocol. Nevertheless, the thresholds we have estimated numerically suggest there is some room for improvement in the results of [12], particularly if noise preprocessing is also used.

Note added.
A derivation of the conditional entropy bound for CHSH incorporating noise preprocessing, i.e., the special case α = 1 of the conditional entropy bound we derive here for the full S α family, has recently been published in [35] independently of us, which the authors apply to an investigation of the performance of an optical model. This entropy bound is obtained by parametrising and explicitly optimising over all qubit attacks, following the approach of [3,4]. Here, we exploited the fact that we already know how to derive the entropy bound including noise preprocessing for the BB84 protocol [23,30]. The qubit analysis of [35] can be promoted to a fully, dimension-free, device-independent bound using the convexity proof we give in appendix B (as this step is incomplete in [35]).
After making public the present results, a followup to [35] providing an analytical derivation of the same entropic bounds for the S α expressions for α ≥ 1 and proposing a numerical method for |α| < 1 has appeared in [36].
Finally, the conjecture described around (147) on the average entropy of both of Alice's measurement outcomes was also made independently in [37].

A Qubit optimisation problem
Here we solve the optimisation problem (100) in section 4 of the main text. We first simplify it by introducing polar coordinates, E zz = λ cos(z) , E xz = µ cos(x) , (148) E zx = λ sin(z) , E xx = µ sin(x) . (149) With this change of variables the problem becomes minimise |µ| subject to in the free variables µ, λ, z, and x. From here, it is an algebra problem to eliminate the unwanted variables λ, z, and x so that only a constraint between |µ| and the constants α and S α remains.
We begin with the first constraint. Using the trigonometric identities cos(x) 2 = 1 + cos(2x) /2 and sin(x) 2 = 1 − cos(2x) /2, it can be rewritten Substituting Σ = z + x and ∆ = z − x and using the trigonometric angle sum and difference identities, we get . (167) The upper limit (166) can be rewritten as which makes it clear that the right side is never less than 1. The lower limit (167) on the other hand may be less than 1 depending on α and S α . Specifically, if then the right side of (167) is not more than 1, in which case it is possible to choose λ such that t = 1.
Recalling that |µ| = | X ⊗ B |, we obtain in this case On the other hand, if |S α | ≥ 2 √ 1 + α 2 − α 4 then the minimum is attained with the smallest value allowed of the argument, in which case the constraint (163) simplifies to the same expression, that we derived in the main text for |α| ≥ 1.

B Concav/exity of the qubit bound
Here we prove lemmas 1 and 2 in the main text by bounding the first and second derivatives of the functions they concern. Both of these depend on the function that we defined in the introduction. We give its first and second derivatives, which are used in the proofs, here for convenience: B.1 Proof of lemma 1 We express the function f defined in Eq. (106) as with R = Q + (1 − Q)x and r = √ x. The first and second derivatives of r and R are Let us first verify that f is monotonically increasing. Its first derivative is To change the terms with logs into something easier to work with we substitute 1 2ξ for both ξ = R and ξ = r. The rest of the proof then amounts to manipulating and simplifying quotients of polynomials: where we used that R 2 − r 2 = Q(1 − r 2 ).
We prove that f is convex in a similar way. Its second derivative is The first and second lines on the right side evaluate to derived for the family I β α = β A 1 + S α of Bell expressions for |α| ≥ 1 in [21].