Improved DIQKD protocols with finite-size analysis

The security of finite-length keys is essential for the implementation of device-independent quantum key distribution (DIQKD). Presently, there are several finite-size DIQKD security proofs, but they are mostly focused on standard DIQKD protocols and do not directly apply to the recent improved DIQKD protocols based on noisy preprocessing, random key measurements, and modified CHSH inequalities. Here, we provide a general finite-size security proof that can simultaneously encompass these approaches, using tighter finite-size bounds than previous analyses. In doing so, we develop a method to compute tight lower bounds on the asymptotic keyrate for any such DIQKD protocol with binary inputs and outputs. With this, we show that positive asymptotic keyrates are achievable up to depolarizing noise values of $9.33\%$, exceeding all previously known noise thresholds. We also develop a modification to random-key-measurement protocols, using a pre-shared seed followed by a"seed recovery"step, which yields substantially higher net key generation rates by essentially removing the sifting factor. Some of our results may also improve the keyrates of device-independent randomness expansion.


Introduction
Device-independent quantum key distribution (DIQKD) is a cryptographic concept based on the observation that if some quantum devices violate a Bell inequality, then it is possible to distill a secret key from the devices' outputs, even when the devices are not fully characterized [BHK05,PAB+09,Sca13]. This is a stronger form of security than that offered by standard QKD protocols, which assume that the devices are performing measurements within specified tolerances [SBPC+09]. In recent years, experimental and theoretical developments have brought the possibility of a physical DIQKD demonstration closer to fruition. In particular, a series of experiments have achieved Bell inequality violations while closing the fair-sampling loophole, using NV-centre [HBD+15], photonic [GMR+13,CMA+13,SMSC+15,GVW+15,LZL+18,SLT+18], and cold-atom [RBG+17] implementations. On the theoretical front, several protocol modifications have been explored that improve the asymptotic keyrates and noise tolerance of DIQKD. We focus on those studied in [HST+20] (noisy preprocessing, in which a small amount of trusted noise is added to the device outputs), [SGP+21] (random key measurements, in which more than one measurement basis is used to generate the key, preventing an adversary from optimally attacking both bases simultaneously), and [WAP21,SBV+21] (modified CHSH inequalities, which certify more entropy than the standard CHSH inequality).
Furthermore, earlier device-independent security proofs [PAB+09] required the assumption that the device behaviour across multiple rounds is independent and identically distributed (IID), sometimes referred to as the assumption of collective attacks. With the development of a result known as the entropy accumulation theorem (EAT) [DFR20, AFRV19,DF19], this assumption can now 1 be removed for some DIQKD protocols -security proofs based on the EAT are valid against general attacks (sometimes referred to as coherent attacks), as long as the device behaviour can be modelled in a sequential manner. The EAT also provides explicit bounds on the finite-size behaviour, hence concretely addressing the question of what sample size is required for a secure demonstration of DIQKD. (Another recent result that can be applied to obtain such bounds against general attacks is the quantum probability estimation technique [ZKB18], but in this work we focus on the EAT.) However, thus far there has been no comprehensive analysis which simultaneously encompasses the various approaches mentioned above. In this work, we present a finite-size security proof that can be applied to protocols which incorporate all of these proposed improvements. In addition, we show that such protocols can achieve substantially higher depolarizing noise tolerance than all previous proven results. We now highlight the main contributions of our work.

Summary of key results
We consider a protocol based on one-way error correction (Protocol 1) that combines noisy preprocessing and random key measurements, and we perform a finite-size analysis against general attacks (Theorem 1) as well as collective attacks (Theorem 4). Our approach is similar to that used in [AFRV19]; however, we slightly modify the analysis in order to relax the theoretical requirements for the error-correction step, and our bounds are tighter since we use an updated version [DF19,LLR+21] of the EAT. Roughly speaking, the proofs rely on lower bounds on the asymptotic keyrates, and hence in Sec. 5 we also describe an algorithm to compute such bounds, which can be applied independently of our finite-size analysis. This algorithm improves over previous results in [HST+20,SGP+21,WAP21,SBV+21] by having all of the following properties simultaneously: it applies to arbitrary 2-input 2-output protocols, it accounts for noisy preprocessing and random key measurements, and it provably converges to a tight bound 2 (for protocols of this form).
For simplicity, the protocol we present only uses the CHSH inequality rather than the modified CHSH inequalities of [WAP21,SBV+21], because our results suggest (see Sec. 5.6) that within the 2-input 2-output scenario, there may not be much prospect for improvement by using the latter, at least for the depolarizing-noise model. However, the finite-size analysis we perform can (like [BRC20]) be applied to protocols based on arbitrary Bell inequalities -in Sec. 4.4, we briefly explain the relevant adjustments in that case. Similarly, while we only performed explicit computations of the asymptotic keyrates for CHSH-based protocols, our approach as described in Sec. 5 can be applied to all 2-input 2-output Bell inequalities.
With our formulas for the finite-size keyrates, we computed the keyrates for several scenar-1 To be precise, other proof techniques [NSPS14,VV14,JMS20] are available that do not require the IID assumption, but the asymptotic keyrates given by those techniques are lower than that of the EAT.
2 In fact, similar to [WAP21], we find numerical evidence that the bound in [SGP+21] is not entirely tight, and also that a conjecture proposed there regarding Eve's optimal attack may not be true after all; see Sec. 5.4. ios. In particular, we studied the keyrates that could be achieved if the honest devices had performance described by the estimated parameters in [MDR+19] for the NV-centre [HBD+15] and cold-atom [RBG+17] loophole-free Bell tests (detailed keyrate plots are shown in Fig. 1

of Sec. 3.4).
For photonic experiments, it is currently somewhat unclear whether the protocol here provides improvements over the results in [HST+20, WAP21, SBV+21] -there are some complications which we explain in Sec. 5.5. We also plot some finite-size keyrates for honest devices subject to a simple depolarizing-noise model (see Fig. 2 of Sec. 3.4).
From those computations, we found that our security proof would require the [HBD+15] and [RBG+17] experiments to run for approximately n ∼ 10 8 and 10 10 rounds respectively in order to certify a positive finite-size keyrate against general attacks. While this is a marked improvement over the basic [PAB+09] protocol (which yields zero asymptotic keyrate for those experiments), these requirements still appear to be quite far outside the reach of those implementations -for reference, the [HBD+15] experiment had sample size n = 245 (over a 220-hour period), while the [RBG+17] experiment collected a data set of size n = 10, 000 (over 2 measurement runs in a 10-day period) and also a larger data set of size n = 55, 568 (over multiple measurement runs in a 7-month period). Our keyrate plots ( Fig. 1) also showed that changing the various security parameters (discussed in detail in Sec. 2.1) by several orders of magnitude only results in fairly small changes to the keyrate, so it appears unlikely that the keyrates could be substantially improved by relaxing these security requirements.
To improve on these results, in Sec. 6 we describe two modifications of Protocol 1. Firstly, Protocol 2 is a modification based on a pre-shared key, which achieves a net key generation rate approximately double that of Protocol 1, by overcoming a crucial disadvantage of random-keymeasurement protocols (namely, the sifting factor). Secondly, Protocol 3 is a modification that is optimized for the collective-attacks assumption -by changing the protocol itself for this scenario rather than just the security proof, we were able to further improve the keyrate. However, we find that even with these modifications and relaxed security requirements, it appears that the required number of rounds for a positive finite-size keyrate is still impractically large, at about n ∼ 10 6 to 10 7 for the [RBG+17] parameters (see Fig. 6 in Sec. 6).
As another immediate consequence of our work, we computed a lower bound on the asymptotic noise tolerance of our protocols against depolarizing noise, by using our algorithm for evaluating the asymptotic keyrates. Our results (shown in Fig. 4 of Sec. 5.5) certify that positive asymptotic keyrates are possible for depolarizing-noise values of up to 9.33% at least. In comparison, the previous best thresholds were 8.34% [WAP21] (using noisy preprocessing and modified CHSH inequalities) or 8.2% [SGP+21] (using random key measurements). Our improvement over these results is hence of similar magnitude to their improvements over the basic [PAB+09] protocol, which has a threshold of 7.15%. Furthermore, our bounds are in fact close to the highest possible bounds allowed by simple convexity arguments -we prove in Sec. 5.6 that all protocols of this general form cannot achieve a depolarizing-noise threshold beyond 9.57%, so the threshold we obtained is not far from the absolute highest possible value in this setting.
Finally, we remark that several results from this work can also be used for DI randomness expansion (DIRE) [Col06,PAM+10,BRC20,LLR+21]. Specifically, the algorithm we use for computing the asymptotic DIQKD keyrates can also be used to obtain tighter bounds on the asymptotic keyrates for DIRE, after which a finite-size analysis could be performed using the EAT [BRC20,LLR+21]. Moreover, the "key recovery" process in the modified protocol (Protocol 2) can be applied in the context of DIRE, hence allowing one to improve the keyrates by using the random-key-measurement approach of [SGP+21]. (This idea was also independently proposed in a separate work [BRC21] at a similar time.) We discuss this in detail in Sec. 6.2.

Paper structure
In Sec. 2, we state the definitions and notation we use in this work. In Sec. 3, we describe the main protocol we consider, and state the main theorem which bounds the finite-size keyrates (Theorem 1), followed by presenting some plots of the resulting values. We give the security proof for this finite-size bound in Sec. 4. In Sec. 5 we present the algorithm to compute the asymptotic keyrates (i.e. the leading-order terms in Theorem 1), and describe the resulting depolarizing-noise thresholds as well as an upper bound on these thresholds. Finally, in Sec. 6 we discuss several variations, such as a modified random-key-measurement protocol which bypasses the sifting factor, and security proofs against collective attacks instead of general attacks.

Preliminaries
We define some basic notation in Table 1, and state some further definitions below. We take all systems to be finite-dimensional, but we will not impose any bounds on the system dimensions unless otherwise specified.

H(·)
Base-2 von Neumann entropy D(· ·) Base-2 quantum relative entropy · p Schatten p-norm · (resp. · ) Floor (resp. ceiling) function X ≥ Y (resp. X > Y ) X − Y is positive semidefinite (resp. positive definite) S = (A) (resp. S ≤ (A)) Set of normalized (resp. subnormalized) states on register A U A Maximally mixed state on register A [n] Indices from 1 to n, i.e. {1, 2, . . . , n} Definition 1. (Frequency distributions) For a string z ∈ Z n on some alphabet Z, freq z denotes the following probability distribution on Z: freq z (z) := 1 n n j=1 δ z,z j . (1) Definition 2. (Binomial distribution) Let X ∼ Binom(n, p) denote a random variable X following a binomial distribution with parameters (n, p), i.e. X is the sum of n IID Bernoulli random variables X j with Pr[X j = 1] = p. We denote the corresponding cumulative distribution function as B n,p (k) := Pr X∼Binom(n,p) Definition 3. (Conditioning on classical events) For a classical-quantum state ρ ∈ S ≤ (CQ) in the form ρ CQ = c |c c| ⊗ ω c for some ω c ∈ S ≤ (Q), and an event Ω defined on the register C, we define the following "conditional states": We informally refer to these states as the subnormalized and normalized conditional states respectively (the latter is perhaps a slight misnomer if Tr[ρ] < 1, but this situation does not arise in our proofs). The process of taking subnormalized conditional states is commutative and "associative", in the sense that for any events Ω, Ω we have (ρ ∧Ω ) ∧Ω = (ρ ∧Ω ) ∧Ω = ρ ∧(Ω∧Ω ) ; hence for brevity we will denote all of these expressions as On the other hand, some disambiguating parentheses are needed when combined with taking normalized conditional states.
Definition 4. (2-universal hashing) A 2-universal family of hash functions is a set H of functions from a set X to a set Y, such that if h is drawn uniformly at random from H, then To ensure consistency of definitions, we now state the definitions of smoothed entropies relevant for this work, though we will not need to use the explicit expressions. We follow the presentation in [DFR20,DF19], which can be shown to be equivalent to the definitions in [Tom16,TL17].
Definition 6. For ρ ∈ S ≤ (AB), the min-and max-entropies of A conditioned on B are H max (A|B) ρ := log max where in the first equation the (I A ⊗ σ B ) − 1 2 term should be understood in terms of the Moore-Penrose generalized inverse. (Note that the optimum is indeed attained in both equations [Tom16], and it can be attained by a normalized state, so S ≤ (B) can be replaced by S = (B) without loss of generality.) Definition 7. For ρ ∈ S ≤ (AB) and ε ∈ 0, Tr[ρ AB ] , the ε-smoothed min-and max-entropies of A conditioned on B are

Security definitions
The question of formalizing an appropriate security definition for the device-independent setting has not been definitively resolved yet [AFRV19], due to considerations regarding the device-reuse attacks described in [BCK13]. However, we shall proceed by following the security definitions used in [AFRV19], which were based on strong 3 security definitions [PR14] for standard QKD, and should be sufficient under suitable constraints on the nature of the device memories (we shall briefly discuss these in the next section, when listing the assumptions).
Qualitatively, the concepts involved in the security definition we use here are: completeness, meaning that the honest devices will accept with high probability, and soundness, meaning that the devices remain "secure" (possibly by aborting) even in the presence of dishonest behaviour. Note that the completeness concept relies on having some description of the honest devices, which should be understood to mean the device behaviour in the situation where they perform "according to specifications", without manipulation or eavesdropping attempts from Eve (we give examples of such descriptions in Sec. 3.3). The following definition formalizes these concepts: Definition 8. Consider a DIQKD protocol such that at the end, the honest parties either accept (producing keys K A and K B of length key for Alice and Bob respectively) or abort (producing an abort symbol ⊥ for all parties). It is said to be ε com -complete and ε sou -sound if the following properties hold: • (Completeness) The honest protocol implementation aborts with probability at most ε com .
• (Soundness) For any implementation of the protocol, we have where σ denotes the normalized state conditioned on the protocol accepting, and E denotes all side-information registers available to the adversary at the end of the protocol.
In the security proof, it is convenient to use the fact that the soundness property is implied by a pair of slightly simpler conditions, as shown in [PR14]. Specifically, to prove a DIQKD protocol is ε sou -sound, it suffices to find ε cor QKD , ε sec QKD such that ε sou ≥ ε cor QKD + ε sec QKD and the protocol is both ε cor QKD -correct and ε sec QKD -secret, defined as follows: Definition 9. A DIQKD protocol as described above is said to be ε cor QKD -correct and ε sec QKD -secret if the following properties hold: • (Correctness) For any implementation of the protocol, we have 4 • (Secrecy) For any implementation of the protocol, we have where σ is as described in Definition 8, and U K A denotes the maximally mixed state (i.e. a uniformly random key for Alice).

Main protocol
The overall structure of our main protocol is stated as Protocol 1 below, with the details of some steps to be specified in the following subsections. We make the following fairly standard assumptions, following the presentation in [AFRV19]: • Alice and Bob can prevent unwanted information from leaking outside of their respective locations (for instance, the inputs and outputs to the devices remain private for each party until/unless they are revealed in the public communication steps).
• Alice and Bob can generate trusted (local) randomness.
• Alice and Bob have trusted post-processing units to perform classical computations.
• Alice and Bob perform all classical communication using an authenticated public channel.
• All systems can be modelled as finite-dimensional quantum registers (though we do not impose any bounds on the dimensions unless otherwise specified).
Regarding the first point in particular, we remark that while the loophole-free Bell tests in [HBD+15,SMSC+15,GVW+15,RBG+17] used spacelike separation to motivate the assumption that the devices do not reveal their inputs to each other, this may not be strictly necessary for a DIQKD implementation. It could be possible, and perhaps more reasonable, to instead justify this assumption by implementing some "shielding" measures on the devices, to prevent them from leaking unwanted information. In any case, "shielding" measures of this nature are likely necessary to prevent the raw outputs of the devices from leaking to the adversary, so it may be expedient to use these measures to prevent the inputs from leaking as well.
The above assumptions will be sufficient for us to show that the protocol satisfies the completeness and soundness definitions stated above. However, when considering whether these formal definitions yield security in an intuitive (or composable) sense, some additional assumptions are needed to address the memory attack of [BCK13]. One approach would be to impose the condition that the devices do not access any registers storing "private" data from previous protocols, and that this condition continues to hold if the devices are reused in the future. (Note that this condition is always implicit in standard QKD, because the register being measured is inherently specified when describing the trusted measurements the devices perform.) It remains to fully formalize this condition in a suitable framework for composable security, but this would be beyond the scope of this work.

Protocol 1
The protocol is defined in terms of the following parameters (chosen before the protocol begins), which we qualitatively describe: . 7. Bob checks whether L h = hash(Ã), as well as whether the value c on registers C satisfies freq c (1) ≥ (w exp − δ tol )γ and freq c (0) ≤ (1 − w exp + δ tol )γ. If all those conditions hold, Alice and Bob proceed to the next step. Otherwise, the protocol aborts. 8. Privacy amplification: Alice and Bob apply privacy amplification (see Sec. 3.2) on A and A respectively to obtain final keys K A and K B of length key .
The rounds in which Y j ∈ {2, 3} will be referred to as test rounds, and the rounds in which Y j ∈ {0, 1} will be referred to as generation rounds (though strictly speaking, the final key in this protocol is obtained from all the rounds, not merely the generation rounds alone). In each round, Eve is allowed to hold some extension of the state distributed to the devices. We will use E to denote the collection of all such quantum side-information she retains over the entire protocol. (We do not denote this using separate registers E j for individual rounds, because in a general scenario, Eve's side-information may not necessarily "factorize" into a tensor product across the individual rounds.) We briefly highlight some aspects of this protocol that may differ slightly as compared to more commonly used QKD protocols. Firstly, we do not choose a random subset of fixed size as test rounds, but rather, each round is independently chosen to be a test or generation round, following [AFRV19]. This was in order to apply the entropy accumulation theorem, which holds for processes that can be described using a sequence of maps. Furthermore, the parameter-estimation check is performed on both freq c (1) and freq c (0). This was required in order to derive a critical inequality in the security proof (following [BRC20]), though in some cases it is possible to omit the freq c (1) check (see Eq. (70) and the subsequent discussion).
We now describe some of the individual steps in more detail.

Error correction
We first discuss Step 5.2, because it will have an impact on our discussion of Step 5.1. Given any ε h ∈ (0, 1], if we consider a 2-universal family of hash functions where the output is a bitstring of length log(1/ε h ) , then the defining property of 2-universal hashing guarantees that In other words, the probability of getting matching hashes from different strings can be made arbitrarily small, by using sufficiently long hashes. Informally speaking, this gives us some laxity in Step 5.1, because regardless of how much the devices deviate from the honest behaviour, the guarantee (13) will still hold, providing a final "check" on how bad the guessÃ could be. Importantly, our later proof of the soundness of the protocol will not rely on any guarantees regarding the procedure in Step 5.1 -only the completeness of the protocol (i.e. the probability that the honest devices mistakenly abort) requires such guarantees.
We now study Step 5.1. EC max is defined as follows: it is the length of L EC required such that given the honest devices, Bob can use L EC and B to produce a guessÃ satisfying (In this section, we will use the subscript hon to emphasize quantities computed with respect to an honest behaviour.) We stress that while some preliminary characterization of the devices can be performed beforehand to choose a suitable EC max , this parameter must not be changed once the protocol begins.
This immediately raises the question of what value should be chosen for EC max in order to achieve a desired ε com EC . Theoretically, there exists a protocol [RR12] with one-way communication that achieves (14) as long as we choose EC max such that EC max ≥ Hε s max (A|BXY) hon + 2 log whereε s ∈ [0, ε com EC ) is a parameter that can be optimized over. (This bound is essentially tight for one-way protocols.) Since the honest behaviour is IID, the max-entropy can be bounded by using the asymptotic equipartition property (AEP) in the form stated as Corollary 4.10 of [DFR20], which yields 5 where (using the decomposition H(Q|Q C) = c Pr[c] H(Q|Q ; C = c) for classical C) where the terms in the summation over z should be understood to refer to the A j B j values after the noisy-preprocessing step. (Any value of j can be used in the above equation since the honest behaviour is IID.) However, the protocol achieving the bound (15) may not be easy to implement. In practice, error-correction protocols typically achieve performance described by where ξ(n, ε com EC ) lies between 1.05 and 1.2 for "typical" values of n and ε com EC . (More precise characterizations can be found in [TMMP+17], which gives for instance an estimate for a constant ξ 1 and a specific functionξ.) Furthermore, some protocols used in practice do not have a theoretical bound on ε com EC (for a given EC max ), only heuristic estimates. Fortunately, as mentioned earlier, the choice of error-correction procedure in Step 5.1 will have no effect in our proof of the soundness of Protocol 1 (as long as EC max is a fixed parameter), only its completeness. This means that as long as we are willing to accept heuristic values for ε com , we can use the heuristic values of ε com EC provided by using some "practical" error-correction procedure in Step 5.1, and the value of ε sou (i.e. how "secure" the protocol is) will be completely unaffected. The critical point to remember is that EC max is a value to be fixed before the protocol begins, and Alice and Bob must stop Step 5.1 once they have reached that number of communicated bits. With this in mind, we remark that while we mainly focus on protocols using one-way error correction, this is not quite a strict requirement -in principle, one could use a procedure involving two-way communication (such as Cascade), as long as EC max includes all the communicated bits, not just those sent from Alice to Bob. Another possibility worth considering might be adaptive procedures that adjust to the noise level encountered during execution of the protocol, rather than the expected noise level (again, making sure to halt once EC max bits are communicated, where EC max is defined beforehand based on the expected behaviour).
We remark that in our situation, the registers ABXY have some substructure in the sense that they can be naturally divided into the substrings where X j = Y j ∈ {0, 1}, X j = Y j ∈ {0, 1}, and Y j ∈ {2, 3}, so the error-correction procedure should ideally take advantage of this substructure (for instance, no error-correction data needs to be sent regarding the rounds where X j = Y j ∈ {0, 1}). Also, if we assume that Pr[A j = B j |X j = x, Y j = y] hon is the same for all x ∈ {0, 1}, y ∈ {2, 3} (in which case it must equal w exp ), then for the y ∈ {2, 3} terms in Eq. (17) we have which lies in approximately [0.600, 0.811] for w exp ∈ [3/4, (2 + √ 2)/4]. If the protocol parameters are such that ξ(n, ε com EC )h 2 (w exp ) turns out to be fairly close to 1, there is not much loss incurred by simply sending the outputs of the test rounds directly rather than expending the effort to compute appropriate error-correction data.

Privacy amplification
Privacy amplification is essentially centred around the Leftover Hashing Lemma, which we state below in the form described in [TL17] (obtained via a small modification of the proof in [Ren05]): Proposition 1. (Leftover Hashing Lemma) Consider any σ ∈ S ≤ (CQ) where C is a classical n-bit register. Let H be a 2-universal family of hash functions from Z n 2 to Z 2 , and let H be a register of dimension |H|. Define the state where the map E represents the (classical) process of applying the hash function specified in the register H to the register C, and recording the output in register K. Then for any ε ∈ 0, Tr[σ CQ ] , we have Practically speaking, the privacy amplification step simply consists of Alice choosing a random function from the 2-universal family and publicly communicating it to Bob, followed by Alice and Bob applying that function to A andÃ respectively. The Leftover Hashing Lemma then ensures that the output of this process is close to an ideal key, as long as the conditional min-entropy of the original state was sufficiently large. (Notice that the register H is included in the "side-information" term in Eq. (22), so it can be publicly communicated.)

Honest behaviour
For this protocol, the honest implementation consists of n IID copies of a device characterized by 2 parameters, w exp and h hon . The first parameter w exp is the probability with which the device wins the CHSH game when supplied with uniformly random inputs X j ∈ {0, 1}, Y j ∈ {2, 3}, while the second parameter h hon is defined in Eq. (17). While h hon does not explicitly appear in the protocol description, it is implicitly used to define the parameter EC max , as was described in Sec. 3.1. (Since h hon has a dependence on γ, strictly speaking it may be more precise to instead view the honest device behaviour as being parametrized by a tuple specifying all the individual entropies in Eq. (17), but for brevity we shall summarize this as the honest behaviour being parametrized by h hon . In the more specific models of honest devices described below, these entropies are expressed in terms of some simpler parameters.) When computing the keyrates shown in Figs. 1 and 6 later for honest devices corresponding to the Bell tests in [HBD+15] (resp. [RBG+17]), we used the following model of the honest devices: following the estimates given in [MDR+19], we characterize them via the parameters w exp = 0.797 (resp. 0.777) and p err = 0.06 (resp. 0.035), where p err is a parameter such that the probabilities before noisy preprocessing satisfy 6 where δ j,k is the Kronecker delta. Furthermore, we take Pr[A j = B j |X j = x, Y j = y] hon to be the same for all x ∈ {0, 1}, y ∈ {2, 3} (in which case it must equal w exp ). Under this model, the expression (17) for h hon can be simplified: where the p + (1 − 2p)p err term is obtained by an explicit computation [WAP21].
When computing the keyrates shown in Figs. 2-3 later for a depolarizing-noise scenario, we modelled the honest devices as being described by a parameter q ∈ [0, 1/2], such that the devices hold the two-qubit Werner state (1 − 2q) |Φ + Φ + | + 2q I/4 (where |Φ + = (|00 + |11 )/ √ 2). The measurements corresponding to Alice and Bob's inputs X j ∈ {0, 1}, Y j ∈ {2, 3} are the ideal CHSH measurements (see e.g. [PAB+09]), and the measurements corresponding to Bob's inputs Y j ∈ {0, 1} are measurements in the same bases as Alice's measurements. In terms of w exp and h hon , this yields where the second expression follows by the same computation as above, noting that essentially we have p err = q in this case.
Finally, for modelling photonic experiments in Fig. 5 later, we follow [Ebe93] and use a highly simplified model with a single parameter η ∈ [0, 1], which is intended to represent the overall detection efficiency (grouping together various effects such as fibre-optic losses and photodetector efficiency into this single parameter). In this model, we take the honest devices to be able to implement arbitrary two-qubit states and measurements perfectly well, but then with probability 1 − η the outcome is replaced with a no-detection symbol φ. In order to apply our security proof, which requires the test-round measurements to have binary outcomes, we impose that for inputs X j ∈ {0, 1} by Alice and inputs Y j ∈ {2, 3} by Bob, the no-detection outcome φ is deterministically mapped to the output value 0 (this is a common approach for Bell tests and/or DIQKD using this model [Ebe93,PAB+09]). However, for inputs Y j ∈ {0, 1} by Bob, we preserve the no-detection outcome, as it slightly improves the keyrates by reducing the error-correction term h hon [ML12]. Unlike the depolarizing-noise model, in this case we do not stick to a fixed choice of states and measurements for all η, but rather we (heuristically) optimize the states and measurements to maximize the keyrate for each η -as is typical in this model, this makes a significant difference in the threshold η value required to achieve e.g. Bell violation [Ebe93] or nonzero keyrates [BFF21].

Finite-size keyrates
We now present our main theorem, giving the length of the final key as a function of the number of rounds and the desired security parameters. To do so, we need to introduce some notation and ancillary functions. First, we require an function r p satisfying some properties we now describe. Consider any tripartite quantum state ρĀBĒ, and suppose there are two possible binary-outcome measurements (indexed by x) on registerĀ, and similarly two binary-outcome measurements (indexed by y) on registerB. Let w be the probability of winning the CHSH game (with uniform inputs) for these measurements on the state ρĀB. LetÂ x be a register that stores the result if measurement x is performed on registerĀ and noisy preprocessing (Step 4) with bias p is then applied to the outcome. We require r p to be an affine function such that for any choice of ρĀBĒ and measurements, we have (The factor of 1/2 arises from the input distributions in the protocol.) We give details on how to obtain such a bound in Sec. 5 (in principle, the p = 0 case could also be obtained from the results of [SGP+21]). Given such a function r p , we then define the affine function g(w) := 1 − γ 2 r p (w) + γr 0 (w).
Informally, g can be interpreted as a lower bound on the entropy "accumulated" in one round of the protocol.
Finally, for an affine function f defined on all probability distributions on some register C, and any subset S of its domain, we will define where max q is taken over all distributions on C, and δ c denotes the distribution with all its weight on the symbol c.
With these definitions, we can state the security guarantees of the protocol. The following theorem involves various parameters in addition to those listed at the start of Protocol 1, which we first qualitatively describe (note that several descriptions are somewhat informal, not meant to be entirely precise on their own): ε com EC : Bound on the probability for the honest implementation that Bob's guessÃ for A is wrong ε com PE : Bound on the probability for the honest implementation that Bob's guessÃ for A is correct but freq c does not satisfy the parameter-estimation checks ε EA : Informally, a bound on the probability that a "virtual" parameter estimation step (in a related virtual protocol) accepts when given devices that produce insufficient min-entropy ε PA : Informally, a bound on the secrecy parameter (12) of the keys after privacy amplification (in a related virtual protocol) when given devices that produce sufficient min-entropy, up to some smoothing corrections These parameters, together with the protocol parameters γ, p, δ tol described earlier, can be considered to be variational parameters that should be chosen to optimize the keyrate as much as possible. The formal theorem statement is as follows: Theorem 1. Take any ε com EC , ε com PE , ε EA , ε PA , ε h , ε s , ε s , ε s ∈ (0, 1] such that ε s > ε s + 2ε s , and any α ∈ (1, 2), α ∈ (1, 1 + 2/V ), β ∈ [g(0), g(1)], γ ∈ (0, 1), p ∈ [0, 1/2], where V := 2 log 5. Protocol 1 is (ε com EC + ε com PE )-complete and (max{ε EA , ε PA + 2ε s } + 2ε h )-sound when performed with any choice of EC max such that Eq. (14) holds, and δ tol , key satisfying where B n,p (k) is the cumulative distribution function of a binomial distribution (Definition 2), and with f min and Q f being a function and a set (defined explicitly in Sec. 4.3.1) that satisfy , Qualitatively, the ϑ ε terms in the above theorem arise from various properties of smoothed entropies, while the V, V terms are measures of "variance" of some functions considered in the the EAT, and the K α term is an additional correction that essentially depends on the range of these functions.
To compute the finite-size keyrates presented in this work, for instance in Figs. 1 and 2 later, we optimized the parameter choices by using the inbuilt (heuristic) constrained-optimization functions in Mathematica or MATLAB to maximize key while imposing the constraint (30) on the parameters ε com PE , γ, δ tol . 7 We highlight that the exact expression for ϑ ε is numerically unstable, and hence we replaced it with the upper bound of log(2/ε 2 ) (this bound is basically tight at small ε, so it makes little difference). Furthermore, the optimization over β also appears to be somewhat unstable. We observed heuristically that the optimal value of β appears to typically be very close to g(1), and hence for simplicity in some cases we did not optimize over it but instead simply fixed β = g(1) (or slightly below it, to avoid some instabilities at β − g(1) = 0). Finally, we found that direct computation of B n,p (k) (e.g. via the regularized beta function) could sometimes be slow or unstable, and in such cases we followed [LLR+21] and replaced it with the upper bound in the following theorem: where D e (q p) := q ln q p + (1 − q) ln 1−q 1−p is the base-e relative entropy between the distributions Replacing B n,p (k) with C n,p (k+1) and computing the latter (which is a Gaussian integral) appeared to be faster and more stable than computing B n,p (k) directly. There is little loss incurred by performing this replacement -the inequalities (35) imply C n,p (k + 1) ≤ B n,p (k + 1), so the effect is no larger than replacing B n,p (k) by B n,p (k + 1), which is basically negligible in the parameter regimes studied in this work.
Note that in general, the optimal parameter values (especially for α and α ) would depend heavily on n. To get an estimate for the asymptotic scaling of key , we can choose all the ε parameters to take some constant values satisfying the desired completeness and soundness bounds, then choose [DFR20, DF19] 7 To enforce that for instance the parameters ε com EC , ε com PE satisfy the condition ε com EC + ε com PE = ε com for the desired completeness value ε com , rather than imposing this condition as a constraint in the optimization, we instead introduced a reparametrization ε com EC = ε com sin 2 θ, ε com PE = ε com cos 2 θ and optimized over θ ∈ (0, π/2). This ensures that ε com EC + ε com PE = ε com automatically holds without imposing it as an explicit constraint. A similar approach (with more parameters) was taken for conditions such as max{εEA, εPA + 2εs} + 2ε h = ε sou and so on. taking n to be large enough such that α ∈ (1, 3/2) and α ∈ (1, 1 + 2/V ). Furthermore, introduce a constantK which satisfiesK ≥ K α for α ∈ (1, 3/2). With these choices, Eq. (31) can be satisfied by taking Furthermore, Eq. (30) can be satisfied by choosing γ = 3w exp n −1 δ −2 tol log(2/ε com PE ) (see Eq. (44)), in which case by taking δ tol ∝ n −1/3 we have δ tol , γ → 0 as n → ∞, and Eq. (38) then yields taking EC max according to Eqs. (15)-(17). This is the expected asymptotic result, e.g. according to the Devetak-Winter bound [DW05] (with the prefactor of 1/2 being due to the sifting).
Given the scaling behaviour shown in Eq. (38), it can be seen that the optimal values of the various ε parameters (given some desired values of ε com and ε sou ) may be of rather different orders of magnitude. This is because some of them appear in O(1/ √ n) corrections to the finite-size keyrate while others appear in O(1/n) corrections. Intuitively speaking, the ε parameters in the latter can be chosen to be substantially smaller than the former, since the O(1/n) scaling reduces their contribution to the finite-size effects.
With Theorem 1, we compute the achievable finite-size keyrates for the various types of honest devices described in Sec. 3.3, showing the results in Figs. 1 and 2. As mentioned previously, the former shows the results regarding the Bell tests in [HBD+15] and [RBG+17], while the latter shows the results for a depolarizing-noise model. We remark that for the computations in Fig. 1, noisy preprocessing was not applied because it appears to only slightly improve the keyrates for those experimental parameters; see Fig. 6 later. For reference, in Fig. 1 we also include plots of the finite-size keyrates against collective attacks as derived in Theorem 4 later -comparing them to the keyrates against general attacks given by Theorem 1, we see that there is indeed some difference, but the threshold value of n to achieve positive keyrates only differs by about one order  17). The colours correspond to soundness parameters (informally, a measure of how "insecure" the key is) of ε sou = 10 −3 , 10 −6 , and 10 −9 for black, blue, and red respectively, while the completeness parameter (the probability that the honest devices abort) is ε com = 10 −2 in all cases. The horizontal line denotes the asymptotic keyrate. All other parameters in Theorems 1 and 4 were numerically optimized, except β. . The solid, dashed and dotted curves denote q = 5%, 6% and 7% respectively, with the error-correction protocol taken to satisfy Eqs. (15)-(17). The soundness parameter is ε sou = 10 −6 and the completeness parameter is ε com = 10 −2 . The horizontal lines denote the asymptotic keyrates. All other parameters were numerically optimized. of magnitude. Furthermore, the various choices we considered for the soundness parameter all yielded fairly similar keyrate curves, indicating that changing the soundness requirements would not significantly change the n threshold.

Finite-size security proof
We now prove that Protocol 1 indeed satisfies the security properties claimed in Theorem 1. To do so, we first introduce a virtual protocol that is more convenient to analyze. For the purposes of understanding this construction, it may be helpful to think of it as being based on a specific set of states and measurements that could be occurring in a run of Protocol 1 (as opposed to simultaneously considering all possible states and measurements that could be occurring). In particular, this virtual protocol (and the channels M j in Sec. 4.3 later) should be understood as being constructed in terms of this specific set of states and measurements. Since we will not impose any additional assumptions on these states and measurements beyond those specified by the protocol, this will still yield a valid way for us to prove the desired security properties (in particular the soundness property, which has to be proven for all possible states and measurements that could occur in a run of the protocol).
Consider the state at the end of Step 6 in Protocol 1. We now describe a virtual protocol 8 that produces exactly the same state (when it is implemented using the same input state and measurements as those used in a run of Protocol 1), apart from the introduction of two additional registers B C .
Protocol 1 A virtual protocol 1. Alice and Bob's devices each receive and store all quantum states that they will subsequently measure.
. 3. Error correction: Alice and Bob publicly communicate some bits L for error correction as previously described, allowing Bob to construct a guessÃ for A.

Parameter estimation: For all
The key changes as compared to Protocol 1 are as follows: • All the states that the devices will measure are distributed immediately at the start (note that this is possible because in Protocol 1, the measurement choices X, Y are not disclosed until all measurements have been performed, and hence the distributed states cannot behave adaptively with respect to the inputs).
• The sifting and noisy preprocessing steps are now performed immediately after each measurement, instead of after all measurements are performed. This is to allow us to subsequently apply the EAT.
• Two additional registers were introduced: B , which is equal to B on all the test rounds but is otherwise set to 0, and C , which is analogous to C but computed using A in place ofÃ. These registers were used in a virtual parameter estimation step.
• All parameter estimation is performed with B instead of B (this substitution has no physical effect since B j = B j in all rounds used for parameter estimation).
Let ρ denote the state on registers AÃBB XYLCC E (as well as the choice of hash function in the error-correction step) at the end of Protocol 1 . As mentioned above, the reduced state after tracing out B C is exactly the same as that at the end of Step 6 in Protocol 1. Since all subsequent steps in Protocol 1 (i.e. simply the accept/abort check and the privacy amplification) only involve this reduced state, to analyze Protocol 1 it suffices to consider the equivalent (apart from B C ) process where all the steps up to Step 6 are replaced by this virtual protocol, and then the remaining steps in Protocol 1 are performed. With this in mind, let us define the following events on the state ρ: Note that in terms of these events, the accept condition of the protocol is Ω h ∧ Ω PE . With the virtual protocol and the above events in mind, we now turn to proving completeness and soundness of Protocol 1.

Completeness
Completeness is defined entirely with respect to the honest behaviour of the devices, hence all discussion in this section is with respect to the situation where the state ρ described above is the one produced by the honest states and measurements. To prove completeness, we simply need to obtain an upper bound on the probability that this honest behaviour yields an abort, i.e. Pr[Ω c h ∨ Ω c PE ] hon (recall we use Ω c to denote the complement of an event). However, we encounter a slight inconvenience here because the event Ω c PE involves the register C produced using Bob's guessÃ rather than Alice's actual string A, and there is some small probability that his guess was wrong. To cope with this, we shall break down Pr[Ω c h ∨ Ω c PE ] hon into simpler terms that can be bounded in terms of probabilities involving only the "virtual" string C rather than C, where the former is easier to handle since it is produced from the actual value of A.
We begin by noting that the hashes of A andÃ can only differ if A =Ã, which is to say that the event Ω c h implies the event Ω c g . With this, we write where in the second line we have partitioned the event Ω c g ∨ Ω c PE into the disjoint events Ω c g and Ω g ∧ Ω c PE . We shall now upper bound the probabilities of each of these events. The Pr Ω c g hon term is straightforward to handle, since by construction the error-correction step ensures that this probability is at most ε com EC . As for the Pr[Ω g ∧ Ω c PE ] hon term, we now make the critical observation that Ω g ∧ Ω c PE = Ω g ∧ Ω c PE (because the event Ω g implies that C = C ). Therefore, we have which is the desired reduction to a term involving C rather than C. To explicitly upper bound Pr[Ω c PE ] hon , observe that under the honest behaviour, the string C consists of n IID rounds such that Pr where ¬0 represents all symbols other than 0. Hence by the union bound, Pr[Ω c PE ] hon is upper bounded by ε com PE is as specified in Eq. (30) (i.e. the sum of the expressions (42) and (43)). This yields a final upper bound of ε com EC + ε com PE on the probability of the honest behaviour aborting, as desired.
In principle, somewhat simpler expressions could be obtained using the (multiplicative) Chernoff bound: where t = δ tol /w exp ≤ 1. However, it was observed in [LLR+21] that these bounds are weaker than Prop. 2 by a significant amount. As yet another alternative, Hoeffding's inequality yields a bound of e −2nγ 2 δ 2 tol , but this is worse than the Chernoff bound whenever 6γ ≤ 1/w exp ; furthermore, it does not yield nontrivial bounds if we choose γ ∝ 1/n (for constant ε com EC , δ tol ), which ought to be possible in principle according to the scaling analysis in [DF19].

Soundness
Soundness has to be proven against all possible dishonest behaviours (subject to the protocol assumptions as listed in Sec. 3). In this section, we shall consider any particular state ρ (as defined after the Protocol 1 description) that would be produced by a particular choice of such dishonest behaviour, and prove that it satisfies the soundness condition (10) (for a specific value of ε sou ) regardless of which dishonest behaviour was considered. All probabilities are to be understood as being defined with respect to that state ρ.
We first note that it is straightforward to show that Protocol 1 is ε h -correct -recalling that the accept condition can be written as Ω h ∧ Ω PE , we obtain the desired upper bound on the probability that K A = K B and the protocol accepts: where the last inequality holds by the defining property of 2-universal hashing.
It remains to prove secrecy. Denote the privacy amplification step in Protocol 1 as the map M PA , so the subnormalized state conditioned on the event of Protocol 1 accepting can be written as M PA (ρ ∧Ω h ∧Ω PE ). Let H be the register storing the choice of hash function used for privacy amplification, and denote E := XYLEH for brevity. We can now rewrite the secrecy condition as the requirement Now, somewhat similarly to the completeness analysis, we shall find a way to upper bound the left-hand-side of the above expression in terms of Ω PE rather than Ω PE , as the former is easier to handle. Specifically, by noting that ρ ∧Ω h ∧Ω PE = ρ ∧Ωg∧Ω h ∧Ω PE + ρ ∧Ω c g ∧Ω h ∧Ω PE , we find the following bound using norm subadditivity: where in the last line we have performed the substitution Ω g ∧ Ω PE = Ω g ∧ Ω PE in the first term (again, because the event Ω g implies that C = C ), and bounded the second term using as noted in Eq. (45).
We now aim to bound the first term in Eq. (47). To do so, we study three exhaustive possibilities for the state ρ (the first two are not mutually exclusive, but this does not matter): Case 3: Neither of the above are true.
In case 1, that term is bounded by In case 2, that term is bounded by The main challenge is case 3. To study this case, we first focus on the conditional state ρ |Ω PE . Importantly, the relevant smoothed min-entropy of this state can be bounded by using the following theorem, which we prove in Sec. 4.3 using entropy accumulation: Theorem 2. For all parameter values as specified in Theorem 1, the state at the end of Protocol 1 satisfies where V , ϑ ε , V, K α are as defined in Theorem 1.
To relate this to our state of interest, we use [TL17] Lemma 10, which states that for a state σ ∈ S ≤ (ZZ QQ ) that is classical on registers ZZ , an event Ω on the registers ZZ , and any ε ∈ [0, Tr[σ ∧Ω ]), we have In our context, we observe that the probability of the event Ω g ∧Ω h on the normalized state ρ |Ω PE is Pr[Ω g ∧ Ω h |Ω PE ], which is greater than ε 2 s since we are in case 3. Hence the conditions of the lemma are satisfied (identifying A with Z,Ã (and the error-correction hash choice) with Z , XYLE with Q , and leaving Q empty), allowing us to obtain the bound where in the second line we have applied a chain rule for smoothed min-entropy (see e.g. [WTH+11] Lemma 11 or [Tom16] Lemma 6.8).
Putting together Eq. (53) and Theorem 2, we find that for a key of length key satisfying Eq. (31), where the last line holds because Pr This finally allows us to bound Eq. (47), since we have ε 2 in case 3, and hence we can apply the Leftover Hashing Lemma to see that 10 Since the three possible cases are exhaustive, we conclude that the secrecy condition is satisfied by choosing Recalling that we have already shown the protocol is ε h -correct, we finally conclude that it is (max{ε EA , ε PA + 2ε s } + 2ε h )-sound.

Entropy accumulation
This section is devoted to the proof of Theorem 2. The key theoretical tool in this proof is the entropy accumulation theorem, which we shall now briefly outline in the form stated in [DF19]. To do so, we shall first introduce EAT channels and tradeoff functions.
where each M j is a channel from a register R j−1 to registers D j S j T j R j , which satisfies the following properties: • All D j are classical registers with a common alphabet D, and all S j have the same finite dimension.
• For each M j , the value of D j is determined from the registers S j T j alone. Formally, this where {Π S j ,s } s∈S and {Π T j ,t } t∈T are families of orthogonal projectors on S j and T j respectively, and d : S × T → D is a deterministic function.
Definition 11. Let f min be a real-valued affine function defined on probability distributions over the alphabet D. It is called a min-tradeoff function for a sequence of EAT channels where Σ j (q) denotes the set of states of the form (M j ⊗ I R )(ω R j−1 R ) such that the reduced state on D j has distribution q. Analogously, a real-valued affine function f max defined on probability distributions over D is called a max-tradeoff function if The infimum and supremum of an empty set are defined as +∞ and −∞ respectively.
Then for any event Ω ⊆ D n such that f min (freq d ) ≥ h for all d ∈ Ω, we have 11 where ϑ ε is as defined in Eq. (32), and V := Var Q (f min ) + 2 + log(2 dim(S j ) 2 + 1), with Q being the set of all distributions on D that could be produced by applying some EAT channel to some state.
Informally, the Markov conditions impose the requirement that the register T j does not "leak any information" about the previous registers (Without this Markov condition, one could for instance have channels such that T j simply contains a copy of S j−1 , in which case there could be situations where a nontrivial min-tradeoff function holds but the conclusion of the entropy accumulation theorem would be completely false.) There is also an analogous EAT statement regarding the max-entropy [DFR20], using a max-tradeoff function, though we will be using it slightly differently and will elaborate further on it at that point.
We now describe how the EAT can be used to prove Theorem 2. First, note that to prove Theorem 2 it would be sufficient to consider only the registers AB XYC E of the state ρ (the conditioning event Ω PE is determined by C alone). The reduced state on these registers is the same as that at the point when Step 2 of Protocol 1 has finished looping over j, since the subsequent steps do not change these registers, and thus we can equivalently study that state in place of ρ |Ω PE . From this point onwards, all smoothed min-or max-entropies refer to that state conditioned on Ω PE (and normalized), hence for brevity we will omit the subscript specifying the state.
Each iteration of Step 2 of Protocol 1 can be treated as a channel M j in a sequence of EAT channels, by considering it to be a channel performing the following operations: 1. Alice generates X j as specified in Step 2. Conditioned on the value of X j , Alice's device performs some measurement on its share of the stored quantum state R j−1 (which includes any memory retained from previous rounds), then performs sifting and noisy preprocessing on the outcome, storing the final result in register A j .
2. Bob's device behaves analogously, producing the registers Y j and B j (we will not need to consider B j ).

The value of
We highlight that in the above description of M j , the only "unknowns" are the measurements it performs on the input state on R j−1 -all other operations are taken to be performed in trusted fashion. (This is reasonable because these measurements and the stored states are the only untrusted aspects in the true protocol.) If we had simply considered completely arbitrary channels M j producing the respective registers, it would not be possible to make a nontrivial security statement about the output.
Identifying C j with D j , A j B j with S j , and X j Y j with T j in Definition 10, we see that these channels M j indeed form a valid sequence of EAT channels: C j is determined from A j B j X j Y j in the manner specified by Eq. (57). Additionally, the state they produce always fulfills the Markov conditions, because the values of X j Y j in each round are generated independently of all preceding registers.
Intuitively, it seems that we could now use the EAT to bound H εs min (A|XYE). However, there is a technical issue: to apply the EAT, the event Ω PE must be defined entirely in terms of the (classical) registers that appear in the smoothed min-entropy term that we are bounding, which is not a condition satisfied by the registers AXY alone. This is where the register B comes into play, following the same approach as [AFRV19]: by a chain rule for the min-and max-entropies ( [VDT+13] or [Tom16] Eq. (6.57)), we have for any ε s + 2ε s < ε s : The H ε s max (B |XYE) term admits a fairly simple bound, as follows: consider a sequence of EAT channels M j that are identical to M j except that they do not produce the registers A j C j . As before, these maps obey the required Markov conditions. In addition, recall that for every round the register B j is deterministically set to 0 whenever y ∈ {0, 1} (which happens with probability 1 − γ), hence we always have This means we can apply the max-entropy version of the EAT 12 with a constant max-tradeoff function of value γ. Letting V = 2 log(1 + 2 dim(B )) = 2 log 5, this yields the following bound for any α ∈ (1, 1 + 2/V ): The bulk of our task is to bound the H ε s min (AB |XYE) term. To do so, we will need an appropriate min-tradeoff function, which we shall now construct. 12 Here we shall use the results from [DFR20], because for a constant tradeoff function, this turns out to yield a slightly better bound as compared to the version of the EAT [DF19] stated here. Strictly speaking, the reasoning used here is not a direct application of the EAT, because once again, the event Ω PE is not defined on the registers B XY alone (attempting to address this by including A in the conditioning registers could result in the Markov conditions not being fulfilled). Fortunately, the bound (64) holds for our maps Mj even without a constraint on the output distribution. Hence the reasoning is as follows, in terms of the equations and lemmas in [DFR20]: first apply Eq. (32) without the event-conditioning term (this is valid since (64) holds without constraints), then condition on Ω PE using Lemma B.6 (noting that , and finally apply Lemma B.10 to obtain Eq. (65). (Alternatively, one could use C instead of B . This was done in [MDR+19] to slightly improve the bound in the block analysis, but it does not make a difference in our analysis. However, using C would seem to make it harder to sharpen the slightly crude bound used to obtain Eq. (67).)

Min-tradeoff function
Consider an arbitrary state of the form (M j ⊗ I R )(ω R j−1 R ). In this section, all entropies will be computed with respect to this state, and hence for brevity we will omit the subscript specifying the state.
Let w denote the probability that the state wins the CHSH game, conditioned on the game being played. Then by applying the simple but somewhat crude bound 13 H(A j B j |R; X j = x, Y j = y) ≥ H(A j |R; X j = x, Y j = y) to the terms in the second sum in Eq. (66), we get the bound 14 We can now use the function g to construct a min-tradeoff function f min , with the domain of f min being distributions on C j (recall that this register is set to ⊥ if Y j ∈ {0, 1}, and otherwise is set to 0 or 1 if the CHSH game is lost or won respectively). First observe that the channel is an infrequent-sampling channel in the sense described in [DF19,LLR+21]. By the argument in Appendix A.7 of [LLR+21], a valid min-tradeoff function f min for the channel is given by the (unique) affine function specified by the following values (in a minor abuse of notation, here we interpret g as a function of a distribution instead of a winning probability): where β ∈ [g (δ 0 ) , g (δ 1 )] is a constant that can be chosen to optimize the keyrate. (Intuitively, this function is constructed simply by noting that the maps M j can only produce distributions that lie in the slice of the probability simplex specified by the constraint Pr[C j =⊥] = 1 − γ, and hence the min-tradeoff function is free to take any value for distributions outside of this slice, recalling that we take the infimum of an empty set to be +∞. For distributions within this slice, we know that g is an affine lower bound on the entropy as a function of the winning probability, and hence we can just set f min equal to g (up to a domain rescaling) on this slice. Any f min constructed this way is precisely of the form described in Eq. (68), with β being a constant determining its value on all distributions outside of the Pr[C j =⊥] = 1 − γ slice.) As shown in [LLR+21], the min-tradeoff function constructed this way satisfies , For the specific g and range of β that we consider, these expressions simplify to Eq. (33), where we have solved the optimization sup q∈Qg in the bound on Var Q f (f min ) by observing that it is an affine function of the distribution q, and the set Q g we use here is essentially a line segment (in a 1-dimensional probability simplex).

Final min-entropy bound
The event Ω PE is defined by the conditions freq c (1) ≥ (w exp − δ tol )γ and freq c (0) ≤ (1 − w exp + δ tol )γ. Hence for all c ∈ Ω PE , we have (since f min is affine) where the inequality holds because β ∈ [g (δ 0 ) , g (δ 1 )], and in the last line we use the fact that g is affine and revert to interpreting it as a function of winning probability. (We remark that if we fix β = g(1), then in fact this inequality can be derived using only the freq c (0) condition, following [DF19]. Hence in principle one could sacrifice the option of optimizing β in exchange for reducing the number of checks to perform in the protocol, which improves the completeness parameters.) Therefore, we can choose h = g(w exp − δ tol ) in the EAT statement (Prop. 3) to conclude that the state conditioned on Ω PE satisfies where ϑ ε , V, K α are as defined in Eq. (32). Putting this together with Eqs. (63) and (65), we finally obtain the bound in Theorem 2.

Scope of applicability
In the above analysis, we have focused on a protocol that only uses the CHSH game. However, it would be possible to modify the analysis to account for arbitrary Bell inequalities, as was done in [BRC20]. Essentially, Alice and Bob would simply need to choose a different distribution of their input settings, corresponding to a different game being played. Furthermore, it is not strictly necessary for Alice to choose uniformly random inputs in the generation rounds -as noted in [SGP+21], she could instead choose some biased distribution of inputs. It would even be possible to consider applying different amounts of noisy preprocessing for the different inputs in generation rounds. All of these modifications would essentially correspond to finding an appropriate min-tradeoff function, which we describe how to do in the subsequent section. However, we show in Sec. 5.6 that for the depolarizing-noise model at least, there is little to be gained by considering these modifications.
There is a technical issue that in order to implement the above modifications, Alice and Bob may need to know which rounds are test rounds and which rounds are generation rounds, if they need to choose different input distributions in the two cases. (In Protocol 1, this was not an issue because the CHSH game is played with uniformly random inputs, and we also used uniformly random inputs for the generation rounds, so there is no difference in the input distributions between test and generation rounds.) However, this could in principle be addressed by using a short pre-shared key in order to choose the test rounds -we describe this in more detail in Sec. 6.1.

Single-round bound
We now explain how to derive the function r p as described in Eq. (27). To reduce clutter, we will use slightly different notation in this section, which should not be confused with the earlier notation. In particular, we will omit the¯accents from the registersĀBĒ, and we will be using hermitian operators A x , B y that should not be confused with the registers A j , B j in the earlier notation.
As described previously, consider an arbitrary state ρ ABE and possible measurements on the A and B subsystems, indexed by x and y respectively. For the purpose of bounding the entropy, all measurements can be assumed projective by considering a suitably chosen simultaneous Stinespring dilation; see e.g. [TSG+21]. Let P a|x for Alice (resp. Q b|y for Bob) denote the projector corresponding to outcome a (resp. b) from measurement x (resp. y). We first consider a somewhat more general setting than that required for the above security proof; namely, we aim to find a lower bound on the conditional entropies, knowing only that the state and measurements produce the values ν j for some observables Γ j (P a|x , Q b|y ) := abxy c (j) abxy P a|x ⊗ Q b|y defined by fixed coefficients c (j) abxy ∈ R. (For instance, these could be the values of some Bell expressions, or even simply the entire set of output probabilities.) Furthermore, we allow for the possibility that the inputs are not uniformly random. Making this precise, we aim to find the function where τ x ≥ 0 are coefficients that depend on the input distributions [SGP+21]. (For the security proof of Protocol 1, we would only need to consider a single Γ j (P a|x , Q b|y ), which describes the probability of winning the CHSH game. Also, we simply have τ 0 = τ 1 = 1/2, since we want a bound of the form (27).) Without loss of generality, we can restrict the optimization to pure ρ ABE .
By considering the effect of Eve using mixtures of strategies, it is easily seen thatȓ p must be convex. It can thus be shown that for any ν in the interior of the set of values achievable by quantum theory, this optimization is in fact equal to its Lagrange dual (see e.g. [TSG+21] for a detailed explanation): Since the optimization over λ is a supremum, it follows that for any value of λ, we have a lower bound onȓ p ( ν) of the formȓ where Importantly, such a lower bound is automatically affine with respect to ν, and hence we can take it as a possible choice of r p for the security proof. We thus see that in order to find an affine r p for use in the security proof, it suffices to choose some λ and compute (or lower-bound) the value of c λ as defined by Eq. (75). In addition, this approach yields essentially tight bounds for the asymptotic rates 15 , in the sense that taking the supremum of the bounds given by all possible λ returns the value ofȓ p ( ν), by Eq. (73). We also note that when there are multiple constraints, for any given ν the corresponding optimal λ yields a single Bell expression that certifies the same bound on the entropy as all the original constraints. When applying this bound in the EAT, the task of choosing λ here is equivalent to the task in [AFRV19, LLR+21] of choosing a tangent point to the rate curves.
The H(Â x |E) term in the optimization can be rewritten based on the approach in [WLC18]. Specifically, we observe that the state produced onÂ x E by performing Alice's measurement x on ρ ABE and then applying noisy preprocessing could also be obtained by the following process: append an ancilla T in the state apply a pinching channel P(σ T ) := t |t t| T σ T |t t| T to T , then perform a measurement on AT described by the projectorsP 15 There is a technicality here that was overlooked in earlier versions of this work. Specifically, even thoughȓp is basically affine over a large range of values (see Sec. 5.5), the optimal choice of rp for the asymptotic rates still may not yield the optimal finite-size keyrates at each n when applying the EAT -this is because choosing a bound rp that is suboptimal with respect to the asymptotic rates can yield a min-tradeoff function with smaller variance and range, reducing the finite-size corrections. However, given the computationally intensive nature of the algorithm we describe here, we did not attempt to optimize the choice of rp as a function of n, instead sticking to the optimal choice for the asymptotic limit. Still, as mentioned previously in Sec. 4.3, we would expect that any potential improvements by optimizing rp would at best bring the keyrates somewhat closer to the collective-attacks values shown in Fig. 1. and store the outcome inÂ x without further processing. However, the pinching channel on T can in fact be omitted, because the subnormalized conditional states produced on E would still be the same without it (in the following, we leave some tensor factors of I implicit for brevity, and consider an arbitrary state σ ABET before applying the pinching channel): Hence we no longer consider the pinching channel on T , i.e. we simply study the situation where we immediately perform the projective measurement (77) on the stateρ ABET := ρ ABE ⊗ |φ p φ p | T and store the outcome in registerÂ x . As mentioned previously, we can take ρ ABE to be pure without loss of generality, in which caseρ ABET is pure as well 16 , and must hence obey the following relation (derived in e.g. [Col12]): where Importantly, this expression for H(Â x |E) is entirely in terms of the reduced state ρ AB , and is convex with respect to ρ AB . (We give an alternative expression in Appendix A, which may be of use in other situations.) We remark that the above analysis was fairly general, in that in fact it applies to DI scenarios with arbitrary numbers of inputs and outputs, as long as Alice's key-generating measurements still only have 2 outcomes. In particular, the approaches described in [TSG+21,BFF21] are able to yield bounds on these optimizations when there is no noisy preprocessing, and it may be useful to study whether the above map for describing noisy preprocessing allows one to apply those approaches to this scenario. However, we leave this question for future work.
We now specialize to 2-input 2-output scenarios. In such cases, we have the following "qubit reduction" [PAB+09,HST+20]: if one finds a convex function that lower-bounds the right-handside of Eq. (72) with its optimization restricted to states ρ ABE of dimension 2 × 2 × 4 and Pauli measurements, then this function is also a lower bound onȓ p . We note (by following the same arguments as before, but with the optimization domain restricted) that picking some λ and solving the optimization (75) over such states and measurements yields such a lower bound via Eq. (74), which is trivially convex since it is affine. Hence it suffices to consider the optimization (75) restricted to such states and measurements. Furthermore, the resulting bounds are still tight in the same sense as before, in that taking the supremum over choices of λ yields the convex envelope of this restricted version of the optimization (72) (except possibly for ν not in the interior of the quantum-achievable values).
Since we have reduced the analysis to Pauli measurements, it is convenient to interpret the measurements as producing values in ±1 instead of Z 2 , and define the corresponding hermitian observables A x := a aP a|x and B y := b bQ b|y .
Considering other forms of constraints in 2-input 2-output scenarios is essentially equivalent to making specific choices of Lagrange multipliers λ for this 4-constraint formulation. For instance, in Protocol 1 we only impose a constraint based on the CHSH value 18 , which is equivalent to restricting to Lagrange-multiplier combinations of the form (λ 00 , λ 01 , λ 10 , λ 11 ) = (λ, λ, λ, −λ) for some λ ∈ R.
We still have the freedom to choose the basis in which to express the optimization. Following [SGP+21], we can use the measurement axes of Alice's two Pauli measurements to define the X-Z plane on her system, taking A 0 = Z and A 1 = cos(θ A )Z +sin(θ A )X for some θ A ∈ [0, π] (values of θ A in [π, 2π] can be brought into this range by rotating our axis choice by π around the Z-axis). Analogously, we can choose a basis for Bob such that B 0 = Z and B 1 = cos(θ B )Z + sin(θ B )X with θ B ∈ [0, π]. (This is a different basis choice from the one in [PAB+09, HST+20] that allows a reduction to Bell-diagonal ρ AB . That choice involves more parameters for the measurements, but fewer parameters for the state.) 17 While more detailed explanations can be found in e.g. [PAB+09,HST+20], the outline is as follows: consider a virtual symmetrization step in which Alice and Bob jointly flip their outputs using a uniformly random public bit, forcing their marginals to be zero while leaving the correlators unchanged. Some calculation shows [SR08, PAB+09, HST+20] that the entropy of their original outputs conditioned on Eve's side-information is equal to the entropy of their symmetrized outputs conditioned on Eve's side-information and the publicly communicated bit (and this still holds true with noisy preprocessing). Absorbing the publicly communicated bit into Eve's side-information, this implies that if the original optimization (72) had constraints on the marginals Tr[(Ax ⊗ I)ρAB], Tr[(I ⊗ By)ρAB] in addition to the correlators Tr[(Ax ⊗ By)ρAB], we could bound it by instead considering the optimization where the marginal constraints are replaced by the condition that they are zero (with the correlator constraints unchanged). (Note that the virtual symmetrization does not need to be physically performed; it merely serves as an intermediate construction in this analysis.) 18 Here, by CHSH value we mean the quantity ν = Tr[(A0 ⊗ B0 + A0 ⊗ B1 + A1 ⊗ B0 − A1 ⊗ B1)ρAB]. This is related to the probability w of winning the CHSH game by the simple equation ν = 8w − 4, so an affine function of ν is easily converted to an affine function of w.
With this in mind, we rewrite the optimization (75) as (83) (The minima are attained because the objective function is continuous and the domain is compact.) Furthermore, the objective function is invariant under the substitutions ρ AB → (Y ⊗Y )ρ AB (Y ⊗Y ) and ρ AB → ρ * AB , which implies [PAB+09] we can restrict the optimization to ρ AB that are "almost" Bell-diagonal -specifically, with respect to the Bell basis {|Φ + , |Ψ − , |Φ − , |Ψ + } where |Φ ± = (|00 ± |11 )/ √ 2 and |Ψ ± = (|01 ± |10 )/ √ 2, we can take We now describe how each minimization can be tackled when treating the parameters in the other minimizations as constants, then summarize how all these algorithms can be put together in a consistent manner, and argue that this approach indeed yields arbitrarily tight bounds.

Minimization over Alice's measurement
We tackle this minimization simply by applying a (uniform) continuity bound for θ A . Specifically, for δ ∈ [0, π] we describe a monotone increasing function ε con (δ) that bounds the change in the objective function when θ A is replaced by θ A + δ (treating θ B and ρ AB as constants). Then for any set of intervals of the form {[θ j − δ j , θ j + δ j ]} j that covers the interval [0, π], we would have We apply this in practice by starting with a fairly "coarse" choice of intervals, then iteratively applying the process of deleting the interval that currently achieves the minimization over j, and replacing it with smaller intervals that cover the deleted interval.
To derive such a continuity bound, we first analyze the entropic term in the objective function, following [SBV+21] (with a minor modification to slightly improve the bound). Take any pure initial state |ρ ABE , and let σÂ 1 BE be the state obtained by performing a Pauli measurement along angle θ A in the X-Z plane on the A register of |ρ ABE , applying noisy preprocessing, then storing the result in the classical registerÂ 1 and tracing out A. Let σ Â 1 BE be the analogous state with θ A replaced by θ A + δ for some δ ∈ [0, π]. Our goal would be to bound H(Â 1 |E) σ − H(Â 1 |E) σ .
Now observe that exactly the same state σÂ 1 BE would have been produced if the initial state had been e iθ A Y A /2 ⊗ I BE |ρ ABE and the initial Pauli measurement were replaced by a Z measurement. Furthermore, the fact that ρ A = I/2 implies we can write |ρ ABE = a |a A |a BE / √ 2, where {|a A } a is the Z-eigenbasis of A and {|a BE } a are two orthonormal states on BE. This implies where Performing the analogous analysis for σ , we conclude that Therefore, we have where the second inequality holds by concavity of fidelity, and the third inequality is given by explicit calculation (see [SBV+21]).
This lets us apply a fidelity-based continuity bound [SBV+21]: where in the second inequality we used the condition δ ∈ [0, π]. (Numerical heuristics suggest that the true bound in Eq. (90) may simply be δ, so there is some potential for improvement here, though the effect would be fairly small. In Appendix B we present an approach based on trace distance instead of fidelity, but it appears to scale poorly at small δ.) As for the Γ term, we note that where the last line follows from an explicit eigenvalue calculation.

Minimization over Bob's measurement
The entropic term in the objective function has no dependence on Bob's measurement, so we only need to consider the Γ term. In principle, this could be approached using the same argument as above, where we would arrive at the continuity bound However, some heuristic experiments indicate that the following approach (used in [SGP+21]) is more efficient: we can let r Z := cos(θ B ) and r X := sin(θ B ) and write in which case the minimization over θ B ∈ [0, π] is equivalent to minimizing over (r Z , r X ) that lie on the set S ¡ := {(r Z , r X ) | r 2 Z + r 2 X = 1 and r X ≥ 0} (i.e. a semicircular arc). Crucially, the objective function is affine with respect to the vector (r Z , r X ). Hence if V is any finite set of points such that S ¡ is contained in their convex hull Conv(V ), we immediately have because the minimum of an affine function over the convex hull of a finite set V is always attained at an extremal point (which will be a point in V ). To apply this result, we start with a simple choice of the set V (for instance, in our code we use V = {(1, 0), (1, 1), (−1, 1), (−1, 0)}) and find the point in V that yields the minimum value. We then delete this point and replace it with two other points such that S ¡ is still contained in the convex hull, and iterate this process until a sufficiently tight bound is obtained (for instance, by checking that there is a feasible point of the optimization that is sufficiently close to the lower bound we have obtained).

Minimization over states
The minimization over states can be tackled by expressing it as a convex optimization and applying the Frank-Wolfe algorithm [FW56], as was observed in [WLC18]. For completeness, we now describe the method, with minor modifications and clarifications for our specific scenario. We emphasize that while this procedure is numerical, the bounds it returns are secure, in the sense that it will never over-estimate the true value of the minimization problem.
We observe that (for fixed measurements, parametrized as described above) the minimization over states is the minimization of the convex function This function is differentiable for all ρ such that G(ρ) > 0, with [WLC18] 19 where G † is the adjoint channel of G. In practice, we will not need to explicitly compute G † , because all subsequent arguments rely only on "inner products" Tr (∇f obj (ρ)) T σ , which can be rewritten as We also note that the optimization domain is a set defined by PSD constraints (namely, ρ of the form (84) with ρ ≥ 0 and Tr[ρ] = 1), which means that optimizing any affine function over this set is an SDP, which can be efficiently solved and bounded [BV04] via its dual value. Together with the explicit expression (97) for the derivative of the objective function, this makes the optimization problem here a prime candidate for the Frank-Wolfe algorithm [FW56], which yields secure lower bounds on the true minimum value of the optimization.
The Frank-Wolfe algorithm is based on a simple geometric insight: for any point in the domain of a convex function f obj , the tangent hyperplane (or a supporting hyperplane, if f obj is not differentiable at that point) to the graph of f obj at that point yields an affine lower bound on f obj . Hence if we can minimize affine functions over the optimization domain, then any such tangent hyperplane lets us obtain a lower bound on the minimum of f obj on the domain. In addition, it intuitively seems that taking the tangent at points closer to the true optimal solution should yield tighter lower bounds (though there are some technical caveats). Thus in principle, one could simply perform some heuristic computations to get an estimate of where the true minimum lies, then take the tangent at that estimate and obtain the corresponding lower bound. For this optimization, however, we found that the results were fairly sensitive to deviations from the true optimum. To get the bounds to converge, we found it more efficient to use the standard Frank-Wolfe algorithm, which (under mild assumptions) can be proven to converge in order O(1/k) after k iterations: Frank-Wolfe algorithm (As presented in [WLC18]) Let the domain of optimization be D and the acceptable gap between the feasible values and the lower bounds be ε tol > 0.
Geometrically, the SDP in Eq. (99) corresponds to considering the tangent to f obj at ρ k and computing the maximum amount by which it can decrease (as compared to its value at ρ k ) over the domain D. As described earlier, this bounds the maximum amount by which f obj can decrease from f obj (ρ k ) over D.
In our application of the Frank-Wolfe algorithm, there is the technical issue that if G(ρ k ) is singular (or has negative eigenvalues, from numerical imprecision), then the derivative at ρ k (Eq. (97)) is ill-behaved. To cope with this, we used the heuristic solution of simply replacing ρ k with (1 − δ)ρ k + δU in Step 2, where δ := max(eigenvalues(−G(ρ k )) ∪ {10 −14 }). Note that this does not affect the security of the result, since it merely corresponds to taking a tangent at a slightly different point, which still yields a valid lower bound. On the other hand, it could possibly affect the theoretical convergence rates, but in practice this did not appear to pose a significant problem in our setting. (In [WLC18], this issue is addressed by analyzing a "perturbed" version of the optimization and applying a continuity bound, but we found that for the level of accuracy we desired in this work, the admissible perturbation values are too small to cope with the negative eigenvalues that occur.)

Overall algorithm
Putting the above results together, we see that for any set of intervals [θ j − δ j , θ j + δ j ] such that We can refine the intervals [θ j − δ j , θ j + δ j ] and set V by the iterative processes described above, with the innermost minimization over ρ AB being handled by the Frank-Wolfe algorithm. It is clearly possible to swap the order of min j and min (r Z ,r X )∈V in the last line; however, some heuristic plots of the objective function suggest that performing the optimizations in the order shown here is slightly faster.
To see that this expression can converge to a tight bound, observe that the first inequality in (100) becomes arbitrarily tight as we choose smaller values of δ j . For the second inequality, note that the described algorithm chooses V in such a way that the minimum over V approaches the minimum over S ¡ , hence this inequality also becomes arbitrarily tight.
In general, the above approach faces the difficulty that it needs to optimize over the choice of Lagrange multipliers λ. Given that our approach for solving (100) for a specific choice of Lagrange multipliers is already highly computationally intensive (requiring about 5000 core-hours to achieve the level of accuracy in the bounds (103) below for each value of p), it would be impractical to also optimize over the Lagrange multipliers while doing so. It is more feasible to first optimize the Lagrange multipliers while using a simple heuristic algorithm to estimate the minimizations, then certify the final result using our approach for solving (100). (This is essentially the same perspective as presented in [SBV+21].) We remark that this approach can also yield arbitrarily tight bounds for DIRE, where the goal would typically be [AFRV19, BRC20] to find lower bounds on (weighted sums of) "two-party entropies" H(A x B y |E). This is because by the same arguments as above, we have (Here we omit the parts corresponding to noisy preprocessing, since it is not applied in randomness expansion.) This expression can be bounded in the same way as we have just described above, though the objective function would no longer be affine with respect to (r Z , r X ), and hence the optimization over Bob's measurements would also have to be approached using a continuity bound. Our approach should yield a substantial improvement over previous results, which are restricted to the CHSH inequality and only bound the entropy of one party's outputs [LLR+21], or which consider the full distribution and bound the entropy of both outputs but use inequalities that are not tight [BRC20,TSG+21,BFF21]. In addition, the fact that it allows for the random-keymeasurement approach could yield further improvements, though there are some technicalities that we address at the end of Sec. 6.2.
It would be convenient for future analysis if it were possible to develop closed-form expressions forȓ p (as was done in [HST+20, WAP21, SBV+21]), rather than the computationally intensive numerical approaches shown above. However, this appears to be rather challenging, as noted in [WAP21]. In particular, we found numerical evidence against the conjecture proposed in [SGP+21] that the minimum in (72) (when restricted to qubit strategies) can always be attained by states such that ρ AB is of rank 2. (A similar observation was reported in [WAP21].) More precisely, in the process of heuristically solving the optimizations in order to estimate suitable choices of λ, we discovered that if we imposed the additional restriction that ρ AB has rank 2, there was a small but numerically significant difference as compared to the results without this restriction. (The states that heuristically approach the minimum in the latter indeed tend to have two very small eigenvalues, but it appears that these eigenvalues cannot be reduced exactly to zero.) This suggests that the aforementioned conjecture is not true after all, which poses a challenge for closedform analysis because the eigenvalues involved in computing the entropy are genuinely roots of a fourth-degree polynomial (in [HST+20,SBV+21], a key element of the analysis was to argue that it suffices to consider rank-2 ρ AB , simplifying the expression for the eigenvalues).  103)), while the points indicate the results of heuristically solving the optimization (72) over qubit states and measurements (with just the CHSH value as the constraint). As previously discussed, the tight bound in each case would be given by the convex envelope of the curve traced out by the points, assuming that the heuristics have found the true minimum. However, we can see that in each case that curve appears to be nonconvex over the interval [2, 2.75] (approximately), and its convex envelope would be affine over that interval -specifically, it would be given by the linear interpolation between the feasible points at the ends of that interval. The certified bound is almost flush with this linear interpolation, indicating that it is basically tight over this interval.

Resulting bounds
For the protocols in this work, the relevant bound to compute is for the case where the only constraint imposed is the CHSH value. More precisely, we consider the described optimization with a single constraint corresponding to the operator with the constraint value ν being the CHSH value. (As previously mentioned, this can be implemented in the formulation where are 4 constraint operators Γ x,y (θ A , θ B ) by simply restricting to Lagrange-multiplier choices of the form (λ 00 , λ 01 , λ 10 , λ 11 ) = (λ, λ, λ, −λ).) Each choice of the associated Lagrange multiplier λ yields an affine lower bound of the form λν + c λ , as noted in Eq. (74). Importantly, some heuristic computations (also observed in [SGP+21] for the p = 0 case) suggest that the true boundȓ p (ν) in this situation is in fact affine over a wide range of CHSH values -we show this in Fig. 3, which displays the results of heuristic minimizations compared to our certified bound in some cases. In particular, this range on which the bound is affine covers all currently experimentally reasonable values. This has the implication that there is a single λ that yields an affine lower bound which is equal toȓ p (ν) (i.e. it is tight) over this entire range; specifically, it is simply the value of λ corresponding to the gradient ofȓ p (ν) in this range. This greatly simplifies our task since we only need to solve the optimization for this specific value of λ.
We focused on several values of noisy preprocessing, ranging from p = 0 to p = 0.45. In each case, we first solved the minimization (75) heuristically for some selection of values of λ, in order to estimate the choice of λ that yields a tight bound over the range of CHSH values in whichȓ p (ν) is affine. Then using our algorithm to get certified bounds on the corresponding c λ in (75), we  103)). For comparison, the dashed curves show the corresponding asymptotic keyrates of the protocol in [HST+20], which does not use the random-key-measurement method (the p = 0 case is equivalent to the [PAB+09] protocol). The solid curves intersect the horizontal axis at q = 8.39%, 9.26% and 9.33%, in order of increasing p. The first value is a minor improvement over [SGP+21] (despite being effectively the same protocol), likely because our algorithm provably converges to a tight keyrate bound (for fixed p). The last value exceeds all previous bounds [HST+20, SGP+21, WAP21, SBV+21] for depolarizingnoise tolerance, by an amount comparable to their respective improvements over the original [PAB+09] protocol. It is also close to the upper bound of 9.57% that we derive in Sec. 5.6.
(To express the bounds as functions of winning probability w instead, as we required for our various keyrate computations, simply replace ν in each expression with 8w − 4.) With these bounds, we computed the asymptotic keyrates for these noisy-preprocessing values, and we show the results up to p = 0.3 in Fig. 4. From the p = 0.3 case, we obtain the depolarizing-noise threshold of 9.33% mentioned in the introduction. However, we were unable to obtain better thresholds using the higher values of p, for reasons we shall soon discuss (in Sec. 5.6 below).
For photonic experiments, we consider the [Ebe93] model described in Sec. 3.3 earlier, and the resulting keyrates for noisy-preprocessing values up to p = 0.3 are shown in Fig. 5 (as in the depolarizing-noise case, higher values of p gave somewhat worse results, for similar reasons that will be discussed in Sec. 5.6). As mentioned previously, in this scenario, for each value of η we have to optimize the honest states and measurements to maximize the keyrates. Since this optimization was performed heuristically and appears to be slightly numerically unstable, we are not completely certain whether these are indeed the best keyrates that could be achieved. However, for the values that we managed to obtain at least, it appears that unfortunately our protocol with τ 0 = τ 1 = 1/2 has worse detection-efficiency thresholds (for nonzero asymptotic keyrate) as compared to the results in [HST+20] based on noisy preprocessing alone, although at higher values of η our protocol  103)). At each value of η, the states and measurements were optimized to maximize the keyrate. For comparison, the dashed curves show the corresponding optimized asymptotic keyrates of the protocol in [HST+20], which does not use the random-key-measurement method (the p = 0 case is equivalent to the [PAB+09] protocol). In contrast to the depolarizing-noise case (Fig. 5), the threshold η value required for nonzero keyrates is significantly worse for our protocol than the [HST+20] protocol (although our protocol does yield slightly better keyrates at higher values of η).
can yield slightly better keyrates. 21 The comparatively poor detection-efficiency thresholds for our protocol in this model seems to be because of the error-correction value h hon -we find that for instance, the states and measurements that optimize the CHSH value yield a rather high value for at least one of the two terms H(A j |B j ; X j = Y j = 0) hon and H(A j |B j ; X j = Y j = 1) hon (see (17)) describing the errorcorrection "contributions" from the two generation inputs. Therefore, there is a significant tradeoff between maximizing the CHSH value (and hence the entropy against Eve) versus minimizing the error-correction value h hon , which results in worse detection-efficiency thresholds as compared to the [HST+20] protocol which only used one input for key generation. Potential points for further keyrate improvements in the photonic model would hence be to optimize the ratio of the generation input choices or to introduce different amounts of noisy preprocessing for the two inputs, but such optimization is rather involved and we leave it for further work.
In principle, one could consider more detailed models of photonic implementations, for instance those developed in [TWF+18,HST+20] (such an analysis was performed in [SGP+21] for the random-key-measurements technique). However, since informally speaking these models account for a range of other imperfections not considered in the simple model here, they typically result 21 Regarding the random-key-measurements technique alone, the results in [SGP+21] indicate that it does not significantly affect the detection-efficiency threshold as compared to the basic [PAB+09] protocol with optimized states and measurements, although it does somewhat improve the keyrates. We note that this comparison basically corresponds to the black curves in Fig. 5 here, although the photonic model used in [SGP+21] is a more detailed one [TWF+18] compared to the model here. Focusing on those curves in Fig. 5, we see that our findings here are roughly similar as well (in our case the random-key-measurements technique yields a slightly worse threshold, but this may just be an artifact of having considered a coarser set of data points to evaluate). in lower keyrates than the simple model. Given that the results shown here are not entirely promising, there may be limited prospects in pursuing that direction further unless we can find some improvements within the simpler model here.

Optimality of results
For each of the bounds in Eq. (103), there is a feasible point of the optimization for c λ which yields a value within 0.005 (or less, for higher values of p) of the certified results shown in Eq. (103), so these bounds on entropy are very close to optimal in terms of absolute error. In terms of the depolarizing-noise thresholds that they yield, taking the convex envelope of some of the feasible points shown in Fig. 3 yields the result that the thresholds for p = 0.2 and p = 0.3 cannot be improved by more than about 0.1 percentage points, so those thresholds are very close to optimal as well. However, larger values of p face the issue that the asymptotic keyrate becomes extremely low, which makes the horizontal intercept (i.e. the depolarizing-noise threshold) very sensitive to changes inȓ p -even a small absolute error in this bound results in a significant change in the threshold value. Therefore, the thresholds we obtained from the certified bounds with p = 0.4 and p = 0.45 in Eq. (103) were only 9.32% and 9.10% respectively, worse than the results for p = 0.3. Heuristic computations suggest that the true thresholds for those cases might be approximately 9.46% and 9.50% respectively, but using our algorithm to certify these values would require it to converge to tolerances that currently appear impractical. Hence a different approach may be needed to find the true thresholds for these values of p. From a practical perspective though, such improvements may be of limited use, because the very low asymptotic rates mean that the finite-size keyrate would likely be zero until extremely large sample sizes.
In any case, we note that for depolarizing noise at least, the threshold value cannot be improved much further by any protocol choices within the framework we have presented in this section, e.g. by using the full distribution as constraints (which also encompasses the use of modified CHSH inequalities [WAP21,SBV+21]), or adjusting the values of τ x . This is essentially because our bounds are very close to the linear interpolation between the points (2, h 2 (p)) and (2 √ 2, 1) (as can be seen from Fig. 3). Intuitively speaking, the bound on the entropy against Eve in the depolarizing-noise scenario cannot exceed this linear interpolation (because Eve can always perform classical mixtures of strategies in order to attain every point on this linear interpolation), which means that our bounds are close to the highest bounds that are even possible in principle.
Making this reasoning quantitative, we can obtain an explicit upper bound on the depolarizingnoise threshold. (We highlight that because we will do so by constructing an extremely generic attack, this upper bound holds for all protocols of this form, regardless of choices of parameters such as the input distributions and noisy preprocessing. To some extent, it is surprising that there even exist any protocols that achieve thresholds close to the upper bounds implied by such a generic attack.) We first recall that (as defined in Sec. 3.3) the measurement statistics in the depolarizing-noise model are given by real projective measurements on the Werner state q := (1 − 2q) |Φ + Φ + | + 2q I/4. It is known [AGT06,Kri79] that these measurement statistics can be reproduced by a local-hidden-variable (LHV) model as soon as the state does not violate the CHSH inequality, i.e. for q ≥ q 2 := (2 − √ 2)/4 ≈ 14.6%. (In fact, this is also true in a more general context where the honest parties have any number of real projective measurements on the Werner state, via the value of the second Grothendieck constant K G (2) = √ 2 [AGT06,Kri79]. For general (i.e. not necessarily real) projective measurements the analogous value is known to satisfy q 3 15.9%, following from the best known bound on the third Grothendieck constant K G (3) [HQV+17].) In the device-independent setting, the existence of an LHV model yields an attack for Eve that gives her full knowledge of the measurement outcomes. This means that for any depolarizing-noise value q, we can construct the following attack for Eve: first, she generates a classical ancilla bit which is equal to 0 with probability µ := (q 2 −q)/q 2 . If the bit is equal to 0, Alice and Bob's devices simply perform the honest measurements on the maximally entangled state Φ + . Otherwise, the devices implement an LHV model that yields the same statistics as the honest measurements performed on the Werner state q 2 , but which gives Eve full knowledge of the outcomes. This attack indeed reproduces the statistics corresponding to depolarizing noise q, since µ |Φ + Φ + | + (1 − µ) q 2 = q and the second strategy produces the same statistics as the honest measurements performed on In the two cases, the entropies of Alice's outputs after noisy preprocessing are H(Â x |Ẽ) triv = 1 (i.e. Eve's side-information is trivial since the Alice-Bob state is pure) and H(Â x |Ẽ) LHV = h 2 (p) (i.e. the uncertainty arises purely from the noisy preprocessing) respectively, whereẼ denotes Eve's side-information excluding the ancilla bit. Incorporating the ancilla bit into Eve's side-information E, this attack achieves This expression hence serves as an upper bound on the best possible lower bound we could derive on the conditional entropy against Eve. Also, the conditional entropy against Bob's output (for Bob's optimal key-generation measurements on Werner states) is simply Recalling that the asymptotic keyrate is given by the Devetak-Winter bound [DW05] (up to the sifting-related weights τ x [SGP+21]), this yields a simple upper bound on the critical value of q that still allows a positive asymptotic keyrate (for protocols of this form, i.e. applying noisy preprocessing, random key measurements, and considering the full output distribution, but only using one-way error correction): where in the first line the entropies refer to those of the attack we have described. We observe numerically that this upper bound is increasing with respect to p, so we have q att (p) ≤ q att (p → 1/2). In the limit p → 1/2 we can write p = 1/2 − δ and expand the expression in small δ to find Hence for protocols of the form described in this work, it is not possible for the depolarizing-noise threshold to exceed this value. For the parameter choice p = 0.3, we have q att (0.3) ≈ 9.51%, which is close to the threshold of 9.33% for which we could certify a positive keyrate. 22 22 These two thresholds cannot match exactly, because the feasible points shown in Fig. 3 also imply that the tight boundȓp(ν) is not exactly equal to the linear interpolation between (2, h2(p)) and (2 √ 2, 1), which essentially corresponds to the attack we describe here.
Finally, it is worth noting that while the main focus of this work is depolarizing noise applied to the statistics corresponding to the ideal CHSH measurements on Φ + , the above analysis in fact generalizes to a substantially larger family of scenarios. Specifically, the same analysis applies for depolarizing noise applied to the statistics from any number of real projective measurements on Φ + , since the results of [AGT06,Kri79] yield the required LHV models. 23 (If Bob performs suboptimal generation measurements, then Eq. (105) still holds as a lower bound, so the argument carries through.) In addition, replacing q 2 with q 3 straightforwardly provides a threshold of q 10.01% for all possible projective measurements on that state, via the respective known bound in [HQV+17]. (Note that in all such cases where there are more than 2 possible measurements, there is no "qubit reduction" to make it easier to derive corresponding lower bounds, and hence the lower bounds have also not been very thoroughly explored.) An interesting further consideration is the case of general POVM measurements. Here, a rather loose bound q POVM 27.3% is also known [HQV+17] for the threshold that allows an LHV description. However, pure POVMs on qubits can involve up to four outcomes, opening up the possibility of more general noisy preprocessing -all doubly stochastic maps on probability vectors with four outcomes. We thus leave it for future work.
The above argument relies on the fact that depolarizing-noise statistics can be obtained by simple mixtures of "extremal" strategies. This is not necessarily the case for other noise models, such as limited detection efficiency in photonic experiments. Therefore, the same argument cannot be directly applied to obtain upper bounds on the thresholds for such forms of noise.
6 Possible modifications

Coordinating input choices by public communication
The random-key-measurement protocol has the drawback that the keyrate is effectively halved, since the generation rounds have "mismatched" inputs approximately half the time. It would be helpful to find a way to work around this issue. One possible approach could be to observe that in [AFRV19], it was assumed that the following operations can be performed in each round: the devices receive some shares of a quantum state, then 24 Alice and Bob publicly communicate to come to an agreement on both of their input choices, and finally they supply these inputs to their devices. (This was necessary in [AFRV19] because Alice and Bob's actions in that DIQKD protocol require both of them to know whether it is a test or generation round. In fact, our analysis can be viewed as the first EAT-based security proof for a "genuinely sifting-based" DIQKD protocol, in the sense that Alice and Bob do not coordinate which rounds are test rounds, and simply choose their inputs independently.) If we assume that this is also possible in our scenario, then Alice and Bob could coordinate their inputs in the generation rounds instead of choosing them independently, thereby avoiding the sifting factor.
Unfortunately, it does not seem clear if such a proposal is entirely plausible in near-term experimental implementations. This is because it relies on the devices being able to store the quantum state for long enough for Alice and Bob to agree on their choice of inputs, which is potentially challenging for current Bell-test implementations. As an alternative, we propose the following potential modification to the DIQKD protocol in [AFRV19] -instead of agreeing on the test rounds via public communication, Alice and Bob could use a small amount of pre-shared key to choose which rounds are test rounds, in the same way as in DIRE (for details on the amount of pre-shared key required, see the DIRE protocols in [AFRV19,BRC20] or the discussion in Sec. 6.2 below). This approach would essentially be a "key expansion" protocol that requires a small amount of pre-shared key to initialize. We remark that this is not a dramatic change in perspective, because a common method to authenticate channels (namely, message authentication codes) relies on having a small amount of pre-shared key, so the assumed existence of authenticated channels in the DIQKD protocol is likely to require some pre-shared key in any case.
However, this basic notion cannot immediately be generalized to Protocol 1 here, since requiring Alice and Bob to choose uniformly distributed "matching" inputs in the generation rounds would require a large amount of pre-shared key (roughly (1 − γ)n bits). Fortunately, in the following section, we propose a variation which overcomes this difficulty by "recovering" the entropy in the pre-shared key, thereby still achieving net key expansion.

Protocol using pre-shared key
Here we describe a variant protocol that avoids the sifting factor without requiring the brief quantum storage described above, through the use of a fairly long pre-shared key. The limitation of this variant is that the net increase in secret key is just a (constant) fraction of the amount of pre-shared key; however, the rate of net key generation does not have the sifting factor of 1/2. Informally, the idea is to simply use the pre-shared key as Alice's input string X, which allows Bob to choose his generation inputs to match Alice's. Just as importantly, this also allows them to (almost) entirely omit the public announcement of their inputs -hence X remains private, and with some care it can be incorporated into the final key without losing the entropy it "contains".
We now describe this idea in detail as Protocol 2 below, followed by its security proof. The protocol supposes that Alice and Bob hold a pre-shared (uniform) key of n bits, which we shall simply denote as X, since it will be exactly the string that Alice uses as her inputs. The appropriate value of key to choose will be described later in Theorem 3.

Protocol 2
This protocol proceeds the same way as Protocol 1, except for the following changes: • In each round, Alice's input X j is determined from the pre-shared key X, instead of being generated randomly in that round. Bob's input Y j is generated as follows: with probability γ he chooses a uniformly random Y j ∈ {2, 3}, otherwise Bob chooses Y j = X j . In addition, he generates another register Y j which equals Y j when Y j ∈ {2, 3} and equals ⊥ otherwise.
• Alice and Bob do not publicly announce the strings XY. Instead, Bob only announces the string Y . 25 Additionally, the sifting step is unnecessary, since there will be no rounds such that Y j ∈ {0, 1} and X j = Y j .
• Privacy amplification is performed on the strings AX andÃX instead of A andÃ.
To prove the security of this protocol, we can simply follow almost exactly the same security proof as for Protocol 1, with some changes we shall now describe. Firstly, the value of h hon (to be used when computing EC max ) is replaced bỹ since the probabilities of X j = Y j = z for z ∈ {0, 1} are now (1 − γ)/2. (Note that no errorcorrection information needs to be sent from Alice to Bob regarding X, since both of them have a copy of that string.) Also, since the strings used in the privacy-amplification step are now AX andÃX, this means that we need an equivalent of Eq. (71), with H ε s min (AB X|Y E) in place of H ε s min (AB |XYE). To obtain this, we note that we can simply construct a virtual protocol in the analogous way to Protocol 1 , then consider the same EAT channels M j as before, but instead we shall identify C j with D j , A j B j X j with S j , and Y j with T j in Definition 10. The Markov conditions are again fulfilled, since Y j is generated by trusted randomness in each round and independent of all previous data. To find an appropriate min-tradeoff function for these channels, we note that the output because X j is produced by trusted randomness independent of Y j R. Therefore, we can use the chain rule to write where the functiong (in contrast to g) does not have the factor of 1/2 introduced by sifting, since Alice does not "erase" the outputs of any rounds. We can thus construct a new min-tradeoff functionf min in the same way as in Sec. 4.3.1, but using 1 +g(w) in place of g(w). 26 The rest of the proof then proceeds as before, leading to the following security statement: Theorem 3. Protocol 2 has the same security guarantees as those described in Theorem 1, except with the following changes (withg being defined in (110)): • β is chosen to be in [1 +g(0), 1 +g(1)] instead.
• h hon is replaced byh hon as specified in Eq. (108).
25 It might be possible to consider a slight variant which omits this step. However, knowing Y allows Alice to compute Y, which may be relevant for error correction since it allows Alice to distinguish the test and generation rounds. In any case, it seems unclear whether the entropy of Y can be usefully extracted even if it is kept secret, since Alice does not have access to it in that case. 26 We remark that if we think of this replacement as happening in two steps, first replacing g byg and then adding a "constant offset" of 1, then the latter has no effect on VarQ f (fmin) or the difference Max(fmin) − MinQ f (fmin), and hence does not change the finite-size correction to the keyrate except indirectly via changing the system dimensions and the range of β. However, the first step of replacing g byg does slightly increase the finite-size correction (sincẽ g has a somewhat larger range).
• In Eq. (31) for key , g(w exp − δ tol ) is replaced by 1 +g(w exp − δ tol ), and the values of V and K α are replaced bỹ V := Var Q f (f min ) + 2 + log 129, wheref min is a function that satisfies , Overall, recalling that Protocol 1 required n bits of pre-shared key, we see that the net gain of secret key bits in Protocol 2 is larger than that of Protocol 1 by a factor of approximately (ignoring the changes to the finite-size corrections) since it avoids the sifting factor. Informally, by keeping X secret and incorporating it in the privacy amplification step, we have "recovered" the entropy that was present in the original pre-shared key. 27 In practice, including the string X in privacy amplification essentially doubles the input size for the hash function in that step, which raises its computational difficulty substantially (though not insurmountably). One might wonder whether it would be possible to bypass this aspectfor instance, by simply performing privacy amplification on A andÃ as before, then appending X to the output. At first glance, this approach might appear plausible, since X is not announced in Protocol 2. Unfortunately, it seems unclear how to certify that the publicly communicated errorcorrection string L is independent of X (in fact, it seems unlikely that this is true). Hence the idea of simply appending X may not be secure. By instead incorporating it in privacy amplification in the specified manner, Protocol 2 ensures that the entropy of X is securely "extracted" into the final key.
As previously mentioned, the net increase in secret key given by one instance of Protocol 2 is limited to a fraction of the amount of pre-shared key. However, it is possible in principle to recursively run Protocol 2 in order to achieve unbounded key expansion -one can use the key generated by one instance of Protocol 2 to run it again with a longer pre-shared key and larger n (since the security definition is composable, the soundness parameter will only increase additively in this process [PR14]). We stress that in doing so, one must always incorporate the seed into the privacy-amplification step exactly as specified in Protocol 2 -in particular, this means that the 27 Note that the underlying idea here critically relies on the fact that the bound rp is for the entropy of the output strings conditioned on Xj. This allowed us to use the chain rule to obtain Eq. (63), which eventually led to the result that the seed entropy contained in X simply "adds on" to the entropy of the output strings in the original Protocol 1. If rp had been a bound on, for instance, H(Aj|Y j R) instead of H(Aj|XjY j R), this argument would not have worked.
entire key changes with every iteration, instead of simply having some new bits appended. Some care is necessary regarding device memory across instances of this recursive procedure -while it does not seem to be directly vulnerable to the memory attack of [BCK13] 28 , it is still important to ensure that the states measured in each instance of the protocol are independent of the key generated in the preceding instance, since this key is used to choose the device inputs (which must be independent of the state in order for our security arguments to hold). Again, this relies on the notion that the registers measured by the devices do not contain information about the key generated in the preceding instance.
There is another potential variant of this idea where a pre-shared key is instead used to generate both input strings X and Y, and the input-choice announcement is omitted entirely, with privacy amplification being performed on AXY andÃXY. This can be done by using n bits to set the value of X j in all rounds as before, then using κh 2 (γ)n bits to choose Bob's test rounds approximately according to the desired IID distribution, and finally using κ γn bits to set the value of Y j in the test rounds (while the generation rounds simply have Y j set to X j ), where κ, κ > 1 are constants that can be chosen such that the approximations to the desired distributions are sufficiently accurate (see the randomness-expansion protocols in [AFRV19, BRC20] for a more complete description of this process based on the interval algorithm). 29 This would hence require (1+κh 2 (γ)+κ γ)n bits of seed randomness. A similar argument as above could then be performed by noting that (for X j Y j generated according to the ideal distribution) we have H(X j Y j ) = 1 + h 2 (γ) + γ, so most of the seed entropy can be "recovered", up to the losses from the κ, κ factors. However, tracking the effects of using the interval algorithm to approximate the ideal distribution is cumbersome (albeit possible; see [BRC20]), and it is unclear if this variant offers any immediate advantage over Protocol 2 for DIQKD -though it may be useful for protocols that use other Bell inequalities or non-uniform input distributions, as mentioned in Sec. 4.4.
On the other hand, it appears that this variant may have potential for the purposes of DIRE instead. (As mentioned in the introduction, this idea has also been independently proposed in [BRC21].) The main reason why the random-key-measurement approach in [SGP+21] could not be easily generalized to DIRE is that in order for Alice to select a uniformly random input in every round, she requires a (local) source of n random bits, which is a free resource in DIQKD but not in DIRE -if a proposed DIRE protocol consumes more random bits than it produces, then it has failed to achieve randomness expansion 30 . However, the protocol proposed in this section has the property that it "recovers" the entropy contained in the seed, which means that one can afford to use much larger seeds while still obtaining a net increase in secret key. Explicitly, the application of this idea to DIRE would hence be as follows: one begins with 2n uniformly random bits, which are then used as the input strings 31 XY to the devices over n rounds to obtain outputs 28 This is because the only public communication in Protocol 2 that can leak any information is the error-correction string (all other public communication is based on trusted randomness). In our security proof, we have bounded the min-entropy leakage at this step simply via the length of this string, without any assumptions about its structure, and hence we can still obtain a secure bound on the min-entropy of the input for privacy amplification in the final protocol instance. Note that this claim is strictly restricted to device reuse following the recursive process specified here -once any key bits have been used for any other purpose, the attack again becomes a potential concern if the devices are reused. 29 We break up the use of the seed into separate processes because it allows for better efficiency as compared to directly approximating the desired distribution of XY -with the approach we describe, the "inefficiency" prefactors κ, κ of the interval algorithm only appear on the h2(γ)n, γn terms instead of the full entropy of XY. 30 There is, however, the related but distinct task of device-independent randomness generation [ZSB+20] (where the goal is to produce private randomness from an unbounded supply of public randomness that is independent from the devices), in which this would not be an issue. 31 For DIRE based on the CHSH inequality, Bob only requires two possible measurements instead of the four AB. Modelling this process using EAT channels in a manner similar to above (see e.g. [LLR+21] for details), for each round we would have which (given a bound on H(A j B j |X j Y j R)) allows one to bound the smoothed min-entropy of ABXY conditioned on E. By performing privacy amplification on ABXY, one "recovers" all the entropy in the seed, due to the H(X j Y j |R) term in the above equation. Overall, this proposed protocol allows one to use the improved entropy rate provided by the random-key-measurement approach [SGP+21], in the context of DIRE instead of DIQKD.

Collective attacks
As a reference to compare our results against, we could consider whether a longer secure key could be obtained under the collective-attacks assumption, which is the assumption that the device behaviour is IID in each round of the protocol (though Eve can still store quantum information for arbitrary periods). 32 To this end, we derive a security proof under this assumption. We defer the proof to Appendix C, and just state the final key length formula here. As compared to Theorem 1, the parameters involved in the formula are somewhat different: some of the previous parameters are no longer involved, though there are some new ones, which we qualitatively describe as follows (they are closely related to the notion of "ε-secure filtering" described in [Ren05]). ε IID : Informally, a bound on the probability that the virtual parameter estimation step (in the virtual protocol) accepts when given devices that produce insufficient min-entropy δ IID : "Relaxation" parameter that slightly enlarges the set of states for which we bound the entropy, in order to derive a nontrivial value of ε IID The formal theorem statement is: Theorem 4. Take any ε com EC , ε com PE , ε PA , ε h , ε s ∈ (0, 1], γ ∈ (0, 1), p ∈ [0, 1/2], and δ IID ∈ [0, w exp − δ tol ). Define Under the collective-attacks assumption, Protocol 1 is (ε com EC + ε com PE )-complete and (max{ε IID , ε PA + 2ε s }+2ε h )-sound when performed with any choice of EC max and δ tol such that Eq. (14) and Eq. (30) hold, and key satisfying required for the DIQKD protocol here. 32 To be precise, we mean that the part of the state held by Alice and Bob's devices is IID across the rounds, and in each round the devices have the same set of possible measurements. Since all purifications are isometrically equivalent, without loss of generality we can suppose that Eve also holds an IID purification of the Alice-Bob state. However, the above theorem is simply a statement for Protocol 1 under the assumption of collective attacks, and that protocol does not fully exploit some implications of that assumption. For instance, in Theorem 4 there is implicitly an O(γ) subtractive penalty to the keyrates (which was also present in Theorem 1 33 ) caused by having to include the test-round data in the EC max term. Yet under the collective-attacks assumption, the test rounds are completely independent of the generation rounds, which implies that the effect of γ should instead be to reduce the keyrate by a multiplicative factor of (1 − γ). Importantly, in the latter case it is possible to choose arbitrarily large test probabilities γ without necessarily making the keyrates negative, which can dramatically improve the statistical bounds for parameter estimation. To formalize this idea, we consider Protocol 3 below, which attempts to minimize the finite-size correction as much as possible using the most optimistic assumptions that have been discussed thus far.

Protocol 3
This protocol proceeds the same way as Protocol 1, except for the following changes: • Instead of independently choosing whether each round is a test or generation round, Alice chooses a uniformly random subset of size m as test rounds before the protocol begins, and we define γ as the value m/n. Alice also prepares the strings XY in advance, by choosing X j = Y j ∈ {0, 1} uniformly at random in the generation rounds, and choosing X j ∈ {0, 1}, Y j ∈ {2, 3} uniformly at random in the test rounds.
• In each round, Alice and Bob briefly store their received quantum states instead of immediately measuring them. Alice then publicly announces X j Y j , which Alice and Bob then use as the inputs to their devices. 34 • In the error-correction step, Alice does not send error-correction data (and a corresponding hash) for the full string A, but rather only the subset of it consisting of the generation rounds, denoted as A g . Bob's guess for this string will be denoted asÃ g . The values of A in the test rounds, denoted as A t , are sent directly to Bob without compression or encryption, and Bob uses this string for parameter estimation.
• Bob's accept condition is instead to check that hash(A g ) = hash(Ã g ) and freq ct (1) ≥ w exp − δ tol hold, where C t denotes the substring of C corresponding to the test rounds (in particular, this means the frequencies are computed with respect to a string of length γn, not n).
• Privacy amplification is performed only on the strings A g andÃ g .
33 In fact, Theorem 1 has another O(γ) subtractive penalty from the use of the bound (63), but this was due to the technical limitations of the EAT and the fact that the bounds (64) and (67) are slightly suboptimal. 34 Here, in our attempt to minimize the finite-size effects, we are following the [AFRV19] assumption mentioned previously: Alice and Bob can briefly store their received quantum states, in a manner such that the public communication cannot affect the stored states.
For this protocol, the value of EC max is to be computed based only on the number of generation rounds, since error correction is performed on the string A g rather than A. Focusing on the best possible theoretical bounds from Sec. 3.1, this means we take EC max to be given by Eq. (15) with Hε s max (A|BXY) hon ≤ (1 − γ)nh hon + (1 − γ)n (2 log 5) log where h hon = z∈{0,1} since the test rounds are excluded. With this value of EC max in mind, we can state the security guarantees of this protocol, with the proof given in Appendix D (we highlight that the dependence on several security parameters here is different as compared to the previous theorems -qualitatively, this is because for instance we no longer consider a virtual protocol when introducing parameters such as ε IID , and also the various binomial distributions are different since the test-round subset now has a fixed size): Theorem 5. Take any ε com EC , ε com PE , ε PA , ε h , ε s ∈ (0, 1], γ ∈ (0, 1), p ∈ [0, 1/2], and δ IID ∈ [0, w exp − δ tol ). Define ε IID := B γn,1−wexp+δ tol +δ IID ( (1 − w exp + δ tol )γn ). (119) Under the collective-attacks assumption, Protocol 3 is (ε com EC + ε com PE )-complete and (max{ε IID , ε PA + 2ε s } + ε h )-sound when performed with EC max defined in terms of ε com EC as described above, and δ tol , key satisfying ε com PE ≥ B γn,wexp ( (w exp − δ tol )γn ), (120) key ≤ (1 − γ)nr p (w exp − δ tol − δ IID ) − (1 − γ)n (2 log 5) log 2 ε 2 s − EC max − log 1 ε h − 2 log 1 ε PA + 2.
In Fig. 6, we plot the results of Theorem 5, focusing on the [RBG+17] experiment. This protocol has improved finite-size performance as compared to the original Protocol 1 (under the collectiveattacks assumption) due to at least two factors. Firstly, we can potentially use larger γ values, as previously mentioned (some of the points shown in the figure correspond to values ranging up to γ ≈ 0.3). Secondly, we find that for fixed values of γ, n, w exp , δ tol , δ IID , the binomial-distribution bounds for this protocol (Eqs. (119) and (120)) are typically several orders of magnitude better than their counterparts for Protocol 1 (Eqs. (130) and (42)). Intuitively, this arises because in Protocol 1, the number of test rounds is itself a random variable, hence increasing the variance in e.g. the number of rounds where C j = 1. Practically speaking, this means Protocol 1 requires noticeably larger values of δ tol and δ IID in order to achieve given completeness and soundness parameters, hence reducing the keyrate by a nontrivial amount.
However, we see that even with the optimistic assumptions that yield Theorem 5, the keyrate for the estimated experimental parameters we consider only becomes positive at fairly large n. This indicates that substantial further work is necessary in order to achieve a demonstration of positive finite-size keyrates.  Figure 6: Finite-size keyrates as a function of number of rounds in Protocol 3 (see Theorem 5), using honest devices following the estimated parameters in [MDR+19] for the loophole-free Bell test in [HBD+15], for p = 0 and p = 0.03 (the latter is a rough estimate of the choice of p which yields the highest asymptotic keyrate for these experimental parameters). Note that the latter graph is computed using a heuristic estimate of r p rather than a certified bound. The error-correction protocol is taken to satisfy Eqs. (15) and (117).
The colours correspond to soundness parameters of ε sou = 10 −3 , 10 −6 , and 10 −9 for black, blue, and red respectively, and the completeness parameter is ε com = 10 −2 in all cases. The horizontal line denotes the asymptotic keyrate. All other parameters in Theorem 5 were numerically optimized. The required number of rounds to achieve positive keyrate is substantially lower than Protocol 1 (see Fig. 1).

Conclusion and further work
In this work, we have performed a finite-size analysis for a protocol that combines several of the most promising approaches towards improving keyrates for DIQKD. Furthermore, we develop an algorithm that computes arbitrarily tight lower bounds on the asymptotic keyrates for protocols of this form (i.e. allowing for noisy preprocessing and random key measurements, but restricted to one-way error correction), which applies to all 2-input 2-output scenarios. This allows us to prove a new threshold of 9.33% for noise tolerance in the depolarizing-noise model, and we show (see Sec. 5.6) that for one-way protocols, any further improvements on this threshold can only be fairly small. Finally, we propose a modified protocol, based on a pre-shared key, that overcomes the disadvantage of the sifting factor in random-key-measurement protocols.
We remark that while the finite-size analysis shown here is for a protocol based on the CHSH inequality, our algorithm for the asymptotic keyrates applies more generally to 2-input 2-output scenarios. If some exploration with this algorithm suggests that an improvement can be obtained by using an inequality other than CHSH, then it would not be difficult to modify the finite-size analysis for such an inequality, as was done in e.g. [BRC20] for DIRE (essentially, it would just correspond to having a different bound r p ).
However, our results show that for the NV-centre experiment in [HBD+15] and the cold-atom experiment in [RBG+17], an impractically large sample size (for those implementations) would still be needed in order to achieve a positive finite-size keyrate, even if one makes the optimistic assumption of collective attacks. A significant question that remains to be addressed is that of photonic experiments [SMSC+15,GVW+15], which achieve lower CHSH values but much larger sample sizes (for instance, the photonic DIRE demonstration in [LLR+21] implemented one run with n = 1.3824 × 10 11 over 19.2 hours, and one run with n = 3.168 × 10 12 over 220 hours). Un-fortunately, for photonic models our heuristic results (Sec. 5.5) suggest that incorporating random key measurements is less helpful in improving the keyrate, because the experimental parameters that achieve maximal CHSH value also tend to result in higher error probability (with respect to Bob's outputs) for one of Alice's measurements than the other. Despite this challenge, we note that there is much freedom in parameter optimization for photonic experiments [MSS20], and given the algorithm we developed here for bounding the asymptotic keyrates, it is now possible to analyze variants such as choosing different ratios for the two key-generating measurements, or different amounts of noisy preprocessing for each basis. We aim to continue studying this in future work.
We also note that for photonic implementations, a protocol modification termed random postselection was recently proposed in [XZZ+22], where it was shown to provide significant improvements in the detection-efficiency thresholds required for positive asymptotic keyrates, under the assumption of collective attacks. However, that modified protocol involves public announcements in each round that violate the Markov condition in the EAT, and hence the EAT cannot be directly applied to prove security of that protocol against general attacks. It remains to be seen whether this can be proven using some other approach, although one challenge to overcome would be the issue that for other protocols with a slightly similar public-announcement structure, it has been shown [TTB+16] that the achievable keyrates against general attacks are strictly lower than for collective attacks. Any approach for proving the security of the [XZZ+22] protocol against general attacks (with the same asymptotic keyrates as for collective attacks) would have to rely on some specific difference between that protocol and the ones analyzed in [TTB+16].
As another extension of our work, our algorithm for bounding the asymptotic DIQKD keyrates also applies to DIRE. Since the results it returns are arbitrarily tight and easily incorporated into the EAT, this could be combined with the finite-size analysis of [BRC20,LLR+21] to improve the results of those works -the proof methods used there yield slightly suboptimal keyrates, in that they either bound the min-entropy rather than the von Neumann entropy [BRC20], or they only bound the entropy of one party's outputs and are restricted to the CHSH inequality [LLR+21]. Our algorithm would yield tight bounds on the two-party von Neumann entropy for 2-input 2-output Bell inequalities; furthermore, it allows the possibility of using the random-key-measurement approach (by applying the pre-shared key proposal) to improve the keyrates.

Computational platform and code
The min-tradeoff function computations were performed using the MATLAB package YALMIP [Löf04] with the solver MOSEK [MOS19], while optimization of the finite-size keyrates was performed in Mathematica and MATLAB. Some of the calculations reported here were performed using the Euler cluster at ETH Zürich. The code used to compute the certified lower bounds is available at the following URL: https://github.com/ernesttyz/qbitent where the last line holds because the test rounds are independent of the generation rounds. Hence as long as we choose key satisfying Eq. (121), we will have Again noting that Pr[Ω h ∧ Ω PE ] ≥ ε 2 s in this case, the Leftover Hashing Lemma then implies Therefore, we finally conclude that under the collective-attacks assumption, the secrecy condition is satisfied by choosing ε sec QKD = max{ε 2 s , ε IID , ε PA + 2ε s } = max{ε IID , ε PA + 2ε s }.
As before, the protocol is ε h -correct, so we conclude that it is (max{ε IID , ε PA + 2ε s } + ε h )-sound when performed with key satisfying Eq. (121).