Self-testing with finite statistics enabling the certification of a quantum network link

1Université Paris-Saclay, CEA, CNRS, Institut de Physique Théorique, 91191, Gif-sur-Yvette, France 2Group of Applied Physics, University of Geneva, 1211 Geneva 4, Switzerland 3Quantum Optics Theory Group, Universität Basel, CH-4056 Basel, Switzerland 4Fakultät für Physik, Ludwig-Maximilians-Universität, 80799 München, Germany 5Max-Planck-Institut für Quantenoptik, Hans-Kopfermann-Strasse 1, 85748 Garching, Germany 16 February 2021

Self-testing is a method to certify devices from the result of a Bell test. Although examples of noise tolerant self-testing are known, it is not clear how to deal efficiently with a finite number of experimental trials to certify the average quality of a device without assuming that it behaves identically at each run. As a result, existing self-testing results with finite statistics have been limited to guarantee the proper working of a device in just one of all experimental trials, thereby limiting their practical applicability. We here derive a method to certify through self-testing that a device produces states on average close to a Bell state without assumption on the actual state at each run. Thus the method is free of the I.I.D. (independent and identically distributed) assumption. Applying this new analysis on the data from a recent loophole-free Bell experiment, we demonstrate the successful distribution of Bell states over 398 meters with an average fidelity of ≥55.50% at a confidence level of 99%. Being based on a Bell test free of detection and locality loopholes, our certification is evidently device-independent, that is, it does not rely on trust in the devices or knowledge of how the devices work. This guarantees that our link can be integrated in a quantum network for performing long-distance quantum communications with security guarantees that are independent of the details of the actual implementation.

Introduction
The distribution of entanglement over long distances is a key challenge in extending the range of quantum communication and building quantum networks [1,2]. The direct transmission of entangled states through optical fibers is a viable solution for short distances but is limited by transmission loss. Quantum repeaters have thus been proposed for entanglement distribution over long distances 1 [4,5]. The basic idea is to divide the global distance into short elementary network links. Entanglement is created in each link and successive entanglement swapping operations are used to combine links and extend entanglement. The key to efficient quantum repeater is to use elementary links where i) the successful creation of entanglement is heralded and ii) entanglement is stored so that it can be created in each link independently. Impressive progress along this line now allows one to envision multipartite quantum networks where entanglement is distributed between arbitrary parties. [6][7][8][9][10][11] At the heart of quantum networks lies the ability to distribute but also certify an entangled state between two distant locations. Although entangled states have been produced between remote locations forming an elementary network link in multiple experiments [12][13][14][15][16][17][18][19][20], their suitability for general purposes including -but not 1 For protocols that do not rely on entanglement creation in independent links, but are based exclusively on error correction codes, see [3]. limited to -quantum key distribution, remains unproven. Demonstrations based on the qubit assumption for instance, stating that all elements involved are of dimension two, are subject to side-channels which completely corrupt the security guarantees [21].
More generally, the identification of a quantum state provides the most complete description of a system. But the trace left by a state in the measurement outcomes is as much influenced by the state as by the measurement itself. Consequently, it is challenging to obtain an accurate description of a quantum state from observed statistics without presuming a detailed description of the measurement apparatus. Yet, characterizing univocally a quantum resource by identifying its quantum state constitutes a crucial step to set quantum technologies on a solid stand.
The possibility of device-independent state characterization which is not relying on assumptions on the dimension of the Hilbert space and on the correct calibration or modelling of the measurements [22] was first realized in Ref. [23,24]. There it was noted that the only quantum states able to achieve a maximum violation of the Bell-CHSH inequality [25], are Bell states -two-qubit maximally entangled states. Interest in self-testing however only started growing significantly after Mayers and Yao rediscovered it and showed that it provides security for quantum key distribution [26,27]. Since then, it has been understood that self-testing guarantees the security of many quantum information tasks, including randomness generation [28][29][30] and delegated quantum computing [31]; see [32] for more details. Therefore, self-testing a state guarantees its direct applicability for a wide range of applications. Motivated by this perspective, further theoretical self-testing results have been obtained lately, addressing an increasing range of states, and with improving tolerance to noise [30,[33][34][35][36][37][38][39][40][41]. Moreover, self-testing has also been extended to the characterization of quantum measurements and channels [27,31,[42][43][44][45][46][47].
In the case of Bell states, it is now known that self-testing based on the Bell-CHSH inequality is strongly resistant to noise [34,36,37]. Recently, this led to the experimental estimation of self-testing fidelities from the perspective of hypothesis testing, in which the null hypothesis to be rejected is that the source only produces states with a fidelity below a fixed threshold value [48]. Rejection of this hypothesis then implies that at least one state produced by the source had a fidelity higher than the threshold value.
However, the implications of hypothesis testing to practical protocols is not clear since no statement on the average fidelity is provided. As an example, data from the experiment involving two individual ions separated by ≈ 1 m [29] were shown to lead to a significant rejection of the null hypothesis even though the average Bell violation that could be certified with the methods presented in [29] at 99% confidence level is lower than 2.05, hence precluding a conclusion on the average Bell state fidelity [49]. A higher Bell violation was demonstrated between two ions separated by about 340 µm within a trap [48], but this short-distance setting is not directly applicable for quantum networks. The recent advent [50][51][52][53] of loophole-free Bell tests [54] involving large separations between entangled particles opens a new perspective for device-independent certification of states distributed in quantum networks.
We here derive a method that provides a confidence interval on the average violation of any binary Bell inequality without assuming that the trials are independent and identically distributed. Applying this method to the CHSH-Bell test allows us to certify that a source has the capability of producing on average states close to a Bell state without making any assumption on the actual state at each trial. This changes the status of self-testing from a mere theoretical tool to a practical certification technique. We show this by considering the data used on the loophole-free Bell inequality violation reported in Ref. [53] where entanglement is distributed in a heralded way and stored in two single atoms trapped at two locations separated by 398m before a Bell test is performed. We first optimize the heralding conditions using an ab-initio model of the entanglement generation process, hence improving on the entanglement fidelity of the data set in [53]. We then apply our new statistical tool accounting for finite experimental statistics and imperfections of the random number generators.
From the observed Bell-CHSH value, we certify the successful distribution over 398 m of an entangled state with a Bell state fidelity of 55.50% at a confidence level of 99%. This constitutes the first result where a statistically relevant bound on the average fidelity of the distributed state is obtained directly from the Bell-CHSH value and the first device-independent certification of an elementary link for a quantum network.

Device-independent assumptions
The scenario we consider involves three protagonists, colloquially referred to as Alice, Bob and Charlie, see Fig. 1. Charlie holds a preparation device which indicates when the experiment is ready: it heralds the start of every measurement procedure. The two other parties each hold one measurement device and one random number generator device. Upon heralding, the random number generators are used by Alice and Bob to choose a measurement setting which is applied to their measurement devices. Measurement settings and outcomes are recorded locally for later analysis. The claim of self-testing for the state measured is based on a number of assumptions that we review now.
1. The experiment admits a quantum description. Essentially, the state of the system can be represented in terms of a density operator, and the measurements as operators acting on the same Hilbert space with the appropriate tensor structure.
2. All devices mentioned above are well identified in space and operate sequentially in time. In particular, the separation between the parties Alice and Bob is clear, as well as between the random number generators and the measurement devices of each party. Moreover, results are recorded before going to the next round, hence we know exactly when a round is going on (two rounds don't happen simultaneously), when it is finished, and we can monitor how many rounds happened in a given time.
3. The random number generators are independent from all other devices and sample from a well characterized probability distribution. Hence, the measurements used are chosen freely: the measured particles cannot influence this choice, nor vice versa. The random number devices can be correlated to each other, but not to the rest of the setup.
4. Finally, the classical and quantum communication between Alice and Bob is limited: no communication (whether direct or indirect) is allowed between the measurement boxes once the settings choices are received and until the measurement outcomes are produced. Moreover, the random number generators only provide the choice of measurement setting when required, and to their respective measurement device. (Note that space-like separation can be used to guarantee the condition of no communication between Alice and Bob.) Apart from the first assumption, which has not been challenged by any experiment so far, note that all three remaining assumptions concern the relation between the various devices involved in the experiment rather than their internal working. This approach is thus often called "black-box" or "device-independent". For a physical setup permitting for self-testing these assumptions are requirements. The settings and requirements for self testing are sufficient to test a Bell inequality and they have been used recently in Ref. [53] to perform a loophole-free violation of the Bell-CHSH inequality. We briefly present this experiment in the next section.

Event-ready CHSH-Bell test with neutral atoms
In our experiment, Alice and Bob's stations are made each with a single 87 Rb atom stored in an optical dipole trap, see Fig. 1. The two setups are independently operated, that is, they are equipped with their own laser and control systems. Two Zeeman states |m F = ±1 of the ground state manifold 5 2 S 1/2 are used as 1/2spin states. After an initial state preparation, the atoms are optically excited to emit a photon whose polarization is entangled with the atomic spin states, see Fig. 3 Figure 1: Sketch of self-testing based on violation of Bell's inequality with entangled atoms separated by a large distance. Each "device" of Alice and Bob is an independent apparatus for trapping and manipulating single atoms. Entanglement between the atoms is generated by entangling the spin of each atom with polarization of a single photon. The photons are coupled into single mode fibers and overlapped at a fiber beamsplitter. Coincident detection of two photons in Charlie's device heralds the entanglement. Alice and Bob then use their random number generators (RNGs) to select a measurement setting for fast and efficient read-out of the atomic state based on state selective ionization using particle detectors (CEMs) to detect the created ions (i) and electrons (e).
Alice's location where a Bell state measurement is implemented with a beamsplitter followed by a polarizing beamsplitter at each output port and four single photon detectors, see Fig. 1. The atom excitation procedure is synchronized on a timescale that is much shorter than the photon duration. Careful adjustment of experimental parameters ensures a spectral, temporal and spatial mode overlap of photons close to unity [19]. This allows us to achieve a high two-photon interference quality limited mostly by twophoton emission effects of a single atom. The joint measurement performed on these photons distinguishes two out of the four Bell states and ideally projects the atoms into either of the two states |ψ ± = (|↑ x |↓ x ± |↓ x |↑ x )/ √ 2 according to the outcome. Depending on the loading rate of the traps, 1 to 2 successful Bell state measurements are obtained per minute. At each success, a signal is sent to Alice and Bob and triggers setting choices. The analysis basis is selected by the output of a fast quantum random number generator, which is based on counting photons emitted by an LED with a photo-multiplier tube [53,55]. The measurement outcome is obtained by a spin-state dependent ionization with a fidelity of 97% on a timescale ≤ 1.1µs. Given that Alice's and Bob's locations are separated by 398 m, this warrants space-like separation of the measurements. Although (strict) space-like separation is not a necessary condition for self-testing, it is a strong guarantee that Alice and Bob's measurement devices are indeed separated from each other and that information about the setting of one party is not available to the other one upon measurement, i.e. for assumptions 2 and 4 above.

CHSH Bell inequality
Let us label the measurement settings x = 0, 1 and y = 0, 1 for Alice and Bob respectively, with outcomes a = 0, 1 and b = 0, 1 for each spin measurement. For each pair of settings, we define the correlator E xy = ab (−1) a+b P (a, b|x, y) where P (a, b|x, y) is the conditional probability of observing outcome a and b when choosing the settings x and y. This allows us to define the Bell-CHSH value, given by The latter is upper bounded by 2 for any local causal theory [25]. A significant violation of this bound can thus rule out this possibility, as conclusively demonstrated earlier [53], see also [50][51][52]. Note that here the values of 0 and 1 for the settings and outcomes were assigned arbitrarily, therefore, any of the 8 relabellings of Eq. (1) equally qualifies as a valid definition of the quantity S [56]. With fixed measurement settings, such equivalent rewritings of the CHSH expression may be necessary to obtain a violation of the local bound with different Bell states.

Self-testing a Bell state
Given assumption 1 above, we can associate to each measurement of Alice and Bob quantum observables A x and B y acting on two Hilbert spaces H A and H B of unknown dimension. Also, we can define the quantum state shared by the two parties as ρ AB ∈ L(H A ⊗ H B ). We emphasize that the internal functioning of the source and measurement boxes do not need to be known. We simply attribute a quantum state and measurement operators to the actual implementation.
Our aim is to identify the actual state ρ AB from the observed statistics only. More precisely, we wish to estimate its fidelity with respect to a maximally entangled state of two qubits, that is (2) where the maximization is over all local tracepreserving maps Λ A/B : H A/B → C 2 . The role of these maps Λ A/B is to identify the subsystems inside the unknown Hilbert spaces H A/B in which ρ AB can be compared to the desired state. Given an observed Bell-CHSH value S, the Bell state self-testing fidelity is defined as the minimum fidelity of the unknown quantum state ρ AB which is compatible with the violation, i.e.
where the correlators are now given by This quantity captures the relation between ρ AB and the singlet state |ψ − , one representative Bell state, that can be inferred from observed statistics: if the quantum state is separable, then F ≤ 1 2 ; on the other hand, if F = 1, then we have the guarantee that local maps exist which identify perfectly a Bell state within the state ρ AB , because this is the case for all admissible quantum realizations.
It has been shown that the self-testing fidelity F can be directly related to the sole knowledge of the Bell-CHSH value S [34]. The tightest known relation is given by [37] F ≥ f (S) = max 40, 12 + (4 + 5 80 . (4)

Statistical analysis
The previous formula holds in the limit where the CHSH value S is known exactly. In order to analyze a real experiment with finite statistics, we consider that each run i = 1, . . . , n is characterized by an (unknown) CHSH value S i and fidelity F i . This fidelity could be different at each round, and depend on past events. We are then interested in making a claim on the average fidelity Other works have considered a different figure of merit for the certification of states in a non-I.I.D. setting [32]. In Appendix C, we show that the two approaches are equivalent to each other and that the numerical value provided by the average fidelity (5) has the advantage of being a direct quantifier of the source quality.
Assuming that the measurement settings are chosen independently by both parties and with a maximum bias τ with respect to a uniform distribution, i.e. 1/2 − τ ≤ P (x), P (y) ≤ 1/2 + τ , we show in the Appendix B that is a one-sided confidence interval for F with confidence level 1 − α.
with S u is the average CHSH value observed over the n rounds assuming a uniform sampling of the settings, i.e. following Eq. (40) from the Appendix B, and I −1 is the inverse regularized incomplete Beta function, i.e. I y (a, b) = x for  Figure 2: Expected self-testing fidelityF resulting from the pre-selection model (lines) for different confidence levels (CL) as a function of the starting time of the acceptance time window for heralding events t s (at a fixed t e ). The optimal start times for for different confidence levels according to the model are shown in Tab. 1. In comparison the self-testing fidelity for the measured data (|ψ − , symbols) evaluated with the acceptance time window starting with t s . Note that the model for the pre-selection is ab-initio and not a fit for the data. For details of the model and a comparison to the measured data see Appendix.

Preselection
In contrast to results presented in Ref. [53], where all registered events were taken into account, here we use a pre-selected set of events to compute the Bell-CHSH violation and the subsequent self-testing fidelities of heralded atomic states. This selection is based on a physical model which takes into account detrimental two-photon emission effects of a single atom and allows to define pre-selection criteria, here a time-window for acceptance of photons in the BSM, to improve the fidelity of the entangled atom-atom state. Details can be found in the Appendix A. Importantly, these considerations are not based on the results observed during the experiment. They are based on an ab-initio model of the underlying excitation and emission processes. Therefore, these considerations allow for the determination of a significance level and desired amount of data prior to the data acquisition stage, in agreement with the requirements of confidence intervals construction. This selection can then be seen as a pre-selection of the data, or equivalently, as a state preparation. In particular, it does not open the detection loophole or introduce expectation bias [57].

Results
For the evaluation we use the data of events heralding the |ψ − state from the loopholefree Bell test [53] taken between 05.02.2016 and 24.06.2016 (25211 events). Fig. 2 shows the resulting lower bound of the average fidelityF for the ab-initio model and the data set using the same pre-selection as a function of the acceptance window starting time t s for different confidence levels. The model allows to determine the acceptance time window start time t s and end time t e for an optimal expected lower bound for the fi-delityF for each confidence level shown in Fig. 2.
The results for the pre-selected data are shown in Tab. 1. For calculation of the lower bound of the fidelityF we consider bias of the RNGs bounded by τ = 6.3 × 10 −4 (arsing from the "paranoid" model for the predictability [53]).
The lower bound of fidelity exceeding the value of 0.5 (at a confidence level of up to 1−1.0×10 −7 for t s = 746 ns) represents the first deviceindependent certification of a distributed entangled state. Moreover, an evaluation of the full data set without any pre-selection yields a Bell state fidelity of 0.5061 at 99% confidence and a Bell state fidelity larger than 0.5 can be certified even at a significance level as high as 99.7%.
As a comparison, note that a confidence interval could also be constructed from Hoeffding's inequality [58], yieldinĝ The conclusion obtained with this inequality would however be significantly weaker 2 . For instance, the claim that the fidelity F is nontrivial (i.e. ≥ 1/2) for the whole data set would not be statistically significant. Indeed, the corresponding statistical level is α = 6.5%, i.e. about 20 times larger than guaranteed by our bound. The average fidelity guaranteed with pre-selection would also be significantly lower. twice closer to the trivial value of 1/2 compared to 0.5550. Additionally, we applied our method to the data of the table-top Bell test performed between two ions separated by 1 meter reported in [29]. Our statistical analysis yields an average Bell violation of 2.2715 or higher at 99% confidence level, clearly above the threshold 2.11. This sets a lower bound on the Bell state fidelity of 61.46% at this confidence level, hence guaranteeing for the first time that the states distributed in this experiment had a significant average Bell state fidelity.
Finally, we applied our method to data from [59] obtained at a distance of 1.3 km. Due to the limited number of events (545), the method can only guarantee a fidelity larger than 50% with a confidence of ∼ 94.2%. Still, this demonstrates that the method can be used in different systems without the need of knowing their specific details.

Discussion
We have derived a bound on the average fidelity of a measured state with respect to a Bell state from the sole knowledge of the observed Bell-CHSH value which is free of the I.I.D. assumption. This bound was achieved by constructing a non-I.I.D. confidence interval for the sum of n independent binary random variables. We used it to quantify device-independently the quality of a bipartite state distributed over 398 m in a real-world elementary quantum network link. These results guarantee that this link is suitable for an integration in a quantum network, either directly or as a part of a quantum repeater.

A Preselection of heralding events
To allow filtering as a preselection we have developed an ab-initio physical model independently of our measurement results to find an optimum filtering based on the model only. This model describes the photon emission process of a single atom excited by a short laser pulse and takes into account all important processes within its multilevel structure. Thereby we are able to calculate the expected fidelity for the entangled state of two atoms heralded by a two-photon coincidence at a certain time. The full description of the model goes far beyond the focus of the present work and can be found in [60,61], here we only present a brief sketch of it.
For generation of a photon whose polarization is entangled with the atomic spin state, the atom is excited by a laser pulse resonant to the transition 5 2 S 1/2 , F = 1 → 5 2 P 3/2 , F = 0. The temporal shape of the pulse is approximately Gaussian with a FWHM of 22 ns, see Fig. 4. After the successful emission of a photon, ideally, the atom should not interact with the excitation laser due to selection rules, see Fig. 3(a). In practice, however, the atomic state remains weakly sensitive to the excitation laser due to two reasons. First, there unavoidably are small polarization misalignments of the excitation laser, i.e. its polarization is not perfectly aligned along the quantization axis (imperfect π-polarization), allowing for a reexcitation of the 5 2 P 3/2 , F = 0, m F = 0 level, see Fig. 3(b). Second, off-resonant scattering via the 5 2 P 3/2 , F = 1 level is possible (Fig. 3(c),(d)). Moreover, before the emission of a photon into the desired mode takes place, there is a finite probability that the atom emitted a first photon in a π-transition, which is not collected by the optics. These multiple photon emissions are detrimental for the quality of the atomic state announced by detection of the photons in the Bell state measurement in two different ways. On the one hand, the state of the atom can be changed by scattering additional photons. On the other hand, the interference quality of photons is reduced since the temporal shape and coherence of the photonic wavepackets is affected.

B Finite statistics analysis
In this section, we detail the construction of the confidence interval on the average singlet fidelity reported in the main text.

B.1 Model
In the experimental situation described in the main text, the settings used after the i th heralding event can be described by two random variables X i and Y i . These variables follow a global probability distribution Importantly, the unwanted multiphoton processes happen predominantly during the excitation. Thus, to filter them, we only accept the detection events for which the first detector click is obtained after a time t s when the excitation laser pulse is essentially off, see Fig. 4. Additionally, to reduce the dark counts contribution to the heralding event we define a maximal time t e for the detection of the second photon. A later start time t s increases the entanglement swapping fidelity and by this the expected S-value for the CHSH inequality but at the expense of the obtained events number, see Fig. 5. Since the measured S-values will depend not only on the entanglement swapping fidelity but also on other properties, e.g., the atomic state measurement fidelity and the coherence time of the entangled states, we use the experimental parameters as specified in [53] to predict the experiment's S-value. The optimal selection of the time window of [t s = 748 ns, t e = 895 ns], considering Eq. (7) for a 99% confidence interval, reduces the number of heralding events by approximately a factor of 2 but the atoms are expected to be in an entangled state of a higher quality.   for 99% confidence. The contribution of t e to the fidelity was found to be negligible, it was fixed at t e = 890 ns. where x, y ∈ {0, 1} n for a binary choice of settings. Similarly, two random variables A i and B i can be used to describe the outcomes observed upon measuring the state in the i th round. By assumptions 2-4 of the main text, the settings and outcomes follow a joint probability distribution of the form Here, a, b ∈ {0, 1} n are the possible outcome strings in the binary case, describes the behavior sampled in the i th round and past i {a j , b j , x j , y j } j<i stands for any information available from the past of round i such as the previous settings and outcomes. The settings distribution can be decomposed into the measurement rounds in a similar fashion as (13) We associate to each measurement round i the CHSH value a+b+xy P i (a, b|x, y, past i ). (14) This quantity S i|past i is the expectation value for the S parameter given in Eq. (1) for the given round i. We also define the singlet fidelity F i|past i , which is bounded according to Eq. (4) as Note that these statistical parameters may be different for all rounds i and may depend on past events.

B.2 Estimation
Before focusing on the fidelity, our figure of merit, let us estimate the Bell contribution corresponding to a given round i. For this, we introduce the statistic where χ is the indicator function, i.e. χ(condition) = 1 if the condition is true and χ(condition) = 0 for a false condition, ⊕ is the addition modulo 2, and the term in the denominator (17) refers to the probability with which the observed settings have been sampled, a notation customary in statistics (c.f. the usual definition of Fisher information for instance). The expectation value of this estimator is directly related to the CHSH violation on round i given the past: This expression thus provides a good estimation of the Bell violation contribution of round i. Note that the relation (18) is valid for all distribution of the settings which is independent from A and B's behavior according to Eq. (10).
In the case where the settings are chosen uniformly, i.e. P i (x i , y i |past i ) = 1 4 , the random variable T i|past i is a Bernoulli variable whose only possible values are 0 and 1. It can then be interpreted as a binary game which is either won . The CHSH contribution of round i can then be reinterpreted in terms of the winning probability q i|past i = P (T i|past i = 1) of this game, such that

B.3 Settings choice bias
In practice, it may be difficult to guarantee that the choice of settings is exactly uniform. One can then resort to a partial characterization of the settings' distribution. For instance, consider the case where the settings of Alice and Bob are chosen independently as with and we only have the guarantee that the local biases are bounded |τ x i |, |τ y i | ≤ τ by some maximal value τ ≤ 1 2 . In this case the statistic (16) as well as the CHSH value Eq. (14) cannot be evaluated directly. We can nevertheless bound its behavior. For this, let us then consider the statistic that would correspond to a uniform choice of settings As mentioned before, this statistic is a Bernoulli random variable taking value either 0 or 1. Its winning probability is q u , and can be evaluated without the knowledge of the settings distribution. It's expectation value is given by Let us now consider this sum. Without loss of generality we set 0 ≤ τ y ≤ τ x ≤ τ , all the other cases directly follow by a permutation of the outcomes or the exchange of x and y. One has 2 x,y where we used f xy ∈ [0, 1], τ x + τ y + 2τ x τ y ≥ 0, −τ x + τ y − 2τ x τ y ≤ 0 and −τ x − τ y + 2τ x τ y ≤ 0.
For the last term one finds leading to x,y which holds for all allowed values of τ x and τ y . Plugging this inequality in Eq. (26) then gives Therefore, we obtain meaning that a lower bound on the winning probability q u i|past i of the uniform statistic T u i|past i gives rise to a lower bound on q i|past i . In order to estimate q i|past i with a distribution of settings which is not fully known, we can thus safely estimate the CHSH value with the statistic (23), effectively assuming that the settings are chosen uniformly, and then correct the winning probability q u i|past i according to the value of τ , as expressed in (32). This provides a lower bound on the actual winning probability q i|past i of (16).
To simplify the notation, we now drop the explicit conditioning on the past, and thus simply write e.g. S i , q i , T i and F i for the quantities introduced above.

B.4 Bounding the fidelity
We construct a statistical parameter for the whole experiment corresponding to the average Bell state fidelity: Thanks to the convex relation between the CHSH violation and the singlet fidelity Eq. (4), this quantity can be bounded from the average CHSH violation S as or equivalently, using relation (19), from the average winning probability q = 1 n i q i as In particular, a left-sided confidence interval for q gives rise to a left-sided confidence interval for the singlet fidelity. By relation (32), a left-sided confidence interval for q u = i q u i gives rise to a left-sided confidence interval for q, and thus also for the singlet fidelity: Let us thus focus now on the average winning probability q u .

B.5 A confidence interval for the average winning probability
The random variables T u i in Eq. (23) being estimators for the parameters q u i , we use their average to estimate q u . This gives rise to the following effective CHSH value which can be evaluated in practice directly from the observed data, without assumption on the distribution of measurement settings. Note that each random variable T u i is a Bernoulli variable with parameter q u i . Therefore, T u is a socalled (renormalized) Poisson binomial random variable. The distribution probability of such a random variable in terms of the average parameter q u has been characterized by Hoeffding in 1956 [62]. We recall this result here.
Theorem B.1 (Hoeffding, 1956). Let T = Bernoulli variables T i with parameters q i . If c and d are two integers such that This theorem says that within all sets of n choices of Bernoulli variables {T i } n i=1 with a fixed average parameter q = 1 n i q i , the one producing the largest tail distribution for the average variable T is the set of n identically-distributed Bernoulli variables with q i = q ∀i. The tail probability then follows a binomial distribution. Since q i = q ∀i is an admissible parameter value, this bound is tight.
We recall that the Binomial cumulative distribution can be expressed in terms of the regularized incomplete Beta function I (43) or equivalently We then denote the inverse regularized incomplete Beta function by I −1 , i.e. I y (a, b) = x for y = I −1 x (a, b). Note that even if Thm. B.1 applies to independent (though not necessarily identically distributed) random variables, it is still useful in our context, in which we do not wish to assume rounds to be either independent or identically distributed. The main reason for that is that our figure of merit is given by the average winning probability q u defined as the average of individual winning probabilities q u i = q u i|past i conditioned on past events. Hence, even if the physical process under study may depend on previous rounds, the random variables on which we would like to apply the theorem are statistically independent from each other: . Indeed, the random variable T u i of each round i is fully characterized by a single and well-definied parameter q u i . We can thus use this result to construct a confidence interval for the average CHSH winning probability q u . Theorem B.2. Given Bernoulli random variables T i with parameter q i for i = 1, . . . , n and 0 ≤ α ≤ 1/2, the interval [q, 1] with the random variablê Proof. We need to show that for all possible sets of Bernoulli variables characterized by parameters 0 ≤ q i ≤ 1 with 1 n i q i = q. Condition (46) states that whatever the unknown distribution of the Bernoulli variables T i happens to be, the value ofq computed from them (the random variableq depends on the observed T ) can be higher than the actual parameter q only with probability at most α.q then constitutes a reasonable lower bound for the parameter q.
The case for which nT = 0 is clear. We can thus assume that nT ≥ 1. But before starting, let us introduce the function This function describes the trade-off between the parameters q and z in I q (z, n − z + 1). Let us remember that the incomplete beta function I q (α, β) is the cumulative distribution function for the beta distribution with parameter α and β. Therefore, I q (z, n − z + 1) is strictly increasing on q ∈ [0, 1] even for non-integer values of n and z. At the same time, 1−I q (z, n−z+1) seen as a function of z ∈ R is a cumulative distribution for the continuous binomial distribution with parameter q, so it strictly increases with z ∈ [0, n + 1] [63]. In other words, I q (z, n − z + 1) strictly decreases with z. Since g(z) is defined as the value of q which leaves I q (z, n − z + 1) invariant (and equal to α) when z changes, it is a strictly increasing function of z ∈ [0, n + 1]: increasing z increases I q (z, n − z + 1) unless q increases as well.
Let us now write P (q ≤ q) = P (nT = 0) where P (nT = k) is the probability distribution of the sum nT of arbitrary Bernoulli random variables and d is the largest integer such that g(d − 1) ≤ q, hence g(d) > q. The sum contains all terms between 0 and d because g(z) is an increasing function. Another implication of the monotonicity of I q (z, n − z + 1) with respect to q, is that its inverse function I −1 α (z, n − z + 1) increases with α. Therefore for α ≤ 1/2. I −1 1/2 (z, n − z + 1) is the median of a beta distribution, which can always be bounded by its mean [64]: Therefore, we have Since g(d) ≥ q by the definition of d, we obtain By the strict monotonicity of g, this implies that the condition d ≥ nq is satisfied. So we can use Thm. (B.1) to lower bound the probability Eq. (48) by the binomial case: Using Eq. (43) we obtain Since g(d) ≥ q, or and I q (d, n − d + 1) is an increasing function of q we can write: where we applied the incomplete beta function to both sides of Eq. (55) and used the monotonicity of the beta function again in the last line. Combining with Eq. (54) completes the proof

C Relation with the global fidelity
In this work, we quantify the quality of a multiround state by its average Bell state fidelity over all rounds: see Eq. (5) of the main text. Other multi-round self-testing works have rather used the global fidelity [32] F g = φ + ⊗n ρ φ + ⊗n , were φ + ⊗n is the state of n copies of an maximally entangled two-qubits state. The quantifier F g was used both in the sequential [31] and parallel repetition setting [39,65,66]. In this appendix, we discuss the relation between these two fidelities. As we show below, strict bounds relate the two fidelity definitions, meaning that they are operationally equivalent (up to some rescaling and finite correction), see also [67]. Although these two fidelities are equivalent to each other, it is worth noting that their actual values scale differently in presence of a finite level of experimental noise. To see this, consider the case of repeatedly measuring a Werner state at each round i = 1, . . . , n. In this case, the expression given in Eq. (59) decreases with the number of rounds: F g = (1 − ) n 1 − n for small . Therefore, this quantity does not directly reflect the quality of the setup that was used to create the state: knowledge of the number of rounds n is needed to deduce the value of from F g . In contrast, the value of Eq. (58) is here F = 1 − for all n. Hence, the average fidelity does not depend on the number of rounds n performed during the experiment and directly reflects the quality of the source.
For concreteness and simplicity, we now consider the estimation of both fidelities above on an arbitrary global state ρ and with fixed extraction maps. This case is compatible with both the sequential and parallel repetition scenarios [68].
Theorem C.1. When evaluating (58) and (59) an a global state ρ ∈ L(H ⊗n A ⊗H ⊗n B ), the following inequalities hold: Moreover, these bounds are tight.
Proof. Let us decompose the state ρ across the n rounds. For each round, we further decompose the Hilbert space of Alice and Bob into a first part spanned by φ + and the rest of the space. Since in the case of both fidelities we are only interested in the overlaps with the φ + state for each round, we can neglect all coherence between these subspaces. Without loss of generality, the full state can then be written in the form Here, c v ≥ 0 ∀ v, v∈{0,1} n c v = 1 and σ i v are quantum states s.t. σ i v = φ + φ + for v i = 0 and Tr σ i v φ + φ + = 0 otherwise. Noting that the single round fidelities in (58) are given by F i = φ + ρ i φ + , where ρ i is the partial trace of ρ over all rounds except round i, the two fidelities can now be written explicitly in terms of the c v coefficients of ρ: F g = c (0,0,...,0) .
Since the component c (0,0,...,0) appears in both expressions, it is clear that F g cannot be larger than and F i for any i. Therefore, it also cannot be larger than their mean: F ≥ F g , and we have the upper bound of Eq. (61). The choice c 1,1,...,1 = 1 − c 0,0,...,0 saturates this bound.
To show the opposite bound, we first note that the quantities F and F g are invariant under permutation of the rounds. We can thus symmetrize the state (62) and express it in terms of just n+1 parameters d j : after symmetrization, ρ becomes The fidelities then take the form and we have the normalization condition and positivity condition d j ≥ 0. We can now write as desired. Here, we used the relations above and (n − 1) n j − n n−1 j−1 ≥ 0, which is true for j ≤ n − 1. The inequality 1 − F g ≥ n(1 − F) is saturated for the choice d n−1 = (1 − d n )/n, d j = 0 ∀j ≤ n − 2.