Optimized Entanglement Purification

We investigate novel protocols for entanglement purification of qubit Bell pairs. Employing genetic algorithms for the design of the purification circuit, we obtain shorter circuits achieving higher success rates and better final fidelities than what is currently available in the literature. We provide a software tool for analytical and numerical study of the generated purification circuits, under customizable error models. These new purification protocols pave the way to practical implementations of modular quantum computers and quantum repeaters. Our approach is particularly attentive to the effects of finite resources and imperfect local operations - phenomena neglected in the usual asymptotic approach to the problem. The choice of the building blocks permitted in the construction of the circuits is based on a thorough enumeration of the local Clifford operations that act as permutations on the basis of Bell states.

We investigate novel protocols for entanglement purification of qubit Bell pairs. Employing discrete optimization algorithms for the design of the purification circuit, we obtain shorter circuits achieving higher success rates and better final fidelities than what is currently available in the literature. We provide a software tool for analytical and numerical study of the generated purification circuits, under customizable error models. These new purification protocols pave the way to practical implementations of modular quantum computers and quantum repeaters. Our approach is particularly attentive to the effects of finite resources and imperfect local operations -phenomena neglected in the usual asymptotic approach to the problem.
The eventual construction of a scalable quantum computer is bound to revolutionize both how we solve practical problems like quantum simulation, and how we approach foundational questions ranging from topics in computational complexity to quantum gravity. However, numerous engineering hurdles have to be surmounted along the way, as exemplified by today's race to implement practical quantum error-correcting codes. While great many high performing error-correcting codes have been constructed by theorists, only recently did experiments start approaching hardware-level error rates that are sufficiently close to the threshold at which codes actually start to help [1,2]. A promising approach is the modular architecture [3,4] for quantum computers based on superconducting circuits [5], trapped ions [6,7], NV centers [8], or other systems [9][10][11]. The central theme is the creation of a network of small independent quantum registers of few qubits, with connections capable of distributing entangled pairs between nodes [4,12]. Such an architecture avoids the difficulty of creating a single complex structure as described in more monolithic approaches and offers a systematic way to minimize undesired crosstalk and residual interactions while scaling the system. Moreover, the same modules might also be used for the design of quantum repeaters for use in quantum communication [13,14].
Experimentally, there have been significant advances in creating entanglement between modules, with demonstrations in trapped ions [6,15], NV centers [16,17], neutral atoms [18], and superconducting circuits [5]. However, the infidelity of created Bell pairs is on the order of 10%, while noise due to local gates and measurements can be much lower than 1%. Purification of the entanglement resource will be necessary before successfully employing it for fault-tolerant computation or communication. Although various purification protocols have been proposed [7,[12][13][14][19][20][21][22], there is still a lack of systematic comparison and optimization of purification circuits, as the number of possible designs increases exponentially with the size of the circuits. In this paper we develop tools to generate and compare purification circuits and we present multiple purification protocols far outperforming the current best contenders [12,14] over a wide range of realistic hardware parameters. We review the notion of an entanglement purification circuit and present our approach to generating and evaluating such circuits. We compare our results to recent proposals for practical high-performance purification circuits, and finally discuss the design principles and key ingredients for efficient purification circuits.
Importantly, we pay particular attention to the limitations imposed by working with finite hardware resources. One can find many great purification schemes in the literature, which reach perfect fidelities at high yield in the asymptotic regime (e.g. [23]), however such asymptotic resource theories neglect the imperfections of the purification hardware. Such imperfections, in the local gates or measurements, which are the limiting factor in realworld hardware, are address by our work.
Purification of Bell Pairs In an entanglement purification protocol, two parties, Alice and Bob, start by sharing a number of imperfect Bell pairs and by performing local gates and measurements and communicating classically, they obtain a single pair of higher fidelity. For conciseness we use A, B, C, and D to denote the Bell basis states.
The imperfect pairs are described in the Bell basis (eq. 1) by ρ 0 = F 0 |A A| + 1−F0 3 (|B B| + |C C| + |D D|). If we have a state of multiple pairs (like AA), the first letter will always denote the pair to be purified, while the rest denotes the sacrificial pairs.
To explain the roles of the local gates and coincidence measurements let us consider, in Fig. 1, the simplest purification circuit, in which Alice and Bob share two Bell Figure 1. A simple purification circuit of width 2 (i.e. 2 local qubits for Alice or Bob.). The upper half is ran by Alice, while the bottom half is ran by Bob. The dashed lines correspond to the initialization of registers with low-quality "raw" Bell pairs. The top and bottom register correspond to the two qubits of the sacrificial Bell pair. A coincidence measurement in the Z basis marks a successful purification procedure.
pairs and sacrifice one of them to purify the other. One way to explain the circuit is to describe it as an errordetecting circuit: If we start with two perfectly initialized Bell pairs in the state A, then the coincidence measurement will always succeed; however, if an X error (a bit flip) happens on one of the qubits, that error will be propagated by the CNOT operations and it will cause the coincidence measurement to fail (the two qubits will point in opposite directions on the Z axis). It is important to note that only X and Y errors can be detected by this circuit, but not Z errors (phase flips) as the coincidence measurement can not distinguish A from D states. One needs a circuit running on more than two Bell pairs to address X, Y, and Z errors.
For the purpose of designing an optimal purification circuit, it is enlightening to also interpret the local operations in terms of permutations of the basis vectors [24,25]. The initial state of the 4 qubit system is described by the density matrix ρ 0 ⊗ ρ 0 , or equivalently by the 16 scalars in its diagonal in the Bell basis {AA, AB, . . . , DD}. The "mirrored" CNOT operations that both Alice and Bob perform result in a new diagonal density matrix with diagonal entries being a permutation of those of the original density matrix. A coincidence measurement on the Z axis follows, which results in projecting out half of the possible states, i.e. deleting 8 of the scalars and renormalizing and adding by pairs the other 8. The permutation operation and coincidence measurement have to be chosen together such that this projection (when the coincidence measurement is successful) results in filtering out many of the lower-probability B, C, and D states. A detailed run through this example is given in the supplementary materials [26].
If we restrict ourselves to finding the best "single sacrifice" circuits, i.e. circuits that sacrifice one Bell pair in an attempt to purify another one, we need to find the best set of permutations and measurements. There are 3 coincidence measurements of interest -coincidence in the Z basis which selects for A and D; coincidence in X basis which selects for A and C; and anti-coincidence in the Y basis which selects for A and B. All of those measure-ments can be implemented as a Z measurement preceded by a local Clifford operation.
The group of possible permutations is rather complicated. Firstly, all permutations of the Bell basis are Clifford operations because the permutation operation can be written as a permutation on the stabilizers of each state (moreover, we do not have access to all 16! permutations, as only operations local for Alice and Bob are permitted). This restriction permits us to efficiently enumerate all possible permutations and study their performance. The software for performing this enumeration is provided with this manuscript. The enumeration goes as follows [27,28]. There are 11520 operations in the Clifford group of two qubits C 2 . After exhaustively listing all operations in C ⊗2 2 we are left with 184320 unique Clifford operations that act as permutations of the Bell basis of two Bell pairs. Accounting for 16 different operations that change only the global phase of the state (e.g. XX which maps B to -B) we are left with 11520 unique permutations. Restricting ourselves to permutations that map A to A cuts that number by a factor of 4 for each pair, which leaves us with 720 unique restricted permutations. Out of those, 72 operations do not change the fidelity (72 = 2 × 6 × 6 correspond to two operations (the identity and SWAP) under the six possible BCD permutations for each pair). The remaining 648 permitted operations perform equally well when purifying against depolarization noise, if they are used with the appropriate coincidence measurement. Half of them can be generated from the mirrored CNOT operation from Fig. 1 together with BCD permutations performed before or after on each of the two pairs. The other half can be generated if we also employ a SWAP gate (such gate can be of importance for hardware implementations that have "hot" communication qubits and "cold" storage qubits[7]). When we break the symmetry of the depolarization noise and use biased noise instead, all of these operations still permit purification, but a small fraction of them significantly outperform the rest. So far, we have only counted purification circuits with 2 local qubits. We may increase the width (i.e., number of local qubits) to boost the performance of the entanglement purification. However, the number of possible purification circuits grows exponentially with not only the length, but also the width of the circuit. Even for relatively small circuits (e.g., length 10 and width 3), there will be > 10 40 different configurations, if we use the operations discussed above, which are impossible to exhaustively compare. Therefore, we need an efficient procedure to choose the appropriate permutation operation at each round of our purification protocol.
Discrete optimization algorithm The design of circuits, whether quantum or classical, lends itself naturally to the use of evolutionary algorithms with numerous interesting examples in electronics, robotics, and experiment design [29,30]. An evolutionary algorithm is an optimization algorithm particularly useful for cost func- Figure 2. Comparing the circuits designed by our genetic algorithm to prior art (considering circuits acting on 3 pairs of qubits). Each colored circle marks a unique circuit. The horizontal axis is the infidelity of the final purified Bell pair. The vertical axis is the probability of success of the protocol. Each upside-down triangle represents a circuit found in the literature ("deutsch" [13, 19, 20]; nested versions of the same protocol [13]; "double selection" protocols [22]; "EXPE-DIENT" and "STRINGENT" -some of the best known purification circuits [12]). The red triangle marks one of our circuits, that we use for later comparisons. Evaluations done at p2 = η = 0.99 and F0 = 0.9. tions over discrete parameter spaces. A candidate solution (a point in parameter space) plays the role of an individual in a population subjected to simulated evolution. Depending on the particular implementation, each individual generates a number of children, whether through mutations or through sexual modes of reproduction with other individuals in the population. The population is then culled so only the fittest candidate-solutions remain and the procedure is repeated for multiple generations until the convergence criteria are fulfilled.
In our particular implementation[31] the individuals are quantum entanglement purification circuits. We restrict ourselves to circuits that purify the entanglement between two parties, Alice and Bob, without the involvement of a third party. The circuit can contain any of the previously discussed coincidence measurements (coincidence in Z, coincidence in X, anti-coincidence in Y). The circuit is also permitted to contain the "mirrored" CNOT operation from Fig. 1 together with any permutation of the {B,C,D} states applied before the CNOT operation. Applying the {B,C,D} permutation after the CNOT is unnecessary as the next operation would already have that degree of freedom. However, the final result will have a biased error, so a single {B,C,D} permutation at the very end might be required.
Operation and measurement errors The design of the purification protocol is sensitive to imperfections in the local operations as well. We parametrize the operational infidelities with the parameters p 2 , where 1 − p 2 is the chance for a two-qubit gate to cause a depolarization, and η, where 1 − η is the chance for a measurement to report the incorrect result.
No memory errors or single-qubit gate errors are considered in our treatment as they are generally much smaller [32,33], but they can be accounted for in the same manner.
After a measurement the measured qubit pair is reset to a new Bell pair and Alice and Bob can again use it as a purification resource. The initial fidelity of each Bell pair is a parameter F 0 set at the beginning of the algorithm. Similarly, we set the measurement fidelity η and the two-qubit gate fidelity p 2 at the beginning. More complicated settings with different error models are possible as well -of special interest would be circuits adapted for registers containing a "hot" communication side and a "cold" memory side [7], where one would also add the SWAP gate to the permitted genome.
The "fitness" that we optimize is the fidelity of the final purified Bell pair, however different weights can be placed on the "infidelity components" along |B , |C , and |D if needed. In practice, the genetic algorithm is fairly robust to changing parameters like the population size, mutation rates, or number of children circuits. We provide both pregenerated circuits and ready-to-run scripts to generate circuits from scratch.
Importantly, depending on the parameter regime and error model, different circuits would be the top performers. This showcases the importance of rerunning the optimization algorithm for the given hardware. An example of such difference is provided in the supplementary materials.
Comparing with Prior Art We generated a few thousand well performing circuits of lengths up to 40 operations and acting on up to six pairs of qubits (and the algorithm can easily generate bigger circuits, but soon one hits a wall in performance due to the imperfections in the local operations as discussed below). The zoo of circuits we have created can be explored online (see note 1), but importantly, one can generate circuits specifically for their hardware using our method. Fig. 2 shows how our circuits compare to other known circuits of width 3. We outperform all circuits in the comparison in terms of final fidelity of the distiled pair, while also having higher probabilities of success, and employing fewer resources or shorter circuits.
In Fig. 3 we compare one of the best performing circuits we know (STRINGENT from [12]) to one particular circuit we have designed. We show only Bob's side of the circuit. Alice performs the same operations and the two parties communicate to perform coincidence mea- Figure 3. Comparison of (a) the STRINGENT circuit [12] to (b) the L17 circuit obtained through optimization. The L17 circuit outperforms STRINGENT in terms of both final obtained fidelity and success probability over a wide range of error parameters. The color coding shows independent sub-circuits in the STRINGENT circuit and no such sub-circuits in our design. We show only Bob's side of the circuit. The vertical set of letters before each gate marks how the {B,C,D} states are permuted, which can be achieved with single qubit gates folded in the CNOT gate as described in the supplementary materials. While we use only CNOT two-qubit gates, we intentionally used a modified symbol in order to bring attention to the presence of these permutation operations. The small white circles after each measurement represent generation of a new Bell pair resource with fidelity F0. The histograms (c) are of the required number of Bell pairs for a completion of the protocol (taking into account any necessary restarts after a failed coincidence measurement) for STRINGENT and the optimized circuit.
surements. The shading permits us to see which qubit pairs are engaged with other pairs: Each qubit Bob possesses starts with a distinct color; The color is "contagious", i.e. two-qubit gates will "infect" the second qubit in the gate with the color of the first qubit; Measurements followed by regeneration of the raw Bell pair resource reset the color of the measured qubit. The shading clearly shows that the best protocols we find have all the qubits engaged (entangled) together, a finding consistent with the use of "multiple selection" purification protocols introduced in [22]. In contrast, conventional purification protocols have sub-circuits where only a subset of the qubits are engaged together.
A potential caveat of that "completely engaged" approach needs to be addressed: in Fig. 2 we report the probability of all measurement in a given protocol succeeding in a given run, but we do not report the following overhead. If a single measurement fails, the protocol needs to be restarted; the aforementioned conventional purification protocols, that posses sub-circuits, can redo just the failed sub-circuit (for instance, either of the two green blocks in STRINGENT from Fig. 3), instead of restarting the entire circuit. A priori, this might lead to lower resource overhead compared to our protocols (as they generally completely entangle the qubits of each register), even if we still win in terms of final fidelity and probability of success. However, a detailed evaluation of this overhead shows that even when taken into account, our protocols outperform known approaches as they require lower number of gates, both in best case scenarios and on average (see right panel of Fig. 3).
Our approach can be employed for circuits with more than two sacrificial pairs. Fig. 4 compares the perfor-mance of circuits working on 3 pairs (as above) and circuits working on 4 or more pairs of qubits. With still bigger circuits one quickly reaches a fidelity limit imposed by the finite imperfections in the last operations performed by the circuit (for hardware with perfect local operations that limitation does not exist and one would rather use larger circuits that perform many more operations per sacrifice [23,34]). Example circuits are given online (see note 1).
Operation versus initialization errors The design of efficient purification circuits needs to balance between the initialization errors (imperfect raw Bell pairs) and operation errors (imperfect local gates and measurements). As detailed in the supplementary materials, for arbitrary long purification circuits, the asymptotic infidelity is ε 2 + O(ε 2 ) where ε = 1 − p 2 (as indicated by the vertical dashed line in Fig. 4), which is only limited by the operation errors. For finite-length purification circuits, however, the initialization errors also play an important role, which determines how fast the purification circuits approach the asymptotic limit with increasing circuit length (Fig. 4). By analyzing the circuits given by the discrete optimization algorithm, we have observed that: (1) For fixed length, depending on the parameter regime and error model, different circuits would be the top performers. This showcases the importance of rerunning the optimization algorithm for the given hardware; (2) To boost the achievable fidelity, it is important to use double-selection (where two Bell pairs are simultaneously sacrificed to detect errors on a third surviving Bell pair) [22] instead of repeated single-selection (where only one Bell pair is sacrificed at each error detection step). This stems from the fact that the asymptotic in-  Fig. 2 we compare the performance of circuits acting on 3 or more pairs. For legibility, only some of the best generated circuits of each width are shown (evaluated at F0 = 0.9 and p2 = η = 0.99). Our circuits approach the limit of ε 2 = 0.005, derived in the main text and supplementary materials.
fidelity of single-selection is 7ε 8 , i.e. nearly twice that of double-selection. Moreover, multiple selection (where n > 2 Bell pairs are simultaneously sacrificed) has the same dominant asymptotic infidelity of ε 2 as double selection. Examples of both of these phenomena are given in the supplementary materials.
In conclusion, we have optimized purification circuits of fixed width using a discrete optimization approach using "building-block" subcircuits proven to be optimal. The optimized circuits outperform many other generalpurpose purification protocols in all three aspects -fidelity of purified Bell pair, success probability, and circuit length (whether measured in terms of average number of operations performed or average number of raw Bell pairs used (i.e. yield)). For purification circuits of width 2, we analyze the group structure of the Clifford operations that fulfill the locality constraints of purification. For purification circuits of width ≥ 3, we demonstrate the importance of multiple selection (using at least two sacrificial Bell pairs to simultaneously detect errors), and specify the diminishing returns of using much wider circuits. We numerically obtain efficient purification circuits that approach the asymptotic theoretical limits. Our approach of using discrete optimization algorithms is applicable to various errors models (e.g., dephasing dominated gate errors, imperfect Bell state beyond the Werner form, etc). Moreover, it can be used to optimize the purification circuits in the presence of memory errors, including additional decoherence to all local qubits during the creation of Bell pairs, and to investigate the entanglement purification of encoded Bell pairs and multi-party entanglement [12,[35][36][37][38][39].

SUPPLEMENTARY MATERIALS
The software and additional online materials are available at qevo.krastanov.org.

Model for operational errors
We consider each two-qubit gateÛ to be performed correctly with a chance p 2 and to completely depolarize the two qubits i and j it is acting upon with chance 1−p 2 . Written as density matrices, when applied to inputρ in it results in where T r i,j is a partial trace over the affected qubits and I i,j is the identity operator associated with qubits i and j. Similarly, measurement on qubit i has a probability η to properly project and measure and a probability 1−η to erroneously report the opposite result (flipping the qubit in the measurement basis). For instance, an imperfect projection on |1 reads aŝ Memory errors are not considered, but can be easily added to the optimization if required.

Purification example when operations are interpreted as permutations of the Bell basis
Consider the simple circuit from Fig. 1. As discussed throughout the main text, a useful way to represent the operations performed in the circuit is as permutations of the Bell basis. For this example we will use perfect operations (i.e. only initialization errors). The density matrix describing the system will be diagonal throughout the execution of the entire protocol as only permutation operations are performed. The following table describes how "mirrored" CNOT operation acts on the basis states ("AD" stands for "the sacrificial pair is in state D and the pair to be purified is in state A"): With this mapping we can trace how the state of the system evolves. The following table gives the diagonal of the density matrix describing the system at each step. In the table q = 1−F 3 . The measurement column assumes a successful coincidence measurement has been performed. By normalizing and tallying the states that remain in A (for the purified pair) we are left with fidelity after Another way to interpret the purification protocols is to look at them as error detection protocols. This way of thinking was used in the main text in the discussion of the limits imposed by the operational errors. Here we will repeat this discussion with more pedagogical visual aids for a particular choice of two-qubit operation and measurement. As in the main text, we will first consider a single selection circuit (where Alice and Bob share two Bell pairs and sacrifice one of them to detect errors on the other one). We are showing only Bob's side of the circuit.
We assume that Alice and Bob started with two perfect Bell pairs in the state A. Each of the two registers (the one Alice uses to store her two qubits and the one Bob uses for his) are subject to complete depolarization with probability ε = 1 − p 2 . This is equivalent to saying that for each of the registers there is probability ε 16 for one of the 16 two qubit Pauli operators to be applied to the state. Writing the possibilities down in a table (columns correspond to the possible errors on the top/preserved qubit, rows correspond to the possible errors on the bottom/sacrificial, and each cell gives the corresponding tensor product): If we are to perform a coincidence measurement immediately, we will be able to detect the errors that have occurred on the sacrificial qubit, however they are not correlated with errors that have occurred on the preserved qubit, therefore no errors on the preserved Bell pair would be detected. However, if we perform a CNOT gate, errors on either qubit will be propagated to the other one, and we will be able to detect some of the errors that have occurred on the Bell pair to be preserved by measuring the sacrificial Bell pair. Bellow we describe how the errors propagate: After the CNOT gate we have the following redistribution of errors: Out of the 8 possibilities (16 initially), 2 (II & IZ) are harmless to the preserved Bell pair and the remaining 6 are damaging, which leaves us with infidelity, to first order, 6 16 ε × 2 (the factor of 2 comes from the fact that both Alice and Bob are subject to depolarization errors).
We can now augment the circuit with another level of detection that will be able to detect the Z error on the sacrificial Bell pair: To first order any errors contributed by this extension are negligible and can not propagate back to the preserved Bell pair. The Z error that might have occurred on the middle line (and was left undetected) will now propagate to the bottom line and be detected by the coincidence X measurement leaving us with the following table: Out of the 4 undetected errors, 3 still harm the preserved Bell pair, so we are left with fidelity 3 16 ε×2. Those three are undetectable as they do not propagate to the sacrificial qubits (they act as the identity on the sacrificial qubits). As such, using bigger registers (wider circuits) would provide only small higher-order corrections.
Finally, the asymptote reached by our circuits contains one additional source of infidelity. The black vertical lines in Fig. 5 correspond to short circuits with zero initialization error. However, a real purification protocol would need multiple rounds of purification until it lowers the non-zero initialization error to the steady-state floor governed by the operational error. In this steady state an additional round of purification would be able to detect only 2 of the possible 3 Pauli error that were already present, therefore raising the bound of the achievable infidelity from 3 16 ε × 2 to 3+1 16 ε × 2 (to first order in ε). For the parameters of Fig. 5 this would correspond to an asymptote at infidelity of 0.005 which is indeed what we observe.
The vertical lines of Fig. 5 are slightly offset from the values quoted above because we used exact numerics for the plots, as opposed to the first-order calculations of this section. Figure 5. Same as 4 but we add some additional information. Dashed vertical lines, corresponding to perfectly initialized (F0 = 1, p2 = η = 0.99) short purification circuits, are shown as a guide to how well the circuits perform in terms of initialization versus operational errors as described in the main text. In "single selection" circuits each party uses registers of size two, size three for "double selection", and size four for "triple selection".

Shortest multi-pair purification circuits
In the main text we introduced circuits to be used as benchmarks of initialization-vs-operation errors. The idea was to show what performance is provided by a circuit applied to perfectly initialized raw Bell pairs, or in other words, how much damage is caused by operation errors if we start with perfect initialization (as done in Fig. 4 and Fig. 7). To do that we found with bruteforce enumeration the best "short" circuits, i.e. circuits that do not reinitialize any of the consumed Bell pairs. They are named in the manner introduced in [22]. The triple select circuit is actually a generalization of the circuit from [22]. As described in [22] and in the main text of our manuscript, double selection significantly outperforms single selection, and extending the double selection circuit to a triple selection circuit provides only modest higher-order improvements. Figure 6. A double selection circuit on the left and a triple selection circuit on the right. We show only Bob's side of the circuit. The circuit from Fig. 1 can be referred to as a single selection circuit. As explained in the main text, there are many circuits with equivalent performance, related to the given circuits by permutation of the Bell basis.
5. More about initialization errors, operational errors, and the length of the circuit In Fig. 4 the vertical lines showed the "operational error" limit which one would reach if there were no initialization errors.
To make the comparison between initialization and operational errors clearer we provide Fig. 7 which drops the "success probability" axis of Fig. 4 and instead shows how the performance varies with p 2 . In it one can see that the initialization infidelity is not a limiting factor -as long as the operational infidelity can be lowered, we can find longer circuits that iteratively get rid of the initialization error. For a sufficiently long circuit we reach a point of saturation, where, as described above, the operational error in the last operation dominates the infidelity of the final Bell pair. Similarly, if the circuit is wider (i.e. the register is bigger) we can obtain higher final fidelities for a fixed operational error, and the point saturation occurs at even lower operational error levels. Figure 7. For each family of generated circuits of various width (color) and for a given operational infidelity (x axis) we show the best achievable final Bell pair fidelity by a circuit in that family. The top plot limits the permitted circuits to length less than 30 operations, and there is a limit of less than 40 operations for the bottom plot. Three important observations can be made: (1) as long as we can lower the operational error, we can design a long enough circuit that is not affected by the initialization error; (2) a wider register outperforms smaller registers and reaches a point of diminishing returns at smaller operational errors; (3) as already mentioned, circuits of width 2 are insufficient for arbitrary suppression of the initialization error as they detect only 2 of the possible 3 single-qubit errors (X, Y, and Z). The grey lines follow the same conventions as in Fig. 4 -short perfectly initialized circuits used as a benchmark. The triple selection circuit from Fig. 4 is omitted as it can not be distinguished from the double select circuit on this scale. The "identity" line corresponds to what would happen if we simply depolarize a single perfect Bell pair with probability 1 − p.  In Fig. 8 we demonstrate the importance of optimizing your purification circuit for the exact hardware at which it would be ran. One can see that each of the three circuits outperforms the other two, only within a small interval around the parameter regime for which it was optimized.
7. Infidelity axes Figure 10. Each point corresponds to one of the circuits shown in Fig. 2. In the top plot they are colored by the final infidelity of the Bell pair produced by the circuit. In the bottom is the same plot, but the color corresponds to the length of the circuit. The 3 axes of the ternary plot correspond to the 3 components of the infidelity. As ternary plots require the 3 coordinates to fulfill a constraint, we plot the relative infidelity. For instance, the left axis (corresponding to the height of a point in the plot) shows the probability to be in ψ+ divided by the total probability to be in a state different from φ+ (the state being purified). Being in the center of the triangle means the infidelity in the final result is pure depolarization. Being in one of the corners means that one of the infidelity components dominates the other two. Being near the midpoint of a side means one of the infidelity components is much smaller than the other two. The 6 symmetries of this triangle correspond to the six permutations of {B, C, D}.
Even if the error model for the circuits and initialization is the depolarization model, the error in the final result of the purification needs not be depolarization. The infidelity in the final result has three components -the probabilities to be in states ψ − , ψ + , and φ − , respectively. Different purification circuits affect the three infidelity components differently, and giving different weights in the cost function of the optimization algorithm might be important, depending on the goal (for instance, if the purified Bell pairs are used for the creation of a GHZ state, the particular implementation might be more susceptible to phase errors, in which case that component would be assigned a higher weight). In Fig. 10 we show the distribution of the infidelity components of the purified Bell pair for each of the circuits we have generated. Of note is that longer circuits reach nearer the pure depolarization error, by virtue of lowering the infidelity to the level of diminishing returns where the depolarization from to the final operation dominates. Moreover, the results are biased to one particular type of error, due to the particular choice of "genes" described in the main text, namely, the {B, C, D} permutations are performed before the CNOT gate and measurements. This bias can be removed if necessary by applying one final {B, C, D} permutation (the six {B, C, D} permutations correspond to the six symmetries of the triangle in Fig. 10).

Implementation of the various permutation operations
Here we give explicitly what single-qubit Clifford operations are necessary in order to perform a permutation of the Bell basis. H stands for the Hadamard gate and P states for the phase gate (in parenthesizes we mark whether the permutation is a rotation or a reflection of the triangle). Even though the decomposition of these operations in terms of H and P has different lengths, in practice these operations are equally easy to implement on real hardware.

Canonicalization of generated circuits
Many redundancies can appear in the population of circuits subjected to simulated evolution. To simplify the analysis of the results we filter the generated circuits by first ensuring that for each circuit: • the first operation is not a measurement; • does not contain two immediately consecutive measurements on the same qubit; • does not contain unused qubits; • does not contain measurement and reset of the topmost qubit pair (the one containing the Bell pair to be purified); • does not contain non-measurement as a last step; and then for the non-discarded circuits we perform the following canonicalization: • reorder the qubits of the register so that the qubit closest to the top-most qubit is the one to be measured last, the second closest is measured second to last, etc; • if a two-qubit gate and a measurement commute (i.e. they affect different qubits of the register), reorder them so that the gate is always before the measurement (in an implementation of that circuit, those two operations will be executed in parallel); • if two two-qubit gates affect different qubits, put the one that affects top-most qubits before the one that affects lower qubits (in an implementation of that circuit, those two operations will be executed in parallel).
The canonicalization rules are arbitrary and any other consistent set can be used, at the discretion of the software writer. However, they ensure that two circuits that are physically equivalent are not presented multiple times in the final result, substantially lowering the circuits that need to be evaluated. The set described above is not exhaustive, as there are other, more complex, equivalences that we have not considered.

Analytical expressions for the final fidelity
Our software also produces a symbolic analytical expression for the fidelities obtained by each circuit. The quality of the raw Bell pairs is expressed as the quadruplet of probabilities to be in each of the Bell basis states (F 0 , q, q, q) where q = 1−F0 3 (more general nondepolarization error models are available as well). The purification circuit acts as a map that takes (F, q, q, q) to the quadruplet (F A , b, c, d) representing the probabilities that the final purified Bell pair is in each of the Bell basis states.
The permutations of the Bell basis (i.e. all the local Clifford operations we are permitting) are polynomial maps, i.e. the output quadruplet contains polynomials of the variables in the input quadruplet. Depolarization is a polynomial map as well. Measurement without normalization is a polynomial map, but it becomes a rational function if normalization is required.
By postponing normalization until the very last step, we can use efficient symbolic polynomial libraries like Sympy (using generic symbolic expressions is much slower than using polynomials). The final result is given as a series expansion of the normalized expression (in terms of the small parameters 1 − F 0 , 1 − p 2 , and 1 − η). Figure 11. Monte Carlo evaluation of overhead due to restarts of failed measurements. The evaluation is from the circuit from Fig. 3 at F0 = 0.9 and p2 = η = 0.99. On the left we have the histogram of completed runs in terms of how many operations a run takes to successfully complete. In each histogram, the mean value of the distribution is showed as well. On the right we have the probability for the protocol (in which reinitializations are permitted) to successfully complete in terms of how many Bell pairs were used (i.e. it is the cumulative version of the top-left plot).

Monte Carlo simulations of restart overhead
As mentioned in the main text, one needs to consider how a protocol proceeds when a measurement fails. If there is a subcircuit that can be restarted, one needs not redo the entire protocol, rather only reinitialize at the point where the subcircuit starts. However, if the top-most qubit pair, the one holding the Bell pair, is entangled with the qubit pair that undergoes a failed coincidence measurement, then the entire protocol has to be restarted. For most of our circuits, such subcircuits do not exist, but they are common among manually designed circuits. Our software automatically finds subcircuits and runs a Monte Carlo simulation of the sequence of measurements and reinitializations in order to evaluate the average resource usage as shown in Fig. 11.
The overhead estimated this way also proves to be closely related to the success probability of the given protocol, to be expected given that greater overhead implies more imperfect operations which implies higher chance of a fault (Fig. 12). Figure 12. The relationship between overhead and success probability for the designs generated by our algorithm. Longer circuits have both lower success probability and higher overhead (expended Bell pairs). However, as shown in the rest of manuscript, longer circuits approach asymptotically the upper bound of performance.