Post-selection-free preparation of high-quality physical qubits

Rapidly improving gate fidelities for coherent operations mean that errors in state preparation and measurement (SPAM) may become a dominant source of error for fault-tolerant operation of quantum computers. This is particularly acute in superconducting systems, where tradeoffs in measurement fidelity and qubit lifetimes have limited overall performance. Fortunately, the essentially classical nature of preparation and measurement enables a wide variety of techniques for improving quality using auxiliary qubits combined with classical control and post-selection. In practice, however, post-selection greatly complicates the scheduling of processes such as syndrome extraction. Here we present a family of quantum circuits that prepare high-quality |0>states without post-selection, instead using CNOT and Toffoli gates to non-linearly permute the computational basis. We find meaningful performance enhancements when two-qubit gate fidelities errors go below 0.2%, and even better performance when native Toffoli gates are available.

Rapidly improving gate fidelities for coherent operations mean that errors in state preparation and measurement (SPAM) may become a dominant source of error for fault-tolerant operation of quantum computers. This is particularly acute in superconducting systems, where tradeoffs in measurement fidelity and qubit lifetimes have limited overall performance. Fortunately, the essentially classical nature of preparation and measurement enables a wide variety of techniques for improving quality using auxiliary qubits combined with classical control and post-selection. In practice, however, post-selection greatly complicates the scheduling of processes such as syndrome extraction. Here we present a family of quantum circuits that prepare high-quality |0 states without post-selection, instead using CNOT and Toffoli gates to non-linearly permute the computational basis. We find meaningful performance enhancements when two-qubit gate fidelities errors go below 0.2%, and even better performance when native Toffoli gates are available.
Key physical implementations of quantum computers now reliably achieve two-qubit gate fidelities approaching 99.8% [23], with multiple systems reporting fidelities ranging from 99.2% to 99.6% [1,10,19,26]. This leaves SPAM (state preparation and measurement) errors as a dominant source of error in many, particularly superconducting, machines. Google's Sycamore processor experienced measurement error rates greater than 3% [1], and IBM report similar levels of measurement error [16] in their devices. Recent improvements to standard superconducting measurement error rates have not yet reduced them below 1% [20], although more is possible using multilevel qubit encodings [8]. In contrast, for ions, neutral atoms, and spins in silicon measurement error rates can be much lower (cf. IonQ's 99.3% [27] and HRL's 99.75% [2]), though achieving low rates in dense arrays remain a substantial challenge [28].
There are three broad approaches to reducing the impact of SPAM errors. The first is to improve the physical processes of preparation and measurement. In most architectures there are tradeoffs between SPAM errors and speed, as longer integration or preparation times can improve performance but take longer; see for example [8] for the benefits of measuring slowly up to T 1 limits. The second is error mitigation techniques which calibrate q 0 q 1 (a) post-selection q 0 q 1 q 2 (b) A (3, 1, 1) post-selection-free purification circuit. Figure 1: Two approaches to |0 state production. and compensate for errors by running the circuit many times and reconstructing the expected outputs post-facto. This is helpful for NISQ applications [13,25] but does little to improve the entropy extraction critical for quantum error correction, and the extension of error mitigation into the early fault-tolerant regime [24] requires SPAM errors with similar performance to two-qubit gate fidelities. Here we examine the third approach: algorithmically improving the quality of preparation and measurement using quantum logic gates.
Algorithmic improvement for measurement is straightforward. The simplest example is used in, e.g., the double species ion clock (see [15,17], and [5] for a recent example), where one readout qubit is used multiple times to estimate the clock qubit-a simple type of repetition code. More generally, CNOT gates can encode a set of measurement outcomes in a classical error correcting code on auxiliary measurement qubits, allowing recovery from sufficiently few measurement errors [14]. As these codes have very good performance and efficient decoders, we recommend their use but do not elaborate on them further here.
A conventional approach to improving preparation using post-selection is the circuit in Figure 1a. We prepare two qubits in state |0 , apply a CNOT from the first to the second, then measure the second in the computational basis. If the measurement result is 0, we output the first qubit as a high-quality |0 . If not, we reject and begin the process again.
Suppose for now that gates are perfect, but that each preparation incorrectly produces |1 with probability p 0 , and that the measurement outcome is also incorrect with probability p 0 , with all errors independent. Then the measurement result is 0 precisely when there are an even number of errors across the two preparations and one measurement. We will correctly output |0 if there are no errors, or two errors which occur on the second preparation and the measurement. Otherwise we will either output |1 or reject. Conditional on acceptance, this circuit produces post-selected |0 states with an error rate 2p 2 0 + O(p 3 0 ). Post-selection is an acceptable method for production of complex logical states such as T -magic states, where the scheduling difficulties it presents are the price we pay for implementing a logical non-Clifford gate fault-tolerantly. However, it is much less satisfactory for the large number of simple physical |0 states required for every round of syndrome extraction. Specific challenges induced by post-selection include • the need to schedule multiple attempts in case the first attempt fails; • the need to wait for measurement results and classical processing, which in some systems can take appreciable time; • the risk of measurement inducing cross-talk errors on adjacent qubits; • limited advantage when measurement errors are substantially worse than preparation errors.
result total errors post-selection purification We propose an alternative approach avoiding post-selection, with the simplest example shown in Figure 1b. This circuit fixes |000 , |010 and |001 , and takes |100 to |011 , so outputs |0 on the first wire precisely when there is at most one error on the input. Thus it produces |0 states with slightly higher error rate 3p 2 0 + O(p 3 0 ), but does so without the use of post-selection.
The outcomes for each circuit are summarised in Table 1. Our post-selection-free approach has a higher yield of |0 states, correctly processing all cases with 1 error rather than one of the cases with 2 errors. It greatly simplifies scheduling of |0 production, and avoids delays on architectures where measurement is significantly slower than applying gates. On the other hand it has higher resource requirements, requiring one additional qubit and more gates, including a physical Toffoli. 1 We call the circuit in Figure 1b a (3, 1, 1) purification circuit because it uses three qubits to prepare one high-quality |0 and can tolerate one error; we define the notation formally at the beginning of Section 1. For the (3, 1, 1) circuit and a selection of other circuits described in more detail later, Figure 2 shows the output error rate as a function of the input error rate, assuming that gates depolarise qubits with probability 0.003 and idle qubits depolarise with probability 0.001 in each round. The (3, 1, 1) circuit improves a 2% preparation error rate to a 0.5% error rate and a 1% preparation error rate to a 0.4% error rate. Figure 3 shows the thresholds a gate set must meet in order for the circuit of Figure 1b to improve preparation quality at fixed idle depolarisation rate 0.001. Contours correspond to preparation error rates. If your CNOT and Toffoli depolarisation rates place you to the left of a contour then you obtain an improvement from the (3, 1, 1) circuit.
In the rest of this paper we propose and assess the performance of a range of circuits analogous to that of Figure 1b, and quantify the trade-off in which more complicated circuits provide greater protection against preparation errors but are more vulnerable to gate errors. In Section 1 we define and describe the general properties of purification circuits, explain how to find small examples computationally, and consider ways to combine these small examples to form larger circuits. In Section 2 we describe a particular construction based on graphs inspired by examples discovered in Section 1. The simplest versions of 1 A variety of constructions could instead be used to approximate a Toffoli gate up to relative phases.
For example, the circuit q0 q1 q2 H H q1-conditionally conjugates the q2-conditional Z gate to an X, resulting in an operation differing from the Toffoli with target q0 by −1 precisely on the basis state |101 . In our model (see Section 1.1) the state is a mixture of computational basis states so these relative phases are unobservable.    Section 3 we present data on the performance of these circuits.
Various schemes making repeated applications of the idea behind Figure 1a have been studied under the name 'algorithmic cooling' [SV99, BMR + 02, FLMR04, EMW11, BEMW14, LLM22]. Their analysis is typically in terms of entropy, tracking which qubits are 'hot' or these circuits have favourable performance and resource requirements and have a natural planar layout. In Section 3 we present data on the performance of these circuits.
Various schemes making repeated applications of the idea behind Figure 1a have been studied under the name 'algorithmic cooling' [3,4,9,11,18,22]. Their analysis is typically in terms of entropy, tracking which qubits are 'hot' or 'cold'. Our approach is combinatorial, instead tracking what happens to individual error patterns as the circuit permutes the computational basis.

Purification circuits
We define an (n, k, e) purification circuit (or (n, k, e) circuit) to be a CNOT and Toffoli circuit mapping the n-long bitstrings of weight at most e into the space of n-long bitstrings that are zero in k nominated positions. If each value in the input is intended to be 0, but experiences an independent error probability p 0 of being flipped to a 1, then each of the k outputs of an (n, k, e) purification circuit experiences a reduced error rate O(p e+1 0 ). The parameter e is defined in terms of protection against adversarial errors and, as with the distance of a (quantum or classical) error correcting code, does not necessarily give a complete picture of the degree of protection against random errors. The worst case output error rate of an (n, k, e) purification circuit is n e+1 p e+1 0 + O(p e+2 0 ). In many cases only a small fraction of the sets of e + 1 preparation errors lead to an error on the output, so the coefficient of p e+1 0 will be much smaller. This applies, for example, to the family of circuits that we consider in Section 2.

Error model
An (n, k, e) purification circuit can be viewed as a purely classical object operating on classical bitstrings. To interpret it as a quantum circuit running on noisy hardware we adopt the following error model for preparation, idle and gate errors.
The basic error events are: • a qubit is incorrectly prepared as |1 rather than |0 with probability p 0 ; • a qubit not involved in the current round of gates depolarises with probability p I ; • a CNOT gate depolarises the set of qubits it acts on with probability p C ; • a Toffoli gate depolarises the set of qubits it acts on with probability p T .
All of these events are independent. By 'a set Q of qubits depolarises' we mean any of the following equivalent things.
• the qubits in Q are replaced by a uniform mixture of the computational basis; • each qubit in Q experiences a Pauli I, X, Y or Z error chosen uniformly and independently at random; • each qubit in Q experiences a Pauli X error with probability 1/2 and a Pauli Z error with probability 1/2, with all choices made independently.
Note that the parameterisation for preparation errors differs from that for idle and gate errors; preparing |1 rather than |0 with probability p 0 < 1/2 corresponds to correctly preparing |0 then depolarising with probability 2p 0 .
Since failures in either preparation or application of a gate in this model replace qubits by mixtures of computational basis states, the state of the system at any point is fully described by a probability distribution over the computational basis. Each probability is a polynomial in p 0 , p I , p C , p T , which can be computed precisely for circuits of moderate size. See Appendix E for more details.
This simple model has the significant advantage of being easy to compute with. The disadvantage is that it does not capture all possible error processes within a quantum computer. To give just one example, suppose that your CNOT gate is composed of CZ and Hadamard gates, and that a more accurate error model is that each component gate has an independent probability of depolarising the qubits it acts on. This is not the same as the combined CNOT gate having some probability of depolarising the qubits it acts on; it doesn't even have the property that a system with this noise model can be described as a mixture of computational basis states. Fully realistic noise models are of course more complicated again.

Existence
We can view an (n, k, e) purification circuit as a permutation of F n 2 which maps the set of vectors of weight at most e into the set of vectors that vanish in k nominated positions. This places a size constraint on n, k, e. The necessary condition is also sufficient.
then there is an (n, k, e) purification circuit consisting of CNOT and Toffoli gates.
Note that this is false if we restrict to circuits containing only CNOTs (which act linearly on F n 2 ) or to circuits containing only Toffolis (which fix the set of states of weight at most 1). Any purification circuit for which (1) is equality is optimal in the sense that, for p 0 < 1/2, it maps the most likely 2 n−k basis states to the most useful 2 n−k states.
We prove Proposition 1 in Appendix A. The argument is group-theoretic, and does not provide an efficient procedure to construct purification circuits with given parameters. For the special case of (2 m+1 − 1, 1, 2 m − 1) circuits we describe an explicit if impractical construction in Appendix B.

Finding small circuits
An (n, 1, 1) purification circuit must have n ≥ 3. It is straightforward to check that no one or two gate circuit is a (3, 1, 1) circuit, so the (3, 1, 1) circuit in Figure 1b is the smallest example that can defend against a single preparation error. (There is one other family of examples, obtained by replacing the second CNOT gate by a Toffoli.) Similarly, an (n, 1, 2) circuit must have n ≥ 5. An exhaustive search reveals that the smallest (5, 1, 2) circuits have 9 gates; two examples are shown in Figure 4. Equivalent circuits can be obtained by permuting the bottom four wires, or by interchanging consecutive commuting Figure 4: Examples of the two classes of (5, 1, 2) purification circuits of length 9. There are 2 5 = 32 basis states of 5 qubits, so a set of states (such as the low weight states, or the states that are |0 on the first qubit) can be represented by a string of 32 bits. By working forward from the initial set of states, backward from the target set of states and meeting in the middle, we can find (5, 1, 2) circuits very quickly, in around √ 2 32 = 2 16 time and space; see Appendix D for more details.
To find a (7, 1, 3) circuit in this way the comparable figure is √ 2 2 7 = 2 64 time and space, so we are already at the limit of what can be achieved without special insight into the problem. In the next section we present techniques for combining purification circuits. The results do not have optimal parameters (n, k, e), but do have relatively simple structures and avoid the requirement to do large amounts of work upfront to discover them.

New purification circuits from old
In this section we describe two techniques for constructing new purification circuits from existing ones.

Composition
The simplest and most general technique is to compose circuits, of similar or different types, to obtain larger ones. Figure 5 shows three (3, 1, 1) circuits feeding their outputs into a fourth (3, 1, 1) circuit. An error on the output requires errors on at least two of the inputs of the fourth circuit, which requires at least two errors on at least two of the first three circuits, so the composed circuit has parameters (9, 1, 3). By Proposition 1, a (9, 1, e) purification circuit could in principle have e = 4, so we have given up something on the achievable protection in exchange for a concrete circuit with a simple structure.
By the same argument we obtain the following.
There is no requirement that the circuits in the first stage are identical. For example, we might replace some of the first round (3, 1, 1) circuits in the (9, 1, 3) circuit by naive preparations. Changing only some of the inputs in this way, as shown in Figure 6, allows us to interpolate between the performance and resource requirements of the (3, 1, 1) circuit and the (9, 1, 3) circuit.

Juxtaposition and overlapping
So far all of our circuits have produced a single output. The simplest way to get multiple outputs is to use multiple circuits; m copies of an (n, k, e) circuit can be viewed as a single (mn, mk, e) circuit. This is very far from optimal, as we are over-protected against sets of e errors split between the m circuits. In favourable situations we can exploit this fact by overlapping circuits to re-use qubits. For example, the circuit in Figure 7 comprises two copies of the (5, 1, 2) circuit from Figure 4a sharing qubits q 3 and q 4 . We can check that this circuit retains the full protection of both original copies, for an overall parameter set (8,2,2). This circuit has been drawn to highlight its symmetry, but its depth can be reduced by 1 to 13 by performing the first four gates over two rounds.

Relation to classical codes
A (2e + 1, 1, e) purification circuit can be viewed as a decoder for the (2e + 1)-bit classical repetition code. The output bit contains the majority vote of the input bits, and the other (a) A legible ordering of the gates.
(b) A logically equivalent circuit that can be scheduled over 8 rounds. bits contain the syndrome information required to reconstruct the input. Any CNOT and Toffoli circuit for decoding a classical [n, k, 2e + 1] code can be viewed as an (n, k, e) purification circuit in the same way. This is generally more than we need; we only require that strings close to the zero codeword are correctly decoded. Take for example the [7,4,3] Hamming code. By Proposition 1, there is a CNOT and Toffoli circuit bijecting the ball of radius 1 about 0 with the space of vectors that vanish in 4 nominated positions. Since there are only 128 8 ≈ 2 40 sets of 8 basis states on 7 qubits, searching for these circuits is feasible, unlike the situation for (7, 1, 3) circuits described in Section 1.3. There are 508 022 784 such circuits. By inspecting a small number of these and rearranging gates by hand we arrive at the circuit presented in Figure 8a. We emphasise that this is not a full decoding circuit for the Hamming code.
We can interpret this circuit as four consecutive (3, 1, 1) circuits on sets of wires {0, 3, 6}, {1, 4, 6}, {2, 5, 6}, {3, 4, 5}. The first three circuits share the auxiliary qubit q 6 . The final circuit re-uses the remaining auxiliary qubits from the first three circuits. Figure 8b shows a logically equivalent circuit obtained by commuting gates past each other greedily that can be scheduled over 8 rounds. This is the form of the circuit that we use in simulations.
In the next section we present a very general construction inspired by circuits like that in Figure 8a.

Purification circuits from graphs
With perfect gates, an (n, k, e) purification circuit improves a preparation error rate p 0 to O(p e+1 0 ). In practice purification circuits will themselves be subject to error, meaning that the dominant term in the output error probability is likely to be due to gate failures, for example of the final Toffoli. There is therefore limited advantage to increasing e. In this section we focus on increasing the rate k/n, presenting a general construction, based on graphs, which allows us to tune the rate against other parameters like circuit depth. Figure 9: In the graph construction, each edge becomes a (3, 1, 1) circuit. In the first stage, CNOTs copy an error on any edge to its endpoints. In the second stage, Toffolis correct the error on any edge which marked its endpoints in this way. The gates within each stage commute and can be applied in any order.

Reinterpreting the (3, 1, 1) circuit
We can interpret the operation of the (3, 1, 1) circuit as follows. Think of the first wire as a data qubit, and the other two wires as auxiliary qubits. When the single allowed error is on the data, the two CNOTs mark the auxiliary qubits. This double mark is detected by the Toffoli, which clears the data error. When the error is on an auxiliary qubit, it can't spread back through the CNOTs, and it isn't able to activate both controls of the Toffoli, so the data remains unchanged. If instead we have a auxiliary qubits then we have a 2 pairs available for marking in this way, which can be used to protect up to a 2 data qubits against a single preparation error. This construction is naturally expressed in the language of graphs.

The general construction
Given a graph G with r vertices and s edges, we can obtain an (r + s, s, 1) purification circuit as follows. Associate one data qubit q uv to each edge uv, and one auxiliary qubit q v to each vertex v.
(detect stage) For each edge uv, apply CNOTs controlled on q uv and targeting q u , q v .
(correct stage) For each edge uv, apply Toffolis controlled on q u , q v and targeting q uv .
The gates within each stage commute, so can be performed in any order. The circuit has 2s CNOTs and s Toffolis, and can be scheduled over at most 3χ e (G) rounds, where χ e (G) is the edge-chromatic number of G. See Figure 9 for the result of applying this process to a path of length 5.
There is a superficial resemblance between these circuits and those implementing a surface code (or repetition code, in the case where G is a path or cycle). Data qubits are associated with each edge, and 'syndrome extraction' consists of CNOTs implementing a boundary operator, revealing those vertices incident to an odd number of edges with data errors. The two processes then diverge, as a true surface code makes corrections based on considering a global syndrome; we instead make local corrections to edges both of whose endpoints are marked in the syndrome. This works well for isolated errors, but not for more complicated error configurations.
(b) Output on q5. Figure 12: Small and large light cones over a long even cycle. When viewed as parts of the same circuit over a path or cycle, there is a further idle step following the short light cone. Figure 10 shows the correct operation of the (3, 1, 1) circuit when there is a single error on the data edge, and the two ways it can fail when there are two errors spread across the data edge and auxiliary vertices. In larger graphs there are other modes of failure. Errors on two incident edges cancel out part of their boundary, leading to neither error being cleared and inducing an error on any edge spanning their other two endpoints (Figure 11a). Errors on two disjoint edges are successfully cleared by the circuit, but cause errors on any other edges induced by these vertices (Figure 11b).
Taking G to be the complete graph K r produces an (r + r 2 , r 2 , 1) purification circuit. If we write k = r 2 , then the parameters become (k +O( √ k), k, 1); that is, we can protect a set of k qubits against a single preparation error with only O( √ k) overhead. The optimal overhead is at least log 2 k (as we must be able to map every single qubit error to an error pattern supported on the auxiliary qubits), so this is not too far from best possible.

Paths and cycles
In addition to failing for many sets of two preparation errors (Figure 11), the circuit for K r has the disadvantage that it requires Ω(r) rounds of gates. These disadvantages can be addressed by choosing a sparser graph, such a long cycle. If G is a cycle C k on k vertices, then the circuit can be scheduled over 4 (if k is even) or 5 (if k is odd) rounds, and has a natural planar layout (cf. Figure 9). It also significantly outperforms its parameters.
A (2k, k, 1) circuit defined over a cycle protects completely against any single error. Two or more errors on the input might lead to an error on the output, but only if those errors are sufficiently close. By the light cone of an output qubit we mean the subset of the gates and input qubits that can causally affect it. The state of the output qubit depends only on the preparation and gate errors on this part of the circuit. Over a long even cycle there are only two types of output qubit up to isomorphism (those in the first round of Toffolis and those in the second round) and so two possible light cones, shown in Figure 12. This allows us to analyse the output quality of the circuit over any sufficiently long even cycle by examining only two light cones. Since the larger light cone only reaches 10 qubits, 'sufficiently long' means k ≥ 10.
With preparation error rate p 0 and perfect gates, each output qubit experiences errors with probability O(p 2 0 ). Errors on output qubits are independent if they are sufficiently separated that their light cones are disjoint, but errors on nearby output qubits are correlated. This might lead to the following situation. Suppose that some set of a errors on the input leads to b > a errors on the output. Then an event which should have probability  O(p b 0 ) in fact occurs with probability Ω(p a 0 ). Depending on the intended use of the output qubits, this might be problematic.
We call a purification circuit combinatorially fault-tolerant up to b preparation errors if, for any a ≤ b, any set of a errors on the input leads to at most a errors on the output. For example, one can check that the circuits corresponding to long cycles are combinatorially fault-tolerant up to 3 preparation errors and that there is exactly one pattern of 4 errors on the input leading to more than 4 errors on the output ( Figure 13). This pattern works as follows. Input errors on two adjacent edges are frozen in place as their shared auxiliary qubit is first set to |1 then reset to |0 , so does not activate the Toffolis. Two of these frozen patterns placed one edge apart preserve the original errors but also causes a new error on the separating edge.
One way to stop this behaviour is to prevent errors on adjacent pairs of edges from being frozen in place. We can do this by adding an extra round of Toffolis to the detect stage.
(detect stage) For each edge uv, apply CNOTs controlled on q uv and targeting q u , q v .
(detect stage) For each pair of consecutive edges uv, vw, apply Toffolis controlled on q uv , q vw and targeting q v .
(correct stage) For each edge uv, apply Toffolis controlled on q u , q v and targeting q uv .
q 0 q 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 q 9 q 10 q 11 q 12 q 13 (b) Output on q7. The gates in both detect stages all commute with each other, so can be performed in any order. One choice of circuit is shown in Figure 14, with corresponding light cones in Figure 15. The idea behind this extended circuit is that, if there are no input errors on the vertices, then the detect stage marks all of the endpoints of each edge with an input error, with no cancellation arising from errors on consecutive edges. Then an input error on an edge is successfully cleared unless there is also an input error on one of its endpoints; and an edge with no input error experiences an error on the output only if, for both of its endpoints, there is an input error on either that vertex or the next edge.

Theorem 3. The extended purification circuit defined over a path or cycle is combinatorially fault-tolerant for any number of preparation errors.
We give the elementary proof in Appendix C.
We have now described all of our constructions. In the next section we present data on their performance.

Performance of purification circuits
For each circuit or light cone discussed in Sections 1 and 2 we computed the distribution of the output qubits. The 2 k probabilities in this output distribution are polynomials in p 0 , p I , p C , p T . Rather than perform Monte Carlo experiments we calculated enough terms of these polynomials to obtain the desired accuracy across the parameter range of interest; we give more details in Appendix E.
The circuits we simulate are listed in Table 2 along with some basic parameters. To simulate a circuit with idle errors we need to decide how to schedule the gates into rounds. For the cycle-based constructions the gates were scheduled manually as described in Figures 12 and 15. For the other constructions the gates were scheduled greedily, commuting gates past each other when necessary. The circuit diagrams in this paper were programmatically generated from the scheduled circuits as simulated, and typeset using yquant [6]. The typeset circuits respect the scheduled gate order, but to avoid visual clutter we have  not forced them to respect the scheduling of gates into rounds. The exact sequence of idle, CNOT and Toffoli gates simulated can be generated from the published source code [21]. For a circuit or light cone we write p out (p 0 , p I , p C , p T ) for the expected number of errors on the output, divided by the number of outputs. When there is a single output this is the probability of an error on that qubit; when there are multiple outputs this is the average probability of an error on each qubit.
Let g 1 , g 2 , g 3 be the number of idle, CNOT and Toffoli gates in a circuit. The polynomial p out (p 0 , p I , p C , p T ) is a sum of terms of the form each corresponding to a run of the circuit in which there were f 0 preparation errors, f 1 idle errors, f 2 CNOT errors and f 3 Toffoli errors. In particular, for some a i ≥ 0, where the i = 0 term is 0 as there are no errors on the output if there are no errors on the input. We plot these polynomials for each circuit in Figure 16. Note that the small and large light cones of each cycle construction, and the two (5, 1, 2) circuits, are equivalent in the absence of idle and gate errors, so only one representative of each type is plotted in Figure 16. As expected, near 0 the performance of the circuits is determined by the leading order term of their error probability. We include the plot for the full range of preparation errors to point out some general features.
• All curves pass through (0, 0), as the zero state is fixed by CNOT and Toffoli gates.
• All curves pass through (0.5, 0.5), as a uniform mixture of the basis states after preparation is preserved by every gate. For the majority vote-based (2e + 1, 1, e) circuits, the curves are symmetric about this point.
• At p 0 = 1 the output is deterministic. For most circuits, such as the majority vote (2e + 1, 1, e) circuits, the output is |1 , an error. For the cycle construction, an all |1 input happens to produce an all |0 output. (For p 0 < 0.5 the error rate at 1 − p 0 is greater than the error rate at p 0 , so this isn't an argument in favour of preparing |1 states deliberately.) The (7, 4, 1) circuit passes through (1, 0.75), meaning that an all |1 input produces one |0 and three |1 states on the output.

Noisy gates
Throughout this section we fix an idle error rate p I = 0.001. Figure 17 shows the performance of the various circuits with representative gate noise parameters p C = p T = 0.003. The main difference from Figure 16 is that there can now be errors on the output even without errors on the input. This limits the performance of circuits, moving the y-intercepts in the plot away from 0. For p 0 close to 0 we have p out > p 0 , as high quality qubits are damaged by noisy gates. Then for each circuit there is an interval of p 0 for which the circuit reduces the output error rate below p 0 . This interval might extend as far as p 0 = 0.5 (at which no purification circuit can lead to an improvement) or, as in the case of the (7, 4, 1) circuit in Figure 17, end much earlier.
These intervals are sections through the region of parameter space on which using each circuit leads to an improvement in average qubit quality. Figure 18 illustrates this region for the (7, 4, 1) circuit. Figure 19 illustrates this region for the (3, 1, 1) circuit. Note that these regions are genuinely different in form. For example, immediately below the plane p 0 = 0.5 there is no improvement from running the (7, 4, 1) circuit, but there is an improvement from running the (3, 1, 1) circuit.
Let θ(p I , p C , p T ) = inf{p 0 : p out (p 0 , p I , p C , p T ) = p 0 } be the lower threshold preparation error rate for obtaining an improvement. In the setting of Figures 18a and 19, θ corresponds to the lowest intersection point of each vertical line with the boundary surface. In particular, θ is not necessarily continuous. Figure 3 shows the threshold θ(0.001, p C , p T ) for the (3, 1, 1) circuit. The x-axis shows p C , the y-axis shows the ratio p T /p C of Toffoli to CNOT error rate and the z-axis, represented by colour, shows θ. We include contour lines for p 0 ∈ {0.003, 0.01, 0.03, 0.1}. If your CNOT and Toffoli error rates place you on a contour, running the (3, 1, 1) circuit produces output of the same quality as the input at that value of p 0 . If your CNOT and Toffoli error rates place you below-left of a contour, you expect to obtain an improvement from running a circuit at that value of p 0 . Examination of plots like Figures 18a and 19 for each circuit show that this holds in practice. It would be interesting to find general conditions under which output error rate is an increasing function of gate error rate. We return to this question in Section 3.2. Figure 3 shows that the (3, 1, 1) circuit can improve output qubit quality even for high gate failure rates; for example, a CNOT failure rate of 1% and a Toffoli failure rate of 3% suffices to improve a preparation error rate of 3% (although in practice you would like lower gate errors to obtain a meaningful improvement). The (5, 1, 1), (7, 1, 2) and (9, 1, 3)

CNOT depolarisation rate
Toffoli rate CNOT rate Preparation error rate (a) The region of parameter space (0 ≤ p 0 ≤ 0.5, p I = 0.001, 0 ≤ p C ≤ 0.05, 1 ≤ p T /p C ≤ 3) in which using the (7, 4, 1) circuit leads to an improvement in average qubit quality. Each data point represents a set of parameters for which p out = p 0 . The plane at p 0 = 0.5 corresponds to the maximum entropy state in which every computational basis state is equally likely. The curved surface bounds the region to its left in which running the circuit provides an advantage over naive preparation.  Toffoli rate CNOT rate Preparation error rate (a) The region of parameter space (0 ≤ p 0 ≤ 0.5, p I = 0.001, 0 ≤ p C ≤ 0.05, 1 ≤ p T /p C ≤ 3) in which using the (7, 4, 1) circuit leads to an improvement in average qubit quality. Each data point represents a set of parameters for which p out = p 0 . The plane at p 0 = 0.5 corresponds to the maximum entropy state in which every computational basis state is equally likely. The curved surface bounds the region to its left in which running the circuit provides an advantage over naive preparation.

CNOT depolarisation rate
Toffoli rate CNOT rate Preparation error rate Figure 19: Parameter sets between these two surfaces lead to improvements when running the (3, 1, 1) circuit.
holds in practice. It would be interesting to find general conditions under which output error rate is an increasing function of gate error rate. We return to this question in Section 3.2. Figure 3 shows that the (3, 1, 1) circuit can improve output qubit quality even for high gate failure rates; for example, a CNOT failure rate of 1% and a Toffoli failure rate of 3% suffices to improve a preparation error rate of 3% (although in practice you would like lower gate errors to obtain a meaningful improvement). The (5, 1, 1), (7, 1, 2) and (9, 1, 3) circuits have the same thresholds as the (3, 1, 1) circuit: if applying the (3, 1, 1) circuit reduces error rates, then applying it multiple times also reduces error rates.
Thresholds for the remaining eight circuits are plotted in Figure 20. Note that many of the thresholds exhibit the discontinuity-a sharp step up to 0.5-that we should expect from Figure 18a. Figures 17 and 20 suggest that the basic path/cycle construction of Figure 9 provides very good performance at a modest factor of two overhead in number of qubits.

Monotonicity
We might expect that running a purification circuit with lower gate and preparation error rates produces a higher quality output, but Figure 16 shows that this is false for preparation errors and the circuit defined over a cycle.
It is, however, true near 0. It follows from (2) that p out (p 0 , 0, 0, 0) is increasing for p 0 < 1/n. A similar argument shows that p out (0, p I , 0, 0), p out (0, 0, p C , 0) and p out (0, 0, 0, p T ) are increasing when each failure probability is less than one over the corresponding number of gates. Since p out is a polynomial, continuity of the partial derivatives shows that it is increasing in some neighbourhood of 0.
circuits have the same thresholds as the (3, 1, 1) circuit: if applying the (3, 1, 1) circuit reduces error rates, then applying it multiple times also reduces error rates. Thresholds for the remaining eight circuits are plotted in Figure 20. Note that many of the thresholds exhibit the discontinuity-a sharp step up to 0.5-that we should expect from Figure 18a. Figures 17 and 20 suggest that the basic path/cycle construction of Figure 9 provides very good performance at a modest factor of two overhead in number of qubits.

Monotonicity
We might expect that running a purification circuit with lower gate and preparation error rates always produces a higher quality output, but Figure 16 shows that this is false for preparation errors and the circuit defined over a cycle.
It is, however, true near 0. It follows from (2) that p out (p 0 , 0, 0, 0) is increasing for p 0 < 1/n. A similar argument shows that p out (0, p I , 0, 0), p out (0, 0, p C , 0) and p out (0, 0, 0, p T ) are increasing when each failure probability is less than one over the corresponding number of gates. Since p out is a polynomial, continuity of the partial derivatives shows that it is increasing in some neighbourhood of 0.
As remarked earlier, when n 0 + · · · + n e = 2 n−k (so that inequality (1) is tight) and p 0 < 0.5, every acceptable input state (pattern of preparation errors that leads to clean output) has higher probability than every unacceptable input state. This means that any averaging process, for example an arbitrary sequence of depolarisations, is bad for the output error rate. Again, this is not enough to show that p out is increasing in each error rate, as successive depolarisations do not necessarily make things progressively worse.
We conjecture that it should be possible to prove monotonicity under some reasonable set of conditions. For instance, suppose that n 0 + · · · + n e = 2 n−k and p 0 < 0.5. Is p out (p 0 , p I , p C , p T ) an increasing function of p 0 , p I , p C and p T ? We leave this as an open question.

CNOT depolarisation rate
Toffoli rate / CNOT rate Lower threshold preparation error θ   3, 1, 1) circuits. Each subplot shows, for each value of p C and p T /p C , the least p 0 beyond which running the corresponding circuit can produce output of higher quality than the input. If your gate errors places you down and to the left of a contour, then you expect an improvement from running the corresponding circuit. See Section 3.1 for a complete description.

Outlook
We have described a variety of relatively convenient low depth circuits compatible with 1D arrays of qubits which substantially improve |0 preparation when two-qubit errors are well below preparation errors. We believe non-Clifford circuits are also worth exploring for pre-and post-processing of the measurement subsystems that will be used in syndrome extraction and elsewhere in quantum computers. The practical use of these circuits remains an open question, as tradeoffs in qubit cost and connectivity, as well as the cost in time and energy for physical resets of unused qubits, depend substantially upon the physical architecture. We would very much like to see data on the practical effect of running our circuits on real quantum computing hardware. q 0 q 1 q 2 q 3 q 4 q 5 q 6 Figure 21: A (7, 1, 3) purification circuit presented using multiply controlled Toffoli gates.

A Existence of purification circuits
In this section we prove Proposition 1 on the existence of purification circuits with given parameters. We use the following algebraic fact. Proof of Proposition 1. By (1), 1 + n ≤ 2 n−k , so we may assume that n > 1. Then 3 ≤ 2 n−k , and so k ≤ n − 2.
Let σ be any permutation of F n 2 \ {0} which maps the vectors of weight at most e into the vectors that vanish in the first k positions. Let τ the transposition that swaps the standard basis vectors e n and e n−1 . Since k ≤ n − 2, τ σ also maps the vectors of weight at most e into the vectors that vanish in the first k positions. One of these permutations is even, so by Lemma 4 can be expressed in terms of CNOTs and Toffolis.
Proof of Lemma 4. For n = 1, all the groups in question are trivial.
For n ≥ 2, let C n be the group generated by CNOTs on n qubits. C n is isomorphic to GL(n, 2) acting naturally on F n 2 \ {0}. For n = 2, this achieves the action of the full symmetric group on F 2 2 \ {0}. For n ≥ 3, let T n be the group generated by Toffolis on n qubits. T n acts on the vectors of weight at least 2. For n = 3, it acts as the symmetric group Sym(4); for n ≥ 4 it acts as the alternating group Alt(2 n − n − 1) [12, Theorem 1.1.4]. Now consider the group G n = C n , T n .
As G n acts primitively on F n 2 \ {0} and contains a 3-cycle (2-cycle) if n ≥ 4 (n = 3), it contains the alternating group (symmetric group) on F 2 2 \ {0} [7, Theorem 3.3A]. A CNOT or Toffoli gate on at least 4 qubits is an even permutation of the basis states, so this containment is equality.

B An explicit construction
In this section we give an explicit construction of purification circuits with parameters (2 m+1 − 1, 1, 2 m − 1).
• In the first stage, perform n − 1 CNOTs controlled on q 0 and targeting each other q i .
For m = 1 this is the (3, 1, 1) circuit in Figure 1b. For m = 2 it is the (7, 1, 3) circuit shown in Figure 21; recall from Section 1.3 that searching for a shortest (7, 1, 3) circuit by brute force was out of reach.
As presented, this circuit already has a large number of gates. If the multi-controlled Toffolis are expanded to conventional Toffolis, this number will increase further, the exact increase depending on how the expansion is performed and the original order of the multicontrolled Toffolis.
Proof. Let 0 ≤ r ≤ e and let v be a vector of weight r. We must show that the circuit maps v to a vector which is 0 in the first position.
If v 0 = 0, then every gate of the circuit fixes v, so the output 0 is preserved.
If v 0 = 1, then all of the CNOTs activate, mapping v to a vector w with w 0 = 1 and (n − 1) − (r − 1) = 2e + 1 − r ≥ e + 1 of bits 1, . . . , n − 1 set. This will activate 2e + 1 − r e + 1 = 2e + 1 − r e − r = 2 m + s s of the Toffolis, where 0 ≤ s = e − r ≤ e < 2 m . We claim that this number is odd, from which the result follows. We use induction on s. For s = 0 the number is 1, so odd. For s > 0, we have 2 m + s s = 2 m + s s where by induction the binomial coefficient on the right-hand side is odd. Since s < 2 m , the number of powers of 2 dividing the numerator and denominator of the fraction agree, so it has the form a/b where both a and b are odd. Hence the left-hand side is odd, as required.

C Fault-tolerance of the enhanced cycle construction
In this section we prove Theorem 3, that the enhanced cycle construction is fully faulttolerant.
Proof of Theorem 3. A path on two vertices only has one output, so is automatically faulttolerant. Otherwise a path behaves like a cycle on the same number of vertices where one nominated edge never has an error on the input, so we prove the result for cycles. Let V be the set of vertices with an error on the input, E the set of edges with an error on the input and F the set of edges with an error on the output. We must show that |F | ≤ |V | + |E|, or equivalently that |F \ E| ≤ |V | + |E \ F |.
Fix an orientation of the cycle. We will show that, starting from any element of F \ E and walking round the cycle clockwise, we encounter an element of V ∪ (E \ F ) before another element of F \ E. This suffices to prove (3).
Let xyz be a sequence of three consecutive vertices moving clockwise around the cycle. Suppose that xy ∈ F \ E but y / ∈ V . For xy to have an error on the output but not the input, y must be in state |1 after the detect stage. Since y and xy have no errors on the input, yz must have an error on the input. If yz ∈ E \ F then we are done, so assume that yz ∈ E ∩ F . But now, by construction, an input error on an edge cannot persist in the output if there are no input errors on its endpoints. There is no input error on y, so we must have z ∈ V .

D Searching for circuits
As described in Section 1.2 and Appendix A, CNOT and Toffoli gates on a set of n qubits act as permutations on the computational basis, which we identify with F n 2 . Let Π n be the set of these CNOT and Toffoli permutations. An (n, k, e) purification circuit is a composition of elements of Π n which maps the Hamming ball B(0, e) of radius e about 0 into the codimension k subspace V n−k of F n 2 comprising the vectors which vanish in the first k positions. By Proposition 1, such a composition exists provided |B(0, e)| ≤ |V n−k |.
We can find such compositions as follows. Let P(F n 2 ) be the power set of F n 2 and, for a set S, let S m be the set of subsets of S of size m. For sets of states S ∈ P(F n 2 ), sets of sets of states S ⊆ P(F n 2 ) and permutations π ∈ Π n , let • π(S) = {π(s) : s ∈ S} • Π n (S) = {π(S) : π ∈ Π n , S ∈ S}.
That is, there is an (n, k, e) purification circuit of length t 1 + t 2 . We expect to find such an intersection point once |B(0,e)| . This contrasts with a naive search over circuits of length t 1 + t 2 , which requires at least an expected 2 n |B(0,e)| time but constant memory. The first method has an additional advantage over a naive search. There are typically multiple circuits taking one set of states to another set of states. This is in part due to the existence of multiple circuits implementing the same permutation (obtained, for example, by commuting gates past each other), and in part due to the fact that distinct permutations of states can have the same action on sets of states (for example, the two inequivalent circuits in Figure 4 which both act as (5, 1, 2) purification circuits). By tracking reachable sets of states rather than circuits, the first method experiences a limited overhead from this phenomenon. A brute force search by contrast might be able to avoid the most obvious cases of repetition (for example, by not applying a pair of commuting gates in both orders), but greater care would be required to avoid duplicated effort.
In [21], Search.hs implements this method by applying generic pathfinding functions from Pathfinding.hs to the gate set described in Circuits.hs.