Magic state parity-checker with pre-distilled components

Magic states are eigenstates of non-Pauli operators. One way of suppressing errors present in magic states is to perform parity measurements in their non-Pauli eigenbasis and postselect on even parity. Here we develop new protocols based on non-Pauli parity checking, where the measurements are implemented with the aid of pre-distilled multiqubit resource states. This leads to a two step process: pre-distillation of multiqubit resource states, followed by implementation of the parity check. These protocols can prepare single-qubit magic states that enable direct injection of single-qubit axial rotations without subsequent gate-synthesis and its associated overhead. We show our protocols are more efficient than all previous comparable protocols with quadratic error reduction, including the protocols of Bravyi and Haah.

error reduction in a single round [5,6,21,22], but there are important practical considerations for why one might favour a concatenated approach (see Sec. 6.2 for a discussion).
We begin by covering some basic notation. Sec. 2 gives an overview of our approach. Note that the first step is our previously proposed synthillation method [19,20], so the details will not be repeated here. Rather we focus on how non-Pauli parity checking is possible given these pre-distilled resources, giving a detailed explanation in Sec. 3. The protocol's performance is analysed in Sec. 4. In Sec. 5, we present a small bonus result that proves equivalent performance is unlikely to be possible using codes with conventional transversal gate constructions.

Notation
We denote axial rotations about the Pauli-Z axis as R(θ) = exp(iθZ) = cos(θ)1l + i sin(θ)Z. (1) If the angle is θ = π/2 for integer then R(θ) belongs in the th level of the Clifford hierarchy [23]. Therefore, R(π/8) is the π/8-phase gate, also known at the T gate. Unitaries inside the Clifford hierarchy are special because they can be realised using state-injection and a bounded number of appropriate magic states. However, all the analysis in this paper holds for any θ, even values corresponding to unitaries and magic states not connected to the Clifford hierarchy. We use W (θ) for the Hermitian operator W (θ) := R(θ)XR(θ) † = R(2θ)X. Note that W (π/8) is a Clifford and plays a similar role to the Hadamard; it interchanges X and Y whereas the Hadamard interchanges X and Z. The relevant magic states are eigenstates of W (θ) and sit on the equator of the Bloch sphere |R(θ) = W (θ)|R(θ) = R(θ)|+ . (2) More generally, when U is a diagonal gate (acting on n qubits) we use |U := U (|+ ⊗n ).
In this notation, the familiar T state is |R(π/8) . We use CZ for control-Z and CCZ for control-control-Z.
2 Overview of new protocols 2.1 Protocols for π/8 phase gates The protocols presented here can be used both for distillation of T -states that can implement π/8 phase gates, or more generally for distillation of |R(θ) magic states for smaller angle rotations. We begin by sketching the simple case of T -state distillation. We say a protocol is an n → k protocol if it takes n inputs and outputs k magic states with some success probability (typically this probability approaches unity in the low noise limit). Our two-step protocols for T -state distillation are 3k + 4 → k protocols for even k with quadratic reduction of noise. The resource overhead of the protocol is roughly captured by n/k = 3 + 4 k , which approaches 3 for large k. For k = 2 we have a 10 → 2 protocol and so the protocol is very similar to the MEK proposal proposed by Meier, Eastin and Knill [3]. Therefore, compared to MEK, we reduce the n/k overhead from 5 to 3 by going to larger blocks sizes (larger k). Another class of protocols was proposed by Bravyi and Haah [4], which are 3k + 8 → k protocols for even k with quadratic reduction of noise. The Bravyi-Haah protocols also have n/k → 3 in the large protocol limit. Both our protocols and the π/8 π/8 π/8 π/8 π/8 π/8 π/8 π/8 Figure 1: An illustration of how the two steps of our protocol are chained together for the special case N = 1. The combined process is described in Eq. (9) for general N .
Bravyi-Haah protocols have the same asymptotic limit, but our protocols approach this limit faster. For example, to achieve n/k = 4 we can use modest size 16 → 4 protocols, whereas the comparable Bravyi-Haah protocol is 32 → 8 with double the block size. This effect becomes more pronounced as the protocol is concatenated. Furthermore, our work presents a different approach to distillation as we break the process up into two steps, making use of a multi-qubit magic state resource. Let us describe how this two-step process works, first considering the 10 → 2 case. In the first step we prepare a "pre-distilled" magic state that can inject a CCZ (or Toffoli) gate. It has been known for several years that a single CCZ magic state can be prepared from 8 T -states with quadratic error reduction [17,18]. It is no longer appropriate to describe this process in simple n → k notation and so we introduce the more detailed description of these protocols as being (3) In this notation, the left hand side gives the input resources and the right hand size gives the output resources. The top line gives the quantity of resources, the middle describes the species of magic state and the bottom line gives the infidelity. Our second step, which was not previously known, is to notice that a single |CCZ resource can be used to check the parity in a pair of |R(π/8) states, which implements the transform Chaining these steps together so that or simply 10 → 2 for short. Next we outline the two-step process for larger block protocols. Instead of using |CCZ resources, larger block protocols will use a resource that we denote as |CCZ #N . This resource can inject a gate CCZ #N , which is N copies of the CCZ gate all sharing one control qubit in common, and the relevant magic state is simply In the first step, we borrow results on synthillation [19,20] that provide protocols implementing This protocol is described in the section "U N # family" of Ref. [20]. To avoid repetition, here we simply treat this synthillation routine as a black box with known properties. Rather, here we focus on the second step -the main technical contribution of this work -by showing that a pre-distilled |CCZ #N can be used to parity check on 2N magic states Chaining steps one and two, so that # = O( 2 π/8 ), we obtain a family of protocols for all integer N or simply 6N + 4 → 2N for short. Alternatively, using k = 2N we have a family of 3k + 4 → k protocols with even k. So we have returned to the choice of symbols used by Bravyi and Haah who found 3k + 8 → k protocols with even k. Fig. 1 illustrates this for the N = 1 (k = 2) case. This completes the outline of our protocols for T -state distillation.

Protocols for general phase gates
Next, we describe a family of protocols for other equatorial magic states |R(θ) .
Step one will remain the same, again making use of synthillation of |CCZ #N resource states.
Step two generalises to When R(2θ) appears uncluttered by a ket, as it does above, it refers to inputting a phase gate R(2θ) rather than a magic state. Most interesting is the case when R(2θ) sits in the Clifford hierarchy and so can be injected using appropriate magic states. Proving the validity of the above mapping is at the core of this paper. Chaining this with synthillation yields an overall protocol It is important to notice that there is no noise reduction in η. Therefore, it is crucially important they are predistilled (e.g. η ∼ 2 ) and so we refer to the rotations R(2θ) as pivotal. Despite the need for high fidelity pivotal rotations, other protocols have used pivotal rotations and found significant reductions of resource costs compared against using gate-synthesis. For instance, the N = 1 two-step protocol has an identical resource cost to the protocols introduced by Campbell and O'Gorman [15]. Pivotal rotations (though not under this name) played a similar role in the protocols proposed by Duclos-Cianci and Poulin [14], which in our notation can be described as Note that for the majority of their paper, Duclos-Cianci and Poulin only discuss the N = 1 case, but they do sketch the higher N case later in the paper. One could say our protocols are compressed as they essentially give a slight compression in T -cost of the Duclos-Cianci and Poulin protocols [14]. Our protocols also have very different inner workings. This provides a new perspective on magic state distillation, but also the two-step feature has a potentially significant practical advantage. The exotic resources are higher value since the R(2θ) pivotal rotation and |R(θ) resources are more difficult to prepare than standard |R(π/8) magic states. However, in the two-step protocols, one does not risk using the exotic resources until the first step has succeeded. This contrasts, with both the Duclos-Cianci-Poulin protocols and Campbell-O'Gorman protocols for which all the resources are committed at the same time, with a single error anywhere leading to loss of all resources.

Implementing step two
Here we show how to use CCZ circuits, implemented using synthillation, to perform a parity check. We will construct circuits that measure the parity of 2N qubits in the basis (a) two-qubit parity checker Figure 2: Gadgets for measuring parity in the W (θ) basis. In (a) we illustrate how control-W (θ) gates are used to measure parity. In (b) we show how the control-W (θ) gates can be decomposed into CNOT and control-phase gates. In the text we describe this phase-gate circuit by V (see Eq. (14)), which can be broken up into a product of U j gates (see Eq. (17)).
of W (θ). Using an ancilla and a control-W (θ) ⊗2N gate would achieve this, but a smaller resource overhead is needed if we instead use the parity check gadgets in Fig. 2. We use a combination of control-W (θ) that triggers when the control bit is in the |1 state with an unconventional control-W (θ) (shown with open circles) that triggers when the control bit is in the |0 state. This parity check gadget measures in the W (θ) basis, but with an additional W (θ) rotation on half the qubits. This resulting Kraus operator for the desired even parity outcome is We assume the noisy |R(θ) magic states are diagonal in the W (θ) basis, and so the additional W (θ) rotations have no effect. The assumption of diagonal noise is mild and can be removed at the expense of hideously complicating the noise analysis (see e.g. App D of Ref. [15]).
Using W (θ) = R(2θ)X it follows that this parity measurement circuit can be split into a sequence of control-X gates followed by a phase gate circuit (see Fig. 2b). Algebraically, this phase gate circuit is where the subscripts denote which qubits the rotations act on, with qubit labels running from 0 to 2N . Using the shorthand we have To recap, given |R(θ) of error rate and the ability to implement V we can parity check in the W (θ) basis, outputting |R(θ) of error rate O( 2 ) with some probability p = 1 − O( ). Next, we show how to implement V with some pre-distilled resources. First, we use the decomposition R(2θ) = cos(2θ)1l + i sin(2θ)Z to expand out U j as Collecting the cos and sin terms, we have where we have introduced further shorthand which is unitary, Hermitian and Clifford. Figure 3: In (a) we show how U j (a pair of control phase gates) can be implemented using an ancilla, a control-M j gate and a pivotal rotation. An algebraic proof of this equivalence starts at Eq. (17) and ends after Eq. (23). In (b) we show how a pair of CCZ gates, which share a pair of controls, is Clifford equivalent to only a single CCZ gate. In (c) we uses this identity twice to simplify a more complex circuit down to a CCZ #2 . In general, one finds that a sequence of N control-M j can be simplified to a CCZ #N circuit (see Eq. (24) and Eq. (26)).
Next we show that each U j can be implemented with access to a single |+ ancilla, a control-M j unitary and a R(2θ) rotation (see Fig. 3a). We prepare the ancilla in the |+ state and use it as the control qubit for the control-M j unitary, which gives the state |0 |ψ + |1 (M j |ψ ). Next we rotate the ancilla by HR(2θ)H and measure in the computational basis. This is equivalent to measuring with projections 0|HR(2θ)H = cos(2θ) 0| + i sin(2θ) 1| (20) 1|HR(2θ)H = cos(2θ) 1| + i sin(2θ) 0|.
In the eventuality of a "+1" outcome, we find as desired. However, when a "1" outcome is measured we have We see that an M j gate will correct for the different measurements outcomes, and since M j is Clifford this does not contribute to the resource cost. This completes the proof of the identity in Fig. 3a. Similarly we chain together N such circuits, assuming we have access to N copies of R(2θ) and the circuitṼ = j=1,...,N This is composed of a sequence of control-M j gates, where each gate is controlled from a different ancillary qubit. We have labelled these new ancilla with negative integers from −1 to −N . Each control-M j consists of a CZ gate and two CCZ gates. On first inspection this seems to imply 2N CCZ gates are needed. However, using the identity in Fig. 3b we see a pair of CZZ gates can sometimes be realised using a single CCZ. Note this identity only works because the pair of CCZ gates share two control bits in common. Applying this identity repeatedly (see e.g. Fig. 3c) can reduce the circuit to one using only N CCZ gates. Algebraically, the identity is where CX c,t is a control-X gate with control qubit c and target qubit t. The resource intensive part is the non-Clifford component of N CCZ gates all sharing one single control qubit in common (qubit 0). Here we denote such a circuit as CCZ #N , which has been elsewhere called Tof #N . For these gates, the problem of optimal synthesis into CNOT + T gates has been solved and the circuit requires 4N + 3 T gates (see Example IV.2. of Ref. [20]). Recall that we are requiring that CCZ #N is predistilled to a higher fidelity. The most efficient known method to achieve this is to use the synthillation protocol that can perpare CCZ #N using only (4N + 4) T -states of π/8 error rate, and so this is the first-step of our two-step protocol.
We have demonstrated how step two works using a series of circuit identities (for any integer N ). For completeness, we show in Fig. 4 how these circuit identities plug together for N = 1 and N = 2. The circuits could be further expanded by replacing the non-Clifford gate CCZ #N with the magic state |CCZ #N and the appropriate Clifford injection circuit.

Noise analysis
This section presents a performance analysis for one round of our protocols. Subsection 4.1 reports some results of numerical simulations for smaller size protocols with N = 1 and N = 2. Subsection 4.2 focuses on providing a simple, yet rigorous derivation, of analytic upper bounds on output noise. The analytic results hold for all N , but are loose and actual performance will be much better than analytically bounded.

Numerical analysis
We performed full state vector numerical simulations using IBM's QISKit (code available as ancillary file). We simulated the effect of leading order errors for circuits with N = 1 and N = 2. The output error probabilities are for a single qubit with other output qubits traced out. Numerical results were independent of which output qubit is chosen and independent of θ (up-to numerical accuracy of 2 significant figures).
For N = 1, we found that The leading order coefficients for output error are identical to those for the MEK k protocols proposed by Campbell and O'Gorman (themselves a modified form of MEK) and so it seems that our new protocols (with N = 1) perform identically in this regard. We have a slight  In the absence of noise, the circuit measures the parity in the W (θ) bases. Therefore, when we see a SUCCESS event, the ρ(θ) states are output with quadratically reduced noise. In the event of a FAILURE, we discard the qubits and attempt again.
performance advantage in terms of success probability due to the two step nature. We found where p synth is the probability of step one succeeding and p parity is the probability of step two succeeding. To leading order, the previous MEK k protocols had a success probability equal to p mek = p synth p parity . Here, we don't commit to the second step until the first step is successful, which will lead to superior rates of generating magic states. For the setting θ = π/8, the protocol simplifies to a 10 → 2 protocol with = 9 2 π/8 + O( 3 π/8 ) and overhead n/k = 10/2 = 5, very similar to the original MEK protocol.
For N = 2, we found that By going to N = 2, we incur only mildly worse constant prefactors, but gain a significant efficiency improvement in terms of magic states output per input. For the setting θ = π/8, the protocol simplifies to a 16 → 4 protocol with = 19 2 π/8 + O( 3 π/8 ) and overhead n/k = 16/4 = 4. To obtain the same n/k overhead using Bravyi-Haah protocols (which are limited to θ = π/8), we need to go to a larger size 32 → 8 protocol with = 25 2 π/8 + O( 3 π/8 ). This confirms that our protocols can obtain similar resource overheads with a smaller scale quantum computer, and without any sacrifice in terms of error suppression or success probability.

Algebraic analysis
Here we take an analytic approach. We do not know of an analytic method of determining the exact expressions for , but can prove a rigourous upper bound using standard norm inequalities. Actual performance will be much better than proven here. We begin by considering the effect of noise on the input states We extend later to account for CCZ #N noise and pivotal rotation noise, but when these components work perfectly the circuit implements where K is the parity projecting Kraus operator introduced in Eq. (13). Rather than the whole multi-qubit output, we are interested in the fidelity of a single output qubit, and so introduce the channel where tr i [· · · ] is the partial trace over all but the i th qubit. The output of this channel is the unnormalised state The renormalisation constant p good + p bad is the probability of the parity check yielding a "+1" outcome. When the parity check process is error-free, this occurs whenever the input states contains no errors or an even number of errors, and so The term p bad is the probability of an error on i th qubit and an odd number of errors on the remaining 2N − 1 qubits where in the first line we have a binomial coefficient. The inequality follows from Bernoulli's inequality. This shows quadratic noise suppression in θ . Next, we account for # noise in the CCZ #N gate. We can write the corresponding magic state as where and σ #N carries some Z noise. We define F as the channel describing the action of the whole circuit (including implicit injection gadget for CCZ #N ), assuming ideal pivotal rotations, acting on ρ #N and ρ(θ) ⊗2N . We will use that Ψ #N leads to a parity and by linearity of F we deduce Again, we are interested in only the single output qubit, and so introduce F i = tr i E i , which straightforwardly yields This yields a single qubit state of the form in Eq. (35) with new parameters p good and p bad , which are tricky to exactly calculate but can again be bounded. The joint probability p good + p bad can be lower bounded by assuming F i (σ #N ⊗ ρ(θ) ⊗2N ) = 0 and so The p bad term can be upper bounded by considering the worst-case scenario that F i (σ #N ⊗ ρ(θ) ⊗2N ) leads to a logical error with unit probability, and so where the second inequality follows from Eq. (37). These bounds are very loose and overestimate p bad by quite a lot. Nevertheless, they are simple to obtain and rigorous.
Next, we further consider phase noise on the pivotal rotation, each failing with probability η. In other words, all pivotal rotations act perfectly with probability (1 − η) N . Therefore, the channel implemented is not F i but something of the form where F i is the noisy part of the channel with diamond norm not exceeding unity. Therefore, The worst case scenario is that F i always generates an error, adding a (1 − (1 − η) N ) contribution to the error term. Therefore, after renormalising the error probability is bounded by The result scales as O( # ), but this error rate is itself the output of performing the synthillation protocol using noisy |R(π/8) -states of error rate π/8 . In particular, Eq. (128) of Ref. [20] shows that This suffices to conclude that where O( 3 ) collects all higher order terms. For instance, for N = 1 and N = 2 this yields Comparing this with the numerical expressions Eq. (27) and Eq. (29), we see the analytic upper bound is very loose and grossly overestimates the prefactors.

No small triorthogonal codes
Many other distillation protocols are based upon projections into codespaces with a transversal non-Clifford, with transversality proofs typically using some notion of triorthogonal matrices [4]. While the protocols proposed here do not manifestly have this form, it is natural to ask whether there is some codespace projection with equivalent performance. Indeed, Jones' first-level distiller protocol [5] is effectively equivalent to projecting onto the codespace of Bravyi-Haah triorthogonal codes [4], and Haah has recently introduced level-lifting as a general methodology for finding such equivalences [16]. Furthermore, it has long been known that for any distillation protocol there exists a codespace projection that achieves the same error suppression [24], though it may not achieve the same success probability or admit a transversal non-Clifford gate.
In this section, we show that there exist no triorthogonal codes with fewer than 14 qubits. This bound is tight since the smallest Bravyi-Haah code is a 14 qubit triorthogonal construction. It follows that the 10 → 2 MEK protocol is not equivalent to a projection onto a triorthogonal code. What distinguishes MEK is that it is a highly compressed circuit that is obtained from taking a larger circuit and cancelling some T -gates. This suggests that something happens during the compression process of eliminating extraneous T -gates that breaks the equivalence to triorthogonal codes. Since our protocols can be understood as a generalisation of MEK protocols, it seems unlikely similar performance parameters will be achievable using projections onto codes with exotic transversality properties.
We present the definition of triorthogonality Definition 1 (Def 1. of Ref. [4])A binary matrix G of size m × n is called triorthogonal iff the supports of any pair and any triple of its rows have even overlap, that is, for all pairs of rows 1 ≤ a < b ≤ m and for all triples of rows 1 ≤ a < b < c ≤ m.
The definition of triorthogonality allows a matrix to have either odd or even rows, and it is standard to use a horizontal line to demarcate the split so G 1 contains odd weight rows and G 0 contains even weight rows. Assuming G is row-wise linearly independent, it describes an [[n, k, d]] quantum code where: n is the number of columns in G; k is the number of rows in G 1 ; and d ≥ 2 if and only if G 0 is non-trivially supported on every column. We also use the notion of a biorthogonal matrix, which obeys the constraint for pairs of rows but not for triples of rows. Let G be a triorthogonal matrix with block matrix form where 1 and 0 are the all-1 and all-0 row vectors of appropriate width. Using column permutation the matrix can always be brought into this form. Without loss of generality, we assume that the last row has weight w and is the highest weight row in the span of G 0 . Let B and D be width u, so that the total matrix width is n = w + u. From triorthogonality of G, it follows that the submatrix is biorthogonal with all rows being even weight and that is biorthogonal with B containing odd weight rows. Since B contains odd weight rows, the matrix D cannot contain the all-1 vector as this would violate biorthogonality. However, since the code is distance 2, the matrix D must be supported on every column. Therefore, there must exist at least 2 non-trivial rows in D. The smallest possible width for D is then achieved by which has width u = 6 and contains weight 4 vectors, and so we can infer that w ≥ 4.
Other D are possible, but one cannot obtain smaller parameters: There are at least two rows and for any pair of rows they must overlap on at least 2 columns and also have support on at least 2 other non-overlapping columns. From this we see that n = w +u ≥ 4+6 = 10. So this is already enough to prove there are no [[n, k, 2]] triorthogonal codes with k ≤ 1 and n less than 10. However, the bound w ≥ 4 was obtained based on the rows of D only, but w is the max weight across all rows in the span of G 0 . If C contains a row of weight w c , then we know that w ≥ w c + 4. For any row of C we can add the last row of G 0 , which generates a row of weight w c = w − w c , entailing that w ≥ w c + 4 = w − w c + 4 and so w c ≥ 4. Putting this together yields w ≥ 8 and u ≥ 6 so that n ≥ 14. There are no triorthogonal codes with fewer than 14 qubits.

Variation of the two-step protocol
This subsection discusses one possible variant of the two-step protocol. Consider a quantum algorithm that needs many magic states of the form |R(θ) , but with different values of θ. As presented, our main protocol cannot be used to full effect as the very large N limit assumes that we need many magic states with the same θ. However, it is straightforward to check that one can distill pairs of states with the same θ j . That is, we may input states of the form N j=1 ρ(θ j ) ⊗2 and use N pivotal rotations with corresponding angles 2θ j .

Quadratic vs higher order error suppression
In our introduction, we remarked on the existence of distillation routines that offer much larger reductions in errors without using concatenation [5,6,21,22]. The appeal of these protocols is better asymptotic performance in the limit of large quantum computers and large error reduction. The analysis underpinning these results assumes that it is appropriate to quantify resource costs by the ratio of input to output states. But a more realistic picture is given by an involved full space time analysis; also accounting for the cost of Clifford gates and quantum error correction. In such an analysis, it is possible to scale the size of error correction codes between rounds of magic state distillation [25][26][27]. This scaling trick is extremely effective, and is arguably the most important tool in the arsenal of magic state distillation techniques. Although the idea has been known for some time, it has gone without a name. In an effort to popularise this trick, O'Gorman and Campbell recently proposed the phrase "balanced investment" [27].
Balanced investment relies on distinct rounds of magic state distillation with successive error reduction. Therefore, balanced investment is more compatible with protocols giving quadratic error reduction, such as presented here, than with the protocols of Refs. [5,6]. This argument is qualitative, and we need detailed resource analyses to make concrete quantitative statements. However, such full resource investigations are difficult and timeconsuming and have only been undertaken for the Reed-Muller and Bravyi-Haah protocols. Naturally, such a numerical investigation also falls outside the scope of this paper.

Conclusions
We presented a new two-step method of magic state distillation that is very competitive at preparing single-qubit magic states, offering a way to circumvent the need for costly gate-synthesis of single-qubit rotations. An important aspect of these new protocols are the preparation of multi-qubit magic states using synthillation. Pressing open questions include how these competing approach fare when all resource costs are considered, though such an analysis will depend heavily on the architecture considered.
We would also like to explore whether the synthillation driven techniques proposed here could be extended to protocols with larger than quadratic error reduction [5,6]. After completing this work, Hastings and Haah proposed some new approaches to synthillation that may provide a starting point for attacking this problem [28].