Codes and Protocols for Distilling $T$, controlled-$S$, and Toffoli Gates

We present several different codes and protocols to distill $T$, controlled-$S$, and Toffoli (or $CCZ$) gates. One construction is based on codes that generalize the triorthogonal codes, allowing any of these gates to be induced at the logical level by transversal $T$. We present a randomized construction of generalized triorthogonal codes obtaining an asymptotic distillation efficiency $\gamma\rightarrow 1$. We also present a Reed-Muller based construction of these codes which obtains a worse $\gamma$ but performs well at small sizes. Additionally, we present protocols based on checking the stabilizers of $CCZ$ magic states at the logical level by transversal gates applied to codes; these protocols generalize the protocols of 1703.07847. Several examples, including a Reed-Muller code for $T$-to-Toffoli distillation, punctured Reed-Muller codes for $T$-gate distillation, and some of the check based protocols, require a lower ratio of input gates to output gates than other known protocols at the given order of error correction for the given code size. In particular, we find a $512$ T-gate to $10$ Toffoli gate code with distance $8$ as well as triorthogonal codes with parameters $[[887,137,5]],[[912,112,6]],[[937,87,7]]$ with very low prefactors in front of the leading order error terms in those codes.


Introduction
Magic state distillation [3][4][5] is a standard proposed approach to implementing a universal quantum computer. This approach begins by implementing the Clifford group to high accuracy using either stabilizer codes [6,7] or using Majorana fermions [8]. Then, to obtain universality, some non-Clifford operation is necessary, such as the π/4-rotation (T-gate) or the Toffoli gate (or CCZ which is equivalent to Toffoli up to conjugation by Cliffords). These non-Clifford operations are implemented using a resource, called a magic state, which is injected into a circuit that uses only Clifford operations.
Since these magic can produce non-Clifford operations, they cannot themselves be produced by Clifford operations. Instead, in distillation, the Clifford operations are used to distill a small number of high accuracy magic states from a larger number of low quality magic state. There are many proposed protocols to distill magic states: for T gates from T gates [1-3, 5, 9, 10], for Toffoli gates from T -gates [2,[11][12][13][14][15], for Fourier states from Toffoli gates [16], CCZ(Toffoli) states from CCZ gates [17].
In such distillation architectures, the resources (space, number of Clifford operations, and number of noisy non-Clifford operations) required to distill magic states far exceed the resources required to implement most quantum algorithms using these magic states. Hence, improvements in distillation efficiency can greatly impact the total resource cost.
This paper presents a variety of loosely related ideas in distillation. One common theme is exploring various protocols to distill magic states for Toffoli, controlled-S, as well as T -gates. We present several approaches to this. We use a generalization of triorthogonal codes [1] to allow this distillation. In section 3, we give a randomized construction of such codes which achieves distillation efficiency [1] γ → 1; this approach is of some theoretical interest because not only is the distance of the code fairly large (of order square-root number of qubits) but also the least weight stabilizer has comparable weight. In section 4, we give another approach based on Reed-Muller codes. In addition to theoretical asymptotic results here, we also find a particularly striking code which distills 512 T -gates into 10 CCZ magic states while obtaining eight order reduction in error. We also present approaches to distilling Toffoli states which are not based on a single triorthogonal (or generalized triorthogonal code) but rather on implementing a protocol using a sequence of checks, similar to Ref. [2]. As in Ref. [2] we use inner codes to measure various stabilizers of the magic state. We present two different methods of doing this, one based on hyperbolic inner codes in section 5 and one based on normal inner code in section 6 (hyperbolic and normal codes were called even and odd inner codes, respectively, in an early version of Ref. [2]).
In addition to these results for distilling Toffoli states, we present other results useful specifically for distilling T -gates. In particular, in 4.5 we study punctured Reed-Muller codes and find some protocols with a better ratio of input T -gates to output T -gates than any other known protocol for certain orders of error reduction. Another result in 2.4 is a method of reducing the space required for any protocol based on triorthogonal codes at the cost of increased depth.
We use matrices S = diag (1, i), and T = diag(1, e iπ/4 ). Any subscript T denotes connection to the magic state for T gate.

Definitions
We consider classical codes with n bits, so that code words are vectors in F n 2 . Given a vector u, let | u| denote the Hamming weight, i.e., the number of nonzero entries of u. Given a vector u, let u i denote the i-th entry of u. Given two vectors u, v, let u ∧ v denote the entry wise product of u and v, i.e., ( u ∧ v) i = u i v i . Let u · v denote the inner product, so that u · v = i u i v i , where the sum is taken modulo 2.
For us, a classical code C will always refer to a linear subspace of F n 2 . Given two classical codes C, D, let C ∧ D denote the subspace spanned by vectors u ∧ v for u ∈ C and v ∈ D. We will write C ∧2 to mean C ∧ C. Note that C ∧2 can be a proper superset of C. Given a code C, let C ⊥ denote the dual code, i.e. for any vector v, we have v ∈ C ⊥ if and only if v · u = 0 for all u ∈ C. Given two codes, C, D, let span(C, D) denote the span of C and D.
Following Bravyi and Haah [1], a binary matrix G of size m-by-n is called triorthogonal if n j=1 G a,j G b,j = 0 mod 2, (2.1) for all pairs 1 ≤ a < b ≤ m, and n j=1 G a,j G b,j G c,j = 0 mod 2, (2.2) for all triples of rows 1 ≤ a < b < c ≤ m. Further, we will always assume that the first k T rows of G have odd weight, i.e. n j=1 G a,j = 1 mod 2 for 1 ≤ a ≤ k T and the remaining rows have even weight, i.e., n j=1 G a,j = 0 mod 2 for k T + 1 ≤ a ≤ n. (The notation k 1 instead of k T was used in Ref. [1].) Let Let G 0 denote the span of the even weight rows of G. Let G T denote the span of the odd weight rows of G. Let G denote the span of all the rows of G. In association with a triorthogonal matrix we define a triorthogonal code, a quantum CSS code, by letting G 0 correspond to X-stabilizers, and G ⊥ to Z-stabilizers. The distance of a triorthogonal matrix G is defined to be the minimum weight of any nontrivial Z-logical operators of the corresponding triorthogonal code, i.e., the minimum weight of a vector u such that u ∈ G 0 ⊥ but u ∈ G ⊥ . The distance of any subspace C is defined to be the minimum weight of any nonzero vector in that subspace. Clearly, the distance of a triorthogonal matrix G is at least the distance of the subspace G 0 ⊥ .

Triorthogonal Spaces and Punctured Triorthogonal Matrices
Let us define a "triorthogonal subspace" to be a subspace C such that for any u, v, w ∈ C, we have | u ∧ v ∧ w| = 0 mod 2. Given a triorthogonal matrix G, the vector space G 0 is a triorthogonal space. Thus, any k 0 -by-n matrix whose rows span G 0 is a triorthogonal matrix. However, if k T = 0, then the span of the rows of G is not a triorthogonal space. In this regard, we note the following. Let G be an arbitrary triorthogonal matrix of the form where G T is k T -by-n (and contains the odd weight rows of G) and G 0 is k 0 -by-n (and contains the even weight rows of G). Consider the matrix where I denotes a k T -by-k T identity matrix and 0 denotes the zero matrix of size k 0 -byk T . This matrixG is a triorthogonal matrix with all rows having even weight, and its row span defines a triorthogonal spaceG. Thus, from a triorthogonal matrix, we can construct a triorthogonal space by adding k T additional coordinates to the vector and padding the matrix by I. We now show a converse direction, based on the idea of puncturing a code. Given any subspaceC of dimension m, there exists a matrixG whose rows form a basis ofC (after possibly permuting the coordinates of the space) such that for some matrix P , where I m is an m-by-m identity matrix. Such a matrix in the reduced row echelon form is unique once an ordering of coordinate is fixed, and can be computed by Gauss elimination from any spanning set forC. Choose any k T such that 0 ≤ k T ≤ m. Let P T be the first k T rows of P and let P 0 be the remaining rows of P .
is a triorthogonal matrix. We say that this matrix is obtained by "puncturing" the previous code on the given coordinates. By the uniqueness of the reduced row echelon form, the matrices G T and G 0 are determined byC, k T , and the ordering of the coordinates. This idea of padding is related to the following protocol for distillation [18]. We consider k T = 1 for the moment, but a generalization to a larger k T is straightforward. Observe that on a Bell pair |φ = |00 + |11 (we ignore global normalization factors), the action of T on the first qubit is the same as T on the second: T 1 |φ = T 2 |φ . Once we have T 2 |φ , suppose we measure out the second qubit onto |+ . The state on the first qubit is then the magic state T 1 |+ = + 2 | T 2 |φ . If we instead measure the second qubit in the |− state, we can apply a Pauli correction to bring the first qubit to the desired magic state. If the second qubit of this Bell pair is a logical qubit of a code, where the logicalT can be fault-tolerantly implemented, then the above observation enables fault-tolerant creation of the magic state.
The protocol is thus as follows. Consider a triorthogonal code defined by some matrix G; for brevity, we also refer to this code as G below. LetG be the space obtained by padding as above. (i) Create a Bell pair |00 + |11 where the second qubit is embedded in the code G. The Bell pair is the eigenstate of XX and ZZ, which is simply the state stabilized by X(v) for any v in the triorthogonal spaceG, and by Z(v ) for any v inG ⊥ . Thus, this step can be implemented by a circuit consisting of control-NOTs. This circuit can be thought of as the preparation circuit of the superposition of all classical code words ofG. (ii) Apply the transversal T gate on the qubits of G, followed by possible Clifford corrections; these Clifford corrections are either phase gate or control-Z [1]. (iii) Project the logical qubit of the code onto a |+ or |− state. This step can be done simply by measuring individual qubits of the code G in the X basis without inverse-encoding, and classical post-processing. The reason is that X operator on individual qubits commutes with logical X of the code, and hence after the X measurement, the state of the qubits that comprised the code is some eigenstate of the logical X operator. The eigenvalue of this logical X can be inferred by taking the parity of the measurement outcomes, and if necessary we apply a Pauli correction to the magic state on the other side of the initial Bell pair. The eigenvalues of the X stabilizers of the code can also be checked similarly and we post-select on these being in the + state.
This protocol is particularly simple to describe in the case that the matrix G is obtained by puncturing some triorthogonal subspaceG on some set of coordinates. Then the protocol is: prepare the superposition of all classical code words ofG, then apply a transversal T gate on all unpunctured coordinates (followed possibly by a Clifford correction), then measure all unpunctured coordinates, and finally, if classical postprocessing shows that all stabilizers are in the + state, the punctured coordinates are in the desired magic state (up to a Pauli correction which is determined by the classical post-processing).
This protocol is different from preparing encoded |+ , applyingT , and inverse-encoding, in that the Clifford depth is smaller. The only Clifford cost is in the initial preparation of the pre-puncture stabilizer state, and the Clifford correction after T . The Clifford correction after T is absent if the pre-puncture codeG is triply even.

Generalized Triorthogonal Matrices: T -to-CCZ Distillation
Let us now generalize the definition of triorthogonal matrices. This generalization has appeared in [15,App. D], upon which the "synthillation" protocols are built. Our definition is a special case in that we consider only codes that distill T -gates, controlled-S gates, and CCZ gates, rather than arbitrary diagonal matrices at the third level of the Clifford hierarchy. On the other hand, we will present codes of arbitrary distance, rather than just distance 2 of [15].

can be written up to permutations of rows as
where G T has k T rows, G CS has k CS pairs of rows, and G CCZ has k CCZ triples of rows such that Such a generalized triorthogonal matrix can be used to distill n T -gates into k T T -gates, k CS controlled-S gates, and k CCZ CCZ gates, where the CCZ gate is a controlled-controlled-Z gate which is conjugate to the Toffoli gate by Clifford operations. Define a quantum code on n qubits. Take X-type stabilizers of the quantum code which correspond to rows of G 0 (i.e., for each row of G 0 , there is a generator of the stabilizer group which is a product of Pauli X on all qubits for which there is a 1 entry in that row of G 0 ). For each row of G T , G CS and G CCZ there is one logical qubit, with logical X-type operators corresponding to the row. The corresponding Z-type logical operators can be determined by the requirement that they commute with the X-type stabilizers and by the commutation relations for logical X and Z operators. Finally, the Z-type stabilizers of the code are the maximal set of operators that commutes with all logical operators and X-type stabilizers. It is easy to show, by generalizing the arguments of Ref. [1], that applying a T -gate to every qubit will apply T -gates to the logical qubits corresponding to rows of k T and will apply controlled-S gates to each pair of logical qubits corresponding to a pair of rows of G CS , and will apply CCZ gates to each triple of logical qubits corresponding to a triple of rows of G CCZ , up to an overall Clifford operation on the logical qubits. Input errors are detected up to an order given by the distance of the code, where the distance of a generalized triorthogonal matrix G is defined to be the minimum weight of a vector u such that u ∈ G 0 ⊥ and such that u ∈ span(G T , G CS , G CCZ ) ⊥ , with G CS , G CCZ being the row spans of G CS , G CCZ respectively.
To generalized triorthogonal matrices, the puncturing and padding in the previous subsection does not immediately carry over. However, the connection is retained if we consider the puncturing or padding in the following way. Let G be a generalized triorthogonal matrix in the form of (2.6) with k T rows in G T , k CS rows in G CS and k CCZ rows in G CCZ , and let F be another generalized triorthogonal matrix in the form of (2.6) with the same corresponding number of rows in the upper three blocks. Combine the submatrices of G and F as For example, if G is triorthognal where k CS = k CCZ = 0, then F can be such that F T = I k T and F CS = F CCZ = F 0 = 0 (this example is precisely the padding in the previous subsection). Generally, the rows ofG F spans a genuine triorthogonal subspace, and whenever F is canonical (due to e.g. linear independence) we can recover G fromG F , the procedure of which amounts to the puncturing. The distillation protocol of the previous subsection carries over to this generalized punctured code; the only change is that one generally has to inverse-encode the logical qubits of the triorthogonal code of F .

Space-Time Tradeoff For Triorthogonal Codes
We now briefly discuss a way of reducing the space required in any protocol based on a triorthogonal code, at the cost of increasing circuit depth. Consider a code with a total of k logical qubits (k = k T + 2k CS + 3k CCZ ), a total of n X X-type stabilizer generators, and n Z Z-type stabilizer generators. The number n X is equal to the number of rows of G 0 . The usual protocol to prepare magic states is to first initialize the logical qubits in the |+ state, encode, then apply transversal T , measure stabilizers, and, if no error is found, finally decode yielding the desired magic states. It is possible to implement this protocol using only k + n X total qubits as follows.
The idea is to always work on the unencoded state, but we instead spread potential errors so that we can detect them. Recall that encoding is done by preparing a total of n X ancilla qubits in the |+ state (call these the X ancilla qubits), a total of n Z ancilla qubits in the |0 state (call these the Z ancilla qubits), and applying a Clifford. Call this Clifford U . Then, an equivalent protocol is: prepare a total of n X ancilla qubits in the |+ state, a total of n Z ancilla qubits in the |0 state, and apply n j=1 U † exp(iπZ j /8)U , then measure whether all the X ancilla qubits are still in the |+ state. (There is no need to check the Z ancilla qubits since our error model has only Z errors after twirling.) The operator U † exp(iπZ j /8)U is equal to exp(iπP j /8) where P j = U † Z j U , which is a product of Pauli Z operators. Let P j =P j Q j whereP j is a product of Pauli Z operators on some set of logical qubits (which are not embedded in a code space!) and X ancilla qubits, and Q j is product of Pauli Z on some set of Z ancilla qubits. Since the Z ancilla qubits remain in the |0 state throughout the protocol, an equivalent protocol involving only k + n X total qubits is: prepare a total of n X ancilla qubits in the |+ state, and apply n j=1 exp(iπP j /8), then measure whether all the X ancilla qubits are still in the |+ state. Note that although the product over j ranges from 1 to n, there are only k + n X ≤ n physical qubits.
This operator exp(iπP j /8) can be applied by a sequence consisting of a Clifford, a T gate, and the inverse of the Clifford. If a subset of {P j } n j=1 consists of n (multiplicatively) independent operators, where n ≤ k+n X , then we can apply these n operators simultaneously by finding a Clifford that conjugates each of the n operators to distinct Pauli Z operators. In the best situation, we can obtain a protocol using k + n X total qubits, that requires n k+n X rounds of Cliffords and T -gates. While the T -depth of the circuit is larger than the original protocol, the total circuit depth may or may not increase: if the Cliffords are implemented by elementary CNOT gates, then the circuit depth depends upon the depth required to implement the various encoding and decoding operations. Other tradeoffs are possible by varying the number of Z ancillas that are kept: keeping all Z ancillas is the original protocol with minimal depth and maximal space, while reducing the number will increase depth at the cost of space.
A Z error on a T gate will propagate due to the Cliffords. Specifically, a Clifford U (j) that maps exp(iπZ j /8) to exp(iπP j /8), will map an error Z j toP j , but the errorP j will not further be affected by the other exp(iπP j /8) since they commute. The accumulated error will flip some X ancilla qubits as well as the logical qubits that would be flipped in the usual protocol. The association from the errors in T gates to the logical and X ancilla qubits is identical to the usual protocol. Hence, in the present space-time tradeoff, the output error probability and the success probability are identical to the usual protocol, whenever the error model is such that only T gates suffer from Z errors.

Randomized Constructon of Triorthogonal and Generalized Triorthogonal Matrix
We now give a randomized algorithm that either returns a triorthogonal or generalized triorthogonal matrix with the desired n, k T , k CS , k CCZ , k 0 , or returns failure. For notational simplicity, we begin with the case of k CS = k CCZ = 0, i.e., a triorthogonal matrix. We then explain at the end how to construct generalized triorthogonal matrices by a straightforward generalization of this algorithm.

Randomized Construction of Triorthogonal Matrices
The matrix is constructed as follows. We construct the rows of the matrix iteratively, choosing each row uniformly at random subject to constraints given by previous rows. More precisely, when choosing the j-th row of the matrix, we choose the row uniformly at random subject to (i) the constraint (2.1) for b = j and for all a < j, (ii) the constraint (2.2) for c = j and for all a < b < j, and (iii) the constraint that the row has either even or odd weight depending on whether it is one of the first k T rows of G or not. If it is not possible to satisfy all these constraints, then we terminate the construction and declare failure. Otherwise, we continue the algorithm. If we are able to satisfy the constraints for all rows of G, we return the resulting matrix; in this case, we say that the algorithm "succeeds." Note that all of these constraints that enter into choosing the j-th row are linear constraints on the entries of the row. Eq.
constraints (the constraints need not be independent). We can express these constraints as follows: let g a denote the a-th row vector of G. 1]. Thus, the matrix M j is determined by g 1 , . . . , g j−1 . The constraints on g j can then be written as , all rows but the last of M j ), then the constraints have no solution; otherwise, the constraints have a solution. Let M j denote the row span of M j ; then, for k T < j, the constraint (3.2) is equivalent to requiring that g j ∈ M ⊥ j . We now analyze the probability that the algorithm succeeds, returning a matrix G. We also analyze the distance of G 0 ⊥ . Our goal is to show a lower bound on the probability that the distance is at least d, for some d. The analysis of the distance is based on the first moment method: We estimate the probability that a given vector u is in G 0 ⊥ . We then sum this probability over all choices of u such that 0 < | u| < d and bound the result.
Let u be a given vector with u = 0 and u = 1. Let us first compute the probability that u ∈ G 0 ⊥ and u ∈ M m conditioned on the algorithm succeeding. When u ∈ M m , it holds that u ∈ M j for all j ≤ m. Hence, since u / ∈ M j implies that the condition u · g j = 0 is independent of the constraints in (3.1) or (3.2). Note that success of the algorithm depends only on the choices of the odd weight rows, and the even weight rows are chosen after the odd weight rows so that the choice of g j does not affect success. So, Now consider the probability that the algorithm succeeds and u ∈ M m . As a warm-up, we consider the probability that the algorithm succeeds and that some vector with small Hamming weight is in G (the span of all rows of G). We will use big-O notation from here on, considering the asymptotics of large n. Let be the binary entropy function.

Lemma 1.
Consider any fixed u = 0. Then, the probability that the algorithm succeeds and that u is in G is bounded as: We consider each of the 2 m − 1 possible nonzero choices of the vector b and bound the probability that, for the given choice of b, u = m i=1 b i g i for g chosen by the algorithm. For a given choice of nonzero b, let k be the largest i such that b i = 0. The vector g k is chosen randomly subject to 1 2 k(k − 1) + 1 constraints. Hence, for given g 1 , . . . , g k−1 and given b, u, the probability that Summing over these choices and summing over k, Eq. (3.5) follows.
By a first moment bound, the probability that there is a nonzero vector of weight at most w in G is bounded by Similarly, the probability that there is a vector with weight at least n − w in G is bounded by < 0 and the first moment bound gives a result which is o(1).

Lemma 2.
Let θ and c be chosen as in Lemma 1, and suppose m ≤ θ √ n. Let 0 < ρ < 1 2 be a constant. Then, the probability that the algorithm succeeds and that the (classical) minimum distance of the subspace M m is smaller than ρn is at most For sufficiently small θ > 0, there are ρ, c > 0 such that this expression tends to zero for large n.
Proof. We say that G has good distance if all nonzero vectors where o(n) term is from Lemma 1. By Eq. (3.6), the probability that the algorithm succeeds and that G does not have good distance is o (1).
Let u = 0, 1. We now bound the probability that the algorithm succeeds and that G has good distance and that u ∈ M m .
If u ∈ M m , then for some m-by-m binary upper triangular matrix A ij and for some a ∈ F 2 , we have (3.7) We consider each of the 2 m(m−1)/2 − 1 possible nonzero choices of the matrix A and each of the two choices of a, and bound the probability that Eq. (3.7) holds for the given choice. Suppose a = 0 (the case a = 1 follows from this case by considering the vector u + 1). For a given choice of A, let k be the largest such that A ik = 0 for some i ≤ k. Let g 1 , . . . , g k−1 be given; we compute the probability that g k is such that Eq. (3.7) holds. Eq. (3.7) imposes an inhomogeneous linear constraint on g k as Assuming G has good distance, we have | w| ≥ cn − o(n). Then, the linear contraint Eq. (3.8) has rank at least cn − o(n); in fact, it fixes at least cn − o(n) components of g k . The vector g k is chosen randomly subject to 1 2 k(k − 1) + 1 linear constraints. Hence, the probability that Eq. (3.8) holds is at most Summing over all choices of A ij , the probability that the algorithm succeeds and that G has good distance and that u ∈ M m is bounded by

The number of vectors
Hence, by a first moment argument, the probability that the algorithm succeeds and that G has good distance and that M m has distance smaller than ρn for ρ ≤ 1/2 is We know c → 1 2 as θ → 0. For small enough θ we have −cn + m(m − 1) = −Ω(n). Hence this probability is o(1) for sufficiently small ρ. Finally, Then, for sufficiently small θ > 0, the algorithm succeeds with probability 1 − o (1).
Proof. Suppose the algorithm fails on step k ≤ k T . (The algorithm never declares failure after k T steps as the constraints become a homogeneous linear equation.) Then, the first k − 1 steps of the algorithm succeed and the vector 1 must be in the span of { g i ∧ g j : i ≤ j < k}. The probability that this happens is o(1), as we can see using the same proof as in Lemma 2. There is one minor modification to the proof: Eq. (3.7) should be replaced by Also, there is no need to sum over vectors u as instead we are considering the probability that a fixed vector is in the span of { g i ∧ g j : i ≤ j < k}. Otherwise, the proof is the same. Hence, and with high probability the algorithm succeeds and the triorthogonal matrix G has distance Proof. By Lemma 3, the algorithm succeeds with high probability for sufficiently small θ > 0. By Lemma 2, for sufficiently small θ > 0, for m = θ √ n , the distance of M m is Θ(n) with high probability. Now we condition on the event that the algorithm succeeds and M m has linear distance.
The distance of the triorthogonal matrix G can be bounded by a first moment bound. Since M m has linear distance, the event that u ∈ M m for any nonzero u of weight o(n) does not happen. Then, we can apply Eq. (3.4) using the fact that for any constant C, the number of vectors with weight at most C √ n/log n is 2 C √ n/2+o (1) . So, for sufficiently small C the first moment bound implies that the probability that there is u ∈ G ⊥ 0 of weight ≤ C √ n/ log n is o (1). Now that in this regime, the distillation efficiency [1] defined as γ = log(n/k T )/ log(d) converges to 1 as n → ∞.

Randomized Construction of Generalized Triorthogonal Matrices
The randomized construction of triorthogonal matrices above immediately generalizes to a randomized construction of generalized triorthogonal matrices. In the previous randomized construction, each vector g j was chosen at random subject to certain linear constraints. Note that Eqs. (3.2,3.1) have the same left-hand side but different right-hand side. These constraints were homogeneous for row vectors in G 0 (i.e., Eq. (3.2) has the zero vector on the right-hand side) and inhomogeneous for row vectors in G T (i.e., Eq. (3.1) has one nonzero entry on the right-hand side). For a generalized triorthogonal matrix, we follow the same randomized algorithm as before except that we modify the constraints on the vectors g j . The vectors will still be subject to linear constraints that M j g j is equal to some fixed vector, with the same M j as before. However, the fixed vector is changed in the generalized algorithm to obey the definition of a generalized triorthogonal matrix. This modifies the success probability of the algorithm, but one may verify that the algorithm continues to succeed with high probability in the regime considered before.
where the right-hand side is the list of function values. In this bijection, the ordering of elements of F m 2 is implicit, but a different ordering is nothing but a different ordering of bits, and hence as a block-code it is immaterial. For example, the degree zero polynomial f (x 1 , . . . , x m ) = 1 is a constant function, that corresponds to all-1 vector of length 2 m , and a degree 1 polynomial f (x 1 , . . . , x m ) = x 1 is a function that corresponds to a vector of length 2 m and weight 2 m−1 . Since the variables x i are binary, we have x 2 i = x i , and every polynomial function is a unique sum of monomials where each variable has exponent 0 or 1.
For an integer r ≥ 0 the Reed-Muller code RM (r, m) ⊆ F 2 m 2 is defined to be the set of all polynomials (modulo the ideal (x 2 1 − x 1 , x 2 2 − x 2 , . . .)) of degree at most r, expressed as the lists of function values. A property we make routine use of is that whenever a polynomial does not contain x 1 · · · x m (the product of all variables), the corresponding vector of length 2 m has even weight. This allows us to see that the dual of RM (r, m) is again a Reed-Muller code, and direct dimension counting shows that For Reed-Muller codes it is easy to consider the wedge product of two codes, which appears naturally in the triorthogonality. Namely, given two binary subspaces V and W , we define the wedge product as It follows that RM (r, m) is triorthogonal subspace if 3r < m. (In fact, it is triply even.) Since a basis of Reed-Muller codes consists of monomials where each variable has exponent 0 or 1, it is often convenient to think of a monomial as a binary m-tuple, that specifies which variable is a factor of the monomial. For example, if m = 3, the constant function f = 1 can be represented as (0, 0, 0), the function f = x 1 can be represented as (1, 0, 0), and the function f = x 2 x 3 can be represented as (0, 1, 1). This m-tuple is called an indicator vector.
(In contrast to what the name suggests, the "sum" of indicator vectors is not defined.) An indicator vector a that defines a monomial corresponds to a code word I a ∈ F 2 m 2 . Under the wedge product of two code words, the corresponding two monomials is multiplied. In terms of indicator vector, this amounts to taking bit-wise OR operation which we denote by ∨: (4.7) For example, if m = 3,

Triorthogonal codes for CCZ
In [11,12], a construction was presented to distill a single Toffoli gate from 8 T gates, so that any single error in the T gates is detected. More quantitatively, if the input T gates have error probability in , the output Toffoli has error probability out = 28 2 in + O( 3 in ). A protocol of similar performance based on a generalized triorthogonal matrix was presented in [15]. In this subsection, we present alternatives to these constructions that builds upon Reed-Muller codes, yielding higher order error suppression. The protocol of [15] will be the same as our smallest instance.
Let m be a multiple of 3. We consider RM (r = m/3 − 1, m) to build a generalized triorthogonal code on 2 m qubits, with k T = k CS = 0 but k CCZ > 0. Since 3r = m−3 < m, the generating matrix of RM (m/3 − 1, m) qualifies to be G 0 . The Z-distance of the triorthogonal code is at least the distance of RM (m/3 − 1, m) ⊥ = RM (2m/3, m), which is 2 m/3 . (In fact, it is exactly this for the following constructions.) We choose triples of G CCZ specified by triples of indicator vectors a (i) , b (i) , c (i) . The tri-orthogonality conditions can be summarized as follows.
(A similar set of conditions for G CS should be straightforward.) We choose a (i) , b (i) , c (i) to have weight exactly m/3, so that the first six conditions above are automatically satisfied. We will give three constructions of triples obeying these requirements. One construction will be analytic, one will be numerical, and one will be a randomized construction using the Lovasz local lemma. It may be useful for the reader to think of a vector a i ∈ F m 2 as corresponding to a subset A i of some set S with |S| = m. Then, a triple consists of three disjoint subsets A i , B i , C i of cardinality m/3 each.
The analytic construction is as follows: where we labeled a triple by u ∈ F . The case u = 0 and u = 1 are excluded, for the triple to satisfy the other generalized triorthogonality conditions. Suppose that x, y, z are rows of G CCZ and are not all from the same triple. We need to check that |x ∨ y ∨ z| < m. This condition is violated only if x = (u x ,ū x , 0) and y = (0, u y ,ū y ) and z = (ū z , 0, u z ) for some u x , u y , u z because there is no way to have 1 = 0 ∨ 0 ∨ u ∈ F m/3 2 unless u = 1, which case we have excluded. But then, we must have that u x = u y = u z to have |x ∨ y ∨ z| = m.
In the particular case m = 3, this construction gives k CCZ = 0. However, we can instead have k CCZ = 1 with the triple of indicator vectors (1, 0, 0), (0, 1, 0), (0, 0, 1), corresponding to polynomials x 1 , x 2 , x 3 . The full generalized triorthogonal matrix is  where the part above the line is G CCZ and that below the line is G 0 . This triorthogonal matrix is maximal in the sense that (G ∧2 ) ⊥ = G 0 . The resulting distillation routine has error probability 28p 2 + O(p 3 ) if input T -states have error probability p. This protocol was given in [15], and is similar to those of [11,12] For m = 6, we find n = 64, k CCZ = 2 and distance 4, of which the triples in terms of polynomials are {x 1 x 2 , x 3 x 4 , x 5 x 6 } and {x 2 x 3 , x 4 x 5 , x 6 x 1 }. We examined m = 6 instance further to see if there could be more logical qubits extending the two triples, but found that there does not exist any extra solution to the generalized triorthogonality equations. Instead, we were able to extend G 0 . The resulting generalized triorthogonal matrix, denoting each row by a polynomial, is (4.11) This triorthogonal matrix is also maximal in the sense that (G ∧2 ) ⊥ = G 0 . The leading term in the output error probability is 2944p 4 . The coefficient was obtained by brute-force weight enumeration and MacWilliams identity. This protocol is similar to that of [13], but not identical; the 64T -to-2CCZ protocol here has a smaller coefficient in the output error probability than that of [13]. If the efficiency measure of a distillation protocol is the ratio of the number of input T gates to the number of output CCZ gates at a given order of error reduction, then composing a quadratic T -to-T protocol such as those of [1] and the 8T -to-1CCZ protocol above is better than the 64T -to-2CCZ protocol here. For m = 9 we find n = 512, k CCZ = 6 and distance 8. We then did a numerical search to see if it would be possible to have a larger k CCZ , restricting to the case that the triples of G CCZ are associated with triples of indicator vectors of weight m/3. We were able to find k CCZ = 10, and further extend G 0 to make the resulting triorthogonal matrix maximal in the sense that (G ∧2 ) ⊥ = G 0 . The resulting [[512, 30,8]] code is the following.
Here, each line in G CCZ contains a triple of polynomials (actually monomials). The algorithm we used was as follows. We used a version of the algorithm in the constructive proof of the Lovasz local lemma of Ref. [19]. We define a subroutine to initialize a triple, which, for given i, sets a (i) , b (i) , c (i) to be random indicator vectors of weight m/3 each, subject to the constraint that a (i) ∨ b (i) ∨ c (i) = 1. That is, "initializing a triple" is to choose a (i) at random of weight m/3, and then choose b (i) at random of weight m/3 with its 1 entries only in the 0 entries of a (i) , after which c (i) is determined by the constraint a (i) ∨ b (i) ∨ c (i) = 1. Then we do the following: 1. Pick k CCZ and initialize k CCZ different triples.

2.
Look for a violation of the triorthogonality conditions. We check rows x, y, z of the matrix in lexicographic order. A violation is when x, y, z are not in the same triple but x ∨ y ∨ z = 1.

If a violation of the conditions exists for vectors
x, y, z, then we find the triples containing x, y, z, and initialize those (at most three) triples, and go to 2. If no violation exists, exit the algorithm, reporting success.
We run this algorithm until it reports success or until we give up and terminate the algorithm. We also tried a slight modification of the algorithm, in which we did some random permutation of the triples at various steps (this has an effect similar to randomizing the order in which we check the conditions).

Lovasz Local Lemma
The randomized numerics above used an algorithm in the constructive proof of the Lovasz local lemma. Here, we show what the local lemma implies about the possible scaling of k CCZ for large m. Note that, as we will see shortly, in the regime where we ran the algorithm above (m = 9) the local lemma does not guarantee a solution.
Suppose that there are n triple = k CCZ triples. Imagine choosing each triple at random, following the initialization routine of the above algorithm. Label the triples by an integer ranging 1, . . . , n triple . Define a bad event E i,j,k to be the event that for three triples, labelled i, j, k, with 1 ≤ i < j < k ≤ n triple , there is a violation of the triorthogonality conditions involving one indicator vector from each triple. We call such events E i,j,k "three-triple events". Define a bad event E i,j to be the event that for two triples, labelled i, j, with 1 ≤ i < j ≤ n triple , there is a violation of the triorthogonality conditions involving one indicator vector from one triple and two indicator vectors from the other triple. We call such event E i,j "two-triple events".
The probability of E i,j,k can be estimated as follows: There are 3 3 = 27 different choices of indicator vectors if we choose one indicator vector from each triple. The vector from the first triple is random. The probability that the vector from the second triple has no overlap with the vector from the first triple is We use the following statement of the Lovasz local lemma [20]. Define a dependency graph on a set of events such that two events are adjacent if and only if they are dependent.

Error Probabilities and Quantitative Values
The generalized triorthogonal matrix has distance d = 2 m/3 . The number of error patterns of weight d which do not violate any stabilizer of the code is equal to the number of code words of RM (2m/3, m) with weight d. This is known [21] to equal , (4.18) where µ = m − r with in this case r = 2m/3 so µ = m/3. For m = 3, A d = 28. For m = 6, A d = 10416. For m = 9, A d = 50434240 ≈ 5 × 10 7 . The leading coefficient in the output error rate is of course at most these numbers, since there could be Z-stabilizers of weight d. Further, in the m = 6 and m = 9 cases above, we extended G 0 so the number of error patterns of weight d is strictly smaller than A d . Indeed, for our maximal m = 6 code, a direct enumeration shows that there are 3248 error patterns that does not violate X-stabilizers, out of which 304 are Z-stabilizers.
It is also known [22] that all weights of RM (2m/3, m) between d and 2d are of the form 2d − 2 i for some i, so that the next weight after d is equal to 3d/2.
To give some numbers when using these codes in a distillation protocol, consider the m = 9 case with k CCZ = 10. Suppose we have an input error probability in = 10 −3 . Then, the probability that the protocol succeeds (i.e., that no stabilizer errors are detected) is lower bounded by (1 − in ) 512 ≈ 0.599. The average number of output CCZ magic states is then n CCZ ≈ 5.99. We expect that for m = 9 the contribution of errors with weight 3d/2 = 12 will be negligible compared to the leading contribution. Thus, we approximate that the output error probability by out ≈ A d where the factor (1 − in ) 504 represents the requirement that none of the other input T gates have an error. We expect that this is an overestimate because, as mentioned above, not all error patterns of weight d that do not violate a stabilizer will lead to a logical error and also we have added additional stabilizers to G 0 . Thus, the ratio out = out /n CCZ ≈ 5.1 × 10 −18 . We use 512/n CCZ ≈ 85.5 T -gates per output CCZ magic state.
It requires [12,23] 4 high-quality T -gates to produce a single high-quality CCZ state, so this protocol's efficiency is comparable, if the goal is to produce CCZ states, to a protocol that uses only 85.5/4 ≈ 21.4 input T -gates per output T -gate. Since one uses 4 T -gates to make a CCZ state, the quality of those output T -gates must be four times better than the needed CCZ quality.
If one is able to improve the input error rate then the protocol becomes more efficient as the success probability becomes higher, asymptoting at 51.2 T -gates per output CCZ magic state, comparable to a protocol using 12.8 input T -gates to produce an output T -gate. Alternatively, one can also make the protocol more efficient by applying error correction as follows. Choose some integer m ≥ 0. Then, modify the protocol; as usual, one encodes logical qubits in the |+ state into the error correcting code, applies a transversal T -gate, and then measures the stabilizers. However, while usually one would declare failure if any stabilizer errors occur, one can instead apply error correction: if the error syndrome can be caused by at most m errors, then one corrects those errors by applying Pauli Z operators to the appropriate physical qubits. For example, at in = 10 −3 , the probability that there are 0 or 1 input errors is equal to (1 − in ) 512 + 512 in (1 − in ) 511 ≈ 0.906, giving the acceptance probability for m = 1. Applying this error correction does reduce the quality of the output states: with m = 1, now seven input errors can cause a logical error. The number of such weight seven input error patterns that cause a logical error is at most 8A d , so that the output error per output logical qubit is approximately 8A d 7 in /10 ≈ 5 × 10 −14 .

Punctured Reed-Muller Codes
Motivated by the puncturing ideas of 2.2, we have considered puncturing a Reed-Muller code. Instead of using RM (m/3 − 1, m) as before, we now consider RM (r, 3r + 1). This code is triorthogonal as before, and is maximal in the sense that (G ∧2 ) ⊥ = G 0 . We then randomly puncture this code. The codes we found numerically are listed in Tables 1,2. Observe that the coefficients A d in the output error probabilities are fairly small given the code lengths.
We found that there is a unique d = 5 code that can be obtained by puncturing RM (2, 7); it is [[125, 3,5]]. This was simple to check: Any three-puncture in RM (r, m > 1) is equivalent, 1 and we numerically verified that any four-puncture to RM (2, 7) gave d = 4.
Let us now explain our numerical techniques. The number k of logical qubits in each case in the tables was calculated after the puncture; k is equal to the number of punctures only if the submatrix of the generating matrix of RM on the punctured coordinates is full rank. The Z-distance, which is relevant to the distillation purposes, is computed either by the MacWilliams identity applied to X-stabilizer weight enumerators that are computed by brute force enumeration, or by enumerating all Zlogical operators of a given weight. The computed Z-distance is in fact the true code distance since the Z-stabilizer group contains a subgroup associated with the bit strings of the Xstabilizer group. The MacWilliams identity was an effective method especially when the base code was RM (2,7) where there are only 29 X-stabilizers prior to puncture. For this base code, we simply did a random search, trying many different random punctures of the code, and selected good examples that we found.
When the base code was RM (3,10), there are 176 X-stabilizers to begin with, so the brute force enumeration of the X-stabilizer weight enumerator became prohibitive unless many coordinates were punctured. Also, at larger distances (≥ 5), a guided search became more efficient than a random search among codes. To solve both these problems, we used an "unpuncturing" strategy based on the following observation. Let G 0 be a matrix whose rows represent X-stabilizers, and suppose G is a matrix whose rows represent X-logical operators such that any Z-logical operator of minimal weight d anticommutes with at least one X-logical operator of G . Then, we consider a new X-stabilizer matrix I G 0 G 0 . We claim that this new code does not have any Z-logical operator of weight ≤ d. The proof is simple: If the bit string v of a Z-logical operator of weight ≤ d have nonzero substring on the columns of G 0 , then, by construction, that substring must have weight at least d, but such a substring has odd overlap with some row of G which must be cancelled by the substring on the columns of I. This forces the weight to be larger than d. The construction of a new code by adding more stabilizers and qubits, is precisely the inverse of the puncturing procedure (up to permutations of qubits), hence the name "unpuncturing." For small distances, e.g., d = 3, it is easy to enumerate all Z-logical operators of weight d. We then select X-logical operators to "catch" those minimal weight Z-logical operators, and identify the punctured coordinates that gave rise to the chosen X-logical operators. One X-logical operatorX was chosen each time so that the number of the mimimal weight Zlogical operators thatX anticommutes with is maximized. The codes in Table 2 were found by this unpuncturing. We started with a random puncturing giving a d = 3 code and then successively unpunctured to obtain distance 4, 5 codes. The d = 6 and d = 7 codes in Table 2 1 A punctured code from a Reed-Muller code is determined by the isomorphism class under affine transformations of the set of points corresponding to the punctured coordinates in the m-dimensional unit hypercube, since an affine transformation is an automorphism of F2[x1, . . . , xm]/(x 2 1 − x1, . . . , x 2 m − xm). Any three-point set in the unit hypercube is affinely independent, and hence is affinely equivalent to any other three-point set.  29,32]. The decimal integers are short-hand notation for the binary coordinate that indexes bits in the Reed-Muller code; e.g., "3" in the first example means that one has to puncture the bit labelled by 0000011 ∈ F 7 2 . The number of Z-logical operators of weight d is obtained by the MacWilliams identity applied to the X-stabilizer weight enumerators. Since the Z stabilizer group in any case corresponds to a subspace of dual of the pre-puncture Reed-Muller code, the minimal weight of any Z stabilizer is at least 8. Every X-stabilizer has weight a multiple of 8, and there is a basis of X-logical operators such that each basis element has weight 7 mod 8. Hence, the transversal T becomes T † on every logical qubit. As a distillation protocol, the output error probability is A d p d at the leading order where p is the independent error probability of the input T states.
To give some numbers when these codes are used in a distillation protocol, consider a in = 10 −3 input error rate using the [[912, 112, 6]] code. In this case, the probability that the protocol succeeds is at least p acc = (1 − in ) 912 = 0.401. The average number of output T magic states is then n T ≈ 44.97. We expect that the dominant contribution to the errors is from the leading order so we approximate the output error probability per output state by out ≈ A 6 6 in (1 − in ) 906 /n T ≈ 1.07 × 10 −17 . We use 912/n T ≈ 20.28 T -gates per output CCZ magic state. One can also use error correction to increase the success probability at the cost of an increase in output error rate. If one corrects a single error, the acceptance probability becomes approximately p acc = (1 − in ) 912 + 912 in (1 − in ) 911 ≈ 0.768. Applying this error correction does reduce the quality of the output states since now five input errors can cause a logical error. The number of such weight five input error patterns that cause a logical error is at most 6A 6 , so that the output error per output logical qubit is approximately  know with the given input and output error rates; further, the performance of the punctured codes will improve at lower input errors where the success probability becomes closer to 1. We expect that for the [[937, 87, 7]] code one can find even lower output error rates.
We have explained a distillation protocol that is particularly well-suited for any punctured (and hence for all) triorthogonal code in Section 2.2 [18]. Since RM (r, 3r + 1) is triply even, the Clifford correction after applying transversal T is absent and so the only Clifford cost for the present punctured Reed-Muller codes is in the preparation of the stabilizer state v∈RM (r,3r+1) |v .
The Clifford circuit to prepare this stabilizer state is a coherent version of a classical encoding circuit, and there exists an encoding circuit of depth m using (m/2)2 m CNOTs (if one can implement CNOT across any pair of qubits), using the recursive construction of Reed-Muller codes. This circuit can be described as follows: using 2 m qubits labelled by bit strings of length m, prepare all qubits labelled by bit strings with Hamming weight ≤ r in the |+ state and prepare all other qubits in the |0 state. Then, for m rounds, labelled by integers 1, . . . , m, do the following: on the j-th round, for each of the 2 m−1 qubits labelled by a bit string with a 0 in the j-th position of that bit string, apply a CNOT with that qubit as source and with the target being the qubit labelled by the bit string which agrees everywhere with the source bit string, except that it is 1 in the j-th position. This circuit is the same as the encoding circuit used for polar codes, up to different choices of the input state [24,25].
5 T -to-CCZ protocols using hyperbolic weakly self-dual CSS codes In Ref. [2], we have classified weakly self-dual CSS codes on n inner qubits into two types. If S is the self-orthogonal subspace of F n 2 corresponding to the stabilizers of the code, the distinction criterion is whether S contains all-1 vector 1. If 1 ∈ S, the space of representing logical operators S ⊥ /S is hyperbolic, and the parameters n inner , k inner , and the code distance must be even numbers. For hyperbolic codes, the binary vector space corresponding to the logical operators is isomorphic to direct sum of hyperbolic planes. Here, we only consider hyperbolic codes. Choose a basis { (1) , (2) , . . . , (k inner ) } of S ⊥ /S such that the dot product between the basis vectors satisfy 2 We call such a basis hyperbolic, and Gram-Schmidt procedure can be used to find a hyperbolic basis. We define logical operators as Note that this is different from the magic basis of Ref. [2] where a pair of logical qubits are swapped under the transversal Hadamard. We now investigate the action of transversal S gate. Since SXS † = Y = −iZX, unless n inner is a multiple of 4, the transversal S is not logical. However, there is a simple way to get around this. Instead of applying S on every qubit, we assign exponents t i = ±1 to each qubit i, which depends on the code, and apply i S t i . We choose t i such that where it is implicit that the elements of F 2 are promoted to usual integers by the rule that F 2 0 → 0 ∈ Z and F 2 1 → 1 ∈ Z, and {b (j) } is a basis of the F 2 -vector space S. A solution t i to these conditions always exists, because the Gauss elimination for the system of equations over Z/4Z, never encounters division by an even number when applied to a full F 2 -rank matrix.
Once we have a valid t i , then it follows that i v i t i = 0 mod 4 for any vector v ∈ S. Since any vector is a sum of basis vectors, which are orthogonal with one another, this follows from the following identity. For any integer vector y [1,26] i Likewise, for any vector ∈ S ⊥ and any s ∈ S, we have i i t i = i ( + s mod 2) i t i mod 4. We now show that the action of i S t i on the logical state |x 1 , . . . ,x k inner is control-Z on hyperbolic pairs of logical qubits: where in the third line we used (5.4) and in the last line we used (5.1). Therefore, if we implement control-S gate over a hyperbolic code, then we implement a measurement routine for product of CZ operators. The control-S can be implemented using an identity where C U = |0 0| ⊗ I + |1 1| ⊗ U ; in particular, C e iπ/4 = |0 0| + e iπ/4 |1 1| ⊗ I. Since a hyperbolic CSS code contains 1 in the stabilizer group, we know i t i 1 i = 0 mod 4, and the control-phase factor will either cancel out or become Z on the control. If T gates in this measurement routine are noisy with independent Z errors of probability p, then upon no violation of stabilizers of the hyperbolic code, the measurement routine puts O(p 2 ) error into the measurement ancilla, and O(p d ) error into the state under the measurement where d is the code distance of the hyperbolic code.

Quadratic error reduction
The control-Z action on the logical level can be used to implement control-control-Z, whenever the hyperbolic code is encoding one pair of logical qubits. The smallest hyperbolic code that encodes one pair of logical qubits is the 4-qubit code of code distance 2, with stabilizers XXXX and ZZZZ. The choice of logical operators that conforms with our hyperbolic conditions is The exponents t i for S is thus Using this choice of t i , the phase factor in (5.7) cancels out. Every non-Clifford gates enters the circuit by (5.7), and hence any single error will be detected. Since the ancilla that controls S inside the hyperbolic code can be contaminated by a pair of T gates acting on the same qubit, there is little reason to consider hyperbolic code of code distance higher than 2. When applied to |+ ⊗3 , the routine described here outputs one CCZ state using 8 T -gates, with output error probability 28p 2 + O(p 3 ) where p is the independent error rate of T gates.
The overall circuit is very similar to quadratic protocol in Jones [13] in which the same choice of logical operators are used, but control-(T XT † ) ⊗4 is applied on the code, followed by syndrome measurement and then π/2 rotation along x-axis on the Bloch sphere. In contrast, we apply control-(T XT † X) 1 4 , and then syndrome measurement, without any further Clifford correction.

Quartic error reduction
For a higher order error suppression of CCZ states, we use the hyperbolic codes to check the eigenvalue of the stabilizers of the CCZ state |CCZ = CCZ |+ ⊗3 . The stabilizers are (CZ) 12 X 3 , (CZ) 13 X 2 , and (CZ) 23 X 1 . (These are obtained by conjugating X 1,2,3 , the stabilizers of |+ ⊗3 , by CCZ gate.) As there are three stabilizers, we need three rounds of checks. By symmetry, it suffices to explain how to measure (CZ) 12 [27][28][29].) Take k independent output CCZ states from the quadratic protocol in the previous subsection, and separate a single qubit from each of the CCZ states. On these separated qubits we act by C X with a common control. The remaining 2k qubits are then embedded into the hyperbolic code, with which C (CZ) will be applied on the logical qubits, using 2n T gates with independent error probability p. It is important that the control qubit is common for all controlled gates. This way, the product of k stabilizers on the k CCZ states are measured.
One has to run this check routine three times for each of the three stabilizers of CCZ states. In total, the number of input T gates is 8k + 6n inner where 8k is from the protocol in the previous subsection, and 3 · 2n inner is inside the distance-4 hyperbolic inner code.
Upon no stabilizer violations of the inner code and outer code measurements, the protocol outputs k CCZ-states. If the inner hyperbolic code does not have any error on T gates while implementing C (CZ), then the output CCZ states' error rate is quadratic in the input CCZ states' error rate. This being quadratic is due to the fact that we have an outer code of code distance 2. (An outer code is one that specifies which input states to check. See [2] for detail.) Thus, the output error from this contribution is k 2 (28p 2 ) 2 at the leading order. There could be a pair of errors in the T gate inside the inner code that flips the eigenvalue measurement of (CZ)X. In order for this type of error to be output error there must be an odd number of errors in the input CCZ states. Hence, the contribution to the output error probability is k · 28p 2 · 3np 2 at leading order.
Finally, the inner code may have 4 errors leading to logical errors since the code distance is 4. An upper bound on this contribution to the output error probability is 3 · 2 3 A 4 p 4 , where A 4 is the number of Z logical operators of the inner code of weight 4. The factor of 2 3 is because one Z error on a qubit of the inner code can occur in one of two places, and the half of all such configurations lead to an accepted output. This is likely an overestimate because a logical error from a check out of three checks can be detected by a later check. In case of the Reed-Muller codes, we see Using [ [16,6,4]], the output error probability has leading term at most 9744p 4 or out = 3.2 × 10 3 p 4 per output, and the input T count is n T = 40 per output CCZ. This particular protocol is worse in terms of input T count than the protocol by a generalized triorthogonal code above, the protocol of [13], or a composition of quadratic protocols of T -to-T [1] and 8Tto-1CCZ [15], but better in terms of space footprint (< 25 qubits). Using [[32, 20, 4]] we see out ≈ (7.7 × We have ignored the acceptance probability. Since the input CCZ states can be prepared independently using only 8 T gates, we may assume that the preparation is always successful. Termination of the protocol is due to nontrivial syndrome on the distance 4 code. Since there are 6n T gates, the overall acceptance probability is at least (1 − p) 6n .
In the next section, we present another family that has even lower asymptotic input T count.
6 Clifford stabilizer measurements using normal weakly self-dual CSS codes As an extension of Ref. [2], we can turn m-copies of any normal weakly self-dual CSS code (normal code) into a measurement routine of magic states of form U |+ ⊗m where U belongs to the third level of Clifford hierarchy. This is based on the observations that such states have Clifford stabilizers of form V i = U X i U † , which can be measured by controling the middle X i , and that any normal code admits transversal implementation of logical V i . For the clarity of presentation, we will explain protocols for distilling CCZ states, and leave general cases to the readers. If V i involves S-gates, one has to choose appropriate exponents t i = ±1 such that ⊗ i S t i i on physical qubits becomes a logical S-gate; see the previous section.
Recall that a normal code is a weakly self-dual CSS code, defined by a self-orthogonal binary vector space S such that 1 / ∈ S. In such a code the binary vector space S/S ⊥ corresponding to the logical operators, has a basis such that any two distinct basis vectors have even overlap (orthogonal) but each of the basis vector has odd weight. Associating each basis vector to a pair of X-and Z-logical operators, we obtain a code where the transversal Hadamard induces the product of all logical Hadamards.
Observe that in a normal code the transversal X anti-commutes with every Z logical operator, and hence is equal to, up to a phase factor, the product of all X logical operator. In the standard sign choice of logical operators where every logical X is the tensor product of Pauli X, the transversal X is indeed equal to the product of all X logical operators. Likewise, the transversal Z is equal to the product of all Z logical operators. Then, it follows that control-Z across a pair of identical normal code blocks is equal to the product of control-Z operators over the pairs of logical qubits. Therefore, given three copies, labeled A, B, C, of a normal code [[n inner , k inner , d]], if we apply n inner i=1 CZ Ai,Bi X Ci , then the action on the code space is equal to k inner j=1 CZ Aj,BjXCj . Having a transversal operator that induces the action of the stabilizer (CZ)X of CCZstate on the logical qubits, we will make a controlled version of this. We use the following identity: (CCZ) 123 ( Ca X 1 )( C b X 2 )( Cc X 3 )(CCZ) 123 = Ca (CZ 23 X 1 ) C b (CZ 13 X 2 ) Cc (CZ 12 X 3 ) (6.1) which is the product of three stabilizers of CCZ-state controlled by three independent ancillas. The transversality of the logical operator (CZ)X implies that if we apply (6.1) transversally across a triple of normal codes, then the three ancillas will know the eigenvalue of the three stabilizers of CCZ, respectively. The non-Clifford gate CCZ in (6.1) can be injected using 4 T -gates [12,23].
This method of measuring stabilizers of CCZ state, compared to that in the previous section using the hyperbolic codes, has advantage that one does not have to repeat three times for each of three stabilizers, but has disadvantage that one needs roughly a factor of three space overhead. (The space overhead comparison is not completely fair, because a code cannot be simultaneously normal and hyperbolic. However, in the large code length limit this factor of 3 in the space overhead is appropriate.) In the large code length limit, this method also has an advantage in terms of T -count. Using the hyperbolic codes, even if the encoding rate is near one, we need 12 T gates per CCZ-state under the test. On the other hand, using (6.1) on a normal code of encoding rate near one, we need 8 T gates per CCZ-state under the test.