Measurement sequences for magic state distillation

We describe specific sequences of multiqubit Pauli measurements for distilling T and CCZ magic states, by exploiting the error correcting capability of normal weakly self-dual CSS codes and triorthogonal codes.

Magic states are ancilla states that enable universal quantum computation using only Clifford gates. Important examples are the one-qubit state |T = |0 +e iπ/4 |1 for general SU (2) rotation and the three-qubit state |CCZ = a,b,c=0,1 (−1) abc |a, b, c for quantum coherent arithmetic. Here we present specific measurement sequences to distill higher fidelity T and CCZ states from noisy T states.
We present three protocols, each of which is a tailored implementation of a known abstract distillation scheme. Two T distillation protocols below are based on the idea of measuring the Clifford (not Pauli) stabilizer of T states [1][2][3][4]. A CCZ distillation protocol below uses a generalization [5] of triorthogonal codes [6], which happens to be closely related to protocols in [7,8].
Earlier considerations assumed perfect Clifford operations with Clifford twirled noise model on nonClifford operations and used a certain CSS code only to detect Z errors. It was known that these assumptions can be relaxed [3,9,10], but only recently [11][12][13] it has become more serious to use the full or partial potential of error correction by the outermost 1 code, which is a normal weakly selfdual CSS code [4] or a triorthogonal code [6].
Chamberland et al. [12,13] use the two smallest instances of color codes for both usual error correction and Clifford stabilizer measurements. These implementations require a degree of connectivity that may be quite nontrivial for a two-dimensional grid of qubits in order to accommodate "flag" ancillas for fault-tolerant syndrome measurements. In addition, their implementation gives a distilled magic state encoded in a patch of color code. We would prefer a surface code to a color code for the surface code's smaller weight of stabilizers.
Litinski [11] focuses on a surface code architecture and shrinks surface code patch size wherever possible to reduce the overhead. This scheme uses the outer code to detect Z errors only. Nonrectangular, nonconvex patches are extensively used to connect rectangular (nonsquare) patches, but the error rate estimation for those nonrectangular patches is a heuristic extrapolation of Monte Carlo simulation results on square patches. If we use the outermost code of distillation as a full quantum error correcting code, it is very important to understand the constituent qubits' error channels; with outermost codes of distance 3 or less, as in [11], any correlated errors would invalidate the error analysis based on independent 1 By an outer code we mean a code whose constituent qubits are logical qubits of some inner code. We often consider the inner code to be a surface code.
noise models. In particular, the nature of logical error channel on surface code patches of irregular shapes that are used to connect rectangular logical patches, deserves further study in regards to the protocols in [11]. Thus, we are motivated to design protocols such that they only require limited connectivity, limited low level operations, and minimal assumption on error models. The theme to use an outermost code in a distillation protocol as a full quantum error correcting code continues in our design. In contrast to [11], we use the ability of the outer error correcting code to correct both X and Z errors with attention to correlated errors; in contrast to [12,13], we use a concatenation of an inner and outer code instead of a single color code, potentially enabling us to exploit the improved error correcting properties of the surface code or other inner code, with consideration of limited connectivity. Our protocols produce a distilled magic state on a standalone "output" qubit (or three for the CCZ state) that may be a surface code patch. We adhere to a scenario where there is strict limitation on the elementary gates. Namely, the elementary operations are horizontal ZZ and vertical XX measurements across nearest neighbor qubits on a square grid of qubits, along with single-qubit X and Z measurements. An exception is given to output qubits which may be a bigger surface code patch. No Hadamard gate will be used. We imagine that poor quality magic states are handed over from a previous round of distillation. This would be a minimal requirement in a surface code architecture, and certainly possible using lattice surgery [14]. Our protocols apply straightforwardly to an architecture with Majorana wires [15].

I. SETTING
As usual,

A. Elementary operations
Every operation in the protocol that we will describe is a measurement of a multiqubit Pauli operator X ⊗m for some m, a single qubit measurement in the X and Z basis, and a single qubit unitary by X, Z and T . An exception is in the treatment of "output" qubits, for which we need an embedding (a logical identity operation) of a qubit into a better quality qubit. If each qubit in our protocol is a surface code patch, this embedding amounts to growth of a patch in size. This happens for one qubit in the entire protocol. The single qubit unitary by X and Z should always be done by a Pauli frame update.
We can decompose our operations down to single-qubit X, Z measurements and horizontal ZZ and vertical XX measurements across nearest neighbors. The T gate is injected via ZZ and X measurements (Fig. 1). Since we want ZZ measurements to be possible only on horizontal pairs, T states need to be transported to the left of each data qubit. This is done by teleportation.
Multiqubit X measurements are performed by cat states. Ref. [16] gives a procedure that uses 2n − 1 qubits and single-qubit X and nearest neighbor ZZ measurements to produce an n-qubit cat state. Using a cat state |0 ⊗m + |1 ⊗m , we can measure X ⊗m by taking the parity of vertical XX measurements between a cat state qubit and a data qubit. These measurements should be followed by single-qubit Z measurements on the cat state qubits and subsequent Pauli corrections on the data qubits, to return the data qubits in the correct postmeasurement state.
It may be instructive to spell out the evolution of states during the measurement of X ⊗m . For any binary vector a, b of appropriate dimension, let X a , Z b be the tensor products of X and Z, respectively, that have nontrivial tensor factors only on the support of a, b. Let x be the binary vector of outcomes of the vertical XX measurements and z be the binary vector of outcomes of the Z measurements on the cat state qubits.
where in the first line the left tensor factor is the cat state, and in the last line X z is the Pauli correction that only leaves a global phase (−1) x· z .

B. Layout
The overall layout of our protocol is depicted in Fig. 2. For an [[n, k, d]] block code, there are 8n qubits that form a 4-by-2n rectangle. The first and second rows are occupied by T and S states that are to be consumed. Every other qubit of the third row is a data qubit of the block code. The fourth row is reserved for cat state prepara- The fourth row is used to prepare cat states. On the left of these rows, there are k+2 qubits of better quality, which needs to be larger if a surface code is used to encode every qubit in this figure. Since these better quality qubits (output) interact with the data qubits only after all T states are consumed, we can use the space that was occupied by S, T states for the output qubits. The case of k = 3 is displayed. The big patch with an arrow illustrates that the size of that patch changes dynamically.
tion. If T rotations are available in place, then of course we do not need the upper two rows. Near the end of the array of qubits there will be k + 2 qubits of better quality, of which we have k "output qubits" and two ancillas. If we use surface code patches for qubits, this means that the output qubits have larger size. The precise size of these patches should be determined case-by-case. The output qubits will interact with the encoded qubits of the block code only after all S and T states are consumed, and will host distilled magic states. Therefore, we may use the space occupied by the first and second row of the rectangle for the output qubits. Also, we can use the space of the leftmost column of the 4-by-2n rectangle for the better quality qubits.
For quadratic error reductions, the size of the surface code patches for better quality qubits would be roughly twice as big as the other qubits. Precise sizes should be determined by desired quality of the output qubits.

C. Error model
Every measurement outcome is flipped with probability p ∈ [0, 1). Every qubit suffers from independent noise after any operation including the identity. The single qubit error is modeled by a quantum channel where D is another quantum channel. Note that we use the same p for both the measurement outcome error and the single qubit noise. Any T states and any T or T † gates are immediately followed by but not by E. Note that we use the same D in Eqs. (2) and (3) for simplicity. We will treat output qubits as if they were noise free; using a surface code family it suffices to choose a code distance that matches the quality of the distilled magic states.
One may wonder how the independent noise assumption can be fulfilled after measurements of X ⊗m , a multiqubit operator. A naive preparation of the cat states introduces correlated errors and using such a cat state for our measurement will invalidate the independent noise assumption. Hence, we need a more careful protocol to prepare cat states that differ from ideal ones by independent noise on constituent qubits. The protocol in [16] achieves this goal.

II. PROTOCOLS
In the protocol specification below, we put tildes on states and operators to emphasize that they are noisy and are not as good as the output qubits.
A. State teleportation to a large surface code patch In our protocol, it is necessary to teleport a magic state that is encoded in a block code to an output qubit. This will be performed by preparing a state of Z out = +1 and measuring X out X and Z where X, Z are logical operators of the block code. Using surface code patches, it is straightforward to initialize the output qubit and measure Z of the code, but the measurement of X out X is unusual since the data qubits of the block code have smaller size. This problem is solved by a cat state preparation as the following. Let a, b denote two big patches in the bottom row of Fig. 2. Patch b is the one with an arrow.
1. Prepare |++ ab and measure Z a Z b with a Pauli correction to have |00 ab + |11 ab .
2. Shrink b to match the size of the patches in the fourth row of the 4-by-(2n − 1) rectangle.
3. Following the prescription of [16], sufficiently measure ZZ between nearest neighbors in the fourth row and discard every other qubit.
According to [16], fault-tolerant cat state preparation on a one-dimensional array of qubits needs only one ZZ measurement for the leftmost pair. Our procedure exploits this construction, and since the leftmost pair measurement is performed on big patches, the output qubits will be protected as desired. Once we have a cat state, we follow the procedure in Eq. (1) to complete the X out X measurement.
In all protocols below, only one output qubit interacts with the block code at a time. So, when we have multiple output qubits, we shift (teleport) the output qubit to the right after interacting.

Initialize six data qubits in |0
⊗6 . Bring |T t1 |T t2 on the left of the array of data qubits, where output qubits are yet to be set up.
2. Measure XXXXII and IIXXXX on the six data qubits. Apply Zs such that the resulting state is the logical state with Z 1 ≈ Z 2 ≈ +1.
3. Teleport states in qubits t1, t2 into the logical qubits by measuring X t1 X 1 , X t2 X 2 and then measuring Z t1 , Z t2 , followed by appropriate Pauli Z corrections on the logical qubits.
8. Measure XXXXII and IIXXXX on the six data qubits. Postselect on all +1 outcomes.
Step 7 implements the half of a teleportation protocol.
Step 8 checks the X-stabilizers of the code. Finally, Step 9 measures the Z-stabilizer as well as the Z-logical operator to complete the teleportation of two logical qubits.
The measurement depth is counted as follows. There are m 1 = 3 rounds of single-qubit measurements on the data qubits in Steps 1, 3,9. There are m 2 = 2+2+2+2 = 8 rounds of multiqubit X-measurements on the data qubits in Steps 2,3,5,8. There are 2 rounds of T -gates on data qubits in Steps 4,6, which would involve input T and S states. An injection of T gate that requires an S correction involves 6 one-and two-qubit measurements ( Fig. 1(c)); T states can be transported right next to the data qubits and S state right above T state while other steps are being executed. So, there are m t = 12 rounds of single-or two-qubit measurements. There are m out = 4 rounds of joint measurements in Step 3 between the output qubits and data qubits. We may neglect the initialization of the output qubits since that can be done in parallel with previous data qubits measurements. Overall, we have m 1 + m t = 15 rounds of oneand two-qubit measurements, m 2 = 8 rounds of multiqubit X-measurements on the data qubits, and m out = 4 rounds X-measurements that involves the output qubits.
Let us be more specific on the measurement count using surface code patches, treating one syndrome measurement of the surface code patch as the unit time. Assume that d rounds of syndrome measurements are needed for surface code patches of the data qubits and d rounds of syndrome measurements for the output qubits for one logical operation; d or d is the code distance of a patch. To prepare a cat state that is "2-fault-tolerant" [16], we need 8 rounds of logical operations across nearest patches. This time is long enough that we can ignore all explicit one-and two-qubit measurements in Steps 1,4,6 as they can run in parallel with the cat state preparation for the next step.
Step 9 is also negligible. The last round of the cat state preparation of [16] consists of single-qubit X measurements on qubits that do not partake in the final cat state, which can run in parallel with XX measurements between the cat state qubits and data qubits. Including single-qubit Z measurements in Eq. (1), we conclude that one multiqubit X-measurement takes time 9d, or 7d + 2d if it involves the output qubits. So, the total time is m 2 · 9d + m out (7d + 2d ).
The number of physical qubits used is 4 · 12 · d 2 + 4d 2 , neglecting ancillas for syndrome measurements of the surface code.

C. Third order T distillation by [[7, 1, 3]]
The following is our measurement sequence based on the principle of measuring T XT † that has eigenvalue +1 on |T [1,2,4]. An implementation circuit of this idea was also presented in [3] but without consideration of a separate output qubit and limited connectivity. The stabilizers of the Steane code [ [7,1,3]] are XIXIXIX, IXXIIXX, IIIXXXX, ZIZIZIZ, IZZIIZZ, and IIIZZZZ. The logical operators are X = X ⊗7 and Z = Z ⊗7 .

Initialize seven data qubits in |0
⊗7 . Bring |T t on the left of the data qubit row, where an output qubit is yet to be set up.
2. Measure the three X-stabilizers XIXIXIX, IXXIIXX, IIIXXXX on the seven data qubits. Apply Zs such that the resulting state is the logical state with Z ≈ +1.
3. Teleport |T t into the code by measuring X t X and then Z t .
8. Initialize an output qubit in the state |0 o .
9. Measure each of three equivalent X-logical operators of the code, multiplied by the Xoperator on the output qubit: X o (IXIXIXI), X o (XIIXXII), X o (XXXIIII). Let x 1,2,3 = ±1 be the outcomes. Postselect on consistent results 10. Measure the three X-stabilizers. Postselect on all +1 outcomes.
12. Accept the output qubits if all the postselections have succeeded. The output qubit holds distilled T state.
Steps 1,2,3 prepare an encoded T state to error rate O(p).
Step 4 checks X-stabilizers. Steps 5,6,7 measure the Clifford stabilizer TXT † of the encoded T state. Similar to the previous protocol, the Clifford stabilizer is induced by (T † XT ) ⊗7 since the logical operator X has weight −1 mod 8. Note that Step 4 has no analog in the previous protocol with the quadratic error reduction.
Step 4 here is needed because, without it, a two-error process, where one Z error in Steps 1,2,3 and another in Step 7, would cancel each other to let an incorrect T state pass through the Clifford stabilizer check. Steps 8,9,10,11 combine teleportation of the encoded T state with Pauli stabilizer checks. An interesting point in Step 9 is that it uses three representatives of logical operators to detect second order processes that may result in wrong teleportation.
The measurement depth is counted as follows. As in the time analysis of the previous protocol, we just count the number of multiqubit measurements. We need "3fault-tolerant" cat states, for the preparation of which we need 10 rounds of one-and two-qubit measurements. For those that do not involve the output qubits, there are 3 + 3 + 2 + 3 = 11 measurements in Steps 2,4,6,10. For those that involve the output qubits, there are 1 + 3 = 4 measurements in Steps 3,9. Using surface code patches of distance d (data) and d (output), we have 11 · 11d + 4 · (9d + 2d ) rounds of syndrome measurements for the surface code patches.
The number of qubits used is 56d 2 + 3d 2 , neglecting ancillas of the surface code.
2. Measure the X-stabilizer XXXXXXXX as well as the three X-logical operators X 1,2,3 . Upon −1 outcomes, apply Pauli corrections by ZIIIIIII and Z-logical operators Z 1,2,3 such that the resulting state is the logical state |+++ .
3. Measure IXXIXIIX, the product of the three Xlogical operators. Postselect on the +1 outcome.
6. Measure the X-stabilizer X ⊗8 on the data qubits. Postselect on the +1 outcome.
7. Destructively measure all data qubits in Z basis with outcomes z 1 , . . . , z 8 . Postselect on all four conditions z 1 z 2 z 3 z 4 = +1, z 2 z 4 z 6 z 8 = +1, z 3 z 4 z 7 z 8 = +1, and z 5 z 6 z 7 z 8 = +1. Apply X o1 if 8. Accept the output qubits if all the postselections have succeeded. The output qubits are in a distilled state |CCZ . In Step 4, we choose a particular product of T and T † . This choice ensures that the underlying (generalized) triorthogonal code satisfies "level-3" orthogonality [18], which removes the need of Clifford corrections of [5].
The measurement depth is counted similarly as the previous protocols; we just count multiqubit measurements. We need "2-fault-tolerant" cat states. There are 4 + 1 + 1 = 6 measurements in Steps 2,3,6 that do not involve the output qubits. There are 6 measurements in Step 5 that involve output qubits. Using surface code patches of distance d (data) and d (output), the duration of the protocol is 6 · 9d + 6 · (7d + 2d ).
The number of qubits used is 4 · 16d 2 + 5d 2 , neglecting ancillas of the surface code.

III. FAULT-TOLERANCE ANALYSIS
We simulated the complete protocols using density matrices which is easy as they involve at most 11 qubits. We numerically examined p out as a function of p, p t ∈ (10 −6 , 10 −4 ) for a given D, and fitted to a polynomial formula p out = ap 2 t + bp t p + cp 2 or p out = ap 3 t + bp 2 t p + cp t p 2 + dp 3 where a, b, c, d are fitting parameters. Table I shows these coefficients rounded to integers.
The CCZ state distillation protocol is designed to achieve quadratic error suppression, and thus all the cat states there need to be 2-fault-tolerant only [16]. Assuming that indeed the cat states are only 2-fault-tolerant, it is not too meaningful to analyze our protocol where Step 7 is supposed to achieve quartic error suppression T -to-T using [ [6, 2 , 2]] Typically, the surface code in a patch is assumed to operate in an "error correcting" mode, meaning that one attempts to correct any errors that occur. At the same time, typically one assumes that the outermost code is used in an "error detecting" mode, meaning that one only keeps the distilled magic state if no errors are detected. A final possibility is "partial error correction"; for example, in [20] it was suggested that for certain large outer codes one might correct if a small number of errors would give the observed syndrome, and discard otherwise (more generally, one may choose some set of observed syndromes to correct and discard on others).
We now consider error detection and partial error correction as applied to surface codes inside a magic state factory (where if one has to discard the state, then this has no effect on the rest of the computation). It is an interesting question whether partial error correction might be useful on the qubits used inside a quantum computer outside the magic state factory, i.e., on the qubits actually used for computation. In this case, if one has to discard the state on some observed syndrome, this will typically require discarding all of the computation up to that point, since typically that qubit will be entangled with the rest of the computation, and restarting the computation from scratch. Thus, for this to be useful, the probability of discarding the state would have to be small compared to the inverse number of gates in the computation.
In an error detecting mode, a surface code of distance d can suppress errors up to d-th order: by performing d rounds of syndrome measurements after each logical measurement, we can suppress logical errors if fewer than d physical errors occur in any round of logical measurements. Unfortunately, this simple error detection mode may not be too useful. The average number of errors is p times the number of error locations; there are d 2 − 1 syndromes, and so we perform d(d 2 − 1) syndrome measurements. Each syndrome measurement is broken into some number of physical operations, with the exact number depending in detail on the physical implementation. So, one needs pd(d 2 − 1) 1 to attain a large probability that the state will not be discarded on a given round, with the exact value depending on the implementation of syndrome measurements.
For a [ [7,1,3]]] code, we need ∼ 60 patches. Given a total number of rounds ∼ 100, we need 6000 · d(d 2 − 1) ≈ 1.6 × 10 5 p −1 to obtain significant throughput (there are some additional numerical factors of order 1 due to use of ancillas to implement measurements). So for physical error rate 10 −5 , it is unlikely that error detection will succeed. On the other hand, for physical error rate slightly smaller (say 10 −6 ), there is no need to use the surface code patch.
Partial error correction, in which one corrects for example up to one error in each patch, in each round, may be more likely to succeed. Now, one needs roughly √ 6000d 3 p 1, which is much more attainable. However, analyzing the performance of partial error correction will require an enumeration of error patterns that we leave for future work. At even higher physical noise rates, it may be useful to implement even more relaxed forms of partial error correction in which one corrects larger numbers of errors. We also leave this for future work.