Symmetry Protected Quantum Computation

We consider a model of quantum computation using qubits where it is possible to measure whether a given pair are in a singlet (total spin $0$) or triplet (total spin $1$) state. The physical motivation is that we can do these measurements in a way that is protected against revealing other information so long as all terms in the Hamiltonian are $SU(2)$-invariant. We conjecture that this model is equivalent to BQP. Towards this goal, we show: (1) this model is capable of universal quantum computation with polylogarithmic overhead if it is supplemented by single qubit $X$ and $Z$ gates. (2) Without any additional gates, it is at least as powerful as the weak model of"permutational quantum computation"of Jordan [14, 18]. (3) With postselection, the model is equivalent to PostBQP.

We consider a model of quantum computation using qubits where it is possible to measure whether a given pair are in a singlet (total spin 0) or triplet (total spin 1) state. The physical motivation is that we can do these measurements in a way that is protected against revealing other information so long as all terms in the Hamiltonian are SU (2)-invariant. We conjecture that this model is equivalent to BQP. Towards this goal, we show: (1) this model is capable of universal quantum computation with polylogarithmic overhead if it is supplemented by single qubit X and Z gates. (2) Without any additional gates, it is at least as powerful as the weak model of "permutational quantum computation" of Jordan [14,18]. (3) With postselection, the model is equivalent to PostBQP.
Imperfect physical gates are a major challenge for building a scalable quantum computer. One possible way to overcome this challenge is to use error correcting codes to build high fidelity logical gates from the lower fidelity physical gates [10]. Another approach is to use a topologically ordered state to store and manipulate quantum information, directly obtaining good logical gates [17]. Here, we propose a third approach, to protect the operations by symmetries of the physical Hamiltonian.
In particular, we consider qubits encoded in quantum spins, and we assume that the Hamiltonian and any noise terms respect an SU (2) symmetry acting on all the qubits simultaneously. We need a quick introduction into the representation theory of SU (2). The irreducible representations of SU (2) are indexed by a quantity S ∈ {0, 1/2, 1, 3/2, . . .}, called the spin. The dimensionality of a representation of spin S is 2S + 1. The spin-1/2 representation has dimen-sion 2, so can be regarded as a qubit. The tensor product of two spin-1/2 representations is the direct sum of a spin 0 and spin 1 representation. The singlet state of a pair of qubits is, up to an arbitrary phase, 1 √ 2 (|01 − |10 ), while states {|00 , |11 , 1 √ 2 (|01 + |10 )} in the subspace orthogonal to the singlet state are called triplet. The singlet state is anti-symmetric under exchange of qubits, while the triplet states are symmetric.
In the idealized case, a single qubit is not subject to any noise of any kind, since there are no terms that we can write down which are invariant under SU (2) on that qubit. If we bring a pair of qubits together, then the total spin (either 0 or 1) is the only SU (2)-invariant term on both qubits. Thus, dephasing may happen between spin 0 (singlet) and spin 1 (triplet) states; however, it also becomes possible to measure this total spin. Our focus in this paper is on the power of these singlet/triplet measurements in an idealized model. To state our physical model: we assume that there are n qubits for some n, initially in some state which is a tensor product of singlets. We then allow arbitrary pairs of qubits to be selected and the total spin of those two qubits to be selected. We use the notation s/t to indicate this operation, which projectively measures a pair of qubits to be in either the singlet (s) or triplet (t) subspace. From this physical model, we define a computational complexity class that we call STP, where a sequence of polynomially many such s/t measurements is allowed. More precisely, Definition 1. STP is the class of problems that can be correctly answered, with constant probability larger than 1 2 , using polynomially many s/t measurements where the sequence of mea-surements to make is determined by a polynomial time classical algorithm based on the input string and on the previous measurement outcomes. Similarly, the decision of whether the input string is accepted or rejected is also determined by a polynomial time classical algorithm based on the string and the measurement outcomes.
As perhaps the "smallest" physical example where this can be realized, imagine a single electron and a single proton far separated from each other. Each has spin 1/2, and (assuming no magnetic fields are present and assuming that spin-orbit coupling can be ignored which depends on the electric fields and on the orbital degrees of freedom) then these spins are not subject to any decoherence. If the electron and proton are brought near each other, then there is a hyperfine coupling between these spins, and the ground state splits into singlet and triplet levels with slightly different energies which can in principle be measured. One might then try to compute by having a large number of electrons and protons and occasionally bringing a single electron near a single proton and measuring the total spin. This simple example is slightly different from the idealized model above, as some of the qubits are associated with electrons and some with protons, and pairs must have one electron and one proton.
More realistic examples include trapped ions where interaction between ions may be dependent on total spin, or spin blockade in quantum dots [13].
This paper considers the question: what is the computational power of STP? We believe the study of such a simple model, in addition to its symmetry protection, is of interest itself and we conjecture that STP is equivalent to BQP. Even if our conjecture is wrong and STP falls somewhere between P and BQP, the study of STP is still well-motivated as it is so distinct from other quantum computational models. While we do not show STP = BQP, we present several partial results 1 .
We emphasize that this kind of physical protection by symmetry is distinct from the approach 1 After this paper appeared, it was shown that STP=BQP [22], building on [21].
Even given that STP=BQP, the constructions here are of interest as they may give a lower overhead way to implement BQP in practice using additional approximate gates.
using measurement based quantum computation in a symmetry-protected topological phase [19].
First, in Section 1, we show how to implement universality using s/t measurements as well as certain single qubit Cliffords and Pauli measurements. Our discussion will be in reverse order of the number of single qubit operations we allow: we will start with the full single qubit Clifford group and all Pauli measurements, and gradually reduce the number of single qubit operations required, until what is required is X, Z Clifford operations in addition to s/t measurements. A related previous result is that s/t operations combined with a source of many copies of three different states whose Bloch vectors span is also universal [21]; s/t operations are used to purify mixed states and build any pure state, which are then used to build cluster states. This purification result, impossible to do using Clifford operations, further motivates the study of STP. We refer to the introduction in [21] for further motivation behind the study of STP.
In Section 2, we show that STP is at least as powerful as the model of "weak permutational computing" [14] (it has been shown [12] that there is an efficient classical algorithm to compute amplitudes in this permutational computing model, but no classical algorithm is known to solve the sampling problem in this model).
In Section 3, in line with our investigation of STP, we consider generalizations of STP, showing that allowing higher spin qudits does not increase the power of the model.
Finally, in Section 4, we define a postselected version of STP, called PostSTP, and show that it is equivalent to PostBQP. This implies that efficient exact sampling from an STP protocol is impossible assuming the polynomial hierarchy is infinite [11,See "Sidebar: The Polynomial Hierarchy and Post-Selection"]. An approximate efficient sampling, related to quantum supremacy experiments, is discussed in Appendix C (Problem 1). Showing PostSTP = PostBQP does not imply STP = BQP, as there are other models, such as boson sampling and IQP which are not believed to be BQP and yet their postselection version equals PostBQP (see [11] for more on PostBQP and quantum supremacy experiments).
There are several interesting open questions, such as the obvious question of the hardness of sampling measurement outcomes for some se-quence of s/t measurements, with the sequence randomly chosen or chosen in some other way.
A final result that may be of independent interest is given in Appendix A, namely a relation between the "irrationality measure" of an irrational angle and how many times one may need to rotate by that angle to approximate some desired target angle to some desired accuracy. Although we use this result in Section 1.1.2, it is only to show an efficient approximation that can also be done with more elementary facts. Likely the result in Appendix A is well-known, but since we did not find it elsewhere we record it here.

X, Z and s/t is Universal for Quantum Computation
We show that: Theorem 1. Using s/t measurements, and single qubit X, Z unitaries, we can approximate a gate set consisting of one and two qubit Clifford operations and Pauli measurements, as well as T gates, with an expected overhead that is only polylogarithmic in the error in this approximation. Remark 1. The polylogarithmic overhead arises for a familiar reason: we will exactly implement Clifford gates and then implement T gates by magic state distillation using as input approximate T gates that we can form from the given operations. The overhead scales polylogarithmically in the accuracy of the distilled T gates. The implementation of the circuit is probabilistic, as certain operations are repeated until they succeed, which is why the theorem refers to an expected overhead.

s/t and Single Qubit Clifford and Pauli Measurement is Universal for Quantum Computing
We first show that s/t on arbitrary pairs of qubits, combined with arbitrary Cliffords on a single qubit (i.e., X, Z, H, S) and single qubit Pauli measurements, is universal for quantum computing. In particular, we can approximate a gate set consisting of one and two qubit Cliffords operations and Pauli measurements, as well as T gates, with an overhead that is only polylogarithmic in the error.
In subsequent sections, we reduce the number of single qubit operations required at the cost of making the construction more complicated.
We begin by showing that we can implement the full Clifford group, and then show universality.

Implementing the Full Clifford Group
First, we prove: Lemma 1 (Bell basis measurement). We can implement a four outcome projective measurement on two qubits A, B in a Bell basis: Proof. To do this, first apply an s/t measurement. If the outcome is s, then the qubits are in the first Bell state. If the outcome is t, apply Z A and again measure s/t. If the outcome is s, then the qubits were in the second Bell state. If the outcome is t, apply X A and again measure s/t; if the outcome is s, then the qubits were in the third Bell state. If the outcome is t, then the qubits were in the fourth Bell state.
Remark 2. Since we can do Bell basis measurements, we can teleport: given a pair of qubits labeled B, C in a Bell state, we can bring in an additional qubit labeled A and measure A, B in the Bell basis. This teleports the state of A to C, up to some correction on C which is a single qubit Clifford.
Further, we can teleport more than one qubit, and use this to perform Clifford operations if we can prepare appropriately entangled states. To prove that for the case of CNOT, consider a state ψ CN OT on four qubits C, D, E, F obtained by taking C, E in a singlet and D, F in a singlet, and then applying a CNOT from E to F . We then bring in two extra qubits A, B, and measure A, C in the Bell basis and measure B, D in a Bell basis. This teleports the state of A, B to E, F and then applies a CNOT, again up to some single qubit Cliffords on E, F .
So, if we can prepare this state ψ CN OT , then we can perform CNOT operations. Combined with the ability to perform single qubit Cliffords, this gives the full Clifford group. It seems at this point that we are trying to get a "free lunch": the most obvious way to prepare ψ CN OT is to perform a CNOT operation, which is precisely the operation we are trying to produce.
However, it is possible to prepare ψ CN OT using just s/t and single qubit Cliffords, as we now show. First we construct a double spin operation O ZZ . Lemma 2 (O ZZ ). We can produce an operation O ZZ which acts on a pair of qubits A, B and pro- Proof. To do this, simply perform only the first two measurements of the protocol above to measure in the Bell basis: if either of the first two measurements are s, then Z A Z B = −1 and we have also measured X A X B . If both measurements are t, then Z A Z B = +1 but no other information is revealed; we should then apply Z A to undo the first application of Z A .

Corollary 1 (Ô ZZ
. Similarly, we can produce an operation O XX which measures X A X B and, if X A X B = −1, also measures Z A Z B , and we can produce an operation O XX which measures X A X B and, with probability 1/2 measures Z A Z B otherwise measuring nothing else. To construct O XX , we use the same construction as O ZZ except we apply X A after the first measurement if the result is t, rather than applying Z A . We constructÔ XX similarly. Also (we will use this in Section 1. Lemma 3 (CNOT). We can produce a CNOT gate using an ancilla, given the ability to perform single qubit Cliffords and Pauli measurements as well as to apply O ZZ , O XX on an arbitrary pair of qubits.
Proof. The circuit [15,23] is: prepare the ancilla in the |+ state, measure ZZ on the source and ancilla, measure XX on the ancilla and target, and finally measure Z on the ancilla, and apply single qubit Clifford corrections if needed. The original reference [23] writes the circuit with additional Hadamard gates so that all measurements are Z or ZZ, but it conjugates to the circuit we give here. Now, to produce the CNOT gate action, recall that we only need to prepare the state ψ CN OT . We do so by a probabilistic protocol: prepare two entangled qubits. Then, attempt to produce a CNOT gate by using operations O ZZ and O XX in place of the ZZ and XX measurements in the aforementioned protocol for a CNOT. If both O ZZ and O XX succeed, we have produced the desired ψ CN OT . If one does not succeed, we may simply try again.
Remark 4. Note that in this probabilistic protocol, indeed O ZZ and O XX will each succeed with probability 1/2, without any need to useÔ ZZ andÔ XX .
Thus, this protocol to perform a CNOT gate can be understood simply as, first, we have a protocol that sometimes succeeds in performing the desired CNOT, and, second, by preparing entangled states "offline". Offline operations use expendable qubits where, without any loss in efficiency, we can start again the operations/measurements on new qubits if the desired result was not obtained. This is in contrast to online operations applied on the data (compu-tational+ancilla) qubits. By teleporting through the entangled states, we can use this to perform CNOTs on data qubits by "only using the CNOT when it will succeed".

Universality
Now we show universality. The construction here is not intended to be optimal in terms of minimiz-ing overhead in any way. It is simply intended to be a simple way to show universality by leveraging standard results.

Lemma 4.
We can prepare (offline) a nonstabilizer pure state on a single qubit.
Proof. Prepare two qubits in states |0 and |+ using Z, X measurements and applying X, Z if the wrong outcome occurs. Project into the triplet state (if instead they are in a singlet, reprepare and try again). The result is, up to normalization, |00 + 1 2 (|01 + |10 ). Measure the second qubit in the X basis; we assume without loss of generality that the result is |+ . The result state on the first qubit is, up to normalization, 3|0 + |1 . With normalization this is cos(θ)|0 + sin(θ)|1 , where θ = arctan(1/3) is an irrational angle [2].
Any such state cos(θ)|0 + sin(θ)|1 is equal to exp(iθY )|0 . Using this as a resource for state injection produces a rotation by where the sign is chosen uniformly at random (our basis for state injection is a little different from the standard one since we rotate by Y instead of by Z). This state injection can be done, e.g., as in [4, Section III] by modifying the following appropriately using a Hadamard and phase gate: where M Z 2 is a Z-measurement on the second qubit, and Λ(e ±2iθ ) the controlled e ±2iθ rota- , with H, S the Hadamard and phase gate. We shall address the issue of sign ± in Λ(e ±2iθ ) shortly.
In the simplest applications of state injection, one imagines a situation where rotation by twice the angle is available as a primitive (for example, using state injection to produce T gates and assume that one has S gates available). In that case, if the sign is not what one wants, one can recover with a rotation by twice the angle. We do not have that option here.
We use a different approach. Pick any desired target angle φ target , and any error > 0. Then, repeatedly apply state injection (like in Eq. (1.1)) to a qubit |ψ initialized in the |0 state, until the result is cos(φ)|0 + sin(φ)|1 where φ ≡ mθ (mod 2π) for some m ∈ N and Here, when we apply state injection, we do not care whether the plus sign or minus sign is chosen. The result is some random walk in angles and since θ is irrational, in expected time Choosing φ target = π/8, we can then produce states arbitrarily close to a magic state for a rotation by π/8, i.e., a magic state for a "T " gate, where the quotes are because this is rotation by Y rather than Z. Choosing sufficiently small, one can then use this gate as input into any standard T gate distillation protocol (like in Eq. (1.1) but for θ → φ) [4, Section III] to obtain universality with only polylogarithmic overhead in the target error.

Reducing the Single Qubit Operations Needed
We now reduce the single qubit operations required.

Avoiding Use of Hadamard and S gates
First, we note that it is possible to avoid using both Hadamard and S gates. Without these gates, our construction of Section 1.1.1 gives the subgroup of the Clifford group generated by single qubit X and Z and two qubit CNOT. Call this subgroup C. We now show that we can generate the full Clifford group using just s/t, X, Z and single qubit Pauli measurements. Proof. Since we can measure single qubit Paulis, prepare qubits in state Y = +1. Using these Y = +1 qubits as a target for state injection for rotations by Z, we can produce the S = exp(i π 4 Z) gate. This state injection protocol requires only X, Z, and CNOT gates and single qubit Pauli measurements, so it requires only the subgroup C above. Note that since we have the single qubit Z = S 2 available, we can recover state injection if we instead produce S † (Eq. (1.2)).
We can also use the same Y = +1 state in state injection to produce rotation by exp(i π 4 X) (Eq. (1.3)). To do this we use the usual state injection protocol, except we interchange the control and target on the CNOT gate in state injection, and we replace the final Z measurement in state injection with an X measurement. The effect of this is to perform state injection in a Hadamard basis (even though no Hadamards are used!), since (H ⊗ H)CNOT(H ⊗ H) is equal to a CNOT gate with control and target interchanged.
Combining rotations exp(i π 4 Z), exp(i π 4 X), we can produce the full single qubit Clifford group, including H.

X, Z and s/t is Universal
Let us now show finish the proof of Theorem 1 by showing: Lemma 6. Single qubit Pauli measurements can be derived using X, Z and s/t.
Proof. Recall that using just X, Z and s/t, we have a protocolÔ ZZ which measures ZZ on any two qubits A, B and which measures nothing else with probability 1/2. So, imagine that we have a set S of qubits such that the product ZZ = +1 for all pairs of qubits in S. Consider another qubit q in any state. Then, pick any qubit r from S and measure Z q Z r . With probability 1/2 nothing else is measured and if Z q Z r = +1, we can add q to the set; if Z q Z r = −1, we can apply X to q and then add q to the set. On the other hand, with probability 1/2, additional information is measured; in this case, we remove r from S.
So, with probability 1/2, |S| → |S+1| and with probability 1/2, |S| → |S − 1|. This allows us to build up large sets S. Since the size of S does an unbiased random walk, it takes ∼ n 2 operation to produce a set S with |S| = n. We can then use such a set as a "standard": throw out any qubit in S. The remaining qubits are then in a state which is an incoherent mixture of |0 ⊗(n−1) and |1 ⊗(n−1) .
So, let us analyze these two cases, where the remaining qubits are either in |0 ⊗(n−1) or |1 ⊗(n−1) , separately. First suppose that the state is |0 ⊗(n−1) . Call each of these n − 1 qubits "Z standards". Then, any time we want to measure Z on a single qubit, we get a Z standard and useÔ ZZ to measure ZZ on the given qubit and the standard. If we measure no other information other than ZZ, we can now in fact use both qubits as standards; if we also measure XX, we discard both qubits. So, at most one Z standard is consumed per measurement. Note that even if O ZZ does not succeed, so that we also measure XX, we still have learned the value of Z on the given qubit: the effect is to measure the value of Z on the given qubit and then bring that qubit and the standard into a Bell pair.
We assumed that the standards were in the state |0 ⊗(n−1) . If instead they are in the state |1 ⊗(n−1) , nothing changes: our labels |0 and |1 are arbitrary and can be interchanged. This is a consequence of a symmetry of our operations X, Z, s/t, which are invariant (up to an unobservable phase) under conjugation by X. Indeed, if our labels were not arbitrary, then some sequence would reveal the difference between these two states, in which case we could, assuming we had |1 ⊗(n−1) , apply X to every qubit to obtain |0 ⊗(n−1) . So, we can measure single qubit Z with a quadratic overhead: the number of operations is proportional to the square of the number of measurements. Similarly, we can measure single qubit X and single qubit Y by preparing X standards (respectively, Y standards), which are sets of qubits where XX = +1 (respectively, Y Y = +1) for every pair in the set. This finishes the proof of the lemma and as a result Theorem 1.
Remark 5. In fact, the quadratic overhead can easily be reduced to linear. Suppose we have some such set S. In O(1) operations we can prepare another set T with |T | ≥ 2 such that all pairs of qubits in T also have ZZ = +1. Then, pick one qubit from S and one from T and applyÔ ZZ . If this succeeds with ZZ = +1, add all qubits from T to S and if it succeeds with ZZ = −1, apply X to all qubits in T and then add all qubits in T to S. If it fails, remove the given qubit from S and discard T . Thus, with probability 1/2 with have |S| → |S| + |T | ≥ |S| + 2, while with probability 1/2 we have |S| → |S| − 1.
This gives a biased random walk in |S| and so |S| increases linearly in the number of operations with high probability. We leave it to the reader to consider optimizing the linear increase.
For example, what is the best |T | to use; should one in fact apply this construction recursively by constructing T using a similar process; one can in fact avoid discarding all of T but only discard the measured qubit ifÔ ZZ does not succeed and so on.

Permutational Quantum Computation
The model of permutational quantum computing [14] is as follows.
For any binary tree T with n leaves, we define a set of commuting operators on a system of n qubits. Each qubit corresponds to one leaf. For every vertex v (including the root), there is an operator S 2 v = ( l S l ) 2 with eigenvalues corresponding to the total spin of the qubits corresponding to leaves which are descendants of that vertex. Additionally, for the root, there is another operator with eigenvalues corresponding to the total spin in the Z-direction of the qubits, denoted S Z = 1 2 l leaf Z l . The eigenvalues of these operators (all together forming a tuple of half-integers) define a complete eigenbasis. We say that a labelled tree is a binary tree with labels at each vertex ( Fig. 2.1), with the labels at each vertex being chosen from the set of eigenvalues of the operator(s) corresponding to that vertex. We use T, T , . . . to denote unlabelled binary trees ( Fig. 2.2). We use λ, λ , . . . to denote labeled binary trees, and use |λ , . . . to denote the corresponding states. There are two models of permutational quantum computing. In the weak model, one can prepare any state |λ corresponding to any labelled tree. One can then pick any any other tree T and projectively measure the corresponding operators for that tree. If λ is a labeling of S, then one can measure | λ|λ | 2 to inverse polynomial accuracy by repeating this projective measurement polynomially many times. In the strong model, we Figure 2.1: Eight labelled trees, corresponding to an orthonormal basis of 3 spins, i.e. (C 2 ) ⊗3 (Figure from [14]). Each label corresponds to the spins of the descendant leaves. However, the root has two coordinates; the first is the total spin of all leaves (total spin), and the second is the total azimuthal angular momentum, an eigenvalue of S Z . Every labelled tree can be expressed in the standard basis of (C 2 ) ⊗3 . For example, the left tree from the second row is |λ  [14]). Measuring, say, the leftmost unlabelled tree, is sampling a labelling of that tree called |λ with probability | λ|λ | 2 for |λ a labelled tree with 4 spins.
assume that one can also measure λ|λ (without taking the absolute value) to inverse polynomial accuracy. While this problem of computing λ|λ can be done classically [12], there is no known efficient classical algorithm for the problem of sampling from λ with probability | λ|λ | 2 . We show that: Using just s/t, one can simulate the weak model in polynomial time.

Reducing to case that root has spin 0
First note that it suffices to consider the case that the root has spin 0. The reader may prefer to skip this section on first reading. Indeed, consider any λ, λ such that λ has spin S root = 0. Introduce notation: let |λ, S Z be a state defined by some labeled tree λ with the Z-spin at root replaced by S Z . Then, λ , S Z |λ, S Z = 0 unless S root = S root and S Z = S Z . Fix some unlabelled tree T . To sample λ with probability | λ|λ | 2 , we do the following. Adjoin an additional 2S root ancilla qubits, defining some tree λ which constrains those ancillas to be in the spin S root state (there are many possible such λ as all we care about is the root spin; choose an arbitrary such λ ). Then join the root of λ to the root of λ to produce 2 someλ with spinŜ root = 0. Let T be an unlabelled tree, obtained by removing the labels from λ . Join the root of T to T to produce someT . Then, prepare |λ and projectively measure the eigenvalues corresponding to treeT . Note that we will always measureŜ root = 0. This projective measurement gives a labeling ofT , which induces a labeling of T since T is contained inT (we can assume that the root of T has S Z = S Z since otherwise the amplitude is zero).
We claim that this correctly samples the amplitudes. That is, ifλ is a labeling ofT , which induces some labeling λ of T , then | λ |λ | 2 is equal to | λ|λ | 2 . To see that this, note that where A(S Z ) are some Clebsch-Gordan coefficients. Do the same expansion for |λ and then Since the inner product on the right-hand side is independent 3 of S Z , the result follows.

Measuring operators for some tree
Next we describe how to projectively measure operators corresponding to any tree T with root having total spin 0. The key is that given any set U of qubits with |U | = M for some M , we can approximately measure the total spin of those qubits as follows. Simply repeat the following operation many times: choose a random pair of qubits, and measure s/t. We show below that 2 The root of λ has two labels, since it also has a label by some SZ . This root becomes an internal vertex ofλ which should have only one label. We drop the label SZ when we buildλ. 3 We show this here. Let (in a slight overuse of notation) X denote the sum of Pauli X operators on all qubits. The state exp(iθX)|λ, SZ = Sroot is a superposition of states |λ, SZ with different SZ . The inner product λ , Sroot| exp(−iθX) exp(iθX)|λ, Sroot is independent of θ and by expanding in θ it follows that λ , SZ |λ, SZ is independent of the particular SZ . the number of measurements needed is at most inverse polynomial in the accuracy.
Then, letting S 2 denote the squared spin, and S i denote a vector of spin operators (one-half the Pauli operators) on qubit i, Hence, averaging over randomly selected i, j, In a triplet state, S i · S j = 1/4 while in a singlet it equals −3/4. So, the probability of triplet is Note that while this measurement of singlet or triplet may change the state of the qubits in U , it does not change the total spin and since we randomly choose i, j each time, the probability of triplet depends only on the total spin. Hence, since the measurements are independent, we may estimate the spin in a time which is inverse polynomial in the accuracy. Indeed, since the spin is quantized, the convergence is actually faster than polynomial. Once the number of measurements is polynomially large compared to S 2 , the convergence in accuracy becomes exponential 4 .
Using this ability to measure total spin of a set of particles as a primitive, we can measure the operators corresponding to any tree T . The key is to start at the leaves and work towards the root. Measure operators corresponding first to vertices closest to the leaves. Proceed toward the root, only measuring an operator on a given vertex once all operators below that vertex have been measured.
The point is, that while the operators corresponding to different vertices commute with each other, our measurement process reveals extra information. Our measurement process need not commute for different vertices. However, our measurement process for a given vertex does not affect the total spin on vertices closer to the root.

Preparing states
We now explain how to prepare a state |λ in polynomial expected time. In this case, we work in the reverse direction: we start at the root and work towards the leaves. In particular, we will prepare a sequence of states |λ 0 , |λ 1 , . . .. This corresponds to a sequence of partially labeled trees λ 0 , λ 1 , . . .. In a partially labeled tree, some set of vertices will have labels, and if a vertex v is labeled, then all vertices which are ancestors of v will also be labeled. The partially labeled tree λ 0 will have the root labeled with S root = 0 and no other labels. Each tree λ i+1 will have two more vertices labeled than the previous tree λ i in the sequence; this will be done by labeling a pair of vertices which are children of some vertex v i which is labeled in λ i , and the last tree in the sequence will have all vertices labeled and will be the same as λ.
Each state |λ i corresponding to some partially labeled tree λ i will have the property that for every operator corresponding to a labeled vertex of λ i , the state |λ i will have the corresponding expectation value for that operator. No other properties of the state are assumed, so a partially labeled tree does not uniquely specify a state.
Preparing the first state |λ 0 is easy: one may simply create any choice of n/2 singlets. We will construct a primitive operation that we call splitting which has the following properties. 1: The splitting primitive takes as input a set of n qubits with total spin S for some n, S; there are no other assumptions on the input state. 2: One fixes some m < n and some S , S with |S − S | ≤ S ≤ S + S . 3: The splitting primitive applies s/t measurements to qubits from that set, taking polynomial expected time. 4: The resulting state has (up to exponentially small error) the first m qubits with total spin S and the remaining n − m qubits with total spin S .
Given this splitting primitive, we can then produce each state |λ i+1 from the state |λ i , by applying splitting to the set of qubits corresponding to descendants of v i , with S , S depending on the labels in λ i+1 for the children of v i .
We now construct the splitting primitive above. First, let us say that a set of n qubits with total spin S are in canonical form if there are n − 2S singlet pairs (in some fixed configuration, rather than a superposition) and the remaining 2S qubits are in a totally symmetric state. For example, a state on 8 qubits with qubits 1, 3 in an singlet and 4, 7 in a singlet and 2, 5, 6, 8 in a totally symmetric state is in canonical form.
We divide the construction of splitting into four steps. First Step-First, we take the n qubits and bring them to canonical form. The construction is recursive. If n = 2S, then they are already in canonical form. If not, pick a pair of qubits at random and measure s/t. If the result is triplet, pick another pair at random, and try again, continuing until eventually some pair is in a singlet. That gives one singlet; we then bring the remaining n − 2 qubits to canonical form using the same algorithm recursively. This takes polynomial expected time. Second Step-Recall that we wish to divide the set of n qubits into two sets, with m and n − m qubits respectively, and with total spin S and S , with S ≥ S . Let ∆ = S − S . Let S min and S min be the two values for total spin such that S min −S min = ∆ and S min +S min = S.
Our second step will be to take the state after the first step which is already in canonical form, and divide it into two sets of qubits, of sizes m, n − m respectively, with total spins S min and S min respectively, with each set in canonical form. Call these two sets Q 1 and Q 2 . This division can be done easily: take the n − 2S qubits in a totally symmetric state, and place 2S min qubits in Q 1 and the remaining 2S min in Q 2 . Then, take the singlets from the state in the first step, and place each singlet into one of the two sets (either Q 1 or Q 2 ), so that the total sizes of the two sets are correct. Third Step-Our third step acts only on certain subsets of sets Q 1 , Q 2 . We call these subsets R 1 , R 2 and both have size 2∆. These subsets R 1 , R 2 are given by choosing ∆ singlets from Q 1 and letting those be in R 1 and choosing ∆ singlets from Q 2 and letting those be in R 2 . Qubits will remain in the subset they are in after the second step, but their state will change due to this step. What the step will do is bring it to a state where those qubits are now in a totally symmetric state in each set individually (i.e., the 2∆ qubits in R 1 are totally symmetric, as are the 2∆ qubits in R 2 ), while the total spin of those 4∆ qubits is still 0.
To do this, we use an iterative algorithm. First, A: bring both subsets R 1 , R 2 into canonical form.
If there are no singlets in either one, terminate. Otherwise, B: if either subset has a singlet, then so must the other 5 . Pick a pair of qubits, one from a singlet in each subset, and measure s/t. Call these qubits 1, 2 and call the singlets that they are in respectively s 1 , s 2 . Then, regardless of the measurement outcome, measure s/t on both qubits in s 1 . If the result is s, this pair of measurements has had no effect on the spins; in this case repeat the pair of measurements, continuing until the second measurement give t. Once the second measurement gives t, go back to step A.
We claim that this takes polynomial expected time. Consider the effect of B. Suppose that before measuring B the total spin in each set was S. After B, the total spin in each set must be one of the possibilities S − 1, S, S + 1. The total spin squared 6 before B is S(S + 1) and the total spin squared after B has expectation value S(S + 1) + 2. The process of bringing into canonical form measures the total spin in each set. We claim that the probability that the total spin squared is S + 1 is greater than the probability that it is S − 1. Indeed, this must follow in order for the total spin squared to have expectation value S(S + 1) + 2 after B because (S − 1)S + (S + 1)(S + 2) 2 = S 2 + S + 1 < S(S + 1) + 2.
Hence, the total spin does a biased random walk with the bias toward increasing spin, and so the spin must become maximal in at most polynomial time. Fourth Step-After the third step, we have the following. Each of the two sets Q 1 , Q 2 has three subsets. Let 1A, 1B, 1C denote the three subsets of the first set and 2A, 2B, 2C denote the three subsets of the second set, with 1A, 1B comprising R 1 and 2A, 2B comprising R 2 . The sets 1A, 2A each contain qubits in some product of singlets. The sets 1B and 2B each contain qubits in a totally symmetric state, with the union of 1B and 2B having total spin 0. The sets 1C and 2C 5 This is because the total spin of qubits in R1, R2 is 0. If R1 has at least one singlet, then R1 has total spin less than the maximal value of ∆ and so R2 must also have total spin less than ∆ in order for total spin of R1, R2 to be 0. 6 We remind the reader that the spin squared is S(S + 1), not S 2 . also each contain qubits in a totally symmetric state, but now the union of 1C and 2C is also in a totally symmetric state. In the fourth step, we act on sets 1B and 1C to try to bring them to a totally symmetric state (we also do the same procedure to 2B and 2C with the same goal but we just describe it for 1B, 1C). To do this, we we apply a large number of s/t measurements on qubits randomly chosen from the union of 1B and 1C. If all measurement outcomes are t, then the application of these measurements converges to projecting onto the state where those qubits in 1B union 1C are in a totally symmetric state of total spin S and we succeed. The convergence to this projector is exponential, once one has more than polynomially many measurements.
The probability that all measurements are t in this step is at least inverse polynomial. This may be seen by computing the projection of the initial state onto the space where 1B, 1C are totally symmetric and 2B, 2C are totally symmetric. If we fail, so that some measurement is s, we repeat all steps of this process.

Generalizing the Model
The model STP can be generalized in several ways. One natural generalization is to consider symmetries other than SU (2), such as SU (m) for m > 2. Another natural generalization is to consider higher spin representations of SU (2).
In such a higher spin representation model, we have qudits, for d = 2S +1 with S integer or halfinteger. We may consider having several different kinds of qudits simultaneously, for example having both qubits (S = 1/2) and qutrits (S = 1). We allow any two qudits (perhaps of different dimensions) to be brought together, and the total spin to be measured.
As a toy physical example, one might imagine deuterium. The deuterium nucleus has total spin 1, while the electron has spin 1/2. The deuterium atom then has total spin 1/2 or 3/2 and there is a hyperfine splitting between these states.
Interestingly, this higher spin model can be simulated using just s/t on qubits. To simulate a qudit with spin S, use 2S qubits in a totally symmetric state. When two qudits with spin S, S are brought together, we can measure the total spin of the 2S + 2S qubits by repeatedly selecting pairs of qubits uniformly at random and measur-ing s/t. Exactly as in Section 2, the convergence in accuracy is exponential once we have a number of measurements which is polynomial in the total spin.
After measuring the total spin, we can then bring the 2S + 2S qubits into two sets of 2S, 2S qubits respectively, both in a totally symmetric state, again as in Section 2. Indeed, this is done by the splitting primitive.

PostSTP Equals PostBQP
Let us formally define STP with postelection.

Definition 2.
Let PostSTP be the class of languages L ⊂ {0, 1} * such that for all inputs x the following holds. A quantum state is initialized as a product of polynomially many singlets. Then some classical algorithm (determined by L) takes x as input and outputs a sequence of polynomially many s/t measurements (as well as outcomes to postselect upon for all but the last measurement), taking polynomial time to output this sequence. The sequence of measurements is applied to the input state. Postselection is applied on all but the last measurement, with the promise that the outcomes postselected on have nonzero probability. Finally, if x ∈ L, then the last measurement is s with probability at least p for some p > 0 and if x ∈ L, then the last measurement is s with probability at most p for some p strictly less than p. The quantities p, p are independent of input size. Remark 6. To define postselection in symbols, let ψ 0 be the product of singlets which is the initial state. Let Π j be a projector which projects onto the desired outcome (either s or t) of the j-th measurement. Let A j = Π j Π j−1 . . . Π 1 be the product of these projectors up to the j-th one. Let there be K = poly(N ) measurement outcomes that we postselect on, and let Π k+1 denote the projector onto the singlet outcome for the last (the (K + 1)-st) measurement. Then, the probability that the last outcome is s is where we are promised that the denominator is nonzero.
In outline, we prove this result by first showing in Section 4.2 that we can use postselection to simulate imaginary time evolution with Heisenberg interactions. Ref. [5] showed that approximating the ground state energy of Hamiltonian with Heisenberg interactions is QMA-hard (see also [6]), so this shows already PostSTP is at least as powerful as QMA: simply evolve under the desired Heisenberg Hamiltonian for a polynomially long imaginary time. To show that we get PostBQP, we use the ability to produce evolution "time-dependent" Heisenberg interactions in imaginary time, i.e., to vary the Heisenberg Hamiltonian that we evolve under. We use the same encoding as in [5] to show in Section 4.3 that this gives us PostBQP. Appropriate choices of time-dependent Heisenberg Hamiltonians will give us both circuits and measurement. We will have to pay some attention to making errors exponentially small when we do this.
Before doing this, we show in Section 4.1 that the probabilities that we postselect on can only ever become exponentially small in poly(N ).

A Remark On How Small Amplitudes Can Become
Let ψ 0 be the product of singlets which is the initial quantum state for PostSTP.
We show Lemma 7. The nonzero probabilities that can occur in PostSTP are all Ω(exp(− poly(N )).
Proof. The probability that the j-th measurement outcome has the desired value is Clearly, the denominator is at most 1. So, we lower bound the numerator. Note that every t postselection can be replaced by (1/2)(1 + SWAP), where SWAP is the gate that swaps a pair of qubits. Similarly, every s postselection can be replaced by 1/2(1 − SWAP). Hence, the numerator is the sum of 4 j different terms, corresponding to replacing individual projectors Π j by either the identity or SWAP. The contribution of any such term to the expectation value is of the form 4 −j ψ 0 |Permute|ψ 0 , where Permute is the composition of some SWAPs, hence some permutation to the qubits. Recall that ψ 0 is the product of singlets, and therefore it is easy to see that for any permutation Permute of the qubits, the expectation value ψ 0 |Permute|ψ 0 is ±2 −J for some 0 ≤ J ≤ N −1. So, the expectation value is a sum of 4 j different terms, each of which equals ±4 −j 2 −J Permute , for some J Permute depending on the permutation Permute. Hence, if nonzero, the expectation value is at least 4 −j 2 −N −1 , where j ≤ poly(N ).

Simulating Imaginary Time Evolution
Let ψ resource ( ) be the four qubit state, on four qubits called C, D, E, F , with total spin 0 given by first preparing C, E in a singlet and D, F in a singlet and then acting with 1 + S E · S F , and finally appropriately normalizing the state to unit norm.
Now consider the effect of bringing in two extra qubits A, B in some arbitrary state ψ AB , and postselecting on A, C being s and on B, D being s. One may commute the postselection on A, C and B, D being in s with the action of 1+ S E · S F used to define ψ resource ( ). So, the effect of this operation is to teleport the state (1+ S A · S B )ψ AB to qubits E, F , while leaving A, C and B, D in singlets.
So, if we can produce ψ resource ( ), we can apply the operation 1 + S A · S B . Of course, the ability to apply 1 + S A · S B correspondingly implies the ability to create ψ resource ( ). Thus, the two things (the state and the operation) are equivalent as resources.
Note that the projection onto t is given (up to normalization) by the operation (1 + 0 S A · S B ) with 0 = 4/3, so we can create ψ resource (4/3).
We will give a protocol to reduce , consuming a pair of states ψ resource ( ) with given and applying the operation (1+ S A · S B ) to an arbitrary pair of qubits A, B, with .
If A, B were in a singlet initially, this creates ψ resource ( ). By applying this protocol repeatedly, we can produce a sequence of states ψ resource ( i ) with 0 = 4/3 and Remark 7. Of course we could also start with 0 = −4 which projects onto s up to normaliza-tion, but this does not lead to anything interesting.
The protocol is as follows. Create a pair of qubits C, D in a singlet, and let A, B be arbitrary. Then, apply (1 + S A · S C )(1 + S B · S D ), consuming the two copies of ψ resource ( ) to do this, and again project onto C, D in a singlet. A little algebra shows that the resulting state (up to normalization) is a singlet on C, D and the operation (1 + S A · S B ) is applied. We emphasize that Eq. (4.1) is not a perturbative result for small , but rather holds for all .
The cost of applying the i-th operation, (1 + i S A · S B ), is exponential in i. At the same time, the magnitude of i decreases doublyexponentially in i, roughly squaring at every step. So, for any x ∈ (0, 1], we can construct the operation 1 + S A S B for some in the interval [x 2 , (4/3)x] using a number of operations at most logarithmic in −1 : simply search for the first term in the sequence which lies in this interval.
To obtain operations with negative , consider a slight modification of the above sequence of operations. Create a pair of qubits C, D in a singlet, and let A, B be arbitrary. Apply Proof. Construct (1 + x S A · S B ) for some x which has the same sign as˜ , with x sufficiently small compared to δ, but x not much smaller than δ 2 , i.e. x = Ω(δ 2 ). Then, take powers of this operation to obtain a suitable 1 + S A · S B . Since x is not much smaller than δ 2 , it takes at most O(δ −2 ) operations to do this.
Remark 8. When constructing powers of the operation, it is more convenient to work with an exponentiated form of the operation: 1+x S A · S B = exp(y S A · S B ), where y depends on x.
Remark 9. Likely the dependence of the number of operations on δ can be greatly reduced. We have given an inefficient construction that involves first constructing an operation with a very small x and then taking powers of that operation. One might instead apply operations with several different i with different magnitudes to reduce the total. We do not worry about that here.

Lemma 9. In PostSTP, we can approximate imaginary time evolution under a time-dependent Heisenberg Hamiltonian, up to inverse polynomial error.
The proof of this lemma is immediate, once we define what we mean. A "Heisenberg Hamiltonian" means a Hamiltonian of the form H = i,j J ij S i · S j , where J ij is some polynomiallybounded matrix. By imaginary time evolution under such a Hamiltonian, we mean evolving an initial state under the equation ∂ t ψ = Hψ, up to normalization. By "time-dependent", we allow is some polynomially bounded matrix which depends on t and we allow ∂ t ψ(t) = H(t)ψ(t). Finally, the inverse polynomial error is an error on the right-hand side of the evolution equation: is a state whose norm may be made polynomially smaller than ψ(t).
Then, to prove the lemma, we simply Trotterize 7 the imaginary time evolution equation, and simulate the Trotter steps using postselection by picking to be polynomially small in the state ψ resource ( ), and applying Lemma 8.
This already immediately implies that

Corollary 2. PostSTP contains QMA.
Proof. By [5], approximating the ground state of a Heisenberg Hamiltonian to inverse polynomial error is QMA-hard. The operator exp(−Ht) is a ground state projector and hence by approximating this imaginary evolution up to inverse polynomial error, we can estimate the ground state of this Hamiltonian.

Simulating PostBQP
However, our goal is not just to prove that Post-STP contains QMA, but rather than PostSTP is equivalent to PostBQP.
To do this, we again use results from [5]. We use the result (see section 5.1 of that paper) that using Heisenberg interactions we can implement 7 The evolution e t( i H i ) can be efficiently approximated by the Lie-Trotter product formula ( i e tH i /n ) n for n small enough. logical qubits (using three physical qubits for each logical) and obtain terms X, Z on any given logical qubit as well as terms XX, ZZ on pairs (see also [7] for an encoding using the Heisenberg interaction). So, it suffices then to consider a model of computation in which we have qubits (which for the rest of this subsection refer to the logical qubits of [5]), and the ability to implement imaginary time evolution under time-dependent X, Z, XX, ZZ, up to inverse polynomial error. Throughout this subsection, when we refer to time, we mean imaginary time.
By turning on an XX term on a pair of qubits for a long time and then turning it off (while leaving other terms off), we can approximately project onto the XX = +1 or XX = −1 eigenstates, and similarly using ZZ terms we can approximately project onto ZZ = +1 or ZZ = −1 eigenstates. Further, by turning on a sum of X and Z terms on a single qubit (while leaving other terms off), we can prepare an ancilla qubit in a state which is approximately any desired pure state cos(θ)|0 + sin(θ)|1 .
These abilities suffice. First, we can approximately prepare a qubit in a |+ state, and then using it as an ancilla, we can approximately apply a CNOT from a source to a target by using the ability to approximately postselect on ZZ = +1 and XX = +1 eigenstates. This is the same as we used in Section 1.1.1 from [15,23]. Note importantly that here we are using imaginary time evolution to approximately apply a unitary gate. This should be not be surprising; after all, the well-known idea of measurement based quantum computation [20] uses a sequence of measurements to apply unitary gates.
Also, by preparing ancillas which are approximately in states cos(θ)|0 + sin(θ)|1 for other choices of θ, and using the CNOT gates and state injection, we can approximately implement unitary rotations to produce approximate rotations by exp(iθY ) where Y is the single qubit Pauli Y .
The ability to do CNOT and rotations exp(iθY ) allows universal quantum computation. So, if we ignore issues of error (i.e., the fact that all our constructions only approximately gave these gates), we can implement arbitrary quantum circuits and further we can postselect on measurement outcomes since we can implement imaginary time evolution under Z, giving Post-BQP.
The issue of error can be resolved by implementing a fault tolerant construction using these approximate gates. Note that the error in the gates can be made arbitrarily small. Indeed, we can even make the error in individual gates 1/ poly(N ) for any polynomial at the cost of a polynomial overhead, so we can make the error in gates polynomially smaller than the the inverse of the total number of gates! Thus, if our goal were simply to simulate BQP, where we would be satisfied with an inverse polynomial error in the output probabilities of our quantum circuit, there would be no need for any fault tolerance. However, since we want to simulate PostBQP, we need exponentially small errors. So, some fault tolerant construction is needed.
Of course, the ability to make the error in individual gates polynomially small compared to the inverse of the total number of gates certainly simplifies the fault tolerant construction. However, we claim that in fact the usual threshold theorems can be applied to our setting and so long as the error in individual gates is sufficiently small, we can make the error in logical operations exponentially small.
The usual threshold theorems involve replacing idealized unitary gates by CPTP maps that approximate (in diamond norm) the desired unitary operations. Instead, we are replacing idealized unitary gates by linear operators that act on pure states (rather than mixed states) that are close in operator norm to the desired unitary. However, we now show that the usual threshold theorems can be adapted to this case.
Consider any given gate in the circuit, which ideally would be implemented by some unitary we will call U . Suppose instead we implement some map A (which is a linear operator on pure state) with A − U ≤ , where · · · is the operator norm. Then, (1 − )A has singular values bounded by 1 and if we define B to be any operator such that Making such a replacement for all gates in the circuit, we get a "noisy circuit" where every gate has two Krauss operators, with the first Krauss operator being O( ) close to the ideal unitary. The usual threshold theorems apply to this noisy circuit for sufficiently small showing that the logical error is exponentially small. Then, the situation relevant for us is one in which every time a CPTP map is applied for some gate, we pick the first Krauss operator (that is, (1 − )A in this case), rather than the second. This can be physically thought of as selecting a particular noise pattern.
So, we ask: if we select the first Krauss operator for every CPTP map in the noisy circuit, is the error still exponentially small? Intuitively, there is no problem: the first Krauss operator is closer to what we want than the second Krauss operator, so the situation in which we select that Krauss operator every time should be even better than a random choice.
To prove this, of course one could reprove the threshold theorem, using our gates (which are not noisy in that they map pure states to pure states but which still only approximate the desired gates). Indeed, this could be done simply by following through some existing construction and showing that the error is exponentially reduced if the first Krauss operator is selected every time. However, we would like to minimize our effort, and show that the error is exponentially small using the standard threshold theorem as a "black box", without delving into the proof. To do this, we use a simple trick. Use some hierarchical construction to prove the threshold theorem, see for example [10] for a review of various constructions. Using such a hierarchical construction of the threshold theorem, one considers codes of some O(1) size, and proves that each step of the hierarchy reduces the error rate. If the error rate is at some level, then it is O( k ) for some k > 1 at a higher level of the hierarchy and so for sufficiently small the error rate reduces. We regard this error rate as a sum over different trajectories, corresponding to different choices of Krauss operators in each CPTP map. The contribution of the trajectory where we select the first Krauss operator in every map is 1−O( ), with the constant hidden in the big-O notation depending on the size of the code. So, for sufficiently small , the contribution of the that trajectory is the dominant one and so the error rate for that trajectory at the next level of the hierarchy is also O( k ), i.e., the hierarchical construction reduces error rate for sufficiently small for this case as well.
Remark 10. Ref. [6] shows how to simulate 2D topological phases from imaginary time Heisenberg evolution. Anyons may be built into these simulations as boundary conditions, and time dependence of the Hamiltonians allows these anyons to be braided adiabatically, and then, later, collective (topological) charge states to be measured by simulating interferometry. In other words, once one can build imaginary time Heisenberg evolution (as in Lemma 9), one can simulate the complete operation of a topological quantum computer. This provides an alternative path from Lemma 9 to showing that PostSTP equals PostBQP, the post-selection in the conclusion stemming from our ability to post-select measurement outcomes of the simulated topological quantum computer, and the topological protection of the state allowing us to make errors exponentially small. While this is an extremely concise argument in outline, some details regarding efficiency should be filled in; this is why we gave the more explicit argument first.
[21] Terry Rudolph and Shashank Soyuz Virmani. A relational quantum computer using only two-qubit total spin measurement and an initial supply of highly mixed single-qubit states.

A Approximating rotations by powers of a fixed rotation
Let θ be an irrational multiple of 2π. Then, the integer multiples of θ are dense in the unit circle. This has the implication that, given the ability to implement rotation exp(iZθ) on a single qubit, one can approximate the rotation exp(iZφ) for any φ to any desired accuracy by repeating the operation exp(iZθ) sufficiently many times. However, it becomes interesting to ask: how many times must one repeat the operation exp(iZθ) to attain some given accuracy. This is equivalent to the question: given some φ and some δ > 0, what is the smallest n such that nθ is within distance δ of φ, modulo 2π.
The answer to the question depends on the irrationality measure µ of θ/(2π). The irrationality measure µ of a number x is defined to be the smallest number such that for any > 0, we have for all sufficiently large integers q, for all p.
Some numbers have infinite irrationality measure; these are called Liouville numbers. As an example, consider a number such as j>0 10 −f (j) , where f (j) is some fast growing function such as j!. Almost all (in the sense of measure theory) real numbers have irrationality measure 2. The arctangent of any algebraic number has finite irrationality measure [1].

Lemma 10.
Assume θ/(2π) is irrational with irrationality measure µ. Let us define the distance between angles in the obvious way, as the shortest distance between them on the circle; we write this distance using an absolute value sign.
Then, for any > 0, for any φ and any δ > 0, there is some m with |mθ − φ| ≤ δ and with m ≤ O( 1 δ µ+ ), where the constant hidden in the big-O notation depends on .
Proof. Consider the set {0, θ, 2θ, . . . , kθ}, where k = 2π/δ . All elements of this set are distinct. So, there must be two different elements, call them j 1 θ, j 2 θ, with |j 1 θ − j 2 θ| ≤ (2π)/k. Let q = j 2 − j 1 , so qθ is within distance (2π)/k ≤ δ of 0. By the definition of irrationality measure, multiplying both sides by 2πq, we get |qθ − 2πp| > ( 2π q ) µ+ −1 for sufficiently large q. So, for sufficiently large q, we have qθ at least distance ( 2π q ) µ+ −1 from 0. For smaller q, it is possible that qθ is closer than that, but there are only finitely many such q, so qθ is at least θ min for some θ min > 0 depending on . Now consider the sequence of angles 0, qθ, 2qθ, 3qθ, . . . . These angles change by at most δ from one to the next, so some angle in the sequence is within δ of φ.
At the same time, they change by at least ∆ ≡ min(θ min , ( 2π q ) µ+ −1 ), so we only need to consider terms up to 2π∆ −1 in the sequence to get an angle within δ of φ. Note that q is bounded by 2k, so it suffices to consider terms up to O(k µ+ −1 ) in the sequence. Since q is O(k), this means we consider m up to O(k µ+ ).

B No-leakage and equiangular sequences
Our paper has antecedents in a previous work [8], where the possibility of chemically protected quantum computation [9, section 8] was introduced; the proposal has been to leverage the symmetry of small molecules for quantum computation, by exploiting a coupling between orbital angular momentum and nuclear spin. Such a coupling may prove useful in implementing STP, but in this paper we have considered the abstracted problem where this simplest possible measurement, s/t, of a collective spin state is taken as the primitive.
In contrast to the approach presented in the main text, a more geometrical approach for analyzing the computational power of the s/t model is to consider no-leakage or equiangular sequences. This approach also originates from [9] where it is shown that the notion of equiangularity is key to many measurement based models that imitate unitary evolution. Indeed, the main result there, involves a model where the measurement based online part of the computation is done through equiangular sequences, as these are sequences in which any undesired outcome of a measurement can be corrected without leakage or starting the computation all over again.
In [9], equiangularity of a pair of projection is defined. Here, we need to define equiangularity for an ordered pair and generalize some of the results in [9]. From now on, a projection P will refer to the operator and the corresponding image (hyper-)plane.

Definition 3.
We call the ordered pair of projection (Q, P ) equiangular if P QP = α 2 P for some α > 0.
Remark 11. Equiangular is a suitable name for the above algebraic equation. In fact, if P QP = α 2 P for some α > 0, then α is equal to cos(θ) with all dihedral angles from P to Q being equal to θ. Also note that (Q, P ) being equiangular implies the same for (1 − Q, P ). The proof of all these involves elementary linear algebra; see [9].
Remark 12. An intuitive consequence of the above is that the application of Q to a computational space defined in the plane P is a unitary embedding up to some positive scale, meaning restricted to the image of P , the map is a unitary up to some positive scale (see [9] for the detailed proof).
The above remark can also be paraphrased as Q defines a no-leakage map from the plane P to the plane Q.
Suppose we want to apply the projective measurement {Q, 1 − Q} following P . Then, we can always force the outcome to be Q. Indeed, assume we get (1 − Q)P . We try {P, 1 − P } and get either P (1 − Q)P ∝ P (so back at P ), or (1 − P )(1 − Q)P . In the latter case, we try {Q, 1 − Q} again, and get either which means we are back at our first unsuccessful measurement. Notice the last case happens with a fixed probability depending on α. Thus, repetition always exponentially suppresses the probability of failure.
The above process is the key reason behind the use of equiangular ordered pairs in a measurement model, as it ensures recovery in case of failure. Let us now define an equiangular sequence of projections, where the same recovery protocol works.

Definition 4.
An equiangular sequence (P k , . . . , P 1 ) satisfies: Remark 13. In an equiangular sequence, applying P i+1 onto the image of P i . . . P 1 is a unitary embedding up to some positive scale. The argument is simply a more involved version of the ordered pair case. Recovery is similarly ensured, i.e. we can always force P i+1 on P i . . . P 1 .
Next, we define a no-leakage sequence. Intuitively, this is a sequence where the measurements do not "leak" quantum information to the environment. Precisely, we mean: Definition 5. The sequence (P k , . . . , P 1 ) is no-leakage if the projection P i+1 is a unitary embedding up to some scale from the image of P i . . . P 1 for all 1 ≤ i ≤ k − 1 to a subspace of P i+1 .
Is a no-leakage sequence necessarily equiangular? The answer is yes when k = 2. In general, linear algebra tells us that the orthogonal eigenspaces E i of P i P i+1 P i that are a subspace of P i decompose P i = E i to subspaces that have all dihedral angles with P i+1 equal. It is not hard to show that in an equiangular sequence, the image of P i . . . P 1 falls inside one of those eigenspaces and α i,i+1 is the cosine of that dihedral angle. However, in a no-leakage sequence, the only requirement is that the image of P i . . . P 1 is a plane with dihedral angles all equal with P i+1 , and that plane could be formed by some complicated combination of vectors in different eigenspaces.
In contrast to the equiangular case, there is generally no guarantee for recovery in a no-leakage sequence. Still, such sequences are of interest in the case of PostSTP, since by definition, we are allowed to force the outcome. Now, we investigate the computational power of each type of sequence, first equiangular sequences of s/t. For a computation to be done on a spin-0 subspace of 2N spins, called spin 0 (2N ), initialized at a singlet dimerization, we use an even number 2M of ancillas, similarly initialized at a singlet dimerization. The sequence starts and ends with the projection onto a particular singlet state for the ancilla pairs, i.e. ancilla pairs s, and in between, s/t measurements can be made on the N + M pairs. This ensures that we start and end at the same computational space spin 0 (2N ). Hence we define: Equiangularity forbids a dimension decrease of the computational space, as the map at each stage must be unitary. Notice dimensional decrease is even 'worse' than leakage, as leakage can occur if the map is invertible but not unitary. Thus at each stage of the computation, we must observe a matching of M pairs of spins on which a projection have been made, meaning that each projection P i is applied on a pair that shares exactly one spin with the previous M pairs; sharing no spins would decrease dimension, and sharing both spins would either decrease dimension or be a redundant projection. This is shown in more details in the proof of the next lemma.
For example, let us denote all ancillas by a 1 , . . . , a 2M and the computational spins by c 1 , . . . , c 2N . At first, a 2i−1 , a 2i are paired in a singlet state. Then, it is not hard to show that P 1 can be any s or t on any pair of the form a i c j . If M = 1 (only two ancillas), then each step involves applying s or t on a pair sharing one spin with the previous measurement. Of course, to satisfy equiangularity. there may not be complete freedom in choosing the other spin. If M > 1, then assuming P 1 applies on a 1 c 1 , P 2 must share a spin with {a 1 , c 1 , a 3 , a 4 }; notice that a 2 has been discarded, as it was paired with a 1 .
Remark 14. We may allow consecutive projections onto pairs that are disjoint, if the goal is to apply an equiangular sequence on a particular subspace of spin 0 (2N ), i.e. the computational space is a subspace of spin 0 (2N ). We explored this approach in our simulations to some extent, but without success.
We now show that no-leakage sequences (and therefore equiangular sequences) with one pair of ancillas only perform signed permutations. This limitation is a reason behind the use of alternative/more analytical methods in Theorem 1.

Lemma 11.
With M = 1 pair of ancillas, every s/t no-leakage sequence is some signed permutation on the 2N computational spins.
Proof. Let us assume N = 2. Ancillas are a 1 , a 2 and computational spins are c 1 , ..., c 4 . At the start, up to permutation, the only possible measurement is on a 1 , c 1 . Indeed, if we measure a 1 , a 2 then it leaves the state unchanged and if we measure any p, q with p, q ∈ {c 1 , . . . , c 4 }, then it decreases the dimension of the computational space. So we must measure one ancilla and one non-ancilla; without loss of generality, let it be a 1 and c 1 .
If the outcome of the a 1 , c 1 measurement is singlet, then the a 1 , a 2 ancilla pair has simply been replaced by an a 1 , c 1 ancilla pair with c 1 being teleported to a 2 . This is a simple permutation. So, we may assume the outcome is triplet.
What is the next possible measurement? It can not be a pair p, q ∈ {c 2 , c 3 , c 4 }, as that would again decrease dimension. One may readily see this as spins c 2 , c 3 , c 4 have not been touched yet, so for some initial states it is possible that such a pair is definitely in a singlet, but not for all.
Notice the state after the first measurement is symmetric under a 1 and c 1 interchanged, as they are a triplet. So suppose the pair a 1 , a 2 or c 1 , a 2 is measured. If the outcome is a singlet, the whole sequence has gained nothing: we return to a 1 , a 2 being an ancilla in a singlet, and the initial spin c 1 is returned back to itself after this sequence. So, suppose we measure a 1 , a 2 and the outcome is a triplet.
To analyze this situation, let us make some notation for the state after the first measurement. After the first measurement, the state is (1 + Swap(c 1 , a 1 ))ψ where ψ is the initial state. The state ψ is annihilated by the projection onto a 1 , a 2 triplet. So, we can assume the state after the first measurement is Swap(c 1 , a 1 )ψ. However, this is equivalent to the initial state up to permutation of spins (i.e. we can replace the first measurement by a permutation) so nothing is gained.
The next possibility for the second measurement is to measure p, q with p = a 1 and q ∈ {c 2 , c 3 , c 4 }. Equivalently, we could pick p = c 1 , as a 1 , c 1 are symmetric. This case also leaks information, though a more detailed check is needed. If spin c 1 started out as some σ at the start of the sequence, then spin a 1 and spin c 1 both are "correlated" with that initial σ after the first measurement. Therefore, measurement on such a p, q pair leaks information (this can also be checked on a computer).
Finally, we might pick p = a 2 and q ∈ {c 2 , c 3 , c 4 } but this also leaks information. Say q = c 2 is picked. If at the start of the sequence, we had spins c 1 , c 2 in a singlet, then a measurement of a 2 , c 2 will always yield triplet, therefore leakage occurs in this case as well.
Notice the proof does not rely on N = 2 and works for all N .
Remark 15. Given equiangular sequences, computer search, of hundreds of thousands iterations for ten different random seeds for producing random s/t sequences, shows that increasing the number of ancillas does not make any difference in the statement of the theorem above.
As previously mentioned, there could be interesting operators for PostSTP from no-leakage sequences. In fact, by increasing the number of ancillas, the previous lemma no longer holds: Proof. The proof is by computer search. The following is no-leakage: 1 ,a 2 s a 3 ,a 4 s a 5 ,a 6 , t a 6 ,c 1 , s a 2 ,a 6 , t a 6 ,c 2 , s a 4 ,a 6 , t a 6 ,c 4 , s a 1 ,a 2 s a 3 ,a 4 s a 5 ,a 6 ), where s − , t − is singlet or triplet projection on pair −. The sequence gives a unitary with infinite order on spin 0 (4), as its eigenvalues are 3 √ 3 2 √ 7 ± 1 2 √ 7 i, which correspond to an irrational angle [2]. The following sequence of projections is no-leakage and gives a unitary with order 12 on the space spin 0 (4), hence can not be a signed permutation on 4 spins: (s a 1 ,a 2 s a 3 ,a 4 , t a 3 ,c 3 , t a 2 ,c 2 , s a 2 ,c 3 , t a 3 ,c 2 , s a 3 ,c 4 , s a 1 ,a 2 s a 3 ,a 4 ).
Unlike the case of N = 2, computer search surprisingly was unable to get any interesting operator for N > 2, even when using higher number of ancillas and different variations on the operators that can be used (for example, simultaneous projection of triplet and/or singlet on different pairs). Thus, we are unable to use no-leakage sequences to prove PostSTP = PostBQP.
We can relax the requirement even further, by simply demanding the result of the sequence to be no-leakage, i.e. give a unitary operator, thereby not requiring each step to be no-leakage. One should imagine an elliptical distortion created by one projection being canceled by a subsequent one. This is still allowed as in PostSTP we can force the outcome. This makes it possible to obtain infinite order unitaries with a sequence of length 3 on spin 0 (4) with only 2 ancillas, such as (s a 1 ,a 2 , t a 2 ,c 2 , t a 2 ,c 1 , t a 2 ,c 4 , s a 1 ,a 2 ).
This corresponds to a unitary with eigenvalues 3 √ 3 2 √ 7 ± 1 2 √ 7 i, even though t a 2 ,c 1 leaks information (gives an invertible but nonunitary map). However, computer search does not deliver any interesting unitary (i.e. not a signed permutation) given higher numbers of spins and ancillas. C Statistics of amplitudes from random s/t sequences In this appendix, we provide informal arguments independent from the main text, supplemented with numerical results which support the inference that the s/t model is not classically simulable. Indeed, we believe that it is not possible to efficiently classically compute the probability of a single measurement being s or t with inverse polynomial error. We present a series of numerical experiments, and then finally use these to propose an average case sampling problem in Problem 1.
To this end, we examine the statistics of amplitudes from random s/t sequences. Suppose we start with a singlet dimerization on 2N spins and pick random pairs and measure whether they are singlet or triplet without postselection. After many rounds of measurement, we pick one final random pair and consider the probability that the pair is a singlet. If this probability is exponentially close to 1 4 , this is not very useful as it can be classically simulated trivially by simply guessing that the value is poly , then it may be hard to simulate classically, and allow one to devise an experiment similar to the Google "quantum supremacy" experiment [3].
However, the naive experiment discussed above fails to deliver any interesting result. To understand this, consider the random sampling performed in [3] using random circuits and individual spin measurements instead of our s/t measurements. There, the standard approximation is to guess that the random circuit produces a random state (Haar random, as an approximation). This random state is well described by choosing the amplitude of every single computational basis state from a Gaussian distribution. So, if one measures a single spin on the final state, the distribution is exponentially close to flat ( 1 2 , 1 2 probability). The reason is, for 2N spins, there are are 2 2N −1 basis states where the selected spin is up and 2 2N −1 where it is down, and adding all those random numbers gives very small fluctuations that cancel each other out to give a probability exponentially close to ( 1 2 , 1 2 ). The same holds if one measures 2, 3, 4, . . . , O(1) spins; their joint probability distribution is still very close to flat. So, we expect a similar behavior that the simple protocol of the above paragraph will not be too interesting, and we give a more interesting protocol later.
We can see this flatness in practice using a simulation of a sequence of 1000 measurements on 18 spins (Fig. C.1) in this protocol, followed by computing the probability of triplet on each pair, of which there are 153. At first glance, there seem to be interesting fluctuations with high magnitude, but what we actually see, as explained later, is a rather boring deviation from exponential flatness. As expected, most of the graph is hovering around 3 4 ; but notice that there some triplets and singlets (values 1 and 0 on the graph respectively). We present an informal argument regarding these (extreme) fluctuations, why they exist even for very long runs like 1000 or more, and why they are not an indication of quantum supremacy. Consider a gas of spins and draw a straight line between two spins when they form a singlet, and a wiggly line for a triplet, and otherwise no line. Suppose the gas reaches a state where there are no lines. Then, as we select a pair and measure s/t, some line (straight or wiggly) forms. By selecting another random pair, likely disjoint from the first, another line forms. By the time about (1 − 1/ √ 2)2N of the spins are paired off by lines of one kind or the other, it becomes equally probable that the next randomly selected pair will touch/not touch a paired spin and that the measurement will break the preexisting line and replace it with another. We should note that breaking a line is the worst case, corresponding to the singlet measurement outcome, and we are assuming such an outcome for every measurement, giving us the ratio (1 − 1/ √ 2) (if triplet is the outcome, the previous line may remain, and we get more lines as a result). On this "fresh" subpopulation, the process, even if it is at step 1001, looks like it is starting from scratch. More generally, there is also an O(1) fraction subpopulation whose parents, or grandparents were all fresh -part of a singlet sub-dimerization, which could provide an explanation on the rest of the fluctuations in Fig. C.1. This constant "refreshment" prevents exponential flatness, but should not be mistaken for evidence of quantum supremacy. We noted how sampling an O(1) number of spins would not break the exponential flatness (other than the deviations described in the above paragraph). However, once we sample almost all the spins, then amplitudes can get interesting. For example, after sampling 2N − k spins, for some k, then the deviation of the next spin from flatness is only exponentially small in k. So, the last few measurements are not close to flat.
To implement such a sampling, imagine a protocol that "detangles" a state: take a pair of spins at random, and measure. If the answer is a singlet, then remove that pair from the collection of spin, and continue the algorithm on the remaining spins. If it is not a singlet, then take another random pair (potentially not disjoint from the previous pair found to be a triplet). Eventually, this fully detangles the state, and at some point only a few spins are left entangled, and finally none at all.
As one detangles, the probabilities start exponentially close to flat but become less and less flat. When there are only 4 spins left, the last measurements are far from flat. Notice if we stop at 2 spins, the result is already determined: a singlet, as the total spin of our original state had to be zero.
Let us call those spins 1, 2, 3, 4. Then, we might guess that after many measurements, we have a "random" spin 0 state on 1, 2, 3, 4. So, we want to compute: given a random singlet on 4 spins, what is the distribution of the probability that a given pair, say 1, 2, will be in a singlet?
The space spin 0 (4) is two-dimensional, and is spanned by an orthogonal basis which amounts to symmetrizing or anti-symmetrizing on the pair 1, 2, implying there is one state |S which is a singlet on 1, 2, and one state |T which is a triplet on 1, 2. We ask: For a randomly chosen state, what is the distribution of probability on the |S outcome of the measurement?
A simple guess would be a random (real) state of the form cos(θ) |S + sin(θ) |T , with θ picked uniformly on the circle. Since dθ = − d cos(θ) sin(θ) , a uniform measure on θ is a measure proportional to 1/| sin(θ)| on the amplitude cos(θ) (Fig. C.2) and as we would measure probabilities, we should consider cos(θ) 2 . However, due to the subpopulation refreshment phenomena described earlier, we would expect to see certain number of spikes on the distribution over probabilities 1, 3 4 , 1 4 , 0 and possibly other numbers. To sum it up, we expect a continuous measure which hopefully delivers arbitrary looking numbers, plus some inevitable discrete terms.
We illustrate the probability of |S for a simulation of 10 runs of the algorithm on 1000 random sequences of 1000 measurements on 14, 16, and 18 spins (Fig. C.3). We see some spikes at 1, 3 4 , 0 for |S and 1, 1 4 , 0 for |T . Overall, we see a continuous measure plus some discrete terms, and the continuous measure delivers the arbitrary looking numbers, apparently quite distinct from cos(θ) 2 . This may indicate a difficulty for classical simulation, and a route to quantum supremacy.
Thus, we propose the following random sampling problem, STSample, which can be viewed as an average case sampling problem: Problem 1.
[STSample] Considering preparing 2N spins in a singlet. Perform Θ(poly(N )) rounds of s/t measurements on random pairs. Then, apply the detangling protocol, until 4 spins are left, and then finally perform one last s/t measurement. Given a choice of random pairs, this defines a probability distribution over bit strings (where each bit being 0 or 1 denotes whether a measurement We present another alternative approach, this time involving representation theory 8 , to proving PostSTP = PostBQP. The "theory" is easy, but numerical difficulties have prevented us from making this approach rigorous. Given 2N = 8 spins in a spin-0 subspace and 2M spins as ancillas initialized in a singlet dimerization, taking out dilation, we aim to prove density in the group SL(14, R), where 14 = dim spin 0 (8). Then we define our logical qubit as the two dimensional spin 0 (4). We may think spin 0 (8) as containing two spin 0 (4) qubits: spin 0 (1, 2, 3, 4) ad spin 0 (5, 6,7,8), and 10 additional (unwanted) directions. This density result would give the entangling gates for the logical qubits, as the SO(4) gates are included in the larger SL(14, R) group.
We know that by simply using singlet projections, we can permute the spins. Thus, we have an action of the symmetric group S 8 is given on spin 0 (8). This irreducible representation corresponds to the Young tableau with partition (4,4), as also shown in [14]. An action is induced (by conjugation) on the endomorphism algebra End(spin 0 (8)) ∼ = M(14, R), and similarly on its Lie algebra A which is isomorphic to M(14, R). Note that we mention the Lie algebra as we will take logs of operators later on.
We thus consider only these normalized operators with determinant 1 in SL(14, R) given as sequences of s/t. To get a dense set, ideally the set of gates should be closed under inverse, allowing a variant of the Solovay-Kitaev algorithm [16] to be implemented. But how can we approximate the inverse of a sequence of s/t? If the operator is not orthogonal, we can not simply take the conjugate sequence, which is the sequence applied in reverse order.
Take the (smallest) log of one of these operators (assuming it exists), say log(O). Then consider the orbits of log(O), which are also the logs of the orbit of O. Adding them up produces an element X = ρ∈S 8 ρ(log(O)) in A 0 which is invariant under S 8 . Since (8) has been projected out, 0 is the Exponentiation and using Baker-Campbell-Hausdorff formula for the error analysis now allow us to approximate the inverse of O.
In the numerical studies, we have normalized the operators by the determinant, thus eliminating the trivial irrep (8), leaving six irreps spanning the Lie algebra A 0 ∼ = sl(14, R). To achieve density in SL(14, R), we need to search for: Lemma 13. Density in SL(14, R) follows given a set of operators "arbitrarily" close to identity (e.g. in the operator norm) with logs having nonzero component in each of the six irreps.
Proof. Given such an operator O, define V O := span {ρ(log(O))} ρ∈S 8 . Clearly V O is invariant and since it has nonzero projection onto each of the six irreps, it must be the whole Lie algebra sl(14, R).
However, density must be achieved by positive integral linear combinations of {ρ(log(O))} ρ∈S 8 , as we only have access to positive integer multiplies of ρ(log(O)) by composing their corresponding s/t sequences. But we previously showed how to approximate inverses (negative multiples) with positive linear combinations, thus giving access to integral linear combinations. Having operators arbitrarily close to id, meaning (the smallest) log(O) very close to zero (in, say, the operator norm), serves the purpose of establishing density with just positive integral linear combinations.
Moreover, it is not necessary to have operators arbitrarily close to identity, as the threshold theorem states; see also the last argument in Section 4.3 regarding the issue of error. Nevertheless, the threshold error required in this case is very small, especially as we build the inverse of an operator by taking 8! − 1 other operators, which introduces even more errors. A very rough estimate would be 10 −10 , but it is likely much less than that, as the usual error correction threshold is around 10 −3 and 8! − 1 ∼ 10 4 terms are used in a composition to compute the inverse of any element. Thus, numerical assisted proof by searching sequences of s/t is out of reach. However, a search can provide evidence of the existence of such operators, by showing one can find operators closer and closer to identity as the length of the s/t sequence increases. We have found such examples in the smaller case of 4 computational spins.
We find two operators  . .] (M times) to identity. The x-axis is the length of the commutator which is M and the y-axis has been log-scaled to show the exponential decay of distance to identity.