Robust sparse IQP sampling in constant depth

Between NISQ (noisy intermediate scale quantum) approaches without any proof of robust quantum advantage and fully fault-tolerant quantum computation, we propose a scheme to achieve a provable superpolynomial quantum advantage (under some widely accepted complexity conjectures) that is robust to noise with minimal error correction requirements. We choose a class of sampling problems with commuting gates known as sparse IQP (Instantaneous Quantum Polynomial-time) circuits and we ensure its fault-tolerant implementation by introducing the tetrahelix code. This new code is obtained by merging several tetrahedral codes (3D color codes) and has the following properties: each sparse IQP gate admits a transversal implementation, and the depth of the logical circuit can be traded for its width. Combining those, we obtain a depth-1 implementation of any sparse IQP circuit up to the preparation of encoded states. This comes at the cost of a space overhead which is only polylogarithmic in the width of the original circuit. We furthermore show that the state preparation can also be performed in constant depth with a single step of feed-forward from classical computation. Our construction thus exhibits a robust superpolynomial quantum advantage for a sampling problem implemented on a constant depth circuit with a single round of measurement and feed-forward.


Introduction
Recent progress on quantum hardware suggests that quantum processors will soon be able to outperform classical devices for some specific tasks.In the absence of fault-tolerant quantum computers, sampling problems [1,2] appear to be a promising avenue to demonstrate such a quantum advantage since they can be solved with reasonably small circuits.In sampling problems, given some family C of quantum circuits on N quantum registers, the goal is to sample from the output distribution p C for any circuit C ∈ C. Well-known examples of circuit families include linear optical circuits in the case of BosonSampling [3], random quantum circuits [4] and Instantaneous Quantum Polynomial-time (IQP) circuits [5].The original idea behind these proposals was that quantum processors can in principle sample from the corresponding distributions, while it is widely believed that classical computers cannot complete the same task efficiently.The caveat, however, is that current quantum processors are not equipped with faulttolerance, and will instead output noisy samples, thus only solving a noisy version of the initial sampling problem.Unfortunately, the evidence for the classical hardness of this problem is thinner, and recent works have cast some serious doubts on the possibility of demonstrating a quantum advantage with this approach [6,7,8,9,10].
A potential strategy to address the issue of noise is to focus on problems for which it is possible to add some level of fault-tolerance, in an intermediate manner between Noisy Intermediate-Scale Quantum (NISQ) processors available in the near term [11] and universal fault-tolerant quantum computation.We list potential approaches to such robust quantum advantage in Table 1.The fact that IQP circuits are a nonuniversal class of circuits make them a good candidate in this respect since they are easier to make faulttolerant.In particular, they can bypass the limitations of the Eastin-Knill theorem which states that a universal gate set cannot be implemented with transversal gates [12].In this work, we show how to perform a fault-tolerant version of (sparse) IQP sampling with a constant-depth quantum circuit and with a space overhead that is only polylogarithmic in the width of the original circuit.We note that [13] addressed a similar question for a different sampling problem, but the constant depth was obtained at the price of a polynomial overhead in terms of qubits because of differences in the initial computational problem and of the magic state distillation protocol necessary to its fault-tolerant implementation.In addition, it neglects some polynomial-time classical computation necessary for the error correction but during which errors can accumulate, while we bring down the complexity of error correction to polylogarithmic-time, making it less of an issue for future implementations.We note that similar computation times, for correcting a surface code of logarithmic size for instance, are often neglected in the literature.

Sparse IQP
An IQP circuit on N qubits takes a very simple form (see Figure 1): one applies an N -qubit gate D, diagonal in the computational basis, to an initial state |+⟩ ⊗N and measures the resulting state in the {|+⟩ , |−⟩} basis [16,17,5].Here, we will focus on the sparse variant of IQP circuits introduced in [18].In this variant, the circuit D is generated randomly from logarithmic-depth circuits with gate set {T, CS}: More precisely, such a circuit on N qubits is generated in the following way: • a single-qubit gate T k is applied to every qubit, with k ∈ {0, . . ., 7} chosen uniformly and independently for every qubit, • for every pair of qubits, a gate CS k with k ∈ {0, . . ., 3} chosen uniformly at random, is applied with probability γ log N/N , for some fixed parameter γ > 0.
Let us denote by D N the family of IQP circuits generated from the gate set {T, CS}.We associate to each circuit of D N its probability of being generated by the previous random process to define a distribution over D N .We call an IQP circuit picked from this distribution sparse and in the following whenever we discuss about a fraction of sparse IQP circuits we mean a fraction of circuit in the sense of the probability distribution defined above.We note that all the considered gates commute, and can therefore be applied in any order.Given that each qubit will typically be involved in a logarithmic number of 2-qubit gates, we see that sparse IQP circuits can be implemented by circuits of average depth Θ(log N ) [18].For each sparse IQP circuit D, we denote by p D the probability distribution on {0, 1} N corresponding to the output distribution of the circuit.In particular, it holds that for some integer weights w i,j , v k .This quantity corresponds to an Ising model partition function, which is proven to be hard to compute in the worst case [19,20] and conjectured to be hard to compute on average.We formally recall the conjecture from [18]: Conjecture 1 (Average Case Hardness of Ising model [18]).Consider the partition function of the general Ising model, where the exponential sum is over the complete graph on N vertices, w i,j The sparse IQP problem is as follows: pick a random D ∈ D N according to the random process described before, and output an N -bit string s according to a distribution q D such that where the total variation distance between two distributions p and q is defined as Assuming Conjecture 1, and the non collapse of the Polynomial Hierarchy, a generalisation of the P N P conjecture widely considered to be true, Bremner et al proved that there is no efficient classical algorithm for the sparse IQP problem.More precisely, Theorem 1 (Classical hardness of sparse IQP sampling [18]).Assuming Conjecture 1, there exists δ > 0 independent of N such that a constant fraction of sparse IQP circuits cannot be simulated by a polynomial-time classical algorithm up to precision δ in total variation distance unless the polynomial hierarchy collapses to its third level.
This theorem states that on average over the choice of D from the probability distribution over D N defined previously, it is hard to sample classically from a distribution close to p D .While a fault-tolerant quantum computer can sample efficiently from such a distribution, we do not expect that this is the case for near-term quantum processors.In fact, the initial proposal [18] partially addressed this issue by considering a simple noise model where the quantum circuit is assumed to be ideal, except for some independent and identically distributed noise added to the classical value of the final outcomes.Unfortunately, this model is too naive and a more realistic noise model should assume that every gate suffers from some constant level of noise.In that case, because the number of gates is of order N log N in the circuit, it is immediate that noise will accumulate through the circuit and that the level of noise per qubit cannot be assumed to be constant, independent of N .Here we choose to consider a more general error model -the local stochastic noise model [21] -that includes wellknown error models such as the independent depolarizing noise channel but also allows for local correlated errors.In this model described in Section 4, errors are applied at each gate operation and the probability that faulty locations contain a specific set A is upper bounded by p |A| .In this work we propose a physical implementation of sparse IQP circuits that is robust to this kind of noise, without requiring the full machinery of fault-tolerant quantum computation.
In order to avoid multiple rounds of costly error correction, our main strategy is to make the encoded circuit of constant depth rather than logarithmic.This is challenging since the target logical circuit has logarithmic depth, and we want in addition to make it fault-tolerant.To this end, we design a family of quantum error-correcting codes on which sparse IQP circuits can be implemented in depth 1, meaning that they are fully parallelized.This is possible thanks to the commuting nature of sparse IQP gates [22].In addition, we prove that the initial state can be encoded in constant quantum depth by performing stabilizer measurements.The only part of the process which is not implemented in constant depth is the final step of the state preparation: it consists of a single interaction with a classical computer that must compute a correction to apply, which depends on the stabilizer measurement results.In our scheme, this classi-cal computation requires a polylogarithmic time because one needs to compute a correction for quantum patches of logarithmic size.We remark that similar time complexities are often neglected in the literature of quantum fault-tolerance [21,23,24], and it may in fact not be a very problematic issue in practice.
Given a circuit D ∈ D N and precision δ > 0, we construct the circuit C D (δ) that samples from a distribution that is δ-close to p D in total variation distance after classical post-processing.While the final classical post-processing is not performed in constant time, this is not an issue since all the qubits have already been measured.We discuss this point in Section 4. The circuit C D (δ) is illustrated in Figure 2 and we detail its construction in Section 2. For circuits D of depth Θ(log N ), the space overhead is polylogarithmic in the precision δ and in number N of logical qubits.Given that the average depth of sparse IQP circuits is Θ(log N ), a simple Markov inequality further implies that the fraction of circuits admiting a depth larger than α log N decreases as O(1/α).Thus an arbitrarily large fraction of such circuits benefits from the above overhead scaling.A practical difficulty that we do not address here is that our scheme requires long-range interactions.We state our main result: Theorem 2 (Constant depth quantum advantage).There exists a universal ε th > 0 such that, for all N ∈ N, D ∈ D N and δ > 0, running a noisy version of the quantum circuit C D (δ) by inserting local stochastic noise of strength ε < ε th after each step, yields samples from p D up to precision δ (in total variation distance) after classical post-processing.
Combining this with Theorem 1, our scheme demonstrates a super-polynomial quantum advantage for the task of sparse IQP sampling, assuming Conjecture 1 and that the Polynomial Hierarchy does not collapse.
To summarize our contribution, we reduce the fault-tolerance space overhead required to demonstrate a superpolynomial quantum advantage with a constant depth quantum circuit, from a large degree polynomial in [13] to a polylogarithmic overhead.A similar reduction is achieved for the classical computation complexity during the quantum computation.This comes at the cost of losing the local connectivity of the scheme.The first layers E (stabilizer measurements) and C X (adaptative error correction) prepare a logical state that is fed to a parallel version D ∥ of the sparse IQP circuit D, followed by single-qubit measurements.The overall circuit has constant (quantum) depth and acts on N × polylog(N ) qubits.A single interaction with a classical computer is necessary to compute the correction C X for the initial preparation.A final classical post-processing (not depicted) then computes a sample from the target distribution p D .

The Tetrahelix code for fault-tolerant parallel computation
We recall that we aim to address two issues in order to get a final circuit of constant depth: we need to reduce the depth of the logical circuit for sparse IQP from logarithmic to constant, and we need to find a fault-tolerant version that remains of constant depth.We achieve this by combining two ideas.
First we rely on 3D color codes [25,26] which admit transversal diagonal gates.More specifically, we will focus on the tetrahedral code subfamily that admits a transversal T -gate.Moreover, because these codes are CSS codes [27,28], they also admit a transversal CNOT gate.Combining both, we see that tetrahedral codes also have transversal CS-gates, as shown on Figure 3.The second idea is that it is possible to fully parallelize an IQP circuit by using a GHZ encoding of each of the input qubits, in order to trade depth for width of the circuit.This means encoding a |+⟩ as 1 † = Figure 3: Implementation of the controlled-phase from controlled-not, T and T † gates, all of those have transversal implementation on a tetrahedral color code.Switching T and T † gives CS † ing on a different qubit within the GHZ state.This is described in Figure 4.This new circuit has two shortcomings.First, despite being of constant depth, the logical phase-flip rate increases linearly with the size of the state since the measurement results of the k qubits within a GHZ state need to be aggregated.Second the preparation of bare GHZ states cannot be done fault-tolerantly in constant depth.
To solve these issues, we define a new stabilizer code -the tetrahelix code -combining the two ideas (3D color code for transversal gates and GHZ states for parallel implementation).We will detail the construction in Section 2, and only briefly explain its main properties here.The encoding is parameterized by two integers, k and L, accounting respectively for the parallelization capacity and the distance of the code.A k-tetrahelix code of distance L is defined by merging (in lattice surgery terms [29,30,31]) k tetrahedral codes of distance L along a 1-dimensional chain.Remarkably, the resulting [[Θ(kL 3 ), 1, Θ(L)]] tetrahelix code admits a depth-1 implementation of a logical sparse IQP circuit of depth k.This corresponds to a linear trade-off between the depth of the initial logical circuit and the number of physical qubits:

Lemma 1. Any sparse IQP circuit of depth k on N qubits can be implemented in depth 1 on N logical qubits encoded in k-tetrahelix codes.
The constant depth of the circuit, together with the arbitrarily large distance L, ensures the faulttolerance of the circuit up to a final classical postprocessing to decode the results of the qubit measurements.We discuss in Section 4 how to achieve this by exploiting efficient decoders of color codes.The complexity of this step remains negligible compared to the super-polynomial quantum advantage of the overall circuit.The remaining challenge concerns the initial preparation of the encoded states of the tetrahelix code.One needs to ensure that such a preparation can also be done in constant depth and in a fault-tolerant manner.

Single-shot state preparation
A logical state of a quantum stabilizer code can always be prepared starting from a simple product state by measuring stabilizers and applying the appropriate correction to set the state in the code space.Such a scheme is however sensitive to measurement errors and fault-tolerance is usually achieved by repeating measurements.We circumvent this shortcoming by establishing the single-shot preparation of logical |+⟩ states for the tetrahelix code.In general, error correction based on erroneous measurements can induce large-weight physical errors whose accumulation could later translate into logical errors.In order to ensure fault-tolerance, one can prove that building on the particular structure of syndromes, the induced residual errors can be kept local with high probability.Such errors are then dealt with by the final classical decoding step.This property corresponds to single-shot decoding introduced by Bombín in [32].Throughout this paper, |x⟩ denotes the logical encoded state |x⟩ for x ∈ {0, 1, +, −}.
Lemma 2. The tetrahelix code admits a singleshot preparation of |+⟩ / |−⟩ logical states, up to X stabilizers of the tetrahedral code.
Note that, as argued in subsection 3.2, the X stabilizers need not be applied since they commute with sparse IQP encoded gates and hence can be propagated to the end of the circuit where they leave the final measurement unchanged.
The proof of the single-shot property of the ktetrahelix code is detailed in Section 3 and relies on (i) the single-shot preparation of Hadamard basis states for 3D gauge color codes [32,33], and (ii) the fact that the measurement errors occurring during code merging are detectable with the global stabilizer measurement outcomes.
We furthermore argue that the associated decoding can be performed on a classical computer in polylogarithmic-time with respect to N in Section 3. We consider it to be instantaneous to derive Theorem 2.

Sketch of proof of Theorem 2
The rest of the paper is devoted to establish Theorem 2. In Section 2, we first briefly review tetrahedral codes, from the 3D color code family.Next, we define the tetrahelix code family obtained by merging tetrahedral codes.We prove that the k-tetrahelix code reduces the depth k of a sparse IQP circuit to depth 1 (Lemma 1).In Section 3, we prove that the merging failure probability between two tetrahedral codes of distance L is exponentially suppressed in L and hence that k-tetrahelix encoded states in the Hadamard basis can be faithfully prepared in constant quantum depth (Lemma 2).In Section 4, we prove the fault-tolerance of the scheme.More precisely, we prove the existence of a non-zero error threshold independent of k, below which we arbitrarily suppress logical errors by increasing L for any encoded sparse IQP circuit.

Tetrahelix code 2.1 Overview of tetrahedral codes
Color codes are a family of topological codes introduced by Bombín and Martin-Delgado [34,25,26].Their main feature is that they admit a transversal implementation of single-qubit phase gates, including the T -gate when the codes are 3-dimensional.In the following we focus on the subfamily of tetrahedral codes that encode a single qubit.Tetrahedral codes are defined on 3-dimensional color complexes, that we will call 3-colexes as in [35], of a tetrahedral shape as described in Figure 5 with the vertices corresponding to the data qubits.3-Colexes are 3D lattices with the properties that (i) each cell is assigned one of four colors such that no two adjacent cells are of the same color; (ii) three colors appear on each external facet of the complex, and such a facet is associated with the missing color (in Figure 5 these facets correspond to the four triangular external boundaries of the tetrahedron); (iii) each vertex is incident to a cell or facet of all possible colors.
In the following we denote by L ∈ N the number of vertices on the edges of the lattice, and will correspond to the code distance, as explained below.The construction of a tetrahedral 3-colex is not unique for a given L but if one relies on tessellations of uniform density, then the resulting codes each encode a single logical qubit in m = Θ(L 3 ) physical qubits, and all display the properties that we will require.
Let us recall the formal definition of a tetrahedral code on m qubits, which will serve as a building block for the tetrahelix code.We start from a tetrahedral 3-colex with set of vertices V, faces F and cells C. Physical qubits are associates with vertices, so m = |V|.The tetrahedral code associated to this colex is a CSS code with stabilizers given by: Here, each cell c or face f is identified as a binary vector of length m with ones at the locations corresponding to the associated vertices, and we define X(a) := ⊗ m i=1 X a i i for a ∈ {0, 1} m .(and similarly for Z(a)).In words, the X stabilizer associated to a 3-cell c is the product of Pauli X operators on all the vertices in the boundary of c.In particular, X stabilizers are associated by the 3-cells of the colex and Z stabilizers are associated with the faces.
The fundamental property of tetrahedral color codes is that for each code of the family there exists a partition of vertices V = V + ∪ V − such that applying the gate T on V + and T † on V − implements an encoded logical T -gate [33]: Similarly to encoded states |x⟩ denoted by |x⟩, we denote by U the encoded logical unitary U .Together with the existence of transversal controlled-not gates, this implies the transversal implementation of the CS-gate (see Figure 3): where CS(V + ) denotes the transversal application of CS between the analogous sets V + of two code blocks.X and Z logical operators are respectively surface-like and string-like and the X distance and Z distance scale as Θ(L 2 ) and Θ(L), respectively.

Construction of the tetrahelix code
The transversality of the sparse IQP gate set paves the way towards the fault-tolerant implementation of such circuits.This would however require repeated error correction cycles at each circuit step, that is a logarithmic number of times.Concatenating a tetrahedral code with a repetition code gives a family of codes that present the desired parallelization property.Unfortunately, it does not meet the criteria of constant depth preparation for the initial encoded states.We now define a new code, the tetrahelix code, that displays both properties: (i) depth-1 implementation of a sparse IQP circuit, (ii) constant-depth encoded state preparation in the Hadamard basis.
As briefly mentioned in subsection 1.2.1, a tetrahelix code is obtained by merging tetrahedral codes in lattice surgery terms [29,30,36].Here we detail the construction starting by merging two such codes of distance L, with respective sets of vertices V 1 and V 2 as described in Figure 6(a).We consider two codes which are exact mirror images of one another and we denote by φ : V 1 → V 2 the bijection between the two sets of vertices.An external triangular facet, B 1 ⊂ V 1 , together with its mirror image, B 2 = φ(B 1 ), are chosen and every vertex is paired with the corresponding one on the other code.This pairing defines a set of pairs The merge operation consists in fusing the X stabilizers on the boundaries and adding new Z stabilizers of weight 2 associated to the paired qubits in P 1,2 .More precisely, the Z stabilizers are defined as ⟨Z(f ), f ∈ F 2 ⟩ with Similarly, we define C 2 the union of merged stabilizers and unmerged ones: with and X stabilizers are then defined as ⟨X(c), c ∈ C 2 ⟩ corresponding to non-adjacent cells of each code and fused adjacent ones (paired according to their color as in Figure 6(a)).This construction ensures that X stabilizers commute with every Z stabilizer including the newly defined Z stabilizers with support on P 1,2 .We denote by X i and Z i the logical operators of the initial tetrahedral codes from which we define the logical operators of the new code.The 2-tetrahelix code encodes a single logical qubit for which the logical operators can be taken of the form Z = Z 1 (or Z 2 ) and X = X 1 X 2 with X 2 chosen so that its support intersects on the same subset of P 1,2 than X 1 (to commute with the associated Z stabilizers).Merging additional tetrahedra does not fundamentally change the analysis.Tetrahedra can be aligned in the shape of Figure 6(c) to form a chain of length k ∈ N so that each extremal vertex is shared between at most four tetrahedra.This ensures that at most four X stabilizers are fused together.This linear packing of regular tetrahedra is known as a Boerdijk-Coxeter helix or tetrahelix [37,38] which motivates the name of the code.
Denoting by V i the set of vertices of the i th tetrahedral color code, we get a partition of the set of all vertices V = ∪ k i=1 V i .With P i,i+1 denoting new Z stabilizers between adjacent tetrahedra i and i + 1, F k and C k are defined analogously as F 2 , and C 2 to ensure stabilizer commutation: where here C * i and M i,i+1 are defined recursively so that stabilizers can be merged across several tetrahedra (up to three on edges and up to four on summits).We define the k-tetrahelix code that encodes a single logical qubit in Θ(kL 3 ) physical qubits from its set of stabilizers: The logical Z operator can be chosen as any of the logical Z i operators of the composing tetrahedral codes, while the X logical operator is a product of X i operators recursively chosen so that X i and X i+1 intersect with the same subset of P i,i+1 .

Code distance
The X and Z distances of a code correspond to the minimal weights of X and Z logical operators.We denote by d k X and d k Z the X and Z distances of the k-tetrahelix code and prove that: We prove the result for the 2-tetrahelix code by relating logical operators of the tetrahelix code to those of the initial tetrahedral codes, the result generalizes to arbitrary k-tetrahelix code by recursion.In this subsection we denote by d 1 X = Θ(L 2 ) and d 1 Z = Θ(L) the X and Z distances of a tetrahedral code of edge length L and number of qubits m = Θ(L 3 ).
Let us consider a logical operator Z of the 2tetrahelix code.We index the vertices of the two composing tetrahedra in a symmetric manner with respect to the paired facets.The logical operator Z is of the form of the tensor product of Pauli Z operators on each tetrahedron Z = Z(µ) ⊗ Z(ν), with µ, ν ∈ {0, 1} m .We will show that up to multiplication by Z stabilizers we can transfer Z(µ) ⊗ Z(ν) to Z(µ + ν) ⊗ 1.This means that we transfer the physical Pauli Z operators from the second tetrahedron to the symmetric ones in the first one.We can then conclude by noticing that Z(µ + ν) is a logical operator of the first tetrahedron and hence of weight larger or equal to d 1 Z .We thus have which concludes the argument.Indeed, since an arbitrary logical Z 1 operator of the first tetrahedron is also a logical operator of the tetrahelix code, there exists a Z stabilizer R Z such that Z = Z 1 × R Z .Such a stabilizer is necessarily of the form: where Z is a product of Z-stabilizers defined at the boundary (paired qubits in P 1,2 ).It is clear that, multiplying Z by R 2 Z , maps the support from the bulk of the second tetrahedron to the paired facet of this tetrahedron.Next, we completely transfer this support to the facet of first tetrahedron by multiplying by R 1,2 Z .At this point, the support of the logical operator is entirely contained in the first tetrahedron.Now we apply the symmetric version of R 2 Z defined on the first tetrahedron.This maps the original logical operator to Z(µ + ν) ⊗ 1.
The case of the X distance is straightforward as the product of X 1 and X 2 logical operators whose supports intersect with the same pairs of P 1,2 yields . . .

+1 +1 +1
Figure 6: (a) Adjacent tetrahedra (here for L = 3) are merged by measuring pairs of qubits from P 1,2 that become Z stabilizers of the new code.The colors of the second tetrahedron are chosen by convention so that merged X stabilizers are of the same color.(b) Every new Z stabilizer generator must be set to 1 to project the state in the code space.The new pair stabilizers value are not independent since they are all related to Z stabilizers of the two tetrahedral codes on the face on which the merge is performed.In particular the product of four pairs overlapping the same Z stabilizer (of support in F 1 ) is equal to 1. (c) Merging additional tetrahedra with each other enables to form a chain of tetrahedra of length k.The optimal chain has the shape of an helix and corresponds to minimizing the number of merged X stabilizers (of support in C k ) that is equal to four in this packing.
a logical operator of the 2-tetrahelix code, and this product form is stable upon multiplication by X stabilizers.This stability is a direct consequence of the fact that the restriction of a tetrahelix X stabilizer to a single tetrahedron is a stabilizer of the tetrahedral code.Merging an additional tetrahedral code hence increases the X distance by d 1 X : The same discussion between a (k − 1)-tetrahelix code and a tetrahedral code generalises the proof by recursion to k-tetrahelix code for arbitrary k.Recalling that d 1 Z = Θ(L) and d 1 X = Θ(L 2 ), we obtain the bounds of (17).

Parallel computation
We turn to the properties of the code concerning parallel computation.We establish Lemma 1 by showing that the encoded T -gate can be implemented in depth 1 on a single tetrahedron of the chain.A tetrahedral code on m physical qubits is a CSS code and logical states can therefore be written in the form with the addition taken modulo 2 and x 1 ∈ {0, 1} and S 1 ⊂ {0, 1} m such that for s 1 ∈ S 1 we have X(s 1 ) ∈ S 1 X .Similarly, L 1 ∈ {0, 1} m represents an arbitrary X 1 logical operator.The transversal implementation of the T -gate T (V + )T † (V − ) = T on the tetrahedral code implies that each codeword gains the same phase from the application of T (V + )T † (V − ): The logical computational states of the k-tetrahelix code are given by |x⟩ = 1 for x ∈ {0, 1}.Here S k ⊂ {0, 1} k×m is such that for s ∈ S k , we have X(s) ∈ S k X and L is the vector associated an arbitrary logical X operator.For each X stabilizer, s ∈ S k is a concatenation of k vectors s i ∈ S 1 , i ∈ {1, . . ., k}: s = [s 1 , ..., s k ].Similarly, for the logical operator, the binary vector L is a concatenation of vectors L i each representing a logical X i operator of the i th tetrahedral code.Focusing on tetrahedron i 0 , and up to qubit re-ordering, we can thus write The terms |ψ(s i 0 , x)⟩ depend on s i 0 because of correlations between codewords restricted to different tetrahedra induced by overlapping X stabilizers, but this does not impact our argument.Taking V + i 0 and V − i 0 as in (7), we have Combining ( 24) and (25) directly implies: To extend the arguments to the CS-gate, we would need to use the gadget of Figure 3. Notice that while the T and T † -gates are applied on a single tetrahedron, the CNOT gates would need to be applied on all physical qubits (CSS property).However, as these CNOT gates come in pairs, they cancel each other outside the tetrahedron where the T -gates are applied.Thus, the CS-gate can also be applied between two arbitrary tetrahedral blocks of two tetrahelix codes: This implies that the k-tetrahelix code can implement in a single step an encoded T or a CS-gate on each tetrahedral block of the code.Therefore, we obtain a depth-1 parallel implementation of a depth-k sparse IQP circuit up to state preparation.This finishes the proof of Lemma 1.In the next section we prove that encoded states in the Hadamard basis can be prepared fault-tolerantly in constant quantum depth.

Constant-depth preparation of encoded states
The encoded states of the k-tetrahelix code in the Hadamard basis can be prepared by merging k associated encoded states of the composing tetrahedral blocks.In this section, we show that both the steps of preparing encoded tetrahedral states and their merging can be done in constant quantum depth.

Single-shot decoding of tetrahedral code
Since the state |+⟩ ⊗m is stabilized by all X stabilizers and by the X logical operator of a CSS quantum code, the projection over the logical |+⟩ can ideally be done by measuring the Z stabilizers and a single step of Pauli X corrections.Measurement errors however usually prevent such reliable encoded state preparation in constant depth.Indeed, measurement errors induce residual data errors after Pauli corrections, which usually calls for many repeated measurements before such a correction is applied.Repeating measurements gives an extra dimension to the error syndrome where ancilla and data errors can be separated by the decoding algorithm which infers an error pattern close to the most likely one and hence an appropriate correction.An alternative approach is to build on the structure of the error syndrome of some codes to ensure that a single round of local measurements is sufficient to ensure the locality of the residual errors with high probability.This strategy was first proposed by Bombín in [32] for 3D gauge color codes and is known as singleshot decoding which is a property of a quantum errorcorrecting code in conjunction with its decoder.In the case of 3D color codes, Z stabilizers on faces correspond to Z gauge operators of 3D gauge color codes.This ensures the single-shot decoding property up to a classical computation of polynomial complexity in the code size.Furthermore, the topological nature of the code implies that the measurements can be parallelized to a constant quantum depth.In conclusion, we have a constant depth preparation of encoded states in the Hadamard basis for the tetrahedral code up to local residual errors.

Single-shot merging of tetrahedral states
Merging two tetrahedral codes of distance L into a 2-tetrahelix code is described in Figure 6(a-b).Pairs of qubits from P 1,2 are measured over the faces on which tetrahedral codes are merged and a correction is applied depending on the measurement outcomes.Since facets of a tetrahedral code have the structure of a triangular code (2D color code) of size L, in the absence of errors, measurements yield a binary codeword w of the corresponding classical code.This codeword can be written as the sum of an X stabi-lizer and an X logical operator of the 2D code with the same formalism used in Section 2.4 for the 3D code.
The appropriate correction then can be seen to be a 3D code codeword whose restriction to the triangular code vertices gives w.This can be obtained by determining first the decomposition of w into facets of the triangular code (X stabilizer generators) and the logical operator X over the entire triangle (so that it is a logical operator of both the 2D and the 3D codes) before mapping the facets to cells to get a 3D code codeword for x ∈ {0, 1}.Importantly, an X stabilizer of the tetrahedral code commutes with encoded T and CSgates on the tetrahelix code since it does not change the structure of the codewords described in subsection 2.4.Since the circuit ends with X measurements this means that it is sufficient for our purpose to compute x and only apply the logical part of the correction.In other words, we only need to prepare tetrahelix encoded states up to tetrahedral codes X stabilizers.
If the two tetrahedral codes are not perfectly in their code space, or in the case of measurement errors, the measurement results deviate from w: Here, e r accounts for the residual errors of the tetrahedral states preparation, and e m stands for measurement errors.Because the preparation of encoded states in the Hadamard basis for the tetrahedral code is single-shot, the resulting errors follow a local stochastic noise model.This is also the case for measurement errors and hence decoding the triangular code yields the correct value of Z 1 Z 2 with probability exponentially close to 1 in L. Z 1 Z 2 is then set to +1 by applying or not X 1 .
A k-tetrahelix code encoded state can then be prepared in a similar manner simply by repeating the merging operation with additional tetrahedral codes, while always applying the logical correction on the tetrahedron for example on the left of the merge.This scheme can be seen as similar to preparing a GHZ state of size k from parity measurements and logical correction, with the difference that here, measurement errors are exponentially suppressed, thus giving Lemma 2.
Efficient decoding algorithms exist for 2D color codes [39,40] and 3D color codes [32,41] and are single-shot for the 3D case.These algorithms have a complexity polynomial in the code size.The different tetrahedral encoded states can be prepared in parallel.Parallel merge measurements followed by iterative computation of the associated correction then give a preparation of tetrahelix encoded states with polynomial in L and proportional to k classical computation.We prove in Section 4 that we need a polylogarithmic number of qubits per code block which hence gives a polylogarithmic-time classical computation.

Application to sparse IQP circuits
In this section, we apply the results of the two previous sections to demonstrate the main result of this paper stated in Theorem 2. We start by presenting the error model.Next, we show that the encoding of the circuit of Figure 2 is fault-tolerant by proving the existence of an error threshold.Finally, we provide an estimation of the space overhead of the scheme.

Error model
The coupling of the quantum system with the environment generates noise that can later induce errors in the computation.We use the local stochastic quantum noise model from [21] where the set of faulty locations is a random variable of a discrete space-time and local correlations are allowed.No assumption is made on a particular type of error operator.This makes the model general enough to cover a wide class of applications.In particular this captures commonly studied noise channels such as depolarizing and dephasing noise, or amplitude damping.
A noise model of parameter ε that satisfies the following two properties is said to be locally stochastic: (i) the faults are confined to a random set of spacetime locations A ⊂ V with probability p(A) and (ii) the probability that a set of faulty locations contains a specific set of A locations is upper bounded by ε |A| .
Final measurements are performed in the Hadamard basis and hence at the end of the circuit only Z-type errors induce errors on the classical output.Z errors can either be environmentally induced or generated during the propagation of X-type errors in the circuit.X errors can also arise due to the coupling with the environment but also from incorrect preparation of encoded states (recall that the preparation only includes X correction).Local stochastic errors propagate as such the constant depth circuit but residual errors after encoded states preparation are not necessarily local.We showed in Section 3 that their non-local representatives admit exponentially low probabilities which implies that the correction of local stochastic errors by the final decoding is sufficient to exponentially suppress the logical error rate.
More formally, residual errors after preparation of tetrahedral encoded states are characterised in [32] such that (i) correctable physical errors follow a local stochastic noise model N ε T,loc , (ii) non-correctable physical errors (that is to say errors whose correction attempt induces a logical error) are exponentially suppressed, we denote by N ε1 T,nc the corresponding error channel.Using a similar notation we call N ε2 M,nc the channel associated to logical errors due to unsuccessful merging, with probability exponentially suppressed in the code distance.
The encoded states preparation error channel then writes with ε1 and ε2 exponentially suppressed in ε: The two non-correctable terms contribute to the final logical error rate but are exponentially rare.In the following subsections we prove that low enough local stochastic noise is corrected by the tetrahelix code.
Post-processing of final single qubit measurements in the form of the tetrahelix code decoding then yields the value of the logical measurement.In the following we analyze error configurations and describe an efficient decoder from 2D and 3D color code decoders.

Existence of a good decoder
In the 3D color code, the logical Z operator corresponds to strings of Pauli Z connecting the four boundaries of different colors.An extremal vertex of the tetrahedron belongs to three boundaries and a string connecting this vertex to the opposite face of the remaining color hence yields an example of a Z logical operator.Logical errors arise when more than half of the respective phases of any such path are flipped.Errors on the tetrahelix code have a similar origin except that error strings can jump between tetrahedra to connect boundaries of different colors as described in Figure 7.This means that we cannot individually decode tetrahedra and that we first need to split (in lattice surgery language) the chain to retrieve tetrahedral codes.This can be performed in software after final single-qubit X measurements by reconstructing the value of X stabilizers of the tetrahedral codes.Considering the example of the 2-tetrahelix code for simplicity, X stabilizers at the interface between the two tetrahedra were merged and hence, taken individually, do not stabilize the tetrahelix code.This means that they will initially not necessarily be in their +1 eigenspace even without errors.This can be fixed by applying Z stabilizers from P 1,2 to set them to +1 while acting trivially on the code space of the tetrahelix code.This can be seen as preparing two triangular codes (2D color codes) logical states on the two facets by applying a Pauli operator of the form with σ ⊂ B 1 a set of vertices from the triangular facet on which the merge was performed, and φ the bijection between the two tetrahedra sets of vertices defined in Section 2. Note here that any potential logical error applied to one tetrahedron would also be applied to the second one.
In reality, errors arising on the support of tetrahedral codes X stabilizers prevents all such stabilizer to be set back to +1 by applying Z stabilizers from P 1,2 .Since in this scheme we aim at correcting errors at the next step during individual tetrahedral codes decoding we only need here to approach the tetrahedral code spaces.This can be done by minimizing the number of tetrahedral code X stabilizers with value -1 in the chain.For the k-tetrahelix code we start by X stabilizers merged between more than two tetrahedra, that is to say on tetrahedra vertices and edges, followed by those on the bulk of the facet on which tetrahedral codes are merged.
Once each tetrahedron is back to the tetrahedral code space (up to physical errors) it suffices to individually decode each code and multiply the logical values of the X i 's to recover the desired logical information (and hence pairs of tetrahedral codes logical errors possibly introduced at the splitting step cancel each other).For a low enough error rate we thus expect the logical error rate ε after such decoding to be proportional to the number of tetrahedra in the chain and to the logical error rate of a single tetrahedral code: Figure 7: Representation of Z logical error configurations.Error strings are no longer restricted to a single tetrahedron (blue) but can also connect neighbouring tetrahedra (red).In the tetrahelix stacking, error strings can jump up to two tetrahedra at once.
Here we have only used 2D and 3D color codes decoders and therefore the existence of efficient 2D and 3D color code decoders [39,42,40,41] implies the existence of an efficient decoder for the tetrahelix code.The formal definition and analysis of such a decoder under the general noise model considered here is beyond the scope of this paper and in the following we will prove Theorem 2 by relying on existing results on quantum LDPC codes.To do so we show that the code admits a non-zero threshold independent of k so that for low enough noise the logical error can be made arbitrarily low by increasing L.

Existence of a threshold for minimum weight decoder and local stochastic noise
The tetrahelix code is a quantum LDPC code since its generators have a bounded weight and each qubit is involved in a bounded number of generators.It is known that a family of [[ñ, k, d]] quantum LDPC codes, with ñ and d scaling to infinity, and experiencing local stochastic noise of parameter ε, admits a non-zero error threshold ε th [21].More precisely, below this threshold the logical error rate is exponentially suppressed as using the minimum weight decoder.
In the case of tetrahelix code, k is in general an independent parameter from L the distance of the code.Applying directly the results of [21] would lead to a threshold dependent on the value of k.We take care of this issue by imposing k = O(L).Thus, the associated family of k-tetrahelix codes admits a number of physical qubits ñ = O L 4 , and such a [[ñ = Θ(kL 3 ), k = 1, d = Θ(L)]] code admits a non-zero threshold ε th with L scaling to infinity.Note that, while the minimum weight decoder is not efficient in general, we expect the efficient decoder of the previous subsection to present similar error suppression property.
In the following subsection, we show that imposing k = O(L) is compatible with the desired parallelization and fault-tolerance properties.

Proof of Theorem 2
When implementing sparse IQP circuits on N qubits on k-tetrahelix codes, k, L and N are related through three relations.First, as discussed in the previous subsection, to ensure the existence of threshold independent of k, we need to have Second, an arbitrarily large fraction of sparse IQP circuits on N qubits are of depth Θ(log N ) and can hence be implemented on k-tetrahelix codes with: Third, another relation between L and N results from the required code size to reach the target precision δ of the sparse IQP problem.A logical sparse IQP circuit D is implemented with a k-tetrahelix code by the circuit C D .After the final decoding, one obtains samples from a distribution p D,ε .For a constant logical error rate ε per logical qubit, the union bound gives an upper bound to the distance between the noisy and the ideal probability distributions with respect to N : Keeping the noisy probability distribution δ-close to the ideal distribution thus imposes a logical error rate ε at most O(δ/N ).Logical errors can arise both from local stochastic errors and remaining non-local residual errors induced by merging errors or tetrahedral encoded states preparation errors, all of which are exponentially suppressed in L below some threshold: where the polynomial dependency on L in equation (34) is absorbed by the exponential.From (37) and (38), we derive the third equation relating L and N , For a given N , it is enough to take L = Θ log(N/δ) log(ε th /ε) and k = Θ(log N ), (40) which automatically also satisfy (36).The total number of qubits n of each code block for a sparse IQP circuit of width N then reduces to: n = Θ(kL 3 ) = Θ polylog(N/δ) polylog(ε th /ε) .(41) This completes the proof of Theorem 2. We note that for the (arbitrarily small) fraction of sparse IQP circuits of super-logarithmic depth the overhead is at most polynomial since the depth of sparse IQP circuits is at most linear.

Discussion
We have proposed a fault-tolerant implementation of sparse IQP circuits, paving the way for demonstrations of super-polynomial quantum advantage in near or mid-term experiments.It consists in a constantdepth quantum circuit and involves a single step of feed-forward from classical computation.To do this we have introduced the tetrahelix code admitting single-shot preparation of logical |+⟩ states and transversal implementation of IQP circuits.The qubit overhead and classical computation time of our scheme are only polylogarithmic in the width of the original sparse IQP circuit.The requirements of our protocol are almost met by current NISQ experiments.We hope it can bring within reach demonstration of super-polynomial advantage of quantum over classical computation.Depending on the physical platform the main complexity of the protocol may be coming from the required connectivity.A single tetrahelix code requires 3D connectivity.On top of this, each physical qubit has to interact with a single other qubit from another tetrahelix code during the implementation of the IQP circuit.These additional interactions are potentially long range.In the same spirit as in [30], the interaction range can be reduced using longer and branching tetrahelix codes while staying 3D.Logical CZ-gates can be realized facet to facet [43], but CS-gates will still require transversal connectivity between tetrahedra.Finding other codes with similar properties but simpler connectivity could ease the implementation even more.
The tetrahelix code that we propose in our implementation has interesting properties in itself.The ability to implement many different non-Clifford unitaries in a transversal manner could potentially be leveraged in other settings.One can take inspiration from this construction to design other codes with large sets of transversal non-Clifford unitaries to locally trade depth for width in larger scale algorithms.The key ingredient of our approach is the commuting nature of sparse IQP gates that enables their parallelization, in the spirit of [44] for MBQC, but in a fault-tolerant manner.Note that a generalization of the construction to any set of commuting gate would have powerful applications [22].
Concerning the encoding of sparse IQP circuits on tetrahelix codes, it is not clear whether or not the trade-off between depth and width is optimal but the noticeable asymmetry between the X and Z distances of the tetrahelix code seems to indicate the overhead could be reduced by balancing them out.Another direction would be to improve the error threshold of the scheme, possibly with post-selection in the spirit of [45].This would participate bringing the scheme further within reach of current experiments [46].
During the preparation of this work, we became aware of a similar work on reducing error correction requirements for fault-tolerant quantum advantage [47].

Figure 1 :
Figure1: IQP circuits on N qubits are defined by an unitary D diagonal in the computational basis with state preparation and measurements performed in the Hadamard basis.For sparse IQP circuits, D is a logarithmic-depth circuit consisting of T and CS-gates.

Figure 2 :
Figure2: Our fault-tolerant implementation of a logical sparse IQP circuit D. The first layers E (stabilizer measurements) and C X (adaptative error correction) prepare a logical state that is fed to a parallel version D ∥ of the sparse IQP circuit D, followed by single-qubit measurements.The overall circuit has constant (quantum) depth and acts on N × polylog(N ) qubits.A single interaction with a classical computer is necessary to compute the correction C X for the initial preparation.A final classical post-processing (not depicted) then computes a sample from the target distribution p D .

Figure 4 :
Figure 4: Each step i ∈ {1, ..., k} of a circuit of depth k is simultaneously applied on the i th physical qubit of all the GHZ states.The logical circuit (a) of depth 4 can be compiled in depth 1 up to classical decoding and state preparation by starting from GHZ state of size 4 and implementing circuit (b).The × blocks correspond to the classical decoding circuits.

Figure 5 :
Figure 5: (a) The [15,1,3] Reed-Muller code is the smallest example of tetrahedral codes.Here qubits are on vertices and X and Z logical operators can be chosen respectively on a face and an edge of the tetrahedron.(b) X stabilizers are supported on cells (elements of C) and Z stabilizers on faces (elements of F).

Table 1 :
Potential candidates for the demonstration of robust quantum advantage.The advantage is relative between the quantum depth and its minimal classical counterpart.Factoring displays a superpolynomial advantage provided that factoring is hard classically, but requires the full machinery of fault-tolerance.Graph state sampling and sparse IQP sampling also give a large advantage, under stronger assumptions (that the Polynomial Hierarchy does not collapse, and with an Average Case Hardness conjecture) and can be implemented with an adaptive circuit of constant depth.Finally the magic square problem leads to an unconditional advantage with a non-adaptative circuit of constant depth, but only offers a logarithmic advantage compared to classical computing.
∈ R and v k ∈ R are weights for edge ij and vertex k, and ω ∈ C. If the weights are chosen uniformly at random from the set {0, . . ., 7}, then it is #P-hard to approximate Z(e iπ/8 ) 2 up to multiplicative error 1/4 + o(1) for a 1/24 fraction of instances, over the random choice of weights.