On maximum-likelihood decoding with circuit-level errors

Error probability distribution associated with a given Clifford measurement circuit is described exactly in terms of the circuit subsystem code previously introduced by Bacon, Flammia, Harrow, and Shi. In particular, this gives a prescription for maximum-likelihood decoding with a given measurement circuit.


I. INTRODUCTION
Quantum computation offers exponential algorithmic speed-up for some classically hard problems. This promise is conditional in a fundamental way upon the use of quantum error correction (QEC). However, despite an enormous progress achieved both in theory and experiment during the quarter century since the invention of QEC [1], universal repetitive QEC protecting against both phase and amplitude errors has not yet been demonstrated in any of the qubit systems constructed to date. This illustrates the enormous difficulty of building quantum computers (QCs) with sufficiently small error rates.
Given also the engineering difficulties with scaling up the number of qubits [2], it is important that on the algorithmic side one tries to achieve every optimization possible. Among other things, we would like to maximize the probability of successful syndrome-based decoding with QEC. Given the available hardware, this requires choosing the optimal code, the optimal measurement circuit, and the optimal decoder. In particular, a decoder should be designed for the specific syndrome measurement circuit, as it must be aware of the associated correlations between the errors [3][4][5]-at sufficiently high error probabilities such correlations are present even in fault-tolerant (FT) circuits designed to prevent a single fault from affecting multiple data qubits.
In contrast, the standard approach is to use a decoder optimized for the underlying code, regardless of the actual circuit used for syndrome measurement. In some cases, e.g., in the case of surface codes with minimum-weight perfect matching (MWPM) decoder, some leading-order correlations can be included in the edge weights of the graph used for decoding [4,[6][7][8]. However, such a scheme is limited to codes and measurement circuits where MWPM can be done efficiently, e.g., surface codes with single-ancilla measurement circuits [9], and certain other classes or topological codes [10,11]. Further, there is necessarily a decoding accuracy loss when measurement circuit is not simply repeated, e.g., with code deformation or lattice surgery [12].
A feasible approach to building a decoder optimized for the specific measurement circuit is to train a neural network (NN) using extensive simulation data [13][14][15][16][17][18][19][20]. How-ever, as is commonly the case with NNs, there is always a question whether training has been sufficient to achieve optimal decoding performance.
A parameterization of the probability distribution of correlated quantum errors in terms of a spin model has been recently considered by Chubb and Flammia [5]. In particular, they describe how such a formalism can be used for maximum-likelihood (ML) decoding in the presence of circuit-level noise. However, Chubb and Flammia focused on larger circuits composed of measurement blocks. Errors are assumed uncorrelated between the blocks, while a model of error correlations at the output of each block has to be constructed offline, e.g., using error propagation for a Clifford measurement circuit. In particular, Chubb and Flammia stopped short of analyzing circuit-level errors, and only considered numerically a toy model of correlated errors.
The goal of this work is to give an explicit numerically efficient algorithm for analyzing error correlations resulting from a given qubit-based Clifford measurement circuit, and for constructing decoders optimized for such a circuit. Main result is that such correlations can be accounted for by using phenomenological error model (no error correlations) with the circuit-associated subsystem code constructed by Bacon, Flammia, Harrow, and Shi [21,22]. Thus, any generic decoder capable of dealing with uncorrelated data and syndrome measurement errors in highly-degenerate sparse-generator subsystem codes can be rendered to account for circuit-level error correlations. Error correlations needed for implementing the scheme of Chubb and Flammia are recovered by calculating the marginals of the constructed distribution, which can be done in practice using Ising model starpolygon transformations [23]. The construction naturally includes additional correlations between errors in different fault locations on the circuit.
An immediate application is for designing decoders approaching true ML decoding for Clifford circuits, to be used in quantum memory. In particular, more accurate decoding and optimization, with the ability to account for error rates' variations on individual qubits, could help improve QEC to the level sufficient to pass the break-even point and show coherent lifetimes longer than that of an unprotected qubit in present-day or nearfuture devices. Related techniques could also be more widely applicable for design and analysis of FT protocols and control schemes. Examples include error correction with FT gadgets like flag error correction [24][25][26], schemes for protected evolution, e.g., using code deformations where conventional approaches may result in a reduced distance [12], and optimized single-shot error-correction protocols [27][28][29].
In addition, analysis of marginal error distributions and associated families of asymmetric codes of varying degeneracy could become an important tool in the theory of QECCs. In particular, it could help constructing better decoders for highly-degenerate quantum LDPC codes. Also, thorough analytical understanding of error correlations could be useful for fundamental analysis of thresholds, e.g., in order to extend and/or improve the bounds in Refs. 30 and 31.
Paper outline: After this Introduction, an overview of relevant notations and results in quantum codes and multi-variable Bernoulli distributions is given in Sec. II. Pauli error channels with correlated errors, including circuit-level correlations, are discussed in Sec. III. Corresponding marginal distributions are constructed in Sec. IV, followed by a discussion of possible implications for exact and approximate ML decoding with circuit-level errors in Sec. V. Sec. VI gives the concluding remarks.

A. Quantum codes
Generally, an n-qubit quantum code Q is a subspace of the n-qubit Hilbert space H ⊗n 2 . A quantum [[n, k, d]] stabilizer code is a 2 k -dimensional subspace specified as a common +1 eigenspace of all operators in an Abelian stabilizer group S ⊂ P n , −1 1 ∈ S, where the n-qubit Pauli group P n is generated by tensor products of single-qubit Pauli operators. The stabilizer is typically specified in terms of its generators, S = S 1 , . . . , S n−k . The weight of a Pauli operator is the number of qubits that it affects. The distance d of a quantum code is the minimum weight of a Pauli operator E ∈ P n which commutes with all operators from the stabilizer S, but is not a part of the stabilizer, E ∈ S. Such operators act non-trivially in the code and are called logical operators.
Subsystem codes [32,33] are a generalization of stabilizer codes where only some of the logical qubits are used. More precisely, given a stabilizer group S, the stabilized subspace Q S is further decomposed into a tensor product of two subspaces, Q S = Q L ⊗ Q G , where Q L is a 2 k -dimensional "logical" subspace used to store the quantum information, while we do not care about the state of the "gauge" qubits in the subspace Q G after the recovery. Logical operators of the original code which act non-trivially only in Q G , together with the operators from the stabilizer group S, generate the non-Abelian gauge group G which fully characterizes the subsystem code. In particular, the center Z(G) is formed by the elements of the original stabilizer S, up to a phase i m , m ∈ {0, 1, 2, 3}.
A Pauli error E that anticommutes with an element of the stabilizer, ES = −SE, S ∈ S, is called detectable. Such an error results in a non-zero syndrome s = {s 1 , . . . , s r } whose bits s i ∈ {0, 1} are obtained by measuring the eigenvalues (−1) si of the chosen set of stabilizer generators S i , i = {1, . . . , r}, r ≥ n − k. Unlike in the case of classical codes, there may be many equivalent (mutually degenerate) errors resulting in the same syndrome. For a subsystem code, errors E degenerate with E, denoted E E, have the form E = EG, where G ∈ G is an element of the gauge group; such errors can not and need not be distinguished. Non-trivial logical operators of the subsystem code are logical operators of the original stabilizer code which act non-trivially in Q L . A bare logical operator U acts trivially in Q G and commutes with any element of G. Such restrictions do not apply to dressed logical operators which are only required to commute with elements of the stabilizer S. In any case, multiplication by a logical operator U gives an error with the same syndrome but from a different equivalence class, EU E.
Analysis of error correction is conveniently done using quaternary, or an equivalent binary, representation of the Pauli group [34,35] . . Z vn n , is mapped, up to a phase, to a length-2n binary vector e = (u|v). Two Pauli operators U 1 , U 2 commute iff the symplectic inner product of the corresponding binary vectors is zero, e 1 e T 2 = 0. Here I n is the n × n identity matrix. Thus, if H = (H Z |H X ) is an m × 2n binary matrix whose rows represent stabilizer generators, and H ≡ H Σ = (H X |H Z ), then the syndrome s of an error U e with binary representation e = (u|v) is given by s T = He T . If we similarly denote G = (G X |G Z ) an r × 2n matrix formed by the gauge group generators (for a stabilizer code, G = H), the errors with binary representation e and e = e + αG are equivalent, e e, for any length-r binary string α ∈ F ⊗m 2 . Generally, GH T = 0, and where κ is the number of gauge qubits. Matrices G and H can be viewed as generator matrices of an auxiliary length-2n CSS code which encodes 2k qubits. These correspond to a basis set of 2k independent logical operators of the original subsystem code. It will be convenient to introduce a logical generating matrix L such that LH T = 0, rank L = 2k. Rows of L map to basis logical operators; a non-zero linear combination of the rows of L must be linearly independent from the rows of G. Gauge or stabilizer generators can be measured with a Clifford circuit which consists of ancillary qubit initial-ization and measurement in the preferred (e.g., Z) basis, and a set of unitary Clifford gates, e.g., single-qubit Hadamard H and phase P gates and two-qubit CNOT. Generally, a Clifford unitary U maps Pauli operators to Pauli operators, E = U † EU , E, E ∈ P n . Ignoring the overall phase (for complete description, see Refs. 36 and 37), this corresponds to a linear map of the corresponding binary vectors, (e ) T = Ce T , where C ≡ C U is a symplectic matrix with the property C T ΣC = Σ.

B. Bernoulli distribution
Multi-variate Bernoulli distribution describes a joint probability distribution of m single-bit variables x i ∈ F 2 , e.g., components of a binary vector x ∈ F m 2 . Most generally, such a distribution can be specified in terms of 2 m probabilities p x ≥ 0, with normalization x∈F m where the probabilities are assumed positive, p x > 0. For any given x ∈ F m 2 only one exponent is non-zero so that the result is p x . Taking the logarithm and expanding, one obtains the corresponding "energy" E ≡ − ln P(x) as a polynomial of m binary variables. The corresponding coefficients can be viewed as binary cumulants [38]; presence of high-degree terms indicates a complex probability distribution with highly non-trivial correlations.
For applications it is more convenient to work with spin variables s i = (−1) xi ≡ 1 − 2x i ∈ ±1, and rewrite the energy function using the general Ising representation first introduced by Wegner [39], parameterized by the binary spin-bond incidence matrix θ with m rows, and the bond coefficients K b . While most general m-variate Bernoulli distribution requires 2 m − 1 bonds, in the absence of high-order correlations significantly fewer terms may be needed. Of particular interest are distributions with sparse matrices θ, e.g., with bounded row and column weights, which also limits the number of columns. In the simplest case of independent identically-distributed (i.i.d.) bits with equal set probabilities p 1 = p, p 0 = 1 − p, we can take θ = I m , the identity matrix, and all coefficients equal, K = ln[(1−p)/p]/2. The logarithm ln[(1 − p)/p] is commonly called a loglikelihood ratio (LLR); the coefficient K here and K b in Eq. (4) are thus called half-LLR coefficients.

III. ERROR CORRELATIONS IN A CLIFFORD MEASUREMENT CIRCUIT
A. Pauli error channel with correlations.
Consider the most general Pauli error channel where P(e) is the probability of an error E e with binary representation e, with the irrelevant phase disregarded. Technically, P(e) describes a 2n-variate Bernoulli distribution. The probability P(e) can be parameterized in terms of a 2n×m binary coupling matrix θ with m < 2 2n columns, and a set of coefficients K b , b ∈ {1, . . . , m}, where [e θ] b in the exponent is the corresponding component of the row-vector e θ. The normalization constant (6) is the partition function of the Ising model in Wegner's form, cf. Eq. (4), In the simplest case of independent X and Z errors, θ = I 2n , the identity matrix, while e 2K b = (1 − p X )/p X for b ≤ n, and the corresponding expression with p X → p Z for n < b ≤ 2n. In the case of the depolarizing channel with error probability p, we have, instead, where the additional column block is to account for correlations between X and Z errors. Additional correlations between the errors can be introduced by adding columns to matrix θ and the corresponding coefficients K b . Given a probability distribution in the form (6), it is easy to construct an expression for probability of an error equivalent to e in a subsystem code with gauge generator matrix G, extending the approach of Refs. 6, 40, and 41, and reproducing some of the results from Ref. 5. A substitution e → e + αG and a summation over α gives Here G is an r × 2n binary matrix, cf. Eq. (2), and the prefactor accounts for a possible redundancy in the summation. Notice that the partition function in the numerator has the same number of bonds m as that in the denominator, but with the signs of the coefficients K b corresponding to non-zero elements of the binary vector ≡ e θ flipped. When both the gauge generator matrix G and the error correlation matrix θ are sparse, the incidence matrix Θ ≡ Gθ in the numerator of Eq. (9) must also be sparse.
B. Clifford measurement circuit and associated input/output codes.
Let us now consider error correlations resulting from a Clifford measurement circuit. Specifically, following Refs. 21 and 22, consider an n-qubit circuit, n = n 0 + n a , with n 0 data qubits and n a ancillary qubits. First, ancillary qubits are initialized to |0 , second, a collection of Clifford gates forms a unitary U , and finally the ancillary qubits are measured in the Z basis, see Fig. 1. In the absence of errors and in the event of all measurements returning +1 (zero syndrome), the corresponding post-selected evolution is described by the matrix where I is a single-qubit identity operator. The circuit is assumed to be a good error-detecting circuit (good EDC), namely, V † V be proportional to the projector onto a sub- Generic circuit with n0 data and na ancillary qubits initialized to |0 and measured in Z basis. In practice, ancillary qubit can be measured during evolution, and subsequently reused after initialization. However, there is no mechanism for adapting the gates to the measurement results.
A good EDC also defines an output code Q 0 ⊆ H ⊗n0 2 which encodes the same number of qubits k. Indeed, since V † V = cΠ 0 , matrix V has only one non-zero singular value, √ c; this immediately gives V V † = cΠ 0 , with the projector onto a space Q 0 ⊆ H ⊗n0 2 of the same dimension 2 k , the output code. Moreover, for any input state |ψ ∈ Q 0 , the output |ψ ≡ V |ψ ∈ Q 0 is in the output code, and the corresponding transformation is a (scaled) Clifford unitary.
Even though the map between Q 0 and Q 0 is unitary, the distance d 0 of the output code does not necessarily equals d 0 . In particular, adding a unitary decoding circuit on output data qubits may be used to render d 0 = 1.

C. Errors in a Clifford circuit.
Using standard circuit identities, any circuit error E can be propagated forward to the output of the circuit, thus giving an equivalent data error E 0 (E) ∈ P n0 and the (gauge) syndrome σ (E) ∈ F na 2 corresponding to the measurement results. Clearly, there is a big redundancy even if phases are ignored, as many circuit errors can result in the same or equivalent E 0 (E) and σ (E). The goal is to find the conditional probability distribution for the equivalence class of the output error E 0 (E) given the measured value of σ (E).
In a given Clifford circuit, consider N possible error locations, portions of horizontal wires starting and ending on a gate or an input/output end of the wire. For example, the circuit in Fig. 2 has N = 15 error locations. A Pauli error may occur on any of these locations. A circuit error E is a set of N single-qubit Pauli operators without the phase, P i ∈ {I, X, Y, Z}, i ∈ {1, . . . , N }. When two circuit errors are applied sequentially, the result is a circuit error whose elements are pointwise products of Pauli operators with the phases dropped. The algebra defined by such a product is isomorphic to the N -qubit Pauli group without the phase. This is an Abelian group which also admits a representation in terms of length-2N binary vectors e ≡ (u|v) ∈ F 2N 2 . Multiplication of two circuit errors amounts to addition of the corresponding binary vectors e. With these definitions, error propagation through a Clifford circuit can be described as products of the original circuit error E with trivial circuit errors which have no effect on the outcome. The collection of all such errors forms error equivalence group (EEG) of the circuit. The generators of this group involved in error propagation are trivial errors, each formed as a union of a single-qubit Pauli P i , i ∈ {1, . . . , N } (but not in the output for data qubits, or right before the measurements for ancillary qubits) with the result of error propagation of P i across the subsequent gate. For example, for the circuit in Fig. 2, some of the EEG generators, with identity operators dropped, are {Z 1 , Z 6 }, {X 4 , X 9 }, {X 1 , X 6 , X 9 }, and {Z 4 , Z 6 , Z 9 } (propagation across the leftmost CNOT), and {X 3 , X 8 }, {Z 3 , Z 8 } (propagation across the leftmost identity gate labeled I).
In addition, for a circuit which includes qubits' initialization and measurement, the list of EEG generators includes errors Z j on ancillary qubits right after initialization to |0 and right before the Z-basis measurement. For the circuit in Fig. 2, these are {Z 4 }, {Z 5 } (ancillary qubits right after initialization), and {Z 14 }, {Z 15 } (an-cillary qubits just before measurement). It is important that stabilizer group of the input code, after its elements are promoted as circuit errors, forms a subgroup of thus constructed full circuit EEG [22]. Same applies to the stabilizer group of the output code.
In the binary representation, a single-qubit Hadamard gate connecting circuit positions i and i corresponds to a pair of generators with non-zero elements , a phase gate similarly corresponds to (10|10) and (01|11) (Z i propagates into Z i and X i into Y i ), and a CNOT gate (10 00|10 00) (input Z i on the control and an output Z i on the same wire), (01 00|01 01) (input X i on the control and X i , X j on the outputs), (00 01|00 01) (target X pass-through), and, finally, (00 10|10 10). The generators for the single-qubit trivial errors are even simpler, since a Z j maps to the weight-one vector (u j , v j ) = (10).
For a given circuit error E with binary form e ∈ F 2N 2 , any equivalent error can be obtained by adding a linear combination of thus constructed circuit EEG generators g j . It is convenient to combine the corresponding generators into a generator matrix G with 2N columns. As discussed in the Appendix A, for a good EDC with the rank of the input-code stabilizer group r 0 = n 0 − k and the constant c = 1/2 κ , see Eq. (10) and below, the generator matrix has the rank Generally, a non-zero value f > 0 indicates a measurement redundancy.
With the circuit EEG generator matrix in place, it is easy to construct a formal expression for the probability of an error in a given equivalence class. Namely, if the circuit error probability distribution is given by an analog of Eq. (6), the probability of an error equivalent to a given e ∈ F 2N 2 is proportional to the Ising partition function, cf. Eq. (9). It is also easy to find the conditional probability distribution of output errors with a given syndrome. Namely, given the binary form of an output data error e 0 ∈ F ⊗2n0 2 and a syndrome vector σ ∈ F ⊗na 2 , we need to form the corresponding vector e ≡ e(e 0 , σ ) ∈ F 2N 0 , filling only the components corresponding to the output data qubits and ancillary X j just before the measurements which give non-zero syndrome bits in σ .
For ML decoding, we compare thus computed probabilities for all inequivalent errors consistent with the measured syndrome σ . In particular, these include errors that differ by a logical operator of the input (or, equivalently, the output) code, since non-trivial logical operators are outside the circuit EEG. In other words, length-2N binary vectors corresponding to logical operators are linearly independent from the rows of the circuit EEG generator matrix G. It is convenient to define a binary logical generator matrix L of dimension 2k × 2N , whose rows correspond to mutually inequivalent logical operators of the input code.
For the ease of decoding, it is also convenient to introduce the parity-check matrix H, also with 2N columns, whose rows are orthogonal to the rows of both matrices G and L, and whose rank satisfies Clearly, H is dual to a matrix combing the rows of G and L. As in the case of a subsystem code, see Eq. (2), these matrices can be seen as forming a half of a CSS code, with stabilizer generator matrices G X = G, G Z = H, and X-logical operator generator matrix L X = L.
The orthogonality requirement for rows of H can also be interpreted in terms of the N -qubit Pauli group associated with the circuit. Namely, the Pauli operators corresponding to the rows of the symplectic dual matrix H = HΣ, see Eq. (1) and below, must commute with generators of the circuit EEG. This guarantees that each of these operators be a spackle, i.e., a circuit error where the single-qubit Pauli operators in any time layer can be obtained by error propagation from those in the previos time layer, see Ref. 22. Respectively, row weights of H scale linearly with the circuit depth.

D. Circuit subsystem code
The discussion in Ref. 22 focused on the special case of good EDCs where each qubit line is required to have an odd number of locations. In this special case, the circuit EEG can be seen as the gauge group of a quantum subsystem code of length N which encodes the same number of qubits k as the input/output codes, and has a distance not exceeding the corresponding distances, d ≤ min(d 0 , d 0 ). Respectively, rows of the matrix H which correspond to generators of the stabilizer group of the subsystem code are necessarily given by linear combinations of the rows of G. In addition, circuit errors corresponding to the rows of the matrix L can be seen as dressed logical operators, while the bare logical operators which commute with the elements of the circuit EEG can be constructed as spackles.
In practice, any circuit can be easily modified to satisfy this additional requirement by inserting a null (identity) gate into each qubit line with an even number of locations, see Fig. 2 for an example. While it is not strictly necessary to work with circuits that satisfy this requirement, it is convenient, as the additional structure of the subsystem code can be used to verify the validity of the constructed matrices.
However, once the matrices G, H, and L are constructed, there is no need to refer to the subsystem code. In fact, the generator matrix row-reduction transformation described in the following Section IV preserves the orthogonality and the relation Eq. (14) between the ranks inherent in the CSS code map, but not the structure of the circuit subsystem code. E. Circuit code distance.
How good can a measurement protocol be? What are the bounds on the distance d of the subsystem code associated with the circuit?
Generally, if d 0 and d 0 are the distances of the input and the output codes, the distance of the corresponding circuit code satisfies d ≤ min(d 0 , d 0 ). This follows from the fact that a logical operator of the input code, e.g., is naturally mapped to a (dressed) logical operator of the circuit code. An important result in Ref. [22] is that one can always design a fault-tolerant circuit so that the distance d of the corresponding subsystem code be as good as that of the input code, d = d 0 .
Unfortunately, circuit-code distance d does not have a direct relation to the probability distribution of the output errors; even single-qubit output errors may remain undetected. This is a well known "feature" of quantum error-correcting codes operating in a fault-tolerant regime, even for codes with single-shot properties [27][28][29]. Indeed, regardless of the circuit structure, errors on the data qubits in the locations just before the circuit output will not be detected.
In comparison, with a formally defined circuit code, such an error can be propagated back to the input layer and (when it is detectable, e.g., if its weight is smaller than the distance d 0 of the output code) it would necessarily anticommute with one or more combination(s) Z g of the ancillary qubits. The original error would thus be detectable in the circuit code. Causality does not permit such an operation with actual circuit evolution. Formally, this functionality is removed due to assumed ancillary qubit initialization to |0 .
Of course, even if a small-weight error goes undetected, it may get corrected after one or more additional measurement rounds. In practice, when an error-correcting code is analyzed in a fault-tolerant setting, the standard numerical procedure is to add a layer of perfect stabilizer measurements (no measurement errors). This guarantees that all small-weight errors at the end of the simulation be detected, and thus recovers the distance d > 1 of the circuit code, without the need to violate causality.

IV. MARGINAL DISTRIBUTIONS FOR CORRELATED ERRORS
The circuit EEG fully describes correlations between the circuit errors. However, it also contains a lot of excessive information: for the purpose of error correction, we are only interested in the distribution of the output errors and the syndrome, which are all supported at the rightmost locations of the circuit. In addition, the large size and sparsity of the circuit generator matrix G makes decoding difficult, except with the simplest circuits.
Present goal is to reduce the number of independent variables, by constructing the marginal distribution for a given subset of the variables. This amounts to a summa-tion over the variables one is not interested in, e.g., P(e s+1 , . . . , e m ) = e1 e2 . . . es P(e 1 , e 2 , . . . , e m ). (15) In the case of binary variables e i ∈ {0, 1}, both the original and the resulting marginal distributions are multivariate Bernoulli distributions, and each can be described in terms of the Ising energy function (4).
A. Row-reduction transformation

Generator matrix and the coupling coefficients
Given an n-variate Bernoulli distribution described by the coupling matrix Θ, e.g., Θ = Gθ in Eq. (9), and a set of half-LLR coefficients K b , 1 ≤ b ≤ m, what are the corresponding parameters of the marginal distribution (15)? In the equivalent Ising-model representation (4), the goal is to describe the couplings between the remaining spins after a partial summation. Such a star-polygon transformation for a general Ising model has been constructed in Ref. 23. The transformation includes two special cases long known in the Ising model literature: the Onsager's star-triangle transformation [42] and the (inverse) decoration transformation [43,44].
It is convenient to derive the result directly, focusing on the marginal distribution after a summation over just one spin variable s i ∈ ±1 corresponding to i th row of Θ. Without limiting generality, assume that the chosen row has w non-zero elements in positions 1, 2, . . . , w, decompose the corresponding bond variables R b = s i T b , T b ∈ ±1, 1 ≤ b ≤ w, and perform the summation explicitly (with the additional one-half factor for convenience), where τ ∈ F w 2 is a composite index with elements τ b such that T b = (−1) τ b . To exponentiate this expression, rewrite B τ by analogy with Eq. (3), where the coefficients B ... in the base of the exponents are the hyperbolic cosines (16) of the sum of coefficients ±K b with the signs fixed, and matching exactly the signs in the exponents. As in Eq. (3), after a substitution of any given τ ∈ F w 2 , only one term with the correct index τ remains in the product. The modified bonds and the corresponding coefficients K b are obtained by expanding the polynomial in the exponent of Eq. (17). Because of the symmetry of the hyperbolic cosine, only even-weight products of the original bonds result from this expansion. Thus, in general, for an original row of weight w, the corresponding w columns are combined to produce w = 2 w−1 −1 even-weight column combinations, a change of ∆w = 2 w−1 − w − 1 columns.
Specifically, for a row of weight w = 1, the transformation amounts to simply dropping the row and the corresponding column of Θ. The values of K b remain the same, except for the one value that is dropped.
For a row of weight w = 2, only the sum of the corresponding columns is retained in Θ, with the coefficient cf. Eq. (16). Equivalently, tanh K 1,2 = tanh K 1 tanh K 2 . For a row of weigth w = 3, the three columns of the original matrix Θ are replaced by their pairwise sums, with the coefficient for the combination of the first two columns. The remaining coefficients K 2,3 and K 3,1 can be obtained with cyclic permutations of the indices. For a row of weight w = 4, the four columns of the original matrix Θ are replaced with six pairwise sums and the seventh column combining all four original columns, with the coefficients, e.g., In general, the coefficient K J in front of the product of T b with indices b in an (even) subset J ⊆ {1, 2, . . . , w} is given by the sum of logarithms of the hyperbolic cosines B τ with τ 1 = 0 (this accounts for symmetry of hyperbolic cosine), with the coefficients ±1/2 w−1 , where the sign is determined by the parity of the weight of the subset τ [J] restricted to J. It is easy to check that the numbers of positive and negative coefficients always match. Respectively, the coefficients for high-weight products are typically small in magnitude.

Transformation for a parity check matrix
The row-reduction transformation can also be written in terms of the parity-check matrix H, also with m columns, and dual to Θ, such that HΘ T = 0 and rank H = m − rank Θ.
To this end, consider the row-reduction of Θ as a combination of the following elementary column steps: (i) The 1st column of Θ is added to columns 2, 3, . . . , w of Θ; as a result the i th row of Θ has a non-zero element only in the first position. (ii) The 1st column of thus modified Θ is dropped, which leaves the i th row zero-it may be dropped as well; (iii) If w > 2, 2 w−1 − w combinations of two, three, . . . , w − 1 columns with indices b ≤ w − 1 are added to the matrix Θ. These can be sorted by weight so that each added column be a combination of exactly two existing columns in the modified Θ. The corresponding steps for H are: (i ) Columns b ∈ {2, 3, . . . , w} of H are added to its 1st column, which becomes identically zero as a result. (ii ) Drop all-zero 1st column from thus modified H. (iii ) For each column, e.g., b , added to Θ as a linear combination of two existing columns b 1 and b 2 , H acquires a new row with the support on {b 1 , b 2 , b } to express this linear relation. It is easy to check that row orthogonality, HΘ T = 0, is preserved. Also, the rank of Θ is reduced by one, while the increase of the rank of H matches the number of columns added in step (iii ), so that the exact duality (18) is preserved.

B. Marginal distribution for error equivalence classes
This analysis is easily carried over to the problem of syndrome-based ML decoding for an [[n, k]] subsystem code under a Pauli channel characterized by a 2n × m matrix θ and a set of m half-LLR coefficients {K b }, see Sec. III A. Given a gauge generator matrix G, the probability of an error equivalent to e is given by Eq. (9). Generally, for ML decoding we need to choose the largest of the 2 2k partition functions Typically, this needs to be done for a large number of error vectors e. Can the calculation be simplified en masse by doing partial summation over the spins corresponding to all rows of Gθ as described in the previous section?
The structure of the logical operators can be accounted for by extending the rows of the generator matrix which now has two row blocks, and the half-LLR coefficientsK b , b ∈ {1, 2, . . . , m}. A matching parity check matrix H is dual to Θ, see Eq. (18).

Independent X and Z errors
Let us first consider the simpler case of independent X and Z errors, where matrix θ = I 2n . In this case H = H = HΣ, where H is a generator matrix of the code's stabilizer group, see Eq. (2). Marginal distribution being independent of the choice of the generator matrix, use row transformations and column permutations to render where matrices B and C have ≡ 2n − r − 2k columns, and r = rank(G). The matrices Θ and H are mutually dual as can be immediately verified. Row-reduction operations applied to each row in the upper block of Θ correspond to: (i ) Column operations to set both blocks A and B to zero, and conjugate column operations on H to set its left-most column block to zero.
where double vertical lines are used to separate the newly added columns. When the described transformation is applied to any of the original partition functions (19), the result is just of the sum of the final half-LLR coefficients. Can identical columns of the final generator matrix (23) be similarly combined to simplify the structure of the final marginal distribution for error equivalence classes? The answer is yes, as long as we account for the effect of the error e on the values of the coefficients K In fact, it is easy to check that the row-reduction transformations in Sec. IV A are such that the additional signs in Eq. (19) only affect the signs of the coefficients K where the vector ε 1 selects the equivalence class, and ε 2 = s + ε 1 C, with s ≡ eH T the original syndrome. Clearly, the right-most blocks in Eqs. (23) and (25) are obtained from ε ≡ (ε 1 |ε 2 ) with 2k + = 2n − r components as a right product with the combined matrix M = M1 M2 . All 0 ≡ 2n − r components of ε being independent, there are non-trivial combinations. Combining identical columns in the transformation from ε to e (fin) , we can ensure that the final matrices contain no more than m max columns. With Eq. (2), r = n−k+κ, so that 0 = n+k−κ, where κ is the number of gauge qubits in the subsystem code. For a stabilizer code, κ = 0, thus 0 = n + k. Clearly, the latter is just the number of logical generators, 2k, plus the number of independent syndrome bits, n − k.

A more general Pauli channel
Instead of considering most general situation, consider an important case of a Pauli channel where any singlequbit error has a non-zero probability. Then, the incidence matrix can be chosen to have an identity-matrix block, θ = (I 2n |T ), where T is an (m − 2n) × 2n binary matrix. As a result, the matrix Θ and the half-LLR coefficients in Eq. (20) both acquire additional blocks of linearly-dependent components, while the parity-check matrix dual to Θ can be chosen in the form Since the relevant error in Eq. (20) is eθ = (e|eT ), the lower row-block of H gives a zero syndrome, just like the lower row-block in Eq. (24). Basically, after degeneracy is taken into account, the number of independent components of eθ is 2n − r, the same as for e; final matrices Θ and H can always be constructed to have m ≤ m max components given by Eq. (26). Of course, the actual resulting matrices, as well as the final half-LLR coefficients do depend on the assumed error model. How much freedom is there to choose the matrices Θ and H ? For the purpose of ML decoding, we need to go over the entire linear space C Θ generated by the rows of Θ ; the choice of basis is irrelevant. The same is true regarding the parity check matrix H . These are the same symmetries as for a generator and a parity-check matrices of a binary code.
In essence, the original quantum subsystem code with gauge G and logical L generator matrices has been transformed into a classical binary code, with the transformation dependent in a non-trivial fashion on the error probability distribution. This binary code has length m ≤ m max , and it encodes 2k bits.
Further, any non-trivial linear combination of rows of Lθ with rows of Gθ has weight lower-bounded by the distance of the quantum code, which gives the lower bound d ≥ d on the distance of this binary code. In general, given the structure of the row-reduction transformation, the distance may be quite a bit larger, possibly scaling linearly with m . Notice, however, that with highly nonuniform error probability distribution, more relevant parameter is not the distance, but the corresponding quantity weighted with half-LLR coefficients K (fin) b , related to the probability of the most-likely logical error. By construction, this quantity is exactly the same as for the original quantum code.
Let us consider an important case of a sparse original parity check matrix H, e.g., with row and column weights bounded. This requires a low-density paritycheck (LDPC) code with a sparse stabilizer generator matrix H = HΣ, and an error channel with the generator matrix θ = (I 2n |T ) also sparse. When acting on the parity check matrix, each row-reduction transformation drops a column of H, and may also add one or more rows of weight 3, see Sec. IV A 2. Thus, the parity-check matrix of the classical binary code describing the marginal probability distribution of error equivalence classes must also be sparse, with row weights not exceeding w ≤ max(w, 3), where w is the maximum row weight of H in Eq. (27).

C. Marginal distribution for output errors in a good measurement circuit
This discussion also applies to the code associated with the error equivalence group of a good EDC. In this case the matrices G and H have 2N columns each, where N is the number of circuit locations, and their ranks are given by Eqs. (12) and (14). Just as any circuit error can be pushed all the way to the right, row-reduction can also be done starting with the bits at the beginning of the circuit and pushing toward its output. This way, a circuit error equivalence class can be characterized by 1 = 2N − rank G = 2n 0 + f = (n 0 + k) + (n a − κ) (28) bits, where n 0 + k is the number of linearly-independent error equivalence classes in an [[n 0 , k]] stabilizer code and n a − κ is the number of syndrome bits measured in the circuit. Alternatively, as in a subsystem code with κ gauge qubits, 0 = n 0 + k − κ is the exponent in Eq. (26), and n a is the number of additional error positions right before the measurements. As in the previous section, this gives an upper bound on the maximum number M of columns in the matrices Θ and H that may be necessary, After row-reduction for all generators of the circuit EEG, we get a classical [M , 2k, d ] binary code, where k is the number of encoded qubits, and d ≥ d.
Is this an LDPC code? This question has not been answered in the previous section: the parity check matrix H of the circuit code is not necessarily sparse as its row weights scale linearly with circuit depth. To analyze the sparsity of the output-error parity-check matrix H , write it in a block form similar to Eq. (27), where the upper row-block contains only the original columns of the circuit EEG parity check matrix H remaining after row-reduction steps (i ) and (ii ), while any row with exactly three non-zero entries added in a step (iii ) goes to the lower row-block. Further, if we assume circuit EEG H in the form (27), with the lower rowblock of bounded weight (e.g., w = 3 for depolarizing errors), any potentially large-weight row in H 0 must correspond to (i) a stabilizer generator of the output code, or (ii) a generator of the group H , see Ref. 22. All of these have bounded weights for any bounded-weight stabilizer LDPC code with a measurement circuit where stabilizer generators are measured independently and with a bounded number of ancillary qubits. On the other hand, these conditions do not generally hold for any family of concatenated codes based on a given code, or for subsystem codes where a stabilizer generator may have an unbounded weight or cannot be expressed as a product of gauge generators with a bounded number of terms. Nevertheless, even in these cases, given a potentially very large number of columns (29), it is reasonable to expect the final H to be a sparse matrix, with only a small fraction of non-zero elements.

A. Decoding based on circuit EEG
The present approach is to analyze error propagation in a Clifford measurement circuit in terms of its circuit EEG characterized by a logical generator matrix L and either a generator matrix G or a parity check matrix H. With a minor constraint on the circuit, these correspond to the circuit subsystem code constructed in Refs. 21 and 22. More generally, these matrices form (a half of) a CSS code, with generator matrices G X = G and G Z = H, while rows of L define X-type logical operators. Simply put, any decoder capable of dealing with a CSS code with uncorrelated X-type errors can now be used to account for error correlations in a given measurement circuit.
Additional correlations can also be accounted for. Assuming a Pauli error channel (with or without correlations between different circuit locations) characterized by a coupling matrix θ and a set of half-LLR coefficients K b (one per column), probability of a circuit error equivalent to a given one is given by the Ising partition function (13), with the spin-bond coupling matrix Gθ.
As already discussed, for ML decoding we need to find the largest of the Ising partition functions (19). Such a calculation can be expensive. Indeed, given a code encoding k qubits, we need to compute and choose the largest of all 2 2k partition functions corresponding to all non-trivial sectors; this can only be done in reasonable time for a code with k small. Previously, a computationally efficient ML decoder using this approach and 2D tensor network contraction for computing the partition functions has been constructed for surface codes [45] in the channel model (perfect syndrome measurement).
Feasible approaches for evaluating the partition functions (19) include tensor network contraction (see, e.g., in Ref. 46, for a 3D network contraction with complexity scaling as ∝ nχ 9 , where χ is the bond dimension) and Monte-Carlo (MC) methods constructed specifically for efficient calculations of free energy differences, e.g., the non-equilibrium dynamics method [47] or the classical Bennett acceptance ratio [48]. In application to surface codes in FT regime, MC calculations are in essence simulations of bond-disordered 3D Ising model; such calculations can be done using GPU [49], FPGA [50], or TPU [51] hardware acceleration.
Notice also that the circuit code is extremely degenerate: with Hadamard, Phase, and CNOT gates the rows of the generator matrix G have (quaternary) weights not exceeding three. On the other hand, the row weights of the parity-check matrix H scale linearly with the circuit depth. By these reasons, iterative decoders like belief propagation (BP) are expected to fare even worse than with the usual (not so degenerate) quantum LDPC codes [52][53][54].

B. Generator-based decoding via marginal distributions
The calculation of the Ising partition functions needed for ML decoding can be simplified via partial summation over a subset of the spins. Technically, this corresponds to using a marginal distribution for the subset of variables needed, as discussed in Sec. IV.
Denote V the set of rows of the original generator matrix G [also, rows in the upper block of Θ in Eq. (20)]. Then, row-reduction in an increasing sequence of subsets defines a sequence of mutually-dual pairs of matrices {Θ (j) , H (j) } with m j columns, where The ranks of logical generator matrices remain the same, rank L (j) = 2k, while the sequence r j ≡ rank G (j) is decreasing with increasing j, ending at r s = 0. In essence, this defines a sequence of asymmetric (half) CSS codes [[m j , 2k]], with generator matrices G (j) , where rows of L (j) define X-type logical operators. The sequence ends with a CSS code with an empty X-generator matrix, i.e., a classical binary code with the parity check matrix H (s) .
For each of the codes in the sequence (32), there is also a set of half-LLR coefficients {K Given an error vector e ∈ F mj 2 matching the syndrome, ML decoding can be done by choosing the largest of the 2 2k Ising partition functions [cf. Eq. (19)] By construction, the result should be the same for every j ≤ s. However, the complexities for computing the partition functions may differ dramatically. One possibility for exact ML decoding is thus to choose a subset I ⊂ V to optimize the partition function calculation. In particular, one option is to choose I 1 such that all rows of weights one, two, and three are eliminated from the corresponding matrix G (1) . This minimizes the number of columns in the exact generator matrix of the marginal-distribution. Even though all rows in the original circuit EEG generator matrix G have row weights not exceeding three, row-reduction on rows of weight three tends to create higher-weight rows. Thus, in general, r 1 > 0: the resulting partition function is not expected to be trivial.
To quote some numbers, Tables I and II give the dimensions of generator matrices and row-reduced generator matrices for two code families: the toy codes [[n 0 , 1, d X = n 0 /d Z = 1]] with stabilizer generators Z i Z i+1 mod n0 , 0 ≤ i < n 0 , and the rotated toric codes (Ex. 11 in Ref. 55) [[n 0 , 1, 2t + 1]] with n 0 = t 2 + (t + 1) 2 and stabilizer generators Z i X i+t mod n0 X i+t+1 mod n0 Z i+2t+1 mod n0 , where 0 ≤ i < n 0 and t = 1, 2, . . ., with up to three measurements cycles. The simplest examples of the measurement circuits used for these two code families are shown in Figs. 3 and 4, respectively. In particular, while the original generator matrix for the circuit EEG of the [ [13,1,5]] code with measurements repeated n cyc = 3 times has dimensions 1066 × 1092, row-reduction of rows of weight up to three reduces the dimension to 182 × 247. Second possibility for exact ML decoding is to perform a summation over all spins, as described in Sec. IV C. This gives a probability distribution directly for the error equivalence classes; the logarithm of probability of an error equivalent to e is given, up to an additive constant, by a weighted sum of the half-LLR coefficients . Unfortunately, such an exact expression is expected to have an exponentially long list of coefficients, see Eqs. (28) and (29) for an upper bound. The corresponding column numbers are large even for the relatively simple circuits in Tables I and II | > , for a given > 0. For incompletely reduced matrices in Tables I and II, with = 0.1, the number of columns is reduced, roughly, by a factor of two. On general grounds, the reduction factor is expected to be much larger for fully-reduced generator matrices.
Alternatively, only a fixed number χ of the largest in magnitude coefficients may be preserved. This latter approach is similar in spirit to approximate tensor network contraction using singular value decomposition and a fixed maximum bond dimension. Notice that if only the coefficients corresponding to columns of the matrix H 0 in Eq. (30) are preserved, we get an approximation similar to the conventional phenomenological noise model.
The "history code" decoding algorithm suggested by Chubb and Flammia [5], can be seen as a special case of generator-based decoding. Here the measurement circuit is assumed to have a block structure, and summation is done over all circuit locations with the exception of those at the output of each block. With the circuits in Tables  I and II, a block corresponds to a single measurement cycle. The full probability distribution is then recovered using Bayes rules, assuming no correlations between measurements in different blocks. Evidently, even in this case, the complexity of the probability distribution accounting for full error correlations could be prohibitive for exact ML decoding.

C. Parity-based decoding via marginal distributions
Summations over the spins in the subsets I j with increasing j from a sequence like (31) give marginal distributions which account exactly for increasing numbers of alternative spin configurations. Respectively, the degeneracy of the corresponding row-reduced half CSS codes decreases with increasing j, down to a classical code with no degeneracy at the end of the sequence, j = s. A general expectation is that the accuracy of minimum-energy (ME) decoding would be increasing with increasing j. ME decoding becomes strictly equivalent to ML decoding for the end-of-the-sequence classical code.
Formally, given a parity-check matrix H and a set of LLR weights K b , ME decoding aims to find an error vector e which gives the correst syndrome eH T = s and maximizes the error likelihood b (−1) e b K b or, equivalently, minimizes the error energy E = b e b K b . Compared to generator-based decoding, an obvious advantage is that there is no need to go over all 2 2k logical operators. Unfortunately, for generic codes, even the relatively simple problem of ME decoding has an exponential complexity [56]. Given that the intermediate codes in the sequence (31) tend to be long, this makes it unpractical to use generic ME decoding algorithms with exponential complexity, e.g., the information subset [57,58] or the classical Viterbi [59] algorithm.
Notice, however, that the sparsity of parity-check matrices H (j) increase with j. The final matrix H (s) is expected to be sparse whenever the output code is an LDPC code. Ideally, this classical code could be decoded with a linear complexity using a variant of belief propagation (BP) algorithm [60][61][62][63]. Assuming a sufficiently small fraction of failed convergence cases, the result would be equivalent to ML decoding of the correlated errors.
Unfortunately, this does not look so promising given the fact that this final code is expected to have an exponential length, see Eq. (29). Additionally, as confirmed with limited numerical simulations [64], having a large number of small in magnitude LLR coefficients tends to reduce the convergence rate. Using approximate decoding schemes with reduced number of LLR coefficients as discussed in Sec. V B is expected to help with both issues. Notice, however, that for such a reduction certain columns in the generator matrix Θ are merely dropped (puncturing). The corresponding transformation for the parity-check matrix H is shortening [65], which may reduce the sparsity of the resulting matrix and, in turn, negatively affect the convergence of BP decoding.

VI. DISCUSSION
Improving the accuracy of syndrome-based decoding in the presence of circuit-level error correlations would both increase the threshold to scalable quantum computation and improve the finite-size performance of quantum error correction. Present results make two steps in this direction. First, the observation that ML decoding under these conditions amounts to decoding the code [21,22] associated with the circuit EEG, in the absence of correlations. Second, the structure of this latter code can be significantly simplified using row-reduction transformations, while leaving the probability of ML decoding unchanged. A variety of approximate decoding schemes naturally follow, which interpolate between the exact ML decoding and the decoding within a relatively simple phenomenological error model, with an additional handle on the degeneracy of the actual code to be used for decoding. Designing practical decoding algorithms, e.g., in application to surface codes with well-developed near-optimal syndrome measurement circuits, would require a substantial additional effort. However, this does not look like an unsolvable problem.
Decoding quantum LDPC codes is a major problem in the theory of QECCs, especially in a fault-tolerant regime with syndrome measurement errors present. While a substantial progress has been made in recent years [66][67][68][69][70], this problem remains open in application to generic highly-degenerate codes. Transformations discussed in Sec. IV change the degeneracy of a quantum code, and can even map from a quantum to a classical code. This opens new avenues to explore in decoding, in particular, new ways to apply existing iterative decoding algorithms to highly degenerate codes.
Circuit optimization: Traditional approach to quantum error correction is to start with a code, come up with an FT measurement circuit, compile it to a set of gates available on a specific quantum computer, and then finally design a decoder. Instead, one could start with the list of permitted two-qubit gates on a particular device and enumerate all good error-detecting circuits, increasing circuit depth and the number of gates. Given the circuit, it is easy to find the parameters of the input/output codes, as well as construct the associated circuit code. While at the end we would still need to evaluate and compare the performance of thus constructed codes, such a procedure could offer a shortcut to circuit optimization for specific hardware.