Fundamental thresholds of realistic quantum error correction circuits from classical spin models

Mapping quantum error correcting codes to classical disordered statistical mechanics models and studying the phase diagram of the latter has proven a powerful tool to study the fundamental error robustness and associated critical error thresholds of leading quantum error correcting codes under phenomenological noise models. In this work, we extend this mapping to admit realistic, multi-parameter faulty quantum circuits in the description of quantum error correcting codes. Based on the underlying microscopic circuit noise model, we first systematically derive the associated strongly correlated classical spin models. We illustrate this approach in detail for the example of a quantum repetition code in which faulty stabilizer readout circuits are periodically applied. Finally, we use Monte-Carlo simulations to study the resulting phase diagram of the associated interacting spin model and benchmark our results against a minimum-weight perfect matching decoder. The presented method provides an avenue to assess the fundamental thresholds of QEC codes and associated readout circuitry, independent of specific decoding strategies, and can thereby help guiding the development of near-term QEC hardware.

In this work, we extend the QEC-statistical mechanics model mapping and apply it to realistic quantum circuit-noise scenarios. For Clifford measurement circuits, correlations arising from circuit noise can be efficiently related to effective phenomenological models as direct input for (possibly sub-optimal) decoders [23,42]. Here, we use a similar approach but perform a systematic error propagation analysis and identify emerging effective noise processes in realistic quantum circuits. Instead of direct decoding, we derive the associated disordered classical spin model with correlated quenched disorder, determine its phase diagram and thereby obtain maximum threshold values agnostic to specific decoding strategies. This method allows one to assess the maximum potential  Figure 1: (a1) One cycle of the phase-flip repetition code with circuit-level noise, shown for three data qubits Di and two ancilla qubits ai. White boxes RY (R −1 Y ) represent single-qubit Y -rotations of π/2 (−π/2). Gray boxes (idle, init, 1Q, 2Q and read) are depolarizing channels describing faults during idling, state preparation, singlequbit, CNOT gates and measurements, respectively. (a2) (Top row) Examples of error propagation through CNOT and RY gates: Z errors on target qubits propagate to the control qubits; Z errors do not propagate from control to target qubits; Z errors are transformed into X errors through RY . (Center row) A Z error between two CNOTs on D1 propagates also as a measurement error to the ancilla a0. (Bottom row) A Z error between two RY on the ancilla results in a measurement error. (b1) Four measurement cycles showing error propagation and the affected ancilla qubits. (i) A data-qubit error (green asterisk) propagates along the green path and affects two ancilla qubits at every subsequent measurement. (ii) A measurement error (blue asterisk) affecting a single ancilla. (iii) An error between the CNOT gates of a data qubit (red asterisk) propagates along the red path and affects one ancilla at step t and two ancilla qubits in subsequent measurements. (iv) A space-time equivalence given by two qubit errors and two measurement errors, which does not trigger any ancilla. (b2) Error graph generated by the physical errors in panel (b1). Positions where qubit, measurement and correlated data phase-flip and measurement errors can happen are represented by vertical, horizontal and L-shaped edges, respectively. Colored semicircles represent ancilla qubits triggered by an error event. From this graph we derive the fundamental error events determining the couplings of the statistical mechanics Hamiltonian (panels (c1)-(c2)) and the syndrome volume. (c1) Fundamental effective errors e1, e2, e3 generated by the error processes (i)-(iii) and equivalences σ . of realistic state-of-the-art QEC architectures. This is particularly pressing for QEC codes when (near-)optimal decoding is computationally hard [43], when optimal decoders are unknown or strongly dependent on the chosen QEC circuitry and noise model details [44], e.g. in measurement circuits for high-weight stabilizers in color codes [45][46][47] or recently developed flag-qubit based QEC [48][49][50][51].
We illustrate the mapping of circuit-level QEC codes to statistical mechanics models for a 1D quantum repetition code with faulty circuitry described by a multi-parameter microscopic noise model accounting for a number of single-and two-qubit gate noise processes. Whereas this code does not enable correction of arbitrary errors, current experimental efforts focus on achieving repetitive QEC with this code in the regime of beneficial error suppression for increasing code sizes [52][53][54][55]. We show that in contrast to oversimplified phenomenological noise models, new effective correlated noise processes arise, which we systematically derive and quantitatively relate to error rates of the underlying microscopic noise model.
We anticipate a robust performance gap between efficient, though sub-optimal minimum-weight perfect matching (MWPM) decoding and the fundamental upper limit as established via the phase diagram of the statistical physics model. This demonstrates the viability of this complete mapping approach to identify maximal performance of QEC circuitry.

Noisy quantum error correcting circuits
Here we consider the phase-flip n-qubit repetition code [56,57] from the class of stabilizer QEC codes [58]. The ±1 eigenvalues of the n − 1 stabilizers S i = X i ⊗ X i+1 (with X i the Pauli X matrix on qubit i) form the error syndrome, with error states having at least one non-trivial syndrome. In order to diagnose potential errors, the stabilizers S i are measured by coupling them to n − 1 ancilla qubits. Phase-flip errors Z i are detected, unless they happen on all qubits, which would be indistinguishable from a logical phase-flip Z L = ⊗ n−1 i=0 Z i . Stabilizer measurements on real hardware are noisy as well and thus need to be repeated periodically. Analysing the discrete difference in time between successive measurement rounds results in another repetition code in the time domain, which yields a (1+1)-dimensional space-time syndrome volume. The code-capacity critical error threshold, below which successful QEC is feasible, is p c = 0.5, when the only error source are data qubit errors. The phenomenological noise threshold, where errors are injected with a rate p per round on both ancilla and data qubits is p c ≈ 0.11 [59]. This is identical to the threshold of the toric code with perfect measurements that exactly maps onto the repetition code with phenomenological noise, essentially trading one spatial dimension for the time dimension introduced by repeating stabilizer measurements in time.

Microscopic circuit noise
In contrast to the phenomenological noise models, which ignore the physical reality of the circuit the QEC code is running on, in this work we analyze a multi-parameter circuit level noise model that treats every component of the circuit as faulty, as illustrated in Fig. 1(a1). Here, every operation is followed by a depolarizing noise channel (grey boxes in Fig. 1(a1)). For single-qubit operations, we inject P i (P i ∈ X, Y, Z) with probability λ/3, where λ can represent state preparation (p sp ), idling (p id ), single-qubit gates (p 1 ) and measurements (p m ) [44,60]. Likewise, after a CNOT we inject a non-trivial one-and two-qubit Pauli operator P i ⊗ P j with probability p 2 /15. Errors propagate according to the rules described in Fig. 1(a2). For the special case p sp = p id = p 1 = p m = p 2 = λ the threshold was found to lie at λ ≈ 0.033 for maximum likelihood decoding [61].

Effective noise processes
Any microscopic circuit-level error configuration of the multi-parameter noise model will lead to one of three effective error processes or combinations thereof, called error chains E, with associated error probabilities: (i) a single data qubit phase flip error with probability p, (ii) a measurement error with probability q, and (iii) a correlated error flipping a data-qubit and simultaneously one of the adjacent measurement outcomes with probability r (examples shown in Fig. 1(b1)). Through propagation of Pauli error generators and factorizing the effective error channel(s) we can express an arbitrary Pauli circuit noise model in terms of the three effective error rates p, q and r (see Appendix A for details): Let us emphasize that this correspondence is exact: given an arbitrary set of circuit noise parameters, the above relationship expresses this set of noise parameters in terms of p, q and r exactly to all orders. We gather the error events from the microscopic processes ( Fig. 1(b1)) in the error graph of Fig. 1(b2): here vertical, horizontal and L-shaped lines indicate positions where a dataqubit (i), a measurement (ii) or a correlated error (iii) event has occurred, while colored semicircles indicate the stabilizers triggered by these, respectively. The error graph is used to construct the types of interactions appearing in the classical statistical mechanics Hamiltonian ( Fig. 1(c1)) and to construct the syndrome volume as input for the MWPM decoder.
From the error graph, we can visualize the error processes by introducing the lattice in Fig. 1(c1) where vertical links represent data qubits and horizontal links are measurement steps. The three error processes (i)-(iii) define the fundamental errors events e 1 , e 2 , e 3 shown in Fig. 1(c1): 1. the error e 1 is generated by a data qubit error (i) or a combination of a measurement error (ii) and a correlated error (iii) with probability 2. the error e 2 is generated by a measurement error (ii) or a combination of a data qubit error (i) and a correlated error (iii) with probability 3. the error e 3 is generated by a correlated error (iii) or a combination of a data qubit error (i) and a measurement error (ii) with probability

Statistical mechanics model
Mapping an error model into a statistical mechanics Hamiltonian allows us to compute the probability Pr(Ē) of the classĒ formed of all the errors that differ from a reference (candidate) error chain E by a so-called space-time equivalence [16,23]: A space-time equivalenceσ (simply called equivalence in the following) is a trivial sequence of errors not detected by the stabilizers and thus producing the same space-time syndrome. For the phase flip code, equivalences are given by the action of a first phase flip on a qubit D i at a measurement step t, followed by two measurement errors on the ancilla qubits of the stabilizers S i adjacent to D i and a second phase flip error on D i before the measurement step t + 2 (see (iv) in Fig. 1(b1)). We associate a classical Ising spin σ to each equivalence (e.g. σ 0 in Fig. 1(c1)) such that a spin configuration with σ = −1 effectively describes the error E = Eσ that differs from E by the equivalenceσ. Thus, once we fix E, sampling over all possible spin configurations is equivalent to sampling over all possible errors E in the classĒ, and the probability of the classĒ is Pr(Ē) = {σ} Pr(Eσ). The limiting behavior of the complementary probabilities Pr(Ē) and Pr(EZ L ) with increasing system size allows us to distinguish two regimes: below threshold, either of the probabilities converges to unity and hence applying a correction from the same error class will undo the error with probability approaching unity, above threshold all probabilities remain asymptotically finite, i.e. there is no decoding choice that removes the error with high probability [21].
Given E which is described by h ∈ {±1} and v ∈ {±1} and the set of equivalences {σ}, the probability of the error E can be written as Pr is an Ising Hamiltonian with correlated quenched disorder. The couplings J 1 , J 2 , J 3 and the inverse temperature β are fixed from the error model describing E by the so-called Nishimori conditions (see Eq. (14) in Appendix B). Ferromagnetic or antiferromagnetic interactions are given by the signs of v and h fixed by the error E. In the first sum of the r.h.s. of Eq. (5), σ and σ are the spins horizontally adjacent to the link (e.g. spins σ 1 and σ 2 in Fig. 1(c1)); in the second sum, σ and σ are the spins vertically adjacent to the link (e.g. spins σ 3 and σ 4 in Fig. 1(c1)); in the third sum, σ and σ are the two spins adjacent to the two arms of the L-shaped error e 3 (e.g. spins σ 5 and σ 6 in Fig. 1(c1)). A similar Hamiltonian has been analyzed for the surface code with spatially correlated phenomenological noise [23], whereas here Hamiltonian (5) arises from multi-parameter circuit noise. 3 Numerical results

Monte Carlo simulations
Mapping the QEC code to Eq. (5) allows us to access the critical thresholds of the code from thermodynamic properties of H E (σ) [16]. The error threshold separating the correctable from non-correctable parameter regime of the code corresponds to the phase transition between ordered and disordered phases of H E (σ). We determine this transition from Monte Carlo (MC) simulations and a scaling analysis of the two-body spin correlation length for various disorder strengths (see Appendix C for details). To this end, we transform the lattice in Fig. 1(c1) into the triangular one of Fig. 1(c2) where classical spins (white circles) reside on vertices and couplings J 1 , J 2 , J 3 are represented by vertical (blue), horizontal (green) and diagonal (red) links, respectively. Figure 2 shows the resulting phase diagram for (I) p = q = 2r, (II) p = q = r, (III) p = q = r/2, and for the uncorrelated case (IV) p = q and r = 0 (this reduces to the random-bond Ising model on a square lattice). We find that as the correlated error strength r increases, the thresholds decrease to p c (I) = 0.0925 (25), p c (II) = 0.0725 (25), p c (III) = 0.0475(25) from the uncorrelated threshold p c (IV) = 0.110 (5). Near the Nishimori lines, longer MC runs are required to locate p c and T c accurately. The error in p c is determined within the present MC statistics (see Appendix C).

Minimum weight matching decoder
The thresholds extracted via MWPM provide lower bounds to the thresholds obtained through the statistical mechanics mapping since they correspond to a decoding decision that is not necessarily optimal: MWPM looks for the likeliest errors given a syndrome, which in the statistical mechanics mapping corresponds to minimizing the energy as opposed to the free energy [16]. MWPM therefore disregards the degeneracy of errors, because the likeliest error need not necessarily be in the most likely class of equivalent errors. A discrepancy between MWPM and optimal (maximumlikelihood) decoding is known to exist already for phenomenological noise [62]. In order to make a fair comparison, we optimize the weight metric going into the MWPM algorithm, which uses the information available about the syndrome graph. A detailed derivation of the error weight metric used for decoding and information on the general MWPM strategy are provided in Appendix D.  Figure 3 shows the thresholds from MC simulations (blue points, solid line) and from MWPM decoding (red points, dotted line). The region where MWPM successfully decodes lies within error bars inside the fundamental region of correctability determined with the statistical mechanics mapping. This demonstrates a finite interval between MWPM and a higher fundamental threshold that could be approached or achieved by improving the decoding strategy. To connect the phase diagram to experimental situations, let us discuss the relative strength of r and its meaning. First, p and q both are typically dominated by similar error processes that for many qubit platforms are of comparable error rates, hence we set p = q for simplicity, and additionally, we dial up the strength of r. The known case r = 0 reduces to the phenomenological case without correlated errors, which would correspond to perfect two-qubit gates (p 2 = 0). Increasing r to r = p/2 = q/2 corresponds to p 2 = 5p 1 + O(p 2 1 ), which is roughly compatible with the two-qubit gate being an order of magnitude worse than single-qubit operations (error rate or infidelity) as observed in many experimental realizations [36,40,55,63]. Considering r = p = q corresponds to p 2 being dominant and all other error sources negligible (p sp = p id = p 1 = p m = 0). The case r = 2p = 2q goes beyond the depolarizing noise model that underlies Eq. (1): not only is the two-qubit gate the only error source, it furthermore specifically produces errors leading to the r type instead of p and q, which would be described by an asymmetric depolarizing noise channel biased towards r. Finally, pure r errors (p = q = 0) correspond to a situation where syndrome information is perfectly trustable if interpreted correctly, akin to a repetition code with perfect measurements (with a threshold approaching 0.5 asymptotically). We thus see that the most experimentally relevant region r = p ≈ q is where we also find a clear separation between MWPM and the fundamental threshold. The separation seems to become narrower for larger contributions of r, which is compatible with the expectation in the extreme case of only r errors with an asymptotic threshold of 0.5. From a practical operational viewpoint, the estimated thresholds for ideal decoding and MWPM for p = q = 2r correspond to single-qubit errors p 1 ≈ 0.02 (ideal) vs. 0.017 (MWPM) and the twoqubit error rate p 2 ≈ 0.087 (ideal) vs. 0.074 (MWPM). For p = q = r the threshold is p 2 ≈ 0.14 (ideal) vs. 0.12 (MWPM).

Discussion
Interestingly, in a recent experimental realization of the phase-flip code [55] circuit errors are reported to be well described by Pauli errors. Casting the experimentally obtained error rates into Eq. (1), we find that they correspond to effective error rates p = 0.032, q = 0.0285 and r = 0.0035, which is in excellent agreement with the observation that this experiment is thoroughly in the error-suppression regime.

Outlook
The techniques presented in this work can be extended to more complex complete QEC codes, including surface and color codes, and concatenated codes. We anticipate that as the code circuitry becomes more complex, it will be necessary to include more types of effective noise processes, which in turn will give rise to new statistical mechanics models with even richer interaction and disorder properties. It will also be interesting to extend the mapping technique to non-Clifford dynamics, and to temporally or spatially correlated circuit-level noise, to study the QEC potential of an even broader class of realistic quantum processors. A Effective error model from circuit level noise . . . Figure 4: Cutout of one ancilla and one data qubit of the QEC circuit. All noise contributions of the circuit-level noise model (see Fig. 1(a) of the main text) can be merged into the three channels P, Q and R. The effective error rates are given in Eq. (9).
Circuit-level noise in general, by definition, introduces noise channels at all locations inside the circuit. However for the given QEC code, it is possible to rewrite the noise model by exploiting the fact that many errors injected at different locations effectively lead to the same error and are thus equivalent, which allows us to arrive at a much simpler noise model (while keeping the mapping exact). Let us establish these equivalences as follows: on a data qubit, observe that X-errors do not affect the syndrome, hence for the purposes of decoding X ∼ 1 and thus also Y ∼ Z. On ancilla qubits, the situation is similar, except that the intermediate rotations R Y (±π/2) flip the relevant error: at initialization and read-out, only X-errors are relevant, Z does neither alter the initialization state nor the readout in the computational basis. Between the rotation gates R Y (±π/2), the roles are reversed and only Z is relevant. This immediately lets us move all singlequbit channels to one side, since all noise channels commute (Pauli channels commute) and the Zerrors commute past the control of the CNOT. Similarly, the data qubit single noise channels can be collected into four consecutive single qubit noise channels. Turning to the two-qubit error channels, let us observe that by the equivalence of errors there are four possible distinct errors arising from the two-qubit depolarizing channel: 11,1Z, Z1 and ZZ (omitting the tensor product symbol). There is a single new phenomenon encoded here: looking at the earlier CNOT noise channel, observe that ZZ is equivalent to a simple data qubit error that would have preceded the CNOT and 1Z (data-ancilla) is equivalent to a simple measurement error, Z1 generates a new syndrome phenomenon, which we call an r-type error. The effect of the second two-qubit noise channel is the same, just the roles of the terms being permuted. Assuming a depolarizing probability p 2 of the CNOT gate, defining the operators P = 1 ⊗ Z, Q = X ⊗ 1 and R = X ⊗ Z = P · Q and using hats for superoperator notationM (ρ) := M ρM † we thus have This channel is a mixture of three error events, which we would like to factorize into independent channels. Remarkably, we can make the ansatz of a product of three channels and solve for their error rates.
Note that the square means applying the channel twice, which comes from the fact that we have two two-qubit noise channels acting subsequently, both of the form (6). By symmetry of Eq. (6), the three rates must be equal, they turn out to be λ r = λ p = λ q = 8p 2 /15. For the r-type error we are done, but for the effective single-qubit channels P and Q, it remains to incorporate the four contributions from the single-qubit error channels. This is straightforward in the Paulitransfer-matrix (PTM) representation [64], where we characterize a channel E by the matrix R ij = tr(P i E(P j ))/d with i ∈ {0, 1, 2, 3}. The strength of this representation is that here the action of a sequence of channels is simply given by matrix multiplication of their PTMs. The PTM of a bit-flip channel isÊ and furthermore it is evident that composing bit-flip channels results in a new bit-flip channel, such that we can read off the error rate from computing one of the non-trivial entries on the diagonal. This results in These are the three channels depicted in Fig. 4. Let us comment on the structure of these expressions: we recover the standard circuit-level noise by setting p 2 = p id = p m = p 1 = p sp = λ. In that case, in leading order in λ we recover r/p = 1/6, which can be understood also by simply counting the number of locations weighted by their error probability. While distinguishing between different error rates of different components in the circuit is straightforward, the factorization of the two-qubit noise channel into independent channels crucially relied on the symmetry between the different P i ⊗ P j terms. This suggests that biasing a particular two-qubit noise term would make the prefactor in Eq. (6) non-symmetric and thus hinder the factorization of the channel. As a side note, we remark on a slightly non-intuitive feature in the conventional definition of the depolarizing channel: if we define it (as we do) as a random application of Paulis, complete depolarization (i.e. deterministically receiving the completely mixed state) corresponds to λ = 3/4 (single qubit depolarizing channel) or λ = 15/16 (two-qubit depolarizing channel), which shows that no term in the product inside Eq. (9) becomes negative. Additionally, some works in the literature choose to not use a depolarizing noise for every location but rather put a bit-flip channel at the initialization and measurement locations. The two differ by a rescaling-factor of 2/3, since only two of the three Pauli errors lead to errors at initialization and readout (e.g. in the computational basis a Z acts trivially etc.).

B Derivation of the correlated disordered interacting classical spin model
The technique of the statistical mechanical mapping is used to construct a classical statistical Hamiltonian with quenched disorder for describing the error model of a quantum code. The mapping relies on the identification of the partition function of the Hamiltonian with the probability Pr(Ē) of the classĒ = {E } of errors E that are equivalent up to the action of space-time equivalences to a reference error E. Several ways have been introduced for deriving this mapping [16,23]. Here we will follow mainly the approach of Ref. [23].
Any error E of the repetition code is composed of the fundamental errors e 1 , e 2 , e 3 introduced in the main text that in turn are generated by the three error processes (i) single-qubit phase flip error on data qubits with probability p, (ii) measurement error with probability q, and (iii) single-qubit phase flip errors happening on the target qubit between two adjacent CNOT gates that trigger a correlated occurrence of a measurement error on one of the neighboring ancilla qubits with probability r.
The probabilities of the errors e 1 , e 2 , e 3 are easily calculated and are given by where, for completeness, we added the probability of the trivial error e 0 that corresponds to the absence or the simultaneous presence of all of the three error processes. In a section of the lattice in Fig. 1(c1) of the main text made only of a horizontal and a vertical link (shown also in Fig. 5), the probability of a generic event E composed by the errors e j with j = 0, 1, 2, 3 can be written as where v, h ∈ {±1} are binary variables that take the negative value −1 if the horizontal (vertical) link belongs to the error E. The Boolean functions f j ∈ {0, 1} signal if e j belongs to the error E and they satisfy , otherwise they are zero. From this, they can be written as By substituting the Eqs. (12) in Eq. (11), the probability Pr(E) can be written as where [23] and the couplings J 0 , J 1 , J 2 , J 3 satisfy βJ 0 = 1 4 log (π 0 π 1 π 2 π 3 ) , βJ 1 = 1 4 log π 0 π 1 π 2 π 3 , The syndrome associated with the error E can also be generated by other error chains E = Eσ that differ from E by the action of the space-time equivalencesσ that are trivial sequences of data-qubit errors and measurement errors that are not detected by the stabilizers. Figure 6 shows the lattice with two highlighted equivalences:σ 0 andσ 1 belonging to the bulk and the boundary of the lattice, respectively.   We can associate classical Ising spins {σ} to each of the equivalences such that if a spin σ in this configuration is −1, we are effectively considering the error E = Eσ that differs from the reference error E by the equivalenceσ (for an example of an equivalence applied to an error E see Fig. 7). These Ising spins {σ} can be associated to the plaquettes of the lattice in Fig. 1(c1) of the main text (see also Fig. 7). The data-qubit errors (green vertical links in Fig. 7) entering the equivalenceσ are given by the vertical links of the lattice for which the product σ σ = −1 where the equivalences σ , σ belong to the left and the right plaquettes w.r.t (Fig. 8(a)). Similarly, the measurement errors (blue horizontal links in Fig. 7) entering the equivalenceσ are associated to the horizontal links of the lattice for which the product σ σ = −1 where the equivalences σ , σ belong to the top and the bottom plaquettes w.r.t. (Fig. 8(b)).
The probability Pr(E ) of the error E can also be written as the binary parameters h and v defining the error E are fixed by the parameters h and v of E and by the equivalence σ as follows: • for a vertical link , v = v σ σ where the equivalences σ , σ are on the left and on the right of ( Fig. 8(a)); • for a horizontal link , h is given by h = h σ σ where the equivalences σ and σ are on top and at the bottom of ( Fig. 8(b)).
Substituting the variables h and v in the Hamiltonian H E allows us to compute the probability Pr(Ē) of the classĒ = {E } composed by errors E that are equivalent up to the action of equivalences to a reference error E as where is the Ising Hamiltonian with correlated disorder of the main text and we have neglected the unimportant additive constant J 0 . The couplings J 1 , J 2 , J 3 and the inverse temperature β are fixed from the error model describing E by the Nishimori conditions previously obtained (see Eq. (14)). The type of interactions (ferromagnetic or antiferromagnetic) is given by the signs of v and h that are fixed by the reference error E. In the first sum of the r.h.s. of Eq. (17), σ and σ are the spins horizontally adjacent to the link (e.g. spins σ 1 and σ 2 in Fig. 1(c1) of the main text); in the second sum, σ and σ are the spins vertically adjacent to the link (e.g. spins σ 3 and σ 4 in Fig. 1(c1) of the main text); in the third sum, σ and σ are the two spins adjacent to the two arms of the L-shaped error e 3 (e.g. spins σ 5 and σ 6 in Fig. 1(c1) of the main text). Equation (16) shows that the probability Pr(Ē) of the classĒ can be written as the partition function The limiting behavior of the complementary probabilities Pr(Ē) and Pr(EZ L ) with increasing system size allows us to distinguish two regimes: below threshold, either of the probabilities converges to unity and applying a correction from the same error class will undo the error with probability approaching unity, above threshold all probabilities remain asymptotically finite [21].

C Monte-Carlo simulation study of the phase diagram
In this section we present the details on the numerical simulations of the Hamiltonian H E (σ).
In particular, we describe how we choose the couplings J 1 , J 2 , J 3 of H E (σ) and the methods for obtaining the phase transition points.

C.1 Couplings of the random bond-Ising model
The Hamiltonian of Eq. (17) that we analyse corresponds to a random-bond Ising model on a triangular lattice with couplings J 1 , J 2 and J 3 (see Fig. 1(c2)

Errors
Couplings  of these couplings are fixed by the Nishimori conditions (see Eq. (14)) while the type of interactions (ferromagnetic or antiferromagnetic) is assigned by drawing three random binary variables z p , z q , z r ∈ {±1} with probability p, q, r for each link of the triangular lattice. These variables take the negative sign if one of fundamental processes (data-qubit error, measurement error, correlated error) is present in the reference error E. Therefore, they fix the signs of the variables v and h and thus of the couplings entering the Hamiltonian of Eq. (17) according to Table 1.

C.2 Finite-size scaling and transition points
Determining the code threshold requires in general the computation of the scaling with the system size of the free energy cost of a domain wall as reported in Ref. [16]. However since in [23] the threshold of the quantum code has been proven to correspond to the phase transition point of the statistical mechanics model, in this work we locate the code threshold by looking at the critical point of the random Ising Hamiltonian. In the case of zero disorder the system is completely magnetized and a convenient order parameter is given by the total magnetization: M = x σ x where x denotes a site in the 2D lattice of linear size L (here we consider lattices with L = 16, 24 and 32). Instead of looking at the behavior of M for different system sizes, we define the Fourier transform of the spin correlation function σ 0 σ x aŝ and we extract the correlation length ξ L in the unit of lattice spacing from ξ L = 1 2 sin(q min /2) where q min = (2π/L, 0). The brackets · · · av denote averages over the quenched disorder distribution (in addition to the thermal average). Near the critical temperature T c , the correlation length ξ L is expected to obey a scaling relation [65] of the form ξ L /L ∼ f (L 1/ν (T − T c )), where f is an unknown scaling function and ν is the critical exponent related to the correlation length. Therefore, at the critical temperature T c , the quantity ξ L /L becomes independent of the temperature and T c can be found by locating the temperature at which the lines representing ξ L /L for different L intersect. Some examples of the comparisons of ξ L /L are given in Fig. 9 for the case p = q = 2r. Metropolis steps/spin-swap combination are repeated 10000 times. The spin correlation function is measured during the Metropolis steps. Averages are taken over the Metropolis steps and then these averages serve as one of the jack-knife bins used to determine the statistical error. We use 250 different quenched disorder samples to compute the average of the correlation length over the disorder distribution. Table 2 lists the details of the numerics: for every case shown in Fig. 2 of the main text, when the probability p is fixed to the values reported in the second column of the table, we found the critical temperatures reported in the third column when the simulations are carried with a number of metropolis step N met .
For assessing whether the random bond Ising model has thermalized we perform additional checks near the quenched probabilities at which the thermal transition crosses the Nishimori lines (Fig. 2 of the main text). In Fig. 10, we compare the effects of increasing the Metropolis steps between parallel tempering spin-swap steps. Comparing the two top panels of Fig. 10, we observe that there is no noticeable difference in the crossing of the lines ξ L /L when the number of Metropolis steps is increased from 800 (panel (a)) to 8000 (panel (b)) for the probability p = q = r = 0.070. On the other hand, the comparison of the two bottom panels in Fig. 10 (p = q = r = 0.075) shows that the crossing of ξ L /L for the three different lattice sizes, which seems to suggest a phase transition in panel (c) for 800 Metropolis steps, almost disappears for 8000 Metropolis steps (panel (d)) and the lines of ξ L /L overlap for all the temperatures T 1.35. Therefore we conclude that there is no transition at the quenched disorder probability p = q = r = 0.075 and the threshold probability should be larger than 0.070 but smaller than 0.075.
In Table 3, we list the largest p's at which MC simulations (run for N met = 8000) show a phase transition and the smallest p's at which MC simulations do not show a phase transition for the cases (I)-(IV). Figure 3 in the main text is produced with these data. The central value of these two p's is taken as the estimated threshold probability and a half of the difference is given as the error estimate. A more precise determination of the threshold probability will require even longer Metropolis updates between parallel tempering steps.

D Minimum-weight-perfect-matching decoding
To motivate this decoding strategy, observe that the probability of some error pattern E consisting of n i errors of the i−th type, where i = p, q or r) is proportional to    Table 3: Lower and upper bounds of the probability p between which the system exhibits a critical temperature for the cases (I)-(IV). The threshold probabilities and the associated errors shown in Fig. 3 of the main text for the statistical mechanics model are estimated by the mean and the deviation of these bounds for p.  In this case a transition is identified at a temperature Tc = 1.67. When instead p = q = r = 0.075 after the Nishimori line is crossed, a transition at Tc = 1.35 seems to appear when ξL/L is computed with 800 Metropolis steps (panel (c)). However, when we compute ξL/L with 8000 Metropolis steps (d) the lines do not cross any more and overlap for the temperature less than 1.35. This implies that a transition does not occur when p = q = r = 0.075.   Figure 12: Simplification in computing the distance of two defects (boxes labeled with 1s) in the triangular lattice. Depending on the values of the weights wp, wq, wr the shortest path between any two vertices can always be found by considering error patterns consisting of two types of errors only. If wr > wp + wq (pq case) the shortest path is the regular Manhattan distance of the square lattice, counting the number of p and q edges between the two defects (red path). If wr + wp < wq (pr case) the shortest path is composed by only p and r edges (blue path). Similarly, if wr + wp > wq (qr case) the shortest path is composed by only q and r (green path).
Given some observed collection of stabilizer measurement outcomes (the syndrome), repairing that syndrome amounts to finding a configuration of errors where every violated syndrome (a defect) is the endpoint of one error string, an assignment known as a (perfect) matching in graph theory (Fig. 11). Now Eq. (20) implies that finding the most likely among all these is equivalent to finding an error configuration with the lowest weight w tot = − log(p(E)) = n p w p + n q w q + n r w r , hence minimum weight perfect matching. By creating a graph with all defects as vertices and the weight between every pair of nodes given by the number of error locations times error type weight (w p = − log p 1−p etc.) between the respective stabilizers, this is what is solved by the above mentioned Blossom algorithm (implemented e.g. in [69]). As a side remark, we note that the QEC code at hand does have a boundary, i.e. error strings extending to a boundary qubit only create a single defect, which is not immediately amenable to the above but can be incorporated by creating a copy of the matching graph with weights set to zero and then putting an edge between every defect and its virtual partner with the corresponding weight of matching to the code boundary [70]. Due to the addition of r-type edges the calculation of weights going into the matching problem has to be adapted as well. While in general we would have to compute the shortest path on a triangular lattice with the three types of edges being weighted respectively, which can be done by a Dijkstratype algorithm, we can make a vast simplification by observing that the shortest path between any two vertices in our setting can always be found by considering error patterns consisting of two types of errors only. To see this, observe that any single edge in a certain path can be replaced by the other two edges in the triangle, which are hence of the other two types, e.g. a particular r-edge can be circumvented by moving across the neighboring p-and q-edge. Furthermore all vertices are connected to at least one edge of each type and the weights are globally the same (up to the edge type). Let us assume that w r < w p + w q (if this was not the case the metric would not change compared to the case without r-edges). Now the two possible cases are that either w r + w p < w q or w r + w p > w q . In the former case, we replace all q-moves by the same amount of r-and p-moves, in the latter case we do the opposite and replace all p-moves by q-and r-moves, which shows that given a path, we can always find a path of lower weight by eliminating one of the edge types. In the borderline case w r + w p = w q both replacements are admissible, such that the statement holds that we can eliminate one edge type (see Fig. 12). This shows that a minimum weight path between any two vertices necessarily lies on one of the three sublattices and the path weight on a sublattice is given by the Manhattan distance on the respective sublattice: Here the two superscript labels indicate the sublattice of the triangular lattice, that consists of the two types of edges. In the pq case this is just the regular Manhattan distance, counting the number of p and q edges between the two vertices at hand. In case of pr and qr this is the Manhattan distance on a skewed square lattice, but by simply rotating the basis vectors we can still use the Euclidean position vectors to compute the Manhattan distance on the respective sublattices. The weight function assigning the weight between any two defects then simply takes the minimum over the three weighted Manhattan distances. Having assigned the edge weights, we can then solve the minimum weight matching problem, which amounts to finding the most likely collection of errors explaining the observed syndrome in a trial run, from which we can therefore deduce the recovery operation we should apply to the code (or in which way we should update the "Pauli frame"). The figure of merit we are after is the logical error rate, the probability that the decoding strategy fails. In order to cleanly decide in our simulation whether a logical error happened or did not happen, we initialize a perfect codeword, we then simulate d subsequent QEC-cycles, each consisting of injecting phase-flip errors with probability p on each data qubit followed by a noisy syndrome extraction (the syndrome bit being flipped with probability q). We furthermore inject the third error type needed, namely a correlated flip of the data-qubit accompanied by a flip of one of the syndrome measurement outcomes adjacent to it (depending on the order of the two-qubit gates this deterministically happens on the syndrome bit sitting either to the left or to the right of the data qubit). This type of error is injected with probability r. In order to make sure that we can declare the binary outcome of the trial run logical error/no logical error, we let the final measurement be perfect, i.e. in the final round the syndrome bits never flip (with q or r). The final measurement being perfect ensures that matching up all defects will with certainty put the data qubits back into a code state. The simulation results are shown in Fig. 13. h) Figure 13: The logical error rate obtained with the minimum weight perfect matching decoder on the triangular syndrome lattice. Left column is a broad sweep, right column is a closeup around the transition point. We plot the average logical error rate as a function of the input error rates p, q, and r. Shown are the three cases p = q = r in a)+b), p = q = 2r in c)+d), p = q = r/2 in e)+f) and p = q, r = 0 in g)+h) . On the left for the broader scan, we use 10 4 samples, data points in the right colum are averaged over 10 5 samples. In each case, we observe a transition from error suppression to error enhancement with increasing error rate. This transition is signified by the behavior of the logical error rate when increasing the code size (the distance d): for small p, increasing the distance leads to a suppression of the logical error rate, whereas for error rates beyond an inflection point increasing the code size instead leads to an increased logical error rate. We estimate the threshold from the region where we can not distinguish the logical error rates within errorbars.