Generalized Belief Propagation Algorithms for Decoding of Surface Codes

Belief propagation (BP) is well-known as a low complexity decoding algorithm with a strong performance for important classes of quantum error correcting codes, e.g. notably for the quantum low-density parity check (LDPC) code class of random expander codes. However, it is also well-known that the performance of BP breaks down when facing topological codes such as the surface code, where naive BP fails entirely to reach a below-threshold regime, i.e. the regime where error correction becomes useful. Previous works have shown, that this can be remedied by resorting to post-processing decoders outside the framework of BP. In this work, we present a generalized belief propagation method with an outer re-initialization loop that successfully decodes surface codes, i.e. opposed to naive BP it recovers the sub-threshold regime known from decoders tailored to the surface code and from statistical-mechanical mappings. We report a threshold of 17% under independent bit-and phase-flip data noise (to be compared to the ideal threshold of 20.6%) and a threshold value of 14%$under depolarizing data noise (compared to the ideal threshold of 18.9%), which are on par with thresholds achieved by non-BP post-processing methods.


Introduction
Quantum computing devices suffer from operational errors and decoherence. Methods to keep errors in check and advance towards fault-tolerant Josias Old: j.old@fz-juelich.de Manuel Rispler: rispler@physik.rwth-aachen.de quantum computing involve quantum error correcting codes. A code is formally defined as a k-dimensional subspace of some n-dimensional Hilbert space. In quantum error correcting codes called (quantum) stabilizer codes, the error correction procedure consists of three main elements: the (non-destructive) measurement of stabilizer operators, the decoding of this measurement to obtain a suitable recovery operation, and the application of the latter to correct the errors on the codestate. In order to achieve fault-tolerant quantum computation, all three stages have to be done in a fast and efficient way to avoid the accumulation of errors.
A class of codes which recently received a lot of attention are Quantum Low-Density Parity-Check codes (qLDPC codes) [1]. Given an asymptotically constant rate r = k n , they are proven to achieve fault-tolerance with only constant overhead. That is, the ratio of total number of qubits used in the fault-tolerant protocol to the number of qubits in a non-fault tolerant circuit is asymptotically constant for increasing code size [2]. An important quantity for error correction is the minimum distance. It is the minimum weight (i.e. the number of qubits involved) of a logical operation on the codespace. Naturally, a large distance implies a better protection against errors. Quantum error correcting codes are called good if their rate is constant and the minimum distance scales linearly with increasing code size. The existence of such codes, however non-qLDPC, has been proven for a long time [3,4]. In a recent seminal work, Panteleev and Kalachev showed that it is actually possible to construct asymptotically good qLDPC codes [5]. Shortly after that breakthrough, several other constructions achieved a similar scaling behavior [6,7].
A key ingredient of a fault-tolerant protocol, which will be the focus of this work, is an efficient decoding algorithm. Proofs of fault-tolerance often consider the classical processing of the error correction cycle as free. In practice, this is not the case and decoders should be fast enough to prevent additional errors from occurring. This typically results in a trade-off between decoder accuracy, i.e. how well the decoding algorithm finds a good, ideally the most likely correction and decoder computational complexity, i.e. how fast the algorithm can be executed as a function of the input size. There are a variety of decoders that are highly adapted to the quantum code in use. In this paper, we consider a generalization of a classical decoding algorithm called belief propagation (BP) or sum-product algorithm. For classical LDPC codes, it is among the best practical decoding algorithms: it allows to achieve error correction close the theoretical upper bound on the information transfer, the Shannon capacity [8], while remarkably maintaining low algorithmic complexity. In principle, any classical decoding algorithm can be used to decode quantum codes, which stems from the fact that any quantum code can be formulated as a so-called CSS (Calderbank-Shor-Steane) code, which decouples into two classical codes that can (in principle) be decoded independently. However, this has many pitfalls. The most relevant one for us here is the observation that belief propagation on the surface code can completely fail to converge or converge to decoding decisions that do not remove the error. This is thought to be the underlying reason for the observation that the surface code produces no sub-threshold regime (the regime of error rate, where the logical error rate can be arbitrarily suppressed by increasing the code size) when decoding with BP [9]. This is in stark contrast to tailored quantum decoders such as minimum weight matching, under which the surface code is typically celebrated for its high threshold value [10]. Recently introduced post-processing methods to BP showed that these can in principle decode various types of qLDPC codes [11,12]. We will focus on a single decoder that achieves the same at a lower complexity. The Generalized Belief Propagation (GBP) algorithm was previously considered in the context of decoding quantum bicycle codes [13].
This paper is structured as follows. First, we review the basics of stabilizer error correction and quantum LDPC codes. In Section 2 we show the relation of GBP to standard BP and adapt it for the decoding of quantum codes. Section 3 applies GBP to surface codes and shows numerical evidence of the emergence of a threshold.

Stabilizer Codes
Stabilizer codes are defined by an Abelian subgroup of the Pauli group, S ⊆ P n with −1 / ∈ S. The (commuting) elements of the group are the stabilizers or parity checks S ∈ S. The codespace Q is then defined as the subspace of the Hilbert space H n that is stabilized by S. For any codestate |ψ⟩ it therefore holds that S |ψ⟩ = |ψ⟩ ∀S ∈ S. If there are n − k independent generators for S, |⟨S⟩| = n − k, then the codespace is k-dimensional and encodes k qubits.
Consider a Pauli error E ∈ P n occurring on a codestate, |ψ⟩ → E |ψ⟩. Such an error can be detected by measuring all stabilizer generators, if any of those anticommute with the error, SE |ψ⟩ = −ES |ψ⟩ for some S ∈ S. The binary outcome of all stabilizer measurements is also called the syndrome or syndrome vector s ∈ GF(2) n−k with Here, s c denotes the measurement outcome of stabilizer generator S c for c = 1, . . . , n c = n − k. This gives rise to three different scenarios. We say there occurred a 3. logical error if s = 0 and E / ∈ S.
The last case represents the operators that map non-trivially between codestates, the logical operators. They can formally be defined using the centralizer of S in P n , C P (S) = {L : LS = SL ∀S ∈ S} , such that the logical operators are L = C P (S) \ S. The (minimum) distance of the stabilizer quantum code then corresponds to the minimal weight of a logical operator,

Decoding of Stabilizer Codes
The decoding problem refers to the inference of a suitable correction from the measured syndrome. Because trivial errors have zero syndrome, they define an equivalence class for every detectable error. This feature called degeneracy implies that corrections only need to be found up to a trivial error. Due to the linearity of the codes, errors up to weight t = ⌊ d−1 2 ⌋ can be uniquely matched to a codestate and hence be corrected for. A simple decoder involves a lookup-table which stores a suitable correction, for example the lowest-weight error matching each measured syndrome. Making assumptions on the error probabilities can improve this approach.
To that end, we consider Pauli error channels, Furthermore, we assume that the qubits are memoryless and suffer from errors independently, We can write the probability of errors conditioned on the observation of a syndrome using Bayes' rule and the fixed "evidence" p(s) = 1 as By δ(i = j) we denote the Kronecker delta δ ij . It assigns zero probability to all error configurations E that have a commutation relation inconsistent with the measurements. In quantum error correction, the ideal decoder returns an error guess, which is in the most likely error class, specified by all errors that are equivalent up to an element of the stabilizer group. Given a syndrome, this maximum likelihood decoding identifies Note, however, that directly calculating the probabilities for an error class (or even just storing the lookup-table) quickly becomes intractable since there are 2 (n−k) different syndromes. For example, storing all syndromes of a code defined by 42 independent stabilizer generators requires approximately 550GB of memory.
Possibly more efficient decoding strategies rely on relaxed constraints such as finding the • most likely error, i.e. identifying E ⋆ = arg max E∈Pn p(E|s) (8) or the • qubit-wise most likely error, i.e. identifying where p q (E q |s) = q ′ ̸ =q p(E|s) are the singlequbit marginal probabilities.
Note that these equations are agnostic of the quantum nature of the underlying problem and have been studied extensively in various settings including classical decoding. Finding the most likely error is already less involved than finding the most likely error class, but is in general still NP-complete [14]. Calculating p(E|s) directly and even inferring the marginal probabilities p q (E q |s) still involves an exponential number of components. However, there exist algorithms that can -under certain conditions -calculate the marginals in linear complexity O(n). This comes at the cost that the qubit-wise most likely error might globally be inconsistent with the observed syndrome. Before introducing such an algorithm, the belief propagation algorithm, we fix the notation and graphical representations used throughout this paper.

Representation of Stabilizer Codes
Algebraic representation The stabilizer group and most operations used in stabilizer error correction can be mapped from the Pauli group to vector spaces over finite fields (or Galois Fields), denoted by GF(q) with q = {2, 4} [15].
In both cases, the Pauli word E is mapped to a vector e ∈ GF(q) of length (3 − q 2 )n. The group operation is mapped to element-wise addition on the finite fields, EE ′ → e + e ′ . Commutation of two Paulis E, E ′ can be checked using the symplectic product denoted by ⋆, Purple circles correspond to qubits, green squares to parity checks. In the quaternary representation, blue edges correspond to X-type checks and red edges to Z-type checks. Note that since the Steane code is a CSS-code, the binary Tanner graph splits into two disjoint graphs T = T X ⊔ T Z .
The stabilizer generators are put in a parity check matrix H ∈ GF(2) (n−k)×2n or H ∈ GF(4) (n−k)×n , such that the measurement of all stabilizers can be represented by the symplectic matrix-vector product The actual implementation in the binary and quaternary framework can be found in App. B.

Tanner graph representation
Stabilizer codes can be graphically represented as Tanner graphs, similar to classical codes [16]. These are bipartite graphs with two vertex sets Q and C representing the qubits and the stabilizer measurements/ parity checks respectively. The edge set E consists of edges e = (q, c) drawn between vertices q ∈ Q and c ∈ C if qubit q is involved in the parity-measurement of stabilizer c.
In the algebraic representations, the parity check matrices are

Low-Density Parity-Check Codes
Low-Density Parity-Check (LDPC) codes are families of classical codes with a sparse paritycheck matrix. The most successful classical LDPC codes rely on random or pseudorandom constructions of the parity-check matrix, most notably Sipser and Spielman's Expander Codes [18]. Their properties include a constant rate, a linear distance and efficient decoders, which is often referred to as good code [19]. This has led to try and construct quantum versions of LDPC codes (qLDPC codes) in the hope of obtaining good quantum codes with a constant rate and distance linear in the number of qubits. In addition to having good decoding properties, it was famously shown by Gottesman that such codes, if they exist, enable fault-tolerant quantum computation with only constant overhead [2].
In general, qLDPC codes can be defined similarly to classical LDPC codes as codes with a sparse parity check matrix. This corresponds to quantum stabilizer codes with stabilizer generators of low weight that is upper bounded by a constant. In particular, a (d c , d q )-qLDPC code ensemble has parity checks measuring at most d c qubits and every qubit is involved in at most d q syndrome measurements. This broad definition includes a range of well known codes like the surface codes. They are defined on a lattice, therefore exhibit a high degree of symmetry and have only nearest-neighbor interaction [20]. They have a minimum distance d ∝ √ n but suffer from a vanishing rate r → 0 as the number of qubits n → ∞. A more general construction, the hypergraph product (HGP) codes, can achieve a constant rate r → 1 − dc dq , when based on good (e.g. random) classical codes [21].
Very recently, a construction that builds on a G-lifted product of expander codes over nonabelian groups G by Panteleev and Kalachev were proven to achieve constant rate and linear distance [5]. Similar constructions like the Quantum Tanner Codes or codes from balanced product of lossless expanders achieve the same [6,7].
The advantageous properties of good qLDPC codes manifest at large qubit numbers. For near future applications, a moderate number of qubits in the order of a few hundred is realistic. It is therefore reasonable to focus on the less intricate hypergraph product construction, which we will briefly recap in the following.

Hypergraph Product Codes
The hypergraph product construction uses graph based arguments to derive quantum codes from classical codes. For details refer to [21], in the following considerations the construction rule for the parity check matrices shall be sufficient. Let which has its parity check matrix constructed from the classical parity check matrices as If we choose the base codes to have minimum distance linear in its length, the HGP construction gives quantum codes with minimum distance Ω( √ n). The construction trivially preserves sparsity and therefore translates a classical LDPC property to a quantum LDPC property. Some choices of base codes give well known quantum codes.
• Taking the classical (cyclic) repetition codes as base gives the topological (toric) surface codes [20]. They have vanishing rate and a distance d ∝ √ n. The graph based construction is shown in Fig. 2. • Taking the product of two (good) classical expander codes yields the quantum expander codes [22]. These codes have constant rate and a minimum distance d ∝ √ n. A slightly simplified version uses random classical codes that with, some known probability, have specific expansion properties. For such codes, a well known and widely used method for decoding is belief propagation.

Generalized Belief Propagation
We now introduce Generalized Belief Propagation due to Yedidia, Freeman and Weiss (YFM) [23]. We then show how a decoder for quantum codes can be constructed from that.
The Tanner graph introduced in Sec. 1.3 can also be thought of as an instance of a factor graph representing a joint probability distribution over factors f [24], In a physical system in thermal equilibrium, Boltzmann's Law gives the probability of a state, P is the space of all possible states x and β the inverse temperature which we set to 1 in the following. A connection between the two can be drawn by identifying the probability distributions and therefore defining an energy E(x) of a state x of the factor graph to be The ultimate goal in quantum decoding is to maximize the probability distribution of the errors, which can be seen to correspond to finding the minimum free energy configuration. While this generally will be computationally unfeasible, one can construct a tractable variational ansatz. In the following, we will present GBP appealing to a general intuition for variational methods and refer the reader to the appendix A for background on variational methods and a detailed derivation of GBP.

Derivation of the GBP Algorithm
Generalized Belief Propagation relies on regionbased approximations to the free energy. These are a class of approximations to F (b), where the approximate free energy is a function of beliefs over sets of variables, called regions. From now on, we will call the factor graph the Tanner graph, the factors check nodes and variables qubit nodes, to facilitate the transition to the decoder based on GBP.
We can define a region r of a Tanner graph as a set of qubit nodes Q r and a set of check nodes C r such that if a check node c is in C r , then all neighboring qubit nodes {Γ(c)} are in Q r . The general idea is to define regions of the factor graph and then approximate the overall free energy with the sum of the free energies of all the regions, subject to the conditions ensuring validity which are shown in the following.
The thermodynamic quantities of a region are defined as • region average energy and region entropy • region free energy When constructing regions, every check and qubit node should be contained in some region. Since check and qubit nodes might appear in multiple regions, it is necessary to introduce counting numbers c r in order to ensure that every check and qubit is only counted once when summing over regions. They need to be chosen such that for a set of regions R of a Tanner graph The overall, region-based approximate thermodynamic quantities are then given by • region-based average energy and regionbased entropy • region-based free energy Note that not for every choice of regions, a valid set of counting numbers can be found. To understand that, consider a valid choice of regions ∇ = {r i } with the corresponding counting numbers {c i }, such that for qubit q, Eq. 27 holds. Now adding any qubit q to a region r ′ with counting number c r ′ and q / ∈ r ′ results in which is only true iff c r ′ = 0. Also note that different choices of regions can yield approximations of different quality [25].

Region graph
Similarly to the Tanner graph, the relations of different regions can be formalized in a graph theoretical framework. To that end, YFM introduce the region graph (RG). It is a labeled, directed graph G = (R, E, L) in which each vertex corresponds to a region r ∈ R. A directed edge e ∈ E may exist from region r p to r c if (Q(r c ) ∪ C(r c )) ⊂ (Q(r p ) ∪ C(r p )), i.e. if the set of constituents of the child is a subset of the parents constituents. An example of a valid region graph for the Steane code is shown in Fig. 3.  Figure 3: Example of a region graph for one part of the Steane code (corresponding to the classical Hamming code). Green and violet numbers represent checks and qubits respectively, counting numbers of the regions at top left corner. The "shadow" of the red bordered region containing qubits 0, 1 is shaded in blue, the "blanket" in orange. See 2.1.2 for the definitions of shadow and blanket.

Notation
We adopt the notation from [26] (which differs slightly from YFM [23]) and adapt it to the error correction setting, • R: set of all vertices (=regions) of the region graph, • E: set of all (directed) edges of the region graph, • P (r), C(r), A(r), D(r): set of all parents, children, ancestors, descendants of region r, • Q r , C r : qubits and checks in region r, • S(r) = D(r) ∪ r: "shadow" of the region r, • B(r) = P (S(r)) \ S(r): "blanket" of the region r.

Algorithm
We can recover the beliefs as approximations of the true marginal probabilities by minimizing the region-based free energy F R ({b r }). To that end, a Lagrangian is constructed with the constraint that the beliefs shall be consistent between every parent and child region, i.e.
and a normalization constraint Setting the derivatives of the Lagrangian with respect to the beliefs equal to zero gives implicit equations for the Lagrange multipliers and the beliefs, as shown in [27]. They can be solved iteratively and for that reason, the Lagrange multipliers (or functions thereof) are often called messages and the corresponding algorithms are referred to as message-passing algorithms [19]. In the following we show a more intuitive approach. For that, assume that a belief of a region first contains all local factors in that region. Messages from parent regions p into child regions r will be of the form m p→r (x r ). In order to catch all possible dependencies, we consider all messages that include some variables that are contained in that region. However, over-counting of factors should be prevented. This can be achieved by first considering all messages from parents into r. Secondly, we take into account all messages from regions which are not descendants of r but point into its descendants D(r). This corresponds to the shadow and blanket of a region, such that an ansatz for the belief of a region r can be written as By x c we denote the variables in the support of check c. The message update rules follows from demanding consistency between parent and child regions (Eq. 32), giving the Parent-to-Child-Algorithm. In the following, the upper index (i) denotes the iteration step in order to formalize the iterative procedure. Note that with a uniform initialization of the messages, the beliefs of parent-and child regions p, r are incompatible at step i = 0. We therefore update the message from p to r by that mismatch, The overlap of messages in the numerator and denominator can then be canceled out to reduce the number of calculations. In that form, it becomes clear that consistent beliefs correspond to converged messages.

GBP as a Decoder for Quantum Codes
In order to draw the connection to quantum decoding, we identify the variables with the qubits. The factors, i.e. functional relations between qubits correspond to Kronecker delta tensors with entries according to the measured syndrome outcomes, The state of the system x coincides with the Pauli error E such that in the binary representation x = (x 1 , x 2 , . . . , x 2nq ) ∈ GF(2) 2n with x q ∈ {0, 1}. In the following, we denote the errors by x in order to avoid confusion with the energy. Note that the algorithm introduced above includes both an implementation separating the Xand Z-part of a CSS quantum code (GF(2)) and a combined (GF(4))-implementation. The differences then lie in the length of the vectors, the set of messages used to calculate region beliefs and in the syndrome function. The latter uses implementations of the symplectic product Eq. 53 and Eq. 57 in the GF(2)-and GF(4)-framework respectively.
At each iteration, the parent-to-child algorithm defined by Eq. 36 gives an estimate of the marginal probability distribution of region r. These beliefs can be used to infer a guess of the error inflicted on the qubits.

How to make a hard decision
The argumentum maximum (argmax) of the belief of a region gives the most probable error on the region's qubits. This is often referred to as making a hard decision. In the end, a hard decision has to be taken for every qubit individually, whereas every qubit might be part of multiple regions. Consistency of hard decisions from different regions is only guaranteed if all messages are converged. This is in general not the case while iterating and also not guaranteed to happen at all. We therefore have to find a strategy to combine the different contributions from the region beliefs to the overall hard decision. A straightforward strategy to choose a hard decision is to focus on the highest level regions R 0 : {r ∈ R : c r = 1} and make a hard decision aŝ Naturally if the region-beliefs are not compatible, the overall error guess will not be consistent with the observed syndrome. In order to improve upon that, we find a more promising strategy. Whenever the error guesses from different regions on a single qubit are incompatible, we compare the beliefs of their respective regions and settle for the one with the largest belief. This corresponds to decoding aŝ In the following, we first show how standard Belief Propagation can be recovered from this more general approach and then show two different strategies to cope with further obstacles.

Bethe approximation
One choice of regions called Bethe approximation gives an approximation equivalent to the standard BP algorithm. It was originally introduced as sum-product decoding for (classical) LDPC codes by Gallager [28]. Later it was independently rediscovered by Pearl as a method to efficiently calculate single variable marginals on factor trees [29]. Poulin and Chung first applied BP to the decoding of quantum codes [30]. For an introduction on the BP algorithm, see for example [31,12]. Here, we show how to get the standard BP equations from the generalized ansatz.
For this purpose, construct two types of regions, large and small. The large regions each contain a single check node and its neighboring qubit nodes. The small regions comprise single qubit nodes such that all qubit nodes that have more than one parent have their own small region. The counting numbers are c r,large = 1 and c r,small = 1 − q∈Qr |P (q)|. The beliefs of small regions {l i } and large regions {L i } are given by Using the consistency constraint Eq. 36 we find for the message updates The definition of the second type of messages (qubit to check) in the last line recovers the initial BP equations 60, 61 when identifying Q c ≡ Γ(c) and P (q) ≡ Γ(q). Note that, instead of following rule Eq. 38 or 39, standard BP thresholds the beliefs of the small regions Eq. 41.
Intuitively, the BP algorithm operates on the Tanner graph of the code by sending messages along its edges. Starting on the qubit site, it assigns initial probabilities of error (e.g. depending on the error channel) to the qubit nodes. This information is sent to the parity check nodes along the edges. The parity check nodes collect all incoming messages and send back a message to all adjacent qubits. In its components, this message contains the sum over all configurations compatible with the observed syndrome, excluding the target qubit. Subsequently, the marginal probability of the qubits is calculated according to the incoming messages. If not compatible or converged, these are sent back excluding the receiver's information. This procedure is shown graphically in Fig. 4.

Numerical results
Using this choice of regions, we obtain the decoding performance for hypergraph product codes based on random matrices and for topological surface codes shown in Fig. 5. The random codes are (7, 4)-qLDPC and the surface codes are (4, 4)-qLDPC. We see that the random codes show a good performance in the sense that increasing the distance of the code increasingly suppresses failures. The surface codes however show the opposite behavior. In both cases, the primary cause for a decoding failure is not a logical error, but a failure to return an error compatible with the syndrome or a failure of convergence.

Success and obstacles using BP
The disadvantages and problems involved in classical BP decoding are extensively covered in the literature. These mainly concern harmful patterns in the Tanner graphs of the code due to cycles or trapping sets [32]. Their existence in classical codes translates to quantum codes when     Figure 6: Split belief on a patch of the surface code, decoded with BP. Tanner graph representation with qubits as purple circles, parity-checks as green squares. X-Paulis are represented by blue and Z-Paulis by red edges. The initial error is indicated by filled qubits, the violated checks by filled squares. The error guess of the decoder by the heptagons. The Z-error on qubits Z 12 Z 31 violates parity checks {6, 9}. The (degenerate) error pattern Z 7 Z 30 is symmetric to the original one. The standard BP decoder returns the union of all those qubits Z 7 Z 30 Z 12 Z 31 with empty syndrome leading to a decoding failure.
using constructions based on the classical codes, like the hypergraph product construction. This also means that classical methods like message scheduling can be used to alleviate such problems [9]. The nature of quantum codes themselves introduces new obstacles to BP decoding. Certain syndromes allow for different error configurations, that can be translated to each other by symmetry transformations in the corresponding Tanner graph.
In a qubit-wise decoding fashion, the BP decoder assigns the same probability to each qubit involved in such configurations. With this split belief, the decoder thresholds all those qubits to the same error guess and therefore fails to converge. Some scheduling methods can break these symmetries but there is no general method to avoid them. Using irregular base codes and graphs of odd degree distribution also reduces the amount of symmetry, helping the decoder. An exemplary split belief is shown and explained in Fig. 6.
Because surface codes are highly symmetrical, lots of such split beliefs occur during decoding, which explains their bad performance. HGP codes with random base codes however show a good decoding performance because their local topology is inherited from the random classical codes that do not exhibit split beliefs.
There are two main strategies for improving the performance of BP. The first makes changes to the algorithm itself, Memory Belief Propagation for example shows good decoding performance at the cost of a slightly higher complexity [33]. A version of GBP was used to improve the decoding performance in quantum bicycle codes [13].
Other methods use the soft output of the BP algorithm (i.e. the marginal probabilities) and use them as input for post-processing methods. Notably Ordered Statistics Decoding allows to apply the combined BP+OSD decoder across a wide range of qLDPC codes, again at the cost of a higher complexity [11,12].

GBP for the Surface Code
We use the same region graph as used for the Bethe approximation in Sec. 2.2.2, i.e. with large and small regions. The reason for that choice is that for surface codes, this construction method is equivalent to using the cluster variational method, a standard method for constructing valid region graphs [27]. This choice of regions ensures that the smallest split beliefs, i.e. those related to two syndrome excitations, are resolved. The transition from the Tanner graph to the region graph is shown for a binary implementation in Fig. 7.
We implement the hard decision based on Eq. 38. We show how this helps with the prototypical split belief from Fig. 6 in Fig. 9. Using this hard decision procedure, we might still get a split belief within a single large region. However in our simulations, we observe that this is not the case, i.e. the arg max of the region is unique towards the end of the iterations.

Split and Repeat
We observe that while decoding based on the region beliefs using Eq. 39 improves upon standard BP, there still exist error guesses incompatible with the overall syndrome. However, applying the proposed correction usually reduces the syndrome weight. This is similar to one decoding iteration in the small-set flip algorithm (SSF), proposed by Leverrier, Tillich and Zémor [22]. In the SSF-algorithm, a configuration of qubits in the support of a stabilizer is flipped, if it decreases the syndrome weight. This works well for quantum codes with a sufficiently expanding Tanner graph. In surface codes however there are constant weight error patterns, that lead to failure of the SSF-algorithm. Our hard decision heuristic after the GBP inference that reduces the residual syndrome weight suggests that this issue can be overcome.
We use this insight to formulate a split and repeat procedure: After a run of GBP, we save the current error guess and reinitialize the decoding procedure with the syndrome of lower weight and a rescaled error probability. We repeat this until there is an empty overall syndrome. The overall error guess then is the sum of all intermediate errors. This algorithm is shown in Alg. 1. An exemplary run for a particularly harmful error pattern on a distance 9 surface code is shown in Fig. 8. We also see that the free energy is reduced during the course of decoding correspondingly.

Dependence on Initial Probabilities
An additional parameter of the decoding procedure is the initial probability. As mentioned by Algorithm 1: GBP split repeat decoding for surface codes.
Input: Parity-Check matrix H, syndrome s, a-priori probability p init , maximum number of iterations and repetitions n mi , n mr Output: Error guessê RG = region graph constructed from H with Bethe approximation i = 0 e total = 0 s i = s while i < n mr dõ p init = |p init − wt(ê total )/n qubits | e i = GBP(s i , RG,p init , n mi ) e total =ê total +ê i if σ(ê total ) =: s GBP = s then returnê =ê total else Hagiwara et al., a linear decoder with a fixed initialization can correct a certain error set, independent of the error probability [34]. Kuo and Lai remark that choosing a fixed initialization can prevent fluctuations and increase decoding stability [33]. In our simulations, we observe that a fixed initialization error probability can lead to decoding failure. We therefore consider two further adaptions.
On the one hand, when re-initializing after a split procedure, we rescale the channel error probability by the weight of the current error guess, see Alg. 1.
On the other hand, when decoding still fails, we reinitialize the whole decoder with different initial probabilities. Assuming a good decoding performance when p init is close to the channel error probability, we sample the new initial probability from a Gaussian with width 0.1 around the channel error probability.

Numerical Results
Remarkably, in our simulations we never observe a decoding failure and the decoder always returns  Vertical lines represent the re-initialization after convergence or maximum number of iterations reached. We see that we can escape the oscillatory behavior by reinitializing.
an error guess that puts the corrupted word back to the codespace. The results are shown for independent XZ-noise and depolarizing noise in Fig. 10 and 11 for the binary and quaternary implementation respectively. They generally show a decreasing logical error probability with increasing distance and a crossing indicating a threshold. The results for the threshold are summarized in Tab. 1.
Complexity In a naive implementation, directly adapting Eqs. 34 and 36, the GBP algorithm requires • q dc multiplications per check region c →≤ n c q dc , • q dq multiplications per qubit region q →≤ n q q dq ,  Figure 9: Split belief on a patch of the surface code, decoded with GBP. Tanner graph representation with qubits as purple circles, parity-checks as green squares. X-Paulis are represented by blue and Z-Paulis by red edges. The initial error is indicated by filled qubits, the violated checks by filled squares. The error guess of the decoder by the heptagons. The Z-error on qubits Z 12 Z 31 violates parity checks {6, 9}. The (degenerate) error pattern Z 7 Z 30 is symmetric to the original one in the sense that they have equal weight. After i = 4 iterations, the GBP decoder returns the correct error, successfully decoding the split belief.

Summary and Outlook
We developed a decoder based on Generalized Belief Propagation using a specific hard decision method and an outer re-initialization loop. With these adaptations, the decoder is able to decode the surface code. The logical error probability decreases with growing distance below a certain qubit error probability indicating the emergence of a threshold of about 14% for depolarizing noise and 17% for independent bit-and phase-flip noise. As is typical for practical decoders, these values fall short of theoretical upper bounds but offer a lower decoding complexity. When comparing to known decoding algorithms, our decoder shows similar threshold values compared to BP-OSD. The main ingredient to the OSD-post processing is matrix inversion, which scales with the third power in the number of rows, i.e. O(n 3 c ) [12]. Minimum weight perfect matching achieves a higher threshold but scales in general as O(n 3 q ) [37]. The almost linear time Union find decoder also achieves a slightly higher threshold [38].
Future work can include a lower complexity implementation that is based on log-likelihoodratios, which is frequently used in standard BP algorithms to reduce complexity. This would al- Table 1: Thresholds from error sampling on the XZchannel and the depolarizing channel for binary and quaternary implementation. The thresholds fall short of the optimal thresholds obtained by statistical mechanic methods but are similar to the ones from BP-OSD decoding. The optimal threshold for the XZ-channel is obtained from the single-Pauli threshold p X th ≈ 10.9% as Ch. q p th BP-OSD optimal

GF(2)
GF(4)   low the decoder to be tested for more general quantum LDPC codes, where finding a fast and general decoder is ongoing research.
Additionally, all simulations were performed with code capacity noise, i.e. the data qubits experience noise through the quantum channel and the syndrome readout is assumed perfect. A next step on the road towards fault tolerance is to extend the decoding scheme to more realistic noise models, for example including faulty syndrome measurements. There are recent results suggesting that belief propagation is also suitable for syndrome noise when using repeated measurements and even in a single-shot decoding scheme [39,40].

Simulation Methods
The decoder is implemented in a GF(q) formalism with both q = 2 and q = 4 in C++ making use of libraries libDai [41], the lemon Graph library [42], xtensor [43], NTL [44] and nlohmann JSON headers [45]. The code can be found on github [46]. under grant agreement number 820495 and by the BMBF project MUNIQC-ATOMS. This research is also part of the Munich Quantum Valley (K-8), which is supported by the Bavarian state government with funds from the Hightech Agenda Bayern Plus. MR was supported by ERC grant EQEC No. 682726 during the initial part of this work. Simulations were performed with computing resources granted by RWTH Aachen University under project thes1045.

A Variational Methods in Statistical Mechanics
The paradigmatic example of a variational method in quantum mechanics is the Ritz method 1 . Here, the setting is that we are given a Hamiltonian and the task is to find its groundstate. Since this task is in general computationally hard already for the simplest non-trivial practically relevant Hamiltonians, it is fruitful and instructive to develop systematic methods to construct trial wavefunctions that approximate the ground state wavefunction while retaining computational feasibility. The fundamental insight here is that the energy expectation value of the problem Hamiltonian evaluated on any state is lower bounded by the expectation value of the true ground state, which essentially only relies on the fact that we can expand |ψ⟩ in the Hamiltonian eigenbasis, upon which the statement follows immediately.
This paradigm of systematically constructing trial states can be extended to mixed states and finite temperature. The role of energy is taken over by the (Helmholtz) free energy F = E − T S = − 1 β log Z, where S is the entropy and Z = tr exp −βĤ the partition function. With respect to the free energy, any trial state fulfils the Bogoliubov inequality trF ρ trial ≥ trF ρ c . (46) Note that this contains the Rayleigh-Ritz inequality in the zero temperature limit. It can be proved by exploiting the Gibbs inequality which holds for any A, B ≥ 0 provided tr A = tr B. Plugging in the canonical ensemble (or Gibbs state) ρ c = exp −βĤ /Z for B leads to −S(ρ trial ) + β trĤρ trial + log Z ≥ 0, which can be rearranged into the Bogoliubov inequality, stating that the free energy of any trial density matrix ρ trial is lower bounded by the free 1 colloquially often known as Rayleigh-Ritz energy of the canonical ensemble. This permits the extension of the variational principle to mixed states, where we now try to approximate the Gibbs state by a trial density matrix. In the context of the present work, we are dealing "only" with probability distributions, so let us point out the rather trivial fact that the above inequality in particular also holds whenĤ is diagonal and the density operators are simply multivariate probability distributions.

A.1 Minimization of the Free Energy
For a variational ansatz, we introduce the trial probability distribution, the belief b(x). Equivalent to Eq. 48, it holds that the free energy of such a trial probability distribution is lower bounded by the free energy of the real distribution, i.e. Due to the intractability of a brute force approach, the trial functions are generally restricted or approximated by some factorized form.

B Representation of Stablizer Codes
Binary representation q = 2 . The binary representation maps Pauli words of length n to binary vectors of length 2n. We can represent any Pauli word (up to a phase) by E = X ex Z ez , where e x , e z ∈ GF(2) n . The binary representation then are the concatenations H = (H X , H Z ) and e = (e x , e z ) and it holds The group operation is naturally represented by addition, if the Paulis are mapped via In order to define the symplectic product, two more definitions are needed, the conjugation and trace in GF(4), • Conjugation: GF(4) → GF(4) : α → α = α × α,
In words: the vertex set of the resulting graph is the Cartesian product of the vertex sets of the graph factors. There is an edge between vertices in the resulting graph if any of their partial vertices shared an edge in their graph factor. The graph constructed from the Cartesian product of two bipartite graphs is again bipartite. In order to derive a code, the vertex set of the new graph is partitioned into N 1×2 = Q ∪ (C X ∪ C Z ) with • qubits: Q := {n 1 n 2 |(n 1 ∈ V 1 ∧ n 2 ∈ V 2 ) ∨ (n 1 ∈ C 1 ∧ n 2 ∈ C 2 )} • X-type stabilizers: C Z := {n 1 n 2 |(n 1 ∈ C 1 ∧ n 2 ∈ V 2 )} • Z-type stabilizers: C X := {n 1 n 2 |(n 1 ∈ V 1 ∧ n 2 ∈ C 2 )} Chosen like that, the graph corresponds to a Tanner graph of a quantum CSS code. The commutation condition is fulfilled since whenever n i ∈ V i is adjacent to n ′ i ∈ C i in T C i , there are exactly two vertices (qubits) in Q which are adjacent to the constructed X-type stabilizer n ′ i n j and Ztype stabilizer n i n ′ j : n ′ i n ′ j and n i n j . Twofold anticommutation then gives commutation.

D Belief Propagation Equations
We denote by Γ(•) the neighbors of node • and by σ(•) the parity of configuration •. For a detailed description of the steps, see text.