Multi-path Summation for Decoding 2D Topological Codes

Fault tolerance is a prerequisite for scalable quantum computing. Architectures based on 2D topological codes are effective for near-term implementations of fault tolerance. To obtain high performance with these architectures, we require a decoder which can adapt to the wide variety of error models present in experiments. The typical approach to the problem of decoding the surface code is to reduce it to minimum-weight perfect matching in a way that provides a suboptimal threshold error rate, and is specialized to correct a specific error model. Recently, optimal threshold error rates for a variety of error models have been obtained by methods which do not use minimum-weight perfect matching, showing that such thresholds can be achieved in polynomial time. It is an open question whether these results can also be achieved by minimum-weight perfect matching. In this work, we use belief propagation and a novel algorithm for producing edge weights to increase the utility of minimum-weight perfect matching for decoding surface codes. This allows us to correct depolarizing errors using the rotated surface code, obtaining a threshold of $17.76 \pm 0.02 \%$. This is larger than the threshold achieved by previous matching-based decoders ($14.88 \pm 0.02 \%$), though still below the known upper bound of $\sim 18.9 \%$.


Introduction
Quantum information processing (QIP) provides a means of implementing certain algorithms with far lower memory and processing requirements than is possible in classical computing [1,2]. The analog nature of quantum operations suggests that devices which perform QIP will have to function correctly in the presence of small deviations between ideal operations and those that are actually implemented, a property called fault tolerance [3,4]. In addition, many external sources of error exist in current QIP devices, making fault tolerance a practical necessity as well as a theoretical one.
In order to design fault-tolerant QIP protocols, we first require a set of quantum states for which the effect of small deviations can be reversed, called a quantum error-correcting code [5], analogous to classical error-correcting codes [6]. One important difference between classical and quantum error correction is that quantum states are inherently continuous, whereas classical states are discrete. This suggests that noise processes on quantum computers will also be continuous, and therefore difficult to correct. However, the action of a projective measurement can effectively discretise the noise [7,Chapter 2], bringing the study of quantum error correction closer to classical coding theory. Unfortunately, the action of a projective measurement also ensures that the qubits used in a quantum code cannot be measured directly in order to determine where errors have occurred, since such a measurement would destroy any superposition of code states, discretising the logical state as well as the noise. However, a large set of quantum error-correcting codes, called stabiliser codes [3], allow the measurement of multi-qubit operators which yield results depending only on which error has occurred, and not on the code state itself. These operators are the stabilisers of the code, and the result from their measurement is called the syndrome. This syndrome indicates which stabilisers anticommute with the error.
Typically, the stabilisers are taken to be Pauli operators on n qubits, tensor products of the operators1, X, Y , and Z, wherê Conveniently, noise discretisation implies that if a code can be used to correct the random application of these operators on a subset of the qubits, it can also correct arbitrary operators on the same subset, since the Pauli operators form a basis for the space of operators. When measuring the stabilisers, it is possible for the measurement apparatus to return an incorrect eigenvalue, or for the measurement procedure to produce errors on the qubits. In order to obtain fault tolerance in such a scenario, it is necessary to repeat the stabiliser measurements many times. In this work, we restrict ourselves to a simpler model, in which Pauli errors occur at random on data qubits, and the measured eigenvalues are correct, and obtained without consequence.
In order to correct these Pauli errors, it is necessary to derive an error which would produce the given syndrome, and is unlikely to alter the intended logical state if performed on the system. This process, called decoding, is non-trivial, since a code with n physical qubits will typically contain O(n) stabilisers, so the number of possible syndromes scales as 2 n , prohibiting the use of a simple lookup table. Also, many quantum codes are degenerate, having multiple errors corresponding to a given syndrome, further complicating the process of decoding.
The purpose of this work is to improve the accuracy of a frequently-used method for decoding 2D homological codes [8]. The prototype for these codes is the 2D surface code, which we review in the following section, along with the reduction of the decoding problem to minimum-weight perfect matching. In Section 3, we review prior enhancements to the decoding algorithm of interest, as well as alternative algorithms. In Sections 4 and 5, we introduce the methods we propose to enhance the accuracy of surface code decoding and obtain increased threshold error probabilities. We discuss the effect of changing boundary conditions, as well as the additional complexity of decoding using these methods, and conclude in Section 6. In any of these codes, pairs of stabilisers anticommute with chains of errors that connect the two tiles supporting the stabilisers, producing a set of locations where the syndrome is 1 (which we will call vertices for the remainder of this work). This mathematical structure reduces the decoding problem to that of finding a likely set of connecting errors which join the vertices in pairs (we will call these connecting errors paths for the remainder of this work, and refer to abstract connections between vertices as edges). To maximise the likelihood, we follow the derivation of [12,Section IV D]. We begin with the assumption that only one type of error (X or Z) can occur, for simplicity. The likelihood of an error E is then a function of its weight w(E) (the size of its support), the total number of qubits n, and the underlying probability of error p: We assume that, if two vertices are connected by a path, the length of that path is minimal. This implies that the weight of the corresponding edge is given by the rotated Manhattan distance between the vertices it connects, see Figure 2. Note that, due to the open boundary X X Figure 2: A pair of vertices separated by an edge whose minimum length is 5. A minimum-length path is composed of steps between neighbouring tiles, and any path which is consistent with the indicated direction will be minimum-length.
conditions, a single vertex can be connected to the boundary instead of another vertex. This is typically remedied by introducing virtual vertices which are connected to each other with weight-0 edges, and to a designated vertex with a weight equal to the minimum path length between that vertex and the nearest boundary [13].
The total weight of an error is then the sum of the weights of the edges, so the likelihood can be expressed as a product over the edge set: To maximise the likelihood, it suffices to maximise any function proportional to the likelihood, so we can use a simplified cost function: with the minus sign present since the odds of a physical error p /1−p 1. This implies that, in order to maximise the likelihood of a prospective error, we should find a set of edges that connects the vertices in pairs (a perfect matching in graph theory) with minimal weight. Finding minimal-weight perfect matchings is a well-known combinatorial problem, solved using the Blossom algorithm [14,15].
Decoding with this algorithm results in a threshold error rate of ∼ 10.3% [16], compared with a threshold rate of ∼ 10.9% estimated using a correspondence between the toric code decoding threshold and the phase transition in a spin model [17,18]. From this, we surmise that minimum-weight perfect matching (MWPM) decoding performs well, though not optimally, in the event that errors of a single type are distributed with identical probability on each qubit. In addition, there remain many scenarios in which the independent identically-distributed (IID) error model does not apply. In the following section, we summarise the current efforts to enhance MWPM decoding with both IID and more realistic models in mind.

Prior Work
The derivation of the objective function in Equation (6) assumed that X and Z errors occur independently, with identical probability on each qubit. This implies that, if the probability of an X or Z error is O(p), the probability of a Y error is O (p 2 ), since Y ∼ XZ. If the probabilities of X, Y , and Z errors differ from this, the performance of the decoder will be decreased. Consider the frequently used example of depolarizing noise, in which each type of error is applied with equal probability. A decoder which minimises the weight of X and Z errors separately may fail to minimise the relevant weight, resulting in a logical error, see Figure 3.
To minimise the appropriate weight function, we need to avoid 'double-counting' the weight of Y errors. To accomplish this, Delfosse [19] and Fowler [20] reduce the weight of X edges that cross Z edges, since individual Y errors result in crossed X and Z edges, see Figure 1, upper left. In order to determine where these crossings occur, the authors first calculate one of Figure 3: From left to right: a weight-2 Y error acting on a distance-3 surface code, and the independent minimum-weight X and Z corrections. The appropriate correction, XX · ZZ, is not minimum-weight in the independent error model. the matchings, X or Z, then use the resulting edge set to determine edge costs for the other matching. In this way, Delfosse demonstrates that the threshold error rate against Z errors can be raised for a homological code which has a naturally larger tolerance to X errors.
One obstacle for this approach is that X and Z paths can intersect more than once, see Figure 4. It is not clear during decoding which error corresponding to a given edge should be Figure 4: For a continuous chain of Y errors, different minimum-weight X and Z paths can be found with different crossing numbers for the same syndrome.
selected in order to maximise the crossing number, thus minimising the total weight. Furthermore, there can be more than one minimum-weight perfect matching corresponding to a vertex set, further hampering any effort to obtain an accurate correction to the depolarizing channel using MWPM decoding. Accounting for degeneracy caused by edge and matching multiplicity, it appears, will provide insight in this direction.
There have been multiple efforts to address these types of degeneracy. The first is due to Barrett and Stace [21], and uses the IID error model. They derive a modified objective function, accounting for the fact that there are multiple error configurations with the same weight corresponding to a given matching, see Figure 5. If this number of minimum-weight configurations corresponding to the error E is labeled Ω(E), then the derivation beginning with Equation (3) instead begins with the likelihood function In order to use an MWPM-based decoder, it is necessary to express the likelihood function as a product over the set of edges. This requires us to express Ω(E) as a product over the set of edges. This is possible, since the number of paths joining a pair of vertices depends only on the positions of those vertices, and not on any property of the graph not related to the edge under consideration. See, for example, Figure 5, in which the total number of weight-6 errors (16) is Figure 5: Syndrome set with a single minimum-weight matching (weight 5) and 16 matchings with a single unit of excess weight. If the odds of an error p /1−p > 1 /16, then the higher-weight matching is more likely, see Equation (7).
a product of the number of paths joining the paired vertices (2, 2, and 4). The probability of a minimum-weight error equivalent to E can be factored as before: Taking the logarithm results in the final form of the objective function: In order to calculate Ω(e) for an edge between two real vertices, Barrett and Stace note that there are ∆y+∆x ∆x paths between points on a rotated square tiling of the torus separated by (∆ x , ∆ y ) [22].
Using this modification to the edge weights, MWPM decoding results in a threshold near 10.3% when correcting IID X/Z noise. This is similar to the increase in the threshold that can be obtained from solving an associated statistical physics model using matchings obtained from Metropolis sampling [23], which suggests a connection between the two methods. In addition, efforts have been made to improve the threshold of MWPM decoders by constraining the input graph to support errors from a specific coset, and by modifying the obtained solutions through Markov Chain Monte Carlo [24]. However, the thresholds from modified MWPM decoding are still suboptimal, suggesting that further corrections are required.
In a recent departure from MWPM-based decoding, Bravyi, et al. [25] have introduced polynomial-time maximum-likelihood decoders for surface codes which are based on efficient sub-theories of quantum mechanics. For IID X/Z noise, they begin by matching each vertex to an arbitrary boundary, producing a correction which eliminates the syndrome, but results in a logical error with high likelihood (also known as a pure error [26]). They then provide a reduction from the problem of determining the probability that the pure error is an accurate correction to the calculation of the output of a matchgate circuit [27,28]. For arbitrary stochastic Pauli noise, likelihood estimation reduces instead to the problem of contracting a tensor network state [29], which can also be solved in polynomial time, reminiscent of earlier decoders for topological codes which use the Renormalisation Group [30,31]. Using these techniques, Bravyi et al. are able to obtain optimal thresholds against IID X/Z and depolarizing noise, opening a gap between MWPM and maximum-likelihood decoding.
This gap can be narrowed by introducing a generalised edge weight calculation, similar to the method of Barrett and Stace, adapted to the case in which there are different probabilities of error on each qubit. We do this in the following section.

Odds Calculation
In order to more accurately approximate the odds of a path existing between two vertices, we first generalise the calculation of Barrett and Stace to cases where a vertex is being matched to the boundary, or to where some paths cannot be realised, see Figure 6. To count the number X X X X X X Figure 6: Two cases in which the number of paths is not ∆y+∆x ∆x , due to open boundary conditions. Left: one otherwise-possible path between two syndromes cannot be realised since two of the qubits on the path are outside the lattice. Right (also see Figure 5): Paths from a syndrome to the boundary can exit the lattice at multiple points, further preventing a simple calculation of Ω e . of paths on directed acyclic graphs such as these, we can use the following function: If this function is applied to a closed bounding box (see Figure 2), the result will be ∆y+∆x ∆x , which simplifies the calculation for periodic boundary conditions, for which all bounding boxes are closed. For a generic directed acyclic graph, this function requires one operation for every edge of the input graph, and can be parallelized using message passing [32,Chapter 16], reducing the runtime to be proportional to the path length using one constant-size processor per vertex of the input graph. To be implemented serially, the input graph g should first be topologically sorted [33,Section 22.4], also requiring a number of operations at most linear in the graph's size.
The key to the derivation of the path-counting algorithm is that the number of paths reaching a vertex w is the sum of the number of paths reaching the vertices with edges incoming to w. Using this function to calculate Ω e , we can replicate the result of Barrett and Stace using the rotated surface code, see Figure 7.
To generalise to the case where Y errors can occur with arbitrary probability, we first consider the role of detected Z errors in determining where X errors have occurred (see [19] for an in-depth discussion). We assume that the physical error model has a fixed probability of X, Y , or Z on each qubit, which is qubit-independent. To see the role that detected Z errors play, we imagine that we could determine with certainty the positions of errors that anticommute with the Z-detecting stabilisers. These errors must be either Z or Y , and the qubits which have not been affected by one of these errors must have been affected by an X error, if at all. To use this information in determining the positions of X errors, we use the conditional probabilities of an X error occurring, depending on the result of the Z decoding: Thus, we see that attempting to account for X/Z correlations in the error model will require us to consider error models in which there are different probabilities of error on each qubit, even when the underlying error model is qubit-independent. The method we use to approximate conditional probabilities of error in the absence of perfect detection is explained in the following section. For now, we generalise the derivation of the likelihood of an error beginning in Equation (3), with a few approximations and definitions: • We only consider minimum-length paths, which are restricted to the set of qubits within bounding boxes of the type seen in Figures 2 and 6.
• We consider two possibilities, either the vertices being considered are joined by a single path, or none of the qubits in the bounding box are in error.
The probability p(E) (now referring to the probability of any error equivalent to E under a change of minimum-length path) can then be expressed in terms of the odds of error on a given qubit, o q ≡ pq 1−pq : We can divide the set paths(e) into disjoint subsets that pass through each of the vertices adjacent to the final vertex, assuming that there is only one (otherwise, we can divide the set paths(e) into disjoint subsets associated with individual final vertices, then proceed). The path from each of these predecessor vertices to the final vertex only traverses a single qubit q p , so:  : Threshold of the rotated surface code correcting IID X/Z errors. Above: Edge weights for MWPM are derived using the Manhattan distance, resulting threshold is 9.97 ± 0.01%, lower than the value of 10.3% obtained using alternate boundary conditions in [34]. Below: Edge weights for MWPM are derived using Path-Counting, resulting in a threshold of 10.34 ± 0.01%, eliminating the disadvantage imposed by the rotated boundary conditions. Error bars represent a 99% confidence interval. Solid lines are maximum-likelihood fits to data near the threshold.
where paths(q p ) is the set of paths leading from the initial vertex to the predecessor vertex q p . This reduction is similar to the sum reduction used in Path-Counting, though now the sum is weighted by o qp . The path-counting function can be easily modified to evaluate this quantity: This produces an approximate cost function, since we have neglected the possibility that there is an error inside the bounding box that does not cause a syndrome (a randomly applied stabiliser), and we have failed to account for the large number of less likely matchings that are equivalent to the likeliest matchings up to a stabiliser. Nevertheless, we will see that surface code decoding can be improved, once we have an accurate set of estimates for the marginal probability of each type of error on each qubit. In the following section, we review belief propagation, a method which can provide these estimates.

Belief Propagation
Belief propagation is a message-passing algorithm for calculating marginal probabilities, which has been applied to decoding quantum codes correcting Pauli noise [35]. To use BP for decoding, we first define the Tanner graph corresponding to a surface code, which contains a qubit vertex for every qubit in the code, and a check vertex for every stabiliser check. An edge exists between each check vertex and each qubit vertex in its support, see Figure 8. The algorithm begins with a probability distribution p(E q ) (also called a belief ), assigned to each qubit q, of the form [1 − (p Z + p X + p Y ), p Z , p X , p Y ], equal to the prior distribution of error probabilities on each qubit (called 'the physical distribution' earlier). These beliefs are passed as messages to each of the neighbouring check vertices. Each check vertex c calculates messages to be passed back to each of its neighbouring qubit vertices, one for each possible error E q on the qubit: Here, proportionality indicates that the messages are to be normalized after they are calculated. Each qubit vertex then calculates a set of messages to be passed back: where proportionality again indicates that the messages are to be normalized.
Once the algorithm has converged, we calculate the updated beliefs (the conditional prob- These beliefs can, in some instances, be used to provide an approximate maximum-likelihood error, but belief propagation is only guaranteed to provide an accurate estimate of likelihood on trees (acyclic graphs). In decoding, where Tanner graphs have many short cycles, the product of the likeliest errors on each qubit may not conform with the syndrome. For an example of this, consider the scenario in Figure 9 (also [35]). In this scenario, an X error has occurred on the upper-left corner, causing a single vertex to appear in the neighbouring square. With equal likelihood, this vertex could have been caused by an X error on the neighbouring qubit. In this case, belief propagation outputs a marginal probability p X + p Y slightly below 1 /2 on each of these qubits. We refer to this scenario as a split belief. Taking the likeliest error on each qubit results in an error which does not conform with the syndrome, preventing the use of BP as a decoder. However, using a split belief in the approximate cost calculation in Section 4 still provides a final odds near 2, resulting in a small negative edge weight, and a good chance that the vertex in question will be matched to the boundary, as it should be.
Using d rounds of belief propagation and the cost calculation in Section 4, we obtain a threshold of 17.76±0.02% when correcting depolarizing noise, see Figure 10. This is a significant improvement over normal MWPM decoding, though not optimal. In the following section, we discuss the complexity of the additional steps in the decoder, as well as potential enhancements and other applications.

Discussion/Conclusion
To our knowledge, little discussion exists in the literature regarding the effect of boundary conditions on the performance of surface code decoders. In addition, it is important to note the complexity of the additional steps used in the decoder studied in this work. In this section, we address these topics, and conclude by pointing out opportunities for future research.

Boundary Conditions
Intuition drawn from the statistical physics of spin models which are related to surface codes motivates the belief that the decoding threshold of a surface code is independent of 'finite-size' considerations such as boundary conditions and aspect ratio (if non-square lattices are used). Indeed, we see this belief to be approximately correct in the current work, though there are small deviations from decoding thresholds calculated elsewhere. To ensure that these deviations are related to boundary conditions, we re-evaluate the decoders we consider using smooth/rough boundary conditions in Figures 11 and 12, obtaining thresholds for comparison to Figures 7  and 10.
The fact that these thresholds exceed those for the rotated boundary conditions is confusing if the decoding threshold corresponds exactly with a thermodynamic property. However, this correspondence is only exact when using an optimal decoder [36]. The suboptimality of the matching-based decoders studied here thus allows for boundary conditions to influence decoder performance. This has also been observed in predictions of logical error rates [37] and when using highly-correlated error models [38]. It is interesting to note that, for the two error models considered here, the difference in threshold between code families with different boundary conditions is smaller when the decoder is closer to optimality, though this is by no means conclusive. Another confounding factor, which prevents us from attributing these deviations solely to boundary conditions, is the behaviour of the Blossom V MWPM implementation on graphs with non-unique minimum-weight perfect matchings, also noted by [21], which may also play a role.

Complexity
The decoder studied in this work does not require any modification to the Blossom algorithm, but adds pre-calculation steps which determine the edge weights.  Figure 10: Threshold for the rotated surface code correcting depolarizing errors. Above: Using uniform edge weights, the threshold is 14.88 ± 0.02%, less than the value of 15.5 ± 0.5% obtained using smooth/rough boundary conditions in [13]. Below: When using multi-path odds summation as discussed in Section 4, the threshold improves to 17.76 ± 0.02%. Error bars are 99% confidence intervals. Solid lines are maximumlikelihood fits to data points near the threshold. assuming that they are performed in the appropriate order. A naïve parallel implementation of this algorithm would require one processor per pair of stabilisers (O(d 4 ) processors) and O(d) time, since the longest minimum-length paths are of O(d) length. It is likely possible to reduce the number of processors, since each processor in the naïve parallelization only performs a calculation during one of the O(d) timesteps of the algorithm. It is also likely possible to reduce the overall complexity of decoding by combining the proposed edge weight evaluation techniques with methods for pruning the syndrome graph [39], which is input to Blossom.  Figure 11: Replication of previous results using smooth/rough boundary conditions. The threshold error rate of 10.30 ± 0.01% is in accordance with earlier results. This threshold increases to 10.59 ± 0.01% using path-counting, comparable to the ∼ 10.6% observed by Barrett and Stace (with periodic boundary conditions). Error bars are 99% confidence intervals. Solid lines are maximum-likelihood fits near the threshold.

Future Work
To address the sub-optimality of the decoder, it is necessary to determine which of the approximations made in its definition are limiting accuracy. The first of these approximations is the set of marginal probabilities from belief propagation. The split belief in Figure 9 produces an odds near 2, where it should be much higher in principle (in the absence of other nearby syndromes, or a strong prior). Efforts to decrease this inaccuracy by altering the Tanner graph or prior used in BP have failed. Immediate future work will likely focus on finding a modification to or substitute for the BP algorithm which remedies this inaccuracy.  Figure 12: Application of belief propagation/multi-path summation to the decoding of surface codes with smooth/rough boundaries which are subjected to depolarising noise. A similar increase in threshold error rates is observed using these boundary conditions, with a threshold of 15.42 ± 0.01% obtained using a standard MWPM decoder, compatible with the observation of 15.5 ± 0.5% by [13]. This increases to 17.84 ± 0.01% when belief propagation and multi-path summation are incorporated. Error bars are 99% confidence intervals. Solid lines are maximum-likelihood fits near the threshold.
In any event, it will be interesting to apply multi-path odds summation to the (2 + 1)dimensional problem of decoding the surface code when incorporating multiple rounds of measurement to combat errors in the syndrome measurement circuit, as well as to the problem of decoding 2D colour codes, which can be reduced to multiple correlated instances of surface code decoding [40].