A Logarithmic Bayesian Approach to Quantum Error Detection

We consider the problem of continuous quantum error correction from a Bayesian perspective, proposing a pair of digital filters using logarithmic probabilities that are able to achieve near-optimal performance on a three-qubit bit-flip code, while still being reasonable to implement on low-latency hardware. These practical filters are approximations of an optimal filter that we derive explicitly for finite time steps, in contrast with previous work that has relied on stochastic differential equations such as the Wonham filter. By utilizing logarithmic probabilities, we are able to eliminate the need for explicit normalization and can reduce the Gaussian noise distribution to a simple quadratic expression. The state transitions induced by the bit-flip errors are modeled using a Markov chain, which for log-probabilties must be evaluated using a LogSumExp function. We develop the two versions of our filter by constraining this LogSumExp to have either one or two inputs, which favors either simplicity or accuracy, respectively. Using simulated data, we demonstrate that the single-term and two-term filters are able to significantly outperform both a double threshold scheme and a linearized version of the Wonham filter in tests of error detection under a wide variety of error rates and time steps.


Introduction
One of the major obstacles to large-scale quantum computation is the high frequency of qubit errors. Small interactions with the environment or imperfections in gate implementation can perturb the underlying quantum state throughout a computation and ultimately render the output useless. These issues only magnify as the size and complexity of the quantum computer increases, jeopardizing any attempt to demonstrate quantum advantage in key tasks such as the factorization of large integers [1]. The obvious challenge posed by quantum errors has given rise to the field of quantum error correction [2] and spurred the development of numerous techniques to identify and correct errors when they occur. These schemes, referred to as error correction codes, typically operate by encoding the quantum state into a larger Hilbert space and then monitoring the location of the state within this space using a specific set of observables.
The present work is focused on continuous or "always-on" error correction, where the state of the system is continuously monitored through a noisy signal channel in order to diagnose errors and to make corrections if necessary. Since the signal is noisy, we cannot know with certainty whether an error has occurred, and must instead make subjective judgements about the condition of the system based on past measurements and any other information we might have. In order to quantify the uncertainty in this task it is natural to adopt a Bayesian framework, which allows for new information to be easily combined with prior knowledge of the system using Bayes' theorem.
Assuming that the errors experienced by a quantum computer are uncorrelated, we can model the system as a Markov chain whose state is imperfectly observed by a measurement apparatus. Optimal treatments of this problem originate with work done by Wonham and his derivation of what is today known as the Wonham filter, which describes the evolution of the Markov state probabilities conditioned on a particular set of noisy observations [3]. Despite its optimality, this filter has not found widespread use for error correction, in part because it assumes that the value of the signal is characterized as a continuous function of time, while in practice the signal is typically only observed as a discrete set of measurement samples. As a non-linear stochastic differential equation, the Wonham filter can also be challenging to evaluate numerically, even if the signal function is known. Due to these limitations, most continuous error correction schemes avoid using Bayesian theory and instead rely on various thresholding procedures which are easier to implement but known to provide suboptimal performance [4][5] [6].
The purpose of this work is to introduce a discrete-time filter for continuous quantum error correction that significantly outperforms common thresholding schemes, while still being practical to implement on real hardware. We construct our filter using logarithmic probabilities, which are numerically stable, easy to normalize, and allow for straightforward evaluation of Gaussian noise distributions using only arithmetic operations. The filter has two different forms based on how it updates the posterior, with one method emphasizing accuracy while the other emphasizes computational simplicity. Both versions of the filter are able to achieve near optimal performance under a wide variety of error rates and time steps.
The remainder of the paper has the following structure. Sec. 2 reviews the three-qubit code that we will use as a simple error correction primitive, and describes the principles of continuous measurement. Sec. 3 derives an optimal Bayesian filter for identifying bit-flip errors, which cannot be practically implemented in its exact form. Sec. 4 discusses prior work in the field of continuous quantum error correction and outlines several obstacles to implementing a Bayesian filter. Sec. 5 proposes an approximate version of the optimal filter which overcomes those obstacles, and provides two different implementations which are both experimentally feasible using today's technology. Sec. 6 tests our filter on simulated data and compares the performances of the two implementations against other schemes found in the literature. Sec. 7 summarizes our findings and identifies areas for future work. An Appendix is also provided, which contains technical details that supplement the body of the paper.

Three-qubit coding scheme
To insulate a quantum system from errors, we must add some level of redundancy to its Hilbert space. This is achieved by assembling a set of physical qubits which is larger than necessary to perform the desired task, with the idea that these extra degrees of freedom will be used to implement an error correction scheme. The computation is then understood in terms of its effect on the coding subspace of the system, which describes the portion of the expanded Hilbert space where the intended quantum evolution will take place.
For simplicity we focus on a three-qubit repetition code, which is designed to protect the state of a single qubit from the effects of bit-flip errors. This encoded qubit is the true object of interest, so the computational task is conceptualized in terms of single-qubit operations which then need to be mapped to the full three-qubit system. We can perform this mapping by considering the so-called logical qubit, defined as which occupies the two-dimensional subspace spanned by |000 and |111 that we designate as the coding subspace. The logical states in Eq. (1) are used to encode the desired behavior of the system in the absence of errors, while the other parts of the Hilbert space serve to catch any bitflips that occur. For example, a bit-flip error on the first qubit will transform |000 into |100 and |111 into |011 , so the system will be shifted into a different two-dimensional subspace. As long as we can detect this change in the subspace, it will be possible for us to track and then correct the errors that have occurred on the system.
To perform this detection, we require a set of observables that uniquely identify the coding subspace and each of the three subspaces generated from bit-flips on the first, second, and third qubits. This is achieved using the parity operators Z 1 Z 2 and Z 2 Z 3 , whose eigenstates and eigenvalues are specified by for any complex numbers a and b. Eq. (2) shows that the combined action of the parity operators splits the three-qubit Hilbert space into four distinct 2-D subspaces, each characterized by a different pair of eigenvalues in {−1, 1} which are referred to as syndromes. The (1, 1) syndrome identifies the coding subspace, while the other three combinations correspond to states which result from a bit-flip on one of the qubits. We refer to the latter as the error subspaces, and note that the syndrome value identifies on which qubit the error has occurred.
The observables Z 1 Z 2 and Z 2 Z 3 offer us a simple method for bit-flip error detection, assuming that they can in fact be measured. After initializing the system into the coding subspace, we simply perform parity measurements of our system as needed and monitor the response of the system. If our results indicate that the system has moved out of the coding subspace, then we can record this deviation and apply a bit-flip correction based on which error subspace the system has been moved to. For example, if one error occurs on the first qubit then we have and the system has been returned to its proper state. Each time a bit-flip is diagnosed to have occurred on a given qubit, we can simply apply another bit-flip to that same qubit and the error will be undone.
It is important to note that if two bit-flip errors occur simultaneously (or rapidly enough to appear indistinguishable) then the above detection scheme will mistakenly assume that only one error has occurred and perform the wrong correction. For example, if errors occur on the second and third qubits then and the system will experience a logical error since the a and b coefficients have been exchanged. This type of misdiagnosis occurs because the errors in Eq. (3) and Eq. (4) both move the system into the same parity subspace and thus return the same syndrome values, so we must respond with the same correction in both cases. Given that single errors are more common then double errors under realistic conditions, the most reasonable response is to always assume that one bit-flip has occurred whenever the system enters an error subspace. This compromise is an inescapable consequence of using a short code, though we can reduce its impact by detecting errors more quickly and thus better resolving the difference between one bit-flip and two sequential bit-flips. Throughout this paper we will focus our attention on state tracking, where the goal is to correctly identify the net number of bit-flips that have acted on the system at a given point in time. For simplicity, we will assume that the system is initialized into |000 , and only evolves due to random bit-flip errors. That said, our results are not restricted to this quantum memory regime, as Eq. (2) shows that the parity operators act uniformly on states within a given coding or error subspace, meaning that the measurement signal will be identical regardless of whether the system is in a basis state or an arbitrary superposition. Even if a Hamiltonian is applied to the system, as would occur during a process like quantum annealing [7], the measurement signal will not be affected so long as the Hamiltonian does not mix together the four parity subspaces. As a result, any filter designed to handle quantum memory can be seamlessly applied to more complicated processes by just changing the metric used to evaluate its performance.

Error correction with continuous measurements
The three-qubit code described in Sec. 2.1 is agnostic to the actual method used to probe Z 1 Z 2 and Z 2 Z 3 , since the separation of the Hilbert space into coding and error subspaces is a property of the observables themselves rather than of any particular measurement scheme. Most work on quantum errror detection has focused on discrete error correction, which utilizes projective measurements to resolve the exact syndrome of the state at periodic intervals [2]. However, recent advances in superconducting qubit architecture has led to increased interest in continous error correction, where the parity operators are monitored at all times and a noisy readout of the underlying syndrome values is generated [8] [9]. Under these noisy conditions, a filter will be needed in order to extract relevant information from the signal.
In an idealized setting, we can formulate a simple mathematical description of the continuous measurement process using Gaussian POVMs. At time t, the probability of observing signal readout α(t) from a parity operator with syndrome S(t) is which implies that the signal values are distributed as N (S(t), σ 2 ). The syndrome value S(t) ∈ {−1, +1} is determined by the state of the system at time t, and will therefore change as errors occur. We assume that the variance σ 2 of the noisy signal is identical across all states and is thus independent of time. The weak measurement described by Eq. (5) must occur over some finite time ∆t, whose duration will determine the variance of the signal through the relation σ 2 = k ∆t for some positive k (the strength of the measurement is thus proportional to 1 k ). A longer measurementM of time T can be constructed by averaging over a set of these weaker measurements where the signal at time increment t + n∆t is the sum of the syndrome S(t + n∆t) and a Gaussian noise term ∼ N (0, k ∆t ). The value of t represents the time at which the averaging process starts, while the integer n indexes the set of sequential weak measurements being averaged over, each of duration ∆t. Since the average is taken over a time interval of length T , we can perform T ∆t weak measurements within the interval. Given that the Gaussian noise in Eq. (6) is additive, its average can be calculated separately from the syndrome average as where˜ ∼ N (0, kT (∆t) 2 ) due to the additive property of the variance and thus ξ(T ) ∼ N (0, k T ) after scaling by ∆t T . The value ofM is therefore equal to the averaged syndrome plus a new noise term ξ(T ), whose variance is inversely proportional to T : From Eq. (8), we define the continuous measurement M as the limit ofM for ∆t → 0, which turns the discrete sum into an integral, that is distributed as a Gaussian with variance k T and a mean value given by the integrated average of the syndrome over the interval.
In Figure 1 we provide examples of measurement sequences that could be observed as a three qubit system evolves under the influence of random bit-flip errors for 30 µs. For a system starting in |000 , the measurements M (t, T ) for each syndrome will be centered at +1 and oscillate randomly due to the additive noise term. Comparing Figures 1A and 1B, it is clear that increasing the integration time T from 0.1 µs (panel B) to 1 µs (panel A) dramatically reduces the fluctuations of the signal, making it much easier to see visually the effects of the bit-flip errors on the measurements. Perhaps counterintuitively, this reduction in the variance does not actually make the task of identifying a syndrome value from its noisy measurements any easier, since the standard error (SE) of mean value estimation scales with both the noise strength k T and the number of samples to be averaged. Given that the number of measurements n in a time interval of length β is inversely proportional to T , i.e., n = β T , we have and thus the uncertainty in our estimate of the syndrome will not depend on T . This is not to say, however, that the length of the integration period has no effect on the state tracking problem. Indeed, Eq. (10) simply states that there is a proportionate trade-off between the informativeness of a measurement (σ 2 = k T ) and the number of such measurements (n = β T ) when the syndrome value is fixed. When we introduce bitflip errors into the system and thus consider situations where the syndrome changes with time, the analysis is significantly more complicated. Figure 1C shows that when an error occurs within an integration period, a portion of the syndrome average will consist of +1 contributions and the other portion will consist of -1 contributions. The resulting syndrome mean can therefore take on any value in the interval [−1, +1], which suggests that the measurement will be inherently less informative than a measurement that occurs when the state of the system is constant. As the value of T grows larger, more measurements will fall into this category. The incorporation of these intermediate syndrome means into an optimal state tracking model is explored in Sec. 3.

Bayesian treatment of measurement
Determining the state of a quantum system from noisy signals is not a new problem, and the field of quantum mechanics has long grappled with how to properly describe the mechanism and effects of measurement [10]. One of the key challenges is knowing how to properly update the density matrix ρ s of a system to reflect the fact that we have performed a measurement on it. A common procedure involves coupling the system to a detector and then tracing out the detector's degrees of freedom, where ρ sd is the joint density matrix of the system and detector. By its very nature this approach cannot offer any information about the state of the system after a specific measurement, since we are averaging over all of the possible detector configurations without reference to the outcome that really occurred. Instead, what Eq. (11) describes is the behavior of an ensemble of systems after they are all measured, since the frequency of each measurement outcome can be specified without knowing the result for any given system. This approach is therefore insufficient for an error correction scheme, as the very purpose of error correction is to monitor a specific system and respond to the errors that actually occur, rather than to the average over all possible errors. The issue of specific outcomes versus ensemble behavior is reminiscent of the disagreement between Bayesian and frequentist statistics [11], and it is therefore unsurprising that a Bayesian formalism for quantum measurement was developed. This formalism was pioneered by Korotkov, who proposed an update rule for the diagonal elements of the density matrix after a measurement M using Bayes' theorem that has the form where ρ ii is the original diagonal element acting as the prior andρ ii is the updated (posterior) element [12]. The term P (M |i) describes the probability of observing measurement result M given that the system is in state i.

An Optimal Bayesian Filter
For our purposes, the significance of Korotkov's measurement formalism from Eq. (12) lies in its treatment of the density matrix elements as classical Bayesian probability distributions, which allows us to leverage the existing toolkit of Bayesian statistics [13] [14] to develop an error correction algorithm. Fundamentally, the goal of this algorithm is to determine, using all available information, the most probable state of the system at each time step in order to detect errors. As in Eq. (12), we seek to derive a posterior probability distribution for the state of our system after each measurement, using knowledge of the underlying bit-flip dynamics and of the Gaussian noise corrupting the measurement signal.

Recursive form of the posterior probability
For notational simplicity, we introduce the vector-valued quantity where M (1) and M (2) are measurements of the parity operators Z 1 Z 2 and Z 2 Z 3 respectively and i is a non-negative integer. The time argument iT appears because the sets of (nonoverlapping) sequential measurements will be spaced out in increments of T . To denote the state of the system after the ith measurement, i.e., at time t = (i + 1)T , we employ the compound index 0 ≤ i ≤ 7. Using this notation, we write the posterior probability of the i th state asP where P (C|AB) is generically the probability distribution of variable C given knowledge of the state of variables A and B. Using the chain rule of probability, Eq. (14) can be put into a recursive form aŝ where in the last line we explicitly assume that M i and i are conditionally independent of { M 0 , ..., M i−1 } given knowledge of i−1 . This is equivalent to assuming that the state transitions depend only on the prior state (Markovian assumption), and that the additive measurement noise is uncorrelated across time. Since P ( M i | M 0 ... M i−1 ) has no explicit i dependence and is not otherwise coupled to any of the terms in the sum, it will be ignored in our subsequent analysis and treated as simply a normalization factor. If we assume that the system begins in a known state, which is reasonable at the start of a quantum experiment, then Eq. (15) provides a recipe for iteratively computing the probabilities of future states in light of new measurement results. The two quantities that must be solved for as part of this procedure are 1. P ( i | i−1 ), which is the probability of jumping from state i−1 to state i in time T due to bit-flip errors, and , which is the probability of measuring values M i from the parity operators when we have knowledge of the system states at the beginning and end of the integration period.
The remainder of this section will be dedicated to deriving explicit functional forms for these two probability terms.

Analysis of the transition probability
Our treatment of bit-flip errors will assume that they act on the system independently of one another and occur at a fixed rate µ that is known to us. In saying that the errors are independent, we mean that an error occurring in one time interval gives us no information about whether an error will occur in any other non-overlapping interval. The number e k of such errors acting on the kth qubit in a time interval of length T will obey the following Poisson distribution where we emphasize that the error rate is identical across all three qubits. In order to use Eq. (16) to derive a functional form for P ( i | i−1 ), we must specify the number of errors needed to connect state i−1 to state i . Taking for example a system that starts as |000 at step i − 1 and ends up as |100 at step i, corresponding to i−1 = 0 and i = 4, it is clear that the first qubit must have been flipped at some point in the interval between the two steps. However, this does not mean that only a single error occurred on the first qubit, or that none of the other qubits experienced any errors. Rather, it means that the first qubit experienced an odd number of errors while the rest of the qubits experienced an even number of errors (including no error at all), such that the net number of bit-flips works out to be one for the first qubit and zero for the others. We can therefore express P ( i | i−1 ) as a sum over the Poisson probabilities of every possible error combination consistent with the net number of flips between i and i−1 .
Rather than evaluate this sum directly, it is easier to look at each qubit separately and calculate the probability that it will experience either an even or odd number of errors. Since the errors on each qubit are independent, we can then take appropriate products of these probabilities to calculate P ( i | i−1 ). The probability that the kth qubit will experience an even number of errors is and therefore the probability of an odd number of errors is Using Eq. (17) and Eq. (18), the transition probability P ( i | i−1 ) is given by where d( i , i−1 ) is the Hamming distance between the bit representation of i and the bit representation of i−1 . In words, the probability consists of an exponential multiplied by a sinh term for every qubit that experiences a net flip and a cosh term for every qubit whose state is ultimately left unchanged.
Since the error rate µ is independent of time, the value of index i is irrelevent to the value of P ( i | i−1 ). This allows us to define a single 8×8 time-invariant transition matrix J . This matrix can be understood as the parameterization of a discrete Markov chain [15] which describes how bit-flip errors can alter the state of our system over time intervals of length T . For the remainder of this paper we will use the elements of J to denote the transition probabilities P ( i | i−1 ).

Analysis of the measurement density
From our discussion of continuous measurement in Sec. 2.2, we know that the value of the ith measurement will depend on the average value of the syndrome from time iT to (i+1)T , plus an additive Gaussian noise term. Since it will be necessary to discuss measurements of the two parity operators separately, we unpack the vector quantity M i as per Eq. (13) into its components M (1) i and M (2) i , which describe the results of measuring Z 1 Z 2 and to make this distinction explicit. For convenience we denote the averaged syndrome values asS (20) where S (j) (t) is the value of the jth syndrome at time t.
Using this labeling, we have from Eq.
i | i i−1 ) will broadly resemble a bivariate Gaussian distribution. However, since the meansS are themselves random variables due to errors occurring within the integration period, the measurement distribution will consist of a Gaussian function integrated over mean values in the continuous interval [-1, +1]. This is given by where we assume that the measurement noise of Z 1 Z 2 and Z 2 Z 3 is uncorrelated and of equal variance. Parsing Eq. i | i i−1 ) of observing those mean values, assuming that i and i−1 are known. Figure 2 illustrates the Gaussian-like behavior of P (M , which is especially clear under strong noise. We exploit this similarity in our Bayesian algorithm to efficiently approximate the measurement log-likelihood as described in Sec. 5.1.
The fact that Eq. (21) is not simply a bivariate Gaussian distribution with a fixed mean reflects the uncertainty that is inherent in our problem. Since P (S is conditioned only on the state of the system at iT and (i+1)T , we must predict the values of the syndrome averages using this information alone. Figure 1C demonstrates the difficulty of such an inference, as the value ofS  can take for any given ( i−1 , i ) pair, so we must marginalize over these degrees of freedom in order to derive the measurement distribution.
To better understand where the probability ofS is concentrated, we can condition P (S  i . Plot A was generated using a moderate amount of noise ( k T = 1), and possesses a Gaussian-like shape. Plot B contains far less noise ( k T = 0.05), so the square shape of the underlying uniform distribution starts to show through. gives where e k is the number of errors occurring on the kth qubit during the ith interval (this time index is suppressed for notational convenience). In the final line we exploit the fact that errors on different qubits are independent, and that i is a deterministic function of e 1 , e 2 , e 3 , i−i and therefore does not need to be explicitly conditioned on.
From our discussion in Sec. 3.2, we know that i and i−1 only specify whether an even or odd number of errors occurred on each qubit, so P (e k | i i−1 ) can be rewritten as P (e k |odd) or P (e k |even) depending on whether i and i−1 differ in the kth qubit. These distributions are identical to the Poisson distribution from Eq. (16) except that the probability of either even or odd e k is set to zero. After re-normalizing, we get As an example of how these distributions are used, if i = 4 and i−1 = 0 then the error distribution factorizes as P (e 1 e 2 e 3 |4, 0) = P (e 1 |odd)P (e 2 |even)P (e 3 |even).
Referring back to Eq. (22), the final term left to characterize is P (S i | i−1 ; e 1 e 2 e 3 ), which describes how the syndrome averages are distributed given a starting state and the number of errors that occurred. This term is challenging to evaluate, since errors on the second qubit will flip both syndromes simultaneously and thus prevent the joint distribution from factorizing. If we consider errors on the second qubit separately from errors on the first and third qubits, i.e., focus on cases where e 1 , e 3 = 0 and where e 2 = 0, then we can derive (see Appendix A.1) the following expressions where a k and b k are the integer parts of e k 2 and e k −1 2 respectively, and δ(x) is the Dirac delta function. The "± j " term indicates whether state i−1 has even "+" or odd "−" parity with respect to the jth measurement operator, and therefore whether the syndrome should be added or subtracted. We note that the syndrome density is uniform (on its support) when only a single error occurs (i.e., a k = b k = 0). This simplification will be utilized in Sec. 5.1 as we construct our logarithmic filter.
Unfortunately, when errors occur on the second qubit and on at least one of the other qubits, the joint distributions are non-smooth and appear to lack convenient analytic forms (see Appendix A.1.3). As an example, if the system starts in |000 and experiences one error on each qubit, then the syndrome distribution is given by which is a piecewise function with a discontinuous first derivative. Figure 3 shows a plot of P (S i |0; 011) which is even less smooth. Due to the poor behavior of the syndrome densities, we are unable to make further progress toward an exact, analytic expression for the measurement probability . Most of this difficulty stems from the fact that we must, in general, account for an infinite number of possible errors that could occur across all three qubits within a given integration interval. If, however, we restrict ourselves to situations where only a single error occurs, which is reasonable when µ and T are small, then the expressions in Eqs. (25 -27) can be used directly and we will be able to evaluate the integrals in Eq. (21). This is the approach we take in Sec. 5.1, although it is no longer an exact treatment of the problem.  i |0; 011), which describes the syndrome means when the system starts in |000 and then experiences an error on the second and third qubits. Plot B shows P (S (1) iS (2) i |0; 111), whose form is given in Eq. (28). As a general rule, the more errors that occur in the interval the more "smooth" the syndrome density will appear, with plot A being outright discontinuous while plot B has a discontinuous first derivative. These plots were generated from 10 8 samples using histograms with 50 equally spaced bins along each axis. The color gradient indicates the relative magnitude of each bin, with different scales for the two plots.

Constructing the Bayesian Filter
from this section into an algebraic description of how the optimal model would operate. Referring back to Eq. (15), the posterior state probabilitiesP ( i ) are given recursively in terms of the measurement probabilities P (M . Noting the dependence of both terms on i and i−1 , the update rule is given byP can be viewed as a weighted transition matrix which combines the Markov dynamics of the error process with the likelihood of the observed measurement outcomes.
Due to the recursive nature of Eq. (29), the state tracking model operates naturally as a digital filter, taking in syndrome measurements at each time step and outputting a set of posterior probabilities that incorporate all of the information available to us. Setting aside the challenges of computing P (M , this filter is optimal in the sense that it was constructed from the formal probability manipulations in Eq. (15), which assumed only that the measurements and errors were Markovian. In Sec. 3.3 we made further assumptions about the structure of the measurement signal, and insofar as these assumptions are valid there is nowhere for the algorithm to be improved. Of course, should the physical system not obey the idealized model outlined in Sec. 2.2, then the filter will need to be modified in order to remain optimal.
From a computational standpoint, Eq. (29) represents the filter as a simple matrixvector product, with the difficulty centered on calculatingJ. More specifically, Sec. 3.3 made it clear that computing P (M (1) i M (2) i | i i−1 ) requires us to evaluate the integral in Eq. (21), which appears to be analytically intractable when the error combinations are not restricted. This limitation, together with other issues that will be discussed in Sec. 4, rules out using the optimal filter for real-time error tracking. That said, accurate numerical evaluation of the integral is still possible via sampling techniques, which are discussed further in Appendix A.2, so we will use this filter as a benchmark for other state tracking methods.

Obstacles and Prior Work
In the context of real-time quantum error tracking, where latency is measured in nanoseconds and hardware resources can be tight, the Bayesian filter outlined in Sec. 3 has four significant challenges: 1. The Gaussian integral in Eq. (21) cannot be evaluated analytically, and sampling methods are too resource-intensive for real-time filtering.
2. Computing even a single Gaussian requires the implementation of an exponential function, which can be challenging on low-latency, high-throughput devices such as FPGAs [16].
3. The outputs of the Gaussians will likely need to be represented using floating-point numbers to capture the necessary range and precision, which adds a further computational burden.
4. The probabilities generated by Eq. (29) will need to be periodically normalized in order to prevent overflow or underflow, which requires a dedicated division routine that will take up resources and be slow to run.
Overcoming all of these obstacles while preserving the underlying Bayesian framework is not trivial. The first derivation of an optimal continuous-time filter for the three qubit bit-flip code was published by van Handel and Mabuchi, who recovered the well-known Wonham filter after solving for the least-squares estimator of the density matrix [17]. This classical filter was designed to output probabilities for the states of a Markov chain observed continuously by a signal with additive Gaussian noise. Using a modified version of our notation, the canonical Wonham filter has the form whereP (t) is the probability of the th state at time t, S (j) ∈ {−1, +1} is the syndrome of the th state from the jth parity operator,S (j) (t) ≡ P (t)S (j) is the average of the jth syndrome at time t, and Q = 1 T ln J is the rate matrix of the continuous Markov chain which describes the state transitions.
It is important to emphasize that Eq. (29) and Eq. (30) are each optimal with respect to different measurement regimes. The Wonham filter is optimal under the assumption that we have access to instantaneous signal readouts at arbitrary t, while our filter in Eq. (29) is optimal when the measurements are restricted to being integrated averages of the signal over some finite period T as in Eq. (9). As T → 0 these two forms of measurement converge, and thus Eq. (29) converges to Eq. (30) after normalization (see Appendix A.3).
The general problem of filtering Markov jump processes using discrete observations has recently been explored by Borisov [18], who derives results that are in agreement with our work in Sec. 3.
As a non-linear stochastic differential equation, the Wonham filter is not practical to use in its exact form, though it is still possible to create discretized approximations of Eq. (30) which are effective. For example, Gange George et al. applied the Euler-Maruyama method to a logarithmic transformation of the Wonham filter in order to derive a first-order update rule that was numerically stable [19]. In the specific context of continous error correction, Mohseninia et al. [5] proposed a linearized version of Eq. (30) equivalent tô i−1 and Q are defined as in Eq. (30). The filter described by Eq. (31) does not involve any Gaussian functions and therefore avoids the first three obstacles in our list, but still requires periodic normalization. Additionally, as T grows larger the accuracy of this first-order approximation will decline.
The filters discussed so far have all been motivated at some level by Bayesian probability analysis, but there exists another class of algorithm, referred to as threshold or boxcar filters, which eschews connections to probability theory in favor of computational simplicity [20]. At a basic level, these models take a pair of integrated measurement values (M (1) i , M (2) i ) and compare them to a set of predefined thresholds, after which an appropriate action is taken. Atalaya et al. developed a double-threshold algorithm for the Bacond-Shor code using non-commuting observables [4], while Mosheninia et al. proposed a non-Markovian boxcar filter and a double threshold scheme for the three-qubit repetition code which were easy to implement and attained good performance [5]. Atalaya and Zhang et al. applied a flexible double thresholding scheme to a system undergoing quantum annealing, demonstrating that these algorithms were effective at error correction even in the presence of Hamiltonian evolution [6].

A Practical Bayesian Filter
To address the obstacles identified in Sec. 4, we propose an algorithm that avoids any exponentiation or division operations, while also approximating the analytically intractable integral in Eq. (21). In Sec. 5.1 we simplify Eq. (21) by assuming that only a single error occurs during each integration period, which yields simple expressions for P (S (1) iS (2) i | i i−1 ). In Sec. 5.2 we eliminate the need for exponentiation and division by shifting our analysis into log-probability space, where the multiplication of probabilities becomes addition, normalization becomes subtraction, and Gaussian distributions transform into simple quadratic functions. Finally, Sec. 5.4 introduces what we call "single-term" and "two-term" strategies for approximating the LogSumExp functions needed for Markov chain evolution using log-probabilities. The performance of our algorithm is tested numerically in Sec. 6.

Single-error approximation
The largest obstacle to using Eq. (29) as a filter is that P (M (1) M (2) | i i−1 ) cannot be easily evaluated. As explored in Sec. 3.3, this difficulty arises because the syndrome density terms P (S (1) iS (2) i | i−1 ; e 1 e 2 e 3 ) do not in general have convenient analytic forms, especially when the number of errors is large. This in turn prevents us from solving the integral in Eq. (21). However, if we were to assume that only a single error can occur within a given measurement interval, then Eqs. (25 -27) would reduce to which all lead to tractable integrals when substituted into Eq. (21). This single-error approximation, which is reasonable when the average number of errors per interval, µT , is small, leads to Gaussian-like measurement distributions which can be easily incorporated into our log-probability filter. For simplicity, we first consider an error that occurs on either the first (e 1 = 1) or third (e 3 = 1) qubit, since it will affect only one of the syndromes and thus allow the joint distribution to separate as in Eq. (24). By substituting P (S (21) and requiring that i and i−1 differ only in the first qubit, we get term is simply a Gaussian centered at ± 2 1, since the second syndrome is not sensitive to an error on the first qubit. The M (1) i term, by contrast, arises because the mean of the first syndrome is distributed uniformly over the range [−1, +1], which yields a pair of error functions. Notably, this term does not depend on ± 1 , so it is independent of the state of the first two qubits.
While the first term in Eq. (35) is an exact result from of our single-error integration, it is not very convenient to evaluate numerically. As described in Appendix A.4, the measurement distribution can be approximated as which will be easy to evaluate numerically after we shift to log-probabilities in Sec. 5 and replacing ∓ 2 with ∓ 1 . Carrying out a similar procedure on the second qubit gives where ± c is positive when i−1 has the same parity with respect to Z 1 Z 2 and Z 2 Z 3 but negative when the parities are opposite (roughly, "± c ≡ ± 1 · ± 2 ").
The final case to consider is when no errors occur in an interval at all. With reasonable values of µ and T this is the most likely outcome for any given interval, and it's measurement distribution can be solved for easily by substituting P (S which is simply the product of Gaussian distributions with means corresponding to the parities of i−1 . Taken together, Eqs. (36, 37, 38) constitute the approximate description of P (M (1) i M (2) i | i i−1 ) that we will use to construct our log-probability filter. The Gaussian form of each equation is especially important on a practical level, as the log-densities will all reduce to sums of simple quadratic equations that can be easily computed.

Moving to logarithmic probability
To begin the transformation from probabilities to log-probabilities, we take the logarithm of the optimal Bayesian filter from Eq. (29) where in the last line we exponentiate the logarithm of each term in the sum in order to preserve the recursive structure of the filter. Since logJ i−1 i ≡ log J i−1 i + log P (M The transition log-probability is given by the logarithm of Eq. (19), which yields where d( i , i−1 ) is the Hamming distance between i and i−1 . While the form of Eq. (40) is not especially illuminating, the value of µT will be known to us and assumed to be constant across an experiment. We are therefore able to compute the values of log J i−1 i in advance and use them in the filter without needing to evaluate the log sinh or log cosh functions in real time.
The measurement log-probabilities, by contrast, must be calculated anew each time a measurement is recorded. Such rapid computation is feasible because Eqs. (36, 37, 38) are all in Gaussian form, and therefore have the following quadratic log-probabilities where the normalization constants have been left unspecified for convenience. The different variances, all functions of k and T , are constant during an experiment and therefore do not need to be computed at each step. Using Eqs. (40 -44), we can calculate logJ and therefore evaluate the exponent of each term in the sum of Eq. (39). The last line of Eq. (39) is known as a LogSumExp [21], and its evaluation is the final obstacle to implementing our filter. For convenience we adopt the more compact notation L i i−1 ≡ logJ i−1 i + logP ( i−1 ), which serve as the inputs of the LogSumExp function, Using this form, we exploit a well-known trick for evaluating LogSumExp functions by pulling the largest exponential outside of the sum where L i * is the largest input to the LogSumExp function. The second term in Eq. (46) is now the logarithm of one plus a set of terms whose exponents are all non-positive. Figure 4 shows that as the difference between L i * and the other inputs grows, the output of Eq. (46) will converge to L i * . If the difference is large for some subset of the inputs, then the expression can be simplified by removing that subset from the sum.

Evaluating the LogSumExp
To take advantage of Eq. (46), we must know the relative magnitudes of the various L i i−1 terms in a typical run. For convenience, we introduce the non-positive quantity which is simplified to just ∆L when the specific indices are not relevant. Using Figure 4 as a guide, it is clear that when  the optimal filter in Eq. (29), which we simulate numerically using the method described in Appendix A.2. The plots of Figure 5 demonstrate clearly that only one L i i−1 term is significant when i is equal to the true system state and two of the terms are significant when i is separated from the true state by one bit-flip. For example, the gap ∆L between L 0 0 and the second-largest L i i−1 in plot 5A is roughly -15 before the bit-flip, which is far smaller than the cutoff threshold of -4 described in Eq. (47). Plots of the other i indicate that states separated from the true state by two bit-flips have four significant terms and the state separated by three bit-flips has eight significant terms.
To understand the behavior shown in Figure 5, it is necessary to determine the typical magnitudes of Eq. (40) and Eqs.(41 -44) under realistic conditions. Starting first with the measurement log-densities, we are interested in their expected values when the system is in some state , so we take the average of Eqs.(41 -44) with respect to measurements distributed as P (M (1) i M (2) i | ). For convenience we label this quantity as and after some algebra the average log-densities are shown to scale as where ⊕ is the bitwise exclusive-or and T k is the precision of the noise (reciprocal of its variance). In words, Eq. for |000 (plot A) and |001 (plot B) taken from the optimal filter during a 15 µs run, with k T = 4 and a state that was initialized to |000 . The dashed black lines mark the occurrence of a bit-flip error at 7.5 µs on the third qubit, which takes the system to |001 . The plotted data represent averages over the measurement noise taken from 10 5 different runs. When the system is |000 , plot A shows clearly that only L 0 0 (blue line) is significant, while plot B shows that both L 1 0 (blue line) and L 1 1 (orange line) are significant. After the bit-flip occurs the state is |001 and the situation is reversed, with L 1 1 , L 0 0 , and L 0 1 contributing significantly.
to what is shown in Figure 5, Eq. (49) indicates that unless the noise magnitude is very small, the L i i−1 will not differ greatly from one another based on their G i i−1 values. Turning our attention to the transition probabilities, we know from the numerical tests in Sec. 6.2 that when µT is on the order of 10 −2 it becomes impossible to accurately track the state of the system at moderate run times, even for the optimal filter. We can therefore expand Eq. (40) for small µT as and keep only the leading µT term. For fixed µT , the values of log J i−1 i differ only in how many factors of log(µT ) are included based on the Hamming distance d( i , i−1 ). Since log(10 −2 ) ≈ −4.6, the magnitude of a given L i i−1 decreases rapidly with every bit-flip that separates i from i−1 . Indeed, log(10 −2 ) lies below the threshold identified in Eq. (47), so if two L i i−1 terms differ by this amount then only the larger of the two will be significant. Using Eq. (50) and the fact that , it is possible to explain the broad patterns observed in Figure 5. Beginning with the contributions to |000 , the magnitude of L 0 0 (blue line in plot 5A) will dominate, since the system begins in |000 (large prior) and has a high probability of remaining in |000 (large transition element). All other L 0 i−1 will involve at least one bit-flip and therefore will not be significant relative to L 0 0 . By contrast, among the contributions to |001 (plot 5B), both L 1 0 and L 1 1 (blue and orange lines respectively) are significant, as each term has one large component (prior for L 1 0 and transition element for L 1 1 ) and one small component (transition element for L 1 0 and prior for L 1 1 ). After the bit-flip occurs the measurement likelihoods begin to favor |001 over |000 , so the prior terms all shift toward |001 as well.

Single-term and two-term filters
The combined results of Sec. 5.2 and Sec. 5.3 offer a straightforward recipe for implementing the log-probability filter of Eq. 1 M (2) | i i−1 ), which are then added to the transition log-probabilities log J i−1 i and log-priors logP ( i−1 ) to generate the L i i−1 terms. These terms must then be fed into Eq. (46), but we are free to choose how many of them to include based on their relative magnitudes. It is here that the algorithm splits into two different paths depending on whether we favor accuracy or simplicity.
If we desire a filter that is highly accurate, then we can take the two largest values of L i i−1 for each i and substitute them into Eq. (46). In this scenario the LogSumExp reduces to where ∆L is the (negative) difference between the largest and second-largest L i i−1 terms. While Eq. (51) can be inconvenient to evaluate exactly, methods have been developed in the context of logarithmic number systems that utilize lookup tables and various interpolation schemes to yield efficient and accurate estimates [22]. By keeping the two largest values of L i i−1 we guarantee that the log-probabilities of the true state and its three nearest states are all correctly propagated into the next time step. The log-probabilities of the other four states will not be computed as accurately, although this is acceptable since these states are not involved in the single-flip transitions which the three-qubit code is designed to detect.
Although the logarithm term in Eq. (51) can be evaluated in a reasonable manner, it still represents an extra computational step that increases the overall complexity of the , and single-term filter (plot C, Eq. (52)), generated from the same parameters and measurement set used in Figure 5. The solid gray line indicates the time at which the filter correctly identifies that an error has occurred on the third qubit (dashed black line marks the time of the error). The optimal and two-term filter show almost identical behavior, while the single-term filter takes longer to detect the error.
algorithm. If simplicity is valued over accuracy, then we can alternatively choose to keep only the largest of the L i i−1 after each time step. In this extreme case the LogSumExp collapses to and L i * simply becomes the new posterior. Figure 5 shows that when i equals the true state only one L i i−1 will be significant, so nothing important is being ignored there. For states adjacent to the true state, however, we will necessarily be ignoring one of the two significant contributions. While this single-term simplification appears quite restrictive, we show in Sec. 6 that Eq. (52) achieves high accuracy, though as expected it performs somewhat worse than the two-term approach.
In Figure 6 we plot the average evolution of the posterior log-probabilities in the presence of an error using outputs from the optimal filter versus those of the single-term and two-term filters. The plots were generated from the same data used to create Figure 5  clearly show that all three filters are able to detect the error on the third qubit. The singleterm filter performs worst as expected, taking longer to identify the error. Impressively, the two-term filter behaves almost identically to the optimal filter, with small discrepancies emerging for i ∈ {3, 5, 6, 7} since these states all have more than two significant LogSumExp terms.

Normalization
As discussed in Sec. 4, one of the practical challenges of using Bayesian methods is the need for periodic normalization of the probability outputs, without which the values will grow or shrink rapidly. This behavior is shown in Figure 7, where the outputs of the linearized Wonham filter from Eq. (31) grow exponentially with time. Since exponential changes become linear on a logarithmic scale, the outputs of our logarithmic filters instead scale linearly with time. In situations where error tracking needs to be done for only a short duration, the linear growth of the outputs is likely tolerable and can simply be ignored. This represents a significant improvement over the linearized Wonham filter, which must incorporate a costly normalization routine to be viable over virtually any time scale. For longer runs, or in cases where the range of output values is limited due to hardware restrictions, we may wish to slow the growth of the output magnitudes even further. This can be achieved by calculating the average rate of change per time step and simply subtracting this quantity at the end of each step. The average change ∆ is given by which is the sum of log J and G when the true state of the system is (since this is the only significant contribution). The corrected outputs are represented by the orange line in Figure 7. Applying this correction greatly slows the growth of the unnormalized log-probabilities, which allows the filters to be run within a very narrow numerical range.

Performance
To evaluate the performance of our single-term and two-term filters in an error-correction setting, we simulate a large number of trajectories (see Appendix A.5) and then record how accurately the filters are able to correctly identify the final state. For our definition of "accuracy" we adopt a binary measure which is 1 when the filter predicts either the true state or a state that differs from the true state by a single bit-flip, and is 0 otherwise. At any given time there will be four states considered to be correct and another four states considered to be incorrect, which gives an expected accuracy of 50% when guessing randomly. We choose this measure of accuracy because errors on a single qubit can be corrected via simple majority vote, while errors on two or more qubits will signify a logical error which the filter was supposed to have prevented. In the following subsections, we present and discuss the performance of the filters under different run durations (Sec. 6.1), error rates µ (Sec. 6.2), and time steps T (Sec. 6.3). In all simulations we set k = 0.4 µs, and the state was initialized to |000 . For reference, the performances of our single-term and two-term filters are plotted alongside those of the optimal filter from Eq. (29) to assess the practical impact of the simplifying assumptions made in Sec. 5.4. We also include plots for the linearized Wonham filter from Eq. (31) and the double-threshold scheme from Atalaya and Zhang et al. [6] to see how well our filters perform relative to existing algorithms. Figure 8 shows the performance of our single-term and two-term filters alongside the reference algorithms as a function of run duration, with the longest runs lasting a full millisecond. The accuracy is a decreasing function of time, since the probability of a logical error will increase as the run grows longer. Mohseninia et al. [5] showed that the accuracy will decrease linearly with run duration at small error rates, a pattern that is clearly visible in our data across all filters.

Experiment duration
As expected, the optimal filter achieves the highest accuracy, though it is almost exactly matched by the two-term filter. Despite its greater simplicity, the single-term filter performs within a percentage point of the optimal model. By contrast, the linear Wonham filter and double threshold algorithm both perform significantly worse, though for different reasons. The linearized Wonham filter is constructed explicitly from a first-order approximation, which is exact as T → 0 but suboptimal for any finite step size. The double threshold scheme, by contrast, is simply suboptimal by design, as it does not directly incorporate the error probabilities or the Gaussian distribution of the noise into its filter. The accuracy of the two-term filter is effectively equal to that of the optimal filter, such that the two curves completely overlap, while the single-term version performs slightly worse. The linear Wonham filter and double threshold performed comparably, though both did significantly worse than the logarithmic filters.  The single-term filter shows a significant improvement as the time step increases, while the accuracy of the linear Wonham filter decreases rapidly. The double threshold appears largely unaffected by the step size. Figure 9 shows the accuracy of the five filters as a function of the per-qubit bit-flip error rate µ. On a log-log plot the dependence between the accuracy and error rate is linear for small error rates, but levels off at 0.5 as the error rate approaches 0.1 (µs) −1 . In this high-error regime the system experiences enough logical errors that all memory of the initial state is lost, and the system is therefore unable to distinguish the true state from its complement. As before, the optimal and two-term filters perform almost identically, while the single-term term algorithm is slightly less accurate. The negative impact of non-zero T on the linear Wonham filter is especially obvious at small error rates, where the accuracies of the other filters all converge together while the Wonham filter performs significantly worse.

Time step
Lastly, Figure 10 shows the accuracy of the filters as a function of the measurement integration time T , which reveals many different trends. The time steps range from 1 ns to 1 µs, mapping to noise variances on the order of 100 to 0.1 respectively, which allows us to explore the behavior of these filters under both high-noise and low-noise conditions.
Focusing first on the double-threshold scheme, it is clear that the step size does not have a significant effect on its performance. This is unsurprising, as the thresholding procedure of [6] already includes an exponential moving average which smooths over the Gaussian noise for all step sizes. Indeed, since smaller values of T result in more measurements per time interval, the stronger noise is balanced out by the greater sample size of the average. This is the same trade-off described in Eq. (10) in the context of mean value estimation.
Looking next at the linear Wonham filter, Figure 10 provides a very clear demonstration of its dependence on T . At a time step of 1 ns the filter is effectively optimal, which is expected given that the differential Wonham filter of Eq. (30) is known to be exact. However, under more realistic conditions in which hardware latency and other factors generate longer time steps, the first-order nature of the filter starts to negatively impact its performance.
For the single-term and two-term filters, several different trends emerge. The singleterm filter (green line) has a much poorer performance at small time steps than the twoterm filter (red line), while at larger time steps their performances converge. This occurs because the transition probabilities are suppressed as T → 0, which means that almost all of the contributions to the LogSumExp are small. For the two-term filter this is not a significant problem, since taking both the largest and second-largest terms allows for the log-probability to accumulate over time. The outputs of the single-term filter cannot build up in this manner, so the log-probabilities are reduced far below their true values. As T increases this effect is diminished, so the performance of the two filters will begin to converge. Figure 10 also shows that the two-term filter diverges from the optimal filter at large T , since the single-error Gaussian approximation made in Sec. 5.1 starts to break down when µT is large and k T is small.

Discussion
The core objective of our work here was to devise a quantum error detection filter that is grounded in formal Bayesian analysis, yet also practical to implement on real-world quantum hardware. We have assumed that the continuous measurements are represented as a set of discrete signal averages taken over a time T , which then serve as inputs to the filter. This finite time step T cannot be avoided when working with actual hardware, since even if weak measurements could be made on arbitrarily short time scales there would still be latency associated with transferring and processing the data. As discussed in Sec. 2.2, the fact that T must be non-zero has important effects on the theoretical distribution of the measurement values, which become more significant as both T and the error rate µ increase. The filter derived from our Bayesian analysis in Sec. 3 can be understood as as a discretized version of the Wonham filter for finite T , though our derivation did not originate from its stochastic differential equation. Much of the difficulty in that analysis stemmed from the fact that errors can occur within the integration period of the measurement, which causes the underlying syndrome means to be distributed across the entire [−1, 1] interval instead of being constrained to only ±1. Even with this additional degree of freedom, a closed-form solution for the marginal distribution of either syndrome can be derived by simply summing Eq. (32) over 0 ≤ e k < ∞. The truly challenging part of the analysis comes when the two syndrome distributions are coupled together by errors on the second qubit, which induces complex inter-dependencies betweenS (1) i andS (2) i . One avenue for future work could lie in analyzing the form of these joint distributions, with the goal of determining whether a convenient or illuminating analytic form exists. In the absence of such a form, some type of simplification will always be necessary in order to utilize the filter.
Indeed, the single-error assumption that we made when deriving our log-probability filters in Sec. 5 was designed explicitly to avoid the mathematical challenges associated with the joint syndrome distributions. This assumption was the most restrictive that we made when developing the filters, and it becomes increasingly inaccurate as the average number of errors per step (µT ) increases. While it is true that the error tracking problem as a whole becomes quite challenging when µT is large, the question of how a filter could best be designed for such a situation warrants further exploration. However, in the preferred regime where µT is small, the numerical results in Sec. 6 show that our single-term and two-term log-probability filters are highly effective. In every test that we conducted, these two filters outperformed both the first-order Wonham filter and double threshold algorithm by a significant margin, with the two-term filter being virtually optimal over a wide range of µ and T values. The single-term filter performs slightly worse, but is surprisingly effective given its simplicity. Furthermore, Figure 10 shows that the performance of the single-term filter increases as T grows larger, up to a value of about 10 −1 µs. This, combined with the fact that the optimal filter experiences only a negligible improvement in performance when moving to smaller T , suggests that effort spent on latency reduction is likely to provide diminishing returns for filter accuracy.
Given the rapidly growing size of modern quantum hardware, it is safe to assume that error correction will remain an integral component of quantum computation for the foreseeable future. The three-qubit toy model analyzed here is insufficient to protect against arbitrary errors, so an obvious extension of our work would focus on developing continuous error correction filters for larger codes e.g., the Shor, Steane, or subsystem codes which provide more comprehensive protection [23] [24]. Our results suggest that Bayesian filters similar to those presented here would find great success on these more complex systems.
where we have now further factorized the syndrome density into the product of two new distributions.
Starting first with P ({x k j j }|e 1 e 2 e 3 ), we know that the location of an error and the qubit that it is assigned to are independent, so we have Given that each qubit is equally likely to experience an error, P ({k j }|e 1 e 2 e 3 ) must be uniform over all valid assignments of {k j }, where an assigment is valid if exactly e k errors are assigned to the kth qubit. Since the distribution must be normalized, each configuration will have a probability equal to one over the total number of valid configurations: Similarly, the errors themselves will be uniformly distributed within the interval, so the joint distribution of all error locations {x j } is one over the integral across all positions which depends only on the total number of errors. Turning now to P (S , the location of the errors and their qubit assignments will completely determine the value of the syndromes, so we will have a product of Dirac delta distributions. Specifically, the average value over an interval depends on the gaps in time between successive errors, as well as the gap between the start of the interval and the first error and the gap between the end of the interval and the last error. As shown in Figure 1C, the syndrome value is given by the sum of these gaps, with the sign of each contribution alternating due to the parity flips caused by the errors. Therefore, the syndrome distribution has the form where {x (m) j } is the set of errors which affect the mth parity operator and N (m) is the number of errors in this set. For the second interval in Figure 1C, the syndrome distributions would then be given by where x 1 = 0.7 is the normalized position of the error on the second qubit. Using Eqs. (55-58), we can rewrite Eq. (54) as which can be understood as the sum of volume integrals over the values of {x (k j ) j } which satisfy the Dirac distributions from Eq. (58). In principle, all that remains is to solve for the appropriate integral boundaries and then evaluate the integrals, though this is difficult to do for an arbitrary distribution of errors. In Sec. A.1.2 we describe how Eq. (60) can be evaluated when errors occur on only a single qubit, while in Sec. A.1.3 we discuss the challenges of generalizing this to errors on multiple qubits, specifically when the second qubit is involved.

A.1.2 Distributions for single-qubit errors
If errors are constrained to occur on only a single qubit, then one of the syndrome means will behave in a trivial manner. For errors on the first qubit we will haveS (2) i = ± 2 1, and for errors on the the third qubit we will haveS (1) i = ± 1 1. With errors on the second qubit both syndromes will change in an identical manner, so we can choose to model explicitly the behavior ofS (1) i and then just setS We begin by considering a scenario where the first qubit experiences N errors, while e 2 = e 3 = 0. Under these conditions, P (S j } is empty and thus there are no errors on the second or third qubits to flip the initial syndrome of the second parity operator. Indeed, the syndrome distribution of Eq. (60) factorizes completely, with the marginal distribution ofS (1) i given by (61) Note that the sum of Eq. (60) has collapsed into a single term, since there is only one way to assign N errors to a single qubit. Given that k j = 1 for all j, we have removed the superscripts from the error locations.
As mentioned previously, we can view Eq. (61) as a volume integral over the regions of the variable space which give the desired value ofS (1) i . To derive the integration bounds, we consider first the constraints placed on x 1 . The first error will contribute ± 1 (x 1 −0) = ± 1 x 1 to the value of the syndrome, which cannot be altered by any subsequent errors. Therefore, the range ofS (1) i after the first error is since the rest of interval can be entirely negative or entirely positive at the two extremes.
Writing the limits with respect to x 1 gives which becomes the upper integration bound of the first integral. We can repeat this same process for each x j , taking the sum of the contributions from errors up to x j and either adding or subtracting the remaining interval length to get the upper and lower bounds respectively onS (1) i . These bounds are then rearranged to derive an upper bound on x j , which has the generic form such that the integration limits are x j−1 ≤ x j ≤ B j . These bounds have the recursion relation which is key to evaluating the integrals. Using Eq. (64) and Eq. (65), the volume integral in Eq. (61) can be easily solved. The integral over x N will collapse to 1 2 due to the Dirac delta function, and contribute nothing further to the volume. Considering the next two integrals we have where we used the recursive expression B j − x j−1 = 1 − B j−1 . The result of the first integral does not have any dependence on x N −2 and therefore is unaffected by the second integral. Indeed, the generic action of the integral for which maps B j → B j−2 . By repeated application of Eq. (67), the probability density in Eq. (61) reduces to where x//y is

A.1.3 Challenges of multi-qubit errors
When errors occur on the second qubit and either the first or third qubits (or both), the evaluation of Eq. (54) is far more challenging. First, the sum over {k j } will not collapse to a single term, so we must evaluate multiple volume integrals. More importantly, the error locations assigned to the second qubit in each of those volume integrals will have to simultaneously satisfy constraints imposed by bothS (1) i andS (2) i , which means that the bounds B j on those errors will be given by    A.2 Implementing the optimal filter From Eq. (29), the optimal Bayesian filter is a function of the measurement density P (M (1) i M (2) i | i i−1 ) and transition elements J i−1 i . The matrix J is easy to compute, but in order to calculate P (M (1) i M (2) i | i i−1 ) we must evaluate the integral in Eq. (21). Although the analytic solution is not available to us, we can approximate the value of the integral using a Riemann sum P (M where the interval [−1, 1] has been discretized into 4n 2 evenly-spaced segments. The syndrome density is estimated by using Eq. (22) to factorize it into P (S  with 4n 2 bins for values of e 1 , e 2 , and e 3 such that e 1 + e 2 + e 3 ≤ N . The value of N is chosen so that the probability of experiencing more than N errors in a given interval is below some threshold value. Since µ and T only impact P (e 1 e 2 e e | i i−1 ), these histograms can be reused for multiple error rates and integration times. To reconstruct P (S (1) iS (2) i | i i−1 ), we perform the sum in Eq. (22) using the histograms and the analytic error probabilities to generate a single combined histogram.
The sum in Eq. (71) must be evaluated at every time step, which becomes a significant computational bottleneck for large n. To speed up the filter, we constructed a lookup table for the Gaussian distribution at discrete values across M

A.3 Convergence to the Wonham filter
The (unnormalized) Bayesian filter given in Eq. (29) can be shown to converge to the linear Wonham filter of Mohseninia et al. [5] as T → 0, and thus to converge to the Wonham filter after normalization. Up to the smallest order in T , the transition matrix elements are given by where d( i−1 , i ) is the Hamming distance between the three-bit representations of i and i−1 . Ignoring the normalization factor, the measurement density can be expanded up to its smallest order in T as whereP (S (1) iS i )|e 1 e 2 e 3 ) such that e k = 1 if there is a net flip on the kth qubit when moving from i−1 to i and e k = 0 otherwise. This is the syndrome density with the fewest errors that is still consistent with the state transition.
Substituting Eq. (72) and Eq. (73) into Eq. (29) giveŝ where I is a matrix with elements given by i | i i−1 )dS (1) i dS (2) i . (75) The sum in Eq. (74) can be simplified by keeping only those terms which are at most first-order in T . With I ≡ {1, 2, 4} containing the set of indices corresponding to bit-flips on the first, second, and third qubits respectively, we havê which is equivalent to Eq. (31) after introducing Q and evaluating I i i .

A.4 Gaussian fit of single-error distribution
In Eq. (79)

A.5 Simulating trajectories
To generate synthetic data of duration nT , we must correctly evolve the state of the system across each of the n integration intervals (of length T ), and then generate an appropriate measurement record. To perform this evolution we utilize the so-called "jump, no-jump" approach, where individual bit-flips are sampled at each time step and then applied to the system such that the state remains pure across the run. For particular values of µ and T the probability of experiencing e k bit-flip errors on the kth qubit is given by the Poisson distribution in Eq. (16), which can be easily sampled from using standard mathematical libraries. After e 1 , e 2 , and e 3 have been determined for each of the n time steps, we apply the errors to our initial state |000 in the proper order to evolve the system.