Error mitigation on a near-term quantum photonic device

We present two schemes to mitigate the effects of photon loss for a Gaussian Boson Sampling device, in particular, to improve the estimation of the sampling probabilities. Instead of using error correction codes which are expensive in terms of their hardware resource overhead, our schemes require only a small amount of hardware modifications or even no modification. Our loss-suppression techniques rely either on collecting additional measurement data or on classical post-processing once the measurement data is obtained. We show that with a moderate cost of classical post processing, the effects of photon loss can be significantly suppressed for a certain amount of loss. The proposed schemes are thus a key enabler for applications of near-term photonic quantum devices.


I. INTRODUCTION
Error is the main hindrance for large scale quantum computation. Quantum error correction codes are introduced to correct the errors and allow for fault-tolerant quantum computation. However, the conditions for faulttolerant quantum computation are extremely stringent, requiring very low-error gates and a large amount of physical qubits, e.g., about a thousand or more physical qubits are needed to construct a single fault-tolerant logical qubit [1]. Currently, the state of the art technologies allow for building quantum devices consisting of about fifty noisy physical qubits, which is very far away from the number required for fault-tolerant computation. It is expected that in the intermediate future quantum devices with hundreds of noisy physical qubits are available [2]. Efforts have been made to discover algorithms that are compatible with these noisy quantum devices and are capable of demonstrating the advantages of quantum computing. Examples include the Random Circuit Sampling [3][4][5], IQP [6][7][8], Boson Sampling [9], Gaussian Boson Sampling (GBS) [10][11][12], Variational Quantum Eigensolver [13], and the Quantum Approximate Optimization Algorithm [14,15], etc.
To improve the performance of these near-term quantum devices, several error mitigation techniques have been developed to suppress the noise. One of the promising error mitigation techniques exploits extrapolation to approximately estimate the expectation values of some observables of a noise-free circuit by using the measured expectation values from noisy quantum circuits with various error rates [16,17]. A proof of principle experiment has been performed in a superconducting device and the accuracy of the variational eigensolver has been significantly improved [18]. Another error mitigation technique that is capable of improving the estimation of expectation values is called quasi-probability decomposition technique, in which a noise-free circuit is simulated by a collection * Electronic address: daiqin@xanadu.ai of randomly selected noisy circuits following a particular quasi-probability distribution [16,19]. This technique has also been tested in various experimental platforms [20,21]. Instead of estimating the expectation values, there are error mitigation techniques that can partially recover the error-free quantum states, though more resources are required. One of those examples is to measure the symmetries of the quantum systems and project it back into the error-free subspace [22][23][24][25]. Another method is to employ variational quantum algorithms to mitigate errors in the state preparation circuits [26]. Moreover, inherent noise resilience of quantum algorithms can lead to error mitigation [27][28][29]. Other error mitigation techniques have also been developed to correct measurement errors [30][31][32], to mitigate decoherence [32], and for analog quantum simulation [33]. Furthermore, learning-based algorithms have also been designed to mitigate errors [34][35][36].
All previously mentioned error mitigation techniques are specifically tailored for quantum systems with a finite Hilbert space dimension. It is an open question whether these methods are applicable for computing models based on infinite-dimensional systems. Here we focus on the GBS device as a promising near-term photonic infinite-dimensional platform. It is believed that GBS can be used to demonstrate computational advantages of quantum computer over classical computer. Various algorithms based on the GBS device have also been proposed, for example, sampling the dense subgraphs [37], distinguishing non-isomorphic graphs [38] and quantifying the similarity of graphs [39,40]. These applications are sensitive to the imperfections of the GBS device, within which photon loss is the main source of errors for a photonic implementation. Schemes to mitigate the effect of photon loss are thus paramount for the applications of the GBS devices. However, no error mitigation scheme for photonic systems has been developed to date.
In this work, we propose two schemes to mitigate the effect of photon loss in a GBS device. The first scheme exploits the Richardson extrapolation technique and is tailored to extrapolate the probability of a photon number pattern or a collection of patterns for a loss-free GBS circuit. To perform the extrapolation, one has to vary the loss of the circuit and measure the probabilities of click patterns for every loss value. This requires only a small modification of the circuit but one needs to perform several experiments, depending on the number of chosen loss values. The second scheme estimates the probability of a click pattern of a loss-free circuit by linearly combining the probabilities of click patterns with higher total photon numbers in the presence of loss. This requires no modifications of the circuits since one does not need to vary the loss value as the first scheme. Classical postprocessing is required to compute the linear combination coefficients.
This paper is organized as follows. In Sec. II we briefly introduce the GBS device and some of its applications. In Sec. III we discuss the error mitigation scheme based on the Richardson extrapolation technique. We detail a standard version and an improved version of the extrapolation technique, with the latter has a better performance for large photon loss. We discuss the second scheme, the loss cancellation method, in Sec. IV. In Sec. V, we test our error mitigation schemes in an eight-mode circuit and compare their performances. Finally, we summarize our discussion in Sec. VI, followed by the Appendix.

II. OVERVIEW OF GBS DEVICE
A GBS device consists of three parts: the input state, linear passive interferometer, and photon detectors. The input is usually chosen as a product state of pure singlemode Gaussian states, each of which is fully characterized by its displacement and squeezing. The linear passive interferometer implements a unitary transformation to the input state and produces a multimode Gaussian state in the output. In each output mode a photon detector counts the number of photons.
Consider a GBS device with M input/output modes. The output multimode Gaussian state is characterized by a 2M -component vector d and a 2M × 2M covariance matrix σ [41]. Assume that n = [n 1 , n 2 , . . . , n M ] represents a certain measurement pattern of photons (a click pattern), where n j is the detected photon number in the j-th mode. In the case of no displacement, namely d = 0, the probability of detecting a click pattern n is given by [10] where n! = n 1 !n 2 ! · · · n M !, σ Q = σ + I 2M /2, α v = (α 1 , · · · , α M , α * 1 , · · · , α * M ) , and the matrix A is given by The expression for probability in Eq. (1) is valid for both pure and mixed Gaussian states. It is believed that a GBS device with a sufficiently large number of modes and input photons can generate a photon number probability distribution that is hard to sample from using a classical computer [10]. The GBS device can also be used to solve graph-related problems by mapping the adjacency matrix of a graph to the covariance matrix of the GBS output state [42]. The properties of the graph are thus encoded into the photon number probability distribution. Such applications include sampling the dense subgraphs [37], distinguishing non-isomorphic graph [38,40], quantifying similarity of two graphs [39], and other potential graph-related applications to be discovered.
A realistic GBS device is noisy, suffering from experimental imperfections like the photon loss, thus limiting its computational powers [43]. Given that fault tolerant error correction will not be accessible in the near future, it is paramount to develop some error mitigation techniques with least amount of hardware modifications to suppress the effects of photon loss and improve the performance of the GBS device. In the following sections, we focus on algorithms that involve estimating sampling probabilities of a click pattern or a collection of click patterns, and propose two schemes to improve the estimation of the probability for a loss-free GBS device.

III. SUPPRESS PHOTON LOSS VIA EXTRAPOLATION
The first method of suppressing errors in GBS is inspired from the extrapolation procedure introduced in Ref. [16]. Similar to Ref. [16], our method relies on performing multiple measurements and extrapolating to obtain the desired outcome. However, while Ref. [16] focuses on decreasing an overall multiplicative factor in the system Hamiltonian thus leading to an effectively higher error rate, we exploit the possibility of directly changing the error rate in photonic systems. Another novelty of our work is an improved method that provides enhanced performance for GBS probabilities by removing the poles of the extrapolation function before extrapolating as described in Sec. III D.

A. General formalism
Consider a quantum circuit with an ideal input state and gates, the density operator of the output state before detection isρ 0 . The expectation value of an observableÔ is O 0 = tr(Ôρ 0 ), where "tr" represents the trace. If the quantum circuit is imperfect, for example, the gate has an error rate characterized by a small parameter , then the output density operator becomesρ and the expectation value becomes O( ) = tr(Ôρ ). Bothρ and O( ) deviate fromρ 0 and O 0 , respectively, but approach to them in the limit of → 0. We thus can perform a series expansion ofρ asρ and similarly for O( ), whereρ k are operators and O k are real numbers. Suppose one can vary the error rate in a controllable way, in particular, to increase the error rate on purpose. Denote the error rates as j = c j , with j = 0, 1, 2, · · · , m, and we choose c 0 = 1 and c j > 1 for j = 0. For each error rate j , one performs the experiment and measures the expectation value O( j ). By linearly combining (m + 1) measured expectation values, one arrives at an estimation of the expectation value O 0 as The coefficients γ j are appropriately chosen such that O k for k = 1, 2, · · · , m are cancelled, giving a better estimation of the loss-free expectation value when is small. Taking this requirement into account, we find that γ j satisfy a linear system of equations m j=0 γ j = 1, m j=0 γ j c k j = 0, k = 1, 2, · · · , m, (7) and the solution can be found as This extrapolation technique has been applied to qubit systems.

B. Extrapolate sampling probability
We now apply the above technique to suppress the photon loss of a GBS device and obtain a better estimation of the sampling probability. To illustrate the method, we first consider the uniform loss case, namely, the photon loss in each mode is the same and is characterized by a single parameter . This is a good approximation to a realistic linear interferometer if it is implemented using the Clements' decomposition [44].
Assume that the covariance matrix of the output state without photon loss is σ 0 and that with photon loss is σ . They are related via the action of a lossy bosonic channel, where I 2M is the identity matrix. We thus have and By substituting this into Eq. (2), we obtain a series expansion of the A matrix as where A 0 is the matrix corresponding to no photon loss state and the coefficients A k is given by The probability, P (n; ), of a click pattern n with photon loss can be obtained by replacing the A matrix and σ Q in Eq. (1) by A( ) and σ Q ( ), respectively. Since both A( ) and σ Q ( ) have series expansions with respect to , we thus can find a series expansion for P (n; ) as where P 0 (n) corresponds to the sampling probability without photon loss and is the quantity that we want to estimate, and P k (n) is a complicated expression and its explicit form is not relevant here. Now we apply the general formalism developed in Sec. III A to derive a better estimation of P 0 (n). Assume that one can choose different loss values, j , and estimate the corresponding probability, P (n; j ), in the experiment. Similarly, define j = c j , with j = 0, 1, 2, · · · , m, and we choose c 0 = 1 and c j > 1 for j = 0. Then we get a better estimation of the probability P 0 (n) by defining which further simplifies tõ for γ j as in Eq. (8) and for small values of . Different values of loss are indeed possible in the experiment: specifically, loss can be programmably increased in experiment using M additional beam splitters that have one output port discarded. A potential challenge is that the detection probability of the needed click patterns decreases and thus it requires a lot of data collection.

C. A two-mode squeezed vacuum state example
To showcase the extrapolation technique, we consider a simple example: to mitigate photon loss in a two-mode squeezed vacuum (TMSV) state. A TMSV state is defined as where χ = tanh r and r is the squeezing parameter [45]. When the TMSV state is detected by two photon-numberresolving (PNR) detectors, the only possible detected photon number patterns are n = [n, n] and the probability is The measurement probability P 0 (n, n) for r = 1.0 is plotted in Fig. 1 (red circle). Now a pure lossy channel with transmissivity η = 1 − is added to each output mode of the TMSV state, resulting in a mixed two-mode Gaussian state. The photon number distribution will be modified, which we denote as P (n, m; ). We then plot P (n, n; ) for r = 1.0 and = 0.1 in Fig. 1 (blue square), which has a big deviation from the no photon loss case. Now we apply the extrapolation technique to mitigate the photon loss. We choose m = 4 and c = (1.0, 1.2, 1.4, 1.6, 1.8). From Eq. (8), we find γ = (126, −420, 540, −315, 70). The new approximation to the ideal case can be evaluated directly from Eq. (14). We plotP (n, n; ) for r = 1.0 and = 0.1 in Fig. 1 (black rhombus). We see thatP (n, n; ) is much closer to P 0 (n, n) as compared to P (n, n; ), showing that the extrapolation technique significantly mitigate the effect of photon loss. After the error mitigation, the effective photon loss is approximately ≈ 0.01.

D. Improved extrapolation of sampling probability
We have shown that the extrapolation technique works very well for low photon loss. However, when the amount of photon loss and input squeezing increase, the extrapolated result becomes less and less accurate. We find that by slightly modifying the previous extrapolation procedure, one can obtain a better estimation of the probability for large squeezing and relatively high photon loss.
By using the expression of σ Q ( ) in Eq. (10), one can rewrite the A matrix with photon loss as There exists a unitary matrix U diagonalizes the the covariance matrix σ 0 as where we assume σ 0 is a pure-state covariance matrix and r k are the squeezing parameters of the input squeezed vacuum states. Here U represents the transformation of the linear interferometer, and is independent of the input squeezing and the photon loss. Then A( ) can be rewritten as where r = (r 1 , r 2 , · · · , r M ) andr = (r 1 ,r 2 , · · · ,r N λ ) , with {r j } j∈{1,...,N λ } denotes a set of nonzero different squeezing parameters in the set {r k } k∈{1,...,M } , and we defined It is evident that P( ,r) is a polynomial of order 2N λ , and Since X 2M is a constant matrix and S (represents the linear interferometer) is independent of photon loss and input squeezing r j , so every entry of the matrix R( , r) is a polynomial of of order at most 2N λ (when at least one of the r j is zero), or 2N λ − 1 (when all r j are not zero). From Eq. (11) we find where Q( , r) ≡ M k=1 1 − 2 tanh 2 r k . By substituting A( ) and σ Q ( ) into Eq. (1), we find the probability of measuring a click pattern n in the presence of photon loss can be written as (22) where N = M j=1 n j is the total detected photon number, P( ) is a polynomial of of order at most 2N N λ (when at least one of the r j is zero), or N (2N λ − 1) (when all r j are not zero).
Notice that Q( , r) → 0 and P( ,r) → 0 when → 1 and r k → ∞, namely, both Q( , r) and P( ,r) are close to zero in the large squeezing and photon loss regime. Therefore, Q( , r) and P( ,r) contribute to the "singular" part of the probability expression P (n; ), and the corresponding poles are given by = coth r k > 1. The presence of these poles limits the accuracy of the first extrapolation procedure. To obtain a better extrapolation accuracy, we can remove these poles by multiplying P (n; ) with Q( , r)P N ( ,r), and then perform the linear combination. That is to say, the new estimation for the probability without photon loss is Notice that Q( , r) → 1 and P( ,r) → 1 when → 0, so the estimationP (n, ) approaches to P 0 (n) in the limit of → 0. To showcase the performance of the improved extrapolation technique, we consider mitigating the photon loss of a TMSV state as in Sec. III C. Table I compares the results of extrapolation and improved extrapolation for low photon loss = 0.2 and relatively high photon loss = 0.5 cases. We see that for low photon loss, they both give very good approximations to the exact sampling probabilities. While for high photon loss, the improved extrapolation gives better results for click patterns with low total photon number. In particular, the probabilities for click patterns [0, 0] and [1,1] can always be exactly extrapolated in the improved extrapolation. This can be understood as follows. After removing the poles, the right hand side of Eq. (22) is simply a polynomial, so in principle the probability P 0 (n) can be extrapolated exactly if sufficient loss values are chosen, namely, m is at least the same as the order of the polynomial. For the TMSV state example, the two input squeezing parameters are the same, so N λ = 1. By taking into account the factor (1 − ) N , the right hand side of Eq. (22) is a polynomial with order 2N after removing the poles. Therefore, by taking m = 4, one can exactly extrapolate probabilities of click patterns with total photon number less than three.
From the experiment perspective, removing the poles is only possible when one can fully control the photon loss and know the input squeezing parameters accurately. This requires a good calibration of the GBS device in a priori.

E. Extrapolation precision analysis
We have showed that the extrapolation technique works quite well by simply choosing several loss values, e.g., m = 4. Better results can be obtained by increasing the number of loss values, namely, to increase the number of experiments. For improved extrapolation technique, one can in principle extrapolate the exact value of the sampling probability by increasing the number of loss values. However, we show that this is challenging in practice. Specifically, the required measurement accuracy should increase exponentially in order to get a good extrapolated probability when the number of loss values increases, resulting in exponential increase of running time for the experiment.
To estimate some quantities, like the sampling probability, there is always an uncertainty due to a limited number of samples, or due to the experimental imperfections. We now consider how the statistic uncertainty and experimental imperfections affect the extrapolation results. From Eq. (14) we can see that the uncertainty in P (n; c j ) results in uncertainty inP (n; ). Assume that P (n; c j ) is an estimator of the sampling probability of click pattern n with photon loss c j and P (n; c j ) is considered as its mean value, then P (n; c j ) = P (n; c j )(1 + X j ), where X j is a random variable with zero mean and its variance characterizes the relative uncertainty of the measured probability. Therefore, the relative fluctuation of the extrapolated probabilityP (n; ) is Y (n; ) ≡ P (n; ) −P (n; ) P (n; ) = m j=0 γ j P (n; c j ) X j m j=0 γ j P (n; c j ) , where P (n; ) is obtained by replacing P (n; c j ) in Eq. (14) by P (n; c j ). It is straightforward to show that the variance of Y (n; ) is given by where V j is the variance of X j and we have assumed that X j are independent random variables. Denote the minimum variance of { X j } m j=0 as V min and the minimum nonzero P (n; c j ) as P min ; and the corresponding maximums as V max and P max . Then we can derive a lower bound and an upper bound for Var( Y ), where C 1 and C 2 are approximately constants because m j=0 γ j P (n; c j ) approaches P 0 (n), and P min (P max ) becomes constant for large values of m. Moreover, Γ 2 = m j=0 γ 2 j increases exponentially as the number of loss values, m, increases, which follows from Eq. (8). Therefore, Var( Y ) increases exponentially if V min is fixed, which implies that it becomes exponentially hard to estimate the loss-free sampling probability.
In practice, the strategy is to choose a number of loss values such that a sufficient good approximation to P 0 (n) is obtained while it is still tractable to measure the probabilities for various photon loss. As an example, we discuss the effect of uncertainty of estimating the probability of a certain click pattern. Estimating a probability can be achieved by sampling a process many times and counting the number of success event. Assume that N succ and N are the number of success tries and the total number of tries, respectively. The probability estimated by p = N succ /N is a random variable and its variance is given by where p is the mean value of p. The variance of the relative error of p is If this variance is treated as V max , then it is straightforward to find the required total number of tries, This gives an estimation of the required total number of tries in order to achieve a certain accuracy of the extrapolated probability, and also the required running time to collect data if the sampling rate is known.
For the TMSV state example, we consider the effect of probability measurement uncertainty and the fluctuation of loss values, and the results are shown in Figs. 2 and 3, respectively. The probability we want to extrapolate is P 0 ([1, 1]) = 0.2436. We choose the photon loss as = 0.3 and c = (1.0, 1.3, 1.6, 1.9, 2.2), and obtain the extrapolated resultP ([1, 1]) = 0.2367 assuming no fluctuations. In the first case, the relative probability measurement uncertainty, as defined in Eq. (24), comes from the finite number of collecting tries. Here we collect 500 samples of size N = 10 5 . We can see from Fig. 2 that the extrapolated probability deviates fromP ( [1,1]) and the variance is so big such that one sometimes gets negative extrapolated probabilities. However, the peak of the distribution is still around 0.2 and the mean value can be calculated to be 0.2373, which is close to theP ([1, 1]), and the standard deviation is 0.2531. The effect of the loss fluctuation is similar, see Fig. 3. We assume the loss fluctuation follows a normal distribution with standard deviation 0.01. The extrapolated probability also follows a Gaussian-like distribution with big variance, and one sometimes gets negative values. However, the peak of the distribution is still around 0.2 and the mean value can be calculated to be 0.2112, which is close to theP ([1, 1]), and the standard deviation is 0.2873. This example shows that the extrapolation amplifies the variance, as indicated by Eq. (27). However, it is still practical to get good results by choosing an appropriate number of loss values.

F. Nonzero displacement and nonuniform loss
We have shown that the extrapolation technique works for a GBS device with squeezed vacuum states as inputs. More general input states consist of displacements, for example in the algorithm to simulate the molecular vibronic spectra in a GBS device [46]. Here we show that in the presence of displacements the probability can also be expanded as a series with respect to the loss , therefore the extrapolation technique still works. We further show that the "singular" part (the poles) in the probability expression is independent of the displacements, therefore the improved extrapolation technique also applies. For details see Appendix A.
We have studied the simplified case where the photon loss in each mode are the same, which can be considered as a good approximation if the linear interferometer is implemented using the Clements' decomposition [44]. However, in the realistic implementation the photon loss is not uniform, and the situation becomes even worse if the interferometer is implemented using the Reck's decomposition [47]. We now propose an extrapolation technique for a GBS device with nonuniform loss.
A universal M -mode linear interferometer consists of N = M (M −1)/2 beam splitters. Assume that each beam splitter is lossy and is characterized by two lossy channels. We order the N beam splitters and label them using integers k, and the loss parameters of the corresponding two lossy channels as ka and kb . The probability of detecting a click pattern n is a function of all loss parameters, and is denoted as P (n; ) with = ( a , b ) and a = ( 1a , 2a , · · · , N a ), b = ( 1b , 2b , · · · , N b ). Define P (ia) (n) (P (ib) (n)) as the probability by changing only one loss parameter from ia ( ib ) to c ia ia (c ib ib ), with c ia and c ib greater than one. We then obtain an approximation of P 0 (n) to the second order of the photon loss, P (n; ) = P (n; ) − µ N k=1 P (kµ) (n) − P (n; ) where µ = {a, b}.
In the experiment, one first measures P (n; ) with loss a and b ; then change one of the loss parameters FIG. 4: Procedure to cancel photon loss. In the first step, the probabilities of a lossy device are calculated using the probabilities of a loss-free device. This corresponds to a physical process and is accomplished by applying the operator T . In the second step, the probabilities of a loss-free device are inferred using the probabilities of a lossy device. This is accomplished by applying the operator T but does not correspond to a physical process.
while keeping other parameters unchanged to measure P (ia) (n) and P (ib) (n), which requires M (M − 1) repeats of experiments. By combining all these measurement results we get a better approximation of the probability P 0 (n) via Eq. (31). This method is advantageous for a large circuit when the photon loss of each beam splitter is small. To precisely control the loss of each beam splitter could be challenging for an interferometer integrated on a chip, but is realizable for architectures based on delay loops.

IV. LOSS CANCELLATION
We now discuss another scheme to mitigate the effect of photon loss in a GBS device. It is specifically tailored for photon number detection and is particularly suitable for platforms where photon number detection is available.
One of the advantages of this scheme is that it requires no hardware modifications of the GBS device. The only computational cost is to calculate a set of coefficients, which we will discuss in details.

A. The general procedure
Consider a noisy M -mode device with same photon loss in each mode, characterized by a single parameter . Equivalently, this noisy GBS device can be modelled as placing M beam splitters, with the same transmissivity η = 1 − , after a lossless GBS device. The beam splitters take a Fock state n (corresponding to an M -tuple of nonnegative integers n j ) to n ≤ n (n j ≤ n j for all j) with conditional probabilities Then given probabilities for the states in the lossless GBS device, we get probabilities for the states in a lossy GBS device. This can be expressed in terms of a linear operator T on measures on N M . Define P 0 (n) as the probabilities for Fock states n in the lossless GBS device, then the probabilities in the lossy GBS device are P (n ; ) = T (P 0 )(n ) = n≥n P (n |n)P 0 (n). (33) This involves an infinite series, but convergence is clear (for 0 < < 1) since n P 0 (n) = 1 and 0 ≤ P (n |n) ≤ 1.
To compute T (P 0 )(n ) for a given final Fock state n , we do not need to compute conditional probabilities for all n, but only for those that could lead to n , i.e., with n ≥ n . If P 0 has finite support, we only need to include click patterns up to the largest |n| (the total number of photons of click pattern n) for which P 0 (n) = 0. Even if the support is infinite, in a practical computation we might impose a finite cutoff. What we have just described, evaluating the probabilities of a lossy GBS device from the probabilities of a loss-free GBS device, is straightforward. However, what we want is the inverse: using the probabilities of a lossy GBS device to infer the probabilities of a loss-free GBS device. This can be accomplished by performing an analytic continuation to the transformation (33).
We first show that the transformation T form a semigroup. Suppose we apply two sets of beam splitters to a loss-free device, with photon losses and µ, respectively. The two sets of beam splitters can be combined into one set of beam splitters with transmissivity (1 − )(1 − µ), namely, with photon losses + µ − µ. This implies that the operators T form a semigroup, with T T µ = T ⊕µ , where ⊕ µ = + µ − µ. Now the operators T can be defined formally for arbitrary complex numbers , not just those in [0, 1]: at least if P is a complex measure of finite support, T (P ) will be a complex measure with finite support, and n T (P )(n) = 1, and by analytic continuation the formula T T µ = T ⊕µ is still true. The formulas are all the same, although the interpretation of Eq. (32) as a conditional probability is no longer there.
We can use this to go backwards, inferring P 0 from P = T (P 0 ). By setting ⊕ µ = 0, we find µ = /( − 1), which we denote as . Thus T T is formally equal to the identity map, so that applying T formally brings the probabilities P = T (P 0 ) back to the loss-free probabilities P 0 . This whole procedure is schematically shown in Fig. 4. In the calculation, one only needs to substitute = /( − 1) for : We call this procedure loss cancellation. Our formal result Eq. (34) involves an infinite series. The convergence of this infinite series is not guaranteed in the whole parameter regime. Of course in practice we can only consider finitely many terms, so we might impose a cutoff, but unless the series converges, the result of this computation might not be a good approximation to the loss-free probabilities. See Appendix B for a proof that under appropriate conditions, for sufficiently low photon loss the series converges and gives good approximation to the loss-free probabilities.

B. Test for a two-mode squeezed vacuum
The two-mode squeezed vacuum state is given by Eq. (16), from which it is clear that the probabilities for click patterns [i, i] are given by and zero for other patterns. With uniform loss we have and by symmetry, we have P ([j, i]; ) = P ([i, j]; ). Since 0 < < 1 and 0 < χ < 1, the series converges; the sum can be written using a hypergeometric function We now apply the loss cancellation procedure to estimate the probability P 0 ([1, 1]). Two cases are considered: in one case we choose the cutoff photon number as n max = 7, which means only click patterns [n 1 , n 2 ] with n 1 ≥ 1, n 2 ≥ 1 and n 1 + n 2 ≤ 7 are considered, and the resulting approximation is denoted asP 7 ([1, 1]); while in the other case we choose the cutoff photon number as n max = 10 and the resulting approximation is denoted as P 10 ([1, 1]). We find that the Maclaurin series ofP 7 ([1, 1]) in begins and that forP 10 ([1, 1]) begins where χ 2 (1 − χ 2 ) is the value of the actual P 0 ([1, 1]). Because the next nonzero coefficients are 6 (for n max = 7) and 10 (for n max = 10) respectively, this should give very good results when is small. Table II shows the estimated result for P 0 ([1, 1]) using loss cancellation for different input squeezing and cutoff photon number. For r = 0.5 and n max = 7, the loss cancellation gives very good approximations for photon loss up to = 0.5, and the estimation fails for high photon loss like > 0.6. Increasing the cutoff photon number to n max = 10 slightly improves the result and pushes the  probability P0([1, 1]) using the loss cancellation procedure for cutoff photon number nmax = 7 and nmax = 10. The first column lists the loss values and the first row gives the lossless probability P0 ([1, 1]). The second column gives the estimated probabilityP7( [1,1]) for nmax = 7, with two subcolumns corresponding to input squeezing r = 1/2 and r = 1, respectively. The third column gives the estimated probabilityP10 ([1, 1 boundary to about = 0.6, but still fails for photon loss > 0.7. For r = 1.0 and n max = 7, good approximation is achieved for photon loss ≤ 0.3 and the result becomes meaningless for ≥ 0.5. We discuss this result in detail in Appendix B. For r = 0.5, our series should converge for all < 1, so for any we should be able to attain good results by taking a sufficiently large cutoff. However, for r = 1.0 our convergence result only works for < 1/(2χ) ≈ 0.6565, and for greater than that an increased cutoff would be useless (see Appendix B for more details).

C. Using empirical data
In a typical application, we will use an empirical distribution P from measured data as an approximation to P , and compute our approximation to P 0 (n) as T µ ( P )(n). Suppose the empirical data is obtained from a sample of size N , the estimator P (n) has mean P (n), variance P (n)[1 − P (n)]/N , and covariances Cov P (n), P (n ) = −P (n)P (n )/N (38) for n = n . If we write T µ ( P )(n ) = n P µ (n |n) P (n), then this has mean T µ (P )(n ) = P 0 (n ) and variance n P µ (n |n) 2 P (n) N − n P µ (n |n)P (n) It should be noted that even when the infinite series for T µ (P )(n) converges, the variance might not. Practically speaking, this means that "outliers" could have a large influence on the variance. Rather than use all the click patterns that appear in our sample, it may be better to impose a fixed cutoff. This means our estimator will no longer be unbiased, but it may be more stable.
For several different values of with χ = tanh(1/2), we took 100 samples of size 10 5 from the distribution P 0 , added loss by letting each photon survive or disappears with probabilities 1 − and , and then took the estimator T ( P )( [1,1]). Recall that the correct value is 0.167948. The results are shown in Table III.
As a function of µ, T µ (P )(n) is an analytic function which we can expand in a power series around µ = 0, the coefficients involving probabilities of various click patterns. The terms involving higher powers of µ will contain click patterns with more extra photons, which will have low probability of being observed but may have large coefficients. In a simulation, a few of these "outlier" click patterns will often be observed, and this can have a bad effect on the accuracy of our estimate. It can be better to only use a limited number of coefficients. The estimator will no longer be unbiased, but the variance may decrease significantly. To illustrate this we consider the example of a TMSV state as before. To fourth order of µ, the series for T µ (P )([1, 1]) reads Now by replacing µ and P in Eq. (40) by and P , respectively, we obtain an estimator T ( P )([1, 1]) for P 0 ( [1,1]). With r = 1/2 and 100 samples of size 10 5 , we get results with a significant improvement, see Table III.
Further improvements are possible if we take advantage of knowledge of the form of T ν (P 0 ). In the two-mode example we know (see Sec. III D) that T ν (P 0 )( [1,1]) should have the form where A(ν) is a polynomial, and in fact we know A(ν) has degree ≤ 4. Note that ν = ⊕ µ = + µ − µ so µ = (ν − )/(1 − ).
where P = T (P 0 ). If we expand the right hand side in a power series in ν − , since the left hand side is a polynomial of degree ≤ 4 the terms in higher powers on the right should be 0. We can take that series to order 4, evaluate at ν = 0 using the empirical distribution P instead of P , and the result should be a good approximation of A(0) = P 0 ([1, 1]). We tried this for r = 1/2 with 100 samples of size 10 5 , with quite good results, see Table III. For = 0.9, the mean and standard deviation are 0.159427 and 0.368437, respectively, which are not wildly off the mark.

V. EIGHT-MODE EXAMPLE
We have discussed two schemes to mitigate the effect of photon loss and showcased their performance for a twomode squeezed state. An important question is whether these error mitigation techniques can be applied to a large GBS device, which is more relevant to practical applications. When the circuit size increases, an immediate issue arises as that the size of the Hilbert space for a fixed total photon number increases, so the probability of detecting a single click pattern decreases. It is not practical to estimate a tiny probability using a GBS device. It is thus necessary to consider the probability of a collection of click patterns, the coarse grained probability. One of the useful coarse grained probabilities is the orbit probability [38], which is critical in solving the graph isomorphism and graph similarity problems. An orbit O n is defined as a collection of click patterns including all permutations of the click pattern n. The orbit probability P On is the sum of all click-pattern probabilities inside an orbit O n . Although we introduce the mitigation schemes by estimating the probability of a single click pattern, we show here that they can also be used to estimate the orbit probability, as well as other coarse grained probabilities.
We consider a book graph of size eight, with adjacency The adjacency matrix A has to be doubled and rescaled to be cA ⊕ A, so that it can be encoded into an 8-mode pure Gaussian state σ 0 [38], which is generated by injecting eight pure single-mode squeezed vacuum states into a loss-free interferometer. Here we choose c = 0.25 so that the maximum input squeezing is about 7.25 dB, which is accessible for current quantum optics experiments. When the circuit is lossy, the generated state is different from σ 0 . We now use the proposed error mitigation schemes to estimate the orbit probabilities for a loss-free circuit.
Consider orbits with at most one photon in each mode up to 8 total photons: vacuum, [1], [11] , where we have omitted "0" in the click pattern to simplify the notation and used a single click pattern to represent an orbit. The orbit probabilities in the absence of photon loss are (0.453, 0., 0.283, 0., 0.058, 0., 0.0044, 0., 0.00011).
In the presence of photon loss, the photons tend to populate toward lower photon number orbits. We now apply the extrapolation technique to estimate the loss-free orbit probabilities. To perform the extrapolation, we choose five loss values and the vector c is chosen as c = (1.0, 1.1, 1.2, 1.3, 1.4). The results are shown in Fig. 5. We can see that for low photon loss, both the extrapolation and the improved extrapolation work very well, giving good approximations for most orbit probabilities. When the photon loss increases, some estimated orbit probabilities from extrapolation lost accuracy, e.g., the probability of the orbit with one photon becomes negative. However, for the improved extrapolation with poles removed we still obtain a good approximation, showing its advantage for high photon loss. We also use the loss cancellation method to estimate the orbit probability P 0 ([00001111]). With cutoff photon number n max = 7, we find that good approximation can be obtained for photon loss ≤ 0.3, see Table IV. This demonstrates that the proposed error mitigation techniques still work for a bigger circuit for reasonable amount of photon loss.

VI. CONCLUSION AND OPEN PROBLEMS
We have proposed two schemes to mitigate the effect of photon loss in a GBS device. The first scheme is based on the extrapolation technique and requires a small modification of the GBS circuit: to increase the photon loss of the circuit. The second scheme requires no modifications of the circuit and thus is hardware efficient. One only needs to measure the probabilities of a lossy circuit and then linearly combine them in an appropriate way. The computational cost is to calculate the linear combination coefficients. We tested these error mitigation techniques in a two-mode and an eight-mode GBS circuits, and showed that they work extremely well for low photon loss, and also give fairly good approximations for relatively high photon loss.
In realistic experiments, the accuracy of the measured probabilities is limited by the finite number of samples and experimental imperfections. We show that the extrapo-lated probability is sensitive to the measured probabilities, and sometimes one gets apparently meaningless results like negative probabilities. This can be overcome by performing multiple experiments and taking the mean value as the estimate for the loss-free probability. The requirement of multiple experiments should be considered as the classical computational cost.
While the procedure that we have described so far is focused on GBS devices, similar ideas can also be applied to other near-term photonic architectures such as Boson Sampling. Boson Sampling differs from GBS in that expressions for click-pattern probabilities are related to permanents of transformation matrices rather than to Hafnians as is the case in GBS. Devising a concrete procedure that accounts for this difference and mitigated Boson Sampling probabilities is an open problem. Another important research direction is to devise methods for error mitigation in sampling problems, which may require recovering the quantum states instead of recovering the expectation value of observables.
where P d ( ) is a polynomial of . Therefore, the probability of measuring a click pattern n in the presence of photon loss and displacements can be written as   where d in = U † d is the input displacement vector. Therefore the exponential term exp − 1 2 d † [σ Q ( )] −1 d is known if the input squeezing parameters and displacements are known, and does not need to be extrapolated. The only term that is unknown is the polynomial P d ( ), which requires extrapolation. The improved extrapolation technique can be applied here since the poles are determined by Q( r) and P( ,r) which can be removed by simply moving them to the left hand side of Eq. (A4). Now we apply the lemma to loss cancellation. Suppose the lossless distribution P has decay radius t. If 0 < < 1/t, the distribution with loss P = T (P ) has decay radius ≤ (1 − )t 1 − t . Then with = /( − 1), the series for P = T (P ) converges if ( ) (1 − )t 1 − t < 1, and this is equivalent to The two-mode example is exceptional in that the decay radius can easily be seen to be χ = tanh(r). so we want < 1/(2χ) to ensure convergence. For squeezing r = 1/2, 1/(2χ) > 1.08, so the series converges for all ∈ (0, 1). But for squeezing r = 1, 1/(2χ) ≈ 0.6565.
In general it may be difficult to predict in advance the decay radius for P , but we may conjecture that it is typically finite and nonzero. The loss-cancellation procedure can then be expected to work very well if is sufficiently small, but very poorly when is too large.