Distinguishing noisy boson sampling from classical simulations

Giving a convincing experimental evidence of the quantum supremacy over classical simulations is a challenging goal. Noise is considered to be the main problem in such a demonstration, hence it is urgent to understand the effect of noise. Recently found classical algorithms can efficiently approximate, to any small error, the output of boson sampling with finite-amplitude noise. In this work it is shown analytically and confirmed by numerical simulations that one can efficiently distinguish the output distribution of such a noisy boson sampling from the approximations accounting for low-order quantum multiboson interferences, what includes the mentioned classical algorithms. The number of samples required to tell apart the quantum and classical output distributions is strongly affected by the previously unexplored parameter: density of bosons, i.e., the ratio of total number of interfering bosons to number of input ports of interferometer. Such critical dependence is strikingly reminiscent of the quantum-to-classical transition in systems of identical particles, which sets in when the system size scales up while density of particles vanishes.


Introduction
Quantum mechanics promises computational advantage over digital computers [1,2]. Current technology is on the brink of building quantum devices with the promised advantage in some specific computational tasks, called the quantum supremacy [3], for which goal several quantum systems are considered [4,5,6,7,8] and a dramatic breakthrough was recently reported Valery Shchesnovich : valery@ufabc.edu.br [9]. Can noise, always present in an experimental setup, compromise the quantum supremacy by allowing for an efficient classical simulation [10]? In this work we consider how a noisy boson sampling system can be distinguished from efficient classical approximations.
In boson sampling proposal of Aaronson & Arkhipov [4] the specific classically hard computational task is sampling from many-body quantum interference of N indistinguishable bosons on a unitary linear M -dimensional interferometer. At least in the so-called no-collision regime, when the output ports receive at most a singe boson (i.e., for M N 2 [11]), complexity-theoretic arguments have been found in Ref. [4] for the quantum advantage over classical simulations. In general, the output probabilities depend on the full many-body quantum interference of N bosons given by a sum of N ! quantum amplitudes, i.e., by the matrix permanents [12,13], which are hard to compute [14,15,16,17].
Unavoidable noise has been accounted for in the boson sampling proposal [4] by allowing for an approximation error. Taking into account that the complexity-theoretic arguments for the quantum supremacy are asymptotic, whereas in an experiment one has a finite quantum device, one may wonder how small experimental error should be? In the spirit of Ref. [1], the experimental error problem can be formulated as follows: Can a classical algorithm efficiently sample from the output distribution of a quantum device in such a way that it would be impossible to tell from the sampling data whether we have the classical simulation or the quantum device? At the other extreme, the two output distributions can be efficiently distinguished. Since the promised computational advantage is due to the opening gap between exponential and polynomial computations in the size of a quantum system [3], it is reasonable to consider a method of distinguishing a quantum device from a classical simulation as efficient if it requires only a polynomial in the quantum system size number of samples.
As a partial answer to the above problem, in the present work it is shown that one can efficiently distinguish the output distribution of noisy boson sampling from output distributions of a wide range of classical algorithms, such as simulation with classical particles and the recent algorithms of Refs. [48,52,53]. We give an analytical expression for the lower bound on the total variation distance between the output distribution of noisy boson sampling and that of such a classical simulation and point that the probability of no particle counts in a subset of output ports can be used for distinguishing the quantum and classical distributions. The number of samples necessary for distinguishing the quantum and classical distributions critically depends on the density of bosons, defined as the ratio of the total number of interfering bosons to the number of input ports of interferometer, and not on the number of bosons or the number of ports of interferometer themselves. Our analytical results are valid asymptotically with the number of in-terfering bosons, convergence to the asymptotic result is studied by using numerical simulations.
Previously the output distribution of boson sampling was shown [54] to be far in the total variation distance from the uniform distribution (argued to be an efficient approximation [55]), where a set of open problems was given. The present work partially resolves open problems (2) and (4)-(6) of Ref. [54] by considering a wide class of possible classical approximations to a noisy realization of boson sampling with arbitrary scaling of the interferometer size in the total number of interfering bosons, beyond the no-collision regime. In Ref. [56] it was shown how one can efficiently distinguish the output distribution of boson sampling from the simulation by classical particles. Statistical or pattern recognition techniques were also applied to assessment of boson sampling and distinguishing it from classical simulations [57,58,59,60]. However, no analysis of the impact of realistic experimental noise was attempted previously, which is the main goal of the present work.
The text is organized as follows. In section 2 the boson sampling realization with linear optics and the classical approximations by low-order multi-boson interference are described and our results on distinguishing their respective output probability distributions are presented. In section 3 our principal findings are summarized and discussed. For better readability, the derivations and mathematical details are relegated to Appendices A-F.

Noisy boson sampling vs classical approximations
The boson sampling proposal of Ref. [4] considers the quantum interference of N identical single bosons on a unitary M -port in the no-collision regime M N 2 , here we consider it for arbitrary M ≥ N . It turns out that distinguishing noisy boson sampling, as N scales up, crucially depends on density of bosons ρ = N/M , more precisely, on the scaling ρ = ρ(N ) (or, equivalently, M = M (N )). This fact allows us to consider all models with a given scaling ρ = ρ(N ) as a class of boson sampling, where N serves as the size parameter in the class. For example, for at least ρ ∼ 1/N we have the so-called no-collision regime, the main focus of Ref. [4].
Let us now mention known estimates on the number of classical computations required to simulate the boson sampling. For the ideal (noiseless) boson sampling, the fastest to date algorithms of Refs. [38,39] can sample from its output distribution in O(N 2 N ) computations. The number of computations depends on the density of bosons: for non-vanishing density of bosons the output probabilities can be given by the matrix permanents of rank-deficient matrices with reduced computational complexity [62,63]. The following simple rule can be stated [61]: to sample from the boson sampling in arbitrary regime of density of bosons ρ by the algorithm of [39] requires at least as many classical simulations as for the no-collision boson sampling with N ρ = N/(1 + ρ) bosons. The size parameter N ρ gives also the expected number of output ports occupied by bosons as a function of the boson density.
Less is known about the scaling of classical computations for exact sampling from noisy boson sampling. There are several strong sources of noise: noisy interferometer [41,43], partial distinguishability of bosons [42], and boson losses [44,45]. It was shown that a finite number of lost bosons (or dark counts of detectors, or both) does not compromise the computational hardness of boson sampling in the no-collision regime. There are also results for the approximate sampling from noisy boson sampling, where the number of classical computations depends additionally on the scaling of the amplitude of noise in the system size [40,48,52,53]. Efficient classical approximation algorithm of boson sampling in the no-collision regime with partially distinguishable bosons was found in Ref. [48], then extended to losses [52] and noise in interferometer [53]. For intermediate-size boson sampling devices the number of computations of the classical algorithm can be further optimized [64]. There is equivalence of different imperfections in their effect on classical hardness of boson sampling, e.g., noise in experimental platforms [22,35] has a similar effect to that of the partial distinguishability of bosons [53] (below we will use the term noise for all imperfections).
The efficient classical algorithm of Refs. [48,52,53] can approximate the output distribution of N -boson sampling with a finite-amplitude noise by a smaller one, with K interfering bosons and N − K classical particles, it becomes effi-cient (i.e., polynomial in N, M ) for bounded K, since the classical computations scale exponentially only in K [48]. Below we investigate if a noisy boson sampling with finite-amplitude noise can be efficiently distinguished from such a classical approximation, to answer the experimentally relevant problem posed in the previous section.
Probability distribution at output of noisy boson sampling Let us now briefly describe an experimental implementation of boson sampling. We will use the popular example of linear optical setup with single photons, see fig. 1, and focus on two sources of noise: losses of photons and their partial distinguishability. Later on, we will include also the false random (a.k.a. dark) counts of detectors, i.e., the counts which are not triggered by photons. There is an equivalence of the losses compensated by dark counts with noise in the interferometer [53], thus our model takes into account the strongest sources of noise in linear optical experimental setup. The multi-photon component at input is neglected here. It can be a strong source of noise with the photon sources based on the parametric down conversion, e.g., when about N 2 sources are used to produce N single photons in the boson sampling from a Gaussian state [23,24,31]. With additional optical multiplexing of the sources one can significantly reduce the multiphoton component noise [65].  Unitary linear optical interferometer U with M input and output ports connects the optical mode |k (in) of input port k to the optical modes |1 (out) , . . . , |M (out) of the output ports: (out) . Such an interferometer would take a Fock state of photons in the input ports into a superposition of the Fock states of the photons in the output ports according to the corresponding input-output relations between the boson creation operators in the input and output ports: is the boson creation operator in input (output) port k (l). The (photon numberresolving) photon detection projects the output state of photons onto one of the Fock states in the output ports. A realistic optical setup, however, suffers from photon losses and not all the photons are detected. Ways to compensate for losses, such as by the post selection on a given number of photons [30], are currently under investigation. For boson sampling one can allow only a constant number of bosons to be lost [44]. On the other hand, if we assume that photons are lost independently of each other, there will be, on average, a proportional to N number of lost photons. The classical algorithm of Ref. [52] can efficiently approximate the output distribution with the proportional losses. The post selection strategy therefore can work only for a small-size boson sampling, since the probability to lose a constant number of photons vanishes exponentially with N . Thus photon losses are unavoidable source of noise in an optical realization of boson sampling. The input-output relation in the case of arbitrary lossy interferometer U, can be cast in the following form [66] whereĉ † j is the boson creation operator in loss mode j (e.g., photon absorption due to nonunitarity of an interferometer), and matrix V is such that the total unitarity is observed UU † + V V † = I (see more details in appendix B).
Another strong source of noise is partial distinguishability of photons, affecting the output state of photons on a linear interferometer [67]. Partial distinguishability can be associated with different internal states of photons (such as temporal profile of a photon in case of Ref. [67]), which are unaffected by the input-output relation in Eq.
(2) and not resolved in an experiment. In this case, the boson operators in the input-output relations of Eq. (2) refer to the same internal state of boson on the input as well as on the output of the interferometer. Partial distinguishability of N bosons can be accounted for by introducing a function on permutations σ ∈ S N , acting on the internal states of bosons [42,68]. This function is defined as follows. For pure internal states of bosons,ρ k = |ψ k ψ k |, k = 1, . . . , N , it reads For mixed internal states J(σ) is a convex sum of the products as in Eq.
[67] (more on the partial distinguishability can be found in Ref. [68]). Explicit analytical results below are obtained for the uniform partial distinguishability, controlled by a single distinguishability parameter 0 ≤ ξ ≤ 1. This model applies when boson k with probability ξ is in some pure internal state |φ 0 and with probability 1 − ξ in an unique internal state |φ k , orthogonal to all other internal states. Either pureρ For bosons in internal statesρ . If photon detectors do not resolve the internal states of photons, the same distinguishability function corresponds to pure-state or mixed-state model in Eq. (4). In an experiment, there is always noise in photon parameters, such as random fluctuations of photon arrival times, thus the internal states of photons are always mixed. Whatever is the model of photon detection, the distinguishability of photons due to mixed states will persist, since even by completely resolving the internal states of photons one is not able to affect the distinguishability coming from the fluctuations of photon parameters [71]. Thus partial distinguishability of photons is a source of unavoidable noise in boson sampling experiments with photons.
We can now give the output distribution of our noisy boson sampling model with boson losses and partially distinguishable bosons. When N bosons, with the distinguishability function J(σ), are sent to the input ports k = 1, . . . , N of a lossy interferometer U, the probability to detect exactly n of them, 0 ≤ n ≤ N , at the output ports in a configuration m = (m 1 , . . . , m M ), m 1 + . . . + m M = n, i.e., when m l bosons are detected at output port l, reads (see appendix B for details) where the sum over k = (k 1 , . . . , k n ) stands for summation over all n-dimensional subsets of {1, . . . , N }, the multi-set l 1 , . . . , l n contains the output ports with bosons (one for each detected boson) and m! = m 1 ! . . . m M !. When only n < N bosons are detected at the output, there is still some interference between the N − n lost bosons and the n detected ones, reflected in exchange of bosons between the output and the loss modes by permutation σ in Eq. (5). Only for a diagonal loss matrix I − UU † no such boson exchange occurs (in this case σ(k α ) = k α for all n + 1 ≤ α ≤ N ) and the residual interference disappears. In the latter case, boson losses can be considered to occur, e.g., at the input of the interferometer [50] and there is such Parameter η k in this case is the probability that boson sent to input port k actually passes the interferometer (i.e., 1 − η k is the probability that boson k is lost). Below, some of our explicit results are given for the uniform loss model with U = √ ηU , where η is the uniform transmission of such an interferometer.

Approximations by K interfering bosons and N − K classical particles
In the no-collision regime, the output distribution of noisy N -boson sampling model can be approximated, to any small error, by a similar noisy model with bosonic particles, where there are only K interfering bosons and the rest N − K particles are distinguishable bosons (i.e., classical particles) [48,52,53]. The parameter K is chosen according to the required approximation error. Below we adopt such a classical approximation as the classical adversary, see fig. 1.
The simplest way to introduce the above described classical approximation is via the respective distinguishability function J (K) (σ), which is equal to J(σ) for permutations σ with at least N − K fixed points and to zero otherwise [53]: With such a distinguishability function, for K > 1, N − K randomly chosen bosons contribute to the output probability in the same way as classical particles, whereas the rest K bosons are allowed to interfere. By setting K = 1 or K = 0 we get N classical particles (since a permutation with N − 1 fixed points is the identity permutation).
Our goal is to estimate the total variation distance between the output distributions of noisy boson sampling (p) and a classical approximation (p (K) ), where both distributions are given by Eq. (5) with the corresponding distinguishability functions J(σ) and J (K) (σ). The total variation distance is defined as follows where the sum runs over all possible configurations m = (m 1 , . . . , m M ) of bosons in the output ports. The key observation is that for any subset Ω of the configurations of bosons in the output ports we have (8) Observe that the equality is necessarily achieved for a certain subset Ω * depending on U and other parameters of the setup.
Analytical analysis below is carried out for the difference in probability, ∆P L = P Ω L − P (K) Ω L , of all the output bosons to be detected in a subset of M − L output ports, or, equivalently, no particle counts in the complementary subset of the output ports Ω L ≡ {l 1 , . . . , l L }. Let us assume that such a subset is chosen once for a given setup (below we will return to this point in more detail). Since we consider arbitrary (or randomly chosen multiports) we will use Ω L = {1, . . . , L} in our analytical and numerical considerations below.
At this stage, the dark counts of detectors can be easily accounted for. Dark counts of a detector follow Poisson distribution p d (n) = ν n n! e −ν , where we assume a uniform rate ν for all detectors. Hence, the dark counts contribute the factor e −Lν to the the probability of no counts in L output ports. Denoting ∆J = J − J (K) we get from Eqs. (5)-(6) the following expression for the difference in probability of no counts in L output ports (see details in appendices A and B) In the limit of large number of interfering bosons N 1, the lower bound on the total variation distance between a noisy boson sampling and the classical approximation by K interfering bosons, Eq.
where the summation over k = (k 1 , . . . , k n ) runs over all n-dimensional subsets of 1, . . . , N (the derivation can be found in appendix C). In Eq. (10) the first two sums are due to expansion of the loss matrix A kj = δ kj − ηU k1 U * j1 of Eq. (9) in powers of η, while the third sum gives the partial distinguishability of the n-boson interference: there are n m d m amplitudes of m-boson interferences [71], weighted by ξ m , where d m is the total number of m-dimensional derangements (permutations with no fixed points) [69] and n m is the number of m dimensional subsets of n bosons. Such interferences up to the order m ≤ K are accounted for by our classical model, hence the summations in Eq. (10) start from K + 1.
Eq. (10) and the known average [72] over the Haar-random interferometer U , where M (n) = M (M + 1) . . . (M + n − 1), allow us to easily find the average value (11) Similarly, one can derive an expression for the variance (∆P 1 ) 2 − ∆P 1 2 (see appendix C). For K √ N the difference in probability in Eq. (11) becomes a function of K, the amplitudes of noise, ξ, η, ν and the density of bosons ρ = N/M . To extract the dependence on the setup parameters from Eq. (11), we use that d K+i = (K+i)! e −1 + R K+i with |R m | < 1/(m+1)! (see appendix F) and, assuming that K is sufficiently large, drop the remainder R K+i . By expanding the resulting expression in powers of K 2 /N and retaining only the leading order terms we obtain (see appendix C): Furthermore, for the class of interferometers having one balanced output port |U k,l | = 1 √ M one can show that the difference in probability of no counts in the balanced output port satisfies where the minus sign indicates a negative correction (see details in appendix C). Such interferometers contain a wide class: By considering more than one output port for the same setup one can maximize | ∆P L | as a function of L (i.e., the lower bound on the total variation distance can be optimized). Indeed, for noiseless boson sampling, ξ = η = 1 and ν = 0, for classical particles (K = 1) and L M we have [73], where the average is over the Haar-random unitary interferometer, maximized when L = int(1/ρ) = int(M/N ).
The analytical results of Eqs. (10)-(13) were verified by numerical simulations of Eq. (9) for the difference in probability of no counts in L ≥ 1 output ports (the numerical approach is  fig. 2 the "statistical average". Our method allows such averaging only for small number of bosons N ≤ 24, due to necessity of computing a large number of N -dimensional matrix permanents. Therefore, for N ≤ 24 and L = 1 we check against the statistical average the analytical expression of Eq. (11), called in fig. 2 the "analytical average", which is then used for larger numbers of bosons.
The numerical results confirm the scaleinvariance predicted by Eq. Finally, our results agree with the upper bound on the total variation distance in the no-collision regime. A noisy boson sampling setup of arbitrary large size N 1 can be approximated in the no-collision regime by the distribution p (K) [48,52,53] to the total variation distance D(p, p (K) ) = O (ξ √ η) K+1 . Whereas the upper bound is independent of N in this regime, our asymptotic lower bound W 1 of Eq. (12) vanishes due to dependence on the density of bosons W 1 ∼ ρ K+1 , since in the no-collision regime ρ 1/N . No upper bound is known beyond the no-collision regime.

The set of efficient distinguishers
The above results not only show that one can efficiently distinguish the output distribution of a noisy boson sampling device from that of a classical approximation, such as those used in Refs. [48,52,53], but provides a large set of such distinguishers. The probability P L of detecting zero counts in L output ports, used in Eq. (9), is such a distinguisher. Note that there are M L of ways to select a subset Ω L with L ports from the total M of them. There is enough subsets Ω L (with different L and Ω L ) with independent probabilities P L = P L (Ω L ) 1 for inversion of the relation P L = Ω L p(m), where p(m) is the output distribution of boson sampling, thus there is a strong correlation between the set of probabilities P L (Ω L ) and the output distribution.
For uniform distinguishability J(σ) = ξ N −c 1 (σ) , uniform detector dark counts rate ν, and arbitrary lossy interferometer U we get P L as a single matrix permanent of N -dimensional positive-semidefinite Hermitian matrix where A is from Eq. (9). Such permanents can be efficiently approximated by a quantum-optics inspired algorithm [74,75].
There is also an analytical expression for the average probability of no counts P 1 in a single output port of our noisy boson sampling with uniform distinguishability, uniform losses and dark counts (details in appendix D), In an experiment, there is no need to actually compute P L : one has only to choose L output ports and estimate P L from the number of output data with zero counts in them. Let us analyze an example. According to the standard estimation theory (e.g., Ref. [76]), the probability P 1 can be estimated, with α-confidence level, as P where T is the total number of samples, T S is the number of successes (samples with zero counts in the selected output port) and z α is the quantile of the standard normal distribution (e.g., for α = 0.05, 95%confidence, z 2 α ≈ 4). For samples the relative confidence interval for P 1 be- = . Setting = α, with the same confidence level as that of the estimate on P 1 , by experimentally estimating P 1 one can tell the output of the quantum device from that of the classical simulation.

Approximations accounting for all K-boson interferences
One important observation on the choice of our classical model, Eq. (6), is in order. The highest order of quantum multiboson interference accounted by the classical approximation in Eq. (6) is obviously K. However, by allowing only K bosons to interfere, such a model does not account for the multi-boson interference where groups of up to K bosons interfere between themselves. Such an interference corresponds to a subset of permutations in the permutation group, which decompose into the disjoint cycles of length up to K. Thus, if one wants to take into account all the multiboson interferences up to the Kth order, one has to use a different model [70], say J (K) + , obtained by setting to zero the distinguishability function J(σ) on all permutations σ having cycles of length n ≥ K + 1 in the disjoint cycle decomposition: where c n (σ) is the number of cycles of length n in the cycle decomposition.
However, switching to the model of Eq. (17), brings insignificant changes to the results, e.g., changes only the K-independent factor in W 1 of Eq. (12) (see details in appendix F). Our model with J (K) of Eq. (6) simplifies the calculations and allows for a simple numerical algorithm (see appendix E), whereas producing essentially the same lower bound on the total variation distance as the model in Eq. (17). This fact leads to two important conclusions: (i) our approach can distinguish the output distribution of a noisy boson sampling device from that of an approximation even if the latter accounts for all multiboson interferences up to order K √ N and (ii) the probability P L (14) essentially depends on multiboson interferences of the orders above K. These conclusions are very important in discussion of the assessments methods of boson sampling by verifying only some low-order correlations at the output distribution, considered below.

Tests based on low-order correlators
The second-order correlations in output distribution were previously used for assessment of boson sampling [57,58]. Current experiments on boson sampling have reached a stage when the output probability space is so large that direct verification by comparison with the output probabilities p(m) Eq. (5) is out reach (both computationally and due to excessively large number of samples for such an assessment). Only the low-order correlations are therefore checked, such as in the recent benchmark demonstration [32], where only up to 4th-order correlations have been verified. Such and other similar assessments are, however, insufficient, since some approximations using a smaller number of interfering bosons, such those of Refs. [48,52,53], can pass them.
Let us consider a whole class of tests based on rth-order correlations, e.g., Prob(max(m l ) ≤ s) Fixing δ 1, one can therefore state that in a randomly selected unitary interferometer, with high probability Prob ≈ 1 − δ, the maximal boson bunching count at the output reads s = ln N ρδ / ln 1+ρ ρ . With the same probability, the number of bosons K detected in r output ports of the ideal boson sampling scales at most as K = O(rs) = O(r ln N ). This means that with 1 − δ chances any test based on the r-order correlations will not succeed to tell the output distribution of even the ideal boson sampling from that of our classical model with K = O(r ln N ). Indeed, the probability to detect up to K bosons in r output ports depends on multiboson interferences only up to the order K [70]. Due to this fact and the above estimate on the number of bosons detected in r output ports, our model of Eq. (17) with K = O(r ln N ) would faithfully account for all the correlations of order r, e.g., such as in Eq. (18), passing any test based on them. In contrast, the probability of no counts in a subset of output ports allows to distinguish the output distribution produced by such a classical simulation from that of the boson sampling, with or without noise, since the applicability condition of our results, such as in Eq. (12), is satisfied:

Discussion of the results
In the spirit of Ref. [1], we have asked if a quantum system realizing imperfect/noisy boson sampling [4] can be efficiently and faithfully simulated classically as the system size scales up. To this goal, we have investigated whether it is possible to efficiently distinguish the output distribution of a noisy realization of boson sampling from that of the classical approximations that take into account only low-order multiboson interferences, where the term "low-order" means an order K √ N , where N is the total number of interfering bosons. Our choice of the classical approximations was dictated by the recent Refs. [48,52,53], which can approximate the output distribution of boson sampling with any finite noise amplitudes and to any given error in the total variation distance by adapting the order K of the accounted multiboson interference.
Our main result is that one can (and we point exactly how) efficiently distinguish noisy boson sampling with finite amplitudes of noise from the classical approximations accounting for multiboson interferences up to an order K = O(1). The latter class of approximation coincides with the efficient classical approximations of Refs. [48,52,53].
It is also found that the required number of runs of a noisy boson sampling for the purpose of distinguishing its output distribution from the considered class of approximations shows critical dependence on the scaling with N of the density of bosons ρ = N/M , performing a transition from a polynomial in N number of runs for the vanishing density of bosons, e.g., in the no-collision regime [4] (when ρ ∼ 1/N at the least), to an Nindependent number of runs for a finite density of bosons.
Our results pose some questions. In the nocollision regime, the lower bound on the total variation distance between the output distribution of noisy boson sampling and that of the approximation by low-order multiboson interferences vanishes when the number of bosons scales up. On the other hand, in Refs. [48,52,53] an upper bound on the total variation distance was found, which does not vanish with the total number of bosons. Are there better efficient classical approximations to the noisy boson sampling, which account for the multi-boson interferences to the same low order and narrow the gap between the upper and lower bounds? Is it possible to find a better way to distinguish the two output distributions in the no-collision regime, i.e., requiring much smaller number of samples? These and related questions are left for the future work.
We conclude by noting that the critical dependence on the density of particles, and not on the number of particles themselves, is strikingly reminiscent of the quantum-to-classical transition in systems of identical particles, occurring when the total number of particles scales up whereas the density of particles vanishes. Such a transition results in the mean-field approximation to a large system of identical bosons, the most prominent example being the Bose-Einstein condensation of weakly interacting bosons [77], experimentally achieved with dilute gases [78,79] and approximated by the mean-field order parameter obeying the Gross-Pitaevskii equation [80,81].

Acknowledgements
The author acknowledges correspondence with Scott Aaronson and discussions with Jelmer Renema and Raúl García-Patrón. This work was supported by the National Council for Scientific and Technological Development (CNPq) of Brazil, Grant 307813/2019-3, and by the São Paulo Research Foundation (FAPESP), Grant 2018/24664-9.

A Lower bound for uniformly lossy interferometer
Here we derive the lower bound on the total variation distance between the output distribution of N -boson sampling, affected by boson losses, partial distinguishability of bosons and dark counts of detectors, and that of the classical approximation by K interfering bosons and N − K classical particles (distinguishable bosons), below called the K-reduced model. Here we consider the case of uniformly lossy interferometer U = √ ηU , U U † = I, with the transmission 0 < η ≤ 1, whereas bosons can be in an arbitrary state of partial distinguishability described by a function J(σ) on permutations σ ∈ S N , defined in the main text. Let us fix the notations. We will use U[k 1 , . . . , k n |l 1 , . . . , l n ] for the submatrix of U on the rows k 1 , . . . , k n (input ports with a boson) and a multi-set of columns l 1 , . . . , l n (the output ports with bosons) corresponding to occupations m = (m 1 , . . . , m M ), |m| ≡ m 1 + . . . + m M = n (referred below as the output configuration).
Let us first consider the case of no dark counts of detectors and give the probability of detecting n bosons at the output of a lossy boson sampling device. For uniform losses, we can assume boson losses to occur at the input [50], hence, with the probability η n (1 − η) N −n only n out of N bosons are sent through interferometer U . Then the probability p 0 (m) to detect n bosons in an output configuration m reads [42,68] where k = (k 1 , . . . , k n ), a subset of the input ports 1, . . . , N , from which the detected bosons originate, p(l|k) is the probability of n partially distinguishable bosons from input ports k, whose state of distinguishability is described by the distinguishability function J k (σ), the output ports l = (l 1 , . . . , l n ) correspond to the configuration m, and the summation runs over the permutations σ 1,2 ∈ S n of n bosons in the input ports k. Now let us describe the effect of the dark counts of detectors. Assume that there are N total detector clicks in an output configuration s = (s 1 , . . . , s M ), thus additionally to n detected bosons there are N − n dark counts (generally, multiple) corresponding to an output configuration r = (r 1 , . . . , r M ), |r| = N − n and r ⊂ s (meaning that r l ≤ s l for all l = 1, . . . , M ). The corresponding probability p(s) becomes where the output ports l = (l 1 , . . . , l n ) correspond to the output configuration m = s − r, |m| = n. The expression of Eq. (21) is complicated, however, as we need to sum the probabilities in Eq. (21) corresponding to no counts in a subset of output ports, the effect of dark counts will be given by a simple factor.
We consider how close is the output probability distribution of Eq. (21) to that of the K-reduced model obtained by replacing N − K partially distinguishable bosons (from randomly selected inputs) by completely distinguishable ones (i.e., classical particles). As explained in the main text, this model corresponds to the probability distribution p (K) (m) similar to that of Eq. (21) but with the following distinguishability function Consider a lower bound on the total variation distance D = D(p, p (K) ) between our noisy boson sampling and its K-reduced model. We have D = n D n , where D n corresponds to exactly n detected bosons at the output. The lower bound is given by the absolute value of the difference in probability to detect all the output bosons in a fixed subset of output ports, below set to be L + 1 ≤ l ≤ M , i.e., no counts in output ports Ω L = {1, . . . , L}. In this case, the effect of dark counts can be accounted in a very simple manner.
Being independent of the input state of bosons, they contribute the factor e −Lν to the probability of zero counts in a subset of L output ports. Therefore, we have where H and A are N -dimensional positive semidefinite Hermitian matrices, defined as follows: To arrive at the result in Eqs. (24) and (25) we have performed the following steps. We have used an identity for the sum of output probabil-ities [56,73] |m|=n where σ ≡ σ 1 σ −1 2 , the following identity replaced the sum over all output configurations m by n independent sums over output ports l 1 , . . . , l n , using that for any symmetric function f (l 1 , . . . , l n ), and observed that for any permutation τ ∈ S N N k=1 where σ ∈ S n , acting on the subindex in k α , is defined by the relation k σ(α) = τ (k α ) (we can introduce such σ, since τ permutes elements in k between themselves). To derive Eq. (24) we have used here that the interferometer has uniform transmission η. However, as proven below, the resulting expression applies for an arbitrary (non-uniformly) lossy interferometer U.

B Lower bound for arbitrary lossy interferometer
Recall that an arbitrary lossy linear interferometer with M input ports can be imbedded into a unitary one with 2M input ports [66] with introduction of M auxiliary boson modes describing losses. One such 2M -dimensional unitary interferometer reads [56] where U = √ BF , with B = UU † and a unitary matrix F , V is such that V V † = I − U U † , and D = diag(η 1 , . . . , η M ) with η l being eigenvalue of B (a singular value of U). Bosons at the output ports M + 1 ≤ l ≤ 2M correspond to losses. Below we will only need to consider the M × 2M -dimensional block (U, V ). For zero dark counts, the probability that n out of input N partially distinguishable bosons are detected at the output of a lossy interferometer U in an output configuration m = (m 1 , . . . , m M ) can be obtained by summation of the general formula for output probability p(m) in the unitary inter-ferometerÛ [68], similar as in Eq. (20), over all possible output configurations r = (r 1 , . . . , r M ), |r| = N − n, of bosons in the output ports M + 1 ≤ l ≤ 2M . Since the probability is symmetric in the input/output ports, we designate output ports l 1 , . . . , l n to be the detected bosons, i.e., 1 ≤ l k ≤ M , 1 ≤ k ≤ n. Using a summation identity similar to that of Eq. (27), and recalling that V V † = I − UU † we obtain where we have introduced subsets k = (k 1 , . . . , k n ) of the input ports of 1, . . . , N , where k α ≡ σ 2 (α), used an identity similar to that of Eq. (26) to recast the summation over σ 2 ∈ S N as that over the subsets k and permutations τ ∈ S n in each such subset, and introduced the relative permutation σ R = σ 1 σ −1 2 . Note that the r.h.s. of Eq. (31) does not depend on the el-ements of the auxiliary unitary matrixÛ other than U.
Consider now the probability P L (n) of no counts in L output ports l = 1, . . . , L with exactly n bosons detected at the output of U. For zero dark counts, by using the summation identity of Eq. (27) we get where we have introduced a positive semi-definite Hermitian matrix (33) and used an identity for permutation τ , similar as in Eq. (26).
Thanks to the summation in Eq. (32) over all products of matrix elements on exactly n input ports k = (k 1 , . . . , k n ) selected from the total N of them, the result of Eq. (32) can be rewritten as the nth order derivative in an auxiliary variable x at x = 0 of the N th order polynomial: .
(34) Eq. (34) is very convenient for the derivation of the expression for the total probability of no counts in L output ports P L = N n=0 P L (n). Indeed, since the polynomial in Eq. (34) is of order N , the latter sum is the Taylor series about x = 0 with ∆x = 1, with the result being the value of the polynomial at x = 1. Taking into account also the random counts factor e −νL , we get where ∆J = J − J (K) .

C Estimating ∆P 1
Here we derive an analytical estimate on the difference in probability in Eq. (36) for L = 1 in the simplest case of a uniform partial distinguishability ξ and a uniform transmission η, U = √ ηU .
where we have used an identity for permutations in S n having n − m fixed points [69] (see also Eq. (40) Using Eq. (39) into Eq. (37) and observing that by Eq. (38) nonzero terms have n ≥ K + 1 we obtain C.
where we have used the binomial identity in the sum over n − m. We have from Eqs. (41) and where we have introduced Using the Pochhammer symbol for falling factorial, (n) m = n(n − 1) . . . (n − m + 1), the sum on the r.h.s. of Eq. (43) can be rewritten as follows Let us consider the remaining sum in Eq. (45), denoted by I N −K−1 (K). Using the estimate on d m in Eqs. (80) and (81) obtained in Section F, we get where we have used that nX < 1, more precisely The exponentially small term in Eq. (47) can be dropped in Eq. (46) as compared to the term What is left is to estimate the front factors in Eq. (43) and (45). We have Using the definition Eq. (44) and the following identity (derived by using Euler's summation formula, similarly as in Ref. [73]) we obtain Combining Eqs. (43) -(49) we get the following result Below will show that the leading order of W 1 from Eq. (50) is also the leading order of the lower bound in a Haar-random interferometer under a similar condition K 2 N , whereas the variance scales as O(K 2 /N ).

C.2 The average ∆P 1 in the Haar-random interferometer
We will use the following known average in a Haar-random unitary interferometer U [72] n α=1 where m (n) ≡ m(m+1) . . . (m+n−1) (Pochhammer symbol for the rising factorial). Consider the average lower bound in a Haarrandom interferometer obtained by averaging Eq. (41). Observing that |x| ≥ | x |, we have D ≥ | ∆P 1 |. We obtain thus they decrease exponentially fast with n. Due to the expansion (derived by using Euler's summation, similar as in Ref. [73]) we can neglect the difference between (η/M ) n and η n /M (n) for n 2 M . These two observations allows us to conclude that, without much of error, one can substitute the Pochhammer symbol M (n) by M n on the r.h.s. of Eq. (52), thus obtaining the same estimate on the average lower bound as in Eq. (50), if for the few first terms in the sum n 2 M = N/ρ, which necessitates that K 2 N/ρ. Below we give an alternative derivation of the leading order of the average lower bound in Eq. (52).
Using Pochhammer symbol for the falling fac- where we have introduced new indices l and s by n = l + K + 1 and m = s + K + 1. Due to the above discussion, we can approximate the falling factorial in the reduced sum by the corresponding power, using that (obtained similar as Eq. (54)) Assuming that K is not small for the approximation d m /m! ≈ e −1 for m ≥ K + 1 (see Eq. (40)) and further details in Eq. (80) of appendix F) and approximating the exponential series by the exponential functions for N − K 1 we have In the same way, we can also approximate the ratio of two falling factorials in the first factor in Eq. (55) Therefore, we obtain from Eqs. (52), (55) and where now the lower bound of Eq. (50) is an approximation to the average for sufficiently large values of K and N K 2 . Below, using the same approximations as those used for derivation of Eq. (58), we show that the relative variance of the difference in the probability in Eq. (41) is of the order O(K 2 /N ), which means that except for a vanishing fraction of interferometers as N scales up, the average result in Eq. (58) is the leading order of an absolute lower bound.

C.3 Variance of ∆P 1 in the Haar-random interferometers
Consider the variance of ∆P 1 in Eq. (41). Due to Eq. (51) the variance R = (∆P 1 ) 2 − ∆P 1 with where we have used that for a common element k in k (1,2) (denoting this subset by k (1) ∩ k (2) ) there is a factor 2 in the r.h.s. of Eq. (51). Let us simplify the sum in Eq. (60). By assuming that n 2 ≥ n 1 and using a dummy variable a ∈ {1, 2} we get for the two terms in the square brackets in Eq. (60) k (1) k (2) a |k (1) ∩k (2) Since the expression in Eq. (61) is symmetric in n 1 and n 2 , it applies for any n 1 and n 2 . Now, using the Pochhammer notations for the falling factorials, for θ n 1 ,n 2 ≡ n 1 !n 2 !Θ n 1 ,n 2 we get θ n 1 ,n 2 = min(n 1 ,n 2 ) s=0 (n 1 ) s (n 2 ) s (N ) n 1 +n 2 −s s!M (n 1 +n 2 ) By setting n i = l i + K + 1 and m i = s i + K + 1 in Eq. (59) we obtain using Eq. (62) As in the estimate of the average value ∆P 1 , only the lowest-order terms in the sums over l 1,2 contribute significantly to the result. Thus we can approximate the Pochhammer symbols by using the expansions in Eqs. (54) and (56). Moreover, for K √ N , it turns out to be sufficient for the leading-order approximation for θ n 1 ,n 2 of Eq. (62) to keep only the terms with s = 0 and s = 1 in the expansion. In this case one can simplify the variance R in Eq. (63). Using the following approximations for the ratios of Pochhammer symbols (setting n = max(n 1 , n 2 )) M (n 1 +n 2 ) M (n 1 ) M (n 2 ) = 1 + we obtain for the first two terms (s = 0, 1) in Eq. s=0,1 The rest of the expansion in Eq. (62) is relatively much smaller for n 1,2 ≤ K + 1 √ N , since Thus we have (recall that n = max(n 1 , n 2 )) θ n 1 ,n 2 = (1 − ρ) N n 1 n 2 ρ n 1 +n 2 1 + O n 2 N .
(64) For K sufficiently large, using the approximation d m /m! ≈ e −1 for m ≥ K + 1, we obtain from Eqs. (63) and (64) the leading order term of the variance as follows where we have taken into account that the four sums factorize into two double sums for (l i , s i ), i = 1, 2, with each factor being proportional to that evaluated in Eq. (57). Eq. (65) shows that, as N → ∞ and K 2 /N → 0, the average value ∆P 1 , given by Eq. (58), is the asymptotic difference in probability for almost all interferometers.
Obviously, there are interferometers which cannot satisfy Eq. (58). This is the class of al- where U = PV is a product of a permutation interferometer P, exchanging the labels of the input ports, and an almost diagonal unitary interferometer V . More precisely, if max(|V k,l | 2 ) for k = l is much smaller than the average |U k1 | 2 = 1/M . This class of interferometers corresponds to strongly concentrated output probability distribution, as N scales up, on the permutation of input ports by P.
D Estimating the probability P 1 We consider the case of uniform losses and distinguishability: U = √ ηU , U † U = I, and J(σ) = ξ N −c 1 (σ) . In the simplest case of L = 1 we can estimate the average value of P 1 of Eq. (35) for such a noisy boson sampling model. The easiest way to get the necessary expression for P 1 is by replacing K + 1 → 0 in Eq. (52). We obtain where we have used the summation identity (which can be obtained by the method of generating functions, see for instance, Ref. Now we can perform an approximation assuming that ρη 1, such that only the powers n √ N contribute significantly to the sum over n in Eq.
Let us also estimate the variance V ≡ P 2 1 − P 1 2 . We have from Eqs. ( where P 1 is given in Eq. (68).

E Numerical simulations of the lower bound
Here the method used to numerically simulate Eq. (9) of the main text is described. We assume, as in appendix C, a uniform overlap ξ over the internal states of bosons, thus J(σ) = ξ N −c 1 (σ) , where c 1 (σ) is the total number of fixed points of permutation σ. Denoting n = c 1 (σ) we get from Eqs. (22) and (24) ∆P L = e −νL where k = (k 1 , . . . , k n ), the sum with fix(σ) = k denotes the summation over all permutations with fixed points k (i.e., over all derangements of the complementary subset (k n+1 , . . . , k N )), and A Then the sum in the expression on the r.h.s. of Eq. (70) can be rewritten as follows where D m ∈ S N is the set of all permutations having exactly N − m fixed points, i.e., the set of all derangements of m elements in the symmetric group S N . Unfortunately, derangements are very difficult to handle numerically. Therefore, we need to rewrite the sum in Eq. (72) in terms of the matrix permanents (i.e., using all the permutations in a symmetric group). We will use the following summation identity (representing the generalised inclusion-exclusion principle) valid for any additive function f on the symmetric group S N [82] f (D N −n ) = i.e., we need to compute permanents of the positive semi-definite Hermitian matrices where we have used the summation identity (easily proven by induction in m) Therefore, we have derived the following result where f s is given by. Eq. (74). This result allows one to simulate numerically ∆P L by computing the matrix permanents.
F Comparing the models with distinguishability functions J (K) and J (K) + Let us consider how switching from the approximation model with J (K) (σ) of Eq. (6) to that with J (K) + (σ) of Eq. (17) would affect the results in Eqs. (9)- (12). The difference in probability ∆P L in Eq. (9) depends on the difference of distinguishability functions, moreover, the expression for ∆P 1 depends only on the sum σ∈Sn ∆J(σ) for K + 1 ≤ n ≤ N (see Eqs.
where we have introduced the number of derangements m ≡ n − c 1 (σ), have taken into account that derangements of m bosons are weighted by ξ m , that the model with J (K) accounts only for the derangement with m ≤ K, whereas that with J (K) + accounts for those (d