Simulating boson sampling in lossy architectures

Photon losses are among the strongest imperfections aﬀecting multi-photon interference. Despite their importance, little is known about their eﬀect on boson sampling experiments. In this work we show that using classical computers, one can eﬃciently simulate multi-photon interference in all architectures that suﬀer from an exponential decay of the transmission with the depth of the circuit, such as integrated photonic circuits or optical ﬁbers. We prove that either the depth of the circuit is large enough that it can be simulated by thermal noise with an algo-rithm running in polynomial time, or it is shallow enough that a tensor network simulation runs in quasi-polynomial time. This result suggests that in order to implement a quantum advantage experiment with single-photons and linear optics new experimental platforms may be needed.


Introduction
In 2003 Knill, Laflamme and Milburn showed that single-photon sources and linear optics are sufficient to achieve universal quantum computation [1]. A single-photon and linear-optics version of measurement based quantum computation has also been thoroughly studied [2,3]. In both proposals, a key component to reach universality is the capability for a measurement outcome to change the gates implemented later in time, i.e., using active feed-forward, a challenging experimental requirement [4]. In 2010, Aaronson and Arkhipov demonstrated that removing the feedforward condition provides a framework, called boson sampling, that is not sufficient for universal quantum computation but that is hard to simu-Raúl García-Patrón: raulgarciapatron@gmail.com  late on a classical computer, unless the polynomial hierarchy collapses to its third level [5]. The key idea behind their proof is the connection between the output statistics of non-interacting indistinguishable photons and the permanent [6], a quantity know to be #P-hard to compute [7,8]. As shown in Figure 1, a boson sampling device implements an N single-photon interference over an M -mode randomly-selected arbitrary linear optical interferometer and measures each output mode with a photon counting detector. An Mmode linear optical interferometer is build out of two-mode couplers (beamsplitters) acting on neighboring modes and single-mode phases gates. In order to generate an arbitrary linear optical circuit, a depth proportional to M is needed [9,10].
Since 2013 a variety of quantum optics experimental groups have implemented proof-ofprinciple implementations based on different architectures, such as reconfigurable integrated photonic circuits [11], fiber-loops [12], 3D waveg-uides [13] or multimode fibers [14], that have the potential of being scalable and therefore are candidates for a quantum advantage demonstration. The motivation to further simplify the experimental scheme, led to the proposal of Scattershot boson sampling, a sampling problem as hard to simulate as the initial boson sampling proposal. Scattershot boson sampling solves the problem of obtaining N single-photon from state of the art probabilistic single-photon sources by using M heralded two-mode squeezed vacuum state sources, one per input modes of the boson sampling circuit [15]. It was discussed in [16] that, since scattershot boson sampling is a subclass of Gaussian boson sampling, the problem of sampling a Gaussian state on the photon number basis is therefore of the same complexity as the original boson sampling proposal. The counting statistics of Gaussian boson sampling are connected to the Hafnian [17] a quantity related to the perfect matchings in graphs, similarly as the permanent is to the number of perfect matching in a bipartite graph.
The lack of fault-tolerant error correction in quantum advantage architectures, such as in boson sampling, implies that increasing the size and depth of the circuit would ultimately lead to a system that is equivalent to sampling random noise. Therefore, the existence of an opportunity window where noise has not yet destroyed the quantum advantage but a classical algorithm, such as [18,19], cannot longer simulate the system is fundamental for a conclusive quantum advantage experiment. It is therefore of paramount importance to have a good understanding of when noise makes a quantum advantage architecture classically simulatable. Keshari et al. provided in [20] a first rigorous bound, which combined losses and dark counts of the detectors. Unfortunately, this bound is independent of the size of the system and can not provide an answer in terms of losses independently of the detectors dark-count rate.
Together with the indistinguishably of photons, for which an algorithm to simulate partially distinguishable photons was demonstrated recently in [21], losses are the most damaging imperfections challenging boson sampling. Despite its importance, little is know about the effect of losses, except from the fact that boson sampling remains hard in the regime of a constant number of pho-tons lost [22], a rather limited assumption. Most boson sampling architectures, and all for which interference has been shown for more than 2 photons, are based on a planar geometry of depth proportional to the number of input systems and where the loss per coupler in the circuit is constant, leading to a law of exponential decay of the transmission with the depth of the circuit. In subsection 2.3 of this work we show that for those platforms, and platforms which have similar exponential decay, boson sampling can be efficiently simulated classically. Therefore, for single-photons and linear-optics to remain competitive in the race for a quantum advantage experiment, a profound change of paradigm is radically needed. More precisely, we show that for those platforms either the depth of the circuit (D) is large enough (D ≥ O(log M )) that it can be simulated by thermal noise with an algorithm running in polynomial time, or the depth of the circuit is short enough (D ≤ O(log M )) that a tensor network simulation, similar in spirit to [23], runs in quasi-polynomial time. Not all optical architectures suffer from an exponential decay of the transmission, for example optical free-space has a quadratic decay of transmission. In paragraph 2.3.4 we extend the validity of the thermal noise simulation to this family of architectures. In addition, our result can be easily adapted to Scattershot boson sampling and boson sampling architectures where the photon-counting detectors are replaced by Gaussian measurements [24,25], as shown in subsection 2.4.

Main result 2.1 The ideal boson sampling model
The initial boson sampling proposal concerns the interference of a multi-photon input state |n = |n 1 ⊗ |n 2 ⊗ . . . ⊗ |n N , where n corresponds to a string of bits such that |n| = i n i = N , over an M -mode linear-optics interferometer modeled as a linear transformation of the annihilation operatorsâ where the unitarity of U guarantees the preservation of the total photon number. Remark that U is an M × M matrix acting on the creating operators, to which corresponds a homomorphism ϕ(U ) of dimension N +M −1 N acting on the Mmode Fock space [5]. At the output we implement a measurement on the photon number basis on each mode, where the probability of obtaining an outcome z reads | z|ϕ(U )|n | 2 . When z corresponds to a string of bits, the probability outcome is connected to the permanent of a submatrix of U , a crucial tool in the hardness proof of boson sampling. Interestingly, all arrangement of two-mode couplers and single-mode phase gates leading to the same global transformation U are completely equivalent, and will be experimentally indistinguishable, with respect to their input/output statistic.
A necessary condition in the proof of the hardness of boson sampling is the fact the U needs to be a Haar random unitary. The proposal by Reck et al. [9] showed that any general linearoptics transformation can be achieved with a planar circuit composed of two-mode gates, using M (M − 1)/2 gates distributed over 2M − 3 layers and M parallel modes. A recent improvement in [10] remarkably brought this result to depth M +1, which is very close the the lower-bound M obtained from a simple counting argument based on the degrees of freedom of a unitary matrix. Therefore, an architecture implementing boson sampling needs to have a depth at least equal the number of modes, as it could be in principle any M xM unitary U . Even if boson sampling over a planar architecture needs depth scaling linearly with the number of modes, other not universal circuit of shortest depth can be of interest for specific applications. Therefore, to make our result as general as possible we will consider the depth of the circuit D as an additional free parameter.
In the initial boson sampling proposal, the proof necessitates a polynomial relation between the number of photons and modes (M = O(N 6 )). In this work we consider the generalized relation where 0 < k < 1 and 0 < γ < 1. It is easy to see that γ = 1/2 corresponds to the bosonic birthday paradox ratio [26]. This ratio ensures that the input states |n is composed of single photons and the probability of two boson bunching at the output is negligible (on average over the set of Haar random unitaries). The case γ = 1/6 is the condition in the proof of Aaronson and Arkhipov. Finally, γ = 1 corresponds to the regime where Figure 2: a) N single photons are sent over a lossy M mode linear-optical circuit composed of D layers of twomode couplers. An uniform loss of τ per coupler leads to an exponentially decay of the transmission µ = τ D ; b) the real scheme in a) is indistinguishable from a circuit composed by a lossless linear-optics transformation V , followed by M parallel set of pure-loss channels of transmission µ i each, and a final lossless linear-optics transformation W ; c) In the case where all the µ i are equal, the virtual representation can be simplified to a layer of M identical pure-loss channels of transmission µ followed by a virtual lossless linear-optics transformation U = V W ; d) The action of a pure-loss channel of transmission µ on a single-photon state |1 is equivalent to an erasure channel of probability µ, which outputs the the density of photon (then given by k) remains constant while the size of the system increases, as opposed to the original boson sampling proposal where it decreases with the size of the system with a scaling M −5/6 .

Modeling losses
A lossy linear-optics circuit, as in Figure 2 a), can be mathematically modeled by a complex matrix A satisfying AA † ≤ I and transforming the annihilation operators of M input modesâ and M environment modesê aŝ A has a singular value decomposition A = VμW , where V and W are unitary matrices andμ = diag( The singular value decomposition has a very natural interpretation, see Figure 2 b), which is that the real interferometer with losses characterized by A has an equivalent circuit composed by a lossless linear-optics transformation V , followed by M parallel set of pure-loss channels of transmission µ i each, and a final lossless linear-optics transformation W . A pure-loss channel is equivalent to a coupling interaction of transmission µ i between our physical mode i and an environmental mode, see [27] for details. The matrix A can be efficiently inferred using a simple tomographic technique that only needs two-mode interferences of classical laser light [28].
In practice, a linear-optics circuit is composed of a network of two-mode couplers and singlemode phase gates, where each layer of gates of an M -mode linear-optics circuit is given by a direct product of local 2 × 2 linear transformation and complex scalars, resulting in a M × M complex banded matrix A i of width 2. The total linear-optics circuit transformation results from the multiplication of D matrices A i , i.e., Similarly, as discussed for the ideal linear-optics circuit, any arrangement of two-mode couplers and single-mode phase gates leading to the same global transformation A will be experimentally indistinguishable with respect to their input/output statistic.
All currently existing architecture proposals to implement a boson sampling experiment, integrated photonic circuits, fiber-optical links and 3D-waveguides, suffer from exponential decay of the transmission with the length of the circuit. An intuitive explanation is that every photon has a constant probability of being lost per unit of length of the circuit or per layer of coupling gates. For a planar graph circuit composed of D layers of gates, where every gate has a transmission coefficient τ , we obtain that all the µ i are equal and the transmission follows an exponential decay rule µ = τ D . Becauseμ = √ τ D I commutes with any matrix, we can simplify the virtual representation of A to a layer of M identical pure-loss channels of transmission µ followed by a virtual lossless linear-optics transformation U = V W (see Figure 2 c)).
To make our proof more accessible, we first restrict to the scenario of uniform loss and later generalize the result to arbitrary circuits in Section 6, where we show that any result that holds for uniform µ can be generalized to µ max = max i µ i .

Simulating boson sampling over circuits with losses
The action of a pure-loss channel of transmission µ into a single-photon state |0 is equivalent to an erasure channel of probability µ, resulting into a mixed state σ = (1−µ)|0 0|+µ|1 1|, see Figure  2 d). Therefore, boson sampling over a realistic interferometer with uniform losses is equivalent to an ideal boson sampler over its virtual circuit U = V W where we replace each of the N singlephotons of the input state, located in the first N modes, by the state σ, leading to a global input state ρ in = σ N ⊗ |0 0| M −N . This state can be equivalently written n|N denotes the set of distributions of n photons without collisions, i.e., n being a string of bits over the first N modes inputs and the remaining M − N input modes are in a vacuum state. This last expression shares some similarities with the state ρ T = σ N th ⊗ |0 0| M −N , composed of N thermal states in the first N modes and the remaining M − N input modes in a vacuum state. A thermal state is given by the Bose-Einstein distribution where λ = n 1+ n , with n being the average number of photons. The state ρ T can be equivalently written where Φ n|N is the set of all possible distributions of n photons over the first N input ports, where collision are now allowed. This similarity can be made formal as we show in the following section.

Simulation of lossy boson sampling with thermal noise
In section 3 we will prove the following lemma Lemma 1. For a boson sampling circuit of uniform losses µ satisfying the condition the trace-distance between both states ρ in and ρ T , when setting An intuitive interpretation of this result is the following. Suppose that you implement a boson sampling experiment where you inject N single photons in the circuit. After the lossy circuit you detected only k photons, each at a different detector, i.e., satisfying the bosonic birthday paradox condition. This provides an estimate of the losses of the order µ ≈ k/N . We do not know from which input each of the k surviving photons was coming, but being a boson sampling device we know each one comes from a different input mode. On the other hand, for k detected photons resulting from a thermal state input ρ T , all the possible distribution of the k photons among the N input modes are equiprobable, which allows all types of collisions. Imposing a bosonic birthday paradox condition k ∼ √ N the probability of collisions becomes negligible and both scenarios provide approximatively the same answer, which justifies the scaling µ ∼ 1/ √ N . Because quantum operations and measurement can only decrease the trace distance, any outcome statistics, p in and p T , resulting from any quantum operation followed by any measurement will also satisfy D (p in , p T ) ≤ . Therefore, any classical algorithm efficiently simulating a boson sampling experiment with input state ρ T will be -close from the one with input ρ in . The following lemma on the efficient classical simulation of boson sampling with thermal states input, which was implicit in [29], is proven in section 4, Lemma 2. There exist a polynomial time classical algorithm that simulates the evolution of a thermal state ρ T over a ideal or lossy interferometer followed by measurement on the photon number basis, where the output distribution isclose to the ideal one and the computational cost scales as O M µ 2 N log 2 N .
As detailed in section 4 the algorithm combines the three following well-know facts in quantum optics. Firsly, any thermal states ρ T has a Glauber-Sudarshan P -representation as a mixture of an N -mode tensor product of coherent states |α = M i=1 |α i according to a Gaussian distribution P (α), which can be sampled efficiently. Secondly, a linear-optical circuit characterized by a unitary matrix U transforms a tensor product of coherent states α into another tensor product of coherent states |β = U |α . Thirdly, coherent states follow a Poisson photon number distribution, which can be sampled efficiently.
The combination of both lemmas will allow us to simulate a lossy boson sampling architecture composed of D layers of gates with exponentially decaying transmission µ = τ D , as stated in the following theorem.
Proof. Any classical algorithm that generates a distributionp T approximating the sampling from where we use the triangle inequality in the first inequality and the fact that a measurement over a quantum state can only decrease its trace norm in the second. We can now apply Lemma 1 to set the bound D(ρ in , ρ T ) < /2 and Lemma 2 to bound D(p T , p T ) < /2. It is then easy to see that the classical simulability condition D ≥ D * can be trivially derived starting from the condition µ ≤ 1/2 /(2N ) adapted from Lemma 1, replacing the losses by µ = τ D , taking the logarithm and replacing N by eq. (2). Therefore, when the condition D ≥ D * is satisfied, by properly selecting a thermal state ρ T satisfying λ = µ/(1 − µ) and running the algorithm defined in section 4 and leading to Lemma 2,p T provides an -approximation to our initial sampling distribution p in in polynomial time.

Simulation of shallow boson sampling circuits
In section 5 we show how one can simulate an ideal boson sampling circuit using tensor network techniques, which can be summarized in the following lemma.

Lemma 3. An ideal boson sampling circuit with N interfering photons over an M -mode linear interferometer of depth D can be simulated exactly using tensor networks with a running time
Tensor networks are a way of encoding quantum states and operating with them that have proven to be very successful in many-body physics [30,31,32,33,34]. Our tensor network proof is a quantum optics version, adapted from [23], of the well-know result that logarithmicdepth planar circuit of M qudits can be simulated on polynomial time [35]. It is easy to see that when the depth of the circuit scales logarithmically with the number of modes M and N satisfies equation (2), our algorithm runs in quasipolynomial time. This is due to the unbounded nature of the Hilbert space of optical modes. In order to have an exact simulation we need to fix the local dimension on each mode to be as large as the total number of photons in the circuit, which results into a quasipolynomial scaling.

Simulation of exponential decaying transmission architectures
It is now easy to see that combining theorem 1 and lemma 3 we can classically simulate any boson sampling architecture that has an exponential decaying transmission, for any depth of the circuit, as stated by the following theorem.
Proof. The classical simulability under condition D ≥ D * is a direct corollary of theorem 1. For a uniform losses boson sampling circuit satisfying D ≤ D * we simulate the circuit with an equivalent virtual circuit composed of an ideal boson sampler U of same depth D precede by M lossy channels of transmission µ = τ D , as explained in subsection 2.1. This is equivalent to an ideal boson sampling circuit U on the the input state ρ in = σ N ⊗ |0 0| M −N . We model the initial state ρ in by starting with N single-photons and transforming each one into a vacuum state with probability µ. We then proceed with the tensor network simulation of U using lemma 3, where we just need to change the input tensor accordingly to the random sequence of (surviving) input single-photons. This leads to a quasipolynomial time algorithm with a running time

Algebraic decay of transmission
Not all optical architectures suffer from an exponential decay of the transmission, for example optical free-space suffers from a decay of transmission scaling as 1/D 2 . Suppose that a given architecture follows the following algebraic decay of losses where d is a length scale that together with the parameter β model the algebraic decrease of transmission. Then theorem 1 can be adapted to the following weaker form.

Corollary 1. The statistics of N photons in-
terfering over an M -mode linear-optics planar circuit of depth D, with algebraic losses given by eq. (11), and a relation between photons and modes given by eq. (2) can be approximate with trace distance error in polynomial time when It is not difficult to check that when the condition γ/β < 2 is satisfied, it always exist an M * such that the condition D * ≤ M − 1 is satisfied for all M ≥ M * . This shows that any boson sampling experiment, which needs a depth D = M , on algebraic decaying architecture satisfying γ/β < 2 will be classically simulatable by an approximation by thermal noise sampling for all M ≥ M * . A condition meet by any free-space boson sampling architecture with its depth proportional to the number of modes (D = M ).

Generalization to alternative boson sampling proposals
Scattershot boson sampling was presented in [15] to circumvent the main problem of current state of the art non-deterministic single-photon sources, where the probability of firing N photon at the same time decays exponentially. The protocol starts by generating M two-mode squeezed vacuum states, where λ is the same parameter as in the definition of a thermal state, as a two-mode squeezed vacuum state is its purification. Then we send half of the two-mode squeezed vacuum states through a boson sampling circuit, while the remaining modes are used to herald the presence of photons. By properly tunning the squeezing parameter one can guarantee that most of the heralded sequences are collision-free, i.e., satisfy the birthday-paradox condition. The price to pay is that the modes where the photons enter the circuit are completely randomized, which is not a problem for boson sampling as the circuit is anyway Haar random. Because right after the heralding process the setup is strictly equivalent to a traditional boson sampling device, up to the randomization of the modes where the single-photons enter the interferometer, both of our simulation algorithms (thermal state sampling and tensor network simulation) can be trivially adapted. We only need to randomly generate valid heralding sequences following the distribution given by eq. (13) and depending on the obtained heralded value we run a boson sampling simulations as detailed in subsection 2.3. The only difference is that now the input photons enter the interferometer on a random selection of N input modes. More recently, a variant of boson sampling where photon detectors are replaced by a Gaussian measurement, has been proposed [24,25]. Because quantum operations and measurement can only decrease the trace distance, the outcome statistics p in and p T will remain -close after any quantum measurement. The evolved thermal state being Gaussian, we can extend our result to this scenario by using well-know techniques of simulating Gaussian measurement over Gaussian states [36].

Proof of Lemma 1
The input state ρ in = σ N ⊗ |0 0| M −N , resulting from sending the N initial single-photons through N pure-loss channels of losses µ reads (14) where Φ (1) n|N denotes the set of non-collisional distributions of photons over the first N modes inputs. As was mentioned in subsection 2.3, this last expression is strikingly similar to where λ = n 1+ n , with n being the average number of photons on each of the N thermal states, and Φ n|N is the set of all possible distributions of n photons over the first N input ports, where now collision are allowed. When µ < 1/2, which always holds when equation (7) is satisfied, we Let us estimate the trace distance between the states in Eqs. (16) and (15) where we have decomposed the thermal state into two parts according to the summation terms (i.e., with ρ (n≤N ) T being the part obtained from Eq. (15) by dropping the terms with n > N from the sum) and used that ρ = 0, as they correspond to the product of density operators acting on orthogonal subspaces. We also define which provides the simplifications Trρ where the first line corresponds to the collisionfree terms and the second to the collision ones, for which only ρ T contributes. One can upperbound the two terms on the r.h.s. of Eq. (19) as where we used q(λ, N ) ≤ 1 in the first inequality, the easy to check relation when µ ≤ 1/2 in the third. Remark that µ ≤ 1/2 is a corollary of equation (7). Therefore, the condition as stated in the definition of lemma 1.

Proof of Lemma 2
An ideal algorithm for boson sampling of thermal states was implicit in [29] and uses three properties. Firstly, any thermal states ρ T has a Glauber-Sudarshan P -representation as a mixture of Nmode coherent states |α ≡ |α 1 , ..., α N = N i=1 |α i according to a Gaussian distribution where Secondly, a linear-optical circuit characterized by a unitary matrix U transforms a tensor product of In other words, coherent states remain in a tensor product form while evolved through a linear optical circuit. Thirdly, coherent states follow a Poisson photon number distribution Therefore, a concatenation of three stochastic processes simulates the boson sampling of thermal states. The first process generates a complex vector α following the probability distribution p(α). The second one applies the map U to the vector α generating the output β = Uα. The third process P generates an M -dimensional vector m from β by sampling from the Mdimensional Poisson distribution P (m, β). This algorithm is an ideal one, as it assumes access to the oracles that samples from Gaussian and Poisson distributions.
In order to build a realistic algorithm we define a new three step process, where the sampling oracles are replaced by efficient approximation algorithms. The first step consist of sampling from the discrete-variable distribution p(α (s) ), similar as in [37]. It can be understood as a map N from the continuous α to its discretized version α (s) . The second stepŨ implements an approximation of matrix multiplication, mapping α (s) to β (s) . Finally the third stepP generates m (s) , an approximate sampling from an M -dimensional Poisson distributions (24) of parameter β (s) , using a scalable number of Bernouilli trials [38]. The three subroutines will be described in more detail bellow.

Error analysis
We want to show that the trace distance between the ideal and approximate algorithms, above described, satisfies D(p(m), p(m (s) )) ≤ with the algorithm remaining polynomial in time.
In what follows we will show how one can build efficient algorithm for each step such that each of the last three terms on the r.h.s. of (25) is bounded by /3, leading to D(p(m), p(m (s) )) ≤ . It is important to remark that it is of crucial importance for the efficiency of the algorithm that U andP have finite support.

Discretization of the Gaussian distribution
To simplify the algorithm presented below, we convert the Glauber P-functions into the standard complex normal distribution by setting z = α/ n = 1−λ λ α which transform p(α) to We employ a cut-off the Gaussian distribution in Eq. (26) to the product Ω R of finite circular regions |z k | ≤ R, for k = 1, . . . , N . Each of the circular regions is discretized by imposing the circular grid such that there is a central circle of radius 0 ≤ |z| ≤ r 1 (which is not subdivided) and each circular region r l ≤ |z| ≤ r l+1 , for l = 1, . . . , L (with some L), is subdivided into phase sectors by polar angle φ of size δφ. We choose equal δφ l for each value of r l by setting δφ l = (δr l )/r l ≡ (r l+1 − r l )/r l , which allows to set δr l , l = 1, . . . , L, to be the same to have equal areas δA of the grid elements (which also sets r 2 1 = δA/π). With these specifications, setting the number of elementary elements of the grid to be N g , the other parameters become We will also need the largest distance ∆z = max(|∆z l |) between two points in an element of the grid, which is bounded as where we have used that 1 − cos(x) ≤ x 2 /2. Note the relation δA = ∆ 2 z /3. Let us estimate the trace distance between two probability distributions p(α) and the R-cut and discretized Gaussian, as above described, p (s) (α).
To do so, we denote by p (R) (α) the intermediate R-cut continuous Gaussian distribution before the discretization. Introducing the indicator function I s (z) = N k=1 I s k (z k ) of s = (s 1 , . . . , s N ), where s i is the sth element of the grid of dimension i and, denoting an inner point by z (s) , we can write the trace distance between the two probability distributions as where Ω R is the region complementary to Ω R . The first integral on the r.h.s. of Eq. (29) can be easily show to satisfy the upper bound The last integral sum on the r.h.s. of Eq. (29) can been estimated using the multidimensional mean-value theorem in the variable Z = (z 1 , . . . , z N ): where there is suchẑ (s) on the straight line between z and z (s) and the upper bound for the 2norm of a (complex) row vector Z = (z 1 , . . . , z N ): . With these observations, by using the Cauchy-Schwarz type inequality in the summation, the last integral sum in Eq. (29) is upperbounded by where the last two integral sums on the r.h.s. of Eq. (32) are bounded from above by the respective N -dimensional Gaussian integrals on the N -dimensional circular region composed of r 1 ≤ |z k | ≤ R + δr for each z k , k = 1, . . . , N . The latter integrals are less then N and 1, respectively.

The algorithm
Generate N random points z k according to the standard 1-dimensional complex Gaussian distribution G(z) = 1 π e −|z| 2 truncated to |z| ≤ R (with the necessary renormalization) and discretized by the circular grid of Eq. (27) and (28) with R and ∆ z as defined above. From Eqs. (28) we find that one needs for this step at most grid elements. One generate z k (i.e., sample from the grid elements) by employing an adaptation of the inverse transform sampling method to the Rcut discretized Gaussian distribution in z-plane. By setting y = (1 − e −r 2 )/(1 − e −R 2 ) we can write the probability to have r l ≤ |z k | ≤ r l+1 as y l+1 − y l = (e −r 2 l − e −r 2 l+1 )/(1 − e −R 2 ). One selects an interval [y l , y l+1 ] (i.e., circular region l in the z-plane) by sampling from the the uniform distribution on 0 ≤ y ≤ 1. Since the circular region r l ≤ |z| ≤ r l+1 for l ≥ 2 is subdivided into the equally probable phase sectors of size δφ, the next step is perform the uniformly random selection of one of the phase sectors. By these two steps one selects for k = 1, . . . , N one grid element according to the R-cut discretized Gaussian distribution. The scheme necessitates O(N ) operations and the number of bits N b required for the necessary accuracy of the uniform distribution in y scales as

Approximate Poisson sampling with Bernouilli trials
Starting from an M -dimensional vector β (s) we output m (s) , where for each β x n ≤ x 2 n (37) between the probability distribution of the sum of n independent Bernoulli trials, S n = ξ 1 + . . . + ξ n , with p B (ξ = 1) = x/n, p B (ξ = 0) = 1 − x/n and the Poisson distribution P (m, x). This implies that our M -dimensional output m (s) error will be bounded as D P β (s) ,P β (s) ≤ M max |β| 4 n . (38) Therefore, the number of necessary Bernoulli trials n for simulation of a Poisson distribution to an error /3 reads where we use that β = U α = λ/(1 − λ)U z, |U | ≤ 1, and that for µ ≤ 1/2 we have λ/(1−λ) = µ/(1 − 2µ) ≤ 2µ. In full generality, simulating the evolution of this state will be hard as the Hilbert space, of size , grows exponentially if both N and M increase proportionally to each other. The idea behind tensor networks is to represent the element C n 1 ,n 2 ,...,n M as a tensor network with M tensors with virtual degrees of freedom that contract each other, leaving M free parameters corresponding to the physical indexes n i . This representation provides a very intuitive representation of quantum states and allows for a very efficient encoding and manipulation when the states have a high degree of locality.

Matrix product states
In our case we are interested in the evolution of a particular example of tensor network called matrix product states, where B [1] n 1 is a transposed vector of dimension χ 1 , B [N ] n N is a vector of dimension χ M , and B The physical indexes n i take values {0, 1, ..., d}. As shown in Figure 3 (c), one can associate to each matrix product state a 1-dimensional graph where each vertex is associated to a three index tensors B [i] n i ,α,β (Figure 3 (a)) and the edges determine the contraction rule of the tensor indexes (Figure 3 (b)).

Canonical form
It is well know that any bipartite quantum state can be rewritten as (42) where the Schmidt coefficients λ α result from the singular value decomposition c ij = α U i,α λ α V j,α . Every matrix product state can be also transformed into a canonical form n M |n 1 , n 2 , . . . , n M (43) by iteratively applying the singular value decomposition [43,44], with its graph representation shown in Figure 3

Simulating ideal linear-optics circuits
A linear optics circuit is composed of one-mode phase gates and two-mode couplers implementing an interaction between two adjacent modes. In what follows we first explain how to update a matrix product state that goes under the evolution of linear-optics gates and later discuss how to sample from the final output state.

One mode gates (phase rotation)
As shown in Figure 4, a single-mode gate acting on mode i is modeled by a matrix G  .
A phase rotation θ has a matrix G [i] that is diagonal with coefficients G [i] = exp(iθn i ). Therefore, the update of a single local gate scales as O (d + 1)χ 2 , as G is diagonal. Notice that applying a single-mode gate wont change the Schmidt coefficients of the matrix product state, as it acts only on the physical indexes of one vertex of the graph.  A two-mode coupler B [k,k+1] acting on modes k and k + 1 is modeled by a 4 legs tensor, i.e., a matrix product operator, with physical indexes n k and n k+1 for the input and n k and n k+1 for the output, B [k,k+1] = n k ,n k+1 ,n k ,n k+1 C n k ,n k+1 n k ,n k+1 |n k , n k+1 n k , n k+1 |, (45) where the coefficients C n k ,n k+1 n k ,n k+1 are the well-known input-output amplitudes of a beamsplitter [45,46] (see also equation (3.9) in [5]).

Two-mode couplers
In [23] an algorithm was constructed based on directly applying the unitary B [k,k+1] to the matrix product state followed by a singular-value decomposition to rebuild the canonical form of the output state, reaching a scaling O (χ 3 d 3 ). In what follows we present an alternative algorithm that provides a better scaling when the bond dimension χ is higher than the physical dimension d, which is generally the case in most realistic simulations.
As shown in Figure 5 a), in order to model the evolution of modes k and k + 1 under a two-mode coupler operation, we first implement a singularvalue decomposition of the matrix product operator B [k,k+1] with respect to the separation between (n k , n k ) and (n k+1 , n k+1 ) indexes, i.e., n k+1 ,n k+1 ,γ (46) where χ BS is the Schmidt rank of the matrix product operator. The Schmidt rank of a singular-value decomposition of a matrix being upper-bounded by the largest of the two local dimensions provides the bound where the running time of the matrix product operator decomposition scales as O (d + 1) 6 . As shown in Figure 5 b), the next step is to contract the tensors Γ [k] and X [k] of mode k and Γ [k+1] and X [k+1] of mode k + 1 in order to generate the tensorsΓ [k] andΓ [k+1] of the state resulting after the beamsplitter transformation.
The running time of the contraction leading to the tensorΓ [k] scales as χ k−1 χ k χ BS (d+1) 2 where forΓ [k+1] scales as χ k+1 χ k χ BS (d+1) 2 which leads to a scaling of the contraction running time (48) Remark that, as shown in Figure 5, the tensors Γ [k] andΓ [k+1] are connected by two pairs of singular values, χ k from the initial state and χ BS from the beamsplitter matrix product operator, which can be merged into a singleχ satisfying χ k = χ k χ BS , which combined with equation (47) provides the bound which is the equivalent of Lemma 4 (i) in [35].

Circuit simulation
The boson sampling input state corresponds to a trivial matrix product state of bond dimension χ = 1, composed of N tensors Γ

Sampling from a matrix product state
Once the matrix product state resulting from D layers of gates has been calculated we can sample from it following a sequential procedure explained in [47] and reproduced here for completeness. We generates one random outcome per mode at a time and exploits the chain rule p(n 1 , . . . , n M ) = p(n M |n M −1 , . . . , n 1 ) . . . p(n 1 ) (50) First calculate for each of the d+1 outcomes n 1 the probability Tr [|ψ ψ||n 1 n 1 | ⊗ I 2...M ], where |n 1 n 1 | is the projector on the photon number state n 1 of mode 1 and I 2...M is the identity operator on modes 2 to M . This is done by contraction the matrix product state with himself, interleaved with a matrix product operator representing the measurement projector |n 1 n 1 |. Then we randomly select one of the d + 1 potential outcomes n 1 and update our state by generating |ψñ 1 := ñ 1 |ψ , where the bra ñ 1 | acts only on mode 1. The result of this contraction is a new, unnormalized matrix product state |ψñ 1 of size N − 1. Note that this new matrix product state satisfies the condition ψñ 1 |ψñ 1 = p(ñ 1 ). The second step now uses the state |ψñ 1 . Firstly, we calculate the d + 1 outcome probabilities p(n 2 ,ñ 1 ) := ψñ 1 | (|n 2 n 2 | ⊗ I 3,...,N ) |ψñ 1 and randomly select añ 2 from the probability distribution p(n 2 |ñ 1 ) := p(n 2 ,ñ 1 )/p(ñ 1 ). Secondly, we generate a new, unnormalized matrix product state |ψñ 1 ,ñ 2 := ñ 2 |ψñ 1 of size N − 2. Continuing this procedure for the remaining M − 3 output modes, we end up with one sample drawn according to the probability distribution p(n 1 , n 2 , . . . , n N ). The highest computational cost corresponds to the contraction leading to p (n 1 , n 2 , . . . , n N ). A trivial contraction algorithm provides a running time of O M χ 4 (d + 1) 2 , which for matrix product state resulting from D layers of couplers gives O(M 2 (d + 1) 8D+2 ).

Simulating an ideal logarithmic depth circuit
It is important to notice that the bond dimension scales exponentially with the depth of the circuit. If d was a constant, such as in spin systems simulations, a shallow circuit satisfying a logarithmic depth constraint as in eq. (10) would lead to a polynomial time algorithm. In a tensor network simulation of quantum optics, the potential bunching of photons, which can all potentially accumulate in a given mode, makes the simulation harder. In order to obtain an exact simulation of the evolution and the sampling of N input single photons over a circuit of depth D we fixed the physical dimension over the whole evolution to d = N , the total number of photons. Because N scales with the number of modes M , see eq. (2), the computational cost of contraction, storage and sampling becomes quasipolynomial in the size of the system.

Generalization to non-uniform losses
In subsection 2.1 we have shown that a lossy linear-optics interferometer is mathematically modeled by a complex linear transformation A satisfying a singular value decomposition A = VμW . The singular value decomposition has a very natural interpretation, see Figure 6 a), where the real interferometer with losses characterized by A has an equivalent circuit composed by a loss-less linear-optics transformation V , followed by M parallel set of different pure-loss channels of transmission µ i , and a final lossless linear-optics transformation W . The M parallel set of pure-loss channels of transmission µ i can be decomposed into a concatenation of two pure-loss channels of transmission µ = max µ i andμ i = µ i /µ. Because M parallel pure-loss channel of transmission µ commute with the unitary V , we obtain the final scheme of Figure 6 b), where M parallel set of pure-loss channels of transmission µ are followed by the ideal circuit V , followed by M parallel set of pure-loss channels (μ i ) and a final ideal circuit W . Then we can define the state ρ in as resulting from applying M parallel set of pure-loss channels of transmission µ and the thermal state ρ T is chosen such that λ = µ/(1 + µ), in the same way we did in subsection 2.1. The only difference is that now the set of operations after the pure-loss channels of transmission µ is not longer ideal interferometer U but an M -mode quantum channel L resulting from concatenating an ideal interferometer V and M parallel set of pure-loss channels of transmissionμ i followed by a final ideal interferometer W .

Generalizing lemma 1
Because applying a quantum operation L can only make the trace-distance decrease, similarly as for a measurement, it is trivial to see that one can generalize lemma 1 by replacing the uniform losses µ by µ = max µ i .

Generalizing lemma 2
Additionally, the thermal state sampling algorithm in section can be easily adapted. It is a well-know fact in quantum optics that the action of a pure loss channel of transmission µ i on a coherent state |α outputs a weaker coherent state |µ i α . Therefore, the evolution of an input multimode coherent state |α can be easily computed, by implementing the matrix multiplication β = Aα. Once the output multimode coherent state has been determined, the sampling from Poisson distribution is done similarly as before.

Generalizing lemma 3
The adaptation of the tensor network simulation is slightly more involved. Let's use the notation A i,j for the coupler acting on input mode i and i+1 at the layer of couplers j. Every A i,j has a decomposition into a unitary V i,j followed by two independent pure-loss channels and a final unitary V i,j . Because every pure-loss channel can be seen as a coupling interaction with an environmental mode, it is easy to see that a circuit with losses can be transformed into an ideal lossless circuit by doubling the number of couplers and adding two ancillary modes per coupler with losses. We can then place all the ancillary modes interacting with mode i between input modes i and i+1, i.e., D of them bellow each input mode for a circuit of depth D. For a circuit of depth D there is at most 3D gates acting on each mode with a range of at most D. As detailed in [35], one can transform a D range gate into 2D nearest-neighbor gates. Therefore, our initial circuit with losses of depth D = M becomes a circuit with M 2 modes and 6M 2 nearest-neighbor gates. This leads to a less favorable scaling of the computational cost of contraction, storage and sampling, but which remains quasipolynomial in M . This last algorithm is certainly not optimal and we are convinced that more elaborated choices can certainly improve the simulation of linear-optical circuits with non-uniform losses.

Conclusion
The vast majority of currently existing architecture proposals to implement a boson sampling experiment, suffer from exponential decay of the transmission with the length of the circuit. We have shown that for those platforms, boson sampling and most of its variants can be efficiently simulated classically by sampling thermal noise with an algorithm running in polynomial time. More precisely, we have show that either the depth of the circuit (noted D) is large enough (D ≥ O(log M )) that it can be simulated by thermal noise with an algorithm running in polynomial time, or the depth of the circuit is shallow enough (D ≤ O(log M )) that a tensor network simulation runs in quasi-polynomial time.
This result suggest that in order to implement a quantum advantage experiment with singlephotons and linear-optics we need a profound change of paradigm. One possibility would be to shift to platforms where the transmission does not decreases exponentially and for which our algebraic decay result does not hold. Another option would be prove the hardness of novel boson sampling architectures beyond the planar circuit architecture. A promising route would be to reduce the the depth of the circuit to the shallow regime while maintaining the complexity by moving to a lattice structure. A potential candidate would be an adaptation to quantum optics of the recent result on the complexity of a finite time quench over a 2D spin lattice [50].
We discussed that the potential bunching of photons makes the tensor network simulation of quantum optical systems more involved than its finite spin counterparts. One could potentially restore the polynomial scaling of logarithmic depth circuits observed for finite systems by designing an -approximate algorithm that truncates every mode to a finite size. To our knowledge, this is a non-trivial result that is certainly worth pursuing in future research.
This work is only a first step in the study of boson sampling under losses and further results may restrict even further its quantum advantage regime. Finally, an interesting open question is whether our proof can be adapted to other technological platforms candidates to a quantum advantage test.
After the completion of this article we learned about Ref. [51] that obtains a similar result using very different techniques.