The Complexity of Bipartite Gaussian Boson Sampling

Gaussian boson sampling is a model of photonic quantum computing that has attracted attention as a platform for building quantum devices capable of performing tasks that are out of reach for classical devices. There is therefore significant interest, from the perspective of computational complexity theory, in solidifying the mathematical foundation for the hardness of simulating these devices. We show that, under the standard Anti-Concentration and Permanent-of-Gaussians conjectures, there is no efficient classical algorithm to sample from ideal Gaussian boson sampling distributions (even approximately) unless the polynomial hierarchy collapses. The hardness proof holds in the regime where the number of modes scales quadratically with the number of photons, a setting in which hardness was widely believed to hold but that nevertheless had no definitive proof. Crucial to the proof is a new method for programming a Gaussian boson sampling device so that the output probabilities are proportional to the permanents of submatrices of an arbitrary matrix. This technique is a generalization of Scattershot BosonSampling that we call BipartiteGBS. We also make progress towards the goal of proving hardness in the regime where there are fewer than quadratically more modes than photons (i.e., the high-collision regime) by showing that the ability to approximate permanents of matrices with repeated rows/columns confers the ability to approximate permanents of matrices with no repetitions. The reduction suffices to prove that GBS is hard in the constant-collision regime.


Introduction
The quest for quantum computational advantage has given rise to a surprisingly fruitful relationship between computer science and physics: theorems provide a foundation for experiments, while practical considerations set challenges for new mathematics. Consider for instance the role of BosonSampling [1]. It gave strong complexity-theoretic evidence that even a weak photonic device could perform a task that is classically intractable. This work motivated future experimental demonstrations [2,3,4,5] that inspired new theoretical models [6,7], which in turn resulted in further experiments [8,9,10].
Gaussian boson sampling (GBS) is a paradigm where a Gaussian state of light is prepared, then measured in the photon-number basis [7,11,12]. This approach offers several benefits. First, Gaussian states can be prepared by a combination of squeezing, displacement, and linear interferometers, which can in principle be applied deterministically and implemented with nanophotonic integrated circuits [13]. This means they can potentially be mass-produced and scaled rapidly [14,15]. Moreover, the inclusion of squeezing and displacements allows more versatility in programming GBS devices, a property that is leveraged in several GBS-based algorithms [16,17,18,19,20,21,22]. GBS has already been used to claim quantum computational advantage [23,24,25], and there are several more proposals for hard-to-simulate GBS experiments [6,7,26]. That said, we believe these experiments reveal that significant progress is still required to bridge the gap between our theoretical hardness arguments and what is currently achievable in the lab. For example, state-of-art GBS experiments consisted of 216 modes with up to 125 photons [23] and 144 modes and at most 113 clicks [25], but all hardness arguments currently assume that the number of modes is at least quadratic in the number of photons, and sometimes worse. 1 Furthermore, the underlying physics of GBS is such that the probability of a given output state is described by the hafnian, a matrix function similar, but not identical to, the permanent. Because of this, new conjectures tailored to this paradigm are sometimes required-see, for example, the Hafnianof-Gaussians conjecture in [7] which parallels the Permanent-of-Gaussians conjecture in [1]. While the goal of this paper is not to compare the merits of the individual conjectures, we do feel that having fewer standard conjectures is generally preferable.
This paper introduces Bipartite Gaussian Boson Sampling 2 (BipartiteGBS) as a new method for programming a GBS device which will begin to address some of these challenges. The key property of BipartiteGBS experiments is that the output probabilities are proportional to the permanents of submatrices of arbitrary matrices. Contrasted with traditional BosonSampling, where the output probabilities are dictated by permanents of submatrices of a unitary matrix, BipartiteGBS can be seen as a powerful new tool on which to build hardness-of-simulation arguments. In particular, this paper will focus on the hardness of approximately sampling from BipartiteGBS distributions with a classical device.
As it turns out, our construction is a strict generalization of Scattershot BosonSampling [6], a different GBS setup for which the output probabilities are given by permanents of unitary matrices. Because of this, the computational hardness of Scattershot BosonSampling can be rooted in the same conjectures on which the hardness of BosonSampling is based-namely, that Gaussian permanent estimation is #P-hard and that Gaussian permanents anti-concentrate. However, this also means that Scattershot BosonSampling inherits the same technical caveats of BosonSampling. In particular, to guarantee that the n × n submatrices of an m × m unitary appear Gaussian, Aaronson and Arkhipov require that m = ω(n 5 ). Therefore, all their hardness proofs are technically within this regime. To be clear, it is widely assumed that m = ω(n 2 ) is sufficient, but to the authors' knowledge, there is currently no definitive proof. 3 Even more general Gaussian boson sampling protocols suffer from this problem, and recent approximate average-case hardness proofs for GBS must conjecture directly that quadratically-many modes suffice [26].
The main technical contribution of this paper is to show that BipartiteGBS can be used to close this loophole. That is, the hardness of GBS can be based on the exact same set of conjectures as BosonSampling, while also working in the regime where the number of modes m is quadratic in the expected number of photons n . 4 Formally, we prove the following theorem: In other words, assuming the BosonSampling conjectures, there is no efficient classical algorithm for BipartiteGBS in this regime unless there is a collapse in the polynomial hierarchy that complexity theorists consider to be extremely unlikely. Unsurprisingly, the proof of this theorem will leverage the fact that the output probabilities of BipartiteGBS experiments are based on permanents of arbitrary matrices. Since the hardness of BosonSampling is based on Gaussian permanents anyway, an obviousin-retrospect idea is to simply start with those Gaussian matrices. Note that, as in all photonic experiments, the probabilities are actually governed by submatrices of the transition matrix. Clearly, however, the submatrices of a Gaussian matrix-i.e., a matrix for which each entry is an i.i.d. complex Gaussian number-are also Gaussian. This trivializes the "Hiding Lemma" often required in other hardness arguments. 5 That said, the proof of Theorem 1 is not itself trivial. In particular, we must prove that the Bi-partiteGBS experiments that we ask the classical oracle to simulate have sufficient probability mass on those outcomes which correspond to #P-hard permanents. To do so, we bound important normalization factors in the output distribution using results from random matrix theory on ensembles of random Gaussian matrices. While our proof unsurprisingly borrows many ideas from the original BosonSampling hardness argument [1], it can be entirely understood without direct reference to it, and we suspect that many will find our rigorous hardness proofs of GBS to be preferable to some in the existing literature.
To complement our formal analysis of the normalization factors (which are sufficient to obtain Theorem 1), we present a more heuristic analysis of other aspects of our experiment that might be relevant to experimentalists hoping to claim quantum advantage (most likely with a more speculative set of conjectures). In particular, we give formulas for the expectation and variance of the number of "clicks" in the output distribution-that is, the number of modes that contain at least one photon when measured. Intuitively, this is an important quantity for hardness arguments since the probability of any particular outcome is proportional to a permanent of a matrix whose rank is equal to the number of clicks in that outcome. Since classical intractability is tied to the complexity of these permanents, and we know that permanents of low-rank matrices have efficient classical algorithms [34], we would like to avoid distributions with low click rates. Thankfully, we prove that this is generally not the case by showing that the expected number of clicks in a BipartiteGBS experiment with Gaussian transition matrices is the harmonic mean of the number of modes and expected number of photons: (1) So, for example, even when we expect that there are only linearly-many more modes than photons, we have that the expected number of clicks is linear. This formula is obtained by assuming that the singular values of a Gaussian matrix are drawn independently and exactly from the quarter-circle law. 6 a random variable that is based on the squeezing parameters of the system. Therefore, we cannot guarantee that the number of modes m is actually quadratic in the number of photons n since there may be some small probability for which there are many more/fewer photons. Instead, what we can say is that the number of modes is quadratic in the expected number of photons n . Coupled with bounds on the variance of n, we can conclude that the output distribution is dominated by output states which obey the m = Θ(n 2 ) relationship.
We present numerical evidence (see Fig. 3) showing that these formulas accurately predict the click distributions of random BipartiteGBS experiments. Finally, we make preliminary progress towards a GBS hardness result in the regime where there are fewer than quadratically many more modes than photons. 7 In this regime, we can no longer guarantee that there is at most one photon per mode. These photon collisions imply that the output probabilities are no longer described by permanents of simple submatrices, but rather by submatrices which have some rows and columns repeated. To this end, we define a new "Permanent-of-Repeated-Gaussians" problem for which the goal is to approximate such permanents. We provide numerics suggesting that a classical simulation of GBS in the high-collision regime can be leveraged to solve the Permanent-of-Repeated-Gaussians problem based on a Stockmeyer counting argument similar to that in Theorem 1. In other words, we could show approximate average-case hardness for GBS in the m = o( n 2 ) regime if we made the following assumptions: the #P-hardness of the Permanent-of-Repeated-Gaussians problem; a plausible conjecture in random matrix theory. We caution that these assumptions remain relatively unexplored.
As a first step towards understanding the Permanent-of-Repeated-Gaussians, we show that there is some sense in which we can reduce arbitrary matrix permanents to permanents of matrices with repeated rows and columns: Theorem 2. Given an oracle O that can approximate Per(B) for any matrix B ∈ C n×n that has k row/column repetitions to additive error , there is an FBPP O algorithm that can approximate Per(A) for arbitrary matrices A ∈ C (n−k)×(n−k) to additive accuracy O( /1.498 k ).
The reduction in this theorem has the additional nice property that if the matrix A is Gaussian, then the oracle is only queried on matrices B that are also Gaussian. This makes it an ideal candidate for use in a hardness reduction because we can only require a classical simulator to sample from distributions that we can sample from quantumly. Unfortunately, while we are getting an exponential improvement in the accuracy to Per(A), we show that it is still insufficient to conclude the #P-hardness of Permanent-of-Repeated-Gaussians from the usual Permanent-of-Gaussians conjecture. That said, if we assume that there are only constantly-many collisions we can show such a reduction. Furthermore, in this regime we can work directly with the magnitude of the permanent, avoiding the need for an additional anti-concentration conjecture.

Theorem 3.
Given an oracle O that can approximate | Per(B)| 2 for any matrix B ∈ C n×n that has k row/column repetitions to additive error , there is an FBPP O algorithm that can approximate | Per(A)| 2 for arbitrary matrices A ∈ C (n−k)×(n−k) to additive accuracy which is exponential in k, but polynomial in n and .
This theorem follows almost immediately by combining Theorem 2 with the polynomial interpolation techniques used to prove the classical hardness of BosonSampling with constantly-many lost photons [36].
The rest of this paper is organized as follows: Section 2 provides a brief introduction to GBS as well the BipartiteGBS protocol for programming GBS devices according to arbitrary transition matrices. It then states important properties of BipartiteGBS with Gaussian transition matrices: two lemmas concerning photon-number statistics and normalization constants (proofs in Appendix A), and analytical formulas for the distribution of click statistics backed up by numerics (proofs in Appendix C). Section 3 contains the proof that classical simulation of BipartiteGBS in the no-collision regime is hard (Theorem 1). Section 4 deals with BipartiteGBS in the collision regime and contains proofs of theorems 2 and 3 (though proofs of some important lemmas in Appendix B). Numerical evidence for extending our arguments beyond the dilute limit are given in Appendix D. 7 It is worth noting that for the task of exact sampling, it was already known that BosonSampling is hard in the m = o(n 2 ) regime, and in fact, even if m = n. Specifically, Grier and Schaeffer give a BosonSampling sampling experiment in the m = n regime for which a particular output probability is #P-hard to approximate [35]. Combined with the Stockmeyer counting arguments of [1], this shows classical intractability of the exact sampling task predicated on the non-collapse of the polynomial hierarchy. For a matrix C with singular value decomposition C = U diag(σi)V T , two-mode squeezed states with squeezing parameter ri = tanh −1 (σi) are generated. These are sent to independent interferometers applying the unitaries U and V , respectively.

Gaussian boson sampling introduction
Gaussian boson sampling is a model of photonic quantum computation where a multi-mode Gaussian state is prepared and then measured in the photon-number basis [7]. Gaussian states receive this name because their Wigner function-a quasi-probability representation of quantum states of light-is a Gaussian distribution [37]. We consider pure Gaussian states without displacements, which can be prepared from the vacuum by a sequence of single-mode squeezing gates followed by linear interferometry.
In contrast to BosonSampling, which uses single photons, GBS employs squeezed states as the input to the linear interferometer. In terms of the creation and annihilation operators a i and a † i on mode i, a squeezing gate is given by S( where r i is a squeezing parameter. A squeezed state can be prepared by applying a squeezing gate to the vacuum. A linear interferometer transforms the operators as where U is a unitary matrix. Let S = (s 1 , s 2 , . . . , s 2m ) encode a measurement outcome on 2m modes, where s i is the number of photons in mode i. The probability of observing sample S when measuring a pure Gaussian state in the photon-number basis is given by [7]: Here where r i is the input squeezing parameter in the ith mode, U is the unitary describing the interferometer, and Haf(·) is the hafnian. 8 The matrix A S is constructed from A by repeating the ith row 8 For a 2n × 2n symmetric matrix A, Haf(A) = 1 . See references [38,39,40] for more detailed discussion of the hafnian and its complexity. and column of A s i -many times (e.g., if s i = 0, both the corresponding row and column are removed entirely). Notice that Eq. (5) implies that A is symmetric.
The total mean photon number in the distribution is given by [19] 2m For future convenience we introduce n as the random variable describing the total number of photon pairs in a given sample. Its quantum-mechanical expectation is simply given by Notice that it is possible to choose a parameter λ > 0 and perform a rescaling A λ := λA so that the singular values of A λ become λσ i . One can check that the mean number of photon pairs n λ is continuous for λ ∈ [0, 1/σ max ) and grows arbitrary large, and so it is possible to set λ such that n λ is any desired non-negative number.

BipartiteGBS
We now introduce BipartiteGBS, a specific strategy for programming a GBS device such that the resulting distribution depends on an arbitrary transition matrix, not just a symmetric one. This scheme is illustrated in Fig. 1. First, we construct a device with 2m modes and generate photons by applying two-mode squeezing gates to modes i and i + m for i = 1, 2, . . . , m. The two-mode squeezing gate is defined as . It can be decomposed as two single-mode squeezing gates with identical parameters (i.e., r i = r i+m ) followed by a 50:50 beamsplitter. The subsequent interferometer is configured by applying a unitary U to the first m modes and a separate unitary V to the second half of the modes. In this sense, this construction is a generalization of Scattershot BosonSampling [6] and Twofold Scattershot BosonSampling [41]. In the former the second interferometer is fixed to V = 1 and r i = r for all the two-mode squeezers, while in the latter only the squeezing parameters are fixed.
In this setup, the GBS distribution is also given by Eq. (3), but in this case the A matrix satisfies The expression for C is equivalent to the singular value decomposition C = U ΣW † of an arbitrary complex matrix with singular values σ i = tanh(r i ) ∈ [0, 1), where we simply set V T = W † . Thus, it is possible to choose C to be an arbitrary complex matrix, up to a rescaling C → λC for some appropriate λ > 0 such that the singular values satisfy σ i ∈ [0, 1).
The resulting output distribution can be more elegantly expressed directly in terms of the matrix C, which we refer to as the transition matrix. We introduce the notation (S; T ) = (s 1 , . . . , s m ; t 1 , . . . , t m ) to denote a sample. Here s i is the number of photons in mode i and t i is the number of photons in mode i + m, for i = 1, 2, . . . , m. From Eq. (9), the outcomes S determine the rows of C and the columns of C T that are kept or repeated when defining the submatrix A S . Similarly, the outcomes T determine the columns of C and the rows of C T . With this in mind, we employ the identity to express the GBS distribution as: S(r 1 ) Figure 2: Constructing submatrices CS,T of the GBS distribution in Eq. (11). In this example, a device with 2m = 6 modes is configured according to an m-dimensional matrix C = U diag(tanh ri)V T . The output is the photon pattern (S; T ) = (2, 0, 1; 1, 2, 0), where S determines the rows of the submatrix and T determines the columns. In step (i), since s2 = 0 and t3 = 0, we remove the second row and the third column. In step (ii), because s1 = 2, we repeat the first row twice and finally in step (iii) since t2 = 2, we repeat the second column twice to arrive at the submatrix CS,T .
The notation C S,T corresponds to a submatrix obtained as follows: if Fig. 2 for an example.
Since the permanent is only defined for square matrices, the number of photons detected in the first half of the modes should be equal to the total number of photons detected in the second set of modes, i.e., Physically, this corresponds to the action of the two-mode squeezing gate, which generates pairs of photons such that every photon in the first m modes has a twin photon in the remaining m modes. This observation, or the fact that r i = r i+m , allows us to write the expectation of the number of pairs as n = m i=1 where the sum only extends to m. Eq. (11) is the starting point for the computational problem we consider. In this formulation, our GBS construction is almost identical to a standard BosonSampling setup [1]. In both cases, probabilities are given in terms of the permanents of submatrices constructed in the same manner. The main difference is that, in our case, we employ 2m modes to encode an arbitrary m × m complex matrix, whereas the corresponding matrix in BosonSampling must be unitary, namely equal to the matrix that describes the interferometer. Another crucial difference is the normalization factor Z. It is necessary to account for the fact that the space of outcomes includes events with different total photon numbers, and it will influence the behaviour of errors in our final result in Section 3.

BipartiteGBS with Gaussian matrices
As discussed above, it is possible to encode an arbitrary matrix in the GBS output distribution. In this section, we specialize this to the case of Gaussian random matrices. Let N (µ, Σ 2 ) m×m C be the distribution over m × m matrices whose entries are independent complex Gaussians variables with mean µ and variance Σ 2 . We choose C ∼ N (0, Σ 2 ) m×m C , for Σ to be specified shortly. By choosing C to be Gaussian, this mirrors the case of BosonSampling [1], where sufficiently small submatrices of uniformly Haar random unitaries are approximately also Gaussian. Therefore, this will allow us to support our hardness-of-simulation result on the same set of conjectures. However, in our construction, any n×n submatrix of C is also Gaussian, for any scaling between m and n. Contrast this with BosonSampling, where requiring submatrices to be approximately Gaussian formally constrains the number of modes to be much larger than the number of photons-rigorously, m = ω(n 5 ). We now prove a few important facts about GBS with Gaussian matrices.
First, several important quantities, such as the squeezing parameters, the normalization constant Z, and the mean photon number n , depend on the list of singular values of C, which we denote by {σ i } i=1...m . For m-dimensional random complex matrices with mean 0 and variance 1/m, in the asymptotic limit m → ∞, the distribution p(σ) of singular values converges to This is the quarter-circle law for random matrices [42]. Importantly, this result states that singular values are constrained within a finite interval, in this case σ ∈ [0, 2], with high probability. The probability that the largest singular value is greater than 2 + decays exponentially as m exp(−m 2 /8) [43]. See also Lemma 15 in Appendix A.2. For now, we simply assume that the singular values are in fact within this range, though we return to this issue in the next section. The above equation holds as the limit for the empirical distribution over singular values of C, and therefore it also corresponds to the limit marginal distribution satisfied by any single σ i . However, we cannot assume in general that the σ i are drawn independently from it. Recall from Eq. (7) that for C to be amenable to encoding in the GBS device, its largest eigenvalue must lie in the range [0, 1). Furthermore, it will be useful to tune the relation between m and n . To address both of these issues, we rescale the matrix by a further factor of 1/α, for some α > 2. Alternatively, we can choose C from G := N (0, 1/(α 2 m)) m×m C . By doing this, the limiting distribution for the singular values in the interval [0, 2/α] is We now consider the typical behaviour of two quantities that will be important for our main result. The first is the number of photon pairs n. The size of the matrix permanent associated with an output probability-and hence its complexity-is directly determined by the number of photons observed in a given experimental run. However, in GBS the total photon number is not fixed, and the fluctuations in photon number depend on the matrix C via its singular values. Therefore, it will be important to prove that fluctuations around the mean photon number are small enough so that our main argument is stable. The other important quantity is the normalization constant Z, which appears as a multiplicative factor between the matrix permanent and the output probabilities. For this reason, it will directly affect the error bounds of our main results, and we need to prove that it is typically not too large.
In the remainder of this section we give an intuitive analysis of these quantities, together with suitably formal bounds that are proven in Appendix A.

Fluctuations of the total photon number
From Eq. (13), we can compute the expected mean number of photon pairs as we vary 9 C over the Gaussian ensemble: Throughout our main argument, we usually consider a regime where m scales faster than Ω(n), i.e., as Θ(n 2 ). This corresponds to a regime where α is large and we can write α 2 (1 − 1 − 4/α 2 ) − 2 ≈ α 2 (1 − 1 + 2/α 2 + 2/α 4 ) − 2 = 2/α 2 . Therefore in this regime we have For instance, if we want the GBS device to operate in a regime where m = cE[ n ] 2 for some c > 0, it suffices to choose α = (cm) 1/4 = Θ(m 1/4 ). We will refer to this regime from now on as the dilute limit.
Even assuming the singular values follow the quarter-circle distribution exactly, computing the expectation of n is not enough. The complexity implied in Eq. (11) depends on the observed number of photons, not on n . Therefore we must prove that, with high probability, n is not so far from its expectation as to invalidate our conclusions. We show the following formal bound, which follows from Theorem 19 and Theorem 21 in Appendix A: whenever α ≥ 6, and m ≥ ln(1/δ). The first probability is over the choice of Gaussian matrix C, whereas the second probability is over both the choice of C and over the photon number distribution.
Notice that in the dilute limit this statement implies that with high probability over the choices of Gaussian matrix and over the photon number distribution, the observed photon number is linear in m/α 2 to leading order.

The normalization factor
Let us now consider the typical behaviour of the normalization factor Z. Recall that we can write Assuming σ i ∈ [0, 2/α], it holds that Z ∈ [1, 1/(1 − 4/α 2 ) m ]. Asymptotically, the upper bound can be written as We want Z to be as small as possible. As will become clear in the next sections, the bound in Eq. (17) is not sufficiently tight for our purposes. On the other hand, if each singular value σ i is drawn independently from the quarter-circle distribution (Eq. (13)), then the expectation of Z would scale more favourably as e m/α 2 . Although the singular values are not independent, we prove that this heuristic argument in fact provides the right scaling (see Appendix A.2). More specifically, we give the following bound: whenever α ≥ 6, and m ≥ ln(1/δ).
Recall that α = Θ(m 1/4 ) in the dilute limit, so Lemma 5 implies that Z is bounded asymptotically by e m/α 2 .

Collision statistics beyond the dilute limit
A BipartiteGBS sample is specified by M = 2m non-negative integers giving the number of photons measured in each of the modes: As mentioned before, the total number of photons detected in the first m modes should be equal to the total number of detected in the second m modes-that is, Another useful variable to consider is the number of clicks. A click sample is obtained from a photon number sample by "thresholding" the events, mapping any event with more than zero photons into outcome 1 while mapping vacuum events to 0. We write these thresholded samples as where d i , e j = 1 if s i , t j ≥ 1 and 0 otherwise. It is also useful to write the total number of clicks in either half where we also state the obvious fact that the total number of clicks is always smaller or equal to the total number of photons detected. Note that, unlike the photon number, the number of clicks in both halves of the modes need not be the same, thus in general, d = e. Whenever the number of clicks is less than the number of photons, there must be collisions (at least one mode with more than one photon).
To understand why the number of clicks is an important random variable consider the expression for the probabilities (recall Eq. (11)) depending on A priori, while the n×n matrix C S,T may be large, its rank (min(d, e)) may be small. Because matrices of small rank have efficient algorithms [34], it is useful to understand the statistics of the clicks in each of the two halves of the modes. In the dilute limit no-collision events dominate and thus d = e = n.
Beyond the dilute limit, we can find simple expressions for the first and second moments of the total number of clicks in either half of the 2m modes. The detailed derivation of these results is provided in Appendix C. These expressions are written in terms of the photon number density For the first order moments (means) we simply invoke the fact that the reduced states of two-mode squeezed states are thermal states, and scrambling from the random interferometers leads also to locally thermal states to find, By rewriting the density µ in terms of the mean number of pairs and the number of modes one easily derives Eq. (1) for E( d + e ). For the second order moments we need to invoke the quarter circle law and use a Taylor expansion to obtain (25b) ) which tells us that d and e are very strongly correlated, as one would expect since in this limit clicks reduce to photon numbers, which should be equal in the two halves of the modes. Even beyond the dilute limit, the equations above predict strong correlations between the number of clicks in either half of the modes. We can for example consider the linear correlation ratio which for a non-negligible density of µ = 0.3 gives Corr(e, d) ≈ 0.83. We find excellent agreement between the results in the equations above against exact numerical calculations for varying photonnumber densities as the number of modes increased in Fig. 3. We conclude this section with a discussion of the bosonic birthday paradox, a bound on how likely we are to observe collision events in a BosonSampling experiment, which will also turn out to govern BipartiteGBS [45]. Specifically, for a BosonSampling experiment with m modes and n ∼ c m (j−1)/j photons, the bosonic birthday paradox says that the number of output modes where we expect to observe exactly j photons converges to a Poisson random variable with mean c j . Collision outcomes with more than j photons are suppressed.
The key idea in the proof of the bosonic birthday paradox is that, upon applying a Haar-random matrix to an n-photon Fock state, the output state is the maximally mixed state over the n-photon, m-mode Hilbert space. While BipartiteGBS experiments do not in general have Fock input states (in fact, the total photon number is a random variable), one can show that the bosonic birthday paradox holds when we restrict to a fixed subspace of n photons since that restricted input state is a Fock state. Moreover, in the singular value decomposition of a Gaussian matrix, the two unitary matrices U and V corresponding to the interferometers in BipartiteGBS are Haar-random [46]. Therefore, it will still be the case that the output state, when averaged over the Gaussian ensemble, can be seen as a copy of the maximally mixed state on each set of modes, as per Fig. 1.
For example, because Lemma 4 shows that n is highly concentrated around m/α 2 in the dilute limit, we can apply the bosonic birthday paradox in this regime (at j = 2). In particular, this implies that some constant fraction of the output distribution of a BipartiteGBS experiment has no collisions, a fact which is somewhat implicit 10 in the proof of Theorem 9.

Approximate average-case hardness for GBS in the dilute limit
In this section we prove our main result, namely that GBS is hard to simulate efficiently on a classical computer in the dilute limit and up to the same complexity conjectures as in BosonSampling [1]. To that end, we start with the following computational problem: Our goal is to prove that, if there exists an efficient classical algorithm to simulate the output of a GBS experiment to high precision in total variation distance, then |GPE| 2 ± can be solved in the complexity class FBPP NP . It is conjectured 11 that |GPE| 2 ± is #P-hard, and so we obtain that the polynomial hierarchy (PH) collapses to its third level by the usual chain of inclusions: Since the polynomial hierarchy is widely conjectured to be infinite, such an efficient classical algorithm is unlikely to exist [1,47]. We will break the proof of our main result in two parts. In Section 3.1, we apply a Stockmeyer counting argument to leverage an efficient classical algorithm that samples from the GBS output distribution into a FBPP NP algorithm that produces an estimate of any probability within this distribution. This follows the corresponding argument in [1], though we emphasize some aspects of the proof that are particular to our scenario. In Section 3.2, we then use Eq. (11) to connect the output probability of an event to the permanent of a Gaussian matrix, and discuss how this affects the corresponding error bounds.

From distributions to probabilities
Consider the probability distribution D C at the output of a GBS experiment which implements an arbitrary 12 transition matrix C, as described in Section 2.1. Let O be a deterministic classical algorithm that takes as input a description of C, together with an error bound β and a uniformly-random number r, and outputs a sample drawn according to distribution D C . We write this as as r ∈ {0, 1} poly(m) is sampled from the uniform distribution. Suppose also that probability and rearranging, we get and S is chosen uniformly at random from H. We show in Theorem 9 (Eq. (44) and Eq. (45)) that Zα 2n m n /|H|n! = poly(m) with overwhelming probability. Plugging in this bound, we get | Per(C S )| 2 = o(n!/poly(m)). However, this contradicts the Gaussian Permanent Anti-Concentration Conjecture which says that | Per(C S )| 2 = Ω(n!/poly(m)) with high probability.
Our goal is to use Stockmeyer's theorem [48] to show that a BPP NP machine, with access to O, can produce a good approximation to the probability of some particular outcome.
At this point, for generality, we do not yet impose the restrictions that we are in the dilute limit, nor that C is a Gaussian matrix. We only use the fact that the Gaussian ensemble is invariant under permutations of its rows and columns. At a high level, this allows us to randomly permute the rows and columns of C in order to hide the outcome S that we care about within D C . The classical algorithm O is allowed to incur some total error β in its probabilities, but it is unlikely to concentrate too much of that error in the specific hidden outcome. Now let S be a particular GBS outcome, 13 and let H S ⊆ N 2m be the subset of GBS outcomes that contains (without multiplicity) all outcomes S π,σ = (s π(1) , . . . , s π(m) ; t σ(1) , . . . , t σ(m) ) for all permutations π, σ ∈ S m . This set is, by definition, invariant under permutations within the first m modes and within the last m modes, which correspond to permutations of rows or columns of C, respectively. To reiterate, to work in the dilute limit, S will be a 0/1-vector, but such a restriction is not yet needed. Indeed, we could choose S to be any possible GBS outcome, even those with many collisions. That said, in order to obtain an approximation to the underlying permanent question, the subspace H S must correspond to outcomes with a sufficiently large probability mass.
We then have the following result, which closely follows Theorem 3 of [1]: There exists an FBPP NP O algorithm which, given an outcome S, produces an estimate of the corresponding probability D C (S) to additive error ε/ |H S | with probability at least 1 − δ (over choices of C) in time poly(m, 1/ε, 1/δ).
Proof. At a high level, we will construct an algorithm which counts all settings of the random bits that cause O to output S. However, O is allowed to err with some constant probability β. In order to prevent O from concentrating more of that error on S, we will first permute the rows and columns of C uniformly at random. Equivalently, we can say that O cannot tell which outcome in H S we desire to approximate.
Set β := ε δ/16 and feed the input C, 0 1/β , r to O. Recall that O returns a sample from D C such that D C − D C ≤ β if r is sampled uniformly. Furthermore, if we let p X := D C (X) and q X := D C (X) for any outcome X ∈ N 2m , we have We use Stockmeyer's approximate counting method [48] on this probability. If we can also show that q S is close to p S , this will imply that we can use Stockmeyer's method to estimate the desired quantity p S as well.
By Markov's inequality, we get Setting k := 4/δ we have 2βk = ε/2, and so Our goal is to bound ∆ S , the error in probability for the specific outcome we are trying to compute.
Recall that rows and columns of C were randomized before feeding it to O. Since the distribution over C is permutation invariant, we have that a random outcome of C has the same probability as a fixed outcome which has been permuted randomly, that is, where π, σ are the permutations used to randomize the rows and columns of C. 14 Let us now formally describe the approximation we get for q S using Stockmeyer counting [48]. For any θ > 0, we can obtain an estimate q S such that with an FBPP NP O machine running in time polynomial in m, 1/θ, and 1/δ. Since q X is a probability, For similar reasons as before, it follows that Finally, we set θ := ε δ/8 and k := 4/δ. Combing all the above with the union bound, we get where the probability is over choices of C, (π, σ) and the randomness of the approximate counting procedure.

From probabilities to permanents
We just showed that, from the assumption that there exists a classical algorithm O that simulates the output distribution of a GBS experiment to within total variation distance β, it follows that there exists an FBPP NP O algorithm that produces an estimate to any outcome S to within error ε/|H S |. We now show how to leverage this result to obtain an estimate for the permanent of a Gaussian matrix, i.e., to solve |GPE| 2 ± . We begin by embedding the matrix we care about, X, within the GBS transition matrix C. Recall that X ∼ N (0, 1) n×n C , and so there is an efficient algorithm which takes X as input and produces a matrix C ∼ N (0, 1) m×m C which contains X as its submatrix, occurring in a random position. Now recall that C cannot be directly implemented in the GBS setup, since we require its singular values to be between [0, 1). Furthermore, we wish to work in the dilute limit. Therefore, we rescale the matrix C by a factor of 1/(α √ m), as discussed in Section 2.3, resulting in the transition matrix C ∼ G.
Suppose now, without loss of generality, that X/(α √ m) appears as the top left submatrix of C. Consider now the 2n-photon no-collision outcome, S, which contains a single photon in each of the modes {1, . . . , n} and of the modes {m + 1, . . . m + n}. By the discussion in Section 2.1, the probability of this outcome is From this correspondence between the probability of outcome S and |Per X| 2 , together with Theorem 7, we obtain the following result: with probability at least 1 − δ over choice of X in time poly(m, 1/ε, 1/δ).
Proof. We begin by following the procedure described previously in order to embed X as a submatrix of the GBS transition matrix C. The entire procedure will fail if C has singular values greater than 1 since such a transition matrix does not correspond to a valid GBS experiment. However, by Lemma 15, the maximum singular value of C is greater than 2/α + with probability at most me −mα 4 /8 . Let us now set to 1−2/α. If me −mα 4 /8 is less than, say δ/2, we can assume the singular values are less than 1 by the union bound. However, one can check that the experiment might fail with probability greater than δ/2 if 1/δ = Ω(exp(mα 4 )). In this case, however, our algorithm can run in time exponential in m since every polynomial in exp(m) is bounded by another polynomial in 1/δ (and recall we allow that our algorithm runs in time poly(1/δ)). Since |Per X| 2 can be computed exactly in time exponential in m by Ryser's formula, the theorem still holds even in this extreme regime of δ.
We now apply Theorem 7, which provides an estimate of Pr(S) to within additive error ε/|H S |. By Eq. (39), this immediately yields an additive estimate for |Per X| 2 to within the desired error.
Notice that we have yet to use the fact that we are working in the dilute limit-indeed, the previous theorems have been stated for a general scaling parameter α. We will now need to make this restriction. That is, we will show that any classical oracle O which approximately samples from the output distribution of our GBS experiment in the dilute limit can be leveraged to compute Gaussian permanents.
The main outstanding step in this proof is to find a sufficiently large subspace H S in which to hide the outcome S, such that the error bound obtained in Corollary 8 matches the required error to solve |GPE| 2 ± . For that purpose, we restrict ourselves to the no-collision subspace with n photon pairs, where n is the size of the matrix permanent we wish to estimate in |GPE| 2 ± . More specifically, we set the scale parameter α such that the expected number of photon pairs is exactly n, and restrict our attention to states S = (s 1 , . . . s m ; t 1 , . . . t m ) where each s i or t j is only equal to 0 or 1, and where s i = t j = n. To clarify, the size of the matrix is fixed and not a random number. Nevertheless, we call this parameter n suggestively since the total photon number of the experiment is unlikely to be too far from E[ n ] by Lemma 4. Remarkably, however, our proof never explicitly invokes this fact.
Proof. Recall that we are given a matrix X ∼ N n×n C , and we wish to construct an FBPP NP O algorithm which estimates | Per(X)| 2 to additive error n! with probability at least 1 − δ in time poly(n, 1/ , 1/δ).
We employ Corollary 8, which gives an FBPP NP O algorithm that estimates of |Per X| 2 to within additive error with probability at least 1 − ∆ in time poly(m, 1/ε, 1/∆). We will set ∆ = δ/2 and set ε such that Eq. (41) is bounded by n!. To this end, let us define the ratio Our goal now is to bound how large I can be. If it is at most polynomially large, then our proof is concluded since we can set ε = /poly(n, 1/δ) and thus achieve the precision required by |GPE| 2 ± . First recall that we are working in the dilute limit, so let us set m = cn 2 and α 2 = √ cm where c > 1 is some constant to be determined later. Furthermore, in each half of the output we have n photons in m modes without collisions, and so where we have used 15 that m = cn 2 and implicitly defined a new constant γ c > 0. From this, we have where the first inequality uses Eq. (43), and the second uses a Stirling approximation bound. By Lemma 5, we have that as long as m ≥ ln(2/δ). Once again, if m < ln(2/δ), we get 1/δ = Ω(e cn 2 ) and so we could have computed the permanent of X explicitly using Ryser's formula. Therefore, let us assume Eq. (45) and combine it with Eq. (44) by the union bound to conclude that I = O( √ n/δ) with probability at least 1 − δ over the Gaussian ensemble. This completes the proof.

Dealing with collision outcomes
In this section, we establish connections between permanents of matrices with repeated rows and columns (corresponding to collision outcomes) and permanents of matrices without repetitions. The end goal is to establish a reduction between the hardness of computing permanents of Gaussian random matrices with and without repetitions. These results apply generally and may also be useful for BosonSampling.
To start, notice that we need to define new variants of the Gaussian Permanent Estimation problem where the goal is to estimate permanents of matrices with repeated rows and columns. Much like how the error tolerance for GPE is based on the expectation of the permanent, the error tolerance for these new problems is based on the expectation of permanents with repetitions for which we need the following result (proof in Appendix B): for any repetition vectors S, We can now define the following problems that generalize |GPE| 2 ± and GPE ± : t j , and accuracy parameters ε, δ > 0, we define the following problems for which we are required to output a complex numberP such that 15 Whenever c, n ≥ 1, we have with probability at least 1 − δ over the randomness of A in time poly(n, c, 1/ε, 1/δ). 16 Given that we do not have a conclusive proof of hardness in the high-collision regime, some may wonder whether we have defined |RGPE| 2 ± correctly. Do we need to make the error scale with the expectation as is the case with |GPE| 2 ± ? In Appendix D, we give numerical evidence that this is the right choice-i.e., Stockmeyer counting on the BipartiteGBS distribution in the high-collision regime still solves the |RGPE| 2 ± problem. Moreover, if the error scaling were much smaller, then the estimate it produced would be insufficient.
In [1], the authors show that |GPE| 2 ± and GPE ± are, in fact, equivalent, up to the Permanent Anti-Concentration Conjecture (PACC), which states that the permanents of random Gaussian matrices are not too concentrated around 0. 17 With some amount of work, it is possible to prove a similar equivalence between problems |RGPE| 2 ± and RGPE ± assuming a version of the PACC which includes permanents of Gaussian matrices with repetitions. Unfortunately, even this generalization has its limits. It can be shown that, for a matrix composed of c copies of a single row of Gaussian elements, the permanent does not anti-concentrate. Therefore, even if the PACC is true, we expect its generalization to fail above a certain number of row/column repetitions. The hope is that the generalized PACC might nonetheless hold for those repetition patterns which occur naturally in the GBS distribution. We do not pursue this here as this equivalence is not required for the arguments that follow.
This rest of this section will show how solutions to RGPE can be used to solve GPE. In Section 4.1, we show how the permanent of a matrix with repetitions can be written as a polynomial that depends on the permanent of the matrix without repetitions. In Section 4.2, we use this result to prove an efficient reduction between RGPE ± and GPE ± and an inefficient reduction between |RGPE|

Permanents of matrices with repeated rows and columns
Given a matrix A and repetition patterns S, T , our goal for this section is to construct some matrix B such that Per(B S,T ) can be expressed as a polynomial whose constant coefficient is proportional to Per(A). Therefore, given an oracle to compute the permanents of matrices with this collision pattern, we can infer (through polynomial interpolation) the permanents of matrices that have no repeated rows or columns. The degree of the polynomial will depend on the total number of collisions k := k S + k T , i.e., the number of collisions in repetition patterns S and T , respectively. Note that here we are allowing k S = k T . If this is the case, the matrix B must be rectangular so that the matrix B S,T is square (for which the permanent is well-defined). Formally, we show the following: The coefficients γ 1 , . . . , γ k are functions of the entries of A, S, T , x and y. Furthermore, whenever While the majority of this proof is given Appendix B, let us briefly describe the form of the construction, which results from two separate (but essentially identical) steps. We first consider the case where only the rows of A are repeated. That is, we construct a rectangular matrix B such that when only the row repetition pattern S is applied (yielding a square matrix B S ) the analogous version of Eq. (47) holds (i.e., the constant coefficient of the permanent polynomial is proportional to the permanent of A). In the second step, we simply treat B S as if it were the original matrix A and do the same construction except for the repeated columns specified by the repetition pattern T .
Let us now describe this first step, the construction of B . For each such that s > 1, append to For example, the 2 × 2 matrix A = a 1,1 a 1,2 a 2,1 a 2,2 with repetition vector S = (3, 2) has , which leads to the matrix To give some intuition for the proof of correctness, one can show that each monomial in the expansion of the permanent for B S cannot have two a i,j terms from the same row or column of A unless there is also z term. Therefore, the only monomials which do not have a factor of z are those that also appear in the expansion for the permanent of A (albeit with an extra factor of a product of y variables which is always the same). To complete the proof, it suffices to count the multiplicity of these monomials. This result can now be used to show that the permanent of a matrix without repetitions can be estimated from estimates of larger matrices with repetitions. The first straightforward way of doing this is just setting z = 0 and x i i,j = y i i,l = 1 in our construction. This can be viewed as a "worstcase reduction" between these two problems, in the sense that we are assuming we have full control over the choice of the larger matrix B. However, this is not sufficient to give a reduction between RGPE and GPE since the matrices there are required to have independent (up to the corresponding repetitions) Gaussian matrix elements, and the matrix B[0; x, y] is very far from Gaussian. The next section describes two (related) techniques for dealing with Eq. (47) when the matrices are random.

Polynomial interpolation techniques to estimate Gaussian permanents
The first polynomial interpolation technique to deal with Eq. (47) is to query the polynomial at the (k + 1)th roots of unity. This leads to the error scaling described in Theorem 2 in the Introduction, which we rephrase below (proof in Appendix B): Theorem 13. Given access to an oracle for RGPE ± with error ε (c + k)! s i !t j !, it is possible to use k + 1 calls to the oracle to solve GPE ± with error √ c! where := ε( 1 |ξ| ((c + k)!/c!) s i !t j !) with ξ as in Eq. (48).
Notice that if we disregard the need to convert between the allowable ( (c + k)! s i !t j !)-error fluctuations for RGPE ± and the ( √ c!)-error fluctuations for GPE ± , then this theorem implies an error of for the repeated matrix leads to an error of /|ξ| for the unrepeated matrix. Moreover, we have that /|ξ| = O( /1.498 k ) with high probability (also proved in Appendix B), and so we actually obtain an exponential improvement in the error accuracy. Unfortunately, once the target error bounds for RGPE ± and GPE ± are incorporated we are in opposite situation- is exponentially large. Therefore, Theorem 13 cannot be used to reduce GPE ± to RGPE ± without an exponential error blowup.
That said, we believe that Theorem 13 provides weak evidence of a formal connection between RGPE ± and GPE ± (and recall the latter is believed to be #P-hard). Furthermore, it is worth noting that Theorem 13 leads to a somewhat surprising error scaling, as the error in the polynomial interpolation does not depend explicitly on the polynomial degree k. We hope the relatively benign error scaling of Theorem 13 might make it a useful building block in a hardness proof for GBS or BosonSampling in the regime where high-collision outcomes dominate, and we leave filling the above gaps for future research.
However, a crucial aspect of this reduction in Theorem 13 is that it is between amplitudes (whose phases are needed to arrange the cancellation of unwanted terms), whereas hardness arguments must be stated in terms of probabilities. We now describe an alternative polynomial interpolation technique which does provide an efficient reduction from |GPE| 2 ± to |RGPE| 2 ± , but only in the regime where k = O(1). While the scaling in Theorem 13 will be much better and would also suffice in the constantcollision regime, we note that working directly with permanent magnitudes allows us to avoid creating a new anti-concentration conjecture for permanents with repeated rows and columns-i.e., to reduce from RGPE ± to |RGPE| Whenever k = O(1), we have = poly(c, ε, δ).
The main reason why the error above scales so poorly in k is that we take the absolute squared value of Eq. (47), and then apply a polynomial interpolation method. Contrary to Theorem 13, we cannot compute the polynomial for values of z in the unit circle, since many of its monomials depend on |z| 2 and so we cannot orchestrate the proper cancellations. As a result, we use a least squares estimator to approximate the value of the polynomial at z = 0 by sampling O(k) values of it in a small range around z = 1. This causes a blowup that is exponential in the degree of the polynomial, leading to the scaling above.
In Appendix B we outline the proof of this result, though we omit some of the important technical lemmas that can be taken directly from [36] with regards to the hardness of lossy BosonSampling.
Theorem 14 can be plugged, in a relatively direct manner, into our main argument or that of [1] to prove that the complexity of both BosonSampling and GBS remains unchanged when we move from the no-collision subspace to a subspace with a (fixed) constant number of collisions. While this does not provide evidence of hardness in a regime where m = O(n d ) for d < 2, we believe Theorem 14 is well-motivated in two senses: first, as a preliminary step towards a stronger result that does take into account more that constantly-many collisions, which might re-purpose some of the intermediate results; second, as a level of "experimental robustness", where allowing for a few collision outcomes to be included in the output distribution might improve the count-rate for finite-sized experiments (recall that even in the no-collision regime, some constant-fraction of the output distribution may still have collisions).

Discussion
We introduced BipartiteGBS as a method for programming a Gaussian boson sampling device so that the output probabilities are proportional to the permanents of arbitrary matrices. This allowed us to rigorously prove the hardness of approximately sampling from GBS distributions in the m = Θ( n 2 ) regime under the same set of conjectures as those used for BosonSampling [1]. To recap the advantage of our approach, recall the two reasons why BosonSampling is required to operate in the m = Ω(n 2 ) (dilute) regime. The first is that the argument is predicated on the hardness of permanents with unrepeated rows and columns, and so we need that many more modes than photons for no-collision outcomes to dominate the output distribution. Perhaps more importantly, the second reason is that it is required that the n × n submatrices of m × m Haar-random matrices appear approximately Gaussian. In fact, while it is widely believed that such a statement holds whenever m = ω(n 2 ), it was only formally proved for m = ω(n 5 ). An important aspect of our work is that it has removed this obstacle in the case of GBS. Since we can implement arbitrary transition matrices, we can just choose them directly from the Gaussian ensemble, and thus their submatrices already look Gaussian even for m = Θ(n). Thus, there is the tantalizing prospect of improving on our work to show hardness for GBS in this regime, which would be much more experimentally friendly. We provide a blueprint for how this argument might go in Appendix D.
Let's also contrast our work with another recent paper of Deshpande et al. on the hardness of Gaussian boson sampling [26]. There, the authors also prove a worst-to-average case hardness result for approximate GBS in the dilute limit (c.f. Theorem 9). To do this, they must conjecture two plausible and yet new conjectures in complexity and random matrix theory. Furthermore, Deshpande et al. actually inherit the same problem that appears in BosonSampling-there are only rigorous proofs that the n × n submatrices of m × m are approximately Gaussian when m = ω(n 5 ). To circumvent this problem, the authors must conjecture directly that m = Ω(n 2 ) suffices (technically, they require that submatrices of the unitary product U U T approximate random XX T matrices for Gaussian X). Therefore, they will require fundamentally new ideas to operate in regimes beyond the dilute limit.
We reiterate that the main open problem left by our work is exactly that-prove hardness of GBS in a regime where the number of modes is subquadratic in the number of photons. We outline two possible approaches. First, one could attempt to improve the reduction in Section 4 so that an efficient algorithm for RGPE ± implies an efficient algorithm for GPE ± . This would allow for the possibility that the hardness of GBS in the high-collisions could primarily be based on the same hardness conjectures as those used in BosonSampling. That said, several issues still need to be addressed. For example, consider our bound on Z in Lemma 5. There, we have a multiplicative exp(Ω(m/α 4 )) term which is constant in the dilute limit, but grows exponentially for smaller α. Some change in scaling is inevitable, but this bound will not lead to a tight enough estimation of the permanent to solve RGPE ± via our Stockmeyer counting argument. While we give strong evidence in Appendix D that such a bound exists, proving it rigorously may be challenging. Moreover, in order to use a reduction akin to that given in Theorem 13, one would need to prove an equivalence between the multiplicative and additive versions of RGPE. This would required a generalization of the Permanent Anti-Concentration Conjecture for matrices with repeated rows and columns which is not known to be equivalent to the original conjecture, and in fact is provably false in extreme settings. A second approach would be to try to strengthen the evidence for the #P-hardness of the Permanent-of-Repeated Gaussians problem without reference to the BosonSampling conjectures. We note that even in this case much of the same work outlined above would be required.
Another major open direction left in our work-and indeed in many of the proposed hardness arguments based on linear optics-is incorporating realistic experimental noise. To be clear, we show the classical hardness of approximately sampling (in total variation distance) from ideal GBS distributions. However, even in the high-collision regime, current experiments also do not sample approximately from their true distributions due to a variety of sources of noise (e.g., photon loss). Therefore, incorporating this noise is a critical to closing the gap between theory and experiment.
That said, it is likely that the flexibility of BipartiteGBS can help mitigate the effects of noise, particularly in near-term experiments. For instance, assume one wants to perform Scattershot Boson-Sampling with an unitary transition matrix W , which corresponds to BipartiteGBS where we set all squeezing parameters to be equal, U = W , and V = I (cf. Fig. 1). If W is Haar random, then it typically requires depth equal to m [29,30]. But now note that we can shift some of the beam splitters from U to V , in the sense that any combination of these two matrices for which W = U V T leads to the same problem instance. Thus, we can program an m-depth decomposition of W , such as that of [29,30], in an m/2-depth BipartiteGBS instance simply by mapping half of the beam splitter layers to U and half to V . Since losses scale exponentially with depth, this reduction of a factor of two for the depth corresponds to having square root of the losses.
Another less straightforward example is as follows. Suppose we measure the complexity of simulating the device by the computational cost of producing a single 2n-photon sample (from the exact distribution) using state-of-the-art algorithms. A 2m-mode implementation of the standard GBS model would typically require an arbitrary 2m-mode interferometer, and samples of its output probabilities can be generated in time O(mn 3 2 n ) [40,49,50]. A BipartiteGBS instance with comparable computational cost (up to polynomial factors) would also have 2m total modes, and its 2n-photon probabilities would be given by permanents of n × n matrices, with cost-per-sample of O(n2 n ) time [51,52,53,54]. But an arbitrary BipartiteGBS interferometer only requires depth m, and thus half the depth (and square root of the losses) as an arbitrary implementation of standard GBS. Combining these two observations, it is possible to reduce the depth by a factor of 4 if one moves from arbitrary GBS to an implementation of BipartiteGBS based on an unitary transition matrix.
Finally, we ask what other uses BipartiteGBS might have in proving stronger forms of classical intractability-either from an experimental or theoretical viewpoint. For instance, because we are no longer constrained to have unitary transition matrices, one might envision a hardness result predicated on an entirely different distribution of matrices (e.g. Bernoulli instead of Gaussian). The flexibility of BipartiteGBS allows the possibility of more contrived distributions that nevertheless have a stronger complexity-theoretic foundation, and we hope that BipartiteGBS motivates others to explore these possibilities.
[15] Z. Vernon, N. Quesada A Bounds on n and Z from random matrix theory In this appendix, we prove Lemma 4 and Lemma 5, corresponding to bounds on n and Z, respectively. As in the main text, let G = N (0, 1 α 2 m ) m×m C be the distribution over m × m matrices whose entries are independent complex Gaussians with mean 0 and variance 1/α 2 m.
The quantities we will bound depend crucially on the set {λ i } i=1,...,m , where λ i = σ 2 i and σ i is the ith singular value of Gaussian matrix C ∼ G. Alternatively, λ i is the ith eigenvalue of the matrix A := CC † . The matrix A is called a complex Wishart matrix. The complex Wishart distribution is usually denoted by W(µ, Σ) when the columns of C are sampled from the multivariate complex normal distribution with mean vector µ and covariance matrix Σ. In our case we can write A ∼ W(0, 1 α 2 m I m ), and the latter ensemble will be represented by W, for short.
Much is known about the spectrum of complex Wishart matrices from random matrix theory, and we will invoke several known results regarding W throughout this appendix. For example, it is known that the maximum eigenvalue of A ∼ W (denoted by λ max (A)) can be bounded as follows: Lemma 15 (Haagerup and Thorbjørnsen [43]).
Let us recall the notation we will use for the source of randomness in these calculations. We write · for the average of a quantity (e.g., the total photon number) over the distribution of photon numbers that arises in a GBS setup of Section 2.1, for a particular transition matrix C. We write E[·] as the expectation of a particular quantity over the randomness of C ∼ G.

A.1 Bounds on observed photon number
We wish to upper bound the probability that the observed number of photon pairs, n, is far from its expectation. Let us begin by reviewing the computation of the expectation value itself. We can write where λ i is the ith eigenvalue of matrix A ∼ W. We can compute the exact expected value of n over random A using the following lemma: where (a) k := a(a + 1) · · · (a + k − 1) is the rising Pochhammer symbol.
This makes the expression for E[ n ] challenging to write down and manipulate. Notice, however, that when α is large, the higher order terms are negligible. Formally, we state the following proposition: We can bound all the higher-order terms (k > 1) as whenever λ max (A) ≤ 1/2. What remains (the k = 1 term) is simply the trace of A.
Importantly, we know that the maximum eigenvalue of A ∼ W is small (for large α) with high probability, due to Lemma 15. We also know the expectation of the trace: whenever β ≥ 4, α 2 ≥ 8β, and m ≥ β −2 ln(1/δ).
Proof. By Lemma 15, for any β ≥ 4, we have Combining our two bounds with the union bound, we get which (using E[Tr(A)] = m/α 2 ) gives the theorem.
Therefore, assuming n is roughly equal to its expected value (over both the choice of A and the inherent randomness of the photon number), we can approximate n as m/α 2 whenever α = Ω(m 1/4 ) from Proposition 17. What we would like to know is how much n deviates from this value, for which we will need to compute the variance Let us now write n = m i=1 n i where n i is the number of photons generated at source in mode i. Expanding out the n 2 term of the variance, we get where the first equality uses n i n j = n i n j (i.e., the photons are generated independently at each mode), and the second equality uses the following identity for one half of a two-mode squeezed state: 18 Therefore, we can write the variance as Once again, we make the observation that when the maximum eigenvalue of A is small, the higher order terms in the Taylor expansion converge.
We bound the terms of the sum as whenever λ max (A) ≤ 1/2.
We are finally ready to state and prove the main result for this section: a bound on how much n deviates from m/α 2 (roughly, its expected value) over both the randomness of A ∼ W and the randomness of the photon number.
Proof. Once again, we have λ max (A) ≤ 4β/α 2 ≤ 1/2 with probability δ/4 by Lemma 15, and so we have where the first inequality comes from Proposition 20 and the second comes from Theorem 19 (which we assume holds with probability δ/2). By Chebyshev's inequality, we get where the probability is over the randomness in the photon number. And so, combining the previous inequalities with the union bound, we get with probability δ, which completes the proof.

A.2 Bounds on normalization factor
We are now interested in the quantity where λ i is the ith eigenvalue of a complex Wishart matrix A := CC † ∼ W. We begin this section with the following fact about the moment generating function of A ∼ W: Lemma 22 (Goodman [58]).
where the last equality comes from the Taylor expansion of each term ln(1 − λ i ). First, we will show that we can bound the higher powers of the eigenvalues. By Lemma 15, for any β ≥ 4, we have whenever m ≥ 1/β 2 ln(1/δ). Therefore, let's assume that λ max (A) ≤ 4β/α 2 . We obtain where we have assumed that 8β ≤ α 2 for the last inequality. Combining (72) and (74), we get By Lemma 22, we have where we've used a Taylor expansion and α ≥ 2 for the last inequality. By Markov's inequality we get with probability at least 1 − δ/2. The theorem follows from (75), (77), and the union bound.

A.3 Proof of thermal state identity
The purpose of this section is to give a self-contained proof of the identity which is used in Appendix A.1 to show that n is close to m/α 2 with high probability. To start, recall that in the main text we introduced the two-mode squeezing operator between modes i and i + m with squeezing parameter r as: is an anti-Hermitian generator. We would like to investigate expectation values of operators O on the two-mode squeezed vacuum state To make progress, we investigate the following quantity In the last equation we used the fact that G † = −G and introduced the commutator Having derived these transformations we can now calculate In the last equation we used the canonical commutation relations and the fact that the vacuum is annihilated by the annihilation operator. If we set k = 1, we easily find the mean photon number to be n i = a † i a i = sinh 2 r. For the second moment, we write The first term in the RHS of the last equality can be obtained by setting k = 2 and we finally obtain n 2 i = 2 n i 2 + n i , as claimed.

B Technical lemmas for proving hardness of permanents with repetitions
In this appendix, we give proofs for the technical lemmas in Section 4. To start, let us show the proof of the expectation of a permanent of a matrix with repeated rows and columns which is used to define the allowable error fluctuations in the RGPE problem (see Definition 11).
for any repetition vectors S, T ∈ N c such that n = Proof. For simplicity, let us define the n×n matrix X := A S,T so that we have Per(X) = σ∈Sn n i=1 X i,σ(i) . Because X has repeated rows and columns, notice that many terms n i=1 X i,σ(i) of the permanent are the same for different permutations σ. Let us rewrite the permanent so that such terms are grouped together. Notice that A S,T can be thought of as consisting of c 2 blocks, where the (i, j)th block is the constant matrix A si×tj i,j . We block the permutations similarly. Concretely, given a permutation matrix over n elements, let f : S n → N c×c be the function that sums all elements within the same block. For example, suppose we have S = T = (2, 2, 1). Then Let M S,T := {f (σ) : σ ∈ S n } be the set of matrices that arise from applying f to all permutations. We now ask, given a particular matrix M ∈ M S,T , how many times is M the image of some permutation σ? Notice that if we start from a permutation matrix and permute say the first s 1 rows or t 1 columns, then its image under f is invariant. Generalizing, we get where the denominator compensates for overcounting. Therefore, we can write Lemma 24. Let A ∈ C c×c and S ∈ N c such that s i ≥ 1 is given. Defining n := c + k S , let B be a c × n matrix obtained by adding k S columns to the matrix A with entries that depend on complex variables z and y = {y ( ) i,j }. Let B S be the n × n matrix which has s i copies of row i of B . We construct B such that where the coefficients γ 1 , . . . , γ k S are functions of the entries of A, S, and y ( ) i,j otherwise. We refer the reader to Section 4.1 for an explicit example of this construction.
To prove the correctness of this procedure, recall that the permanent of any matrix can be written as the sum of monomials n i=1 c i,σ(i) over all permutations σ. Let us consider the monomial terms in the expression for Per(B S ) that are constant in z. We claim that for each such monomial, there must be at least one element from each row of the original matrix A. Suppose, by contradiction (and without loss of generality), that there is some monomial with no contribution from the elements in the first row of A. By construction, it follows that the monomial must have s 1 -many contributions from the first s 1 rows of the added columns (i.e., from V (1) , . . . , V (c) ). Notice that only V (1) has entries in those first s 1 rows that are not multiplied by z. However, V (1) only has (s 1 − 1)many columns, and hence the monomial in question must have some contribution from the the other added columns, which are all multiplied by the variable z.
Therefore, one element from each of the c rows of A appears in any monomial that is a constant in z. Observe that these c terms form a monomial of the permanent of A. The remaining k S terms must come from the matrices V (i) , for which the only entries that do not depend on z are of the form y Let us now describe in detail how Lemma 24 and its column version are combined to prove Theorem 12. Given A ∈ C c×c , we will construct a (c + k T ) × (c + k S ) matrix B := B[z; x, y] to which we apply the repetition patterns S = (s 1 , . . . , s c , 1, . . . , 1) and T = (t 1 , t 2 , . . . , t c , 1, . . . , 1). There are k T trailing 1's in S, and k S trailing 1's in T . The matrix B is obtained from A by appending the rows/columns which arise in the following two invocations of the Lemma 24: where p(z) is a polynomial of degree k − 1, and we have implicitly defined

B.2 Polynomial interpolation techniques
Starting from Eq. (47), which we restate, we give the proofs for the polynomial interpolation theorems given in Section 4.2 to extract the constant coefficient. We begin with a general technique for approximating the constant term in a complex polynomial: Lemma 25. Let P (z) = γ 0 + k n=1 γ n z n be a complex polynomial. Assume that there is a procedure to obtain an estimateP (z) = P (z) + e(z) for any z, where e(z) is the error in the estimate, satisfying |e(z)| ≤ for all z. Thenγ is an estimate of γ 0 such that |γ 0 − γ 0 | ≤ .
Proof. The proof is by construction. For each j = 0, 1, . . . , k, define z j = e 2πij/(k+1) . Then it holds that where we used the fact that k j=0 z n j = 0 for all n ≤ k. It follows that We then have Combining lemmas 24 and 25, we have shown that given an oracle to estimate Per(B S,T ), it is possible to also estimate Per(A) by calling the oracle k + 1 times for values of z equally distributed along the unit circle. Indeed, definingP =γ 0 /ξ, with ξ as in Eq. (48), we have |P − Per(A)| is at most the error of the oracle times 1/|ξ|. We now use this to provide a reduction from GPE ± to RGPE ± , as per Definition 11. Theorem 13. Given access to an oracle for RGPE ± with error ε (c + k)! s i !t j !, it is possible to use k + 1 calls to the oracle to solve GPE ± with error √ c! where := ε( 1 |ξ| ((c + k)!/c!) s i !t j !) with ξ as in Eq. (48).
Proof. Let A ∈ C c×c be the input matrix to GPE ± . Following Theorem 12, we build a matrix B such that Per(B S,T ) = ξ Per(A) + zp(z). Recall that if we chose |z| = 1 and the variables x i,j to be from the same Gaussian distribution as the entries of A, this ensures that B S,T is a Gaussian random matrix (with repetitions) that can be fed into the oracle for RGPE ± . Now repeat this construction for z j = e 2πij/(k+1) to obtain the matrices B[z j ; x, y] S,T for each j = 0, 1, . . . , k +1. For each such matrix, use the oracle to compute an estimateP (z j ) of its permanent. Then we output an estimateP of ξ Per(A) given bỹ Using Lemma 25, we have |P − ξ Per(A)| ≤ ε (c + k)! s i !t j !, and so the error in the estimateP /ξ for Per(A) is To understand in Theorem 13, we must understand the size of |ξ|. To this end, in Appendix B.3 we will prove that |ξ| = Ω(1.498 k ) with constant probability for sufficiently large k. To succeed with probability 1 − δ for any δ > 0, we note that the random Gaussian variables appearing in the definition of |ξ| come from the additional Gaussian variables (the x ( ) i,j and y ( ) i,j terms) employed in the reduction. Therefore, if those variables are not sufficiently large, we can simply resample them O(log(1/δ))-many times until they are.
Let us now turn to our second polynomial interpolation technique, which works directly with the magnitude of the permanent: Theorem 14. Given access to an oracle for |RGPE| 2 ± with error ε(c + k)! s i !t j !, it is possible to use O(k) calls to the oracle to solve |GPE| 2 ± with additive error c!, for Proof. We begin by taking the absolute squared value of Eq. (47). This leads to an equation of the form where q(z) is polynomial in z andz with q(0) = 0. Contrary to Theorem 13, we cannot compute the left-hand side for z in the unit circle and leverage the corresponding cancellations, since q(z) has monomials that depend on |z| 2 . However, notice that when z is real, Eq. (91) is of the same form as Eq. (8) in [36]. Unsurprisingly, the remainder of this proof closely follows that in [36], and so, we will just walk through their proof outline and emphasize the differences that arise in our case but omit repetition of technical details.
The first step is to choose x

B.3 Lower bound for |ξ|
The purpose of this section is to give a lower bound on the |ξ| term coming from Eq. (48) in Theorem 13: To prove this theorem, let us first focus on bounding the product of the k Gaussian magnitudes that appear in Eq. (98). To simplify the notation, let us simply rewrite this product as X = k i=1 X i where each X i is the random variable for the magnitude of a standard complex Gaussian. One can check that To bound X, we will need the following standard central limit theorem: where C is some universal constant.
Proof. Letting Z i = log(X i ) and Z = k i=1 Z i , we begin by writing One can check that Pr[Z i = z] = 2e 2z−e 2z leading to the following moments:

C Moments of the click distribution
In this appendix we derive the results provided in Section 2.3.3 on click moments. In Appendix C.1, we present a brief review of the Gaussian formalism in terms of covariance matrices an apply it to derive the Husimi covariance matrix of a Gaussian state in which a transition matrix has been encoded. In Appendix C.2, we describe how to reduce the calculation of the mean number of clicks and its variance for an arbitrary multimode state to the calculation of one-and two-mode vacuum probabilities. In Appendix C.3, we show how to fix the scaling factor α in terms of the photon number density using the quarter circle law. Finally, in Appendix C.4, we put together all the results derived previously and derive Eqs. (24) and (25) giving the first and second order moments of the click distribution for the two halves of the modes respectively.

C.1 Gaussian states and covariance matrices
In this appendix we provide a self-contained review of the Gaussian formalism as relevant to the discussion in the main text and the subsequent appendices. We consider a Gaussian state ρ with M modes and zero mean. The latter condition means that a i = Tr(a i ρ) = 0 for all i ∈ {1, . . . , M }. The state ρ is uniquely characterized by its (complex-)Husimi covariance matrix Q with entries where For Q to represent a valid quantum mechanical state it needs to satisfy the uncertainty relation [60] where Z = I M 0 0 −I M , with I M the M × M identity matrix. The marginal state of a subset of the modes of a Gaussian state is another Gaussian state. In particular the covariance matrix of a subset of the modes i 1 , . . . , i K is obtained by keeping rows and columns i 1 , . . . , i K and i 1 + M, . . . , i K + M from the original covariance matrix Q. Finally, note that the diagonal elements of the Husimi covariance matrix are related to the mean photon number in each of the modes as For a given zero-mean Gaussian state the probability of obtaining vacuum in all its modes when measuring its photon number statistics is given by A simple expression exists linking a pure Gaussian state parametrized by an adjacency matrix A = A T ∈ C M ×M and its Husimi covariance matrix [7,19] We now consider the case where M = 2m and we use A to encode a transition matrix C ∈ C m×m as follows We write the following decomposition C = U σV T with U and V m × m unitary matrices and σ a diagonal matrix with 0 ≤ σ i,i < 1. We can then write where is a unitary matrix of size 4m × 4m. With this diagonalization it is direct to write with This form of the covariance matrix will be useful below.

C.2 Detector click statistics
In this section we derive simple expressions for the mean and variance of the number of clicks when a quantum state is probed using threshold detectors. We will obtain expressions for c , c 2 and ∆ 2 c that hold for an arbitrary M -mode quantum state. In the next subsections we will specialize these expressions to zero-mean Gaussian states encoding a random complex-Gaussian transition matrix.
Let ν c be a subset of {1, 2, . . . , M } where ν c contains exactly c elements. Furthermore, let p(ν c ) denote the probability of observing a click in all detectors with labels in ν c . This probability is given by [61] p(ν c ) = Tr where P (j) 0 = |0 j 0 j | is a projector onto the vacuum in mode j and P (j) 1 0 . The probability of observing c clicks is then given by the sum p(c) = νc p(ν c ). Now consider its characteristic function: where in the last line we used the binomial expansion. The expected number of detectors that click c can then be computed from the characteristic function [18] Tr P where we have used the product rule for derivatives. Using the fact that P where we have defined P We now write By setting x = 0 in the last equation we obtain Equivalently, we can use P 0 , which gives where we have defined P Since reduced states of Gaussian states can be computed efficiently [62], it is possible to use these formulas to calculate the average and variance of the number of clicks for these states. Under the assumption of zero-mean Gaussian states, the two types of terms in the equations above can be written as where Q i,j and Q j are the reduced Husimi covariance matrices of modes i, j and j respectively. We note that these quantities can also be estimated efficiently for Gaussian states using Monte Carlo methods as recently shown in Ref. [63].
In the following, we use these results to estimate the average number of clicks for GBS devices that are configured to encode Gaussian random matrices.
C.3 Fixing the scaling factor in terms of the photon number density In Eq. (14) we used the quarter circle law to find a relation between the scaling factor α, the number of modes m and the total mean photon number. From this equation we can solve For future convenience we can also calculate the following matrix norms of the blocks of the Husimi covariance matrix (recall Eq. (115)) when the transition matrix has singular values satisfying the quarter circle law In the equation above we used the fact that the Frobenius norm of a matrix, ||A|| 2 = m i,j=1 |A i,j | 2 can also be expressed as the sum of the squares of the singular values of the same matrix.

C.4 Means and variances of the click distribution
With the preliminary calculations in the last subsections we are ready to tackle the calculation of the means and covariances of the click distribution between the two sets of modes when averaged over the set of Gaussian random matrices.
For this we start with the means, for which we find the covariance of mode i in the first half and similarly if i = j + m is in the second half From the last two equations we find d = m − m i=1 1 Xj,j and similarly e = m − m i=1 1 Yj,j . Now we can let X j,j → 1 + µ and Y j,j → 1 + µ and rearrange to find precisely Eq. (24). Now we consider the covariances between the clicks in the two halves. For this we need the twobody covariance matrices of the modes. Consider first the case where i and j refer to modes in the first half of the set, i.e., i, j ≤ m. In that case their reduced covariance matrix is given by Note that the Taylor expansion is guaranteed to converge since the uncertainty relation for the Q covariance matrix (cf. Appendix C.1) guarantees that X i,i X j,j > |X i,j | 2 . If the two modes i, j are in the second half one can obtain a similar expression to the one above by letting X → Y Finally, if i and j are in different halves one has to be polynomially bounded. Unfortunately, we can no longer use Lemma 5 to bound this quantity. Because α = o(m 1/4 ), the subdominant term of that expression (e 272m/α 4 ) would still contribute exponentially to Eq. (142). Since that expression was obtained through several approximations, it is no longer tight enough. Nevertheless, we claim that the empirical scaling of Eq. (142) still appears to be polynomial. Let us now describe how those numerics were obtained. Given that we do not have an explicit bound on Z, we must choose to estimate it somehow. One approach would be to simply draw a random Gaussian matrix and compute Z from the eigenvalues of that matrix. Unfortunately, this approach is quite slow in practice and we would only be able to calculate Z for relatively modest numbers of modes. Instead, we will choose to estimate Z by an explicit formula. First, consider the quantity for complex Wishart matrix A ∼ W (see Appendix A for explanation of this ensemble). In other words, the inverse of Z is the characteristic polynomial of A. The expectation of the characteristic polynomial for a Wishart matrix is well-understood: Lemma 29 (Edelman [64]). In Fig. 4, we show that the quantity 1/E[Z −1 ] serves as a good estimate for Z by comparing it against its true value (computed directly with the eigenvalues). In Fig. 5, we compute I from Eq. (142) using the estimate 1/E[Z −1 ] for Z and find a relatively benign scaling. We conclude by reiterating that if we could prove the accuracy of our estimator and that this scaling is only polynomial, then we arrive at our desired conclusion: |RGPE| 2 ± ∈ FBPP NP O . Therefore, if one were to also assume that |RGPE| 2 ± is #P-hard, we obtain approximate average-case hardness for GBS in the high-collision m = Θ(E[ n ]) regime.

D.1 Justification for collision subspace
The purpose of this section is to justify that the subspace H only contains collision patterns that suffice for the reduction given by Lemma 24. While it is known that Lemma 24 is insufficient to prove an efficient reduction between GPE ± and RGPE ± , it is still weak evidence that the permanents that appear in the probabilities are hard to estimate. To recap that procedure, we embed the permanent of matrix A into the permanent of a matrix B S,T with repeated rows/columns provided the repetition patterns S and T have a least k T and k S modes (respectively) with a single photon. 20 Notice that H contains repetition patterns with k T = k S = n − c. So, to match the reduction, n − c of the c modes that click must have exactly one photon. We will show that under a relatively modest condition on α, all repetition patterns in H will satisfy this property.
While this may seem like a rather strong condition, the fact that the number of modes that click is always a significant fraction of the total number of photons makes it possible. First notice that there are at most n − c modes (of our total budget of c clicks) that can have more than two photons. For the reduction, we then need a different set of n − c modes to click. That is, we want 2(n − c) ≤ c. (144) Using Eq. (14) and Eq. (24), this will hold whenever α ≥ 3/ √ 2. Finally, we claim that A will be a (2c − n) × (2c − n) matrix because the total number of clicks must be the dimension of A plus the n − c extra clicks required for the reduction (i.e., c = (2c − n) + (n − c)). In order to obtain a hardness proof, we would like A to be relatively large so that computing its permanent is #P-hard. We have that the size of A is at least half the total number of clicks in the distribution whenever 2c − n ≥ c/2. Notice that this is the same condition as that in Eq. (144), and so we only require α ≥ 3/ √ 2. This is more than sufficient to guarantee hardness for all regimes from the dilute limit all the way to when the number of photons is linear in the number of modes. We stress again, however, that our reduction is only efficient given an oracle for Per(B S,T ), and the numerics of the previous section only suggest an oracle for | Per(B S,T )| 2 .