Quantum machine learning with adaptive linear optics

We study supervised learning algorithms in which a quantum device is used to perform a computational subroutine - either for prediction via probability estimation, or to compute a kernel via estimation of quantum states overlap. We design implementations of these quantum subroutines using Boson Sampling architectures in linear optics, supplemented by adaptive measurements. We then challenge these quantum algorithms by deriving classical simulation algorithms for the tasks of output probability estimation and overlap estimation. We obtain different classical simulability regimes for these two computational tasks in terms of the number of adaptive measurements and input photons. In both cases, our results set explicit limits to the range of parameters for which a quantum advantage can be envisaged with adaptive linear optics compared to classical machine learning algorithms: we show that the number of input photons and the number of adaptive measurements cannot be simultaneously small compared to the number of modes. Interestingly, our analysis leaves open the possibility of a near-term quantum advantage with a single adaptive measurement.


Introduction
Quantum computers promise dramatic advantages over their classical counterparts [1,2], but a fault-tolerant universal quantum computer is still far from being available [3].The quest for nearterm quantum speedup has thus led to the intro-duction of various subuniversal models-models that are believed to have an intermediate computational power between classical and universal quantum computing-such as Boson Sampling [4] or IQP circuits [5], recently culminating with the experimental demonstration of random circuit sampling [6] and Gaussian Boson Sampling [7].
Finding practical applications for these subuniversal models, other than the demonstration of quantum speedup, is a timely issue, as it may enable interesting quantum advantages in the era of Noisy Intermediate-Scale Quantum devices [3].
In particular, recent proposals have been driven by subuniversal models such as Gaussian Boson Sampling [25,26] or IQP circuits [5,19].In the latter, the authors considered supervised learning algorithms in which some computational subroutines are executed in a quantum way, namely the estimation of the output probabilities of quantum circuits, or the estimation of the overlap of the output states of quantum circuits.They showed that IQP circuits alone could not provide a quantum advantage for these subroutines and therefore considered minimal extensions of these circuits, in terms of circuit depth.
Hereafter, we study the use of Boson Sampling linear optical interferometers [4], with input photons, for similar quantum machine learning tasks.Instead of extending the depth which was shown to improve machine learning model performances [19] while making the training phase challenging [27][28][29], we allow for adaptive measurementsintermediate measurements that drive the rest of the computation-which provide a natural analogy with the circuit depth in the linear optics picture [30]: by encoding qubits into single-photons and using a sufficient number of adaptive measurements, one can perform universal quantum computing [31].
We give a detailed prescription for performing quantum machine learning with classical data, using adaptive linear optics for computational subroutines such as probability estimation and overlap estimation.
We also examine the classical simulability of these quantum subroutines.More precisely, we give classical simulation algorithms whose runtimes are explicitly dependent on: (i) the number of modes m, (ii) the number of adaptive measurements k, (iii) the number of input photons n and (iv) the number of photons r detected during the adaptive measurements.This effectively sets a limit on the range of parameters for which adaptive linear optics may provide an advantage for machine learning over classical computers using our methods, thus identifying the regimes where a quantum advantage can be envisaged.Boson Sampling instances [4] correspond to the case with no adaptive measurement, while the Knill-Laflamme-Milburn scheme for universal quantum computing [31] corresponds to the case where the number of adaptive measurements scales linearly with the size of the computation.
For probability estimation, we show that the classical simulation is efficient whenever the number of adaptive measurements or the number of input photons is constant.Moreover, the number of input photons and the number of adaptive measurements cannot be simultaneously small compared to the number of modes (see Table 2).
For overlap estimation, we show a similar behaviour, although in this case our results do not rule out the possibility of a quantum advantage with a single adaptive measurement (see Tables 3  and 4).Our main technical contribution is an expression for the inner product of the output states of two adaptive unitary interferometers which is essentially independent of the number of adaptive measurements.
The rest of the paper is organised as follows.In section 2, we provide a background on quantum machine learning with classical data and classical simulation of quantum computations.In section 3, we introduce the model of adaptive linear optics which we consider.We give a prescription for performing probability estimation and overlap estimation with instances of this model, and detail how to use these as subroutines for machine learning problems.In section 4, we derive two classical simulation algorithms, one for each of these two tasks, and we analyse the running time of these algorithms.We conclude in section 5.

Background
2.1 Kernel methods for quantum machine learning

Encoding classical data with quantum states
We consider a typical machine learning problem, such as a classification problem, where a classical dataset D = { x 1 , . . ., x |D| } from an input set X is given.One way to use quantum computers to solve such problems is to encode classical data onto quantum states such that there exists a socalled feature map x l → |φ( x l ) which can be processed by a quantum computer.
Definition 1 (Feature map [20]).Let F be a Hilbert space, called feature Hilbert space, X a non empty set, called input set, and x ∈ X .A feature map is a map φ : X → F from inputs to vectors in the Hilbert space.
Many machine learning algorithms perform well in linear cases such as the support vector machines which will be used hereafter.However, many real world problems require non-linearity to make successful predictions.By using kernel methods one can introduce non-linearity and use estimation methods that are linear in terms of the kernel evaluations.
The kernel corresponds to a dot product in a feature space (here in a high-dimensional Hilbert space).In [32], it is shown that the notions of feature map in Hilbert space and kernel can be connected.A straightforward way is to define a (quantum) kernel K from a feature map φ as follows: where x m ∈ X , x m ∈ X and •|• F is the inner product over the Hilbert space F.

Using Feature Hilbert Spaces for Machine Learning
There are two main ways to use feature Hilbert spaces: • The quantum variational classification [19] or explicit approach [20]: the entire model computation is preformed on a quantum device trained by a hybrid variational quantum-classical or classical algorithm [14,15,33].In this case, the probability distribution over the possible outcomes is used for the classification.Hence, while the feature map is used to encode the data, the kernel is not known directly from the quantum device.
In [19], the probability to obtain a certain binary output b i for the classical data input x is given by: where U φ(x) encodes the feature map and W (θ) corresponds to the quantum support vector machine.
• The quantum kernel estimation [19] or implicit approach [20]: a quantum-assisted method where the quantum device is only used to evaluate the kernel and the rest of the machine learning algorithm is carried on by a classical algorithm.In [19], the kernel between two classical data vectors x l and x l is given by: In order to challenge and justify the use of a quantum-assisted method, it is important to consider how hard it would be for a classical machine to compute the same quantities.Indeed, if the output probability or kernel could be estimated directly classically, there would be no need for a quantum computer for this task.Hence, when it comes to classical hardness, the first method requires that estimating the output probability in Eq. ( 2) is hard, while the second method requires that estimating the overlap in Eq. (3) is hard (for classical computers).We give more formal definitions for these classical simulation tasks in the following section.

Classical simulation of quantum computations
Depending on the approach used for simulating classically the functioning of quantum devices, several notions of simulability are commonly used.One could ask the classical simulation algorithm to mimic the output of the quantum computation [34,35]: informally, a quantum computation is weakly simulable if there exists a classical algorithm which outputs samples from its output probability distribution in time polynomial in the size of the quantum computation.Various relaxations of this definition are possible, allowing the classical sampling to be approximate rather than exact, or to abort with a small probability.The existence of such an efficient classical simulation has been ruled out for various subuniversal models of quantum computing, such as IQP circuits [5] or Boson Sampling [4], under complexity-theoretic conjectures-both in the exact case and the approximate case, up to additional conjectures.While weak simulation of quantum computations is arguably the most commonly studied, other notions of classical simulation may be useful: if the output samples of a quantum computation are used to compute a quantity which may be computed efficiently classically by other means, it is no longer necessary to simulate the whole quantum device.We consider two concrete examples which are prominent for variational quantum algorithms in quantum machine learning: probability estimation and overlap estimation [19,20].

Definition 3 (Probability estimation).
Let P be a probability distribution over M outcomes and let , δ > 0. Given any outcome x in the sample space of P , probability estimation refers to the computational task of outputting an estimate P [x] such that with probability greater than 1 − δ, in time O(poly ( M ) log 1 δ ).Efficient probability estimation thus amounts to outputting an estimate of the probability of a fixed outcome with polynomially small additive error, with exponentially small probability of failure, in polynomial time in the size of the computation.One may use the samples from a quantum computation in order to perform efficiently probability estimation for any given outcome: given a quantum device of size M which outputs samples from some probability distribution and a fixed outcome x in the sample space, one may run the device poly (M ) times, recording the value 1 whenever the outcome x is obtained and the value 0 otherwise.Then, the frequency of the outcome x over the poly (M ) uses of the quantum device is a polynomially precise additive estimate of the probability of the outcome x with exponentially small probability of failure, by virtue of Hoeffding inequality [36].
Weak simulation is at least as hard as probability estimation, since by the previous reasoning one may obtain polynomially precise additive estimates of probabilities from samples of the probability distribution.Moreover, there are some quantum computations for which weak simulation is hard for classical computers (assuming widely believed conjectures from complexity theory), but probability estimation can be done efficiently classically.This is the case for IQP circuits [5,19], Boson Sampling interferometers [4] and even the circuit implementing the period-finding subroutine of Shor's factoring algorithm [2].We detail the latter case in Appendix A, for the sake of clarifying the relations between these different notions of classical simulation.Note also that computing exactly output probabilities of quantum circuits or interferometers is a #P-hard problem, therefore expected to be hard classically and quantumly [37].
A more general computational task than probability estimation in the context of quantum computing is the following: Definition 4 (Overlap estimation).Let |φ and |ψ be output states of two efficiently describable quantum computations of size M and let , δ > 0. Overlap estimation refers to the computational task of outputting an estimate Õ such that with probability greater than 1 − δ, in time O(poly ( M ) log 1 δ ).The overlap between two quantum states is a measure of their distinguishability [38] and over-lap estimation thus is related to quantum state discrimination.Several techniques exist to perform quantumly the overlap estimation of two states |φ and |ψ [39].One of them is to perform the swap test [40] with various copies of both states.
Overlap estimation can be also done efficiently classically for IQP circuits.Nonetheless, this family of circuits has been identified as a promising venue for implementing quantum machine learning algorithms [19], when enlarged to contain similar circuits with bigger depth.Motivated by this approach, we consider hereafter the case of another subuniversal model: Boson Sampling [4] with input photons, supplemented with adaptive measurements.

Quantum machine learning with adaptive linear optics
In what follows, we study Boson Sampling architectures [4], supplemented with a given number of adaptive measurements-that is, some of the modes are measured throughout the computation and the rest of the computation can depend on their outcome-which we refer to as adaptive linear optics.In this section, we derive quantum algorithms for performing probability and overlap estimation with adaptive linear optics, together with a prescription for utilising these algorithms as subroutines in supervised learning algorithms.

Adaptive linear optics
Hereafter, we detail the computational model of adaptive linear optics which we consider.We first introduce a few notations.
In the linear optics picture, the input states we consider are multimode photon number Fock states over m modes (we use bold math for multiindex notations, see Table 1): where s i and â † i are respectively the number of photons and the creation operator for the i th mode [41].We identify these states with m-tuples of integers s = (s 1 , . . ., s m ) ∈ N m .Following [4], let us define, for all n ∈ N, .We consider a unitary interferometer of size m, described by an m × m unitary matrix U = (u ij ) 1≤i,j≤m .Unlike in the circuit picture, the matrix U does not act on the computational basis, which is in this case the infinite multimode Fock basis, but rather describes the linear evolution of the creation operator of each mode.More precisely, We write Û instead the unitary action of the interferometer on the multimode Fock basis.Because the interferometer conserves the total number of photons, for all p, q ∈ N, all s ∈ Φ m,p and all t ∈ Φ m,q s| Û |t = 0 (9) Eq. (6) and Eq.(8) we obtain [4] s| where U s,t is the n×n matrix obtained from U by repeating s i times its i th row and t j times its j th column for i, j = 1, . . ., m, and where the permanent of a r × r matrix A = (a ij ) 1≤i,j≤r is defined as where S r is the symmetric group over {1, . . ., r}.
We write Pr m,n [.|t] the probability distribution of the outputs over Φ m,n of the unitary interferometer U acting on an input |t .With the previous notations we obtain, for all p, q ∈ N, all s ∈ Φ m,p and all t ∈ Φ m,q , where δ pq is the Kronecker symbol.
In what follows, we fix the input state |t = |1 n 0 m−n , with single-photon states in the first n modes and vacuum states in all other modes, where the superscript indicates the size of the string (0, . . ., 0) or (1, . . ., 1) when there is a possible ambiguity.We consider the case of linear optical quantum computing with adaptive photonnumber measurements, which we refer to as adaptive linear optics (see Fig. 1).We denote by k ∈ {0, . . ., m} the number of single-mode adaptive measurements.Without loss of generality, we assume that the first k modes are measured adaptively throughout the computation and we write p = (p 1 , . . ., p k ) the adaptive measurement outcomes.For r ∈ N and p ∈ Φ k,r , let us define ) where 1 j is the identity matrix of size j and where the unitary matrices U j depend on the measurement outcomes p 1 , . . ., p j for all j ∈ {1, . . ., k}.An adaptive interferometer U over m modes with n input photons and k adaptive measurements is then represented as a family of nonadaptive unitary interferometers for each possible adaptive measurement outcome p.The matrix U p describes the interferometer in Fig. 1, when the adaptive measurement outcome p = (p 1 , . . ., p k ) ∈ Φ k,r has been obtained, where r is the number of photons detected during the adaptive measurements.In this case, the output state is a pure state which reads: where z l P P i v D s f 8 9 Y V J 5 8 5 g j 9 w P n 8 A u A K Q W g = = < / l a t e x i t >

{ < l a t e x i t s h a 1 _ b a s e 6 4 = " b O G K O 5 S T C v M 4 F D U S k n S g c e O I w 3 8 = " >
Figure 1: Linear optical computing model with k adaptive measurements and input state |1 . . . 10 . . .0 , with n photons over m modes.The output modes are measured using photon number detection.For all j ∈ {1, . . ., k}, the unitary interferometer U j , acting on m − j modes, depends on the measurement outcomes p 1 , . . ., p j .The adaptive measurement outcomes p 1 , . . ., p k are used to drive the computation, whose final outcome is s 1 , . . ., s m−k .

Quantum probability and overlap estimation
We now detail how to perform probability estimation or overlap estimation with an adaptive linear optical interferometer as described in the previous section.
For estimating an output probability with a quantum circuit, one may run the circuit O(poly m) times, obtaining classical outcomes, for which the frequency gives a polynomially precise additive estimate of the probability which can be computed efficiently.In the case of a circuit with adaptive measurements, one only looks at the final measurement outcomes and the same holds for adaptive linear optical computations.
For estimating the overlap of output states of two unitary quantum circuits, one may run both circuits in parallel and compare their quantum output states, for example with the swap test [40].Doing so a polynomial number of times provides a polynomially precise estimate of the overlap.Alternatively, writing C 1 and C 2 the unitary circuits, one may build the circuit C 1 C † 2 and project the output quantum state onto the input state.
In the case of circuits with adaptive measurements, there is a different output state for each adaptive measurement outcome.In particular, if the number of possible adaptive measurement outcomes is exponential in the size of the computation, then the probability distribution for these outcomes has to be concentrated on a polynomial number of events for the quantum overlap estimation to be efficient.This is because in or-der to compute a polynomially precise estimate of the overlap, say, | φ|ψ | 2 , the states |φ and |ψ , both corresponding to specific adaptive measurement results, have to be obtained a polynomial number of times.
For adaptive linear optics over m modes with n input photons and k adaptive measurements, the number of possible adaptive measurement outcomes is given by where the sum is over the total number of photons detected at the stage of the adaptive measurements.Hence, for overlap estimation to be efficient, either the probability distribution for the adaptive measurements outcomes is concentrated on a polynomial number of outcomes, or In what follows, we do not assume concentration of the adaptive measurement outcome probability distribution and consider general interferometers with adaptive measurements.In this setting, the quantum efficient regime for overlap estimation thus corresponds to n+k n = O(poly m).Let U = {U r |r ∈ Φ k,r , 0 ≤ r ≤ n} be an adaptive linear interferometer with n input photons and k adaptive measurements and let |φ and |ψ be two output states.Let p and q denote the outcomes of the adaptive measurements for |φ and |ψ , respectively, so that U p is the interferometer for |φ and U q is the interferometer for |ψ , with input Fock state |t .With Eq. ( 15) we obtain (17) Because of the conservation of the total number of photons, the overlap between the states |φ and |ψ is zero if |p| = |q|.Otherwise, it can be estimated using a polynomial number of copies of the state |ψ as follows: send the input |p ⊗ |ψ into the interferometer with unitary matrix U p † and measure the photon number in each output mode; record the value 1 if the measurement pattern matches the Fock state |t and the value 0 otherwise.Then, the mean of the obtained values yields a polynomially precise estimate of the overlap | φ|ψ | 2 by Eq. ( 17) and Hoeffding inequality.A similar procedure can be followed when |φ and |ψ are output states of two different adaptive interferometers.
Note that this overlap estimation requires the preparation of the Fock state |p but no adaptive measurements.By symmetry, one could estimate the overlap alternatively using a polynomial number of copies of the state |φ and preparing the Fock state |q , or by mixing copies of |φ and |ψ .In practice, one may run the adaptive interferometer U and apply the above procedure for |φ or |ψ on the fly, depending on whether the adaptive measurement outcome obtained is equal to q or p (see Algorithm 1 for overlap estimation including this state preparation step).
These tasks of probability or overlap estimation may be performed in particular as subroutines for machine learning algorithms, as we detail in the next section.

Support vector machine with adaptive linear optics
We consider a training dataset We also consider a feature map that is given by a subset of the family introduced in Eq. ( 14): U φ = {U p l l } l , in which for each classical data x l there is a unitary operator U p l l .

Support vector machine with quantum kernel methods
The goal of a support vector machine [42][43][44] is to find the maximum-margin hyperplane that divides the set of points by their y values.Such an hyperplane is defined by a vector w ∈ R d and b ∈ R. With the hard-margin condition we have, In this case, a decision function over a new point x can be constructed directly from the hyperplane as follows: When we have access to a feature map φ, the vector w can be decomposed as for some α l ∈ C, and the decision function becomes where the kernel κ( x l , x) has an interpretation in terms of the feature map φ as given in Eq. (1).
In order to find the best separating hyperplane, one may use a quadratic programming solver.In section 3.3.2,we consider the case where the programming solver is composed of a hybrid quantum-classical algorithm whereas in section 3.3.3,such solver will be external to the quantum device.The optimisation program that needs to be solved is: where λ is a parameter that is introduced for the soft-margin condition.By solving such problem, one obtains the optimal coefficients {α * l } and b * of the hyperplane (see Eq. ( 20)).This formulation is for the quantum case , but similar methods can be applied in the classical case [43].

Explicit method: probability estimation
The explicit method corresponds to the case where the prediction-assigning a classification label to the data-is obtained from the probability distribution when using a quantum device.
Hereafter, we consider using adaptive linear optics for the feature map and use its output state as an input state of a second quantum device for the support vector machine [19].The choice of the feature map ultimately depends on the architecture at hand: one may use, e.g., reconfigurable interferometers [45] in the linear optics picture.We show that Boson Sampling can be used to realize a support vector machine algorithm by using a hybrid variational quantumclassical for the training phase [14,15,33].We write the Boson Sampling interferometer operation BS( θ), where θ are the parameters of the beam splitters and phase shifters of the interferometer, since any interferometer can be implemented efficiently with only phase shifters and beam splitters [46].In order to make a prediction we need to bin the outcomes.For a given function g : Φ m,n → {−1, +1}, we can write the following observable for the binning: The observable N can also be decomposed in For a given data point x with the associated operation Û px x , the probability to obtain the outcome y is: In the quantum circuit model, it has been explicitly shown in [19] that the equivalent of Eq. (24) in the circuit picture is related to the decision function of a support vector machine.This proof relies on decomposing the variational quantum circuit in the Pauli basis.Moreover, in [47], the authors provide a quantum circuit that simulates Boson Sampling interferometers with input photons, using the quantum Schur transform.Hence, the unitary BS( θ) can be decomposed in the Pauli basis (for qudits), thus providing the same relation for Eq.(24) to the decision function of a support vector machine.
Experimentally, the probability Pr(y|p x ) can be estimated using the following approximation where T x is the number of times where the value p x has been recorded and T y|x is the number of times where the value y has been recorded after that the value p x has been also recorded.
In order to train the Boson Sampling device, it is necessary to use a cost function J( θ) and an optimisation routine.For the optimisation routine, standard variational optimisation methods can be used such as hybrid variational quantumclassical or classical algorithms [14,15,33].Different cost functions can be used such as the empirical risk function [19] or the square-loss function [20].However, other type of cost function may not be suitable for this kind of problems, such as the ones usually used in gate synthesis or quantum state preparation [48,49].
The algorithm for the training phase is described in Algorithm 2 and the prediction phase is described in Algorithm 3. l .Store output at kernel matrix entry K l,l = κ( x l , x l ).endfor endfor Solve the optimisation problem in Eq. (22) with T and K. return {α * l } and b * .

Implicit method: overlap estimation
The implicit method corresponds to the case where the kernel is computed by a quantum device, while the rest of the machine learning algorithm is performed on a classical machine.In adaptive linear optics, the kernel can be estimated using Algorithm 1.It is possible to proceed to both the training and prediction phase using this hybrid quantum-classical method.The algorithm for the training phase is described in Algorithm 4 and the prediction phase is described in Algorithm 5.

Classical simulation of adaptive linear optics
The complexity of probability estimation and overlap estimation of quantum computations has been well studied in the circuit model [50,51].In this section, we challenge the quantum algorithms introduced in the previous section by deriving classical algorithms for probability estimation and overlap estimation for adaptive linear optics over m modes.We identify various complexity regimes for different numbers of input photons n ≤ m, different number of adaptive measurements k ≤ m, and different number of photons r ≤ n detected during the adaptive measurements.To do so, we derive generic expressions for the output probabilities and for the overlap of output states of linear optical interferometers with adaptive measurements.

Classical probability estimation
Firstly, we obtain a classical algorithm for probability estimation of adaptive linear optics over m modes with n input photons and k adaptive measurements.
We first consider the case k = 0, i.e., Boson Sampling.The probability of an outcome s ∈ Φ m,n for a unitary interferometer U given the input t = (1 n , 0 m−n ) ∈ Φ m,n is given by Eq. (12): where U s,t is the n × n matrix obtained from U by repeating s i times its i th row for i ∈ {1, . . ., m} and removing its j th column for j = {n + 1, . . ., m}.When |s| = n however, the probability is 0, since the input t has n photons and the linear interferometer does not change the total number of photons.The permanent of a square matrix of size n can be computed exactly in time O(n2 n ), thanks to Ryser's formula [52].However, polynomially precise estimates of the permanent of a square unitary matrix can be obtained in polynomial time in the size of the matrix using an algorithm due to Gurvits [53], later generalised to matrices with repeated lines or columns [54], so probability estimation can be done classically efficiently, which was already noted in [4].
We now turn to the case k > 0-which to our knowledge has not been treated elsewhere-using the notations of section 3.1.This case is a direct extension of the case k = 0.With Eq. ( 12), for r ∈ N, p ∈ Φ k,r and s ∈ Φ m−k,n−r , the probability of an total outcome (p, s) ∈ Φ m,n (adaptive measurement and final outcome) is given by Let r ∈ {0, . . ., n} and let s ∈ Φ m−k,n−r .Then, the probability of obtaining the final outcome s after the adaptive measurements reads The sum is taken over the elements of Φ k,r , which has k+r−1 r elements, where r ≤ n is the total number of photons detected during the adaptive measurements.Each permanent in the sum can be estimated additively with polynomial precision in time O(poly m), using the generalised algorithm from [54] (see Appendix B for details).Hence, probability estimation can be done classically efficiently whenever the sum has a polynomial number of terms.In particular, as long as both k and r are O(log m), the output probability can be estimated efficiently.The simulability regimes are summarised in Table 2.
The universal quantum computing regime corresponds to n = O(m) and k = O(m) [31]

Classical overlap estimation
In this section, we obtain a classical algorithm for overlap estimation of output states of adaptive linear optical interferometers over m modes with n input photons and k adaptive measurements.
Once again, we start with k = 0.With Eq. (10), the output state of an m-mode interferometer U with input state t ∈ Φ m,n reads where U s,t is the n × n matrix obtained from U by repeating s i times its i th row for i ∈ {1, . . ., m} and repeating t j times its j th row for j ∈ {1, . . ., m}.The composition of two interferometers is another interferometer which unitary representation is the product of the unitary representations of the composed interferometers.Hence, the inner product of the output states |φ and |ψ of two m-mode interferometers U and V with the same input state t ∈ Φ m,n , is equal to the matrix element t, t of Û † V : where we used in the third line t ∈ Φ m,n and the fact that Û † V conserves the space Φ m,n .With the input t = (1 n , 0 m−n ) with n photons in m modes, this reduces to where (U † V ) n is the n × n top left submatrix of U † V .Hence, the inner product and the overlap may be approximated to a polynomial precision efficiently, since this is the case for the permanent [53,54], as detailed in the previous section.We now consider the case k > 0. Let r ∈ N and let p ∈ Φ k,r .With Eq. (10) and Eq. ( 15), writing Pr adap m,n [p] the probability of obtaining the adaptive measurement outcome p, the output state of the interferometer U p with k adaptive measurements with input t = (1 n , 0 m−n ) in Fig. 1, when the adaptive measurement outcome p is obtained, reads 1 where and where Pr adap m,n [p] = ψ p |ψ p .The inner product of two (not normalised) output states |ψ p and |ψ q of m-mode interferometers U p and V q with k adaptive measurements is zero if |p| = |q|, and when |p| = |q| = r, it is thus given by (34) This expression is a sum of |Φ m−k,n−r | terms, which is exponential in m whenever n is not constant.Using properties of the permanent [55], we show that the expression in Eq. (34) may actually be rewritten as a sum over fewer terms, which constitutes our main technical result: Lemma 1.Let r ∈ N. The inner product of two (not normalised) output states |ψ p and |ψ q of m-mode interferometers U p and V q with adaptive measurements outcome p, q ∈ Φ k,r is given by where for all i, j ∈ {0, 1} n such that |i| = |j| = r, is an r×r matrix which can be obtained efficiently from U p , is an r×r matrix which can be obtained efficiently from V q , and ) is an (n−r)×(n−r) matrix which can be obtained efficiently from U p and V q .
We give a detailed proof in Appendix C. By Lemma 1, the inner product is expressed as the modulus squared of a sum over n r 2 products of three permanents, of square matrices of sizes |p| = r, |q| = r and (n − r), respectively.In the worst case, when r = n/2, the sum has at most O(4 n ) terms, up to a polynomial factor in n.In particular, when n = O(log m) or r = O(1), the inner product reduces to a sum of a polynomial number of terms, which can all be computed in time O(poly m) with Ryser's formula [52].An interesting fact is that the cost of computing the inner product does not depend explicitely on the number k > 0 of adaptive measurements.However, it does depend explicitly on r the total number of photons detected during the adaptive measurements, and a larger number of adaptive measurements k favors the detection of a larger number of photons r.The overlap of normalised ouput states is given by which may also be computed efficiently when n = O(log m).In this case, the classical algorithm for overlap estimation simply computes the above expression, using Lemma 1 for each of the inner products.The running time of this classical algorithm thus is O( n r 2 poly m) and its efficiency is summarised as a function of n and k in Table 3 and as a function of n and r in Table 4.
Since the quantum efficient regime corresponds to k+n n = O(poly m) there is a possibility of quantum advantage for overlap estimation when k = O(1) and n = O(m).However, like for probability estimation, the fraction r n of input photons detected during the adaptive measurements has to be sufficiently large to prevent efficient classical simulation.In this case, when the number of adaptive measurements satisfies k = O(1), the interferometers used must concentrate many photons on the adaptive measurements.

Conclusion
In this work, we have given a roadmap for performing quantum variational classification and quantum kernel estimation using adaptive linear optical interferometers.We have investigated the transition of classical simulation between Boson Sampling [4] and the Knill-Laflamme-Milburn scheme for universal quantum computing [31], in terms of the number of adaptive measurements performed.In particular, we have derived classical algorithms for simulating the quantum computational subroutines involved: output probability estimation and output state overlap estimation.
In the case of probability estimation, the possible regimes for quantum advantage are incompatible with near-term implementations: both the number of adaptive measurements k and the number of input photons n must be greater than log m, where m is the number of modes.On the other hand, for overlap estimation, there is a possibility of near-term computational advantage using adaptive linear optics with a single adaptive measurement, requiring the preparation of photon number states.Note that the interferometer should be concentrating many photons r at the stage of the adaptive measurements in order to obtain overlaps that are possibly hard to estimate.Using more adaptive measurements does not increase the complexity (apart from polynomial factors in m), but may ease the detection of a larger number of photons during the adaptive measurements.
Our results suggest regimes where quantum advantage for machine learning with adaptive linear optics is possible, in the parameter regimes where our classical simulation algorithms fail to be efficient: it is an interesting open question whether better classical simulation algorithms for the computational subroutines involved can be found, or even classical algorithms solving directly the machine learning problems efficiently.In any case, our results restrict the parameter regimes for which such a quantum advantage may be possible: support vector machine with adaptive linear optics has to take place either in a bunching regime-concentrating many photons in the adaptive measurements-or using a number of adaptive measurements scaling with the size of the problem.In both cases, this imposes strong experimental requirements.
Our results have identified a sweet spot for quantum kernel estimation with adaptive linear optics using a single adaptive measurement, which would be interesting to demonstrate experimentally.In the longer term, variational classification with adaptive linear optics could also be interesting since it may enable quantum advantage in a regime where the quantum algorithm for overlap estimation is no longer efficient.
As previously mentioned, it is a pressing question whether more efficient classical algorithms may be derived.In practical settings, taking into account photon losses could help providing more efficient classical simulation algorithms [56].We leave these considerations for future work.
This algorithm has been refined in the case of matrices with repeated lines or repeated columns: Lemma 3 ([54]).Let B be an m × n matrix.Let q = (q 1 , . . ., q m ) ∈ Φ m,n and let A = B q,1 n be the n × n matrix obtained from B by repeating q i times its i th line.Then, Per A may be estimated classically with additive precision ± • q 1 !...qm! √ q q 1 1 •••q qm m B m with high probability, in time O( mn 2 ), where B is the largest singular value of B.Moreover, This lemma also allows to approximate efficiently | Per A| 2 .Indeed, let 0 < < 1 and let z be an estimate of Per A with additive error ± • q 1 !...qm! √ where we used Eq.(45) in the third line.
In particular, when B is an m×n submatrix of a unitary matrix (implying B ≤ 1), we may compute an estimate of | Per A| 2 , where A = B q,1 n is the n × n matrix obtained from B by repeating q i times its i th line, with additive precision ± • q 1 ! 2 ...qm! 2 q q 1 1 •••q qm m with high probability, in time O( mn2 ).The matrices in Eq. (44) are submatrices of unitary matrices, with repeated lines.Hence, estimating independently all the terms in the sum in Eq. (44) where we used that for all n ∈ N, n! ≤ n n .Note that a tighter bound may be obtained, but the above one is sufficient for our needs.In particular, when |Φ k,r | = O(poly m), the above procedure provides a polynomially precise additive estimate of the probability Pr final m,n [s] in time O(poly m), with high probability.
We have This quantity is polynomial in k (resp. in r) when r = O(1) (resp.k = O(1)).Moreover, the partial trace is over the first k modes and where |p denotes the k-mode Fock state |p 1 . . .p k .At the end of the computation, all the remaining m − k modes are measured with photon-number detection, yielding the final outcome s = (s 1 , . . ., s m−k ) ∈ Φ m−k,n−r .

=
O(poly m), which is the case for example when n = O(1) and k = O(m), or n = O(log m) and k = O(log m), or n = O(m) and k = O(1).

Algorithm 2 :
Explicit method: training phase Input: Training dataset T , feature maps U φ , optimisation routine with cost function J( θ).Parameters: Input |t , number of shots T , initial parameters θ 0 .Set θ = θ 0 .while J( θ) not converged do for l = 1 to |T | do Set c p l = 0. Set c y = 0 ∀y ∈ C. while c p l < T do Run BS( θ)U p l l on |t , obtaining adaptive measurement outcome r and outcome label y. if r = p l then c p l → c p l + 1. c y → c y + 1. else end while Compute Pr(y|p l ) = c y /T .end for Compute J( θ) and update θ. end while return Value J( θ * ) and final θ * .

Algorithm 3 :
Explicit method: prediction phase Input: Unlabeled data x, feature map U φ(x) , optimal parameters θ * .Parameters: Input |t , number of shots T x .Set c px = 0. Set c y = 0 ∀y ∈ C. while c px < T x do Run BS( θ * )U φ(x) on |t , obtaining adaptive measurement outcome r and outcome label y. if r = p x then c px → c px + 1. c y → c y + 1. else end while Compute Pr(y|p x ) = c y /T x .return argmax y {Pr(y|p x )} .term of projectors as N = Π +1 − Π −1 , where

Algorithm 4 :
Implicit method: training phase Input: Dataset T , Feature maps U, quadratic programming solver.Parameters: Input |t and number of shots T .for l = 1 to |T | do for l = 1 to |T | do Run U p l ⊗T l on input |t ⊗T .Run Algo. 1 with input U p l l , U p l

Algorithm 5 :
Implicit method: prediction phase Input: Dataset T , unlabeled data x, optimal parameters {α * l } and b Parameters: Input |t and number of shots T .Set the variable f = b.for l = 1 to |T | do Run U px⊗T x on input |t ⊗T .Run Algo. 1 with parameters U px⊗T x |t ⊗T , U p l l .f → f + α l y l κ( x l , x). endfor return Return sign(f ).
. The time complexity of the classical simulation is O k+r−1 r poly m and there is a possibility of subuniversal quantum advantage for probability estimation for n = O(log m) and k = O(m), or n = O(m) and k = O(log m).Moreover, the fraction r n of input photons detected during the adaptive measurements has to be sufficiently large to prevent efficient classical simulation.
T , with |T | points of the form {( x 1 , y 1 ), . . ., ( x |T | , y |T | )}, where x l ∈ Adaptive interferometer U = {U r |r ∈ Φ k,r , 0 ≤ r ≤ n}, adaptive measurement outcomes p, q.Parameters: Input |t and number of shots T .Set the state preparation counter c sp = 0. Set the overlap counter c over = 0. while c sp < T do Run U on input |t , obtaining adaptive measurement outcome r and output state |χ .if r = p then c sp → c sp + 1. Run U q † on input |q ⊗ |χ .Measure in photon number basis.Record measurement outcome s.
if s = t then c over → c over + 1, else c over → c over .else if r = q then c sp → c sp + 1. Run U p † on input |p ⊗ |χ .Measure in photon number basis.Record measurement outcome s. if s = t then c over → c over + 1, else c over → c over .else return c over /T .R d , y l ∈ C, ∀l ∈ {1 . . ., |T |} and where C = {−1, 1} in the case of binary classification.

Table 2 :
Simulability regimes for probability estimation as a function of the parameters r (the total number of photons detected during the adaptive measurements) and k (the number of single-mode adaptive measurements).In green is the parameter regime for which the classical probability estimation is efficient, i.e., takes polynomial time in m, while in red is the regime where it is no longer efficient.

Table 3 :
Simulability regimes for classical overlap estimation as a function of the parameters n and k.In green is the parameter regime for which the classical overlap estimation may be efficient (i.e., may take polynomial time in m) depending on the value of r, and in red is the regime where the classical algorithm is no longer efficient.The cells containing a symbol "• • • " correspond to parameter regimes for which the quantum algorithm for estimating the overlap is no longer efficient.