Ancilla-free continuous-variable SWAP test

We propose a continuous-variable (CV) SWAP test that requires no ancilla register, thereby generalizing the ancilla-free SWAP test for qubits. In this ancilla-free CV SWAP test, the computational basis measurement is replaced by photon number-resolving measurement, and we calculate an upper bound on the error of the overlap estimate obtained from a ﬁnite Fock cutoﬀ in the detector. As an example, we show that estimation of the overlap of pure, centered, single-mode Gaussian states of energy E and squeezed in opposite quadratures can be obtained to error (cid:15) using photon statistics below a Fock basis cutoﬀ O ( E ln (cid:15) − 1 ) . This cutoﬀ is greatly reduced to E + O ( √ E ln (cid:15) − 1 ) when the states have rapidly decaying Fock tails, such as coherent states. We show how the ancilla-free CV SWAP test can be extended to many modes and applied to quantum algorithms such as variational compiling and entanglement spectroscopy in the CV setting. For the latter we also provide a new algorithm which does not have an analog in qubit systems. The ancilla-free CV SWAP test is implemented on Xanadu’s 8-mode photonic processor in order to estimate the vacuum probability of a two-mode squeezed state.


Introduction
Proposals for implementing quantum algorithms using photonic or continuous-variable (CV) quantum systems are motivated by the fact that CV systems are candidates for universal, fault-tolerant quantum computation [1][2][3][4][5]. Quantum algorithms such as Deutsch-Jozsa [6,7] and Grover search [8] have counterparts in CV systems that make use of a linear optical implementation of the quantum Fourier transform (which is just a π/2 phase shift), and finite-precision homodyne measurements.
The SWAP test is a basic quantum algorithm for estimation of the pure state overlap | ψ|φ | 2 (tr(ρσ) for mixed states σ and ρ) originally introduced in the context of quantum fingerprinting [9]. In its original formulation, the algorithm requires 1. a tripartite register ABC, 2. a Hadamard gate on C, followed by a controlled-SWAP gate acting on AB and controlled on C that maps |ψ A |φ B → |φ A |ψ B conditional on |1 C , followed by a final Hadamard gate on C, 3. computational basis measurement of "ancilla" register C. The quantum circuit is shown in Fig. 1a. 1 The main drawback of this algorithm is the use of the controlled-SWAP gate, which is both costly in terms the number of gates and precludes the possibility of parallelization. This is the motivation for the ancilla-free algorithms discussed in this paper (which are also, therefore, free of controlled unitaries).
When applied to quantum states of single photons, the success probability of the SWAP test is seen to be equal to the coincidence fraction in the Hong-Ou-Mandel effect [10]. This identification motivated the discovery of an ancilla-free, destructive SWAP test for qubits with no ancillas. It was later shown that this version of the SWAP test can be identified by an algorithm that optimizes quantum circuits with respect to depth, in this case depth two [11]. Applications of the SWAP test and its multi-register generalizations include variational quantum algorithms for estimation of rank, quantum entropies, and quantum Fisher information [12], entanglement spectroscopy and estimation of polynomial functions of quantum states [13][14][15][16], virtual cooling and error mitigation [17,18], and implementing nonlinear transformations of quantum states [19].
In this work, we present a destructive, ancilla-free CV SWAP test that makes use of linear optical operations and photon number-resolving detection. The algorithm is not a generalization of the destructive, ancilla-free SWAP test for qubits, but rather utilizes the fact that eigenvectors of the CV SWAP gate are images of tensor products of Fock states under a 50:50 beamsplitter. We show how, given a desired precision of the overlap estimate and the energy E of the input states, one can determine a Fock basis cutoff M ( , E) for the photon number-resolving measurement that is sufficient to achieve this precision. These methods are illustrated for pairs of equal-energy CV Gaussian states such as squeezed and anti-squeezed vacuum and antipodal coherent states. After providing example applications of the ancilla-free CV SWAP test, our last main result introduces the ancilla-free, CV PERM test, which allows to calculate trρ L , L ≥ 2, for any CV state ρ by measurement of the cyclic permutation operator.
The structure of the paper is as follows. We conclude the introduction with a primer on multimode CV systems, where we introduce the main mathematical concepts and notation needed for the presentation of the results of this work. In Section 2 we review the ancillafree destructive SWAP test of Refs. [10,11] for qubits and generalize this algorithm to qudits. In Section 3 we discuss the main result of this paper: an ancilla-free CV SWAP test algorithm. In Section 3.1 we provide a proof-of-principle experimental implementation of this algorithm on Xanadu's X8 device by estimating the vacuum probability of two-mode squeezed states. Section 4 is dedicated to applications. In Section 4.1, we first show that cost functions utilized in variational compiling of CV quantum circuits can be computed by a generalization of the two-mode, ancilla-free CV SWAP test. As a second application, in Section 4.2, we give a CV generalization of the so-called "two-copy test" [14] which is then used to estimate (trρ n ) 2 using 2n copies of a purification of a generally mixed state ρ. In Section 4.3 we give a new algorithm for calculating tr(ρ n ), which directly uses n copies of ρ. Both the two-copy test of Section 4.2 and the algorithm of Section 4.3 have advantages over the conventional Hadamard test for entanglement spectroscopy. In Section 4.4 we combine the discrete-variable (DV), ancilla-free SWAP test of Section 2 and the CV, ancilla-free SWAP test of Section 3 to give an ancilla-free SWAP test valid for hybrid DV-CV states.

Primer on multimode continuous-variable systems
Let L 2 (R) (the set of square integrable measurable functions on R) have orthonormal basis given by the eigenfunctions of a quantum harmonic oscillator of unit frequency and unit mass. We define an M -mode continuous-variable system as L 2 (R M ) ∼ = 2 (C) ⊗M , where 2 (C) is the set of square summable sequences of complex numbers. This is a representation of the canonical commutation relation algebra corresponds to the Fock state |n 1 1 ⊗ · · · ⊗ |n M M .
A distinguished set of CV states are the Gaussian states, characterized by Wigner functions that are Gaussians on phase space R 2M [21]. In this work we make use of three unitary operations that map the set of Gaussian states to itself: with α, z ∈ C and θ ∈ [0, π], φ ∈ [0, 2π). We will also utilize phase space rotations, which take the form M j=1 e −iφ j a † j a j . In Section 3, we utilize two-mode squeezed states, defined by and in Section 3 we utilize CV coherent states, i.e., states of the form |α := D(α)|0 .
In this work, we consider CV systems to be read out by photon number-resolving measurement. This is a projective measurement with outcome indexed by Z ×M ≥0 corresponding to a M -mode Fock state.
2 Ancilla-free, destructive discrete variable (DV) SWAP test It is well-known that the expected value of SWAP (considered as an observable) in a product state on two registers is given by the trace of the product of the density operators on both registers [22], i.e.
It suffices to know how to implement SWAP between two registers since SWAP between multiple registers is defined as the tensor products of the former. We first discuss the destructive, ancilla-free SWAP test algorithm for qubits for which we know an efficient quantum circuit in terms of elementary gates. For two qubits, the eigenvectors of SWAP are given by the Bell basis with respective eigenvalues λ i,j = (−1) ij . A measurement in the Bell basis can be implemented by a computational basis measurement preceded by a quantum circuit C † , where C satisfies C|i A |j B = |Φ i,j such as C = CN OT BA H B -see Fig. 1b).
For ρ and σ generally mixed states of K qubits, the expectation value in (3) can be estimated by preparing the two states, measuring pairs of qubits in each system in the Bell basis, and repeating this S times. The following then is an unbiased estimator of the overlap trρσ: where i k (s), j k (s) are the outcomes of the Bell measurements of the k-th pair of qubits in the s-th run of the experiment. The Berry-Esseen theorem implies that the difference between the distribution of the estimator and the normal distribution N (trρσ, [23]. The appropriate generalization of the Bell basis measurement to the case of qudit registers AB is the projective measurement defined by the eigenvectors of SWAP. These can be constructed using the fact that for a bipartite system AB with dimA = dimB = d, SWAP = P + −P − where P + (P − ) is the projector to the symmetric (antisymmetric) subspace of the Hilbert space AB. Since the spectrum of SWAP is highly degenerate there are many unitaries that diagonalize it. One such unitary is given by which generates the SWAP eigenvectors having respective eigenvalues λ i,j = (−1) 1(i>j) where 1(i > j) is the indicator function of the i > j subset of Z ×2 d . Therefore, the circuit in Fig. 1c allows to estimate (3) for a system of qudits.
Unitary V of Eq. (6) is one example that is easy to specify on paper but might not be the easiest to implement in terms of a given gate set. For example, for d = 2, V does not correspond to the unitary of Fig. 1b and its quantum circuit has two CNOTs as opposed to one. To the best of our knowledge, short-depth gate decompositions of any V † using a standard gate set are not known for d > 2 and it is an open question if constant depth circuits for this task exist analogous to the qubit and CV cases. We note that, unlike qubits, for d > 2, any orthonormal basis consisting of eigenvectors of SWAP does not coincide exactly with the set of qudit Bell states |Φ z,x defined by where (z, x) ∈ Z ×2 d . In Appendix A we show how SWAP eigenvectors can be formed as superpositions of pairs of qudit Bell states. However, this fact by itself does not lead to an efficient gate decomposition.

Ancilla-free, destructive continuous variable (CV) SWAP test
In principle, the DV operator V can be extended to a countably infinite dimensional Hilbert space H ⊗ H of two quantum harmonic oscillators (i.e., two CV registers). However, the computational basis states |j correspond to Fock states, so the action of V † on, e.g., a tensor product of CV coherent states, would produce optically non-classical, non-Gaussian states. In particular, V † would not be implementable on a near-term photonic processor such as Xanadu's X8, which produces non-classical, Gaussian states [24]. In this section, we seek an ancilla-free CV SWAP test that requires only linear optical operations and photon number-resolving measurement.
The SWAP gate for continuous variables can be written as As in the case of SWAP on a tensor product of isomorphic finite dimensional Hilbert spaces, (8) is both unitary and self-adjoint. A generalized version of SWAP has been experimentally implemented for two-mode CV systems in a circuit quantum electrodynamics framework [25]. A proposal for implementing the ancilla-full CV SWAP test using a fault-tolerant ancilla prepared in a Kerr cat logical qubit appears in Ref. [26].
To prove (8), one verifies that the unitary has the correct action on the canonical operators, SWAPa † 1(2) SWAP † = a † 2(1) . One can see that the SWAP gate is a linear optical unitary (i.e., its generator is quadratic in the creation and annihilation operators and commutes with the total photon number j=1,2 a † j a j ), and can therefore be written as a composition of 50:50 beamsplitters and local phase shifters according to the rectangular decomposition [27]. For example, defining the beamsplitter as in (1), one gets Note that U BS ( π 4 , 0)|n 1 |m 2 is an eigenvector of SWAP with eigenvalue (−1) n , and that these form a complete set of eigenvectors. Thus we can estimate the expectation value of SWAP using the circuit in Fig. 1d, in which only the photon count n in the first register contributes to the overlap estimator. Note that this is inherently different from the particular DV SWAP test algorithm discussed in the previous section, where the eigenvalue of SWAP is determined by the measurement outcomes on both registers. One might try to reverse engineer an ancilla-free DV SWAP test algorithm for qudits based on the present CV algorithm. Alas, this is not possible because the beamsplitter unitary maps some states in the qudit subspace outside this subspace. This demonstrates that calculating some functions of qudit states might be much easier when embedded in a CV system, due to the fact that the state is occasionally allowed to leave the subspace associated with the qudit.
Further, SWAP in (9) commutes with the projections Q 2M of subspaces spanned by {|n A |m B : n + m ≤ 2M } and, as a consequence, also commutes with its complement } is therefore an alternative orthonormal basis for the subspace Q 2M H ⊗ H associated with Q 2M . We define two operators that implement SWAP on the Q 2M H ⊗ H subspace and its complement via which satisfy SWAP = SWAP 2M + SWAP C 2M . Consider states ρ and σ such that supp ρ ⊗ σ is contained in Q 2M H ⊗ H. Then Therefore, photon number-resolving detection with a finite experimental threshold on the total number of photons can be used to estimate the expectation of SWAP in ρ ⊗ σ granted all photon numbers above the threshold have zero probability. In an experiment, if the input states have local Fock cutoff M , the local photon number-resolving detectors should have a threshold of at least 2M photons, i.e., (11) holds for M ≥ M . Thus 2M is the minimal photon detector threshold that allows to apply the ancilla-free, CV SWAP test for input states with local Fock cutoff M . This fact provides an opportunity to concretely illustrate how the ancilla-free CV SWAP test takes into account amplitudes on all 2M Fock basis states to estimate the overlap. Consider M = 1 and the task of estimating the overlap of single photon states |ψ = α 1 |0 + β 1 |1 and |φ = α 2 |0 + β 2 |1 with our algorithm. The overlap is a homogeneous polynomial of order 4 in α 1 , α 1 , α 2 , α 2 , β 1 , β 1 , β 2 , β 2 , and this polynomial contains the term |β 1 | 2 |β 2 | 2 whenever both β 1 and β 2 are nonzero. But writing out U BS (π/4, 0) † |ψ |φ , one sees that there is no way to get this term without registering both the |0 |2 and |2 |0 photon count outcomes with +1 weight. Physically, these outcomes can be considered as "Hong-Ou-Mandel" contributions to the overlap. More generally, when ρ and σ have support on Fock states up to |M , one needs to take M ≥ M in the expression (11) so that all photon interferences arising from U BS ( π 4 , 0) † contribute to the overlap estimate. To rigorously extend the above argument to CV states ρ ⊗ σ with arbitrary support requires some analysis in separable Hilbert space. For instance, the photon number operator needs to have its domain defined, its spectral theorem should be stated, etc. We will forego the formalities, simply noting that {|n } ∞ n=0 is a countable orthonormal basis for a CV mode 2 (C) as discussed in Section 1.1. Consider pure CV states |ψ and |φ . Their respective amplitudes in the above orthonormal basis are square summable by Parseval identity, so where q 2M := ψ| φ|Q 2M |ψ |φ allows to define the respective normalization factors. Then where the second line uses the fact that Using (11), the result (12) is extended by bilinearity (in the tensor factors A and B) to normal states [28] ρ A ⊗ σ B of two CV modes. The error due to Fock cutoff can be bounded as: where q ρ M = tr(ρ M n=0 |n n|) and similarly for q σ M . Again from the Parseval identity, it follows that the left hand side of (13) can be made arbitrarily small by taking a large enough value of M . Thus the accuracy of the estimate obtained using a photon numberresolving measurement depends on M and the precision depends on the number of shots S.
Consider the task of estimating tr(ρσ) with some error using an experimental setup whose photon detectors on each register have a threshold given by 2M . We perform the procedure shown in Fig. 1d and obtain two photon counts on each register: n and m. We assign the value 0 to all realizations for which n + m > 2M . Note that for all values n + m < 2M the photon detectors are not saturated. To these instances we assign the value (−1) n and compute the following estimator: where Θ is the Heaviside step function with Θ(0) = 1. Note that the Fock basis plays the role of the computational basis in the ancilla-free DV SWAP test of Section 2, although not all photon number counts are utilized in the estimator. Estimator (15) is the generalization of (5) with K = 1 to the CV setting with the device cutoff taken into account. This is an unbiased estimator of the expectation value of SWAP 2M operator. Note that the expectation value of a quantum observable can be estimated with standard deviation δ by perfectly measuring the observable O(1/δ 2 ) number of times. The total error of the estimator |trρσ − tr(SWAP 2M ρ ⊗ σ) | can be upper bounded by the sum of the systematic error |trρσ−trSWAP 2M (ρ⊗σ)| and the statistical error |tr(SWAP 2M ρ ⊗ σ) −trSWAP 2M (ρ⊗σ)|.
In order to bound the total error by it suffices to choose M such that 1 − q 2M < and run the algorihms with S = O(1/δ 2 ) shots where 0 < δ ≤ − (1 − q 2M ). The first condition puts limits on the quality of acceptable photon detectors whereas the second condition dictates the sampling complexity of the algorithm. It is useful to carry out a detailed error analysis for some example state pairs. First consider the unitary squeezing operator S(r) = e r 2 (a 2 −a †2 ) [29]. With |ψ A ≡ S(r)|0 A , |φ B = S(−r)|0 B , i.e., squeezed and anti-squeezed vacuum, one gets | ψ|φ | 2 = 1 cosh 2r . Applying the circuit in Fig. 1d produces where U TM (r) = e r(ab−a † b † ) is the two-mode squeezing operator. The finite Fock cutoff approximation to | ψ|φ | 2 is given by with error upper bounded by (13) Convergence of (17) to 1/ cosh 2r is shown in Fig. 2    Another example consists of computing | α|β | 2 , where |α and |β are any two isoenergetic coherent states of the quantum harmonic oscillator with energy E, i.e., |α| 2 = |β| 2 = E. The total photon number random variable n + m is Poisson distributed with mean 2E. Therefore, an upper bound on 1 − q 2M is obtained from an upper bound on the tail probability of the Poisson distribution. From a weak tail bound such as 1 − q 2M ≤ 1 − e − E M (M > E), one can only infer that M = O(E/ ) is sufficient to obtain an error in the estimate of | α|β | 2 . This scaling is worse than log −1 that was obtained in the squeezed state example. However, the Chernoff bound gives that for M > E, One verifies numerically that for all E and for ∈ (0, 1) taking M = 13 10 E + log −1 results in (19) being less than . It follows that for a fixed energy E, the smallest acceptable Fock cutoff for CV SWAP test for antipodal coherent states has a milder dependence on additive error compared to the CV SWAP test for squeezed and anti-squeezed states.
To obtain the bound in (19) and derive an expression for M that implies that the bound is less than requires knowledge of specific results on Poisson tail probabilities. For sufficiently large E, a general approach can be implemented that only utilizes the tail of the normal distribution. This approach is based on the fact that the looser bound (14) on the systematic error allows one to obtain an upper bound on the error from the local cumulative Fock distributions of ρ and σ, respectively. For the case of antipodal coherent states, one first assumes E large enough to justify the normal approximation to the Poisson distribution, viz., (14). This upper bound is less than or equal to for for → 0, where c < 1. In (20), Φ −1 is the quantile function for the standard normal distribution (i.e., the probit function) and logit(x) ≡ ln x 1−x on (0, 1).

Experimental demonstration
To provide a proof-of-principle demonstration of the ancilla-free CV SWAP test, it would be ideal to implement an experiment showing convergence of the estimator (15) with different states and energy ranges. However, such an experiment is not presently possible using cloud-based CV processors such as Xanadu's X8 device [24], which restricts the input state, decomposition of linear optical circuit, and photon number-resolving measurement cutoff. In particular, one cannot probe different energy ranges. Further, on the optical chip, a linear optical unitary is always precompiled to the rectangular decomposition, and there is no option to change the decomposition (the optical chip is not reconfigurable). Specifications of the optical chip and photon number-resolving measurement for the X8 can be found in Ref. [24]; for the present purposes we note that the photodetectors have a quantum efficiency above 95% and a local photon number cutoff between 5 and 7.
Putting aside the limitations of the X8 device for doing a full experimental analysis, it is possible to implement the CV SWAP circuit in order to verify that it produces an approximation to the analytical result (recall that the multimode ancilla-free CV SWAP test simply requires applying the beamsplitter U BS ( π 4 , 0) † to appropriate registers, followed by photon number-resolving measurement). The beamsplitter is implemented on the X8 in a noise-resistant way by specifying the rectangular decomposition (up a complex multiple of modulus 1) [27,30]. Note that we do not have access to the exact noise channels that affect the output of the device (i.e., the photon number-resolving measurement statistics). The simplest nontrivial fidelity that can be estimated on the X8 is where |TMSS r = 1 cosh r ∞ n=0 (−1) n tanh n r|n |n is a two-mode squeezed state. According to the two-mode generalization of (15), an estimate of this overlap is obtained as Compared to (15), we are considering applying the ancilla-free CV SWAP circuit on two mode pairs, and we further omit the Heaviside step function because we use r = 1, so the photon counts never exceed the photon detection threshold. The circuit for computing (23) is shown in Fig. 3a. On the actual X8, the squeezing factor r is constrained to r ≈ 1. A numerical simulation of this circuit (10 runs, 2000 shots per run, local Fock cutoff 11) gave a mean 0.423 and standard deviation estimate 0.01, close to the analytical value 1 cosh 2 r ≈ 0.420. Running the circuit of Fig. 3a on the X8 produces a mean 0.4811 with standard deviation estimate 0.065 over 5 runs of 5 × 10 4 shots per run.
It is possible to utilize all 8 modes of the X8 processor to estimate in parallel the fidelity The circuit for computing (25) is shown in Fig. 3b. A numerical simulation of this circuit (10 runs, 500 shots per run, local Fock cutoff 10) gave a mean 0.176 and standard deviation estimate 0.03, close to the analytical value 1/ cosh 4 (1) ≈ 0.1764. Running the circuit of Fig. 3b on the X8 gives a large systematic difference from the analytical result, producing a mean 0.292 with standard deviation estimate 0.007 over 10 runs of 5 × 10 4 shots per run. This systematic error could be due to the noise photons that occur even in modes initialized to vacuum, or from the cumulative effect of photon losses when many of the modes of the X8 are not initialized to vacuum. The following section discusses applications of the ancilla-free CV SWAP test in the context of multimode CV quantum computing.

Variational quantum compiling
The CV SWAP test provides an alternative method for computing cost functions that appear in the task of variational quantum compiling of CV circuits [31,32]. Such cost functions have the general form where the training set T ≡ {|ψ j } K j=1 is a set of states of a two mode CV system H A ⊗ H R , and θ is shorthand for a set of continuous and discrete parameters that is optimized in order to minimize C T (θ). Often, the states |ψ j are related by an energy-preserving symmetry operation. Parallel computation of the sum in (26) can be obtained by first preparing the Fig. 1d is then applied to all mode pairs A j A j and R j R j (the CV circuit architecture may require that beamsplitters be applied to adjacent modes, in which case the appropriate SWAP gates should be applied). This procedure works because the states are eigenvectors of SWAP AA SWAP RR with eigenvalue (−1) n+n . Assume for simplicity that for all j, |ψ where Q 2M j projects to the subspace with orthonormal basis {|n A j |m A j |n R j |m R j : n+m+n +m ≤ 2M j }. Under this assumption, the j-th term of the sum in (26) is given by An estimator of this quantity can be constructed analogous to (15) and, for general |ψ (j) in without a finite Fock cutoff. An error analysis similar to previous section yields lower bounds on the sufficient photon count thresholds M j needed in order to get an additive error .

Two-copy test
The CV SWAP test immediately gives a CV generalization of the two-copy test [14]. The latter is based on the observation that for pure states |Ψ , the squared expectation value | Ψ|U |Ψ | 2 for some unitary operator U is equal to the overlap between states |Ψ and U |Ψ . Note that both real and imaginary parts of Ψ|U |Ψ can be estimated using the Hadamard test which also works for mixed states. However, the two-copy test has some advantages. Unlike the Hadamard test which requires controlled-U , the two-copy test only uses U . In general, controlled versions of unitaries require more complicated (larger depth) quantum circuits, which for noisy devices can be detrimental. Moreover, a controlled unitary cannot be enacted if one only has black-box access to the unitary [33]. The two-copy test is useful whenever one does not know how to measure in the eigenbasis of U , but U is easy to implement. We now describe the two-copy test in the CV setting. For a state |Ψ on n mode CV register C and a CV unitary U acting on C , one computes | Ψ|U |Ψ | 2 by preparing |Ψ C ⊗ U |Ψ C with C ∼ = C , and applying the ancilla-free, 2-mode CV SWAP test (the circuit is Fig. 1d on each mode pair C j C j , j = 1, . . . , n). The full circuit has depth 2.
As an example, the CV two-copy test with U a cyclic permutation of many CV modes can be employed for computing (trρ n ) 2 , n ≥ 2. This was also a primary application of the DV two-copy test [14]. Specifically, one takes C to be a CV register C = AB = A 1 B 1 · · · A n B n and U to be the CV cyclic permutation PERM A ≡ n j=1 SWAP A j A j+1 (where n + 1 is taken modulo n) on the n-mode register A consisting of isomorphic modes A 1 , . . . , A n (ρ can be considered as a state on A j for any j). We let |Ψ C ≡ n j=1 |ψ A j B j , j = 1, . . . , n, and |ψ is a purification of ρ. We prepare a second copy |Ψ C and the state |Ψ C ⊗ |Ψ C is then subjected to PERM A , i.e., a cyclic permutation of the subsystem A in the second copy. Finally, the ancilla-free CV SWAP test of Section 3 is carried out on the state |Ψ C ⊗ PERM A |Ψ C . This algorithm works because analogous to (3) One can see that this CV two-copy test produces an estimate of (trρ n ) 2 . For n = 2, the full circuit for computing (30) Fig. 4. More generally, for a pure CV state |ψ C 1 ···Cn on n CV modes, the CV two-copy test allows to compute the quantities (trρ n Σ ) 2 , where Σ ⊂ {C 1 , . . . , C n }. It should be noted that PERM A does not need to be implemented as a gate as it amounts to a reindexing of the registers. This is the advantage of the two-copy test over the Hadamard test, where PERM A is controlled on an ancilla and has to be implemented on the device. Nevertheless, one may still have to implement some SWAP gates for the twocopy test in devices with limited connectivity. We note that a constant-depth version of the Hadamard test has been proposed that makes use of constant-depth preparation of ancilla Greenberger-Horne-Zeilinger (GHZ) states and constant-depth implementation of of a multiply controlled PERM [34].

Extension of SWAP test and entanglement spectroscopy
We now discuss a novel CV algorithm for calculation of tr(ρ L ) (L ≥ 2) that, to the knowledge of the authors, does not have a strict analogue for DV systems. It makes use of the fact that a short depth implementation of a measurement in the orthonormal basis of eigenvectors of PERM is possible in CV systems. To see this, note that similar to the construction of eigenvectors of the CV SWAP gate in Section 3, one can construct a linear optical unitary U such that U |n 0 0 ⊗ · · · ⊗ |n L−1 L−1 are eigenvectors of PERM (here we are using a L mode CV system). One just uses the fact that PERM acts on the vector (a † 0 , . . . , a † L−1 ) via a circulant matrix, so defining U such that one finds that the states Figure 4: Circuit for estimation of (trρ 2 A ) 2 via two-copy test on a photonic processor that only allows beamsplitting between nearest-neighbor modes. The SWAP A 1 A 2 gate is PERM A , which implements the cyclic permutation. The remaining SWAP gates allow to implement the beamsplitters according to the assumed nearest-neighbor architecture.
are eigenvectors of PERM with eigenvalues L−1 j=0 e 2πijn j L . The relation between b † j and the a † is a discrete Fourier transform on {0, . . . , L−1}. The unitary U can be compiled on a linear optical circuit with local phase shifters and nearest-neighbor 50:50 beamsplitters by using the rectangular form [27]. In particular, there is a depth L circuit of beamsplitters and phase shifters that compiles U . Such a linear optical operation has also been proposed for implementing an order-M SWAP test without ancillas for single-photon states, which consumes a state of the form |φ ⊗ |ψ ⊗M −1 and allows to estimate | φ|ψ | 2 by postprocessing photon number-resolving detection patterns [35].
Applying Ad U † to input state ρ = ρ (0) ⊗ · · · ⊗ ρ (L−1) allows one to obtain an estimate of trPERMρ = tr L−1 =0 ρ ( ) from the estimator where n j (s) is the photon count on the j-th register at the s-th run of the experiment. We refer to this extension of the ancilla-free SWAP test as PERM test. The case of ρ (i) = ρ for i = 1, . . . , L corresponds to estimating trρ L which can be used in determining the entanglement spectrum when ρ is a reduced state of a larger quantum system [14].
To the best of our knowledge, a short-depth (linear in L) quantum circuit for diagonalizing the cyclic permutation operator PERM is not known for qubit systems. As a result, the two-copy test has been used for estimating (trρ L A ) 2 from 2L copies of a purification |ψ AB of ρ A [14,15]. That algorithm can also be used in the CV case using results of Section 4.2. It has the advantage of having constant depth (independent of L) but it is inferior to the PERM test in other ways. First, the PERM test does not require purification of the quantum states. Second, PERM test requires half the number of modes the two-copy test does. And third, the sampling complexity of two-copy test is larger than that of PERM test because the former estimates the square of the desired quantity which is less than one [14]. Finally, we also note that the functions trρ L A are used to upper bound the entanglement entropy of a subregion Σ of a lattice spin system, and can be numerically estimated in these systems by using quantum Monte Carlo to sample the SWAP Σ operator in such systems [22]. In fact, path integral Monte Carlo methods for sampling SWAP Σ have been applied to estimate trρ 2 Σ , Σ ⊂ R 3 , for massive particles in the continuum [36].

DV-CV hybrid SWAP test
Combining the DV SWAP test in Section 2 and the CV SWAP test in Section 3, one obtains an ancilla-free, destructive SWAP test that can be applied in hybrid quantum systems, such as cavity QED. In more detail, and specializing the DV system to be a qubit system, let A be a qubit register and B a CV register, and consider the task of estimating | φ|ψ | 2 where |ψ AB and |φ AB are pure states of the hybrid system. The ancilla-free, destructive hybrid SWAP test proceeds by: 1. preparation of the state |ψ AB |φ A B , 2. application of the circuit in Fig. 1b to the modes AA and the circuit in Fig. 1d to the modes BB , 3. computational basis measurement of AA and photon number-resolving measurement of BB . Combining (5) and (15), the estimator of trρσ given by where (i A , j A ) is a two-bit word associated with an outcome of a Bell measurement. Eq. (34) has arbitrarily small error for sufficiently large M . The proof of this statement follows from noting that if |ψ |φ has support in ( holds exactly, where the SWAP gates are the qubit and CV versions, respectively. One can then follow the analysis of (12) and (13)

Discussion
The destructive, ancilla-free CV SWAP test presented in this work is envisioned as an equality testing subalgorithm [37] for CV states that can be embedded in larger CV quantum algorithms. Our discussions of applications such as variational quantum compiling and implementation in hybrid DV-CV systems indicate potential implementations in nearterm quantum hardware. Indeed, we were able to carry out a demonstration on the X8 device, thereby showing the feasibility of the proposed CV SWAP test for few CV modes on a noisy photonic chip. We note that a SWAP test for high-dimensional motional states of trapped ions has recently been implemented [38], further suggesting the importance of the SWAP test beyond the qubit context. Advantages of the present proposal include: no ancilla degrees of freedom or controlled gates, fully linear optical depth 1 circuit, and photon number-resolving measurement that can be achieved using transition-edge sensors. These requirements are presently available on photonic chips and free-space optical setups which manipulate lowenergy continuous variable states. Technologies that extend these systems to a wider range of frequency modes and energies will allow for practical implementation of the present algorithm in more quantum optical settings.