Classical simulation of non-Gaussian fermionic circuits

We propose efficient algorithms for classically simulating fermionic linear optics operations applied to non-Gaussian initial states. By gadget constructions, this provides algorithms for fermionic linear optics with non-Gaussian operations. We argue that this problem is analogous to that of simulating Clifford circuits with non-stabilizer initial states: Algorithms for the latter problem immediately translate to the fermionic setting. Our construction is based on an extension of the covariance matrix formalism which permits to efficiently track relative phases in superpositions of Gaussian states. It yields simulation algorithms with polynomial complexity in the number of fermions, the desired accuracy, and certain quantities capturing the degree of non-Gaussianity of the initial state. We study one such quantity, the fermionic Gaussian extent, and show that it is multiplicative on tensor products when the so-called fermionic Gaussian fidelity is. We establish this property for the tensor product of two arbitrary pure states of four fermions with positive parity.


Introduction
While universal polynomial-time quantum computation is believed to exceed the capabilities of efficient classical algorithms, restricted classes of quantum computations are amenable to efficient classical simulation. Identifying such models and corresponding simulation algorithms is a central goal in the study of quantum computing. On the one hand, a good characterization of the boundary between the computational power of classical and quantum computational models provides insight into potential quantum advantages. On the other hand, efficient classical simulation methods can be used to assess the merits and scalability of quantum informationprocessing proposals. For example, the resilience of certain quantum codes against restricted noise models has successfully been studied by means of classical simulation methods giving threshold estimates for large-scale systems, see e.g., [1][2][3][4][5][6] for an incomplete list of relevant references.

Efficiently simulable quantum computations
Most known examples of efficiently simulable quantum computations can be summarized by the following ingredients: (i) A set D of states with the property that each element Ψ ∈ D has a succinct classical description d Ψ . In the following, we will refer to D as a dictionary.
(ii) A set E of operations (unitary or non-unitary evolutions), again with the property that each element E ∈ E has a succinct classical description d E . Following resource-theoretic conventions, we call E the set of free operations.
(iii) A set M of measurements (quantum instruments) with an efficient classical descriptions d M for each M ∈ M, and the property that every post-measurement state (associated with different measurement outcomes) obtained by applying M ∈ M to a state Ψ ∈ D belongs to D.
A triple (D, E, M) gives rise to a (generally restricted) quantum computational model by composing these ingredients. A typical (non-adaptive) computation proceeds by preparing an initial state Ψ ∈ D, applying a sequence {E t } T t=1 ⊂ E of operations, and performing measurements {M k } L k=1 ⊂ M in succession. Assuming for simplicity that E consists of a set of unitaries, and that for each k ∈ More generally, one may consider circuits where operations are chosen adaptively depending on intermediate measurement results, assuming that the dependence is given by an efficiently computable function. The task of classically simulating the computational model associated with (D, E, M) comes in two flavors. The input in both cases is the collection (d Ψ , {d Et } T t=1 , {d M k } L k=1 ) of descriptions of the initial state, the set of operations applied, and the measurements. The problem of weak simulation then consists in producing a sample m ∈ M 1 × · · · × M L drawn according to the (ideal) output distribution p(m) of the circuit given by Eq. (1). In contrast, the problem of strong simulation consists of computing the output probability p(m) for a given (potential) measurement outcome m.
Relaxing the requirements of weak and strong simulation, one may allow for an approximation error. For weak simulation, this is typically formalized by demanding that the (probabilistic) classical algorithm outputs a sample m drawn from a distributionp which is δ-close in L 1distance (for a chosen error parameter δ > 0) to the ideal output distribution p. Similarly, for strong simulation, the outputp is required to be close to the value p(m) with a controlled (additive or multiplicative) error.
In cases where a computational model specified by (D, E, M) is amenable to efficient classical simulation, associated classical simulation algorithms are typically constructed by considering evolution and measurement separately. The basic problem then consists in constructing efficient classical algorithms with the following functionalities: (a) An algorithm evolve which, given classical descriptions d Ψ of a state Ψ ∈ D and d E of an evolution operation E ∈ E, computes a classical description d E(Ψ) of the evolved state E(Ψ). which is linear in the number T of operations applied, and linear in the number L of measurements. Assuming that for any measurement M ∈ M, the set of measurement outcomes M M associated with M is of constant cardinality, the triple (evolve, measureprob, postmeasure) also gives rise to a randomized algorithm for weak simulation: Such an algorithm is obtained by using measureprob to compute the entire distribution {p(m)} m∈M M of measurement outcomes (when applying a measurement M ), and then drawing m ∈ M M randomly according to this distribution. The runtime of this probabilistic algorithm is T · time(evolve) + L · (time(measureprob) · w + time(postmeasure)) where w = max M ∈M |M M | bounds the maximal cardinality of the set of measurement outcomes.

Clifford circuits / Stabilizer computations
Perhaps the most well-known example of a computational model (D, E, M) where efficient algorithms (evolve, measureprob, postmeasure) can be provided is the Gottesman-Knill-theorem for stabilizer computations on n qubits. Here D is the set STAB n of n-qubit stabilizer states (whose elements can be specified by their stabilizer generators, i.e., corresponding stabilizer tableaux), E is the set of Clifford unitaries (described by symplectic matrices), and M are measurements of single-qubit Pauli Z operators (described by an index j ∈ [n]). In this case, there are efficient algorithms with runtimes given in Table 1.  Table 1: Runtimes of building blocks evolve, measureprob, postmeasure for classical simulation of n-qubit stabilizer circuits as given in [7]. Evolution corresponds to application of an n-qubit Clifford unitary, and each measurement is that of a Pauli observable Z j with j ∈ [n].

Fermionic linear optics / Fermionic Gaussian computations
A different class of efficiently simulable computations -the one we are interested in here -is that of fermionic linear optics on n fermions. We focus on pure-state computations: Here the dictionary D consists of the set G n of pure fermionic Gaussian states. An element Ψ ∈ G n in the dictionary can be described by its covariance matrix Γ Ψ , an antisymmetric 2n × 2n matrix with real entries. The set E = E Gauss can be taken as the set of Gaussian unitary operations. Each such unitary U = U R is fully determined by an element R ∈ O(2n) of the orthogonal group on R 2n , where R → U R defines a (projective) unitary representation of O(2n) on the space H n of n fermions. The set M = M number consists of all occupation number measurements. As in the case of stabilizer states, there are polynomial-time algorithms (evolve, measureprob, postmeasure) for classical simulation with runtimes summarized in Table 2. In particular, the covariance matrix Γ U R Ψ of a Gaussian state Ψ evolved under a Gaussian unitary U R can be computed in time O(n 3 ) from Γ Ψ and R. The outcome probability of observing 0 (respectively 1) when performing an occupation number measurement can be computed in time O (1), and the covariance matrix of the post-measurement state can be computed in time O(n 2 ) [8][9][10] (see also [3]).  Table 2: Runtimes of building blocks evolve, measureprob, postmeasure for classical simulation of n-fermion linear optics circuits as proposed in [8][9][10], see also [3]. Evolution amounts to application of a fermionic Gaussian unitary. Measurement corresponds to measuring an observable a † j a j (occupation number) for j ∈ [n].

Classical simulation algorithms and measures of magic
A natural way of extending the power of a quantum computational model specified by (D, E, M) consists in providing resources/capabilities that do not belong to the specified sets. "Magic states" are a prime example: Here a state Ψ ∈ D not belonging to the dictionary is provided as an (initial) state in the quantum computation, thereby providing additional capabilities to the computational model. For example, non-Clifford unitaries can be realized by certain stabilizercomputations (sometimes referred to as "gadgets") applied to so-called magic states [11]. Similarly, non-Gaussian initial states can be combined with fermionic linear optics operations to realize non-Gaussian operations [12,13]. While such a magic state can even promote the computational model to universal quantum computation, this is generally not the case for all states Ψ. It is thus a natural question to quantify the degree of "magicness" provided by a state Ψ ∈ D. For the set STAB n of n-qubit stabilizer states, corresponding magic monotones considered in the literature include the robustness of magic [14,15], the exact and approximate stabilizer rank [16][17][18], the stabilizer extent [18,19], the stabilizer nullity [20] and the generalized robustness [21].
The maximum overlap of a given state Ψ with an element of the dictionary D, i.e., the quantity is arguably one of the most direct ways of quantifying how far Ψ is from a "free" state, i.e., a state belonging to D. Motivated by the analogously defined notion of stabilizer fidelity in Ref. [18], we call F D (Ψ) the D-fidelity of Ψ in the following. This quantity plays an important role in our arguments when considering multiplicativity properties. However, the D-fidelity F D (Ψ) is not a good quantifier of hardness of classical simulation because simply replacing Ψ by an element of D typically leads to a significant approximation error. From the point of view of classical simulation, a relevant magicness measure for a state Ψ ∈ D relates to the (added) complexity when trying to simulate a quantum computation with initial state Ψ, built from a triple (D, E, M) allowing for efficient classical simulation. One such measure, introduced in Ref. [16] for the case of stabilizer computations, is the D-rank χ D (Ψ) of Ψ. (For D = STAB n , this is called the stabilizer rank of Ψ.) It is defined as the minimum number of terms when decomposing Ψ as a linear combination of elements of D, i.e., In the context of signal processing, the corresponding optimization problem is referred to as a sparse approximation problem. The D-rank χ D (Ψ) appears naturally when constructing and analyzing simulation algorithms, but it suffers from a number of shortcomings: On the one hand, the set of states Ψ ∈ H whose D-rank is less than the dimension of the Hilbert space H is a set of zero Lebesgue measure [22,Proposition 4.1]. On the other hand, the quantity χ D (Ψ) relates to the classical simulation cost of exactly simulating dynamics involving the state Ψ. In practice, some approximation error is typically acceptable, and corresponding simulations can be achieved with lower cost. In other words, the quantity χ D (Ψ) does not accurately reflect the cost of approximate simulation. A more operationally relevant quantity is the δ-approximate D-rank χ δ D (Ψ) of Ψ introduced in Ref. [17], again for stabilizer computations. For a fixed approximation error δ > 0, this is given by the minimum D-rank of any state Ψ that is δ-close to Ψ, i.e., An exact classical simulation algorithm whose complexity scales with the exact D-rank χ D (Ψ) provides an approximate simulation at a cost with an identical scaling in the approximate (instead of exact) D-rank χ δ D (Ψ) of Ψ. Here approximate weak simulation means that instead of sampling from the ideal output distribution P of a circuit, the simulation samples from a distribution P whose L 1 -distance from P is bounded by O(δ). Similarly, in approximate (strong) simulation, output probabilities are approximately computed with a controlled approximation error.
A different quantity of interest is obtained by replacing the ill-behaved rank function (i.e., size of the support) in the definition of the D-rank χ D (Ψ) by the L 1 -norm of the coefficients when representing Ψ as a linear combination. In the context of stabilizer states the corresponding quantity was introduced by Bravyi et al. [18] under the term stabilizer extent: For a state Ψ ∈ (C 2 ) ⊗n it is defined as where γ 1 = ϕ∈STABn |γ(ϕ)| denotes the 1-norm of γ. The corresponding convex optimization problem is known as the basis pursuit problem [23] (when STAB n is replaced by e.g., a finite dictionary D). Sufficient conditions for when the basis pursuit problem yields a solution of the sparse approximation problem where investigated in a series of works culminating in Fuchs' condition [24] (see also [25]). More importantly for (approximate) simulation, feasible solutions of the basis pursuit problem provide upper bounds on the sparse approximation problem. For the stabilizer rank, a sparsification result (see [18,Theorem 1]) gives an upper bound on the δapproximate stabilizer rank χ STABn (Ψ) in terms of the stabilizer extent ξ STABn (Ψ), for any δ > 0 (see Section 4.2). Building on earlier results [17], it was shown in Ref. [18] that a stabilizer circuit on n qubits with L Clifford gates initialized in a state Ψ can be weakly simulated with error δ in a time scaling as O(ξ STABn (Ψ)/δ 2 · poly(n, L)). The error δ expresses the L 1 -norm distance of the distribution of simulated measurement outcomes from the output distribution of the actual quantum computation. Here we are not accounting for the time required to perform classical computations when adaptive quantum circuits are considered. In addition, Ref. [26] provided a classical algorithm for strong simulation of a circuit U with L Clifford gates and t T -gates initialized in a stabilizer state Ψ with an additive error δ. Their algorithm outputs an estimate of the probability | x, U Ψ | 2 of obtaining measurement outcome x ∈ {0, 1} n up to an additive error δ, with success probability greater than 1 − p f . It has runtime O(ξ STABn (|T ⊗t ) log(1/p f ) · poly(n, L, δ −1 )), scaling linearly with the stabilizer extent ξ STABn (|T ⊗t ) of t copies of the singlequbit magic state |T associated with a T -gate [11].

The fermionic Gaussian extent
In the following, we generalize the notion of the extent beyond stabilizer computations to any dictionary D. We refer to the corresponding quantity as the D-extent ξ D (Ψ) of Ψ. We assume throughout that we are interested in pure state quantum computations on a Hilbert space H, and that the dictionary D is a subset of pure states on H. Then the D-extent ξ D (Ψ) of Ψ ∈ H is defined as Here γ 1 = N j=1 |γ j | is the L 1 -norm of the vector γ. That is, the D-extent ξ D (Ψ) is the L 1norm of the coefficients minimized over all decompositions of Ψ into a finite linear combination of elements of the dictionary D. As mentioned above, quantities of the form (5) are well-studied in the context of signal-processing.
When the dictionary D is a finite subset of a Hilbert space H ∼ = C d , the D-extent ξ D (Ψ) of a state Ψ ∈ H can be expressed as a second-order cone program [27] (see also e.g., [28]), as in Appendix A of Ref. [19]. Second-order cone programs can be solved in time polynomial in max(d, |D|). We are typically interested in cases where D contains a basis of H (such that every state can indeed be represented as a linear combination of dictionary elements): Here this runtime is at least polynomial in d. For example, in the case D = STAB n of stabilizer states on n qubits, this leads to an exponential scaling in n 2 . Beyond algorithmic considerations related to the evaluation of the extent, the fact that ξ D (Ψ) is given by a second-order cone program provides useful analytical insight by convex programming duality. Indeed, this fact has previously been exploited both for showing multiplicativity of the stabilizer extent for states of small dimension [18], as well as to show non-multiplicativity in high dimensions [19]. In Section 6, we also exploit this connection to relate the D-fidelity F D (Ψ) with the D-extent ξ D (Ψ).
In contrast, the D-extent ξ D (Ψ) for an infinite, i.e., continuously parameterized, dictionary D constitutes additional mathematical challenges as an optimization problem. This is the case of interest here as we are considering the dictionary D = G n consisting of all n-fermion Gaussian states in the following. We call the associated quantity ξ Gn (Ψ) the (fermionic) Gaussian extent of an n-fermion state Ψ. Our focus here is on discussing the role of the quantity ξ Gn (Ψ) in the context of classically simulating fermionic linear optics, and its behavior on tensor products. A detailed discussion of the algorithmic problem of computing ξ Gn (Ψ) for an arbitrary state Ψ, and finding a corresponding optimal decomposition of Ψ into a linear combination of Gaussian states is beyond the scope of this work. We refer to e.g., [29] where semidefinite relaxations are given for the related atomic norm minimization problem in cases where the atomic set (corresponding to the dictionary) has algebraic structure. Similar techniques may be applicable to the fermionic Gaussian extent.

On the (sub)multiplicativity of the extent
Consider a situation where an operation E ∈ E not belonging to the set E of efficiently simulable operations is implemented by using a "magic" resource state Ψ ∈ D. For example, if D = STAB n is the set of stabilizer states, E the set of Clifford unitaries and M the set of singlequbit Pauli-Z-measurements, then a non-Clifford gate (such as the T -gate) can be realized by an (adaptive) Clifford circuit at the cost of consuming a non-Clifford state (such as the state |T ) [11]. Similar "gadget constructions" exist for fermionic linear optics, where non-Gaussian unitaries are realized by Gaussian unitaries and non-Gaussian states [12,13]. A natural question arising in this situation is to characterize the cost of simulating the application of two independent magic gates E 1 , E 2 ∈ E, each realized by efficiently simulable operations (belonging to E) using magic states Ψ 1 , Ψ 2 . For any reasonable simulation algorithm, we expect the required simulation effort to increase at most multiplicatively. Indeed, this feature is reflected in the submultiplicativity property of the D-extent. In Eq. (6), we are considering Hilbert spaces H 1 , H 2 and their tensor product H 3 = H 1 ⊗ H 2 , as well as dictionaries D j ⊂ H j for j ∈ [3]. The submultiplicativity property (6) follows immediately from the definition of the extent if the three dictionaries satisfy the inclusion property In particular, this is satisfied e.g., when the dictionary D j = STAB n j ⊂ (C 2 ) ⊗n j is the set of n jqubit stabilizer states for j ∈ [3], with n 3 = n 1 + n 2 , or when considering the set of (even) Gaussian states (see below). While the submultiplicativity property (6) is a trivial consequence of Eq. (7), the question of whether or not the stronger multiplicativity property for all Ψ 1 ∈ H 1 and Ψ 2 ∈ H 2 (8) holds for the D-extent is a much less trivial problem. If the multiplicativity property (8) is satisfied, then computing the extent of a product state can be broken down into several smaller optimization problems: It suffices to compute the extent of each factor in the tensor product. Furthermore, the classical simulation cost (with typical algorithms) when applying several nonfree ("magic") gates constructed by gadgets increases at an exponential rate determined by the individual gates. In contrast, if the extent is not multiplicative (i.e., the equality in (8) is not satisfied for some states Ψ j ∈ H j , j ∈ [2]), then such a simplification is not possible. More surprisingly, such a violation of multiplicativity implies that the classical simulation cost of applying certain non-free gates can be reduced by treating these jointly instead of individually. We note that in the slightly different context of so-called circuit knitting, similar savings in complexity have been shown to be significant [30]. Previous work established that the stabilizer extent is multiplicative even for multiple factors, that is, if the factors are single-qubit, 2-or 3-qubit states, i.e., n j ∈ [3], see Ref. [18]. An example is the stabilizer extent of a tensor product of t copies of the magic (single-qubit) state |T = (|0 + e iπ/4 |1 )/ √ 2 associated with the T -gate. Multiplicativity for qubit states gives ξ STAB 1 (|T ⊗t ) = ξ STABt (|T ) t , where ξ STAB 1 (|T ) is known to be approximately 1.17 [17]. This translates to an overhead exponential in t in the runtime of stabilizer computations supplemented with t T -gates. Surprisingly, the stabilizer extent has been shown not to be multiplicative (for all pairs of states) in high dimensions [19].
For (pure) Gaussian states, the Gaussian extent of a 1-, 2-and 3-mode pure fermionic state is trivially one because any 1-, 2-and 3-mode pure fermionic state is Gaussian [31] and is thus an element of the dictionary. Hence the Gaussian extent is (trivially) multiplicative if the factors are 1-, 2-or 3-mode fermionic states. The simplest non-trivial case is that of n = 4 fermionic modes in each factor.

Our contribution
Our results concern fermionic linear optics, the computational model introduced in Section 1.1.2 described by the triple (G n , E Gauss , M number ) of fermionic Gaussian pure states on n fermions, Gaussian unitary operations and number state measurements. We propose classical simulation algorithms for the case where the initial state Ψ ∈ H n is an arbitrary pure state in the n-fermion Hilbert space H n (instead of belonging to the set G n ⊂ H n of Gaussian states). Our results are two-fold: New simulation algorithms. We give algorithms realizing the functionalities described in Section 1.1 exactly for the triple (H n , E Gauss , M number ). This immediately gives rise to efficient algorithms for weak and strong simulation of circuits with non-Gaussian initial states. The corresponding runtimes of these building blocks, which we refer to as (χevolve, χmeasureprob, χpostmeasure), depend on the Gaussian rank χ = χ Gn (Ψ) of the initial state Ψ and are summarized in Table 3. Table 3: Runtimes of the building blocks χevolve, χmeasureprob, χpostmeasure for exact simulation of n-qubit fermionic linear optics circuits with a non-Gaussian initial state Ψ of Gaussian rank χ = χ Gn (Ψ). Evolution corresponds to the application of a Gaussian unitary from a set E Gauss of generators (specified below), and the set of measurements is given by occupation number measurements on each of the modes.
Key to the construction of these algorithms is a novel way of keeping track of relative phases in superpositions of Gaussian states, see Section 3. We argue that our techniques can be applied more generally to adapt simulation procedures developed, e.g., for Clifford circuits, to the setting of fermionic linear optics. In order to illustrate this procedure, we apply it to the simulation algorithms of [17,18] for efficient (approximate) classical simulation algorithms. In this way, we obtain new approximate simulation algorithms with runtimes depending linearly on the fermionic Gaussian extent ξ = ξ Gn (Ψ) of the initial state Ψ, see Table 4 for a summary of the corresponding runtimes. They depend inverse-polynomially on parameters (δ, , p f ) determining algorithm time Table 4: Runtimes of building blocks approxevolve, approxmeasureprob, approxpostmeasure for approximate simulation of n-qubit fermionic linear optics circuits with a non-Gaussian initial state Ψ of Gaussian extent ξ = ξ Gn (Ψ). The parameters ( , δ, p f ) determine the quality of the approximation.
the accuracy of the simulation. The error δ describes a certain "offset", i.e., a systematic error: Instead of simulating the dynamics of the circuit with the (ideal) initial state Ψ, the simulation algorithm emulates the dynamics when using a different starting stateΨ which is δ-close to Ψ, i.e., which satisfies Ψ −Ψ ≤ δ. The algorithm approxevolve computes evolution exactly on the state used in the simulation (i.e., it preserves the approximation error δ relative to the ideal initial state). In contrast, the procedure approxmeasureprob can fail with probability p f , and both approxmeasureprob and approxpostmeasure introduce an additional error quantified by (if approxmeasureprob succeeds): Instead of returning the probability p(0) of obtaining zero occupation number when measuring the state, the output of approxmeasureprob is a valuep which satisfies |p − p(0)| ≤ O( ). Similarly, the output of approxpostmeasure is a description of a state that is O( )-close to the actual post-measurement state. These parameters and runtimes are analogous to those obtained in [18] for simulating Clifford circuits with non-stabilizer initial states. In particular, they imply that a circuit with initial state Ψ involving T Gaussian unitaries and L occupation number measurements can be weakly simulated in timeÕ( −2 ξ), such that the sampled measurement outcomes are -close in L 1 -distance to the ideal (joint) output distribution of all measurements. Here the notationÕ(·) suppresses a factor polynomial in n, T, L and log( −1 ), see [17] for details.
On the multiplicativity of the Gaussian extent and the Gaussian fidelity. Motivated by the relevance of the Gaussian extent ξ Gn (Ψ) for characterizing the complexity of classical simulation, we study multiplicativity properties of both the D-fidelity F D (Ψ) as we well as the D-extent ξ D (Ψ) for a general infinite, i.e., continuously parameterized, dictionary D. We show that multiplicativity of the D-fidelity is closely related to that of the D-extent: For a general family of (discrete or continuous) dictionaries D j ⊂ H j for j ∈ [3] with the property multiplicativity of the D-fidelity, i.e., implies multiplicativity of the D-extent, i.e.
We note that for stabilizer states D = STAB n , a similar route was followed in Ref. [18] to show multiplicativity of the stabilizer extent ξ STABn with respect to the tensor product of 1-, 2-and 3qubit states. Our main contribution is an extension of this connection to the case of infinite dictionaries by the use of nets. We expect this connection to be helpful in proving or disproving multiplicativity of the extent more generally.
We subsequently make use of this connection to the Gaussian fidelity to show that the fermionic Gaussian extent is multiplicative for the tensor product of any two 4-mode fermionic states with positive parity, i.e., Here H 4 + denotes the set of 4-mode fermionic states with positive parity. The proof of (9) relies on the Schmidt decomposition of fermionic Gaussian states and specific properties of 4-mode (positive parity) fermionic states.
The result (9) gives the first non-trivial example of multiplicativity of the Gaussian extent. Multiplicativity for more general cases such as that of multiple 4-mode fermionic factors remains an open problem.

Prior and related work
The starting point of our work is the fact that fermionic Gaussian operations acting on Gaussian states can be efficiently simulated classically as shown in pioneering work by Terhal and DiVincenzo [9] and Knill [10]. The model and its simulability are closely related to that of matchgate computations introduced by Valiant [8], where so-called matchgates correspond to a certain certain subset of Gaussian operations (see also [32]). In analogy to the fermionic context, the efficient simulability of bosonic Gaussian circuits was recognized at around the same time [33,34]. In an effort to identify commonalities between simulation algorithms for a variety of quantum computational models, Somma et al. [35] provided a unifying Lie algebraic treatment which gives a counterpart to the Gottesman-Knill theorem for the simulability of Clifford circuits [7,36].
While matchgate circuits, fermionic and bosonic linear optics, and Clifford circuits provide rich classes of efficiently simulable models for the study of many-body dynamics associated with quantum circuits, it is desirable to extend the applicability of such simulation methods. There has been significant interest in this problem resulting in a range of approaches. We only briefly discuss these here to give an overview, without attempting to give an exhaustive treatment.
A first prominent approach is the use of quasi-probability distributions to describe states and corresponding dynamics. Such a description typically applies to a subset of density operators: For example, it has been shown in [37,38] that circuits applying Gaussian operations to initial states with a positive Wigner function (a strict superset of the set of Gaussian states) can be simulated efficiently. Negativity of the Wigner function (both in the continuous-variable as well as the qubit context) thus serves as a resource for quantum computation, also see e.g., [39,40]. It is also closely related to contextuality, see [41], and thus connects contextuality to the complexity of classical simulation [42,43]. Not unlike the notorious sign problem in quantum Monte-Carlo methods applied in many-body physics, the runtimes of corresponding (randomized) simulation algorithms scale with certain measures of "negativity" of the initial state.
The concept of a convex-Gaussian state was introduced and studied in [31] to extend the range of fermionic linear optics simulation methods. This is related to quasi-probability representations in the sense that initial states of a particular form are shown to lead to efficient simulability. Here a density operator is called convex-Gaussian if it is a convex combination of fermionic Gaussian states. The utility of this concept was illustrated in [31] by showing a converse to the fault-tolerance threshold theorem: Sufficiently noisy quantum circuits can be simulated classically because the corresponding states turn out to be convex-Gaussian. A detailed characteriziation of convex-Gaussianity is necessary to translate this into explicit (numerical) threshold estimates. An infinite hierarchy of semidefinite programs was constructed in [31] to detect convex-Gaussianity, and this was subsequently shown to be complete [44]. This hierarchy also provides a way of determining whether a state is close to being convex-Gaussian [44].
A second important class of approaches are rank-based methods. Here the non-free resource (either a state or an operation) is decomposed into a linear combination of free (i.e., efficiently simulable) resources. Our work follows this approach, which is detailed in Section 4.1 for pure states. For Clifford computations, this involves writing general states as superpositions of stabilizer states. The development of such simulators was pioneered by Bravyi, Smith, and Smolin [16] with subsequent work dealing with approximate stabilizer decompositions [17].
The concept of low-rank (approximate) decompositions of quantum states or operations into more easily treatable basic objects appears in a variety of forms: For example, the work [18] also discusses -in addition to state vector decompositions -decompositions of non-Clifford unitaries into sums of Clifford operations. In Ref. [45], a similar approach was taken to approximately decompose non-Gaussian fermionic unitary operations into linear combinations of Gaussian channels. In all these cases, the main challenge is to identify optimal (or simply good) decompositions (e.g., in terms of rank or an extent-like quantity).
In more recent work, Mocherla, Lao and Browne [46] study the problem of simulating matchgate circuits using universality-enabling gates. They provide a simulation algorithm and associated runtime estimates for estimating expectation values of single-qubit observables in output states obtained by applying a matchgate circuit to a product state. This problem is closely related to the problem considerd in this work as matchgate circuits efficiently describe evolution under a quadratic fermionic Hamiltonian. The approach taken in [46] is quite different from ours, however: The classical simulator keeps track of the density operator by tracking its coefficients in the Pauli (operator) basis, using the structure of corresponding linear maps associated with matchgates. The effect of a specific set of universality-enabling gates is then analyzed in detail. This extends the sparse simulation method for matchgate circuits to circuits augmented with such gates. The runtime estimates of [46] apply to certain universality-providing gates. In contrast, our constructions can in principle also be applied to (gadget-based) constructions of arbitrary gates and provide gate-specific information. For gates close to the identity, for example, this may provide additional resource savings (in terms of e.g., the rate of growth for several uses of such a gate).
Near the completion of our work, we also became aware of concurrent independent work on simulation of fermionic circuits with non-Gaussian operations, see the papers [47,48] which are posted simultaneously with our work to the arXiv.

Outline
The paper is structured as follows. In Section 2, we give some background on fermionic linear optics, reviewing fermionic Gaussian operations and states, inner product formulas for Gaussian states and tensor products of fermionic systems. In Sections 3 and 4 we describe classical algorithms for simulation of Gaussian and non-Gaussian fermionic circuits, respectively. Specifically, in Section 3 we provide an algorithm overlap for computing the overlap of two Gaussian states, an algorithm evolve to simulate unitary evolution of a Gaussian state, and algorithms measureprob and postmeasure to simulate measurements of occupation numbers. All these algorithms keep track of the phase of the state. In Section 4 we extend the simulation described in Section 3 to allow for non-Gaussian input states. The remainder of this work is focused on the multiplicativity of the fermionic Gaussian extent. In Section 5, we prove the multiplicativity of the fermionic Gaussian fidelity for the tensor product of any two 4-mode fermionic states with positive parity. Section 6 is devoted to showing that the multiplicativity of the D-fidelity implies multiplicativity of the D-extent for general (finite and infinite, i.e., continuously parameterized) dictionaries. Finally, the results from Sections 5 and 6 are used to prove the main result in Section 7, namely the multiplicativity of the fermionic Gaussian extent for the tensor product of any two 4-mode fermionic states with positive parity.

Background
In this section, we give some background on fermionic linear optics to fix notation.

Dirac and Majorana operators
Throughout, we consider fermionic systems composed of n modes, with (Dirac) creation-and annihilation operators a † j , a j , j ∈ [n], satisfying the canonical anticommutation relations The fermionic vacuum state |0 F is the (up to a phase) unique unit vector satisfying a j |0 F = 0 for all j ∈ [n]. For x = (x 1 , . . . , x n ) ∈ {0, 1} n , we define the number state |x by The states {|x } x∈{0,1} n are an orthonormal basis of the underlying Hilbert space H n ∼ = (C 2 ) ⊗n . A state |x is a simultaneous eigenstate of the occupation number operators a † j a j , j ∈ [n], where x j is the eigenvalue of a † j a j . For later reference, we note that with the definition where we write 0 = 1 and 1 = 0, where e j ∈ {0, 1} n is given by (e j ) k = δ j,k for k ∈ [n], and where ⊕ denotes bitwise addition modulo 2.
It will be convenient to work with Majorana operators {c j } 2n j=1 defined by Majorana operators are self-adjoint and satisfy the relations For α ∈ {0, 1} 2n , we call the self-adjoint operator a Majorana monomial. Here |α| = 2n j=1 α j denotes the Hamming weight of α ∈ {0, 1} 2n . The set {c(α)} α∈{0,1} n constitutes an orthonormal basis of the real vector space of self-adjoint operators on H n equipped with the (normalized) Hilbert-Schmidt inner product A, B = 2 −n tr(A † B). The Majorana monomials satisfy where x·y = n j=1 x j y j . In particular, if either x or y have even Hamming weight then c(x)c(y) = (−1) x·y c(y)c(x). In the following, we will denote the set of even-and odd-weight 2n-bit strings by {0, 1} 2n + and {0, 1} 2n − , respectively. The parity operator is the Majorana monomial associated with α = 1 2n = (1, . . . , 1). The parity operator commutes with every even-weight Majorana monomial and anti-commutes with every odd-weight Majorana monomial, i.e., we have The Hilbert space H n = H n + ⊕ H n − associated with n fermions decomposes into a direct sum of positive-and negative-parity vectors We call a state Ψ ∈ H n of definite parity if either Ψ ∈ H n + or Ψ ∈ H n − . An element X ∈ B(H n ) belonging to the set B(H n ) of linear operators on H n is called even (odd) if it is a linear combination of Majorana monomials c(α) with α ∈ {0, 1} 2n of even (odd) weight. An immediate consequence of these definitions is that a state Ψ ∈ H n has definite parity if and only if |Ψ Ψ| is even (see e.g., [49, Proposition 1] for a proof).

Gaussian unitaries
where R ∈ O(2n) is a real orthogonal matrix. Ignoring overall phases, the group of Gaussian unitary operators is generated by operators of the form and by operators The operator U j,k (ϑ) implements the rotation The operator U j = c j leaves c j invariant and flips the sign of each c k with k = j, i.e., it implements the reflection We note that by relation (14), every generator U j,k (ϑ) is parity-preserving, whereas every generator U j reverses the parity, i.e., Every orthogonal matrix R gives rise to a Gaussian unitary U R satisfying (15). The unitary U R is unique up to a global phase, and R → U R is called the metaplectic representation. We can fix the global phase of U R uniquely, e.g., by the following procedure. Every element R ∈ O(2n) can be uniquely decomposed into a product with L ≤ 2n(2n−1) 2 and where where for each r ∈ [L], the matrix S r is of the form Here is associated with U j,k (ϑ) according to Eq. (16). We note that R j,k (ϑ) ∈ SO(2n) is a so-called Givens rotation, introduced in Ref. [50], and a decomposition of the form (19) can be found by a deterministic algorithm with runtime O(n 3 ) (see e.g., Section 5.2.3 in Ref. [51]). In particular, application of this algorithm defines a function taking an arbitrary element R ∈ O(2n) to a unique product of the form (19). Given the (unique) decomposition (19) of R ∈ O(2n), we can then define U R as the product Overall, this defines a function R → U R from O(2n) to the set of Gaussian unitaries, fixing the phase ambiguity. Throughout the remainder of this work, U R will denote the Gaussian unitary uniquely fixed by R.

Fermionic Gaussian (pure) states
The set of pure fermionic Gaussian states is the orbit of the vacuum state |0 F under the action of O(2n) defined by the metaplectic representation, i.e., fermionic Gaussian states are of the form U R |0 F with U R a fermionic Gaussian unitary. In more detail, every fermionic Gaussian state e iθ U R |0 F is uniquely specified by a pair (θ, R) with θ ∈ [0, 2π) and R ∈ O(2n). We will denote the set of all fermionic Gaussian states by By Eq. (18) and because P |0 F = |0 F , every pure fermionic Gaussian state Ψ has a fixed parity, i.e., it is an eigenvector of the parity operator P . This defines a disjoint partition G n = G + n ∪ G − n of the set of fermionic Gaussian states into positive-and negative-parity states.

Gaussianity condition
In Ref. [52] Bravyi established a necessary and sufficient condition to determine if a (possibly mixed) state ρ ∈ B(H n ) is Gaussian (see Theorem 1 therein). Here Gaussianity of a density operator ρ is defined by the condition that ρ has the form for an antisymmetric matrix A = −A T ∈ Mat 2n×2n (R) and a constant K > 0. We note that a pure state Ψ ∈ H n is Gaussian if and only if the associated density operator ρ = |Ψ Ψ| is Gaussian. (This follows from the fact that (20), then it follows from Williamson's normal form for antisymmetric matrices that there is Conversely, for a Gaussian state |Ψ = e iθ U R |0 F ∈ G n , we can use the expression for |0 F 0 F | to argue that |Ψ Ψ| is of the form (20), i.e., Gaussian.) The characterization of Gaussian density operators established in [52] is the following. Based on this characterization [52], the following was shown in [31].
Let ρ ∈ B(H n ) be an even state. Then ρ is a Gaussian pure state if and only if Λ (ρ ⊗ ρ) = 0.
In the following, we only use the statement of Lemma 2.2 applied to pure states in order to distinguish between Gaussian and non-Gaussian pure states. We formulate this as follows: Proof. This follows immediately from the equivalence of the concepts of Gaussianity of pure states (vectors) and density operators because the density operator |Ψ Ψ| is even for any fixedparity state Ψ.
We note that there is an elegant representation-theoretic interpretation of this characterization of Gaussianity [53]. It is derived from the fact that Gaussian states are the orbit of the vacuum state |0 F (a highest weight state) under the action of the metaplectic group, cf. [54, Section IV] and [55]. We use a version of this reformulation for 4 fermions, see Lemma 5.1 below, that has first been obtained in [56].

Covariance matrices, Gaussian states and Wick's theorem
The covariance matrix Γ = Γ(Ψ) ∈ Mat 2n×2n (R) of a state Ψ ∈ H n is the antisymmetric matrix with entries . It satisfies ΓΓ T = I for any state Ψ ∈ H n . The expectation value of a Hermitian operator with respect to a Gaussian state Ψ is fully determined by its covariance matrix Γ = Γ(Ψ). This is because the expectation value of a Majorana monomial c(α), α ∈ {0, 1} 2n , is given by Wick's theorem Here Γ[α] ∈ Mat |α|×|α| (R) is the submatrix of Γ which includes all rows and columns with index j ∈ [2n] such that α j = 1. Evaluating such expectation values, i.e., computing Pfaffians of |α| × |α|-matrices (with |α| even), takes time O(|α| 3 ). (Here and below we use the number of elementary arithmetic operations to quantify the time complexity of algorithms.)

Inner product formulas for Gaussian states
The modulus of the inner product of two Gaussian states Φ 1 , Φ 2 with identical parity σ ∈ {±1} and covariance matrices Γ 1 , Γ 2 is given by the expression [57] For three Gaussian is invariant under a change of the global phase of any of the states, and can therefore be computed by the covariance matrix formalism. An explicit expression was derived by Löwdin in [57]. In Ref. [58] Bravyi and Gosset gave the formula is the covariance matrix of Φ j for j = 0, 1, 2. More generally, they obtained the formula for any even-weight Majorana monomial c(α), α ∈ {0, 1} 2n + , where Here ) is a diagonal matrix, whereas J α ∈ Mat |α|×2n (R) has entries defined in terms of the indices {i ∈ [2n] | α i = 0} = {i 1 < · · · < i r } associated with non-zero entries of α, that is, In other words, (J α ) j,k = 1 if and only if k is the position of the j-th nonzero element of α. As argued in [58], expressions (23) and (24) determine the inner product Φ 1 , Φ 2 and an expression of the form Φ 1 , c(α)Φ 2 entirely in terms of covariance matrices, assuming that the remaining two overlaps Φ 0 , Φ 1 , Φ 2 , Φ 0 with a Gaussian reference state Φ 0 are given and non-zero. In this situation, these quantities can be computed in time O(n 3 ).

Gaussian evolution and occupation number measurement
Underlying the known classical simulation algorithms for fermionic linear optics is the fact that Gaussian unitaries and occupation number measurements preserve Gaussianity. Explicitly, this can be described as follows: Given a Gaussian state Ψ with covariance matrix Γ(Ψ) (ii) measurement of the observable a † j a j for j ∈ [n] gives the outcome s ∈ {0, 1} with probability Given that the measurement outcome is s ∈ {0, 1}, the post-measurement state is Gaussian with covariance matrix Γ(Ψ(s)) defined by the lower-diagonal entries for k > .

The tensor product of two fermionic states
Two density operators ρ j ∈ B(H n j ), j ∈ [2], have a joint extension if and only if there is an element ρ ∈ B(H n 1 +n 2 ) such that Here α 1 α 2 ∈ {0, 1} 2(n 1 +n 2 ) denotes the concatenation of α 1 and α 2 . Theorem 1 in [59] implies that if either ρ 1 or ρ 2 is even, then a unique joint extension ρ of (ρ 1 , ρ 2 ) exists. Furthermore, this extension is even if and only if both ρ 1 and ρ 2 are even. Theorem 2 in [59] shows that if ρ is even and ρ 1 and ρ 2 are pure, then ρ is also pure.
In particular, this means that for states Ψ 1 , Ψ 2 of definite parity, there is a unique joint pure extension ρ = |Ψ Ψ| of (|Ψ 1 Ψ 1 |, |Ψ 2 Ψ 2 |). Since ρ is pure, this also means that Ψ is of definite parity. We will write Ψ = Ψ 1⊗ Ψ 2 for this state in the following, and we call⊗ the fermionic tensor product. Note that Ψ is only defined up to a global phase. It follows immediately from these definitions that for all x ∈ {0, 1} n 1 and y ∈ {0, 1} n 2 . (29) Proof. Let x ∈ {0, 1} n 1 and y ∈ {0, 1} n 2 be arbitrary. By definition, we have Refining Eq. (29), (relative) phase information between these matrix elements can be obtained from the explicit construction of Ψ 1⊗ Ψ 2 given in [59, Section 3.1] (see also [49, Proof of Theorem 1]): Consider the isometry U : H n 1 +n 2 → H n 1 ⊗ H n 2 |x 1 , . . . , x n 1 +n 2 → |x 1 , . . . , x m ⊗ |x n 1 +1 , . . . , x n 1 +n 2 whose action is given by where P 1 , the parity operator acting on H n 1 , introduces phases. Then It is straightforward from this definition to check that Ψ 1⊗ Ψ 2 is the extension of (Ψ 1 , Ψ 2 ) and We note that the fermionic tensor product preserves Gaussianity in the following sense.

Tracking relative phases in fermionic linear optics
The covariance matrix Γ(Ψ) of a fermionic Gaussian state |Ψ = e iθ U R |0 F ∈ G n fully determines expectation values by Wick's theorem, which is why Gaussian states and dynamics are amenable to efficient classical simulation (see Section 2.7). However, the description of Ψ in terms of the covariance matrix Γ(Ψ) does not capture information on the global phase e iθ of the state. For processes involving non-Gaussian states expressed as superpositions of Gaussian states, such phase information needs to be available for computing norms, expectation values and overlaps.
Here we provide an extended (classical) description of fermionic Gaussian states that incorporates phase information. A central feature of our construction is the fact that this description permits to compute overlaps (including relative phases, i.e., not only the absolute value) of Gaussian states in an efficient manner.
Our construction is motivated by and relies on expression (23), which relates the inner product Ψ 1 , Ψ 2 of two Gaussian states Ψ 1 , Ψ 2 ∈ G n to their inner products Ψ 0 , Ψ 1 , Ψ 0 , Ψ 2 with a Gaussian reference state Ψ 0 ∈ G n and their covariance matrices Γ 0 , Γ 1 , Γ 2 . This suggests fixing a reference state Ψ 0 ∈ G n and using the pair (Γ(Ψ), Ψ 0 , Ψ ) as a classical description of any state |Ψ ∈ G n relevant in the computation. The problem with this idea is that Ψ 0 , Ψ may vanish, preventing the application of (23). To avoid this problem, we use -instead of a single state Ψ 0 -a (potentially) different reference state for each state Ψ. Specifically, we will show that using number states, i.e., states of the form (10), suffices. This motivates the following definition.
In our algorithms we will in fact ensure that | x, Ψ | 2 ≥ 2 −n , i.e., only a subset of valid descriptions is used. A description d = (Γ, x, r) with this property, i.e., satisfying |r| 2 ≥ 2 −n , will be called a good description. The restriction to good descriptions is necessary to make our algorithms work with finite-precision arithmetic.
More explicitly, necessary and sufficient conditions for d = (Γ(Ψ), x, r) to constitute a description of Ψ are that r = 0 and |r| 4 = 2 −2n Det(Γ(|x ) + Γ(Ψ)) because of formula (22) for the overlap of two states and because Det(·) = Pf 2 (·). Here is the covariance matrix of |x . Since a Gaussian state Ψ generally has non-zero overlap with more than a single occupation number state |x , there are several distinct valid descriptions of Ψ. We will denote the set of descriptions of |Ψ ∈ G n by Desc(Ψ). We note that a description d = (Γ, x, r) uniquely fixes a Gaussian state Ψ ∈ G n : The covariance matrix Γ determines all expectation values, and the global phase of Ψ is fixed by the overlap x, Ψ , i.e., by r. Denoting by Desc n = Ψ∈Gn Desc(Ψ) the set of all descriptions of fermionic Gaussian n-mode states, this means that we have a function The main result of this section shows that expectation values, overlaps, and matrix elements of (Majorana) operators with respect to Gaussian states can be efficiently computed from their classical descriptions. Furthermore, when evolving a Gaussian state under a Gaussian unitary, the description of the resulting state can be computed efficiently. The same is true for the post-measurement state when applying an occupation number measurement.
For evolution, we note that it suffices to consider Gaussian unitaries of the form U R where R ∈ O(2n) belongs to the set of generators Gen(O(2n)) introduced in Section 2.2, that is, Here R j,k (ϑ) is a Givens rotation and We note that each element of Gen(O(2n)) can be specified by a tuple (j, k, ϑ) ∈ [2n] × [2n] × [0, 2π) or an index j ∈ [2n], respectively. We assume that this parameterization is used in the following algorithms (but leave this implicit).
To state the properties of our (deterministic) algorithms, it is convenient to express these as functions.
Theorem 3.2 (Overlap, evolution, and measurement). Let Ψ(d) ∈ G n be the Gaussian state associated with a description d ∈ Desc n , see Eq. (33). Then the following holds: (i) The algorithm overlap : Desc n × Desc n → C given in Fig. 7 has runtime O(n 3 ) and satisfies (ii) The algorithm evolve : Desc n × Gen(O(2n)) → Desc n given in Fig. 9 has runtime O(n 3 ) and satisfies for all d ∈ Desc n and R ∈ Gen(O(2n)) , where U R denotes the Gaussian unitary associated with R ∈ O(2n).  The output of both evolve and postmeasure is a good description for any input.
We argue that descriptions of relevant initial states can be obtained efficiently. Clearly, this is the case for any state of the form |Ψ = U R L · · · U R 1 |0 F obtained by applying a sequence {R j } j∈[L] ⊂ Gen(O(2n)) of generators to the vacuum state |0 F : Here we can use the algorithm evolve L times, producing a description of |Ψ in time O(Ln 3 ).
We we will at times need a description of a state |Ψ but do not require fixing its global phase. This is the case for example when subsequent computational states only involve phase-insensitive expressions, e.g., terms of the form | Ψ, Φ | 2 . Such a description can be found efficiently from the covariance matrix Γ of |Ψ . Since the phase can be fixed arbitrarily, the problem here is to find x ∈ {0, 1} n such that x, Ψ is non-zero. Theorem 3.3. There is an algorithm describe : Mat 2n×2n (R) → Desc n with runtime O(n 3 ) such that for any covariance matrix Γ, the state Ψ(describe(Γ)) is a Gaussian state with covariance matrix Γ, and describe(Γ) is a good description.
The remainder of this section is devoted to the proofs of Theorem 3.2 and Theorem 3.3: We describe the algorithms evolve, overlap, measureprob, postmeasure and describe in detail, providing pseudocode, and verify that these satisfy the desired properties.

Subroutines
Our algorithms make use of subroutines called findsupport, relatebasiselements, overlaptriple and convert which we describe here.

4:
for j ← 1 to n do simulate a measurement of a † j a j 5: if q j ≥ 1/2 then choose the higher-probability outcome 7: x j ← 0 8: p j ← q j 9: else 10: x j ← 1 11: covariance matrix of the post-measurement state 13: for ← 1 to n − 1 do
Proof. The main idea of the algorithm is to mimic a measurement in the number state basis executed in a sequential manner. Consider the following process: Suppose we start with the state Ψ (0) = Ψ, and then measure a † j a j successively for j = 1, . . . , n. Let P (x j |x 1 · · · x j−1 ) denote the conditional probability of observing the outcome x j ∈ {0, 1} (when measuring a † j a j ), given that the previous measurements yielded (x 1 , . . . , x j−1 ). According to Born's rule, this is given by x 1 ···x j−1 is the post-measurement state after the first (j − 1) measurements. The probability of observing the sequence x ∈ {0, 1} n of outcomes then is by Bayes' rule.
The algorithm findsupport simulates this process: For each j ∈ [n], the quantity q j computed in line 5 is equal to the conditional probability P (0|x 1 · · · x j−1 ) that the j-th measurement results in the outcome 0. Lines 6-11 ensure that the outcome x j ∈ {0, 1} with higher probability of occurrence is selected at each step, guaranteeing Property (34) (because of Eq. (35)). The matrix Γ (j) computed in steps 12-18 is the covariance matrix of the post-measurement state Ψ (j) Each measurement is thus realized in time O(n 2 ) yielding the overall complexity of O(n 3 ).
The algorithm relatebasiselements is more straightforward: Given x, y ∈ {0, 1} n , it outputs (α, ϑ) ∈ {0, 1} 2n ×R such that c(α) |x = e iϑ |y . That is, it finds a Majorana monomial c(α) which maps the basis state |x to |y up to a phase and computes the corresponding phase. In Fig. 2 we give pseudocode for this algorithm.
Require: Γ j covariance matrix of a Gaussian state Φ j for j = 0, 1, 2 return o Figure 3: The algorithm overlaptriple takes as input the covariance matrices Γ j of three Gaussian states Φ j , j = 0, 1, 2 with identical parity, α ∈ {0, 1} n + and the overlaps Φ 0 , Φ 1 , Φ 1 , c(α)Φ 2 . The latter have to be non-zero. The algorithm computes the overlap Φ 2 , Φ 0 using Eq. (24). Fig. 4 gives a graphical representation of what the algorithm overlaptriple achieves. These graphical representations will be helpful to construct and analyze other algorithmic building blocks.
(b) Applying overlaptriple provides the inner product Φ 0 , Φ 2 . In this diagrammatic representation, this completes the triangle with ver- The algorithm convert takes a description d = (Γ, x, r) of a Gaussian state Ψ(d) and y ∈ {0, 1} n such that y, Ψ(d) = 0, and outputs a description d = (Γ, y, s) of the same state. In other words, it converts a description d of the state involving the reference state |x to a description d of the same state but involving a different reference state |y . In Fig. 5 we give pseudocode for this algorithm.
(a) The input to the algorithm convert specifies a Gaussian state Ψ, x ∈ {0, 1} n such that x, Ψ = 0, the value r = x, Ψ and an element y ∈ {0, 1} n such that y, Ψ = 0. The value of y, Ψ is not given.
(b) The algorithm applies the subroutine relatebasiselements to find (α, ϑ) such that c(α) |y = e iϑ |x . In particular, after this step, the value x, c(α)y = e iϑ is known and it is non-zero.
(c) The algorithm then applies the subroutine overlaptriple to compute w = y, Ψ . The triple (Γ, y, w) is a valid description of Ψ.
Furthermore, denoting the output of convert(d, y) by d = (Γ , y , s ) we have y = y as well as Proof. Let us denote the input to convert by (d, y), where d = (Γ, x, r) ∈ Desc n and y ∈ {0, 1} n . Then since d is a description of Ψ(d). Furthermore, for (α, ϑ) as defined in line 2 we have x, c(α)y = e iϑ = 0 (39) by definition of the algorithm relatebasiselements. In line 3 of convert, the matrices Γ j , j ∈ [3] are the covariance matrices of the states We note that Eq. (38) and the assumption y, Ψ(d) = 0 imply that these three states have identical parity. The value w computed in line 6 using overlaptriple is equal to the overlap because by Eqs. (38) and (39). Eq. (40) together with the assumption y, Ψ(d) = 0 show that the output (Γ, y, w) is a description of Ψ(d). This completes the proof of Eq. (36). Eq. (37) is trivially satisfied because The complexity of the algorithm is dominated by overlap, which takes time O(n 3 ).

Computing overlaps and descriptions of evolved/measured states
Based on the subroutines findsupport, relatebasiselements, overlaptriple and convert, we can now describe our main algorithms overlap, evolve, measureprob and postmeasure for overlaps, Gaussian unitary evolution, to compute the outcome probability and the post-measurement state when measuring the occupation number, respectively. We give pseudocode for each algorithm and establish the associated claims. We give pseudocode for the algorithm overlap in Fig. 7 and we illustrate it in Fig. 8.

11:
return w (a) The input to the algorithm overlap consists of (descriptions of) two Gaussian Ψ 1 , Ψ 2 , x 1 , x 2 ∈ {0, 1} n and the overlaps r j = x j , Ψ j , j ∈ [2]. The latter are both assumed to be non-zero.
(b) The algorithm uses the subroutine relatebasiselements to find (α, ϑ) such that c(α) |x 2 = e iϑ |x 1 . In particular, this means that the value x 1 , c(α)x 2 = e iϑ is computed, and it is non-zero. Furthermore, this implies that = e iϑ r 2 is also known and non-zero.
(c) In the last step of the algorithm, the subroutine overlaptriple is applied to complete a triangle: This amounts to computing w = Ψ 2 , Ψ 1 . The algorithm returns the complex conjugate w = Ψ 1 , Ψ 2 .
Furthermore, by Eq. (43) the states (Φ 0 , Φ 1 , Φ 2 ) have identical parity. It thus follows from the properties of overlaptriple that the quantity w computed in step line 10 is equal to Since the output of the algorithm is the complex conjugate w = Ψ(d 1 ), Ψ(d 2 ) , this implies the claim (41). The runtime of overlap is dominated by overlaptriple, hence it is of order O(n 3 ).
We give pseudocode for the algorithm evolve in Fig. 9 and we illustrate it in Fig. 10.
(a) The input to the algorithm evolve is (the description of) a Gaussian state Ψ, x ∈ {0, 1} n and the non-zero overlap r = x, Ψ , as well as an element R ∈ Gen(O(2n)) associated with a Gaussian unitary U R .
(b) By unitarity of U R , the input data also provides the inner product U R x, U R Ψ = x, Ψ = r, which is non-zero.
(c) The algorithm invokes the subroutine findsupport applied to the covariance matrix Γ 0 of the evolved state U R Ψ in order to find an element y ∈ {0, 1} n such that y, U R Ψ = 0. (The value of this inner product is not computed/available at this point.) (d) Using the fact the action of U R for a generator R ∈ Gen(O(2n)) is local, the algorithm determines an element z ∈ {0, 1} n such that s = z, U R x = 0, and computes the value s.
(e) The algorithm then uses the subroutine relatebasiselements to find (α, γ) such that c(α) |y = e iγ |z . This means that the inner product z, c(α)y = e iγ is known and non-zero. Since z, U R x is known and non-zero (as ensured by the previous step (10d)), the value U R x, c(α)y = e iγ z, U R x is also non-zero and can be computed.
(f) In the last step, the subroutine overlaptriple is used to compute the quantity w = y, U R Ψ . It is non-zero by step (10c). Thus (Γ 0 , y, w) is a valid description of U R Ψ. Figure 10: An illustration of the algorithm evolve. Dotted lines correspond to inner products whose value is non-zero, but has not been computed at that stage of the algorithm.
Lemma 3.8. The algorithm evolve : Gen(O(2n)) × Desc n → Desc n given in Fig. 9 runs in time O(n 3 ). Consider an arbitrary generator R ∈ Gen(O(2n)) and a description d ∈ Desc n . Then that is, the output of evolve is a description of the evolved state U R Ψ(d). Furthermore, denoting the output by d = (Γ , x , r ) = evolve(R, d) we have Proof. Let us denote the input of evolve by (R, d) where R ∈ Gen(O(2n)) and d = (Γ, x, r) ∈ Desc n . The state U R Ψ(d) has covariance matrix Γ 0 = RΓR T computed in line 2 (see Section 2.7). By the properties of findsupport (see Lemma 3.4), the state |y with y = findsupport(Γ 0 ) ∈ {0, 1} n computed in line 3 is such that In particular, it is non-zero. The remainder of the algorithm computes y, U R Ψ(d) .
We first show the following: and Proof. Here we are using the fact that for any generator R ∈ Gen(O(2n)), the associated Gaussian unitary U R has a local action on the mode operators. In particular, we can easily compute the image U R |x of a number state |x under U R . We distinguish two cases: : In this case, R is associated with the unitary evolution operator U j,k = exp(ϑ/2c j c k ) = cos(ϑ/2)I + sin(ϑ/2)c j c k .
In particular, comparing with (49), it follows immediately that the claims (47) and (48) are satisfied.
(ii) R = R j , j ∈ [2n] is a reflection (see Lines 12-15): Here R is associated with the unitary evolution operator Its action on |x , x ∈ {0, 1} n , is described by Eq. (50), i.e., we have This state is proportional to |x ⊕ e j , showing that the choice indeed ensures that the claims (47) and (48) are satisfied.
Equipped with Claim 3.9, we can show that the algorithm evolve has the desired functionality. The matrix Γ 0 computed in Line 2 is the covariance matrix of the evolved state U R Ψ(d), whereas Γ 1 , Γ 2 computed in Line 17 are the covariance matrices of U R |x and |y , respectively. Thus overlaptriple in Line 20 is invoked on the triple of states To check that the requirements of overlaptriple are satisfied, first observe that Furthermore, this is non-zero because r (part of the input) is non-zero by definition of the description d = (Γ 0 , x, r) of Ψ(d).
We conclude from the the properties of overlaptriple that By construction of y using findsupport, we have see Eq. (46). In particular, we conclude that the triple (Γ 0 , y, w) is a valid description of U R Ψ(d) with the desired property (45). This is what the algorithm returns.
The runtime of the algorithm evolve is dominated by the runtime O(n 3 ) of the algorithm overlaptriple.
We give pseudocode for the algorithm measureprob in Fig. 11.  where Π j (s) = 1 2 (I + (−1) s ic 2j−1 c 2j ) is the projection onto the eigenvalue-s eigenspace of a † j a j . Proof. We denote the input to measureprob by (d, j, s) where d = (Γ, x, r) ∈ Desc n is a description of a state Ψ(d), j ∈ [n] and s ∈ {0, 1}. Given the state Ψ(d), the probability of obtaining measurement outcome s when measuring the occupation number operator a † j a j is given by Eq. (26). This is the output of the algorithm in line 2 and gives the claim. Computing line 2 requires a constant number of arithmetic operations, giving the runtime O(1).
We give pseudocode for the algorithm postmeasure in Fig. 12 and we illustrate it in Fig. 13.
compute covariance matrix of post-measurement state Ψ 3: for ← 1 to n − 1 do
(b) After computing the covariance matrix Γ of the postmeasurement state Ψ , the algorithm uses the subroutine findsupport to find an element y ∈ {0, 1} n such that y, Ψ = 0. (The value of this inner product is not computed at this point.) The subroutine relatebasiselements is then used to find (α, ϑ) such that c(α) |y = e iϑ |x .
(d) The algorithm overlaptriple is used to compute the overlap w = y, Ψ .
(e) As argued in the proof of Lemma 3.11, the inner product y, Ψ can be computed from w and the probability p: it is equal to y, Ψ = w/ √ p.
Thus (Γ , y, w, √ p) is a description of Ψ . In particular, this is also non-zero. The requirements to run overlaptriple in Line 14 are therefore met, and Line 14 returns It remains to show that (Γ , y, w/ √ p) (the expression returned by the algorithm) is a valid description of the post-measurement state Ψ , and to establish the bound |w/ √ p| 2 ≥ 2 −n (58) in order to prove Eq. (54).
because Π j (s) is self-adjoint and with the Cauchy-Schwarz inequality. In particular, we have Π j (s)y = 0 and thus Π j (s)y = y (60) since any number state |y is an eigenvector of the projection Π j (s). Inserting (60) into (59) and using (57) we obtain the bound establishing (58). Eq. (60) and the self-adjointness of Π j (s) also imply that Since Γ is the covariance matrix of Ψ and p = Π j (s)Ψ(d) 2 is the probability of obtaining outcome s when measuring a † j a j , this shows that (Γ , y, w/ √ p) is a valid description of Ψ as claimed.
The complexity of the algorithm is dominated by overlaptriple, which takes time O(n 3 ).

Initial states for computation
Using the algorithm evolve, it is straightforward to generate a description of a state that is obtained by applying a sequence of Gaussian unitaries (generators) to the vacuum state. This is all that is typically needed to describe initial states. In cases where we do not need to fix the overall phase, we can generate a description from the covariance matrix. The algorithm describe takes as input the covariance matrix Γ of a Gaussian state Ψ and outputs a description d ∈ Desc n of a Gaussian state which is equal to Ψ up to a global phase. It is given in Fig. 14 and it simply uses the subroutine findsupport and Eq. (22).
Proof. Let Γ ∈ Mat 2n×2n (R) be a covariance matrix and let Ψ be a Gaussian state with covariance matrix Γ. By definition of the algorithm findsupport, the value y ∈ {0, 1} n computed in line 3 satisfies By Eq. (22), the value r computed in Line 5 satisfies In particular, there is an angle ϑ ∈ [0, 2π) such that It follows immediately that d = (Γ, y, r) is a valid description of the Gaussian state e iϑ Ψ = Ψ(d) with the required property (61).

Classical simulation of fermionic Gaussian circuits with non-Gaussian initial states
In this section, we argue that the techniques developed in Section 3 to describe fermionic Gaussian states (including relative phases) give rise to efficient classical simulation algorithms for computations composed of non-Gaussian initial states, Gaussian unitaries and occupation number measurements. Specifically, we argue that algorithms developed in the context of stabilizer circuits can immediately be translated to this fermionic setup. Furthermore, this translation maintains runtime bounds when the stabilizer extent is replaced by the fermionic Gaussian extent. Because of the generality of this adaptation procedure -it being applicable to a variety of simulation algorithms both for strong and weak simulation -we restrict our attention to the key substitutions.
Our algorithms apply to the efficient classical simulation of fermionic circuits of the following form, involving n fermions.
(i) The initial state Ψ (0) = Ψ is a possibly non-Gaussian state Ψ. We assume that its fermionic Gaussian extent ξ(Ψ) and a corresponding optimal decomposition into a superposition of Gaussian states is known. This is the case for example for any four-fermion state, or a tensor product of two four-fermion states, see Section 5. Alternatively, we may assume that an upper bound ξ(Ψ) ≥ ξ(Ψ) and a corresponding decomposition of Ψ achieving this value is known: In this case, runtime upper bounds will depend on ξ(Ψ) instead of ξ(Ψ).
(ii) The computation proceeds in a sequence of timesteps. At each step t ∈ [T ], one of the following is performed: (a) A Gaussian unitary U R , R ∈ Gen(O(2n)) is applied to the state. Here the choice of R may depend (in an efficiently computable manner) on measurement results obtained previously. We will leave this dependence implicit and do not take it into account in our runtime estimates, as it will typically depend heavily on the circuit considered.
(b) An occupation number measurement, i.e., measurement of the operator a † j a j for some j ∈ [n] is performed, yielding a measurement outcome s ∈ {0, 1} and a corresponding postmeasurement state. The choice of the mode j ∈ [n] to be measured may again depend (in an efficient manner) on the measurement outcomes already obtained.
We note that the restriction to the set of Gaussian unitaries associated with generators of O(2n) in (iia) incurs no loss of generality at the cost of possibly increasing T by a factor of order O(n 2 ) and an additive term in the runtime of order O(n 3 ) since a decomposition of an arbitrary element R ∈ O(2n) of the form (19) as a product of L ≤ O(n 2 ) generators can be found in time O(n 3 ), see the discussion below Theorem 3.2.
The use of arbitrary initial states Ψ in (i) allows us to model, in particular, the application of certain "magic gates" using so-called gadgets. These can be realized by using non-Gaussian auxiliary states combined with Gaussian operations, see e.g., [12,13]. Since all 1-, 2-and 3fermion states are Gaussian [31], 4-fermion states provide the smallest non-trivial examples; these will also be our main focus in Section 5. We refer to e.g., [12,13] for a discussion of these constructions.
We proceed as follows: In Section 4, we formulate in general terms how simulation algorithms for a model can be generalized to initial states that are superpositions: This follows known approaches for stabilizer circuits augmented by magic states. In Section 4.2 we review the relationship between the D-extent and the D-rank defined by a dictionary D. In Section 4.3 we discuss fast algorithms for estimating norms of superpositions of dictionary states. In Section 4.4 we apply these constructions to our setup.

Extending simulation algorithms to superpositions
Here we discuss how to extend simulation algorithms for an efficiently simulable model (D, E, M) in such a way that the resulting extended algorithms (χevolve, χmeasureprob, χpostmeasure) work with any initial state Ψ which is a superposition of χ elements of D (i.e., has D-rank bounded by χ D (Ψ) ≤ χ). Our discussion is standard and is included only for the reader's convenience: It follows that for stabilizer states as discussed in [18].
Recall that the dictionary D is a set of states, E a set of operations and M a set of measurements. In addition to the subroutines evolve, measureprob and postmeasure for evolution and measurement associated with (D, E, M), the construction discussed here requires an efficient algorithm overlap which computes inner products Ψ(d 1 ), Ψ(d 2 ) from descriptions (d 1 , d 2 ) ∈ Desc 2 n . This means that the description d ∈ Desc n of a state Ψ(d) must include phase information. For Gaussian states, the covariance matrix formalism has to be extended as discussed in Section 3.
Our goal is to find classical simulation algorithms for circuits of the following form: (i) The initial state Ψ (0) = Ψ is a superposition of the form of χ states {ϕ j } χ j=1 ⊂ D with complex coefficients {γ j } χ j=1 . We assume that this decomposition is explicitly provided as an input to the classical algorithm in the form of a χ-tuple {(γ j , d j )} D j=1 , where d j is an efficient classical descriptions of the state ϕ j . j } D j=1 describing the instantaneous state Ψ (t) after step t as a linear combination of vectors belonging to the dictionary D, and the subroutines (χevolve, χmeasureprob, χpostmeasure) are used to successively update this description (respectively sample from corresponding measurement outcomes). Before describing the extended routines χevolve, χmeasureprob, χpostmeasure in more detail, it is convenient to introduce a subroutine χnorm which takes as input a tuple {(γ j , d j )} χ j=1 ∈ (C × Desc n ) χ and outputs the squared norm χ j=1 γ j Ψ(d j ) 2 . It is clear that such a primitive can be realized naively by using the algorithm overlap for computing inner products. This naive implementation, which we refer to as χnaivenorm, requires time Let us now describe the procedures χevolve, χmeasureprob and χpostmeasure, building on a (general) norm computation subroutine χnorm.
(a) if an evolution operation E ∈ E with description d E is applied at time t, then the description is updated by setting This defines the algorithm χevolve. The runtime of this update is time(χevolve) = χ · time(evolve) .
(b) if a (projective) measurement M = {M s } s∈M ∈ M with description d M is applied to the state at time t, then the probability of obtaining s ∈ M is given by Here the probability p(s|Ψ is the squared norm of a superposition of elements from D. This expression (together with the norm computation routine χnorm) defines the algorithm χmeasureprob. In particular, given {γ j , d j (s), p(s|Ψ (t−1) j )} χ j=1 , the probability p(s|Ψ) can be evaluated (exactly) in runtime time(χnorm). Since χmeasureprob first has to compute the descriptions d j (s) of the post-measurement states Ψ (t−1) j (M, s) and the probabilities {p(s|Ψ (t−1) j )} χ j=1 , its runtime is time(χmeasureprob) = time(χnorm) + χ · (time(measureprob) + time(postmeasure)) .
One can easily verify that the post-measurement state after time step t is given by In particular, this means that (similarly as for χmeasureprob) we have an algorithm χpostmeasure which given {(γ j , d Given the ability to compute p(s|Ψ) and assuming, e.g., that the number |M| of measurement outcomes is constant, one can then sample from this distribution (when the goal is to perform weak simulation) to get an outcome s ∈ M.
Using the naive algorithm χnaivenorm for χnorm gives runtimes

Sparsification: Relating D-extent to approximate D-rank
Algorithms whose complexity depends on the D-extent ξ D (Ψ) instead of the (exact) D-rank χ D (Ψ) (see Eq. (3)) of the initial state Ψ can be obtained as follows. The idea consists in replacing Ψ by a stateΨ which is δ-close to Ψ and has bounded D-rank. More precisely, it relies on the following result which connects the D-extent ξ D (Ψ) to the approximate D-rank χ δ D (Ψ) defined in Eq. (4).  Theorem 1 in [18]). Suppose Ψ = m j=1 γ j ϕ j is a decomposition of a normalized vector Ψ into a superposition of elements {ϕ j } m j=1 belonging to the dictionary D. Then where γ 1 = m j=1 |γ j | is the 1-norm of γ. In particular, we have the relationship In [18], this result was established for the dictionary D = STAB n consisting of n-qubit stabilizer states. Inspection of the proof immediately shows that the statement is applicable to any dictionary D (independently of, e.g., whether or not it is finite). In particular, Theorem 4.1 implies that in runtime upper bounds, the quantity χ D can always be replaced by (the potentially much smaller quantity) ξ D (Ψ)/δ 2 , at the cost of introducing an O(δ)-error in L 1 -distance in the sampled distribution. For example, using the naive norm estimation algorithm (i.e., inserting into (63)), this gives a quadratic scaling (for computing output probabilities) in ξ D (Ψ). Note that here we are assuming that a decomposition of Ψ with squared L 1 -norm γ 2 1 of coefficients achieving ξ D (Ψ) is given.

Fast norm estimation and approximate simulation
In pioneering work, Bravyi and Gosset [17] improved upon the sketched algorithm in the case of stabilizer states. This was achieved by replacing the O(χ 2 )-runtime (naive) estimation algorithm χnaivenorm for the norm of a superposition of stabilizer states by a probabilistic algorithm χfastnorm. With success probability at least 1 − p f , the algorithm χfastnorm provides an estimateN of the squared norm N = Ψ 2 of a superposition Ψ = χ j=1 γ j ϕ j of n-qubit stabilizer states {ϕ j } χ j=1 with multiplicative error (i.e.,N ∈ [(1 − )N, (1 + )N ]), and has runtime O(χ · n 3 −2 p −1 f ) The key observation underlying the algorithm is the fact that the norm of interest can be expressed as i.e., it is proportional to the expected (squared) overlap of Ψ with a state |Θ drawn uniformly at random from the set of n-qubit stabilizer states. Given {γ j , ϕ j } χ j=1 , this algorithm proceeds by taking R = p −1 f −2 stabilizer states Θ 1 , . . . , Θ R chosen uniformly from the set of all stabilizer states, and producing the estimateN of N . Importantly, expression (65) can be computed from (the descriptions of) {Θ k } R k=1 , {ϕ j } χ j=1 and the coefficients {γ j } χ j=1 , using χ · R calls of a subroutine overlap which computes the overlap of two stabilizer states (including phases). This is because each summand in (65) can be written as a sum of χ such products. The resulting runtime of this norm estimation algorithm is thus O(χ · R · time(overlap)) which amounts to the claimed runtime O(χ · n 3 −2 p −1 f ). We note that a similar reasoning can be applied to any situation where the norm of a superposition of dictionary elements of interest can be expressed as in Eq. (64) as the expected overlap of the inner product of Ψ with a state Θ randomly chosen according to a suitable distribution over dictionary states. Specifically, as derived in Appendix B of [58] and discussed below (see Section 4.4), this is the case for the set of fermionic Gaussian states. The corresponding norm estimation algorithm then has a runtime of the form where samplestate is an algorithm producing a description of a state Θ drawn randomly form the appropriate distribution. Importantly, the runtime (67) is linear in χ, resulting in a linear dependence when replacing χ by the stabilizer extent ξ D (Ψ) as discussed in Section 4.2. Algorithms (approxevolve, approxmeasureprob, approxpostmeasure) can now be obtained by using χfastnorm in place of χnorm. The algorithm approxevolve is identical to χevolve since it does not involve norm computations. In contrast, approxmeasureprob is a probabilistic algorithm that can fail with probability p f and both approxmeasureprob and approxpostmeasure introduce an error (in the sampled distribution and the post-measurement state, respectively). This is because χfastnorm only estimates the norm of a superposition.
Finally, replacing χ by the D-extent ξ D (Ψ), see Section 4.2 results in a triple of approximate algorithms (approxevolve, approxmeasureprob, approxpostmeasure) with parameters ( , δ, p f ) describing the quality of approximation and failure probability as discussed in Section 1.5. By construction, the runtimes of these algorithms are . (68)

Fermionic linear optics with non-Gaussian initial states
The algorithms described above can be adapted in a straightforward manner to the problem of classically simulating fermionic linear optics with non-Gaussian initial states: We can simply use the efficient description of Gaussian states introduced in Section 3 and make use of the associated procedures overlap, evolve as well as measureprob and postmeasure. In particular, observe that combining Eq. (63) with the runtimes O(n 3 ) for the algorithms evolve, postmeasure and overlap, and O(1) for measureprob (see Section 3) results in the runtimes given in Table 3 for exact simulation.
To get a linear scaling in the Gaussian extent ξ Gn (Ψ) of the initial state (for approximate simulation), the naive norm estimation needs to be replaced. A fast norm estimation scheme for superpositions of fermionic Gaussian has been described in Appendix C of Ref. [58]: Consider the following probabilistic process defined for a superposition Ψ = χ j=1 γ j ϕ j , ϕ j ∈ G n , γ j ∈ C of n-mode fermionic Gaussian states: (i) Sample K random Gaussian states {Θ k } K k=1 independently and identically from the distribution induced by picking a permutation π ∈ S 2n and a string y ∈ {0, 1} n uniformly at random and outputting |Θ(π, y) = U Rπ |y .
A description of a state proportional to Θ k can be computed from the associated pair (π k , y k ) ∈ S 2n × {0, 1} n using the subroutine describe, for each k ∈ [K], as follows (see Fig. 15 for pseudocode for the algorithm). The covariance matrix Γ k = R π k Γ(|y )R † π k of such a state can be computed in time O(n 3 ) from (π k , y), and applying describe to Γ k gives the desired description. We note that any such state can be used in place of Θ k since the expression (69) (and, in particular, (66)) does not depend on the global phase of Θ k . With the definition (69), it follows that the probabilistic process described here can be simulated in time given by Eq. (67) using K calls to the subroutine samplestate shown in Fig. 15, and subsequent use of overlap to compute the empirical average (69). π ← uniform random permutation in S 2n
Because the runtimes of describe and overlap are both upper bounded by O(n 3 ), this leads to an overall runtime of O n 7/2 −2 p −1 f χ of this algorithm for computing the estimateN of Ψ 2 . We note this conclusion about the runtime was also reached in Ref. [58], although the issue of a potential lack of a phase reference applicable throughout the computation was not considered there. This issue is resolved by our description of Gaussian states, see Section 3.
Combing this algorithm with the runtimes given in Eq. (68) and with the runtimes O(n 3 ) for the algorithms evolve, postmeasure and overlap, and O(1) for measureprob (see Section 3) gives runtimes claimed in Table 4 for the algorithms approxevolve, approxmeasureprob and approxpostmeasure.

Efficient additive-error strong simulation
In a different direction of generalization, building upon the work [17] and making innovative use of a concentration inequality by Hayes [60] for vector martingales, Ref. [26] gives a randomized algorithm which, for a state Ψ obtained by applying n-qubit Clifford gates and t (non-Clifford) Tgates to |0 ⊗n , provides an additive-error estimatep(x) for the probability p(x) = ( x| ⊗ I ⊗(n−a) )|Ψ 2 of observing a qubits in the state |x , with x ∈ {0, 1} a . The algorithm is based on a procedure by which the probability p(x) of interest is expressed in terms of the squared norm 0| ⊗t−r ⊗ I ⊗r W |Ψ 2 of a partially projected state, where Ψ is a tensor product of t nonstabilizer single-qubit states (arising from gadgetization), W a certain Clifford unitary, and r a circuit-dependent integer. The failure probability of the constructed algorithm is then upper bounded (see [26,Theorem 3]) by an expression depending on p(x), the error , the stabilizer rank ξ STABn (Ψ) (taking the role of χ) of the product state Ψ, as well as two additional parameters than be chosen freely, but enter into the (polynomial) runtime estimate. The described method of adapting fast algorithms for simulating Clifford circuits with nonstabilizer initial states can be applied in a similar manner to this algorithm, since this also reduces to computing inner products (including phases) between Gaussian states.

Multiplicativity of the Gaussian fidelity for 4 fermions
The main result of this section is a proof that the fermionic Gaussian fidelity is multiplicative for the tensor product of any two positive-parity 4-fermion states.
We begin in Section 5.1 by laying out some specific properties of 4-fermion states. We discuss a Gaussianity condition specific to 4-fermion states [56] and we write an explicit expression for any 4-fermion state as a superposition of two orthogonal (Gaussian) states. This was first introduced in Refs. [54,56]. In Section 5.2 we establish properties of the fermionic Gaussian fidelity for 4-fermion states which are subsequently used in Section 5.3 to prove that the fermionic Gaussian fidelity is multiplicative for the tensor product of any two 4-fermion states.

Four-fermion Gaussian and non-Gaussian states
Key to our considerations is a certain antiunitary map θ acting on H 4 + , the positive-parity subspace of 4 fermions spanned by {|x } x∈{0,1} 4 + . It is defined by its action for x ∈ {0, 1} 4 + , on basis states (antilinearly extended to all of H 4 + ), where ϑ(x) = (−1) x 1 +x 3 = (−1) x 2 +x 4 = ϑ(x). Here x = (x 1 , . . . , x n ) is obtained by flipping each bit of x. The relevant properties of this map are the following. We note that the following statement has been given in [56,Eq. (9)], along with a negative-parity version. Proof. This follows from the Gaussianity criterion given in Lemma 2.3. We give the proof in Appendix A.
Lemma 5.2. We have θc j c k = c j c k θ for all j, k ∈ [8].
Proof. See Appendix B.
Let Ψ be such that (73) holds. We define "real" and "imaginary" parts of Ψ by the expressions It follows immediately from this definition that where we used that θ is an antiunitary in the first step, and assumption (73) in the last step. The claim now follows by setting Theorem 5.4 ( [54,56]). Let Ψ ∈ H 4 + be an arbitrary unit vector. Then there are a Gaussian pure state Ψ g ∈ G + 4 , ϕ ∈ [0, 2π) and f ∈ [1/2, 1] such that the state θΨ g is Gaussian and orthogonal to Ψ g and The triple (Ψ g , ϕ, f ) is uniquely defined by Ψ, i.e., a function of Ψ. Furthermore, the quantity f = f (Ψ) is invariant under the action of Gaussian unitaries associated with special orthogonal rotations: We have for any Gaussian unitary U = U R with R ∈ SO(2n) .
Rewriting Eq. (72) by expressing (Ψ 1 , Ψ 2 ) in terms of (Ψ + g , Ψ − g ) gives The claim (74) now follows with (76). It remains to show property (75) of the function f . This follows immediately from the fact that the antiunitary θ commutes with all quadratic monomials c j c k of Majorana operators (see Lemma 5.2), and hence with any Gaussian unitary U = U R with R ∈ SO(2n), i.e., U θ = θU . Retracing the steps of the proof, it is easy to check that if (Ψ 1 , Ψ 2 ) are the states of Lemma 5.3, and Ψ g the state in expression (74) for Ψ, then the corresponding states (Ψ 1 , Ψ 2 ) and Ψ g for the state Ψ = U Ψ are given by Ψ j = U Ψ j for j ∈ [2] and Ψ g = U Ψ g , respectively. This implies the claim.

The Gaussian fidelity for 4-fermion states
For a subset E ⊂ {0, 1} 4 , we define E = {x | x ∈ E}. We also write Π E = x∈E |x x| for the projection onto the span of {|x } x∈E .
Proof. Let f = f (Ψ) ∈ [1/2, 1], ϕ ∈ [0, 2π) and Ψ g ∈ G + n be as in Theorem 5.4 such that We define We claim that we have the identities Observe that these two identities immediately imply (77): Using expression (78), we have where we defined the vectors α = (α x ) x∈E , β = (β x ) x∈E ∈ C 4 . Since (79) and (80) are equivalent to the statement that we obtain by the Pythagorean theorem in C 4 (using (83)) and by maximizing over (α, β) satisfying (82). Inserting (84) into (81) results in the upper bound (77) on Π E Ψ 2 . It remains to prove the claimed identities (79) and (80). We argue that these are a consequence of the fact that Ψ g is normalized and Gaussian, respectively.
Proof. Observe that by definition of the antiunitary θ, we have In particular, this implies that Eq. (79) now follows from the fact that Ψ g is normalized and positive-parity: we have where we used the definition of α x and (85) in the first step, and the assumption E ∪E = {0, 1} 4 + in the second identity. Similarly, Eq. (80) is a consequence of the fact that Ψ g is Gaussian: we have where we used the definition of α x and β x in the first step, the fact that Ψ g ∈ H 4 + in the second step, and the characterization of Gaussianity from Lemma 5.1 in the last identity.
Lemma 5.5 immediately implies the following expression for the fermionic Gaussian fidelity. We note that a more general expression for the "Gaussian fidelity" of a mixed state has previously been obtained in [56]. The proof for pure states given here is more elementary and illustrates the use of Lemma 5.5.
Theorem 5.6 (Fermionic Gaussian fidelity for 4-mode pure states [54,56]). Let Ψ ∈ H 4 + be a unit vector. Let f (Ψ) ∈ [1/2, 1] be defined as in Theorem 5.4. Then Proof. Let f = f (Ψ), ϕ ∈ [0, 2π) and Ψ g ∈ G + 4 be as in Theorem 5.4. Then we have since θΨ g is orthogonal to Ψ g . It thus suffices to show the upper bound Let Φ g ∈ G + 4 be an arbitrary positive-parity Gaussian pure state. Then there is a Gaussian unitary U = U R with R ∈ SO(2n) and a phase µ ∈ [0, 2π) such that Φ g = e iµ U |0 F . We will use any subset E ⊂ {0, 1} 4 + of even-weight strings as in Lemma 5.5 with the additional property that 0000 ∈ E, e.g., E = {0000, 1100, 1010, 1001}. Then |0 F = Π E |0 F is in the image of Π E . It follows that where we used the Cauchy-Schwarz inequality in the penultimate step, and Lemma 5.5 applied to the state U † Ψ. Since Φ g ∈ G + 4 was arbitrary, the claimed inequality (86) follows by taking the square and using that f (U † Ψ) = f (Ψ), see Eq. (75) of Theorem 5.4.
where we applied Corollary 5.7. Identical reasoning applies to E and yields the inequality Combining Eqs. (90), (91) with Eq. (89), we conclude that . Taking the square and observing that gives the claim.
The following lemma will be useful to prove the main theorem.
Theorem 5.10 (Multiplicativity of the fermionic Gaussian fidelity for 4-mode pure states.). Let H n + be the set of pure n-fermion states with positive parity and let G + n be the set of pure nfermion Gaussian states with positive parity. We have that and by the same reasoning, we have We conclude that it suffices to show that follows trivially from the definition of fermionic Gaussian fidelity in Eq. (2) is a consequence of the Schmidt decomposition for fermionic states put forward in Ref. [61] and of Lemmas 5.8 and 5.9. According to Ref. [61], an arbitrary pure fermionic state Φ ∈ H n admits a Schmidt decomposition of the form With this definition of m x for n = 4, an arbitrary state Φ ∈ G + 8 can be written as in Eq. (87) and the conditions for Lemma 5.8 apply. We have where E ⊂ {0, 1} 4 + with |E| = 4 a subset of even weight strings such that E ∪E = {0, 1} 4 + . Notice that x, y have even-weight and that x = y because E and E are disjoint sets. Identifying m x (whose dependence on θ ∈ R 4 is implicit) with f (θ, x) in Lemma 5.9 (apart from a minus sign that is not relevant because we take the absolute value) we have , giving the claim.
6 Multiplicativity of D-fidelity implies that of D-extent In this section, we show that multiplicativity of the D-fidelity implies multiplicativity of the Dextent. In Section 6.1 we prove this for finite dictionaries: This follows immediately from the fact that F D (Ψ) and ξ D (Ψ) are related by (convex programming) duality. In Section 6.2, we extend this results for infinite, i.e., continuously parameterized dictionaries. We achieve this extension by using (finite) -nets for the set of Gaussian states. Similar approaches have been applied in the signal processing context, see e.g., the work [62], which shows how to approximately solve atomic norm minimization problems for sparse recovery when the parameters indexing the dictionary lie in a small-dimensional space.

Multiplicativity for finite dictionaries
We will restrict our attention to finite dictionaries in this section. For |D| < ∞, the D-fidelity is related to the dual formulation of the D-extent as (see [19,Eq. (3.2)] and [18,Theorem 4]) Let H 1 , H 2 and H 3 be a triple of Hilbert spaces. Let {D j } j∈ [3] be a family of dictionaries, where D j ⊂ H j for j ∈ [3]. We assume that We are interested in the following two properties: As an important example, let n j ∈ N for j ∈ [2], n 3 = n 1 + n 2 , H j = (C 2 ) ⊗n j and let STAB n be the set of stabilizer states on (C 2 ) ⊗n . Then Mult ξ (STAB n 1 , STAB n 1 , STAB n 1 +n 2 ) does not hold for certain (large) choices of n 1 and n 2 [19]. On the other hand, for n 1 , n 2 ≤ 3 the multiplicativity property Mult ξ (STAB n 1 , STAB n 1 , STAB n 1 +n 2 ) holds (see Ref. [18,Proposition 1]). This was shown using that the stabilizer fidelity is multiplicative, i.e. Mult F (STAB n 1 , STAB n 1 , STAB n 1 +n 2 ). We claim the property Mult F (D 1 , D 2 , D 3 ) implies property Mult ξ (D 1 , D 2 , D 3 ).
Proof. We clearly have for all Ψ 1 ∈ H 1 and Ψ 2 ∈ H 2 because of property (93) of the dictionaries {D j } 3 j=1 and the definition (5) of ξ D . To show the converse inequality, assume that y 1 ∈ H 1 , y 2 ∈ H 2 are such that where we used the assumption that property Mult F (D 1 , D 2 , D 3 ) holds to obtain the equality. This implies that y 1 ⊗ y 2 is a feasible point of the dual program for the quantity ξ D 3 (Ψ 1 ⊗ Ψ 2 ), see Eq. (92). Thus Expression (95) together with Eq. (94) gives the claim.

Multiplicativity for infinite dictionaries
In this section, we extend the results of Section 6.2 to dictionaries D that may contain infinitely many elements. Our strategy is to use an -net for D ∈ H with a finite number of elements, we denote by D . We relate the extent and fidelity with respect to the dictionary D to the extent and fidelity with respect to its net D (see Lemmas 6.2 and 6.3) to prove that multiplicativity of the Dfidelity implies multiplicativity of the D-extent in Theorem 6.6. This result is a generalization of Theorem 6.1 (that considered finite dictionaries) for (possibly) infinite dictionaries. We will make use of the notion of -net to replace our infinite set D by a finite set D . Let Ψ = Ψ, Ψ for Ψ ∈ H denote the norm on H. Let D ⊂ H and let > 0. Then a set D ⊂ H is called an -net for D if for any Ψ ∈ D there is some Φ ∈ D such that Φ − Ψ ≤ .
We are interested in the case where for every > 0 there is a finite -net D for D, with the additional property that D ⊂ D, i.e., the net consists of elements of D. A sufficient condition for this being the case is that the subset D ⊂ H is compact. Since m ∈ N was arbitrary, we can take the limit m → ∞ and obtain This gives the claim. Proof. The first inequality follows trivially from the definitions using D ⊂ D. For the second inequality, let Ψ ∈ H be arbitrary. Let ϕ ∈ D be such that Then (by the fact that D is an -net for D and ϕ ∈ D) there is an element ϕ ∈ D and δ ∈ H such that It follows that where we used the definition of F D (Ψ) and the Cauchy-Schwarz inequality (in the penultimate step). The claim follows. where we used that v 2 ≤ v 1 for v ∈ C d . The claim follows.

B Commutativity of the map θ and quadratic Majorana monomials
In the following, we prove Lemma 5.2.