The Fermionic Quantum Emulator

The fermionic quantum emulator (FQE) is a collection of protocols for emulating quantum dynamics of fermions efficiently taking advantage of common symmetries present in chemical, materials, and condensed-matter systems. The library is fully integrated with the OpenFermion software package and serves as the simulation backend. The FQE reduces memory footprint by exploiting number and spin symmetry along with custom evolution routines for sparse and dense Hamiltonians, allowing us to study significantly larger quantum circuits at modest computational cost when compared against qubit state vector simulators. This release paper outlines the technical details of the simulation methods and key advantages.


Introduction
High accuracy simulation of fermionic systems is an important and challenging problem and is a major motivation behind current attempts to develop quantum computers [1,4,12,31]. There has been significant experimental progress towards realizing the simulation of fermionic systems on current quantum devices [3,19]. As these experiments scale in size there is a growing need to understand the possibilities for quantum advantage, with one approach being to characterize the classical emulation complexity of the corresponding quantum circuits. In addition, the efficient emulation of near-term fermionic simulation experiments is crucial for experiment design, algorithm design, and testing. In this work, we describe an implementation of protocols to efficiently emulate quantum circuits describing time evolution under fermionic generators. We name the library that implements these protocols the Fermionic Quantum Emulator (FQE). 1 There have been many developments in quantum circuit simulation and emulation. Broadly these advancements can be classified into algorithmic improvements [6,7,14,14,[16][17][18]26] and computational implementation improvements [24,37,39,41]. However, despite this progress, there remains potential to improve the emulation of circuits relevant to fermionic simulation. For example, in general circuit emulation it is necessary to include additional circuit elements to handle the fermion encoding, but these can be eliminated and the fermionic sign accounted for implicitly in a specialized fermionic emulator. In addition, many fermionic systems have symmetries that can be used to reduce the resource requirements of the classical emulation. For example, molecular and material problems are often described by Hamiltonians that commute with a variety of global symmetry operators, such as the total particle number, total spin, time-reversal, and point-group and crystallographic symmetries. Though there are many open source software packages to carry out quantum chemistry calculations of fermionic systems using these symmetries [35,38,40], these are not designed to support computations using the quantum circuit model. Similarly, existing quantum circuit simulators and the corresponding computational techniques used within them are not naturally suited to efficiently working within symmetry reduced Hilbert spaces.
The FQE is a simulator of fermions represented in second quantization. It corresponds to a statevector simulator in that the wavefunction is explicitly stored. In the first release of the emulator, number and spin (S z ) symmetry are used to minimize the wavefunction memory footprint by using a generalization of the Knowles-Handy [21] determinant based configuration interaction wavefunction organization scheme. Time evolution of the state under a fermionic Hamiltonian is implemented in two ways: (1) via a custom routine similar to the cosine-sine construction for nilpotent operators applicable to sparse fermion generators, and (2) via a series expansion of the propagator for dense Hamiltonians taking advantage of efficient intermediate data structures. The library completely integrates with the fermion quantum simulation library OpenFermion [27] and has built in functions to convert wavefunction objects in FQE to the format used in OpenFermion as well as for the general purpose circuit simulator Cirq [9]. For example, the time evolution of a wavefunction can be generated using OpenFermion's FermionOperator objects or Hamiltonians defined through FQE utility functions. The FQE is an open source library distributed under the Apache 2.0 license [33]. The library is implemented in Python to facilitate code extension and reuse. However, some hot spots have been addressed with high-performance kernels written in the C language. These C extensions can be enabled or disabled by the user. This allows the library to function as either a high-performance library for computationally demanding applications or as a pure-Python library implemented with understandable and easily modifiable expressions.
This work describes the key technical aspects of the library, demonstrates how it can be used in quantum algorithm development, and makes single-thread and multi-threaded timing comparisons for key circuit primitives against Cirq [9] and the highly performant general purpose quantum circuit simulator Qsim [32]. We close with a perspective on the development of the library and future directions.

Wavefunctions and Hilbert space organization
When simulating spin-1 2 fermions in second quantization, the many-particle fermionic Hilbert space (sometimes called the Fock space) generated by a basis of L single-fermion states (termed orbitals) {|φ 1 , |φ 2 , . . . |φ L }, is spanned by basis states (termed determinants) labelled by L-digit binary strings {n 1 n 2 . . . n L } where n ∈ {0, 1}, and n i is the occupancy of orbital i. It is evident that the fermionic Hilbert space is isomorphic to the Hilbert space of L qubits, with the qubit computational basis corresponding to the fermion determinant basis. This allows one to use the traditional state vector representation of qubits to encode fermionic states. For many molecular and materials systems, the electronic Hamiltonian is an operator that commutes with various global symmetry operators. These symmetries thus provide useful quantum numbers to label sectors of Hilbert space. Because the Hamiltonian does not mix sectors, many simulations can be performed either in a single sector, or in a collection of independent sectors. A simple example is simulating a fixed number of electrons n. In this case, the Hilbert space contains only L n states. Because the determinants are eigenstates of the particle number operator, the spanning basis can be chosen to be determinants with total occupancy i n i = n. If we further assume the orbitals are eigenstates of S z (i.e. spin-orbitals), then the determinants are also eigenstates of S z . Then a Hilbert space sector labelled by n and S z is spanned by determinants only with the given n and S z .
The FQE uses such a decomposition into symmetry sectors to store the wavefunction compactly. A user specifies symmetry sectors of fixed particle number and given S z . The total wavefunction is then stored in the direct sum of these sectors. Considering all possible particle sectors and S z sectors corresponds to working in the full many-particle fermionic Hilbert space, and thus storing the full set of 2 L qubit amplitudes.
An efficient way to represent wavefunctions for a fixed particle number and S z sector was first introduced by Siegbahn [36] and refined by Knowles and Handy [21] in the context of the exact diagonalization of chemistry Hamiltonians. The binary string encoding a determinant is separated into a string for the spin-up (α) spin-orbitals and the spin-down (β) spin-orbitals. Commonly these two binary strings (integers) are referred to as the αand β-string respectively. A further simplification arises by assuming that the α and β spin-orbitals share a common set of spatial functions, known as spatial orbitals. Given M spatial orbitals in total, the total number of spin-orbitals is thus L = 2M .

]])
Each element of the input list labels a Hilbert space sector as the triple (n-electrons, S z , M -spatial-orbitals). The wavefunction in each sector is stored as a matrix of size M nα × M n β where n α and n β are the total number of α and β electrons. Each row of the matrix corresponds to a specific α-string, and each column to a β-string. The mapping between the integer values of the α-and β-strings to the row and column indices is somewhat arbitrary; we use the lexical ordering proposed by Knowles and Handy [20]. By storing the wavefunction coefficients as matrices we can leverage vectorized multiplications when performing updates or contractions on the wavefunction coefficients. The memory savings from simulating in specific symmetry sectors is substantial when compared to traditional state-vector simulators which use the full 2 L qubit space. In Figure 1 we show the relative size of half filling (n = L/2) and quarter filling (n = L/4) subspaces with S z = 0 along with the wavefunction memory footprint in gigabytes.
To make the FQE interoperable with other simulation tools we provide a number of utilities for transforming the FQE Wavefunction representation. We also provide human readable printing, saving wavefunctions as binary data in numpy's .npy format, various wavefunction initialization routines for random, Hartree-Fock, or user defined states, and conversion functions to OpenFermion and Cirq wavefunction representations. The printing functionality is demonstrated in the code snippet below.   The conversion to OpenFermion is handled by converting each coefficient in the Wavefunction object with its associated α, β-string to an ordered string of creation operators acting on a vacuum state, with the string represented by an OpenFermion FermionOperator object. This allows the user to leverage the normal ordering routines in OpenFermion. By using the FermionOperator intermediate we can also directly map the string of normal ordered operators acting on a vacuum to a state vector for integration with Cirq simulations. where g is a complex number andĝ is a single product of an arbitrary number (N op ) of creation and annihilation operators that create/annihilate orbitals with spatial orbital indices labelled by i, j and S z indices labelled by σ, ρ. We refer to this Hamiltonian as the excitation Hamiltonian because it only involves a single "excitation" termĝ and its Hermitian conjugate. To specifyĝ the FQE can digest a FermionOperator from OpenFermion. IfĤ excite is diagonal, i.e. a polynomial of number operators (such asâ † 1αâ † 2αâ 1αâ2α = −n 1αn2α , witĥ n 1α =â † 1α a 1α and similarly forn 2α ) evolution of the wave function can be performed using the techniques described later in Sec. 3.3.1. WhenĤ excite is not diagonal and contains no repeated indices (such asâ † 4αâ † 2βâ 1αâ3β ), evolution of the wavefunction is accomplished by where we defineP x as the projector onto the basis of determinants that are not annihilated byx. For other recent applications of this relation in quantum chemistry see e.g.
, a hybrid approach is used with the only additional complication being the fermion parity evaluation that arises from operator reordering.
Evolving under a single fermionic excitation Hamiltonian is the basis for arbitrary fermionic quantum circuit evolution and can be used to simulate arbitrary fermionic Hamiltonian evolution. Because of the isomorphism between fermion and qubit spaces, general qubit Hamiltonians, so long as they are symmetry preserving in the same sense, can be simulated with little overhead within the same scheme either by modifying the code to ignore signs arising from fermion parity or by explicitly inserting swap operations. This raises the possibility to simulate large quantum circuits associated with non-fermionic Hamiltonian evolution, that benefit from the memory saving associated with simulating the fixed "particle" or "spin" sectors of the corresponding qubit Hilbert space.
Though evolution by single excitation Hamiltonian provides a primitive to implement many fermionic circuit simulations, the FQE also implements efficient time-evolution routines for Hamiltonians with special forms and for generic quantum chemical Hamiltonians. These will be discussed in more detail in the following sections.

Evolution of Dense Hamiltonians
The FQE provides special routines to evolve sums of excitation Hamiltonians through series expansions. We begin by discussing dense Hamiltonians, which for the purposes of this work we define as a weighted sum over all possible excitation Hamiltonians with the same index structure. A dense two-particle Hamiltonian, for which the FQE implements special routines, would take the form whereĤ is Hermitian. In addition, specialized code is implemented for spin-conserving spin-orbital Hamiltonians (σ = σ and ρ = ρ in Eq. (5)), and for spin-free Hamiltonians (V ijkl ≡ V iσ,jρ,kσ ,lρ in Eq. (6)), which frequently appear in molecular and materials systems. Other cases which have specialized subroutines discussed below include sparse Hamiltonians (arbitrary sums of excitation Hamiltonians), quadratic Hamiltonians, and diagonal pair Hamiltonians. Time evolution with such dense Hamiltonians is performed by means of series expansions such as the Taylor and Chebyshev expansions. When the time step t is taken to be small, the operator exponential can be efficiently computed by a Taylor expansion that converges rapidly, We evaluate this by recursively computing the action of the operator on the wave functions, with |Ψ 0 ≡ |Ψ , using which the previous expression can be rewritten as We evaluate the norm of each term, |t n |||Ψ n |/n!, and when the norm becomes smaller than the given threshold, we stop the computation. Alternatively, one can specify the number of terms in the expansion, or both. When the spectral range of the operator (i.e., extremal eigenvalues [ min , max ]) is known or can be estimated in advance, the expansion can be made more robust by using the Chebyshev expansion [22]. The Chebyshev expansion is known to be a near optimal approximation of the exponential in a minimax sense. First, we scale the operator such that the eigenvalues are in [−1, +1],Ĥ where ∆ = ( max − min )/2w and shift = −( max + min )/2. The factor w is a number that is slightly smaller than 1 to ensure that the eigenvalues do not lie outside the window due to numerical noise or insufficient accuracy in the estimation of extremal eigenvalues.
In the FQE, we use w = 0.9875. Then we expand as Here J n (x) is the modified Bessel function of the first kind, and |Ψ n is obtained by recursion as We stop the expansion when the contribution from rank n becomes smaller than the given threshold, as in the Taylor expansion above. The dense evolution routines can be accessed through the Wavefunction interface which intelligently dispatches to the specialized routines depending on the form of the Hamiltonian the user provides. As an example, below is a code snippet assuming the user has defined an OpenFermion MolecularData object called 'molecule'. The Wavefunction interface provides access to the apply_generated_unitary routine which has options for each series expansion evolution described above. The above algorithms that use the series expansion are also employed for time evolution with sparse Hamiltonians, which are arbitrary weighted sums of multiple excitation Hamiltonians. In this case, the apply function used in Eqs. 9 and 13 is overloaded by a special function for sparse Hamiltonians, in which the action of the operator is evaluated one term at a time.

Diagonal pair Hamiltonians
Hamiltonians that are diagonal in the determinant basis are particularly simple as they are polynomials of number operators with terms that all commute. Evolution under such Hamiltonians appears as an important quantum circuit for chemical Hamiltonian Trotter steps and certain variational ansätze [23,29]. The FQE leverages the mutual commuting nature of the terms of Hamiltonians of this form to efficiently evolve under the generated unitaries. Specifically, the action of the unitary generated by diagonal pair Hamiltonians of the formĤ wheren r is the spin-summed number operator for spatial orbital r (i.e.,n r = σâ † rσâ rσ ), is constructed by iterating over all determinants and generating a phase associated with the coefficient W rs for each determinant. A direct wall clock time comparison to other simulators is made in Figure 2 by translating evolution under a random diagonal pair Hamiltonian (of the form in Eq. (14)) into a series of CPHASE gates with phases corresponding to 2W rs . For the half filling circuits on 14 orbitals (i.e., 14 electrons in 28 spin orbitals to be represented by 28 qubits) we observe that the python reference implementation of FQE is approximately eight times faster than Cirq and five times faster than single-threaded Qsim. The C-accelerated FQE single threaded is five hundred times faster than Qsim single threaded with default gate fusion. The C-accelerated FQE on twenty threads is twenty times faster than QSim using twenty threads and gate-fusion level eight for the same 14 orbital half filled system. FQE performance gains for lower filling fractions are even more substantial.   left The Python reference implementation of FQE is compared against Cirq's Python circuit evolution routines; right C-accelerated FQE is compared against Qsim, a highly optimized C++ circuit simulator with and without threading. FQE computes the results in-place. Qsim is executed in single precision mode with one (green dashed line) and twenty (green solid line) threads with gate gate-fusion set to eight through the qsimcirq python interface. Quantum circuits for the time evolution generated by the diagonal Coulomb Hamiltonian are constructed using 4M 2 CPHASE gates where M is the number of spatial orbitals. FQE at half filling with one and twenty threads is shown in blue while quarter (single thread) filling is in red.

Quadratic Hamiltonians
Evolving states by quadratic Hamiltonians is another important algorithmic primitive. In the quantum circuit context, it is closely related to matchgate circuit simulation [42]. The FQE performs this task by transforming the wave functions into the orbital representation that diagonalizes the quadratic Hamiltonian, evolving in the quadratic Hamiltonian eigenbasis, and then returning to the original basis. The algorithm is based on those in the quantum chemistry literature [5,25,28] that utilize the LU decomposition, with an improvement in the handling of pivoting in the LU decomposition. Using the unitary X that diagonalizes the operator matrix inÂ = ij (A) ijâ † iâ j , i.e., X † AX = a, one obtains exp(−iÂt)|Ψ =T (X † ) exp(−iât)T (X)|Ψ (15) whereâ is the diagonal operator after orbital rotation, andT (X) is the transformation on the wave function due to the change in the orbital basis described by the unitary X.
Here we ignore the pivoting for brevity; see Appendix B for more algorithmic details. The overall cost of rotating the wave function (i.e., action ofT (X)) is roughly equivalent to the cost to apply a single dense one-particle linear operator to a wave function. The diagonal operator is then applied to the rotated wave function, followed by the back transformation to the wave function in the original orbital basis. Figure 3 compares the wall clock time required to evolve a random quadratic Hamiltonian. For Cirq and Qsim, a circuit is generated using the optimal Givens rotation decomposition construction of Clements [10] in OpenFermion which is translated into a sequence of 2M 2 gates of the form exp (−iθ(XY − Y X)/2) and Rz. For the half filling circuits on fourteen orbitals (twenty eight qubits) we observe that the python-FQE LU decomposition algorithm is approximately forty times faster than Cirq. The single-thread C-accelerated FQE is approximately ten times faster than single-threaded Qsim with gate-fusion set to four. For twenty threads the C-accelerated FQE is five times faster than Qsim. Lower filling fractions show even more substantial savings. This illustrates the large advantage physics, symmetry, and algorithmic considerations can provide when targeting specialized quantum circuit emulation.

Operator Action on the Wavefunction
The FQE provides a variety of methods to apply fermionic operators to states. The general code allows a user to define an arbitrary individual excitation operator and apply it to a wavefunction. This functionality constructs the action of the user defined operator by computing the sparse matrix elements of such an operator in the determinant basis, which are then contracted against the wavefunction coefficients to obtain the action of the operator on the state.
The FQE also provides efficient methods to apply dense fermionic operators (such as a dense Hamiltonian) to wave functions, because these are the fundamental building blocks for fermionic time evolution under dense Hamiltonians (see, for example, Eqs. (9) and (13)). It is equally important for reduced density matrix (RDM) construction, efficient access to which is a crucial aspect of many fermionic simulation algorithms. In the following, we present the algorithms to apply dense spin conserving, spin-free Hamiltonians (Eq. (7)), but the FQE implements similar algorithms for S z conserving spin-orbital Hamiltonians (Eq. (6)) and S z non-conserving Hamiltonians (Eq. (5)) as well.

Knowles-Handy algorithm
In the Knowles-Handy algorithm [21], a resolution of the identity (RI) is inserted in the n-electron determinant space after operator reordering (1 = K |K K|) as where I, J, and K label determinants and C I is the wave function coefficient associated with determinant |I . The second term is trivially evaluated. The primitive operations for the first term are This form of the RI insertion has the advantage that it can be easily generalized to apply 3-and 4-particle operators, since these primitive operations can be performed recursively, i.e., The step in Eq. (18) is typically a computational hot spot and is performed using efficient numpy functions. For efficiency, the expectation values I σ |â † iσâ jσ |J σ are precomputed for the α-and β-strings and stored in an FQE object FciGraph. To facilitate the evaluation, the intermediate tensors are also stored using the α-and β-strings; for example, D I ij in Eq. (17) is stored as a four-index tensor D IαI β ij where I = I α ⊗ I β . When the Hamiltonian breaks spin symmetry (and, hence, S z symmetry), the mappings between strings that have N σ ± 1 electrons are also generated ( I σ |â † iσ |J σ and I σ |â iσ |J σ ) which are stored in the FQE object FciGraphSet. From these quantities, I|â † iσâ jσ |J can be easily computed on the fly.

Harrison-Zarrabian algorithm for low filling
Low filling cases are computed using the Harrison-Zarrabian algorithm [15], which is closely related to the algorithm developed by Olsen et al. [30]. This algorithm inserts the RI in the n − 2 electron space (1 = L |L L|) as where I and J label a determinant with n electrons and L labels determinants with n − 2 electrons. This is computed by the following primitive steps, The advantage of this algorithm, in comparison to the Knowles-Handy algorithm, is that the number of determinants for the RI can be far smaller than the number of the original determinants. This is pronounced when the wave function is at low filling. The disadvantage, however, is that there is certain overhead in computational cost because one cannot perform spin summation in Eq. (23). Therefore, the FQE switches to this algorithm when the filling is smaller than 0.3. Note that a similar algorithm can be devised for the highfilling cases by sorting the operators in the opposite order, but this is not implemented in the FQE. When such a system is considered, one can reformulate the high-filling problem into a low-filling one by rewriting the problem in terms of the hole operatorsb iσ = 1 −â iσ .

Olsen's algorithm
The approach of Knowles and Handy, while being simple to implement with optimized linear algebra routines in Python, can be further improved by taking advantage of the sparsity in the D J ij appearing in Equation 17. For example, D J ij will be zero if determinant J has orbital i unoccupied or orbital j doubly occupied. This means that extra work is performed during the most expensive step of the algorithm, Equation 18. Olsen and coworkers were able to leverage this sparsity by avoiding explicit reference to intermediate states [30]. In this approach, the same replacement lists used by Knowles and Handy in the construction of the D J ij are used to loop over only the non-zero Hamiltonian elements connecting bra and ket. The essence of the algorithm is to structure the loops in such a way that vector operations can be used. Details of the algorithm are given by Olsen et al. [30] and clearly reviewed by Sherrill and Schaefer [34]. While Olsen's algorithm cannot be efficiently implemented in pure Python, it is the basis of the FQE C extension for applying dense, two-particle Hamiltonians.

RDM computation
The FQE provides efficient routines using the intermediate tensors in the above algorithms to efficiently compute 1-through 4-RDMs. For example, a spin-summed 2-RDM element Γ ijkl defined as can be computed either using the intermediate tensor in the Knowles-Handy algorithm (Eq. (17)), or, in low filling cases, using the intermediate in the Harrison-Zarrabian algorithm, The corresponding generalization of Olsen's algorithm is used when the C extensions are enabled. In both Python algorithms, efficient numpy functions for matrix-matrix and matrix-vector multiplication are used. Spin-orbital RDMs (i.e. without the spin summation in Eq. (26)) are computed similarly. In addition to the standard (i.e., particle) RDMs, the FQE can compute RDMs that correspond to other operator orders, e.g., hole RDMs. This is done by performing Wick's theorem as implemented in fqe.wick and evaluating the resulting expression using the particle RDMs. For example, the spin-summed 2-body hole RDM element (Γ h ijkl ) defined as is computed as The FQE implements the spin-orbital counterpart of this feature as well.
For spin conserving Hamiltonians, the FQE also provides up to 4-body spin-summed RDMs and up to 3-body spin-orbital RDMs in the OpenFermion format. An example is: The implementation to compute RDMs offers a significant improvement over OpenFermion's native RDM generators which map Pauli operator expectations to RDM elements. As an example of this efficient RDM computation the FQE library provides a base implementation of the energy gradient due to a two-particle generatorĜ iσ,jρ,kσ ,lρ =â † iσâ † jρâ kσ â lρ , which corresponds to evaluating g iσ,jρ,kσ ,lρ = Ψ|[Ĥ,Ĝ iσ,jρ,kσ ,lρ ]|Ψ for all spatial and spin indices.

Closing thoughts and future directions
The Fermionic Quantum Emulator (FQE) described above is an open source library under the Apache 2.0 license [33]. Currently, the library is completely implemented in Python to facilitate extension and code reuse. Despite not being written in a high performance programming language, the FQE's algorithmic advantages allow us to outperform even heavily optimized quantum circuit simulators. In future releases, current computational bottlenecks will be addressed with additional performance improvements. The Fermionic Quantum Emulator (FQE) provides a user-friendly and developer-friendly code stack without sacrificing performance. Algorithmic advantages allow the FQE to clearly outperform both Python-based and heavily optimized quantum circuit simulators, even at half filling, with our C-accelerated code. Away from half filling, the efficiency of FQE is even more pronounced. Extending the code to GPUs or other architectures may lead to even better performance.
Though this first release of the FQE focuses on exact evolution in a statevector representation, our long-term goal is to provide implementations of fermionic circuit emulation that exploit symmetry within different simulation strategies. For example, approximate representation using fermionic matrix product states, can also be adapted to this setting. Ultimately, a variety of such techniques will need to be explored to reach an honest assessment of possible quantum advantage when simulating molecular and materials systems.

Acknowledgments
NCR thanks Ryan LaRose for testing and feedback on the code.
In addition,ĝ †ĝ andĝĝ † are both diagonal in the computational basis, because any of the unpaired operators inĝ would be paired with its conjugate inĝ †ĝ andĝĝ † . For example, letĝ = gâ † 4â † 2â 3â1 (we have omitted the spin indices for brevity), then It is worth noting that the parity associated with this reordering for normal-ordered g (all creation operators to the left of annihilation operators) is always even. We define the square root of this diagonal operator with the following phase convention, where we used the fact that the action of the number operators gives 0 or 1, and therefore, they are idempotent. Using these expressions, the Taylor expansion of the evolution operator is exactly re-summed (where the summations are performed separately for the odd and even rank terms) as Denoting the projection to the set of determinants that are not annihilated by an operator x asP x , together with the convention in Eq. (33), this can be further simplified to Eq. (3).

B Basis change: Evolution under quadratic Hamiltonians
In this work, we make use of the following primitive. Given an orbital basis {|φ } and manyelectron wave function |Ψ = I(φ) C I(φ) |I(φ) and a linear transformationX|φ → |φ , we wish to re-express |Ψ = I(φ ) C I(φ ) |I(φ ) where |I(φ ) is a determinant in the new orbital basis {|φ }. There have been several discussions of how to implement this transformation efficiently [5,25,28]. In the original work by Malmqvist [25], it was understood that the transformation can be performed by the successive application of one-body operators to the wave functions, in which the operator was computed from the LU decomposition of the orbital transformation matrix X (X ij = φ i |φ j ). Mitrushchenkov and Werner (MW) [28] later reported that the pivoting in the LU decomposition is necessary to make the transformation stable. In their work, a LAPACK [2] function was used to perform the LU decomposition with pivoting on the rows of the orbital transformation matrix. This leads to reordering of the orbitals in the determinants and additional phase factors in C I(φ ) that make it complicated to, for example, evaluate the overlap and expectation values. Therefore, we have implemented a column-pivoted formulation of the MW algorithm, which is presented below. We will show the spin-free formulation (i.e. same transformation for α and β orbitals) for spin-conserving wave functions as an example, but the procedure for the spin-broken case can be derived in the same way. First, we obtain the column pivoted LU decomposition. This is done using numpy.linalg.lu (which pivots rows) for the transpose of the orbital transformation matrix X, whereL andŪ are lower-and upper-triangular matrices, and the diagonal elements ofL are unit. It is then easily seen that, by taking the transpose, one obtains whereŪ T andL T are lower and upper triangular, respectively. If one scales the rows and columns ofŪ T andL T , respectively, such that the diagonal elements ofŪ T become unit, this can be rewritten as in which L and U are lower-and upper-triangular matrices with the diagonal elements of L being unit, and P =P T . These are done in the ludecomp and transpose_matrix subfunctions in the wavefunction.transform function. When pivoting is not considered (i.e., P = 1), wave functions are transformed as follows (see Ref. [25] for details). Suppose that the spin-free MO coefficients are rotated as Using L and U, we compute the matrix elements which is performed in the process_matrix function. It has been shown [25] that this operator can be used to transform the many-body wave functions. LetT be the oneparticle operator associated with this transformation, i.e., |Ψ =T (LU)|Ψ .
In the context of exact evolution with a quadratic Hermitian operator, the orbital transformation is performed to diagonalize the quadratic operator. Assume that the operator isÂ = ij A ijÊij withÊ ij = σ a † iσ a jσ ; we diagonalize it so that X † AX = a where a is a diagonal matrix. Note that X is unitary. Using the column-pivoted LU decomposition (Eq. 37), the propagation can be performed as whereâ is a diagonal operator,â To summarize, in this column-pivoted formulation, the pivot matrix P is used only to reorder the orbitals in the diagonal operator. This is more trivial and efficient than reordering the orbitals in the wave functions.