Accelerating Quantum Computations of Chemistry Through Regularized Compressed Double Factorization

We propose the regularized compressed double factorization (RC-DF) method to classically compute compressed representations of molecular Hamiltonians that enable efficient simulation with noisy intermediate scale (NISQ) and error corrected quantum algorithms. We find that already for small systems with 12 to 20 qubits, the resulting NISQ measurement scheme reduces the number of measurement bases by roughly a factor of three and the shot count to reach chemical accuracy by a factor of three to six compared to truncated double factorization (DF) and we see order of magnitude improvements over Pauli grouping schemes. We demonstrate the scalability of our approach by performing RC-DF on the CpdI species of cytochrome P450 with 58 orbitals and find that using the resulting compressed Hamiltonian cuts the run time of qubitization and truncated DF based error corrected algorithms almost in half and even outperforms the lambda parameters achievable with tensor hypercontraction (THC) while at the same time reducing the CCSD(T) energy error heuristic by an order of magnitude.

Many NISQ algorithms such as the variational quantum eigensolver (VQE) [2] are essentially methods to reduce circuit depth at the expense of requiring many repetitions, also called shots.In a similar fashion, error mitigation techniques [3,4] create a tolerance for the noise of NISQ devices by further increasing the number of shots needed to obtain a final result.The total number of shots is thus often the limiting factor on the path towards quantum advantage.This is a particularly pressing issue in in quantum chemistry simulation.Here, molecular Hamiltonians, which in their second-quantized form have O(n 4 ) terms, where n is the number of spatial orbitals, need to be measured with very high accuracy.Naive measurement schemes require an extremely fast growing number of distinct observables and total number of shots [5] for reaching the required accuracy.
Methods to cope with this problem fall in two broad classes.First, methods [6,7,8] which, starting from a decomposition of the observable into Pauli operators, group or otherwise combine these Pauli operators into sets that are jointly measurable with no or only minimal increase in circuit depth.Second, methods which yield a compressed and possibly approximate representation of the original Hamiltonian in the form of a tensor contraction.For fermionic second-quantized Hamiltonians, these are mainly density fitting [9], tensor hypercontraction (THC) [10,11,12], and double factorization (DF) [13,14] (see [14] for a comparison).While the Pauli grouping methods are applicable to general qubit Hamiltonians, the second class of methods typically yields better performance when applicable [5].
These compressed representations also enable drastic resource reductions in leading fault tolerant algorithms for the simulation of chemistry based on linear combinations of unitaries (LCU) and qubitization [15,12,16,17,11].Here run time is mainly a function of the so-called lambda parameter.Its precise definition depends on the algorithm and will be discussed later, but it can be thought of as a norm-like quantity that depends on the magnitude of the coefficients of the representation of the Hamiltonian.THC typically yields lower lambda parameters than existing DF schemes.The fact that some tensors in the THC decomposition are non-square and nonunitary causes other overheads and complications [17] which does not make THC a viable option for typical NISQ quantum algorithms.
In contrast, explicit double factorization (X-DF) and compressed double factorization (C-DF) [18,19,20,21,22] naturally yield a NISQ-friendly measurement scheme that only requires a linear depth orbital/Givens rotation circuit before the final measurements, is compatible with particle number post selection, and also an LCU representation of the Hamiltonian suitable for error corrected algorithms based on qubitization [23].
The X-DF measurement scheme reduces the number of distinct measurement bases to at most n(n + 1)/2 and drastically decreases the number of shots to reach a target accuracy, when compared to Pauli-based schemes.The number of bases can be further reduced by truncating the X-DF representation of the Hamiltonian, thereby making the representation approximate.This can reduce the required number of shots, but the error resulting from the now approximate representation of the Hamiltonian quickly outweighs this.C-DF is designed to overcome this issue by performing a tighter least-squares numerical tensor fitting of the molecular Hamiltonian to truncated double-factorized form.By lifting a rank constraint in the equation defining the X-DF Hamiltonian and using the resulting additional freedom to improve the representation of the molecular Hamiltonian by means of parameter optimization starting from a truncated X-DF guess, it achieves lower approximation errors than truncated X-DF.However, when attempting practical deployment of C-DF in the context of quantum algorithms, one encounters an additional major barrier: the optimization of the C-DF tensor fitting to minimize least squares error does not consider the variance properties of the resulting representation.In practice, this means that the variance of the resulting energy estimator can erratically fluctuate and can be orders of magnitude higher than the variance of the X-DF energy estimator and the approximation error of both X-DF and C-DF.
In this work we propose the regularized compressed double factorization method (RC-DF) to fix this.RC-DF uses the same functional form of the compressed Hamiltonian as C-DF but it adds a regularization term to the C-DF cost function that is used when optimizing the parameters of the compressed representation 1 .The regularization term stabilizes the optimization and reduces the variance of the resulting NISQ energy estimator as well as the λ parameter determining the resources of fault tolerant quantum algorithms.We find that RC-DF consistently outperforms both previous double factorization schemes in terms of variance, approximation error, and lambda parameter and even yields lambda parameters lower than THC.

Comparison of factorization methods
We start from the well known form of the secondquantized electronic structure Hamiltonian where are the symmetric one-electron integrals and the real and 8-fold symmetric two-electron integrals with Z m and r m the charges and positions of the 1 When working out the implications of RC-DF for fault tolerant quantum algorithms, we became aware that a similar erratic behavior of the λ parameter of THC had been observed in [11] and an L1 regularization has been proposed as a cure there.
nuclei and ϕ the spacial molecular orbitals, and Êpq := p † q+ p † q is the singlet excitation operator.
The exact X-DF representation of the Hamiltonian is determined by diagonalizing the modified one-electron integrals tensor F pq and doubly diagonalizing the two-electron integrals tensor to obtain and where the U t pk result from diagonalizing the V t pq = n k U t pk Λ k U t qk and consequently Z t kl = Λ k g t Λ l is, for every t, a symmetric outer product, hence of rank one, and the U t pk are unitary (in fact without loss of generality special orthogonal).The second factorization is possible whenever (pq|rs) is real and 8-fold symmetric (as is always the case for non-relativistic Coulomb repulsion integrals), as this is enough to ensure that the V t pq are not only orthogonal but also real and symmetric for every t (see Lemma 1 in Appendix D).With n t equal to the maximum number n(n + 1)/2 of non-zero eigenvalues of (pq|rs) the Hamiltonian can then be written exactly (see Appendix F for the full derivation) as where is independent of the state, and U ∅ and U t rotate the orbitals for each t according to U t pk (See Fig. 1), Ẑk , Ẑk are respectively pauli operators on qubit 2k and 2k If the sum over t is ordered according to |g t |, the Hamiltonian can be approximated with a truncated X-DF representation with fewer terms (also called leafs).
In (R)C-DF the rank-one constraint on the Z t kl is lifted and they are allowed to be arbitrary symmetric matrices.The orbital rotations U t pk and coefficients Z t kl are then obtained using a two-step gradient based optimization procedure by first exponentially parametrizing the orbital rotations U t pq := exp(X t ) pq via anti-symmetric generators X t pq and then minimizing the squared Frobenius norm (in [16] this is called the incoher-ent error) of the difference between the left and right hand side of (5) for some pre-set n t ≤ n (n+1)/2 starting from a truncated X-DF initial guess (for details see [18]).
Irrespective of whether a Hamiltonian representation of the form (6) was found via X-DF or (R)C-DF, the energy can be then be measured by means of quantum circuit with a linear gate depth overhead of the form shown in Fig. 1.
The most favorable resource estimates of quantum algorithms for error corrected simulation of chemistry with qubitization have been obtained with THC [17,11].In this context THC approximates the two-body part of the Hamiltonian ac-  (6).From each U ∅ and U t the parameters of a square shaped fabric of givens gates G can be computed.The results of Ẑ and Ẑ ⊗ Ẑ measurements in these n t + 1 distinct bases can then be contracted against the F ∅ k and Z t kl tensors to obtain an energy estimator.
cording to with M k χ k p χ k q ≤ 1 and M ≤ n 2 the THC rank.Also here the symmetric ζ kl and rectangular χ k p are found by means of minimizing the Frobenius norm error.When DF is used with qubitization [12,16] it is usually presented and performed according to where the L t pq can be found with Cholesky decomposition or eigen decomposition as L t pq = √ g t V t pq and the scalar λ t k = √ g t Λ k are the eigenvectors and the U t the diagonalizing unitaries of the L t pq (here g t , V t pq , and Λ k refer to the quantities introduced in the context of X-DF above).For a NISQ measurement scheme such as that in Fig. 1 it makes no sense to partially discard a leaf (because the measurement data is available for all contributions from a leaf), but in qubitization truncation can be done on the level of setting individual λ t k equal to zero.The (R)C-DF form of the Hamiltonian, in which Z t kl is no longer rank one, can be re-cast as a sum of operator squares similar to (10).By taking the matrix square root W t kl := ( √ Z t ) kl so that Z t kl = n i W t ki W t li (we use the implementation in scipy of the algorithm from [24], which works also for non-positive Z t kl matrices, in which case the W t ki come out complex, which is compatible with the scheme from [16]) one can write This allows one to run the algorithm of [16] with (R)C-DF Hamiltonians as input.
The block encoding of the qubitization method needs the Hamiltonian in the form of an LCU.The number of ancillary qubits and T gates is then determined by the number of terms of the LCU and a sort of normalization factor, called the lambda parameter [16,17].The result of (truncated) X-DF, C-DF, and RC-DF is itself an LCU with a lambda factor Alternatively, because of (12), one can use the algorithm from [16], which achieves a contribution from the two-body part of the Hamiltonian to lambda of 1/4 t ∥(L t pq ) pq ∥ 2 1 for Hamiltonians of the form (10) (where ∥ • ∥ 1 is the Schatten 1norm) and using (12) one can thus obtain The difference between λ Burg DF and λ LCU DF stems from the different LCU representation of the Hamiltonian.The latter uses the equation in (6) which is evidently an LCU and subsequently its ||.|| 1 is (13).The former however uses the algorithm and LCU from (13) of [16].Finallyl, for THC, Lee et al. [17] have obtained a lambda of The precise run times of the algorithms corresponding to the different lambda values differ and depend on factorization-specific quantities such as the THC rank M but they all scale like their respective lambda divided by the allowable phase estimation energy error times the sum of run times of certain circuit primitives plus logarithmic overheads.The differences between the lambda values have turned out to outweigh the influence of other factors when comparing algorithm run times for similar overall target accuracies [17].

Regularized compressed double factorization
While C-DF allows to reduce the number of leafs needed for good accuracy from close to n 2 for X-DF to roughly linear in n while maintaining an approximate but sufficiently accurate representation of the Hamiltonian, it turns out that the optimization of C-DF often converges to Z t kl tensors with very large entries.This is problematic since the variance of the NISQ energy estimator and both lambda parameters grow with the number and magnitude of |Z t kl | values.To solve this issue, we propose to add to the C-DF cost function from (8) a regularization term penalizing large |Z t kl | via a tensor of weights ρ tkl ≥ 0 We have tested both weighted L1 and L2 regularizations of the form 2 γ with γ ∈ 1, 2 but concentrate on L2 regularization with uniform regularization strength ρ tkl = ρ in the rest of the main text.
As in C-DF, a joint optimization of the U t pq and Z t pq has unfavorable performance also with regularization, but the the two-step optimization of C-DF proposed in [18] can be adopted to the regularized case.Further, for large n, a very expensive 6-index matrix inversion can be circumvented by carrying it out in a matrix-free manner with, e.g., a conjugate gradient algorithm (for details see Appendix A).We have found that this step benefits from the regularization (L2), as it improves the conditioning of the matrix.In RC-DF, initialization can be done either from X-DF truncated to the target number of C-DF leafs or one can start from the full X-DF factorization and put a high penalty on the leafs that are to be truncated in the end.In practice, contrary to the difficulties reported in [11,17] on converging THC, RC-DF seems to be rather well behaved.Convergence may take thousands of iterations, but we had no difficulty converging RC-DF in large active spaces to much tighter residual Frobenius norm errors (8) and coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) energy errors than those reported for THC [11] (see Appendix A for further details of the optimization procedure).

Numerical results
Unless otherwise explicitly stated all results shown in the following were obtained with L2 regularization starting from a truncated X-DF guess and with a uniform regularization factor for all n t leafs of ρ tkl =: ρ.
We first investigate the advantages of RC-DF over other NISQ measurement schemes.The performance of any such scheme is determined by both the systematic error introduced in case the Hamiltonian is approximated and the variance, Var, of the estimator.We quantify the overall performance by the mean squared error and take the square root to obtain a quantity that has units of energy √ MSE := ⟨ Ĥ − Ĥ′ ⟩ 2 + Var, where Ĥ′ is the compressed Hamiltonian obtained with the respective flavor of double factorization and for the Pauli grouping based schemes Ĥ′ = Ĥ.
The variance further depends on the state, the overall shot budget, and the shot distribution.In the main text we show data for the state being the complete active space configuration interaction (CASCI) ground state, but the plots look very similar for representative states along a VQE optimization trajectory.We consider two shot distribution schemes: A "uniform" distribution which divides the total number of shots uniformly among all bases in which measurements need to be preformed, and an "according to weights" distribution which distributes the shots according to the L2 norm of the coefficients of each group of jointly measurable Pauli operators.We chose the overall shot budget so that the best method is able to achieve chemical accuracy of 10 −3 Hartree.  ) in comparison with the naive termwise Pauli scheme as well as the tensor product basis [28] and a minimum clique cover [6] based Pauli grouping methods as implemented in [29] for 3 × 10 5 shots each in computing a single point energy in the (6e, 6o) CASCI ground state of para-benzyne on 12 qubits.To reach a MSE on par with that of RC-DF at n t = 7 would require over 1.1 × 10 6 shots with X-DF and n t = 17 and about 3.8×10 6 shots with the best Pauli grouping scheme.We also compare shot distribution schemes (see main text) and find that "according to weights" improves performance.

NISQ measurement of the para-benzyne ground state
As a first test case we consider the CASCI ground state in a (6e, 6o) active space of FON-HF/cc-pVDZ (with fractional occupations determined with temperature 0.1[1/Eh] and Gaussian broadening within the active space) [25,26,27] orbitals of para-benzyne (see Fig. 2).The weighted shot distribution yields lower variance than uniform shot distribution in all cases by directing more shots to the more important first leafs.RC-DF beats the second best method (X-DF) by approximately a factor of five and while the maximum number 21 = 6 (6 + 1)/2 of leafs yields the lowest MSE, chemical accuracy is consistently achievable with RC-DF from n t = 7 on.The MSE of C-DF fluctuates widely and while it by chance achieves a good variance and approximation error for 8 leafs, this is of limited use for practical applications.The results are virtually unchanged over a broad range of ρ values and good values for ρ can be found in a systematic way even when running on quantum hardware (see Appendix B).

NISQ measurement of the singlet-triplet gap of naphthalene
As a second test case, we investigate the singlettriplet energy gap in a (10e, 10o) active space consisting of the π system of naphthalene constructed with AVAS [30] as implemented in PySCF [31,32].The reference state was computed at the HF/def2-SVP [33] level of theory.We only show data for double factorization based measurement schemes since the Pauli grouping based schemes become increasingly less competitive for larger systems.Since the "according to weights" method is significantly better than uniform shot distribution, we use it exclusively in this case.We compute the factorization once per n t and then evaluate the energetically lowest singlet and triplet energies from the same decomposition.The MSE of the singlet-triplet energy gap is given by the square of the difference between the noiseless energy gaps ∆ computed with the exact and the gap ∆ ′ from the compressed Hamiltonian plus the sum of the variances of the singlet Var S and triplet Var T energies MSE = (∆−∆ ′ ) 2 +Var S +Var T .We find the variances Var S and Var T of both states to be very similar, and for (R)C-DF, the MSE is dominated by the variance contributions for n t ≥ 8 whereas for X-DF, the systematic energy error remains high until n t ≈ 25. Surprisingly, RC-DF reaches chemical accuracy already at n t = 6 with the assigned shot budget and is consistently more accurate than the other two factorization methods for all n t > 2. By contrast, the variance of C-DF singlet-triplet gap estimations is quite erratic, as explained previously.While the accuracy in predicting the energy gap with X-DF is well controllable, an approximately 12 times larger shot budget and more leafs would be needed to reach chemical accuracy.This test case shows the reliable performance of RC-DF for medium-sized active spaces and indicates that the method could be employed for other chemical properties such as activation energies.

Combination with fluid fermionic fragments
We further explore how X-DF and RC-DF can be combined with the fluid fermionic fragments (FFF) [21] technique to further reduce shot budgets.The FFF technique exploits the fact that certain quadratic terms of the Hamiltonian, called fluid fermionic fragments, can be taken care of in different parts of the energy estimator.FFF minimizes the variance by optimizing how these terms are spread over the different possible locations (for more details see Appendix G).The variance can thereby either be approximated with that of a mock state whose variance can be classically efficiently computed or it can for example be estimated with part of the shot budget or from a classical shadow [34,35], which can then also be used for the energy estimation.In any case, the final form of the FFF optimized energy estimator is then state dependent and optimized to have low variance for certain states.We find (see Figure 8 in Appendix G) that for the cases considered RC-DF and FFF nicely complement each other.Using RC-DF as initial point for FFF yields the lowest shot budgets and that the FFF optimization converges faster when started form RC-DF than from X-DF and that the state independent distribution of the fluid fermionic fragments corresponding to the Hamiltonian as written in (6) is a good initial guess for the FFF coefficients.THC for THC.The color scheme represents the number of leafs n t for double factorization schemes, or the THC rank M .The active space Hamiltonian of the Cpd I model of cytochrome P450 and the data for THC and truncated DF were taken from [11].The encircled THC data point was used for the resource estimates there.To compare different levels of convergence we vary the squared Frobenius norm error (8) at which we abort the RC-DF optimization (Conv.Tol.) and use ρ = 10 −3 .The data is tabulated in Appendix E.

RC-DF for error corrected quantum computing
We now turn to exploring the usefulness of RC-DF for error corrected algorithms based on qubitization.
As example we take the (34α+29βe, 58o) active space of the Cpd I species of cytochrome P450 proposed in [11].As this system is beyond the regime accessible with CASCI, we use, as in [17,11], the CCSD(T) energy error as a heuristic to assess the quality of the compressed representation.Computational details can be found in Appendix E.
The aim is then to find compressed representation of the Hamiltonian that achieves both a low CCSD(T) error and a low lambda value.We find λ Burg DF < λ LCU DF in all cases and thus only display and compare the former.As can be seen in Fig. 4, RC-DF outperforms both previous DF methods and THC by a substantial amount.Compared to truncated DF or X-DF we can almost cut λ Burg DF in half, and thereby also the run time of the quantum computer, while at the same time achieving a CCSD(T) error smaller than 5 × 10 −5 , which was not reached with THC [11].Further comparing with THC, we can reduce the CCSD(T) error by an order of magnitude and the lambda value by roughly one third or, alternatively, RC-DF can achieve a λ Burg DF that is only 60% of λ THC at comparable CCSD(T) error.This improvement becomes even more noteworthy when recognizing that when qubitizing the RC-DF Hamiltonian one need not worry about the complications caused by the non-orthogonal nature of THC that had to be worked around in [17] (see Section III of [17]).Converging the best RC-DF data point with n t = 100 took 2944 L-BFGS iterations and approximately 11 hours on a single GPU and our not highly optimized JAX [36] based implementation.
To investigate the scaling of the achievable lambda values with system size we considered the hydrogen chain benchmark in the STO-6G basis previously proposed in [12,17].Also there we find that at n = 100 orbitals (corresponding to 100 hydrogen atoms) RC-DF achieves values of λ Burg DF that are lower than previously reported values for λ THC and we find, as for THC, an approximately linear scaling of λ Burg DF with n if n t is chosen such that the CCSD(T) error per particle is constant (see Appendix C for details).
It should be clear that the lambda values, while highly significant for determining the quantum resources of fault tolerant algorithms, are not the only relevant quantity and we do not claim that our analysis constitutes a comprehensive comparison of the quantum resources required for different methods.We also leave to future work the possibility of regularizing methods such as RC-DF or THC with quantities more directly related to the required quantum resources than the norm-like term in (17), which could more directly steer the optimization towards factorizations with low qubit or T gate count, as desired.

Conclusions
We proposed the regularized compressed double factorization (RC-DF) method which, from a unified framework yields both a NISQ compatible measurement scheme with only linear cir-cuit overhead and can be used in conjunction with qubitization in error corrected quantum algorithms for the simulation of chemistry.We found that in both of these scenarios RC-DF leads to lower quantum run times when compared to previous double factorization (DF) and tensor hypercontraction (THC) schemes.Contrary to THC, the Hamiltonian in DF form can also be used to construct trotter schemes that need to alternate between a very small number of noncommuting operators.It will be interesting to compare the quantum resources (e.g.number of Toffoli gates) required for phase estimation based on THC and RC-DF via qubitization with RC-DF and trotterization.
In the NISQ setting, this advantage is a consequence of the fact that the regularization guides the optimization towards compressed representations of the Hamiltonian with smaller coefficients, which reduces the variance of the resulting energy estimator.In qubitization schemes, the smaller coefficients reduce the norm-like lambda parameter of the Hamiltonian on which the T gate count depends in a multiplicative fashion.
Avoiding a six-index intermediate quantity during the RC-DF optimization and adopting a two step gradient based scheme previously developed by some of us for non-regularized compressed DF, we were able to make RC-DF scale well into the regime where quantum computers may provide an advantage over classical methods.More work is needed to understand the precise scaling of RC-DF with active space and basis set size and to explore other options for regularization.

Data Availability
The data supporting the findings of this manuscript have been uploaded to Zenodo with the DOI 10.5281/zenodo.7866658[37], including instructions on how to load the data using Python.

Appendix A: RC-DF optimization procedure
The RC-DF cost function is with regularization tensor ρ tkl and γ = 1 in the L1 case and γ = 2 in the L2 case.Just like C-DF as outlined in [18], also for RC-DF it is advisable to use a nested two-step optimization process to minimize C with respect to X and Z following these steps: 1. Update the X t pq for fixed Z t kl .This can be done with a gradient based optimizer using the gradient 2. Determine the optimal Z t kl given the updated X t pq by solving The gradient with respect to the orbital rotations' generators X t pq for a given Z t pq is independent of the regularization and stays unchanged compared to the original C-DF as the U t pq do not appear in the regularization contribution to the cost function.Therefore Let us now look at the second step.In the L2 case (20) yields by replacing ∆ pqrs by its expression in 8, we have Note that the left-hand side of the equation is independent of Z o mn i.e constant, and the right-hand side can be written as a linear combination of the unknowns Z o mn .Therefore, we have a system of linear equations of the form and  to the well known one over square root raw.As can be nicely seen there is a rather large window between ρ ∈ [10 −6 , 10 −2 ] where both errors (and thus their sum) is well below 10 − 3.In practice one find a good ρ by estimating the standard deviation on a quantum device starting from a comparably large regularization and then reduce the regularization to reduce the systematically error until the variance becomes too large to be compensated for by the affordable shot budget.
compared, again for 10 ≤ n ≤ 100.The "naïve" λ 2 is found to grow approximately proportional to n 3.16 to values well beyond 5 × 10 6 .The λ 2 achievable with the non-orthogonal basis THC based qubitization proposed in [17] (which corresponds to the two-body part of λ Lee THC ) is found to only grow approximately proportionally to n 1.16 and reaches approximately 4×10 2 at n = 100.Fig 9 of the same work compares the full lambda values (including the one-body contribution) for different factorization methods including THC (the main focus of that work and the most competitve of the factorization methods in this plot) and the Cholesky based DF described around (10).The parameters of these methods are chosen to yield a CCSD(T) correlation energy error per atom of at most 50 micro Hartree, amounting to 0.5 × 10 −3 Hartree at n = 100.The authors find a scaling of roughly n 1.88 for DF and of n 1.11 for THC and at n = 100 values of λ > 2 × 10 3 for DF and λ ≈ 5 × 10 2 for THC.
With a regularization of ρ = 5 × 10 −5 and even just a linear number of leafs n t = ⌊(n + 1)/2⌋ (red line in Fig. 7) we find that RC-DF (with two norm regularization γ = 2) is able to yield a constant Frobenius norm error and an absolute CCSD(T) error per atom |∆ CCSD(T) |/n in line with the 50 micro Hartree per atom used in Fig 9 of [17] (see the green line in Fig. 7b).At these parameters λ Burg RC−DF is found to scale approximately like n 1.08±0.10(fit through the values for 70 ≤ n ≤ 100) and reaches approximately 2.5 × 10 2 at n = 100, roughly a factor of two better than the THC results from [17].Compared to X-DF (which requires n t to grow faster than linearly to obtain acceptable accuracy (see the blue and yellow lines in Fig. 7b), RC-DF yields roughly one order of magnitude lower lambda values at n = 100, mostly owing to the fact that we find that the X-DF lambda values scale roughly quadratic with n. λ Burg and λ LCU seem to have a similar scaling for all double factorization schemes considered here.
Appendix D: Necessary and sufficient condition for the symmetry of the V t pq In this section we present a proof that shows that 8-fold symmetry of the (pq|rs) tensor is a sufficient condition for the V t pq to be symmetric for every t where the eigenvalue g t ̸ = 0. Hence, the Z t pq are real, which is required for the X-DF procedure to work, and the U t pq are orthogonal (and thud can be chosen to be special orthogonal without loss of generality), which is essential for their implementation on a quantum computer by means of a fabric of givens rotations.Slightly abusing notation, in the following lemma, we use (pq|rs) for a four index tensor that does not necessarily arise from electron overlap integrals, but has the stated properties.Lemma 1.Let (pq|rs) be any real, symmetric, i.e., (pq|rs) = (rs|pq) tensor of shape n × n × n × n.By grouping the indices pq and rs, let (g t ) t be its n 2 eigenvalues, V t pq its diagonalizing unitary and let T + = {t : g t > 0}, T − = {t : g t < 0}, and T := T + ∪ T − .Then (pq|rs) can be written in the form Further, if and only if (pq|rs) is in addition 8-fold symmetric, i.e., (qp|rs) = (qp|sr) = (pq|sr) = (pq|rs) = (rs|pq) = (sr|pq) = (sr|qp) = (rs|qp) the matrices (V t pq ) pq are symmetric, i.e., V t pq = V t qp ∀t ∈ T , and |T | ≤ n(n + 1)/2.
Proof.That symmetry of the V t pq implies 8-fold symmetry of (qp|rs) can be verified directly from (32) and whenever the V t pq are symmetric, since they are also orthogonal for different values of t, there can be at most n (n + 1)/2 of them and thus the bond on the size |T | of T holds.
To show that 8-fold symmetry implies symmetry of all V t pq with t ∈ T let us first define (pq|rs) ± := For p = r and q = s we can specialized this to The symmetry (pq|pq) ± = (qp|qp) ± together with (32) implies and thus the left and right hand side of (37) can be identified to be the left and right hand side of the Cauchy-Schwarz inequalities of the inner product of the vectors ( where in the last step we have used (38).Thus (37) implies that these Cauchy-Schwarz inequalities are indeed fulfilled with equality, which is the case if and only if the two vectors in the Cauchy-Schwarz inequality are identical up to a prefactor, i.e., for each p, q there must exists a scalar α such that Since (pq|rs) = (qp|rs), we further have Appendix G: Comparison of RC-DF and FFF The Fluid Fermionic Fragments (FFF) method is based on the fact that some contributions to the Hamiltonian can be moved back and fourth freely between the second and third term of the electronic structure Hamiltonian as written in (1).In the fermionic picture, these "fluid" parts of the Hamiltonian correspond to terms that are quadratic in the creation and annihilation operators (see ( 9) an (10) in Ref [21]) and which, after diagonalization of the quadratic part contribute to the terms proportional to particle number operators, and which under Jordan Wigner yield Pauli Ẑ operators.
Here we present the FFF method in the qubit picture.To that end, starting from (1), we first factorize the (pq|rs) part of the Hamiltonian only, which yields:  and U ′ ∅ and C ∅ k are the diagonalizing unitaries and eigenvalues of C pq .The case all c t k = 0 corresponds to how X-DF was introduced in [18] and this was taken as the prior art benchmark in [21].The case corresponds to the way we wrote the Hamiltonian in (6), with no single qubit Ẑ contributions in the two body leafs.It turns out that neither of these choices is optimal with respect to variance and thus shot count and optimizing the c t k can further yield improvements, Optimization can be done with a gradient based optimizer and in [21] as well as here we used LBFGSB as implemented in scipy.The number of shots is then optimized, via a proxy state, using a nested loop where in each iteration the coefficients c t k are updated using the partial derivative at fixed shot distribution then the shots are optimally distributed according to the variances computed with the new c t k in the proxy state.For simplicity and better comparability with the results form [21] we use the exact ground state as the proxy state.This is not efficient but [21] found little difference between using the real ground state and an approximate proxy state for which variances can be computed efficiently.We compute the needed derivatives by means of a fully auto-differentiable code that computes the variances as a function of the c t k [29,36,41].For some cases for which we have performed simulations (see Figure 8) we find that all c t k = 0 and/or X-DF with the c t k corresponding to (6) is a local minimum and hence gradient based optimization of the c t k does not work.We consistently found good final shot counts from random uniformly distributed within [0, 1[ initializations.Alternatively one can initialize from coefficients according to (88) which lead to faster convergence but seems to yield the same final or very similar shot budgets.Overall we find that combining RC-Df with FFF yields the lowest shot budgets.The term mapping used in (6) is significantly better than choosing all c t k = 0 and RC-DF (with and without FFF) consistently outperforms X-DF.

Figure 1 :
Figure1: DF measurement scheme according to the LCU decomposition in(6).From each U ∅ and U t the parameters of a square shaped fabric of givens gates G can be computed.The results of Ẑ and Ẑ ⊗ Ẑ measurements in these n t + 1 distinct bases can then be contracted against the F ∅ k and Z t kl tensors to obtain an energy estimator.

Figure 2 :
Figure 2: Performance of measurement schemes based on C-DF, X-DF, and RC-DF (with L2 regularization ρ = 10 −6) in comparison with the naive termwise Pauli scheme as well as the tensor product basis[28] and a minimum clique cover[6] based Pauli grouping methods as implemented in[29] for 3 × 10 5 shots each in computing a single point energy in the (6e, 6o) CASCI ground state of para-benzyne on 12 qubits.To reach a MSE on par with that of RC-DF at n t = 7 would require over 1.1 × 10 6 shots with X-DF and n t = 17 and about 3.8×10 6 shots with the best Pauli grouping scheme.We also compare shot distribution schemes (see main text) and find that "according to weights" improves performance.

Figure 4 :
Figure 4: Comparison of the achievable CCSD(T) error heuristic and lambda values λ BurgDF for the truncated DF method based on (10), X-DF and RC-DF, as well as λLee  THC for THC.The color scheme represents the number of leafs n t for double factorization schemes, or the THC rank M .The active space Hamiltonian of the Cpd I model of cytochrome P450 and the data for THC and truncated DF were taken from[11].The encircled THC data point was used for the resource estimates there.To compare different levels of convergence we vary the squared Frobenius norm error (8) at which we abort the RC-DF optimization (Conv.Tol.) and use ρ = 10 −3 .The data is tabulated in Appendix E.

Figure 6 :
Figure 6: Standard deviation of the ground state energy estimator of para-benzyne for a budget of 300.000 shots as a function of ρ for different numbers of leafs n t .
discussions regarding C-DF.QC Ware Corp. acknowledges generous funding from Covestro for the undertaking of this project.Covestro acknowledges funding from the German Ministry for Education and Research (BMBF) under the funding program quantum technologies as part of project HFAK (13N15630).
Figure 5: RC-DF systematic ground state energy approximation error of para-benzyne as a function of the regularization factor ρ with different numbers of leafs n t .
-fold symmetry of (pq|rs) implies 8-fold symmetry of |(pq|rs)| and thereby 8-fold symmetry of both (pq|rs) + and (pq|rs) − individually.The symmetry (pq|rs) ± = (qp|rs) ± then implies that Moreover, since for k = l =⇒ Ẑk Ẑl = I, we have an extra offset of tk Z t kl : which is possible only if ±α = 1.Thus we have as claimed V t pq = V t qp ∀t ∈ T .

Table 2 :
X-DF performance summary for Cpd I

Table 3 :
[12]17]ed DF performance summary for Cpd I a) Referred to as L in Refs.[12,17].c)Eigenvectorscreening threshold with which the accuracy of the factorization is tuned, see Ref.[12].