Quantum simulation of battery materials using ionic pseudopotentials

Ionic pseudopotentials are widely used in classical simulations of materials to model the effective potential due to the nucleus and the core electrons. Modeling fewer electrons explicitly results in a reduction in the number of plane waves needed to accurately represent the states of a system. In this work, we introduce a quantum algorithm that uses pseudopotentials to reduce the cost of simulating periodic materials on a quantum computer. We use a qubitization-based quantum phase estimation algorithm that employs a first-quantization representation of the Hamiltonian in a plane-wave basis. We address the challenge of incorporating the complexity of pseudopotentials into quantum simulations by developing highly-optimized compilation strategies for the qubitization of the Hamiltonian. This includes a linear combination of unitaries decomposition that leverages the form of separable pseudopotentials. Our strategies make use of quantum read-only memory subroutines as a more efficient alternative to quantum arithmetic. We estimate the computational cost of applying our algorithm to simulating lithium-excess cathode materials for batteries, where more accurate simulations are needed to inform strategies for gaining reversible access to the excess capacity they offer. We estimate the number of qubits and Toffoli gates required to perform sufficiently accurate simulations with our algorithm for three materials: lithium manganese oxide, lithium nickel-manganese oxide, and lithium manganese oxyfluoride. Our optimized compilation strategies result in a pseudopotential-based quantum algorithm with a total Toffoli cost four orders of magnitude lower than the previous state of the art for a fixed target accuracy.


Introduction
Quantum computing is being actively studied as a potential method to accurately simulate materials and support the development of nextgeneration lithium-ion batteries [34,79,41,20,21,80,89]. The driving motivation is that quantum algorithms are uniquely positioned to perform highly-accurate simulations without incurring prohibitive computational costs [65]. Nevertheless, considerable progress still needs to occur on both hardware and algorithms to make this promise a reality. The main theoretical challenges are the high cost of implementing quantum algorithms and the difficulty of identifying the applications.
There has been considerable progress in developing quantum algorithms for simulating the properties of molecules and materials. A variety of different strategies have been proposed, ranging from variational approaches designed for noisy hardware with few qubits [38, 105, 2, 20, 4], algorithms tailored for early fault-tolerant quantum computers [55,98,100,22], and variants of quantum phase estimation that require the full capabilities of large-scale fault-tolerant quantum computers [6, 78,43,87,70,49]. Particular attention has been devoted to improving the efficiency of quantum phase estimation and Hamiltonian simulation algorithms, which has led to an overall cost reduction of several orders of magnitude [78]. This progress has been fueled by innovations such as qubitization [60, 61, 12], Hamil-tonian factorization techniques [43,70,97,49], interaction-picture simulations [62,40,77], improved Trotter bounds [19], and first quantization methods [8,87].
Simulating bulk materials presents additional challenges beyond those associated with simulating finite molecules [93]. Arguably, among the most pressing ones is the frequent need to use large unit cells (supercells). For example, in the context of lithium-ion batteries, large supercells are needed to predict the most stable phases of cathode materials. Their energies calculated for different values of the lithium-ion concentration determine the voltage profile of the battery cell [94, 82,67]. The size of the supercell is even more critical for the simulation of chemical reactions at the electrode-electrolyte interface [99]. This results in systems with many hundreds of electrons requiring a very large number of planewave basis functions to achieve high-accuracy simulations [33].
In this work, we introduce a quantum algorithm that uses ionic pseudopotentials (PPs) to reduce the cost of using quantum phase estimation to simulate material, mirroring a strategy widely used for density functional theory simulations [81,51]. Replacing the bare Coulomb potential due to the nucleus and the core electrons with an effective potential leads to a substantial reduction in both the number of electrons and plane waves needed to accurately represent the system. However, this is accomplished at the price of a more complicated pseudopotential operator describing the interaction between the valence electrons and the effective ionic cores. This greatly complicates the implementation of the resulting quantum algorithm and can negate the benefits of reducing the number of electrons and plane waves.
We tackle this challenge by deriving highlyoptimized implementation strategies for qubitization-based quantum phase estimation in first quantization. We focus on the Hartwigsen-Goedecker-Hutter (HGH) pseudopotentials [29] and develop a tailored linear combination of unitaries decomposition for the pseudopotential term of the Hamiltonian. We also carefully engineer the compilation of the qubitization encoding to reduce the implementation cost. To avoid costly quantum arithmetic, our methods make frequent use of quantum read-only memory (QROM) subroutines. To the best of our knowledge, this is the first example of a quantum algorithm that can incorporate pseudopotentials. Overall, simulations with our algorithm requires orders-of-magnitude fewer plane waves to reach convergence for typical materials than comparable all-electron calculations. This results in circuit-depth reductions of many orders of magnitude for a given target accuracy.
However, while it is well-understood how one might use quantum computers to simulate key properties of batteries, such as equilibrium voltages [21], it has not yet been established which concrete class of battery simulation would benefit the most from the capabilities of quantum computers. After all, developing better lithium-ion batteries requires solving a multitude of problems, and scientists are already equipped with sophisticated simulation techniques that can be run on powerful supercomputers. It is therefore crucial to identify problems where the limitations of classical methods are most severe, and whose solution would be most impactful to battery development.
We propose an application of quantum computers for batteries that we contend meet these criteria: simulating lithium-excess cathode materials [106]. These materials offer an avenue to dramatically increase the energy density of stateof-the-art cathodes. With theoretical capacities that are roughly twice as high as commercial batteries, they would make the driving range and cost of electric vehicles competitive with internal combustion engines [53]. Unfortunately, lithiumexcess materials suffer from substantial capacity loss even after a single charging cycle [26]. The rapid degradation of lithium-excess materials responsible for the capacity loss has been attributed to irreversible structural transformations of the material, but the relationship between the proposed redox mechanisms and the observed transformations remains a topic of active discussion as these processes are difficult to probe experimentally [106]. To better understand these processes, researchers rely on density functional theory simulations. However, the available density functionals are not accurate enough to single out the dominant mechanisms driving the materials structural changes [106]. This makes it difficult to develop solutions for capacity loss in lithium-excess cathode materials using classical computer simulations.
To assess the potential for quantum computers to provide a solution to this problem, we perform a detailed estimation of the resources required to implement our quantum algorithm as applied to three lithiumexcess cathode materials: lithium manganese oxide (Li 2 MnO 3 ), lithium nickel-manganese oxide (Li[Li 0.17 Ni 0.25 Mn 0.58 ]O 2 ), and lithium manganese oxyfluoride (Li 0.75 MnO 2 F) [54,24,67]. In each of these cases, the use of ionic pseudopotentials allows a reduction of the number of electrons by a factor of two, and a reduction in the number of plane waves by roughly three orders of magnitude compared to all-electron simulations for a fixed target accuracy. This results in a total Toffoli cost for the algorithm that is about four orders of magnitude lower than the previous state-of-the-art [87].
The rest of this work is organized as follows. Section 2 provides background information on first-quantization quantum algorithms for electronic structure, the theory of ionic pseudopotentials, and the basic properties of quantum readonly memories. Our quantum algorithm is described in Section 3, outlining the linear combination of unitaries and qubitization subroutines that constitute the main technical contribution of this work. This is complemented with a detailed error analysis in Section 4 and a calculation of the qubit and gate cost of the full algorithm in Section 5. We then study the application of the algorithm to the simulation of lithium-excess cathode materials in Section 6.

Background
This work is an interdisciplinary effort covering topics across computational chemistry, quantum computing, and lithium-ion batteries. In an effort to make this manuscript self-contained, this section provides background information on key concepts that will be used throughout.

The plane-wave electronic Hamiltonian in first quantization
The ultimate goal of the quantum algorithm presented here is to solve the electronic structure problem: HΨ 0 (r 1 , . . . , r η ) = E 0 Ψ 0 (r 1 , . . . , r η ), (1) where H is the Hamiltonian of η interacting electrons, and Ψ 0 and E 0 are the ground-state wave function and energy, respectively.
In the Born-Oppenheimer approximation [15], the electronic Hamiltonian, H, is given by where T is the total kinetic energy operator, U is the Coulomb potential due to the nuclei and V is the electron-electron interaction term [46]. We have listed the symbols used in this paper in Appendix N.
Plane-wave functions are a natural basis set to represent the electronic states in periodic materials. They can be used to encode the translational symmetry of crystal structures and allow us to derive closed-form expressions for the Hamiltonian matrix elements. Plane-wave functions are defined as where Ω is the volume of the material's unit cell and G p is the reciprocal lattice vector where b 1 , b 2 , b 3 are the primitive vectors of the reciprocal lattice [5]. For a total number of plane waves N , the integer vectors p contained in the set define a uniform grid of points in the reciprocal lattice.
Previous works [8,9,87,21] have argued in favour of using first quantization techniques to accommodate the large number of plane waves that are required for accurate simulations. We follow that strategy in this work. In first quantization, the plane wave representation of the op-erators T , U and V are given by [87,21]: where Z I and R I are respectively the atomic number and position of the Ith atomic species, and L denotes the number of atoms in the unit cell. In Eqs. (7) and (8), G ν = G q − G p and G ν = G p − G s = G r − G q , respectively, and G 0 = G \ (0, 0, 0). The qubit representation of the N plane waves uses n p = ⌈log(N 1/3 + 1)⌉ (9) qubits to encode each component of the plane wave vector. Thus a total of 3ηn p qubits are required for the system register.

Ionic pseudopotentials
Performing accurate all-electron simulations of supercell structural models of battery materials is hampered by the huge number of plane waves that are needed to represent the core states and the valence states near the nucleus as sketched in Fig. 1. Different strategies such as the augmented and orthogonalized plane wave methods [86, 102, 31] have been proposed to overcome this limitation. However, a key step to retain the advantages of plane waves for materials simulations was taken by Phillips, Kleinman and Antoncik (PKA) [75,3]. Crucially, it follows from the PKA transformation that the nuclear potential and the core electrons can be replaced by an effective potential known as a pseudopotential that produces the same energies of the valence states. Furthermore, the associated pseudo wave functions superimpose the r rc  Figure 1: Representation of the oscillating character of the all-electron wave function, ϕ, versus the smooth behaviour of the pseudo wave function ,ϕ PP , in the core region r < r c . The pseudo wave function is a solution of the pseudopotential, u PP , and superimposes the valence wave function outside the core region. The two lower curves sketch the bare nuclear potential −Z/r (solid line) and the pseudopotential u PP (dotted line).
true valence wave functions outside the core region (see Fig. 1), and can be accurately represented using a significantly smaller number of plane waves. This is an excellent approximation since core electrons populate deep energy states that do not influence neither the chemical bonding nor the redox processes in battery materials.
Here, we focus on the use of ionic pseudopotentials (PPs) which have proven to be highly accurate for simulating materials [51,25,11]. Different pseudization schemes, including the Hartwigsen, Goedecker and Hutter (HGH) PPs adopted in this work and described in more details in Sec. 3.1, have been extensively benchmarked for a large set of elemental crystals [51]. Modern PPs exhibit small deviations (1-2.2 meV/atom) of the calculated equation of state with respect to analogous all-electron results. Furthermore, the transferability of ionic PPs have also been shown to reproduce the lattice constants of more complicated materials e.g., transition-metal oxides, with maximum root mean squared errors of the order of 2% [25,11]. Interestingly, the authors in Ref. [25] have noted that the differences between the PPs and the allelectron calculations are often comparable with the numerical uncertainties in the all-electron re-sults themselves.
Ionic PPs are typically generated from allelectron atomic calculations performed using density functional theory. Due to the spherical symmetry of the Coulomb potential, the pseudo wave functions, ϕ lm (r) = ϕ l (r)Y lm (ϕ, θ), are eigenstates of the angular momentum operator and characterized by the quantum numbers (l, m). Starting from a pseudo wave function ansatz, ϕ l (r), the effective model potential, u l (r), is found by inverting the radial Schrödinger equation [90,28,39,10]. Norm-conserving PPs [91] are obtained from pseudo wave functions enclosing the same charge as the true wave function in the core region r < r c , where r c denotes a core radius around the nucleus. Finally, the ionic PP is obtained by removing the screening effects due to the valence electrons in the atom. This is referred to as "unscreening" the pseudopotential, which makes the PP transferable to different atomic environments [59,23].
The general form of the pseudopotential operator is [46,64] where u loc := u loc (r) is a local potential, i.e., obtained by evaluating a simple function at point r. It represents a screened Coulomb potential which joins smoothly the all-electron atomic potential at some radius r < r c . The second operator in Eq. (10) is defined as where l max is the maximum angular momentum of the core electrons and ⟨r|lm⟩ = Y lm (ϕ, θ) denotes the spherical harmonics. The l-dependent potential, ∆u l (r) = u l (r) − u loc (r), is a shortranged potential vanishing beyond the core radius. Moreover, in the asymptotic limit, r → ∞, the full pseudopotential, u PP , behaves as −Z ion /r, where Z ion = Z − η core is the effective charge of the ionic core and η core is the number of core electrons. The operator u SL in Eq. (10) has a semi-local character since its action on a given basis function, ⟨r|q⟩ = φ q (r, ϕ, θ), is local in the radial coordinates but involves an integral over the angular variables (ϕ, θ) [64]. Computing its plane wave matrix elements u SL pq requires evaluating the radial integral [46], where j l denotes the spherical Bessel functions and G p = ∥G p ∥. The number of such integrals scales as O(LN 2 ), where L is the number of atoms in the material's unit cell and N is the total number of plane waves. Typically, N can be very large in actual simulations and computing these matrix elements becomes computationally expensive [46]. To reduce this computational cost, Kleinman and Bylander (KB) [45] proposed the separable pseudopotentials by expressing the radial operator ∆u l (r) in a form that is separable in the radial variables. The KB construction was further modified by Blöch [14] to construct the non-local (NL) pseudopotential with the operator u NL given by In Eq. (14) ⟨r|ϕ lm ⟩ = ϕ l (r)Y lm (ϕ, θ) is a pseudo wave function solution of the model potential u l (r), and ⟨r|ξ lm ⟩ = ξ l (r)Y lm (ϕ, θ) are projectors defined as where ε l is the all-electron reference energy associated with the pseudo wave function ϕ lm (r). Note from Eq. (14) that computing the matrix elements of the separable potential u NL pq , as opposed to the matrix elements of a semi-local operator (Eq. (12)), requires only to evaluate the product of the projection operations which scales linearly with the number of plane waves.
The non-local operator in Eq. (14) can be generalized to use two or more projectors per angular momentum quantum numbers [96], where the matrix B ij = ⟨ϕ i | ξ j ⟩ is used to define the projectors |β i ⟩ = j (B −1 ) ji |ξ j ⟩.
where each H ℓ is a unitary and the coefficients α ℓ > 0 are referred to as (unnormalized) selection probabilities. The specific choice of an LCU has a large impact on the cost of qubitization, especially through the parameter λ = ℓ α ℓ [58]. We then define the qubitization operator with the prepare and select unitaries given by Note that the reflection in Q and PREP act on an auxiliary register |0⟩. The latter prepares the so-called PREP state ℓ α ℓ λ |ℓ⟩ with amplitudes given by the selection probabilities. It does not alter the system register |ψ⟩, which is acted upon by SEL that applies the unitaries H ℓ . These subroutines satisfy the block-encoding equation The operator Q is block-diagonal, with each block Q k corresponding to a two-dimensional subspace W k spanned by an orthonormal basis span{|0, Φ k ⟩ , |ρ k ⟩}, where |Φ k ⟩ is an eigenstate of H with eigenvalue E k and |ρ k ⟩ is orthogonal to |0, Φ k ⟩. The term qubitization refers to these effective qubit subspaces. In its eigenbasis, Q k can be written as where |±θ k ⟩ are the eigenstates, with θ k = arccos(E k /λ). Thus, by applying QPE on Q with an initial state |0⟩ |Ψ 0 ⟩, where |0⟩ |Ψ 0 ⟩ = α |θ 0 ⟩ + β |−θ 0 ⟩ for some α, β, we always recover the ground state and its energy since cos(±θ 0 ) = E 0 /λ. In the QPE algorithm, the unitary Q is controlled on the state of auxiliary qubits, which increases the Toffoli cost of the algorithm. To avoid this, as shown in Ref.
[7], one can use a reflection to have the inverse unitary applied when the auxiliary qubit is in state |0⟩. The only requirement for this modification to work is for SEL to be self-inverse, which our algorithm satisfies.
When acting on an initial state |ψ⟩, the QPE algorithm outputs an estimate of the groundstate energy E 0 with probability p 0 = | ⟨ψ 0 |ψ⟩ | 2 , where |ψ 0 ⟩ is the ground state. The QPE routine needs to be repeated O(1/p 0 ) times on average to retrieve E 0 with high probability. It is thus necessary to prepare an initial state with a sufficiently large overlap with the ground state. This is the initial state preparation problem: a crucial and daunting challenge for quantum algorithms. Strategies for preparing initial states have been studied in Refs. [88,92,97,50], and Ref.
[21] described a method for preparing a Hartree-Fock state for periodic materials in first quantization. While we acknowledge the importance of developing better methods for initial state preparation, in this work we focus on the problem of reducing the cost of QPE.

Quantum read-only memory (QROM)
Our main use of QROM in the quantum algorithm is to prepare arbitrary states of few qubits. There are a variety of methods in the literature for this task; we refer to [66] for an overview of such techniques. A QROM, or a data-lookup oracle, is an operator O that reads a register |x⟩ and outputs a corresponding bitstring |θ x ⟩ into an auxiliary register as: . The output bitstring θ x is precomputed and available in a data-lookup table. The non-Clifford gate cost of QROM is exponential in the size of x and polynomial in the size of θ x . Parallelization can decrease the depth at the expense of using more qubits [7, 63].
We follow [63] in our description of state preparation using QROM. Consider the target state |ψ⟩ = x a x |x⟩, where we assume a x ≥ 0 for simplicity. For any bit-string y of length w ≤ n, where n is the total number of qubits, denote by p y the probability that the first w qubits of |ψ⟩ are in state |y⟩. Define also cos(θ y ) = p y0 /p y , where y0 is the bitstring y followed by 0, and the QROM oracles O w |y⟩ |0⟩ = |y⟩ |θ y ⟩, outputting the classically precomputed and tabulated θ y up to b bits of precision for each w. The QROM oracles can be used iteratively to prepare any state |ψ⟩ = x a x |x⟩ as follows. By induction, for 1 ≤ w ≤ n, do: where R w is a one-qubit rotation on the (w + 1)th qubit controlled on the state of the auxiliary system, and as a slight abuse of notation we use |0⟩ to denote all-zero states of different number of qubits. While we assumed non-negative amplitudes a x ≥ 0, the complex phases of |ψ⟩ can be implemented by storing ϕ x = arg[a x /|a x |] and applying it in the final iteration. In our applications, ϕ x = ±1, meaning the phase is always real and the last step is a simple Z gate. For future reference throughout the text, this entire state preparation procedure is called Algorithm 1. The precision of the rotation angle is the main Operation Additional qubits Toffoli Depth Toffoli count ≤ · + O(log ·) There are three different types of QROM oracles that can be used in Algorithm 1 as proposed in [63]. Two of these, called Select and Sel-SwapDirty, are of interest to us. The latter is our terminology for the oracle described in [63, Fig. 1d], also called QROAM by [12]. Select is mostly used when n is small while b is large. One important property of SelSwapDirty is the space-depth trade-off that it offers. With the help of dirty qubits, i.e., qubits that do not need to be initialized to any specific state, the depth of the circuit can be lowered by parallelizing the controlled-SWAP gates used in the Swap subroutine of SelSwapDirty. Furthermore, the dirty qubits are returned to their initial state, therefore any qubit not undergoing simultaneous computation in the quantum circuit can be borrowed as a dirty qubit. We briefly overview the cost of these routines in Table 4, and we refer the reader to Appendix E for more details.

Quantum algorithm
We now describe the pseudopotential-based quantum phase estimation algorithm. We begin with the construction of the plane-wave representation of the pseudopotential operators describing the ionic cores in the material, where we derive closed-form expressions for the matrix elements of the local and non-local potentials (see Appendix A). We then proceed to describe the LCU decomposition of the Hamiltonian and the implementation of the qubitization operator. These procedures exploit the structure of the pseudopotential to optimize the qubitiza-tion of the modified operator U , as discussed in Section 3.3.

Plane wave matrix elements of the pseudopotential operator
We focus on the description of the separable Gaussian pseudopotentials proposed by Hartwigsen, Goedecker and Hutter (HGH) [29]. The HGH pseudopotentials are relatively easy to define and have proven to be transferable and accurate.
The HGH local potential is defined as , r loc is a local radius parameter giving the charge distribution in the core, and the C i are tabulated coefficients [29]. On the other hand, the non-local part u NL is given by a sum of separable terms [29] where ⟨r|β where the A i l is a constant defined in Appendix A.2 and the radii r l give the range of the l-dependent projectors. The optimized values of the coefficients B The plane-wave matrix elements of the local and non-local components of the HGH pseudopotentials are derived in Appendix A. For an ion located at the coordinates R, the matrix elements of the local potential are given by The matrix elements of the non-local operator in Eq. (26) are derived in Appendix A.2. Typically, electronic structure calculations are performed using one or two projectors. For the sake of simplicity, we consider the case of one projector per angular momentum (i = j = 1). In this case, the plane wave matrix elements for the HGH non-local potential are given by where the coefficient B l := B (l) 11 . To see roughly how the projector expression in Eq. (26) could lead to the expression above, we show one example of how the equation decomposes to a sum of projections. Consider the first term in Eq. (29) 4B 0 r 3 0 e −(G 2 p +G 2 q )r 2 0 /2 |p⟩ ⟨q|. When summed over p, q, this can be expressed as a scalar multiple of the projection onto the Gaussian superposition state p∈G e −G 2 A similar rewriting applies to other terms in Eq. (29), and involves projection onto (derivatives) of Gaussian superpositions. See Appendix D for more details.

The pseudopotential Hamiltonian
By including the ionic pseudopotentials, the allelectron problem defined by the Hamiltonian in Eq. (2) transforms into a valence-only electron problem where η val = L I=1 Z ion I . This results in a substantial reduction of the total number of electrons, plane waves N , and the overall norm of the Hamiltonian describing the valence electrons.
In this approach, the expressions for the operators T and V (Eqs. (6) and (8)) remain formally identical. However, the operator U accounting for the electron-nuclei interactions in the all-electron case needs to be defined using the effective pseudopotentials describing the ionic cores: (31) Using Eq. (13), the plane-wave representation of the operator U becomes where u loc pq (R I ) and u NL pq (R I ) are the matrix elements given by Eqs. (28) and (29), respectively. This yields the pseudopotential Hamiltonian

Linear Combination of Unitaries
We study the structure of the matrix entries of each of the four operators in the pseudopotential Hamiltonian when deriving the LCU for H as in Eq. (35). There are two main differences from the setting in [87]: (i) We consider the general case of non-cubic lattices, that is, the primitive vectors a i have different lengths and are not orthogonal. Hence, the reciprocal lattice vector can no longer be substituted by G p = 2πΩ −1/3 p, and (ii) While the local term U loc resembles the operator U in the all-electron setting, the nonlocal term U N L has a radically different structure. Dealing with the complexity of this nonlocal term is one of the biggest challenges we face in deriving an efficient decomposition. We briefly show how the LCUs are derived for T, V, and U loc . The case of U N L is explained at a high level, with details appearing in Appendix D. Hereafter, we make use of the following conventions. We use η := η val to denote the number of valence electrons in the pseudopotential Hamiltonian. To make our future discussions more precise, we define the compound index I = (t, t j ) where t indicates the atomic species of the I-th ionic core, and t j enumerate the cores of that type (reading for example as the 'third oxygen atom'). To make the reference to nuclei I more explicit, we may sometimes use t I and t I,j .
LCU for T . Recall that T is a diagonal operator with entries and take the binary expansion where p ω ′ ,s is the s-th bit of p ω . Note the signed integer representation, where the (n p − 1)th bit determines the sign of the coordinate. One can apply the same trick as in the orthonormal lattice case [87], rewriting p ω,r p ω ′ ,s = (1 + (−1) pω,rp ω ′ ,s +1 )/2. By doing so, we reach the following decomposition: where sgn(x) = 0 if x ≥ 0 and is equal to 1 otherwise. Notice the appearance of the inner product in (−1) sgn(⟨bω,b ω ′ ⟩) , which yields one for an orthogonal lattice. The unitaries are in the last line of the equation. The LCU in Eq. (39) is also used for the kinetic term of the all-electron calculations as the lattices in our case studies are no longer orthonormal.
Remark 3.1. When the lattice is orthogonal, the sum corresponding to b = 0 is a multiple of the identity, which can be omitted in the qubitization of H. This shifting is used in our case studies with orthogonal lattices, giving an improvement over the LCU proposed in Ref.
[87], as it decreases the value of λ and thus the simulation cost.
LCU for V . The unitaries used here are signed translations of the lattice by the momentum vector ν. This strategy applies, given that the ma- , and its value depends only on ν (see [21, App. E.2]). This term identically appears in the all-electron setting. Furthermore, in that case, as the term 1 G 2 ν is shared by U and V selection probabilities, the LCU for U is similarly derived. See Appendix F for more details. The LCU is given by where as before the unitaries are denoted in the last line of the equation.
LCU for U loc . The LCU for the local term is derived similarly to V , given its symmetry with respect to any translation of the lattice by ν ∈ G 0 . However, the selection probabilities are different, as U loc entries include an additional exponentially decaying term in the numerator, denoted by γ I (G ν ) and defined below. The resulting LCU, with unitaries corresponding to the last line of the equation, is given by (42) The parameters r loc , C 1 , C 2 were previously introduced in Eq. (28). They depend on the atomic type t I , so the function γ I is determined by t I . We use this fact to change the notation to γ t when the context is clear.
LCU for U N L . The main insight in deriving an LCU for the non-local term is to exploit the projector representation of the operator as in Eq. (26). We then break down each projection P into a sum of the identity and a reflection operator 1 . This leads to a smaller λ and more efficient strategies for the qubitization of U N L . This decomposition also gives rise to the identity terms that lead to a shifted Hamiltonian. For convenience, we still refer to U N L and H by the same name after the shifting. A full derivation can be found in Appendix D, and the resulting LCU is given by where R(R I ) |p⟩ = e iGp·R I |p⟩ .
As before, the second line in the equation for the LCU are the unitaries in the decomposition. The coefficients c I,σ are complicated expressions representing sums over Gaussian terms, which we define fully in Appendix D. For example, the simplest one corresponds to These coefficients depend on the atomic specie t I and the label σ that denotes the type of Gaussian superposition defining the states |Ψ I,σ ⟩. There are eleven different choices for σ ∈ {0, . . . , 10}.
All corresponding Gaussian states are derived in Appendix D, with |Ψ I,σ ⟩ for σ = 0 shown in Eq. (30). Notice we need to use this sign function as the selection probabilities must be positive.
We make a few comments on the generalizability of the LCUs. Our preference in Eq. (42) to only include C 1 , C 2 , and not the higher order terms in Eq. (28) is motivated by the fact that C 3 = C 4 = 0 for all of our case studies. Nevertheless, the algorithm subroutines that are relevant to γ I (G ν ) apply without changes to any material for which C 3 , C 4 ̸ = 0. In fact, the LCUs above are applicable to any pseudopotential as defined in Eqs. (84) and (91). Indeed, for the local term, the LCU is identical to Eq. (41). For the non-local term, if one includes more than one projector, then the LCU will involve more Gaussian superpositions |Ψ I,σ ⟩, some of which do not appear in our explicit derivation in Appendix D. The generalization of our subroutines is straightforward as well, especially given the fact that we employ QROM for the preparation of superpositions, where the functional defining the amplitudes could change according to the specific parameters of the pseudopotential.

Breakdown of the qubitization subroutines
The qubitization of H = T + U loc + U N L + V is roughly broken down into a qubitization of each term. This means defining PREP and SEL subroutines for each of the four operators, for instance SEL T for the kinetic energy term. We use this notation from now on. Henceforth, 'AE' refers to the all-electron setting, with Hamiltonian H = T + U + V given by Eqs. (6) to (8) and no assumption on the lattice. In our discussion of PREP and SEL, we mention the needed adjustments to the AE algorithm when the lattice is not orthonormal. The all-electron setting with an orthonormal lattice, hereafter referred to by 'OAE', has been qubitized in [87,21].

Prepare operator (PREP)
Below, the target PREP state for the pseudopotential Hamiltonian is shown, as implied by their LCUs in Eqs. (39) to (41) and (43). The error analysis is done in Section 4. We refer to [87, Eq. (48)] for the similar equation in the OAE setting.
Each register is denoted by a subscript, such as X which labels the first register. While we borrow techniques from [87] to prepare some registers, there are also adjustments and new regis-ters: • We have to qubitize four operators instead of three.
• As the lattice is no longer orthonormal, the selection probabilities are more general, and in cases like T , they include the inner product of the reciprocal lattice vectors.
• When using pseudopotentials, the selection probabilities of U loc and V are not proportional to each other, hence their corresponding momentum state superposition cannot be shared. This is in contrast to the AE case, where the momentum state superposition of U and V is shared.
• The preparation of the state in (52) for the local term is via QROM, instead of the usual inequality test involving quantum arithmetic [87, Sec. II.C.]. The same holds for the register storing the amplitudes |c I,σ | for U N L .
We give a sketch of the preparation of the state of each register, with more details provided in Appendix H.
1. The state of register X is made of two qubits, and is a superposition prepared by QROM with amplitudes λχ/λ, where λχ is the sum of the selection probabilities of the LCU for χ . We enumerate T, V with χ = 00, 01 and U loc , U N L with χ = 10, 11. This state enables us to end up with the de- 2. The state f is made of five qubits, two for each of ω, ω ′ , and one storing sgn(⟨b ω , b ω ′ ⟩). This is prepared using QROM and is a part of the PREP state of T . There is a failure probability in the preparation of this state, where ineligible states are flagged by an additional qubit. This is not shown above to avoid cluttering. 6. The superposition over the momentum ν corresponding to V is given in registers j V , k V . This is prepared using an inequality test (Appendix F) followed by amplitude amplification to increase the probability of success, flagged by |0⟩ j V . The procedure is very similar to the OAE setting [87, Sec. II.C.], except that we use QROM instead of quantum arithmetic to compute one side of the inequality test.

The superposition in registers
7. The superposition with registers k loc , k ′ loc , s loc correspond to the momentum state of U loc . The register k ′ loc enumerates all nuclei I = (t, t j ). While for V the similar superposition is created using an inequality test, according to our simulations, the exponentially small term γ I (G ν ) makes the rejection of the inequality test happen too often, leading to excessive rounds of amplitude amplifications that increase the Toffoli cost. Therefore QROM is used to prepare almost the entire superposition. Note that γ I only depends on the atomic species, so QROM prepares the superposition over register k loc , s loc and the atomic species index t in the register k ′ loc . What remains to be done is a uniform superposition over all nuclei t j of each specie t, which is created using the technique in [49, App. A.2].
8. The registers k N L , k ′ N L , s N L give a superposition for the PREP state of U N L . The coordinate I in register k ′ N L enumerates all nuclei, with the same splitting mentioned above for k ′ loc . To prepare Eq. (53), recall that c I,σ only depends on the atomic type t I and the Gaussian superposition type σ. Therefore, a QROM produces the superposition over |t⟩ k ′ N L |σ⟩ k N L |sgn(c t,σ )⟩. Then a uniform superposition over the t j nuclei of type t is implemented, giving the desired Eq. (53).

Select operator (SEL)
We mostly borrow the corresponding implementation in the OAE case [87] for every SEL operator except SEL N L , while adjusting for general lattices and the different registers holding the momentum state for U loc and V . We devote more explanation to SEL N L as it is the operator with no similar precedent in the literature. Nevertheless, this section is not a detailed compilation, especially for SEL N L which is the most involved; we refer to Appendix H for more details.
There are some commonalities among all SEL operators, which we briefly discuss. The action of each operator is controlled on a register that flags the success of the corresponding state preparation. For example for T , we need to check three conditions: • The register c flags the success of i = j in registers d, e (i.e., |1⟩ c ), • The ancilla attached to register f flags the meaningful basis states (ω ̸ = 4 and ω ′ ̸ = 4) in the superposition prepared by QROM.
If all the above conditions hold, then we get |0⟩ T , and get |1⟩ T otherwise, in which case SEL T acts as the identity. Checking correctness of the PREP states can be performed using a few logical gates (Toffoli, CNOT, X). Note that these checks are performed as part of the PREP procedure, but we find it more informative to introduce them here. The second design shared by all select operators is a common CSWAP circuit and its inverse. This circuit first copies |p⟩ i (controlled on |i⟩ d ) into an auxiliary register, swaps it back into its place after the relevant SEL operations are carried out, then does the same for |p⟩ j controlled on |j⟩ e . This ensures that the SEL operations are all done on a single auxiliary register, obviating the need for the far more costly controlled operations directly on the system register [21, Eq. (E27)]. In the AE case, this is actually the costliest part of implementing the select operator. We now discuss how to implement each select operator, the sum of which is the desired SEL H .
1. SEL T : The transformation is given by The action above is essentially a phase, which is a combination of multi-controlled Z gates.
The addition and subtraction along with the phase implementation are all controlled on |0⟩ V . The non-diagonal superposition |i⟩ d |j⟩ e over the electrons, flagged by |0⟩ c , is acted upon by SEL V , while the rest of the SEL operators only act on the e register.
3. SEL loc : The operator acts in two main steps: This first step illustrates a controlled sign, along with a controlled subtraction |p⟩ j → |p − ν⟩ j . It also shows a QROM that reads |x⟩ loc |I⟩ k ′ loc and outputs R I into the register |·⟩ R only if x = 0. This is then used to implement the phase e iGν ·R I , following the techniques in [87, Eq. (50)]. The second step maps the previous state to (57) Here we erase the register R by taking the inverse of the QROM, and finish by applying Z s loc controlled on |0⟩ loc . As a remark, the AE circuit for SEL U is exactly the same as its OAE implementation in [87].
4. SEL N L : The unitary from the LCU in Eq. (43) is which can be broken down as a series of transformations, which we describe below. To avoid cluttering, we present only the registers involved in each stage. The first transformation applies a phase as follows: To implement this transformation, first a QROM reads |x⟩ N L |I⟩ k ′

N L
and outputs R I into |·⟩ R only if x = 0. Then the phase e iGq·R I is applied, similar to how e iGν ·R I was applied for U loc . The next stage is to apply the reflection onto a Gaussian state |Ψ I,σ ⟩. To do so, we need to prepare the state and apply a reflection: where U I,σ |0⟩ = |Ψ I,σ ⟩ acts on the same register as |q⟩ j . Note that the reflection also includes the flag qubit |0⟩ N L , ensuring that SEL N L acts by identity if the basis state has not been successfully prepared for the non-local term. This reflection is the most expensive part of SEL H , so it is worthwhile to discuss strategies to reduce its cost. Many materials have either orthogonal or partially orthogonal lattices, i.e., when a lattice vector is orthogonal to the other two. This crystallographic feature is prevalent among many materials of interest, including the ones utilized as cathode materials. For instance, about half of the crystal structures available in the Materials Project database [36] have orthogonal or partially orthogonal lattices. Assuming this, the Gaussian state can always be decomposed into the tensor product of three onedimensional (1D) Gaussian states, or a 1D+2D Gaussian state, respectively. Since QROM cost rises exponentially with the number of read qubits, it is important to exploit this decomposition. As a result, for orthogonal (and partially orthogonal lattices), one has three (two) QROMs acting in parallel and all reading n p (n p and 2n p ) qubits, instead of one QROM reading 3n p qubits. As an example of the decomposition, we have the following for Eq. (63) when the lattice is orthogonal: where G p,ω is the ω coordinate of G p . Once the reflection in Eq. (60) is implemented on |q⟩ j |0⟩ j , it leads to a superposition of the form p∈G ϕ I,σ (p) |p⟩ j for some amplitudes ϕ I,σ (p). The final steps are (i) the application of the phase e −iGp·R I , which is done similar to its inverse at the beginning, (ii) the erasure of R I from the register R by the inverse of the QROM that created it, and (iii) the Z gate Z s N L on s N L controlled on |0⟩ N L . Focusing on the notable registers, the final result is: Finally, we explain our choice of QROM for U I,σ in further detail. First, while the preparation of a discrete Gaussian state such as |Ψ I,0 ⟩ = p e −G 2 p r 2 0 /2 |p⟩ has been specifically treated in earlier [42], we have to also prepare higher-order derivatives of such states, for example: Even for a diagonal covariance matrix corresponding to an orthogonal lattice, the preparation method in [42] is inefficient compared to QROM as it assumes an arithmetic oracle. Furthermore, the specific case of a discrete Gaussian state with a non-diagonal covariance matrix has not been properly investigated and optimized. While the work in [42] provides some ideas like a simple basis change, the details regarding non-orthogonal lattices are far more complicated and our estimates show that the algorithm does not yield the actual state in a costefficient way. In our range of applications, more recent methods like inequality test coupled with quantum arithmetics [87] fail at providing a good balance of the product of number of qubits and number of gates. A more sophisticated preparation method called state preparation without coherent arithmetic [66] is promising. However, it is also more complicated for cost and error analysis and crucially provides less parallelization and qubit/gate trade-off opportunities, which we frequently exploit to reduce overall cost of the algorithm.

Error analysis and the effective value of λ
The quantum phase estimation algorithm targets a maximum total error that we denote by ε. To achieve this, we need to identify all individual sources of error in the algorithm. We use ε X to denote each source of error, where X will be replaced by a label describing the type of error. These errors are ultimately related to finite precision operations respectively using n X bits. The choice of n X further determines the number of qubits and non-Clifford gates used in the algorithm, and is involved in identifying the effective value of the normalization factor λ after qubitization. Below, we review the different sources of error, and discuss how to compute λ. We relegate the detailed derivations to Appendices I and J.

Overview of errors and finite size approximations
Many of our errors are related to the precision of the rotation angles θ in Algorithm 1 when building a superposition using QROM. Below, we show the complete list of all qubitization errors: 1. εχ is the error due to using nχ bits for the precision of the rotation angles necessary to build the superposition of register X (Eq. (47)) using QROM.
2. ε B is a similar error, due to using n B bits for preparing the superposition of register f in Eq. (48).
3. ε N L is the error due to using n N L bits in the QROM for building the PREP state for U N L in Eq. (53).
4. ε M V is the error due to using n M V bits in the QROM computing the inequality test for preparing the PREP V state in Eq. (51).
5. ε M loc is the error due to using n M loc bits for preparing the local PREP state Eq. (52) with QROM.
6. ε Ψ is the error due to using n Ψ bits for building the superpositions |Ψ I,σ ⟩ using QROM.
7. ε R ≤ ε R,loc + ε R,N L is the error due to the finite size register n R used to represent R I in register R for the implementation of SEL loc and SEL N L .
Finally, let ε QPE be the error from quantum phase estimation. To achieve an approximation ε of the ground state energy, it is necessary and sufficient that The proof follows a similar argument to the OAE setting [87, Thm. 4].
Each ε X is upper bounded by an expression depending on n X , which thus determines the value of n X needed to obtain an error ε X . We estimate all errors in Appendix J, with results summarized in Table 6. As an example, following the Lemma in Section 2.4, it can be shown that which implies that gives an error less than or equal to εχ.

The effective value of λ
The value of λ determines the scaling in the block-encoding of H and is a significant factor in the cost that is highly dependent on the chosen LCU and compilation strategy. Therefore, its accurate computation is important. The algorithm targets a PREP state that is different from the theoretical one implied by the LCUs, and the resulting effective λ is closely related but technically different from the one implied by the LCUs. Note that this is also a feature of previous work ([87, Thm. 4]).
Computing the effective λ requires finding the effective one for each of the four operators T, U loc , U N L , V . To do so, we identify any failure/success probability embedded in the PREP implementation. Here, failure refers to a basis state itself being inadmissible. This appears when preparing a uniform superposition over a basis that is not a power of two, or when an inequality test is rejected. Even upon success, the obtained amplitudes are usually different from the desired ones. For example, we use an inequality test to prepare amplitudes G −1 ν for Eq. (51), and after success, we get some complicated expression approximating G −1 ν (see Eq. (167)). Overall, the λ for each operator will roughly look like a sum of effective selection probabilities α ℓ divided by a product of success probabilities i P i . Then the effective total value λ = λ T + λ loc + λ N L + λ V can be derived. We do this in Appendix I and summarize the values in Table 5. As an example, we show λ T below. This is computed after the compilation of the LCU of T in Eq. (39). In the numerator of the expression below, we have the explicit value for α ℓ . In the denominator, we have a success probability related to the preparation of the uniform superposition over pairs of electrons in Eq. (50):

Gate and qubit cost
Having described the main steps of the algorithm and identified the sources of error, we now quantify the number of gates and qubits needed to run the full procedure. We follow standard practice in the literature and focus on Toffoli gates since they constitute the bulk of non-Clifford gates used in the algorithm. The compilation of the algorithm involves numerous strategies, each requiring its own separate cost estimate. This leads to an extensive analysis that cannot be summarized in simple expressions.
We have thus gathered all Toffoli costing calculations in the Appendix Table 7 and demonstrate the derivation of each in Appendix K. These formulas are implemented in code at https:// github.com/XanaduAI/pseudopotentials and used to perform resource estimation calculations. Here we focus on highlighting the most expensive steps of the algorithm.
First, we recall the rough estimate for the cost of a qubitization-based QPE algorithm: where we explicitly include the qubitization costs of PREP and SEL. The factor of two appears because in the qubitization operator we apply PREP and its complex conjugate. The multiplicative factor λ/ε is the largest contributor to the total cost. For example, for a system with N = 10 5 plane waves and a hundred electrons (Table 9), this fraction gives a factor of about 10 9 , while the qubitization cost contributes roughly a factor of 10 5 .
The two most costly subroutines in the qubitization part of the algorithm are: )⟩, which is part of the PREP of the local term.
• Performing the reflection on the Gaussian superpositions |Ψ I,σ ⟩, which is part of SEL of the non-local term.
The expressions for the Toffoli cost of these steps are roughly given by respectively, where n QROM = n p + ⌈log(T )⌉ + 2, where T is the number of atomic species in the cell. Given how the parameters β loc , β Ψ , β ′ Ψ divide the leading terms, their values are quite important in lowering the cost. However, these scalars have to satisfy constraints based, for example, on the number of available dirty qubits.
The space-time trade-offs of QROM during the resource estimation is explained further in Appendix M.
In our calculations, each of these constitutes the largest share of PREP cost and SEL cost , respectively. While SEL cost is slightly costdominant for a smaller number of plane waves, the trend is reversed as N grows beyond 10 5 . Consequently, one significant obstacle in further optimizing the qubitization algorithm is the preparation of Eq. (52), where γ I (G ν ) is the arithmetically involved term in Eq. (42).
The derivation of the qubit cost of the algorithm is of similar complexity to the gate cost. We provide a full analysis in Appendix L, where we reuse any uncomputed and clean qubits whenever possible. We also make the distinction between clean and dirty qubits as the latter is implicated in QROMs. Although our counting is different from [87, 21], our results in Section 7 follow the same behaviour: the overwhelming contribution to the qubit cost is the size of the system register, which requires 3ηn p qubits. As an example, the total clean qubit cost for simulating a system of 408 electrons with N = 10 5 plane waves is about 9,892, while the system register uses 3ηn p = 3 · 408 · 6 = 7, 344 clean qubits (Table 12).

Application: lithium-excess cathode materials
In this section, we focus on the quantum simulation of lithium-excess (Li-excess) cathode materials, which have been recently proposed for higher-capacity cathodes [106, 16, 37]. By replacing a fraction of the transition metals with Li atoms, Li-excess materials can potentially offer up to twice the specific capacity, i.e., the total amount of charge stored per unit mass, of conventional cathodes [103]. For example, Li 2 MnO 3 has a theoretical capacity of 460 mAh g −1 responsible for the voltage of ∼ 4 V [104]. This yields a specific energy of (460 mAh g −1 ×4 V) = 1840 Wh Kg −1 which is well above the specific energy (800 Wh Kg −1 at the cathode material level) required to enable full driving performance and significantly reduce the cost of electric vehicles [53]. However experiments reveal average voltages of ∼ 3.8 V and a much smaller capacity of about 180 mAh g −1 following the first charging cycle [101]. This voltage decay and the abrupt capacity loss, which is common to other Li-excess materials, is an important obstacle in designing higher-capacity batteries.
The voltage profile of a cathode material, i.e., the voltage V (x) measured as a function of the concentration of the Li ions x, provides relevant information about the Li insertion process [95]. The typical voltage profile of Li-excess materials is sketched in Fig. 2. A distinguishing feature of this profile is a long plateau region following the initial sloping curve during charge. This is indicative of the cathode material transitioning from a solid solution phase to a region where two phases of the material coexist [95]. Moreover, the hysteresis loop reveals that the removal of Li ions is accompanied by irreversible structural transformations causing the capacity loss observed experimentally [106].
The capacity gain of Li-excess materials is most frequently attributed to the oxidation of oxygen anions (O 2− → O n− , n < 2) (anion redox) [82]. This redox mechanism and alternative mechanisms have been proposed and discussed extensively in Ref.
[106] along with their relation to structural transforms resulting in materials degradation and performance loss. While important questions remain open, there seems to be a consensus that the increase of oxygen hole states and lithium vacancies destabilizes the metal-oxide chemical bonds and leads to the formation of oxygen dimers [17,67]. Furthermore, oxygen dimerization cooperatively favors the migration of transition metals to Li-vacancy sites in the structure, which is the main process driving the structural transformations in Li-excess materials [68, 44].
It follows that identifying the most stable phases of the cathode material for compositions with excess Li is crucial for determining the dominant redox mechanisms and, more importantly, for deriving potential strategies (e.g., doping, modifying the crystal structure) to retain more reversible capacity. The relevant quantity for ranking the stability of possible phases of the delithiated material is the formation energy computed for a given Li ion concentration [93]. For example, in the case of Li 2 MnO 3 , this is given  Figure 2: a) Sketch of a typical voltage profile for Liexcess materials. The curves depict the charging (orange) and discharging (dark-blue) processes. In the transition metal oxidation region, the extraction of Li ions is charge-compensated by electrons provided by the transition metals. Further Li extraction is thought to be possible via the oxidation of oxygen atoms (excess oxidation). Phase transformations occur within the plateau region. b) A typical formation energy plot with respect to the Li-ion concentration x in a Li-excess cathode. The brighter (dimmed) blue circles correspond to the more (less) stable phases in this representation. The dotted line indicates the convex hull consisting of the most stable phases. The orange circles correspond to the fully lithiated and delithiated phases where ab initio calculations are strongly supported by experimental data. Differences between stable, unstable, and inaccurate calculations (orange dimmed circle) are in the energy scale of millielectronvolts (meV), as depicted in the inset.

by [54]
where E(Li x MnO 3 ) is the ground-state energy of the material in a given phase, and E(Li 2 MnO 3 ) and E(Li 0 MnO 3 ) are respectively the total energies of the stable reference materials at the end points of the charging curve (x = 2 and x = 0).
where δ denotes the number of oxygen atoms removed per formula unit and E(O 2 ) is the groundstate energy of the oxygen molecule. Different strategies have been proposed to mitigate the capacity loss of Li-excess materials [52, 76,84,85,47,72,18,24]. Most of them rely on modifying the composition and/or the atomic structure of the material to suppress transition metal migration in the delithiated cathode. To that end, computing accurate site energies, i.e., the ground-state energy of the material as the transition metal occupies different lattice sites, has proven to be useful [24]. Furthermore, from the ground-state energy of the transition state (TS) of the material along the transition metal migration path [30], it is possible to compute the activation energy E TS −E 0 to describe the kinetic pathways and, crucially, determine the reversibility of the structural transformations of the material.
The accuracy of the electronic structure method used to simulate Li-excess materials is key as it impacts the entire computational methodology used to investigate these materials [106, 82,24,67]. Due to the practical computational limitations of more accurate methods, we only have access to approximate density functional theory (DFT) methods which can introduce significant errors as they rely on approximate parametrizations of the exchangecorrelation density functional. In particular, the simulation of key battery properties requires computing the difference of ground-sate energies of materials with very different electronic structures, see for example Eq. (71) to compute the formation energy. In these scenarios, DFT approximations do not benefit from cancellation of errors, and very accurate energies need to be computed. Standard local and semi-local approximations to the density functional can not properly describe the strong on-site electronic correlations between the d electrons in the transition metal [93].
This problem is partially mitigated by adding a Hubbard-like term in the Kohn-Sham Hamiltonian, the so-called DFT+U method [35]. However, the value of the Hubbard parameter U , typically obtained from experimental data [82,24,67], is strongly system-dependent which limits applicability of the method to explore new materials. In addition, authors in Ref. [82] noted that DFT+U cannot predict the band structure of Li-excess materials with the required accuracy. Instead, they used a hybrid functional [32] which incorporates a fraction of the exact exchange from Hartree-Fock theory as part of the exchange-correlation functional. This approach also depends on an adjustable parameter selecting the amount of HF exchange included in the calculations, and finding its optimal value is an issue if no experimental data is available. Finally, Zhang et al. [106] have recently pointed out that these approximations may break down in the presence of oxygen-oxygen bonding which is one of the most important relaxation process in delithiated Li-excess materials [67].
In summary, relevant electronic structure calculations for Li-excess materials include formation energies of delithiated phases, oxygenvacancy formation energies, site energies, and activation energies for kinetic pathways. Any of these simulations can be reduced to a series of ground-state energy calculations, each of which can be performed using our quantum algorithm. Note that the proposed algorithm is a full firstprinciples approach that, for a given structural model of the target material, can be used to compute its ground-state energy with guaranteed precision using a fault-tolerant quantum com-puter. It does not depend on any semi-empirical parameter and it can be used to simulate any material consisting of atomic species for which HGH PPs are accessible. We now study the resources required to implement our quantum algorithm for the ground-state energy calculations for Li-excess materials.

Resource estimation
We now estimate the number of qubits and Toffoli gates needed to implement our pseudopotential-based algorithm for three different Li-excess materials: lithium manganese oxide (Li 2 MnO 3 ), lithium nickel-manganese oxide (Li[Li 0.17 Ni 0.25 Mn 0.58 ]O 2 ), denoted as LLNMO, and lithium manganese oxyfluoride (Li 2 MnO 2 F).
To that end we have built supercell structural models of these materials as described in detail in Refs. [54, 24, 67]. The atomic models are shown in Fig. 3. The lattice parameters of the supercells are summarized in Table 2, and the procedure to delithiate the pristine materials is described in Appendix B.
The resource estimation results targeting chemical accuracy are shown in Table 3. All calculations are performed using our resource estimation software, which is available at https: //github.com/XanaduAI/pseudopotentials. This code allows us to compute the resources of both the pseudopotential and the all-electron algorithms for any material. In all cases, we see that the pseudopotential-based algorithm has gate counts that are roughly four orders of magnitude lower than in the all-electron case. Qubit numbers are also considerably lower, roughly needing less than half as many qubits.
We note that the cost of the pseudopotential (PP) and all-electron algorithms are comparable even when using the same number of planewaves. For Li 0.5 MnO 3 and N = 10 5 , the Toffoli cost for PP is 2.68 × 10 14 , and for the all-electron is 1.44×10 14 . We refer to the tables in Appendix M for comparisons for each material. The cost competitiveness for simulations using the same number of plane waves is not just the result of only having to use half the number of electrons with the pseudopotential, but also a result of our tailored compilation strategy, which involves a suitable LCU to lower the value for λ, and the use of   Figure 3: Representation of the atomic structures of the Li-excess materials selected to perform the resource estimation of the quantum algorithm. These structural models corresponds to the supercells described in Table 2. The polyhedra depict the octahedrally coordinated transition metals. These figures were generated using the VESTA package [69].
QROM for the state preparation method. When targeting the same energy accuracy, the cost of the all-electron algorithm is significantly higher as the number of electrons increases, and a very large number of plane waves are needed to properly describe the quantum system (see Table 2).
The number of plane waves used to perform resource estimation are reported in Table 2. These numbers were estimated by performing a convergence analysis of the total energy of the materials using density functional theory calculations with the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional [74] (see Appendix C). Using pseudopotentials, we achieve convergence with roughly 6 × 10 4 plane waves. For the all-electron calculations, we need approximately 7 × 10 8 plane waves.
Calculations using the HGH pseudopotentials were performed using the Quantum ESPRESSO package [27], and the total energy was computed for the Gamma (Γ)-point only, which is sufficient to represent the large systems studied in this work [71]. A cutoff energy of 70 Ry ≈ 950 eV was set for both Li 0.5 MnO 3 and Li 0.75 MnO 2 F, while a cutoff of 80 Ry ≈ 1000 eV was set for Li[Li 0.17 Ni 0.25 Mn 0.58 ]O 2 (see Appendix Fig. 4). For the all-electron case, we used the localized augmented plane-wave plus local orbitals (LAPW+lo) method as implemented in the WIEN2k code [13]. In this case, a larger number of plane waves are needed to achieve convergence of the total energy. The most important parameters that have to be considered are the muffin-tin radii (R MT ) and the plane-wave cutoff (K max ) for  the expansion of the wave function in the interstitial. The basis set cutoff parameter is defined by the product R MT K max , which was set to 7 for all the structures in this work. In all cases, the convergence energy was calculated without applying structure relaxation as this would not significantly affect the total number of plane waves needed to achieve convergence.
Using the data in Table 2, we can measure the Toffoli and logical qubit cost of our algorithm. These estimates are benchmarked against the allelectron setting for the same materials. Results are also shown for the dilithium iron silicate cathode Li 2 FeSiO 4 , studied earlier in [21]. We only report the cost of performing one round of quantum phase estimation. A more comprehensive analysis of total cost should include the overhead due to a limited overlap of the initial state and the cost of initial state preparation. We can also consider the T-gate cost of the algorithm, but these are at least an order of magnitude lower than that of the qubitization-based QPE [21].
We also report the depth of the circuits along with the clean portion of the qubit cost in Appendix M illustrating further the advantage of using pseudopotentials and QROMs in our algorithm. We compute this for the following reasons. First, studying depth is the first step towards answering the more complex and interesting question of algorithmic runtime. Second, compared to other state preparation methods, QROM offers far more space-depth trade-off flexibility, something we would like to leverage. Also, given the high costs of the algorithm, it is reasonable to assume that capable hardware would also offer the possibility of simultaneous Toffoli applications.
To estimate the Toffoli depth, we have to specify certain parameters such as the maximum number n dirty of dirty qubits available for QROM, the maximum allowed number n tof of simultaneous Toffoli applications. More details are provided in Appendix M. Notice that due to the relatively significant use of QROMs, the PP-based algorithm stands to benefit more than the AE algorithm on circuit depth reduction. As an example, for Li 0.75 [Li 0.17 Ni 0.25 Mn 0.58 ]O 2 , the clean qubit cost and Toffoli depth for PP are respectively 11,130 and 9.59×10 14 , while for AE they are 29,784 and 3.59×10 19 .
Lastly, note that while computing depth gets us one step closer to physical resource estimation, we are still estimating resources at the logical level, i.e. estimating the depth of the logical circuit. One of the assumptions in the logical setting regarding QROM is the all-to-all connectivity of the logical qubits, an assumption that could break down in realistic hardware. However, such geometrical constraints can be overcome through clever architecture designs and lattice surgery techniques [57,56], showing only a logarithmic overhead of Clifford gates is incurred to bring together far away encoded logical qubits on the hardware.

Conclusions
This work introduced the first example of a quantum algorithm that makes use of ionic pseudopotentials to simulate periodic materials. Our main technical contributions are a collection of highly-optimized compilation strategies for qubitization-based quantum phase estimation in first quantization. They are designed to reduce the cost of implementing the algorithm despite the mathematical complexity of pseudopotential operators. A key ingredient is the use of quantum read-only memories to avoid performing complicated arithmetic operations on a quantum computer. This also helps navigate tradeoffs between the number of qubits and number of gates in the algorithm, which can be exploited to reduce costs. Overall, we reduce the cost of the quantum algorithm by about four orders of magnitude compared to the all-electron approach described in Ref.
[87] when applied to simulating lithiumexcess materials for a fixed target accuracy of the ground-state energy.
Using these tailored quantum algorithms, we estimated the number of qubits and Toffoli gates needed to simulate Li-excess cathode materials proposed in the literature. In each of these cases, a large supercell is necessary to ensure the quality of the simulation, which considerably increases the cost of the all-electron approach, making the use of pseudopotentials even more important. On the other hand, even though our algorithm is applicable to any lattice, it benefits from the orthogonality of the lattice vectors, as this reduces the cost of quantum read-only memory techniques for preparing relevant Gaussian superposition states. It is desirable to study strategies where the cost of implementing the quantum algorithm is less dependent on the specific structure of the lattice.
The accuracy of the quantum algorithm depends on the quality of the pseudopotentials. Therefore, in selecting the Hartwigsen-Goedecker-Hutter pseudopotential we rely on extensive benchmarking by the density functional theory community that have identified it as accurate and transferable [51]. Additionally, we have constrained our analysis to separable pseudopotentials with one projector per angular momentum channel in the non-local component of the pseudopotential operator. Possible extensions of this work might consider more than one projector and the use of ultrasoft pseudopotentials [96] to further reduce the number of plane-wave basis functions as observed in classical algorithms.
Quantum computing offers the potential to perform high-accuracy simulations of stronglycorrelated systems of unprecedented size. This is an outstanding challenge for classical methods, that either cannot offer the same accuracy guarantees, or incur prohibitive costs. Still, to realize the potential advantages of quantum computing, further efforts are invaluable to reduce the cost of quantum algorithms. For example, the quality of the initial state preparation method should be addressed, since a single round of quantum phase estimation may need to be repeated too many times for states with poor overlap.
Overall, we have demonstrated that the method of pseudopotentials can be effectively applied to reduce the cost of quantum algorithms to simulate battery materials that require large supercell structural models. This can potentially unlock the application of quantum computing to address more complicated processes involving doped materials as well as chemical reactions at the electrode-electrolyte interface.

Acknowledgments
We thank Matthew Kiser, Stepan Fomichev, Soran Jahangiri, and Nathan Killoran for their fruitful comments.

A Plane wave matrix elements of the pseudopotential operator
In this section we derive the expressions for the matrix elements of the local and non-local components of the pseudopotential operator described in Section 3.1 in a plane-wave basis.

A.1 Matrix elements of the local potential
For an ion located at the coordinates R, the plane-wave matrix element of the local potential is defined by the integral where G ν = G q − G p . By changing the variable r → r + R we obtain with r = ∥r∥. In spherical coordinates, the integral above transforms as dθ cos(G ν r cos(θ)) r 2 sin(θ).
To integrate the angular variable we use which simplifies Eq. (75) to the integral over the radial variable Note that Eq. (77) is general and it can be used to compute the matrix element of any given local potential u loc (r). For the case of the Hartwigsen-Goedecker-Hutter (HGH) pseudopotential we insert Eq. (25) into Eq. (77) to obtain the following expression: where α = 1 √ 2r loc . The first integral is computed as follows: To evaluate the second term we compute the integrals: Using Eq. (79) and Eqs. (80) to (83) we obtain a final expression for the matrix elements of the local potential

A.2 Matrix elements of the non-local potential
The matrix elements of the non-local potential u NL defined in Eq. (26) are given by where ⟨r|β ij are reported in Ref.
[29], and R denotes the coordinates of an ion in the material. Here Y lm (ϕ, θ) denote spherical harmonics. By changing the variable r → r + R the overlap integral ⟨φ p |β (lm) i ⟩ writes as The plane wave function φ p (r) can be expanded in terms of spherical harmonics Y lm (ϕ, θ) and Bessel functions j l (G p r) [1] wherer andĜ p are unitary vectors. Using this expansion we compute the overlap in Eq. (86) as Similarly, the projection ⟨β (lm) j |φ q ⟩ is given by Plugging Eqs.
where G ν = G q − G p , P l is a Legendre polynomial, and ⟨p|i⟩ l is an overlap integral over the radial coordinates given by To compute the matrix elements of the non-local component of the Hartwigsen-Goedecker-Hutter (HGH) pseudopotenial, we use the HGH projectors defined as: where the radii r l give the range of the l-dependent projectors, and with Γ denoting the gamma function. For the case of one projector (i = 1) and l max ≤ 2, the overlap integrals defined by Eq. (92) are calculated as: Using (95)-(97) we compute the product of the projections ⟨p|i⟩ l ⟨i|q⟩ l entering Eq. (91) to obtain the final expression for the matrix elements where the coefficient B l := B  Figure 4: Convergence of the ground-state energy for the selected materials computed using DFT at the level of the PBE-GGA exchange-correlation functional [73]. a) Energies computed using plane waves and the HGH pseudopotentials as implemented in the Quantum ESPRESSO package. b) Energies obtained for the all-electron calculations using WIEN2k as a function of the cutoff parameters R MT and K max . The number of plane waves used to perform the resource estimations for each material is indicated with arrows.

B Structural models of lithium-excess materials
Here we provide more details on how structural models were built for the lithium-excess materials studied in this work. These models are used for determining the number of plane waves needed for convergence of the total energy, computed using density functional theory, and for resource estimation of the quantum algorithm. . Similarly, we removed one lithium layer from the supercell to estimate the resources to compute the site energies reported in Ref. [24]. Finally, the structural model for the delithiated Li 0.75 MnO 2 F material was obtained by randomly taking Li ions off the pristine material until we match the Li-ion concentration of x = 0.75. The relaxed structure of the fully lithiated material was taken from Ref. [83].

C Number of plane waves used to perform resources estimation of the quantum algorithm
The purpose of estimating the number of plane waves is to have an appropriate number that we can use to perform resource estimations of the quantum algorithm. To this aim, we estimate the number of plane waves needed to perform a classical calculation using the structures studied in this work. We conducted an energy convergence test by changing the cutoff energy and calculating the total energy at the Γ-point (Fig. 4). Each point in Fig. 4a plot corresponds to the total energy value, calculated by using the Quantum ESPRESSO package, for a given kinetic energy cutoff. Fig. 4b) depicts a similar analysis conducted using the WIEN2k program, an all-electron approach, where the cutoff in the reciprocal space is given by the (R MT and K max ) parameter. The number of plane waves was chosen to ensure that the total energy variation was less than 1 kcal/mol = 0.043 eV (chemical accuracy).

D Linear combination of unitaries for the non-local potential
This section provides details on the construction of the linear combination of unitaries (LCU) decomposition for the non-local term of the pseudopotential operator. The LCU for the non-local term exploits the projector representation of the operator, and breaks down each projection P to the sum of unitaries 1 We decompose U N L by focusing on each individual term in the function f I . Define: Now for the term in Eq. (100) including r 0 , we define where R(R I ) |p⟩ = e iGp·R I |p⟩ is a phase action and we have suppressed the electron index j for the phase and for the register in state |Ψ I,0 ⟩. Thus, in the formula above, using the same notations of Eq. (43) in the main text, for σ = 0 we have The sign of c I,0 only depends on B 0 , i.e., on the atomic type of I. For the next term in U N L involving r 1 , we adopt the notation σ = (1, ω). For ω ∈ {x, y, z}, we define where G p = (G p,1 , G p,2 , G p,3 ), and we have the following decomposition depending on the coordinate ω: For the term including the scalar 32r 7 2 B 2 /45, corresponding to the index σ = (2, 0), we define and note the following decomposition: implying Finally, for the term including the scalar in Eq. (100), corresponding to the index σ = (2, (ω, ω ′ )) where ω ̸ = ω ′ ∈ {x, y, z}, we first define and derive the following decomposition implying This finishes the LCU for U N L : In each of the above decompositions, there is a multiple of the identity, which are not considered in the qubitization as we can shift the Hamiltonian by the appropriate scalar. Throughout the text, we use the alternative indexing 0 ≤ σ ≤ 10 to index the operators above in the order they were derived.

E QROM: application, parallelization and costs
This section of the appendix provides more details on how quantum read-only memories (QROMs) can be used to prepare arbitrary superposition states. It also explains the cost of implementing a QROM and the space-time tradeoffs that arise.

E.1 Using QROM to prepare superpositions
As mentioned in Section 2.4, there are three different types of QROMs that one can use. Two of these, called Select and SelSwapDirty, are of interest to us. Fig. 5 provides an overview of their circuit implementation. We now prove the error estimate in Lemma E.1. We use the same notations as in Section 2.4, which we briefly recall here. The target state is |ψ⟩ = x a x |x⟩, where we assume a x ≥ 0 for simplicity. For any bit-string y of length w ≤ n, define p y = prefix w (x)=y |a x | 2 , cos(θ y ) = p y0 /p y , and the QROM oracles O w |y⟩ |0⟩ = |y⟩ |θ y ⟩, outputting θ y up to b bits of precision for each w. The precise form of Lemma E.1 assumes an exact preparation for the synthesis of rotation R.

Select
Swap with norm ε b,y for each rotation, the next step gives Remark E.1. To synthesize a rotation, one needs a so-called gradient state. Given access to a gradient state, the synthesis of an exact rotation has cost b − 3 Toffolis [87, Eq. (55)]. We note that the gradient state used in all single-qubit rotations synthesis in our algorithm is precomputed. Even if we were to consider the error, the cost added is polylogarithmic and can be safely ignored in our resource estimations. Remark E.2. Assuming the amplitudes are not positive, there is one last iteration of the algorithm which we did not include in the previous estimate. If the phase ϕ x = arg[a x /|a x |] is nontrivial, then a QROM and its inverse reading n qubits along with a rotation are needed. Thus, in general, the error of this last rotation needs to be added to the estimate above. However, we explain below why there is no error for our specific cases.
Our applications are either part of the PREP subroutine or part of SEL N L . For SEL N L , the preparation of the Gaussian states |Ψ I,σ ⟩ of certain types σ has amplitudes with a phase that is ±1. In those cases, the rotation is a Z gate which has no error. For the PREP subroutines, such as those in Eqs. (52) and (53) for the momentum state of U loc and the PREP state corresponding to U N L , we notice a register s loc , s N L that holds a sign which is later used in the corresponding SEL subroutine. The cost of computing this is one last QROM that reads n qubits to compute the sign. Again, there is no error in this last step.

E.2 QROM gate and qubit costings
In this section, we list the costs that are relevant for performing resource estimation of various parts of the algorithm that use variants of QROM. Table 4 further below, which is a copy of Table 1 in the main text is used to derive the estimates, along with a careful examination of the cost of each step of Algorithm 1. In some cases, we are simply recomputing some of the estimates in [63, App. D.b] but with more precision. First, we recall the Toffoli cost of a single QROM for our variants of interest: Using the above for SelSwapDirty and recalling Remark E.1, we estimate the Toffoli cost of Algorithm 1 for N = 2 n . It is given by which is bounded by The leading factor of two is due to the application of O w and its inverse in the iterative process. Further, the sum over w goes from 0 to n, instead of 0 to n − 1, since in the last iteration we need to also output the phase ϕ To find the optimal value of β, one has to also take into account the maximum number of available dirty qubits n dirty . This number is set equal to the total number of qubits on the circuit, minus the ones already used as clean qubits in the QROM itself. As the QROM consumes βb many dirty qubits, we must have βb ≤ n dirty . Since the leading term in the cost is always 2(2 n+1 − 1), one can show that the optimal value of β satisfying its constraint is Lastly, the cost for preparing a state using the Select variant of QROM is The qubit costings are already described in Table 4. Note the clean qubit cost b + ⌈log(N )⌉ is independent of β.
Remark E.3. The variant of choice for QROM when applied on very few qubits is Select. Its implementation, as shown in Fig. 5, along with its cost analysis, is far simpler than that of SelSwapDirty.

E.3 QROM Toffoli depth
SelSwapDirty offers considerable flexibility in controlling the Toffoli depth. As shown in Fig. 5, the implementation of the operator SWAP in SelSwapDirty involves β-controlled swap operations on β + 1 registers, each of size b. Without any parallelization, the depth would be βb. However, such an operation can be extensively parallelized on a circuit to a depth of b log(β) as explained in [63, App. B.2.b], using simply controlled swap operations applied on two registers of size b. The parameters involved in this depth reduction and considered for the purpose of resource estimation, are • κ: This factor is between 1 and b and determines the extent to which we further parallelize the circuit by implementing κ of the b many controlled swap operations simultaneously. This would bring down the depth to (b/κ) log β. The assumption in [63] is that κ = b, meaning all controlled swap operations are applied in parallel.
• n tof : The maximum allowed number of simultaneous Toffoli application.
• n dirty : Already defined in Appendix E.2.
It can be shown that the depth of SelSwapDirty on n qubits is: As mentioned, the choice of κ = b gives Table 4, but we have the following constraints on the number of simultaneous Toffoli applications and dirty qubits: Finally, the depth of Algorithm 1 is Given the range of parameters in our case studies, we always have b, n ≪ 2 n+1 − 1. Thus, the optimal value of β satisfying the constraints is where log e is the logarithm in the natural basis. Note that if n dirty , n tof are large enough, for example ∼ O(2 n+2 ), then we could set κ = b and the minimum possible depth would be achieved with β = ⌊ 2·(2 n+1 −1) 3bn/κ log e 2⌋: 2 3 n + 2 − log(3n) + log(log e 2) n + 3n

F Inequality test
In this section, we review and generalize to general lattices the inequality test technique used for the preparation of the momentum state superposition for the all-electron Hamiltonian. Recall that this state is shared by the PREP state of U and V . The task is to prepare: where λ ν is a normalization factor. There are two main steps, with the first independent from the lattice structure.
1. Prepare a unary-encoded register on µ = 2 to n p + 1, i.e., np+1 µ=2 √ 2 µ |µ⟩. Then using controlled Hadamards over registers |ν x ⟩, |ν y ⟩, and |ν z ⟩, prepare a uniform superposition taking values from −2 µ−1 + 1 to 2 µ−1 − 1 as signed integers. These superpositions are over a series of nested cubes C µ = {ν : ∥ν∥ ∞ < 2 µ−1 }, whose differences are denoted by B µ = C µ \C µ−1 . In the previous preparation, |+0⟩ and |−0⟩ both appear, and the latter is flagged as failure. Furthermore, to avoid double-counting, for a ν prepared for µ, if ν / ∈ B µ , it is flagged as failure. Finally, we prepare a uniform superposition over a register |m⟩ of size n M , where n M is to be determined later by error analysis. Overall, we obtain the following state: where M = 2 n M and |Φ ⊥ ⟩ includes the basis states flagged as failure by any of the two qubits in register f lag. See [9, p. 4-5] for more details.
2. For every |µ⟩ |ν⟩ |m⟩, we check whether: where b min = σ 3 (B) is the smallest singular value of the lattice matrix B := (b 1 , b 2 , b 3 ) with b ω the reciprocal lattice column vector. Notice b min coincides with min ω ∥b ω ∥ for an orthogonal lattice. One must show the soundness of the inequality, in other words, as m < M , we need . To implement the inequality test, we opt to use a SelSwapDirty QROM circuit O to compute: where a ν = ⌈M (2 µ−2 b min /G ν ) 2 ⌉ is calculated classically. This is followed by an n M -bit comparator circuit to compare a ν to m and store the success as |0⟩ in the third flag qubit below: Upon success, the amplitude for each ν is as desired up to a uniform scale: Amplitude amplification for |000⟩ flag is the last step, amplifying its probability above some predetermined p th . This finishes the preparation of Eq. (129).

G Exact amplitude amplification with known initial amplitude
In our implementation of SEL N L , we need an improved amplitude amplification that yields a success probability equal to one for preparing the state |Ψ I,2,0 ⟩. As mentioned in [66, App. A], this technique is folklore knowledge but no reference could be found describing the costings of the method. This section gives an overview of the algorithm and discusses how to compute its cost.

G.2 Cost and error
The Toffoli cost in this case, assuming a priori access to a gradient state, is the one Toffoli used to prepare Eq. (135), plus n AA − 3 Toffolis to make the rotated ancilla with n AA bits of precision. The qubit cost is the two additional ones used in Eq. (135) plus the n AA qubits to be used in the gradient state to achieve required precision. In our resource estimations, we choose a default value of n AA = 35, which is more than enough to ensure high precision in our cases. Notice this cost is to be added to the cost of preparing Φ, and then multiplied by (2l + 1) to give the full amplitude amplification cost.
Overall, requiring a lot more bits of precision is very cheap, given that the costs are many orders of magnitude less than the other subroutines involved in the preparation of |Ψ I,2,0 ⟩. Lastly, no error on the ground state energy estimation is induced from this step; the effective value of λ changes very slightly, through the scaling of λ N L (which is by far the lowest contributor to λ) by a factor of about (1 − 2 −2n AA ) −1 . Due to its negligible impact in our resource estimations, we ignore the exact calculation of this change but note that it can be easily included.

H The details of Prepare and Select operators
This section describes the implementation of prepare and select operators in more depth. We make use of the following notation: • T : the number of atomic species in the cell, and τ = ⌈log(T )⌉, • N t : the number of nuclei with atomic type t in the cell, and n t = ⌈log(N t )⌉.
Note that T −1 t=0 N t = L by definition, where L is the number of atoms. Expressing the nuclei I = (t, t j ) in register k ′ loc or k ′ N L costs τ + max t n t qubits.

H.1 Prepare
We first address the registers that have a known or straightforward implementation. 2. For the states of the registers X (47) and f (48), as motivated by Remark E.3, we choose the Select variant of QROM in Algorithm 1. The small subtlety in the case of f is that Algorithm 1 is used to prepare the superposition over |ω, ω ′ ⟩ f , and the fifth qubit |sgn(⟨b ω , b ω ′ ⟩)⟩ f is computed directly by reading |ω, ω ′ ⟩ f using a Select QROM.
3. Finally, for the superposition in Eq. (51) on j V , k V , the exact same algorithm in Appendix F applies, where one only needs to change the notation from M → M V . For example, the inequality test is mG 2 ν < M V (2 µ−2 b min ) 2 . Next, we discuss the more involved superposition for the local and non-local term.

H.1.1 Momentum state superposition for U loc
We rewrite the superposition in Eq. (52) on k loc , k ′ loc , s loc : The preparation of the above state is done almost entirely using QROM. Recall that the function γ I is solely determined by the atomic type t of I = (t, t j ), where 0 ≤ t ≤ T − 1, and 0 ≤ t j ≤ N t − 1 enumerates the type t atoms in the cell. The state |t⟩ k ′ loc is represented with a binary string of length τ = ⌈log(T )⌉, and the state |t j ⟩ k ′ loc with one of length n t = ⌈log(N t )⌉. We prepare the state below over |ν⟩ k loc |t⟩ k ′ loc using Algorithm 1 with SelSwapDirty QROMs, followed by a final SelSwapDirty QROM that reads |ν⟩ k loc |t⟩ k ′ loc and outputs |sgn(γ t (G ν ))⟩ s loc (Remark E.2): Here, Ps(n, b) is the probability of success for the preparation of a uniform superposition over n basis states using b bits of precision for the involved rotation ([87, App. J]). In all of our implementations, we set b r = 8 which gives a very high success probability for all different values of n in our simulations. The state we get from using Algorithm 1 is not exactly that of Eq. (137) and, similarly to the preparation of Eq. (129), it may include inadmissible states, such as |ν⟩ = |±0⟩ or |t⟩ for a t ≥ T . Notice that for states prepared using QROM, the corresponding action of SEL on inadmissible basis states does not have to be a trivial action with an exactly computable scalar, as we are not attempting to shift the Hamiltonian by some known scalar. We take into account these inadmissible basis states by the error they induce throughout the block-encoding. Thus it is not necessary to flag these basis states, unless it is to make the action of SEL well defined. However, as shown by the definition of SEL, the action there is already well defined: these basis states can only be an issue when computing R I or the phase action e iGν ·R I . When ν = |±0⟩, G ν is the origin vector, and when t ≥ T leading to some undefined R I , the QROM computing R I is programmed to output the all-zero state. Therefore, the action is not only well-defined but also trivial.
The last step is a direct application of the technique in [49, App. A.2] to prepare a uniform superposition over the n t nuclei of type t, meaning t,ν∈G 0 n t Ps(n t , b r ) where |ψ ⊥ t ⟩ k ′ loc is some inadmissible state and |·⟩ I,loc flags the desired uniform superposition. Simplifying the expression above for |0⟩ I,loc yields the desired state of Eq. (136).

H.1.2 PREP state for U N L
We wish to prepare the state of registers k N L , k ′ N L , s N L in Eq. (53), rewritten below: 1 The process is very similar to U loc . First, the state T −1 t=0 10 σ=0 n t |c t,σ | Ps(n t , b r ) is prepared using a SelSwapDirty QROM reading τ + 4 many qubits, where the four qubits encode 0 ≤ σ ≤ 10. This is followed by a uniform superposition over n t basis states with success probability Ps(n t , b r ): t,σ n t |c t,σ | Ps(n t , b r ) Finally, for the value in register s N L , we need to know the atomic type t and which of the three subgroups of the 11 types σ refers to, i.e., σ = 0, or σ = (1, ω), or σ = (2, x) where x is either 0 or (ω, ω ′ ). This identification requires us to compute two bits of information, which can be done by inequality tests as the three subgroups correspond to the enumeration σ = 0, 1 ≤ σ ≤ 3, and 4 ≤ σ ≤ 10. Thus a Select QROM reading τ + 2 qubits can be used to finish the preparation of Eq. (139).

H.2 Select
Below, we go over the remaining details of SEL T , SEL V , and SEL loc , and dedicate a separate section for SEL N L .
1. SEL T : We recall the transformation implemented by this operator In addition to the CSWAP that sends back and forth the (j, ω, r), (j, ω ′ , s) coordinates to an auxiliary register, we have a phase that is controlled on the following: • The state | χ ⟩ is equal to |00⟩, • The register c flags the success of i = j in registers d, e (i.e., |1⟩ c ), • The ancilla attached to register f flags the admissible states ((ω ̸ = 4) ∧ (ω ′ ̸ = 4)) in the approximate superposition prepared by QROM. This uses three Toffolis and two additional qubits.
With the help of three additional Toffolis and a single auxiliary register, we can record the success of all three above in terms of the state |0⟩ T . This would indicate the success of the state preparation of T . Therefore the phase action of SEL T is only controlled on a single auxiliary state |·⟩ T when the lattice is orthogonal (Remark 3.1), and otherwise, it is controlled on an additional register |·⟩ b . Note this additional control is only for implementing (−1) b(pω,rp ω ′ ,s ) , where one needs two Toffolis.
2. SEL V : We recall the phase and controlled arithmetics carried out by this operator: The addition and subtraction along with the phase implementation must be controlled on the success of state preparation for V , which occurs when the state (| χ ⟩ X is equal to |01⟩ X ) and the j v , c registers are in state |0⟩ j V |0⟩ c . This can be computed via three Toffolis into one single register |0⟩ V . Thus addition and subtraction along with the phase implementation are controlled on a single auxiliary register.
3. SEL loc : We recall the overall action: The subtraction of ν from p is controlled on (| χ ⟩ X being equal to |10⟩ X ), and registers c and I, loc being in state |1⟩ c |0⟩ I,loc . This is computed via three Toffolis and recorded as |0⟩ loc , indicating the success of the state preparation for U loc . Recall |·⟩ I,loc was computed as part of the preparation algorithm in Eq. (138).
To implement e iGν ·R I , we employ a Select QROM that outputs R I into a register denoted by R and of size n R . This QROM reads |x⟩ loc |I⟩ k ′ loc |0⟩ R and outputs R I in register R if x = 0 and leaves |0⟩ R unchanged otherwise. Note that in case I's type is inadmissible (as discussed in Appendix H.1.1), one may simply leave R = 0. The implementation of the phase e iGν ·R I does not need to be controlled as for any case other than the local and non-local operators being qubitized, the register R is all-zero. Once the phase is implemented, we erase the register R (by applying the inverse of the QROM) and apply Z s loc controlled on |0⟩ loc .

The Select QROM-based preparation of the state
where |i⟩ is one-hotencoded in three qubits.

The parallel SelSwapDirty QROM-based preparation of three states, each QROM reading
|p ω ⟩ |t⟩ k ′ N L |·⟩ N L,c |i⟩ ω , a total of n p + τ + 1 + 1 qubits. We ensure that the state to be prepared is indeed of type (c) by reading |·⟩ N L,c . This register is computed using one Toffoli calculating the AND of |·⟩ N L and |σ == (2, 0)⟩, where the latter is determined using three Toffolis and three qubits by checking the condition σ == (2, 0) (recall 0 ≤ σ ≤ 10 is represented using four qubits).

•
The probability of success for |0⟩ flag is 1 8 , which is larger than sin π 2(2·2+1) 2 ≈ 0.095. Therefore, using the technique in Appendix G, we can ensure that after l = 2 steps of amplitude amplification, the success probability is very close to one, so that the error in our implementation of the reflection is negligible.
The case of non-orthogonal lattices. We have two materials in Table 2 with a non-orthogonal lattice. However, as one of the lattice vectors is orthogonal to the other two, the states |Ψ I,σ ⟩ admit a decomposition into states of size n p and 2n p qubits. Therefore, the QROM cost will need to change accordingly for all states of type (a), (b), and (d). For type (c), after a suitable change of axis, we Hence, the preparation of |Ψ I,2,0 ⟩ can be adapted as follows: prepare a superposition ). The amplitude amplification will need to be done on a qubit with success probability 1 4 . Thus only one amplification is necessary to get exact success probability 1, and there is no need to use the technique in Appendix G.

I Derivation of λ
In this section, we follow the guideline of Section 4.2 for computing the effective values of λ T , λ V , λ N L and λ loc . This includes finding the relevant success probabilities in state preparation, which flag the admissible states in the superposition. Then one needs to find the approximated amplitudes implemented by the algorithm for the admissible states.
ΩPs(η,br) 2 , where λ ν,loc = I,ν∈G 0 Table 5: Values of λ T , λ V , λ N L , λ loc and total λ according to calculations in Appendix I. The corresponding qubit numbers, e.g. n T are computed in Appendix J. The probability P amp ν,V given by Eq. (154) is an amplification of the initial probability P ν,V defined in Eq. (155). Each term must include the success probability Ps(η, b r ) 2 for preparing the superposition over electron pairs in Eq. (50).

I.1 λ T
For the kinetic term T , there are two success probabilities to consider. One is for preparing the states of registers g, h in Eq. (49), which brings an adjustment by a probability of (2 np−1 −1) 2 2 2np−2 , as demonstrated in [87,Eq. (71)]. The other is for preparing Eq. (50), which involves creating two uniform superposition over electrons, yielding the adjustment by Ps(η, b r ) 2 . So we need to replace the theoretical value λ T = η If the lattice is orthogonal, λ T is half the above value (Remark 3.1).

I.2 λ N L
Given the implementation in Appendix H.1.2, the amplitudes are correctly scaled such that the success probabilities Ps(n t , b r ) are canceled out. Thus, only the success probability Ps(η, b r ) 2 for the superposition over pairs of electrons must be taken into account. Therefore the theoretical value of λ N L = η I,σ |c I,σ | is adjusted as follows: Here, we need to consider multiple adjustments to the theoretical value of λ V . The first one is the probability of success P ν,V , flagged by |0⟩ j V . We shall amplify it above some set threshold probability p th . Furthermore, similar to U, V in the OAE case [87, Eq. (124)], the amplitudes implemented by the inequality test in Appendix H.1 are not exactly 1 Gν . Indeed, while theoretically λ ν,V = ν∈G 0 1/G 2 ν , the effective amplitudes are as mentioned in Eq. (134). Therefore, we have an adjustment for the normalization of the success state flagged by |0⟩ j V : This is then used to adjust the value of λ V , along with the amplified probability and the usual Ps(η, b r ) 2 for the electron pairs superposition: To complete our derivation, we recall the expression for the amplified probability P amp ν,V , where given a V many amplitude amplifications to reach a set success probability threshold p th . Lastly, following the Eq. (134), P ν,V can be shown to be given by

I.4 λ loc
We recall the implementation of the momentum state for U loc in Appendix H.1.1, where the QROM scaled the amplitudes by Ps(n t , b r ) −1 , ensuring this success probability gets canceled after preparing the superposition over the nuclei of each atomic type. Thus, similar to λ N L , we only need to adjust by the usual electron pairs success probability preparation: Eq. (186) ε Ψ ≤ 18(np+4+τ )πη 2 n Ψ I,σ |c I,σ | Table 6: Equations derived in Appendix J linking the target errors ε X and their associated finite size registers n X . By replacing the inequality with equality, one obtains the n X that achieves a target error ε X . In the estimation for ε R , F I,ν (Eq. (177)) is some expression bounding the entries of the non-local term.

J Error Analysis
We estimate the errors listed in Section 4.1. To do so, we make the following basic observation. Assume that an LCU of the form a α a U a is approximated by a ξ a V a . Here, ξ a is obtained after a series of approximations due to choosing finite size registers S α = (n s 1 , . . . , n s k ) and similarly for V a , where we have a series of approximations using finite size registers S U = (m s 1 , . . . , m s l ). We estimate ∥ a α a U a − a ξ a V a ∥ using the triangle inequality, by building the following LCU series: • a α a U a,j where 0 ≤ j ≤ l means we perform the series of approximations up to m st . Note that U a,l = V a and U a,0 := U a .
• a α a,i V a , where 0 ≤ i ≤ k means we use only the finite size registers up to n s i . Note that α a,k = ξ a and α a,0 = α a .
Then we can estimate a α a U a,j by a α a U a,j+1 for 0 ≤ j ≤ l − 1, and a α a,i V a by a α a,i+1 V a for For each of the four operator χ = T, V, U N L , U loc , we must identify the order in which the approximations must be introduced. For the selection probabilities, in addition to the choice nχ which is the first to be made for all of the four operators, there is only one other approximation. For example, for T , the order of approximation is (nχ, n B ) while for U N L , it is (nχ, n N L ). There is also at most one choice for the unitaries for all four operators, with the exception of the non-local term; there, the order of approximations is (n R , n Ψ ).
We finish this discussion with a lemma that is essential in getting an accurate estimate of the errors made by QROM when scaling a qubitized operator by its λ.
Lemma J.1. Assume the unit state |ψ⟩ = 1 √ λ a α a |a⟩ with α a ∈ C, is approximated by the unit state |ψ⟩ = a ξ a |a⟩ up to error: Proof. We use Cauchy-Schwarz and triangle inequality a ||ξ a The first equality is the conjugate identity, the inequality after is Cauchy-Schwartz. It is followed by a triangle inequality for (|ξ a | √ λ−|α a |) 2 ≤ (|ξ a √ λ−α a |) 2 and the expansion of the term (|ξ a | √ λ+|α a |) 2 . Then we use directly the assumption to bound the first term, while the second term expansion simplifies since a |ξ a | 2 = 1, a |α a | 2 = λ. The rest is another application of Cauchy-Schwartz.
While the order of approximation in Eq. (158) starts with the unitaries and then the selection probabilities, we found it more instructive to first discuss the errors related to PREP, i.e., the selection probabilities.

J.1 Errors in PREP
The register X is a superposition made by QROM with target amplitudes λχ/λ. As shown in Lemma E.1, the error in estimating the normalized state is ϵ = nπ 2 nχ where n is the number of qubits in register X . Since we have four operators, n = 2. Thus the equation determining εχ after taking into account the normalization λ and using Lemma J.1 is: Note that in the OAE setting, ε T [87, Eq. (D29)] is the closest analog to our εχ.

J.1.2 ε B
This error is derived similarly to the previous one. It approximates the normalized state in register f (Eq. (48)) up to error ϵ = 4π 2 n B as we use 4 qubits to denote the two coordinates ω, ω ′ . The normalization factor λ (in the context of Lemma J.1) is η2 2np−2 2 ω,ω ′ |⟨b ω , b ω ′ ⟩|, and thus the error induced is If the lattice is orthogonal, then ε B is bounded by half the estimate above (Remark 3.1).

J.1.3 ε N L
The superposition over |t⟩ k ′ According to Lemma E.1, this leads to an error ϵ = π(τ +4) 2 n N L in preparing the normalized state. By Lemma J.1, the error induced on the selection probabilities is Let us explain the factor η t,σ | ct,σnt Ps(nt,br) |, which is supposed to be the factor λ in Lemma J.1. First, note that η is simply taking into account the sum over the η electrons.

J.1.5 ε M loc
The error analysis here is similar to ε N L in Appendix J.1.3, as the preparation method of the momentum state for the local term is also based on QROM followed by a uniform superposition over n t basis states. The coefficients estimated by the QROM are α a = γt(Gν ) 1/2 n 1/2 t Gν Ps(nt,br) 1/2 , where a = (t, ν). After applying QROM, superpositions over n t nuclei of atomic species t are created which introduce an amplitude of Ps(nt,br) nt . So we need to bound the error a=(t,ν) nt Ps(nt,br) |ξ a − α a | 2 . We use the simple bound ε = max t ( nt Ps(nt,br) ) · π(3np+τ ) 2 n M loc where the latter term is the bound on a=(t,ν) |ξ a − α a | 2 given by the QROM approximation of the normalized state (Lemma E.1). Therefore, by virtue of Lemma J.1 with λ in that lemma set as 4πη We let U loc be the approximation of U loc as a result of using n R bits to compute the approximation R I of R I , and define δ R = max I ∥R I − R I ∥. We have δ R ≤ max ∥a i ∥ 2 n R +1 as R I = 3 i=1 a i r i,I , where 0 ≤ r i,I ≤ 1 are the given fractional coordinates of the nuclei in the cell. Given the LCU of U loc in Eq. (41), we have: 4ηπ The LCU of U N L can not be used like its local counterpart to facilitate the estimation of ε R,N L . Instead we have to first derive an estimate for the entries of U N L . Below, we provide two estimates, the first is the tighter one, the second is more pessimistic but easier to compute and is used for the purpose of resource estimation. Recall Then, The (k N L , k ′ N L , s N L ) register superposition prepared using QROM. β N L defined in Eq. (189).
Preparing the superposition for the register (j V , k V ) with amplitudes 1/G ν using QROM in inequality test; β V defined in Eq. (193).
Reflection on state preparation qubits Toffoli cost Reflection on the qubits used in state preparation; see Appendix K.3. states |Ψ I,σ ⟩ of type (c) in Eq. (147). We will not consider the second and higher order of errors in our approximation as their impact is too small, and we have already a pessimistic estimate above by taking ϵ 2 ≤ ϵ. The case for partially orthogonal lattices is simpler, as there are two QROMs and therefore two associated errors, however one must select n = 2n p + τ + 4, or n = 2n p + τ + 2. Thus, for the reflections, we have: Hence the error is estimated as: when the lattice is orthogonal, and we simply replace (n p + 4 + τ ) by (2n p + 4 + τ ) if the lattice is partially orthogonal. As explained in Appendix H.2, the implementation of the reflection onto |Ψ I,2,0 ⟩ requires an additional QROM to prepare a one-hot-encoded superposition 3 Denoting by n b the number of qubits used by the Algorithm 1 rotations to prepare the superposition, the associated error ε N L ′ satisfies ε N L ′ ≤ 3π · 2 −n b . However, to make the analysis easier for our case-studies while also retaining accuracy later on in our resource estimations, we choose n b so large that it gives the superposition with a negligible error. By choosing n b = 50, the error is of order 8e − 15, which is small enough to be safely ignored in our analysis. Even for larger materials than those in our case studies with a much larger λ N L , one can always increase n b by a small amount without any significant accrued gate and qubit cost.

K Gate costings
In this section, we derive the Toffoli cost expressions for the algorithm. A summary of the results is given in Table 7. We make a few general remarks on this table: 1. The cost calculated for the PREP subroutines always includes the uncomputation part by PREP † . Hence, most costs have a leading factor of two.
2. The cost for the reflections on |Ψ I,σ ⟩ for all σ will need to change slightly for the materials with non-orthogonal lattices (see Appendix K.2).
3. While we list the Toffoli cost in Table 7, we also study Toffoli depth, and calculating the latter mostly involves replacing the QROM costs expressions in Table 7 by their depth formulae in Table 4. 4. The parameters β − determine the space-depth tradeoff of the QROM (Table 4). Optimizing the expressions in Table 4 in terms of β − generally leads to a much higher total number of qubits compared to the AE case. Thus we determine β − in a way that satisfies constraints on the number n dirty of dirty qubits that can be used. When estimating depth, there will be an additional constraint posed by the maximum allowed number n tof of simultaneous Toffoli applications.
Finally, note that the gate and qubit costings in the AE case for general lattices is the same as OAE in [87] with two exceptions: • The costing for preparing the state of register f , over |ω, ω ′ , sgn(⟨b ω , b ω ′ ⟩)⟩, • The costing for preparing the momentum state superposition, which is identical to preparing the momentum state for V in the PP-based algorithm.
The gate and qubit estimates for these are derived further below.

K.1 Toffoli cost of Prepare
The superposition χ ∈{0,1} 2 λχ λ | χ ⟩ X on two qubits is prepared using the Select QROM-based Algorithm 1, reading 2 qubits and using a register of size nχ for the precision of the rotations. Its gate cost is directly derived from Eq. (122), substituting n = 2 and b = nχ. Notice that the inverse of the operation in PREP † is responsible for the doubling of the cost, yielding 2 · [2(2 2+1 − 1) + (nχ − 3)2].

K.1.2 Register f
We use a Select QROMs to prepare the superposition in register f and the same variant to output sgn(⟨b ω , b ω ′ ⟩) in the fifth qubit. According to Eq. (122), the former has cost 2(2 4+1 − 1) + (n B − 3)4 as 4 qubits are read, while the latter has cost 2 4 (117). The inverse of these operations for PREP † has the same cost.

K.1.3 Register R
We used two Select QROMs to output the nuclei coordinates R I into register R, one for each of the local and non-local term. Each reads the nuclei type and its enumeration, i.e. τ + max t n t qubits. They further read the qubit of the register loc and N L to effectively control their output. By a direct application of Eq. (117), the cost is 2 · [2 · 2 τ +maxt nt+1 ], which is doubled due to PREP † .
We apply the algorithm and cost estimate in [49, App. A.2] for preparing the uniform superposition over n t basis states enumerating the nuclei of type t in register k ′ loc and k ′ N L , giving the Toffoli cost 2 · [3 max t n t − 3v 2 (max t n t ) + 2b r − 9] for each, where v 2 (x) is the largest power of two dividing x ∈ Z. The uncomputation has the same cost, therefore doubling the said amount.
There is one small subtlety that we did not address when implementing the PREP states of U loc and U N L . Our application of [49, App. A.2] assumes that the number of qubits n t needed for creating the superposition over n t many basis states, and the number n t itself, are both stored in some registers. These two registers are computed using the Select variant of QROM that needs to read only the atomic type, and has cost 2 · 2 τ , which is further doubled due to the inverse of PREP.
Circuit Depth.
Following the same strategy in the previous case, we consider the constraints β loc κ loc ≤ n tof , β loc (n M loc + 1) ≤ n dirty . We are using n M loc + 1 instead of n M loc for the very last QROM, therefore, to get an upper bound of the resource estimate, we use n M loc + 1 in the constraints. The depth is where

K.2 Toffoli cost of Select
Below we briefly go over cost estimates that are very similar to the OAE setting. CSWAPs and the SEL cost for T . Recall that at the beginning and end of all SELχ operators, there is a shared circuit of CSWAPs. The cost estimate used in [87, Eq. (72-73)] applies without any change, to perform the CSWAPs on the plane wave vectors for all four operators and the bits r, s of coordinates ω, ω ′ for the operator T , copying them back and forth to an auxiliary register and implementing the necessary phases for T .
Controlled addition/subtraction of the momentum state vector. While the same cost in the OAE setting [87, Eq. (93)] was computed as 24n p , here one needs to take into account two separate application of this operation for U loc , V yielding 48n p .
Phasings by the nuclei coordinates. There are two such phasings, one for the local part, which cost 6n p n R is computed exactly as in [87, Eq, (97)], and another for the non-local, which cost is 12n p n R , as we apply the phase once for G q and then for G p after the reflection on |Ψ I,σ ⟩.

K.2.1 Reflection on |Ψ I,σ ⟩
There are two costs to be estimated. One is the preparation of |Ψ I,σ ⟩ by the operator U I,σ (and its inverse) and the other is the reflection onto |0⟩ N L |0⟩ ⊗3np . The latter's Toffoli cost is (3n p + 1) − 2. Below we compute the costs for a material with an orthogonal lattice, and end with a remark on the changes required for the non-orthogonal case.
There are three sets of SelSwapDirty QROMs used for each coordinate, along with their inverse that follows the reflection on |0⟩ N L |0⟩ ⊗3np . This means a factor of six. The QROMs read n qrom = n p + τ + 4, i.e. the number of bits in one coordinate p ω of the plane wave vector p, the type of the nuclei and Gaussian state (t, σ). But since τ + 4 of these qubits are already determined by the PREP state, the iterative process of QROM to build the superposition happens only n p times. Thus the reflection costs To the cost above, one needs to add the one for implementing V I,σ for the reflection onto |Ψ I,2,0 ⟩. We refer to Appendix H.2 for the relevant notations. We can summarize this cost as 5 · 2(QROM i + 3 · QROM Ψ + (n AA − 2) + 2), where • the factor of five is because of the exact amplitude amplification, • n AA − 2 is due to using the trick in Appendix G, • the additional 2 is to compute the flag qubit out of the three hot encoded qubits |i⟩, and with n ′ qrom = n p + τ + 2 and n b = 50 (Appendix J.2.2). QROM i is the cost of preparing the one-hot- j |i⟩, and QROM Ψ is the cost for preparing the superposition |Ψ i,I,σ ⟩ for each given i. According to Eq. (121) the expression for β ′ Ψ is Notice that while we are using a different β ′ Ψ ̸ = β Ψ , the same n Ψ in Eq. (201) is used. This enables us to also take into account the error in preparing |Ψ I,2,0 ⟩ when analyzing the error ε Ψ due to the choice n Ψ (Appendix J.2.2).
Non-orthogonal case. The non-orthogonal lattices in our case studies allow a decomposition of the Gaussian states into a 1D and 2D factor. The cost for U I,σ will change to include that of two different QROMs reading n p + τ + 4 and 2n p + τ + 4 qubits. Further, for the implementation of V I,σ , we only need one amplitude amplification, and the hot-encoded superposition above is over two qubits. The necessary changes to the cost formulae are straightforward. For example, for V I,σ , the cost changes to 3 · 2(QROM Ψ,1 + QROM Ψ,2 + QROM ′ i + 2), where QROM Ψ,1 is the same as (203), and where we note the substitution n ′ qrom = 2n p + τ + 2 and replacing n p with 2n p where appropriate. Circuit Depth. Given the decomposition of QROM to three parallel QROMs, we can parallelize the computation more so than in the previous procedures. We apply the three sets of QROMs in parallel, in addition to reducing their depth using Eq. (126), yielding a circuit depth of to implement |Ψ I,σ ⟩ for σ ̸ = (2, 0). However note that the number of dirty qubits used in this case is 3β Ψ n Ψ ≤ n dirty with the factor 3 due to simultaneously preparing the three 1D Gaussian states.
Similarly we have 3β Ψ κ Ψ ≤ n tof . These constraints imply the following optimization • Three qubits for the one-hot-encoded superposition 3 i=1 |i⟩. This is two for nonorthogonal lattices.
• All ancilla qubits used by all QROMs are either dirty and from the circuit itself, which are returned to their initial state, or are clean (such as in the Select variant of QROM), which are uncomputed either by the procedure itself or measurement and Clifford gates after the QROM.
Note that all other flag or ancilla qubits not mentioned above, such as the qubits |·⟩ χ s, are rezeroed. Overall, the total number of qubits and Toffoli cost for the reflection is at most and three less for the non-orthogonal cases.

L Qubit costings
We list the entire qubit cost below borrowing from [87, App. C] in parts where the subroutines involved stay the same.
1. The system register has size 3ηn p .
2. The control register for the phase estimation needs log πλ 2ϵ pha qubits.
3. The phase gradient state that is used for the phase rotations.
There are max(n R + 1, nχ, n B , n N L , n b , n Ψ ) bits used in the phasing; each of these n X 's is the number of bits of a phase gradient state used within the subroutine building the superposition on the corresponding register. 4. One qubit for the |T ⟩ state used catalytically for controlled Hadamards. 5. Two qubits for register X .
6. Four qubits for the four registers |·⟩ χ . 7. The 2n η + 5 qubits from the preparation of the superpositions over i and j; η qubits for each of these registers, 2 qubits for the rotation preparing the superpositions, 2 qubits that flag the success of the two preparations, and 1 qubit that flags whether i = j.
8. Eight qubits for register f . Five qubits in register f along with three additional qubits needed to flag the eligible basis states.
9. Five qubits in total used by the two QROMs to make the superposition in register f . Note five are used and rezeroed immediately, before four of them are reused to make the sgn(ω, ω ′ ) (and rezeroed again).
10. The states r and s are prepared in unary, and need n p qubits each, for a total of 2n p .
11. The register R itself uses 3n R qubits.
12. The two QROMs used for computing the register R each use and immediately clean τ +max t n t +1 ancilla qubits.
13. The register k ′ loc uses τ + max t n t qubits.
14. The registers k N L , k ′ N L , s N L use τ + max t n t + 4 + 1 qubits.
15. Four qubits for the uniform superposition over the n t nuclei of each type t in k ′ loc , k ′ N L . One for the rotation for each and one for the success flag.
16. Making the superposition on k N L , k ′ N L , s N L consumes β N L n N L dirty qubits and n N L + (τ + 4) clean qubits which are all returned to their initial state.
17. Flagging the eligible states in k ′ N L , k N L uses four qubits, one for the type, one for the σ index, and two to compute whether these two flags and the flag for n t superposition are successful.
19. The same three inequality tests use two qubits to determine σ's associated type for computing s N L .
20. Outputting the sign into s N L uses and rezeroes τ + 2 ancilla qubits.
21. The preparation of the momentum state superposition for V : (a) Storing ν requiring 3(n p + 1) qubits.
(c) n M V qubits for the equal superposition state.
(e) 2n p + 1 qubits used in signaling whether ν is outside B µ , including the flag qubit.
(f) The n M V bits required by QROM to compute one side of the inequality test, and the β V n M V dirty and n M V + 3n p clean ancilla qubits it uses and immediately returns to initial state to compute that side.
(g) The qubit j V resulting from the inequality test.
(h) Two qubits, one flagging success of all three of inequality test, no negative zero and ν not outside B µ , and the other an ancilla qubit used to produce the triply controlled Toffoli.
22. The preparation of the momentum state superposition for U loc : (a) Storing ν requires 3n p qubits.
(b) Storing I requires τ + max t n t qubits.
(c) One qubit for the s loc register.
(d) The n M loc bits required by QROM with the sign into s loc , and the β loc (n M loc + 1) dirty and (n M loc + 1) + (3n p + τ ) clean ancilla qubits it uses and immediately returns to initial state to compute that RHS.
24. The temporary ancillae used in the addition and subtraction of ν for the U loc , V . This is identical to the OAE case and we simply recall it to be thorough. The cost here is given by items (a) and (c), giving a total of 5n p + 1: (a) In implementing the SEL operations, we need to control a swap of a momentum register into an ancilla, which takes 3n p qubits for the output. The n η − 1 temporary ancillae for the unary iteration on the i or j register can be ignored because they are fewer than the other temporary ancillae used later.
(b) We use 2n p + 3 temporary qubits to implement the block encoding of T , where we copy components ω, ω ′ of the momentum into an ancilla, copy out two bits of these components of the momentum, then perform a controlled phase with those two qubits as control as well as the qubit flagging that T is to be performed.
(c) For the controlled addition or subtraction by ν in the SEL operations for U loc and V , we use n p bits to copy a component of ν into an ancilla, then there are another n p + 1 temporary qubits used in the addition, for a total of 2n p + 1 temporary qubits in this part. Even though the momentum ν registers are different for U loc , V , the same temporary ancillae can be used since the computations are not done in parallel for the two.
(d) There are also temporary qubits used in converting the momentum back and forth between signed and two's complement, but these are fewer than those used in the previous step.
25. There are 2 overflow qubits obtained every time we add or subtract a component of ν into a momentum. All these qubits must be kept, giving a total of 9.
26. There are also temporary qubits used in the arithmetic to implement e −iGν ·R I , e −iGq·R I , e iGp·R I . The arithmetic requires a maximum of 2(n R − 2) qubits. Note that the R I can be output by the QROM, the phase factor applied, and the R I erased, after (or before) the arithmetic for addition/subtraction of ν is performed. So we only need to take the maximum of the 5n R − 4 qubits used in this item and item 11, and the 5n p + 1 temporary qubits used in item 24.
27. Two qubits used to control between adding and subtracting ν in order to make SEL self-inverse. This is required to employ the techniques of [7] to avoid controlled application of the qubitization operator.
30. The one-hot-encoded superposition for |Ψ I,2,0 ⟩ requires 3 qubits, along with the QROM preparing that superposition requiring an additional 3 clean qubits which are immediately rezeroed. Both of these requirements become two instead of three when the lattice is non-orthogonal.
Remark L.1. In case we use the technique in Appendix G, we need to change the phase gradient qubit cost (item 3) to max(n R + 1, nχ, n AA , n B , n N L , n M loc , n b , n Ψ ). Furthermore, we need to add 2 each time we use this technique, which we do once for SEL N L when the lattice is orthogonal.
To sum up the above, we need to take into account which temporary ancillae can be reused, and take the maximum of the dirty qubits and clean qubits to get the total number of qubits used in the algorithm. First, we list the subroutines and the number of temporary ancillae they need. All ancillae below are clean and temporary unless mentioned otherwise. simultaneous Toffoli applications. To derive a more reasonable depth, we set the limits with which QROM can optimize its circuit depth. These limits are values we set for the parameters n dirty and n tof . The two set the constraints Eqs. (124) and (125) on the space-depth trade-off parameter β (127). Notice that the Toffoli cost has (only) the dirty qubit constraint (121). As a result it is the latter that needs to be determined first.
To do so, we first run a simulation to compute the clean qubit cost of the all-electron algorithm. Notice that the clean qubit cost is independent of the trade-off parameter β. Therefore, for each N , it is well-defined to set n dirty as the number of clean qubits that the all-electron algorithm needs for N many plane waves. This allows for a fair comparison as for a fixed number of plane waves, both algorithms have the same available number of dirty qubits to optimize the depth of their QROM computations.
Given the fact that the AE algorithm always consumes far more clean qubits, we would like our PPbased circuits with optimized depth to use as much as possible the dirty qubits available. Choosing a small value for n tof can prevent that and we found that setting n tof = 500 is approximately the smallest value that satisfies this requirement for all of our case studies.
As defined in Appendix E.3, there is another parameter κ involved in the QROM depth calculation that is subroutine-dependent (we have κ Ψ , κ loc , etc.). However we set a uniform κ = 1 value for all our estimations. Changing this value has shown insignificant or worsening impact on the depth.
The success probability threshold for the amplitude amplification involved in preparing ν 1 Gν |ν⟩ (51) has no impact on the qubit cost. However, it changes the Toffoli depth as λ V ∝ p −1 th . We set p th = 0.75 for all pseudopotential experiments. For the all-electron setting, given that the algorithm in [87] has slight variations according to the value of the initial probability of success (denoted by p ν in [87, Thm. 4]), we select a p th that gives the lowest Toffoli depth.
Lastly, we need to determine the errors listed in Appendix J while targeting the chemical accuracy ε = 0.043eV. Recall that we have to satisfy: As the cost formula πλ ε QPE (2PREP cost + SEL cost ) suggests, among all errors, the inverse of ε QPE contributes directly to the cost, as others only do so polylogarithmically. Therefore, we allocate the vast majority (99.5%) of the error to ε QPE , while distributing the rest equally among all other errors:

M.2 Clean and total qubit cost
As mentioned in the main text, the highest contribution to the qubit cost comes from the encoding of the plane waves, needing 3ηn p many clean qubits. The qubit cost in the Tables 8 to 12 is measured in two parts, clean and total. The clean cost is lower than the total cost for the pseudopotentials, but they are equal in the AE setting, as enforced by the definition of n dirty in Appendix M.1. Notice these numbers are reported for the n tof = 500 runs optimizing the depth of the circuit, and not for the optimized costs. Further, this is only relevant to the PP-based algorithm, and the clean and dirty costings when optimizing the Toffoli cost of the PP-based algorithm are even smaller.

M.3 Results
In all cases, taking into account the better accuracy, we can conclude that the PP-based is the better alternative. However, this difference is most clear in Table 8 where we choose the right number of plane waves N PP and N AE to hit chemical accuracy with both PP and AE calculations. There are multiple reasons why the depth and cost is competitive, even for the same number of plane waves: • The number of electrons is about half of the all-electron case.   Table 3 where the optimized quantity was Toffoli cost. Better numbers are indicated in bold. The total qubit cost in Table 3 includes dirty qubits. The clean qubit cost is lower than the total qubit cost for the pseudopotentials, but they are equal in the AE setting, as enforced by the definition of n dirty in this section.  Table 9: Resource estimation for dilithium iron silicate (Li 2 FeSiO 4 ). This material resource estimation was studied earlier in [21]. Better numbers are indicated in bold. The algorithm is using almost the entire dirty qubit capacity (n dirty ), as the total qubit count of both algorithms are almost equal.  Table 10: Resource estimation for lithium manganese oxyfluoride (Li 0.75 MnO 2 F). Better numbers are indicated in bold. Depth circuit optimization consumes almost the entire n dirty available for N = 10 4 , 10 5 , meaning it is determined by the dirty qubit constraint, instead of the Toffoli parallelization limits. For N = 10 3 , the optimization is fully achieved, i.e. n dirty , n tof are both large enough that we obtain the minimum possible depth for the optimized QROM circuit depths.   (Eq. (150)), aided by the fact that the number of unitaries involved in the LCU for U N L is smaller compared to other operators (∼ ηL).
• Even though the total PREP and SEL cost for qubitizing the pseudopotential Hamiltonian are larger than those in the AE case (2-4 times), it was important that our algorithm manages to control the qubitization cost, despite the more complicated expressions defining the pseudopotential matrix entries. This is accomplished thanks to our LCU and subroutine choices like QROM.

N List of notations
• p, q, ν -Plane wave indices as integer vectors. Normal font version is used for indexing other variables.
• R -Nuclear coordinates, also denoting the register R storing those coordinates • r loc , r i , α, C i , A i -HGH pseudopotential parameters that depend on the atom • p th -Chosen threshold for the amplitude amplification for success probability P ν,V amplified to P amp ν,V , corresponding to PREP V