Algorithmic Error Mitigation Scheme for Current Quantum Processors

We present a hardware agnostic error mitigation algorithm for near term quantum processors inspired by the classical Lanczos method. This technique can reduce the impact of different sources of noise at the sole cost of an increase in the number of measurements to be performed on the target quantum circuit, without additional experimental overhead. We demonstrate through numerical simulations and experiments on IBM Quantum hardware that the proposed scheme significantly increases the accuracy of cost functions evaluations within the framework of variational quantum algorithms, thus leading to improved ground-state calculations for quantum chemistry and physics problems beyond state-of-the-art results.

In this paper, we introduce a technique for quantum error mitigation inspired by the Lanczos algorithm for matrix diagonalization [34,35]. Our method provides a rigorous way to improve ground state estimates by enhancing the projection of any given solution to the true ground state of the problem Hamiltonian, therefore representing a direct extension in the context of quantum algorithms of the well known classical Krylov space methods. These could for example be employed to increase the quality of the variational ansatz whenever the latter is limited, e.g., by the shallowness of the corresponding quantum circuit or by hardware connectivity constraints [36]. In addition to that, here we go beyond a simple adaptation of classical protocols and we demonstrate a promising parallel application as a systematic quantum error mitigation procedure. The scheme remains fully compatible with the underlying variational principle and can be combined with other error mitigation techniques. Most importantly, our method does not require additional experimental resources (e.g., calibration of dedicated pulses) besides an increase in the number of observables to be measured on the quantum register. We demonstrate a practical implementation of our technique on a number of quantum physics and quantum chemistry models using state-of-the-art quantum hardware. The general applicability of the scheme is not restricted to such problems, but can be used for any quantum computing application that requires the calculation of ground state properties.
The results indicate that the proposed error mitigation scheme can systematically provide better accuracy for energy estimates without the need for a detailed description of the underlying noise model and without being subject to specific hardware constraints.
Lanczos Algorithm. Variational quantum algorithms are typically formulated in terms of a Hamiltonian H, whose smallest eigenvalue, and possibly ground eigenstate, represent the desired solution. In the variational approach, a quantum circuit with a set of tunable gate parameters θ is used to generate an optimal approximation |ψ( θ opt ) of the target quantum state in the variational subspace, using the expectation value of the Hamiltonian H θ = ψ( θ)|H|ψ( θ) as a cost function. Under the effect of gate, thermalization, and readout noise, represented here as a quantum operation E (see also Appendix A and B and Ref. [37]), the variational estimate of the target ground state energy can be described in terms of a density matrix ρ = E(|ψ( θ opt ) ) as E = Tr [ρH].
Following the classical Lanczos algorithm [34], both the ideal H θ and the noisy energy estimates can be improved by constructing a basis of the order-m Krylov subspace K (m) through, e.g., a power iteration scheme and then diagonalizing H in K (m) [34]. As an example, given an arbitrary pure variational state |ψ( θ) , any element in K (2) can be parametrized as |Ψ L (a 0 , a 1 , θ) = (a 0 − a 1 H)|Ψ( θ) / N (a 0 , a 1 ) where N (a 0 , a 1 ) = Ψ( θ)|(a 0 − a 1 H) 2 |Ψ( θ) . An improved upper bound for the true ground state energy E 0 can then be found by minimizing the expectation value of the Hamiltonian over the free parameters a 0 and a 1 . Similarly, for any density matrix ρ representing a noisy variational state, the improved energy estimate E L can be expressed as (see Appendix C for the general case) E L = min a 0 ,a 1 ∈R Tr ρH(a 0 − a 1 H) 2 Tr [ρ(a 0 − a 1 H) 2 ] . (1) Notice that this result coincides with the application of a single Euler step of size τ = a 1 /a 0 in imaginary time evolution [38][39][40]. The estimate in Eq. (1) can be shown to be reliable and at least as good as the initial estimate E even in the presence of noise, as it still obeys the variational principle E 0 ≤ E L while additionally satisfying E L ≤ E. Indeed, by defining, ∀a 0 , a 1 ∈ R, we can rewrite Eq. (1) as E L = min a 0 ,a 1 Tr [ρ L (a 0 , a 1 )H].
Since ρ L (a 0 , a 1 ) represents a valid density matrix, i.e., hermiticity, semi-positive-definitness and normalization are preserved under the application of Eq. (2), the variational bound E 0 ≤ E L must also hold. Moreover, the relation E L ≤ E follows from the observation that the expression minimized in Eq. (1) evaluates to E for a 0 = 1, a 1 = 0. With a similar proof, it can be shown that the same ordering E 0 ≤ E L ≤ E is respected also for higher order (m > 2) versions of the method.
Ifρ L is the density matrix achieving the minimum in Eq. (1), an improved estimate for any ground state observable O can be obtained as O L = Tr[ρ L O]. The accuracy of such estimate with respect to the true value is expected to grow, compared to the initial noisy result, as the overlap of ρ L with the true ground state |Ψ 0 will generally be larger than the one of the original ρ. A sufficient condition for this to hold, ifā 0 ,ā 1 are the optimal parameters from Eq. (1), is that Intuitively, the application of the Lanczos method modifies the original spectral weights ρ ii = i|ρ|i on the energy eigenstates {|i }, leading to the replacement where E i is the eigenvalue of H corresponding to eigenvector |i . The minimum condition of Eq. (1) then corresponds, similarly to the parameter shift rule in usual power iteration methods for matrix diagonalization, to a choice ofā 0 ,ā 1 reducing the spectral weight of excited states.
The number of independent operators that need to be measured at order m scales in the worst case as M 2m−1 , where M is the number of Pauli terms in the original Hamiltonian. Indeed, expectation values of the form H n for n = 0, . . . , m+1 appear e.g. in Eq. (1) and its generalization to m > 2. However, through grouping of Pauli operators into tensor-product basis sets [12] or by using partial state tomography and state reconstruction techniques [41][42][43] the actual overhead in the number of measurements required by the scheme can be significantly mitigated. With extensive numerical simulations, we also find that for many relevant quantum chemistry problems encoded on n q qubits the number of Pauli strings in H n remains below the theoretical predictions (O(M 3 ), i.e., O(n 12 q ), for n = 3) at least up to about n q = 20, see Appendix D for details. Moreover, simple analytical arguments predict a very favorable scaling for local Hamiltonians, like for instance the ones describing spin and lattice models. In the following, we will show practical examples for which even loworder (m = 2) results can provide estimates very close to noiseless results. Implementation of order m = 2 error mitigation. For the implementation of the method to lowest order, E L is expressed in terms of the three expectation values H k = Tr[ρH k ] with k = 1, 2, 3, which can all be efficiently measured in practice on a quantum processor. However, each of such estimates usually comes with an associated standard error σ H k arising from fluctuations of the readout due to the inherent quantum statistics of measurements as well as to hardware noise. As a matter of fact, when such statistical errors are propagated to the final outcome E L they can give rise to uncertainties σ E L which are larger than the standard errors affecting the original noisy estimate E = Tr [ρH]. The precision is particularly influenced by variations in time of the experimental noise levels and calibration accuracy, and can become critical at illconditioned values for which the minimization in Eq. (1) yields a small denominator. Besides increasing the precision of the original H k estimates e.g. via an increase in the size of the sta-tistical samples within the same experiment, several strategies can be adopted to enhance the stability of the proposed method. A direct improvement can for example be obtained by repeating the estimation of E L a number n repeat of times, thus obtaining a set of values {e L,i } and of associated errors {σ i } which are then combined in a weighted least square (wls) average E L,wls (see also Appendix E). Alternatives are obtained by replacing Eq. (1) with a cube root estimate E L,cube = 3 H 3 , which -under some assumptions -can also enhance the spectral weight corresponding to the exact ground state (see Appendix F), or by a direct empirical choice of the parameters a 0 and a 1 in a region with low standard error yielding E L,a 0 ,a 1 (see Appendix F and G for details). It is also worth mentioning that, in principle, Eq. (1) may become ill-conditioned whenever |Ψ( θ) is already very close to a pure eigenstate of H. However, in all our m = 2 experiments we always find that that H|Ψ and |Ψ can be assumed to be linearly independent. In the more general case, simple numerical schemes can be introduced to detect singularities in the Krylov subspace.
Applications. In all the examples presented below, we successfully use an order m = 2 method to correct for hardware noise and to enhance the quality of the ansätze generated by the Variational Quantum Eigensolver (VQE) algorithm [12,44]. We use the so called R y R z variational form (see Appendix H) [12] and optimize the rotation angles θ via the simultaneous perturbation stochastic approximation method (SPSA) [12,45]. To increase numerical stability and in view of the often rough optimization landscapes under study, in all cases we sample n init = 5 parameter sets θ from uniform distributions on [0, 2π] and use the one with the lowest energy expectation value as initial set for the optimization routine with a maximum of n steps SPSA iterations. The optimization procedure is repeated a number n VQE of times, choosing the { θ opt } with the lowest energy as the optimal VQE solution. Error mitigation is then applied to the optimal circuit by taking measurements of the operators H, H 2 and H 3 and employing the methods introduced above. In particular, we obtain the direct solution of Eq. (1) by analytic diagonalization of a 2 × 2 representation of the Hamiltonian constructed from estimates of H k , see Figure 1: Error mitigation for H 2 encoded on 4 qubits at equilibrium distance with n steps = 100, n VQE = 10, n shots = 8192 run on the ibmq_ourense quantum processor. The ground state is approximated with the hardware efficient R y R z variational form with one entangling layer. (a) Probability density functions (PDF) for the bare evaluation of H θopt on real hardware (green) and the corresponding mitigated results using Eq. (1)(blue), the cube root method (gray), the wls scheme (red) and the empirical selection of the a 0 and a 1 (orange). E 0 is the exact ground state energy, E nn denotes a numerical evaluation of H θopt using the optimal VQE ansatz, whose parameters were obtained on the real quantum processor. E CZ represents the value of the zero-noise extrapolation carried out on bare energy evaluations with a number of measurements comparable to the ones required for the Lanczos method. (b) Richardson extrapolation towards zero-noise energy estimations. Evaluations of H θopt are obtained by replacing every physical controlled phase (CZ) gate in the variational circuit with an odd (1, 3, 5, 7) number of copies, thus systematically increasing the impact of hardware errors. Markers at #CZ = 0 represent the outcomes of a linear extrapolation to zero noise. The green curve is constructed with the bare energy evaluations while the data in blue are obtained by applying to each point our proposed error mitigation technique defined in Eq. (1). Combined with Richardson extrapolation, the latter procedure can recover noiseless results (dotted line) almost perfectly. The color code and the meaning of the vertical lines are consistent with the definitions in panel (a).
Appendix C. The outcomes are compared to the original noisy estimate of the ground state energy E.
In the first example, we study the dissociation profile of the Hydrogen molecule H 2 [11,12] as a function of the internuclear distance d. We compute the Hamiltonian parameters using PySCF [46] in the STO-3G basis set [47] with the restricted Hartree-Fock method [48], leading to a four-qubit encoding. For comparison to previous state-of-the-art work [13], we also simulate H 2 on two qubits exploiting the spin conservation symmetry to reduce the size of the required quantum register [49].
In the second example, we increase the complexity of the simulation by focusing on the H 3 molecule, a prototypical example of a chemical system acquiring a Berry phase under a transformation between obtuse and acute triangular shapes [50][51][52]. Here, we tune the height h of an originally equilateral molecule at a constant basis side length of 1 Å: under this deformation, the spectrum exhibits a systematic change due to a level crossing at an arrangement of equidistant nuclei. The Hamiltonian is constructed similarly to the case of H 2 above, but using the unrestricted Hartree-Fock method.
As a last example, we consider the antiferromagnetic Heisenberg model for spin s = 1/2 on a tetrahedral cell where J > 0, σ α are the usual Pauli matrices and i, j denote nearest neighbors. We study the evolution of the ground state energy manifold as a function of the coupling strength of one of the bonds (J ) with respect to all the others. At the fully symmetric tetrahedron configuration, achieved for J = J , a level crossing occurs.
Results. In Fig. 1a we report, for the 4-qubit H 2 setup at the equilibrium bond length (0.74 Å), the distribution of outcomes for the evaluation of the ground state energy on the ibmq_ourense quantum processor, accessed via the IBM Quantum Experience using the Qiskit software stack [53]. The histograms are constructed by repeating 1335 times the experimental procedure, namely the evaluation of Hamiltonian expectation values for the optimal VQE parameters { θ opt } and the application of the algorithmic error mitigation  Bare results (green) are compared to error-mitigated ones (blue, n shots = 8 · 10 5 ). Black curves and dots denote the true ground state energies, and the dotted lines reference to the non-noisy energy evaluation obtained using the VQE optimal parameters resulting from (noisy) quantum hardware calculations. In all cases, the R y R z ansatz with one entangling layer is used.
method. For each data point, the noisy estimates of H k were constructed by using n shots = 8192 runs of the relevant quantum circuit. Compared to the exact result E 0 ≈ −1.137 Ha, obtained with numerical diagonalization, the bare measurement of E = H θopt on the optimized ansatz (green) yields the lowest accuracy, with an average of about −0.9145(6) Ha. A direct application of the order m = 2 error mitigation procedure, according to Eq. (1), significantly shifts the distribution towards the reference value: however, due to the normalization required for the Lanczosinspired algorithm, the distribution (blue) has a long tail towards small energies and large uncertainty. By increasing n shots one can directly obtain smaller uncertainties σ H k on the evaluations of H k θ within a single experiment. Despite being computationally expensive (typically σ H k ∝ 1/ √ n shots ), this approach leads in our case to an average mitigated result E L = −1.084(7) Ha with n shots = 10 7 . Less demanding alternatives are the application of the cube root method (grey), yielding the smallest improvement on the energy evaluation but the most stable results, the weighted least square over groups of n repeat = 5 repetitions (red), or the empirical selection of the a 0 and a 1 parameters (orange). The latter procedure is car- Figure 3: Ground state energies for the triangular H 3 molecule (a) and for the Heisenberg tetrahedron model (b, J corresponds to the orange bond). Bare results (green) are compared to error-mitigated ones (orange) obtained by fixing a 0 and a 1 to achieve a maximal estimated standard error of 0.04 Ha. Black curves and dots denote the true ground state energies, dotted lines are VQE results optimized numerically and evaluated in the absence of noise. The R y R z ansatz with 1 (3) entangling layers is used, with n steps = 2000, n VQE = 4 (10), n shots = 8 · 10 5 (8192), for H 3 (respectively the Heisenberg model).
ried out by choosing a 0 and a 1 in such a way to keep the average estimated measurement uncertainty below 0.02 Ha, which is far smaller than the investigated energy scales, while still providing an improvement over the bare results. In fact, by tuning the ratio a 0 /a 1 one can interpolate between the outcome of Eq. (1) and a more stable result, ultimately converging to the original bare value for a 0 /a 1 → ∞ (see also Appendix G). It is worth mentioning that a reasonable scale for the energy uncertainty, and therefore suitable values for a 0 and a 1 , can be identified through a systematic procedure involving only noisy energy estimates (see Appendix I) without prior knowledge of ideal results.
The performance of the proposed methods in presence of high noise values is assessed in Fig. 1b. Here, for the same configuration of Fig. 1a, we artificially increase the noise levels by replacing every controlled phase (CZ) operation in the entangling block of the variational circuit with an odd (3, 5, 7) number of copies [54, 55]. We observe a dominantly linear dependence on the number of CZ, with an increase of the uncertainties associated to the mitigated results E L for higher levels of noise. Fig. 1b also shows the application of the Richardson zero-noise extrapolation (ZNE) pro-cedure [14, 18] on both bare energy measurements and Lanczos-mitigated ones, reporting an almost complete cancellation of the hardware noise when both error mitigation strategies are combined.
To compare the Lanczos and the bare ZNE methods on an equal footing, we also take into account the overall number of measurements required in the two cases. In Fig. 1a, we report an additional histogram, E CZ , representing the distribution of error-mitigated energy reconstructions obtained with the zero-noise extrapolation (ZNE) procedure applied to the bare energies, using a similar amount of resources (i.e., measurements) as needed for the proposed Lanczosinspired procedure (i.e., E L results). For the latter, each of the data points contributing to the histogram is obtained by measuring 15 Pauli strings for the expectation value of the Hamiltonian H and 32 Pauli strings for H 2 and H 3 . In total, this amounts to 79 Pauli strings measured independently using 8192 shots each. To match as close as possible these overall requirements, we increase the statistics for the energy expectation reconstructions with the ZNE approach. More specifically, for each ZNE realization we measure four times H , one for each choice of #CZ = 1, 3, 5, 7. This consists of a total of 4 · 15 = 60 Pauli strings. For each Pauli string we double the number of shots used (2 · 8192) to estimate the expectation values, such that the overall number of measurements is actually in favour of the ZNE method. The whole procedure is repeated 50 times to construct the corresponding histogram. These results show that the use of more measurements for the ZNE method (keeping fixed the range of #CZ) only slightly reduces the uncertainties, without significantly improving the average energy expectation value (see also Fig. 14 in Appendix N). In fact, we consistently observe that the Lanczos method yields results closer to the true ground state energy, although with typically larger uncertainties. It is also worth stressing that we do not in general consider (as shown in Fig. 1b) the Lanczos and ZNE methods as alternative options, but rather as complementary approaches.
The results for the full dissociation profile of H 2 encoded on 2 and 4 qubits are shown in Fig. 2. The proposed error mitigation scheme enhances the quality of the energy estimates up to a factor of 5. In particular, for the 2-qubit calcula-tion this leads to an accuracy of about 10 mHa, which is competitive with state-of-the-art results obtained, e.g., on dedicated hardware [13,18] and comparable to the application of quantum subspace expansion (QSE) in Ref. [27]. We notice that for m = 2 the Lanczos algorithm requires a smaller subspace size than QSE at the cost of a larger number of measurements (i.e., more Pauli strings). It is also worth pointing out that in some cases our Lanczos-inspired error mitigation provides corrections going beyond a mere noise-cancellation effect in energy evaluations, and which can be interpreted as an improvement on the ansatz itself. This can be seen by comparing the mitigated results with what would be obtained form an ideal noiseless measurement E nn , simulated numerically starting from the variational ansatz learned on the real hardware (dashed lines). Additional quantitative details on both the improvement over noiseless energy estimates, due to the very nature of Krylov-subpace methods, and the fulfillment of Eq. (3), are provided in the Appendices J and K) Finally, in Fig. 3 we report the outcomes for the H 3 dissociation curve and for the Heisenberg tetrahedron model mapped on 6 and 4 qubits, respectively. Due to the significant increase in complexity and size of the required hardware simulations, for these models the optimal VQE parameters θ opt were computed via numerical simulations, while the final energy evaluations and the corresponding mitigated results were obtained by implementing the optimal circuit on the ibmq_almaden (H 3 ) and ibmq_ourense (tetrahedron) quantum processors. At difference with Fig. 2, here we made use of the empirical selection method for a 0 and a 1 in order to reduce the standard error on the mitigated points E L,a 0 ,a 1 (see Appendices for additional data). In both cases, the Lanczos-inspired scheme significantly helps in recovering the most relevant ground state features, i.e., the extremal points of the energy surfaces and level crossings.
Summary. We introduced a hardware agnostic technique to mitigate errors on near term quantum processors and we described its direct application to quantum chemistry and quantum physics problems. The proposed approach significantly improves the calculation of ground state properties by effectively enhancing the spectral weight of noisy variational ansatzes on the respec-tive optimal solutions while ensuring the physical consistency of the outcomes. This method does not require direct control on low-level hardware operations and can in principle be combined with other available noise mitigation protocols. We reported experimental results for energy evaluations closely approaching the exact reference values and demonstrated the robustness of the approach in several test cases with highly non-trivial ground state properties. The computational cost of this error mitigation procedure is ultimately associated to an increase in the number of measurements to be performed and can potentially be reduced by the introduction of more elaborate schemes for the grouping of Pauli terms appearing in the target Hamiltonian. In view of these arguments, we conclude that the proposed method has the potential to become a standard error mitigation scheme for any type of noisy quantum computing platform. Acknowledgements

A Kraus Operator Expression for Errors
The validity of the Lanczos method requires a density matrix description of the state of the quantum computer with noise. This is guaranteed as long as the whole evolution of the state of the quantum computer can be described with Kraus operators. Indeed, in the basic noise model of qiskit, the noise of the quantum computer is divided into gate errors, thermalization errors, and readout errors. While the Kraus operator expression for the gate and thermalization errors are well known and given below, the Kraus operator expression for the readout error is derived in section B. With each application of a gate an error occurs with a probability p depol. error . These errors are modeled via the depolarization channel, described for one qubit as returning the density matrix ρ n,depol. including the gate error given the state of the quantum computer after the application of the respective gate ρ 0 [37].
The thermalization error occurring during a gate duration of t is modelled using two different error sources. These are the phase flip and the generalized amplitude damping channel with respective Kraus operator expressions where τ 1 is a decay time to be fitted, and where ω is the energy difference between the two qubit levels, T the temperature and τ 2 a decay time.
While the phase flip channel describes a decay of the off-diagonal elements of a one-qubit density matrix, with the generalized amplitude damping channel a decay of the diagonal elements towards the Fermi-Dirac distribution is achieved. For each applied gate these Kraus expressions are applied consecutively on the involved qubits using the noise calibration data of an IBM quantum device to find τ 1 , τ 2 , p depol. error , t, T and ω.

B Kraus operator expression for readout errors
In this section we explain the simulation of readout errors in qiskit and derive a Kraus operator expression so as to incorporate the readout errors by changing the density matrix.
We consider an operator O in the Pauli basis with an expectation value of where s j ∈ {−1, 1} is the measurement outcome of σ i j ∈ {σ 0 = 1, σ x , σ y , σ z } with a measurement probability of p i (s) for the measurement result s ∈ {−1, 1} ⊗N for the Pauli string i. Now, this probability changes with the readout error p i r (s|s ) for a given Pauli string i, i.e., it encodes the probability to measure s instead of s given that the measurement outcomes of the single qubits were s = {s 0 , . . .}. Now, the measurement result with readout error (r.e.) can be written as In qiskit, p i r (s|s ) is described using the probability for a single qubit flip p i j r,j (s j |s j ) on qubit j, which is an approximation making the probability that a readout error occurs on one qubit independent of the measurement result of all other qubits. Furthermore, we assume that p This allows us to further simplify To emphasize the independence of p i j r,j (−s j |s j ) on s j by assumption we write p i j r,j ≡ p i j r,j (−s j |s j ). This allows us to write To be able to write down Kraus operators, we further assume that p x r,j = p y r,j = p z r,j which we refer to by p r,j in the following. This assumption is reasonable, as the only difference in the measurement of σ x , σ y , σ z is one single qubit rotation, which has a small error, which is neglected here. Furthermore, we have p 1 r,j = 0 as the identity does not need to be measured. Then, we can write Using this expression, it becomes evident, that j is a Kraus operator. This simplifies the error calculation to with = 0 • 1 · · · and j acts trivially on all sites except for site j. to describe the readout error by changing the density matrix.

C Relation to the Lanczos Algorithm and Generalization
In the Lanczos algorithm, the ground state is approximated using the ground state of the Hamiltonian expressed in an order m Krylov space K (m) . The matrix expression for the Hamiltonian is real and symmetric, as all matrix entries are real functions of { Ψ|H l |Ψ ∈ R} 2m−1 l=0 . Thus, every eigenstate can be chosen to be real. The ground state is the linear combination with real coefficient with minimal eigenvalues of all elements of the Krylov space. Since the minimal eigenvalue is the lowest possible measurement outcome for all states in K (2) the minimization in Eq. (1) cannot yield a lower energy estimate. Furthermore, the linear combination of states parametrized with a 0 and a 1 spans K (2) so that the ground state lies within the space parametrized with a 0 and a 1 . Hence, the minimization in Eq. (1) yields the ground state of the Hamiltonian expressed in K (2) and Eq. (1) evaluates to its eigenvalue.
This holds also for higher orders m for the generalized expression of Eq. (1) that reads for which E L,k,n,m > E 0 holds with odd k and even n. This follows by generalization of the statements of the main text, this section and Appendix F. For k = 1, n = 2 expression (21) is equivalent to the Lanczos method for the smallest eigenvalue in an order-m Krylov space, as can be shown with the same argument as for K (2) . Furthermore, E L,k=1,n,m ≤ E, as E is the special case of a 0 = 1, a i>0 = 0.
The direct relationship with the classical Lanczos algorithm immediately suggests a practical way to solve Eq. (1). For the 2-dimensional case (m = 2) presented in this work, the minimization is in fact equivalent to the diagonalization of a 2 × 2 matrix, which can be done analytically. The matrix is constructed by expressing all matrix elements in terms of the measurement results for H θ , H 2 θ and H 3 θ . The matrix elements can be obtained by constructing In this basis, the matrix expression for the Hamiltonian reads In higher orders (i.e., m > 2), the minimization can in principle be performed in a similar way by constructing and diagonalizing, in a classical post-processing stage, the corresponding matrix H K (m) . Alternatively, one could also achieve the desired result with an iterative minimization procedure whose cost function, parametrized by a 0 , a 1 , is defined in Eq. (1) of the main text. A similar approach is also possible at higher orders, when a suitable parametrization to navigate the appropriate Krylov subspace is considered.
Concerning the regularization techniques proposed in the main text and discussed in more details in some Appendix sections below, they can in principle be generalized straightforwardly to higher orders m. This is for instance true for the direct application of the Lanczos method (with the generalized version of Eq. (1) presented in Eq. (21) above) as well as for the closely related weighted least squares averaged version. The same Eq. (21) shows how the empirical tuning of the parameters a i could scale for higher orders, with m − 1 independent parameters to optimize for an order m mitigation. Although it is true that such approach could become impractical for large values of m, a viable solution could be to apply an iterative minimization procedure of a suitably parametrized cost function, which can be constructed from the noisy estimates of powers of the Hamiltonian. This could possibly be used to explore the neighborhood of the formal solution obtained with, e.g., the numerical diagonalization of the reduced Hamiltonian in the Krylov subspace. Finally, the cube root version can be replaced with similar higher order constructions of the form k H k θ , for odd k.  = 18), where the 18-qubit molecules are simulated using a two-qubit reduction scheme for implementation purposes. The Pauli terms for the Hamiltonian H are calculated using PySCF and the H 2 equilibrium interatomic distance of 0.75 Å and 1.6 Å for LiH. The geometry of the molecules was chosen to be one dimensional except NH 3 , which was simulated on a tetrahedron. The results for similar number of qubits n q tend to agree for different molecules.

D Scaling
In general terms and in the absence of more specific assumptions, when two-body interactions are present H 2 and H 3 are, respectively, 8th and 12th order operators, with correspondingly large numbers of Pauli strings to be measured in order to reconstruct average values. However, as we argue below through an extensive collection of numerical investigations, the reality for many cases of practical interest shows effective results which are much more favorable.
Focusing on quantum chemistry problems, in Fig. 4, we report the number of Pauli terms corresponding to the Hamiltonian operators H n (n = 1, 2, 3) for a collection of 11 different molecules which require up to n q = 18 encoding qubits. These were obtained numerically without resorting to grouping techniques, which might provide a further reduction of approximately an order of magnitude in the number of independent measurement settings required for the reconstruction of average values. We observe, in particular, that the number of terms in H seems to saturate around a power law of the form n 3 q , making it effectively a 3rd-order operator. This behaviour is typical whenever conservation laws are present, which provide implicit constraints to the number of available physical interactions. Regarding the terms H 2 and H 3 , although the trend is actually still far from achieving a power law behavior with a well established exponent, we can already explicitly observe that the actual number of Pauli strings remains far from the predicted values O(n 8 q ) for H 2 , respectively O(n 12 q ) for H 3 , and even below the O(n 6 q ) (respectively O(n 9 q )) that could be inferred by the 3rd order nature of the H operators employed. We interpret such results by mentioning two concurrent effects: on one hand, the commutation algebra of the Pauli strings effectively reduces the number of terms appearing in successive products of H. On the other hand, the total number of independent Pauli strings that can be constructed with n q qubits equals 4 nq , and can be quickly saturated for small and medium sized systems. While we cannot exclude that in the large n q limit an actual 8th or 12th order scaling is followed, the onset of such regime appears to be still beyond many of the examples of current practical interest in near term applications. The latter represent, in fact, some ideal candidates for our proposed error mitigation strategy.
Much more favourable scaling properties can also be predicted, independently of the size of the problem, for local Hamiltonians: There, the number of terms scales as n q , so that the Lanczos method scales in the worst case like n 3 q . This class of Hamiltonians is represented in the main text by the Heisenberg tetrahedron. Local Hamiltonians constitute by themselves an interesting field of research with many connections, e.g., to the theory of magnetism (see also a recent work by Vallury et al. [36], where the authors analyse an application of Krylov methods for local Hamiltonians), and include among others many examples of spin and lattice systems.
Finally, it is worth noticing that all the analysis conducted in the present work remains valid whenever a different observable O is used instead of the Hamiltonian. In such case, only OHO has to be measured, while the variational principle is still fulfilled and the ground state energy estimate will not be worse than the noisy prediction. In fact, H can actually be used to guide the choice of the operator O, which can be as small as the quantum computer is capable of dealing with. However, an analysis of this strategy requires a more detailed study which is left for future work.

E H 2 Saturation of wls
In this section we show the results of taking the weighted least square average (wls) over n repeat measurement outcomes. In this case the error mitigated energy estimate can be obtained via the formula: with standard error σ −2 E L ,wls = i σ −2 i . Notice that this solution fulfills the generalized Gauss-Markov Theorem [56], and therefore E L,wls has minimal variance. The results are shown in Fig. 5, where we plotted the dependency of the mean of the wls result and its standard deviation against varying n repeat . As expected, for increasing n repeat the standard deviation of the wls decreases faster than 1/ √ n repeat , which is validating the generalized Gauss-Markov Theorem. Furthermore, we can observe an increase of the mean of the wls result with increasing n repeat . The reason is the small uncertainty for results with larger energy, which therefore have a larger weight. The higher n repeat , the more unlikely are wls results which only contain low energy estimates with larger measurement uncertainty. Hence, the mean of the wls result increases. The mean for n repeat = 1 is below the true ground state energy showing that this value might not be useful and reflecting the necessity of considering the estimated measurement uncertainties.

F Variational Principle for the Cube Root Method
For the cube root method the variational principle holds as well since for any odd k and k √ . is monotone restricting the image of k √ . to the real axis.
The effective enhancement of the ground state spectral weight can be understood by comparing the expectation value where α i represents the probability of observing the i-th energy eigenstate on a given variational (noisy) ground state estimate. If we assume that the dominant contributions come from the true ground state (i = 0) and other eigenstates for which |E i | < |E 0 |, i.e., we have that the weight associated to the ground state energy in the expression of Eq. (25) is enhanced by a factor E k−1 , which is larger for the ground state, with respect to the quantity in Eq. (26). This assumption is reasonable, for example, in those cases where the ground state energy and the energies of the lowest excited states are negative. Although the cube root method is not a priori guaranteed to be helpful in all cases, the fact that it complies with the variational principle makes it straightforward to discard the result k H k θ whenever this is larger than the unmitigated value H θ . In such case, one could infer that the (noisy) variational quantum state features a significant occupation of high energy eigenstates.
G Dependence of the Lanczos Energy Estimate on a 0 , a 1 In Fig. 6 and 7 the dependence of the energy estimate on the choice of parameters a 0 , a 1 is shown. For H 3 even for large number of measurements we could observe an instability of the method. This is related to the fact, that the measurements have been taken over a long time period of six days during which the noise fluctuates significantly. This instability can be detected considering the estimates for the measurement uncertainty. This is shown in Fig. 6, where the measurement uncertainty also increases for too small energy estimates. In all cases considered, the ground state energy was found to be within a likely number of standard deviations of the measurement result so that the measurement uncertainty reflects the accuracy of the method sufficiently well.
E L (a 0 /a 1 ) E L (a 0 /a 1 ) ± σ Figure 6: The dependence of the mean of 100 energy estimates using formula (1) for fixed a 0 /a 1 with reference to E 0 and the mean of the estimated standard errors of the 100 results are shown for an height of 0.8 Å for H 3 . In general, we observe that for larger a 0 /a 1 the standard deviation decreases, while the energy estimate increases. For a 0 /a 1 ≈ E 0 the standard deviation is the largest as most of the weight of the statevector is projected out, the norm is close to 0. In conclusion, when fixing the parameter a 0 /a 1 a value needs to be chosen which provides a good compromise between increased ground state energy estimates and standard errors. For a 0 /a 1 → ∞ the method converges to the bare measurement of H θopt . In order to reduce the standard error by a factor of 1/ √ n, n estimates can be averaged over.   (1) for fixed a 0 /a 1 with reference to E 0 and mean of the estimated standard errors of the 100 results are shown for an internuclear distance of 0.74 Å for H 2 on four qubits. In general, we observe that for larger a 0 /a 1 the standard deviation decreases, while the energy estimate increases. For a 0 /a 1 ≈ E 0 the standard deviation is the largest as most of the weight of the statevector is projected out, the norm is close to 0. In conclusion, when fixing the parameter a 0 /a 1 a value needs to be chosen which provides a good compromise between increased ground state energy estimates and standard errors. For a 0 /a 1 → ∞ the method converges to the bare measurement of H θopt . In order to reduce the standard error by a factor of 1/ √ n, n estimates can be averaged over.

H Variational Form
In this section the used variational form, namely the R y R z ansatz, is shown in Fig. 8. It consists of a layer of initial rotations followed by entanglement layers. One such entanglement layer is shown below denoted with the box. This entanglement layer is repeated as often as required to obtain the highest achievable accuracy.
I Empirical selection of the parameters a 0 , a 1 In this chapter we want to give a quantitative description for how the bound of σ max = 40 mHa on the energy estimates E L,h,a 0 /a 1 , where h is the varied system parameter, is calculated. Such value was chosen by comparing the results for the whole dissociation profiles for different values of a maximal measurement uncertainty σ max , essentially identifying reasonable energy scales from typical energy differences in the system obtained from noisy estimates. This corresponds to a modification of Eq. (1) in the main text to E L,a 0 ,a 1 = min a 0 ,a 1 |C Tr(ρ L (a 0 , a 1 )H) with the additional constraint C : σ E L,a 0 ,a 1 ≤ σ max , where σ E L,a 0 ,a 1 is the estimated measurement uncertainty on E L,a 0 ,a 1 . Note that reasonable values of σ max can be found only in the range [σ E , σ E L,h ]. To quantify the choice of σ max , we evaluate the sum of the energy level distances for consecutive ground state energy estimates which we expect to be the lowest for a more pronounced energy profile (i.e. by requiring adjacent energy estimates to be close and therefore reducing noise induced fluctuations, this strategy highlights peaks or discontinuities in the profile, if they are present, without actually making any assumptions on the existence of such features). This is shown in Fig. 9 (a) for H 3 and (b) for H 2 , for the tetrahedron σ E L,J /J < σ max , for any σ max |E L,J /J,a 0 /a 1 − E L,J /J+δ,a 0 /a 1 | so that ∆ J /J E is constant. We denote with σ E L,J /J the measurment uncertainty for the energy estimate of Eq. (1) in the main text. We stress that choosing a σ max is always legitimate to do, as the results are bounded from below by the true values via the variational principle and from above by the non-mitigated, noisy results. Hence, by tuning the results as a whole an improvement within these boundaries is expected.
In summary, the proposed error mitigation procedure is particularly well suited to refine energy profiles and to bring to light extremal features in the spectrum. Towards this goal, we suggest to choose a common bound on the standard deviation in such a way that the calculated energy differences are the smallest. This does not involve any requirement regarding knowledge of the solution a priori.

J Groundstate overlap improvement
In the main text a sufficient condition for the improvement of the overlap between the true ground state and the ground state estimate is presented. It requires the fulfillment of Eq. (3)  (b) H 2 Figure 9: The dependence of the summed level distance of consecutive parameters is shown in (a) for H 3 and (b) for H 2 for different chosen cutoffs σ max , which is used for all heights h (distances d for H 2 ). Given the cutoff and the system parameter h, a 0 /a 1 is chosen as minimizing the ground state energy estimate yielding E L,h (Eq. (1) main text) if σ E L,h < σ max or as the smallest value a 0 /a 1 so that σ E L ,h,a0/a1 < σ max . We observe a saturation for large σ max , since all estimated errors are below the cutoff so that it is not applied. For small cutoffs, the ∆ h E increases up to the non-error mitigated value. In orange the estimated noise error on E L,h,a0/a1 averaged over h is shown. Its saturation for large cut offs σ max reflects the circumstance that the chosen cutoff is larger than the estimated standard error σ E L . and a 0 /a 1 > E L for the optimal a 0 /a 1 for a 1 = 0. In Tab. 1 we report the results for the evaluation of such condition for all the data points presented in the main text. We recall that whenever the presented value is larger than one, an improvement of the overlap between the ground state estimate and the true ground state is expected. For each of the values provided, we have that a 0 /a 1 > E L and, as reported in the Table, we evaluate the left hand side of Eq. (3) to be larger than 1 for all cases. Eq. (3) is therefore fulfilled and the overlap between the ground state and the estimate is improved.

K Noiseless Results Improvement
In this section, we provide insights about the improvement of the Lanczos estimates over noiseless energy estimates. This effect is a direct consequence of the application of the Lanczos diagonalization method within Krylov subspaces. Indeed, similarly to a classical diagonalization problem, any noiseless simulation of a variational ansatz essentially corresponds to an approximation of the eigenvector of the target Hamiltonian associated to the minimum eigenvalue. It is therefore not surprising that the quality of such approximation can benefit from the application of the Lanczos method, as this is explicitly designed to increase the spectral overlap with the exact solution. We notice, for example, that this appears to be the main motivation behind some recent work by Vallury et al. [36]. As we discuss in the main text, the aim of our work is, besides remarking the usefulness of this effect for quantum variational algorithms, to demonstrate its very promising parallel application as a systematic error mitigation procedure. For completeness, we report in Fig. 10 the results obtained from numerical noiseless simulations of the VQE algorithm for the four example models presented in the main text. These are obtained with Qiskit statevector simulator, which works out the linear algebraic calculations without the effect of sampling or hardware noise. It shows that the application of the method described in this work provides an improvement towards the true ground state values, which is purely due to the effective increased spectral overlap provided by the Lanczos procedure.

L Convergence Rate
In principle, due to the delicate scaling properties of the method already at the lowest m = 2 order, we do not expect that higher orders will be used extensively. Hence, we do not provide a detailed numerical study for the convergence of the method as a function of m. Instead we can make use of the Kaniel-Paige convergence theory for density matrices. This yields a convergence |E 0 − E L | < constant · r n , where n is the order of the method and r is a convergence rate.

M Effect of shot noise and error propagation
In this section, we provide additional insights about the results for E L reported in Fig. 1 of the main text, where some data points placed below the true ground state energy appear to contradict the  Figure 11: In this figure we show additional properties of the numerical data, which is also presented in Fig. 1 of the main text. For each data point we show on the x axis the energy estimate and, on the y axis, the corresponding deviation from the true ground state value in terms of the measurement uncertainty. In the lower panel the histogram of the data in the upper panel is shown, which is the same as presented in the main text in Fig. 1. Like in the figure in the main text, the solid black line marks E 0 , the dashed black line the noiseless estimate E nn .
validity of the variational principle. We support our discussion with the analysis presented in Fig. 11, where more detailed information is provided particularly regarding the uncertainty associated to the mitigated energy estimates. As it can be seen, the long tails of the energy distribution should be interpreted as a consequence of shot noise and error propagation effects. Indeed, as we highlighted in the main text and in Appendices A and B, all other possible sources of errors act as quantum channels, thus always yielding valid density matrices. However, statistical fluctuations due to the finite size of the samples taken from a quantum hardware can for example lead to apparent violations of the variational bounds whenever the minimization procedure in Eq. (1) of the main text yields an ill-conditioned denominator. Nevertheless, as we report in Fig. 11, such extreme cases are always clearly marked by large intrinsic errors accompanying the energy estimate and can thus be clearly identified and marked as unphysical, in what essentially represents a self-consistence mechanism for the proposed method.

N Additional Figures
This section contains additional results for the experiments presented in the main text. In Fig. 12 the result for the Lanczos method are shown for the parameters a 0 , a 1 minimizing the energy expectation value for the dataset also presented in Fig. 3 in the main text. The results for the tetrahedron coincide with the ones given in the main text as the chosen parameters a 0 , a 1 minimizing the energy yield an accuracy below σ/J = 0.03. In Fig. 13 the results for the different Lanczos methods for the VQE results of H 2 generated on four qubits are depicted. Also in this case the results for the parameters a 0 , a 1 achieving the exact minimum of Eq. (1) (blue) and the ones yielding a maximum standard error of 0.03 Ha (orange) almost coincide when the experimental estimates of H, H 2 , H 3 are obtained with n shots = 8 · 10 5 . Finally, Fig. 14 provides additional results for the comparison of the zero-noise extrapolation (ZNE) and the Lanczos error mitigation methods. In particular, we present the effect of increasing the total number of measurements in the ZNE approach.