Contextual Subspace Variational Quantum Eigensolver

We describe the contextual subspace variational quantum eigensolver (CS-VQE), a hybrid quantum-classical algo-rithm for approximating the ground state energy of a Hamiltonian. The approximation to the ground state energy is obtained as the sum of two contributions. The ﬁrst contribution comes from a noncon-textual approximation to the Hamiltonian, and is computed classically. The second contribution is obtained by using the variational quantum eigensolver (VQE) technique to compute a contextual correction on a quantum processor.


Introduction
The variational quantum eigensolver (VQE) is the leading algorithm for quantum simulation on noisy intermediate-scale quantum (NISQ) com-William M. Kirby: william.kirby@tufts.edu puters, due to the limited resources it requires in both qubit count and coherence time [1][2][3][4][5][6][7][8][9][10][11][12][13]. VQE is a hybrid quantum-classical algorithm in which the expectation value of the Hamiltonian, or other observable, is computed for an ansatz state generated by a parameterized quantum circuit. Optimization of the ansatz parameters is performed iteratively using an optimization algorithm running on a classical computer. VQE algorithms require a large number of measurements to be performed, and give approximate results due to limitations of the ansatz that can be prepared and noise on the quantum device. As a result, the largest experiments to date either do not reach chemical accuracy [6], do not include all Hamiltonian terms [10], or simulate restricted models such as Hartree-Fock [13].
We address these limitations of VQE by providing an approximate simulation method for the full Hamiltonian that can be adjusted to use any amount of available quantum resources. We show that in many cases our method reduces the resources required to reach chemical accuracy. The method is based on the concept of a noncontextual Hamiltonian, in which definite values can be assigned to all Pauli terms simultaneously without contradiction [14]. In [15], we gave a classical algorithm for computing the ground state energies of noncontextual Hamiltonians based on a quasi-quantized model [16,17]. This raises the question of whether a truly hybrid algorithm can be developed for simulating general Hamiltonians, in which the noncontextual parts of the Hamiltonians are computed classically and contextual quantum corrections are computed using VQE. The method described in this paper is such an algorithm.
The contextual quantum correction is obtained by performing VQE restricted to quantum states that are consistent with the noncontextual ground state. We refer to the space of such quantum states as the contextual subspace. The contextual subspace represents the degrees of freedom that remain after the noncontextual degrees of freedom have been fixed. The resulting quantum computations only involve Hamiltonian terms in the complement of the noncontextual Hamiltonian. We also show how to adjust the noncontextual part of the Hamiltonian, in order to move more of the computation onto the quantum computer, while preserving the structure of the quasi-quantized model. The technique for accomplishing this is related to "subspace-search VQE" [18], in which excited energies are found by restricting the search space to be orthogonal to the (previously approximated) ground state. In our case, we are not looking for excited states, but we are implementing VQE in a restricted search space, and part of the technique for achieving this is similar to that in [18] (see Section 2 for details of the technique).
As an example, suppose we want to apply VQE to a Hamiltonian on n qubits, but the available quantum processor has only q qubits. CS-VQE permits us to adjust the noncontextual approximation method so that the associated quantum correction uses exactly q qubits. Increasing the number of qubits used on the quantum processor monotonically improves the quality of the overall approximation, interpolating between the noncontextual approximation with no quantum correction and full VQE. Thus, we can tune the quantum part to fit the available quantum resources, with the classical method making up the difference.
CS-VQE is the first hybrid quantum-classical algorithm of its kind, where a nonclassicality criterion (in our case, contextuality) is used to isolate the intrinsically quantum part of a quantum algorithm, and the classical remainder of the algorithm is simulated classically. Our prior works [14,15] defined contextual and noncontextual Hamiltonians, and gave a classical model for noncontextual Hamiltonians, respectively, so the current algorithm is distinct because it is applicable to arbitrary Hamiltonians and because it has a quantum component as well as a classical component.
In the remainder of the introduction, we review the necessary information from [15] about classical simulation of noncontextual Hamiltonians. In Section 2, we describe the quantum cor-rection procedure. In Section 3, we describe how to implement CS-VQE, and show the results of simulating CS-VQE on Hamiltonians for small molecules. Finally, in Section 4, we summarize our results and discuss their implications for NISQ computing.
Let S be the set of Pauli terms in a general Hamiltonian H. We divide S into a noncontextual subset S nc and its complement S c . This induces a decomposition of H into a noncontextual part H nc whose Pauli terms are S nc , and H c whose Pauli terms are S c [15]: We also require that S nc be closed under inference within S: Definition 1. S nc is closed under inference within S if any operators in S whose values can be inferred from the values of operators in S nc must be included in S nc [15].
Closure under inference is reviewed in detail in Appendix C. The decomposition (1) is the basis of the CS-VQE method. We will obtain an efficient classical description of the eigenspaces of H nc , and use this and H c to quantum compute a correction to the noncontextual approximation to the ground-state energy.
Because all terms in H nc can simultaneously be assigned definite values without contradiction we can introduce a phase space description of its eigenspaces [14,36]. The phase space points are the possible joint value assignments to a set of observables derived from S nc , which we describe below. The eigenstates of H nc are probability distributions over this phase space. This is a quasiquantized model: a classical phase-space model with an uncertainty relation imposed upon the allowed probability distributions (sometimes called epistemic states) on the phase space [16,17]. We refer the reader to [15] and Appendix C for further general points about quasi-quantized models.
We now describe the states of the model, which we call noncontextual states. We first identify a set of observables that define the phase-space points in the model: Each A i is one element of the corresponding clique C i , so the A i pairwise anticommute. G is an independent generating set for the Abelian group Z, which includes Z as well as all products of pairs of operators in the same clique. The phase space points are all assignments of values ±1 to the observables in the set (2) [15]. The noncontextual states are probability distributions over the phase space points generated by (2). Probability distributions corresponding to valid quantum states must obey an uncertainty relation [16,17]. A sufficient condition is that the commuting generators G j ∈ G take definite values, and that the expectation values of the A i form a unit vector [15]. This means that each noncontextual state is defined by parameters ( q, r) such that In [15], we showed how these expectation values for the set (2) induce expectation values for all terms S nc in the noncontextual part H nc of the Hamiltonian; consequently, a noncontextual state induces an expectation value for H nc . We also proved that all expectation values for H nc can be generated in this way. Minimizing this expectation value by varying the noncontextual state ( q, r) thus provides a variational estimate of the ground state energy of H nc [15]. We refer to the minimizing assignment ( q, r) as the noncontextual ground state.
The A i are anticommuting Pauli operators and r is a real unit vector, so the observable is a rotated Pauli operator, and thus has eigenvalues ±1. The unitary that maps A to a single Pauli operator is a sequence of N − 1 rotations generated by Pauli operators, all of which preserve the G j . If each of the A i has expectation value r i then A has expectation value +1, and vice versa [15]. Thus, the noncontextual state with parameters ( q, r) is equivalent to a joint value assignment for the set of observables where the value assignments are G j → q j = ±1 for each j and A → +1. We refer to the observables in (5) as the noncontextual generators.
The noncontextual states therefore correspond to subspaces of quantum states that are stabilized by the operators q j G j and by A. These are almost stabilizer subspaces in the usual sense (see e.g. [37,Sec. 10.3]), except that A is not a single Pauli operator, but is unitarily equivalent to one. Therefore, a noncontextual state can be thought of as a stabilizer subspace, one of whose stabilizers has been rotated by an efficiently-describable unitary.

Quantum correction
Let ( q, r) be the noncontextual ground state. If we take the resulting energy of H nc as a classical estimate of the ground state energy of the full Hamiltonian H, we can obtain a quantum correction by minimizing the energy of the remaining terms in the Hamiltonian over the quantum states that are consistent with the noncontextual ground state. As discussed above, this common eigenspace is a stabilizer subspace up to a rotation on one of the stabilizers. We refer to this subspace as the contextual subspace.
Before we discuss how to find quantum corrections, we establish when such corrections can appear: Theorem 1. Let S be a set of Pauli operators, and let S nc be a noncontextual subset that is closed under inference within S (see Definition 1). Then for any noncontextual state ( q, r) as in (3) describing S nc , there exists a quantum state consistent with ( q, r) (i.e., that gives the same expectation values for S nc as ( q, r)) for which the expectation value of every operator in S c ≡ S \ S nc is zero.
The proof may be found in Appendix A, and follows from the fact that S nc is closed under inference: no value of any operator in S c can be inferred from the values of operators in the noncontextual part of the Hamiltonian [15]. Theorem 1 has two useful corollaries. Proof. From Theorem 1, it follows that there exists a ground state of the noncontextual part for which the expectation value of every other term in the Hamiltonian is zero. Hence, the ground state energy of the noncontextual part is a possible expectation value for the energy of the full Hamiltonian, so it is lower-bounded by the ground state energy of the full Hamiltonian. To compute the quantum correction we minimize the expectation value of the full Hamiltonian over quantum states consistent with a given noncontextual state. This produces a variational estimate of the energy with the contribution from H nc given by the noncontextual state.

Mapping a contextual subspace to a stabilizer subspace
We now show how to map the contextual subspace corresponding to the noncontextual ground state to a subspace stabilized by single-qubit Z operators. To achieve this goal we rotate the G j and subsequently the single operator A to singlequbit Z operators.
The number of G j is some M < n, as discussed in [15]. Therefore, the G j can be mapped to single-qubit Z operators by a sequence of at most 2M π 2 -rotations 1 (for completeness, we provide a constructive proof as Lemma A.2 in Appendix A). These rotations are Clifford operators, so they map the remaining Pauli operators in the Hamiltonian back to single Pauli operators while preserving their commutation relations. Let D denote the composition of these rotations, and let H ≡ DHD † be the rotated Hamiltonian. After applying D, the noncontextual generators G j have been mapped to single-qubit Z operators G j . Without loss of generality, since we have already found the noncontextual ground state at this point we can choose the signs of the G j such that they all have eigenvalue +1 in the noncontextual ground state, and thus stabilize it. We refer to this basis as the "rotated basis." Once D has been applied, we map A = DAD † to a single-qubit Z operator as well. A is a normalized linear combination of the anticommuting Pauli operators A i = DA i D † , as in (4), so we use the sequence of rotations employed in unitary partitioning [39,40]. The result is a sequence of N − 1 rotations generated by products of pairs of the A i ; we denote by R the composition of these rotations, and the result of applying it to A is where A 1 is a single Pauli operator. The rotations forming R are generated by products of the A i , and the A i commute with the operators G j , so R commutes with the operators G j . Thus A 1 commutes with and is independent of the operators in G , so since it is a single Pauli operator, we can use Lemma A.2 to map it to a single-qubit Z operator as well, without disturbing the operators in G . Let D A denote the full rotation that maps A to a single-qubit Z operator A .

Restricting the Hamiltonian to a contextual subspace
In the rotated basis, we will restrict the Hamiltonian to the subspace stabilized by the noncontextual generators G j . We will then obtain the quantum correction by minimizing the expectation value of this restricted Hamiltonian over +1eigenvectors of the remaining noncontextual generator A .
Let H 1 denote the Hilbert space of the n 1 qubits acted on by the single-qubit Pauli Z operators G j , and let H 2 denote the Hilbert space of the remaining n 2 qubits. Thus the full Hilbert space is H = H 1 ⊗ H 2 and the total number of qubits is n = n 1 + n 2 . The contextual part of the Hamiltonian in the rotated basis is: The set of Pauli terms in H c is S c and terms in S c in general act on both of the subspaces H 1 and We can write the terms P in (7) as where P 1 is a Pauli operator acting on H 1 and P 2 is a Pauli operator acting on H 2 . P commutes with an element of G if and only if P 1 ⊗ 1 H 2 does (where 1 H 2 denotes the identity operator acting on H 2 ), since the operators in G act only on H 1 . Since the noncontextual state corresponds to a subspace stabilized by G , if P anticommutes with any element of G , then its expectation value in the noncontextual state is zero. Hence, any P that admits a quantum correction must commute with all elements of G , and thus P 1 ⊗ 1 H 2 must as well. The elements of G comprise all singlequbit Z operators acting in H 1 , so P 1 must be a product of such operators. Hence the expectation value of P 1 is some p 1 = ±1, determined by the noncontextual state. Let |ψ ( q, r) be any quantum state consistent with the noncontextual state ( q, r). The action of any term P that admits a quantum correction on |ψ ( q, r) has the form Therefore, if we denote by H c | ( q, r) the restriction of H c to its action on the noncontextual ground state ( q, r), for This is a Hamiltonian on n 2 qubits, for n 2 given by where |G| is the number of noncontextual generators G j . Furthermore, if terms P ∈ S c are distinct only on their tensor factors P 1 , the remaining operators P 2 in H c | H 2 will be identical. Also, any terms that anticommute with any of the noncontextual generators G j are dropped entirely (since their expectation values are zero). Thus, the restricted Hamiltonian H c | H 2 may contain fewer than |S c | terms. This is illustrated in Fig. 3, in Section 3.2.

Optimizing within a noncontextual subspace
To obtain the quantum correction, we perform n 2 -qubit VQE on the restricted Hamiltonian H c | H 2 within the contextual subspace. The contextual subspace is the subspace of H 2 that forms the +1-eigenspace of A , the remaining noncontextual generator in the rotated basis. To search within the +1-eigenspace of A , we can prepare ansätze in the +1-eigenspace of A (a single-qubit Z operator), and then apply the inverse of D A . This guarantees that every ansatz state is consistent with the noncontextual ground state ( q, r). Note that we only explicitly restrict the rotated Hamiltonian to the subspace stabilized by G j , whereas we restrict to the +1-eigenbasis of A at the level of the ansatz. This is because although the operation D A that diagonalizes A can be efficiently implemented on a quantum computer, it is not a Clifford operation. Conjugating the Hamiltonian with D A can increase the number of terms by a factor of Θ(2 N ), where N is the number of cliques. Thus if N is small, we could classically compute the Hamiltonian restricted to the +1 eigenbasis of A and then perform VQE on this Hamiltonian. This would save one qubit and permit an unconstrained search for the quantum correction, but since N can in principle scale as Θ(n) this approach will not be efficient in general.

Example
As an example, we construct a Hamiltonian for which most of the terms are included in the noncontextual part. Let S = S nc ∪ S c , where {ZII,IXI, IY I, IZX, IZY, IZZ, ZXI, ZY I, ZZX, ZZY, ZZZ}, The set of terms S nc is noncontextual, par- The extra terms S c all commute with Z. In this case, G = Z since Z contains only one operator, and this operator is already a single-qubit Z operator, so D is the identity. Thus H 2 is the Hilbert space of the second two qubits, so for We also have for some unit vector r; the restriction of A to H 2 is thus so D A is the rotation that maps this to a singlequbit Z operator, as described in Section 2.1. We can choose in this case, for an ansatz we may prepare any state whose value is |0 for the first qubit in H 2 , and then apply D † A to this state. Thus, we reduce an initial Hamiltonian on three qubits to a noncontextual approximation and a quantum correction that may be implemented on a two-qubit quantum processor.
To evaluate the performance of the resulting approximations, we generated 10000 Hamiltonians with the terms (13) by choosing coefficients for them uniformly at random from [−1, 1]. The resulting fractional errors in the ground state energies are plotted in Fig. 1; the average fractional error is 0.257 for the noncontextual approximation alone, and 0.0268 when the quantum correction is included. The quantum corrections were simulated classically by directly evaluating the lowest eigenvalues of the Hamiltonians restricted to the noncontextual ground states.

Contextual subspace VQE
The quantum correction to noncontextual approximations discussed in Section 2 allows us to use limited quantum resources to improve a classical simulation result. In this section we explain how we can systematically step back from the original noncontextual approximation in order to enlarge the contextual subspace, thus improving the overall accuracy of the approximation by using more quantum resources. This provides a pa-rameter that can be specified based on the quantum resources available, taking us from the optimal noncontextual approximation at one extreme to full VQE at the other. We call this method contextual subspace VQE.

Method
We begin with a Hamiltonian H whose noncontextual approximation is H nc . As discussed above, the noncontextual ground state corresponds to a joint eigenspace of the noncontextual generators G ∪ {A}. We can trade accuracy of the noncontextual approximation for an improved quantum correction by decreasing the size of G ∪ {A}, which increases the dimension of the contextual subspaces. We accomplish this by decreasing the size of G, the set of generators for the commuting part of H nc . Since the number of qubits used in the quantum correction procedure is the total number of qubits minus the number of generators in G (see (12)), reducing the size of G increases the dimension of the search space for the quantum processor.
We work in the rotated basis, as in Section 2. In this basis, we select some subset of the noncontextual generators G j , and remove all terms generated by them from the noncontextual part. Since the G j are single-qubit Z operators, this means that for each generator to be dropped we remove from the noncontextual part all terms containing the corresponding single-qubit Z operator as a tensor factor. All the terms thus removed should be added to the quantum correction Hamiltonian H c (as in Section 2.2). We now implement the quantum correction on this expanded H c , keeping the same noncontextual ground state that we began with, but only applying its value assignments to the generators that remain in the noncontextual part.
The new noncontextual approximation on its own will in general be worse than the original noncontextual approximation. However, after including the new quantum correction the overall approximation cannot be worse, and will in general be better. This is because the values assigned in the original noncontextual approximation and quantum correction are still consistent with the noncontextual ground state, so quantum states that obtain those values are included in the new quantum search space. Thus in the worst case the new approximation will only recover the original approximation. If the new quantum correction is nonzero for any additional terms, the new approximation will be strictly better than the original approximation. In the limit where we remove all terms from the noncontextual part and simulate them on the quantum computer, there will be no noncontextual approximation left, and we will have recovered full VQE.
The additional terms that can have nonzero quantum corrections after the removal procedure are those that anticommute with any of the generators G j that were removed, but commute with the remaining generators. These terms were previously restricted to null expectation values only because the noncontextual state was required to be a joint eigenstate of the removed generators, so when that is no longer enforced their expectation values can vary. Therefore, we can choose which subset of the generators to remove based on which will permit the optimal quantum correction.
Note that classically simulating the noncontextual part of the Hamiltonian is NP-complete, so in worst cases the classical simulation part of CS-VQE will not perform well [15]. However, worst case Hamiltonians for standard VQE are QMAcomplete, meaning that a similar argument applies to VQE in general. Hence, in both cases we are interested in heuristic performance for specific Hamiltonians of interest, rather than worst cases. Framed in this way, what CS-VQE does is take standard VQE, which is a heuristic for an optimization problem over a set of parameters for a quantum circuit, and transform it into two heuristics (one classical and one quantum) for two smaller optimization problems. In practice, we have found that a combination of Monte-Carlo and gradient descent methods works well for the classical part of the algorithm, but continuing to optimize this is a topic for future work.

Applications
We tested CS-VQE on a set of electronic structure Hamiltonians in the Jordan-Wigner mapping [41]. In order to distinguish CS-VQE from qubit tapering, we first tapered the Hamiltonians using symmetries as in [42,43], then implemented CS-VQE in order to remove even more qubits. The initial noncontextual approximation Hamiltonians were chosen via a greedy classical algorithm, as described in [15]. This algorithm runs   in O(N 5 ) time for an N -term Hamiltonian, since testing a particular Hamiltonian for noncontextuality takes O(N 3 ) time [14], and a greedy algorithm that adds optimal terms one at a time requires O(N 2 ) steps. This method is not optimal, but is efficient. The quantum parts of the procedures were simulated classically by directly evaluating the lowest eigenvalues of the quantum correction Hamiltonians. The results are given in Fig. 2, which shows the overall CS-VQE approximation errors versus the number of qubits used on the quantum computer, and in Fig. 3, which shows the number of terms that must be simulated on the quantum computer in order to reach chemical accuracy using CS-VQE. Our code is available on GitHub 2 , and may be used to reproduce our results or to apply CS-VQE to new Hamiltonians of the reader's choosing.
As noted at the end of Section 3.1, CS-VQE is sensitive to the order in which the qubits are moved from the noncontextual approximation to the quantum processor. In the calculations to obtain Figs. 2 and 3, we used a heuristic that begins with the noncontextual approximation, then adds qubits to the quantum correction two at a time, greedily choosing each pair to maximize the decrease in the ground state energy estimate. This method is informed by the structure of the noncontextual and contextual parts of the molecular Hamiltonians, and performed best out of the heuristics we tried that can be implemented efficiently without performing full VQE. Details of the implementation of the heuristic are given in Appendix B.
This heuristic involves running CS-VQE repeatedly, since for sufficiently large applications one would have to use the quantum processor to compute the quantum corrections on the way to choosing the set of qubits for the final quantum correction. However, these preliminary computations would only be necessary once the number of qubits chosen becomes unfeasible for classical simulation, and from Fig. 3 we see that the number of terms required to reach chemical accuracy can be many times smaller than the number of terms required to implement full VQE. Therefore, even the repeated runs of CS-VQE required for this heuristic can require fewer measurements overall than full VQE, and of course they also require fewer qubits.
Alternatively, one could use a heuristic to determine the order without evaluating energies at all. However, all variants of this that we tried had substantially worse performance than the heuristic discussed above, so we suggest using that heuristic for real applications. We also tested an inefficient "optimal" heuristic that begins from full VQE and moves qubits to the noncontextual approximation one at a time, greedily minimizing the error penalty for each. Actually implementing this heuristic is even more costly than full VQE, but it did identify the optimal qubit orderings in cases small enough for us to find these by bruteforce search. The first heuristic discussed above performed nearly as well as the "optimal" heuristic, requiring the same number of qubits to reach chemical accuracy in most cases and only one extra in the remaining few cases. Details of all of the heuristics and their relative performance are discussed in Appendix B.

Conclusion
In this paper, we showed how to use a quantum computer to obtain a correction to a noncontextual approximation of a ground state energy. We then showed how to adjust the number of qubits used on the quantum computer in order to increase the accuracy of the hybrid approximation. This method, contextual subspace VQE or CS-VQE, is a true hybrid quantum-classical algorithm, in which the quantum resources used may be set to match whatever resources are available, and the classical approximation algorithm accounts for the remainder. The method is approximate, but variational, as is VQE itself. Exact methods will only achieve approximate results on NISQ devices due to their noisy character. CS-VQE allows the quantum resources used to be increased systematically until the desired precision is achieved, if possible.
Standard VQE is a heuristic algorithm: there are no analytic characterizations of its performance for general Hamiltonians, or even for special classes like electronic structure Hamiltonians, upon scaling the system size. This is also true for CS-VQE: its performance is sensitive to the specific problem to which it is applied. We do not analytically characterize the errors as a function of the number of qubits used on the quantum processor. However, the examples in Section 3.2 illustrate that CS-VQE performs well in many cases of interest going well beyond the scale of VQE implementations to date, so we hope that as the available quantum processors continue to grow, CS-VQE can be used to allow larger systems to be simulated using those processors.
The technique for restricting the quantum correction to the subspace consistent with the noncontextual ground state may appear reminiscent of using qubit tapering to exploit symmetries as described in [42,43]. However, in CS-VQE the symmetries are intrinsic to the noncontextual ground state, rather than to the Hamiltonian (as in [42,43]), and are thus under the experimenter's control. We illustrated this point in Section 3.2 by applying CS-VQE to Hamiltonians that were already tapered using the methods of [42,43]; using CS-VQE we can eliminate additional qubits at will.
In this paper we did not explore how to implement ansätze for the restricted VQE instance used by CS-VQE, instead finding the exact ground states of the contextual parts. However, standard ansatz classes, like unitary coupledcluster (UCC) for electronic structure Hamiltonians [44][45][46], can be transformed into ansätze for CS-VQE by projecting the gates onto the contextual subspace, just as the contextual part of the Hamiltonian is restricted to the contextual subspace. Detailed study of this is a topic for future research.
One concern for standard VQE as well as for CS-VQE is that the ansatz may suffer from the barren plateau problem [47][48][49], where the gradient of the cost function (in this case expected energy) vanishes exponentially with the system size. It is hoped that for standard VQE, using physically-motivated ansätze like UCC may avoid the barren plateau problem, so since we can use projections of the same ansätze for CS-VQE, this same hope transfers to our case. However, even physically-motivated ansätze may be subject to noise-induced barren plateaus [50]: to the best of our knowledge, all variational quantum algorithms have the potential to fail in this way, including CS-VQE. Nonlinear optimization and its attendant problems, including barren plateaus, may be avoided by the use of quantum imaginary time evolution (QITE) or similar methods [51,52]. In our case, QITE could be applied directly to the contextual part of the Hamiltonian.
It is possible that some of the qubit and term reductions we obtained using CS-VQE have explanations in terms of chemistry. However, in such cases CS-VQE identifies and exploits such features using principles that are derived from the foundations of quantum mechanics, and are consequently agnostic any specific, high-level chemistry arguments. Identifying such chemical arguments would illustrate the role contextuality plays in chemistry, which would be of independent interest.
By using CS-VQE it is possible to reach chemical accuracy for ground state energies of numerous small molecules using many fewer qubits than would be required to implement full VQE on the tapered Hamitonians. The number of terms and thus number of measurements required is also substantially reduced by using CS-VQE, since groups of terms become equivalent under the symmetry imposed by the noncontextual ground state. The number of measurements needed to obtain the quantum correction could be further reduced by the techniques described in [39,40,[53][54][55]. We leave this and other optimizations of the method to future work. Current VQE implementations are limited in both qubit count and number of measurements by the available hardware, so we expect CS-VQE to be of immediate practical value in accessing new molecular simulation applications on NISQ computers.

A Proofs
We will use Lemma 1 from [15]: Lemma A.1 (Lemma 1 in [15]). Let P 1 , P 2 , ..., P N be an anticommuting set of Pauli operators. For any unit vector a ∈ R N , the operator N i=1 a i P i has eigenvalues ±1. From this it follows that for any state, N i=1 P i 2 ≤ 1.
Theorem 1 Let S be a set of Pauli operators, and let S nc be a noncontextual subset that is closed under inference within S (see Definition 1). Then for any noncontextual state ( q, r) as in (3) describing S nc , there exists a quantum state consistent with ( q, r) (i.e., that gives the same expectation values for S nc as ( q, r)) for which the expectation value of every operator in S c ≡ S \ S nc is zero.
Proof. Let G∪{A} be the independent, commuting set of observables associated to the noncontextual state ( q, r) describing S nc (see (3) and (5)): the values assigned to G ∪ {A} in the noncontextual state ( q, r) are for each G j ∈ G, and (see (4) and the associated discussion). Let P be a Pauli operator in S \ S nc . Case 1. If P anticommutes with any operator in G, then P = 0, since any quantum state consistent with ( q, r) is a simultaneous eigenstate of G. Case 2. If P commutes with the operators in G and also with the A i , then: 1. if P is a product of operators in G, then P can be inferred from G, so P must in fact be included in S nc , since by assumption S nc is closed under inference within S. This follows immediately from Definition 1, the definition of closure under inference with S. Case 3. Finally, suppose P commutes with the operators in G, but anticommutes with at least one of the A i . In this case we want to prove that there exists a +1-eigenstate of A for which P = 0, as follows: Let I P be the set of indices such that If I P is empty, then P anticommutes with all of the A i : thus since (see (4) and the associated discussion), and (by Lemma A.1), it follows that P = 0. The remaining case is when I P is nonempty; there also exist i / ∈ I P by assumption. Let Since K and L are linear combinations of anticommuting Pauli operators, their eigenvalues are ±k and ±l, respectively, where Therefore, Since P commutes with K and is a Pauli operator, P K is also an observable with eigenvalues ±k, which commutes with L (since both P and K anticommute with L). Thus, P K commutes with A, so within the +1-eigenspace of A there exist eigenstates |± of P K with eigenvalues ±k, i.e., A|± = |± , P K|± = ±k|± .
Note that since P / ∈ S nc , P cannot be written as a product of operators in G with any of the A i , so both of the states |± are consistent with the noncontextual state ( q, r).

Consider the Pauli operator
defined as follows: for each k...
where the values σ k differ for exactly one k (as we noted above, at least one of the σ Consider a rotation by π/2 generated by J i , i.e., Upon conjugating B i by this operator, we obtain where the third line follows because J i anticommutes with B i , and the fourth line follows because J i is self-inverse. By the conditions on the σ = I or ± iZ for each k, and ±iZ appears exactly once, so (42) becomes where all σ (D i ) k = I except one, which is Z. In other words, the rotation about J i has mapped B i to a single-qubit Z operator, as desired.
In each step, we apply the rotation exp i π 4 J i to all operators in the set. Thus we might worry that, having already mapped some subset of the B i to single-qubit Z operators D i , applying some later rotation exp i π 4 J i to map B i to a single-qubit Z operator could change the previously obtained D i . This turns out not to be the case, as we now show: Consider some particular one of the D i , whose expansion as a tensor product is where one of the σ (D i ) k is Z and the others are I. D i commutes with B i , since the previously applied rotations preserve commutation relations, so for all values of m such that σ (36)) must be I or Z. But this implies that J i also commutes with D i , since we know that σ Therefore, the rotation that maps B i to a single-qubit Z preserves the previously obtained Z} for all k. Since any previously-obtained D i are single-qubit Z operators and we assumed that the entire set is independent, B i cannot be the product of any subset of the previouslyobtained D i . Therefore, there must exist some m ∈ {1, 2, ..., n} such that and σ for all of the previously-obtained D i . Apply the rotation exp i π 4 K i , for K i defined by where and σ for all k = m. Thus exp i π 4 K i commutes with and therefore does not change any previouslyobtained D i .
As in Case 1, applying exp i π 4 K i to B i obtains where by construction, and σ for all k = m. In other words, the rotation has changed the Z at the mth spot in B i into an X, and left B i otherwise unchanged. We also apply this rotation to all other operators in the set, which does not change those that have already been mapped to single-qubit Z operators, as we noted above. Now B i is no longer diagonal, so we proceed as described in Step 1 above, applying a second Pauli π 2 -rotation to map B i to a single-qubit Pauli operator.
B CS-VQE implementation details B.1 Moving qubits from the noncontextual approximation to the quantum correction We describe in detail how the noncontextual approximation is truncated to make room for improved quantum corrections. As in Section 2, we work in the rotated basis, denote by H 1 the subspace of n 1 qubits acted upon by the noncontextual generators G , and denote by H 2 the subspace of the n 2 remaining qubits, which are used to implement the quantum correction. Let I 2 be the set of indices of these latter n 2 qubits (those not acted upon by G ), whose Hilbert space is the quantum search space H 2 . To increase the size of H 2 , we first need to select some subset of G that we want to remove from H nc . Since the elements of G are single-qubit Z operators, this subset defines a set I add of indices for qubits whose states are initially fixed by the noncontextual state, but that we will switch to simulating on the quantum processor. To begin with, from S nc (the terms in the noncontextual part of the Hamiltonian, in the rotated basis) we remove all terms that act on any of the qubits in I add , including the elements of G that act on these qubits. The remaining elements of G form a new generating set G ⊂ G satisfying |G | = |G | − |I add |.
Let the new noncontextual set of terms be denoted S nc , and let All terms that were removed from S nc to obtain S nc should be added to S c to obtain an expanded contextual set of terms S c , whose corresponding Hamiltonian is We can see that this removal operation preserves closure under inference of S nc by again thinking of G ∪ A as stabilizers (up to some signs) for the contextual subspace. The elements of G are single Pauli operators that are generators for the stabilizer group of the subspace. In order to increase the dimension of the stabilized subspace, therefore, for each element G i of G that we remove we must also remove all elements of the stabilizer group that include G i as a factor. In other words, in this instance preserving closure under inference is equivalent to preserving closure of a stabilizer group.
We can now implement the quantum correction on H nc and H c , keeping the same noncontextual ground state that we began with, but only applying its value assignments to terms in H nc . Let n 2 denote the new number of qubits used in the quantum correction procedure: then by (53), where n 2 was the initial number of qubits used in the quantum correction procedure.

B.2 Details of heuristics
The heuristic described in the main text starts from the pure noncontextual approximation and moves qubits two at a time to the quantum correction search space, greedily maximizing the improvement to the error at each step. We guessed that this heuristic might perform well for the following reasons. For the molecular Hamiltonians  Using CS-VQE Terms required to reach chemical accuracy Figure 5: Number of terms simulated on the quantum processor required to reach chemical accuracy using CS-VQE versus using full VQE, for qubit ordering chosen by the optimal heuristic. The dashed line marks equality. All points represent either one, two, or three Hamiltonians.
we tested, the noncontextual Hamiltonians chosen via the greedy heuristic discussed in [15] contain the diagonal terms in the full Hamiltonian (i.e., tensor products of combinations of Pauli Z and single-qubit identity), together with a single clique containing some off-diagonal terms. In particular, this means that the generating set G comprises single-qubit Z operators in the original basis, so the rotated basis is identical to the original basis. The highest weight terms that are not included in the noncontextual part are those containing only one or two off-diagonal Pauli tensor factors.
As discussed in the main text, terms in the quantum correction Hamiltonian are "freed" to be optimized when the generators in the noncontextual part that they anticommute with are dropped. In the present case, these generators are the single-qubit Z operators that act on the qubits for which the terms in the quantum correction Hamiltonian are off-diagonal. Thus, greedily dropping pairs of generators allows a new subset of the highest weight terms in the quantum correction Hamiltonian to be freed for optimization at each step in the heuristic.
For other heuristics, we refer to the one that led to the best errors of any we tested as the optimal heuristic. The optimal heuristic begins with full VQE, then adds qubits to the noncontextual approximation one at a time while greedily minimizing the error penalty at each step. This heuristic is consequently as hard as performing full VQE, so using it in practical implementations would negate the value of CS-VQE. However, it is informative because it provides a good approximation to the optimal orderings and consequently to the ideal performance of CS-VQE.
For comparison to Fig. 2 and Fig. 3, the errors versus qubits and terms to reach chemical accuracy figures for the heuristic in the main text, we include here the corresponding figures for the optimal heuristic, as Fig. 4 and Fig. 5. Notably, the heuristic included in the main text matches the number of qubits required to reach chemical accuracy using the optimal heuristic in all cases except F 2 , LiH in the 3-21G basis, and Mg: in these cases the heuristic in the main text requires one more qubit than the optimal heuristic. In fact, for HeH+ in the 3-21G basis, the heuristic in the main text requires one fewer qubit to reach chemical accuracy than the optimal heuristic.
As an alternative to heuristics that calculate actual energies, we tested two heuristics based on the total weight of the Pauli terms associated to each qubit. Both starting from the full VQE end (greedily minimizing the penalty for each qubit removed) and starting from the noncontextual end (greedily maximizing the improvement for each qubit added) had identical performance for the examples we tested, but unfortunately that performance was substantially worse than the performance of the heuristic discussed in the main text (as well as the optimal heuristic).

C Noncontextual Hamiltonians
As noted in the main text, a set of observables is noncontextual when it admits consistent joint valuations. The kind of contradiction that might prevent such a joint valuation is closely related to the notion of inference, which we can introduce as follows. If a pair of observables A, B commute, then they can be measured simultaneously together with their product AB. Thus, if we attempt to construct a classical, ontological model for some set of observables including A and B, in any assignment of values, the value assigned to AB must be the product of the values assigned to A and B. This is based on the fact that if an observer measures A, B, and AB, the values they obtain will always be consistent with the product relation. Hence, we say that given an assignment of values to A and B, we may infer the value assignment to AB.
Definition 2 (see [14]). Given an arbitrary set S of Pauli operators, the closure under inference S of S is the minimal set of Pauli operators, containing S as a subset, such that for every commuting pair A, B in S, AB is also in S.
Definition 3 (see [14]). A set S of Pauli operators is noncontextual if it is possible to assign values (±1) to S that respect all inference relations in S, i.e., such that for every commuting pair A, B ∈ S, the value assigned to AB is the product of the values assigned to A, B.
A set of Pauli operators S nc is noncontextual if and only if it has the form where Z is the subset of S nc containing all operators in S nc that commute with all other operators in S nc , operators in the same C i commute, and operators in different C i anticommute [14]. In other words, S nc is noncontextual if and only if commutation is transitive on S nc \ Z. If commutation is transitive on a set, then because it is always reflexive and symmetric, it is an equivalence relation: as noted in the main text, we label its equivalence classes C i , and also refer to these as cliques.
As in the main text, we partition a general Hamiltonian H into a noncontextual part H nc and the remaining terms H c , where the associated sets of Pauli operators are S, S nc , and S c , respectively. S nc must be closed under inference within S, which we may now define rigorously in terms of Definition 2: S nc is closed under inference within S if and only if S nc = S nc ∩ S.
This is simply a formalization of Definition 1 in the main text. Subject to these constraints, we can choose H nc in any way we like.
The key step in building a quasi-quantized model for the noncontextual part H nc is construction of R, a new set of Pauli operators such that R = S nc (so that value assignments to R induce value assignments to S nc by inference), and R is independent: Definition 4 (see [15]). A set R of Pauli operators is independent if no value of any operator A in R can be inferred from any value assignment to a subset of R not containing A. Equivalently, R is independent if and only if for every commuting subset of R, its product is not in R.
Note that in a commuting set of Pauli operators, this notion of independence reduces to the usual definition of independence of subsets of an Abelian group [15]. Requiring R to be independent means that not only are some value assignments to R allowed (since R is noncontextual), but in fact all value assignments to R are allowed (which is not true for a general noncontextual set) [15].
The independent set R is given by where each A i ∈ C i , and G is an independent generating set for Note that since the set (60) is composed of elements of Z and products of pairs of elements in the same clique, its elements commute with all elements of S nc . All elements of G therefore commute with all of the A i , although the A i pairwise anticommute (since each is an element of the corresponding C i ). As noted in (5) in the main text, states of the quasi-quantized model turn out to be equivalent to joint knowledge of the set of commuting observables where the Pauli operators G j ∈ G have values q j = ±1, and for some unit vector r the operator has value +1 (and consequently each operator A i has expectation value r i ). Since the Pauli operators A i anticommute and | r| = 1, A has eigenvalues ±1, as shown in [15] (see also Lemma A.1).
The states of the quasi-quantized model are thus parametrized by the values ( q, r), so we call such a parameter set a noncontextual state (which is the same as an epistemic state, as in [15,17]). The resulting expression for the expectation value of H nc is where the classical state parameters ( q, r) can take any values such that q j = ±1 for each j and | r| = 1 [15]. The constants in (63), h B and h B,i , are the coefficients in the original Hamiltonian H nc (under an efficiently classically calculable relabeling), and for each B ∈ G, J B is the set of indices such that which is also efficiently classically calculable. Thus (63) expresses the expectation value of the noncontextual part of the Hamiltonian as a classical objective function of the parameters ( q, r), which may be both obtained and evaluated classically efficiently [15]. Given the objective function (63), estimating the ground state energy of the noncontextual part of the Hamiltonian requires minimizing (63) over the parameters ( q, r). For an Hamiltonian of n qubits, the total dimension of ( q, r) is at most 2n + 1. As noted in the main text, we refer to the setting ( q, r) that minimizes (63) as the noncontextual ground state.