Quantum Error Mitigation using Symmetry Expansion

Even with the recent rapid developments in quantum hardware, noise remains the biggest challenge for the practical applications of any near-term quantum devices. Full quantum error correction cannot be implemented in these devices due to their limited scale. Therefore instead of relying on engineered code symmetry, symmetry verification was developed which uses the inherent symmetry within the physical problem we try to solve. In this article, we develop a general framework named symmetry expansion which provides a wide spectrum of symmetry-based error mitigation schemes beyond symmetry verification, enabling us to achieve different balances between the estimation bias and the sampling cost of the scheme. We show that certain symmetry expansion schemes can achieve a smaller estimation bias than symmetry verification through cancellation between the biases due to the detectable and undetectable noise components. A practical way to search for such a small-bias scheme is introduced. By numerically simulating the Fermi-Hubbard model for energy estimation, the small-bias symmetry expansion we found can achieve an estimation bias 6 to 9 times below what is achievable by symmetry verification when the average number of circuit errors is between 1 to 2. The corresponding sampling cost for random shot noise reduction is just 2 to 6 times higher than symmetry verification. Beyond symmetries inherent to the physical problem, our formalism is also applicable to engineered symmetries. For example, the recent scheme for exponential error suppression using multiple noisy copies of the quantum device is just a special case of symmetry expansion using the permutation symmetry among the copies.


Introduction
The ultimate goal of implementing a fully faulttolerant quantum error correction scheme for quantum algorithms with provable speed-up may still be years away. However, the recent rapid advance in quantum computing hardware has enabled certain Zhenyu Cai: cai.zhenyu.physics@gmail.com computational tasks to be performed that are well beyond classical capabilities [1,2], prompting the question of whether it is possible to perform any practically useful tasks with the noisy intermediate-scale quantum (NISQ) devices that we will soon have.
One of the biggest roadblocks for such a goal is how to alleviate the damage done by the noise in these NISQ devices with the limited quantum resource we have. This brings us to quantum error mitigation, which unlike quantum error correction, mostly relies on performing additional measurements instead of employing additional qubits to mitigate the damages caused by errors [3][4][5][6]. Quantum error mitigation has been successfully implemented in various experimental settings [7][8][9][10][11]. Symmetry verification is one such error-mitigation technique that projects the noisy output quantum state back into the symmetry subspace defined by the physical problem we try to solve [12,13]. Besides measuring all symmetries in every circuit run and discarding the runs that fail any symmetry test, symmetry verification can also be carried out in a post-processing way with a much simpler measurement scheme inspired by quantum subspace expansion [12,14]. Since stabiliser code decoding can be viewed as symmetry verification with engineered symmetry, we can also apply similar post-processing techniques to reduce the practical challenges in the stabiliser code implementation [15].
More recently, two teams have independently developed a quantum error mitigation scheme that uses multiple copies of the noisy quantum device to suppress errors exponentially as the number of copies increases [16,17]. The method was named virtual distillation by one of the teams. Its core idea relies on the permutation symmetry among the copies, but the way it is implemented does not involve projecting the noisy state into the correct symmetry subspace, and thus does not fall within the symmetry verification framework. In fact, the corresponding permutation symmetry verification [18][19][20] can only suppress errors linearly with the increase of the number of copies instead of exponentially.
In this article, we will provide a general framework named symmetry expansion that encompasses a much wider range of symmetry-based error mitigation schemes beyond symmetry verification. We will discuss ways to search within this wide range of symmetry expansion schemes and identify one that can outperform symmetry verification, just as virtual distilla-tion arises from the permutation symmetry group and vastly outperform the permutation symmetry verification.
We will start by introducing the general concepts of quantum error mitigation in Section 2 and the framework of symmetry expansion in Section 3. Then in Section 4 and Section 5, we will discuss the estimation bias and the implementation costs for symmetry expansion. Moving onto Section 6, we will outline a practical way to identify a good symmetry expansion scheme. In the end, we will finish with an explicit numerical example for the application of symmetry expansion in Section 7, along with the conclusion and outlook in Section 8.

Quantum Error Mitigation
In quantum error correction, we encodes the quantum information into a code space by employing additional qubits, thus we can recover the noiseless state ρ 0 by project our noisy state ρ back into the code space. On the other hand, in quantum error mitigation, instead of trying to recover the ideal state, we are trying to recover the ideal measurement statistics for some observable. For a given observable O, we will slightly abuse the notation and denote the random variable corresponding to its measurement results of the unmitigated noisy state ρ as simply O, i.e. O = Tr(Oρ), while the random variable corresponding to the measurement results of the noiseless state ρ 0 is denoted as O 0 , i.e. O 0 = Tr(Oρ 0 ). Using the noisy circuit to estimate the noiseless expectation value can lead to a large estimation bias |Bias [O]| = | O − O 0 | when there is a large amount of noise present. Hence, we can instead try to construct an error-mitigated estimator O em using the noisy data obtained from different noisy circuit configurations, which would hopefully lead to a smaller bias in the estimates: Note that throughout this article, we often refer to the magnitude of bias simply as "bias" just like above, the exact meaning should be clear from the context.
An error-mitigated estimator that reduces the bias will usually come at a cost of a larger variance which means that a larger number of samples are needed to reduce the shot noise in the sample estimate. Very often we care more about the factor of increase of variance for a given error mitigation scheme compared to the unmitigated scheme, which will be called its sampling cost: Similarly for biases, we often care more about the fractional deviation from the ideal value, which we will call the relative bias and denoted as: The additional minus sign relative to the bias is for a simpler connection to quantum state infidelity later, but the essential idea does not change. When trying to compare two different error mitigation schemes, if one of the schemes has both a smaller bias and a smaller variance (sampling cost), then that would simply be the better scheme. However, the more common and more interesting case is when one scheme has a smaller bias while the other has a smaller variance, which means we need to compare the biasvariance trade-off between the two schemes more carefully. As the number of samples increase, the shot noise contribution from the variance will decrease and the error due to the bias will become more and more dominant, and as a result the small-bias scheme would become more and more favourable. In Appendix A, we have shown that in practice the minimum number of samples required for a small-bias estimator A to outperform a small-variance estimator B for a Pauli observable O is: (3) As we will see later, the general framework of symmetry expansion will enable us to predict the bias and variance of different symmetry-based error-mitigation schemes, and thus enable us to choose a suitable scheme for our given scenario (or at least rule out some impractical schemes).

Symmetry Expansion
We are given a circuit that would ideally generate the noiseless state |ψ 0 which is stabilised by a group of symmetry operations G: The group of symmetry is usually identified through our knowledge of the structure of the circuit and/or the physical meaning of the state (see Appendix B). The symmetry subspace that |ψ 0 lives in, within which all G ∈ G will act trivially, is defined by the projector [21]: being an uniform average of elements in G will be Hermitian: Π † G = Π G . In reality, the output of the circuit is a noisy state ρ that might have components outside the symmetry subspace. Hence, we will project ρ back into the symmetry subspace to mitigate the damage of the noise and we will call this process symmetry verification [12,13]. The effective density matrix we obtain after symmetry verification is simply: Throughout this article, we will use O = Tr(Oρ) to denote the expectation value of some observable O from the noisy circuit. Using Eq. (6), the expectation value of some observable O after symmetry verification will be: In direct symmetry verification, we will measure the observable O and the symmetry verification result Π G in the same circuit run, and discard the circuit runs that has Π G measured to be 0. This would require and thus we have: Alternatively, Eq. (9) will also hold if we have How this may arise will be further discussed in Section 6.4. Note that the normalisation factor Π G is essentially the fraction of circuit runs that pass the symmetry verification (i.e. those runs with Π G measured to be 1).
We can also apply symmetry verification in a postprocessing way. Using Eq. (4), we see that we can actually obtain Π G by uniformly sampling random G ∈ G. Similarly, we can also obtain OΠ G by uniformly sampling random OG for G ∈ G. In each circuit run, if we measure O along with a random G ∈ G, we can then compose them to obtain both G and OG and thus effectively obtaining samples for Π G and OΠ G , which in turn can give us the estimate of Tr(Oρ sym ) via Eq. (9).
Instead of sampling uniformly from G, we can sample each symmetry G with a positive weight w G , and we will call this more general sampling scheme symmetry expansion. By denoting the sampling weight distribution as w, the resultant observable following our new sampling scheme will be: where Γ w is the symmetry expansion operator : and ρ w is the symmetry-expanded 'density operator': The symmetry expansion operation acting on ρ here is not necessarily a positive map. Hence, though ρ w has unit trace as expected, it is not necessarily positive semi-definite and thus is not necessarily a valid density operator in the usual sense. Note that Γ w is also a symmetry operator: An important class of symmetry expansion would be a uniform expansion using a subset of symmetry operators F ⊆ G (note that F is not necessarily a group), which would be our main focus later on. We will denote the corresponding weight vector as w = F and the symmetry expansion operator is simply When F contains only one symmetry operator F , we will slightly abuse the notation and write F = F and thus Γ F ≡ F . The unmitigated noisy state is simply a special case of symmetry expansion using just the identity operator: Symmetry verification, when used for the purpose of expectation value estimation in Eq. (9), is equivalent to a uniform symmetry expansion over the full set of symmetry operators G: Hence, from here on, our discussion of general symmetry expansion schemes will also be applicable to • the unmitigated case: w = I, • the symmetry-verified case: w = G.
As discussed in Refs. [12], symmetry verification is just a special case of quantum subspace expansion [14] and the idea of changing the weight of the symmetry in the projection operator has been discussed before under the quantum subspace expansion framework [15]. Assuming all symmetry operators G ∈ G commute with the observable O or the noisy state ρ (a special case of Eq. (8) and Eq. (10)), the expectation value after performing quantum subspace expansion using the expansion operator Γ v is given by Such quantum subspace expansion using symmetry operators will simply be called symmetry subspace expansion (not to be confused with symmetry expansion). Compared to Eq. (11), we see that symmetry subspace expansion is simply the special case of symmetry expansion in which the expansion operator Γ w is the square of some other expansion operator: Γ w is positive semi-definite. In Appendix C, we have shown that the symmetry subspace expansion scheme that would maximise the fidelity against the ideal state is simply the symmetry verification scheme. Non-uniform weight distribution for symmetry subspace expansion will only bring advantages over symmetry verification when we are restricted to a truncated set of symmetry operators that do not form a group [15]. Hence, in this article we will only focus on the comparison between symmetry verification and symmetry expansion since we are interested in the case in which we have access to the full set of symmetries.
To perform direct symmetry verification for a general symmetry group, we can have an ancilla register with its different basis states corresponding to different elements in the group, then perform controlsymmetry operations from the ancilla to the data register and measure out the ancilla as outlined in Ref. [19]. For Pauli symmetry groups (stabiliser groups), their direct symmetry verification can be carried out by simply measuring the group generators since they are abelian. Similarly, for post-processing symmetry verification, Pauli symmetries are much easier to measure compared to general symmetries since they are Hermitian. Hence, Pauli symmetries are most relevant to the practical implementations of symmetry-based error mitigation techniques and will be our main focus in this article. As a result, the symmetry expansion operator Γ w we look at will be Hermitian, and since all the symmetries are unitary, we have Though our discussion mainly focuses on Pauli symmetries, many of the arguments hereafter are still applicable to general symmetries. An example for the application of symmetry expansion beyond Pauli symmetry is the virtual distillation scheme mentioned in Section 1, which makes use of the permutation symmetries among identical copies of the quantum system. The details of how it relates to symmetry expansion are outlined in Appendix H. Being a symmetry expansion scheme, it was shown to vastly outperform the corresponding symmetry verification scheme via both theoretical analysis and numerical simulations [16,17].

Estimation Bias of Symmetry Expansion 4.1 Absolute Infidelity
Using Eq. (2), the relative bias for a given observable O when symmetry expanded with weight w is simply: Here ρ 0 = |ψ 0 ψ 0 | and ρ w are the ideal state and symmetry-expanded state, respectively. Recall that the unmitigated case is simply w = I while the symmetry-verified case is w = G.
To show that symmetry verification can produce a smaller estimation bias than the unmitigated case, one possible way is compare their output state fidelity against the ideal state: Here we have used Eq. (13) and Eq. (14). We see that the symmetry-verified fidelity is just the unmitigated noisy fidelity ρ 0 boosted by a factor of Γ G −1 .
We might be tempted to generalise the above arguments and try to predict the performance of symmetry expansion using the symmetry-expanded fidelity : which is just the unmitigated noisy fidelity ρ 0 boosted by a factor of Γ w −1 . However, unlike the unmitigated and symmetry-verified cases, Tr(ρ 0 ρ w ) is not upper-bounded by 1 anymore since ρ w might not be positive semi-definite as mentioned in Section 3. Hence, instead we construct a metric that measure how far F w = Tr(ρ 0 ρ w ) deviate from the ideal value of 1, which is equivalent to the magnitude of the bias in Eq. (18) with the observable being the ideal state ρ 0 : For the unmitigated and symmetry-verified cases we have: which are simply the output state infidelities in the corresponding cases. Hence, we will call this metric the absolute infidelity.
In this article, we are going to use the absolute infidelity, which is the magnitude of the bias with observable ρ 0 , as our metric for the magnitude of the bias for a given error mitigation scheme. We have further discussed why this is a reasonable metric in Appendix D, and the validity of this metric will also be further verified in our numerical simulation later in Section 7.

Mechanism of Symmetry Expansion
The erroneous component of the symmetry-expanded state ρ w is defined to be the components that are orthogonal to the ideal state ρ 0 , which can be obtained by applying the projector 1 − ρ 0 . Within these erroneous components, there are components living within the symmetry subspace defined by the projector Γ G , which are undetectable via symmetry verification. The coefficient of these undetectable erroneous components in ρ w is simply: On the other hand, the erroneous components living outside the symmetry subspace would be detectable, whose coefficient in ρ w is: From Eq. (22), we see that the absolute infidelity is simply the magnitude of the sum of the detectable and undetectable error coefficients: Hence, B d, w and B u, w are simply the bias of the symmetry-expanded estimation of the observable ρ 0 due to the detectable and undetectable errors, respectively. We will simply call them detectable and undetectable bias. In the special case of w = I, i.e. the unmitigated case, B u, I and B d, I are simply the undetectable and detectable error probability, respectively. From Eq. (22), we can see that symmetry expansion with negative Γ w will lead to | w (ρ 0 )| ≥ 1 ≥ I (ρ 0 ) , i.e. the corresponding symmetry expansion scheme will performs worse than the unmitigated case, thus there is no point of applying symmetry expansion in this regime. Therefore, we will only consider symmetry expansion operators with positive expectation values: 0 < Γ w ≤ 1. In this regime, B u, w will increase as Γ w decrease and will always be positive, while B d, w will decrease as Γ w decrease and will become negative once Γ w < Γ G . Hence, the expression of the absolute infidelity can be split into the following regions:   Figure 1: The change in the absolute infidelity along with its detectable and undetectable components against the change in the noisy expectation value of the symmetry expansion operator. We always have ρ0 ≤ Γ G since the symmetryverified fidelity in Eq. (20) is always smaller than 1. Note that the absolute infidelity is simply the magnitude of the sum of the detectable bias and the undetectable bias: which is plotted in Fig. 1. Let us now go through these different regions of symmetry expansion.

Standard behaviour:
Here both the detectable bias B d, w and the undetectable bias B u, w are positive and thus the absolute infidelity is simply At the start of this region we have Γ w = I = 1, which is simply the noisy unmitigated case. As we modify our symmetry expansion scheme to decrease Γ w , the undetectable bias B u, w will increase while the detectable bias B d, w will decrease. In this region, the absolute infidelity | w (ρ 0 )| will decrease overall as a result of decrease in Γ w .
When we reach the point of Γ w = Γ G , we have arrived at symmetry verification, in which we have B d, G = 0. Hence, symmetry verification have removed all the detectable error components and we have | w (ρ 0 )| = B u, w .

Bias cancellation:
In this region the detectable bias B d, w turns to negative and thus the overall error becomes We see that since the detectable bias B d, w and the undetectable bias B u, w have different signs, the absolute infidelity become the difference between the magnitude of detectable and undetectable biases, allowing cancellation between the detectable and undetectable bias and thus reducing the overall bias.
In this region, the absolute infidelity | w (ρ 0 )| will still decrease overall as a result of decrease in Γ w . As we arrive at the point of Γ w = ρ 0 , we have B u, w = |B d, w |, which means perfect cancellation between the detectable and undetectable bias, and thus the absolute infidelity is reduce to | w (ρ 0 )| = 0.

Negative infidelity:
In this region, we still have negative detectable bias B d, w , and thus there will still be cancellation between the detectable and undetectable biases. However, now the detectable bias has grown to a larger magnitude than the undetectable bias and thus in this region we have w (ρ 0 ) being negative, which means: The absolute infidelity | w (ρ 0 )| will increase as we decrease Γ w . When Γ w decrease beyond , i.e. symmetry expansion will have a larger bias than symmetry verification. When Γ w decrease , i.e. symmetry expansion will have a larger bias than the unmitigated state.

Simple example of bias cancellation
As explained above, some symmetry expansion schemes can achieve smaller absolute infidelity than symmetry verification due to the cancellation between the detectable and undetectable bias. We will further illustrate this through the simple example in which there is just one single non-trivial symmetry operator: G = {I, G}.
We will compare the performance of three schemes: As mentioned, the undetectable/detectable biases for the unmitigated case are simply the probability that some errors undetectable/detectable by the symmetry G have occurred in a given circuit run, which we will denote as P u /P d : Hence, we can write the absolute infidelities of various schemes in terms of P u and P d using Eq. (22): We see that I (ρ 0 ) is proportional to the sum of P u and P d since none of the errors are removed. We also have G (ρ 0 ) being proportional to just P u since the detectable errors are removed by symmetry verification. On the other hand, G (ρ 0 ) is proportional to the difference between P u and P d , allowing cancellation between detectable and undetectable error probability, which is why symmetry expanding with w = G can achieve a smaller bias than symmetry verification.

Implementation
In direct symmetry verification, we need to measure the whole set of symmetry generators and the observable in the same circuit run, and discard runs that fail any of the symmetry tests. By using additional circuit structures to simultaneously diagonalise all the observables and symmetries we want to measure [22,23], we would only need to perform local measurements directly on the data register without needing any ancilla. The additional circuit structure required can be vastly simplified (or even removed) by using postprocessing symmetry verification and more generally symmetry expansion, for which we only need to measure one symmetry operator along with the observables in each circuit run. The Fermi-Hubbard simulation that we will study in our numerical simulation later in Section 7 is an example of no additional circuit structure required to perform our measurements. The detailed measurement scheme for this case can be found in Ref. [24]. Since we are discarding circuit runs that fail the symmetry test in direct symmetry verification, we are essentially changing the form of the effective error channels in the circuit. Therefore combining direct symmetry verification with other error mitigation techniques like quasi-probability and error extrapolation would require elaborate schemes [5] since they all rely on our knowledge about the error channels. On the other hand, there is no discarding circuit runs in symmetry expansion, and thus it can be combined with other error mitigation techniques straightforwardly, e.g. by simply performing symmetry expansion following Eq. (11) using OΓ w and Γ w that are error-mitigated using the other error mitigation techniques.

Sampling Cost
Now looking beyond the circuit implementation, we would want to know how many circuit runs are needed for symmetry expansion. For a given error mitigation scheme we have defined the corresponding sampling cost in Eq. (1) as the factor of increase in the number of circuit runs required compared to the unmitigated case. As shown in Appendix G, the sampling cost for implementing direct symmetry verification and symmetry expansion are As can be seen from Section 4.1, in the region where Γ w ≥ ρ 0 (i.e. w (ρ 0 ) is positive), the symmetryexpanded fidelity behaves like regular fidelity and thus can be use as a metric for the estimation bias of symmetry expansion. For all symmetry expansions in this region, including post-processing symmetry verification, if we are allowed C sampling cost for our error mitigation technique, symmetry expansion can give us a factor of √ C boost in the fidelity based on Eq. (21) and Eq. (29). Hence, we see that symmetry expansion in this regime is as 'cost-effective' as post-processing symmetry verification. Direct symmetry verification, on the other hand, can achieve a factor of C boost in the fidelity and thus is more 'cost-effective' but is more difficult to implement due to the measurement requirement and harder to combine with other error mitigation techniques as discussed in Section 5.1.
In the region where Γ w < ρ 0 , the symmetryexpanded fidelity is not well-defined anymore and thus our metric for estimation bias becomes the absolute infidelity as discussed in Section 4.1. In this region we can achieve the same bias as the symmetry expansion in the Γ w ≥ ρ 0 region, but at a smaller Γ w and thus at a larger sampling cost C w . Hence, symmetry expansion in this region is less 'cost effective' than post-processing symmetry verification.
6 Searching for Suitable Symmetry Expansion Schemes

Estimation of Fidelity and Symmetry Expectation Value
As discussed in Section 4 and Section 5, we can estimate the bias and the sampling cost of a given symmetry expansion scheme using the expectation value of its symmetry expansion operator Γ w and the unmitigated fidelity ρ 0 . Hence, in order to compare among different symmetry expansion schemes, we need to first perform estimations of ρ 0 and Γ w .
We will call the expected number of errors occurring in each circuit run the mean circuit error count and denote it using µ. In practical NISQ applications, we would need to have a circuit that is large enough such that the number of error locations is much greater than 1 and is not too noisy such that the mean circuit error count is of order unity: µ ∼ 1. In this limit, using the Le Cam's theorem, the probability that n errors occurring in a given circuit run will follow the Poisson distribution provided the noise is Markovian: Hence, the probability that there is no errors in the circuit will be: Let us use f to denote the effective fraction of errors that would move the ideal state |ψ 0 out of the 1D subspace it spanned. Then the average number of effective errors occurring on the ideal state is simply f µ, and the fidelity ρ 0 is just the probability that zero effective errors happens: In practice, we usually assume f = 1 for crude estimation of ρ 0 , which gives ρ 0 = e −µ . Moving on, using Eq. (12), Γ w is simply Hence, we can estimate Γ w by simply measuring G with sampling weight w G at the end of the circuit.

Search for a Small-bias Symmetry Expansion
In many of the practical application, due to the large amount of circuit noise present, very often the main bottleneck of quantum error mitigation is still trying to achieve a small enough estimation bias. Therefore, we hope to use symmetry expansion to further reduce the estimation bias of symmetry-based error mitigation method beyond symmetry verification. Here we will discuss how to search for a symmetry expansion scheme that has a smaller absolute infidelity and thus a smaller bias than symmetry verification. The absolute infidelity | w (ρ 0 )| of a given symmetry expansion can be estimated using ρ 0 and Γ w using Eq. (22). However, due to the approximation we made in estimating ρ 0 and Γ w , it is usually hard to fine-tune our expansion weights w such that the absolute infidelity | w (ρ 0 )| reach the exact value we want. On the other hand, the precision of our | w (ρ 0 )| estimates is often enough for comparing among the restricted class of symmetry expansion scheme mentioned in Section 3, in which we only consider a uniform expansion using a subset of symmetry operators F ⊆ G. For such symmetry expansion schemes, we have: In order for the symmetry expansion scheme with w = F to achieve an absolute infidelity F (ρ 0 ) smaller than some threshold δ, using Eq. (22) we need to have: Finding symmetry expansion schemes that has a smaller bias than symmetry verification is simply the case of δ = G (ρ 0 ). One way to construct symmetry expansions that fulfil this requirement is by first identifying the set of symmetry operator R ⊆ G consists of all the symmetry operators whose noisy expectation value falling within this range: In such a way, any subset F ⊆ R will always satisfy Eq. (32). Using Eq. (32) we see that the scheme that has a smaller Γ F − ρ 0 can achieve a smaller estimation bias. Following Eq. (30) and assuming f ≈ 1, we have ρ 0 ≈ e −µ . Hence, we should be able to achieve a small estimation bias using the set of symmetry operators that minimise Γ F − e −µ , which is denoted as We need to be careful about the case in which we have two schemes whose Γ F are of similar distance to e −µ , one from above and one from below. In this case, the scheme with larger Γ F is expected to lead to a smaller bias as it will have Γ F closer to the actual ρ 0 (which is lower-bounded by e −µ ), and it is also expected to have a smaller sampling cost as discussed in Section 5.2. Hence, the scheme with larger Γ F is generally preferred in such a case. We will call the symmetry expansion scheme found in the way outlined above the small-bias expansion scheme.
In Appendix D and later in the numerical simulation (Section 7), we can see that we usually have w (O) ∼ w (ρ 0 ). Hence, using Eq. (3), the minimum number of samples required for the small-bias expansion scheme ( w = G * ) to outperform symmetry verification ( w = G) for a Pauli observable is simply: Note that f I = 0. Using Eq. (31), we then have: We can use Eq. (30) and Eq. (36) to rewrite Eq. (33) at the circuit error rate µ as: In the limit of small δ, the first order approximation of the inequality above gives: i.e.
In this small δ limit, the relative bias of symmetry expanding using any F ⊆ R can be obtain using Eq. (22): which is of O(δ) as expected. The corresponding sampling cost can be obtained using Eq. (29): We see that the sampling cost grows exponentially with the mean circuit error count µ, similar to other mainstream quantum error mitigation techniques [6]. Using Eq. (38) with f ≈ 1, the search for the smallbias expansion scheme in Eq. (34) can now be written as Similarly to our discussion before, when we have two schemes with similar 1 |F| F ∈F f F will have larger Γ w and thus is preferred.

Reducing Search Space using Equivalent Symmetry
If a given symmetry operator S commutes with the noisy state ρ, which also implies S −1 commuting with ρ, we then have: One possible way that such a symmetry S can arise is when it corresponds to some qubit permutations. In such a case, we would naturally design a state preparation circuit with a layout that respects this symmetry. Since we usually assume the nature of the noise of a given type of circuit components is the same for all of its instances in the circuit, we can expect the resultant noisy state ρ coming out of this circuit will also follow the same symmetry: SρS −1 = ρ, i.e. ρ commutes with S. Eq. (41) implies we can obtain the same value for Γ w even if we sample SGS −1 in place of G. Hence, G and SGS −1 are equivalent when we are considering Γ w , which in turn means they are equivalent when trying to construct a symmetry expansion scheme with a given absolute infidelity and a given sampling cost.
We will denote the set of symmetry operators that commute with ρ as S, which is a subgroup of G. We say that two elements G and G are equivalent if they give the same noisy expectation value. Alternatively, based on our arguments above, we can write this as We can prove that this is an equivalence relation, and thus will partition the whole symmetry group into different equivalence classes. The symmetry operator within each equivalence class will have the same effect when sampled in symmetry expansion for calculating the fidelity against the ideal states. Hence, to find the suitable symmetry expansion, we only need to pick one element from each equivalence class and sample over them instead of sampling over the whole symmetry group. This vastly reduces the search space for the suitable expansion weight distribution. In the extreme case that ρ commutes with all symmetry operators G ∈ G, then the equivalence classes are just the conjugacy classes of G.

Problem Setup
The physical problem that we will be looking at is the 2D Fermi-Hubbard model with nearest-neighbour interaction, with the Hamiltonian given as: Here a † v,σ /a v,σ are the creation/annihilation operators of site v with spin σ and n v,↑/↓ are the number operators. The exact circuit we used is outlined in Ref. [24], which has the same form as a first-order trotterisation circuit. We use the Jordan-Wigner encoding to map the individual interaction terms and fermionic swaps to two-qubit gates in the circuit, with the gates that correspond to the interaction terms parametrised. This circuit can be used in for example variational eigen-solvers. We would perform simulations for the 2 × 2 and 2 × 3 half-filled Fermi-Hubbard model, which corresponds to a 8-qubit circuit with 144 two-qubit gates and a 12-qubit circuit with 336 twoqubit gates, respectively. All two-qubit gates are assumed to be affected by two-qubit depolarising noise of the same strength, which is of the form: where P 2 is the full set of two-qubit Pauli operators. Similar simulations are also performed for bitflip noise in Appendix J, which yield similar results.
We will look at a range of circuits with randomly generated parameters and thus has random ideal output states. Using t v,w = 1 and U v = 2 for our Hamiltonian, we can measure the energy of the ideal output state and compare it to the energies of the noisy output state and the error-mitigated output states. In our simulation, we will focus on circuits whose ideal output energies magnitude are larger than 0.5 since in practice we are usually interested in states with nonnegligible energy magnitude (otherwise the sampling cost for high energy precision will be very large). The simulation code used is available online [25].

Different Symmetry Expansion Schemes
The ideal circuit will conserve the number of fermions within each spin subspace. Hence, the symmetry operator we can enforce on the output state would be the spin-up/down number parity symmetry, denoted as G ↑/↓ . In the Jordan-Wigner encoding, G ↑/↓ is just the tensor product of the Z operators acting on the qubits corresponding to the spin-up/down orbitals. Together they can generate the number parity symmetry group where G tot = G ↑ G ↓ is the total fermion number parity symmetry. As discussed in Section 6.2, our main focus will be on the symmetry expansion schemes that are uniform expansion of subsets of symmetry operators F ⊆ G. The Hamiltonian in Eq. (42) is invariant under exchange of spin up and spin down. This spin-exchange symmetry is denoted as S, and will naturally lead to a circuit construction that respects S. Hence, as shown in Section 6.4, we can prove that G ↑ and the corresponding conjugated symmetry SG ↑ S −1 = G ↓ would be equivalent when used for symmetry expansion. Note that for our practical implementation of the circuit, the trotter step implemented actually does not exactly commute with S, thus the performance of G ↑ and G ↓ for symmetry expansion are only approximately the same in our case. However, they are close enough such that the symmetry expansion schemes  that have G ↓ in place of G ↑ or vice versa are considered to be identical to simplify our comparison between different expansions.
As shown in Appendix I, the fraction of errors detectable by G tot is estimated to be f Gtot ∼ 8 15 and that by G ↑/↓ is estimated to be f G ↑/↓ ∼ 2 5 . Using these and following Eq. (40) along the discussions there, the small-bias expansion scheme that we will find is simply uniform expansion with the set of symmetry operators:

Results
The absolute relative bias in the energy estimate, denoted as | (H)| over different mean circuit error counts µ for the 8-qubit and 12-qubit simulations are shown in Fig. 2 (a) and (b). The corresponding absolute infidelity | (ρ 0 )| are shown in Fig. 2 (c) and (d). From the close resemblance between the shapes of the curves for | (H)| and | (ρ 0 )|, we see that | (ρ 0 )| is indeed a good metric for comparing the estimation bias for different symmetry expansion methods as discussed in Section 4. This further validates our approach of searching for good symmetry expansions based on | (ρ 0 )| in Section 6 and our assumption of The symmetry expansion schemes that we look at are uniform expansions of different subsets of symmetry operators F ⊆ G as discussed in Section 7.2. When F forms a group, then symmetry expanding with F is simply performing symmetry verification using the symmetry group F. We see that in our examples in Fig. 2, out of all symmetry verification schemes (solid lines), the symmetry verification using the full symmetry group G can achieve the lowest bias as expected. When we look at all symmetry expansion schemes including symmetry verifications, we see that the small-bias scheme we found in Section 7.2 can achieve the lowest bias.
The sampling costs are shown in Fig. 2 (e), which is essentially the plot of the function C w = Γ F −2 in Eq. (29) for different F. Since the estimated detectable error fraction f G ↑/↓ and f Gtot are independent of the number of qubits and the circuit parameters, it means that Γ w is mostly independent of the number of qubits and the circuit parameters using Eq. (37), and thus our cost plot is practically the same for both the 8-qubit and 12-qubit cases and for circuits with any parameters. Note that the costs for the symmetry verifications here are for post-processing verification. If we go for direct verification instead, then we need to take the square root of the cost.
Let us zoom in at the mean circuit error count µ = 1 and focusing on the three most representative symmetry expansions (thicken lines): the unmitigated noisy state, the symmetry verification using the full symmetry group G and the small-bias symmetry expansion   using G * = {G ↓ , G tot }, whose data are summarised in Table 1 (a). We see that at µ = 1, the energy estimate bias | (H)| for the unmitigated noisy output sits at ∼ 0.5 for the 8-qubit example and ∼ 0.4 for the 12-qubit example. Symmetry verification using G can reduce the bias to ∼ 0.2, more than halving the bias, at the sampling cost of ∼ 3. The bias in the energy estimate can be further reduced to ∼ 0.03 when we apply the small-bias expansion scheme in Eq. (44), which is ∼ 15 times reduction compared to the unmitigated noisy state and ∼ 6 times reduction compared to symmetry verification. The sampling cost for the symmetry expansion is C G * ∼ 6, which is only twice the cost of symmetry verification. We see a similar, if not larger, improvement of the absolute infidelity | (ρ 0 )| by using the small-bias symmetry expansion scheme over the unmitigated noisy state and symmetry verification. Using Eq. (35), we have sampling number we can easily reach in practice and thus the small-bias scheme should be much preferred in practice. Now we will move to µ = 2 with the data summarised in Table 1 (b). The bias in energy estimate | (H)| for the unmitigated noisy output now rise to ∼ 0.74 for the 8-qubit example and ∼ 0.62 for the 12qubit example. Symmetry verification can reduce the relative bias to ∼ 0.5 and ∼ 0.4 for the 8-qubit and 12qubit case, respectively, with a sampling cost of ∼ 7. The small-bias symmetry expansion on the other hand can still achieve a low relative bias of ∼ 0.05, which is around 13 times reduction compared to the unmitigated state and around 9 times reduction compared to symmetry verification. Similar to the µ = 1 case, again we see a similar if not larger improvements in the absolute infidelity | (ρ 0 )| compared to | (H)| when we use the small-bias symmetry expansion. However, due to the high circuit error rate µ = 2, the small-bias symmetry expansion would have a high sampling cost of C G * ∼ 41. Using Eq. (35), we have e. the minimum number of samples needed for the small-bias expansion to outperform symmetry verification for a Pauli observable is around 100. This is larger than the µ = 1 case, but still reachable in practice.
Alternatively, at µ = 2, we can opt for symmetryexpanding with only G ↓ , which can also achieve a much lower bias than symmetry verification at G ↓ (ρ 0 ) ≈ 0.2 as seen from Fig. 2, but at a lower sampling cost than the small-bias scheme at C G ↓ ∼ 27. Hence, the minimum number of samples needed to outperform symmetry verification for a Pauli observable is around N * 70 in this case, which is smaller than the small-bias scheme.

Conclusion
In this article, we have constructed a general framework named symmetry expansion for symmetry-based error mitigation techniques. Different symmetry expansions correspond to different sampling weight distributions among the symmetry operators, with symmetry verification corresponding to the uniform weight distribution over the full symmetry group. The effective 'density operator' after symmetry expansion is not positive semi-definite, thus instead of using the infidelity against the ideal state to predict the estimation bias of a given symmetry expansion, we introduced the metric absolute infidelity against the ideal state to predict the estimation bias. Using the absolute infidelity as the metric, we have shown that some symmetry expansion schemes can achieve a smaller bias than symmetry verification through cancellation between the biases due to the detectable and undetectable noise components. For any given symmetry expansion scheme, we have shown ways to estimate its bias and sampling cost using the overall circuit error rate and measurements of the symmetry operators. Such performance prediction can be further simplified if we have knowledge about the error channels in the circuit. Using the bias prediction, we can search for the uniform symmetry expansion over some subset of symmetry operators that has the smallest predicted bias. The scheme we found is simply called the smallbias expansion scheme.
We then applied our methods to the energy estimation of Fermi-Hubbard model simulation with random circuit parameters for both the 8-qubit and 12-qubit cases. From our simulation, we see that indeed absolute infidelity is a good metric for predicting the bias in our energy estimate. When there is on average 1 error per circuit run, the relative energy estimation biases can be reduced from 0.4 ∼ 0.5 for the unmitigated noisy state and ∼ 0.2 for symmetry verification to ∼ 0.03 for the small-bias symmetry expansion. The sampling cost for symmetry verification in this case is 3, i.e. we need 3 times more circuit runs for shot noise reduction, while the sampling cost for the small-bias symmetry expansion is around 6. Hence, symmetry expansion can reach an estimation accuracy 6 times beyond what is achievable by symmetry verification at a sampling cost that is only 2 times higher. When there is on average 2 errors per circuit run, we found that the factor of bias reduction brought by the smallbias symmetry expansion further increases, but the sampling cost also rise to 41 due to the high circuit error rate. The small-bias scheme will still outperform symmetry verification given a reasonable amount of samples allowed, but we can also turn to another expansion schemes with a slightly higher bias (still lower than symmetry verification) at a lower sampling cost.
In practice, choosing the error-mitigation scheme with the right bias-variance trade-off would largely depends on the precision we wish to reach and/or the number circuit runs allowed for our experiment. Symmetry expansion is able to provide us with a wider range of symmetry-based schemes beyond symmetry verification to fit our various practical requirements on the bias-variance trade-off. The estimation biases of the symmetry expansion can be further improved if we could develop effective ways to search for the relevant symmetry expansion schemes, which could be done by for example constructing a better metric beyond the absolute infidelity or using Clifford approximation learning-based method outlined in Ref. [26].
In this article, we have only considered positive sampling weights for symmetry expansion and it would be natural to extend this to negative sampling weights like in the quasi-probability error mitigation [3,4]. On the other hand, it would also be interesting to see if any method can be developed to modify the sampling distribution within quasiprobability like how we go from symmetry verification to symmetry expansion. Part of this has been explored in Ref. [26].
We have provided an explicit example of the application of symmetry expansion for Pauli symmetry group and for virtual distillation (Appendix H) in this article. One could look into the more detailed application of symmetry expansion in the other cases, e.g. non-abelian symmetry group. Note that nonabelian symmetry expansion is much easier to implement than direct non-abelian symmetry verification, as non-abelian symmetry verification cannot be done by simply measuring their generators. However in this case, the expectation values of the symmetry operators might not be real any more, and thus we need to redevelop our methods for finding a suitable symmetry expansion, possibly involving using complex weights. We might use methods in Ref. [27] to measure such complex expectation values.
All of our symmetry expansion arguments are directly applicable to the stabiliser symmetry group in the stabiliser codes. It would be interesting to look at an explicit example of how symmetry expansion might perform in the context of stabiliser code like what has been done for quantum subspace expansion [15]. To find the optimal symmetry expansion in this context, one might want to draw inspiration from Ref. [28] about ways to reduce the search space. In addition, it would also be interesting to see how symmetry expansion can be applied to the symmetry present in continuous variable systems like bosonic codes [29][30][31].

Acknowledgements
The numerical simulations are performed using QuESTlink [32], which is the Mathematica interface of the high-performance quantum computation simulation package QuEST [33]. The author is grateful to those who have contributed to both these valuable tools.
The author would like to thank Simon Benjamin and Balint Koczor for valuable discussions and Tyson Jones for his help on the usage of QuESTlink.
The author is supported by the Junior Research Fellowship from St John's College, Oxford and acknowledges support from the QCS Hub (EP/T001062/1).

A Comparison between Error Mitigation Techniques
Using O em to denote the sample mean of O em after N samples, the mean square error of the error-mitigated sample mean estimator O em is simply: The root mean square error MSE O em will indicate the estimation precision we can reach in practice given N samples. We see that the contribution from the variance term can be reduced by increasing the number of samples N , but the overall precision is lower-bounded by the bias contribution |Bias [O em ]|. Suppose we are required to obtain an answer whose root mean square error MSE O em is below a certain error bound δ, we need to first make sure our estimation bias is below the error bound: If both schemes have their biases below this threshold, then we will compare the number of samples required for them to reach this error threshold The scheme with a smaller N δ will be our go-to scheme.
In this article, we do not have a fixed precision requirement for our estimators, thus we turn to another way to compare between different error mitigation schemes.
If we are given a fixed number of samples allowed instead of being given the required precision, and the two error mitigation schemes are labelled 1 and 2, one with a smaller bias , then the two schemes will achieve the same mean square error when the number of samples is: When the number of allowed samples is more than this threshold N > N * , then the error due to bias dominates and thus scheme 1 with smaller bias is preferred, on the other hand when N < N * , then variance is the larger error source and thus scheme 2 with smaller variance is preferred. Using the expression of relative bias in Eq. (2) and sampling cost in Eq. (1), we have: In a standard quantum circuit, we start with a quantum state |ψ init and perform a series of unitary operations (gates) on it by evolving it under different Hamiltonians, which will output a final state |ψ out , using which we can perform measurements of some observables of interest O. To obtain an estimate of the expectation value of the observable ψ out | O |ψ out , we need to run the above circuit over and over again and take the average of the output. When we try to prepare a certain state |ψ 0 for some physical problems, e.g. the ground state for a given Hamiltonian, we usually can identify some symmetries G that stabilise the state: based on the known physical properties of the ideal state |ψ 0 , e.g. the particle number, spin number, spatial symmetry, etc.
To produce a final state |ψ out that stabilised by the right symmetry G using our quantum circuit, we can start with an initial state is stabilised by the given symmetry: G |ψ init = |ψ init and restrict our circuit U to those that commute with the symmetry Note that this usually means the circuit is built from blocks that commute with the symmetry. As a result, when the circuit above is noiseless, the output state |ψ out = U |ψ init will be stabilised by the given symmetry G: In this case, any violation of the symmetry of the output state is due to the noise in the circuit, and performing symmetry verification will enable us to remove noise that leads to symmetry violation.
On the other hand, if our circuit is not constructed in the way above and would output a noiseless state |ψ out that does not respect the symmetry G, then symmetry verification can project |ψ out into the same symmetry subspace as |ψ 0 , which enables us to remove errors due to imperfect circuit representation for |ψ 0 besides the removal of noise in the circuit.
In this article, we would assume that we have made a sensible choice of our state preparation circuit such that it produces |ψ out that respect the same set of symmetries as |ψ 0 , as were most of the cases in practice, thus symmetry verification will remove the noise in the circuit as discussed above.

C Optimal Symmetry Subspace Expansion
The subspace-expanded density operator with the expansion operator Γ v is: and the corresponding expectation value for some observable H is: with Note that we are only interested in v that satisfy so that Eq. (47) and Eq. (48) are well-defined. Finding the extremal points in Eq. (48) is commonly done in the Rayleigh-Ritz method and is equivalent to solving the generalised eigenvalue problem: In the case of H = ρ 0 = |ψ 0 ψ 0 | for some ideal pure state |ψ 0 , we have E v being the fidelity of the subspace-expanded state against the ideal state following Eq. (48). Since all G are symmetry of ρ 0 we have: Hence, H is a matrix with all entries being ρ 0 . This is a rank 1 matrix that has only one non-zero eigenvalue |G| ρ 0 whose corresponding eigenvector is 1.
Since the set of G i forms a group G , we see that the rows of S are simply permutations of the set of expectation value { G | G ∈ G}, and thus 1 will be an eigenvector of S with the eigenvalue G∈G G .
Hence, we see that v = 1 would satisfy Eq. (52) with E v given by: For any other eigenvectors of S, denoted as v ⊥ , we will only be interested in those with non-zero eigenvalue s v ⊥ due to Eq. (51). Since v ⊥ is orthogonal to 1, we have H v ⊥ = 0. Hence, v = v ⊥ would satisfy Eq. (52) with E v given by: i.e. all extremal points other than v = 1 would given zero fidelity E v = 0. Hence, the symmetry subspace expansion weights that would maximise the fidelity when all our symmetry operators form a full group is always v = 1, i.e. it is simply performing symmetry verification using the symmetry group.

D Linking Infidelity to Relative Bias
We can extract out the noiseless component of the symmetry-expanded state ρ w by writing it as: where F w = Tr(ρ w ρ 0 ) is the coefficient for the errorfree component as expected, and erroneous component ρ ε, w is simply which has unit trace but is not necessarily positive semi-definite. By definition we have Tr(ρ 0 ρ ε, w ) = 0. Using Eq. (18) and Eq. (53), the relative bias for expectation value estimation using symmetry expansion can be written as where w (ρ 0 ) = 1 − F w is the infidelity of the symmetry expansion and is the relative bias of the erroneous component ρ ε, w . In practice, we are usually only interested in the traceless part of the observable since the trace only corresponds to a uniform shift of the spectrum of the observable. Hence, without loss of generality we will assume our observable of interests O is traceless. Now in practice, the ideal output state ρ 0 is usually dominated by the eigenstates of the observable of interest O that lies close to the extreme of the spectrum (e.g. the ground state) such that the magnitude of Tr(Oρ 0 ) is not too small to be measured to high accuracy. The noisy component ρ ε, w on the other hand is usually not dominated by any eigenstates of O as it is the result of random noise. In such a case, we usually have Tr(Oρ ε, w )) Tr(Oρ 0 ) and thus the relative bias of the erroneous component would be close to 1: ε, w (O) ∼ 1, which means: In Section 7, from numerical simulations we see that even if some of the assumptions above are violated, e.g. when ρ 0 does not lie close to extreme of the spectrum of O, w (ρ 0 ) still serve as a good metric for w (O).

E Estimation of Symmetry Expectation Values
Arguments in this section largely follows from Ref. [5].
We will assume the average number of errors in each circuit run is µ. Now looking within each error location in the circuit, there will be only a fraction of the errors that can be detected by a given symmetry G. We will denote this detectable error fraction averaged over all error locations as f G . Correspondingly, it means that the average number of such detectable errors in each circuit run will be f G µ. If the number of error locations with non-negligible detectable error components is much larger than 1, then following the same arguments in Eq. (21) while focusing on only detectable errors, the probability that n detectable errors occur in a given circuit run is: For a Pauli symmetry G, only an odd number of such detectable errors will be detected and an even number of such detectable errors will pass the symmetry test. Hence, using Π G to denote the projector into the G = +1 subspace, the probability of passing the symmetry test using G is just: Hence, we have: Hence, the more general projection operator with multiple Pauli symmetries is simply: where f G is the fraction of errors that are detectable by G. Note that f I = 0. Now we look at a group of Pauli symmetry G generated by the generators G. If the errors detectable by different symmetry generators G ∈ G are independent, i.e. each error component is detectable and only detectable by one of the symmetry generators G ∈ G, then the symmetry tests using different symmetry generators will be independent. Hence, we have: which is a special case of Eq. (56).
Here we have implicitly assumed that the errors in the circuit are incoherent mixture of detectable errors and undetectable errors such that the average number of detectable errors occurring in each circuit run f G µ is well-defined. If this is not true, then we can always try to decohere the detectable and undetectable errors via twirling [34].

F Variance of Quotient
If we have a random variable C that is obtained by taking the quotient of two other random variables A and B: then the variance of C can be approximated using: Throughout this section, we will use X to denote the estimator of X after N circuit runs. If in each circuit run, we can obtain one sample of A and one sample of B, then by definition, we have:

G Sampling Cost Analysis
In this section, we will follow the discussion in Section 3, in which our circuit outputs the noisy quantum state ρ and we want to estimate the value of a Pauli observable O by using the symmetry verification with the symmetry projection operator Π G or using symmetry expansion with the expansion operator Γ w . For a given observable X whose noisy expectation value is X = Tr(Xρ), we will slightly abuse the notation and use X to also denote the unbiased estimator of X .

G.1 Direct Symmetry Verification
The symmetry-verified expectation value of a Pauli observable O is: Since Π G OΠ G takes the value 0, 1 and can be 1 if and only if Π G takes the value 1, we have (Π G OΠ G ) 2 = Π 2 G = Π G . Using this and Eq. (59), the variance and covariance of the observables are: (60) If we can measure all symmetry generators and O in the same circuit run, then in each circuit run, we will obtain one sample for Π G OΠ G and one sample for Π G . Hence, after N circuit runs, our estimation of the symmetry-verified variable is: and the corresponding variance can be obtained using Eq. (58) and Eq. (60):

G.2 Symmetry Expansion
The symmetry-expanded expectation value of a Pauli observable O is: We will decompose OΓ w into: . They can be composed to obtain OG and G, giving us one sample each to the variable OΓ w and Γ w . Note that sampling in this way will means that OΓ w and Γ w are variables that return ±1 for each sample. Hence, we have (OΓ w ) 2 = Γ 2 w = 1, and also OΓ w Γ w = OΓ 2 w = O. Using these and Eq. (63), the variance and covariance of the observables are: (65) When we sample in the way outlined above, for each circuit run, we get one sample for OΓ w and one sample for Γ w just like in the direct verification case, it is just that these two variables are different random variable now. Hence, after N circuit runs, our estimation of the symmetry-expanded variable is: and the corresponding variance can be obtained using Eq. (58) and Eq. (65):

G.3 Comparison
When we directly sample from the noisy circuit, after N circuit runs, the variance in the estimator of O is: which is upper-bounded by 1 N . Compared to the upper-bounds of the variance using symmetry verification and symmetry expansion in Eq. (62) and Eq. (67), the sampling cost factors are given by: • Direct Symmetry Verification: • Symmetry Expansion (including post-processing symmetry verification):

H Virtual Distillation
Here we will revisit the virtual distillation protocol introduced in Refs. [16,17] from the perspective of symmetry expansion. Suppose we are interested in the ideal state |φ 0 . We can prepare M copies of it: |ψ 0 = |φ 0 ⊗M , such that it is invariant under any permutation between the copies. Hence, its symmetry group is the permutation group of M elements, denoting as S M . The projection operator into the subspace stabilised by S M is simply: In reality, we have M copies of the noisy state ρ = σ ⊗M instead. Hence, we can implement symmetry verification using the permutation group to project the noisy state into the symmetry subspace defined by Eq. (70) [18][19][20].
Since the M copies of the noisy state are identical, we know that ρ = σ ⊗M commute with all S ∈ S M , and thus commute with Π M . Hence, the effective state we have after the symmetry verification is: This is a special case of the symmetry expanded state in Eq. (13) with G = S M and uniform weights. From here on, our discussion will deviate from the main text since in virtual distillation, we only care about the observable on the first copy. If we observe an observable O 1 on the first copy of the symmetryexpanded state, we have: .
By direct calculation, we have where we use m S to denote the order of the cycle that copy 1 belongs to in the cyclic representation of the permutation operator S. Hence, here using the structure of our problem, we have successfully identified sets of permutation operators within which all symmetry operators are equivalent for the purpose of being the components of symmetry expansion to observe O 1 . Each set here contains permutation operators that have 1 in a cycle of the same given order m.
Let us define C m as the cyclic permutation operator acting on the first m copies, which is one of the operators in the set that has 1 in an m-cycle, then we simply have: Hence, our symmetry expansion operator only needs to have one element from each equivalence class, which are just C m of different m. It can be rewritten as: , and the symmetry projected observable is simply: we then have fidelity of the symmetry expanded state against the ideal state of the first copy being: .
i.e. F p is a weighted sum of Tr(σ0σ m ) Tr(σ m ) . Therefore, we have: From Refs. [16,17] we see that: which in terms implies the w that maximise F p is the one with w M = 1 and all other w m = 0. Hence, the optimal symmetry-expanded density operator is: which is just the density operator used for virtual distillation in Refs. [16,17]. Symmetry expansion in this case was shown to be able to reduce the infidelity exponentially with increase of M while symmetry verification can only reduce the infidelity linearly with increase of M .
Besides the performance improvement, using symmetry expansion over symmetry verification in this case and more generally for non-abelian symmetry groups also has advantages in terms of circuit implementation. For symmetry expansion, we only require measuring one symmetry operator along with the observable in the same circuit run. If the expectation value of the symmetry operator is real, then no ancilla and complicated long-range gates between ancilla and data are required for symmetry expansion. In the example of virtual distillation, such circuits are constructed in Ref. [17].

I Fraction of Detectable Errors in Fermi-Hubbard Simulation
Similar discussion can be found in Ref. [24]. Since the quantum states in between our two-qubit gates have well-defined G tot , any Pauli errors that anti-commute with G tot will flip the G tot value and get detected. Similarly for G ↑ and G ↓ . Under Jordan-Wigner transformation, G tot is simply the tensor product of Z acting on all qubits while G ↑/↓ is simply the tensor product of Z acting on the qubits in the spin-up/down subspace.

I.1 Depolarising Error
Here we assume all of the two-qubit gates in the circuit are affected by two-qubit depolarising errors. Hence, there are 15 possible two-qubit Pauli errors after each two-qubit gates, only the error components generated by {Z 1 , Z 2 , X 1 X 2 } (excluding identity) are undetectable by G tot . Hence, 8 out of 15 Pauli error components are detectable by G tot , which means Now to calculate the fraction of errors detectable by G ↑ , we need to divide our two-qubit gates into three types: • Gates acting across both spin spaces: Suppose qubit 1 is in spin up and qubit 2 is in spin down, then the two-qubit error components that are undetectable by G ↑ is generated by {Z 1 , Z 2 , X 2 }, excluding identity. Hence, for the errors in these gates, the fraction of error detectable by G ↑ is 8 /15.
• Gates acting within the spin-up subspace: Similar to our discussion of G tot , 8 /15 of the errors are detectable by G ↑ • Gates acting within the spin-down subspace: All errors are undetectable by G ↑ .
Since our circuit consists of alternating layers of gates acting across the spin subspaces and acting within the spin subspaces, we should expect roughly 1 2 of the gates are acting across different spin spaces, roughly 1 4 of the gates are acting within the spin-up subspace and roughly 1 4 within the spin-down subspace. Hence, the total fraction of errors detectable by G ↑ is thus: Similarly we also have:

I.2 Bit-flip Error
Here we assume all of the two-qubit gates in the circuit are followed by two single-qubit bit-flip error locations, i.e. after applying each two-qubit gate, we have (1 − p) 2 probability that no error happens, (1 − p)p probability that one of the qubits is flipped and p 2 probability that both qubits are flipped. To consider the fraction of detectable errors, we only need to consider fraction of detectable errors at each error location (not each gate), which in this case are all just single-qubit bit-flip error locations. Any single-qubit bit-flip in the circuit will anti-commute with G tot thus they are all detectable: f Gtot = 1.
Now to calculate the fraction of errors detectable by G ↑ , we simply need to note that half of our singlequbit bit-flip error locations are in the spin-up subspace, which produce errors that are detectable by G ↑ , while the other half if error locations are in the spin-down subspace, which produce errors that are undetectable by G ↑ . Hence, we have: and similarly:

J Numerical Simulation for Bit-flip Noise
The same circuit as in Section 7 is used for our simulation, which consists of two-qubit components that are affecting by bit-flip noise as described in Appendix I.2. We have also shown in Appendix I.2 that the fraction of errors detectable by G tot is estimated to be f Gtot ∼ 1 and that by G ↑/↓ is estimated to be f G ↑/↓ ∼ 1 2 . Following Eq. (40) and the discussions there, the small-bias expansion scheme is simply expanding with The absolute relative bias in the energy estimate over different mean circuit error counts µ for the 8qubit and 12-qubit simulations are shown in Fig. 3  (a) and (b). The corresponding absolute infidelity | (ρ 0 )| are shown in Fig. 3 (c) and (d). The corresponding sampling costs are shown in Fig. 3 (e). In Fig. 3, we again see that out of all symmetry verification schemes (solid lines), the symmetry verification using the full symmetry group G can achieve the lowest bias. When we look at all symmetry expansion schemes including symmetry verifications, we see that the small-bias scheme we found again can achieve the  The relative bias in energy estimate | (H)|, absolute infidelity | (ρ0)|, and the sampling costs for three of the most representative symmetry expansions in our simulation: the unmitigated noisy state (Unmitigated), symmetry verification using the full symmetry group G (Verified) and the small-bias symmetry expansion using {G ↓ } (Expanded) at the mean circuit error rate (a) µ = 1 and (b) µ = 2. Note that the cost for symmetry verification here are for post-processing symmetry verification.
lowest bias. The bias and sampling cost comparison among the small-bias scheme, symmetry verification and without mitigation is shown in Table 2. At µ = 1, the small-bias scheme can achieve a much smaller bias compared to symmetry verification and without mitigation at a moderate sampling cost. At µ = 2, the small-bias scheme can maintain the small bias, but at a larger sampling cost. This is similar to what happens for the depolarising noise in the main text.  Data are averaged over circuit configurations with randomly generated parameters. The shaded areas indicate the spread of the lines over different parameters. The symmetry expansions we consider are all uniform expansion of a given set of symmetry operators. The legend indicates the set of symmetry operators used in the corresponding symmetry expansion. When the set of symmetry operators forms a group, the corresponding symmetry expansion will be symmetry verification which is labelled using solid lines. All the other symmetry expansion are labelled using dashed lines. Three of the most representative methods: without mitigation (black), symmetry verification using the full symmetry group G = {I, G ↑ , G ↓ , Gtot} (green) and the small-bias symmetry expansion using G * = {G ↓ } (red) are labelled with thicken lines.