Performance analysis of multi-shot shadow estimation

Shadow estimation is an efficient method for predicting many observables of a quantum state with a statistical guarantee. In the multi-shot scenario, one performs projective measurement on the sequentially prepared state for $K$ times after the same unitary evolution, and repeats this procedure for $M$ rounds of random sampled unitary. As a result, there are $MK$ times measurements in total. Here we analyze the performance of shadow estimation in this multi-shot scenario, which is characterized by the variance of estimating the expectation value of some observable $O$. We find that in addition to the shadow-norm $\|O \|_{\mathrm{shadow}}$ introduced in [Huang et.al.~Nat.~Phys.~2020\cite{huang2020predicting}], the variance is also related to another norm, and we denote it as the cross-shadow-norm $\|O \|_{\mathrm{Xshadow}}$. For both random Pauli and Clifford measurements, we analyze and show the upper bounds of $\|O \|_{\mathrm{Xshadow}}$. In particular, we figure out the exact variance formula for Pauli observable under random Pauli measurements. Our work gives theoretical guidance for the application of multi-shot shadow estimation.


Introduction
Learning the properties of quantum systems is of fundamental and practical interest, which can uncover quantum physics and enable quantum technologies. Traditional quantum state tomography [2][3][4] is not efficient with the increasing of qubit number in various quantum simulating and computing platforms [5,6]. Recently, randomized measurements [7], especially the shadow estimation method [1,8] are proposed for learning the quantum systems efficiently with a statistical guarantee.
In shadow estimation, a quantum state ρ is prepared sequentially, evolved by a randomly sampled unitary ρ → U ρU † , and finally measured in the computational basis. With the measurement results and the information of U , one can construct the shadow snapshotρ, which is an unbiased estimator of ρ. Using a few independent snapshots, many properties described by the observable O, for instance, the local Pauli operators and fidelities to some entangled states [1], can be predicted efficiently. Shadow estimation has been applied to many quantum information tasks, from quantum correlation detection [9][10][11], quantum chaos diagnosis [12,13], to quantum error mitigation [14,15], quantum machine learning [16,17], and near-term quantum algorithms [18]. There are also a few works aiming to enhance the performance of shadow estimation from various perspectives, for instance, derandomization [19] and locally biased shadow for measuring Pauli observables [20,21], hybrid shadow for polynomial functions [22], single-setting shadow [23], generalized-measurement shadow [24], shadow estimation with a shallow circuit [25][26][27] or restricted unitary evolution [28][29][30][31], and practical issues considering measurement noises [32,33].
In the current framework, shadow estimation in general needs to sample say M times random unitary to generate M shadow snapshots. However, the execution of different unitary is still resource-consuming in real experiments, which generally means changing control pulses or even physical settings. One direct idea is to add more measurement shots under the same unitary evolution [7,14], say K-shot per unitary, to compensate for the realization of random unitaries. This strategy, denoted as multi-shot shadow estimation, however, still lacks a rigorous performance analysis compared to the original one. In this work, we fill this gap by giving the statistical guarantee for multi-shot shadow estimation. We estimate the variance of measuring some observable O and relate it to the introduced cross-shadow-norm ∥O∥ Xshadow , which supplements the original shadow-norm ∥O∥ shadow [1]. The final variance is practically bounded by a interpolation of ∥O∥ Xshadow and ∥O∥ shadow controlled by the shot-number K. For both Pauli measurements and Clifford measurements, where the random unitary is sampled from single-qubit and global Clifford circuits respectively, we analyze ∥O∥ Xshadow and the variance. In the Pauli measurement, we find that multi-shot shadow estimation shows an advantage when estimating Pauli observable, conditioning on the expectation value of that observable. However, in the Clifford measurement, we find no clear benefit of increasing the shot-number K for a given number of random evolution M , especially for estimating the fidelity for many-qubit states. Our work could advance further applications of multi-shot shadow estimation.

Multi-shot shadow estimation
In this section, we introduce the background of shadow estimation and its multi-shot extension. Here, we focus on the n-qubit quantum system, that is, the state ρ ∈ H D with D = 2 n , and denote the computational basis of H D as {|b⟩} = {|b 1 b 2 · · · b n ⟩} with b i = 0/1. In shadow estimation [1], one prepares an unknown quantum state ρ in the experiment sequentially for M rounds. In the i-th round, one evolves the quantum system with a random unitary U sampled from some ensemble E to get U ρU † , and measures it in the computational basis to get the result b (i) . We call M the setting-number hereafter, since the sampled unitary U in each round determines the measurement setting. The shadow snapshot can be constructed as follows.
withb (i) a random variable.ρ (i) is an unbiased estimator of ρ, such that E {U,b (i) } ρ (i) = ρ. And the inverse (classical) post-processing M −1 is determined by the chosen random unitary ensemble [1,25,34,35]. For the n-qubit random Clifford circuit ensemble E Cl and the tensor-product of random single-qubit Clifford gate ensemble E Pauli , one has M −1 C = M −1 n and M −1 P = ⊗ n i=1 M −1 1 respectively, with M −1 n (A) = (2 n + 1)A − I 2 n tr(A) [1]. The random measurements from E Cl and E Pauli are denoted as Clifford and Pauli measurement primitives respectively.
Randomly choose U ∈ E and record it.

3:
for j = 1 to K do 4: Evolve the state ρ using U to get U ρU † .

5:
Measure the state in the computational basis {|b⟩}.
In the multi-shot shadow estimation illustrated in Fig. 1, one conducts K shots by applying the same unitary U sampled in each round, that is, the measurement settings of these K shots are the same. So the total preparation-and-measurement number is M K. Denote the measurement result in i-th round and j-th shot as b (i,j) . One can construct the estimator of ρ in this shot asρ (j) (i) following Eq. (1), and then average on K shots and M rounds to getρ This procedure is listed in Algorithm 1, and we remark that as K = 1, it reduces to the original shadow estimation.
To estimate the expectation value of an observable O, one can construct the unbiased estimator asÔ = tr(Oρ). The performance of shadow estimation is mainly characterized by the variance ofô, which determines the experiment time to control the estimation error, Here, the expectation value is taken on the random unitary U in M rounds and the measurement result b (i,j) in M K shots. In the following sections, we give a general expression for this key quantity, and then analyze the variance for Pauli and Clifford measurements respectively. We remark that even though we focus on estimating one observable O here, the variance result can be applied to simultaneously estimating a set of observables {O i } by using the median-of-mean technique [1], which is discussed in Appendix B.

The general variance expression and cross-shadow-norm
In this section, we give the general variance expression for the multi-shot shadow estimation and relate the variance to the cross-shadow-norm. First, we define two functions of a quantum state σ and an observable O as follows, where b, b ′ are computational basis for n-bit. The cross-shadow-norm is defined as follows.
The proof of it being a norm is in Appendix A.1. Note that the original shadow norm (short as Snorm hereafter) is defined as ∥O∥ shadow = max σ Γ 1 (σ, O) The proof of Theorem 1 is left in Appendix A.2. We remark that as K = 1, the variance in Eq. (6) and the upper bound in Eq. (7) reduce to the result of the original shadow estimation [1]. Note that the variance is an interpolation of the two functions Γ 1 (ρ, O 0 ) and Γ 2 (ρ, O 0 ) with the shot-number K. And the advantage of introducing multi-shot should come from the condition when Γ 2 (ρ, O 0 ) ≪ Γ 1 (ρ, O 0 ). We remark that Ref.
[14] also investigated on this kind of multi-shot variance with a focus on quantum error mitigation.
The function Γ 1 (σ, O) and the Snorm ∥O 0 ∥ shadow have been extensively studied in Ref. [1]. Thus, the remaining content of this work is to analyze Γ 2 (σ, O) and the XSnorm ∥O 0 ∥ Xshadow . To proceed, one can write Γ 2 (σ, O) in Eq. (4) in the following twirling channel form.
Here, we denote the 4-fold twirling channel as Φ (4,E) (·) = E {U ∈E} U †⊗4 (·)U ⊗4 , which is dependent on the random unitary ensemble E, and the 4-copy n-qubit diagonal operator is defined as For the global n-qubit Clifford ensemble, we denote the twirling channel as Φ (4,Cl) n , and the one of local Clifford is just the tensor product n i=1 Φ (4,Cl) 1,(i) . It is known that Clifford ensemble is not a 4-design [36]. Hence, in order to calculate Γ 2 (σ, O) and the XSnorm, one should apply the representation theory (rep-th) of Clifford group in the 4-copy Hilbert space [36,37]. In the following sections, we first figure out the exact result of random Pauli measurements, say single-qubit Clifford twirling based on direct calculation, without using the rep-th; and then give the bound of Γ 2 (σ, O) for random Clifford measurements, say n-qubit Clifford twirling with results from rep-th.

Variance analysis for Pauli measurements
In this section, we focus on the observable O being an n-qubit Pauli operator P , that is This choice is of practical interest, for example, in the measurement problem in quantum chemistry simulation [19,20,38]. We say a Pauli operator P is w-weight, if there are w qubits with P i ̸ = I 2 , i.e., there are single-qubit Pauli operators on w qubits and identity operators on the other n − w qubits. We have the following result for the Pauli observable.
We remark that compared to Γ Pauli 1 (σ, P ) = 3 w in the original shadow [1], Γ Pauli 2 (σ, P ) has an extra relevant term, that is the square of the expectation value tr(P σ) 2 . To prove Proposition 1, we mainly apply the following Lemma 1 of the single-qubit Clifford twirling.

Lemma 1. The 4-fold single-qubit Clifford twirling channel maps the operator
Here F (t) := 2 −t/2 (I ⊗t 2 + X ⊗t + Y ⊗t + Z ⊗t ) on t qubits, and F (2) is the swap operator on two qubits.
We prove Lemma 1 by giving the general t-copy twirling result of Φ (t,Cl) 1 on any Pauli operator in Appendix C.1. Note that the diagonal operator in Eq. (9) can be decomposed to Λ n = ⊗ n i=1 Λ 1,(i) , and the twirling result in Eq. (8) is thus in the tensor-product form 1,(i) (Λ 1,(i) ). Then one can prove Proposition 1 by applying the result of Lemma 1 to Eq. (8) and further calculation, which is left in Appendix C.2.

Theorem 2.
Suppose O = P is a Pauli observable with weight w, the variance to measure P using random Pauli measurements is in (a) and (b) respectively. The variance in (a) is almost independent with K, and for the one in (b) the dependence on K and M is the same. Note that in (a) and (b), tr(P ρ) 2 = 1 and 0 respectively, which are consistent with Eq. (13) and (14). In (c) and (d), we explore the variance dependence on weight w. Here, the processed state ρ is an 8-qubit GHZ state, and we set M = K = 64. In (c), the Pauli observable is P = σ ⊗w with w being an even number, and in (d), P = σ ⊗w , such that tr(P ρ) 2 = 1 and 0 respectively. The red triangles represent the simulated variance which shows consistency with the green dotted line, i.e., the analytical variance given by Eq. (13).
Theorem 2 shows that the dependence of Var [tr(P ρ)] on the shot-number K is related to tr(P ρ) 2 . For the extreme cases when tr(ρO) 2 = 0/1, One can see that when tr(P ρ) 2 = 1, the variance is independent of the shot-number K; for tr(P ρ) 2 = 0, the dependence of K is the same as the setting-number M . So it is advantageous to increase K for tr(P ρ) 2 being small, by considering that it is more convenient to repeat shots than change measurement settings in a real experiment. The variance result here can be extended to any w-local observable following the proof routine in Ref. [1], and this is investigated in Ref. [14]. We give more discussions on this point in Appendix B.
In Fig. 2, we show numerical results of the variance dependence on M , K and w, where we perform random Pauli measurements on the Greenberger-Horne-Zeilinger(GHZ) states. Please see Appendix F for details of the numerical simulation, and also extended discussions about the reduction of experiment time-cost. The numerical results are consistent with the variance given by Eq. (13) and (14).

Variance analysis for Clifford measurements
In this section, we analyze the statistical variance of random Clifford measurements. In this case, the inverse channel maps For the simplicity of the calculation, we decompose Λ n = Λ 0 n + Λ 1 n , with To calculate Γ Cl 2 (σ, O 0 ), one should apply the 4-fold twirling result of the Clifford group [36] on Λ 0 n and Λ 1 n , respectively. And we show in Appendix E that Λ 1 n contributes to the leading term of the final result. Since the n-qubit Clifford twirling is quite sophisticated, we do not calculate it in a very exact form but give the following upper bound on Γ Cl 2 (σ, O 0 ).
for the random Clifford measurements, where ∥A∥ 2 = tr(AA † ) is the Frobenius norm, and c is some constant independent of the dimension D. In this case, the XSnorm For comparison, in the original shadow [1],

Theorem 3. For a general observable O, the variance to measure O using the random Clifford measurements is upper bounded by
with c ′ = c + 3 some constant independent of the dimension D, and ∥A∥ 2 = tr(AA † ) is the Frobenius norm. . Note that the corresponding Frobenius norm ∥O 0 ∥ 2 is about a constant and 2 n/2 , respectively. The dotted lines and the slopes represent the fitting curves for the corresponding cases, which is consistent with Eq. (19).

In the original shadow, the variance bound is about
is an upper bound, it already gives a hint that it is not very helpful to increase the shot-number K in the random Clifford measurements. We demonstrate this phenomenon with numerical simulation considering measuring the fidelity. In Fig. 3, we take , the GHZ state with some phase. And the observable is taken to be O = |GHZ θ=0 ⟩ ⟨GHZ θ=0 |, that is, measuring the fidelity of the quantum state to the GHZ state. We find that the variance is almost independent with the shot-number K for various choices of θ which gives different expectation values of tr(Oρ). This is very different from the results of random Pauli measurements as shown in Fig. 2.

Conclusion and outlook
In this work, we systemically analyze the statistical performance of multi-shot shadow estimation by introducing a key quantity -the cross-shadow-norm. We find the advantage of this framework in Pauli measurements, however, the advantage in Clifford measurements is not significant. There are a few interesting points that merit further investigation. First, the application of the result of Pauli measurements to quantum chemistry problems is promising, especially combined with other measurement techniques, such as derandomization and local-bias [19,20]. Second, considering that the advantage of multi-shot for Clifford measurements is minor, it is interesting to investigate whether this hold or not for shallow circuit [25][26][27] or restricted unitary evolution [28][29][30][31]. Third, it is also intriguing to extend the current analysis to boson [39][40][41] and fermion systems [42,43], and to nonlinear functions of quantum states [9,11,22]. Finally, we expect the result here would also benefit the statistical analysis for other randomized measurement tasks, especially those related to high-order functions [44][45][46][47].
While completing this work, we became aware of a related work [48] by Jonas Helsen and Michael Walter which also considers shadow estimation under multi-shot. Different from Ref. [48], we analyze the performance under Pauli measurements. Our negative conclusion on Clifford measurements is consistent with theirs. Ref. [48] also demonstrates the weakness of Clifford measurements by giving the exact variance for estimating stabilizerfidelity. Additionally, they nicely extend the Clifford unitary to doped Clifford circuits to bypass this weakness. First one has ∥0∥ Xshadow = 0, and then we verify the triangle inequality as follows.
(20) Here the first and third inequalities are due to the definition of XSnorm in Eq. (5), and σ is assumed to be the optimal state in the maximization for O 1 + O 2 , but may be not the optimal one for O 1 and O 2 respectively. The second inequality is by the Cauchy-Schwarz inequality for the domain of the random unitary U . That is, we can take the formula of the cross term as the inner product of two vectors indexed by U .

A.2 Proof of Theorem 1
As ρ (i) are M i.i.d. random variables, we only need to focus on a specific i 0 , and get the final variance by directly dividing it by M . By definition the variance of tr O ρ (i 0 ) shows Here one can shift the operator to its traceless part without changing the variance. The expectation value can be written explicitly as The expectation value of the terms in the summation depends on the coincidence of the index j. For j = j ′ with totally K terms, one has by definition in Eq. (4). Here in the first equality we insert the definition of ρ For j ̸ = j ′ with totally K 2 − K terms, the measurements are under the same setting i 0 but for different shots, one has by definition in Eq. (4), and here the subscript i 0 is omitted without ambiguity.
As a result, the total variance shows and we finish the proof.

B Variance analysis of a collection of observables
In this section, we extend the variance results on a single observable O in the main text to a collection of observable {O i } L i=1 . In particular, we are interested in the summation of them H = i O i , and each O i = α i P i could be proportional to a Pauli operator P i . For instance, H can be a Hamiltonian of an n-qubit system, and we aim to estimate the expectation value of H within error ϵ. We first give general formulas of the measurement budget or the variance without a specific chosen random measurements, and then show the result of estimating H under random Pauli measurements, which is of practical interest. For simplicity, we denote the estimator based on the shadowρ asÔ = tr(Oρ) and the expectation value asŌ = tr(Oρ) in the following discussion.
We mainly give two solutions to this task. The first one is directly using the median-ofmean technique for all {O i } L i=1 following Theorem S1 of Ref. [1]. Remembering that we totally collect M i.i.d. estimators {ρ (k) }, and each of them is composed of measurement data from repeated K-shots as shown in Eq. (2). For the median-of-mean post-processing, {ρ (k) } is grouped into M 1 sets with each set containing M 2 elements (please refer to Eq. (S12) in Ref. [1] for more details). The estimator for each O i is denoted asÔ i (M 1 , M 2 ), and H(M 1 , M 2 ) is thus the summation of them. Different from estimating each O i within error ϵ there, we should instead keep the whole error within ϵ. Let ϵ i = ϵ∥O i ∥ ∞ / i ∥O i ∥ ∞ and ϵ = i ϵ i , one can find the following result.
with the variance dependent on the particular random measurement primitive. A collection of M = M 1 M 2 shadow estimators {ρ (k) } M k=1 via the the median-of-mean prediction is sufficient to make the estimating error of In particular, for each O i = α i P i and the random Pauli measurements, the number of independent estimators is at most Here ∥⃗ α∥ 1 = i |α i | is the 1-norm of the coefficient vector, and Var(P j ) of Pauli operator P j on a single estimatorρ (k 0 ) given in Eq. (13) (i.e., M = 1 there).
Here the error of estimating H is bounded by the triangle inequality of each error ϵ i of estimating O i .
The second way is to calculate and bound the variance ofĤ, based on the variance of eachÔ i . Let us first define some functions of a state σ and two observable O 1 and O 2 generalized from Eq. (4).
One has the following variance result of estimating H.

Theorem 4. By using multi-shot shadow estimation with M -round and K-shot, the statistical variance ofĤ with H = i O i shows
The variance of Var(Ô i ) is given in Eq. (6), and the covariance can be calculated via The proof follows similarly to that of Theorem . Applying this to random Pauli measurements, one finally gets the following result.

Corollary 2. For H = i α i P i and the random Pauli measurements, the variance of estimating H is upper bounded by
Var(P j ). (30) with Var(P j ) given in Eq. (13).
The second inequality in Eq. (30) is by the Holder's inequality (p = 1, q = ∞). By adopting the median-of-mean technique only for a single observerble H, one sets M 1 = 2 log(2/δ) and M 2 = 34ϵ −2 Var(Ĥ), with the variance on a single estimatorρ (k 0 ) . In this way, a collection shadow estimators {ρ (k) } M i=1 is sufficient to make the estimator error less than ϵ under confidence 1 − δ. By using upper bound in Eq. (30), one has M at most being Var(P j ).

(31)
It is clear that the upper bound of M obtained here is better than the one in Eq. (26).
We remark that one can also apply Cauchy-Schwarz inequality to Eq. (30) to find another upper bound Var(Ĥ) ≤ ∥⃗ α∥ 2 2 j Var(P j ). Note that Var(P j ) is a function of w i , P i and the shot-number K. One may further utilizes the commuting relation among the operators {P i } of interest to minimize the measurement budget, and we leave it to some further study.

C Proof for random Pauli measurements C.1 Proof of Lemma 1
We first decompose Λ 1 defined in Lemma 1 into the Pauli operator representation as follows.
Here I 2 and Z are single-qubit identity and Pauli Z operator. By inserting this decomposition inside the 4-copy single-qubit Clifford twirling channel, In the second line: the first term is due to Φ (4,Cl) 1 being unital, the two terms in the middle reduce to 2-copy twirling, and the last term is still a 4-copy twirling. In the third line, we insert the result of the following Lemma.

Lemma 2. The t-fold single-qubit Clifford twirling channel maps any Pauli operator P ⊗t
where in the first line we define F (t) := 2 −t/2 (I ⊗t 2 + X ⊗t + Y ⊗t + Z ⊗t ), and F (2) is the swap operator on two qubits.
Proof. By definition, random single-qubit Clifford gate takes P 1 ∈ {X, Y, Z} uniformly to all six directions in the Bloch sphere, i.e., {±X, ±Y, ±Z}. Consequently, the final result is the average of six terms, i.e., 1/6 P 1 ∈{±X,±Y,±Z} P ⊗t 1 . In the even t case, we thus have the equal weight summation of X ⊗t , Y ⊗t , Z ⊗t . In the odd k case, X ⊗t and (−X) ⊗t cancel with each other, same for Y, Z terms, thus it returns zero.

C.2 Proof of Proposition 1
Proof. Without loss of generality, suppose P = P 1 ⊗ · · · P w ⊗ I [n−w] with the first w-qubit owning Pauli operator, and of course traceless. The inverse channel maps . By definition of Eq. (4), one has Here in the first equality, we write the random unitary U = n i=1 u i and computational basis summation b = {b 1 b 2 · · · b n }, b ′ = {b ′ 1 b ′ 2 · · · b ′ n } on the single-qubit level; in the second equality we sum the b i , b ′ i for i > w in the trace, and it gives I [n−w] on the last n − w qubits no matter what u i is chosen. And the problem is reduced on the first w-qubit in the third equality, with σ [m] the reduced state of the first m-qubit from σ. In the last two lines, we arrange these operators and relate the result to the single-qubit Clifford twirling.
By inserting the Eq. (12) of Lemma. 1, one finally has Here the second line is by the following fact. Suppose there is an appearance of I ⊗2 i ⊗ F

D Statistical analysis for Haar random measurements
In this section, we aim to give the statistical analysis for the global Haar random measurement, that is, the unitary ensemble E is Haar random on H D (or any unitary 4-design ensemble). The central result shows as follows, as an analog of Proposition 2 in main text.

Proposition 3.
Suppose O 0 is a traceless observable, the function Γ 2 defined in Eq. (4) is upper bounded by Γ Haar for the random Haar measurements, where ∥A∥ 2 = tr(AA † ) is the Frobenius norm, and c 1 is some constant independent of the dimension D. In this case, the XSnorm ∥O 0 ∥ Haar Xshadow ≤ √ c 1 ∥O 0 ∥ 2 by Eq. (5).
The main task of this section is to prove this proposition, which is helpful for the discussion of random Clifford measurements shown in the next section.
Recall the essential quantity in Eq. (15) we would like to calculate shows with the only difference being that we use the Haar twirling Φ (4,Haar) here, and Λ n is decomposed into two parts as in main text, Before we calculate their contributions separately in the next subsections, we give a brief review of the result of the t-copy Haar twirling, and more details can be found in, for example Ref. [49][50][51]. Denote the permutation elements of the t-th order symmetric group as π ∈ S t , and there is a unitary representation of π on the t-copy Hilbert space H ⊗t D as with |b⟩ the basis state for one copy. By the celebrated Schur-Weyl duality, the twirling result is related to the irreducible representation (ir-rep) of S t as follows.

Lemma 3. The t-fold Haar twirling channel maps
where λ denotes the irreducible representation of S t , and P λ is the corresponding projector showing with χ λ (π) being the character of π.

D.1 The contribution of Λ 0 n in Haar case
Inserting the term Λ 0 n in the 4-copy Haar twirling channel in Eq. (40) with t = 4 one has Here the second line is by the fact tr |b⟩ ⟨b| ⊗4 T π −1 = 1 for any b and π; the third line is by the definition of the symmetric subspace as P sym = 1 4! π∈S 4 T π ; the final line is by the fact that symmetric subspace is one of the ir-rep subspace and orthogonal to others. Inserting this twirling result and the dimension D sym = (D + 3)(D + 2)(D + 1)D/4!, one has The final line shows that the result is in the order O(D −1 ) tr O 2 0 , by the following lemma. Lemma 4. For any π ∈ S 4 , the following inequality holds for a quantum state σ and a traceless observable O 0 , Proof. For the permuation operator T π , the formula could take the following values: tr σ 2 tr O 2 0 , tr O 2 0 , tr O 2 0 σ , tr O 2 0 σ 2 , tr(O 0 σO 0 σ). All these can be bounded by tr O 2 0 . We bound the last one with Cauchy-Schwarz inequality for operator as follows.

D.2 The contribution of Λ 1 n in Haar case
For the second term of Λ 1 n , by inserting the Haar twirling channel in Eq. (40) with t = 4, one has Here the second line is by the fact that the trace formula gives zero unless π ∈ S r , with the set S r = {(), (12), (34), (12)(34)} being a subgroup of S 4 . And we define the corresponding projector as P r = 1

E Statistical analysis for random Clifford measurements
Similar as the Haar case, here the we should calculate the quantity in Eq. (15) shown as follows, In the following subsections, as in the Haar case, we evaluate the contributions from Λ 0 n and Λ 1 n separately. As Clifford circuit is not a 4-design, the 4-copy twirling result is a little different compared to Eq. (40), which is shown as follows. One can refer to Ref. [36,51,52] for more details.

Lemma 5. The 4-fold n-qubit Clifford twirling channel maps
where λ denotes the irreducible representation of S t , and P λ is the corresponding projector shown in Eq. (41) with t = 4. The operator Q is a stabilizer code subspace projector, commuting with any T π and P λ , with W k running on all D 2 n-qubit Pauli operators including the identity; and Q ⊥ = I ⊗4 D − Q.

E.1 The contribution of Λ 0 n in Clifford case
For the first term Λ 0 n , by using the twirling formula in Eq. (51) one has (53) Here in the second line, we use the fact that T π |b⟩ ⊗4 = |b⟩ ⊗4 for any π. In the third line, tr |b⟩ ⟨b| ⊗4 Q = 1/D, since W k which only contains {I 2 , Z} on each qubit contributes to it and there are totally 2 n = D such terms. Therefore, tr |b⟩ ⟨b| ⊗4 Q ⊥ = 1 − 1/D. In the final line, we only have P sym due to the orthogonality of each P λ . Inserting this twirling result into the trace formula in Eq. (50) one gets ; in the last line, the second term is directly from the result in Eq. (43) of the Haar case, and the first term is on account of the following Lemma 6, by taking P 0 = P sym and P 1 = I there.
In total, we thus show that the contribution from Λ 0 n is still in the order O(D −1 ) tr O 2 as in the Haar case. Lemma 6. For any two projectors P 0 and P 1 , which may not commute with each other but commute with Q, the following inequality holds for a quantum state σ and an observable O, Proof. Suppose the spectrum decomposition of the observable is O = j a j |Ψ j ⟩ ⟨Ψ j |, and the first inequality is by using the Cauchy-Schwarz inequality, and the second inequality is by To further bound Eq. (56), we insert the formula of Q in Eq. (52) to get Here the second inequality is by the fact tr(σW k ) 2 ≤ 1, and the last line is by the decomposition of the n-qubit swap operator in the Pauli basis as S = D −1 k W k ⊗ W k .

E.2 The contribution of Λ 1 n in Clifford case
For the second term Λ 1 n , by using the twirling formula in Eq. (51) one has We define the first trace formula as There are two cases which give nonzero value depending on π. In the first case, via the permutation T π , |b⟩ is connected to ⟨b|, and this also holds for b ′ . That is, π ∈ S r with the set S r = {(), (12), (34), (12)(34)} and one has | ⟨b| W k |b⟩ | 2 | ⟨b ′ | W k |b ′ ⟩ | 2 .
As in the Haar case, one can only choose W k with {I 2 , Z} for each qubit, so it gives D/D 2 = D −1 . In the second case, all |b⟩ are connected to ⟨b ′ |, that is π ∈ S r ′ , with S r ′ = (13)(24)S r = S r (13) (24). And one has | ⟨b| W k |b ′ ⟩ | 4 . For b ̸ = b ′ , one needs to choose {I 2 , Z} for the qubit with the same bit value of b and b ′ , and {X, Y } for the qubit of different bit values. Thus the result is also D −1 . Besides these two cases, the term shows Note that Q 0 (π) + Q 1 (π) = 1 when π ∈ S r , otherwise it is 0, as shown in the Haar case. We thus have Inserting these into Eq. (59), and define the projector P r ′ = 1 4 π∈S r ′ T π similar as P r in the Haar case, one has by the fact D + λ = O(D 2 ) (otherwise it is zero) and D − λ = O(D 4 ) [36]. For the ir-rep where D + λ = tr(QP λ ) = 0, that is, QP λ = 0, one has Θ 1 (λ) = Θ 2 (λ) = 0. Recall Eq. (53), one finally has the contribution of Λ 1 n as (64) Here in the last but one line, we apply Lemma 6 for the first two terms, and the last two terms can be bounded by following Eq.  In this section, we clarify the technique and details of the numerical simulations for the scaling of statistical variance with shot-number K, setting-number M , weight w of Pauli observables, and qubit number n in Fig. 2 and 3.
We first generate the processed state ρ, which is either a GHZ state or a more general GHZ θ state in quantum simulators. Then we perform randomized Pauli or Clifford measurements on ρ for M different random bases, each one repeating K shots, and record the measurement results and the corresponding bases. Using the numerical results, we can construct the classical shadowρ through Eq. (1) and Eq.

F.2 Practical time-cost and accuracy analysis for random Pauli measurements
Here, we illustrate the advantage of the introduced multi-shot protocol, by investigating the variance and time cost enhancements against the original shadow protocol for fixed measurement budgets and accuracy requirements.
Generally, the total experimental time cost C tot depends on both the time to change measurement settings C M and the time to perform single-shot measurements C K . We assume, similar as in Ref. [14], C M = 1000 and C K = 1, which indicates that changing the measurement settings requires more time than performing measurements in a fixed basis. Thus, the measurement budget is approximately C tot = 1000M + M K. As indicated in Eq. (13), the variance to measure a Pauli operator P using random Pauli measurements depends on {M, K, w, tr(P ρ)}. Here, we directly set w = 2, tr(P ρ) = 0.2 and explore the scaling of the variance and total time cost with the shot-number K and setting-number M , as shown in Fig. 4. Consider a fixed measurement budget C tot = 4.4 × 10 5 , we find the optimal choice of (M, K) using our multi-shot protocol. It provides the lowest variance for estimating tr(P ρ), that is, Var ms = 0.001, with the optimal choice approximately M = 380 and K = 160. Under the same budget, as shown in Fig. 4, the variance using the original shadow protocol Var os is much larger than using the multi-shot method Var ms , i.e., Var os = 0.011 > Var ms = 0.001. In the meantime, if one fixes the variance requirement as Var = 0.001, the total experimental time cost of the original shadow is C ′ tot = 9.15 × 10 6 marked by the point of red triangle in Fig. 4, which is much larger than the one using our multi-shot technique. Therefore, the results of the example above show advantages of our multi-shot shadow protocol against the original shadow from both the measurement budgets and the accuracy aspects.