Sample-optimal classical shadows for pure states

We consider the classical shadows task for pure states in the setting of both joint and independent measurements. The task is to measure few copies of an unknown pure state $\rho$ in order to learn a classical description which suffices to later estimate expectation values of observables. Specifically, the goal is to approximate $\mathrm{Tr}(O \rho)$ for any Hermitian observable $O$ to within additive error $\epsilon$ provided $\mathrm{Tr}(O^2)\leq B$ and $\lVert O \rVert = 1$. Our main result applies to the joint measurement setting, where we show $\tilde{\Theta}(\sqrt{B}\epsilon^{-1} + \epsilon^{-2})$ samples of $\rho$ are necessary and sufficient to succeed with high probability. The upper bound is a quadratic improvement on the previous best sample complexity known for this problem. For the lower bound, we see that the bottleneck is not how fast we can learn the state but rather how much any classical description of $\rho$ can be compressed for observable estimation. In the independent measurement setting, we show that $\mathcal O(\sqrt{Bd} \epsilon^{-1} + \epsilon^{-2})$ samples suffice. Notably, this implies that the random Clifford measurements algorithm of Huang, Kueng, and Preskill, which is sample-optimal for mixed states, is not optimal for pure states. Interestingly, our result also uses the same random Clifford measurements but employs a different estimator.


Introduction
How many copies of an unknown state are required to construct a classical description of the state?The answer to this question will depend on several details: what constitutes an accurate description; what is already known about the state; and what restrictions are placed on the measurements of the state.Given the fundamental importance of this question, there has been significant prior work in bounding the number of samples of the states required to perform this learning task in a variety of contexts.
The most well-known setting is called quantum state tomography, where the goal is to learn enough about the state to be able to completely reconstruct it-precisely, estimate the unknown d-dimensional quantum state to accuracy ϵ in the Schatten 1-norm.Tight upper and lower bounds for the number of copies required for this task are known: Θ(ϵ −2 d 3 ) copies of the state are needed with independent measurements [1], and Θ(ϵ −2 d 2 ) copies are needed when the unknown states can be simultaneously measured in a large joint measurement [2].Independent measurements are easier to experimentally implement, while the joint measurements explore what is possible with respect to the fundamental limits of quantum mechanics.A key takeaway in the joint measurement setting is that the algorithm for the upper bound is achieving what would naïvely be the best possible result, given that a d-dimensional state has Θ(d 2 )-many independent parameters, and Θ(ϵ −2 ) samples are necessary to estimate any one parameter.
In some sense, the requirements of the quantum tomography question are quite rigid.For many applications, only some properties of the unknown state are important.Can we get away with fewer samples if we relax our notion of approximation?In particular, what if we only wish to learn the expected values of certain Hermitian observables?Aaronson gave a somewhat surprising answer to this question in a joint measurement setting called shadow tomography [3]: given M bounded observables ({O i } M i=1 , ∥O i ∥ ≤ 1), estimate Tr(O i ρ) to within ϵ additive error. 1 In this setting, Aaronson showed that only Õ(ϵ −4 log 4 M log d) samples of the state are needed.Subsequent work by Bȃdescu and O'Donnell [4] improved this to Õ(ϵ −4 log 2 M log d), but there are still no matching lower bounds for this setting.That is, we do not know if we are extracting as much information about the unknown state as we can.In the independent measurement setting, Θ(min{M, d}/ϵ 2 ) samples are necessary and sufficient [5].
One subtlety concerning these observable estimation tasks is whether or not the measurements are allowed to depend on the specific observables O i .In shadow tomography, the measurements can depend on the observables, but an increasingly popular setting (inspired by the work of Huang, Kueng, and Preskill [6]) is one in which the observables O i are unknown at the time of measurement.That is, the measurements must produce a classical description (called the classical shadow ) from which the observable expected values can later be calculated.In their randomized Clifford measurement scheme, Huang, Kueng, and Preskill consider the independent measurement setting and show that Θ(Bϵ −2 log M ) copies of the unknown state are both necessary and sufficient provided that Tr O 2 i ≤ B for all i (note that Tr O 2 i ≤ d).Consider now how the classical shadows setting compares to the quantum state tomography setting with regard to the type of measurements allowed.In the quantum state tomography setting, we know that joint measurements allow us to extract more information from the state, yielding estimates of the unknown state with provably fewer samples than those required with independent measurements.In the classical shadows setting, however, it is not known how the type of measurement affects the number of samples required.Concretely, is it possible to perform the classical shadows task with fewer samples if we switch to joint measurements?We answer this question affirmatively in the setting of pure states.
Formally, we show the following: O(( √ Bϵ −1 + ϵ −2 ) log M ) samples of the unknown pure state are sufficient for performing the classical shadows task with constant probability of failure.Compared to [6], this achieves almost a square root reduction in sample complexity.
Remarkably, in analogy with the quantum state tomography setting, our joint measurement procedure is in some sense extracting the maximum amount of information possible.To see this, consider a simple setting in which B = d, ϵ is constant, and we only wish to estimate a single observable.Our algorithm uses O( √ d) samples.However, Gosset and Smolin [7] show that even if you are given the state as an explicit density matrix, you cannot compress your description of the state down to fewer than Ω( √ d)-many bits of information in order to estimate arbitrary observable expectation values.Notice, however, that to successfully execute the classical shadows task, one would first need to learn such a compressed description through measurement of the unknown state.A priori, the number of measurements required to do this could be much higher than the size of this compressed description.The fact that we find a matching upper bound implies that accessing the relevant information contained in the state is not the significant bottleneck.
We show that a similar phenomena exists for arbitrary parameters B and ϵ.Namely, we refine the Gosset-Smolin lower bound for compression to Ω( √ Bϵ −1 )-many bits, which ultimately allows us to show that Ω( √ Bϵ −1 + ϵ −2 ) samples of the state are required for the classical shadows task.Therefore, our joint-measurement algorithm above is sample-optimal (at least for a single observable and up to log factors).
Finally, we address the classical shadows question with pure states and independent measurements.We show that O(( ) copies of the state suffice.It's worth noticing that in certain parameter regimes, this upper bound is smaller than Θ(Bϵ −2 log M ).In other words, our algorithm uses fewer samples than the classical shadows algorithm of Huang, Kueng, and Preskill which was designed for general mixed states.Indeed, their lower bound methods require the underlying state to be mixed.

The classical shadows task
We consider the classical shadows task introduced by Huang, Kueng, and Preskill [6]: given several copies of an unknown quantum state, produce a classical description of the state that is sufficiently representative to permit the reliable and accurate estimation of expectation values of some number of observables chosen from a broad class.
To formalise the task, let's begin with the class of observables we will use: In summary, these observables have been scaled/normalized so that ∥O∥ ∞ = 1 and have a bound of B on their squared Frobenius norm Tr(O 2 ).The latter condition is due to the fact that Tr(O 2 ) is typically the dominant term in the sample complexity.We could also reasonably upper bound it by the rank of the observable since We remark that ∥O∥ 2 = Tr(O 2 ) and ∥O∥ ∞ are examples of Schatten p-norms where p = 2 and p = ∞ respectively, but defined in general as ∥A∥ p := Tr(|A| p ) 1/p for p ∈ [1, ∞).We will also use the Schatten 1-norm.Going forward, we write ∥O∥ 1 for the 1-norm, ∥O∥ for the infinity norm, and prefer Tr(O 2 ) over ∥O∥ 2  2 .
Definition 2 (Classical Shadows Task).The Classical Shadows Task consists of two separate phases-a measurement phase and an observable estimation phase-which are completed by two separate (randomized) algorithms, A meas and A est , respectively.In addition to the inputs below, each algorithm also depends on the four parameters s, B, ϵ, and δ: It's worth emphasizing that the input to the measurement algorithm is quantum (the state ρ ⊗s ) and the output is classical (the classical shadow).This output is computed from measuring the input state with some POVM (with arbitrary post-processing).We say that A meas and A est constitute a valid protocol for the classical shadows task if their estimate for the expectation of the observable with probability at least 1 − δ over the randomness of A meas and A est .
Some may find it useful to think about the classical shadows task as a one-way communication protocol where one party (let's call her Melanie) is given copies of an unknown state and another party (say, Esteban) is given an observable.Melanie doesn't know Esteban's observable, and Esteban cannot send hints because we are assuming one way communication from Melanie to Esteban, so there is only one course of action: Melanie must measure her unknown state and send (over a classical channel) a description of the state from which Esteban can estimate the expected value of his given observable.
Throughout this paper, we will focus on the classical shadows task with unknown pure states.This motivates the following definitions: Definition 3 (Sample Complexity of the Classical Shadows Task).Let Shadows(B, ϵ, δ) to be the minimum number of samples s required to successfully carry out the classical shadows task on pure states with the set of observables Obs(B), to accuracy ϵ, and failure probability at most δ.
Sometimes we will omit δ and write Shadows(B, ϵ) to denote the minimum number of samples to achieve these tasks with some constant probability of failure, say, 0.001.Definition 4 (Classical Shadows with Independent Measurements).Let I-Shadows(B, ϵ, δ) be the sample complexity for the classical shadows task with pure states when the measurement algorithm can only make independent measurements on the input state-that is, the measurement POVM is the tensor product of POVMs on single copies of the state.These POVMs do not have to be identical, but the entire state must be measured at the same time, or in other words, the output from a measurement on one copy of the state cannot influence the measurement on another.
We note that there are many possible variants for the sample complexity of the classical shadows task that we haven't given individual names.Most notably are the settings where the unknown states are mixed states (rather than pure) and/or the measurements are allowed to be adaptive (while still acting on single copies of the state).

Summary of results
Our main result is to prove matching upper and lower bounds on the sample complexity of performing the classical shadows task with respect to joint measurements and pure states.
Notice that Theorem 5 consists of separate upper and lower bound results (for constant δ).These match up to logarithmic factors in B and ϵ −1 , and the technical relationship between B, ϵ, and d is only required for the lower bound.In Section 3, we will prove the upper bound, where we will also show that the dependence on the failure probability δ goes as log(1/δ).We note that this dependence on δ implies that there are efficient protocols for the calculation of several observables simultaneously-that is, if the classical shadows task fails with probability at most δ on a single observable, then it fails with probability at most M δ on one or more out of M observables by the union bound.In Section 4, we will prove the lower bound where only the ϵ −2 term will scale with log(1/δ).
We also prove an upper bound on the sample complexity of performing the classical shadows task with respect to independent measurements and pure states.Our upper bound can be compared to the matching upper and lower bound of Huang, Kueng, and Preskill [6] which applies to independent measurements and general states.In certain parameter regimes, our upper bound achieves a smaller sample complexity than the lower bound in [6] which implies that in the independent measurement setting, the classical shadows task has smaller sample complexity for pure states.Theorem 6.For all ϵ, δ > 0, We discuss and prove Theorem 6 in Section 5.
Finally, we note that in all of our algorithms, the estimator ρ we use for the unknown state ρ is not itself a proper state.In Appendix A, we show that this is a necessary price for the favorable sample complexity enjoyed by classical shadows schemes.Informally, we show that even for observables in Obs(1), learning an estimate ρ that is a proper state to sufficient accuracy to solve the classical shadows task via the formula Tr(O ρ), requires a sample complexity that scales linearly in d, the dimension of the unknown state.

Preliminaries
Here we cover key background material related to Haar random states, their moments, and the symmetric subspace.Throughout, we're working with qudits of dimension d ≥ 2 unless otherwise specified.Since the unitary group U(d) acts on the Hilbert space of dimension d, it has a corresponding Haar measure which is invariant under the action of the group.Haar random states sampled proportional to this measure are ubiquitous in quantum information, and essential to define our measurement in Section 3.
To perform the necessary calculations on Haar random states, we need to discuss their moments, and some ancillary concepts.Definition 7.For integer k ≥ 1, k-th moment of an ensemble E of quantum states is An ensemble E is a (state) t-design if the moments 1 ≤ k ≤ t are identical to those of the Haar distribution (see Lemma 10).Definition 8 (permutation operator).Given a permutation π ∈ S s (for s ≥ 1), define a permutation operator W π ∈ C d s ×d s such that and extend by linearity.That is, W π acts on (C d ) ⊗s by permuting the qudits, sending the qudit in position i to position π(i).
Definition 9 (symmetric subspace).The symmetric subspace of an s-qudit system (C d ) ⊗s is the subspace invariant under W π for all π ∈ S s .We use κ s to denote its dimension and define Π (s) sym to be the projector onto it (notationally omitting the dependence on d, the dimension of the qudit).
We have two characterizations of the symmetric subspace.The integral of |ψ⟩⟨ψ| over the Haar measure is known from, e.g., [8].
sym is the projector onto the symmetric subspace and W π is the operator that permutes s qudits by an s-element permutation π.
We will often need to compute the (partial) trace of (A 1 ⊗ A 2 ⊗ • • • ⊗ A s )W π for some linear operators A 1 , . . ., A s ∈ C d×d .It turns out that there is an extremely useful tensor network based pictorial representation that simplifies these calculations.Let us give a brief introduction to those techniques, though readers may also find more thorough treatments useful [9,10].
To start, we draw a single d dimensional linear operator A = i,j∈ [d] a i,j |i⟩⟨j| as a tensor block with a leg for the input and output indices for A: A i j Suppose we have another tensor B = i,j∈ [d] b i,j |i⟩⟨j|.We express composition, tensor product, and trace as the following tensor networks: The reason the tensor network picture is particularly nice for dealing with traces of W π terms is because each W π term is simply a permutation of wires in the tensor network picture.For example, for a simple cyclic permutation, we have The key feature of tensor networks is that only the topology of the network matters, so we can simplify tensor networks just by moving the elements around.For example, consider a common partial trace that will arise in this paper: Tr 1 ((A ⊗ B)W (1 2) ).Drawing the tensor network, we get where we can push the B tensor through the SWAP and around the trace loop to see that it is composed with A. In other words, we have just shown the identity As a generalization, we have the following useful fact: For any A 1 , . . ., A n , we have where Tr −1 indicates the partial trace of all but the first qudit.Thus, Proof.The fact is best seen with a small example.When n = 5, for instance, the tensor network diagram is from which the identity follows.

Joint Measurement Upper Bound
The goal of this section is to prove the following upper bound for the sample complexity of classical shadows for pure states and joint measurements.
The proof of Theorem 11 is constructive; given B, ϵ, δ and d, we specify the number of samples and a pair of algorithms A meas and A est that solve the classical shadows task with that many samples.
In brief, the construction is as follows.We give a measurement M s on s copies of ρ, where the outcome of the measurement is a classical description of a pure state Ψ.We apply an affine transformation to the outcome Ψ to produce an unbiased "shadow" estimator: a unital Hermitian matrix ρ such that E[ρ] = ρ.Increasing the number of samples, s, suppresses the additive error ϵ and the failure probability δ by a factor of s −O (1) .To improve this to an inverse exponential suppression in the failure probability, we repeat the entire procedure k = O(log(δ −1 )) times and take the median of the batch estimates akin to the median of means method [11,12,6].A pseudocode description is given in Algorithms 1 and 2.
Algorithm 1 Algorithm for A meas of Theorem 11  We will be interested in the setting were ρ is pure2 and therefore ρ ⊗s is in the symmetric subspace, so we will never see the "fail" outcome-it exists solely to make the POVM sum/integrate to I.
One might be concerned that the standard symmetric joint measurement is constructed from the Haar measure, resulting in a continuum of outcomes.This is technically inconsistent with Definition 2 where the measurement must output a finite length bit string.However, it will turn out that our analysis (c.f.Theorem 11) only requires the states that appear in the POVM to form an (s+2)-design, where s is the number of samples jointly measured.That is, it suffices to replace the continuous POVM M s with a finite POVM {κ s+2) dψ where the p i ≥ 0 define a finite probability distribution.
For some perspective, consider the independent measurement setting in which s = 1.By the above observation, we require the measurement to form a 3-design.Since the set of multiqubit stabilizer states forms a 3-design, we recover the efficient measurement protocol of [6].That said, our measurements typically involve many copies of the state, resulting in a large s.In such cases, we must use much more complicated constructions of designs (see, e.g., [13,14,15]).Nevertheless, these constructions result in a finite POVM that can at least in principle be implemented with a projective measurement using poly(d, log(1/ϵ))-many ancillas [16].

Analysis
After defining the measurement, the estimator, and how many samples we need, the only remaining technical component is to bound the probability of failure.This ultimately comes down to Chebyshev's inequality: Hence, we need to calculate the mean and variance of the random variable Tr(O ρ).To be precise, let ρ be a pure state and suppose we measure ρ ⊗s with the standard symmetric joint measurement M s .Let Ψ be the density matrix random variable for |ψ⟩⟨ψ|, where ψ is the outcome of the measurement.Let's start with the mean: Lemma 13 (First moment).For measurement M s on pure state ρ ⊗s , we have Proof.To start, let's express the expectation as a Haar integral using the definition of M s : Using the identity A Tr(B) = Tr 2 (A ⊗ B) for all square matrices A and B, we can apply Lemma 10 to compute the integral above: We attack the right hand side by evaluating Tr −1 (W π (I ⊗ ρ ⊗s ) for each π.In particular, we will show that To do this, we take the cycle decomposition of π and analyze each cycle separately.Notice that any cycle not involving position 1 is completely traced out and the cycle operator acts on a tensor power of ρ only, so Fact 2 says the trace is Tr(ρ k ) = Tr(ρ) = 1 (since ρ is pure).Thus, only the cycle through position 1 matters.If π(1) = 1, then this cycle is trivial, and the result is I. Otherwise, the cycle visits k ≥ 1 copies of ρ, leading to the product ρ k = ρ.There are s! permutations which fix 1 (i.e., π(1) = 1) and hence s • s! which do not, so we conclude that as a sanity check.
We now turn to the variance calculation, which depends on the second moment of the estimator: Lemma 14 (Second moment).For measurement M s on pure state ρ ⊗s , we have where the partial trace Tr −1,2 now preserves the first two qudits.In Figure 1 and for the special case of s = 1, we show a complete derivation of how this trace simplifies using the tensor network notation, which may be useful to some readers before proceeding to the more general proof.
Let us evaluate the sum term-by-term.We divide the permutations into two types: those where 1 and 2 appear in separate cycles (type A) and those where 1 and 2 appear in the same cycle (type B).Consider the type A permutations first: As before, we end up getting I or ρ for each position, depending on whether 1 and 2 were fixed by the permutation.The combinatorics is similar but not identical: there are (s + 1)! permutations which fix 1, and of those, s! fix 2 and s • s! do not.Likewise, s • s! fix 2 but not 1.The remainder of the type A permutations fix neither 1 nor 2, and to count these we need a fact.In other words, there is a bijection between A and B permutations, so exactly half of all permutations ( 1 2 (s + 2)!) are type A, and half are type B. It follows that there are s(s−1)

2
• s! type A permutations such that π(1) ̸ = 1 and π(2) ̸ = 2. Therefore, the overall contribution of the type A permutations is equal to s! times Fortunately, we do not have to repeat this counting argument for the type B permutations.The bijection from Fact 3 decomposes each type B permutation π as (1 2)π ′ where π ′ is type A, and so Therefore, we can multiply our result for type A permutations by sym to get the total.The result follows from some careful accounting of the scalar factors.
Proof.Our goal will be to compute the mean and variance of the estimate Tr(O ρ) in order to apply Chebyshev's inequality.By Lemma 13, we have Putting aside the s −2 factor for now, the (squared) first moment term is For the second moment term, we use ] and Lemma 14 to write Recall that (2Π ) , so to simplify, consider the contribution of those two terms: In fact, because ρ is pure, we have3 that Tr((Oρ) 2 ) = Tr(Oρ) 2 .Combining all of the above, we arrive at a bound for the (scaled) variance of Tr(OΨ): where the last inequality uses Hölder's inequality.
Next consider the ∥O 2 0 ∥ term.We have that Tr(O) ≤ ∥O∥d, and so the largest eigenvalue (in absolute value) of O − Tr(O)I/d is at most 2∥O∥.We get Hence Var(Tr(O ρ)) ≤ 1 s 2 Tr(O) 2 + 8s∥O 2 ∥ , and the result follows by Chebyshev's theorem.
At last, we can prove the main theorem for this section.
Proof of Theorem 11.Consider an arbitrary observable O, and use Corollary 15 to bound the probability a single batch estimate ρ(i) is wrong by Suppose we want this probability to be less than some constant p < 1/2; we leave it to the reader to check that at most O(1/(ϵ 2 p) + B/(ϵ 2 p)) samples suffice, and note that s is chosen accordingly in Algorithm 1.
Recall that our final estimate E is Assume there are an odd number of batches, so the median is actually some Tr(O ρ(i) ).If E is a bad estimate, i.e., |E − Tr(Oρ)| > ϵ then at least k/2 of the batch estimates are wrong: either E and the estimates higher than it, or E and the estimates lower than it.
The batches are independent so Chernoff bounds the chance of seeing ≥ k/2 failures.
Setting this less than the failure probability δ, we have Again, we note that k is set accordingly in Algorithm 1.

Discussion
Let us compare this result with the original classical shadows protocol of Huang, Kueng, and Preskill [6].Their algorithm measures each copy of ρ with M 1 5 , producing unbiased singlecopy estimates ρ1 , . . ., ρs for ρ, which are then averaged into a batch estimate ρ = 1 s s i=1 ρi .Given the observable O, the estimate is then Tr(O ρ), or the median of several batches, if necessary to reduce the probability of failure.
We have just seen that the variance of a single-copy estimate is Var(Tr(O ρi )) ≤ Tr(O 2 ) + Tr(O 2 ρ), and averaging s estimates together reduces the variance by a factor of 1 s .On the other hand, our measurement with M s provides an unbiased estimate with variance Since Tr(O 2 ρ) ≤ 1, we see that the quadratic denominator of Tr(O 2 ) (which is the dominant term) is making all the difference.
4 Joint Measurement Lower Bound Notice that this bound matches the O(( √ Bϵ −1 + ϵ −2 ) log(δ −1 )) upper bound up to a log(B) and a log(1/δ) factor.We prove this as two separate lower bounds: Ω( log δ −1 ϵ 2 ) and Ω( The first lower bound (Ω(ϵ −2 log(δ −1 )) is derived via a reduction from the problem of distinguishing two pure states, ρ 0 and ρ 1 , at trace distance 2ϵ from each other.We then use the known performance of the optimal measurement (Helstrom measurement).
The second lower bound is shown via a reduction from a problem in communication complexity known as Boolean Hidden Matching [21].We will show that any protocol for the classical shadows task implies a protocol for the Boolean Hidden Matching problem, which has known communication complexity lower bounds.These communication lower bounds will imply that the classical shadow must contain a significant amount of information.However, Holevo's theorem gives an upper bound on the amount of information gained through measurement.Therefore, in order to successfully complete the classical shadows task, many copies of the unknown state are required.
4.1 Ω(ϵ −2 log(δ −1 )) lower bound The proof of the lower bound uses known results relating the trace distance between two states with our ability to distinguish the states by observables or binary measurements.In particular, the maximum gap for the expectation of a positive semi-definite observable is equal to the trace distance between the states: Lemma 17.For arbitrary states ρ and σ, Furthermore, there is an optimal

It follows that Tr(O
Since O + and O − are orthogonal, the rank of their sum, rank(ρ − σ), is the sum of their ranks.Hence, we can take whichever of O + and O − has rank at most 1  2 rank(ρ − σ).
Separately, we know the optimal measurement for distinguishing a uniformly randomly chosen ρ or σ is given by: Lemma 18 (Helstrom measurement [22]).The optimal measurement for distinguishing states ρ and σ succeeds with probability 1  2 Proof.Let ρ 0 and ρ 1 be pure states with trace distance 1 2 ∥ρ 0 − ρ 1 ∥ 1 = 2ϵ.We claim that a protocol for the classical shadows task to ϵ-approximate the expected values of observables of rank 1 with probability of failure at most δ can be used to distinguish states ρ 0 , ρ 1 with probability of failure at most δ.We can see this classical shadows protocol as a binary measurement distinguishing ρ 0 and ρ 1 .Since it succeeds with probability 1 − δ, the optimal distinguishing measurement from Lemma 18 must do better, so where we have used the equation 1 2 ∥ρ 0 − ρ 1 ∥ = 1 − Tr(ρ 0 ρ 1 ) relating trace distance and fidelity for pure states [23].Rearranging, we have (1 Taking logs and using 1 − 1 x ≤ ln x we have

Ω(ϵ −1 B/ log(B + 1)) lower bound
To prove this lower bound, we leverage the perspective that the classical shadows task is fundamentally a one-way communication problem-recall the setup of the classical shadows task (c.f., Section 1.1) where Melanie measures copies of an unknown state ρ and sends a classical message to Esteban that allows him to estimate the expectation of some observable O on ρ.Intuitively, measuring more copies of ρ means the message will contain more information about ρ.Conversely, if we can prove that Melanie's message must contain a lot of information, we can prove that she must have measured many copies of ρ.In other words, there is a tight correspondence between the sample complexity and one-way classical communication complexity of the classical shadows task.Formalizing this correspondence is somewhat tricky, so we leave the precise details for later (in particular, Section 4.2.1).However, once this correspondence is established, the high-level structure of the proof is relatively straightforward.
Our starting point is a one-way communication task called "Boolean Hidden Matching".As in the classical shadows task, there are two parties involved in the task: Alice and Bob.Alice has a labeled graph and Bob has a "partial matching" (a collection of vertex-disjoint edges from the graph).Together, these encode a secret bit6 .Bob doesn't know the labels on the graph, so Alice's goal is to send him a classical message so that he can extract the encoded bit.[21] shows a lower bound on the number of bits that Alice must send to be successful-namely, she must send Ω( n/α) bits where n is the number of vertices in Alice's graph and α is the fraction of edges in Bob's partial matching.Our goal will be to take this lower bound for Boolean Hidden Matching and turn it into a lower bound for the classical shadows task.To do this, we create an ensemble of states (corresponding to labeled graphs) and observables (corresponding to partial matchings) such that computing the expected value of an observable with a state solves the Boolean Hidden Matching problem for the corresponding graph and matching.In other words, if Alice and Bob want to solve the Boolean Hidden Matching problem, they can first create the corresponding states and observables, and then use a protocol for classical shadows.We give a depiction of this reduction in Figure 2.
To tie everything together, we appeal to the equivalence between the sample complexity of the classical shadows task and the one-way communication complexity.Namely, Alice must measure a number of copies of her state (roughly) proportional to the number of bits she wants to send Bob.Since we have a lower bound on the number of bits she must send Bob, we have a lower bound on the number of copies she must measure.This completes the proof.
This overall idea draws considerable inspiration from that in the work of Gosset and Smolin for compressing classical descriptions of quantum states [7].Our proof can be seen as generalization of their techniques.
The remainder of this section is devoted to formalizing the above ideas.Section 4.2.1 introduces one-way communication complexity, culminating in a powerful theorem connecting the number of bits exchanged in a communication protocol and the amount of Shannon information exchanged in the protocol.In Section 4.2.2, we give the formal definition of the Boolean Hidden Matching problem and fill in the missing details from the proof outline above.

One-way communication complexity
Because our proof is based on principles from communication complexity, let's briefly introduce that topic.We are interested in one-way communication protocols where two parties-Alice and Bob-are trying to jointly compute some function f : X × Y → {0, 1}.Alice is given some input x ∈ X and Bob is given some input y ∈ Y. Alice's goal is to send a single message m ∈ {0, 1} * to Bob, so that he can compute f (x, y).Of course, she could choose to send her entire input x, but in many cases it may be possible to communicate fewer bits and still be successful.
To be precise about the size of the message Alice must send, let X and Y be the random variables (possibly correlated) for the inputs of Alice and Bob, respectively, and let M be the random variable for Alice's message to Bob.Notice that implicit in M is Alice's communication strategy, which may be an arbitrary (randomized) function of her input.Let's start with the easiest setting, where Alice and Bob run deterministic algorithms.Definition 20 (Deterministic One-Way Communication Complexity).D (X,Y ) δ (f ), the boundederror deterministic one-way communication complexity of f , is the minimum number of bits that Alice must send to Bob to compute f with at most δ probability of error whenever their inputs are chosen according to the distribution (X, Y ).
A natural variant of classical one-way protocols is when Alice and Bob are allowed to run randomized algorithms.There are two settings: private-coin protocols, where Alice and Bob each have access to private random strings; and public-coins protocols, where Alice and Bob also have access to a shared random string along with their private strings.It will not be critical to completely understand the nuances of the various types of protocols for our proof, but we define them in order to precisely state the theorems on which the lower bound rests.Definition 21 (Randomized One-Way Communication Complexity).R δ (f ), the boundederror randomized one-way communication complexity of f , is the minimum number of bits that Alice must send to Bob with a public-coin protocol to compute f over all possible inputs with failure probability at most δ.
While randomized strategies may seem more powerful than deterministic strategies, Yao's minimax principle shows that there is always some input distribution for which the randomized and deterministic complexities coincide: Theorem 22 (Yao's minimax principle [24]).max (X,Y ) D It turns out that we will eventually be interested in the amount of information contained in Alice's message M , not just the length, which is what is measured by the communication complexity.That said, these two quantities are intuitively related-if the information I(M : X) that Alice's message M reveals about her input X is much lower than the number of bits she is communicating, she should be able to send a smaller message and still be successful.The following theorem formalizes this message compression idea: ] where the minimization is over all one-way private-coin protocols for f with input distribution (X, Y ) and probability of error at most δ.
In the next section, we will show that classical shadows must also contain a lot of information, which will be the basis of our lower bound.
Recall that our lower bound technique is to show that a classical shadows strategy with few samples implies a communication protocol for the Boolean Hidden Matching function with low complexity.To do this, let's look more closely at the BHM α function.
It will be useful describe the function directly as a communication problem with the inputs x ∈ X for Alice and y ∈ Y for Bob.Alice is given (0, 1)-assignments for the n vertices of a graph, and Bob is given (0, 1)-assignments to αn vertex-disjoint edges in the graph.A set of vertex-disjoint edges from a graph is called a matching, hence the name of the function.Importantly, the function is only defined on inputs for Alice and Bob that satisfy the following promise: for each edge in the matching, the parity of the connected vertices (from Alice's input) plus Bob's edge bit assignment is some constant b ∈ {0, 1}.The output of the function is then defined as this bit b.

Promise:
There In the setting where α is constant, [21] show that the quantum communication complexity of the Boolean Hidden Matching problem is low-Alice only needs to send a (log n)-qubit state.Gosset and Smolin [7] notice that this implies the existence of a set of states and observables whose expectation values give solutions to the Boolean Hidden Matching function.We generalize this observation to the non-constant α setting below: Theorem 25.There is a set of states {ρ x ∈ C n } x∈X and observables {O y ∈ Obs(αn)} y∈Y such that Tr(O y ρ x ) = 2α • BHM α (x, y).Furthermore, a protocol for the classical shadows task for observables of squared Frobenius norm B := αn, estimation accuracy ϵ := α, and failure probability δ implies a one-way private-coin protocol for Boolean Hidden Matching with failure probability δ.
Proof.Given valid inputs x and y = (M, w) to the BHM α function, define the pure state and the observable Notice that O y ∈ Obs(αn) since O y is a αn-rank projector.Letting ρ x := |ψ x ⟩⟨ψ x |, we get In particular, this implies that if E is an α-approximation to Tr(O y ρ x ), then or in other words, rounding E/(2α) is equal to BHM α (x, y).
We now claim that the existence of these states and observables implies a private-coin one-way protocol for the Boolean Hidden Matching problem (see Figure 2): Suppose we want a protocol for BHM α with probability of failure at most δ.Let s = Shadows(αn, α, δ).On input x, Alice prepares the state ρ ⊗s x , measures it with a valid classical shadows strategy, and sends the resulting classical shadow to Bob.On input y, Bob computes the observable O y , and then computes an estimate E for Tr(O y ρ x ) using the classical shadow sent by Alice.The correctness of the classical shadows strategy implies that E is an α-approximation to Tr(O y ρ x ) with probability of failure at most δ.As shown above, Bob can then compute BHM α (x, y) with failure probability at most δ by appropriately rounding the estimate.
Let us now note a key property of the one-way protocol in Theorem 25 for the Boolean Hidden Matching problem.Namely, once Alice prepares ρ ⊗s x , she no longer uses her original input x.Her message (the classical shadow) only depends on her measurement of the state ρ ⊗s x .In particular, if her message is to contain a lot of information about her input x, then Holevo's theorem stipulates that she must be measuring a state of high dimension, or, in other words, s must be large: Theorem 26 (Holevo [26]).Let Z be the classical outcome of measuring a d-dimensional state drawn from an ensemble {ρ x } x∈X according to x ∼ X.Then, I(X : Z) ≤ log d.

Naïvely, the states ρ ⊗s
x in Theorem 25 consist of s qudits of dimension n, i.e., it is a space of dimension n s .However, since each ρ ⊗s x is invariant under permutation, it belongs to the symmetric subspace, which has dimension nearly a factor of s! smaller (see Fact 1).
We are now ready to put all of the pieces of the lower bound together: Proof.Using the communication complexity of BHM α as our starting point, we first show that Alice's message to Bob in every successful protocol for the Boolean Hidden Matching problem must contain a significant amount of information.To show this, note that by Yao's minimax principle (Theorem 22), there exists a distribution (X, for any constant δ.It follows that I(M : X) = Ω( n/α) for any one-way private-coin protocol for BHM α .Now consider the classical shadows strategy for solving BHM α as described by Theorem 25, and suppose that Alice measures s = Shadows(αn, α) copies of ρ x . 7Recall that Alice's message M depends only on her measurement of ρ ⊗s X which has classical outcome Z.We get where we have used (in order) the Data Processing Inequality, Holevo's theorem (Theorem 26), the dimension of the symmetric subspace (Fact 1), and the following inequality: Notice that we now have both an upper bound and a lower bound for the mutual information between Alice's input and her message for a one-way protocol for BHM α with constant error probability.Setting ϵ := α and B := ϵn, we have and .
Notice that if we substitute any lower bound for s in the RHS of the equation above, then we get a new lower bound for s on the LHS.Unfortunately, plugging in the trivial lower bound (s ≥ 1) is not very tight.Instead, we will use the s = Ω(1/ϵ 2 ) lower bound from the previous section.To justify this, notice that where we have used Theorem 19 and the fact that B ≥ 1 since ∥O∥ = 1.Therefore, we can plug s = Ω(1/ϵ 2 ) into the RHS above to arrive at the following: .
Assuming ϵ ≤ 1, we simplify this to s = Ω( √ B ϵ log(B+1) ).8 Finally, we point out that the construction of Theorem 25 operates in the regime where the states have dimension d := n and the observables are of rank B = ϵd.One can extend the lower bound to apply to all observables of rank B ≤ ϵd by embedding the states used in Theorem 25 into a subspace of dimension n ≤ d and keeping the observables the same.

Independent Measurement Upper Bound
Since the global Clifford group acting on qubits is a 3-design, the randomized Clifford measurement classical shadows algorithm of Huang, Kueng, and Preskill [6] can be viewed as simulating independent M 1 measurements on all copies of ρ then, constructing an unbiased estimator from the measurement outcome on each copy.Their result is for independent measurements and general mixed states, but it upper bounds pure states as a special case.
Theorem 28 (Huang,Kueng,Preskill [6]).For all ϵ, δ > 0, Huang, Kueng, and Preskill also show a matching lower bound, but the hard instances they construct are with states of full rank.We give an independent measurement classical shadows algorithm for pure states which is better in certain parameter regimes (and is no worse).
Theorem 29.For all ϵ, δ > 0, For example, consider the parameter regime in which δ is a constant, B = d, and any ϵ = o (1).Note that this encompasses natural settings such as estimating full-weight Paulis.One can check that (as d grows) the sample complexity given by Theorem 29 is O(d/ϵ + 1/ϵ 2 ), which is evidently less than O(d/ϵ 2 ), the sample complexity of the Huang-Kueng-Preskill protocol.In general, our approach gives lower sample complexity whenever ϵ = o( d/B) and B = ω (1).
As it turns out, our measurement algorithm is the same as the one in [6]-on each copy of ρ, we make an independent measurement with the POVM M 1 , which (on multi-qubit systems) can be performed with a random Clifford measurement since we only use third moments of the Haar measure.The difference is in how we construct our estimator for the unknown state.To see this, first let Ψ 1 , . . ., Ψ s be the Hermitian random variables for the measurement outcomes.Using Lemma 13, notice that ρi := (d + 1)Ψ i − I is an unbiased estimator for the unknown state, i.e., E[ρ i ] = ρ.The average of the ρi 's, i.e., ρi is effectively the Huang-Kueng-Preskill estimator.Our key observation is that when ρ is pure, ρi ρj is also an unbiased estimator of ρ: where we have used the independence of the measurements for the first equality and the purity of ρ for the last.In light of this, we consider an estimator Ŷ defined to be the average of the s(s − 1) quadratic terms where i ̸ = j.
To analyze the accuracy of the estimator Ŷ , we will once again turn to Chebyshev's inequality: Expanding out the variance term using the definition of Ŷ , we get We need to bound all of these covariance terms to bound the variance.When all indices i, j, k, ℓ are distinct, then the covariance is 0 (by independence).For the other four cases, we rely on corollaries 35, 36, 37, and 38, which we summarize in the following lemma (proof in Appendix B): • Order swapped (i = ℓ and j = k): O(Bd) • Same order (i = j and k = ℓ): O(Bd) Since ∥O∥ 2 ≤ 1, the contribution from the first two terms is extremely small compared to the last term.This gives us the following bound on the variance: Using Lemma 30, we account for the contribution of each type of covariance term to get where we have used that ∥O∥ 2 ≤ 1 and that there are O(s 4 ) terms where i, j, k, ℓ are distinct; O(s 3 ) terms where exactly one index matches; and O(s 2 ) terms where both indices match.
Putting everything together, we can now prove the claimed sample complexity in Theorem 29.
Proof of Theorem 29.We first point out that the sample complexity of our new estimator is only better (or at least no worse) when ϵ ≤ B/d, so when that does not hold we simply use the original X estimator of Huang, Kueng, and Preskill. 9therwise, we use Algorithm 3 to measure the state and construct several Ŷ estimators (line 9, constituting the classical shadow.This shadow is then used in Algorithm 2 for the observable estimation step, which once again uses the median-of-means method where the analysis will be identical to that in the proof of Theorem 11. It suffices to analyze the variance of the estimator constructed within each batch, which is O Bd/s 2 + 1/s by Lemma 31.To apply Chebyshev's inequality, we need the variance to be at most ϵ 2 , which occurs when we have at least s = O( √ Bd/ϵ + 1/ϵ 2 ) samples.
We simulate the new quadratic estimator as shown in Figure 3.The plots show the empirical variances of the linear and quadratic estimators in a regime where the target observable has large Frobenius norm, namely B = d.For the linear estimator, one expects that the variance should decrease linearly with the number of samples.For the quadratic estimator, the variance is O(Bd/s 2 + 1/s) by Lemma 31.Therefore, whenever Bd/s 2 dominates 1/s, the variance should decrease quadratically in the number of samples.Since the plots are shown on a log-log scale, this should result in a slope of −2.We see this scaling in the graph shown on the right since d is large (the slope of the regression for the linear estimator is −.998 and the slope for the quadratic estimator is −1.936).However, for the left graph Bd = d 2 is only 64, so we expect that the variance for the quadratic estimator to scale linearly after about s = 64 copies.Indeed, one can observe that the lines for the linear and quadratic estimators are essentially parallel after that point.As a final observation, we note that in both graphs the quadratic estimator becomes better than the linear estimator at the point where the variance becomes less than 1.Since the estimate is only useful once the variance is less than 1, one could interpret this as conveying the fact that the quadratic estimator is always better than the linear estimator in this particular parameter regime.

Open Problems
Our upper and lower bounds almost completely settle the question of sample complexity for the classical shadows task with arbitrary measurements and arbitrary observables, but clearly many questions remain.First, can the remaining discrepancies between our upper and lower bounds be removed?That is, can we get the lower bounds to have the correct dependence on δ (which we conjecture to be log(δ −1 )) and remove a log B factor?
Second, the sample complexity of learning states in the context of tomography is known for all combinations of independent vs. joint measurements, and pure vs. mixed states.Can we characterize the sample complexity of classical shadows as thoroughly?Ref. [6] gives matching bounds for independent measurements and mixed states, and our result gives matching bounds for joint measurements on pure states, but the independent/pure and joint/mixed cases are open.At the very least, we know that in the independent measurement setting that the pure   [6]) and our quadratic estimator ( Ŷ , defined in Section 5) for the classical shadows task using independent measurements.The data confirm our variance calculation in Lemma 31: O( Bd s 2 + 1 s ).Specifically, the quadratic estimator variance scales inverse-quadratically in s when Bd/s 2 dominates 1/s (a slope on the graph of −2), and the linear estimator scales inverse-linearly in s (a slope of −1).Each point in the graphs represent an empirical variance of 10 4 trials of the following procedure: generate a random pure state ρ of dimension d = 2 n and a random full-weight Pauli operator P on n qubits; independently measure s copies of ρ in a random basis to obtain outcomes ψ 1 , . . ., ψ s ; compute i̸ =j ρ i ρ j ; output error from true expectation value: Tr(P x) − Tr(P ρ) and Tr(P y) − Tr(P ρ).
See main text for detailed explanation of the slope of the lines.
state case is different than the mixed state setting since our upper bound in Theorem 29 is smaller than the lower bound in Ref. [6] for some regimes.
As a follow up, tomography bounds are sometimes stated as a function of r, the rank of the unknown state.For example, the sample complexity is Θ(rdϵ −1 ) for joint measurements, capturing the pure state (r = 1) and worst-case mixed state (r = d) behaviour simultaneously.We believe the sample complexity of the classical shadows task depends smoothly on r, but do not yet have a conjecture.
Third, we do not describe how to concretely implement our large joint measurements.We may replace the continuum of Haar random pure states and elements A ψ in the POVM M s with a concrete (s + 2)-design, but even then it is not clear how to practically implement the measurement.Alternatively, can we make a similar POVM with a simpler ensemble of states, e.g., t-designs for some t < s + 2, and update the analysis to achieve an equivalent end result?We note that for our exact measurement, a lower bound on t can be computed by using upper bounds on the number of bits required to describe a state t-design.To see this, notice that the outcome of the POVM M s suffices as a compressed description of the state for the purpose of observable estimation.Since we proved a state compression lower bound of Ω( √ Bϵ −1 )-many bits, the number of bits required to specify the state must be at least this large.In particular, this implies that you could not implement our measurement on n qubits with a 3-design since Clifford states can be specified using O(n 2 )-many bits.
Fourth, there is the question of robustness to error or noise.For any classical shadows protocol, we can ask how it behaves when the samples are not exactly of the form ρ ⊗s , due to variation in samples.For our pure state protocols, we are also interested in how fast our algorithms degrade when given mixed states that are close to pure.
Fifth, Ref. [6] introduced two classical shadows protocols known as the random Pauli measurements and the random Clifford measurements schemes.The former schemes targets local observable.The latter scheme, like our protocols, targets observables with low Frobenius norm.These target classes of observables are mutually exclusive and each scheme achieves lower sample complexity with respect to its target class observable.Recent work [27,28] has also focused on the development of an intermediate scheme that achieves favourable sample complexity scaling in both target classes of observables.All these works focus on general states and independent measurements.Our work identifies an optimal protocol for the low Frobenius norm class of observable in the setting of pure states and joint measurements.So it is natural to consider the pure states and/or joint measurements setting in the context of local observables or the combined class.
Finally, one may consider cubic or higher order generalizations of the quadratic estimator used in the proof of Theorem 29.We leave the analysis of such estimators to future work.
In this section, we show that any proper learning algorithm for classical shadows would require significantly more samples.Our starting point is a known lower bound for the quantum state tomography question for pure states: Theorem 32 ( [1]).Any quantum algorithm that takes copies of an unknown pure state ρ and outputs a classical estimate ρ such that ∥ρ − ρ∥ 1 ≤ ϵ with constant failure probability δ < 1 requires Ω(dϵ −2 / log(d/ϵ)) samples.
In particular, we show that a proper classical shadows algorithm implies a state tomography algorithm.
Theorem 33.Suppose there exists a quantum learning algorithm that, given s copies of an unknown d-dimensional state ρ, outputs a classical description of a trace 1, Hermitian PSD matrix ρ such that, for all O ∈ Obs(1) with failure probability δ < 1: Then, s = Ω(d/ϵ).
Hence, if we solve the classical shadows task to error ϵ on this observable, then we have estimated ρ to within Schatten 1-norm distance O( √ ϵ).That is, classical shadows learner is also a quantum tomography algorithm, so we can use the known lower bound from Theorem 32.It follows that a proper learning algorithm for the classical shadows task requires Ω(d/ϵ) samples.
In the commonly considered regime where d ≫ ϵ −1 , the above lower bound is significantly more than both our algorithm (from Section 3) and the original classical shadows algorithm, which use only O 1 ϵ 2 samples.

B Covariance bounds
The goal of this subsection is to prove Lemma 30, which gives each of the covariance terms Cov(Tr(O ρi ρj ), Tr(O ρk ρℓ )) that appears in the expansion of the estimator Ŷ .Because ρj appears twice in all of the covariance terms that are non-zero, it will be convenient to explicitly calculate the second moment of ρj .
Lemma 34.For all j, the second moment of ρj is Proof.Recall that ρj is obtained through an independent and identical measurement process, so it suffices to analyze a specific ρ := (d + 1)Ψ − I term.By Lemma 13 and Lemma 14, we compute the first and second moment of Ψ for the special case s = 1 as Now, expanding out the second moment, we get where we recall that W (1)(2) = I ⊗ I to obtain the last equality.We arrive at the lemma by writing the final line in terms of the symmetric subspace Π Our goal will now be to express all covariance terms in a manner such that we can apply Lemma 34.Since these equations can become quite cumbersome to write out fully, we will often drop the "⊗" symbol in expressions with I and ρ.For example, will be a common abbreviation.We enclose these abbreviations in parentheses when they are multiplied with other terms.
Let's first tackle the covariance terms Cov(Tr(O ρi ρj ), Tr(O ρk ρℓ )) where there is only 1 index shared, i.e., (|{i, j}∩{k, ℓ}| = 1).There are two subcases: a match in different positions (i = ℓ or j = k); or a match in same position (i = k or j = ℓ).While the proofs are quite similar, we break them into two separate corollaries.where the last equality uses that O, ρ j , ρ k are Hermitian.The second moment can be further decomposed using independence of ρi , ρj , and ρk .

E[Tr(O ρi ρj ) Tr(O ρk ρj
where we have once again use the purity of ρ.Plugging everything in, we get Corollary 36.For all distinct i, j, k, Proof.The proof is similar to that of Corollary 35.We expand the covariance as and compute the second moment term using independence: We now turn to the covariance terms Cov(Tr(O ρi ρj ), Tr(O ρk ρℓ )) which share two indices, i.e., (|{i, j} ∩ {k, ℓ}| = 2).Once again, there are two subcases: the order is swapped (i = ℓ and j = k); or the order is the same (i = j and k = ℓ).
In both cases, their are terms of the covariance that are proportional to Tr(O), but interestingly, we cannot assume O is traceless as we have earlier.This is due to the fact that the Ŷ estimator does not necessarily have trace 1.Nevertheless, it will turn out that this cannot affect the overall covariance of Ŷ too much, as we will show in the following corollaries.Using Lemma 34, we get an expression for the second moment terms: E[ρ i ⊗ ρi ]E[ρ j ⊗ ρj ] = (II + ρI + Iρ) 2 (W (1 2) − 2Π (2)  sym /(d + 2)) 2 = (II + 3Iρ + 3ρI + 2ρρ)(W (1)(2) − 4Π (2)  sym /(d + 2) + 4Π (2)  sym /(d + 2) 2 ) = (II + 3Iρ + 3ρI + 2ρρ)(W (1)(2) − 4(d + 1)Π (2)  sym /(d + 2) 2 ).Each expectation is a difference of two W π permutation terms.Therefore, computing the product of the two expectations, we get four W π terms.It will turn out that the dominant one is the following, where we have taken the W (1)(2 3) term for ρi and the W (1 2)(3) term from ρj (to visualize the largest contribution from this term, we refer to tensor network picture in Figure 4).Notice that these last two terms are non-negative, and so multiplying them by −(d + 1)/(d+2) 2 makes them non-positive.Since we want to give an upper bound on the covariance, these terms can be dropped.Altogether, and inserting the appropriate constants, we get the following upper bound on the covariance Cov

Figure 1 :
Figure 1: Second moment calculation in the special case s = 1.

Figure 2 :
Figure 2: Protocol for Boolean Hidden Matching using classical shadows subroutines (shown in blue).From her input x, Alice prepares ρ ⊗s x , measures, and sends the classical shadow to Bob.From his input y, Bob computes O y and then estimates Tr(O y ρ x ) to accuracy α from the classical shadow that Alice sent to him.He then uses that estimate to answer the Boolean Hidden Matching problem.Correctness follows from the fact that Tr(O y ρ x ) = 2α • BHM α (x, y).For the details of ρ x and O y see Theorem 25.
for any given δ is the probability of error.Theorem 23 lets us upper bound the (deterministic) complexity with mutual information, and Theorem 24 proves a lower bound.Thus, D (X,Y ) δ (BHM α ) = O(min I(M : X) + 1), and

Figure 3 :
Figure3: Empirical variances of the linear estimator ( X, the estimator of[6]) and our quadratic estimator ( Ŷ , defined in Section 5) for the classical shadows task using independent measurements.The data confirm our variance calculation in Lemma 31: O( Bd s 2 + 1 s ).Specifically, the quadratic estimator variance scales inverse-quadratically in s when Bd/s 2 dominates 1/s (a slope on the graph of −2), and the linear estimator scales inverse-linearly in s (a slope of −1).Each point in the graphs represent an empirical variance of 10 4 trials of the following procedure: generate a random pure state ρ of dimension d = 2 n and a random full-weight Pauli operator P on n qubits; independently measure s copies of ρ in a random basis to obtain outcomes ψ 1 , . . ., ψ s ; compute ρ i = (d + 1)ψ i − I for all i ∈ [s]; compute estimates x = 1