Estimating expectation values using approximate quantum states

We introduce an approximate description of an $N$-qubit state, which contains sufficient information to estimate the expectation value of any observable to a precision that is upper bounded by the ratio of a suitably-defined seminorm of the observable to the square root of the number of the system's identical preparations $M$, with no explicit dependence on $N$. We describe an operational procedure for constructing the approximate description of the state that requires, besides the quantum state preparation, only single-qubit rotations followed by single-qubit measurements. We show that following this procedure, the cardinality of the resulting description of the state grows as $3MN$. We test the proposed method on Rigetti's quantum processor unit with 12, 16 and 25 qubits for random states and random observables, and find an excellent agreement with the theory, despite experimental errors.

We introduce an approximate description of an N -qubit state, which contains sufficient information to estimate the expectation value of any observable to a precision that is upper bounded by the ratio of a suitably-defined seminorm of the observable to the square root of the number of the system's identical preparations M , with no explicit dependence on N . We describe an operational procedure for constructing the approximate description of the state that requires, besides the quantum state preparation, only single-qubit rotations followed by single-qubit measurements. We show that following this procedure, the cardinality of the resulting description of the state grows as 3M N . We test the proposed method on Rigetti's quantum processor unit with 12, 16 and 25 qubits for random states and random observables, and find an excellent agreement with the theory, despite experimental errors.

I. INTRODUCTION
Quantum density operators represent our knowledge of the state of quantum systems and give us a way to calculate expectation values and predict experimental results via the Born rule. Our knowledge of a system's density operator is equivalent to our ability to calculate the expectation value of any observable of the system. However, for an N -body system, reconstructing the system's density operator requires the knowledge of exponentially many numbers in N . This in turn translates into practical difficulties in estimating and storing density operators, in the form of density matrices, even for systems composed of just a few tens of qubits.
Over the last few years, there has been a surge of interest in devising protocols to reduce the resources required to estimate and classically store quantum states [1][2][3][4][5][6][7][8][9]. There have been numerous approaches to reduce the cost of performing quantum state tomography by assuming that the state of the system has a specific structure. These approaches include, for example: matrix product states tomography [3], where the state is assumed to have an efficient matrix product state representation as a matrix; neural network tomography [6,7], where the state is assumed to have an efficient deep neural network representation; and compressed sensing tomography [4,5], where the state is assumed to have a low-rank density matrix representation. The theory of Probably Approximately Correct (PAC) learning was utilised in [2] to show that if, instead of full tomography, we are interested in approximating with high probability the expectation values of observables drawn from a distribution, then only polynomially-many samples in N of the observables' expectation values are required. It was later also shown [8] that estimating with high probability the expectation values of a set of observables can be achieved using only * mpaini@rigetti.com † amirk@isi.edu polynomially-many copies in N of the system. As for storage, it is well established [1] that quantum states that can be represented as low bond-dimension matrix product states can be efficiently stored on classical computers.
Recently it was shown [9] that a general quantum state of N qubits can be approximately described using its inner product with an order of √ 2 N stabiliser states. This result thus provides a polynomial improvement for classically storing quantum states over a brute-force approach of storing density matrices. In Sec. IV we discuss these and other works in connection to the results presented in this paper.
The growth of the number of copies of the system in N for estimation and the considerable requirements in N for classically storing the information associated with a generic quantum state lead to the question of whether a protocol that significantly reduces or eliminates the dependence on N is admissible. In this work, we give an affirmative answer to the question. Instead of focusing on the estimation and storage of the density matrix, we start by considering the equivalent problem of calculating the expectation value of any observable of the system up to a certain precision. In this article we show that, for a system of N qubits, the expectation value of any observable can be calculated with arbitrary precision with a number of copies of the system independent of N , but dependent on a properly-defined seminorm of the observable. The information obtained through the measurements on the copies of the system will be used to form an approximate description of the quantum state -a description with a cardinality that grows linearly in N and thus can be constructed and stored for large systems. The method for estimating expectation values, and the exact definition of an observable seminorm and of the storage approximate quantum state will be given in the next section.

II. EXPECTATION VALUE ESTIMATION AND THE APPROXIMATE QUANTUM STATE
We will consider specifically a system of N qubits. However, the considerations that will be made are applicable in general to other systems, with finite and infinitedimensional Hilbert spaces, since they are based on the general theory of irreducible representations of groups which provide explicit forms for a basis in the space of linear bounded operators defined on a Hilbert space. We refer the reader to Refs. [10,11] where these considerations have been developed.
We start by presenting the method to estimate the expectation value of any observable and notion of the approximate state for a single-qubit system. We then generalise these ideas for the more general N -qubit case.

A. One qubit
A single-qubit density operator can be written as [10,11] with the unitary spherical surface Σ as integration domain, n = (cos ϕ sin ϑ, sin ϕ sin ϑ, cos ϑ), with ϑ ∈ [0, π], ϕ ∈ [0, 2π), is a unit vector in R 3 , p(m, n) is the probability of obtaining the result m ∈ {−1, +1} when measuring the qubit in the eigenbasis of σ · n, where σ = (σ x , σ y , σ z ) is a vector of the Pauli operators, and K 1 (m, n) is the 1-qubit kernel operator given by see App. A for more details. Let us define a single-qubit (unbiased) estimator function where the · represents the argument of the function R 1 , i.e. a single-qubit observable (for a lighter notation we sometime omit the explicit dependence of the estimator on (m, n)). In particular, the expectation value of the Pauli operator σ i , with i = 0, 1, 2, 3 and σ 0 := 1 σ 1 := σ x , σ 2 := σ y , and σ 3 := σ z , can be written in terms of Calculating the estimator of the Pauli operators we arrive at The estimator R 1 [σ α ](m, n) is a random variable due to its dependence on the stochastic measurement outcome m. Hereafter, unless noted otherwise, we use the subscript α to index x, y, z and subscript i to index 0, 1, 2, 3.
In App. A we show that the variance of the estimator R 1 [σ α ] is upper bounded by 3, Since R 1 [1](m, n) is independent of m, it is deterministic and thus has a null variance. Equation (4) implies that we can construct an estimate of σ i using the following Monte Carlo sampling procedure: The estimate of σ i , which we denote by σ i M , is simply given by the average of corresponding estimator R 1 [σ i ](m, n) over the data set: where n j = (cos ϕ j sin ϑ j , sin ϕ j sin ϑ j , cos ϑ j ) is the unit vector sampled at the jth iteration and m j is the corresponding measurement outcome. By construction, in the limit M →∞, σ i M → σ i . The value of M determines the precision of the estimation. For M sufficiently large, σ i M has a normal statistical error, which due to Eq. (7) is smaller than or equal to 3/M . Note that the estimate 1 M = 1 for any value of M and it has a zero variance. The measurement in the eigenbasis of σ · n, at step 2 above, can be implemented by first applying a unitary operator U , that rotates this basis to the eigenbasis of σ z , and then measuring in the eigenbasis σ z (the computational basis). This unitary operator is given by U = e i ϑ 2 σ· n ⊥ , where n ⊥ = (− sin ϕ, cos ϕ, 0). We call the vector s j = (m j , ϑ j , ϕ j ) the j-th snapshot of the qubit system. We refer to the collection of the M snapshots, S = {s j } M j=1 , that we obtain following the sampling procedure above, as the approximate quantum state, as it enables us to estimate the expectation value of any singlequbit observable to a certain precision as we now describe.
Let O be a single-qubit observable, expanded in the Pauli basis Given the estimated expectation values of {σ i } 3 i=0 , by linearity of the estimator function (3), we can construct an estimate of O as We prove in App. A that for M sufficiently large, O M has a normal statistical error smaller than or equal to where Std stands for 'standard deviation' and is a seminorm of O that depends on its expansion in the Pauli operator basis.

B. N qubits
The method developed for one qubit in the previous subsection is easily generalised to N qubits, in the assumptions that the qubits are distinguishable and that a qubit-specific measurement capability is available. The space of the N -qubit system is the tensor product of the single-qubit spaces and thus we can write a general Nqubit density operator as [10,11] (13) where {m, n} is a shorthand for (m 1 , n 1 , . . . , m N , n N ), the N -qubit kernel operator K({m, n}) equals to the tensor product of single-qubit kernels and p({m, n}) represents the joint probability of finding the outcome (m 1 , . . . , m N ) ∈ {−1, 1} N when measuring the qubits in the product eigenbasis of σ 1 · n 1 ⊗ · · · ⊗ σ N · n N . Similarly to the single qubit case, this measurement can be implemented with single-qubit rotations followed by a measurement in the computational basis on each qubit. We define the N -qubit (unbiased) estimator function As a result of the tensor product structure of the kernel operator, , is a monomial of Pauli operators (a 'Pauli monomial' or a 'Pauli string' for short), and hereafter i k = 0, . . . , 3, is simply the product of the corresponding single-qubit random variables, where R 1 [σ i k ](m kj , n kj ) is given in Eqs. (5) and (6).
Similarly to the single qubit case of the previous section, we can write the expectation value of P i with respect to the state of the system in terms of R[P i ] as Equation (17) implies that an estimate P i M of P i can be calculated by performing the single-qubit tomographic procedure introduced in the above section, in parallel on all of the N qubits, as in Fig. 1. We call the vector s j = (m 1j , n 1j , . . . , m N j , n N j ) the jth snapshot of a system of N qubits, and we refer to a set S of such M independent snapshots as the approximate state of the N -qubit system.
The use of single-qubit Haar-random basis measurements was recently proposed, and experimentally implemented, to probe certain properties of an N -qubit system, such as purity [12,13] and the Frobenius inner product of two states [14]. In light of these results, as we now show, the measurement protocol we introduced above is a universal protocol for estimating the expectation value of any N -qubit observable. The protocol is efficient in terms of time and space resources, as long as the observable admits an efficient representation in the Pauli basis and has a small seminorm (e.g. not growing exponentially with N ).
To start with, we can use the approximate state S to calculate an estimate P i M of P i as where the first equality follows from Eq. (16). We show in App. B that for M sufficiently large, P i M (i = 0 v . = (0, . . . , 0)) has a normal statistical error that is smaller than or equal to 3 ri /M , where r i represents the weight of P i , i.e., the number of qubits on which it acts nontrivially. Note that the estimated expectation value for the N -qubit identity operator 1 ⊗N M = 1 and it has a zero variance.
With the ability to efficiently calculate the estimated expectation value of any given Pauli monomial operator, we can estimate the expectation value of N -qubit observables. Let O be an N -qubit observable such that From linearity of the estimator function (15) and from Eq. (18), the estimated expectation value of O is given by We note by passing that as a result of the tensor product structure of the kernel operator, Eq. (14), the estimate O M , when O is tensor product of qubit-subsystems observables, is simply given by We show in App. B that the standard deviation of O M has the upper bound with the seminorm of O defined as where the sum is extended to all values of i and j except 0 v . = (0, 0, . . . , 0), r ij is the number of indices k for which i k = 0 ∧ j k = 0, ∆ ij = 0 if there exists a k such that i k = 0∧j k = 0∧i k = j k and ∆ ij = 1 otherwise. Equation (23) reduces to Eq. (12) when O is an observable on a singlequbit space.
Importantly, in this procedure, the expectation values estimation error has no global dependence on the system size N . The error only depends on the observable seminorm and the number of snapshots. One could argue that the presence in the seminorm of 3 rij reflects an exponential growth of the estimates' variance in N , as, for example, for an observable given by the tensor product of N Pauli operators, which has seminorm 3 N/2 . However, the seminorm is a property of the observable and not of the dimension of the space: an N -qubit observable O and its extension to a larger (N +L)-qubit space, O ⊗ 1 ⊗L , have the same seminorm (23), independent of L. Irrespective of N , because of Eq. (22), the expectation values of all observables with unit seminorm can be obtained with statistical error bounded by 1/ √ M . Any observable with seminorm different from zero can be obtained from an observable with unit seminorm through Prepare the -qubit state: Measure in random directions: multiplication by a real number, representing in fact the observable's seminorm. For a given number of identical preparations and measurements, the statistical error in the expectation value of the generic observable has a bound only dependent on how "large" the observable is. In App. B, we also explain how the seminorm O 2 , defined by divided by √ M , with r i representing the weight of P i , is generally a good approximation for the standard deviation of O M . This result may be important for applications of the method since O 2 can be significantly smaller than O and since, for a chosen statistical error, the number of identical preparations required is inversely proportional to the variance of O M . See App. C for further details.
We close this part with a remark on the notion of the approximate quantum state, S. We have shown that the approximate description of the state, as we have defined it, comes with an explicit operational procedure to estimate the expectation value of any generic observable with a rigorous bound on the estimation statistical error. Therefore, the approximate state together with the rule (20) provides a description of a physical system in terms of statistical inferences based on observed experimental data and, in principle, could be assumed as primitive concepts instead, or, at least, as not requiring the existence of density operators and the Born rule.

III. EXPERIMENTAL RESULTS
As a proof-of-concept, we have tested the theory on Rigetti Computing quantum processor unit (QPU), Aspen7-28Q-A, with three systems, involving N = 12, 16 and 25 qubits, respectively. In each experiment the system is prepared in a random state. The state preparation circuit is composed of two layers of gates applied to the computational state |0 ⊗N . In the first layer, the gates are chosen at random from a flat distribution over the single-qubit gate set {X, Y, Z, H, S, T } and applied to N/2 qubits, chosen at random. While in the second layer the two-qubit gate XY (α) = e −iα(X⊗X+Y ⊗Y ) is applied to 50% of pairs of qubits, chosen at random, with different values of α sampled uniformly from [0, 2π). This gate composition results in shallow quantum circuits with N/2 random single-qubit gates and N/4 random two-qubit gates, to ensure that the experimentally prepared state has a high fidelity with the target state.
For each system, an approximate description of the prepared state was obtained using M = 10 4 snapshots. The approximate state was then used to estimate the expectation value of two sets of observables: (1) 20 random observables, (2) 20 random projectors onto computational basis states. Each random observable in the first set is a linear combination of 20 Pauli monomials. Each Pauli monomial is constructed by randomly sampling from the set {σ i } 3 i=0 for each qubit, whereas the linear combination coefficients (the a i 's) are sampled uniformly from the (0, 1]. The projectors onto computational basis states were sampled uniformly from the set {0, 1} N . For simplicity in the analysis, the 20 random observables are normalised to have a unit seminorm (the projectors already have unit seminorm with probability exponentially close to 1 as a function of N , see App. C). Figure 2 shows the results for the 25-qubit experiment. In this figure, we compare the estimated expectation values of random observables and of random computational basis projectors to their expectation values with respect to the 25-qubit target state. The results are plotted as a function of the number of snapshots. Table I summarises the experimental results of the 12-, 16-and 25-qubit systems for 10 4 snapshots. The figure and the table show that the experimental results generally mirror the theory, with the fraction of estimates within one and two standard deviations from their true values not far from 68% and 95% respectively. Other than statistical considerations, the cases with a fraction of the estimates greater than the expected percentages from a normal distribution can be explained in terms of how tightly the observables' seminorms bound the standard deviation of the estimators, in particular for the case of the projectors.
To complete the picture, including an explanation for the cases of a lower fraction of the estimates than predicted, we need to start examining the effects of hardware errors. It is worth noting that we obtain agreement with the theory despite an order of p err = 5% (averaged) readout error per qubit, which amounts to flipping the mea- surement outcome from +1 to −1 and vice versa. The resilience of our protocol to single-qubit readout errors can be attributed to the randomness of the measurement bases, as was discussed in [15]. Moreover, the following considerations provide additional insights in the robustness of the estimation protocol to readout errors. Firstly, let us consider the estimation of the expectation value of single-qubit observable. Let S be the approximate state description composed of M snapshots. Suppose that in M err of the snapshots we obtained the erroneous measurement outcome, M err /M ≈ p err . The estimation of the expectation value of any single-qubit Pauli operator σ α (α = x, y, z) is where m err j is the erroneous outcome. Since m err j = −m j we arrive at which, for sufficiently large M and M err , can be approximated by Therefore, for a single-qubit observable O = i a i σ i we arrive at Thus, the readout error introduces an attenuator (1 − 2p err ) to each coefficient different than a 0 in the expectation value.
Moving to the N -qubit case, we first note that only odd numbers of readout errors per snapshot contribute to the estimation error of an N -qubit expectation value. This is because the estimated expectation value is calculated via a product of the measurement outcomes which take values ±1, c.f. Eq. (18). Therefore, if a given snapshot includes readout errors on an even number of qubits, there are even number of erroneous outcomes m err j such that m err j = −m j , and the estimated expectation value remains unaffected by these errors. Let p odd (r) be the probability that in a given snapshot there is an odd number of readout errors on r qubits. Given a readout error p err per qubit, Similarly to the single qubit analysis, one can then show that the expectation value of a Pauli monomial P i with weight r i is We therefore obtain that the estimated expectation value of an N -qubit observable O = i a i P i is given by This equation indicates that the readout error does not depend on N . It introduces an attenuator (1 − 2p err ) ri to each term in the expectation value. Since the attenuator is an exponentially decreasing function in the weight r i , only low-weight Pauli monomials contribute to the expectation value O M . Thus, the proposed method is robust against readout error when estimating expectation values that have contributions dominantly from low-weight Pauli monomials (this does not necessarily exclude observables, such as projectors, that have contributions from all Pauli monomials: the contribution of higher weight Pauli monomials could be small on a class of states of interest, as in the case of the low entanglement states of this section's random projectors experiment). Finally, we remark that other errors, e.g. gate operation errors, could be included in this framework in a similar fashion and that, clearly, (31) is easily generalisable to the case of different readout errors per qubit.

IV. RELATED WORK
Quantum state tomography. In general, estimating an N -qubit state to error in trace distance requires O(2 N / 2 ) copies of the state [16]. The estimation of the density matrix with our tomographic procedure does not break this lower bound. If we choose the basis of Nqubit Pauli operators, Eq. (22) tells us that a few matrix elements will require M to be exponentially large in N . A different basis, such as the standard basis |i j|, has exponentially many matrix elements bounded by 1. However, calculating the trace distance requires adding exponentially many of them, creating again the need for an exponentially large M . The key point of this paper, however, is to be able to estimate approximately and efficiently expectation values without resorting to the notion of density matrix.
Shadow tomography. In shadow tomography the task is to estimate with high probability the expectation values of a set of K observables, to a certain precision . It was shown that [8] this can be done using M = O(log(K) 2 N 2 −8 ) copies of the state, by im-plementing a non-local measurement on all copies of the state. Recently, these results were improved [17]. The authors of [17] proposed an algorithm where M = O(B log(K)/ 2 ) copies of the state, with B max i O i 2 shadow , allow one to predict the expectation values of K observables, , to a precision . The norm · shadow depends on the particular measurement procedure used in the algorithm. For example, for random Pauli measurements the authors show (Proposition 3 there) that for a k-local observable O shadow 2 k O ∞ , where · ∞ denotes the spectral norm.
The algorithm of [17] has a similar structure to the algorithm proposed in this work. Similarly to [17], the choice of randomised single-qubit measurement directions of Eq. (13) is not the only possible one. The framework of [10,11] generates valid formulas for any irreducible projective representation in the space of the system, including for example the Clifford group and the representation of SU (2) in the space of a spin s = (2 N − 1)/2. Furthermore, one could also easily derive a general expression for the norm, bound of the standard deviation of the estimator, as Tr[O 2 ] multiplied by a factor only dependent on the group representation chosen. It is important to note, however, that the use of the Cauchy-Schwarz inequality in the derivation would generate suboptimal norms, often exponentially larger than the true standard deviation of the estimator. A tight norm (or seminorm) is key, as it gives a guarantee of performance with a lower number of preparations, especially if the separation from a suboptimal one is exponential. The closeness to the standard deviation of the estimator is what justifies the importance of O 2 of (24): O 2 can be exponentially smaller than norms depending exponentially on the locality or on the dimension of the system (App. C for the case of projectors onto the computational basis). Finally, while the task in shadow tomography is to estimate with high probability the expectation values of a set of K observables, to a certain precision , we focus here on the estimation with high probability the expectation values of a given, yet arbitrary, observable to a certain precision . We show that accomplishing this task requires M = O( O 2 / 2 ) copies of the state. If a certain precision is required for the combined evaluation of expectation values of several observables, one would simply calculate the joint probability of several normal variables of being smaller than a certain value which would lead to a log contribution to the error in the number of observables.

V. CONCLUSIONS AND OUTLOOK
We have introduced a concept of approximate description of a quantum state for a system of N qubits. We have shown that the approximate state comes with an explicit operational procedure to estimate the expectation value of any observable that admits an efficient rep-resentation in the Pauli basis with statistical error only dependent on the cardinality of the approximate quantum state and on the observable's seminorm, independent of N and growing at most exponentially with the locality of the observable. The operational procedure to obtain the approximate state, other than the initial state preparations, only requires single-qubit operations and measurement. The natural robustness of random singlequbit operations to experimental noise, as we saw in the experimental results and also discussed in [15], suggests that the proposed protocol could be appropriate for NISQ computers.
We expect the notion of the approximate quantum state to enable improvement over existing variational optimisation methods, as it would not require separate iterations for each set of values of the parameters. Machine learning optimisations, in particular for generative models known as Born Machines [18], may be an interesting case, since the explicit dependence of the expectation values of the projectors onto computational basis states on the variational parameters [19] may be obtained even for a large number of qubits, since, as shown in App. B, the estimators for the projectors onto computational basis state have, with high probability, variance bounded by 1 for all values of N .
The definition of the snapshots, constituting an approximate quantum state, can be generalised, too. The snapshots can in fact be derived from a group different from SU (2). This generalisation has interest that goes beyond the purely mathematical exercise. As we have seen, SU (2) tomography works well for observables expressed as linear combinations of products of Pauli operators, especially if the weights of the Pauli monomials are low. Electronic Hamiltonians can be represented as linear combinations of products of Pauli operators, but at the cost, in general, of introducing Pauli monomials with weights increasing with N . In second quantisation, electronic Hamiltonians contain linear combinations of products of only two or four ladder operators for any N . The bosonic tomographic formulas of [10] can be extended to the fermionic case, choosing the additive group of Grassmann numbers [20], instead of the additive group of complex numbers, as tomographic group and expressing the snapshots as vectors of sampled real numbers descending from Grassmann numbers and of measured presence or absence of electrons in orbitals. The estimators for ladder operators and for their linear combinations can then be calculated directly without the need for the mapping to Pauli operators and without the consequent appearance of weights of Pauli monomials growing with N . where s = (s x , s y , s z ) is spin of the system, and K(m s , n) is a kernel function given by K(m s , n) = 2s + 1 π 2π 0 dψ sin 2 ψ 2 e iψ(ms− s· n) .
Specialising Eq. (A2) for a spin-1/2 particle where m s = ±1/2 . = m/2 and s = (σ x , σ y , σ z )/2 . = σ/2 we can write where the subscript of K 1 is used to emphasise that is the kernel operator on a single qubit. We calculate (A3) utilising the identity e −i ψ 2 σ· n = 1 cos ψ 2 − i σ · n sin ψ 2 and directly obtain In the main text we have defined an unbiased estimator of σ α , with α = x, y, z and n α defined by n = (n x , n y , n z ). The variance of R 1 [σ α ] is We calculate the last term of (A6) utilising (A5): We define a seminorm of O as and obtain Therefore, for M sufficiently large, O M has a normal statistical error smaller than or equal to O / √ M .  is not explicitly indicated for a lighter notation): where the last sum on i and j is extended to all values of i and j except 0 v . = (0, 0, . . . , 0). In general, for every k, the indices i k and j k appearing in each R[P i ]R[P j ] of (B2) can assume any integer value between 0 and 3. For given i and j, let S be the set of indices k for which i k = 0 ∧ j k = 0, s the cardinality of S and S the set containing the remaining indices k of 1, . . . , N with cardinalitys = N − s. We use (16) for a Pauli monomial to arrive at If we introduce a seminorm of O as with r i representing the weight of P i , since This is not surprising, since the seminorm (B6) can be shown to be a bound for the variance of R[O] by expanding Var i =0v a i R[P i ] of (B2) and using Schwarz inequality without resorting to explicit properties of R as in (B3). The seminorm (B4) can also be expressed as with the seminorm O 2 defined by We use (B3) and rewrite (B2) as Equation (B10) shows how the mixed terms of (B8) are a weak upper bound for the last term of (B10). For a randomly chosen O or, in the same way, for a random ρ, the addends in the last term of (B10) have mixed positive and negative signs. Furthermore, for most states ρ, the expectation values of the expressions P i P j tend to become smaller for higher values of r ij . As a result, O 2 is generally a good approximation of the standard deviation of R[O].
To formalise this statement, we start by showing that the expectation value of the mixed terms of (B10) over the space of the density operators of the system is equal to zero. The probability of choosing a density operator from the space of density operators is the probability of choosing a set of eigenvalues multiplied by the probability of choosing a density operator with specific eigenvalues conditional to the choice of the eigenvalues. If for any choice of the eigenvalues, the expectation value over the density operators with the chosen eigenvalues is zero, then the expectation value over all density operators is zero. All density operators with the same eigenvalues can be obtained from one with a unitary transformation and, vice versa, all unitary transformations applied to a density operator preserve the eigenvalues. Therefore, we can calculate the expectation value of the mixed terms of (B10) over the density operators with given eigenvalues by averaging over the unitary group [21] with the unitary transformations applied to an arbitrary density operator ρ 0 To gain further information on how close O 2 is likely to be to O for a random state, we now consider the variance of the mixed terms of (B8) over the space of the density operators of the system. For simplicity, we specialise the analysis to pure states. From the unitary group notation of (B11), we switch to the average over the states expressed as in [22] and recall the following identities: for any operator A and B. The variance of the mixed terms of (B10) over the states of the system can be expressed as Var i =j( =0v) 3 rij ∆ ij Tr P i P j |ψ ψ| a i a j ψ = ψ dψ i =j( =0v) a i a j 3 rij ∆ ij Tr P i P j |ψ ψ| a i a j a i a j 3 rij 3 r i j ∆ ij ∆ i j ψ dψ Tr P i P j |ψ ψ| Tr P i P j |ψ ψ| a i a j a i a j 3 rij 3 r i j ∆ ij ∆ i j Tr (P i P j ) ⊗ (P i P j ) ψ dψ|ψ ψ| ⊗2 a i a j a i a j 3 rij 3 r i j ∆ ij ∆ i j Tr (P i P j ) ⊗ (P i P j ) 1 sym a i a j a i a j 3 rij 3 r i j ∆ ij ∆ i j Tr[P i P j P i P j ] a i a j a i a j 3 rij 3 r i j ∆ ij ∆ i j ∆ iji j , with ∆ iji j = 1 when i, j, i , j are such that P i P j P i P j = 1, otherwise ∆ iji j = 0. A general upper bound of the last expression is likely to be weak, given its strong dependence on O. The specific case of O given by a projector onto a computational basis state is examined in App. C.

Appendix C: Projectors onto the computational basis states
In this section we examine properties of the seminorm for an important special case, namely, the case of projectors onto a generic element of the computational basis state, where the ± of each factor is defined by the choice of the element of the computational basis |x . This case is important not only for its practical significance in a machine learning context, as briefly indicated in Sec. V, but also because it is representative of the cases with bounded O 2 and diverging O for increasing N , which makes the determination of whether the statistical error of the estimator can be approximated by O 2 of particular interest. To calculate the seminorms of any projector onto a computational basis state, we expand the tensor product of (C1) and use the seminorm definitions. For |x x| 2 this simply means |x x| A more involved combinatorial calculation would show that |x x| 2 (3/2) N and |x x| 2 = O((3/2) N ). We come back to (B15) for any projector onto computational basis state and observe that, for given i and j, there are 2 N ordered pairs i , j such that ∆ iji j = 1. The operator P i P j is in fact the tensor product of single-qubit identities and operators σ z . Each of them becomes the identity when multiplied by itself and this can be obtained with two choices for every qubit of P i and P j . Unfortunately, to complicate things, r i j is not independent of the choice of i and j for given i and j; however, we can simplify the evaluation of (B15) by considering a limit case and determining an upper bound. In particular, for every i and j, with i = j and i and j different from 0 v , i ,j With the substitutions a i = (1/2) N and ∆ ij = 1 valid for any projector |x x| and any value of the indexes, the last term of Eq. (B15) becomes 1 (2 N + 1) i =j( =0v) i =j ( =0v) a i a j a i a j 3 rij 3 r i j ∆ ij ∆ i j ∆ iji j < 1 (2 N + 1) 4 N i =j( =0v) (C4) Equation (C4) shows that, for any projector |x x|, Var(R[|x x|]) for a random state is bounded by |x x| 2 2 < 1 with probability exponentially close to 1 as a function of N .