Operational Quantum Average-Case Distances

We introduce distance measures between quantum states, measurements, and channels based on their statistical distinguishability in generic experiments. Specifically, we analyze the average Total Variation Distance (TVD) between output statistics of protocols in which quantum objects are intertwined with random circuits and measured in a standard basis. We show that for circuits forming approximate 4-designs, the average TVDs can be approximated by simple explicit functions of the underlying objects – the average-case (AC) distances. We apply AC distances to analyze the effects of noise in quantum advantage experiments and for efficient discrimination of high-dimensional states and channels without quantum memory. We argue that AC distances are better suited for assessing the quality of NISQ devices than common distance measures such as trace distance or the diamond norm.

case performance of a device in question.This may be impractical as well -it is not expected that the performance of typical experiments on a quantum device will be comparable to the worst-case scenario.
In this work, we consider the average Total-Variation (TV) distance between output statistics of two protocols in which random circuits interlace quantum objects of interest (see Figure 1).This can be thought to mimic the typical circumstances in which quantum states, measurements, or channels appear as parts of quantum-information protocols.We show that for a broad class of easy-toimplement random circuits (forming approximate 4designs), the average TV distance is approximated by simple explicit functions expressible by degree 2 polynomials in objects in question.
We use these functions to define distance measures between states, measurements, and channels.The so-defined average-case (AC) distances are thus distance measures that approximate average-case total variation distance.Contrary to conventional distances such as the trace distance or the diamond norm, the AC distances capture the generic behavior of quantum objects in experiments involving only moderate-depth quantum circuits.This feature can be especially relevant in the context of near-term algorithms, such as the Quantum Approximate Optimization Algorithm (QAOA) [13,14,23] and Variational Quantum Eigensolver (VQE) [33,46,47], as it is expected that generic variational circuits will, on average, have properties of unitary designs [41].We present numerical results suggesting that AC distances are more suitable for quantifying the impact of imperfections on variational algorithms than the conventional distance measures.
Multiple recent quantum advantage proposals are based on random circuits sampling [5,51].We apply AC distances to understand the effects of noise on such protocols.We approach the problem from two sides.First, the AC distances allow to easily lower bound the average-case TV distance between the noisy distribution and the ideal distribution, thus For quantum states a), we take the average over random unitaries applied to the state, followed by measurement in the standard basis.For quantum measurements b), we take the average over random pure states measured on the detector.Finally, for quantum channels c) we take the average over independent random unitaries applied before and after the application of the channel.
giving insight into how well separated, on average, are noisy distributions from target distributions.Second, AC distances allow to upper bound the average-case TV distance between a noisy distribution and a (trivial) uniform distribution.This allows to study how quickly the noise makes the average distribution useless.For example, we show that even in the absence of gate and state-preparation noise, the local, symmetric bitflip error in measurements causes noisy distribution to approach trivial one exponentially quickly in system size.
Recently there has been a lot of interest in algorithms that use randomized quantum circuits, such as shadow tomography [1,11,19,20,28] and randomized-benchmarking [12,15,16,26,39,40].Our results can be employed to quantify the performance of randomized algorithms in the task of statistical distinguishability of quantum objects.Namely, if the average-case distance between a pair of quantum objects on N qubit systems is large, then they can be (statistically) distinguished almost perfectly using a randomized protocol with just a few implementations of local random circuits of depth O(N ).We observe that such behavior takes place in two scenarios related to those recently analyzed in the context of so-called Quantum Algorithmic Measurement [2] and complexity growth of quantum circuits [10]: (i) distinguishing Haar random N qubit pure state from maximally mixed state and (ii) distinguishing N qubit Haar random unitary from maximally depolarizing channel.This shows that protocols employing random circuits can be used to efficiently discriminate quantum objects.Since they do not depend on the objects to be distinguished, randomized measurement schemes can be interpreted as "universal discriminators", analogous to the SWAP test but not requiring the usage of entanglement or coherent access to copies of quantum systems.
The manuscript is accompanied by a complementary work [37] that contains proofs of theorems, a thorough analysis of the properties of average-case quantum distances, and further examples.In contrast, the following work focuses on providing intuition behind AC distances and demonstrating how they can be applied to understand the power of random quantum circuits in practically relevant scenarios, which is followed by numerical demonstrations.We will consider general protocols consisting of three stages (i) state preparation, in which quantum system is initialized in state ρ, (ii) evolution given by a quantum channel Λ and (iii) measurement of the resulting state Λ(ρ) by a POVM M. The outcome statistics of such a protocol are given by the Born rule: TV distance defines the statistical distinguishability of p and q.Specifically, in a task when we are asked to decide whether the provided samples come from p or q (where both are promised to be given with equal probability), the optimal probability of correctly guessing the answer is p succ = 1 2 (1+TV(p, q)).The related distance between quantum objects is constructed by considering the optimal success probability of distinguishing between pairs of relevant quantum objects, where the optimization is carried out not only over classical post-processing strategies but also over quantum strategies that produce classical outcomes given the objects in question (see Sup-plementary Material (SM) for details).
Here we propose alternative distance measures based on scenarios where the strategy of discrimination of quantum objects is based on intertwining them with random quantum circuits and then comparing their outcome statistics [37].Specifically, consider output statistics p α,β of a quantum protocol where α is a fixed quantum object while β is taken to be a random variable (specifying a quantum circuit) distributed according to probability distribution ν.The average statistical distinguishability of two objects α 1 , α 2 is quantified by Explicit computation of TV av (α 1 , α 2 ) is difficult because TV(p, q) is not a polynomial function of the involved probabilities.However, if ν forms an approximate 4-design, it is possible to find simple estimates to TV av .Unitary k-designs are measures on U(H d ) that reproduce averages of Haar measure µ on balanced polynomials of degree k in U [3].For approximate k-designs these averages agree only approximately.Measure Importantly, random quantum circuits in the 1D architecture formed from arbitrary universal gates that randomly couple neighboring qubits, generate approximate kdesigns efficiently with the number of qubits N [9,21,25,45].Specifically, δ-approximate 4-designs are generated by the 1D random brickwork architecture in depth O(N + log(1/δ)), with moderate numerical constants [21].
Quantum average-case distances between states, measurements, and channels.We are now ready to formulate our main technical results -dimension independent relative error estimates on average TV distances between three types of quantum objects depicted in Figure 1.To simplify the formulation of the Theorems, we will use the symbol ≈ to denote equality up to a dimension-independent relative error.The specific constants are given in [37].In Appendix B we provide simplified proofs of the following theorems in the setting of exact unitary designs.The proofs for approximate unitary designs can be found in Appendix B of [37].
Quantum states.
Let p ρ,U denote the probability distribution of a quantum process in which ρ undergoes a unitary transformation U and is then subsequently measured in the computational basis of H d .In other words p ρ,U i = tr |i⟩⟨i|U ρU † , where Theorem 1 (Average-case distinguishability of quantum sates -Theorem 1 from [37]).Let ρ, σ be quantum states in H d and let ν be a distribution in the unitary group U(H d ) forming δ-approximate 4design for δ = δ ′ 2d 4 , for δ ′ ∈ (0, 1  3 ).We then have where ∥X∥ HS = tr(X 2 ) denotes Hilbert-Schmidt norm.
The proof of Theorem 1 (and also theorems 2 and 3 stated below) is inspired by the proof of Theorem 4 from [3] where Berger inequality (stating that for every random variable X with well-defined 2nd and 4th moments we have ≤ E|X| ) was used to prove that two states far apart in Hilbert-Schmidt norm can be information-theoretically distinguished by a POVM constructed from approximate 4-design.
Remark 1.We can interpret the above average statistical distinguishability as TV-distance of output statistics resulting from a measurement of a single POVM with effects M i,V j = ν j U † j |i⟩⟨i|U j , where ν j is the probability of occurence of circuit U j in the ensemble ν (for simplicity of presentation we assumed that ensemble ν is discrete).This POVM can be interpreted as a convex combination [44] of projective measurements M U j with effects M U j i = U † j |i⟩⟨i|U j .Lower bound on average TV distance implies that such randomized protocol distinguishes between quantum states with high probability.It immediately follows that there also exists a deterministic (not randomized) optimal distinguishability protocol that achieves the same success probability.Such a measurement can be implemented, for example, via Naimark's dilation using an ancillary system [43].Analogous interpretation holds also for the average TV-distances from Theorems 2 and 3 below.[21].

Quantum measurements.
Let p M,ψ V denote the probability distribution of a quantum process in which a fixed pure quantum state ψ 0 is evolved according by unitary V and is subsequently measured via a n-outcome Theorem 2 (Average-case distinguishability of quantum measurements -Theorem 2 from [37]).Let M, N be n-outcome POVMs on H d and let ν be a distribution on on U(H d ) forming δ-approximate 4design for δ = δ ′ (2d) 8 , for δ ′ ∈ (0, 1  3 ).We then have , where ( Quantum channels.Let p Λ,ψ V ,U by the probability distribution associated to a quantum process in in which a fixed pure quantum state ψ 0 is subsequently acted on by unitary V , channel Λ and unitary U , and is subsequently measured in the computational basis of H.In other words we have Theorem 3 (Average-case distinguishability of quantum channels -Theorem 3 from [37]).Let Λ, Γ be quantum channels acting on ).Then we have , where and J Λ denotes Jamiołkowski-Choi state of Λ.

Remark 3. Having defined randomized distinguishability strategies, it is natural to ask how they compare to optimal protocols on a d-dimensional
Hilbert space H d .We give upper bounds on the maximal ratio between worst-case and average-case distances to answer this.It turns out that this ratio is at most for quantum states, measurements, and channels, respectively.This implies that there exist scenarios where the optimal protocol for distinguishing two quantum objects performs exponentially better than protocol using random quantum circuits.Indeed, in the technical version of the manuscript, [37] we construct examples that saturate those bounds.
The above theorems suggest to define average-case distances between quantum states, measurements, and channels via formulas d s av , d m av , d ch av appearing in approximations (2), (3), and (4).This approach has several pleasant consequences.First, functions describing these distances can be expressed via simple, degree-two polynomials in underlying objects and can be easily explicitly computed for objects acting on systems of moderate dimension (no optimization is needed as in the case of the diamond norm [50]).Second, all average-case distances utilize in some way the Hilbert-Schmidt norm.This gives this norm an operational interpretation it did not possess before (especially for quantum states for which d s av (ρ, σ) = 1 2 ∥ρ − σ∥ HS ).Third, it turns out that so-defined distances satisfy plethora of natural properties such as subadditivity: , or restricted data-processing inequalities (typically various distances d av are non-increasing under application of unital quantum channels).See [37] for details and proofs of various properties of average-case distances.Fourth, while it may seem that condition of being (approximate) 4-design is quite stringent, from a recent paper [21] it follows that ensembles of quantum circuits required by Theorems 1-3 can be realized by random circuits in the 1D brickwork architecture in depth O(N ) (with moderate prefactors) [21].Finally, we expect that our average-case distances will more accurately capture the behavior of errors in the performance of quantum objects in generic moderate size quantum algorithms (note that many architectures of variational circuits used in NISQ algorithms are expected to exhibit, on average, design-like behavior [41]).We back up this last claim numerically by testing the usefulness of our distance measures on families of random quantum circuits originating from random instances of variational quantum algorithms on few-qubit systems.
Applications.For all the reasons mentioned above, we believe that introduced distances will prove useful in analyzing the practical performance of near-term quantum processors.We expect that they can also be useful in other branches of quantum information requiring the usage of randomized protocols like quantum communication, quantum complexity theory, or quantum machine learning.The following simple examples illustrate potential usefulness of our results.Here we consider examples which help to understand how noise affects average probability distributions in experiments with random circuits sampling.First, AC distances between noisy and ideal state allow to lower-bound average TVDs between target and noisy distributions.Second, AC distances allow to upper-bound average-case TVD between noisy distribution and trivial (uniform) one.
Indeed, to bound average TVD between uniform and noisy distribution, one calculates AC distance to maximally mixed state I d (states), trivial POVM M I = I d , . . ., I d (measurements), or maximally depolarizing channel Λ dep that acts as Λ dep (ρ) = I d for any state ρ (channels).This follows directly from definitions of AC distances -see Lemmas 23, 24 and 25 in [37].
In what follows, most of the examples make use of some average noise parameter q av (with different meaning for each example) that describes an average (over qubits) probability of errors of considered type not occurring.In most of them, we make an assumption that q av ≤ N 1  2 .This is done solely to achieve a particularly appealing form of lower bounds.One can derive expressions that are more complicated and do not require this assumption (see SM for details and proofs of the following examples).In gen- j σ j ρσ j with j ∈ {1, x, y, z}, σ 1 = I, and p r i , i.e., a probability of applying on qubit i a gate that stabilizes the state of that qubit (namely, either identity or Pauli matrix of which |±r i ⟩ is an eigenstate).Define average properties of noise as for each qubit and that q av ≤ N 1  2 .Then we have The above example might be relevant, for example, in QAOA algorithms where input state is often indeed a tensor product Pauli state [13], or can be useful for estimating effects of state-preparation errors for standard setting where input state is |0⟩⟨0| ⊗N .We see that with growing system size, the average noisy distribution approaches uniform distribution exponentially quickly (while moving away from target distribution).
This demonstrates that even in the absence of noise in random unitaries, the state-preparation errors will quickly aggregate.Exactly the same behaviour is demonstrated for the following simplified measurement noise model.

Example 2 (Symmetric bitflip measurement noise).
Consider a noisy version T sym P of computational basis measurement P, where T sym = ⊗ N i=1 T sym i and kth effect of noisy measurement is given by (T sym P) k = l T sym kl |l⟩⟨l|.Here for each qubit we have is a bitflip error probability on ith qubit.Define 2 for each qubit.Then we have The above means that even in the absence of state-preparation and gate errors, for symmetric bitflip noise the resulting average distribution exponentially quickly converges to uniform.We now consider a distance from ideal measurement for more realistic case of generic tensor product measurement noise.
Example 3 (Generic tensor product measurement noise).Let P = (|x⟩⟨x|) x∈{0,1} N be a computational basis measurement on N qubit system.Let M = (M x ) x∈{0,1} N be a POVM specified by effects , where Λ i are quantum channels affecting i'th qubit, and Λ † i is the conjugate of Λ i .Define classical success probability as av .Assume that for each qubit q (i) av ≥ 1  2 and that q av ≤ N 1 2 .Then we have The quantity q av is the survival probability of classical single-qubit state |x i ⟩⟨x i | that goes through a channel Λ i , averaged over all qubits and input states.We note that those quantities are routinely reported in experimental works, which makes the above bound particularly useful.Indeed, data from recent quantum advantage experiments [5,51] suggests that q av is around 97% (we take average of values reported in both papers).Assume perfect gates, no state preparation errors and q av = 0.97.Furthermore, assume that random circuits used in experiments form approximate 4-designs (this assumption is consistent with results of [25]).Then from Theorem 2 it follows that if readout errors remain con-stant with scaling of the system, for a 54-qubit quantum computer, on average (over realizations of random quantum circuits) output distributions p M,ψ V will have a constant ≈ 0.13 TV-distance from the ideal probability distributions p P,ψ V solely due to effects of readout noise.
Example 4 (Tensor product Pauli noise in the middle of the circuit).Consider tensor product Pauli channel Λ pauli defined in Example 1.For each qubit , and corresponding

as well as average probability of application of identity channel
Then we have Recall that the above scenario corresponds to inserting local Pauli noise "between" two random circuits (two averages in Eq. ( 4)).Similarly to previous cases, whenever there is non-zero noise, we will observe an exponential convergence to the trivial distribution and high separation from ideal distribution corresponding to identity channel I.

Example 5 (Single Pauli error the middle of the circuit). Consider tensor product channel Λ (i)
σ that applies some traceless unitary σ on qubit i (and identity to all other qubits).Then we have Physically, the above may correspond to a unitary noise applying one of Pauli matrices on qubit i somewhere in the circuit.We then observe a constant separation (value of 1 √ 2 ) between ideal distribution and the noisy distribution.Such significant average distance between noisy and target distribution suggests that local strong coherent errors can dramatically affect the performance of a given device in typical circumstances.This result is in agreement with empirical observations made in Refs.[5,8] where single-qubit errors were causing "speckle pattern" of output bitstrings probabilities to break, resulting in very low cross-entropy benchmarking fidelity.
Application 2: Sample efficient distinguishability of quantum objects with incoherent access Example 6.For any pure state ψ on H d we have It follows that a single round of a randomized protocol implicit in the definition of d s av (cf.Remark 1), realized via approximate 4-design and computational basis measurements, gives a constant bias in distinguishing any pure N qubit state ψ from the maximally mixed state: p av succ ≳ 0.57.This probability can be made arbitrarily close 1 by repeating the protocol and using the majority-vote strategy.Importantly, this method does not utilize the coherent access or a quantum memory (in a sense defined, e. g., in [2,29]).We note that a related but distinct scenario is considered in Ref. [2].There, the authors introduced the task of PurityTesting corresponding to discrimination between unknown Haarrandom pure random state and maximally mixed state.For N qubit systems, Theorem 4 of [2] implies exponential lower bound for the query complexity k (number of usages of unknown quantum state) needed to succeed in this task, given incoherent access to objects in question.In contrast, our randomized measurement protocol gives high statistical distinguishability already for a single query for all states ψ.The difference comes from the fact that in the scenario considered in Example 6 the random state is arbitrary but known.
Example 7. Let Λ U be a a unitary channel corresponding to a unitary U on H d and let Λ dep be a depolarizing channel i.e.Λ dep (ρ) = τ d for any ρ.

Then we have d
In related task FixedUnitary studied in [2], one is asked to distinguish unknown Haar-random unitary channel Λ U from Λ dep .Exponential query complexity lower bound incoherent protocols was shown in [2].By repeating analogous reasoning as for states, we get that when Λ U is arbitrary but known, randomized, non-adaptive, and incoherent protocol, utilizing two realizations of approximate 4-designs, gives constant bias in success probability of discrimination of Λ U from Λ dep using just a single query.
Application 3: Strong complexity of quantum states and unitaries.The above o examples have interesting consequences for the notion of a strong state and unitary complexity investigated in [10].There, the authors defined complexity C ∆ of Nqubit pure state ψ (resp.unitary circuit Λ U ) as the number of elementary gates needed to construct a circuit necessary to implement a two-outcome measurement discriminating between ψ (resp.depolarizing channel Λ dep ) with success probability p succ = 1 2 + ∆.Our results imply that if the requirement of two-outcome measurement is relaxed, then measurements realizable with circuit depths r = poly(N )   3), ( 4)).In case of worst-case distance, "lb" indicates lower-bound.Average-case quantum distances were calculated explicitly.Mean TVDs were calculated between (exact numerical) probability distributions over 1000 random instances of random unitaries.
can succeed in these discrimination tasks with a constant bias ∆ * for all states ψ and unitary channels Λ U .This renders the so-defined notion of complexity trivial -all states and unitaries will have complexity C ∆ ≤ poly(N ), unless bias δ satisfies ∆ > ∆ * .
We note that large average-case distance d av implies only information-theoretic distinguishability of quantum objects.The cost of classical postprocessing needed to distinguish the probability distributions resulting from randomized protocols can be very large since they operate on exponentially large sample space.
Numerical results.Here we present the results of numerical studies of small-size quantum systems.We compare scaling with the system size for worstcase distance, average-case distance, and a mean TVD taken over an ensemble of random unitaries.The mean Total-Variation distance is calculated numerically over two types of ensembles of unitaries with a structure of variational circuits.One ensemble has a QAOA-like structure, while the other is a standard hardware-efficient VQE ansatz [47], both initialized with random parameters (see SM for exact form).Based on recent results [41], we expect them to form (approximate) unitary 4-designs.
We consider the following scenarios.

(States)
We compare a randomly chosen Pauli eigenstate affected by random local Pauli noise with its ideal version (Fig. 2a) and with maximally mixed state I d (Fig. 2b ).This is the scenario considered in Example 1.The error probabilities are chosen randomly from range [0.001, 0.01].

(Measurements)
The noisy measurement is a tensor product POVM constructed from singlequbit measurements obtained via Quantum Detector Tomography [35] of IBM's 15-qubit Melbourne device.We compare it to ideal computational-basis measurement (Fig. 2c).
Since the measurement noise in superconducting devices is usually highly asymmetric [36], we do not expect it to converge to the uniform distribution.
3. (Channels) We compare channel corresponding to random tensor product 1-qubit rotations around a random axis with ideal identity channel I (Fig 2d).Explicitly, the unitary corresponding to the channel has a form , where V (k) is chosen randomly to be X, Y or Z gate, and γ k ∈ [0.025π, 0.0313π].Similarly to POVMs, we do not expect coherent errors to bring noisy distributions close to the uniform distribution.
In each case, the number of circuit layers is ⌊1.5N ⌋.In Fig. 2 we collectively present the results of all simulations.Recall that both ensembles presented in Fig. 2 consist of circuits that are variational QAOA and VQE circuits with random parameters.From the plots, it is clear that in all studied cases for those ensembles, the average-case quantum distance is both significantly closer and more similar in scaling to the mean Total Variation distance between distributions in question, as compared to worst-case distance.
diamond norm distance d ⋄ [43] between quantum channels For the case of states, the maximization is over POVMs M used to distinguish them.We have a dual situation for measurements, the maximization is over input quantum states used to differentiate between one POVM and another.Finally, for the case of quantum channels and the diamond norm -the maximization is over both input states (on a possibly extended system) and over POVMs applied after a channel is implemented.

B Simplified proofs of main Theorems
Here we present simplified versions of proofs of Theorems 1, 2, and 3 from the main text.We refer the Reader to [37] for detailed calculations.Since in the main text we omitted dependence on δ in δ-approximate unitary designs, we consider here proofs only for exact (not approximate) unitary designs.The functional dependence for approximate designs, as well as proofs for approximate designs, can be found in [37].

B.1 Lower and upper bounds on absolute values
In scenarios we consider, we aim to find bounds on a random variable that is a Total-Variation distance (TVD) between two probability distributions.Note that since the expectation value is linear, it suffices to focus attention on a single outcome probability, and then add resulting bounds to obtain bounds on TVD.Let us thus denote by X i = p i − q i the value of a difference of probabilities of measurement outcome i taken from probability distributions p and q that correspond to two quantum-mechanical protocols.This is a shorthand notation -the protocols are described in the main text and correspond to discrimination between two states, measurements, or general channels.Conveniently, it turns out that for considered scenarios and probability measures (Haar measure and unitary designs), one can find real parameters a such that the following holds.

Lemma 1. (Lower bound on absolute value)
where the value of a depends on whether we discriminate between states, measurements, or channels.
Proof.From Lemmas 4, and 5 in [37] it follows that one can find constants a such that We note that Lemma 4 from [37] is Lemma 2 from [34], while Lemma 5 from [37] is one of the results in the accompanying technical manuscript [37].Recall that Berger's inequality [7] states that for random variable Y with well-defined 2nd and 4th moments, we have Then the proof follows from combining Eq. ( 16) with Berger's inequality.
At the same time, we have that the following holds for any random variable Y .

Lemma 2. (Upper bound on absolute value)
Proof.The above is a special case of Jensen's inequality [30] which states that for a concave function From the above one can see that to obtain both lower and upper bound on TVD it suffices to calculate the 2nd moment of |X i |.To do so, the following Lemma will be useful.Lemma 3 (Ancillary integral for 2nd moment).Let A be a Hermitian operator on (H d ) and µ be a Haar measure.Then we have Proof.We first write simple manipulation This allows us to evaluate the RHS using standard techniques of Haar measure integration (see, e.g., [

B.2 Proofs of Theorems 1 and 2
For states and measurements, the proofs are essentially identical, thus we consider them together.As stated above, obtaining both bounds reduces to calculating second moments of |X i |, which we will now outline.Consider discrimination of states ρ and σ.We calculate the second moment by applying Lemma 3 to operator Note that the RHS does not depend on index i.The proof concludes by taking a square root of the RHS and summing over i.Consider discrimination of measurements M and N.In analogy to states, we calculate the 2nd moment by applying Lemma 3 to operator ∆i = M i − N i , and obtain

B.3 Proof of Theorem 3
In the case of states and measurements, there was only a single average (over projective measurements for states and over pure states for measurements).However, for quantum channels we have both quantum inputs and outputs, thus we need to calculate two averages.Consider discrimination between two channels Λ and Γ .Denote ∆ = Λ − Γ .
To proceed, we first apply Theorem 1 to perform averaging over projective measurements after the application of the channel (or, equivalently, averaging over unitaries acting on the output of channels followed by fixed measurement in a standard basis).In this way, we remove one integral and reduce the problem to finding bounds on the expected value of E The last term in the above can then be evaluated using standard techniques of Haar measure integration (see, e.g., [24,Prop. 6], and recall the proof of Lemma 3).The computation yields j σ j ρσ j with j ∈ {1, x, y, z}, σ 1 = I, and p r i , i.e., a probability of applying on qubit i a gate that stabilizes the state of that qubit (namely, either identity or Pauli matrix of which |±r i ⟩ is an eigenstate).Furthermore, assume that for each qubit i we have q (i) ≥ 1 2 .Then we have We start by defining function f (i) = q (i) (1 − q (i) ), as well as average noise properties q av = 1 N N i=1 q (i) and f av = 1 N N i=1 f (i) .We then bound Eq. (24) from above as and continue with bounding (positive) expression inside square root as where in first inequality we used inequality between geometric and arithmetic means together with a fact that x N ≥ y N for x > y > 0. In second inequality we used that for 0 ≤ x ≤ 1 and N ≥ 1, we have (1 − x) N ≤ exp (−xN ).Note that each term 2f (i) lies in interval 2f (i) ∈ 0, 1 2 .Combining everything we obtain which concludes the proof of first bound.To bound Eq. ( 25) from below, we start by again employing inequality between geometric and arithmetic mean, namely which after combining with Eq. (25) yields The above bound is valid provided that argument is still contained in the domain of square root, i.e., we need to impose Note that N 1 2 N →∞ − −−− → 1, and since q av is by definition lower than 1, the bound becomes less restrictive for higher system sizes.For small systems it is valid only for high noise (small q av ), but in such cases one can simply use the exact expressions from Eqs. (24) and (25).
The exactly same reasoning is applied for Examples 2 and 4, for which all expressions have almost the same functional forms (see [37]).We now consider bound from Example 3 from the main text, for which the first part of the proof is slightly more involved due to more general noise model considered.
Example 9 (Example 3 from the main text).Let P = (|x⟩⟨x|) x∈{0,1} N be a computational basis measurement on N qubit system.Let M = (M x ) x∈{0,1} N be a POVM specified by effects , where Λ i are quantum channels affecting i'th qubit, and Λ † i is the conjugate of Λ i .Define classical success probability as p (i) (k|k) = tr Λ † i (|x i ⟩⟨x i |) |x i ⟩⟨x i | and corresponding average q (i) av = p i (0|0)+p (i) (1|1)

2
. Let q av := 1 N N i=1 q (i) av .Assume q (i) av ≥ 1  2 for each qubit i and that q av ≤ N 1 2 .Then we have d m av (M, P) > To prove the above, first one applies maximally-dephasing channel to both measurements and uses dataprocessing inequality for average-case distance to bound the distance from below by the diagonal part of the POVM M. Specifically, define dephased POVM Φ dep (M) via its effects Φ dep (M) i = Φ dep (M i ), where maximally dephasing channel acts on any operator A as Φ dep (A) = diag(A), with diag(A) denoting diagonal part of A. Note that for compuational basis measurement P we have Φ dep (P) = P. Thus we have The above allows to treat noise as classical and look only on assignment infidelities for classical states (i.e., error probabilites when measured states are computational-basis states).Note that, importantly, maximally dephasing channel does not change the product structure of M. Thus we can treat this dephased POVM Φ dep (M) as related to computational basis measurement via some tensor product stochastic map T = N i=1 T (i) , where T (i) acts on ith qubit and is specified by two success probabilities p (i) (0|0) and p (i) (1|1) (see, for example, Ref. [38] for more details on stochastic readout noise).Thus we have d m av (M, P) ≥ d m av (TP, P) , (34) where TP is a POVM with ith effect given by (TP) i = i T ij |j⟩⟨j| and stochastic map T is defined via diagonal elements of original POVM M (as in discussion above).Now one applies Lemma 28 from technical version of the work [37] that lower bounds the distance via symmetrized version of T, where now both error probabilities are the same and equal to q (i) av = p (i) (0|0)+p (i) (1|1) 2 (note that this is equivalent to Pauli bitflip channel applied with probability q Therefore we reduced the lower bound to scenario considered in Example 2 from the main text, for which the bound was proved above.

Figure 1 :
Figure1: Measures of the distance between quantum objects based on average statistical distinguishability.For quantum states a), we take the average over random unitaries applied to the state, followed by measurement in the standard basis.For quantum measurements b), we take the average over random pure states measured on the detector.Finally, for quantum channels c) we take the average over independent random unitaries applied before and after the application of the channel.

Application 1 :
Noise in quantum advantage experiments.
(a) Quantum states, distance to ideal distribution (b) Quantum states, distance to uniform distribution (c) Quantum measurements, distance to ideal distribution (d) Quantum channels, distance to ideal distribution

Figure 2 :
Figure2: Results of numerical studies for comparison between worst-case distance, average-case quantum distance and numerically calculated mean TVD.Plots 2a, 2c and 2d correspond to distance to ideal (noiseless) distribution.For states, we additionally plot distance to uniform (trivial) distribution on plot 2b.For average-case distance, we also plot value corresponding to lower bound on average-case TVD (following from Eqs. (2), (3), (4)).In case of worst-case distance, "lb" indicates lower-bound.Average-case quantum distances were calculated explicitly.Mean TVDs were calculated between (exact numerical) probability distributions over 1000 random instances of random unitaries.

( a )
Quantum states, distance to ideal distribution (b) Quantum states, distance to uniform distribution (c) Quantum measurements, distance to ideal distribution (d) Quantum channels, distance to ideal distribution

Figure 3 :
Figure 3: Results of numerical studies for comparison between worst-case distance, average-case quantum distance and numerically calculated mean TVD.The plot is exactly the same as Fig 2 in the main text, but with additional ensemble of unitaries considered (see text description).

( a )
Quantum states, distance to ideal distribution (b) Quantum measurements, distance to ideal distribution (c) Quantum channels, distance to ideal distribution

Figure 4 :
Figure 4: Histograms of TVDs obtained for random ensembles considered in numerical simulations corresponding to Fig 2 in the main text.Different shades of a given color (blue or green) correspond to different system sizes for a given ensemble (QAOA or VQE).Bounds from average-case distances are indicated via dashed lines and for each dimension are the same for both ensembles (they depend only on quantum objects in question, not on the choice of random ensemble).
Notation and basic concepts.Our result concern quantum systems on finite-dimensional Hilbert space H d ≈ C d .General quantum measurements, also known as POVMs, are described by tuples M = (M i ) n i=1 of operators on H d which satisfy M i ≥ 0 and n i=1 M i = I d , where I d is the identity on H d .General quantum operations on H d is described by a quantum channel, i.e., a completely-positive tracepreserving map Λ : Herm(H d ) → Herm(H d ).We will use the notation τ d = I/d to denote maximally mixed state on H d .
the assumption becomes less restrictive for higher-dimensional systems and the presented bounds are intended for use in such cases.
Example 1 (Pauli eigenstates and tensor product Pauli noise).Consider state ψ pauli = ⊗ N i=1 |±r i ⟩⟨±r i |, where r i ∈ {x, y, z}, i.e., |±r i ⟩ is any Pauli eigenstate on qubit i (with eigenvalue +1 or −1.).Consider tensor product Pauli channel [37]mentioned in the main text, Examples 1-5 follow directly from more general expressions in examples in technical manuscript[37].Specifically, the Example 1 follows from Example 9, Examples 2 and 3 follow from Example 10 (in case of Example 3 arguments are slightly more involved, as presented below), while Examples 4 and 5 follow from Example 14.We now recall statements of Example 9 for Reader's convenience.Consider state ψ pauli = ⊗ N i=1 |±r i ⟩⟨±r i |, where r i ∈ {x, y, z}, i.e., |±r i ⟩ is any Pauli eigenstate on qubit i (with eigenvalue +1 or −1.).Consider tensor product Pauli channel