From estimation of quantum probabilities to simulation of quantum circuits

Investigating the classical simulability of quantum circuits provides a promising avenue towards understanding the computational power of quantum systems. Whether a class of quantum circuits can be efficiently simulated with a probabilistic classical computer, or is provably hard to simulate, depends quite critically on the precise notion of"classical simulation"and in particular on the required accuracy. We argue that a notion of classical simulation, which we call epsilon-simulation, captures the essence of possessing"equivalent computational power"as the quantum system it simulates: It is statistically impossible to distinguish an agent with access to an epsilon-simulator from one possessing the simulated quantum system. We relate epsilon-simulation to various alternative notions of simulation predominantly focusing on a simulator we call a poly-box. A poly-box outputs 1/poly precision additive estimates of Born probabilities and marginals. This notion of simulation has gained prominence through a number of recent simulability results. Accepting some plausible computational theoretic assumptions, we show that epsilon-simulation is strictly stronger than a poly-box by showing that IQP circuits and unconditioned magic-state injected Clifford circuits are both hard to epsilon-simulate and yet admit a poly-box. In contrast, we also show that these two notions are equivalent under an additional assumption on the sparsity of the output distribution (poly-sparsity).


Introduction and summary of main results
Which quantum processes can be efficiently simulated using classical resources is a fundamental and longstanding problem [1,2,3,4,5,6]. Research in this area can be split into two broad classes: results showing the hardness of efficient classical simulation for certain quantum processes, and the development of efficient classical algorithms for simulating other quantum processes. Recently, there has been substantial activity on both sides of this subject. Works on boson sampling [7], instantaneous quantum polynomial (IQP) circuits [8,9], various translationally invariant spin models [10,11], quantum Fourier sampling [12], one clean qubit (also known as DQC1) circuits [13,14], chaotic quantum circuits [15] and conjugated Clifford circuits [16] have focused on showing the difficulty of classically simulating these quantum circuits. On the other hand, there has been substantial recent progress in classically simulating various elements of quantum systems including matchgate circuits with generalized inputs and measurements [17] (see also [3,4,18] for earlier works in this direction), circuits with positive quasi-probabilistic representations [19,20,21], stabilizer circuits supplemented with a small number of T gates [22], stabilizer circuits with small coherent local errors [23], noisy IQP circuits [24], noisy boson sampling circuits [25], low negativity magic state injection in the fault tolerant circuit model [26], quantum circuits with polynomial bounded negativity [27], Abelian-group normalizer circuits [28,29] and certain circuits with computationally tractable states and sparse output distributions [30]. In addition, there has been some work on using small quantum systems to simulate larger quantum systems [31] as well as using noisy quantum systems to simulate ideal ones [32].
An important motivation for showing efficient classical simulability or hardness thereof for a given (possibly non-universal) quantum computer is understanding what properties of a quantum computer give rise to super-classical computational power. In this context, we desire classical simulability to imply that the computational power of the target quantum computer is "contained in classical", and the hardness of classical simulablility to imply that the target computational device can achieve at least some computational task beyond classical. Achieving these desiderata hinges crucially on the strength of the notion of simulation that is employed. As an extreme example, if one uses a notion of simulation that is too weak, then efficient classical "simulation" of universal quantum circuits may be possible (even if BQP ⊆ BPP). In such a case, the existence of a "simulator" does not imply that the computational power of the simulated system is contained within classical. As an opposite extreme, if one uses a notion of simulation that is too strong, then efficient classical "simulation" of even classical circuits may be impossible [33]. In this case, the nonexistence of such a simulator does not imply that the computational power of the "un-simulable" system is outside of classical. Once we establish the notion of simulation that is neither "too strong" nor "too weak", it will become evident that both too strong and too weak notions of simulations have been commonly used in the literature. To this end, we require a clear mathematical statement about which notion of simulation minimally preserves the computational power of the system it simulates.
From a computer science perspective, the computational power of a device can be characterized by the set of problems such a device can solve. However, when it comes to quantum devices that produce probabilistic output from an exponentially growing space, even the question of what problems these devices solve or what constitutes a solution is subtle. Given an efficient description of a quantum circuit, the exact task performed by a quantum computer is to output a sample from the probability distribution associated with the measurement outcomes of that quantum circuit. This suggests that for ideal quantum computers, sampling from the exact quantum distribution is what constitutes a solution. On the other hand, it is unclear what well justified necessary requirement fail to be met by an arbitrarily small departure from exact sampling. Perhaps due to these subtleties, the choice of notion of "classical simulation" for sampling problems lacks consensus and, under the umbrella term of weak simulation, a number of different definitions have been used in the literature. We will argue that some of these notions are too strong to be minimal and others are too weak to capture computational power. The cornerstone of this argument will be the concept of efficient indistinguishability; the ability of one agent to remain indistinguishable from another agent under the scrutiny of any interactive test performed by a computationally powerful referee whilst simultaneously employing resources that are polynomially equivalent.
Examples of definitions that we argue are too strong include simulators required to sample from exactly the target distribution or sample from a distribution that is exponentially close (in L 1 -norm) to the target distribution [34]. These also include a notion of simulation based on approximate sampling where the accuracy requirement is the very strong condition that every outcome probability is within a small relative error of the target probability [35,8,14]. From the perspective of efficient indistinguishibility, these notions of simulation are not minimal since they rule out weaker notions of simulation that are nonetheless efficiently indistinguishable from the target quantum system.
An example of a notion of approximate weak simulation that we argue is too weak requires that the classical algorithm sample from a distribution that is within some small fixed constant L 1 -norm of the target distribution [24,25,9,10,11,36,16]. We argue that such a notion does not capture the full computational power of the target, since it cannot perform a task that can be performed by the target device, namely of passing some sufficiently powerful distinguishibility test.
The focus of this paper will be on a notion of approximate weak simulation we call efficient polynomially small in L 1 -norm (epsilon) simulation (or -simulation for short). This has been used in prior works including Refs. [7,37,12,22]. We will advocate for this notion of simulation (over other definitions of weak simulation) by showing that an -simulator of a quantum computer achieves efficient indistinguishablity and any simulator that achieves efficient indistinguishablity satisfies the definition of an -simulator. Thus -simulation minimally captures computational power. The notion of -simulation is also closely related to the definition of a sampling problem from Ref. [37] (where the definition includes an exact statement of what constitutes a solution to the sampling problem). In this language, an -simulator of a family of quantum circuits can be exactly defined as an efficient classical algorithm which can solve all sampling problems defined by the family of quantum circuits in the natural way. Thus, our result shows that a device can solve all sampling problems defined by a quantum computer if and only if the device is efficiently indistinguishable from the quantum computer.
The conceptual significance of -simulation as a notion that minimally captures computational power motivates the study of its relation to other notions of simulation. This is particularly important for translating the existing results on simulability and hardness into statements about computational power relative to classical. Such a comparison to the above-mentioned approximate weak simulators is clear but a comparison to simulators defined in terms of Born probability estimation can be significantly more involved. Simulators which output sufficiently accurate Born probability estimates can be called as subroutines in an efficient classical procedure in order to output samples from a desired target distribution. Such a procedure can be used to "lift" these simulators to an -simulator implying that the computational power of all families of quantum circuits simulable in this way is contained within classical.
Some commonly used notions of simulation such as strong simulation and multiplicative precision simulation require the ability to estimate Born probabilities extremely accurately. These simulators can be lifted to -simulators [3,4,35]. We focus on another notions of simulation that has been prominent in recent literature [27,26,23] which we call a poly-box . Compared to strong or multiplicative precision simulators, a poly-box has a much less stringent requirement on the accuracy of Born probability estimates that it produces. We discuss the significant conceptual importance of poly-boxes owing to the fact that they capture the computational power with respect to decision problems while simultaneously being weak enough to be admitted by IQP circuits, unconditioned magic-state injected Clifford circuits and possibly other intermediate models for quantum computation.
Assuming some complexity theoretic conjectures, we show that a poly-box is a strictly weaker notion of simulation than -simulation. However, if we impose a particular sparsity restriction on the target family of quantum circuits, then we show that a poly-box can be lifted to ansimulator, implying that the two notions are, up to efficient classical computation, equivalent under this sparsity restriction.

Indistinguishability and -simulation.
In Sec. 2, we motivate the use of a particular notion of efficient simulation, which we callsimulation. Essentially, we say that an algorithm can -simulate a family of quantum circuits, if for any > 0, it can sample from a distribution that is -close in L 1 -norm to the true output distribution of the circuit, and if the algorithm runs in time polynomial in 1/ and in the number of qubits. We provide an operational meaning for this notion by showing that "possessing an -simulator" for a family of circuits is equivalent to demanding that even a computationally omnipotent referee cannot distinguish the simulator's outputs from that of the target circuit family. Further, any simulator that satisfies efficient distinguishability also satisfies the definition of an -simulator. This is captured by the following theorem presented in Sec. A family of binary outcome quantum circuits, where each circuit is indexed by a bit-string, defines a decision problem as follows: Given a bit-string indexing a quantum circuit, decide which of the circuit's two possible outcomes is more likely 1 .
Here, the only quantity relevant to the computation is the probability associated with the binary measurement outcome (decision). Hence, in this setting, simulation can be defined in terms of the accuracy to which these probabilities can be estimated. A commonly used notion of simulation known as strong simulation requires the ability to estimate Born probabilities extremely accurately. In Sec. 3, we will define a much weaker notion of simulation (poly-box 2 ) which is a device that computes an additive polynomial precisions estimate of the quantum probability (or marginal probability) associated with a specific outcome of a quantum circuit.
We show that families of quantum circuits must admit a poly-box in order to be -simulable.

Theorem 2.
If C is a family of quantum circuits that does not admit a poly-box algorithm, then C is not -simulable.
We advocate the importance of this notion on the grounds that whether or not some given family of quantum circuits admits a poly-box informs our knowledge of the computational power of that family relative to classical. In particular: • if a (possibly non-universal) quantum computer can be efficiently classically simulated in the sense of a poly-box, then such a quantum computer cannot solve decision problems outside of classical • if a (possibly non-universal) quantum computer cannot be efficiently classically simulated in the sense of a poly-box, then such a quantum computer can solve a sampling problem outside of classical (Thm. 2) We give three examples of poly-boxes. The first one is an estimator based on Monte Carlo sampling techniques applied to a quasiprobability representation. This follows the work of Ref. [27], where the it was found that the efficiency of this estimator depends on the amount of "negativity" in the quasiprobability description of the quantum circuit. As a second example, we consider the family of circuits C PROD , for which the n-qubit input state ρ is an arbitrary product state (with potentially exponential negativity), transformations consist of Clifford unitary gates, and measurements are of k ≤ n qubits in the computational basis. We present an explicit poly-box for C PROD in Sec. 3. As a third example, we also outline a construction of a poly-box for Instantaneous Quantum Polynomial-time (IQP) circuits C IQP based on the work of Ref. [39].

From estimation to simulation.
For the case of very high precision probability estimation algorithms, prior work has addressed the question of how to efficiently lift these to algorithms for high precision approximate weak simulators. In particular, Refs. [3,4,35] (see also Appendix B) lift estimation algorithms with small relative error. In Appendix B, we also present a potentially useful algorithm for lifting small additive error estimators. In Sec. 4 we focus on the task of lifting an algorithm for a polybox, to an -simulator. Since a poly-box is a much less precise probability estimation algorithm (in comparison to strong simulation), achieving this task in the general case is implausible (see Sec. 5). In Sec. 4, we will show that a poly-box can efficiently be lifted to an -simulator if we restrict the family of quantum distributions to those possessing a property we call poly-sparsity. This sparsity property measures "peakedness versus uniformness" of distributions and is related to the scaling of the smooth max-entropy of the output distributions of quantum circuits. Loosely, a poly-sparse quantum circuit can have its outcome probability distribution well approximated by specifying the probabilities associated with polynomially many of the most likely outcomes. We formalize this notion in Sec. 4. Theorem 3. Let C be a family of quantum circuits with a corresponding family of probability distributions P. Suppose there exists a poly-box over C, and that P is poly-sparse. Then, there exists an -simulator of C.
We emphasize that the proof of this theorem is constructive, and allows for new simulation results for families of quantum circuits for which it was not previously known if they were efficiently simulable. As an example, our results can be straightforwardly used to show that Clifford circuits with sparse outcome distributions and with small amounts of local unitary (non-Clifford) noise, as described in Ref. [23], are -simulable.

Hardness results.
Finally, in Sec. 5, we prove that the poly-box requirements of Theorem 2 is on its own not sufficient for -simulability. The challenge to proving such a result is identifying a natural family of nonpoly-sparse quantum circuits for which a poly-box exists but for which -simulation is impossible.
We prove that the family C PROD described above, which violates the poly-sparsity requirement, admits a poly-box. Then, by assuming a now commonly used "average case hardness" conjecture [7,9,10,12,36,16], we show that the ability to perform -simulation of C PROD implies the unlikely result that the polynomial hierarchy collapses to the third level. Loosely, this result suggests that there exist quantum circuits where the probability of any individual outcome (and marginals) can be efficiently estimated, but the system cannot be -simulated. Our hardness result closely follows the structure of several similar results, and in particular that of the IQP circuits result of Ref. [9].
Our proof relies on a conjecture regarding the hardness of estimating Born rule probabilities to within a small multiplicative factor for a substantial fraction of randomly chosen circuits from C PROD . This average case hardness conjecture (which we formulate explicitly as Conjecture 1) is a strengthening of the worst case hardness of multiplicative precision estimation of probabilities associated with circuits from C PROD . Worst case hardness can be shown by applying the result of Refs. [40,41,42,43,44] and follows from an analogous argument to Theorem 5.1 of Ref. [16]. We note that our hardness result is implied by the hardness results presented in Refs. [10,11], however; our proof is able to use a more plausible average case hardness conjecture than these references due to the fact that we are proving hardness of -simulation rather than proving the hardness of the yet weaker notion of approximate weak simulation employed by these references.
In Appendix D we also present Theorem 7. This theorem shows that the properties of polysparsity and anti-concentration are mutually exclusive.
The flow chart in Fig. 1 summarizes the main results in this paper by categorizing any given family of quantum circuits in terms of its computation power based on whether or not the circuit family admits certain properties related to simulability.

Defining simulation of a quantum computer
While there has been a breadth of recent results in the theory of simulation of quantum systems, this breadth has been accompanied with a plethora of different notions of simulation. This variety brings with it challenges for comparing results. Consider the following results, which are all based on (often slightly) different notions of simulation. As a first example, the ability to perform strong simulation of certain classes of quantum circuits would imply a collapse of the polynomial hierarchy, while under a weaker (but arguably more useful) notion of simulation this collapse is only implied if additional mathematical conjectures hold true [7,9]. As another example, Ref. [14] shows that the quantum complexity class BQP is contained in the second level of the polynomial hierarchy if there exist efficient classical probabilistic algorithms for sampling a particular outcome (from the quantum circuits considered) with a probability that is exponentially close to the true quantum probability in terms of additive error (or polynomially close in terms of multiplicative error). As additional examples, Refs. [27,23] present efficient classical algorithms for additive polynomial precision estimates of Born rule probabilities. While many such technical results are crucially sensitive to these distinctions in the meaning of simulation, there is a growing need to connect the choice of simulation definition used in a proof against (or for) efficient classical simulability to a statement about proofs of quantum advantage (or ability to practically classically solve a quantumly solvable problem). In particular, to the non-expert it can be unclear what the  Figure 1: An overview of the main results. An arbitrary family of quantum circuits C is partially classified by its computational power relative to universal classical computers. The unclassified category (admits a poly-box and is not poly-sparse) is known to contain circuit families that are hard to -simulate assuming some plausible complexity theoretic conjectures. We give examples of circuits families in these categories. Here, C * UNIV , C * STAB , C * polyN and C * IQP refer to the following families of circuits: universal circuits, stabilizer circuits, circuits with polynomially bounded negativity and IQP circuits respectively. The circuit families Ce and C PROD are discussed in some detail in Sec. 4.1 and 3.4.2 respectively. The presence of a superscript represents an upper bound on the number of qubits to be measured.
complexity of classical simulation (in each of the above mentioned notions of simulation) of a given quantum device says about the hardness of building a classical device that can efficiently solve the computational problems that are solvable by the quantum device.
In this section, we will discuss a meaningful notion of approximate weak simulation, which we call -simulation. This notion of simulation is a natural mathematical relaxation of exact weak simulation and has been used in prior works, e.g., in Refs. [7,12,22]. Further, this notion of simulation is closely related to the class of problems in complexity theory known as sampling problems [37]. Here, we define -simulation and prove that up to polynomial equivalence, an -simulator of a quantum computer is effectively a perfect substitute for any task that can be performed by the quantum computer itself. In particular, we will show that -simulators satisfy efficient indistinguishability meaning that they can remain statistically indistinguishable from (according to a computationally unbounded referee) and have a polynomially equivalent run-time to the quantum computer that they simulate. We argue that efficient indistinguishability is a natural choice of a rigorously defined global condition which minimally captures the concept of computational power. The accuracy requirements of -simulation are rigorously defined at the local level of each circuit and correspond to solving a sampling problem (as defined in [37]) based on the outcome distribution of the circuit. Thus our result shows that the ability to solve all sampling problems solvable by a quantum computer C is a necessary and sufficient condition to being efficiently indistinguishable from C or "computationally as powerful as C".

Strong and weak simulation
We note that every quantum circuit has an associated probability distribution that describes the statistics of the measurement outcomes. We will refer to this as the circuit's quantum probability distribution. As an example, Fig. 2 below depicts a quantum circuit. The output of running this circuit is a classical random variable X = (X 1 , . . . , X k ) that is distributed according to the quantum probability distribution.
Two commonly used notions of simulation are strong simulation and weak simulation. A weak simulator of a quantum circuit generates samples from the circuit's quantum probability distribution. In the strict sense of the term, a weak simulator generates samples from the exact quantum probability distribution. Loosely, having a weak simulator for a quantum system is an equivalent resource to using the quantum system itself. Figure 2: An example of a quantum circuit. This circuit acts on n qubits (or, in general, qudits). The initial state is a product state. The unitary operation U must be constructed out of a sequence of local unitary gates. The first k qubits in this example are each measured in a fixed basis, yielding outcome (X1, X2, . . . , X k ). Qubits i > k, shown without a measurement, are traced over (marginalized).
The term weak simulation has also been used in reference to classical algorithms which sample from distributions which approximate the target probability distribution. There exist at least four distinct notions of approximate weak simulation appearing in the quantum computation literature. As background, we give a brief description of these here although the focus of this paper will be on only one of these and will be discussed in some detail later in this sections.
1. The first notion of approximate weak simulation requires that the classical algorithm sample from a distribution that is exponentially close (in L 1 -norm) to the target distribution. This notion was used in Ref. [34,33].
2. Another notion of apprximate weak simulation requires that the sampled distribution be sufficiently close to the target distribution so as to ensure that for every outcome x, the sampled distribution satisfies |P sampled (x) − P target (x)| ≤ P target (x) for some fixed > 0. See Ref. [35] and also [8,14] for related variants.
3. The third notion of approximate weak simulation requires that the classical algorithm sample from a distribution that is inverse polynomially close (in L 1 -norm) to the target distribution. This notion of simulation has been used in prior works, e.g., in Refs. [7,37,12,22] both in the context of hardness of classical simulation and existence of classical simulators. We call this -simulation.
4. The final prominent example of approximate weak simulation, requires that the classical algorithm sample from a distribution that is within some small fixed constant L 1 -norm of the target distribution. This definition has predominantly featured in hardness proofs [9,10,11,36,16]. It has also feature in proofs of efficient classical simulability of noisy boson sampling circuits [25] and noisy IQP circuits [24].
A strong simulator, in contrast, outputs probabilities or marginal probabilities associated with the quantum distributions. More specifically, a strong simulator of a circuit is a device that outputs the quantum probability of observing any particular outcome or the quantum probability of an outcome marginalized 3 over one or more of the measurements. Note that a strong simulator requires an input specifying the event for which the probability of occurrence is required. Taking Fig. 2 as an example, a strong simulator could be asked to return the probability of observing the event (X 1 , X 2 ) = (1, 0), marginalized over the measurements 3 to k. The requirement that a strong simulator can also output estimates of marginals is weaker than requiring them to estimate the quantum probability associated with any event (subset of the outcome space).
While the names 'strong' and 'weak' simulation suggest that they are in some sense different magnitudes of the same type of thing, we note that these two types of simulation produce different types of output. In particular, a strong simulator outputs probabilities. (More specifically, it outputs exponential additive precision estimates of Born rule probabilities and their marginals.) In contrast a weak simulator outputs samples (from the exact target probability distribution).
Ref. [33] provides a compelling argument advocating for the use of weak simulation in place of strong simulation by showing that there exist classically efficiently weak simulable probability distributions that are #P-hard to strong simulate, thus showing that aiming to classically strong simulate is an unnecessarily challenging goal. In a similar vein, here we will advocate for the notion of -simulation over other notions of simulation including the alternative notions of approximate weak simulation.

-simulation
A weak simulator, which generates samples from the exact quantum probability distribution, is a very strict notion. Often, it would be sufficient to consider a simulator that generates samples from a distribution that is only sufficiently close to the quantum distribution, for some suitable measure of closeness. Such a relaxation of the requirement of weak simulation has been used by several authors, e.g., in Refs. [34,33,7,37,12,22,9,10,36,16,25,24] . Here, we define the notion of -simulation, which is a particular relaxation of the notion of weak simulation, and motivate its use.
We first define a notion of sampling from a distribution that is only close to a given distribution. Consider a discrete probability distribution P. Let B(P, ) denote the ball around the target P according to the L 1 distance (or equivalently, up to an irrelevant constant, the total variation distance). We define -sampling of a probability distribution P as follows: Definition 1. Let P be a discrete probability distribution. We say that a classical device or algorithm can -sample P iff for any > 0, it can sample from a probability distribution P ∈ B(P, ). In addition, its run-time should scale at most polynomially in 1/ .
We note that the use of the L 1 -norm in the above is motivated by the fact that the L 1 -distance upper bounds on the one-shot success probability of distinguishing between two distributions. More details can be found in the proof of Theorem 1 in Appendix A.
The definition above does not require the device to sample from precisely the quantum probability distribution P, but rather allows it to sample from any probability distribution P which is in the ball around the target probability distribution, P. We note that the device or algorithm will in general take time (or other resources) that depends on the desired precision in order to output a sample, hence the efficiency requirement ensures that these resources scale at most polynomially in the precision 1/ .

Definition 2.
We say that a classical device or algorithm can -simulate a quantum circuit if it can -sample from the circuit's associated output probability distribution P.
We note that each of the above mentioned notions of simulation refers to the simulation of a single quantum circuit. More generally, we may be interested in (strong, weak, or ) simulators of uniform families of quantum circuits. In this setting we can discuss the efficiency of a simulator with respect to n, the number of qubits 4 . As an example, consider a family of circuits described by a mapping from A * (finite strings over some finite alphabet A) to some set of quantum circuits C = {c a | a ∈ A * } where for each a ∈ A * , c a is a quantum circuit with some efficient description 5 given by the index a. In the case of strong (weak) simulation, we say that a device can efficiently strong (weak) simulate the family of quantum circuits C if the resources required by the device to strong (weak) simulate c a ∈ C are upper-bounded by a polynomial in n. In the case of -simulation, we require that the simulator be able to sample a distribution within distance of the quantum distribution efficiently in both n and 1/ .

Definition 3.
We say that a classical device or algorithm can -simulate a uniform family of quantum circuit C if for all > 0 and for any c ∈ C (with number of qubits n and quantum distribution P) it can sample from a probability distribution P ∈ B(P, ) in run-time O(poly(n, 1 )).

-simulation and efficient indistinguishability
As noted earlier, this definition ensures that -simulation is a weaker form of simulation than exact weak simulation. However, we point out that the notion of exact sampling may be weakened in a number of ways, with the -simulation approach being well suited to many applications related to quantum simulators. As an example, if the definition of simulation allowed for a fixed but small amount of deviation in L 1 distance (as opposed to one that can be made arbitrarily small) then computational power of a simulator will immediately be detectably compromised. The above notion of -simulation requires a polynomial scaling between the precision (1/ ) of the approximate sampling and the time taken to produce a sample. Below (Theorem 1), we will use a statistical indistinguishability argument to show that a polynomial scaling is precisely what should be demanded from a simulator. In particular, we will show that a run-time which scales subpolynomially in 1/ puts unnecessarily strong demands on a simulator while a super-polynomial run-time would allow the simulator's output to be statistically distinguishable from the output of the device it simulates.
We now introduce the hypothesis testing scenario we consider.

Hypothesis testing scenario. Suppose Alice possesses a quantum computer capable of running a (possibly non-universal) family of quantum circuits C, and Bob has some simulation scheme for C (whether it's an -simulator is to be decided). Further, suppose that a referee with unbounded computational power and with full knowledge of the specifications of C, will request data from either Alice or Bob and run a test that aims to decide between the hypotheses:
H a : The requested data came from Alice's quantum computer or H b : The requested data came from Bob's simulator. The setup will be as follows: At the start of the test, one of Alice or Bob will be randomly appointed as "the candidate". Without knowing their identity, the referee will then enter into a finite length interactive protocol with the candidate (see Fig 3). Each round of the protocol will involve the referee sending a circuit description to the candidate requesting the candidate to run the circuit and return the outcome. The choice of requests by the referee may depend on all prior requests and data returned by the candidate. The rules by which the referee: 1. chooses the circuit requested in each round,

chooses to stop making further circuit requests and 3. decides on H a versus H b given the collected data
define the hypothesis test. The goal of the referee is as follows. For any given δ > 0 decide H a versus H b such that P correct ≥ 1 2 + δ where P correct is the probability of deciding correctly. Bob's goal is to come up with a (δ-dependent) strategy for responding to the referee's requests such that it jointly achieves: • indistinguishablity: for any δ > 0 and for any test that the referee applies, P correct < 1 2 + δ and • efficiency: for every choice of circuit request sequence α, Bob must be able to execute his strategy using resources which are O(poly(N (α), 1 δ )) where N (α) is the resource cost incurred by Alice for the same circuit request sequence.
We note that the referee can always achieve a success probability P correct = 1 2 simply by randomly guessing H a or H b . Importantly, the referee has complete control over the number of rounds in the test and additionally does not have any upper bound imposed on the number of rounds. Hence, P correct is the ultimate one shot probability of the referee correctly deciding between H a or H b and in no sense can this probability be amplified through more rounds of information requests. As such, we will say that the referee achieves distinguishability between Alice and Bob if ∀δ > 0, there exists a test that the referee can apply ensuring that P correct ≥ 1 − δ (independent of Bob's strategy). Alternatively, we will say that Bob achieves indistinguishability (from Alice) if ∀δ > 0, there exists a response strategy for Bob such that P correct ≤ 1 2 + δ (independent of what test the referee can apply). We will show that if Bob has an -simulator then there exists a strategy for Bob such that he jointly achieves indistinguishablity (i.e. the referee cannot improve on a random guess by any fixed probability δ > 0) and efficiency. In this case, Bob can at the outset choose any δ > 0 and ensure that P correct < 1 2 + δ for all strategies the referee can employ. The efficiency requirement imposed on Bob's strategy is with respect to the resource cost incurred by Alice. Here we will define what this means and justify the rationale behind this requirement. Let us first note that for any circuit c a ∈ C, there are resource costs R(c a ) incurred by Alice in order to run this circuit. This may be defined by any quantity as long as this quantity is upper and lower-bounded by some polynomial in the number of qubits. For example, R(c a ) may be defined by run-time, number of qubits, number of elementary gates, number of qubits plus gates plus measurement, length of circuit description etc. Since this quantity is polynomially equivalent to the number of qubits, without loss of generality, we can treat n a (the number of qubits used in circuit c a ) as the measure of Alice's resource cost R(c a ). We now note that for a given test, the referee may request outcome data from some string of circuits c 1 , . . . , c m ∈ C. Thus we define the resource cost for Alice to meet this request by N := n 1 + . . . + n m .
Bob's resource cost (run-time) with respect to each circuit c a ∈ C is polynomially dependent on both n a and the inverse of his choice of accuracy parameter . Thus, Bob's strategy is defined by the rules by which he chooses j , the accuracy parameter for his response in the j th round 6 . Thus, for a given sequence of circuit requests a 1 , . . . , a m ∈ A * , Bob will incur a resource cost Thus the efficiency condition requires that there exists some polynomial f (x, y) such that for all δ > 0 and for all possible request sequences α = (a 1 , . . . , a m ), The efficiency requirement imposed on Bob's strategy thus naturally requires that the resource costs of Alice and Bob be polynomial equivalent for the family of tests that the referee can apply.

Theorem 1. Bob has an -simulator of Alice's quantum computer if and only if given the hypothesis testing scenario considered above, there exists a strategy for Bob which jointly achieves indistinguishablity and efficiency.
The proof for this theorem can be found in Appendix A. The proof uses the fact that the L 1 distance between Alice and Bob's output distributions over the entire interactive protocol can be used to upper bound the probability of correctly deciding between H a and H b . Further, we show that the total L 1 distance between Alice and Bob's output distributions over the entire interactive protocol grows at most additively in the L 1 distance of each round of the protocol. We also note that an -simulator allows Bob to ensure that the L 1 distance of each round decays like an inverse quadratic ensuring that the sum of the L 1 distances converges to the desired upper bound. The convergence of the inverse quadratic series, which is an inverse polynomial, thus motivates the significance of -simulators i.e. simulators with run-time O(poly(n, 1/ )).
We note that the "if" component of the theorem says that meeting the definition of -simulator is necessary for achieving efficient indistinguishability, thus the notion of simulation cannot be weakened any further without compromising efficient indistinguishability.
Throughout this paper we view a quantum computer as a uniform family of quantum circuits C = {c a | a ∈ A * }. We note that by committing to the circuit model of quantum computation, our language including important definitions such as -simulation are not necessarily well suited to other models of computation unless these are first translated to the circuit model. For example, in a quantum computational model that makes use of intermediate measurements, such as the measurement based quantum computing (MBQC) model, consider a procedure where a part of the state is measured then conditioned on the outcome, a second measurement is conducted. This procedure (consisting of 2 rounds of measurement) can be described as a single circuit in the circuit model, but cannot be broken up into two rounds involving two separate circuits. This limitation becomes apparent when we consider the hypothesis testing scenario. If the referee is performing a multi-round query, expecting the candidate to possess an MBQC-based quantum computer, then even Alice with a quantum computer may be unable to pass the test unless her computer operates in an architecture that can maintain quantum coherence between rounds. In the setting we consider, such a query by the referee in not allowed.

-simulation and computational power
In addition to the technical contribution of Theorem 1, we wish to make an argument for the conceptual connection between computational power and efficient indistinguishability. Intuitively, we wish to say that an agent A is at least as computationally powerful as agent B if A can "do" every task that B can do using an equivalent amount of resources. In our setting, we can restrict ourselves to polynomially equivalent resources and the most general task of sampling from a target probability distribution given an efficient description of it. However, defining what constitutes an acceptable solution to the sampling task is not only of central importance but also difficult to conceptually motivate. Given a description of a probability distribution, can anything short of sampling exactly from the specified distribution constitute success? An answer in the negative seems unsatisfactory because very small deviations 7 from exact sampling are ruled out. However, an answer in the positive presents the subtlety of specifying the exact requirement for achieving the task. It is easy to offer mathematically reasonable requirements for what constitutes success at the local level of each task but significantly more difficult to conceptually justify these as precisely the right notion. In our view, this difficulty arises because a well formed conceptually motivated requirement at the local level of each task must be inherited from a global requirement imposed at the level of the agent across their performance on any possible task.
We advocate for efficient indistinguishability as the right choice of global requirement for defining computational power and implicitly defining what constitutes a solution to a sampling task. If an agent is efficiently indistinguishable from another then, for any choice of δ > 0 chosen at the outset, the referee cannot assign any computational task to the candidate to observe a consequence that will improve (over randomly guessing) their ability to correctly decide between H a and H b by a probability δ. Thus, there is no observable consequence 8 to substituting an agent with another efficiently indistinguishable agent. For these reasons, we argue that in the setting where the agents are being used as computational resources, an agent's ability to (efficiently and indistinguishably) substitute another naturally defines containment of computational power. In light of this, the "only if" component of Theorem 1 says that, the computational power of Bob (given an -simulator of C) contains that of Alice (given C) and the "if" component says that an -simulator is the minimal simulator that achieves this since any simulator to achieve efficient indistinguishibility is an -simulator.
The referee can be seen as a mathematical tool for bounding the adversarial ability of any natural process to distinguish an agent from an efficiently indistinguishable substitute. As such one may argue for further generalization of the concept of efficient indistinguishability from one which is defined with respect to (w.r.t.) a computationally unbounded referee to a notion dependent on the computational power of the referee. If we take the view that the computational power of all agents within this universe is bounded by universal quantum computation, then a particularly interest generalization is efficiently indistinguishability w.r.t. a referee limited to universal quantum computation. We return to this generalization in the discussion, elsewhere focusing on efficient indistinguishability w.r.t. a computationally unbounded referee.

Probability Estimation
As described in the previous section, an exact (or approximate in the sense of Ref. [34]) weak simulator produces outcomes sampled from the exact (or exponentially close to the exact) Born rule probability distribution associated with a quantum circuit. The notion of -simulation is a weaker notion of simulation, a fact we aim to exploit by constructing algorithms for -simulation that would not satisfy the above-mentioned stronger notions of simulation. In this paper, we describe an approach to -simulation of quantum circuits based on two components: first, estimating Born rule probabilities for specific outcomes of a quantum circuit to a specified precision, and then using such estimates to construct a simulator. In this section, we describe this first component, coined a poly-box. We motivate and define poly-boxes, discuss their conceptual importance and give a number of important examples. In the next section, we employ such an estimator to construct an -simulator under certain conditions.

Born rule probabilities and estimators
Consider the description c = {ρ, U, M} of some ideal quantum circuit, with ρ an initial state, U = U L U L−1 · · · U 1 a sequence of unitary gates, and M a set of measurement operators (e.g., projectors).
The Born rule gives us the exact quantum predictions associated with observing any particular outcome x: (1) Further, probabilities associated with events S ⊆ {0, 1} k are given by: The task of efficiently classically estimating these probabilities with respect to general quantum circuits is of great practical interest, but is known to be hard even for rather inaccurate levels of estimation. For example, given a circuit c a from a family of universal quantum circuits with a Pauli Z measurement of the first qubit only, deciding if P a (0) > 2 3 or < 1 3 is BQP-complete. Monte Carlo methods are a common approach to estimating Born rule probabilities that are difficult to calculate directly [27,26,23]. Let p be an unknown parameter we wish to estimate, e.g., a Born rule probability. In a Monte Carlo approach, p is estimated by observing a number of random variables X 1 , . . . , X s and computing some function of the outcomesp s (X 1 , . . . , X s ), chosen so thatp s is close to p in expectation. In this case,p s is an estimator of p.
We first fix some terminology regarding the precision of as estimator, and how this precision scales with resources. We say that an estimatorp s of p is additive ( , δ)-precision if: We say thatp s is multiplicative ( , δ)-precision if: In the case where p ≤ 1 is a probability, a multiplicative precision estimator is more accurate than an additive precision estimator. For any estimator based on the Monte Carlo type of approach described above, there is a polynomial (typically linear) resource cost associated with the number of samples s. For example, the time taken to computep s will scale polynomially in s. More generally, s may represent some resource invested in computing the estimatorp s such as the computation run-time. For this reason, we may wish to classify additive/multiplicative ( , δ)-precision estimators by how s scales with 1/ and 1/δ. We say thatp s is an additive polynomial precision estimator of p if there exists a polynomial f (x, y) such that for all , δ > 0,p s is an additive ( , δ)-precision estimator for all s ≥ f ( −1 , log δ −1 ). We say thatp s is a multiplicative polynomial precision estimator of p if there exists a polynomial f (x, y) such that for all , δ > 0,p s is a multiplicative ( , δ)-precision estimator for all s ≥ f ( −1 , δ −1 ) 9 .
A useful class of polynomial additive precision estimators is given by application of the Hoeffding inequality. Supposep 1 resides in some interval [a, b] and is an unbiased estimator of p (i.e. E(p 1 ) = p). Letp s be defined as the average of s independent observations ofp 1 . Then, by the Hoeffding inequality, we have: for all > 0. We note that for ,p s is an additive ( , δ)-precision estimator of p. With this observation, we see that additive polynomial precision estimators can always be constructed from unbiased estimators residing in a bounded interval.
As an important example let us consider one way an agent can generate Born probability estimates when given access to some classical processing power and a family of quantum circuits C. Given a description of an event S and a description of a quantum circuit c a ∈ C, the agent can efficiently estimate p = P a (S). In this example, the agent can construct the estimatorp s by independently running the circuit s times. On each of the runs i = 1, . . . , s, she observes if the outcome x is in the event S (in this case, X i = 1) or not in S (in this case, X i = 0). We then definê p s = 1 s s i=1 X i . Using the Hoeffding inequality, it is easy to show that the Born rule probability estimatorp s is an additive polynomial precision estimator of p. Thus, for all a ∈ A * , , δ > 0, there is a choice of s ∈ N such that this procedure can be used to compute an estimatep of p := P a (S) such thatp satisfies the accuracy requirement: Pr |p −p| ≥ ≤ δ (6) and the run-time required to compute the estimatep is O(poly(n, −1 , log δ −1 )).
Let us now discuss an important aspect that we have been ignoring: namely the restrictions that need to be placed on the events S. We first note that since each event S is an element of the power set of {0, 1} k , the total number of events grows doubly exponentially implying that any polynomial length description of events can only index a tiny fraction of the set of all events. Even once we make a particular choice as to how (and hence which) events are indexed by polynomial length descriptions, deciding if a bit-string x is in the event S is not computationally trivial (with the complexity depending on the indexing). Since the estimation procedure requires a computational step where the agent checks whether x is in S, there will be restrictions place on the allowed events depending on the computational limitations of the agent and the complexity of the indexing of events.
When discussing poly-boxes, we will be interested in the restricted set of events S ∈ {0, 1, •} k . We use this notation to indicate the set of all specific outcomes and marginals. Specifically,  numbered j 1 , . . . , j m ) produce either a 0 or a 1 measurement outcome. The probability corresponding to such an event S is the marginal probability associated with observing the outcome bit-string x 1 , . . . , x k−m on the qubits numbered i 1 , . . . , i k−m marginalized over the qubits j 1 , . . . , j m . 9 The observant reader will notice that additive and multiplicative precision estimators have different scalings in δ. Of course one can define an alternative notion of additive estimation where s ≥ f ( −1 , δ −1 ) or an alternative notion of multiplicative estimation where s ≥ f ( −1 , log δ −1 ). Here, we have chosen to define the notions that are most useful as motivated by the existence of techniques and associated inequalities bounding their performance. In particular, Hoeffding's inequality allows the construction of additive ( , δ)-precision estimators while Chebyshev's inequality motivates the multiplicative ( , δ)-precision estimator definition.

The poly-box: generating an additive polynomial precision estimate
Given a family of quantum circuits C, we will be interested in constructing an -simulator of C using estimates of Born rule probabilities associated with circuits in C. For this purpose we define a poly-box over C.

Definition 4.
(poly-box). A poly-box over a family of quantum circuits C = {c a | a ∈ A * } with associated family of probability distributions P = {P a | a ∈ A * } is a classical algorithm that, for all a ∈ A * , , δ > 0 and S ∈ {0, 1, •} kn , can be used to compute an estimatep of P a (S) such that p satisfies the accuracy requirement: and, the run-time required to compute the estimatep is O(poly(n, −1 , log δ −1 )).
Eq. (7), gives an upper bound on the probability that the computed estimate,p, is far from the target quantity. This probability is over the potential randomness in the process used to generate the estimatep. In addition we implicitly assume that the output of this process is independent of prior output. In particular, let α = (a, , δ, S) be an input into a poly-box andp α the observed output. Then, we implicitly assume that the probability distribution ofp α only depends on the choice of input α and in particular is independent of prior output.
Note that a poly-box over a family of quantum circuits C = {c a | a ∈ A * } with associated family of probability distributions P = {P a | a ∈ A * } is a classical algorithm that can be used to compute additive polynomial precision estimatorsp s of P a (S) for all a ∈ A * , s ∈ N, S ∈ {0, 1, •} kn efficiently in s and n.

Conceptual significance of a poly-box
Whether or not a family of quantum circuits C admits a poly-box has bearing on both the complexity of sampling problems and decision problems solvable by C, and so we will find that the notion of a poly-box is a useful concept. We first note that the existence of a poly-box is a necessary condition for -simulation.

Theorem 2.
If C is a family of quantum circuits that does not admit a poly-box algorithm, then C is not -simulable.
Proof. We note that given an -simulator of C, a poly-box over C can be constructed in the obvious way simply by observing the frequency with which the -simulator outputs outcomes in S and using this observed frequency as the estimator for P(S).
A poly-box over C is not only necessary for the existence of an -simulator over C, but as we will show in Theorem 3, combined with an additional requirement, it is also sufficient. In addition, we note that if C admits a poly-box then all "generalized decision problems" solvable by C are solvable within BPP. As an illustrative but unlikely example, suppose there exists a classical poly-box over a universal quantum circuit family C UNIV . Then, for any instance x of a decision problem L in BQP, there is a quantum circuit c a ∈ C UNIV that decides if x ∈ L (correctly on at least 2/3 of the runs), simply by outputting the decision "x ∈ L" when the first qubit measurement outcome is 1 on a single run of c a and conversely, outputting the decision "x ∈ L" when the first qubit measurement outcome is 0. We note that, in order to decide if x ∈ L one does not need the full power of an -simulator over C UNIV . In fact it is sufficient to only have access to the poly-box over C UNIV . Given a poly-box over C UNIV , one can request an ( , δ)-precision estimatep for the probability p that the sampled outcome from c a is in S = (1, •, . . . , •). For < 1/6 and δ < 1/3, one may decide "x ∈ L" ifp ≥ 1/2 and "x ∈ L" otherwise. This will result in the correct decision with probability ≥ 2/3 as required. A poly-box over C offers the freedom to choose any S ∈ {0, 1, •} n which can in general be used to define a broader class of decision problems. Of course in the case of C UNIV , this freedom cannot be exploited because for every choice of a and S = (1, •, . . . , •), there is a alternative easily computable choice of a such that the probability that a run of c a ∈ C UNIV results in an outcome in (1, •, . . . , •) is identical to the probability that a run of c a ∈ C UNIV results in an outcome in S. However, since we are considering the general case of not necessarily universal families of quantum circuits, it is feasible that a poly-box over C will be computationally more powerful than a poly-box over C restricted to only estimating probabilities of events of the form S = (1, •, . . . , •). On the other hand, we do not wish to make poly-boxes exceedingly powerful. If we view a poly-box over C as a black box containing an agent with access to C and processing an estimation algorithm as per the aforementioned example, then by restricting the allowable events as above and choosing such a simple method of indexing these, we are able to limit the additional computational power given to the agent and/or poly-box.

Poly-boxes from quasiprobability representations
There are a number of known algorithms for constructing poly-boxes over certain non-universal families of quantum circuits [2,45,27,22,23]. In particular, we focus on the algorithm presented in Ref. [27], which can be used to construct a poly-box over any family of quantum circuits C where the negativity of quantum circuits grows at most polynomially in the circuit size. We refer the interested reader to Ref. [27] for a definition of the negativity of a quantum circuit, but note that this quantity depends on the initial state, sequence of unitaries and the final POVM measurement that defines the circuit. For general quantum circuits, the negativity can grow exponentially in both the number of qudits and the depth of the circuit.
A key application of this approach is to Clifford circuits. In odd dimensions, stabilizer states, Clifford gates, and measurements in the computational basis do not contribute to the negativity 10 of a Monte Carlo based estimator. Including product state preparations or measurements that are not stabilizer states, or non-Clifford gates such as the T gate, may contribute to the negativity of the circuit. Nonetheless, these non-Clifford operations can be accommodated within the poly-box provided that the total negativity is bounded polynomially. In addition, a poly-box exists for such circuits even in the case where the negativity of the initial state, or of the measurement, is exponential [27].

A poly-box over C PROD
As a nontrivial example of a class of Clifford circuits for which there exists a poly-box, consider the family of circuits C PROD . This family consists of quantum circuits with an n-qubit input state ρ that is an arbitrary product state 11 (with potentially exponential Wigner function negativity [27] in the input state). The allowed transformations are non-adaptive Clifford unitary gates, and k ≤ n qubits are measured at the end of the circuit, in the computational basis. Such a circuit family has been considered by Jozsa and Van den Nest [41], where it was referred to as INPROD, OUTMANY, NON-ADAPT. This circuit family will be discussed again in Sec. 5 where we will show the classical hardness of simulating this family according to another notion of simulation. Aaronson and Gottesman [2] provide the essential details of a poly-box for this family of circuits; for completeness, we present an explicit poly-box for C PROD in the following lemma.

Lemma 1. A classical poly-box exists for the Clifford circuit family C PROD .
Proof. Give an arbitrary circuit c = {ρ, U, M} ∈ C PROD and an event S ∈ {0, 1, •} n we construct an estimatorp s of the probability P(S) as follows: 1. Let Π = ⊗ n i=1 Π i be the projector corresponding to S. Here, we set: 10 with respect to either the phase point operator or stabilizer states choice of frame 11 As an additional technical requirement, we impose that the input product state is generated from |0 ⊗n by the application of polynomially many gates from a universal single qubit gate set with algebraic entries.

For each
In these cases, define a local Pauli operator P i by sampling either I or ±Z with equal probability. For each i where the i th entry of S is a •, we deterministically set P i = I. 3. We construct the n-qubit Pauli operator P := ⊗ n i=1 P i , (including its sign ±).

5.
We compute the single sample estimatep 1 using the equation: 6. We compute the estimatorp s by computing s independent single sample estimates and taking their average.
It is straightforward to show that the expectation value ofp s is the target quantum probability p := P(S). Further, the single sample estimates are bounded in the interval [−1, 1]. Hence, by the Hoeffding inequality, This algorithm can be executed efficiently in s and in n and produces additive polynomial precision estimates of P(S) for any circuit c ∈ C PROD and any S ∈ {0, 1, •} n and is thus a poly-box.

A poly-box over C IQP
As an additional example, we note that C IQP , the family of Instantaneous Quantum Polynomialtime (IQP) quantum circuits [46,47,8,9] that consist of computational basis preparation and measurements with all gates diagonal in the X basis admits a poly-box. One can construct such a poly-box over C IQP by noting that Proposition 5 from Ref. [39] gives a closed form expression for all Born rule probabilities and marginals of these circuits. This expression: is an expectation value over 2 k vectors in Z n 2 where: • {i 1 , . . . , i k } are the set of indices where the entries of S are in {0, 1}; • s ∈ Z n 2 is defined by s i = S i when i ∈ {i 1 , . . . , i k } and s i = 0 otherwise; • P r is the affinification of the m × n binary matrix P which defines a Hamiltonian of the IQP circuit constructed from Pauli X operators according to • α(P, θ) is the normalized version of the weight enumerator polynomial (evaluated at e −2iθ ) of the code generated by the columns of P .
We note that this is an expectation over exponentially many terms which have their real part bounded in the interval [−1, 1]. Further, for each r, the quantity α P r , π 4 can be evaluated efficiently using Vertigan's algorithm [48] and Ref. [39]. As such, one can construct an additive polynomial precision estimator for all Born rule probabilities and marginals simply by evaluating the expression:p 1 = Re (−1) r·s α P r , π 4 (12) for polynomially many independent uniformly randomly chosen r ∈ span { e i | i ∈ {i 1 , . . . , i k }} and computing the average over all choices. This can be shown to produce a poly-box by application of the Hoeffding inequality.

From estimation to simulation
Given the significance of -simulation as the notion that minimally preserves computational power, here we turn our attention to the construction of an efficient algorithms for lifting a poly-box to an -simulators. We give strong evidence that in the general case, such a construction is not possible. This suggests that a poly-box is statistically distinguishable from an -simulator and hence computationally less powerful. However, by restricting to a special family of quantum circuits, we show an explicit algorithm for lifting a poly-box to an -simulator. Combined with Theorem 2 this shows that within this restricted family a poly-box is computationally equivalent to an -simulator. The significance of -simulation also motivates the need to understand the relationship to other simulators defined in terms of Born probability estimation. At the end of this section and in Appendices B and C we present two algorithms which lift an estimator of probabilities and marginals to a sampler.

A poly-box is not sufficient for -simulation
This section focuses on the relation between poly-boxes and -simulation. With a poly-box, one can efficiently estimate Born rule probabilities of outcomes of a quantum circuit with additive precision. However, assuming BQP = BPP, a poly-box alone is not a sufficient computational resource forsimulation. We illustrate this using a simple but somewhat contrived example, wherein an encoding into a large number of qubits is used to obscure (from the poly-box) the computational power of sampling.
Define a family of quantum circuits C e using a universal quantum computer as an oracle as follows: 1. take as input a quantum circuit description a ∈ A * (this is a description of some quantum circuit with n qubits); 2. call the oracle to output a sample outcome from this quantum circuit. Label the first bit of the outcome by X; 3. sample an n-bit string Y ∈ {0, 1} n uniformly at random; is the parity function on the input bit-string Y .
We note that C e cannot admit an -simulator unless BQP⊆BPP, since simple classical post processing reduces the -simulator over C e to an -simulator over universal quantum circuits restricted to a single qubit measurement.
We now show that C e admits a poly-box: 1. take as input a ∈ A * , , δ > 0 and S ∈ {0, 1, •} n+1 . Our poly-box will output probability estimates that are deterministically within of the target probabilities and hence we can set δ = 0; 2. if S specifies a marginal probability i.e. k < n + 1, then the poly-box outputs the estimate 2 −k (where k is the number of non-marginalized bits in S); otherwise, (a) small case: if < 1/2 n , explicitly compute the quantum probability p := Pr(X = 1); (b) large case: if ≥ 1/2 n , output the probability 2 −(n+1) as a guess.
This algorithm is not only a poly-box over C e but it in fact outputs probability estimates that have exponentially small precision.

Lemma 2.
For all a ∈ A * , > 0 and S ∈ {0, 1, •} n , the above poly-box can output estimates within additive error of the target probability using O(poly(n, 1/ )) resources. Further, the absolute difference between estimate and target probabilities will be ≤ min 2 −(n+1) , .

Proof. We note that the resource cost of this algorithm is O(poly(n, 1/ )). Since in the case of small it is O(poly(2 n )) ⊆ O(poly(1/ )) and in the case of large it is O(n).
We now consider the machine's precision by considering the case with no marginalization and the case with marginalization separately. We restrict the below discussion to the large case as the estimates are exact in the alternate case.
Let z = (z 0 , . . . , z n ) ∈ {0, 1} n+1 be fixed and define z := (z 1 , . . . , z n ). Then, So for S = z (i.e. no marginalization), we have an error given by max For the case where S i = • (i.e. there is marginalization over the i th bit only and k = n), we note that the quantum marginal probability p(S) is given exactly by: where z j := S j for j = i. This implies that for all k < n + 1, the quantum probability is exactly 2 −k . Thus, in the worst case (no marginalization and ≥ 2 −n ), the error is ≤ 2 −(n+1) .
This example clearly demonstrates that the existence of a poly-box for a class of quantum circuits is not sufficient for -simulation. In the following, we highlight the role of the sparsity of the output distribution in providing, together with a poly-box, a sufficient condition for -simulation.

Sparsity and sampling
Despite the fact that in general the existence of a poly-box for some family C does not imply the existence of an -simulator for C, for some quantum circuit families, a poly-box suffices. Here, we show that one can construct an -simulator for a family of quantum circuits C provided that there exists a poly-box over C and that the family of probability distributions corresponding to C satisfy an additional constraint on the sparsity of possible outcomes. We begin by reviewing several results from Schwarz and Van den Nest [30] regarding sparse distributions. In Ref. [30], they define the following property of discrete probability distributions:

Definition 5. ( -approximately t-sparse). A discrete probability distribution is t-sparse if at most t outcomes have a non-zero probability of occurring. A discrete probability distribution is -approximately t-sparse if it has a L 1 distance less than or equal to from some t-sparse distribution.
The lemma below is a (slightly weakened) restatement of Theorem 11 from Ref. [30].

Lemma 3. (Theorem 11 of Ref. [30]). Let P be a distribution on {0, 1}
k that satisfies the following conditions: 1. P is promised to be -approximately t-sparse, where ≤ 1/6; Then it is possible to classically sample from a probability distribution P ∈ B(P, 12 +δ) efficiently in k, t, −1 and log δ −1 .
We note that for every discrete probability distribution P, there is some unique minimal function t( ) such that for all ≥ 0, P is -approximately t-sparse. We note that if this function is upper-bounded by a polynomial in −1 , then a randomized classical algorithm for sampling from estimators of P(S) can be extended to a randomized classical algorithm for sampling from some probability distribution P ∈ B(P, ) efficiently in −1 . This fact motivates the following definition: Definition 6. (poly-sparse) Let P be a discrete probability distribution. We say that P is polysparse if there exists a polynomial P (x) such that for all > 0, P is -approximately t-sparse whenever t ≥ P ( 1 ).
Let P be a family of probability distributions with P a ∈ P a distribution over {0, 1} ka . We say that P is poly-sparse if there exists a polynomial P (x) such that for all > 0 and a ∈ A * , P a is -approximately t-sparse whenever t ≥ P (k a / ).
The notion of poly-sparse is related to the notion of smooth max entropy H max . In particular, P is poly-sparse iff there exists a polynomial P (x) such that for every P ∈ P with domain cardinality 2 n , we have: where H max (P) := inf P log 2 |Supp(P )|, |Supp(P )| is the cardinality of the support of the distribution P and the infimum is taken over all distributions P subject to 1 2 ||P − P|| 1 ≤ . This notion was first defined in Ref. [49] where it corresponds to the -smooth Rényi entropy of order α = 0.

Conditions for -simulation
With this notion of output distributions that are poly-sparse, we are in a position to state our main theorem of this section:

Theorem 3. Let C be a family of quantum circuits with a corresponding family of probability distributions P. Suppose there exists a poly-box over C, and that P is poly-sparse. Then, there exists an -simulator of C.
Proof. Let a ∈ A * and > 0 be arbitrary. Then there exist t = t(a, ) such that P a isapproximately t-sparse. Further, due to the existence of the efficient classical poly-box over C, for all S ∈ {0, 1, •} ka , there exists an (s, k a )-efficient randomized classical algorithm for sampling from an additive polynomial estimator of P a (S). Thus by Lemma 3, it is possible to classically sample from a probability distribution P a ∈ B(P a , ) efficiently in −1 , t and k a . We note that here we have removed the dependence on δ since we can make δ ≤ whilst remaining efficient in −1 , t and k a . Finally, since poly-sparsity guarantees the existence of a t(a, ) that can be upper-bounded by a polynomial in ka , we arrive at the desired result.
As an example, consider families of quantum circuits C where each circuit of size n can only produce outcomes from some set of size at most poly(n). Then C is poly-sparse (even if the output distributions are uniform over the poly(n) sized support). Hence, if C also admits a poly-box, then by Thm. 3 one can with high probability repeatedly sample from this space of poly(n) outcomes hidden within a exponentially large space of bit-strings.
We have shown that having a poly-box and a poly-sparsity guarantee for a family of quantum circuits gives us an -simulator. We emphasize that the proof of this Theorem is constructive, and allows for new simulation results for families of quantum circuits for which it was not previously known if they were efficiently simulable. As an example, our results can be straightforwardly used to show that Clifford circuits with sparse outcome distributions and with small amounts of local unitary (non-Clifford) noise, as described in Ref. [23], are -simulable.
Theorem 3 requires a promise of poly-sparsity. Since this is a property of infinite families of probability distributions, one cannot hope to algorithmically verify (or even falsify) it through sampling from member distributions. Nevertheless, for distributions generated by some particular family of quantum circuits, a proof that this property holds may be possible.
In summary, the results of Thms. 3 and 2 imply that in order to construct an -simulator of any particular family of quantum circuits, it is necessary to construct a poly-box and further, if the family is poly-sparse, this is also sufficient. In Sec. 4.1, we also showed that there exists a somewhat artificial family of quantum circuits C e with respect to which a poly-box is insufficient for -simulation. In the next section, we show that this phenomenon also occurs with much more natural families of quantum circuits.

On lifting stronger estimators to approximate samplers
In contrast to poly-boxes, certain stronger nations of simulation based on Born rule probability estimation can be lifted to -simulators (or even stronger approximate weak simulators). In Appendices B and C we present two such efficient classical algorithms.
The algorithm presented in Appendix C uses an estimator with multiplicative precision to construct an -simulator (it can in fact construct an approximate weak simulator based on the stronger notion from Ref. [35]). This algorithm exploits the fact that ratios of multiplicative precision estimators are multiplicative precision in order to sequentially, one qubit's measurement outcome at a time, sample from the marginal probability of the next qubit's measurement conditioned on the sampled measurement outcomes of the prior measurements. This algorithm and its variants have been presented in Refs. [3,4,35] and are well known within the simulation-of-quantum-circuits community.
The algorithm presented in Appendix B uses an estimator with exponentially small additive precision to construct an -simulator (it can in fact construct an approximate weak simulator based on the stronger notion from Ref. [34]). This algorithm aims to map a bit-string r (approximately representing a uniformly sampled point from the unit interval) to a bit-string representing the outcome of running the circuit. Such a mapping is defined for every ordering of the measurement outcomes. This algorithm makes intuitive use of marginal probability estimates to do a binary search for the measurement outcome corresponding to r. This technique avoids computing ratios of probability estimates making it useful in regimes where additive errors are small but larger than some of the probabilities in the target distribution. Hence, this algorithm has some advantages compared to that of Appendix C. In particular, it can be used to lift an additive ε precision estimator to a sampler from within L 1 distance O(2 n ε). This can be used to construct asimulator in certain cases where the algorithm in Appendix C would fail. An example is when one has access to an estimator with additive precision ε = 2 −n κ where κ > 0 can be made arbitrarily small in run-time O(poly(n, 1/κ)).

Hardness results
In the previous section, we have shown that one can construct an -simulator for a family of quantum circuits C given a poly-box for this family together with a promise of poly-sparsity of the corresponding probability distribution. We also discussed a contrived construction of a family of quantum circuits that admits a poly-box but is not -simulable (unless BQP=BPP). In this section, we provide strong evidence (dependent only on standard complexity assumptions and a variant of the now somewhat commonly used [7,9,10,11,12,36,16] "average case hardness" conjecture) that a condition such as poly-sparsity is necessary even for natural families of quantum circuits. One such family has already been identified by noting that C IQP admits a poly-box and is likely hard to -simulate [9]. Here, we also show the likely hardness of -simulating the nonpoly-sparse Clifford circuit family C PROD (defined in Sec. 3). These results mean that at least two (and possibly more) of the intermediate models of quantum computing have the property that the probability of individual outcomes and marginals can be estimated to 1/poly(n) additive error but due to non-sparsity, their -simulability is implausible.
Our hardness result for classical -simulation of C PROD closely follows the structure of several similar results, and in particular that of the IQP circuits result of Ref. [9]. We note that this hardness result is implied by the hardness results presented in Refs. [10,11], however; our proof is able to use a more plausible average case hardness conjecture than these references due to the fact that we are proving hardness of -simulation rather than proving the hardness of the yet weaker notion of approximate weak simulation employed by these references.
Despite the existence of a poly-box over C PROD , we show that there cannot exist a classicalsimulator of this family unless the average case hardness conjecture fails or the polynomial hierarchy collapses to the third level. We note that the hardness of exact weak simulation of C PROD was shown in Ref. [41]. In contrast here we show the hardness of -simulation for this family. Our proof relies on a conjecture regarding the hardness of estimating Born rule probabilities to within a small multiplicative factor for a substantial fraction of randomly chosen circuits from C PROD . This average case hardness conjecture is a strengthening of the worst case hardness of multiplicative precision estimation of probabilities associated with circuits from C PROD .
The hardness of -simulating C PROD circuits is shown by first noting that the existence of a classical -simulator implies, via the application of the Stockmeyer approximate counting algorithm [50], the existence of an algorithm (in the third level of the PH) for estimating the probabilities associated with the output distribution of the -simulator to within a multiplicative factor. These estimates can then be related to estimates of the exact quantum probabilities by noting two points: 1. that the deviation between the -simulator's probability of outputting a particular outcome and that of the exact quantum probability will be exponentially small for the vast majority of outcomes. We show this fact using Markov's inequality.
2. that a significant portion of outcomes associated with randomly chosen circuit in C PROD must have outcome probabilities larger than a constant fraction of 2 −n . We show this property using our proof that these circuits anti-concentrate.
These observations are combined to show that if there exists an -simulator of C PROD , then there exists a classical algorithm (in the third level of the PH) that can estimate Born rule outcome probabilities to within a multiplicative factor for almost 50% of circuits sampled from C PROD . This is in contradiction with Conjecture 1 thus implying that an -simulator does not exist.

Conjecture regarding average case hardness
We begin by stating our conjecture that multiplicative precision estimation of C PROD is #P-hard in the average case.

Conjecture 1.
There exist an input product state ρ over n qubits such that given a uniformly random Clifford unitary U acting on n qubits, estimating p := tr U ρU † |0 0| to within a multiplicative error of 1/poly(n) for 49% or more of the sampled Clifford unitaries is #P-hard.
We note that this average case hardness conjecture has an analogous worst case hardness version 12 . The worst case hardness can be proven by applying the result of Refs. [40,41,42,43,44] and by an argument essentially identical to the proof of Theorem 5.1 in Ref. [16]. We omit the proof here but note that this proof relies on three key facts: 1. that estimating Born rule probabilities for universal (indeed even IQP) circuits that use a gate set with algebraic entries, to within any multiplicative factor in the open interval (1, √ 2) is #P-hard [44,42] ; 2. for gate sets with algebraic entries, all non-zero output probabilities are lower bounded by some inverse exponential [43]; 3. that C PROD circuits with post-selection (or adaptivity) are universal for quantum computation [40,41].

Anti-concentration of outcomes for C PROD
Next, we prove that Clifford circuits chosen uniformly at random from the family C PROD satisfy an anti-concentration property.
where p x := tr U ρU † |x x| is the Born rule probability for the outcome x.
Proof. We use the unitary 2-design property of the Clifford group.
where P Sym = 1 2 (1 + SWAP) is the projection onto the symmetric subspace of C d n ⊗ C d n . We use the Paley-Zygmund inequality, which states that for a non-negative random variable R with finite variance, and for any α ∈ (0, 1): Application of this inequality with Eqs. (17)(18) then gives the desired result.
We point out that the property of anti-concentration is inconsistent with poly-sparsity. This result is shown in Theorem 7 of Appendix D.

Hardness theorem
We are now in a position to prove our main theorem: Proof. Assuming there exists an -simulator of C PROD , we can treat the -simulator as a deterministic Turing machine with a random input. Let T be the Turing machine that takes as an input > 0 (representing the L 1 error required), r ∈ {0, 1} poly(n/ ) (representing the random bit-string) and d c ∈ A poly(n) (representing an efficient description of an n qubit circuit c ∈ C PROD ) and outputs an outcome X ∈ {0, 1} k with the correct statistics (over uniformly random r inputs) up to in L 1 distance in time poly(n, 1/ ). That is, the output satisfies: where p x := Pr(X = x) is the probability of observing outcome x on a single run of the quantum circuit c and p x := Pr r∼unif (X = x) is the probability of observing outcome x on a single run of the Turing machine T for a uniformly distributed random r and fixed , d c inputs. We now note that the problem of computing the proportion p x of bit-strings r that result in T ( , r, d c ) = x is a problem in #P. Thus, the Stockmeyer algorithm gives us a means of estimating p x to within a multiplicative error in the complexity class FBPP NP .
More precisely, there exists an algorithm in FBPP NP which will output an estimatep x such that: Thus we have that for all c and for all x: We note that the expectation value of |p x − p x | over random choice of x ∼ unif({0, 1} k ) is upper-bounded by 2 −n . That is: Restricting our attention to circuits in C PROD where all of the qubits are measured i.e. k = n, we have: We apply Markov's inequality, which states that for R a non-negative random variable and γ > 0: we have that for all β > 0: That is: Applying this to the upper bound in Eq. (22), we find that for all β > 0: For any fixed choices of α ∈ (0, 1), β, > 0, let us define the following events: By Eq. (16), we have Pr . This immediately implies the following: This can be further simplified by incorporating the randomness over x into the uniform randomness over the Clifford unitaries. Specifically, let y ∈ {0, 1} n be arbitrarily fixed. Further, let U x := ⊗ n i=1 X xi . Then, noting that for all n qubit Cliffords V : where probabilities over U x are chosen uniformly over all x ∈ {0, 1} n . Applying this to Eq. (28) we find that for all y ∈ {0, 1} n and for all n qubit product states ρ; We recall that for an -simulator, > 0 can be made polynomially small efficiently in run-time and n. Thus, as an example, we may assign the following scaling to α, β, : This argument shows that the existence of an -simulator of C PROD implies that there exists and algorithm in ∆ p 3 that can for any fixed product states ρ and measurement outcomes x ∈ {0, 1} n , output an O(1/n) multiplicative precision estimate of p x := tr(U ρU † |x x|) for almost 50% of randomly uniformly chosen Clifford unitaries U acting on n qubits. That is: By conjecture 1, this is #P-hard. This implies that a #P-hard problem is solved in FBPP NP . By Toda's theorem [51], this collapses the polynomial hierarchy to its third level.

Discussion
There is a substantial and growing body of results showing the classical "simulability" of some quantum computers and the hardness of "simulability" of others. We hope that the results presented here will significantly inform the interpretation of this literature in relation to the comparison of the computational power of the relevant quantum computer to the computational power of a universal classical computer. For some family of quantum circuits C, these results typically make statements of the form either: • Simulability: C can be classically "simulated" or • Hardness: C can be classically "simulated" implies some implausible outcome In the case of simulability proofs, our results show that whenever the notion of simulation used is stronger or equivalent to -simulation, the useful computational power of C is contained within classical. Further, if the notion of simulation is a poly-box (a weaker notion then -simulation), this still applies provided that C is poly-sparse. If C is not known to be poly-sparse but admits a polybox then, we can still conclude that without non-trivial classical post-processing, C is incapable of solving decision problems outside of the complexity class BPP.
In the case of hardness proofs, our results show that whenever the notion of simulation used is weaker or equivalent to -simulation, it is plausible that the useful computational power of C is beyond classical. However, for proofs of hardness based on weaker notions of simulation, it may be possible to alter the proof such that it shows the hardness of -simulation (rather than a yet weaker notion) with the added benefit that now the hardness is more plausible.
Some hardness results show the implausibility of classically simulating C with respect to a notion of simulation much stronger than -simulation. Even if quantum computers can reliably achieve such a notion of simulation, these results cannot be seen as showing the implausibility of the existence of efficient classical devices that can be used as a perfectly good computational substitute to C.
The perspective of efficient indistinguishability gives us a natural avenue to defining the set of all problems solvable by a quantum device. We have seen that the minimal notion of simulation to achieve this is -simulation; a significantly weaker notion of simulation than many of the notions used in literature [34,35,8,14]. Thus, the gap between classical and quantum computational power can be closed not only by the development of more powerful classical simulation algorithms but also by significantly reducing the computational hurdle classical devices must overcome in order to act as efficient substitutes to quantum computers. Our results exploit this feature in order to show that any family of quantum circuits that both admits a poly-box and satisfies the poly-sparsity condition can be -simulated. The existence of multiple known constructions of polyboxes (see Refs. [2,45,27,22,23]) over restricted families of quantum circuits, and in particular Ref. [27], demonstrates the significant advantages offered by weakening the minimal requirements on classical simulators from the stronger notions of weak simulation to that of -simulation.
For any given family of quantum circuits, poly-sparsity can be trivially guaranteed by upper bounding the number of measured qubits by log n. However, the condition of poly-sparsity permits significantly more complex probability distribution families (including families with exponentially growing support). Future exploration of how to non-trivially guarantee poly-sparsity offers yet more potential for identifying interesting families of quantum circuits that are -simulable using the techniques outlined here.
In this paper, we have argued that -simulation minimally captures computational power. However, the term "minimally" is with respect to the computational power of the referee, which is unbounded in the setting we considered. This raises the importance of future work aimed at defining the notion of simulation which minimally captures the computational power of a quantum computer with respect to a referee that is computationally bounded to universal quantum computation (or equivalent). In light of this observation, our work suggests that even requiring a simulator to be capable of solving all sampling problems (as defined in Ref. [37]) solvable by the quantum device is too strong to be minimal (w.r.t. a universal quantum bounded referee). Future results in this direction would inform us on precisely how to further weaken the notion of sampling problems and to define a yet weaker complexity class than SampBQP (or more generally SampC) that (w.r.t. a universal quantum bounded referee) minimally captures computational power.
In an experimental setting where there is a constant lower bound to the noise present in the quantum device, the minimal requirements for efficient indistinguishability become yet weaker. In this setting, it is plausible that for IQP circuits and boson sampling circuits, classical computation can achieve the minimal requirements for efficient indistinguishability w.r.t. a universal quantum bounded referee. This possibility is supported by the existence of classical algorithms for simulating noisy IQP circuits [24] and noisy boson sampling circuits [25]. In the constant lower bounded noise setting, these algorithms fail to achieve efficient indistinguishability w.r.t. a computationally unbounded referee. However, whether or not they achieve efficient indistinguishability w.r.t. a universal quantum bounded referee remains a question to be resolved.
Aiming to tighten the separation between simulability and hardness is an important goal toward a deeper understanding of the computational power of quantum verses classical circuits. Specifically, the aim is to move towards a full classification of simulablity by gradually reducing the "unclassified" space (of parameters describing a quantum computer that are both outside the range to ensure simulabilty and outside the range to ensure hardness of simulability). By focusing on the tension between anti-concentration and poly-sparsity our work has made modest progress in this direction with potential for further consolidation and progress with respect to this aim.
We have shown the poly-sparsity and anti-concentration properties to be mutually exclusive. If we assume that the polynomial hierarchy does not collapse and restrict to quantum computers that admit a poly-box and average case hardness (some plausible candidates being C IQP , C PROD , and their poly-sparse restricted counterparts) we see that either poly-sparsity holds ensuringsimulability or anti-concentration holds ensuring hardness.
For general quantum circuit families, poly-sparsity and anti-concentration are not exhaustive. Future work directed towards finding interesting spaces of quantum computers where the two notions are exhaustive would help to classify more of the yet unclassified computers in Fig. 1 (admits a poly-box and not poly-sparse). Restricted to this setting, all quantum computers that admit the appropriate average case hardness property would admit a hardness proof. Further, such work can give a much needed new perspective on the peculiar nature of the transition from -simulable to hardness that IQP and magic state injected Clifford circuit families undergo as they transition from poly-sparse to non-poly-sparse. In particular, this may shed light on whether this behavior (shared by IQP, magic state injected Clifford circuit and possibly others) is common to intermediate models of quantum computing for a good reason or simply a coincidence.
Our work establishes the conceptual importance of a poly-box as a notion of simulation. Through the Hoeffding inequality and powerful sampling techniques such as Monte Carlo simulations, we inherit a number of important examples of poly-boxes including IQP circuits, magic state injected Clifford circuit and circuits with polynomially bounded negativity (see also Refs. [2,45,27,22,23]). This is of immediate practical interest as admitting a poly-box is sufficient for many useful problems such as finding certain expectation values or estimating the probability associated with certain events.
Whether or not a family of quantum circuits C admits a poly-box significantly informs our understanding of the computational power of C relative to classical. Simulability of a family of quantum circuits C according to the notion of a poly-box, implies that, without additional classical computational resources, C cannot solve decision problems outside of classical. If C is also polysparse (with binary outcome circuits being a very special case) then even an agent with universal classical computational power and access to the quantum computer C is confined to universal classical computational power. However, when supplemented with a universal classical computer, if C admits a poly-box but is not poly-sparse then it may be capable of solving decision problems beyond BPP. This possibility is not ruled out by our analysis and is consistent with the fact that C PROD and C IQP circuits both admit hardness proofs.
There is something conceptually unclear about circuit families that admit poly-boxes and a hardness proof of the type presented in Sec. 5. In particular, it is unclear if these admit a poly-box purely due to the restriction placed on the types of events that a poly-box can be queried about, or if hardness of -simulation could manifest even in circuit families which allow efficient classical polynomial precision estimation of probabilities associated with any family of events decidable in BPP. In the latter case, an agent with access to such circuits cannot solve any decision problem outside of BPP even given access to a universal classical computer. The former case leaves open the possibility that these families of circuits will behave like C e (introduced in Sec. 4.1) where some appropriate classical post-processing of outcome samples will render them more powerful than BPP (assuming BQP ⊆ BPP). This question is closely related to an open question raised by Aaronson in Ref. [37].
It is surprising that examples of families of circuits that admit a poly-box and a plausible hardness proof are far from rare and in fact may be typical among intermediate models of quantum computing. In addition to the families we have shown to be in this category (C PROD and C IQP ), we note that linear optical networks C LON and circuits with polynomially bounded negativity C polyN are also plausible candidates. We note that due to an algorithm by Gurvits [6] (see also Ref. [52]), the family of linear optical quantum circuits considered in the boson sampling setting of Ref. [7] admit additive polynomial precision estimators of individual outcome probabilities. However, there is no known poly-box over this family since it is currently unclear how to produce such estimators for all marginal probabilities. Alternatively, C polyN is known to admit a polybox [27]. Also, for odd prime d it contains the qudit generalization of C PROD which is both universal under post-selection and anti-concentrates. Hence an average case hardness conjecture is also plausible implying that C polyN admits a proof of hardness essentially identical to that of C PROD . In light of these considerations we are optimistic that useful and computationally interesting applications can be found for intermediate models of quantum computation. by the U.S. Army Research Office through grant W911NF-14-1-0103. HP also acknowledges support from the Australian Institute for Nanoscale Science and Technology Postgraduate Scholarship (John Makepeace Bennett Gift). The work of DG is supported by the Excellence Initiative of the German Federal and State Governments (ZUK 81), the DFG within the CRC 183 (project B01), and the DAAD.

A Statistical indistinguishability proof
We first show a well known connection between the optimal probability of choosing the correct hypothesis in a hypothesis test and the L 1 distance.
Suppose P 1 and P 2 are probability distribution over some finite set I, and suppose a sample X is observed from the distribution Q where either Q = P 1 (hypothesis H 1 ) or Q = P 2 (hypothesis H 2 ). Then, any hypothesis test must have some H 1 acceptance region A 1 ⊆ I and some H 2 acceptance region A 2 := A c 1 ⊆ I. The probability of a type I error is α := Pr(X ∈ A 2 | X ∼ P 1 ) and the probability of a type II error is β := Pr(X ∈ A 1 | X ∼ P 2 ). The L 1 distance between P 1 and P 2 can be written as: where, the second equality can be verified by noting that the supremum is achieved when Here, α * and β * are the type I and type II errors for the optimal choice of acceptance region / hypothesis test. We note that if a priori, H 1 and H 2 are equally likely, then the probability of choosing the correct hypothesis, based on a single sample, using the optimal test is thus given by: The interactive protocol between the referee and the candidate will proceed as follows (see Figure 3): 1. Initially, the referee will fix a test by choosing a function a(·) that dictates how all gathered data in prior rounds determines the next circuit request. We note that while this can be further generalized by allowing stochastic maps (rather than functions), this has no baring on our results and our proof can fairly easily be extended if required.
2. Initially the referee will make the circuit request a ∅ ∈ A * 3. The response from the candidate is denoted by the random variableỸ a ∅ and the string of random variables a ∅ ,Ỹ a ∅ will be represented byX 1 4. The referee may make another circuit request by applying the map a toX 1 thus defining the next circuit request a(X 1 ).

5.
On the (j + 1) th round, the referee's circuit request will be represented by a(X j ) and the response will be represented byỸ a(Xj ) where,X j+1 represents the string of random variables X j , a(X j ),Ỹ a(Xj ) .
6. In addition, at the end of the j th round for j = 1, 2, . . ., a fixed stochastic binary map h will be applied toX j with the outcome determining whether or non to halt the interactive procedure. We will assume that the test will eventually halt and represent the final round of any given test by m ∈ N.
7. Finally, the referee will decide H a vs H b by applying a fixed binary map d to the full collected data setX m .
3: The figure above shows the interactive protocol between the referee and the candidate. In each round of the protocol, the referee send a circuit description to the candidate. This circuit description is in general given by applying any fixed (possibly stochastic) map to all of the prior data collected by the referee. The candidate's responses (Ỹ ) may depend only on the circuit request from the current round and the round number. When the candidate is known to be Alice or Bob, we will represent the variables corresponding toỸ andX by Y and X or Y ′ and X ′ respectively.
a distinct observed value of the random variableX m . The two probability distribution P 1 and P 2 discussed above will each correspond to a distribution over all of the branches of this tree conditioned on the choice of candidate. We note that one can easily incorporate probabilistic choices by the referee into the formalism which will only result in an increase in the number of branches of the tree. Eq. (A1) shows that Bob can ensure suppression of P correct by suppressing the L 1 distance ||P 1 −P 2 || 1 . However, if Bob has an !-simulator, he can only directly control the L 1 distance between his output and that of Alice for a given circuit request. The proof below culminating in Eq. (A5) demonstrates that ||P 1 − P 2 || 1 is sub-additive in the L 1 distance of each circuit request thus Bob can upper bound P correct by bounding each round's L 1 distance in such a way as to ensure the sum converges to the desired bound for ||P 1 − P 2 || 1 .
Proof. We now prove each direction of the "if and only if" statement of Thm. 1. "⇒" Here, we assume that Bob's simulation scheme is an !-simulator over C and explicitly specify a strategy for Bob which simultaneously achieves indistinguishability and efficiency. Bob's strategy will be as follows; if he becomes the candidate, then in the j th round of the protocol, he will be asked to report the outcome of running some circuit indexed by a j ∈ A * . In this case, Bob will !-simulate the circuit c aj with the precision setting given by: We note that Bob's strategy as outlined above is fixed and independent of the referee's hypothesis test. Further, we note that for all m ∈ N: Figure 3: The figure above shows the interactive protocol between the referee and the candidate. In each round of the protocol, the referee send a circuit description to the candidate. This circuit description is in general given by applying any fixed (possibly stochastic) map to all of the prior data collected by the referee. The candidate's responses (Ỹ ) may depend only on the circuit request from the current round and the round number. When the candidate is known to be Alice or Bob, we will represent the variables corresponding toỸ andX by Y and X or Y and X respectively. We will use the notation convention above but in the case when the candidate is fixed to be Alice, we will remove the tilde (i.e.X,Ỹ → X, Y ) and alternatively when the candidate is fixed to be Bob, we will replace the tilde with a prime (i.e.X,Ỹ → X , Y ).
The set of all possible data collected by the referee (based on all probabilistic choices including the choice of the candidate) over the course of the entire test can be viewed as a tree where each branch corresponds to a distinct observed value of the random variableX m . The two probability distribution P 1 and P 2 discussed above will each correspond to a distribution over all of the branches of this tree conditioned on the choice of candidate. We note that one can easily incorporate probabilistic choices by the referee into the formalism which will only result in an increase in the number of branches of the tree. Eq. (35) shows that Bob can ensure suppression of P correct by suppressing the L 1 distance ||P 1 − P 2 || 1 . However, if Bob has an -simulator, he can only directly control the L 1 distance between his output and that of Alice for a given circuit request. The proof below culminating in Eq. (39) demonstrates that ||P 1 − P 2 || 1 is sub-additive in the L 1 distance of each circuit request thus Bob can upper bound P correct by bounding each round's L 1 distance in such a way as to ensure the sum converges to the desired bound for ||P 1 − P 2 || 1 .
Proof. We now prove each direction of the "if and only if" statement of Thm. 1. "⇒" Here, we assume that Bob's simulation scheme is an -simulator over C and explicitly specify a strategy for Bob which simultaneously achieves indistinguishability and efficiency.
Bob's strategy will be as follows; if he becomes the candidate, then in the j th round of the protocol, he will be asked to report the outcome of running some circuit indexed by a j ∈ A * . In this case, Bob will -simulate the circuit c aj with the precision setting given by: We note that Bob's strategy as outlined above is fixed and independent of the referee's hypothesis test. Further, we note that for all m ∈ N: We define the map E[X, X ] from any pair of random variables X with probability distribution P and X with probability distribution P to R as the L 1 distance between P and P .
We will show that for every test, the quantity on the LHS of Eq. (36) upper bounds E[X m , X m ]. Hence: where the sums are taken over α in the support ofX j and β in ∪ a∈A * supp(Ỹ a ). We note that the precision of Bob's response in any round only depends on the round number. Thus, Eq. (38) can be simplified by replacing E[Y a(α) , Y a(α) ] with the upper bound j+1 . Combined with the observation that E[X 1 , X 1 ] = 1 , we have shown that: This proves that Bob's strategy meets the indistinguishibility property. We now consider the efficiency of the strategy. We recall that given a circuit request sequence α, Alice's and Bob's resource costs are represented by N (α) and T (α) respectively. Further, Alice's resource costs is lower bounded by m, the number of rounds of the Hypothesis test α.
By definition of -simulation, there exists κ, c 1 , c 2 ∈ N such that for a given circuit index a, and precision , T (a) ≤ c 1 N (a) κ + c 2 . For simplicity, we will set c 1 = 1 and c 2 = 0 as this is immaterial given sufficiently large N (α) and 1 . For m = 1, clearly the strategy is efficient. Hence, given a string of inputs α = (a 1 , . . . , a m ), with m ≥ 2 we have: where: • in Eq. (43)  hence, there exists a polynomial f (x, y) such that for all request strings α and δ > 0, T (α) ≤ f (N (α), 1 δ ).
"⇐": We restrict ourselves to interactive protocols consisting of only one round. For each fixed circuit request, under the optimal choice of the decision map d, δ ∝ hence for all c ∈ C and for all > 0, Bob must be able to sample from some distribution P ∈ B(P, ). Further, since Bob's strategy meets the efficiency condition, for every a ∈ A * , Bob must be able to output the sample using resources ∈ O poly(N (a), 1 δ ) ⊆ O poly(n, 1 ) .

B Strong simulation implies EPSILON-simulation
In this appendix , we will show that the existence of a classical strong simulator of a family of quantum circuits implies the existence of an -simulator (it can in fact construct an approximate weak simulator based on the stronger notion from Ref. [34]). This algorithm aims to map a bitstring (representing the outcome of running the circuit) to r, which is sampled uniformly from [0,1]. While such a mapping is defined for every ordering of the measurement outcomes, it cannot be efficiently computed. This algorithm makes intuitive use of marginal probability estimates to do a binary search for the bit-string corresponding to r. This technique avoids computing ratios of probability estimates making it useful in regimes where additive errors are small but larger than some of the probabilities in the target distribution. We start by giving a more precise definition of a strong simulator (than was presented in Sec. 2.1).

Definition 7.
(strong simulator). A strong simulator of a uniform family of quantum circuits C = {c a | a ∈ A * } with associated family of probability distributions P = {P a | a ∈ A * } is a classical algorithm that, for all a ∈ A * , , δ > 0 and S ∈ {0, 1, •} kn , can be used to compute an estimatep of P a (S) such thatp satisfies the accuracy requirement: Pr |p −p| ≥ ≤ δ (48) and, the run-time required to compute the estimatep is O(poly(n, log −1 , log δ −1 )).
We point out that much like a poly-box, a strong simulator outputs estimates of Born probabilities. The key difference is that the precision of a strong simulator is exponential compared to the polynomial precision of a poly-box. In particular, for any polynomial f , a strong simulator can (efficiently in n) output estimates such that Eq. (48) is satisfied for ∈ Ω(2 −f (n) ) (as opposed to a poly-box which generally requires ∈ Ω(1/f (n))). Hence, we note that the only difference between the definition of a strong simulator and that of a poly-box is the scaling of run-time in .
Theorem 5. Let C be a uniform family of quantum circuits. If C admits a strong simulator, then C admits an -simulator.
In fact we will prove an even stronger statement; that a strong simulator implies approximate weak simulation in the much stronger sense of approximate weak simulation used in Ref. [34] (exponentially small error in L 1 norm).
Before proving this theorem, we introduce an algorithm that uses output from a strong simulator to approximately sample from the output distribution of a circuit i.e. to produce output consistent with the definition of an -simulator. Without loss of generality, let c ∈ C be an arbitrary n qubit circuit with all n qubits measured. We will denote the quantum probabilities by p S and the output of the strong simulator by p ,δ S suppressing the dependence on c. To give a rough intuition, the algorithm will first sample a polynomial length bit-stringr which will be mapped to a probability r ∈ [0, 1]. This value will remain fixed and be used throughout the algorithm until a sampleX is generated from the approximate output distribution. This sample will be the output of the -simulator upon a single execution with the input ( , c). The samplẽ X = (X 1 , . . . ,X n ) will be generated by sampling one bit at a time starting withX 1 . The choice of the j th bitX j is based on the comparisons between the output of the strong simulator p ,δ S and the probability r. This n step process will require n calls to the strong simulator where in each call, the only variation in the inputs is the events S j . Each event S j will be chosen based on the previously sampled valuesX 1 , . . . ,X j−1 .
The algorithm will proceed as follows: 10. Set s j =X j .
We now prove Theorem 5.
Proof. We wish to show that for all acceptable families of quantum circuits C, choices of c ∈ C and > 0: • there exist a polynomially bounded function f ( , n) which determines m and • there exist functions for determining , δ such that given a strong simulator of C, the above algorithm can be executed in run-time O(poly(n, −1 )) and produce outputX from a distributionP satisfyingP ∈ B(P, ). We note that the probability distribution over x ∈ {0, 1} n defines a partitioning (up to sets of measure zero) of the unit interval into 2 n intervals 13 V x labeled by x such that the uniform measure on these intervals corresponds to the quantum probability of outcome x. That is, we fix the partitioning such that for all x ∈ {0, 1} n : To be specific, we can define V where, the above order on bit strings x and x is defined by lexicographical ordering. We note that given a uniform sample p from the unit interval, p will, up to measure zero, be strictly identified with an outcome x ∈ {0, ideal case where the strong simulator produces output which is deterministically exact i.e. p ,δ S = p S for all S, we note that the above algorithm would, for a given r, produce outputX = o(r). For r distributed uniformly on the unit interval, this ensuresX is sampled from exactly the quantum distribution. We thus note that two sources of error arise. The first is from the inaccuracies introduced by the strong simulator's output. The second is from having to approximate a uniform sample over [0, 1] by a uniform sample over {0, 1} m . Letp ,δ x denote the probability Pr(X = x). Then, we have: Given an interval V = [v − , v + ] and α ∈ R, we define: If p x ≥ 2 and r ∈ V − x then: This can be seen by noting that with probability ≥ 1 − δ, each requested probability estimate in step 8 will be within of the corresponding quantum probability resulting inX j = o(r) j .

C Multiplicative precision simulation implies EPSILON-simulation
In this section, we present an algorithm (very similar to Ref. [35]) which uses an estimator with multiplicative precision to construct an -simulator (it can in fact construct an approximate weak simulator based on the stronger notion from Ref. [35]). This algorithm exploits the fact that ratios of multiplicative precision estimators are multiplicative precision in order to sequentially, one qubit's measurement outcome at a time, sample from the marginal probability of the next qubit's measurement conditioned on the sampled outcomes of the prior measurements. This algorithm and its variants have also been presented in [3,4] and are well known within the simulation-ofquantum-circuits community.
Here, we claim without proof that this algorithm lifts a classical multiplicative precision simulator of a family of quantum circuits to an approximate weak simulator based on the stronger notion from Ref. [35]. This result has been shown in Ref. [35], but we discuss it here for completeness.
We start by giving a definition of a multiplicative precision simulator.

Definition 8. (multiplicative precision simulator).
A multiplicative precision simulator of a uniform family of quantum circuits C = {c a | a ∈ A * } with associated family of probability distributions P = {P a | a ∈ A * } is a classical algorithm that, for all a ∈ A * , , δ > 0 and S ∈ {0, 1, •} kn , can be used to compute an estimatep of P a (S) such thatp satisfies the accuracy requirement: and, the run-time required to compute the estimatep is O(poly(n, −1 , δ −1 )).
We claim that a multiplicative precision simulator can be used to construct an -simulator.
Theorem 6. Let C be a uniform family of quantum circuits. If C admits a multiplicative precision simulator, then C admits an -simulator.
We omit a complete proof of this theorem as it makes straightforward use of standard techniques. However, we outline the algorithms which uses output from a multiplicative precision simulator to approximately sample from the output distribution of a circuit. Without loss of generality, let c ∈ C be an arbitrary n qubit circuit with all n qubits measured. We will denote the quantum probabilities by p S and the output of the multiplicative precision simulator by p ,δ S suppressing the dependence on c.
The algorithm will proceed as follows: 14. Reset j → j + 1 and go to step 5.
We note that multiplicative precision estimate can divide each other and still produce a multiplicative precision estimate. Hence c j computed in step 8 is a multiplicative precision estimate of the quantum conditional probability p Sj /p Sj−1 = Pr(X j = x j | X 1 = x 1 , . . . , X j−1 = x j−1 ). This ensures that for = 1 poly(n) , there exist polynomials f 1 , f 2 , f 3 such that ≤ 1/f 1 (n), δ ≤ 1/f 2 (n) and m ≥ f 3 (n) satisfy the desired accuracy.

D On poly-sparsity and anti-concentration
In this section we will prove that poly-sparsity and anti-concentration cannot simultaneously be satisfied by any family of quantum circuits. This result is proven in Theorem 7.
The condition of poly-sparsity forces the output distributions over exponentially many outcomes to concentrate on polynomially many outcomes. Alternatively, the property of anti-concentration forces the probabilities of observing any particular outcome, over random choices of circuits, to be low. Intuitively these properties do appear to oppose each other. However, since these properties are statements with respect to different probability spaces, we must first translate each property into a statement about a common probability space and with a common measure. This is done for anti-concentration and poly-sparsity in Lemma 5 and 6 respectively. We then state and prove our main claim in Theorem 7.
First let us restate the relevant definitions is some detail.

Definition 9.
(poly-sparse) Let P be a discrete probability distribution. We say that P is polysparse if there exists a polynomial P (x) such that for all > 0, P is -approximately t-sparse whenever t ≥ P ( 1 ). Let P be a family of probability distributions with P a ∈ P a distribution over {0, 1} ka . We say that P is poly-sparse if there exists a polynomial P (x) such that for all > 0 and a ∈ A * , P a is -approximately t-sparse whenever t ≥ P (k a / ).

Definition 10.
(anti-concentration) Let C be a family of quantum circuits with P its associated family of probability distributions. For all n ∈ N let σ n be a probability measure over A n . We say that C anti-concentrates with respect to the set of measures Σ := {σ n } n∈N iff ∀n ∈ N, ∀x ∈ {0, 1} n and ∀α ∈ (0, 1): where the probability is with respect to the measure σ n .
Lemma 5. For each n ∈ N, let σ n be a probability measure over A n , let ν n be any probability measure over over {0, 1} n and let τ n be a probability measure over A n × {0, 1} n defined as the product measure σ n × ν n . Then C anti-concentrates with respect to {σ n } n∈N implies that ∀n ∈ N and ∀α ∈ (0, 1): where the probability is taken with respect to τ n .