Lower Bounds on Stabilizer Rank

The stabilizer rank of a quantum state $\psi$ is the minimal $r$ such that $\left| \psi \right \rangle = \sum_{j=1}^r c_j \left|\varphi_j \right\rangle$ for $c_j \in \mathbb{C}$ and stabilizer states $\varphi_j$. The running time of several classical simulation methods for quantum circuits is determined by the stabilizer rank of the $n$-th tensor power of single-qubit magic states. We prove a lower bound of $\Omega(n)$ on the stabilizer rank of such states, improving a previous lower bound of $\Omega(\sqrt{n})$ of Bravyi, Smith and Smolin (arXiv:1506.01396). Further, we prove that for a sufficiently small constant $\delta$, the stabilizer rank of any state which is $\delta$-close to those states is $\Omega(\sqrt{n}/\log n)$. This is the first non-trivial lower bound for approximate stabilizer rank. Our techniques rely on the representation of stabilizer states as quadratic functions over affine subspaces of $\mathbb{F}_2^n$, and we use tools from analysis of boolean functions and complexity theory. The proof of the first result involves a careful analysis of directional derivatives of quadratic polynomials, whereas the proof of the second result uses Razborov-Smolensky low degree polynomial approximations and correlation bounds against the majority function.


Introduction
The conventional wisdom is that quantum computers are more powerful than classical computers. Among other reasons, this belief is supported by the fact that quantum computers are able to efficiently solve problems such as integer factorization [22], which are believed by some to be hard for classical computers; by provable black box separations [23,11,3,20]; and by quantum computers' advantage in solving certain sampling problems that are deemed intractable for classical computers under well established complexity theoretic conjectures [1].
There is, however, very little that we can unconditionally prove with regard to the impossibility of efficiently simulating quantum computers using classical computers. Indeed, barring a computational complexity theoretic breakthrough, such as -at the very leastseparating P from PSPACE, we can't hope to prove general and unconditional impossibility results.
Nevertheless, it remains an interesting and important problem to prove lower bounds on the running time of certain restricted types of simulation techniques for quantum circuits. One such result is a lower bound of Huang, Newman and Szegedy [12], who prove √ 3) [6]. 1 This suggests the possibility of simulating a general quantum circuit by decomposing |H ⊗n or |R ⊗n as a linear combination of stabilizer states.
More formally, |ϕ is a stabilizer state if |ϕ = U |0 n where U is an n-bit Clifford unitary (see also Equation (1) and the following paragraph). The stabilizer rank of a state |ψ , denoted χ(ψ), is the minimal integer r such that where for every 1 ≤ j ≤ r, |ϕ j is a stabilizer state and c j ∈ C.
For any n-qubit state, the stabilizer rank is at most 2 n . Interestingly, much smaller upper bounds can be shown for the the stabilizer rank of |H ⊗n : Bravyi, Smith and Smolin [7] proved that χ(H ⊗6 ) ≤ 7 which implies that χ(H ⊗n ) ≤ 7 n/6 ≤ 2 0.468n . Bravyi, Smith and Smolin [7] then use this identity to obtain simulation algorithms for circuits with a small number of T gates, whose running time is much faster than the trivial bruteforce simulation. A slightly faster algorithm was presented by Kocia who proved that χ(H ⊗12 ) ≤ 47 [14], and upper bound was further improved by Qassim, Pashayan and Gosset [19] who proved that χ(H ⊗n ) = O(2 αn ) for α = 1 4 log 2 3 ≤ 0.3963. When simulating quantum circuits, it is often enough, for all intents and purposes, to obtain an approximation of their output state. Thus, it's natural to define a similar approximation notion for stabilizer rank. The δ-approximate stabilizer rank of |ψ , denoted χ δ (ψ), is defined as the minimum of χ(ϕ) over all states |ϕ such that ψ − ϕ 2 ≤ δ [4]. By considering approximate stabilizer decomposition of |H ⊗n , improved simulation algorithms were obtained by Bravyi and Gosset [5].
A natural question is then what is the limit of such simulation methods. As the running time of the simulation scales with the stabilizer rank, an upper bound which is polynomial (in n) on χ(H ⊗n ) or χ(R ⊗n ) will imply that BPP = BQP and even (by simulating quantum circuits with postselection) P = NP [4], and thus seems highly improbable. 2 Much stronger hardness assumptions than P = NP, such as the exponential time hypothesis, imply that χ(H ⊗n ) = 2 Ω(n) [17,12].
However, the starting point of this discussion was our desire to obtain unconditional impossibility results, and thus we are interested in provable lower bounds on χ(H ⊗n ) and χ δ (H ⊗n ), and similarly for R ⊗n .
While it's easy to see, using counting arguments, that the stabilizer rank of a random quantum state would be exponential, it is a challenging open problem to prove superpolynomial lower bounds on the rank of |H ⊗n or for other explicit states. Bravyi, Smith and Smolin proved that χ(H ⊗n ) = Ω( √ n). In this paper, we improve this lower bound, and also prove the first non-trivial lower bounds for approximate stabilizer rank.

Our results: Improved Lower Bounds on Stabilizer Rank and Approximate Stabilizer Rank
Our first result is an improved lower bound on χ(H ⊗n ) and χ(R ⊗n ).
As we remark in Section 1.3, proving super-linear lower bound on χ(H ⊗n ) will solve a notable open problem in complexity theory. We discuss this challenge, as well us some barriers preventing our technique from proving super-linear lower bounds, in Section 1.5.
The result of Theorem 1.1 can be immediately adapted to prove the same lower bounds on the δ-approximate stabilizer rank for exponentially small δ. We are, however, interested in much coarser approximations, and we are able to prove a meaningful result even for δ being a small enough positive constant. Theorem 1.2. There exists an absolute constant δ > 0 such that χ δ (H ⊗n ) = Ω( √ n/ log n), and similarly χ δ (R ⊗n ) = Ω( √ n/ log n).
By definition, the stabilizer rank of any two states which are Clifford-equivalent is the same, and thus the lower bounds of Theorem 1.1 and Theorem 1.2, while stated as lower bounds on the ranks of |H ⊗n and |R ⊗n hold for any state which is Clifford-equivalent to them, even up to a phase.

Technique: Stabilizer States as Quadratic Polynomials
The original proof of the Gottesman-Knill Theorem used the stabilizer formalism and tracked the current state of the circuit by storing the generators of the subgroup of the Pauli group which stabilizes the state, and updating them after each application of a Clifford operation. It turns out, however, that there is an alternative succinct representation of stabilizer states, using their amplitudes in the computational basis {|x } x∈F n 2 [8,26]. This representation also leads to an alternative proof of the theorem, as explained in [26].
If |ϕ is a stabilizer state then (up to normalization) where A ⊆ F n 2 is an affine subspace, (x) is an F 2 -linear function and q(x) is a quadratic polynomial over F 2 . The amplitudes of |H ⊗n and |R ⊗n are also easy to compute. For example, recall that |H = cos(π/8) |0 + sin(π/8) |1 , and thus where |x| denotes the Hamming weight of x.
It is convenient to recast this problem as a problem about functions on the boolean cube in the following natural way. For an n-qubit state |ψ we associate a function F ψ : F n 2 → C such that F ψ (x) equals the amplitude of |x when writing |ψ in the computational basis. In this formulation, our "building blocks" are stabilizer functions, i.e., functions of the form where A is an affine subspace, 1 A is the indicator function of A (i.e., 1 A (x) = 1 if x ∈ A and zero otherwise), is a linear function and q is a quadratic polynomial. Let H n denote the function associated with |H ⊗n . We would like to show that in any decomposition where c j ∈ C and ϕ j (x) are stabilizer functions, r must be large.
Our techniques for showing that use tools from the analysis of boolean functions and from complexity theory. In Section 1.4 we recall some similar questions that have arisen in complexity theory.
For the proof of Theorem 1.1, we show that if f is a function of stabilizer rank at most, say, n/100, then it is possible to find two vectors x, y ∈ F n 2 such that the Hamming weight of x is very small, the Hamming weight of y is very large, and f (x) = f (x + y). Since |x + y| ≥ |y| − |x|, for the correctly chosen parameters we get that |x + y| > |x|, which leads to a contradiction if f = H n , since H n takes different value on each layer of the Hamming cube.
To find such x and y, given a decomposition r j=1 c j i (x) (−1) q j 1 A j with r ≤ n/100, we find x, y such that j (x) = j (x + y), q j (x) = q j (x + y) and Observe that for a fixed y ∈ F n 2 and a quadratic polynomial q(x), the equation q(x) = q(x + y) is an affine linear equation in unknowns x. Thus, denoting ∆ y (q) = q(x) + q(x + y) (this is also called the directional derivative of q with respect to y), we get a system of affine linear equations {∆ y (q j ) = 0} j∈ [r] in x, which, assuming r is small, has many solutions (assuming it is solvable at all).
The additional requirements j (x) = j (x + y) and 1 A j (x) = 1 A j (x + y) make things more complicated. However, using an averaging argument and by again utilizing the fact that r is relatively small, we are able to find a large affine subspace U of vectors which satisfy those equations, and then we analyze the above system of linear equations over the affine subspace U .
In order to satisfy the conditions on the Hamming weights of x and y we use Kleitman's theorem [13] which gives an upper bound on the size of sets of the boolean cube with small diameter, as well as some elementary linear algebra. The full proof of Theorem 1.1 appears in Section 3.
The proof of Theorem 1.2 follows a different strategy. Starting from a state |ψ of rank r which is δ-close to |H ⊗n for some small enough constant δ > 0, we show how to use |ψ in order to construct an F 2 -polynomial of degree O(r log r) which (1 − ε)-approximates the majority function on m = Ω(n) bits. By a well known correlation bound of Razborov and Smolensky [21,24,25], this implies that r = Ω( √ n/ log n). We now explain how to obtain this polynomial approximating the majority function. Let p = sin 2 (π/8) = 0.146 . . . . Instead of majority, it is convenient to first consider the function THR pn which is 1 on all inputs x whose Hamming weight is at least pn, and zero otherwise. Note that this function is trivial to approximate under the uniform distribution by the constant 1 polynomial, but the approximation question becomes meaningful when considering B(n, p), the binomial distribution with parameter p on the n-dimensional cube. This is useful since the L 2 mass of the vector |H ⊗n is distributed according to this distribution. In particular it is heavily concentrated on coordinates x such that |x| = pn ± O( √ n), and a state |ψ which is δ-close to |H ⊗n must contain in almost all of these coordinates values which are very close to those of |H ⊗n . It is then possible to obtain from ψ a boolean function f which approximates the function THR pn . We observe that a restriction g of f to a random set of 2pn coordinates will approximate the majority function, and further, assuming |ψ has stabilizer rank r, and using standard techniques again borrowed from Razborov and Smolensky, g itself can be approximated by a polynomialg of degree O(r log r). It follows thatg approximates the majority function over 2pn bits. The full proof of Theorem 1.2 appears in Section 4.

Related Work
As mentioned above, the previous best lower bound was an Ω( √ n) lower bound for exact stabilizer rank of |H ⊗n proved by Bravyi, Smith and Smolin [7]. Stronger lower bounds are known in restricted models. As mentioned by [7] (see also Lemma 2 in [5]), for every stabilizer state |ϕ it holds that | ϕ|H ⊗n | ≤ 2 −Ω(n) which immediately implies an exponential lower bound in the case that the coefficients c j are bounded in magnitude (in particular, this holds if the states in the decomposition are orthogonal). It is worth noting that by Cramer's rule, in any rank r decomposition the coefficients c j can be taken to be of magnitude at most exponential in n and r.
Bravyi et al. [4] present a different restricted model in which they prove an exponential lower bound.
Related questions have been considered before in complexity theory. The so called "quadratic uncertainty principle" [9,27] is a conjecture which states that in any decompo-sition of the AND function as a sum for quadratic functions {q j } j∈ [r] and c j ∈ C, r = 2 Ω(n) . The best lower bound known is r ≥ n/2 (see [27]). Note that since in the stabilizer rank case we allow functions of the form (−1) q · 1 A for affine subspaces A, the model we consider in this paper is stronger: in particular the AND function itself is a stabilizer function and its stabilizer rank is 1.
Williams [27] has constructed, for every positive integer k, a function f k ∈ NP which requires r = Ω(n k ) in any decomposition as in (2). It remains, however, an intriguing open problem to construct boolean function in P which requires a super-linear number of summands.
We remark that proving super linear lower bounds on the stabilizer rank of |H ⊗n will solve this problem. Indeed, as mentioned above, the stabilizer rank model is even stronger, and thus lower bounds carry over to weaker models. Furthermore, even though H n itself is not a boolean function, |H is Clifford-equivalent (up to an unimportant phase) to |T := 1 √ 2 (|0 + e iπ/4 |1 ) (see [7]), which implies that the stabilizer rank of |H ⊗n equals the stabilizer rank of |T ⊗n . Denoting T n the function associated with T ⊗n , it is now evident that T n (x) depends only on |x| mod 8, and therefore is a boolean function such that M j (x) = 1 if and only if |x| = j mod 8. Thus, a super-linear lower bound on the stabilizer rank of |H ⊗n will imply a super-linear lower bound on the rank of the (boolean) mod 8 function.
Following the initial publication of this work, our results were reproved using different techniques. Labib [15] used higher-order Fourier analysis in order to prove a result similar to Theorem 1.1, and extended it to qudits of any prime dimension. Lovitz and Steffan [16] proved nearly identical lower bounds for exact and approximate stabilizer rank using number-theoretic techniques.

Open Problems
While Theorem 1.1 improves upon the previous best lower bound known, we are unfortunately unable to prove super-polynomial or even super-linear lower bounds on χ(H ⊗n ) or χ(R ⊗n ). Further, our techniques seem incapable of proving super-linear lower bounds, as they extend to any representation of H n as an arbitrary function of r stabilizer functions, and not necessarily a linear combination of them.
As mentioned in Section 1.4, it seems that a first step could be proving super-linear lower bounds for the quadratic uncertainty principle problem. A different approachable open problem is to improve our lower bound on the δ-approximate stabilizer rank to be closer to Ω(n). This could perhaps be easier assuming δ is polynomially small in n.

General Notation
As mentioned in the introduction, it is often convenient to speak about functions on the boolean cube rather than quantum states. For an n-qubit state |ψ = x∈F n 2 c x |x , the associated function F ψ : The L 2 norm of the function F : F n 2 → C is then the same as the norm of the corresponding vector, i.e., F = As shown in [8,26], stabilizer functions indeed correspond to stabilizer states up to normalization (which has no effect on the stabilizer rank).
The stabilizer rank of a function F : For a vector x ∈ F n 2 we denote by |x| its Hamming weight. We denote by Maj m : F m 2 → F 2 the m-bit majority function, that is Maj m (x) = 1 if and only if |x| ≥ m/2.
Here d(u, v) denotes the Hamming distance of u and v.
Kleitman [13] proved that sets of small diameter cannot be too large.
This result is obviously tight as shown by the example of the set of all vectors of Hamming weight at most k.

Linear Algebraic Facts
Recall that an affine subspace U ∈ F n 2 is a the set of solutions to a system of affine equations, i.e., a system of the form M x = b for some M ∈ F k×n 2 and b ∈ F k 2 . Every affine subspace can be written as U = u + U 0 for u ∈ F n 2 and a linear subspace U 0 ⊆ F n 2 . In our terminology, linear subspaces are in particular affine subspaces (and similarly, linear functions are a special case of affine functions).
We record the following useful facts.  Proof. Follows immediately from applying Claim 2.5 with v = 0.
Finally, we define the directional derivative of a quadratic function over F 2 .
The directional derivative of q in direction y is defined to be the function Observe that for every y, ∆ y (q) is an affine function in x.

A Lower Bound for Exact Stabilizer Rank
In this section we prove Theorem 1.1. We first present the main lemma of this section. Proof. In the case where |B = |H , the associated function F H : F n 2 → C is defined by F H (x) = cos(π/8) n−|x| sin(π/8) |x| . If |B = |R , the associated function F R : F n 2 → C is defined by F R (x) = cos(β) n−|x| (e iπ/4 sin(β)) |x| where β = arccos(1/ √ 3)/2. It is immediate to verify that for every y, z ∈ F n 2 of different Hamming weight those functions attain different values. Thus, by Lemma 3.1, their stabilizer rank is at least n/100.
We turn to the proof of Lemma 3.1.
Proof of Lemma 3.1. Let F : F n 2 → C be a function of stabilizer rank at most r ≤ n/100, i.e., where for every j ∈ [r], j is a linear function, q j is a quadratic function, and A j ⊆ F n 2 is an affine subspace.
To prove the statement of the lemma, we will show that there exist y, z ∈ F n 2 such that |y| < |z| and for every j ∈ [r] all of the following hold: The first two items are handled by the following claim, which shows that there is a large affine subspace satisfying both conditions. Claim 3.3. There's an affine subspace U ⊆ F n 2 of dimension at least n − 3r such that for every j ∈ [r] and for every u 1 , u 2 ∈ U , j (u 1 ) = j (u 2 ) and 1 A j (u 1 ) = 1 A j (u 2 ).
We defer the proof of Claim 3.3 to the end of this proof. Write U = u + U 0 where u ∈ F n 2 and U 0 ⊆ F n 2 is a linear subspace. The next claim handles the third item above. Claim 3.4. There exists v ∈ U 0 with |v| ≥ 2n/3 such that the system of equations

(in unknowns x) has a solution in U .
We postpone the proof of this claim as well, and now explain how it implies the result.
Let v ∈ U 0 as promised in Claim 3.4. The set of solutions in U to the system of affine equations is non-empty (by Claim 3.4), and thus by Fact 2.4, the set of solutions in U to (3) is an affine subspace V ⊆ U of dimension at least n − 4r. By Corollary 2.6, there is y ∈ V with |y| ≤ 4r. Set z = y+v, so that q j (y) = q j (y+v) = q j (z) for all j ∈ [r]. Observe that z ∈ U , since Claim 3.4 promises that v ∈ U 0 . Thus y and z attain the same values on j and 1 A j for all j ∈ [r] as well. Finally note that |y| ≤ 4r whereas |z| = |y + v| ≥ |v| − |y| ≥ 2n 3 − 4r > 4r.

It remains to prove Claim 3.3 and Claim 3.4.
Proof of Claim 3.3. Let V 1 ⊂ F n 2 be the linear subspace defined by the system of equations { j = 0} for all j ∈ [r]. It holds that dim(V 1 ) ≥ n − r > 0.
Consider now the map E : Let S be the support of α, that is, the set of indices j ∈ [r] such that α j = 1. We have that Then V 2 is an affine subspace, and |V 2 | ≥ |E −1 (α)| ≥ 2 n−2r , so dim(V 2 ) ≥ n − 2r > 0.
Pick now an arbitrary x 0 ∈ E −1 (α). Thus, x 0 ∈ V 2 , and for every j ∈ S, x 0 ∈ A j . By Fact 2.3, for every j ∈ S there is an affine equation a j such that a j (x 0 ) = 1 and for all x ∈ A j , a j (x) = 0. Let Then U is an affine subspace (as it is defined by at most r additional affine constraints on V 2 ), and it is non-empty (since x 0 ∈ U ). By Fact 2.4, it follows that dim(U ) ≥ n−2r −r = n − 3r. Further, for every x ∈ U and j ∈ [r], it holds that j (x) = 0 and which completes the proof. For every α ∈ {0, 1} r , let Γ α = x 1 + x 2 : x 1 , x 2 ∈ Γ −1 (α) . Observe that for every α, Γ α ⊆ U 0 . Furthermore, for every v ∈ Γ α , the set of affine equations in unknowns x, has a solution in U . Indeed, v = x 1 + x 2 where x 1 , x 2 ∈ Γ −1 (α), and thus q j (x 1 ) = q j (x 2 ) = q j (x 1 + v) for every j ∈ [r], which implies that x 1 is a solution.

A Reduction from Threshold Functions to Majority
Let 0 < p < 1/2. Recall that THR pn (x) equals 1 if |x| ≥ pn and 0 otherwise. In this section we prove that given any function f : F n 2 → F 2 that approximates THR pn with respect to B(n, p), we can find a function g, which is a restriction of f to 2pn random coordinates, which approximates the majority function on those bits with respect to the uniform distribution.
In anticipation of the next section, when considering approximations for THR pn we will work with a slightly different notion of approximation than approximation with respect to B(n, p), which we now explain.
Let L k = {x ∈ F n 2 | |x| = k} denote the k-th layer of the boolean cube. We say that a function f : F n 2 → F 2 is ε-wrong on L k (with respect to THR pn ) if the fraction of elements We say that f (ε, γ)-approximates THR pn if f is ε-wrong on at most a γ fraction of the layers L k for k ∈ [pn − 5 √ 2pn , pn + 5 √ 2pn ]. For the rest of the proof we will always set ε = γ = 0.01.  Proof. Let m = 2pn. For every D ⊆ [n] of size m, let g D be the function obtained from f by fixing all input bits outside of D to 0. It will be convenient to consider g D as a function whose domain is F m 2 using some bijection between D and [m]. Every x ∈ F n 2 which is zero on coordinates outside of D then corresponds to a uniquex ∈ F m 2 , and vice versa. We will now pick D uniformly at random among all subsets of [n] of size m, so that g D is a random restriction of f .

Since B(n, p) is heavily concentrated on layers
We say x ∈ F n 2 survives D if the set of indices j ∈ [n] such that x j = 1 is contained in D. The probability that x ∈ L k survives D is m k / n k . For an input x ∈ F n 2 , we say x is correct if f (x) = THR pn (x), and incorrect otherwise. If x is correct and survives, then Maj m (x) = THR pn (x) = f (x) = g D (x).
Let X k be a random variable, which denotes the number of incorrect inputs x ∈ L k that survive D, and By the assumption, for at least 0.99 fraction of the layers L k , the number of incorrect x's is at most 0.01 n k , and thus for each such layer L k for k ∈ [pn − 5 We call such layers good. For the rest of the layers, which we call bad, 2pn ] is at most 2 · (5 2pn + 1) + 1 ≤ 11 2pn, and thus the number of bad layers is at most 0.01 · 11 · √ 2pn. Further, for every k, Therefore, In particular, there is some D 0 such that the number if incorrect x's in layers [pn − 5 √ 2pn , pn + 5 √ 2pn ] that survive D 0 is at most 12 100 2 m . Let g := g D 0 . We now claim that g and Maj m agree on more than 3/4 of the inputs in F m 2 . First, By the Chernoff bound, the number of vectorsx ∈ F m 2 whose Hamming weight is not in the range is at most 2e −((5/ √ pn) 2 ·pn/6) · 2 m ≤ 1 15 2 m . On these inputs we have no guarantee. By the choice of D 0 , the number ofx's such that |x| ∈ [m/2 − 5 √ m , m/2 + 5 √ m ] and g(x) = Maj(x) is at most 12 100 2 m . It follows that g(x) = Maj m (x) on less than 1 4 · 2 m inputs.
From here on, δ will denote a sufficiently small constant, which may depend on |B and its parameters (i.e., δ is some function of p), but does not depend on n. Since we are interested in the case |B = |H or |B = |R , δ can be taken to be some small universal constant.
For k ∈ [n], let m k := |α n−k β k | denote the absolute value of F B on the k-th layer. Let w k = m 2 k = p k (1 − p) n−k and W k = n k w k the total mass on the k-th layer, with respect to B(n, p). Let η = |β| |α| . Observe that by assumption, 0 < η < 1.
We define a boolean function f ψ : F n 2 → F 2 as follows: 4 The intuition for the definition is that, since ψ − F B ≤ δ, we expect ψ(x) to be very close to F B (x) for most inputs x. For every such x, f ψ will correctly compute THR pn . Further, inputs x such that f ψ (x) = THR pn (x) correspond to inputs x such that |ψ(x) − F B (x)| is large. Assuming there are many such x's will lead to a contradiction to the assumption that ψ − F B ≤ δ.

Lemma 4.2.
Let ψ : F n 2 → C be a function such that ψ − F B ≤ δ for a sufficiently small δ. Let f ψ the boolean function defined as in (4). Then f ψ (0.01, 0.01)-approximates THR pn .
We begin with the following calculation.
By assumption, f ψ (x) = 1, which implies that Observe that m k = (η −1 ) pn−1−k m pn−1 ≥ m pn−1 for k ≤ pn − 1, and therefore by the triangle inequality which implies the statement of the claim (for k ≤ pn − 1) by squaring both sides. If k ≥ pn, then THR pn (x) = 1 which implies f ψ (x) = 0, i.e., Note that m k = η k−pn+1 m pn−1 and in particular m k ≤ ηm pn−1 for all k ≥ pn. Thus, which proves the lemma for this case as well.
We use the following standard estimates on the concentration of the binomial distribu- where the constant hidden under the Ω notation depends on C and p, but not on n.
Observe that C in the above claim may be negative. The proof is a direct application of Stirling's approximation. For completeness, we provide a crude estimate which suffices for us in Appendix B.
We are now ready to prove the main lemma of the section.
Proof of Lemma 4.2. Let k be a layer such that f ψ is 0.01-wrong on L k . By Claim 4.3, Suppose, towards a contradiction, f ψ is 0.01-wrong on more than 0.01 fraction of the layers k ∈ [pn − 5 √ 2pn , pn + 5 √ 2pn ], i.e., on more than 0.1 √ 2pn layers. By Claim 4.4, for every such k, W k ≥ c/ √ n for some constant c which does not depend on n. It follows that which is a contradiction for δ < 0.001 √ 2pc 1−η 2 2 .

A Low Degree Polynomial Approximation
In this section we show that for the function f ψ defined as in (4), and for any restriction g D of f ψ as in Lemma 4.1, the function g D has a polynomial approximating it, whose degree is at most O(r log r). To prove this we apply standard approximation techniques used for proving lower bounds for bounded depth circuits with modular gates, although in our case the details are somewhat simpler. We begin with the following lemma that shows how to approximate indicator functions of affine subspaces with low degree polynomials.
where for every j ∈ [r], j is a linear function, q j a quadratic function, and A j an affine subspace.
For every j ∈ [r], let A j , j , q j denote the projection of A j , j , q j respectively, obtained by setting the coordinates outside of D to zero. Observe that A j ⊆ F m 2 is an affine subspace, j an m-variate linear function over F 2 , and q j an m-variate quadratic function over F 2 , and that (note that here v j ∈ {0, 1} is considered as a real number). Then For every j ∈ [r], let P j be a polynomial of degree O(log(r)) such that Prx ∈F m 20r , as guaranteed by Claim 4.5. Note that h is a function on 3r boolean variables, and hence can be represented exactly by a polynomial of degree at most 3r. As the j 's have degree 1 and q j 's degree 2, it follows that is a polynomial of degree O(r log r), and by the union bound

A Lower Bound for Approximate Stabilizer Rank via Correlation Bounds
We now observe that the results of Section 4.1, Section 4.2 and Section 4.3 imply our lower bounds. The final ingredient we require is the following correlation lower bound. Then deg(f ) = Ω( √ m).

A.1 Pauli and Clifford Group
The Pauli matrices are three 2 × 2 complex unitary matrices defined as follows: These matrices generate a subgroup of 2x2 matrices of order 16, denoted by P 1 and called the single qubit Pauli group, that contains the elements {±I, ±iI, ±X, ±iX, ±Y, ±iY, ±Z, ±iZ} .
The Clifford group C n can now be defined as the normalizer of P n in the group U (n) of n-qubit unitary matrices. It is convenient, however, to consider C n as a finite group, which is why it is usually defined modulo U (1), i.e., we identify two matrices U and V if U = cV for some c ∈ C with |c| = 1 (c is called a global phase): C n := U ∈ U (n) : U P n U † = P n /U (1).
It turns out that C n has a set of generators which is very easy to describe. Every U ∈ C n can be generated using the following simple set of gates: is the so-called π/8 gate, is universal.

A.2 Magic States
As explained in Section 1.1, any circuit over the (universal) gate set {CNOT, H, S, T } can be converted to a circuit of roughly the same size with only Clifford gates, which is given as additional inputs an ample supply of qubits in a magic state. The two types of magic states defined by Bravyi and Kitaev [6] are |H = cos(π/8) |0 + sin(π/8) |1 , and |R = cos(β) |0 + e iπ/4 sin(β) |1 , where β = arccos(1/ √ 3)/2. We say two n-qubit states ψ and ϕ are Clifford-equivalent if |ψ = U |ϕ for U ∈ C n . Up to a phase, state |H is Clifford-equivalent to the state |T = 1 √ 2 (|0 + e iπ/4 |1 ) (see [7]), and thus Clifford circuits provided with |H ⊗n as auxiliary inputs have the same computational power as Clifford circuits provided with |T ⊗n . √ 2πn · (n/e) n 2π(pn)(pn/e) pn · 2π(1 − p)n · ((1 − p)n/e) (1−p)n . Thus, where the constant hidden under the Ω notation depends on p. Now, for C > 0, we will show that W pn /W pn+C (pn + C √ n) · · · (pn + 1) The last term is bounded by a constant, as A similar calculation works when C < 0.