Stabilizer extent is not multiplicative

The Gottesman-Knill theorem states that a Clifford circuit acting on stabilizer states can be simulated efficiently on a classical computer. Recently, this result has been generalized to cover inputs that are close to a coherent superposition of logarithmically many stabilizer states. The runtime of the classical simulation is governed by the stabilizer extent, which roughly measures how many stabilizer states are needed to approximate the state. An important open problem is to decide whether the extent is multiplicative under tensor products. An affirmative answer would yield an efficient algorithm for computing the extent of product inputs, while a negative result implies the existence of more efficient classical algorithms for simulating largescale quantum circuits. Here, we answer this question in the negative. Our result follows from very general properties of the set of stabilizer states, such as having a size that scales subexponentially in the dimension, and can thus be readily adapted to similar constructions for other resource theories.


INTRODUCTION AND SUMMARY OF RESULTS
In the model of quantum computation with magic states [1], stabilizer circuits, whose computational power is limited by the Gottesmann-Knill theorem [2,3], are promoted to universality by implementing non-Clifford gates via the injection of magic states. There has been a long line of research with the goal of designing classical algorithms to simulate such circuits: Quasiprobability-based methods [4][5][6][7][8] work on the level of density operators. Starting point is the observation that the (qudit) Wigner function [9] of stabilizer states is given by a probability distribution on phase space and thus gives rise to a classical model. Similar to the quantum Monte-Carlo method of many-body physics, one can then devise randomized simulation algorithms whose runtime scales with an appropriate "measure of negativity" of more general input states.
Stabilizer rank methods [10,11], on the other hand, work with vectors in Hilbert space. The idea is to expand general input vectors as a coherent superposition of stabilizer states. The smallest number of stabilizer states required to express a given vector in this way is its stabilizer rank. Bravyi, Smith, and Smolin [10] proposed a fast simulation algorithm that scales in the stabilizer rank, Bravyi and Gosset [12] developed this technique further into a simulation procedure scaling in approximate stabilizer decompositions.
a Electronic adress: arne.heimendahl@uni-koeln. de No efficient methods are known for computing the stabilizer rank analytically or numerically. To address this issue, Bravyi et al. [13] introduced a computationally better-behaved convex relaxation: the stabilizer extent (see Definition 1). The central sparsification lemma of [13] states that a stabilizer decomposition with small extent can be transformed into a sparse decomposition that is close to the original state. In this way, the stabilizer extent defines an operational measure for the degree of "non-stabilizerness". We work in a slightly more general setting than [13], where the role of the stabilizer states is replaced by a finite set D ⊂ C d which spans C d , referred to as a dictionary. Definition 1 ( [13]). Let D ⊂ C d be a finite set of vectors spanning C d . For an element x ∈ C d , the extent of x with respect to D is defined as where c 1 = s∈D |c s |. If d = 2 n and D = STAB n is the set of stabilizer states, then ξ D (x) is the stabilizer extent of x, and the notation is shortened to ξ(x).
As is widely known, ℓ 1 -minimizations such as ξ D can be formulated as convex optimization problems (see for example [14]). In the complex case this is a second order cone problem [15], whose complexity scales polynomially in the dimension d. In particular, the complexity of determining the stabilizer extent of an arbitrary vector, ξ(x), scales exponentially in the number of qubits. Thus, the question arises whether it is possible to simplify the computation of ξ D for certain inputs, e.g. product states of the form ψ = ⊗ j ψ j .
Since the set of stabilizer states is closed under taking tensor products, one can easily see that the stabilizer extent is submultiplicative, that is ξ(⊗ j ψ j ) ≤ j ξ(ψ j ) for any input state ⊗ j ψ j . Bravyi et al. proved that it is actually multiplicative if the factors are composed of 1-, 2-or 3-qubit states.
Our main result is that stabilizer extent is not multiplicative in general. In fact, our result does not depend on the detailed structure of stabilizer states, but holds for fairly general families of dictionaries. The properties used -prime among them that the size of the dictionaries scales subexponentially with the Hilbert space dimension -are listed as Properties (i) to (v) in the following theorem: Assume that (D n ) satisfies the following properties: (i) Normalization: s, s = 1 for all s ∈ D n .
(ii) Subexponential size: (iii) Closed under complex conjugation: if s ∈ D n , then s * ∈ D n .
(v) Contains the maximally entangled state: For every n, the maximally entangled state is contained in the dictionary D 2n . Here, {e k } is the standard ("computational") basis of (C d 0 ) ⊗n .
Let ψ be an n-qudit Haar-random state. Then In particular, for sufficiently large n, the extent with respect to the dictionary sequence (D n ) is strictly submultiplicative.
Note that the main theorem also implies that other magic monotones recently defined in [11] (mixed state extent, dyadic negativity, and generalized robustness) fail to be multiplicative, since they all coincide with the stabilizer extent on pure states [16].
The remaining part of paper is organized as follows: In Section 2, we outline the geometric intuition behind the argument. The rigorous proof is given in Section 3. As an auxiliary result, we present an optimality condition on stabilizer extent decompositions in Section 4.

PROOF STRATEGY
In this section, we explain the geometric intuition behind the main result. To simplify the exposition, we present a version of the argument for real vector spaces.
We recall the convex geometry underlying the problem. In the real case, the extent can be formalized as a linear program: Using standard techniques for linear programming, see for example [17], one can dualize the program: x j y j denotes the inner product on R d . Let be the region of feasible points for the dual program. Since D is finite and contains a spanning set of R d , the set M D is a polytope. The dual formulation implies that for each x, there exists a witness y among the vertices of M D such that ξ D (x) = x ⊤ y. Conversely, with each vertex y ∈ M D , one can associate the set of primal vectors x for which y is a witness: It is easy to see that the C y are full-dimensional convex cones that partition R d as y ranges over the vertices of M D (see Figure 1 for an illustration). The cones C y are called normal cones and the induced partition of R d is referred to as the normal fan of M D , see for example [18]. For x ∈ R d , define the fidelity of x with respect to D as the maximal overlap of x with an element in D (the value F D (x) can also be viewed as the ℓ ∞ -norm of x with respect to D).
These notions allow us to analyze how the extent of a vector x changes when a word w is added to the dictionary D (in the proof below, we will track the extent when the maximally entangled state is added to a product dictionary). Indeed, if x is contained in the interior of some C y , and if |w ⊤ y| > 1, then the vertex y is infeasible for the dual program with respect to the dictionary D ∪ {w} (i.e., y / ∈ M D∪{w} ), and therefore ξ D∪{w} (x) < ξ D (x). Now, the argument of the proof of the main theorem proceeds in two steps: (1) Assume x is chosen Haar-randomly from the unit-sphere in R d . Almost surely, there will be a unique witness y, i.e., x will lie in the interior of some normal cone C y for some vertex y of M D . Moreover, the norm of y is large with high probability, To see why the latter holds, note that where the second inequality follows because x/ F D (x) ∈ M D is feasible for the dual (as realized in [13]). A standard concentration-of-measure argument (as in [13], proof of Claim 2) shows that if |D| is not too large, the maximal inner productsquared of x with any element of D will be close to the expected inner productsquared with any fixed unit vector v, which is |x ⊤ v| 2 ≈ 1/d.
The polytope M D for the dictionary D = {s 1 , s 2 } ⊂ S 1 and the normal cone C y 2 of the vertex y 2 . The active inequalities at y 2 yield the extreme rays of C y 2 .
(2) Now consider x ⊗ x. With respect to the product dictionary D ⊗ D, one easily finds , and that y ⊗ y is a unique witness and a vertex of M D⊗D . If Φ is the maximally entangled state, Thus adding Φ to the dictionary means that y ⊗ y becomes dually infeasible (i.e., y ⊗ y / ∈ M D⊗D∪{Φ} ). It follows that the extent of x ⊗ x (in fact, the extent of any element in the interior of C y⊗y ) decreases if Φ is added.

PROOF OF THE MAIN THEOREM
In preparation of proving the main theorem, we translate the convex geometry of ℓ 1minimization from the real case (sketched in the previous section) to the case of complex vector spaces. This problem has been treated before in various places in the literature, including in [10], in the context of the theory of compressed sensing (e.g. [19]), and in greater generality in the convex optimization literature (e.g. [20]). As we are not aware of a reference that gives a concise account of all the statements required, we present self-contained proofs in Appendix A.
We will use the superscripts R and I to denote, respectively, the real and complex part of a vector. The extent then has the following dual formulation (c.f. Appendix A): where F D (y) = max s∈D | s, y | 2 and s, y := d j=1 s j y j denotes the inner product on C d . Let be the set of feasible points for the dual. In contrast to the real case, M D is not a polytope, but M D is still a bounded convex set (viewed as a subset in R 2d , for an explanation, see Appendix A). Thus, by Krein-Millman, M D is the convex hull of its extreme points, which can be characterized as follows (Appendix A contains a proof): With every extreme point y, we associate the normal cone A final preparation step invokes complementary slackness (Appendix A contains a proof): Lemma 3 (Complementary slackness conditions). Let ψ = s∈D c s s be an optimal extent decomposition with respect to D and let y ∈ M D be an optimal dual witness, i.e., ψ ∈ C y and ξ D (ψ) = s∈D |c s | = ψ, y R . Then, we have the following two conditions: (A1) If c s = 0, then s, y = c s /|c s |.
The complementary slackness conditions have the following two consequences: First, assume that ψ = s∈D c s s is an optimal decomposition and that y ∈ C d optimal for the dual. From condition (A1), we obtain so we can rewrite the dual program for the extent as which coincides with the dual formulation given in [13]. Since ψ/ F D (ψ) is feasible for the dual, we get the natural lower bound [13] ξ Secondly, if a state ψ is chosen Haar-randomly, the optimal dual witness y for ξ D (ψ) is a unique extreme point of M D with probability one, because of the following observation: A generic ψ will not be contained in a proper subspace spanned by elements of D, since the finite collection of all these lower-dimensional subspaces has measure zero. Thus, generically, if we expand ψ = s∈D c s s in the dictionary D, the set {s ∈ D : c s = 0} has to span C d . Therefore, the solution of the system of linear equations induced by condition (A1) of Lemma 3 is uniquely attained at an extreme point y of M D . Note that such ψ's are also called non-degenerate in convex optimization [15].
Analogously to the case of a normal cone in a real-valued vector space, we identify the interior int(C y ) of a normal cone C y simply with all points ψ whose dual witness is a unique extreme point y. Note that this means that there exists an optimal extent decomposition ψ = s∈D c s s = s∈D α e e iφs s, such that α s ≥ 0, c s = α s e iφs , e iφs s ∈ C y , and {s ∈ D : c s = 0} spans C d .
With the above notion, we are able to describe how the extent is effected by adding a word w to the dictionary D. As in the case of a real valued vector space, an extreme point y ∈ M D becomes dually infeasible if | w, y | > 1 (i.e., y / ∈ M D∪{w} ). Hence, the extent of an element x decreases if y is the unique dual witness of x, that is x ∈ int(C y ). In summary, we get the following theorem: Theorem 4. Let D ⊂ C d a dictionary and let w ∈ C d with w, w = 1. Let D ′ = D∪{w}. Then, ξ D ′ (x) < ξ D (x), if and only if x ∈ int(C y ) for an extreme point y ∈ M D with | w, y | > 1.
In order to analyze the multiplicativity properties of the extent for product inputs, we now turn our attention to product dictionaries. The argument starts with the observation that extreme points of M D are closed under taking tensor products. That is, if y 1 , y 2 are extreme points of dually feasible sets M D j ⊂ C d j for two dictionaries D 1 and D 2 , then y 1 ⊗ y 2 is an extreme point of M D 1 ⊗D 2 , where D 1 ⊗ D 2 ⊂ C d 1 ⊗ C d 2 is the product dictionary. Indeed, since y 1 ⊗ y 2 ∈ M D 1 ⊗D 2 and the set is a spanning set of C d 1 ⊗ C d 2 . Moreover, by the characterization of the normal cone (3.1), it follows immediately that the normal cone of y 1 ⊗ y 2 has the form C y 1 ⊗y 2 = cone{e iφs 1 s 1 ⊗ e iφs 2 s 2 : e iφs j s j ∈ C y j , j = 1, 2}. (

3.4)
This allows us to derive the following multiplicativity property of product dictionaries: Lemma 5. Consider two dictionaries D j ⊂ C d j and extreme points y j ∈ M D j , j = 1, 2.
Proof. We will prove C y 1 ⊗ C y 2 ⊂ C y 1 ⊗y 2 , the statement int(C y 1 ) ⊗ int(C y 2 ) ⊂ int(C y 1 ⊗y 2 ) can be proven analogously. Let ψ j ∈ C j , so where α j s ≥ 0 and if α j s is positive, then e iφ j s s ∈ C y j . Thus, by Equation (3.4).
In order to prove multiplicativity it suffices to observe that, by the definition of the normal cone and the extent formulation (3.2), Using the above lemma and the generic uniqueness of the dual witness y, we are now able to prove our main theorem. We subdivide the proof in two parts, where the first part is an adaption of Claim 2 in [13] to the class of dictionaries defined in Theorem 1: Proposition 6. Assume that the dictionary sequence (D n ) with D n ⊂ (C d 0 ) ⊗n satisfies the assumptions of Theorem 1. Then, for a Haar-randomly chosen unit vector ψ ∈ (C d 0 ) ⊗n and some fixed ε > 0 it holds that In particular, F Dn (ψ) ≤ 1 √ d n 0 +ε for sufficiently large n and a typical unit vector ψ ∈ Proof. We fix a unit vector ω ∈ (C d 0 ) ⊗n and choose a Haar-random unit vector ψ ∈ (C d 0 ) ⊗n . Following the proof of Claim 2 in [13] we can bound the probability of the event {| ω, ψ | 2 ≥ x} by If we set x = ( d n 0 + ε) −1 for ε > 0 and use Properties (i) and (ii), we can use a union bound to estimate the fidelity of ψ with respect to D n by which converges to zero as n tends to infinity.
The proposition assures that randomly chosen unit vectors generically have small overlap with elements in the dictionary sequence. Starting from there, we proceed with the proof of the main theorem.
Proof of Theorem 1. Let ψ ∈ (C d 0 ) ⊗n be a unit vector satisfying F Dn (ψ) ≤ 1 √ d n 0 +ε for some ε > 0. Due to Proposition 6, this holds for a typical ψ and sufficiently large n. As a consequence of(3.3), we can lower bound the extent of ψ by Let y ∈ M Dn be an optimal dual witness, so ψ ∈ C y . As pointed out earlier, we can further assume that y is an extreme point of M Dn and that y ∈ int(C y ) generically. Applying Cauchy-Schwarz, we get a lower bound on the norm of y by Now consider ψ⊗ψ * . Assumption (iv) ensures that ξ D (ψ) = ξ D (ψ * ) and ψ * ∈ int(C y * ). The proof of Lemma 5 tells us that the extreme point y ⊗ y * of M Dn⊗Dn is optimal for Moreover, it is the unique optimizer, as ψ ⊗ ψ * ∈ int(C y ) ⊗ int(C y * ) ⊂ int(C y⊗y * ).
Next, we add the maximally entangled state Φ to the dictionary and observe since D n ⊗ D n ⊂ D n ⊗ D n ∪ {Φ}. The norm estimation (3.5) of y yields therefore y ⊗ y * is not contained in the set of dually feasible points M Dn⊗Dn∪{φ} of the dictionary D n ⊗ D n ∪ {φ}. Since y ⊗ y * ∈ int(C y⊗y * ) we can apply Theorem 4 to obtain ξ Dn⊗Dn∪{Φ} (ψ ⊗ ψ * ) < ξ Dn⊗Dn (ψ ⊗ ψ * ).
To conclude, because of (iv) and (v), which proves the desired result.

AN OPTIMALITY CONDITION FOR THE STABILIZER EXTENT
In this section we fix the dictionary sequence to be the set of n-qubit stabilizer states STAB n and we will derive a condition on optimal stabilizer extent decompositions. (While preparing this document, we learned that this fact has already been observed earlier [21], but does not seem to be published).
Let P n = n i=1 W i : W i ∈ {I, X, Y, Z} be the set of n-qubit Pauli matrices. The set of stabilizer states can be decomposed in a disjoint union of orthonormal bases, where each basis is labeled by a maximally commuting set S ⊂ P n of Pauli matrices (see [22], Chapter 10, or [9,23] for details). The projectors on the basis elements can be written as ss † = 1 2 n σ∈S (−1) kσ σ, where k σ ∈ {0, 1} has to be chosen in a way such that {(−1) kσ σ : σ ∈ S} is a closed matrix group with 2 n elements.
Theorem 7. Let ψ be an n-qubit state. Suppose that ψ = c s s is an optimal stabilizer extent decomposition, that is ξ(ψ) = s∈D |c s | 2 . Then, there is at most one non-zero c s for the words s, that are labeled by the same orthonormal basis.
For the proof of the theorem, we will make use of the Clifford group C n . For our purpose this is the unitary group that preserves the set STAB n , i.e., if U ∈ C n , then Us ∈ STAB n for all s ∈ STAB n (more details can be found in [9]).
Optimal stabilizer extent decompositions are invariant under the Clifford group, that is, if ψ = c s s is optimal for ψ and U belongs to the Clifford group, then Uψ = s∈STABn c s (Us) is optimal for Uψ. Since the Clifford group acts transitively on orthonormal stabilizer bases (independently of the number of qubits), an optimal extent decomposition of a 1-qubit state will never have non-zero coefficients for any of the three orthonormal stabilizer bases. Thus, we have shown the result for the 1-qubit case.
For the n-qubit case assume that ψ = c s s is a stabilizer decomposition with c s c s ′ = 0 for two stabilizer states s, s ′ ∈ STAB n belonging to the same orthonormal basis. Due to invariance of ξ under the Clifford group and its transitive action on orthonormal stabilizer bases, we may choose any orthonormal stabilizer basis. By possibly applying another Clifford unitary, we may even assume that s = e 0 ⊗ e 0 · · · e 0 , s ′ = e 1 ⊗ e 0 · · · e 0 . But if we consider the decomposition of the unnormalized state ω = c s e 0 ⊗ e 0 ⊗ · · · ⊗ e 0 + c s ′ e 1 ⊗ e 0 ⊗ · · · ⊗ e 0 = (c s e 0 + c s ′ e 1 ) ⊗ e 0 ⊗ · · · ⊗ e 0 , the 1-qubit case result together with the fact that stabilizer states are closed under taking tensor products can be applied to see that the decomposition of ω is not optimal. Now, the crucial observation is that if ψ = c s s is an optimal stabilizer extent decomposition, then ω = c s s + c s ′ s ′ is an optimal decomposition for ω. But as the decomposition of ω is not optimal, neither is the one of ψ.
There is an interesting connection between the derived optimality condition and the geometric properties of the stabilizer polytope SP n , which is the convex hull of the projectors onto stabilizer states, i.e., SP n = conv{ss † : s ∈ STAB n }. As shown in [24,25], two stabilizer projectors are connected by an edge if and only if they do not belong to the same orthonormal stabilizer basis. Thus, we can reformulate the above result: If ψ = c s s is an optimal stabilizer extent decomposition and c s c s ′ = 0, then conv{ss † , s ′ (s ′ ) † } is an edge of SP n .

SUMMARY AND OUTLOOK
We have settled an open problem in stabilizer resource theory, by showing that the stabilizer extent is generically sub-multiplicative in high dimensions. What is striking is that the previous multiplicativity results for one to three qubit states [10] made use of the detailed structure of the set of stabilizer states. In contrast, our counterexample involves only a small number of high-level properties of the stabilizer dictionary. Therefore, we see this work as evidence that ℓ 1 -based complexity measures on tensor product spaces should be expected to be strictly sub-multiplicative in the absence of compelling reasons to believe otherwise. In particular, it seems highly plausible that the assumptions that go into Theorem 1 can be considerably weakened. We leave this problem open for future analysis.
Next, we prove Proposition 2, which gives a characterization of the extreme points of the set of dually feasible points M D = {y ∈ C d : | s, y | ≤ 1 for all s ∈ D}.
Proof of Proposition 2. Let y ∈ M D . First, we assume that the set A y = {s ∈ D : | s, y | = 1} does not span C d . Then, there exists u ∈ C d being orthogonal to all elements in A y and, since d is finite, we can find ε > 0 such that y ± εu ∈ M D and y = 1 2 ((y + εu) + (y − εu)) is a proper convex combination of y ± ε. Hence, y is not an extreme point of M D .
Conversely, assume that A y spans C d and that y = αu + (1 − α)v for some u, v ∈ M D . For every s ∈ A y there is φ s ∈ R such that 1 = e iφs s, y = αe iφs s, u + (1 − α)e iφs s, v , hence, e iφs s, u R = e iφs s, v R = 1.
But as | s, u | ≤ 1 and | s, v | ≤ 1, it must hold that e iφs s, u I = e iφs s, v I = 0.
Since the elements of A y span C d , the system e iφs s, w = 1 for all s ∈ A y and w ∈ C d has the unique solution y, so y = u = v and y is an extreme point of M y .
We will continue with the proof of Lemma 3, which is a consequence of complementary slackness.
Proof of Lemma 3. If (c s , t s ) s∈D is optimal for the primal and (y, (z s ) s∈D ) optimal for the dual, then complementary slackness [15] enforces where the last inequality follows from (c s , t s ) ∈ L 2+1 and | s, y | ≤ 1 for all s ∈ D.
Consequently, we have equality in each step. This leads to the conditions given in the lemma because: (A1) If c s = 0, then | s, y | = 1, but by the first inequality the vector (c R s , c I s ) must be proportional to ( s, y R , s, y I ), hence s, y = cs |cs| .