LIMDD: A Decision Diagram for Simulation of Quantum Computing Including Stabilizer States

Efficient methods for the representation and simulation of quantum states and quantum operations are crucial for the optimization of quantum circuits. Decision diagrams (DDs), a well-studied data structure originally used to represent Boolean functions, have proven capable of capturing relevant aspects of quantum systems, but their limits are not well understood. In this work, we investigate and bridge the gap between existing DD-based structures and the stabilizer formalism, an important tool for simulating quantum circuits in the tractable regime. We first show that although DDs were suggested to succinctly represent important quantum states, they actually require exponential space for certain stabilizer states. To remedy this, we introduce a more powerful decision diagram variant, called Local Invertible Map-DD (LIMDD). We prove that the set of quantum states represented by poly-sized LIMDDs strictly contains the union of stabilizer states and other decision diagram variants. Finally, there exist circuits which LIMDDs can efficiently simulate, while their output states cannot be succinctly represented by two state-of-the-art simulation paradigms: the stabilizer decomposition techniques for Clifford + $T$ circuits and Matrix-Product States. By uniting two successful approaches, LIMDDs thus pave the way for fundamentally more powerful solutions for simulation and analysis of quantum computing.


Introduction
Classical simulation of quantum computing is useful for circuit design [1,2], verification [3,4] and studying noise resilience in the era of Noisy Intermediate-Scale Quantum (NISQ) computers [5]. Moreover, identifying classes of quantum circuits that are classically simulatable, helps in excluding regions where a quantum computational advantage cannot be obtained. For example, circuits containing only Clifford gates (a non-universal quantum gate set), using an all-zero initial state, only compute the so-called 'stabilizer states' and can be simulated in polynomial time [6][7][8][9][10]. Stabilizer states, and associated formalisms for expressing them, are fundamental to many quantum error correcting codes [8] and play a role in measurement-based quantum computation [11]. In fact, simulation of universal quantum circuits is fixed-parameter tractable in the number of non-Clifford gates [12], which is why many modern simulators are based on stabilizer decomposition [12][13][14][15][16][17]. Another method for simulating universal quantum computation is based on decision diagrams (DDs) [18][19][20][21], including Algebraic DDs [22][23][24][25], Affine Algebraic DDs [26], Quantum Multi-valued DDs [27,28], and Tensor DDs [29]. A DD is a directed acyclic graph (DAG) in which each path represents a quantum amplitude, enabling the succinct (and exact) representation of many quantum states through the combinatorial nature of this structure. A DD can also be thought of as a homomorphic (lossless) compression scheme, since various manipulation operations for DDs exist which implement any quantum gate operation, including measurement (without requiring decompression). Strong simulation is therefore easily implemented using a DD data structure [27][28][29]. Indeed, DD-based simulation was empirically shown to be competitive with state-of-the-art simulators [21,28,30] and is used in several simulator and circuit verification implementations [31,32]. DDs and the stabilizer formalism are introduced in Sec. 2.
QMDDs are currently the most succinct DD supporting quantum simulation, but in this paper we show that they require exponential size to represent a type of stabilizer state called a cluster state [33]. In order to unite the strengths of DDs and the stabilizer formalism and inspired by SLOCC (Stochastic Local Operations and Classical Communication) equivalence of quantum states [34,35], in Sec. 3, we propose LIMDD: a new DD for quantum computing simulation using local invertible maps (LIMs). Specifically, LIMDDs eliminate the need to store multiple states which are equivalent up to LIMs, allowing more succinct DD representations. For the local operations in the LIMs, we choose Pauli operations, creating a Pauli-LIMDD, which we will simply refer to as LIMDD. We prove that there is a family of quantum states -called pseudo cluster states-that can be represented by poly-sized (Pauli-)LIMDDs but that require exponentially-sized QMDDs and cannot be expressed in the stabilizer formalism. We also show the same separation for matrix product states (MPS) [36, 38? ]. Fig. 1 visualizes the resulting separations.
Further, we give algorithms for simulating quantum circuits using Pauli-LIMDDs. We continue by investigating the power of these algorithms compared to state-of-the-art simulation algorithms based on QMDD, MPS and stabilizer decomposition. We find circuit families which Pauli-LIMDD can efficiently simulate, which stands in stark contrast to the exponential space needed by QMDDbased, MPS-based and a stabilizer-decomposition-based simulator (the latter result is conditional on the exponential time hypothesis). This is the first analytical comparison between decision diagrams and matrix product states.
Efficient decision diagram operations for both classical [39] and quantum [2] applications crucially rely on dynamic programming (storing the result of each intermediate computation) and canonicity (each quantum state has a unique, smallest representative as a LIMDD) [40][41][42]. We provide algorithms for both in Sec. 4. Indeed, the main technical contribution of this paper is the formulation of a canonical form for Pauli-LIMDDs together with an algorithm which brings a Pauli-LIMDD into this canonical form. By interleaving this algorithm with the circuit simulation algorithms, we ensure that the algorithms act on LIMDDs that are canonical and as small as possible.
The canonicity algorithm effectively determines whether two n-qubit quantum states |φ⟩ , |ψ⟩, each represented by a LIMDD node φ, ψ, are equivalent up to a Pauli operator P , i.e, |φ⟩ = P |ψ⟩, which we call an isomorphism between |φ⟩ and |ψ⟩. Here P = P n ⊗ · · · ⊗ P 1 consists of single qubit Pauli operators P i (ignoring scalars for now). In general, there are multiple choices for P , so the goal is to make a deterministic selection among them, to ensure canonicity of the resulting LIMDD. To do so, we first take out one qubit and write the states as, e.g., |φ⟩ = c 0 |0⟩ |φ 0 ⟩ + c 1 |1⟩ |φ 1 ⟩ for complex coefficients c 0 , c 1 . We then realize that P rest = P n−1 · · · ⊗ P 1 must map the pair (|φ 0 ⟩ , |φ 1 ⟩) to either (|ψ 0 ⟩ , |ψ 1 ⟩) or (|ψ 1 ⟩ , |ψ 0 ⟩) (in case P n is a diagonal or antidiagonal, respectively). Hence P rest is a member of the intersection of the two sets of isomorphisms. Next, we realize that the set of all isomorphisms, e.g. mapping |φ 0 ⟩ to |ψ 0 ⟩, is a coset π · G of the stabilizer group G of |φ 0 ⟩ (i.e. the set of isomorphisms mapping |φ 0 ⟩ to itself) where π is a single isomorphism |φ 0 ⟩ → |ψ 0 ⟩. Thus, to find a (canonical) isomorphism between n-qubit states |φ⟩ → |ψ⟩ (or determine no such isomorphism exists), we need three algorithms: to find (a) an isomorphism between (n − 1)-qubit states, (b) the stabilizer group of an (n − 1)-qubit state (in fact: the group generators, which form an efficient description), (c) the intersection of two cosets in the Pauli group (solving the Pauli coset intersection problem). Task (a) and (b) are naturally formulated as recursive algorithms on the number of qubits, which invoke each other in the recursion step. For (c) we provide a separate algorithm which first rotates the two cosets such that one is a member of the Pauli Z group, hence isomorphic to a binary vector space, followed by using existing algorithms for binary coset (hyperplane) intersection. Having characterized all isomorphisms |φ⟩ → |ψ⟩, we select the lexicographical minimum to ensure canonicity. We emphasize that the algorithm works for arbitrary quantum states, not only stabilizer states.

Preliminaries
Here, we briefly introduce two methods to manipulate and succinctly represent quantum states: decision diagrams, which support universal quantum computing, and the stabilizer formalism, in which a subset of all quantum computations is supported which can however be efficiently classically simulated. Both support strong simulation, i.e. the probability distribution of measurement outcomes can be computed (through weak simulation one only samples measurement outcomes).
The Quantum Multi-valued Decision Diagram (QMDD) [27] is a data structure which can succinctly represent functions of the form f : {0, 1} n → C, and thus can represent any quantum state per Eq. 1. A QMDD is a rooted DAG with a unique leaf node 1 , representing the value 1. Fig. 2 (d) shows an example (and its construction from a binary tree). Each node has two outgoing edges, called its low edge (dashed line) and its high edge (solid line). The diagram has levels as each node is labeled with (the index of) a variable; the root has index n, its children n − 1, etc, until the leaf with index 0 (the set of nodes with index k form level k). Hence each path from root to leaf visits nodes representing the variables x 3 , x 2 , x 1 in order. The value f (x n , . . . , x 1 ) = ⟨x n . . . x 1 |φ⟩ is computed by traversing the diagram, starting at the root edge and then for each node at level i following the low edge (dashed line) when x i = 0, and the high edge (solid line) when x i = 1, while multiplying the edge weights (shown in boxes) along the path, e.g., f (1, 1, 0) = 1 2 · 1 · − √ 2 · 1 = − 1 √ 2 in Fig. 2. A path from the root to a node v with index k (on level k) thus corresponds to a partial assignment (x n = a n , . . . , x k−1 = a k−1 ), which induces subfunction f ⃗ a (x k , . . . , x 1 ) ≜ f (a n , . . . , a k−1 , x k , . . . , x 1 ). The node v represents this subfunction up to a complex factor γ, which is stored on the edge incoming to v along that path. This allows any two nodes which represent functions equal up Canonicity ensures that the QMDD is always as small as possible as redundant nodes are merged. But more importantly, canonicity allows for quick equality checks: two diagrams represent the same state if and only if their root edges are the same (i.e., have the same label and point to the same root node). This allows for efficient QMDD manipulation algorithms (i.e. updating the QMDD upon performing a gate or measurement) through dynamic programming, which avoids traversing all paths (exponentially many in the size of the diagram in the worst case). For all quantum gates, there are algorithms to update the QMDD accordingly and measurement is also supported (even efficiently). Therefore, QMDDs can simulate any quantum circuit, although they may become exponentially large (in the number of qubits) already after applying part of the gates from the circuit. The resulting simulator is strong, as the complete final state is computed as QMDD (and computing measurement outcome probabilities on QMDD is tractable).
Finally, we can also define the semantics of a node v recursively, overloading Dirac notation: |v⟩.

Pauli operators and stabilizer states
In contrast to decision diagrams, the stabilizer formalism [6] forms a classically simulatable subset of quantum computation. Instead of explicitly representing the (exponential) amplitude vector, the stabilizer formalism describes the symmetries a quantum state using so-called stabilizers. A unitary operator U stabilizes a state |φ⟩ if |φ⟩ is a +1 eigenvector of U , i.e., U |φ⟩ = |φ⟩. The formalism considers stabilizers U made up of the single-qubit Pauli operators Pauli ≜ {I, X, Y, Z} as defined below. In fact, a stabilizer is taken from the n-qubit Pauli group, defined as Pauli n ≜ Pauli ⊗n , i.e. it is the group generated by all n-qubit Pauli strings P n ⊗ · · · ⊗ P 1 with P i ∈ Pauli. Here we used the notation ⟨G⟩ = H to denote that G ⊆ H is a generator set for a group H. One can check that Pauli n = {i c P n ⊗ · · · ⊗ P 1 | P 1 , . . . , P n ∈ Pauli, c ∈ {0, 1, 2, 3}}, so in particular we have Pauli 1 = {±P, ±iP | P ∈ Pauli} (the Pauli set with a factor ±1 or ±i).
The set of Pauli stabilizers Stab(|φ⟩) ⊂ Pauli n of an n-qubit quantum state |φ⟩ necessarily forms a subgroup of Pauli n , since the identity operator I ⊗n is a stabilizer of any n-qubit state and moreover if U and V stabilize |φ⟩, then so do U V, V U and U −1 . Furthermore, any Pauli stabilizer group G is abelian, i.e. A, B ∈ G implies AB = BA. The reason for this is that elements of Pauli n either commute (AB = BA) or anticommute (AB = −BA) under multiplication and anticommuting elements can never be stabilizers of the same state |φ⟩, because if A, B ∈ Stab(|φ⟩) and AB = −BA then |φ⟩ = AB |φ⟩ = −(BA) |φ⟩ = − |φ⟩, a contradiction. Finally, note that −I ⊗n can never be a stabilizer. In fact, these conditions are necessary and and sufficient: the class of abelian subgroups G of Pauli n , not containing −I ⊗n , are precisely all n-qubit stabilizer groups. For clarity, we adopt the convention that we denote Pauli strings without phase using the symbols P, Q, R, . . . and we use the symbols A, B, C, . . . for Pauli operators including phase; e.g., we may write A = λP . The phase λ of any stabilizer λP ∈ Pauli can only be λ = ±1, derived as ∀λP ∈ Stab(|φ⟩) : |φ⟩ = (λP ) |φ⟩ = (λP ) 2 |φ⟩ = λ 2 I |φ⟩ = λ 2 |φ⟩ =⇒ λ = ±1. (2) The number of generators k for a n-qubit stabilizer group S can range from 1 to n, and S has 2 k elements. If k = n, then there is only a single quantum state |φ⟩ (a single vector up to complex scalar) which is stabilized by S; such a state is called a stabilizer state. Equivalently, |φ⟩ = C |0⟩ ⊗n where C is a circuit composed of only Clifford unitaries, a group generated by the Clifford gates: In the stabilizer formalism, an n-qubit stabilizer state is succinctly represented through n independent generators of its stabilizer group, each of which is represented by O(n) bits to encode the Pauli string (plus factor), yielding O(n 2 ) bits in total. Examples of (generators of) stabilizer groups are ⟨Z⟩ for |0⟩ and ⟨X ⊗ X, Z ⊗ Z⟩ for 1 √ 2 (|00⟩ + |11⟩). Updating a stabilizer state's generators after application of a Clifford gate or a single-qubit computational-basis measurement can be done in polynomial time in n [6,7]. Various efficient algorithms exist for manipulating stabilizer (sub)groups S, including testing membership (is A ∈ Pauli n a member of S?) and finding a generating set of the intersection of two stabilizer (sub)groups. These algorithms predominantly use standard linear algebra, e.g., Gauss-Jordan elimination, as described in App. A in detail.
In this work, we also consider states which are not stabilizer states and which therefore have a nonmaximal stabilizer group (i.e. < n generators). To emphasize that a stabilizer group need not be maximal, i.e. it is a subgroup of maximal stabilizer groups, we will use the term stabilizer subgroup. Such objects are also studied in the context of simulating mixed states [43] and quantum error correction [8]. Examples of stabilizer subgroups are {I} for 1 Stabilizer decomposition-based methods [12][13][14][15][16][17] extend the stabilizer formalism to families of Clifford circuits with arbitrary input states |φ n ⟩, enabling the simulation of universal quantum computation [46]. By decomposing the n-qubit state |φ n ⟩ as linear combination of χ stabilizer states followed by simulating the circuit on each of the χ stabilizer states, the measurement outcomes can be computed in time O(χ 2 · poly(n)), where the least χ is referred to as the stabilizer rank of |φ n ⟩. Therefore, stabilizer-rank based methods are efficient for any family of input states |φ n ⟩ whose stabilizer rank grows polynomially in n.
A specific method for obtaining a stabilizer decomposition of the output state of an n-qubit circuit is by rewriting the circuit into Clifford gates and T = |0⟩⟨0|+e iπ/4 |1⟩⟨1| gates (a universal gate set). Next, each of the T gates can be converted into an ancilla qubit initialized to the state T |+⟩ where |+⟩ = 1 √ 2 (|0⟩ + |1⟩); thus, an n-qubit circuit containing t T gates will be converted into an n + tqubit Clifford circuit with input state |φ⟩ = |0⟩ ⊗n ⊗ (T |+⟩) ⊗t [12]. We will refer to the resulting specific stabilizer-rank based simulation method as the 'Clifford + T simulator,' whose simulation runtime scales with χ t = χ((T |+⟩) ⊗t ), the number of stabilizer states in the decomposition of |φ⟩. Trivially, we have χ t ≤ 2 t , and although recent work [12,13] has found decompositions smaller than 2 t terms based on weak simulation methods, the scaling of χ t remains exponential in t. We emphasize that the Clifford + T decomposition is not necessarily optimal, in the sense that the intermediate states of the circuit might have lower stabilizer rank than |T ⟩ ⊗t does. Consequently, if a given circuit contains t = Ω(n) T -gates, then the Clifford + T simulator requires exponential time (in n) for simulating this n-qubit circuit, even if there exist polynomially-large stabilizer decompositions of each of the circuit's intermediate and output states (i.e., in principle, there might exist another stabilizer rank-based simulator that can simulate this circuit efficiently).

Matrix product states
Representing quantum states as matrix product states (MPS) has proven successful for solving a large range of many-body physics problems [36,37]. For qubits, an n-qubit MPS M is formally defined as a series of 2n matrices A Here, D k+1 is the matrix dimension over the k-th bond. The interpretation |M ⟩ is determined as ⟨x 1 x 2 . . . x n |M ⟩ = A x1 1 A x2 2 · · · A xn n for x 1 , . . . , x n ∈ {0, 1}. If the bond dimension may scale exponentially in the number of qubits, any family of quantum states can be represented exactly by an MPS.
The Schmidt rank of a state |φ⟩ on n qubits, relative to a bipartition of the qubits into two sets A and B, is the smallest integer m ≥ 1 such that |φ⟩ can be expressed as the superposition |φ⟩ = m j=1 c j |a j ⟩ A |b j ⟩ B for complex coefficients c j , where the states |a j ⟩ A (|b j ⟩ B ) form an orthonormal basis for the Hilbert space of the A register (B register). The relation with MPS is that the maximum Schmidt rank with respect to any bipartition A = {x 1 , . . . , x k }, B = {x k + 1, . . . , x n } is precisely the smallest possible bond dimension D k+1 required to exactly express a state in MPS.
Vidal [38] showed that MPS-based circuit simulation is possible in time O(n · poly(χ)) per elementary operation, where n is number of qubits and χ the maximum Schmidt rank for all intermediate states computed.

Local Invertible Map Decision Diagrams
Sec. 3.1 introduces a LIMDD definition parameterized with different local operations. We mainly consider the Pauli-LIMDD and refer to it simply as LIMDD. We show how LIMDDs generalize QMDDs and can represent arbitrary quantum states, normalized or not. We then use this definition in Sec. 3.2 to show how LIMDDs succinctly -i.e., in polynomial space-represent graph states (in particular cluster states), coset states and, more generally, stabilizer states. On the other hand, QMDDs and MPS require exponential size to represent two-dimensional cluster states.
We translate this exponential advantage in quantum state representation to (universal and strong) quantum circuit simulation in Sec. 3.3 by giving algorithms to update and query the LIMDD data structure. These LIMDD manipulation algorithms take a LIMDD φ, representing some state |φ⟩, and return another LIMDD ψ that represents the state |ψ⟩ = U |φ⟩ for standard gates U and also for arbitrary unitaries U (by preparing U in LIMDD form first; we show how). The measurement algorithm we give returns the outcome in linear time in size of the LIMDD representation of the quantum state.
For many quantum operations, we show that our manipulation algorithms are efficient on all quantum states, i.e., take polynomial time in the size of the LIMDD representation of the state. Algorithms for certain other operations are efficient for certain classes of states, e.g., all Clifford gates can be applied in polynomial time to a LIMDD representing a stabilizer state. We show that LIMDDs can be exponentially faster than QMDDs, while they are never slower by more than a multiplicative factor O(n 3 ). These algorithms use a canonical form of LIMDDs, such that for each state there is a unique LIMDD. We defer this subject to Sec. 4, which introduces reduced LIMDDs and efficient algorithms to compute them.
With these algorithms, a quantum circuit simulator can be engineered by applying the circuit's gates one by one on the representation of the state as LIMDD. Prop. 1 provides the bottom line of this section by comparing simulator runtimes. In Sec. 3.4, we prove Prop. 1.

Proposition 1. Let QSim
Clifford + T C denote the runtime of the Clifford + T simulator on circuit C (allowing for weak simulation as in [13]). Let QSim D C denote the runtime of strong simulation of circuit C using method D = (Pauli−)LIMDD, QMDD, QMDD ∪ Stab, MPS, QMDD ∪ Stab. * Here, the latter is an (imaginary) ideal combination of QMDD (not tractable for all Clifford circuits) and the stabilizer formalism (tractable for Clifford circuits), i.e., one that always inherits the best worst-case runtime from either method. The following holds, where Ω * discards polynomial factors, i.e., Ω * (f (n)) ≜ Ω(n O(1) f (n)).
There is a family of circuits C such that: 1. LIMDD is exponentially faster than Clifford + T :

The LIMDD data structure
Where QMDDs only merge nodes representing the same complex vector up to a constant factor, the LIMDD data structure goes further by also merging nodes that are equivalent up to local operations, called Local Invertible Maps (LIMs) (see Def. 1). As a result, LIMDDs can be exponentially more succinct than QMDDs, for example in the case of stabilizer states (see Sec. 3.2). We will call nodes which are equivalent under LIMs, (LIM-) isomorphic. This definition generalizes SLOCC equivalence (Stochastic Local Operations and Classical Communication); if we choose the parameter G to be the linear group, then the two notions coincide (see [34,App. A] and [35,47]). Figure 3: A QMDD (a) representing the state 1 , evolving into a LIMDD (d). As in Fig. 2, diagram nodes are horizontally ordered in 'levels' with qubit indices 4, 3, 2, 1. Low edges are dashed, high edges solid. See the text for an explanation. By convention, unlabelled edges have label 1 (for QMDD) or I ⊗k (for LIMDD nodes at level k).
Before we give the formal definition of LIMDDs in Def. 2, we give a motivating example in Fig. 3, which uses ⟨X, Y, Z, T ⟩-LIMs to demonstrate how the use of isomorphisms can yield small diagrams for a four-qubit state. This figure shows how to merge nodes in four steps, shown in subfigures (a)-(d), starting with a large QMDD (a) and ending with a small LIMDD (d). In the QMDD (a), the nodes labeled q 1 and q ′ 1 represent the single-qubit states |q 1 ⟩ = [1, 1] ⊤ and |q ′ 1 ⟩ = [1, −1] ⊤ , respectively. By noticing that these two vectors are related via |q ′ 1 ⟩ = Z |q 1 ⟩, we merge nodes q 1 , q ′ 1 into node ℓ 1 in (b), storing the isomorphism Z on all incoming edges that previously pointed to q ′ 1 . From step (b) to (c), we first merge q 2 , q ′ 2 into ℓ 2 , observing that |q ′ 2 ⟩ = I ⊗ Z |q 2 ⟩. Second, we create a node ℓ ′ 2 such that |p 2 ⟩ = T ZX ⊗ I |ℓ ′ 2 ⟩ and |p ′ 2 ⟩ = T ⊗ X |ℓ ′ 2 ⟩. So we can merge nodes p 2 , p ′ 2 into ℓ ′ 2 , placing these isomorphisms on the respective edges. To go from (c) to (d), we merge nodes q 3 , q ′ 3 into node ℓ 3 by noticing that |q ′ 3 ⟩ = (Z ⊗ I ⊗ Z) |q 3 ⟩. This isomorphism Z ⊗ I ⊗ Z is stored on the high edge out of the root node. We have |q 3 ⟩ = I ⊗ I ⊗ X |ℓ 3 ⟩, so we propagate the isomorphism I ⊗ I ⊗ X upward, and store it on the root edge. Therefore, the final LIMDD has the LIM 1 4 I ⊗ I ⊗ I ⊗ X on its root edge. The resulting data structure in Fig. 3 is a LIMDD of only six nodes instead of ten, but requires additional storage for the LIMs. Sec. 3.2 shows that merging isomorphic nodes sometimes leads to exponentially smaller diagrams, while the additional cost of storing the isomorphisms results only costs a linear factor of space (linear in the number of qubits).
The transformation presented above (for Fig. 3) only considers particular choices for LIMs. For instance, it would be equally valid to select LIM I ⊗ Z instead of −I ⊗ XZ for mapping q ′ 2 onto q 2 . In fact, efficient algorithms to select LIMs in such a way that a canonical LIMDD is obtained are a cornerstone for the LIMDD manipulation algorithms presented in Sec. 3.3. Sec. 4 provides a solution for ⟨Pauli⟩-LIMs (the basis for all results presented in the current article), which is based on using the stabilizers of each node, e.g., the group generated by {I ⊗ X, Y ⊗ I} for q 2 .

Definition 2.
An n-G-LIMDD is a rooted, directed acyclic graph (DAG) representing an n-qubit quantum state. Formally, it is a 6-tuple (Node ∪ {Leaf}, idx, low, high, label, e root ), where: • Leaf (a sink) is a unique leaf node with qubit index idx(Leaf) = 0; • Node is a set of nodes with qubit indices idx(v) ∈ {1, . . . , n} for v ∈ Node; • e root is a root edge without source pointing to the root node r ∈ Node with idx(r) = n; • low, high : Node → Node ∪ {Leaf} indicate the low and high edge functions, respectively.
We write low v (or high v ) to obtain the edge (v, w) with w = low(v) (or w = high(v)). For all v ∈ Node it holds that idx(low(v)) = idx(high(v)) = idx(v) − 1 (no qubits are skipped ‡ ); • label : low∪high∪{e root } → k−G-LIM∪{0} is a function labeling edges ( . , w) with k-G-LIMs or 0, where k = idx(w) We will find it convenient to write for a node u with low and high edges to nodes v and w labeled with A and B, respectively. We will also denote v A for a (root) edge to v labeled with A. When omitting A or B, e.g., v , the LIM should be interpreted as I ⊗idx(v) .
We define the semantics of a leaf, node v and an edge e to node v by overloading the Dirac notation: It follows from this definition that a node v with idx(v) = k represents a quantum state on k qubits. This state is however not necessarily normalized: For instance, a normalized state α |0⟩ + β |1⟩, So the node v represents a state up to global scalar. But, in general, any scalar can be applied to the root edge, or any other edge for that matter. So LIMDDs can represent any complex vector.
The tensor product |e 0 ⟩ ⊗ |e 1 ⟩ of the G-LIMDDs with root edges e 0 pointing to the e 1 root node w. The result is an n + m level LIMDD if e 0 has n levels and e 1 has m. In addition, the LIMs C on the other edges in the LIMDD e 0 should be extended to C ⊗ I ⊗m .
We can now consider various instantiations of the above general LIMDD definition for different LIM groups G. A G-LIMDD with G = {I} yields precisely all QMDDs by definition, i.e., all edges labels effectively only contain scalars. As all groups G contain the identity operator I, the universality of G-LIMDDs (i.e., all quantum states can be represented) follows from the universality of QMDDs. It also follows that any state that can efficiently be represented by QMDD, can be efficiently represented by a G-LIMDD for any G. Similarly, we can consider ⟨Z⟩ and ⟨X⟩, which are subgroups of the Pauli group, and define a ⟨Z⟩-LIMDD and a ⟨X⟩-LIMDD; instances that we will study for their relation to graph states and coset states in Sec. 3.2. Finally, and most importantly, ⟨Pauli⟩-LIMDDs can represent all stabilizer states in polynomial space, which is a feature that neither QMDDs nor matrix product states (MPS) posses, as shown in Sec. 3

Succinctness of LIMDDs
Succinctness is crucial for efficient simulation, as we show later. In this section, we show exponential advantages for representing states with LIMDDs over two other state-of-the-art data structures: QMDDs and Matrix Product States (MPS) [36,37]. Specifically, QMDDs and MPS require exponential space in the number of qubits to represent specific stabilizer states called (two-dimensional) cluster states. We also show that an ad-hoc combination of QMDD with the stabilizer formalism still requires exponential space for 'pseudo-cluster states.' These results are visualized in Fig. 1.  D2 signify an exponential separation between two classes, i.e., some quantum states have polynomial-size representation in D1, but only exponential-size in D2. By transitivity, QMDD is exponentially separated from all representations (not drawn for clarity). In particular, G-LIMDDs with G = ⟨Pauli⟩ can be exponentially more succinct than QMDDs, and retain this exponential advantage even with G = ⟨Z⟩ , ⟨X⟩. In Corollary 3, we show the strongest result, namely that LIMDDs are also more succinct than the union of QMDDs and stabilizer states, written QMDD ∪ Stab, which can be thought of a structure that switches between QMDD and the stabilizer formalism depending on its content (stabilizer or non-stabilizer state). This demonstrates that ad-hoc combinations of existing formalisms do not make LIMDDs obsolete. Fig. 4 hold.

Proposition 2. The inclusions and separations in
Proof. The inclusions between the sets of states shown in gray are well known [44,45]. The inclusions between decision diagrams hold because, e.g., a QMDD is a G-LIMDD with G = {I}, i.e., each label is of the form λI n with λ ∈ C, as discussed in Sec. 3.1. The relations between coset, graph, stabilizer states and G-LIMDD with G = ⟨X⟩ , ⟨Z⟩ , ⟨Pauli⟩ are proven in Th. 1 and App. C (which also shows that poly-sized LIMDD includes QMDD ∪ Stab). Corollary 3 shows that there is family of a non-stabilizer states (with small LIMDD) for which QMDD is exponential, hence the separation between QMDD ∪ Stab. Th. 2 shows the separation with QMDDs by demonstrating that the so-called (two-dimensional) cluster state, requires 2 Ω( √ n) nodes as QMDD. Finally, App. C proves the same for coset states.
Th. 1 shows that any stabilizer state can be represented as a ⟨Pauli⟩-Tower LIMDD (Def. 3).

Definition 3.
A n-qubit G-Tower-LIMDD, is a G-LIMDD with exactly one node on each level. Edges to nodes on level k are labeled as follows: low edges are labeled with I ⊗k , high edges with P ∈ G ⊗k ∪ {0} and the root edge is labeled with λ · P with P ∈ G ⊗k and λ ∈ C \ {0} (i.e., in contrast to high edges, the root edge can have an arbitrary scalar). Fig. 6 depicts a n-qubit G-Tower LIMDD. Theorem 1. Let n > 0. Each n-qubit stabilizer state is represented up to normalization by a ⟨Pauli⟩-Tower LIMDDs of Def. 3, e.g., where the scalars λ of the PauliLIMs λP on high edges are restricted as λ ∈ {0, ±1, ±i}. Conversely, every such LIMDD represents a stabilizer state.
We also note that Th. 1 demonstrates that for any n-qubit stabilizer state |φ⟩, the (n − 1)-qubit states (⟨0| ⊗ I 2 n−1 ) |φ⟩ and (⟨1| ⊗ I 2 n−1 ) |φ⟩ are not only stabilizer states, but also PauliLIM-isomorphic. While we believe this fact is known in the community, § we have not found this statement written down explicitly in the literature. More importantly for this work, to the best of our knowledge, the resulting recursive structure (which DDs are) has not yet been exploited in the context of classical simulation.
Next, Th. 2 shows the separation with QMDDs by demonstrating that the so-called (two-dimensional) cluster state, requires 2 Ω( √ n) nodes as QMDD. Corollary 3 shows that a trivial combination with stabilizer formalism does not solve this issue. Theorem 2. Denote by |G n ⟩ the two-dimensional cluster state, defined as a graph state on the n × n lattice. Each QMDD representing |G n ⟩ has at least 2 ⌊n/12⌋ nodes.
Proof sketch. Consider a partition of the vertices of the n × n lattice into two sets S and T of size 1 2 n 2 , corresponding to the first 1 2 n 2 qubits under some variable order. Then there are at least ⌊n/3⌋ vertices in S that are adjacent to a vertex in T [49,Th. 11]. Because the degree of the vertices is small, many vertices on this boundary are not connected and therefore influence the amplitude function independently of one another. From this independence, it follows that, for any variable order, the partial assignments ⃗ a ∈ {0, 1} → C is the amplitude function of |G n ⟩. The lemma follows by noting that a QMDD has a single node per unique subfunction modulo phase. For details see App. B.

Corollary 3 (Exponential separation between Pauli-LIMDD versus QMDD union stabilizer states).
There is a family of non-stabilizer states, which we call pseudo cluster states, that have polynomialsize Pauli-LIMDD but exponential-size QMDDs representation.
Proof. Consider the pseudo cluster state |φ⟩ = 1 √ 2 (|0⟩ + e iπ/4 |1⟩) ⊗ |G n ⟩ where |G n ⟩ is the graph state on the n × n grid. This is not a stabilizer state, because each computational-basis coefficient of a stabilizer state is of the form z · 1 √ 2 k for z ∈ {±1, ±i} and some integer k ≥ 1 [9], while is not of this form. Its canonical QMDD and Pauli-LIMDD have root nodes Gn 1

Gn e iπ/4
and Gn I Gn e iπ/4 I , where the respective diagram for G n is exponentially large (Th. 2) and polynomially small (Th. 1).

LIMDDs are exponentially more succinct than matrix product states
Th. 4 states that matrix product states (MPS) require large bond dimension for representing the two-dimensional cluster states, which follows directly from the well-known results that these states have large Schmidt rank.

Theorem 4.
To represent the graph state on the n × n grid (the two-dimensional cluster state on n 2 qubits), an MPS requires bond dimension 2 Ω(n) .
Proof. Van den Nest et al. [50] consider spanning trees over the complete graph where each node corresponds to a qubit and define the Schmidt-rank width: the largest encountered base-2 logarithm of the Schmidt rank between the two connected components resulting from removing an edge from the spanning tree, minimized over all possible spanning trees. It then follows from the relation between bond dimension and Schmidt rank (see Sec. 2) that any quantum state with Schmidt-rank width w requires bond dimension 2 w for representation by an MPS. Van den Nest et al. also showed that for graph states, the Schmidt-rank width equals the so-called rank width of the graph, which for n × n grid graphs was shown to equal n − 1 by Jelinek [51]. This proves the theorem. § For instance, this fact can be observed (excluding global scalars) by executing the original algorithm for simulating single-qubit computational-basis measurement on the first qubit, as observed in [6]. Similarly, the characterization in Prop. 2 of ⟨Z⟩-Tower-LIMDDs as representing precisely the graph states, is immediate by defining graph states recursively (see App. C). The fact that ⟨X⟩-Tower LIMDDs represent coset states is less evident and requires a separate proof, which we also give in App. C.
In contrast, the Pauli-LIMDD efficiently represents cluster states, and more generally all stabilizer states (Th. 1).

Pauli-LIMDD manipulation algorithms for simulation of quantum computing
In this section, we give all algorithms that are necessary to simulate a quantum circuit with Pauli-LIMDDs (referred to simply as LIMDD from now on). We provide algorithms which update the LIMDD after an arbitrary gate and after a single-qubit measurement in the computational basis. In addition, we give efficient specialized algorithms for applying a Clifford gate to a stabilizer state (represented by a ⟨Pauli⟩-Tower LIMDD) and computing a measurement outcome. We also show that many (Clifford) gates can in fact be applied to an arbitrary state in polynomial time. Table 1 provides an overview of the LIMDD algorithms and their complexities compared to QMDDs.
Central to the speed of many DD algorithms is keeping the diagram canonical throughout the computation. Recall from Sec. 3.1, that a G-LIMDD can merge isomorphic nodes v ≃ G w, i.e., if there exists a G-LIM C such that |w⟩ = C |v⟩. To achieve this, we require a 'MakeEdge' subroutine which, given the node MakeEdge algorithm for ⟨Pauli⟩-LIMDDs satisfying this specification. For now, the reader may assume the provisional implementation in Alg. 2, which does not yet merge LIM-isomorphic nodes and hence does not yield canonical diagrams.
In line with other existing efficient decision-diagram algorithms, we use dynamic programming in our algorithms to avoid traversing all paths (possibly exponentially many) in the LIMDD. In this approach, the decision diagram is manipulated and queried using recursive algorithms, which store intermediate results for each recursive call to avoid unnecessary recomputations. This recursive algorithmic structure that uses dynamic programming and reconstructs the diagram in the backtrack, is typical for all DD manipulation algorithms. Note that constant-time cache lookups (using a hash table) therefore require the canonical nodes produced by MakeEdge. LIMDDs additionally require the addition of LIMs to the caches; Sec. 3.3.3 shows how we do this. Table 1: Worst-case complexity of currently best-known algorithms for applying specific operations, in terms of the size of the input diagram size m (i.e., the number of nodes in the DD) and the number of qubits n. Although addition (Add) of quantum states is not, strictly speaking, a quantum operation, we include it because it is a subroutine of gate application. Note that several of the LIMDD algorithms invoke MakeEdge and therefore inherit its cubic complexity (as a factor).
if v / ∈ CanonicalCache then ▷ Compute result once for v and store in cache: 3: Finally, in this section, we often decompose LIMS using A = λP n ⊗ P ′ . Here λ ∈ C is a nonzero scalar, P ′ a Pauli string and P n ∈ {I, X, Z, Y } = Pauli. Our algorithms will use the Follow procedure from Alg. 4 to easily navigate diagrams according to edge semantics. Provided with a bit string x n . . . x 1 , the procedure is the same as ReadAmplitude. If however fewer bits are supplied, it returns a LIMDD root edge representing a subvector. For instance, the subvector for |10⟩ of the LIMDD root edge e r in Fig. 3 So, we can specify it as |follow b (e)⟩ = (⟨b| ⊗ I n−ℓ ) |e⟩, i.e., select the bth block of size 2 n−ℓ from the vector |e⟩ (or rather, return a LIMDD edge representing that block).

Algorithm 4
Follow: a generalization of ReadAmplitude, returning edges.

Performing a measurement in the computational basis
We discuss algorithms for measuring, sampling and updating after measurement of the top qubit.
App. E gives general algorithms with the same worst-case runtimes.
The procedure MeasurementProbability in Alg. 5 computes the probability p of observing the outcome |0⟩ for state |e⟩. If the quantum state can be written as |e⟩ = |0⟩ |e 0 ⟩ + |1⟩ |e 1 ⟩, then the probability is p = ⟨e 0 |e 0 ⟩ / ⟨e|e⟩, where we have ⟨e|e⟩ = ⟨e 0 |e 0 ⟩ + ⟨e 1 |e 1 ⟩. Hence we compute the squared norms of e x = follow x (e) using the SquaredNorm subroutine. The total runtime is dominated by the subroutine SquaredNorm, which computes the quantity ⟨e|e⟩ given a LIMDD edge e = v λP by traversing the entire LIMDD. We have ⟨e|e⟩ = |λ| 2 ⟨v| P † P |v⟩ = |λ| 2 ⟨v|v⟩, because P † P = I for Pauli matrices. Therefore, to this end, it computes the squared norm of |v⟩.
Since ⟨v|v⟩ = ⟨low v |low v ⟩+⟨high v |high v ⟩, this is accomplished by recursively computing the squared norm of the node's low and high edges. This subroutine visits each node at most once by virtue of dynamic programming, which stores intermediate results in a cache SNormCache : Node → R for all recursive calls (Line 7, 8). Therefore, it runs in time O(m) for a diagram with m nodes.

Algorithm 5 Algorithms
MeasurementProbability and UpdatePostMeas for respectively computing the probability of observing outcome |0⟩ when measuring the top qubit of a Pauli LIMDD in the computational basis and converting the LIMDD to the post-measurement state after outcome m ∈ {0, 1}. The subroutine SquaredNorm takes as input a Pauli LIMDD edge e, and returns ⟨e|e⟩. It uses a cache to store the value s of a node v. 1: procedure MeasurementProbability(Edge e) 2: s 0 := SquaredNorm(follow 0 (e)) 3: if v / ∈ SNormCache then ▷ Compute result once for v and store in cache: 8: if m = 0 then 12: e r := MakeEdge(follow 0 (e), 0 · follow 0 (e)) 13: else 14: e r := MakeEdge(0 · follow 0 (e), follow 1 (e)) 15: The outcome m ∈ {0, 1} can then be chosen by flipping a p-biased coin. The corresponding state update is implemented by the procedure UpdatePostMeas. In order to update the state |e⟩ = |0⟩ |e 0 ⟩+|1⟩ |e 1 ⟩ after the top qubit is measured to be m, we simply construct an edge |m⟩ |e m ⟩ using the MakeEdge subroutine. This state is finally normalized by multiplying (the scalar on) the resulting root edge with a normalization constant computed using squared norm.
To sample from a quantum state in the computational basis, simply repeat the measurement procedure for edge v with k = idx(v), throw a p-biased coin to determine x k , use follow x k ( v ) to go to level k − 1 and repeat the process.

Gates with simple LIMDD algorithms
As a warm up, before we give the algorithm for arbitrary gates and Clifford gates, we first give algorithms for several gates that have a relatively simple and efficient LIMDD manipulation operation. In the case of a controlled gate, we distinguish two cases, depending whether the control or the target qubit comes first; we call these a downward and an upward controlled gate, respectively.
Here, we let L k denote the unitary applying local gate L on qubit k, i.e., Applying a single-qubit Pauli gate Q to qubit k of a LIMDD, by updating the diagram's root edge from A to Q k A, i.e., change A = λP n ⊗ · · · ⊗ P 1 to λP n ⊗ · · · ⊗ P k+1 ⊗ QP k ⊗ P k−1 ⊗ · · · ⊗ P 1 . Since only nodes -and not root edges-need be canonical, this can be done in constant time, provided that the LIMDD is stored in the natural way (uncompressed with objects and pointers).
Applying any diagonal or antidiagonal single-qubit gate to the top qubit can be done efficiently, e.g., applying the T -gate to the top qubit. For root edge e = v Broot , we can construct e x = follow x (e), which propagates the root edge's LIM to the root's two children. Then, for a diagonal node α 0 0 β , we construct a new root node MakeEdge(α ·e 0 , β ·e 1 ). For the anti-diagonal gate 0 β α 0 , it is sufficient to note that 0 β α 0 = X · α 0 0 β ; thus, we can first apply a diagonal gate, and then an X gate, as described above.
is also efficient. Alg. 6 gives a recursive procedure.
Hence, we can 'push' S k through the LIMs down the recursion, rebuilding the LIMDD in the backtrack with MakeEdge on Line 6 and 7. To apply S k to v when k = n = idx(v), we finally multiply the high edge label with i on Line 4. Dynamic programming, using table SGateCache, ensures a linear amount of recursive calls in the number of nodes m. The total runtime is therefore O(mn 3 ), as MakeEdge's is cubic (see Sec. 4).

Algorithm 6 Apply gate
if v / ∈ SGateCache then ▷ Compute result once for v and store in cache: where Q is a single-qubit Pauli gate, c the control qubit and t the target qubit with t < c, to a node v can also be done recursively. If idx(v) > c, then since CQ c t is a Clifford gate, we may push it through the node's root label, and apply it to the children low(v) and high(v), similar to the S gate. Otherwise, if idx(v) = c, then update v's high edge label as B → Q t B, and do not recurse. Alg. 7 shows the recursive procedure, which is similar to Alg. 6 and also has O(mn 3 ) runtime.

Algorithm 7
Apply gate CX with control qubit c and target qubit t for Pauli-LIMDD v A . We let n = idx(v). We can replace CX, with CY, CZ. modifying Line 4 accordingly (i.e. to Y t , Z t ).
if v / ∈ CPauliCache then ▷ Compute result once for v and store in cache:

Applying a generic multi-qubit gate to a state
We use a standard approach [24] to represent quantum gates (2 n ×2 n unitary matrices) as LIMDDs.
Here a matrix U is interpreted as a function u(r 1 , c 1 , . . . , r n , c n ) ≜ ⟨r| U |c⟩ on 2n variables, which returns the entry of U on row r and column c. The function u is then represented using a LIMDD of 2n levels. The bits of r and c are interleaved to facilitate recursive descent on the structure. In particular, for x, y ∈ {0, 1}, the subfunction u xy represents a quadrant of the matrix, namely the submatrix u xy (r 2 , c 2 , . . . , r n , c n ) ≜ u(x, y, r 2 , c 2 , . . . , r n , c n ), as follows: Def. 4 formalizes this idea. Fig. 7 shows a few examples of gates represented as LIMDDs.

Definition 4 (LIMDDs for gates). A
where r, c are the row and column index, respectively, with binary representation r 1 , . . . , r n and c 1 , . . . , c n . The semantics of a LIMDD edge e as a matrix is denoted [e] ≜ U (as opposed to its semantics |e⟩ as a vector).
The procedure ApplyGate (Alg. 8) applies a gate U to a state |φ⟩, represented by LIMDDs e U and e φ . It outputs a LIMDD edge representing U |φ⟩. It works similar to well-known matrix-vector product algorithms for decision diagrams [24,27], except that we also handle edge weights with LIMs (see Fig. 8 for an illustration). Using the follow x (e) procedure, we write |φ⟩ and U as CNOT gate: Then, on Line 6, we compute each of the four terms U rc |φ c ⟩ for row/column bits r, c ∈ {0, 1}. We do this by constructing four LIMDDs f r,c representing the states |f r,c ⟩ = U r,c |φ c ⟩, using four recursive calls to the ApplyGate algorithm. Next, on Line 7 and 8, the appropriate states are added, using Add (Alg. 9), producing LIMDDs e 0 and e 1 for the states |e 0 ⟩ = U 00 |φ 0 ⟩ + U 01 |φ 1 ⟩ and for |e 1 ⟩ = U 10 |φ 0 ⟩ + U 11 |φ 1 ⟩. The base case of ApplyGate is the case where n = 0, which means U and |v⟩ are simply scalars, in which case both e U and e φ are edges that point to the leaf.
▷ Get canonical root labels 4: if (P ′ , u, Q ′ , v) / ∈ Apply-cache then ▷ Compute result for the first time:

5:
for r, c ∈ {0, 1} do 6: Edge f r,c := ApplyGate(follow rc ( ▷ Store in cache 10:  called, store an entry with key (P, u, Q, v) in the cache. This would allow us to retrieve the result the next time ApplyGate is called with the same parameters. However, we can do much better, in such a way that we can retrieve the result from the cache also when the procedure is called with This can happen even when λP ̸ = A or γQ ̸ = B; therefore this may prevent many recursive calls.
To this end, we store not just an edge-edge tuple from the procedure's parameters, but a canonical edge-edge tuple. To obtain canonical edge labels, our algorithms use the function RootLabel which returns a canonically chosen LIM, i.e., it holds that RootLabel( A specific choice for RootLabel is the lexicographic minimum of all possible root labels. In Alg. 17, we give an O(n 3 )-time algorithm for computing the lexicographically minimal root label, following the same strategy as the MakeEdge procedure in Sec. 4.2. As a last optimization, we opt to not store the scalars λ, γ in the cache (they are "factored out"), so that we can retrieve this result also when ApplyGate is called with inputs that are equal up to a complex phase. These scalars are then factored back in on Line 11 and 9.
The subroutine Add (Alg. 9) adds two quantum states, i.e., given two LIMDDs representing |e⟩ and |f ⟩, it returns a LIMDD representing |e⟩ + |f ⟩. It proceeds by simple recursive descent on the children of e and f . The base case is when both edges point to the diagram's leaf. In this case, these edges are labeled with scalars A, B ∈ C, so we return the edge Algorithm 9 Given two n-LIMDD edges e, f , constructs a new LIMDD edge a with |a⟩ = |e⟩+|f ⟩.
▷ Normalize for cache lookup 4: Caching in Add. A straightforward way to implement the cache would be to store a tuple with key ). However, we can do much better; namely, we remark that we are looking to construct the state A |v⟩+B |w⟩, and that this is equal to A·(|v⟩+A −1 B |w⟩). This gives us the opportunity to "factor out" the LIM A, and only store the tuple (v, A −1 B, w).
We can do even better by finding a canonically chosen LIM C = RootLabel( ) (on Line 4) and storing (v, C, w) (on line Line 8). This way, we get a cache hit at Line 5 upon the call This happens of course in particular when (A, v, B, w) = (D, v, E, w), but can happen in exponentially more cases; therefore, this technique works at least as well as the "straightforward" way outlined above. Finally, on Line 3, we take advantage of the fact that addition is commutative; therefore it allows us to pick a preferred order in which we store the nodes, thus improving possible cache hits by a factor two. We also use C in the recursive call at Line 6 and 7. The worst-case runtime of Add is O(n 3 2 n ) (exponential as expected), where n is the number of qubits. This can happen when the resulting LIMDD is exponential in the input sizes (bounded by 2 n ), as identified for QMDDs in [52, Table 2]. The reason for this is that addition may remove any common factors, as illustrated in Fig. 9. However, the Add algorithm is polynomialtime when v = w and v is a stabilizer state, which is sufficient to show that the Hadamard gate can be efficiently applied to stabilizers represented as LIMDD, as we demonstrate next in Sec. 3.3.4.

LIMDD operations for Clifford gates are polynomial time on stabilizer states
We give an algorithm for the Hadamard gate and then show that it can be applied to a stabilizer state in polynomial time. Together with the results of Sec. 3.3.2, this shows that all Clifford gates can be applied to stabilizer states in polynomial time. The key ingredient is Th. 7, which describes situations in which the Add procedure looks up the same tuples in the cache in both its recursive calls (modulo ±1). Th. 5 gives the final result.
To apply a Hadamard gate (H = 1 to the first qubit, we first construct edges representing the states |a 0 ⟩ = |e 0 ⟩ + |e 1 ⟩ and |a 1 ⟩ = |e 0 ⟩ − |e 1 ⟩, using the Add procedure (Alg. 9 and multiplying the root edge with −1). Then we construct an edge representing the state |0⟩ |a 0 ⟩ + |1⟩ |a 1 ⟩ using MakeEdge. Lastly, the complex factor on the new edge's root label is multiplied by 1 √ 2 . Since the Hadamard is also a Clifford gate, we can apply this operation to any qubit in the LIMDD by pushing it through the LIMs, as we saw in Sec. 3.3.2. Alg. 10 shows the complete algorithm. Proof. By virtue of the cache, HGate is called at most once per node. Since the LIMDD is a Tower, there are only n nodes; so HGate is called at most n times. For the node at level k, HGate makes two calls to Add on Line 4. By applying induction over the qubits 1, . . . , k, using if v / ∈ HGateCache then ▷ Compute result once for v and store in cache: 3: if idx(v) = k then 4: else 6: Th. 7, it is easy to see that at each level, the cache in Alg. 9 is consulted at Line 5 with a tuple . This tells us that Add performs at most 2k recursive calls. Each recursive call to Add may invoke the MakeEdge procedure, which runs in time O(n 3 ), yielding a total worst-case running time of O(n 4 ), when k = Ω(n).
Theorem 7. If Alg. 9 is called on two edges pointing to the same ⟨Pauli⟩-Tower-LIMDD node v with low(v) = high(v), then the recursive Add calls at Line 6, 7 both lookup the same LIM in cache up to a factor ±1.
Proof. Assume the algorithm is at Line 6. Let v w w Q be a node on which the algorithm was called. Let C = P n ⊗ P be the n qubit Pauli-LIM computed at Line 4 with P n ∈ Pauli and P an n−1 qubit Pauli-LIM. At Line 6 and 7, Add makes two recursive calls computing a x for x ∈ {0, 1}, as listed in the header of the below table. The follow x (e) semantics yield four cases cases for the parameters in a recursive Add calls, depending on x and P n . The following table shows the tuples computed for cache normalization at Line 4 in the recursive call, ignoring the RootLabel() In all four cases, the cache-normalized LIMs computed in both recursive calls are equivalent up to a factor ±1. Finally, our RootLabel() function from Sec. 4.2.1, which selects the lexicographic smallest label, satisfies RootLabel( This completes the proof. Since stabilizer states are closed under Clifford gates, ⟨Pauli⟩-Tower-LIMDDs should also be closed under the respective LIMDD manipulation operations. We show this in Lemma 4 (App. C).

Comparing LIMDD-based simulation with other methods
Prop. 1 shows exponential advantages of (Pauli-)LIMDDs over three state-of-the-art classical quantum circuit simulators: those based on QMDDs and MPS [36,37], and the Clifford + T simulator. In this section we prove the proposition, mainly using results from the current section: To show the separation between simulation with LIMDDs and Clifford + T , we present Th. 8.
Our proofs often rely on the fact that LIMDDs are exponentially more succinct representations of a certain class of quantum states S that are generated by circuits with a certain (non-universal) gate set G. For instance, the stabilizer states that are generated by the Clifford gate set. LIMDD-based simulation -similar to MPS [38] and QMDD-based [28] simulation-proceeds by representing a state |φ t ⟩ at time step t as a LIMDD φ t . It then applies the gate U t ∈ G in the circuit corresponding to this time step to obtain a LIMDD φ t+1 with |φ t+1 ⟩ = U t |φ t ⟩, thus yielding strong simulation at the final time step as reading amplitudes from the final LIMDD is easy (see Sec. 3.1).
It follows that LIMDD-based simulation is efficient provided that it can execute all gates U t in polynomial time (in the size of the LIMDD representation), at least for the states in S. Note in particular that since the execution stays in S, i.e., |φ t ⟩ ∈ S =⇒ |φ t+1 ⟩ ∈ S, the representation size can not grow to exponential size in multiple steps (S can be considered an inductive invariant in the style of Floyd [53] and de Bakker & Meertens [54]). On the other hand, since MPS and QMDD are exponentially sized for cluster states, they necessarily require exponential time on circuits computing this family of states.

LIMDD is exponentially faster than QMDD-based simulation
As state set S, we select the stabilizer states and for G the Clifford gates. Th. 1 shows that LIMDDs for stabilizers are always quadratic in size in the number of qubits n, as the diagram contains n nodes and n + 1 LIMs, each of size at most n (see Def. On the other hand, Th. 2 shows that QMDDs for cluster states are exponentially sized. It follows that in simulation also, there is an exponential separation between QMDD and LIMDD, proving that QSim QMDD For the other direction, we now show that LIMDDs are at most a factor O(n 3 ) slower than QMDDs on any given circuit. First, a LIMDD never contains more nodes than a QMDD representing the same state (because QMDD is by definition a specialization of LIMDD, see Sec. 3.1). The LIMDD additionally uses O(n) memory per node to store two Pauli LIMs; thus, the total memory usage is at most a factor O(n) worse than QMDDs for any given state. The ApplyGate and Add algorithms introduced in Sec. 3.3.3 are very similar to the ones used for QMDDs in [1,24]. In particular, our ApplyGate and Add algorithms never make more recursive calls than those for QMDDs. However, one difference is that our MakeEdge algorithm runs in time O(n 3 ) instead of O(1). Therefore, in the worst case these LIMDD algorithms make the same number of recursive calls to ApplyGate and Add, in which case they are slower by a factor O(n 3 ).
Finally, Corollary 3 shows that the pseudo-cluster state |φ⟩ has a polynomial representation in LIMDD. By definition of the pseudo-cluster state, post-selecting (constraining) the top qubit to 0 (or 1) yields the cluster state |G n ⟩. Therefore, QMDD for the pseudo-cluster state must have exponential size, as constraining can never increase the size of DD [55,Th 2.4.1]. Together with the universal simulation discussed above, this proves that the above also holds for for a simulator based on the combination QMDD ∪ Stab (Prop. 1 Item 5).

LIMDD is exponentially faster than MPS
In Sec. 3.4.1, we saw that LIMDD can simulate the cluster state in polynomial time. On the other hand, Th. 4 shows that MPS for cluster states are exponentially sized. It follows that in simulation also, there is an exponential separation between MPS and LIMDD, proving Prop. 1 Item 2.

LIMDD is exponentially faster than Clifford + T
In this section, we consider a circuit family that LIMDDs can efficiently simulate, but which is difficult for the Clifford+T simulator because the circuit contains many T gates, assuming the Exponential Time Hypothesis (ETH, a standard complexity-theoretic assumption which is widely believed to be true). This method decomposes a given quantum circuit into a circuit consisting only of Clifford gates and the T = 1 0 0 e iπ/4 gate, as explained in Sec. 2.
The circuit family, given my McClung [56], maps the input state |0⟩ ⊗n to the n-qubit W state |W n ⟩, which is the equal superposition over computational-basis states with Hamming weight 1, Arunachalam et al. showed that, assuming ETH, any circuit which deterministically produces the |W n ⟩ state in this way requires Ω(n) T gates [57]. Consequently, the Clifford + T simulator cannot efficiently simulate the circuit family, even when one allows for preprocessing with a compilation algorithm aiming to reduce the T -count of the circuit (such as the ones developed in [58,59]).

Theorem 8.
There exists a circuit family C n such that C n |0⟩ ⊗n = |W n ⟩, that Pauli-LIMDDs can efficiently simulate. Here simulation means that it constructs representations of all intermediate states, in a way which allows one to, e.g., efficiently simulate any single-qubit computational-basis measurement or compute any computational basis amplitude on any intermediate state and the output state.
We note that we could have obtained a similar result using the simpler scenario where one applies a T gate to each qubit of the (|0⟩+|1⟩) ⊗n input state. However, our goal is to show that LIMDDs can natively simulate scenarios which are relevant to quantum applications, such as the stabilizer states from the previous section. The W state is a relevant example, as several quantum communication protocols use the W state [60][61][62]. In contrast, the circuit with only T gates yields a product state, hence it is not relevant unless we consider it as part of a larger circuit which includes multi-qubit operations.
Lastly, it would be interesting to analytically compare LIMDD with general stabilizer rank based simulation (without assuming ETH). However, this would require finding a family of states with provably superpolynomial stabilizer rank, which is a major open problem. Instead, we implemented a heuristic algorithm by Bravyi et al. [14] to empirically find upper bounds on the stabilizer rank and applied it to a superset of the W states, so-called Dicke states, which can be represented as polynomial-size LIMDD. The O(n 2 )-size LIMDD can be obtained via a construction by Bryant [19], since the amplitude function of a Dicke state is a symmetric function. The results hint at a possible separation but are inconclusive due to the small number of qubits which the algorithm can feasibly investigate in practice. See App. G for details.

Canonicity: Reduced LIMDDs with efficient MakeEdge algorithm
Unique representation, or canonicity, is a crucial property for the efficiency and effectiveness of decision diagrams. In the first place, it allows for circuit analysis and simplification [20,27], by facilitating efficient manipulation operations through dynamic programming efficiently, as discussed in Sec. 3.3. In the second place, a reduced diagram is smaller than an unreduced diagram because it merges nodes with the same semantics. For instance, Pauli-LIMDDs allow all states in the same ≃ Pauli equivalence class to be merged. Here, we define a reduced Pauli-LIMDD, which is canonical.
In general, many different LIMDDs can represent a given quantum state, as illustrated in Fig. 10. However, by imposing a small number of constraints on the diagram, listed in Def. 5 and visualized in Fig. 11, we ensure that every quantum state is represented by a unique 'reduced ' Pauli-LIMDD. We present a MakeEdge algorithm (Alg. 11 in Sec. 4.2) that computes a canonical node assuming its children are already canonical. The algorithms for quantum circuit simulation in Sec. 3 Figure 10: Four different Pauli-LIMDDs representing the Bell state 1 √ 2 (|00⟩ + |11⟩). From left to right: as I-LIMDD, swapping high and low nodes v, v ′ by placing an X on the root LIM, merging v ′ into v by observing that |v⟩ = X |v ′ ⟩ and selecting a different high LIM −X together with changing the root LIM. This section shows that selecting a unique high LIM is the most challenging, as in general many LIMs can be chosen.

LIMDD canonical form
The main insight used to obtain canonical decision diagrams is that a canonical form can be computed locally for each node, assuming its children are already canonical. In other words, if the diagram is constructed bottom up, starting from the leaf, it can immediately be made canonical. (This is why decision diagram manipulation algorithms always construct the diagram in the backtrack of the recursion using a typical 'MakeNode' procedure for constructing canonical nodes [24], like in Sec. 3.3.) For instance, a QMDD node v α w β with α, β ∈ C \ {0} can be reduced into a canonical node by dividing out a common factor α and placing it on the root edge. Assuming that v, w are canonical, the resulting node v 1 w β /α can be stored as a tuple (1, v, β /α, w) in a hash table. Moreover, any other node that is equal to this node up to a scalar is reduced to the same tuple with this strategy [27] and thus merged in the hash table.
For LIMDD, we use a similar approach of dividing out 'common LIM factors.' However, we need to do additional work to obtain a unique high edge label ( β /α in the example above), as the PauliLIM group is more complicated than the group of complex numbers (scalars).
Def. 5 gives reduction rules for LIMDDs and Fig. 11 illustrates them. The merge (1) and low factoring (4) rules fulfill the same purpose as in the QMDD case discussed above. In a Pauli-LIMDD, we may always swap high and low edges of a node v by multiplying the root edge LIM with X ⊗ I, as illustrated in Fig. 10. The low precedence rule (3) makes this choice deterministic, but only in case low(v) ̸ = high(v). Next, the zero edges (2) rule handles the case when αór β are zero in the above, as in principle a edge e with label 0 could point to any node on the next level k, as this always yields a 0 vector of length 2 k (see semantics below Def. 2). The rule forces low(v) = high(v) in case either edge has a zero label. We explain the interaction among the zero edges (2), low precedence (3) and low factoring (4) rules below. Finally, the high determinism rule (5) defines a deterministic function to choose LIMs on high edges, solving the most challenging problem of uniquely selecting a LIM on the high edge. We give an O(n 3 ) algorithm for this function in Sec. 4.2.

Definition 5 (Reduced LIMDD).
A Pauli-LIMDD is reduced when it satisfies the following constraints. It is semi-reduced if it satisfies all constraints except possibly high determinism.

(Zero) edge:
For any edge (v, w) ∈ high ∪ low, if label(v, w) = 0, then both edges outgoing from v point to the same node, i.e., high(v) = low(v) = w.
3. Low precedence: Each node v has low(v) ≼ high(v), where ≼ is a total order on nodes.

Low factoring:
The label on every low edge to a node v is the identity  Figure 11: Illustrations of the reduction rules from Def. 5 applied at level k + 1 (i.e., k Note that, in general, the top edges are not necessarily root edges, but could be high and low edges for nodes on level k + 2. So, in general, there can be multiple such incoming edges (dashed and solid).

High determinism:
The label on the high edge of any node v is where HighLabel is a function that takes as input a semi-reduced n-Pauli-LIMDD node v, and outputs an ( Moreover, for any other semi-reduced node w with |v⟩ ≃ Pauli |w⟩, it satisfies HighLabel(w) = B high . In other words, the function HighLabel is constant within an isomorphism class.
We make several observations about reduced LIMDDs. First, let us apply this definition to a state |0⟩ ⊗ A |φ⟩ + |1⟩ ⊗ B |ψ⟩ with |φ⟩ ̸ ≃ Pauli |ψ⟩, where A, B ∈ PauliLIM. Assume we already have canonical LIMDDs for φ and ψ (note that necessarily φ ̸ = ψ). We will transform this node so that it satisfies all the reduction rules above. There is a choice between representing this state , as these are related by the isomorphism X ⊗ I. The low precedence rule resolves this choice here. Assuming φ ≺ ψ, low factoring can now be realized by dividing out the LIM A, yielding a node φ I ψ A −1 B (with root edge I ⊗ A as in Fig. 11 (4)). Otherwise, if ψ ≺ φ, we obtain node ψ I φ B −1 A with incoming edge X ⊗ B. Finally, since there might be other LIMs B high not equal to B −1 A that yield the same state, the high determinism rule is finally needed to obtain a canonical node ψ I φ B high as shown in Fig. 12. This last step turns a semi-reduced node into a (fully) reduced node. Sec. 4.2 discusses it in detail. Now, let us apply the definition to a state |1⟩ ⊗ A |φ⟩. First, notice that the zero edges rule forces low(v) = high(v) = φ in this case. There is a choice between representing this state as either , which denote the states |0⟩ ⊗ A |φ⟩ and |1⟩ ⊗ A |φ⟩, as these are related by the isomorphism X ⊗ I. The low factoring rule requires that the low edge label is I, yielding a node of the form φ A φ 0 with root label X ⊗ A: In other words, this rule enforces swapping high and low edges, placing a X on the root label, and dividing out the LIM A. Consequently, the high edge must be labeled with 0, and therefore, semi-reduction, in this case, coincides with (full) reduction (no high determinism is required). Notice also that there is no reduced LIMDD for the 0-vector, because low factoring requires low edges with label I. This is not a problem, since the 0-vector is not a quantum state.
The rules in Def. 5 are defined only for Pauli-LIMDDs, to which our results pertain (except for the brief mention of ⟨X⟩ and ⟨Z⟩-LIMDDs in Sec. 3.2). We briefly discuss alternative groups here. If G is a group without the element X ̸ ∈ G, the reduced G-LIMDD based on the same rules is not universal (does not represent all quantum states), because the low precedence rule cannot always be satisfied, since it requires that v 0 ≼ v 1 for every node. Hence, in this case, reduced G-LIMDD cannot represent a state |0⟩ |v 0 ⟩ + |1⟩ |v 1 ⟩ when v 1 ≺ v 0 . However, it is not difficult to formulate rules to support these groups G; for instance, when G = {I}, we recover the QMDD and may use its reduction rules [28].

, which represents
state [1, 1, i, i] ⊤ . Because the normalization constant was divided out (see factor 1 /4 on the root edge), this state is not normalized. In fact, the root node does not need to be normalized, as even reduced LIMDDs can represent any vector (except for the zero vector).
Lastly, the literature on other decision diagrams [18,19,48] often considers a "redundant test" or "deletion" rule to remove nodes with the same high and low child. This would introduce the skipping of qubit levels, which our syntactic definition disallows, as already discussed in Footnote ‡. However, if needed Def. 2 could be adapted and a deletion rule could be added to Def. 5.
We now give a proof of Th. 9, which states that reduced LIMDDs are canonical.
Theorem 9 (Node canonicity). For each n-qubit quantum state |φ⟩, there exists a unique reduced Pauli-LIMDD L with root node v L such that |v L ⟩ ≃ |φ⟩.
Proof. We use induction on the number of qubits n to show universality (the existence of an isomorphic LIMDD node) and uniqueness (canonicity).
Base case. If n = 0, then |φ⟩ is a complex number λ. A reduced Pauli-LIMDD for this state is the leaf node representing the scalar 1. To show it is unique, consider that nodes v other than the leaf have an idx(v) > 0, by the edges rule, and hence represent multi-qubit states. Since the leaf node itself is defined to be unique, the merge rule is not needed and canonicity follows.
Case |φ 0 ⟩ = 0 or |φ 1 ⟩ = 0: In case |φ 0 ⟩ ̸ = 0, by the induction hypothesis, there exists a Pauli-LIMDD with root node w satisfying |w⟩ ≃ |φ 0 ⟩. By definition of ≃, there exists an n-qubit Pauli isomorphism A such that |φ 0 ⟩ = A |w⟩. We construct the following reduced Pauli-LIMDD for |φ⟩: , adding a root edge e r = v I ⊗ A as illustrated in Fig. 12 (left). In case |φ 1 ⟩ ̸ = 0, we do the same for root node In case |φ 1 ⟩ ̸ = 0, we do the same for root |w⟩ ≃ |φ 1 ⟩ = A |w⟩, but switch the high and the low edge by instead a root edge e r = v X ⊗ A (similar to Fig. 11 (3)). In both cases, it is easy to check that the root node v is reduced as it can be represented by a tuple (I, w, 0, w), where w is canonical because of the induction hypothesis. Also in both cases, we also have |φ⟩ = |e r ⟩ because either |φ⟩ = I ⊗ A |v⟩ or |φ⟩ = X ⊗ A |v⟩.
Case |φ 0 ⟩ , |φ 1 ⟩ ̸ = 0: By applying the induction hypothesis twice, there exist Pauli-LIMDDs L and R with root nodes |v L ⟩ ≃ |φ 0 ⟩ and |v R ⟩ ≃ |φ 1 ⟩. The induction hypothesis implies only a 'local' reduction of LIMDDs L and R, but not automatically a reduction of their union. For instance, L might contain a node v and R a node w such that v ≃ w. While the other reduction rules ensure that v and w will be structurally the same, the induction hypothesis only applies the merge rule L and M in isolation, leaving two copies of identical nodes v, w. We can solve this by applying merge on the union of nodes in L and M , to merge any equivalent nodes, as they are already structurally equivalent by the induction hypothesis. This guarantees that (also) v L , v R are identical nodes.
By definition of ≃, there exist n-qubit Pauli isomorphisms A and B such that |φ 0 ⟩ = A |v L ⟩ and |φ 1 ⟩ = B |v R ⟩. In case v L ≼ v R , we construct the following reduced Pauli-LIMDD for |φ⟩: ) . Otherwise, if v R ≼ v L , then we construct the following reduced Pauli-LIMDD for |φ⟩: the root node ). It is straightforward to check that, in both cases, this Pauli-LIMDD is reduced. Moreover, |v⟩ isomorphic to |φ⟩ as illustrated in Fig. 12 (right).
The fact that these nodes are isomorphic means that there is a Pauli isomorphism P such that P |v L ⟩ = |v M ⟩. We write P = λP top ⊗ P rest ̸ = 0 where P top is a single-qubit Pauli matrix and P rest an (n − 1)-qubit Pauli LIM. Expanding the semantics of v L and v M , we obtain, We distinguish two cases from here on: where P top ∈ {I, Z} or P top ∈ {X, Y }.
, −1}, then Eq. 8 gives: We conclude that in both cases v L and v M have the same children and the same edge labels, so they are identical by the merge rule.
for z ∈ {1, i}, then Eq. 8 gives: because then |v 0 M ⟩ is the all-zero vector, which we excluded. The other case: if B M = 0, then it must be that λzP rest A L |v 0 L ⟩ is zero. Since λzP rest ̸ = 0 and A L = I, it follows that |v 0 L ⟩ is the all-zero vector, which is again excluded.
We conclude that v L and v M have the same children and the same edge labels for all choices of P top , so they are identical by the merge rule.

The MakeEdge subroutine: Maintaining canonicity during simulation
To construct new nodes and edges, our algorithms use the MakeEdge subroutine (Alg. 11), as discussed in Sec. 4.1. MakeEdge produces a reduced parent node (with root edge) given two reduced children, so that the LIMDD representation becomes canonical. Here we give the algorithm for MakeEdge and show that it runs in time O(n 3 ) (assuming the input nodes are reduced).
The MakeEdge subroutine distinguishes two cases, depending on whether both children are nonzero vectors, which both largely follow the discussion below Def. 5. It works as follows: • First it ensures low precedence, switching e 0 and e 1 if necessary at Line 3. This is also done if e 0 's label A is 0 to allow for low factoring (avoiding divide by zero).
• Low factoring, i.e., dividing out the LIM A, placing it on the root node, is visualized in Fig. 12 for the cases e 1 = 0/e 1 ̸ = 0, and done in the algorithm at Line 6,7 / 9,11.
• The zero edges rule is enforced in the B = 0 branch by taking v 1 := v 0 .
• The canonical high label B high is computed by GetLabels, discussed below, for the semireduced node with v 0 ̸ = v 1 . With the resulting high label, it now satisfies the high determinism rule of Def. 5 with HighLabel(w) = B high .
• Finally, we merge nodes by creating an entry (v 0 , B high , v 1 ) in a

Choosing a canonical high-edge label
In order to choose the canonical high edge label of node v, the MakeEdge algorithm calls GetLabels (Line 10 of Alg. 11). The function GetLabels returns a uniquely chosen LIM B high among all possible high-edge labels which yield LIMDDs representing states that are Pauli-isomorphic to |v⟩. We sketch the algorithm for GetLabels here and provide the algorithm in full detail in App. D (Alg. 12). First, we characterize the eligible high-edge labels. That is, given a semi- . Our characterization shows that, modulo some complex factor, the eligible labels C are of the form C ∝ g 0 ·Â · g 1 , for g 0 ∈ Stab(|v 0 ⟩), g 1 ∈ Stab(|v 1 ⟩) where Stab(|v 0 ⟩) and Stab(|v 1 ⟩) are the stabilizer subgroups of |v 0 ⟩ and |v 1 ⟩, i.e., the already reduced children of our input node v. Note that the set of eligible high-edge labels might be exponentially large in the number of qubits. Fortunately, eq. (10) shows that this set has a polynomial-size description by storing only the generators of the stabilizer subgroups.

Algorithm 11
Algorithm MakeEdge takes two root edges to (already reduced) nodes v 0 , v 1 , the children of a new node, and returns a reduced node with root edge. It assumes that idx(v 0 ) = idx(v 1 ) = n. We indicate which lines of code are responsible for which reduction rule in Def. 5. v := v0 Our algorithm chooses the lexicographically smallest eligible label, i.e., the smallest C of the form C ∝ g 0Â g 1 (the definition of 'lexicographically smallest' is given in App. A). To this end, we use two subroutines: (1) an algorithm which finds (a generating set of) the stabilizer group Stab(|v⟩) of a LIMDD node v; and (2) an algorithm that uses these stabilizer subgroups of the children nodes to choose a unique representative of the eligible-high-label set from eq. (10).
For (1), we use an algorithm which recurses on the children nodes. First, we note that, if the Pauli LIM A stabilizes both children, then I ⊗ A stabilizes the parent node. Therefore, we compute (a generating set for) the intersection of the children's stabilizer groups. Second, our method finds out whether the parent node has stabilizers of the form P n ⊗ A for P n ∈ {X, Y, Z}. This requires us to decide whether certain cosets of the children's stabilizer groups are empty. These groups are relatively simple, since, modulo phase, they are isomorphic to a binary vector space, and cosets are hyperplanes. We can therefore rely in large part on existing algorithms for linear algebra in vector spaces. The difficult part lies in dealing with the non-abelian aspects of the Pauli group. We provide the full algorithm, which is efficient, also in App. D.
Our algorithm for (2) applies a variant of Gauss-Jordan elimination to the generating sets of Stab(|v 0 ⟩) and Stab(|v 1 ⟩) to choose g 0 and g 1 in eq. (10) which, when multiplied withÂ as in eq. (10), yield the smallest possible high label C. (We recall that Gauss-Jordan elimination, a standard linear-algebra technique, is applicable here because the stabilizer groups are group isomorphic to binary vector spaces, see also App. A). We explain the full algorithm in App. D.

Checking whether two LIMDDs are Pauli-equivalent
To check whether two states represented as LIMDDs are Pauli-equivalent, it suffices to check whether they have the same root node. Namely, due to canonicity, and in particular the Merge rule (in Def. 5), there is a unique LIMDD representing a quantum state up to phase and local Pauli operators.

Related work
We mention related work on classical simulation formalisms and decision diagrams other than QMDD.
The Affine Algebraic Decision Diagram, introduced by Tafertshofer and Pedam [63], and by Sanner and McAllister [26], is akin to a QMDD except that its edges are labeled with a pair of real numbers (a, b), so that an edge v (a, b) represents the state vector a |v⟩ + b |+⟩ ⊗n (i.e., here b is added to each element of the vector a |v⟩). To the best of our knowledge, this diagram has not been applied to quantum computing.
Context-Free-Language Ordered Binary Decision Diagrams (CFLOBDDs) [64,65] extend BDDs with insights from visibly pushdown automata [66]. An extension of CFLOBDD to the complex domain [67] shows good performance for various simulation of quantum computing benchmarks. Sentential Decision Diagrams [68] generalize BDDs by replacing their total variable order with a variable tree (vtree). Although Kisa et al. [69] introduced an SDD which represents probability distributions, SDDs have not yet been used to simulate quantum computing, to the best of our knowledge. The Variable-Shift SDD (VS-SDD) [70] improves on the SDD by merging isomorphic vtree nodes. We remark that CFLOBDDs are similar to VS-SDD with a balanced vtree.
Günther and Drechsler introduced a BDD variant [71] which, in LIMDD terminology, has a label on the root node only. To be precise, this diagram's root edge is labeled with an invertible matrix A ∈ F n×n 2 . If the root node represents the function r : B n → B, then the diagram represents the function f (⃗ x) = r(A · ⃗ x). (This concepts extends trivially to the domain of pseudo-Boolean functions, by replacing the BDD with an ADD.) In contrast, LIMDDs allow a label on every edge in the diagram, not only the root edge. We show that this is essential to capture stabilizer states.
A multilinear arithmetic formula is a formula over +, × which computes a polynomial in which no variable appears raised to a higher power than 1. Aaronson showed that some stabilizer states require superpolynomial-size multilinear arithmetic formulas [33,45].

Discussion
We have introduced LIMDD, a novel decision diagram-based method to simulate quantum circuits, which enables polynomial-size representation of a strict superset of stabilizer states and the states represented by polynomially large QMDDs. To prove this strict inclusion, we have shown the first lower bounds on the size of QMDDs: they require exponential size for certain families of stabilizer states. Our results show that these states are thus hard for QMDDs. We also give the first analytical comparison between simulation based on decision diagrams, and matrix product states, and the Clifford + T simulator.
LIMDDs achieve a more succinct representation than QMDDs by representing states up to local invertible maps which uses single-qubit (i.e., local) operations from a group G. We have investigated the choices G = Pauli, G = ⟨Z⟩ and G = ⟨X⟩, and found that any choice suffices for an exponential advantage over QMDDs; notably, the choice G = Pauli allows us to succinctly represent any stabilizer state. Furthermore, we showed how to simulate arbitrary quantum circuits, encoded as Pauli-LIMDDs. The resulting algorithms for simulating quantum circuits are exponentially faster than for QMDDs in the best case, and never more than a polynomial factor slower. In the case of Clifford circuits, the simulation by LIMDDs is in polynomial time (in contrast to QMDDs).
We have shown that Pauli-LIMDDs can efficiently simulate a circuit family outputting the W states, in contrast to the Clifford + T simulator which requires exponential time to do so (assuming the widely believed ETH), even when allowing for preprocessing of the circuit with a T -count optimizer.
Since we know from experience that implementing a decision diagram framework is a major endeavor, we leave an implementation of the Pauli-LIMDD, in order to observe its runtimes in practice on relevant quantum circuits, to future work. We emphasize that from the perspective of algorithm design, we have laid all the groundwork for such an implementation, including the key ingredient for the efficiency of many operations for existing decision diagrams: the existence of a unique canonical representative of the represented function, combined with a tractable MakeEdge algorithm to find it.
Regarding extensions of the LIMDD data structure, an obvious next step is to investigate other choices of G. Of interest are both the representational capabilities of such diagrams (do they represent interesting states?), and the algorithmic capabilities (can we still find efficient algorithms which make use of these diagrams?). In this vein, an important question is what the relationship is between G-LIMDDs (for various choices of G) and existing formalisms for the classical simulation of quantum circuits, such as those based on match gates [72][73][74] and tensor networks [29,75]. It would also be interesting to compare LIMDDs to graphical calculi such as the ZX calculus [76], following similar work for QMDDs [77].
Lastly, we note that the current definition of LIMDD imposes a strict total order over the qubits along every path from root to leaf. It is known that the chosen order can greatly influence the size of the DD [55,78], making it interesting to investigate variants of LIMDDs with a flexible ordering, for example taking inspiration from the Sentential Decision Diagram [68? ].

A Linear-algebra algorithms for Pauli operators
In Sec. 2, we defined the stabilizer group for an n-qubit state |φ⟩ as the group of Pauli operators A ∈ Pauli n which stabilize |φ⟩, i.e. A |φ⟩ = |φ⟩. Here, we explain existing efficient algorithms for solving various tasks regarding stabilizer groups (whose elements commute with each other). We also outline how the algorithms can be extended and altered to work for general PauliLIMs, which do not necessarily commute. For sake of clarity, in the explanation below we first ignore the scalar λ of a PauliLIM or Pauli element λP . At the end, we explain how the scalars can be taken into account when we use these algorithms as subroutine in LIMDD operations.
A set of k Pauli strings thus can be written as 2n × k binary matrix, often called check matrix, as the following example shows.
Furthermore, if P, Q are Pauli strings corresponding to binary vectors (⃗ x P , ⃗ z P ) and (⃗ x Q , ⃗ z Q ), then and therefore the group of n-qubit Pauli strings with multiplication (disregarding factors) is group isomorphic to the vector space {0, 1} 2n (i.e., F 2n 2 ) with bitwise addition ⊕ (i.e., exclusive or; 'xor'). Consequently, many efficient algorithms for linear-algebra problems carry over to sets of Pauli strings. In particular, if G = {g 1 , . . . , g k } are length−2n binary vectors (/ n-qubit Pauli strings) with k ≤ n, then we can efficiently perform the following operations. RREF: bring G into a reduced-row echelon form (RREF) using Gauss-Jordan elimination (both are standard linear algebra notions) where each row in the check matrix has strictly more leading zeroes than the row above. The RREF is achievable by O(k 2 ) row additions (/ multiplications modulo factor) and thus O(k 2 · n) time (see [79] for a similar algorithm). In the RREF, the first 1 after the leading zeroes in a row is called a 'pivot'.

Construct Minimal-size Generator
Set convert G to a (potentially smaller) set G ′ by performing the RREF procedure and discarding resulting all-zero rows. It holds that ⟨G⟩ = ⟨G ′ ⟩, i.e., these sets generate the same group modulo phase. Intersection: determine all Pauli strings which, modulo a factor, are contained in both G A and G B , where G A , G B are generator sets for n-qubit stabilizer subgroups. More specifically, we obtain the generator set of this group, i.e., we obtain a set G C such that ⟨G C ⟩ = ⟨G A ⟩∩⟨G B ⟩. This can be achieved using the Zassenhaus algorithm [80] for computing the intersection of two subspaces of a vector space, in time O(n 3 ). if h j = 1 and G has a row g i with its pivot at position j then h := h ⊕ g i

Membership
The resulting h is h rem . This algorithm's runtime is dominated by the RREF step; O(n 3 ).
To include the scalar into the representation, we remark that Pauli LIMs that appear as labels on diagrams may have λ ∈ C, i.e., any complex number is allowed. Therefore, to store LIMs, we use a minor extension to the check vector form introduced above, in order to also include the phase. Specifically, the phase is stored using two real numbers, by writing λ = r · e iθ with r ∈ R >0 and θ ∈ [0, 2π). Consequently, the check vector has 2n + 2 entries, where the last entries store r and θ, e.g.: where we used 3 = 3 · e i·0 and − 1 2 i = 1 2 · e 3πi/2 . This extended check vector also easily allows a total ordering, namely, we simply use the ordering on real numbers for r and θ. For example, (1, 1, |0, 0|2, 1 2 ) < (1, 1|1, 0|3, 0). Let us stress that the factor encoding (r, θ) is less significant than the Pauli string encoding (x n , . . . , x 1 |z n , . . . , z 1 ). As a consequence, we can greedily determine the minimum of two Pauli operators, by reading their check vectors from left to right.
Finally, we emphasize that the algorithms above rely on bitwise xor-ing, which is a commutative operation. Since conventional (i.e., factor-respecting) multiplication of Pauli operators is not commutative, the algorithms above are not straightforwardly applicable to arbitrary PauliLIM n input. (When the input consist of pairwise commuting Pauli operators, such as stabilizer subgroups [7], the algorithms can be made to work by adjusting row addition to keep track of the scalar.) Fortunately, since Pauli strings either commute or anti-commute, row addition may only yield factors up to the ± sign, not the resulting Pauli strings. This feature, combined with the stipulated order assigning least significance to the factor, enables us to invoke the algorithms above as subroutine. We do so in Sec. 4.2.1 and Sec. D.1.

B Proof that cluster states and coset states need exponentially-large QMDDs
In this appendix, we show that QMDDs which represent both clusters states, and coset states, are exponentially large in the worst case (respectively, Th. 2 and Corollary 11). On the other hand, in App. C, we will show that these states can be represented using only O(n) nodes by ⟨X⟩-LIMDDs, showing that they are exponentially more succinct than QMDDs. We first fix notation and definitions, after which we prove the theorem using two lemmas.
Let G be an undirected graph with vertices For a subset of vertices S ⊆ V G , the S-induced subgraph of G has vertices S and edge set (S × S) ∩ E. Given G, its graph state |G⟩ is expressed as where f G (⃗ x) is the number of edges in the S-induced subgraph of G.
For a function f : {0, 1} n → C and bit string ⃗ a = a 1 · · · a k ∈ {0, 1} k , we denote by f ⃗ a the subfunction of f restricted to ⃗ a: (a 1 , . . . , a k , x k+1 , . . . , x n ) We also say that f ⃗ a is a subfunction of f of order |⃗ a| = k.
We will also need the notions of boundary and strong matching. Using these definitions and notation, we prove Th. 2.

Theorem 2.
Denote by |G n ⟩ the two-dimensional cluster state, defined as a graph state on the n × n lattice. Each QMDD representing |G n ⟩ has at least 2 ⌊n/12⌋ nodes.
Proof. Let G = lattice(n, n) be the undirected graph of the n × n lattice, with vertex set V = {v 1 , . . . , v n 2 }. Let σ = v 1 v 2 · · · v n 2 be a variable order, and let S = {v 1 , v 2 , . . . , v 1 2 n 2 } ⊂ V be the first 1 2 n 2 vertices in this order. The proof proceeds broadly as follows. First, in Lemma 1, we show that any (S, S)-strong matching M effects 2 |M | different subfunctions of f G . Second, Lemma 2 shows that the lattice contains a large (S, S)-strong matching for any choice of S. Put together, this will prove the lower bound on the number of QMDD nodes as in Th. 2 by the fact that a QMDD for the cluster state G has a node per unique subfunction of the function f G . Fig. 13 illustrates this setup for the 5 × 5 lattice.  a (0, . . . , 0, x t = 0, 0, . . . , 0) ⃗ a (0, . . . , 0, x t = 1, 0, . . . , 0) (13) We see that each subset of S M corresponds to a different subfunction of f G . Since there are 2 |M | subsets of M , f G has at least that many subfunctions.
We now show that the n × n lattice contains a large enough strong matching.

Lemma 2.
Let S = {v 1 , . . . , v 1 2 n 2 } be a set of 1 2 n 2 vertices of the n × n lattice, as above. Then the graph contains a (S, S)-strong matching of size at least 1 12 n .
Proof. Consider the boundary B S of S. This set contains at least n/3 vertices, by Theorem 11 in [49]. Each vertex of the boundary of S has degree at most 4. It follows that there is a set of 1 4 |B S | vertices which share no neighbors. In particular, there is a set of 1 4 |B S | ≥ 1 12 n vertices in B S which share no neighbors in S.
Put together, every choice of half the vertices in the lattice yields a set with a boundary of at least n/3 nodes, which yields a strong matching of at least 1 12 n edges, which shows that f G has at least 2 ⌊ 1 12 n⌋ subfunctions of order 1 2 n 2 . Our result follows by noting that if f has codomain {0, 1} as above, then the QMDD of the state |f ⟩ = x f (x) |x⟩ has the same structure as the BDD of f . Consequently, in particular the BDD and QMDD have the same number of nodes.
Proof. We will show that the QMDD has the same number of nodes as a BDD. A BDD encodes a function f : {0, 1} n → {0, 1}. In this case, the BDD encodes f V , the characteristic function of V . A BDD is a graph which contains one node for each subfunction of f . (In the literature, such a BDD is sometimes called a Full BDD, so that the term BDD is reserved for a variant where the nodes are in one-to-one correspondence with the subfunctions f which satisfy f 0 ̸ = f 1 ).
Similarly, a QMDD representing a state |φ⟩ = x f (x) |x⟩ can be said to represent the function f : {0, 1} n → C, and contains one node for each subfunction of f modulo scalars. We will show that, two distinct subfunctions of f V are never equal up to a scalar. To this end, let f V,a , f V,b be distinct subfunctions of f V induced by partial assignments a, b ∈ {0, 1} k . We will show that there is no λ ∈ C * such that f V,a = λf V,b . Since the two subfunctions are not pointwise equal, say that the two subfunctions differ in the point x ∈ {0, 1} n−k , i.e., f V,a (x) ̸ = f V,b (x). Say without loss of generality that f V,a (x) = 0 and f V,b (x) = 1. Then, since λ ̸ = 0, we have Because distinct subfunctions of f V are not equal up to a scalar, the QMDD of |V ⟩ contains a node for every unique subfunction of f V . We conclude that, since by Th. 10 with high probability the BDD representing f V has exponentially many nodes, so does the QMDD representing |V ⟩.
C How to write graph states, coset states and stabilizer states as Tower-LIMDDs In this appendix, we prove that the families of ⟨Z⟩-, ⟨X⟩-, and ⟨Pauli⟩-Tower-LIMDDs correspond to graph states, coset states, and stabilizer states, respectively, in Th. 12, Th. 13 and Th. 1 below. Def. 5 for reduced Pauli-LIMDDs requires modification for G = ⟨Z⟩-LIMDDs because of the absence of X as discussed below the definition. Note that the proofs do not rely on the specialized definition of reduced LIMDDs, but only on Def. 2 which allows parameterization of the LIM G. They only rely on the Tower LIMDD in Def. 3.
Before we give the proof, we remark that graph states present an interesting special case because the LIMDD's edge labels contain meaningful information. Namely, the labels on the high edges of a graph state's LIMDD are precisely the edges in the original graph. Specifically, suppose a graph G gives rise to a graph state |φ G ⟩ represented by a LIMDD. Let P = P k−1 ⊗ · · · ⊗ P 1 be the label on the high edge out of the LIMDD node at level k. Then G contains an edge (v k , v j ) if and only if P j = Z (with the roles of k and j reversed if k < j). These edge labels come about in a straightforward manner during the construction of the graph state. Namely, the graph state |φ G ⟩ is produced by starting from the state |+⟩ ⊗n , and applying controlled-Z gates to qubit pairs (u, v) for every edge (u, v) in the graph. Applying such a controlled-Z gate to qubit pair (u, v) has the effect of setting P v to Z in the high edge outgoing from the vertex at level u. In general, however, the labels on the high edges cannot be easily inferred from the stabilizer state.
A G-Tower-LIMDD representing an n-qubit state is a LIMDD which has n nodes, not counting the leaf. It has G-LIMs on its high edges. Def. 3 gives an exact definition.
Theorem 12 (Graph states are ⟨Z⟩-Tower-LIMDDs). Let n ≥ 1. Denote by G n the set of n-qubit graph states and write Z n for the set of n-qubit quantum states which are represented by ⟨Z⟩-Tower-LIMDDs a defined in Def. 3, i.e, a tower with low-edge-labels I and high-edge labels λ j P j with P j ∈ {I, Z} and λ = 1, except for the root edge where λ ∈ C \ {0}. Then G n = Z n .
Proof. We establish G n ⊆ Z n by providing a procedure to convert any graph state in G n to a ⟨Z⟩-Tower-LIMDD in Z n . See Fig. 14 for an example of a 4-qubit graph state. We describe the procedure by induction on the number n of qubits in the graph state.
Base case: n = 1. We note that there is only one single-qubit graph state by definition (see Eq. 11), which is |+⟩ := (|0⟩ + |1⟩)/ √ 2 and can be represented as LIMDD by a single node (in addition to the leaf node): see Fig. 14(a).

Induction case.
We consider an (n + 1)-qubit graph state |G⟩ corresponding to the graph G. We isolate the (n + 1)-th qubit by decomposing the full state definition from Eq. 11: where E is the edge set of G and G 1..n is the induced subgraph of G on vertices 1 to n. Thus, |G 1..n ⟩ is an n-qubit graph state on qubits 1 to n. Since |G 1..n ⟩ is a graph state on n qubits, by the induction hypothesis, we have a procedure to convert it to a ⟨Z⟩-Tower-LIMDD ∈ Z n . Now we construct a ⟨Z⟩-Tower-LIMDD for |G⟩ as follows. The root node has two outgoing edges, both Figure 14: Construction of the ⟨Z⟩-Tower LIMDD for the 4-qubit cluster state, by iterating over the vertices in the graph, as described in the proof of Th. 12. (a) First, we consider the single-qubit graph state, which corresponds to a the subgraph containing only vertex A. (b) Then, we add vertex B, which is connected to A by an edge. The resulting LIMDD is constructed from the LIMDD from (a) by adding a new root node. In the figure, the isomorphism is ZB ⊗ IA, since vertex C is connected to vertex B (yielding the Z operator) but not to A (yielding the identity operator I). (c) This process is repeated for a third vertex C until we reach the LIMDD of the full 4-qubit cluster state (d). For comparison, (d) also depicts a regular QMDD for the same graph state, which has width 4 instead of 1 for the LIMDD.
To prove Z n ⊆ G n , we show how to construct the graph corresponding to a given ⟨Z⟩-Tower LIMDD. Briefly, we simply run the algorithm outlined above in reverse, constructing the graph one node at a time. Here we assume without loss of generality that the low edge of every node is labeled I.
Base case. The LIMDD node above the Leaf node, representing the state |+⟩, always represents the singleton graph, containing one node.
Induction case. Suppose that the LIMDD node k + 1 levels above the Leaf has a low edge labeled I, and a high edge labeled P k ⊗ · · · ⊗ P 1 , with P j = Z aj for j = 1 . . . k. Here by Z aj we mean Z 0 = I and Z 1 = Z. Then we add a node labeled k + 1 to the graph, and connect it to those nodes j with a j = 1, for j = 1 . . . k. The state represented by this node is of the form given in Eq. 15, so it represents a graph state.
A simple counting argument based on the above construction shows that |Z n | = |G n | = 2 ( n 2 ) , so the conversion is indeed a bijection. Namely, there are 2 ( n 2 ) graphs, since there are n 2 edges to choose, and there are 2 ( n 2 ) ⟨Z⟩-Tower-LIMDDs, because the total number of single-qubit operators of the LIMs on the high edges is n 2 , each of which can be independently chosen to be either I or Z.
We now prove that coset states are represented by ⟨X⟩-Tower-LIMDDs.
Theorem 13 (coset states are ⟨X⟩-Tower-LIMDDs). Let n ≥ 1. Denote by V n the set of n-qubit coset states and write X n for the set of n-qubit quantum states which are represented by ⟨X⟩-Tower-LIMDDs as per Def. 3, i.e., a tower with low edge labels I and high edge labels λ j P j with P j ∈ {I, X} and λ ∈ {0, 1}, except for the root edge where λ ∈ C \ {0}. Then V n = X n .
Proof. We first prove V n ⊆ X n by providing a procedure for constructing a Tower-LIMDD for a coset state. We prove the statement for the case when C is a group rather than a coset; the result will then follow by noting that, by placing the label X an ⊗ · · · ⊗ X a1 on the root edge, we obtain the coset state |C + a⟩. The procedure is recursive on the number of qubits.
Base case: n = 1. In this case, there are two coset states: |0⟩ and (|0⟩ + |1⟩)/ √ 2, which are represented by a single node which has a low and high edge pointing to the leaf node with low/high edge labels 1/0 and 1/1, respectively. Induction case. Now consider an (n + 1)-qubit coset state |S⟩ for a group S ⊆ {0, 1} n+1 for some n ≥ 1 and assume we have a procedure to convert any n-qubit coset state into a Tower-LIMDD in X n . We consider two cases, depending on whether the first bit of each element of S is zero: (a) The first bit of each element of S is 0. Thus, we can write S = {0x | x ∈ S 0 } for some set S 0 ⊆ {0, 1} n . Then 0a, 0b ∈ S =⇒ 0a ⊕ 0b ∈ S implies a, b ∈ S 0 =⇒ a ⊕ b ∈ S 0 and thus S 0 is an length-n bit string vector space. Thus by assumption, we have a procedure to convert it to a Tower-LIMDD in X n . Convert it into a Tower-LIMDD in X n+1 for |S⟩ by adding a fresh node on top with low edge label I ⊗n and high edge label 0, both pointing to the the root S.
(b) There is some length-n bit string u such that 1u ∈ S. Write S as the union of the sets {0x | x ∈ S 0 } and {1x | x ∈ S 1 } for sets S 0 , S 1 ⊆ {0, 1} n . Since S is closed under element-wise XOR, we have 1u ⊕ 1x = 0(u ⊕ x) ∈ S for each x ∈ S 1 and therefore u ⊕ x ∈ S 0 for each x ∈ S 1 . This implies that S 1 = {u ⊕ x | x ∈ S 0 } and thus S is the union of {0x | x ∈ S 0 } and {1u ⊕ 0x | x ∈ S 0 }. By similar reasoning as in case (a), we can show that S 0 is a vector space on length-n bit strings.
We build a Tower-LIMDD for |S⟩ as follows. By the induction hypothesis, there is a Tower-LIMDD with root node v which represents |v⟩ = |S 0 ⟩. We construct a new node whose two outgoing edges both go to this node v. Its low edge has label I ⊗n and its high edge has label P = P n ⊗ · · · ⊗ P 1 where P j = X if u j = 1 and P j = I if u j = 0.
We now show V n ⊆ X n , also by induction.
Base case: n = 1. There are only two Tower-LIMDDs on 1 qubit satisfying the description above, namely (1) A node whose two edges point to the leaf. Its low edge has label 1, and its high edge has label 0. This node represents the coset state |0⟩, corresponding to the vector space V = {0} ⊆ {0, 1} 1 .
(2) A node whose two edges point to the leaf. Its low edge has label 1 and its high edge also has label 1. This node represents the coset state |0⟩ + |1⟩, corresponding to the vector space V = {0, 1}.

Induction case.
Let v be the root node of an n + 1-qubit Tower ⟨X⟩-LIMDD as described above. We distinguish two cases, depending on whether v's high edge has label 0 or not.
(b) the high edge has label P = P n ⊗ · · · ⊗ P 1 with P j ∈ {I, X}. Then |v⟩ = |0⟩ |v 0 ⟩ + |1⟩ ⊗ P |v 0 ⟩. By the observations above, this is a coset state, corresponding to the vector space , 1} n is a string whose bits are u j = 1 if P j = X and u j = 0 if P j = I, and V 0 is the vector space corresponding to the coset state |v 0 ⟩.
Lastly, we prove the stabilizer-state case, showing that they are exactly equivalent to the ⟨Pauli⟩-Tower-LIMDD, as defined in Def. 3. For this, we first need Lemma 3 and Lemma 4, which state that, if one applies a Clifford gate to a ⟨Pauli⟩-Tower-LIMDD, the resulting state is another ⟨Pauli⟩-Tower-LIMDD. First, Lemma 3 treats the special case of applying a gate to the top qubit; then Lemma 4 treats the general case of applying a gate to an arbitrary qubit.

Lemma 3.
Let |φ⟩ be an n-qubit stabilizer state which is represented by a ⟨Pauli⟩-Tower-LIMDD as defined in Def. 3. Let U be either a Hadamard gate or S gate on the top qubit (n-th qubit), or a downward CNOT with the top qubit as control. Then U |φ⟩ is still represented by a ⟨Pauli⟩-Tower-LIMDD.
Proof. The proof is on the number n of qubits.
Induction case. For n > 1, we first consider U = S and U = CNOT. Let R be the label of the root edge. If U = S, then the high edge of the top node is multiplied with i, while a downward CNOT (target qubit with index k) updates the high edge label A → X k A. Next, the root edge label is updated to U RU † , which is still a Pauli string, since U is a Clifford gate. Since the high labels of the top qubit in the resulting diagram is still a Pauli string, and the high edge's weights are still ∈ {0, ±1, ±i}, we conclude that both these gates yield a ⟨Pauli⟩-Tower-LIMDD. Finally, for the Hadamard, we decompose |φ⟩ = |0⟩ ⊗ |ψ⟩ + α |1⟩ ⊗ P |ψ⟩ for some (n − 1)-qubit stabilizer state |ψ⟩, α ∈ {0, ±1, ±i} and P is an (n − 1)-qubit Pauli string. Now we note that H |φ⟩ ∝ |0⟩ ⊗ |ψ 0 ⟩ + |1⟩ ⊗ |ψ 1 ⟩ where |ψ x ⟩ := (I + (−1) x αP ) |ψ⟩ with x ∈ {0, 1}. Now we consider two cases, depending on whether P commutes with all stabilizers of |ψ⟩: (a) There exist a stabilizer g of |ψ⟩ which anticommutes with P . We note two things. First, ⟨ψ|P |ψ⟩ = ⟨ψ|P g|ψ⟩ = ⟨ψ|g · (−P )|ψ⟩ = −⟨ψ|P |ψ⟩, hence ⟨ψ|P |ψ⟩ = 0. It follows from Lemma 15 of [82] that |ψ x ⟩ is a stabilizer state, so by the induction hypothesis it can be written as a ⟨Pauli⟩-Tower-LIMDD. Let v be the root node of this LIMDD. Next, we note that is the root node of a ⟨Pauli⟩-Tower-LIMDD for H |φ⟩. Proof. The proof is by induction on n. The case n = 1 is covered by Lemma 3. Suppose that the induction hypothesis holds, and let |φ⟩ be an n + 1-qubit state represented by a ⟨Pauli⟩-Tower-LIMDD. First, we note that a CNOT gate CX t c can be written as CX t c = (H ⊗ H)CX c t (H ⊗ H), so without loss of generality we may assume that c > t. We treat two cases, depending on whether U affects the top qubit or not.
Finally, we show that stabilizer states are precisely the ⟨Pauli⟩-Tower-LIMDDs. Proof. We first prove that each stabilizer state is represented by a ⟨Pauli⟩-Tower-LIMDD. We recall that each stabilizer state can be obtained as the output state of a Clifford circuit on input state |0⟩ ⊗n . Each Clifford circuit can be decomposed into solely the gates H, S and CNOT. The state |0⟩ ⊗n is represented by a ⟨Pauli⟩-Tower-LIMDD. According to Lemma 4, applying an H, S or CNOT gate to a ⟨Pauli⟩-Tower-LIMDD results a state represented by another ⟨Pauli⟩-Tower-LIMDD. One can therefore apply the gates of a Clifford circuit to the initial state |0⟩, and obtain a ⟨Pauli⟩-Tower-LIMDD for every intermediate state, including the output state. Therefore, every stabilizer state is represented by a ⟨Pauli⟩-Tower-LIMDD.

D Efficient algorithms for choosing a canonical high label
Here, we present an efficient algorithm which, on input Pauli-LIMDD node , returns a canonical choice for the high label B high (algorithm GetLabels, in Alg. 12). By canonical, we mean that it returns the same high label for any two nodes in the same isomorphism equivalence class, i.e., for any two nodes v, w for which |v⟩ ≃ Pauli |w⟩.
We first characterize all eligible labels B high in terms of the stabilizer subgroups of the children nodes v 0 , v 1 , denoted as Stab(v 0 ) and Stab(v 1 ) (see Sec. 2 for the definition of stabilizer subgroup). Then, we provide the algorithm GetLabels which correctly finds the lexicographically minimal eligible label (and corresponding root label), and runs in time O(n 3 ) where n is the number of qubits. Fig. 15 illustrates this process. In the figure, the left node w summarizes the status of the Ma-keEdge algorithm on Line 10, when this algorithm has enough information to construct the semi-reduced node w v0 I ⊗n v1 λP , shown on the left. The node v r , on the right, is the canonical node, and is obtained by replacing w's high edge's label by the canonical label B high . This label is chosen by minimizing the expression B high = (−1) s λ (−1) x g 0 P g 1 , where the minimization is over s, x ∈ {0, 1}, g 0 ∈ Stab(|v 0 ⟩), g 1 ∈ Stab(|v 1 ⟩), subject to the constraint that x = 0 if v 0 ̸ = v 1 . We have |w⟩ ≃ Pauli |v r ⟩ by construction as intended, namely, they are related via |w⟩ = B root |v r ⟩. Th. 14 shows that this way to choose the high label indeed captures all eligible high labels, i.e., a Proof. It is straightforward to verify that the isomorphism B root in eq. (18) indeed maps |w⟩ to |v⟩ (as x = 1 implies v 0 = v 1 ), which shows that |w⟩ ≃ |v⟩. For the converse direction, suppose there exists an n-qubit Pauli LIM C such that C |w⟩ = |v⟩, i.e., We show that if B high satisfies eq. (19), then it has a decomposition as in eq. (17). We write C = C top ⊗ C rest where C top is a single-qubit Pauli operator and C rest is an (n − 1)-qubit Pauli LIM (or a complex number ̸ = 0 if n = 1). We treat the two cases C top ∈ {I, Z} and C top ∈ {X, Y } separately: (a) Case C top ∈ {I, Z}. Then C top = 1 0 0 (−1) y for y ∈ {0, 1}. In this case, Eq. 19 implies C top |0⟩ C rest |v 0 ⟩ = |0⟩ |v 0 ⟩, so C rest |v 0 ⟩ = |v 0 ⟩, in other words C rest ∈ Stab(|v 0 ⟩). Moreover, Eq. 19 implies (−1) y λC rest P |v 1 ⟩ = B high |v 1 ⟩, or, equivalently, (−1) −y λ −1 P −1 C −1 rest B high ∈ Stab(v 1 ). Hence, by choosing s = y and x = 0, we compute Figure 15: Illustration of finding a canonical high label for a semi-reduced node w, yielding a reduced node v r . The chosen high label is the minimal element from the set of eligible high labels based on stabilizers g0, g1 of v0, v1 (drawn as self loops). The minimal element holds a factor λ (−1) x for some x ∈ {0, 1}. There are two cases: if v0 ̸ = v1 or x = 0, then the factor is λ and the root edge should be adjusted with an I or Z on the root qubit. The other case, x = 1, leads to an additional multiplication with an X on the root qubit.
From Eq. 20, we first note that |v 0 ⟩ and |v 1 ⟩ are isomorphic, so by Corollary 9, and because the diagram has merged these two nodes, we have v 0 = v 1 . Consequently, we find from Eq. 20 that z −1 C −1 rest B high ∈ Stab(v 0 ) and z −1 λC rest P ∈ Stab(v 1 ). Now choose x = 1 and choose s such that (−1) s · z −2 C −1 rest B high C rest = B high (recall that Pauli LIMs either commute or anticommute, so B high C rest = ±C rest B high ). This yields: where we used the fact that P 2 = I ⊗(n−1) because P is a Pauli string.

Corollary 15.
As a corollary of Th. 14, we find that taking, as in Fig. 15, yields a proper implementation of HighLabel as required by Def. 5, because it considers all possible B high such that |v⟩ ≃ Pauli |0⟩ |v 0 ⟩ + |1⟩ ⊗ B high |v 1 ⟩.
A naive implementation for GetLabels would follow the possible decompositions of eligible LIMs (see Eq. 17) and attempt to make this LIM smaller by greedy multiplication, first with stabilizers of g 0 ∈ Stab(v 0 ), and then with stabilizers g 1 ∈ Stab(v 1 ). To see why this does not work, consider the following example: the high edge label is Z and the stabilizer subgroups are Stab(v 0 ) = ⟨X⟩ and Stab(v 1 ) = ⟨Y ⟩. Then the naive algorithm would terminate and return Z because X, Y > Z, which is incorrect since the high-edge label X · Z · Y = −iI is smaller than Z.  B high := (−1) s · λ (−1) x · g 0 · P · g 1 10: To overcome this, we consider the group closure of both Stab(v 0 ) and Stab(v 1 ). See Alg. 12 for the O(n 3 )-algorithm for GetLabels, which proceeds in two steps. In the first step (Line 3), we use the subroutine ArgLexMin for finding the minimal Pauli LIM A such that A = λP · g 0 · g 1 for g 0 ∈ Stab(v 0 ), g 1 ∈ Stab(v 1 ). We will explain and prove correctness of this subroutine below in Sec. D.2. In the second step (Line 4-8), we follow Corollary 15 by also minimizing over x and s. Finally, the algorithm returns B high , the minimum of all eligible edge labels according to Corollary 15, together with a root edge label B root which ensures the represented quantum state remains the same.
Below, we will explain O(n 3 )-time algorithms for finding generating sets for the stabilizer subgroup of a reduced node and for ArgLexMin. Since all other lines in Alg. 12 can be performed in linear time, its overall runtime is O(n 3 ).

D.1 Constructing the stabilizer subgroup of a LIMDD node
In this section, we give a recursive subroutine GetStabilizerGenSet to construct the stabilizer subgroup Stab(|v⟩) = {A ∈ PauliLIM n | A |v⟩ = |v⟩} of an n-qubit LIMDD node v (see Sec. 2). This subroutine is used by the algorithm GetLabels to select a canonical label for the high edge and root edge. If the stabilizer subgroup of v's children have been computed already, GetSta-bilizerGenSet's runtime is O(n 3 ). GetStabilizerGenSet returns a generating set for the group Stab(|v⟩). Since these stabilizer subgroups are generally exponentially large in the number of qubits n, but they have at most n generators, storing only the generators instead of all elements may save an exponential amount of space. Because any generator set G of size |G| > n can be brought back to at most n generators in time O(|G| · n 2 ) (see App. A), we will in the derivation below show how to obtain generator sets of size linear in n and leave the size reduction implicit. We will also use the notation A · G and G · A to denote the sets {A · g | g ∈ G} and {g · A | g ∈ G}, respectively, when A is a Pauli LIM.
We now sketch the derivation of the algorithm. The base case of the algorithm is the Leaf node of the LIMDD, representing the number 1, which has stabilizer group {1}. For the recursive case, we wish to compute the stabilizer group of a reduced n-qubit . If B high = 0, then it is straightforward to see that λP n ⊗ P ′ |v⟩ = |v⟩ implies P n ∈ {I, Z}, and further that Stab(|v⟩) = ⟨{P n ⊗ g | g ∈ G 0 , P n ∈ {I, Z}}⟩, where G 0 is a stabilizer generator set for v 0 .
where Iso(v, w) denotes the set of Pauli isomorphisms A which map |v⟩ to |w⟩ and we have denoted π·G := {π · g | g ∈ G} for a set G and a single operator π. Lemma 5 shows that such an isomorphism set can be expressed in terms of the stabilizer group of |v⟩.
Given generating sets for Stab(v 0 ) and Stab(v 1 ), evaluating eq. (24) requires us to: • Compute Stab(A |w⟩) from Stab(w) (as generating sets) for Pauli LIM A and node w. It is straightforward to check that AgA † g ∈ G , with ⟨G⟩ = Stab(w), is a generating set for Stab(A |w⟩).
• Find a single isomorphism between two edges, pointing to reduced nodes. In a reduced LIMDD, edges represent isomorphic states if and only if they point to the same nodes. This results in a straightforward algorithm, see Alg. 16.
• Find the intersection of two stabilizer subgroups, represented as generating sets G 0 and G 1 (IntersectStabilizerGroups, Alg. 15). First, it is straightforward to show that the intersection of two stabilizer subgroups is again a stabilizer subgroup (namely, it is abelian and does not contain −I. It is never empty since I is a stabilizer of all states). ‖ The algorithm proceeds in two steps: first, we compute the intersection of G 0 and G 1 modulo phase; second, we "correct for" the fact that the phases play a role.
Very broadly speaking, we use the following algebraic properties of Pauli groups. First, when considering a Pauli string λP modulo phase, it is convenient to think of it as simply the Pauli string P with phase equal to +1. This allows us to take an element P ∈ G 0 from a Pauli stabilizer group G 0 modulo phase, and, by abuse of language, multiply it by a phase λ ∈ C to obtain λ · P ∈ G 0 . Second, for any Pauli string P ∈ G 0 ∩ G 1 , i.e., in the group modulo phase, there exists a unique λ such that λ·P ∈ ⟨G 0 ⟩; and a unique ω such that ω·P ∈ ⟨G 1 ⟩. Moreover, we have ω = ±λ in this case due to anti-commutativity. Consequently, if S = {P 1 , . . . , P ℓ } is a generating set for the group ⟨S⟩ = G 0 ∩ G 1 modulo phase, then we can divide these generators P j into two sets, S = S 0 ∪ S 1 , where each P ∈ S 0 satisfies λP ∈ G 0 ∩ G 1 for some λ ∈ C and each P ∈ S 1 satisfies λP ∈ G 0 and −λP ∈ G 1 . The algorithm finds these sets S 0 and S 1 , including phase, in the loop in Line 5-11. Given such sets S 0 , S 1 , any element in G 0 ∩ G 1 (i.e., the set we are interested in) can be written as a product of elements of S 0 and an even number of elements from S 1 . The set {e 1 · e 2 , . . . , e 1 · e m }, found by the algorithm in Line 14-16, generates precisely this set of elements generated from an even number of elements of S 1 .
All of the above steps can be performed in O(n 3 ) time, where n is the number of qubits. In particular, a generating set for the intersection of G 0 and G 1 modulo phase is simply the intersection of two vector spaces over We remark that the Zassenhaus algorithm cannot be straightforwardly applied to find the intersection of the groups G 0 and G 1 directly, since the elements of G 0 may not commute with those of G 1 .
‖ To be clear, here we consider the stabilizers including their phase, i.e., we are not considering the groups modulo phase. Indeed, computing the intersection of two groups modulo phase is relatively easy, as shown in App. A.
• IntersectIsomorphismSets: Find the intersection of two isomorphism sets, represented as single isomorphism (π 0 , π 1 ) with a generator set of a stabilizer subgroup (G 0 , G 1 ), see Lemma 5. This is the coset intersection problem for the PauliLIM n group. Isomorphism sets are coset of stabilizer groups (see Lemma 5) and it is not hard to see that that the intersection of two cosets, given as isomorphisms π 0/1 and generator sets G 0/1 , is either empty, or a coset of ⟨G 0 ⟩ ∩ ⟨G 1 ⟩ (this intersection is computed using Alg. 15). Therefore, we only need to determine an isomorphism π ∈ π 0 ⟨G 0 ⟩ ∩ π 1 ⟨G 1 ⟩, or infer that no such isomorphism exists.
We solve this problem in O(n 3 ) time in two steps (see Alg. 14 for the full algorithm). First, we note that that π 0 ⟨G 0 ⟩ ∩ π 1 ⟨G 1 ⟩ = π 0 [⟨G 0 ⟩ ∩ (π −1 0 π 1 )⟨G 1 ⟩], so we only need to find an element of the coset S := ⟨G 0 ⟩ ∩ (π −1 0 π 1 )⟨G 1 ⟩. Now note that S is nonempty if and only if there exists g 0 ∈ ⟨G 0 ⟩, g 1 ∈ ⟨G 1 ⟩ such that g 0 = π −1 0 π 1 g 1 , or, equivalently, π −1 0 π 1 ·g 1 ·g −1 0 = I. We show in Lemma 6 that such g 0 , g 1 exist if and only if I is the smallest element in the set Sπ −1 0 π 1 ⟨G 1 ⟩ · ⟨G 0 ⟩. Hence, for finding out if S is empty we may invoke the LexMin algorithm we have already used before in GetLabels and we will explain below in Sec. D.2. If it is not empty, then we obtain g 0 , g 1 as above using ArgLexMin, and output π 0 · g 0 as an element in the intersection. Since Lexmin and ArgLexMin take O(n 3 ) time, so does Alg. 14.
The four algorithms above allow us to evaluate each of the four individual terms in eq. (24). To finish the evaluation of eq. (24), one would expect that it is also necessary that we find the union of isomorphism sets. However, we note that if πG is an isomorphism set, with π an isomorphism and G an stabilizer subgroup, then P n ⊗ (πg) = (P n ⊗ π)(I ⊗ g) for all g ∈ G. Therefore, we will evaluate eq. (24), i.e. find (a generating set) for all stabilizers of node v in two steps. First, we construct the generating set for the first term, i.e. I ⊗ (Stab(|v 0 ⟩) ∩ Stab(B high |v 1 ⟩)), using the algorithms above. Next, for each of the other three terms P n ⊗ (πG), we add only a single stabilizer of the form P n ⊗ π for each P n ∈ {X, Y, Z}. We give the full algorithm in Alg. 13 and prove its efficiency below.

Lemma 7 (Efficiency of function GetStabilizerGenSet).
Let v be an n-qubit node. Assume that generator sets for the stabilizer subgroups of the children v 0 , v 1 are known, e.g., by an earlier call to GetStabilizerGenSet, followed by caching the result (see Line 28 in Alg. 13). Then Alg. 13 (function GetStabilizerGenSet), applied to v, runs in time O(n 3 ).
Proof. If n = 1 then Alg. 13 only evaluates Line 2-4, which run in constant time. For n > 1, the algorithm performs a constant number of calls to GetIsomorphism (which only multiplies two Pauli LIMs and therefore runs in time O(n)) and four calls to IntersectIsomorphismSets. Note that the function IntersectIsomorphismSets from Alg. 14 invokes O(n 3 )-runtime external algorithms: • the Zassenhaus algorithm [80] to calculate a basis for the intersection of two subspaces of a vector space, • the RREF algorithms mentioned in App. A, and • Algorithm 2 from [82] to synthesize a circuit that transforms any stabilizer state to a basis state. Specifically, this algorithm receives as input a stabilizer subgroup G and outputs a Clifford circuit U such that U GU † = {Z 1 , . . . , Z |G| }. We remark that García et al. assume in their work that G is the stabilizer group of a stabilizer state, i.e., |G| = n, but in fact the algorithm works also without that assumption, i.e., in the more general case when G is any abelian group of Pauli operators not containing −I. Our algorithms use this more general use case.

Algorithm 13
Algorithm for constructing the Pauli stabilizer subgroup of a Pauli-LIMDD node. The algorithm always returns a set in reduced row echelon form (see App. A), which is accomplished in line 27. In particular, the set always returns at most n elements for n-qubit states.
The central procedure in Alg. 17 is ArgLexMin, which, given a LIM A and sets G 0 , G 1 which generate stabilizer groups, finds g 0 ∈ ⟨G 0 ⟩ , g 1 ∈ ⟨G 1 ⟩ such that A · g 0 · g 1 reaches its lexicographic minimum over all choices of g 0 , g 1 . It first performs the scalar-ignoring minimization (Line 5) to find g 0 , g 1 modulo scalar. The algorithm LexMin simply invokes ArgLexMin to get the arguments g 0 , g 1 which yield the minimum and uses these to compute the actual minimum.
The subroutine FindOpposite finds an element g ∈ G 0 such that −g ∈ G 1 , or infers that no such g exists. It does so in a similar fashion as IntersectStabilizerGroups from Sec. D.1: by conjugation with a suitably chosen unitary U , it maps G 1 to {Z 1 , Z 2 , . . . , Z |G1| }. Analogously to our explanation of IntersectStabilizerGroups, the group generated by U G 1 U † contains precisely all Pauli LIMs which satisfy the following three properties: (i) the scalar is 1; (ii) its Pauli string has an I or Z at positions 1, 2, . . . , |G 1 |; (iii) its Pauli string has an I at positions

E Measuring an arbitrary qubit
Alg. 18 allows one to measure a given qubit. Specifically, given a quantum state |e⟩ represented by a LIMDD edge e, a qubit index k and an outcome b ∈ {0, 1}, it computes the probability of observing |b⟩ when measuring the k-th significant qubit of |e⟩. The algorithm proceeds by traversing the LIMDD with root edge e at Line 7. Like Alg. 5, which measured the top qubit, this algorithm finds the probability of a given outcome by computing the squared norm of the state when the k-th qubit is projected onto |0⟩, or |1⟩. The case that is added, relative to Alg. 5, is the case when n > k, in which case it calls the procedure SquaredNormProjected. On input e, y, k, the procedure SquaredNormProjected outputs the squared norm of Π y k |e⟩, where Π y k = I n−k ⊗ |y⟩ ⟨y| ⊗ I k−1 is the projector which projects the k-th qubit onto |y⟩.
After measurement of a qubit k, a quantum state is typically projected to |0⟩ or |1⟩ (b = 0 or b = 1) on that qubit, depending on the outcome. Alg. 19 realizes this. It does so by traversing the LIMDD until a node v with idx(v) = k is reached. It then returns an edge to a new node by calling MakeEdge(follow 0 (e), 0) to project onto |0⟩ or MakeEdge(0, follow 1 (e)) to project onto |1⟩, of register A acts as the control qubits and one qubit in register B is the target qubit. Lastly, it applies n − log n Controlled-X gates, where, in each gate, one qubit in register B is the control qubit and one or more qubits in register A are the target qubits. Each of the three groups of gates is highlighted in a dashed rectangle in Fig. 16. On input |0⟩ ⊗n , the circuit's final state is |W n ⟩. We emphasize that the Controlled-X gates are permutation gates (i.e., their matrices are permutation matrices). Therefore, these gates do not influence the number of non-zero computational basis state amplitudes of the intermediate states. We refer to the t-th gate of this circuit as U t , and the t-th intermediate state as |ψ t ⟩, so that |ψ t+1 ⟩ = U t |ψ t ⟩ and |ψ 0 ⟩ = |0⟩ is the initial state.
For the next lemma, we use the notion of a prefix of a LIMDD node. This lemma will serve as a tool which allows us to show that a LIMDD is small when its computational basis rank is low. We apply this tool to the intermediate states of the circuit in Lemma 11.
Definition 10 (Prefix of a LIMDD node). For a given string x ∈ {0, 1} ℓ , consider the path traversed by the follow x ( r R ) subroutine, which starts at the diagram's root edge and ends at a node v on level ℓ. We will say that x is a prefix of the node v. We let Labels(x) be the product of the LIMs on the edges of this path (i.e., including the root edge). The set of prefixes of a node v is denoted pre(v).

Lemma 9.
If a LIMDD represents the state |φ⟩, then its width at any given level (i.e., the number of nodes at that level) is at most χ comp (|φ⟩).
Proof. For notational convenience, let us number the levels so that the root node is on level 0, its children are on level 1, and so on, with the Leaf on level n (contrary to Fig. 3). Let r be the root node of the LIMDD, and R the root edge's label. By construction of a LIMDD, the state represented by the LIMDD can be expressed as follows, for any level ℓ ≥ 0, Since r R is the root of our diagram, if x is a prefix of v, then follow x ( r R ) = Labels(x) · |v⟩ (27) A string x ∈ {0, 1} ℓ can be a prefix of only one node; consequently, the prefix sets of two nodes on the same level are disjoint, i.e., pre(v p )∩pre(v q ) = ∅ for p ̸ = q. Moreover, each string x is a prefix of some node on level ℓ (namely, simply the node at which the follow x ( r R ) subroutine arrives). Say that the ℓ-th level contains m nodes, v 1 , . . . , v m . Therefore, the sets pre(v 1 ), . . . , pre(v m ) partition the set {0, 1} ℓ . Therefore, by putting Eq. 27 and Eq. 26 together, we can express the root node's state in terms of the nodes v 1 , . . . , v m on level ℓ: We now show that each term x∈pre(v k ) |x⟩ ⊗ Labels(x) · |v k ⟩ contributes a non-zero vector. It then follows that the state has computational basis rank at least m, since these terms are vectors with pairwise disjoint support, since the sets pre(v k ) are pairwise disjoint. Specifically, we show that each node has at least one prefix x such that Labels(x) · |v⟩ is not the all-zero vector. In principle, this can fail in one of three ways: either v has no prefixes, or all prefixes x ∈ pre(v k ) have Labels(x) = 0 because the path contains an edge labeled with the 0 LIM, or the node v represents the all-zero vector (i.e., |v⟩ = ⃗ 0). First, we note that each node has at least one prefix, since each node is reachable from the root, as a LIMDD is a connected graph. Second, due to the zero edges rule (see Def. 5), for any node, at least one of its prefixes has only non-zero LIMs on the edges. Namely, each node v has at least one incoming edge labeled with a non-zero LIM, since, if it has an incoming edge from node w labeled with 0, then this must be the high edge of w and by the zero edges rule the low edge of w must also point to v and moreover must be labeled with I by the low factoring rule. Together, via a simple inductive argument, there must be at least one non-zero path from v to the root. Lastly, no node represents the all-zero vector, due to the low factoring rule (in Def. 5). Namely, if v is a node, then by the low factoring rule, the low edge has label I. Therefore, if this edge points to node v 0 , and the high edge is v1 A , then the node v represents |v⟩ = |0⟩ |v 0 ⟩ + |1⟩ A |v 1 ⟩ with possibly A = 0, so, if |v 0 ⟩ ̸ = ⃗ 0, then |v⟩ ̸ = ⃗ 0. An argument by induction now shows that no node in the reduced LIMDD represents the all-zero vector.
Therefore, each node has at least one prefix x such that follow x ( r R ) ̸ = ⃗ 0. We conclude that the equation above contains at least m non-zero contributions. Hence m ≤ χ comp (R |r⟩), at any level 0 ≤ ℓ ≤ n. Fig. 16 (with n = 2 c ) has χ comp (|ψ⟩) ≤ n.

Lemma 10. Each intermediate state in the circuit in
Proof. The initial state is |ψ 0 ⟩ = |0⟩ ⊗n , which is a computational basis state, so χ comp (ψ 0 ) = 1. The first log n gates are Hadamard gates, which produce the state |ψ log n ⟩ = H ⊗ log n ⊗ I n−log n |0⟩ = |+⟩ ⊗ log n ⊗ |0⟩ ⊗n−log n = 1 √ n This is a superposition of n computational basis states, so we have χ comp (|ψ log n ⟩) = n. All subsequent gates are controlled-X gates; these gates permute the computational basis states, but they do not increase their number.
Lemma 11. The reduced LIMDD of each intermediate state in the circuit in Fig. 16 has polynomial size.
Proof. By Lemma 9, the width of a LIMDD representing |φ⟩ is at most χ comp (|φ⟩) at any level.
Since there are n levels, the total size is at most nχ comp (|φ⟩). By Lemma 10, the intermediate states in question have polynomial χ comp , so the result follows.

Lemma 12.
The LIMDD of each gate in the circuit in Fig. 16 (with n = 2 c ) has polynomial size.
Lemma 16. Each call to ApplyGate(U t , |ψ t ⟩) runs in polynomial time, for any gate U t in the circuit in Fig. 16 (with n = 2 c ).
Proof. If U t is a Hadamard gate, then LIMDDs can apply this in polynomial time by Th. 6, since |ψ t ⟩ is a stabilizer state. Otherwise, U t is one of the controlled-X gates. In this case there are a polynomial number of recursive calls to ApplyGate, by Lemma 13. Each recursive call to ApplyGate makes two calls to Add(|α⟩ , |β⟩), where both α and β are states with polynomial subfunction rank, by Lemma 15. By Lemma 14, these calls to Add all complete in time polynomial in the subfunction rank of its arguments.
Corollary 16. The circuit in Fig. 16 (with n = 2 c ) can be simulated by LIMDDs in polynomial time.

G Numerical search for the stabilizer rank of Dicke states
Given the separation between the Clifford + T simulator -a specific stabilizer-rank based simulatorand Pauli-LIMDDs, it would be highly interesting to theoretically compare Pauli-LIMDDs and general stabilizer-rank simulation. However, proving an exponential separation would require us to find a family of states for which we can prove its stabilizer rank scales exponentially, which is a major open problem. Instead, we here take the first steps towards a numerical comparison by choosing a family of circuits which Pauli-LIMDDs can efficiently simulate and using Bravyi et al.'s heuristic algorithm for searching the stabilizer rank of the circuits' output states [12]. If the stabilizer rank is very high (specifically, if it grows superpolynomially in the number of qubits), then we have achieved the goal of showing a separation. We cannot use W states for showing this separation because the n-qubit W state |W n ⟩ has linear stabilizer rank, since it is a superposition of only n computational basis states. Instead we focus on their generalization, Dicke states |D n w ⟩, which are equal superpositions of all n-qubit computational-basis status with Hamming weight w (note |W n ⟩ = |D n 1 ⟩), We implemented the algorithm by Bravyi et al.: see [83] for our open-source implementation. Unfortunately, the algorithm's runtime grows significantly in practice, which we believe is due to the fact that it acts on sets of quantum state vectors, which are exponentially large in the number of qubits. Our implementation allowed us to go to at most 9 qubits using the SURF supercomputing cluster. We believe this is a limitation of the algorithm and not of our implementation, since Bravyi et al. do not report beyond 6 qubits while Calpin uses the same algorithm and reaches at most 10 qubits [84]. Table 2 shows the heuristically found stabilizer ranks of Dicke states with our implementation. Although we observe the maximum found rank over w to grow quickly in n, the feasible regime (i.e. up to 9 qubits) is too small to draw a firm conclusion on the stabilizer ranks' scaling.