Efficient classical algorithms for simulating symmetric quantum systems

In light of recently proposed quantum algorithms that incorporate symmetries in the hope of quantum advantage, we show that with symmetries that are restrictive enough, classical algorithms can efficiently emulate their quantum counterparts given certain classical descriptions of the input. Specifically, we give classical algorithms that calculate ground states and time-evolved expectation values for permutation-invariant Hamiltonians specified in the symmetrized Pauli basis with runtimes polynomial in the system size. We use tensor-network methods to transform symmetry-equivariant operators to the block-diagonal Schur basis that is of polynomial size, and then perform exact matrix multiplication or diagonalization in this basis. These methods are adaptable to a wide range of input and output states including those prescribed in the Schur basis, as matrix product states, or as arbitrary quantum states when given the power to apply low depth circuits and single qubit measurements.


Introduction
In the physical sciences, symmetries are useful for simplifying difficult computational tasks by reducing the effective degrees of freedom of the problem.This general principle has been used to find exact solutions to many problems, such as integrable systems [1], topological fixed-point models [2], or conformal field theories [3].There has been a hope that similar symmetries may enable the efficiency of quantum algorithms for simulating or finding the ground state of a symmetric Hamiltonian.Indeed, it is known that there exist theoretical guarantees for quantum algorithms for finding the ground state [4] and fast-forwarding quantum dynamics [5] of Hamiltonians which commute under the action of the symmetric group S n on qubits.It has also numerically been shown that quantum algorithms are capable of finding the ground state of certain integrable systems [6,7] even when the symmetry is not explicitly given to the quantum algorithm a priori.Furthermore, prior work used Lie algebraic methods to efficiently classically simulate operators restricted to a Lie algebra whose dimension is polynomially large (independent of the potentially exponentially large Hilbert space dimension) [8,9].Quantum machine learning models that are symmetry equivariant are also believed to be more efficiently trainable than their general counterparts [10,11,12,13,14,15].These quantum models are partly inspired by classical neural network models that have enjoyed much recent success [16,17,18].However, restricting quantum algorithms to problems obeying many symmetries potentially allows for efficient classical algorithms which also take advantage of these same symmetries.This raises the natural question: are there efficient classical algorithms capable of performing these tasks?
This is what we investigate here.Intuitively, we show that problems constrained by large symmetry groups yield efficient classical algorithms for computing many properties of interest, as il-lustrated in Figure 1(a).We first broadly discuss very general classical algorithms for finding the ground state and energy of Hamiltonians constrained by many symmetries.We also consider the problem of simulating dynamics under symmetric Hamiltonians.We then specialize to the case of systems invariant under permutations of its qubits.Finally, we dequantize an algorithm for performing binary classification problems using permutation-invariant systems on qubits.

Motivation and setting
Our algorithms are motivated by the fact that symmetries significantly reduce the number of degrees of freedom for a given problem.For example, consider the classical setting of Boolean functions which are invariant under arbitrary permutations of the bits.Such functions are defined up to the orbits of the Boolean cube with respect to permutations of the bits.For a Boolean function on n bits, there are n + 1 orbits indexed by the Hamming weight of the bitstrings.Therefore, any problem over symmetric Boolean functions need only consider a given element of each of the n + 1 orbits to cover all possible degrees of freedom.As we will later show, the symmetric group acting over n qubits similarly reduces systems to O(n 3 ) degrees of freedom.By considering the algebra of the symmetric group on the symmetric subspace of linear operators, we will show that all these degrees of freedom can be manipulated solely through classical computation.
Before proceeding, we need to introduce important functions and definitions that will be used in this setting.We first formalize the notion of symmetry by speaking of invariant operators, defined in the following way: Definition 1 (Invariant operator).Given a compact group G with unitary representation R : G → U (N ), a linear operator H : (1) Note that any invariant operator is also an equivariant operator [13] in the sense that it commutes with the representation of the group.
Any operator can be projected onto the symmetric subspace induced by R (G) using the twirling superoperator Re R (more commonly known as the Reynold's operator in invariant theory) [19,20], which maps any operator onto the set of equivariant operators: Invariant subspaces of a larger Hilbert space can be identified by performing an isotypic decomposition of the representation of a group.As an example, in the case of systems invariant under permutations of the qudits, the Schur decomposition maps the computational basis into blocks of invariant subspaces.We graphically visualize this phenomenon in Figure 1(b) and provide further details in Appendix A.
Throughout this study, runtime complexities are denoted as a function of the matrix multiplication exponent ω.The best known upper bound is currently ω = 2.37188 [21], which implies asymptotic runtimes of O(n ω+α ) for any α > 0 for stably performing common linear algebraic routines including eigendecomposition, SVD, and matrix inversion [22].

Background for general symmetry groups
In this Section, we discuss the general problem of finding the ground state energy, ground state, and performing time evolution under a Hamiltonian H on a finite-dimensional Hilbert space which is invariant under some representation R of a symmetry group G.Some of these problems have previously been considered in a variety of special cases [23,8,9,24,25].Here, we provide background on the general techniques used in solving these problems using a unified framework.We also give an equivalent formulation in Appendix B in the language of tensor networks.
We consider the *-subalgebra of operators invariant under R to which H belongs.We think of this subalgebra as a standalone *-algebra X, such that the embedding of X into the full operator algebra defines a representation A of X.The practical relevance of these considerations is when the size of the total Hilbert space grows exponentially with some scaling parameter n.The paradigmatic example is the Hilbert space of n qubits.If there are enough symmetries, it can happen that the dimension N (n) of X only grows polynomially with n, in which case many properties can be calculated efficiently [8].This restriction of X to a lower-dimensional subspace may more generally happen beyond systems symmetric in the sense of Definition 1. Due to this, for now we focus explicitly on X and A, rather than on G and its representation R; we will discuss the connection of our results to G and R more specifically at the end of this Section.
For the various algorithms we now consider, we will assume that different properties of X and A are known.In the course of proving Lemma 2, we give an algorithm for finding the ground state of H. Every finite-dimensional *-algebra is isomorphic to a direct sum of irreducible blocks, and every representation is isomorphic to a direct sum of irreducible representations.That is, there is a block-diagonal orthonormal basis |λ, q λ , p λ ⟩ of the vector space acted upon by A, where λ labels an irrep of X, q λ labels a basis vector internal to λ, and p λ labels a basis vector in the multiplicity vector space of λ; this is the basis in which we compute the ground state of H (for some arbitrary and fixed multiplicity label p λ0 ).To prove our theorem, we assume knowledge of the matrix elements: In the following lemmas, we assume we are given the Hamiltonian H ∈ A(X) as h ∈ X expressed in the preferred basis, such that Though a given H may not be classically described in this exact basis, transformations into this basis are typically efficient when the number of irreps and irrep dimensions are polynomially sized.For example, a Pauli decomposition can be used to transform operators into the Pauli basis studied later in the permutation invariant setting.We first focus on the case when we are interested in finding the ground state of some representation of such a Hamiltonian, in a basis where the action of the representation is known.
Lemma 2 (Finding the ground state of symmetric Hamiltonians).Consider a subalgebra A(X), and assume that the matrix elements F i,λ q λ ,q ′ λ are known as discussed above.Then the ground state energy and ground state of H in the |λ, q λ , p λ0 ⟩ basis can be found in time O n 2 λ n 4 q , where n λ are the number of irreps of X and n q the maximum irrep dimension.
Proof.For each λ, we compute the operator with indices: ĥλ This takes time O dim (X) 2 = O n 2 λ n 4 q .Note that in the |λ, q λ , p λ ⟩ basis, H has a block diagonal form.Furthermore, as p λ labels isomorphic copies of irreps, we can find the ground state by fixing p λ0 WLOG.Namely, the ground state energy is given by: Then, for any p, is a ground state in the |λ, q λ , p λ ⟩ basis.The dimension of ĥλ is dim X (λ) × dim X (λ), and thus and thus diagonalizing it takes time O n ω q .As we need to do this for O(n λ ) many values of λ, the total runtime for finding |ψ * ⟩ is O n λ n ω q .So the runtime of the computation in Eq. ( 5) dominates, and the overall runtime is O n 2 λ n 4 q .
We now show that the dynamics of an initial state ρ under equivariant unitaries can be classically simulated even if ρ ̸ = A (X).This generalizes similar approaches taken in [23] and [25] for both free fermions and fermionic systems with particle number symmetry.The Lemma as written assumes the initial state is given in a basis where the multiplicity indices have been traced out; in Section 4 we will give concrete examples on how one can compute this efficiently under a variety of input models.

Let
be a projective measurement and a unitary operator.Let ρ be a quantum state, and let ρ (λ,q λ ,p λ ),(λ ′ ,q ′ λ ,p ′ λ ) be the coefficients of the state in the |λ, q λ , p λ ⟩ basis.Assume we are given the reduced density matrix Then, Proof.After going to the |λ, q λ , p λ ⟩ basis, O and U become block diagonal, so we can compute tr OU ρU † by restricting to fixed p λ0 .To this end, we compute õλ and ũλ Given the matrix elements The right-hand side can be computed in time O n λ n ω q , as it is given by the matrix multiplication of n λ blocks each of size at most n q × n q .
In the above considerations, the group G and representation R do not directly enter.In practice, however, we might want to start with those two.The irreps of X are in one-to-one correspondence with those of R. By a simple corollary of the von Neumann bicommutant theorem, multiplicities of irreps in X are the dimensions of the irreps of G, and the dimensions of the irreps of G are the multiplicities of the irreps of A. We thus have that: Thus, the problems discussed above become classically tractable if the number n λ of irreps of G with non-zero multiplicity in R, as well as the maximum multiplicity n q of an irrep λ in R, are both polynomially small.

Permutation invariance on qubits
We will now apply the general algorithms of Section 3 to the case where G is given by S n and R is the representation on n qubits acting by permutations, A straightforward basis for the algebra of invariant operators can be obtained by applying the Reynold's operator in Eq. (2) to the Pauli basis.
Normalizing such that all operators A i are sums of unit norm Pauli terms, we obtain (21) where σ denote Pauli operators, and σ 1 = id 2 .
Enumerating such 4-tuples shows that the dimension of the algebra X is of order O(n 3 ).Therefore, the algorithms of Section 3 would reduce the task of finding the ground state (energy) of a Hamiltonian to a polynomial runtime in n, compared to exact diagonalization of the full Hamiltonian with a runtime exponential in n.Let us start with computing the ground state energy and ground state of a permutation-invariant Hamiltonian H given as a vector h i in the basis of symmetrized Pauli monomials.
Theorem 4 (Efficient classical computation of the ground state of a permutation-invariant Hamiltonian).The ground state and ground state energy of a permutation-invariant Hamiltonian on n qubits, given as h i in the basis of symmetrized Pauli monomials above, can be computed in time O(n 7 ) via Lemma 2. When applied to a single Hamiltonian for each system size n, the bottleneck of the algorithm is the computation of the matrix elements F i,λ q λ ,q ′ λ that takes O(n 7 ), but is independent on h i .Once the matrix elements are computed, application of Lemma 2 only takes time O n 6 .The output ground state is given in the |λ, q λ , p λ0 ⟩ Schur basis for a fixed p λ0 .
Proof.The block-diagonal basis of the permutation representation R of G = S n is known as Schur basis, which we will review in Appendix A. From this it is easy to see that the number n λ of irreps with nonzero multiplicity is n λ = O (n), and the maximal dimension n q of these irreps is n q = O (n).Direct application of the algorithm in Lemma 2, then leads to a runtime of O n 6 for computing the ground state of S n -equivariant Hamiltonians in the Schur basis.All that remains is the computation of the O(n 6 ) many matrix elements F i,λ q λ ,q ′ λ for a given n.To this end, we make use of the fact that the Schur basis |λ, q λ , p λ0 ⟩ as well as the representation A both have matrix product operator (MPO) representations.We will here only give a sketch of the tensor-network computation, and refer to Appendix C for the details.The representations are of the form of bond dimension O(n), for each fixed value of λ, and of bond dimension O(n 3 ).With this, the overlap in Eq. ( 3) in the computation of the matrix elements of F can be expressed as a tensor-network diagram: Contracting this tensor network from the left to the right (for a fixed λ and ⃗ p λ0 ) is the same as a sequence of n−1 vector-matrix multiplications of dimension O(n 5 ).This naive contraction already yields a polynomial runtime, namely O(n 10 ) for each of O(n) contractions for each of O(n) values of λ, yielding O(n 12 ) in total.However, in Appendix C, we make use of a band-diagonal structure of the matrices to reduce the cost of a single contraction from O(n 10 ) to O(n 5 ), yielding O(n 7 ) in total.In Appendix E, we also give an alternative combinatorial way of calculating the matrix entries of F which has a slightly worse runtime of O(n 10 ) in total but is better parallelizable.
Remark 5.The output of the classical algorithm in Theorem 4 is the ground state in the Schur basis, which might not always be a useful description.However, from this output, we can be efficiently construct the ground state on n qubits with a quantum computer via the Schur transform [26,27,28].Furthermore, we can construct an efficient classical representation of a ground state as a low-bond-dimension MPS as we discuss in Section 4.2 and Appendix D.
We now consider an application of Lemma 3 to the symmetric group case.
Theorem 6 (Efficient classical simulation of permutation-equivariant dynamics).Consider the classical evaluation of ℓ (ρ) as in Eq. ( 14) using the same assumptions as Lemma 3 with symmetry group S n .ℓ (ρ) can be calculated in time O n 7 .
Proof.As for the proof of Theorem 4, we can compute the matrix elements F i,λ q λ ,q ′ λ in time O(n 7 ).The remainder is straight-forward application of Lemma 3 with n λ , n q = O(n).
Theorems 4 and 6 provide efficient end-to-end classical algorithms if the output ground state, or the input state (for computing the dynamics) are given in the Schur basis.Prior work has also shown similar results for Dicke states [29,24].It may be desirable and more natural to process inputs and outputs of the algorithms that do not explicitly depend on the Schur basis.In the following two subsections, we will discuss the algorithms for two different input and output formats, namely first as a quantum state, and second as a matrix product state.These results are summarized in Table 1 4

.1 Input and output as Quantum State
In this subsection, we will describe how Theorems 4 and 6 can be applied if the input or output is directly given as a quantum state on n qubits.Of course, this requires applying some quantum gates, measurements, and state preparations.However, the necessary quantum computation is of shallow depth and its function is merely to prepare the state or to measure a classical shadow of the state, whereas the core of the computation is classical.
Theorem 7 (Efficient classical preparation of the ground state of a permutation-invariant Hamiltonian).Given a Hamiltonian in the symmetrized Pauli basis as in Theorem 4, we can prepare one of its ground states as a quantum state to an accuracy ϵ in classical runtime O n 7 using a quantum circuit of depth Õ n poly log ϵ −1 .
Proof.We first use Theorem 4 to get a representation of the ground state in the |λ, q λ , p λ ⟩-basis, given by λ min and |ψ * ⟩.We then prepare the quantum state for some p λ min 0 of choice.This preparation takes time O (n).The quantum ground state is then obtained by applying V STO to the above state, which can be implemented in time Õ n poly log ϵ −1 on a quantum computer [27].
Theorem 8 (Efficient classical simulation of permutation-equivariant dynamics on a given quantum state).Consider the classical evaluation of ℓ (ρ) as in Eq. ( 14) using the same assumptions as Lemma 3 with symmetry group S n , except instead of ρ one is given ρ as a quantum state.Then, ℓ (ρ) can be estimated to additive error ϵ with probability Proof.This follows from using the Schur transform [26,27] to prepare ρ (as defined in Eq. ( 13)) from ρ, which can be done in time Õ n poly log ϵ −1 on a quantum computer.Note that ρ is a log 2 n 2 -qubit state as it only acts on the registers labeled by irreps λ and their basis vectors q λ , i.e., the multiplicity register p λ has been traced out.Given a source of ρ, then, a quantum computer can efficiently use classical shadow techniques [30] to output a classical description of ρ capable of estimating ℓ (ρ) to addi-  [24,29].
Table 1: Summary of runtimes and results for finding the ground state and simulating permutation invariant Hamiltonians on inputs of different forms.Throughout, n refers to the number of qubits, δ is the probability of success and ϵ is the error of the outcome.
tive error ϵ with probability 1 − δ using samples of the state.Here, ∥•∥ shadow is the shadow norm, and õ and ũ are defined as in Eqs. ( 15) and ( 16), respectively.As ũ † õũ is on log 2 n 2 qubits we have the general bound [30] ũ † õũ where ∥•∥ ∞ is the operator norm.We have then reduced the problem to a purely classical one, for which the classical runtime is given by Theorem 6.
The sample complexity of this procedure could potentially be improved to Õ(∥O∥ 2 ∞ ϵ −2 n 3 log δ −1 ) as ρ only has n 3 degrees of freedom.However, the block diagonal structure over irreps is lost when transforming to the bitstring encoding |λ⟩ |q⟩ |p⟩ via the Schur transform, and thus we arrive at the sample complexity given.

Input and Output as Matrix Product States
The previous subsection implemented algorithms with quantum states as outputs and inputs.In that particular setting, the readout of a quantum state requires some form of quantum computation of low depth.Here we describe alternative algorithms where we give the inputs and outputs purely classically as low-bond dimension MPS, which are discussed in more detail in Appendix D. Let us start by giving the output of Theorem 4 as an MPS.

Theorem 9. The output of Theorem 4 can be given purely classically as an MPS of bond dimension O(n).
Proof.The output of Theorem 4 is a value of λ and a vector ρ q λ .An MPO representation of the ground state can be obtained by contracting ρ with the MPO for the Schur basis in Eq. ( 22), As we mentioned around Eq. ( 22), the bond dimension of this MPO is O(n).We consider this MPS representation in more detail in Appendix D.
Next, let us consider Theorem 6 with an MPS as classical input.

Theorem 10. Consider a state ψ represented as an MPS of bond dimension χ. The time evolved expectation value ℓ (ρ) with ρ = |ψ⟩ ⟨ψ| as in Theorem 6 can be computed classically in runtime
Proof.Denoting the MPS as tensor-network diagram, the reduced density matrix ρλ as defined in Eq. ( 13) is given by the overlap with twice Eq. ( 22): The naive contraction of this tensor network from left to right would need runtime O(n 8 χ 4 ).However, using that the MPO in the middle is band diagonal, the runtime reduces to O(n 4 χ 4 ).We discuss this contraction in more detail in Appendix D. After we have obtained ρ, we can proceed as in Theorem 6.
Remark 11.If the matrix F in Theorem 4 is pre-computed, then this runtime is reduced to O(χ ω n 3 + n 6 ).

Applications to machine learning
Though it is widely believed that general quantum machine learning models are more expressive than their classical counterparts [31], it is known that generic quantum machine learning models are untrainable [32,33,34,7,35,36,37,11].This gives a strong motivation for constructing quantum machine learning models from symmetryequivariant time dynamics [13] which are believed to be efficiently trainable [36,37].This was formally proven for S n -equivariant models in Ref. [4].
As a specific application of our results, we now consider one of the learning problems for which a variational quantum algorithm was given in Ref. [4].We emphasize that here, just as in Lemma 3, we do not require that the input states ρ i respect the symmetries of the model.For simplicity we assume the quantum input setting of Theorem 8, but a similar theorem can be given for MPS inputs.

Corollary 12 (Efficient classical simulation of permutation-invariant models). Consider a binary classification problem with labels y
where ℓ θ (ρ i ) is as in Eq. ( 14) with a θ-dependent U and otherwise the same assumptions as in Theorem 8. L can be estimated to additive error ϵ at (33) with total probability of success at least 1 − δ.
Proof.This follows immediately from Theorem 8 with δ → δ P by the union bound.
Corollary 12 implies that the loss of these models can be estimated completely classically when the states ρ i are given as certain classical shadows or matrix product state descriptions, even if they do not respect the symmetries of the model.As a point of comparison, consider the runtime of using a variational quantum algorithm to perform this binary classification task.Assume the variational circuits are of depth Ω n 3 as required in Theorem 3 of Ref. [4] to ensure convergence.Then-taking Ω ∥O∥ 2 ∞ ϵ −2 samples for each measurement to achieve an overall shot noise of O (ϵ)-this yields an overall runtime of Ω M P ∥O∥ 2 ∞ ϵ −2 n 3 .For P sufficiently large, compare this to the time O M P n ω+1 algorithm found for the classical algorithm where, even if quantum states are given as input, a classical shadow representation can be measured in quantum depth only Õ n poly log ϵ −1 .Unlike the quantum algorithm, this algorithm can be parallelized over irreps (i.e., over n λ ) easily, giving an effective runtime O (M P n ω ) = o M P n 3 .Even for P small, given many QPUs capable of running depth ∼ n quantum circuits, the classical algorithm parallelizes more effectively than the quantum algorithm as the required shadow tomography can be straightforwardly parallelized over shots.

Conclusion
We have specified a general framework for classically simulating highly symmetric quantum systems.Specializing to the symmetric group, we showed that these techniques yield an efficient classical algorithm for finding the ground state of quantum systems obeying an S n symmetry, evaluating dynamics, and simulating S n -equivariant quantum machine learning models.We hope that this framework sets the foundations for the future study of classical characterizations of symmetric quantum systems.Potential applications include the physical analysis of symmetric quantum systems [9,38,39,40] and implementation of algorithmic primitives for measuring entanglement [41], performing measurements with distributed sensors [42], and learning or testing quantum states [43,44,45].
A major remaining open question is to what extent the classical algorithms we have proposed for the case of qubit permutation invariance carry over to the more general highly symmetric manybody quantum systems.The approach we presented in Section 3 already works for general symmetry groups and representations.However, the computation of the matrix elements of F , as well as the representations of ground and initial states as matrix product states or quantum states was only described for the case of qubit permutation invariance.The perhaps most obvious generalization to other symmetry groups of representations is to consider permutation-invariant systems consisting of n d-dimensional qudits instead of qubits.We sketch this generalization in Appendix G.As it turns out, all algorithms we have proposed generalize with polynomial runtimes, however, the exponents become very large for large d.Specifically, the computation of the matrix elements F , which is the bottleneck for many of our algorithms, takes time O(n 2d 2 −1 ) in this case.
It should be noted that the rather large exponents of the polynomial runtimes (O(n 7 ) for qubits and O(n 2d 2 −1 ) for qudits) do not come from the method we use, but from the way we phrase the problem itself: Specifying a symmetric Hamiltonian alone needs O(n 3 ) data for qubits, and O(n d 2 −1 ) data for qudits, so when measuring the runtime in the size of the input data, the scaling is just a little more than quadratic.For more efficient algorithms we would need to restrict the input to a subset of permutation-invariant Hamiltonians/operators, for example to Hamiltonians that are sparse in the symmetrized Pauli basis, or k-local Hamiltonians for constant k.
Our results indicate that care must be taken when constructing quantum algorithms to solve problems described by strongly symmetric quantum systems to ensure no equally efficient classical algorithm exists, as was demonstrated in Corollary 12.One approach to avoid classical simulation methods is to consider smaller symmetry groups as illustrated in Figure 1(a).Indeed, complexity theory arguments [46,47] suggest that solving the ground state problem for systems with sufficiently small symmetry groups-such as translational invariance-is expected to be difficult.It is also reasonable to expect quantum algorithms for strongly symmetric quantum systems to offer polynomial speedups in appropriately selected scenarios, such as has been demonstrated in Ref. [48].Notably, the tensor contraction algorithms outlined in Theorem 4 currently have a runtime of O(n 7 ), which in terms of the number of qubits n is not as optimal as quantum algorithms.We hope this work sparks further investigation into how problems involving symmetric quantum systems may produce a practical use case for quantum algorithms.

A The Schur basis
In the presence of permutation invariance, the action of operations can be fully understood by analyzing a much smaller subspace of the larger Hilbert space.To precisely understand the form of that subspace, we turn to the Schur-Weyl decomposition of n qubits into subspaces corresponding to irreducible representations of the symmetric and unitary groups labeled by Young diagrams.Schur-Weyl duality offers a means to perform this decomposition by considering the natural representations of the permutation group and n-fold unitary group acting on n qubits [27,49].To describe the Schur basis and the resulting Schur transform, first we note the natural action of a permutation operation R(π) acting on qubits: as in the main text.
Similarly, a unitary U ∈ U(2) acting as the n-fold product Q(U ) takes the form Schur-Weyl duality takes advantage of the fact that Q(•) and R(•) are each others' commutants, stating that the subspace of (C 2 ) ⊗n decomposes as where λ runs over the set of partitions of n into at most two elements, and ρ λ (•) and σ λ (•) are irreducible representations of the unitary group U(2) and the symmetric group S n , respectively.Note that irreps of both of these groups are indexed by partitions.More generally, for the space (C d ) ⊗n of n qudits of dimension d, the λ would span over partitions of n into at most d elements.Partitions can equivalently be enumerated by Young diagrams.For example for the setting of 4 qubits, we have the 3 Young diagrams below that appear in the decomposition above: A consequence of the above is that there exists a basis indexed by |λ, q λ , p λ ⟩ called the Schur basis where the actions of Q(•) and R(•) are separated [27]: where we have implicitly projected onto the subspace indexed by λ.Here, ρ λ (U ) and σ λ (π) act only on the q λ and p λ space, respectively.ρ λ (U ) and σ λ (π) are respectively the linear transformations corresponding to the irreducible representations of U d and S n for the irreducible representation indexed by λ.The above also presents a useful fact about permutation invariance.Namely, such an operation will act invariantly on the permutation register |p λ ⟩ thus significantly reducing the degrees of freedom of a problem.The Schur transform U Sch is a unitary transformation that acts as a change of basis from the computational to the Schur basis described above.The Schur transform can be efficiently implemented on a quantum computer running in time Õ (npoly(d, log n, 1/ϵ)) for error ϵ on qudit systems of local dimension d [27].We follow the notation of Ref. [27]: As noted in the main text, the total degrees of freedom reduces to n+3 The above is also enumerated by the tetrahedral numbers [50].
To expand and manipulate individual basis states indexed by the |q λ ⟩ register, one can use the Young symmetrizer Π p λ to project onto an explicit basis for each λ [27,51].Here, p λ is a particular Young tableau for the Young diagram λ.The Young symmetrizer projects onto a subspace isomorphic to the subspace spanned by |q λ ⟩: where Row(p λ ) and Col(p λ ) are the set of permutations which permute integers within only rows and columns of the Young tableau p λ , respectively [27,51].An example of the basis found via application of the Young symmetrizer is shown in Figure 2. Throughout our study, we consider the Young tableau formed by filling entries in order first column-wise and then row-wise to be the "canonical" basis that we study.As an example, for 4 qubits, there are the following Young tableaus in our "canonical" basis: B General formalism using tensor-network diagrams In this and the two subsequent appendices, we discuss the tensor-network based methods used in Section 4 of the main text in more detail.As a preparation, we will here recap the general formalism from Section 3 in a tensor-network language.
An associative algebra on a vector space B is determined by a linear multiplication map B ⊗ B → B, whose structure coefficients form a 3-index tensor, Here, a and b label basis vectors at the input of the multiplication map, and c is at the output.In our setting, there are three different relevant algebras, namely X, the group algebra of S n , and the group algebra of SU (2).A representation of the algebra on a vector space V is determined by a linear map V ⊗ B → V , whose structure coefficients are again a 3-index tensor, As depicted, we will often use different line styles for different vector spaces such as B and V ; here we made the V -indices thick.Again, there are three different relevant representations: The representation A of X, the representation R of S n , and the representation ( 1 2 ) ⊗n of SU (2), all of which act on the n-qubit Hilbert space.The statement that a Hamiltonian (or an arbitrary operator) H is invariant under the symmetry representation R can be written as The commutativity of the representation A of X with the representation R of S n reads Since by construction A spans the whole commutant of R, H can be parametrized by some X-element h as Eq. (45) then follows through Eq. (46).Next, let us describe how to obtain the ground state energy and ground state via Theorem 4 from the main text using the matrix F .To this end, we use a central theorem of representation theory [52]: 1) Every finite-dimensional *-algebra (which includes group algebras and the algebra X) is isomorphic to a direct sum of full matrix algebras, and 2) every (unitary) representation is isomorphic to a direct sum of irreducible representations, which are projections onto one of the matrix-algebra summands.The two isomorphisms, which we call O and S, respectively, are unitary matrices, Here, the index a labels basis elements of the algebra, the irrep index λ labels irreducible representations of the algebra, internal indices like q and q ′ label basis vectors within an irreducible representation λ, and the multiplicity index p labels the different copies of a fixed irreducible representation λ, if λ occurs multiple times in the representation.Note that indices with different bond dimensions were drawn with different line styles, such as thick for irrep, dotted for internal, and ticked for multiplicity indices.Note that the bond dimension of the internal and multiplicity indices can depend on the value λ of the nearby irrep index, so the diagrams are not classical tensor-network notation but a slightly generalized version thereof.In this notation, the isomorphism can be written as The small black dot on both right-hand sides is a δ-tensor, which is 1 if the irrep labels at all indices are equal, and 0 otherwise.The free lines represent identity matrices.The three identity matrices on the right-hand side of the left equation are matrix multiplication: Each matrix is represented by a double-index, and the second index of the first matrix is contracted with the first index of the second matrix.
We now use the isomorphism O of the symmetry group algebra, and the isomorphism S of its representation.Let us quickly recall what this means for the symmetry group S n and its qubit permutation representation R, even though the discussion here in principle works for general symmetries.As described in the first appendix, the irreps λ are Young diagrams and the internal indices label Young tableaux p λ .The dimension of the multiplicity index q is only non-zero for Young diagrams with one or two rows.In this case, we can label the irrep by a half-integer 0 ≤ λ ≤ n 2 corresponding to the Young diagram with two rows of lengths (n/2 + λ, n/2 − λ), but also to a SU (2) representation.We now let H denote the Hamiltonian (or an arbitrary invariant operator) H conjugated with S, Eq. (45) in this block-diagonal basis then becomes By contracting the left two open internal indices on the left-hand side with copies of a normalized vector v and applying a δ-tensor to the left two irrep indices, we find where Note that v is actually a collection of vectors, depending on the value λ of the irrep.Plugging in Eq. (50) and Eq.(47), we find with F is in fact the isomorphism O for the algebra X.The tensor h can be interpreted as a collection of matrices h λ for different irreps λ.As can be easily seen from Eq. (52), the ground state energy of H is the minimum over λ of the ground state energies of h λ .Furthermore, the ground states are given by where s min is the ground state of h λ min , and v is an arbitrary vector.Of course, there cannot be an efficient classical description which works for all ground states, since the dimension of v (and therefore the ground state degeneracy) is exponentially large in n.However, as we will later give a low-bond dimension MPS description for a full basis of ground states in Eq. (67).
Finally, let us consider the time evolution of a pure state |ψ⟩ under an invariant unitary U .More precisely, we want to calculate the time-evolved expectation value ⟨ψ| U OU † |ψ⟩ of an invariant operator O.We find As usual, we cannot efficiently contract this tensor network, since the internal p-index and the n-qubit indices have an exponential bond dimension in n.However, once we have the reduced density matrix the remaining contraction is fast, dominated by the contraction between F and u or o which takes time O(n 6 ).If ψ is given as many copies of the quantum state, then ρ can be obtained by tomography on the λ and q-subspace, as described in the main text.If ψ is given classically as an MPS, then we show later around Eq. (69) how time evolution can be performed efficiently.Note that time evolution can also be performed if U is defined through an invariant Hamiltonian H = A(h) via U = e itH .In this case, we calculate h as in Eq. (54), exponentiate the individual h λ , and plug e it h into Eq.(57) instead of u and F .

C Tensor-network method for calculation of the entries of F
In this Appendix, we show how to efficiently calculate the explicit matrix F defined in Eq. (55).We cannot directly efficiently contract the tensor network on the right-hand side since the two indices shared between A and S have bond dimension 2 n .In order to make contraction efficient, we will write A and S as MPOs and then contract the tensor network horizontally from qubit to qubit.Let us start by writing A as an MPO, The global prefactor on the right-hand side was not included in the definition of A in the main text, but is necessary for the normalization in Eq. (90).The arrows over the indices on the left-hand side indicate that they correspond to collections of indices on the right-hand side.We have also used a triple-line for the X-index ⃗ i, to indicate that it is labelled by a triple (i x , i y , i z ).The bond dimension of all three horizontal indices is n, and the tensor with the + label is defined as In order to write S as an MPO, we make use of the fact that the commutant of the representation R of S n is spanned by the representation ( 12 ) ⊗n of SU (2).Thus, the isomorphism S is the same for both those representations, just that the internal and multiplicity indices are exchanged.Now, the representation ( 12 ) ⊗n is a tensor product of on-site representations, and thus an MPO, Note that u labels an element of SU (2), so this is a tensor network with a continuous bond dimension, which might not be directly suitable for practical computation but still makes sense formally.In our case all the r i are the spin-1 2 representation, but we will discuss the following few steps for arbitrary representations.In order to get the isomorphism S for the overall representation (for us ( 1 2 ) ⊗n ), we start by block-diagonalizing the individual r i .We end up with many copies of the isomorphism O for the group algebra of SU (2).Those copies of O need to be pulled through all the δ-tensors.When doing this, the δ-tensor splits up into two copies of a tensor C, C exists since the group algebra together with the δ-tensor forms a Hopf algebra.When writing the bialgebra axiom in the block-diagonal basis, one can see that the left-hand side for a fixed configuration of irrep indices is a projector P between all three left internal indices to all three right internal indices.So C can be defined as an isometry such that CC † = P .The index connecting the two copies of C has a bond dimension which depends on the three irreps, and will be referred to as fusion index.The entries of C are known as the Clebsch-Gordon coefficients.We can now use this equation to pull the isomorphism O through all the δ-tensors in Eq. (61).As a result, we obtain S as For our present situation where the group is SU (2), the Clebsch-Gordon coefficients are well-known and can be computed efficiently up to p bits of precision (i.e., up to additive error exponentially small in p) in time poly (n, p) by the Racah formula [53].Note that the fusion index is always trivial, that is, it has dimension either 0 or 1.All local representations r i are equal to spin-1 2 such that all the local s i can be chosen equal to the identity.With those simplifications, we get, The tensor product of the spin-c 2 representation with the spin- Thus, any two consecutive irreps p i , p i+1 in the sequence p 0 , . . ., p n−1 = λ must differ by ± 1 2 , otherwise the MPO above yields 0. Such a sequence exists if and only if (2λ) mod 2 = n mod 2 and λ ≤ n 2 .In fact, one can directly see the correspondence between such sequences ⃗ p and the Young tableaux for the Young diagram with two rows of lengths (n/2 + λ, n/2 − λ): Starting at an empty diagram, we fill its fields consecutively with the numbers 0, . . ., n − 1.In the ith step, the number i is appended to the first row if p i = p i−1 + 1 2 , and added to the second row if p i = p i−1 − 1 2 .Note that a representation of the Schur-Weyl basis equivalent to this MPO has also been used in Ref. [54], and was originally presented in Ref. [27].
Let us now plug the derived MPO representations in Eq. (64) and Eq.(59) into Eq.(55).For the vector v, we choose a fixed basis vector corresponding to a valid sequence ⃗ p = (p 0 , . . ., p n−1 ) depending on λ, F λ q q ′ ⃗ i The maximal possible value of the p i is O(n), and thus the bond dimension of the horizontal q-indices is O(n) as well.We can already see that contracting the tensor network from the left to the right has polynomial runtime.The contraction becomes even faster by the following considerations.For a fixed λ ′ = λ ± 1 2 , consider the Clebsch-Gordon tensor, as collection of matrices M qq ′ for different values of l.Those matrices are simultaneously constantwidth block-diagonal.The + tensor is also band-diagonal, and the bond dimension of the vertical indices is O(1).Since there are 5 horizontal indices each of bond dimension O(n) and all tensors are band diagonal, adding one column of tensors to the contraction takes time O(n 5 ).Since there are n qubits/contraction steps, as well as O(n) different values of λ, the total runtime for the contraction is O(n 7 ).

D Time evolution and ground state using MPS
In this appendix, discuss the two algorithms from Theorems 9 and 10, which are classically end to end using MPS as input or output, in more detail.Let us start by Theorem 9 giving an MPS description of the ground states of an invariant Hamiltonian by combining Eq. (56) and Eq.(64).A basis of ground states is labelled by valid sequences p 0 , . . ., p n−1 = λ min .For every such sequence, the ground state is given as an MPS of bond dimension O(n), Next, let us discuss Theorem 10, which calculates the time-evolved expectation value giving an initial state ψ as an MPS, of bond dimension χ.We first note that for u representing an invariant unitary U = A(u), U |ψ⟩ is again an MPS of bond dimension χ O(n 3 ) due to the MPO representation of A in Eq. ( 59).An invariant observable O = A(o) can be written as an MPO as well, so the expectation value ⟨ψ| U OU † |ψ⟩ can be evaluated as a product of MPS and MPO from left to right in polynomial time.However, the most efficient way to calculate that expectation value is to build on the method presented around Eq. (58).
The reduced density matrix ρ in Eq. (58) can be computed efficiently using the MPO representation of S in Eq. (64), As usual, this tensor network can be contracted from the left to the right.There are five independent horizontal indices, two of them of bond dimension χ, and three of bond dimension O(n).As argued above, the Clebsch-Gordon coefficients are band-diagonal.This is even true if the irrep indices are not fixed, since they can only change by ± 1 2 .The runtime of adding the ith column of tensors to the contraction is thus dominated by contracting the MPS tensors A i , which takes time O(χ w n 3 ).Since there are n steps, the total runtime is O(χ w n 4 ).

E Alternative combinatorial method for calculation of the entries of F
In Appendix C, we have presented a tensor-network method for calculating the matrix entries F i,λ q λ ,q ′ λ used in the main text to efficiently simulate permutation symmetric Hamiltonians.In this section, we present an alternative method that calculated these matrix elements in polynomial runtime O(n 10 ) using combinatorics.Even though the runtime scaling for this method is slower, it may be more approachable and insightful for people without tensor-network background.
We also like to remark that in this method, the matrix entries are calculated individually, each taking a runtime of O(n 4 ).This might help if the given Hamiltonian is sparse in the symmetrized Pauli basis.In this case it suffices to calculate only the matrix elements for the non-zero Pauli monomials.For example, consider a family of permutation invariant Hamiltonians that is k-local for a constant k.In this case, we only need the matrix elements for a constant number of Pauli monomials and thus in this case (since the Schur basis has dimension O(n 3 )) the relevant matrix elements can also be calculated in O(n 7 ).In contrast, restricting to a sparse subset of symmetrized Pauli monomials does not directly lead to an improvment of the O(n 7 ) runtime of the tensor-network based algorithm.
The combinatorial formula for the matrix elements that we will derive in this appendix is as follows.
Lemma 13.Recalling that λ is given by a Young diagram, we choose p λ0 to be the standard Young tableaux for that diagram, with numbers increasing first in the column direction and then in row direction, as shown in Eq. (42).Then the tensor components F i,λ q λ ,q ′ λ discussed in the main text for the completely symmetrized Pauli representation are given by: where the sum is over a set of 12 non-negative integers fulfilling the constraints and λ 1 is the length of the second row of λ.
Proof.Following Appendix A, we can project onto the space with an S n irrep λ and a fixed multiplicity label p λ0 using the Young symmetrizer in Eq. (41).Acting with the Young symmetrizer on a computational basis state yields a superposition of basis states with the same number of 0s and 1s.Let us write λ = (λ 0 , λ 1 ) for the lengths of the first and second row of λ.Then we see that applying the Young symmetrizer yields 0 unless the number of 1s is between λ 1 and λ 0 .This is because the symmetrizer does not change the number of 1s, and the antisymmetrizer on λ 1 length-2 columns yields 0 if any columns are 00 or 11.Thus, the irrep basis states can be obtained by applying the Young symmetrizer to states with λ 1 + q λ ones, where 0 ≤ q λ ≤ n − 2λ 1 .Specifically, we can use Let us first evaluate where |Σ y x ⟩ denotes the equal-weight superposition of all computation basis states on x qubits with x − y zeros and y ones, which (up to normalization) is also known as Dicke state on x qubits [55,56].Π r→c denotes the permutation of qubits needed to obtain the "column-standard" Young tableau p λ0 from an analogous "row-standard" Young tableau where the numbers first increase in the row direction and then in column direction.In other words, if we think of the qubits being associated to the tiles of the Young diagram λ, then the qubits in the first row are in state Σ q λ λ 0 , and the qubits in the second row are in state |1⟩ ⊗λ 1 .
Next, for a two-row standard Young tableau p λ0 , we have where |Ψ⟩ is the 2-qubit singlet state and τ denotes the swap operator acting on two qubits.The qubits in the second row of λ in the state of Eq. ( 74) are fixed to |1⟩, so applying |Ψ⟩ ⟨Ψ| to each of the first λ 1 columns has the same effect as applying |Ψ⟩ ⟨Ψ| (|0⟩ ⟨0| ⊗ id 2 ).Applying |0⟩ ⟨0| to the first λ 1 qubits of Σ q λ λ 0 yields |0⟩ ⊗λ 1 ⊗ Σ q λ λ 0 −λ 1 .Thus, we find: Now, we are ready to evaluate where we used S x y to denote the set of bitstrings of length y with exactly x ones.This is a sum over (more than) exponentially many terms.Similarly to the previous Appendix, it can be evaluated efficiently by realizing that many summands have equal value.Thus, we instead sum over the different possible values multiplied with the number of summands with that value, which can be counted using combinatorics.Each summand is an overlap of two product states with a product operator in between.More precisely, we have a product of first λ 1 two-qubit overlaps, and then n−2λ 1 single-qubit overlaps.
For each summand in Eq. ( 78), let us denote by L ab with a, b ∈ {1, x, y, z} the subset of two-qubit pairs: L ab := {(2l, 2l + 1) : and let us write f ab = |L ab | for the number of elements in those subsets.The according overlap is 0 if a ̸ = b, so we only need to consider subsets where a = b.The number of summands for given numbers f aa is the number of decompositions of the first λ 1 qubit pairs into the four subsets L aa with a ∈ {1, x, y, z}, which equals The value which the overlap on the first λ 1 qubit pairs contributes to each summand only depends on the numbers f aa .The overlap in Eq. ( 80) is given by 1 if a = b = 1, and −1 if a = b otherwise.Thus, the overall contribution to each summand is Next, let us consider the n − 2λ 1 single-qubit overlaps.For each summand in Eq. ( 78), let us denote by K iaj for i, j ∈ {0, 1} and a ∈ {1, x, y, z} the subset of the last n − 2λ 1 qubits and let us write g iaj = |K iaj | for the number of elements in those subsets.The according overlap is only non-zero if i = j for a ∈ {1, z} and i ̸ = j for a ∈ {x, y}, so we can restrict to summands where only those 8 subsets are non-empty.The number of summands for given numbers g iaj is the number of decompositions of the set of the last n − 2λ 1 qubits into the 8 subsets K iaj , and is thus given by The contribution of the overlap on the last n − 2λ 1 qubits to each summand only depends on the numbers g iaj .The single-qubit overlap in Eq. ( 84) evaluates to 1 for g 010 , g 111 , g 0X1 , g 1X0 and g 0z0 , −1 for g 1z1 , i for g 0y1 , and −i for g 1y0 .Thus the overall contribution to each summand is Overall, the number of summands for given f aa and g iaj is the product of Eq. ( 81) and Eq. ( 85), and the value of each summand is given by the product of Eq. ( 82) and Eq. ( 86).Plugging this into Eq.(78) yields Eq. ( 70).The constraints in Eq. ( 71) are explained as follows.The first four constraints are due to the fact that the number of zeros and ones in s and s ′ is determined by q λ and q ′ λ , respectively.The last four constraints correspond to the fact that the number of Pauli operators σ 1 , σ x , σ y , σ z in p i is given by i 1 , i x , i y , and i z , respectively.
In a similar fashion to the previous Appendix, we can easily evaluate the runtime this method achieves in calculating all of the matrix elements.Note that each component is a sum over four independent variables due to the constraints, yielding a runtime of O n 4 .Taking into account the O n 6 tensor components of F yields the final runtime of O n 10 .Once again, it seems likely that the O n 4 runtime for a single tensor component can be reduced to a smaller exponent.We will leave this open to further investigation.

F Ground state energy via structure coefficients
In this appendix, we will give an alternative algorithm for finding the ground state energy only.The runtime of this algorithm is slower than for the algorithm presented in the main text.Yet it might help the reader develop further intuition for why the computation is tractable in polynomial time.It is also simpler in that it does not require any knowledge of the Schur transform at all, but only determining the structure coefficients of the algebra X.

F.1 Algorithm for general symmetry groups
Let us start by phrasing the alternative ground-state energy algorithm for general symmetry groups and representations.Lemma 14 (Finding the ground state energy of symmetric Hamiltonians).Consider a subalgebra X of dimension N , and assume that the structure constants of X in some preferred basis are known.Let H ∈ A(X) be a Hamiltonian given in the preferred basis as in Eq. (4).Then the ground state energy of H can be found in time O (N ω ).
Proof.Consider the operator with indices: which is nothing but the regular representation of h for the algebra X.Then we have that their ground state energies are equal: GSE(H) = GSE( ĥ) , or, in tensor-network notation (c.f.Eq. ( 43)), ĥ This is because the regular representation is faithful, and the ground state energy of an operator is the same in any faithful representation.Since X has dimension N , the ground state energy of ĥ can be found in time O(N ω ).
An advantage of this algorithm is that the only necessary information are the structure constants of X; no knowledge of the irreps of X is needed.However, due to this we have poor scaling with the number of irreps n λ , as the direct sum structure of X is not necessarily known.Another disadvantage of this approach is that it only gives the ground state energy, rather than the ground state itself (in a representation that is not the regular representation).

F.2 Algorithm for qubit permutation invariance
Let us now apply the algorithm in Lemma 14 to the case of qubit permutation invariance.
Corollary 15.The ground state energy of a permutation-symmetric Hamiltonian on n qubits, given as h i in the basis of symmetrized Pauli monomials above, can be computed in time O n 3ω via Lemma 14.
Proof.All that is needed for applying Lemma 14 are the structure constants of the algebra X, which are computed in the following.Let us start by showning how these structure coefficients are computed in general, given only the representation A. If we normalize A such that it defines an orthonormal set of operators, Here, we use the fact that by definition of X, A is a faithful definition of X.Note that the * denotes complex conjugation of the tensor.We cannot directly efficiently contract the tensor network on the right-hand side of Eq. ( 91), since the thick indices have a bond dimension of 2 n .This problem can be solved by realizing that A can be written as an MPO, as we have seen in Eq. ( 59).Plugging this MPO representation into Eq.(91), we obtain where we have vertically shrinked the + tensors and the distance between the horizontal indices, and * denotes complex conjugation.We see that now the tensor network can be contracted efficiently by contracting in the horizontal instead of the vertical direction.Indeed, the overall horizontal bond dimension is n 9 , so contraction can be performed in polynomial runtime.To further speed up computation, we note that the values of each pair of horizontal indices on the left and right of a + tensor differ by at most 1.Thus, the tensor + is constant-width band-diagonal in its horizontal indices, independent of the value of its vertical indices.Thus, the matrix corresponding to a column of three + tensors is band-diagonal as well.Applying a band-diagonal matrix to a vector of bond dimension d takes time O(d).Thus, contracting the above tensor network takes n steps of runtime O(n 9 ), so runtime O(n 10 ) in total.
F.3 Alternative combinatorial method for computing the structure coefficients of X In this Appendix we give an alternative combinatorial algorithm for calculating structure constants of the algebra X, similar to the algorithm used in Appendix E. Again, the algorithm is slower than the tensor-network method in Appendix F.2, namely O(n 15 ).However, it might have the advantage of being more accessible to readers not familiar with tensor network.
Lemma 16.The structure coefficients X i,j k of the completely symmetrized Pauli representation are given by: X i,j k = {f ab } a,b∈{1,x,y,z} /Eq.( 94),Eq.(95) (93) where the variables in the sum are non-negative integers subject to the constraints a∈{1,x,y,z} and Proof.For calculating the structure constants X, we first note that where P i is the set of Pauli words with i a times σ a for a ∈ {1, x, y, z}.Now we evaluate the product: This is a sum over products of exponentionally many Pauli words.The idea to evaluate this is that many of the summands have equal value, so it suffices to sum over a few different values multiplied by the number of summands with that value.For every summand, define the subsets of qubits L ab for a, b ∈ {1, x, y, z}, (100) In total, the number of decompositions into four subsets for different c is given by Finally, the prefactor α p i ,p j ,p k in Eq. ( 97) only depends on the f ab .Using the Pauli algebra, Let us quickly discuss the complexity of the computation of X i,j k .In the summation of Eq. (93), we sum over 16 variables within a range of the order n, so if we naively evaluate the sum, we already obtain a polynomial runtime O(n 16 ).However, due to the constraint Eq. (94), we can reduce the summation to only 9 variables f ab with a, b ∈ {x, y, z}.Eq. (95) poses another three independent constraints, reducing the summation to 6 variables.Thus, an individual entry X i,j k can be calculated in O(n 6 ) runtime, whereas all O(n 9 ) coefficients together take runtime O(n 15 ).

G Permutation invariance for qudits
In this appendix, we sketch the generalization of the proposed algorithms to permutation invariant systems with n d-dimensional qudits instead of qubits.In this case, the dimension of the symmetrized basis is dim(X) = n+d 2 −1 This can be derived by utilizing a stars and bars counting argument, representing the number of ways to place d 2 − 1 bars among n stars.Let us now determine the numbers n λ and n q that determine the runtimes of our algorithms, and can also be used to rederive dim(X) = O(n λ n 2 q ).The irreducible representations of SU (d) are identified with Young diagrams with d − 1 rows, which can be labelled by a sequence j = {j i } 0<i<d−1 with j i ≥ j i+1 where j i is the number of boxes in the ith row.The representation acting on a single qudit is the d-dimensional fundamental irrep, corresponding to a Young diagram with only one box, which we denote by d.The overall representation is R = d ⊗n In order to decompose it into irreps, we only need to know the tensor product of an arbitrary irrep j with the fundamental irrep d.It is given by a direct sum of at most d + 1 irreps with at most x + 1 boxes, if x is the number of boxes of j.Specifically, the direct sum consists of irreps which are obtained by adding a box to any row, as well as an irrep obtained by removing a box from every row (as long as these operation still yield valid Young diagrams).So we see that for n qubits, the irreps in the decomposition of R are Young diagrams with at most n boxes.The number of these Young diagrams is O(n d−1 ).The maximum dimension of an irrep is O(n 2 ), as can be seen from the asymptotics of the formula given in Ref. [27], with λ k ≈ n(k/d).
Let us now look at how the computation of the matrix elements F via Eq.( 24) carries over to the case of qudits.Analogous to Eq. ( 23), the symmetrized Pauli basis for qudits can be represented as an MPO of bond dimension O n d 2 −1 .This MPO is still band-diagonal.Next, the tensors in the MPO representation in Eq. ( 23) are the Clebsch-Gordan coefficients C (λ ′ ,q ′ λ ) (λ,q λ ),(d,q d ) , which can be found in Section 18.2.6. of Ref. [57].For an appropriate choice of basis, C band diagonal.That is, for a fixed λ and q λ , there is a constant number of λ ′ and q ′ λ with C (λ ′ ,q ′ λ ) (λ,q λ ),(d,q d ) ̸ = 0.For the λ part, this follows from the fact that tensor product with the fundamental representation yields at most d+1 irrep factors as discussed above.Overall, the total bond dimension in Eq. ( 24) is O(n 2 q n d 2 −1 ) = O(n 2d 2 −d−1 ).Since the overall matrix (consisting of twice C and the tensor +) is band diagonal, a single vector-matrix multiplication in the contracting of Eq. ( 24) from left to right takes time O(n 2d 2 −d−1 ).Since there are O(n λ ) = O(n d−1 ) involved irreps, and the contraction consists of n vector-matrix multiplications, the total runtime is O(n 2d 2 −1 ).So we see that we do get a polynomial runtime also for qudits, but the exponents explode quickly with increasing local dimension, for example we have O(n 17 ) for qutrits.

Figure 1 :
Figure1: (a) Small groups of symmetry leave too large of an effective dimension for the problem to be tractable via quantum computation.On the contrary, very restrictive symmetries render a problem classically tractable.Between these two regions lies an area of promise where quantum computers may offer an advantage.(b) The Schur-Weyl decomposition shows that only a smaller representative subspace (indicated by darker colors) of the larger Hilbert space needs to be considered for permutation invariant operations.The size of this subspace grows as O(n 3 ) for n qubits.

Figure 2 :
Figure2: Graphical depiction of Schur decomposition for n = 4 qubits.There are three Young diagrams of at most two rows for 4 qubits.Due to the presence of permutation invariance, we can restrict attention to the darker colored subspaces which correspond to a single subspace over the multiplicity of the permutation irreps.To project onto this darker colored subspace, we use the Young symmetrizer (Eq.(41)).

3
in settings with permutation invariance.To see this, note that the dimension of the |q λ ⟩ register for a partition (a, b) is equal to a − b + 1.Therefore, we have DOF = degrees of freedom.A similar calculation can be performed via a stars-and-bars counting argument.

L
ab := {l : (p i ) l = σ a , (p j ) l = σ b , 0 ≤ l < n}, (98) and let f ab := |L ab | (99) be the numbers of elements in those subsets.Since every Pauli operator i l at a qubit l is paired with some other Pauli operator j l , f ab fulfill the constraints in Eq. (94).The multiplication algebra of Pauli operators directly implies Eq. (95).Let us now count how many Pauli words there are in the sum for a fixed set of numbers f ab and a fixed resulting Pauli word p k .Every triple p i , p j , p k corresponds to a decomposition of each k celement set of qubits {l : (p k ) l = σ c } for c ∈ {1, x, y, z} into four subsets L ab for the four different combinations a, b ∈ {1, x, y, z} with σ a σ b ∝ σ c under the Pauli algebra.For each c, the number of decompositions into the corresponding four subsets is given by k c ! a,b:σaσ b ∝σc f ab ! .
* Bond dimension of output O(n).* * * * * Previous work provides efficient algorithms for symmetric Dicke states jx δ iy,jy δ iz,jz + (σ x ) lm δ ix,jx+1 δ iy,jy δ iz,jz + (σ y ) lm δ ix,jx δ iy,jy+1 δ iz,jz + (σ z ) lm δ ix,jx δ iy,jy δ iz,jz+1 .Note that the value of δ a,b is understood to be zero unless 0 ≤ a, b < n.We can see that the value of the bottom, middle, and top horizontal index in Eq. (59) is the count of Xs, Y s, and Zs up to this point, respectively.To achieve this, the value of the bottom horizontal indices of a + tensor increases by 1 from left to right if the two vertical indices are in an X configuration, and analogously for the middle and top horizontal indices.