An efficient high dimensional quantum Schur transform

The Schur transform is a unitary operator that block diagonalizes the action of the symmetric and unitary groups on an $n$ fold tensor product $V^{\otimes n}$ of a vector space $V$ of dimension $d$. Bacon, Chuang and Harrow \cite{BCH07} gave a quantum algorithm for this transform that is polynomial in $n$, $d$ and $\log\epsilon^{-1}$, where $\epsilon$ is the precision. In a footnote in Harrow's thesis \cite{H05}, a brief description of how to make the algorithm of \cite{BCH07} polynomial in $\log d$ is given using the unitary group representation theory (however, this has not been explained in detail anywhere. In this article, we present a quantum algorithm for the Schur transform that is polynomial in $n$, $\log d$ and $\log\epsilon^{-1}$ using a different approach. Specifically, we build this transform using the representation theory of the symmetric group and in this sense our technique can be considered a"dual"algorithm to \cite{BCH07}. A novel feature of our algorithm is that we construct the quantum Fourier transform over the so called \emph{permutation modules}, which could have other applications.


Introduction
Schur-Weyl duality is a remarkable correspondence between the irreducible representations of the symmetric group and those of the unitary group acting on an n fold tensor product of a vector space V . This correspondence allows one to construct all of the so-called polynomial representations of the unitary, general linear and special linear groups. Polynomial representations of matrix groups such as unitary groups are representations whose matrix entries can be written as polynomials in the entries of the group element i.e., ρ(U ) is a polynomial representation if the entries of ρ(U ) are polynomial in the entries of U . Schur-Weyl duality has been generalized to many other groups and algebras including quantum groups [10,17].
Schur-Weyl duality has numerous applications in quantum information theory. It has been used to prove that the tensor product of many copies of a density operator is close to a projector. In fact, the projector is the one corresponding to a partition in Schur-Weyl duality that is closest to the spectrum of ρ [2,26,21,9]. It has also been used to prove de Finetti theorems [30], which have many applications in security proofs of quantum key distribution systems. The Schur transform was first constructed for qubits in the work of Bacon, Chuang and Harrow in [4]. This has been extended to qudits by the same authors in [5]. A quantum circuit for the Schur transform also has numerous applications in quantum information theory. It has been applied to universal distortion-free entanglement concentration [32], universal compression [21,22], encoding and decoding into decoherence-free subspaces [42,29,24,3]. These applications and others are discussed in more detail in Harrow's thesis [18]. Recently, the Schur transform has been used as a primitive in an efficient algorithm for spectrum testing of a density operator [35] and in algorithms for sample optimal state tomography [34,16] improving on previous algorithms [39,14,20,25]. It is also used in a scheme for optimal 2 Background in representation theory 2

.1 Basics of induced representations
In this section, we briefly describe the representation theoretic concepts such as irreducible representations (irreps, for short), regular representations and induced representations. Induced representations are important in this article since the dual Schur transform is essentially a block diagonalization of induced representations. These concepts for finite groups are described in several texts such as [37,12]. Here we follow the development in [12].
A representation of a finite group on a finite dimensional complex Hilbert space V is a homomorphism from the group G to the unitary group on the vector space U(V ) i.e., a representation is ρ : G → U(V ). For every finite or compact group, any representation on V can be made unitary i.e., ρ(g) is a unitary matrix for all g ∈ G. Very often, the space which the representation ρ maps to, is identified with the representation. Two representations ρ and ρ of a group G acting on the same vector space V are considered equivalent if there exists a unitary U such that U ρ(g)U † = ρ (g) for all g ∈ G. A subspace W of V is called a subrepresentation if ρ(g) preserves W for all g ∈ G. In this case, the orthogonal complement of W in V is also a sub-representation and V can be viewed as a direct sum of these two sub-representations. A representation is called an irreducible representation (irrep) if it does not contain any non-trivial sub-representations. Any representation V can be broken up into a direct sum of sub-representations W and its complement as above. Continuing this process further and breaking up W and its complement into sub-representations, one can arrive at a decomposition of V into irreps: V ∼ = V 1 ⊕ · · · ⊕ V n , where some of the irreps in the decomposition may be equivalent. A special kind of irrep is the trivial irrep which acts on a one-dimensional vector space and takes all the group elements to the identity. Any finite group G has a finite number of irreducible representations whose number is equal to the number of conjugacy classes of G.
A regular representation of a finite group G acts on the vector space C[G], where g acts on any basis vector h by left multiplication g : h → gh. The regular representation turns out to have the following interesting direct sum decomposition into irreps.
where W i is an irrep of G, i runs over all the different irreps of G and d i is the vector space dimension of the irrep W i . The quantum Fourier transform (QFT) over a group G usually refers to the transform that performs the above block diagonalization. This can be defined as the following basis transformation.
where ρ is the label of the irrep, i is the multiplicity space index and j is the irrep space index. From the above decomposition, we can see that the dimension of both the multiplicity space and irrep space are the same and so i and j run over the same index set labeling the basis vectors. A type of representation that is of particular importance here is the induced representation defined as follows. Given a subgroup H of G and a representation (ρ, W ) of H, we can construct a representation V of G as follows. As a vector space, it is the tensor product The action of G on this basis can be described using a transversal G/H = {t 1 , t 2 , . . . , t m } for H in G, where m is the number of cosets i.e., m = |G|/|H|. This means that these elements form an orthonormal basis of the vector space C[G/H]. Let {|w 1 , . . . , |w d } be a basis of W . We denote the induced representation by ↑ G H ρ or ↑ G ρ (when H can be inferred). In this basis, (↑ G H ρ)(g) is the following action on basis vectors (which can be linearly extended to other vectors).
where t ∈ G/H and h ∈ H are the unique elements for which gt = t h.

Irreducible representations of symmetric and unitary groups
The symmetric group on n letters is denoted S n and consists of all possible permutations of the n letters. There are n! permutations and every element can be written as a product of transpositions, where a transposition is a swap of two letters. Here we denote a transposition between a and b as (a, b). The representation theory of the symmetric group is discussed in several books (see for example, [12,23,36]). The irreducible representations of S n are labeled by Young diagrams, which are diagrams that consist of rows of boxes. A Young diagram corresponds to a partition of n, which is defined as a tuple λ = (λ 1 , λ 2 , . . . λ k ), where λ j ≥ 0, j λ j = n and λ k ≥ λ l for k < l. Given a partition, a Young diagram has k rows and i j boxes in row j. In this case, we say that there are k parts in the partition λ. For example, a Young diagram corresponding to the partition (2, 1) (not to be confused with a transposition) is given by . A Young tableau is a Young diagram with numbers in the boxes. If the numbers are from 1 to n, increasing from left to right and increasing from top to bottom, then the Young tableau is called a standard Young tableau (SYT). For example, for the partition (2, 1), an SYT could be 1 3 2 or 1 2 3 . In fact, these are the only possible choices. This reflects the fact that the irrep labeled by this Young diagram is two dimensional. In general, the dimension of the irrep of S n corresponding to a partition λ is the number of possible standard Young tableau. It is also given by the famous hook length formula where the product in the denominator is over all boxes (i, j) and h λ (i, j) is the hook length of a box, which is defined as the sum of the number of boxes to the right (including the box (i, j)) and the number of boxes directly below the given box (i, j). We will get back to other aspects of the symmetric group when we discuss subgroup adapted bases and permutation modules in the next section. The unitary group U(d) is the group of d × d unitary matrices that is an infinite, though compact, group. The irreducible representations of this group can be labeled in several ways. Here, we describe the labeling using both Dynkin labels and Young diagrams. A Dynkin label is the set of coefficients in the so called basis of fundamental weights. Every irreducible representation has a basis whose vectors are called weight vectors and have weights associated with them. In this basis, the highest (and lowest) weight vectors are unique and, in fact, every irrep of the unitary group can be associated to a unique highest weight (and there is a one-to-one correspondence between them). It turns out that the weight vectors lie in a space (called the root space) spanned by the fundamental weights. This means that every highest weight (of any irrep) can be written as a linear combination of the fundamental weights. The weights and weight vectors of a given irrep form a special basis of the irrep of the unitary group called the Gelfand-Tsetlin basis. We will discuss this basis in more detail below. It turns out that one can use Young diagrams to label irreps of the unitary group as well. One can convert the Dynkin label representation to a Young diagram representation in the following way: if the Dynkin labels are (l 1 , . . . l r ), then the corresponding partition λ has components λ i = l i + · · · + l r . Conversely, a Young diagram λ that represents an irrep can be converted to Dynkin labels by setting l i = λ i − λ i−1 .

Subgroup adapted bases
A subgroup adapted basis is a canonical basis for an irrep of a group G that is obtained from a tower of subgroups G 0 = 1 ⊂ G 1 ⊂ . . . G n = G from the identity to G. To see how one obtains a canonical basis from a given subgroup tower, first consider an irrep ρ of G = G n . Suppose we restrict it to the subgroup G n−1 , then ρ can be decomposed into irreps of G n−1 . Suppose that this restriction yields irreps σ i of G n−1 each with multiplicity m i , then choosing a basis for the multiplicity spaces and the irrep spaces would give us a basis for ρ. In choosing a basis for the σ i , we can restrict to G n−2 and so on down the subgroup tower. Finally, we would end up with the trivial subgroup and since it has only a one-dimensional irrep, this would fix the entire basis. In the special case that each of the restrictions from G i to G i−1 are multiplicity-free i.e., m i are all zero or one, we get a canonical basis. In other words, there is no ambiguity in choosing the basis for the multiplicity space except for the choice of a phase for each multiplicity space.
In most applications, one makes a projective measurement in this basis, which makes this phase choice irrelevant. However, if one were to perform other quantum operations conditioned on basis vectors after the Schur transform, then the phase choice might be relevant. But in that case, one can incorporate that phase choice into the conditional operations to be performed. We will see next that there are special subgroup towers for both the symmetric and unitary groups that have multiplicity-free branching along the tower and hence lead to canonical bases.
For the symmetric group, the tower of subgroups 1 = S 1 ⊂ S 2 ⊂ . . . S n , where S i permutes the first i letters and fixes the remaining n − i ones, gives a multiplicity-free branching rule from one subgroup to the next. This tower is fixed once we number the n qudit registers in some fashion. The resulting canonical basis is called Young orthonormal basis (also Young-Yamanouchi basis). This basis can be associated to Young diagrams with numbers in the boxes with the rule that the numbers are strictly increasing as one goes from left to right along a row and top to bottom along a column. As mentioned above, such numbered Young diagrams are called standard Young tableaux (SYT). An example is given below. . (

2.4)
As is well-known, the symmetric group is generated by adjacent transpositions (k, k + 1). If k and k + 1 are in different rows and columns in T , the action of any such transposition on an SYT T is given as where a T k is the inverse of the Manhattan distance in T between the boxes labeled k and k + 1 and b T k = 1 − (a T k ) 2 . The Manhattan distance between two boxes is the number of steps needed to go up plus number of steps to the right minus number of steps to the left minus number of steps down. It can be seen easily that this does not depend on the path taken. We use the notation |(k, k + 1)T to denote the SYT with k and k + 1 interchanged, which can be seen to be a SYT. If k and k + 1 are in the same row, they must be next to each other and the action is given as and if they are in the same column (and necessarily in adjacent rows) the action is For the unitary group, a subgroup tower that leads to a canonical basis is where U i is the unitary group acting on the i × i minor of the full d × d matrix. This tower is determined once we fix a basis for each qudit register. This tower, like the one for the symmetric group, gives rise to multiplicity-free branching and hence to a canonical basis. This canonical basis is called the Gelfand-Tsetlin basis. Given any irrep λ of U d in the form of a Young diagram, one can obtain the diagrams in the restriction to U d−1 by removing a box from end of each column in all possible ways. If two columns have the same length in λ and the choice is to remove a box from the left column, then one must also remove the box from the right column to ensure that a valid Young diagram is obtained. An example is given below.
Suppose we pick one and proceed with a choice down the tower, we would have the following possibility.
This sequence gives us a basis vector of the irrep λ. This can be encapsulated by putting numbers into the original irrep, which represent the stage before which the boxes are removed. For the above sequence the following numbering would hold.
Notice that the rows are weakly increasing i.e., the numbers either increase or stay the same as we move right and the columns are strictly increasing. Such a Young diagram is called a semi-standard Young diagram (SSYT). SSYTs with numbers taken from the set [d] label basis vectors in the irrep of U d . We will see below that SSYTs also play a role in certain induced representations of the symmetric group called permutation modules. Efficient encodings of these bases are constructed in [5] i.e., using poly(log d, n, log 1/ ) bits. For a SSYT, the tuple that contains the number of boxes labeled by a given integer is called the content of the SSYT. For instance, in the example above, the content is (2, 2, 3) corresponding to 2 boxes numbered one, 2 boxes numbered two and 3 boxes numbered three. SSYTs have an interesting structure that is useful in our algorithms. If we consider all the boxes containing a specific number, we find that no two of them appear in the same column. If we isolate these boxes, such a skew diagram is called a horizontal strip. An SSYT can be thought of as being composed of horizontal strips. It turns out that this composition can be made more precise as we describe in the next subsection. An example of an SSYT and the associated horizontal strips are given below.
In terms of horizontal strips, the above decomposition of an irrep λ of U (d) into irreps µ of U (d − 1) can be rephrased as the set of all µ that can be obtained from λ by removing a horizontal strip in all possible ways.

RSK algorithm and composition of Young tableaux
The discussion in this section is taken from the book by Fulton [11]. This procedure is used in the algorithm DualSchur (step 4). For a more detailed explanation of how to obtain SYTs with permuted content, see [11] chapter 4. The RSK (Robinson-Schensted-Knuth) algorithm establishes a correspondence between pairs of words and pairs of tableaux. The main part of the RSK is a procedure called row insertion that lets one insert a letter into a tableau such that the resulting tableau is semi-standard with one more box. This correspondence has several applications, but the primary application here is to produce semi-standard Young tableaux where the content is permuted. In this subsection, we briefly describe this algorithm. The row insertion procedure takes as input a tableau T and an integer x and outputs a tableau with one more box than T . The procedure is as follows 1. Find the number in the first row of T that is greater than x.
2. If there is none, then place x in a box at the end of the first row.
3. If there are numbers greater than x in the first row, let y be the smallest among them. Place x in y's position (x 'bumps' y).

4.
Repeat the previous steps with y (in the place of x) and starting with the second row.
The RSK algorithm uses the above row bumping procedure. It takes a pair of words, say u = u 1 u 2 . . . u r and v = v 1 v 2 . . . v r that has the following two properties as input. The first is that u is weakly increasing and second if u k−1 = u k , then v k−1 ≤ v k . Given such pairs as input the procedure produces a pair of tableaux (P, Q) iteratively as follows. Start with the base tableaux x and y , where x = v 1 and y = u 1 . Then from any pair (P k−1 , Q k−1 ), row insert v k into P k−1 getting P k . Then add a box to Q k−1 in the position where the new box is in P k and put u k in this box. This procedure allows us to define a product or composition of tableaux mentioned in the previous subsection. Suppose S and T are two tableaux, then S · T is defined as follows. If T consists of only one box, then the product of S and T is the result of row insertion into S. If T contains more than one box, then we row insert them one by one into S starting from the bottom left box and moving left to right along each row and upwards along the rows.
In this paper, we will need this procedure (in the algorithm DualSchur step 4) to create an SSYT V from another SSYT U with the content permuted. In order to describe it for any permutation, we only need to show it for a single transposition. As noted above, an SSYT consists of horizontal strips each having the same number. Now suppose that the transposition is (k, k + 1) i.e., if the original SSYT U contains n k boxes numbered k and n k+1 boxes numbered k + 1, then the new SSYT V should contain n k+1 boxes numbered k and n k boxes numbered k + 1. This is done by using the product defined above. It turns out [11] that we can write U = A · B · C, where A is a SSYT that contains only boxes numbered 1 through k, B is an SSYT that contains boxes numbered k and k + 1 and C is an SSYT that contains the remaining numbers. Since B contains only two labels, it must be of the form Now the new SSYT V is obtained by composing A · B · C using the RSK algorithm.

Gelfand-Tsetlin bases
An alternate way of representing the basis vectors of the unitary group is the so called Gelfand-Tsetlin (GT) patterns. GT patterns are useful in certain applications, although one can easily convert an SSYT to a GT pattern and vice versa. The power of GT patterns comes from the fact that in the GT basis, one can write the matrix elements of the Lie algebra SU(d) as derived by Gelfand and Tsetlin [13]. The formulae in this section can be found in the book by Vilenkin and Klimyk [40]. A GT pattern M is a triangle of numbers such as the one below. (2.13) These numbers satisfy the in betweenness condition 14) The numbers in the first row of the GT pattern correspond to the number of boxes in each row of the corresponding SSYT. The number of boxes in the SSYT with the number l in row k is m k,l − m k,l−1 . The total number of boxes with the number l is therefore the difference of the row sums k (m k,l − m k,l−1 ) (where m k,l = 0 if k > l). More systematically, in order to convert a GT pattern to an SSYT, we start from bottom-most row of the GT pattern and create a partial SSYT with one row of m 1,1 boxes labelled 1. Next, we add m 1,2 − m 1,1 boxes to this row labelled 2 and put m 2,2 boxes in the second row labelled 2. Continuing in this way, we add m 1,l − m 1,l−1 boxes to the first row labelled l, m 2,l − m 2,l−1 boxes in row two labelled l etc. The in-betweenness conditions guarantee that the skew tableau with boxes labelled l is a horizontal strip i.e., a skew tableau with at most one box in each of its columns. Now, to convert from an SSYT to a GT pattern, we can use the fact that the number of boxes labelled l in the k th row is m k,l − m k,l−1 and fill in the k th diagonal. The generators of the Lie algebra of SU(d) are defined as follows.
where E k,l is the matrix with a one in the (k, l) th position and zeros everywhere else and 1 ≤ l ≤ d − 1. The matrix elements of these generators can be expressed in the GT basis. The action of these elements on a basis vector corresponding to a GT pattern |M is given by Here δ k,l is a triangle of numbers like a GT pattern with zeros everywhere and a one in the k th diagonal and l th row. It is not a valid GT pattern on its own. In the above formulae, only those M ± δ k,l are considered that are valid GT patterns.

Schur-Weyl duality
Schur-Weyl duality refers to the fact that the actions of the symmetric group and the unitary group on V ⊗n are full centralizers of each other. A good description of Schur-Weyl duality can be found in the book by Goodman and Wallach [15]. In terms of representations, this can be written as follows. Suppose we pick the representation of the symmetric group on V ⊗n and block diagonalize it to obtain irreps as follows where λ runs over all the irreps of the symmetric group, V λ is the irrep space of the irrep λ and W λ is the multiplicity space on which the symmetric group acts trivially. Now consider the following action of the unitary group: U ⊗n i.e., the diagonal action where the same unitary acts on each copy of the vector space V . This action clearly commutes with the action of the symmetric group. This means that in terms of representations, W λ is a representation of the unitary group for each λ. Schur-Weyl duality essentially asserts that W λ is not just a representation, but rather an irreducible representation of the unitary group.
Stated in yet another way, the same unitary transformation that block diagonalizes the symmetric group representation into irreps also block diagonalizes the unitary group representation into irreps. In order to write this in terms of a basis, let us first pick a basis for V as {|1 , . . . , |d }. Then a basis for V ⊗n is the set The basis after block diagonalization can be written as |λ, i, j , where λ labels the symmetric (or unitary) group irrep, i is an index for a basis of the symmetric group irrep and j indexes the unitary group irrep basis. In terms of this, the (strong) Schur transform can be defined as the unitary transformation that changes the basis from the computational one i.e., |i 1 , . . . , i n to the block diagonal one |λ, i, j . The label i of the unitary group irrep is essentially a GT pattern or equivalently a SSYT and the label j is a SYT since it corresponds to the Young-Yamanouchi basis element of the symmetric group. The GT basis of the unitary group irrep λ consists of a highest weight given by (λ 1 , . . . , λ d ) and the highest weight vector is represented by the SSYT of shape λ and content is also λ i.e., its first row contains all ones, second row is all twos etc. In fact, the weight of any basis vector is the content of the SSYT. Therefore, all the basis vectors corresponding to SSYTs of a fixed content µ are degenerate and the multiplicity of this weight space is K λ,µ . The correspondence between the Kostka number K λ,µ and the multiplicity of the weight space with content µ can be shown combinatorially (see for instance the book by Stanley [38]).

Permutation modules of the symmetric group
Now, let us look at so called permutation modules of the symmetric group, which are useful in understanding the structure of the S n representation on the space V ⊗n . For a more careful treatment, see the book by Sagan [36]. First, define the type of any n-tuple E = (e 1 , e 2 , . . . , e n ) with e i ∈ [d] to be an n-tuple T (E) = (t 1 , . . . , t n ), where t i is the number of occurrences of i in E. Clearly, i t i = n and so corresponding to any E is a partition µ(E). Given a type T , denote by W (T ), the set of all n-tuples of that type. This set can be obtained by starting with the tuple E 0 (T ) = (1, . . . , 1, 2, . . . , 2, . . . , n, . . . , n), where there are t i elements labeled i and then applying all possible permutations to it. Now, a permutation module corresponding to T is the representation of S n on the vector space with basis as the set W (T ). This basis comprises of vectors of the form |E = |e 1 , e 2 , . . . , e n , where T (E) = T . Let µ = µ(T ) denote the associated partition i.e., the tuple obtained by arranging the non-zero elements of T in decreasing order.
It turns out that the representation of S n described above is an induced representation. It is induced from the trivial representation of a particular subgroup to the full group S n . This subgroup is denoted Y T and called a Young subgroup. The Young subgroup is the stabilizer of E 0 i.e., all possible permutations of S n that preserve E 0 . So the permutation module is P (T ) = ↑ Sn Y T 1. It turns out that the representation of S n on V ⊗n is just the direct sum of the permutation modules P (T ), where the sum is over all possible types T . These permutation modules are reducible in general and decompose into irreps λ of S n . However, not all irreps appear in this decomposition. Only those λ which dominate µ(T ) (in a certain ordering defined below) appear in the decomposition. Their multiplicities are called Kostka numbers and are denoted K λµ . The dominance order on the irreps or Young diagrams is the following. A Young diagram or a partition λ is said to dominate µ if λ 1 + · · · + λ k ≥ µ 1 + · · · + µ k for all k ≥ 1.
Let us now look at the structure of the multiplicity space of any irrep in a permutation module. We would like to understand this space and its basis since, in the dual version of the Schur transform, this space leads to the irrep space of the unitary group. As mentioned earlier, the dimension of the multiplicity of λ in the permutation module of the partition µ is K λµ . This space has a basis in terms of semi-standard Young tableau (SSYT) of shape λ and content µ (both of which are partitions of n) i.e., a Young diagram of shape λ filled with µ 1 ones, µ 2 twos etc., such that the numbers are strictly increasing in the columns and weakly increasing in the rows. As a special case, when λ = µ, we have K λ,λ = 1. In other words, there is only one SSYT with content and shape given by the same Young diagram. For λ = (2, 2), it is 1 1 2 2 and it turns out that such SSYTs lead to highest weight vectors in the Gelfand-Tsetlin basis of the unitary group.

Quantum Fourier transforms 3.1 Precision of quantum transforms
The precision of a unitary operator can be defined as follows. Given a target unitary V , U is called an approximation to a precision if where |||ψ || is the norm of the state |ψ . It can be shown [27] that a computation consisting of a sequence of L -approximate unitaries followed by a measurement that has a error probability δ has an overall error probability ≤ δ + 2L . When one has a m×m unitary matrix whose entries can be efficiently computed, one can use the Solovay-Kitaev theorem to -approximate it by a sequence of gates from a universal gate set using O(m 2 log c (m 2 / )) elementary operations. For a constant sized m, this is efficient in log(1/ ). As we will see below, the QFT over the symmetric group S n can be done in time O(poly(n, log(1/ )).

QFT over the symmetric group
Although implicit in steps of Beals' algorithm [6], the dependence on the precision is not written explicitly.
Here we show that it is poly(log(1/ )). We briefly explain the steps in Beals' algorithm for a quantum Fourier transform over C(S n ) and the labeling of the Fourier basis used in it. The algorithm proceeds by reducing each element of the symmetric group into a product of adjacent transpositions. The set of adjacent transpositions {(12), (23), . . . , (n − 1n)} generate the group and hence any element can be written as a product of adjacent transpositions. Beals' algorithm uses subgroup adapted bases, strong generating sets with small adapted diameter (techniques that have been generalized to several other groups in [33]). By inductively constructing the Fourier transform on the subgroup tower {S n , S n−1 , . . . , S 1 }, the algorithm converts from the group basis |g to the basis |λ, i, j . The indices i and j label the multiplicity space and irrep space, which are of the same dimension since this is the regular representation. Therefore, they are both labeled by SYTs defined above.
They can also be labeled by paths in the Bratelli diagram, which is a rooted tree with nodes at each level n corresponding to all the inequivalent irreps of S n . In this tree, there are edges between a node or irrep at level n (say ρ) and a node or irrep at level n − 1 (say σ) if σ is contained in the restriction of ρ to S n−1 . The multiplicity of this edge is equal to the multiplicity of σ in ρ. We will use this algorithm as a subroutine below. We will denote a QFT over S n as QFT(S n ) in the following and a QFT over any Young subgroup Y T by QFT(Y T ).
The main steps in the algorithm can be summarized as proceeding from S 1 to S n along the tower, 1. Embed an irrep of S k into an irrep of S k+1 .

Apply the unitary ρ(t),
where ρ is an irrep of S k+1 and t is a transversal i.e., an element of S k+1 /S k .

Sum over all cosets of
Each of these steps involves applying a unitary transform that is sparse (as shown in [6]) with only a constant number of non-zero entries that can be calculated efficiently. Using standard results described in the previous subsection, we can approximate the QFT in poly(log(1/ )).

Fourier transform over induced representations
As mentioned earlier, the usual Fourier transform is a unitary operator, which changes the basis from the basis of group elements {|g | g ∈ G} to the block diagonal form given in (2.1). In this subsection, we use this transform to construct a Fourier transform for induced representations i.e., a transform that block diagonalizes induced representations. It turns out that the Fourier transform for induced representations allows us to construct the dual Schur transform. Suppose we have an induced representation from H to G of an irreducible representation σ of H i.e., we have ↑ G H σ. The computational basis for this space can be written as |t, v , where t is an element of the transversal and v is a basis vector in the representation space of σ. This induced representation is in general reducible as a representation of G and can be decomposed as a sum of irreducible representations. The multiplicity of each irreducible representation ρ in the decomposition is equal to, using Frobenius reciprocity, the multiplicity of σ in the restriction of ρ to H. We now describe how one can perform this transform.
The change of basis we want to implement is from the basis labeled |t, v to a block diagonal basis labeled, say |λ, i, j . Here λ labels the irreps that appear in the decomposition and i and j are the multiplicity and the irrep space basis vectors.
1. First, we append an ancilla register of size |H|/dim(σ) (which is an integer) to the initial state so that we have a register of size H (excluding the transversal register). We now embed |v into this register of size H as |σ, u, v , where u labels the multiplicity space of dimension dim(σ).
2. Next, perform the inverse QFT over H to get the group basis |h . So including the transversal register, we have |t, h . Now, write this as |g . For the symmetric group, it turns out that this is easy since both t and h are specified in terms of adjacent transpositions. However, if the group basis for certain groups is defined in a complicated way, then this transform might be non-trivial.
3. Now perform the QFT over G to obtain the basis |λ, i, j . The label i is the multiplicity index, which runs over the entire dimension of λ. However, for an induced representation, it would run over a smaller set in general. Similarly for the irrep label. If the irrep and the multiplicity index labels can be ordered in such a way that the ones that occur in the decomposition are higher in the ordering, then we can easily return the ancilla register and obtain a clean transform. For the goal of block diagonalization, this ordering is not so important. However, we will see that this can be done for the case of the symmetric group.

Quantum Fourier transform over permutation modules
As explained above, permutation modules are induced representations from the trivial representation of some Young subgroup Y T to S n . The Young subgroup Y T can be regarded as the stabilizer group of some n tuple E = (e 1 , . . . , e n ) of type T with entries e i ∈ [d]. This representation space is spanned by all possible permutations of E. In order to construct the Fourier transform over this space, we first re-write these vectors in a way that reflects the structure of the induced representation. In other words, we choose a transversal and think of elements of the transversal |t as basis vectors. Now, we can use the algorithm to construct Fourier transforms over induced representations. For clarity, we make the steps involved more explicit here.
• First, take an ancilla of size |Y T | (more precisely, m = log |Y T | qubits) with all the qubits in the state |0 . Here |Y T | stands for the number of elements in the subgroup Y T . The state |0 m is taken to be the label of the trivial irrep of Y T .
• Then perform the inverse Fourier transform over Y T to obtain the equal superposition over all the elements of Y T . This can now be thought of as a subspace of C(S n ) spanned by equal superpositions over elements of cosets of Y T in S n .
• On this space, we can perform the quantum Fourier transform over S n . This produces the basis |λ, i, j with i and j labeled by SYTs.
The set of irreps λ that appear in the decomposition are the ones that dominate the Young diagram corresponding to Y T . Similarly, the multiplicity space, although embedded inside a space of dimension d λ , does not have support on all the vectors. It will be supported only on K λµ many vectors. This space is the multiplicity of the trivial representation of the subgroup Y T when λ is restricted to Y T . While the algorithm above does an essential block diagonalization, we would ideally like to have the multiplicity index label basis vectors of the right subspace rather than have it label the basis of a larger space. In order to do this, we need to change the basis inside multiplicity spaces to correspond to the trivial space under the action of the subgroup Y T . This follows from Frobenius reciprocity as mentioned earlier. It turns out this is also important to get to the Gelfand-Tsetlin basis in the dual Schur transform that we construct in the next section. To perform this base change in the multiplicity spaces, we can use generalized phase estimation (GPE) [18]. GPE is a generalization of Kitaev's phase estimation technique [28]. In this technique, one can block diagonalize any representation ρ(g) if one can perform ρ(g) with g as control. The main reason why we need the GPE is to organize the multiplicity space into a basis that consists of vectors that transform trivially under Y T . We can use this primitive to re-organize the multiplicity space.
The action of Y T and S n in the multiplicity space is the so called right regular representation R. This acts on Young tableau according to the Young orthonormal representation since for the right action of S n , the multiplicity space is an irrep. The result of performing GPE on the basis vector

2)
and λ(g) is the representation corresponding to the partition λ evaluated at the group element g. In order to have a controlled application inside the multiplicity space, we can first apply a controlled right multiplication and then apply the quantum Fourier transform. We can just combine GPE with the algorithm described above to get a decomposition of the multiplicity space. However, for completeness, we explicitly list out all the steps below. The algorithm for GPE and its performance guarantee are as follows [18].
GPE ρ (G) Inputs: A quantum state |ψ in the representation space of ρ of a group G. Blackbox: The ability to perform controlled multiplication in the representation ρ. Outputs: The outcome λ of an irrep of G with probability p λ = ψ|Π λ |ψ . Runtime: 2T QF T (G) + T Cρ , where T QF T (G) is the time to perform a QFT over the group G and T Cρ is the time to perform controlled multiplication in the representation ρ.
1. Take an ancilla register consisting of log |G| qubits and create an equal superposition over elements of G.
3. Perform a QFT(G) to the ancilla register.
4. Measure the irrep label of the ancilla register.
The black box in the above algorithm can be made explicit in the following algorithm. The representation ρ turns out to be the usual right regular representation and can be efficiently implemented. The overall algorithm to block diagonalize permutation modules is the following. This algorithm is potentially applicable to other problems where one needs to block diagonalize induced representations and could be of independent interest. In the following, we will use GPE as a unitary module without the measurement step (step 4 above). The algorithm QFTPermMod described next is essentially the quantum part of the algorithm DualSchur described in the next section. To make things clear, we have included a block diagram of the algorithm QFTPermMod in Fig. 1.

QFTPermMod(Y T )
Inputs: A quantum register A with the computational basis given by elements of the transversal of Y T in S n . Outputs: Quantum registers |λ, i, j corresponding to the block diagonalization of the induced representation of the trivial irrep of Y T to S n . Runtime: O(poly(n, log −1 )).
1. Take an ancilla register B consisting of log |Y T | qubits and create equal superposition over elements of Y T .
2. Perform GPE R (Y T ) on register AB where R is the right regular representation of S n .
3. Perform a QFT(S n ) on the registers AB. Proof. The proof can be broken into three parts.
1. We show that for states of the form t a t |t , performing GPE and measuring the irrep of Y T would always give the trivial irrep.
2. We show that when the trivial irrep of Y T appears in the measurement, the computational basis of the multiplicity space is rotated from SYTs to (the normalized version of) λ(Y T )|T , where T is a SYT.
3. Then we show that the states λ(Y T )|T are in one-to-one correspondence to SSYTs whenever the states are non-zero.

Part (i)
To show that we always get the trivial irrep of Y T for states of the form t a t |t , where t is the transversal of Y T in S n , we track the state through the steps of the algorithm.
1. Suppose we had the state t a t |t A , where t is an element of the transversal of Y T in S n . We take an ancilla register B consisting of log |Y T | qubits to get t a t |t A |0 B .
2. Perform inverse QFT over Y T on the B register to obtain an equal superposition over group elements of Y T . This allows us to view the registers A and B together as the group basis of S n . This step takes O(polylog |Y T |) time since the QFT over Y T can be done efficiently for Young subgroups (as they are direct products of symmetric groups). We now have the state t,h b t |t A |h B . Here h runs over all the elements of Y T and b t = a t / |Y T |.
3. Perform GPE, which consists of the following steps. This takes O(polylog |Y T |+ poly(n)) time since controlled R operations can be done in poly n time and there are two QFTs over Y T .
• Take a register C of log |Y T | qubits initialized to |0 and obtain an equal superposition over Y T in it. This gives us the state t,h1,h2 where h 1 and h 2 run over all elements of Y T and c t = b t / |Y T |.
• Conditioned on the register C, perform a controlled right multiplication on the group basis of S n in registers AB i.e., perform where R is the right multiplication in S n . This gives the state t,h1,h2 This state can be rewritten as where we have replaced h 1 h 2 as h 3 .
• Perform QFT(Y T ) on C. This gives us the following state µ,k,l,h1,h3,t where µ runs over all the irreps of Y T and k and l run over its dimension. The sum over h 1 forces µ to be the trivial irrep of Y T . Thus, for states of the form t a t |t , we will always get the trivial irrep when we measure µ.
Part (ii) If we had done a QFT over S n and not performed GPE before that, we would have had the basis of the multiplicity space labeled by SYTs. Here we show that, after performing GPE and when µ (the irrep label of Y T ) is trivial, the basis of SYTs is rotated to λ(Y T )|T , where T is a SYT of shape λ. Starting with a state of the form |th 1 and performing GPE would give us the state µ,k,l,h2 Next we perform a QFT over S n on this state to get the following state h2,λ,µ,T1,T2,k,l where T 1 and T 2 are SYTs. Since the right regular representation acts only on the first register, we have the λ(h 2 ) acting only on |T 1 . When µ is trivial the basis state |T 1 , which corresponds to a SYT gets taken to λ(Y T )|T 1 .

Part (iii)
We show next that λ(Y T )|T can be identified with semistandard Young tableau of shape λ and content defined by Y T . In other words, if Y T = S X1 × S X2 × · · · × S X k for some k, then the content is |X 1 | 1s, |X 2 | 2s etc. Also note that the sets X i consist of consecutive integers. The SSYT associated with λ(Y T )|T is the one where all the integers in X i are replaced by i. This is a valid SSYT if no two integers in X i are in the same column or equivalently, every column has at most one element of X i . If we isolate the boxes in the SYT labeled by elements of X i and they have the property above, such a skew Young diagram is called a horizontal strip as described in section 2.3.
We now only need to show that for every X i that if the boxes numbered with elements of X i form a horizontal strip, then λ(Y T )|T is non-zero and it is zero otherwise. To show this, we focus on a single X i and show that if there are two elements of X i in the same column, then λ(X i )|T = 0. This is done next in lemma 2. Finally, the claim of the dependence on follows from the fact the each of the steps in the algorithm (including the group multiplications and quantum Fourier transforms) can be done with O(polylog −1 ) elementary gates based on the results described in section 3.1 and 3.2. Proof. Since the elements of A are consecutive integers, we can assume without loss of generality that there are two elements of A that appear in the same column and in consecutive rows. Let the elements be i and j with j > i. We first show that if j = i + 1, then λ(S A )|T = 0 and then reduce the general case to this one.

Lemma 2. Let
So assume now that i and i + 1 are in the same column (they have to be in consecutive rows). The action of the transposition (i, i + 1) is λ((i, i + 1))|T = −|T . Therefore, λ(e + (i, i + 1))|T = 0, where e is the identity element. It is easy to see that for any set K = {i 1 , i 2 . . . i r }, the symmetric group algebra element that is a sum of all possible permutations of the elements of K can be written as follows.
S K = ((i r , i 1 ) + · · · + (i r , i r−1 )) . . . ((i 3 , i 1 ) + (i 3 , i 2 ) + e) ((i 2 , i 1 ) + e) (3.10) While this factorization is dependent on the ordering of the elements, the overall group algebra element S K is independent of it. Using this and writing A = {i, i + 1, i + 2, . . . , a k , a 1 , a 2 , . . . , i − 1}, we obtain (3.11) where in S A , we collect the rest of the terms i.e., S A is a product of sums of transpositions coming from the factorization above. It is now easy to see that S A |T = 0 if T contains i and i + 1 in the same column. For the general case, assume inductively that when i and j − 1 are in the same column, then S A |T = 0. Now, suppose that i and j are in the same column and in consecutive rows. Consider the element k, where k is the largest element between i and j such that k is in a different row from j and all elements between k and j are in the same row as j. For example, if j − 1 is in a different row from j, then k = j − 1. This, in particular, means that k and k + 1 are in different rows. If they are in the same column, then we are done. So assume that they are not in the same column. Then we have Let (k, k + 1) be a transposition and let |(k, k + 1)T be the SYT with k and k + 1 interchanged. Since they are not in the same row or column, (k, k + 1)T is also a SYT. We have where a T k is the inverse of the Manhattan distance in T between k and k + 1 and b T k = 1 − (a T k ) 2 . Using these equations, we have |T = (Ae + B(k, k + 1))|(k, k + 1)T , (3.14) where . Therefore, we have where the last equality follows from the fact that both e and (k, k + 1) commute with S A . Now note that in (k, k + 1)T , k + 1 and k + 2 are not in the same row or column. Continuing this process we get It is easy to see that |(j − 1, j) . . . (k, k + 1)T is a SYT where i and j − 1 are in the same column and consecutive rows. By the induction hypothesis, we have S A |(j − 1, j) . . . (k, k + 1)T = 0.

Dual algorithm for the Schur transform
We are now ready to describe our dual algorithm for the Schur transform using the above tools. It involves essentially two main steps. The first is a block diagonalization into permutation modules and the second is a block diagonalization of each permutation module into irreps using the above transform. The algorithm is as follows.  where G is the symmetric group S n , H is a Young subgroup of the form (S µ1 × S µ2 × . . . S µ k ) for some k ≤ d.
To avoid clutter, we replace Y T above by H. We will call the tuple µ, which can be permuted to a valid Young diagram, the content corresponding to H since it is the content of the SSYTs that we obtain. The sum over g runs over elements of the transversal of H in G. With the embedding of the permutation module into the group algebra of S n done in the previous section, we can interpret |H as the computational basis state |11 . . . − for l = 1, 2, . . . , d − 2. These operators were defined in section 2.5. We show that these operators acting on any |λ, i, j leave the horizontal strip labeled d intact. In other words, they give superpositions of states of the form |λ, i, j , where j is a SSYT of shape λ and whose horizontal strip labeled d is the same as the one in j.
To show this, let us look at the action of J − is the Hermitian conjugate and so it suffices to look at J (l) + . We work with the unnormalized states given above.
where g runs over the transversal of H in G and H is the Young subgroup that differs from H in the number of ls and l + 1s with one more l and one less l + 1 than H i.e., if the content corresponding to H is µ = (µ 1 , . . . , µ k ), then that of H is µ = (µ 1 , . . . , µ l + 1, µ l+1 − 1, . . . , µ k ). The coefficient c g can be calculated to be c g = [λ(g JH)] i,j , (4.6) where the operator J is defined as where σ l = µ 1 + · · · + µ l and similarly σ l+1 . This shows that the new SSYT is a sum SSYTs obtained by taking a SYT j and applying H and an element of J. This means that from the SSYTj, we get a sum that involves SSYTs obtained by replacing some box labeled l + 1 by l as long as it gives a valid SSYT (since if two boxes labeled l are in the same column, then by lemma 2, that SSYT is taken to zero). Now we show that the action of J (l) 0 is as described in 2.5. To see this note that E l,l acting on the computational basis counts the number of ls in the basis vector, which means that it counts the number of boxes labeled l in the SSYT basis. Therefore since J

Conclusions
We have presented an efficient algorithm for a high dimensional Schur transform that runs in time O(poly(n, log d, log 1/ )). This improves exponentially in the dimension over the prior work of Bacon, Chuang and Harrow [5]. As mentioned above, Harrow's thesis [18] contains a way to make the unitary group approach of [5] polynomial in log d. Our algorithm is novel in that it uses the representation theory of the symmetric group rather than that of the unitary group. Another interesting feature is that it uses only the quantum Fourier transform and generalized phase estimation (which is also based on the QFT) and essentially no new tools. A potentially useful feature of this algorithm that could be a primitive for other problems is the circuit for a Fourier transform over induced representations. Several permutation modules, which are induced representations encode important problems that include element distinctness and collision finding. The subroutines to block diagonalize permutation modules could provide Fourier analytic algorithms to these problems and generalize to solve other problems which have permutational symmetry.