Introduction to Haar Measure Tools in Quantum Information: A Beginner’s Tutorial

The Haar measure plays a vital role in quantum information, but its study often requires a deep understanding of representation theory, posing a challenge for beginners. This tutorial aims to provide a basic introduction to Haar measure tools in quantum information, utilizing only basic knowledge of linear algebra and thus aiming to make this topic more accessible. The tutorial begins by introducing the Haar measure with a specific emphasis on characterizing the moment operator, an essential element for computing integrals over the Haar measure. It also covers properties of the symmetric subspace and introduces helpful tools like tensor network diagrammatic notation, which aid in visualizing and simplifying calculations. Next, the tutorial explores the concept of unitary designs, providing equivalent definitions, and subsequently explores approximate notions of unitary designs, shedding light on the relationships between these different notions. Practical examples of Haar measure calculations are illustrated, including the derivation of well-known formulas such as the twirling of a quantum channel. Lastly, the tutorial showcases the applications of Haar measure calculations in quantum machine learning and classical shadow tomography.

Although the Haar measure serves as a fundamental tool in quantum information, its study can be challenging for beginners due to its reliance on advanced concepts from representation theory.In order to make this topic more accessible, this tutorial provides an introduction to Haar measure tools using only concepts from linear algebra.Throughout the tutorial, we intentionally avoid delving into representation theory to enhance accessibility.For further reading, one can refer, for example, to [39,40,[45][46][47][48][49][50][51].
The tutorial is structured as follows.Section 2 offers an overview of the notation and preliminary concepts that will be used throughout the tutorial.Section 3 focuses on introducing the Haar measure, with a specific emphasis on characterizing the moment operator-a crucial quantity for computing integrals over the Haar measure.In Section 4, the symmetric and antisymmetric subspaces are introduced, highlighting their properties and their connection to the Haar measure over pure states.Sections 5 and 6 introduce tools that facilitate calculations in Haar measure applications.Section 5 presents the vectorization formalism, while Section 6 introduces the tensor networks diagrammatic notation, providing visual representations that enhance comprehension and streamline computations.Section 7 introduces the concept of unitary designs-a method employed to mimic some properties of the Haar measure, facilitating efficient protocols in quantum computing.Building upon this foundation, Section 8 explores approximate notions of unitary designs, elucidating the relationships between the different introduced notions.In Section 9, the tutorial showcases practical examples of Haar measure calculations.We begin by deriving well-known formulas such as the twirling of a quantum channel, average gate fidelity, average purity of reduced states in bipartite systems, and Haar averages of observables expectation values.Furthermore, we demonstrate how these expected value calculations can be translated into probability statements using concentration inequalities.Finally, we delve into two in-depth applications: Barren Plateaus phenomena in Quantum Machine Learning and Classical Shadow Tomography, both of which rely on the theory of k-designs.

Notation and Preliminaries
We use the following notation throughout this tutorial.L(C d ) is the set of linear operators that act on the d-dimensional complex vector space C d , while Herm(C d ) is the set of Hermitian operators on C d .The identity operator is denoted by I, and we define the operator I := I ⊗ I as the tensor product of two identity operators.The unitary group is denoted by U(d) and is defined as the set of operators U ∈ L(C d ) such that U † U = I.Furthermore, we use the notation [d] to denote the set of integers from 1 to d, i.e., [d] := {1, . . ., d}.
Let v ∈ C d be a vector, and let p ∈ [1, ∞].The p-norm of v is denoted by ∥v∥ p , and is defined as ∥v∥ p := ( d i=1 |v i | p ) 1/p .The Schatten p-norm of a matrix A ∈ L C d is given by ∥A∥ p := Tr(( √ A † A) p ) 1/p , which corresponds to the p-norm of the vector of singular values of A. The trace norm and the Hilbert-Schmidt norm are important instances of Schatten p-norms and are respectively denoted as ∥ • ∥ 1 and ∥ • ∥ 2 .The Hilbert-Schmidt norm is induced by the Hilbert-Schmidt scalar product ⟨A, B⟩ HS := Tr A † B for A, B ∈ L C d .The infinite norm, denoted as ∥•∥ ∞ , of a matrix is defined as its largest singular value.This norm can be understood as the limit of the Schatten p-norm of the matrix as p approaches infinity.Important facts about the Schatten p-norms that will be used in the tutorial are the following.For all matrices A and 1 ≤ p ≤ q, we have ∥A∥ q ≤ ∥A∥ p and ∥A∥ p ≤ rank(A) (p −1 −q −1 ) ∥A∥ q .Additionally, for all unitaries U and V , and matrix A, we have the unitary invariance property ∥U AV ∥ p = ∥A∥ p .Furthermore, we also have the tensor product property ∥A ⊗ B∥ p = ∥A∥ p ∥B∥ p and the submultiplicativity property 1 ∥AB∥ p ≤ ∥A∥ p ∥B∥ p , for all matrices A and B.
We use the bra-ket notation, where we denote a vector v ∈ C d using the ket notation |v⟩ and its adjoint using the bra notation ⟨v|.We refer to a vector |ψ⟩ ∈ C d as a (pure) state if ∥|ψ⟩∥ 2 = 1.The canonical basis of C d is denoted by {|i⟩} d i=1 .We define the non-normalized maximally entangled state |Ω⟩ as |Ω⟩ := d i=1 |i⟩ ⊗ |i⟩ = d i=1 |i, i⟩.We define the set of density matrices (or quantum states) as S C d := {ρ ∈ L C d : ρ ≥ 0, Tr(ρ) = 1}.A quantum channel Φ : L(C d ) → L(C d ) is a linear map that is completely positive and trace-preserving.In particular, the completely positive condition means that for all positive operators σ ∈ L(C d ⊗ C D ), for all D ∈ N, the operator Φ ⊗ I(σ) is also positive.Here, I : L C D → L C D denotes the identity map, which simply maps any A ∈ L C D to itself.Additionally, every quantum channel Φ can be expressed in terms of d 2 Kraus operators, i.e., there exist {K i } d 2 i=1 operators such that Φ (•) = i K i = I in order to satisfy the trace-preserving property.
3 Haar measure and moment operator Definition 1 (Haar measure).The Haar measure on the unitary group U(d) is the unique probability measure µ H that is both left and right invariant over the group U(d), i.e., for all integrable functions f and for all V ∈ U(d), we have: In fact, for compact groups such as the unitary group, there exists a unique probability measure that is both left and right invariant under group multiplication [52].However, we will not delve deeper into the theory of compact groups and measures in this tutorial, as our focus is on applications of the tools we describe.For a more comprehensive treatment of this topic, we recommend referring to [51,52].
The Haar measure is a probability measure, satisfying the properties S 1 dµ H (U ) ≥ 0 for all sets S ⊆ U(d) and U(d) 1 dµ H (U ) = 1.Consequently, we can denote the integral of any function f (U ) over the Haar measure as the expected value of f (U ) with respect to the probability measure µ H , denoted as ( When f (U ) is a matrix function, the expected value is understood to be the expected value of each of its entries.We can prove the following proposition, which shows that any integral involving an unbalanced product of matrix entries of U and U * must vanish.In the following, we use the notation Proof.We can use the right invariance multiplying U by the unitary exp i π k1−k2 I: from which the claim follows. 1 The significance of investigating tensor powers of unitaries and their adjoints, i.e.U ⊗k ⊗ U * ⊗k , becomes evident when considering applications (Section 9).In brief, within various scenarios, the computation of integrals over the Haar measure of the relevant quantity reduces to evaluating integrals of homogeneous polynomials of degree k in the matrix elements of U and U * , for some k ∈ N. Remarkably, any such polynomial can be expressed as Tr AU ⊗k ⊗ U * ⊗k , where A is a matrix that contains the coefficients of the polynomial.
This next proposition will be useful in our subsequent calculations.It states that in Haar integrals, we are free to change the variable U with U † .Proposition 3.For all integrable functions f defined on U(d), we have that: Proof.Let µ † be a probability measure defined as We will now show that µ † is right and left invariant, which implies that it coincides with µ H because of the uniqueness of the Haar measure.To this end, let V be a fixed unitary matrix: = This shows that µ † is right-invariant, as claimed.Left-invariance similarly follows.
A quantity that plays a crucial role in our analysis is the k-th moment operator, where k is a natural number.Definition 4 (k-th Moment operator).The k-th moment operator, with respect to the probability measure µ H , is defined as By characterizing the moment operator and computing its matrix elements, we can explicitly evaluate integrals over the Haar measure.For example, let O = |i 1 , . . ., i k ⟩⟨j 1 , . . ., j k | with i a , j a ∈ [d] for all a ∈ [k], we have: where l a , m a ∈ [d] for all a ∈ [k].In order to characterize the moment operator, we need to define the k-th order commutant of a set of matrices S.

Definition 5 (Commutant).
Given S ⊆ L C d , we define its k-th order commutant as It is worth noting that Comm(S, k) is a vector subspace.In the following we will show that the moment operator is the orthogonal projector onto the commutant of the unitary group Comm(U(d), k) with respect to the Hilbert-Schmidt inner product.In order to do so, we first prove the following Lemma.

Lemma 6 (Properties of the moment operator).
The moment operator 1.It is linear, trace-preserving, and self-adjoint with respect to the Hilbert-Schmidt inner product.

For all
Proof.
1. Linearity and the trace preserving property follow easily from Definition 4. To show that the moment operator is self-adjoint, we need to prove that: for all A, B ∈ L (C d ) ⊗k .This follows because: where in the last step we used Proposition 3.
2. For all V ∈ U(d), we have that: where we used the left invariance of the Haar measure.
3. Since A ∈ Comm(U(d), k), we have: Theorem 7 (Projector onto the commutant).The moment operator is the orthogonal projector onto the commutant Comm := Comm(U(d), k) with respect to the Hilbert-Schmidt inner product.In particular, let P 1 , . . ., P dim(Comm) be an orthonormal basis of Comm and let O ∈ L((C d ) ⊗k ).Then, we have: Proof.Let us extend the orthonormal basis of the commutant with the orthonormal operators P i for i ∈ {dim (Comm) + 1, . . ., dim (V )}, where V := L (C d ) ⊗k .This extended basis forms an orthonormal basis for V .Therefore we have: where in the second line we used the fact that the moment operator is self-adjoint (Lemma 6.1), in the third line that M (k) µ H (O) ∈ Comm (Lemma 6.2) and that P i with i ∈ {dim (Comm) + 1, . . ., dim (V )} are in its orthogonal complement, in the fourth line that P i ∈ Comm for i ∈ {1, . . ., dim (Comm)} and Lemma 6.3.
We have just shown that the moment operator is intimately related to the k-order commutant of the unitary group, i.e. all the matrices that commute with U ⊗k for any unitary U .A set of operations that surely commutes with U ⊗k is if we exchange some tensor factors.For this purpose, we will now define the operators that implement such transformations, namely the permutation operators.
Definition 8 (Permutation operators).Given π ∈ S k an element of the symmetric group S k , we define the permutation matrix V d (π) to be the unitary matrix that satisfies: Note that from this definition it follows that . Equivalently, we can write the permutation matrix as: Thus, we have the following property: A crucial and much celebrated result of representation theory is now that the permutation operators characterize all possible matrices in the commutant -this is the Schur-Weyl duality.
Theorem 9 (Schur-Weyl duality [53]).The k-th order commutant of the unitary group is the span of the permutation operators associated to S k : We will omit the proof of this theorem here, except for the cases where k = 1 and k = 2, which we will show later in this section.Interested readers can refer to Refs.[46,53,54] for a detailed exposition.However we can easily check that span V d (π) : π ∈ S k ⊆ Comm(U(d), k).To see why this is true, consider an arbitrary permutation V d (π) with π ∈ S k and U ∈ U (d).We have: for all |ψ 1 ⟩ , . . ., |ψ k ⟩ ∈ C d .Hence we have that V d (π), U ⊗k = 0 for all π ∈ S k , from which the claim follows.
The permutation matrices form a basis for the k-th order commutant of the unitary group, capturing its essential structure.However, it is important to note that they are not orthonormal with respect to the Hilbert-Schmidt inner product.Therefore, we cannot directly apply Theorem 7. Nevertheless, we have an alternative approach that allows us to evaluate the moment operator and, consequently, compute integrals over the Haar measure.The following theorem presents a recipe for accomplishing this task.
where the coefficients c π (O) can be determined by solving the following linear system of k! equations: This system always has at least one solution.
Proof.Equation ( 30) follows from Lemma 6.2, which states that Comm(U(d), k), and from Schur-Weyl duality (Theorem 9).To obtain the linear system of equations, we begin by multiplying both sides of Eq. ( 30) by V † d (σ) and taking the trace for all σ ∈ S k .This yields: where in the first equality we used Eq. ( 30), in the second equality we used that V † d (σ) commutes with U ⊗k , and in the last equality we used the fact that the moment operator is trace preserving (Lemma 6.1).Since M , a solution to this linear system of equations always exists.
The previous theorem provides an explicit expression for the coefficients of the moment operator in the permutation basis as Here, G is the Gram matrix, i.e. that matrix with coefficients , and G + is its pseudo-inverse.This result allows us to express the moment operator in terms of the so-called Weingarten coefficients [51,[55][56][57] 2 , as follows: The Weingarten coefficients Wg(π −1 σ, d) can be written in terms of characters of the symmetric group.However, we will not explore this aspect here.For reference, see [51,55,59,60].The Gram matrix G has a simple expression in terms of the cycle structure of the permutations, given by: where the last equality follows from the fact that and observing that this sum has d #cycles(π) nonvanishing terms and they are all equal to 1.This fact is also evident in the tensor networks notation that we will introduce in section 6.
Proposition 11.For π ∈ S k , the permutation matrices Proof.Let us consider first the case where k ≤ d.We assume that there exist complex coefficients α π ∈ C for all π ∈ S k such that the following equation holds: Since, k ≤ d, we can choose k distinct elements i 1 , . . ., i k ∈ [d] and apply to the right of both sides of the previous equation the state For all σ ∈ S k , we multiply to the left by and deduce α σ = 0. Thus, we have shown linear independence.If k > d, then consider the operator: Now we show this linear combination is the zero operator, proving linear dependence.We consider the action of A on an arbitrary product basis state |i 1 ⟩ ⊗ • • • ⊗ |i k ⟩.Since k > d, at least two tensor factors must have matching entry, i.e., there exist l ̸ = m ∈ [k] such that i l = i m .Due to anti-symmetrization, the output vector has to be the zero vector.In fact, we have: In the following we define the identity permutation operator I and the Flip operator F (also often referred to as SWAP operator) which are the permutation operators corresponding to the elements of the permutation group S 2 .
Definition 12 (Identity and Flip operators).The identity permutation operator I is defined as the linear operator that leaves any tensor product state |ψ⟩ ⊗ |ϕ⟩ unchanged, that is: The Flip operator F is defined as the linear operator that interchanges the order of the tensor factors of any product state |ψ⟩ ⊗ |ϕ⟩, that is: Writing the Identity and the Flip in the computational basis, we have: From this, it is evident that the Flip operator is Hermitian.Another key property of the Flip operator is the swap-trick, which states that for all operators A, B ∈ L(C d ), we have: This property can be easily verified using the definition of the Flip operator.The swap-trick is particularly useful as it allows us to simplify calculations involving tensor products and permutations, and it will be used extensively in the subsequent sections.
We now present a corollary of Theorem 10, which plays a crucial role in many calculations involving Haar integrals in the context of quantum information, as we will see in section 9.In the following corollary we assume d > 1 (which makes sense for an n-qubit system, since d = 2 n > 1).
Corollary 13 (First and second moment).Given O ∈ L C d , we have: Given O ∈ L((C d ) ⊗2 ), we have: where: Proof.According to Theorem (10), we have that the first-order moment operator is proportional to the identity operator (the only permutation operator associated with the permutation group of one element S 1 ), i.e.
with c I ∈ C. Taking the trace to both sides, we deduce that c I = Tr(O) d .Moreover, using Theorem (10), we can see that that the second-order moment operator is a linear combination of the two permutation operators associated with the permutation group S 2 which are the identity I and the flip F operator: with c I,O , c F,O ∈ C. To find these numbers, we left-multiply both sides by I and take the trace, which gives us: where we used the fact that Tr(I) = d 2 and Tr(F) = d.Similarly, by left-multiplying by F and taking the trace, we obtain the linear system of equation: Solving this system, we obtain c I,O = Tr(O)−d −1 Tr(OF) We have shown how to obtain the formulas for the first and second moment operator by utilizing the property that the moment operator lies in the commutant (Lemma 6.2) and that the commutant is the span of the permutation matrices Comm(U(d), k) = span V d (π) : π ∈ S k for k = 1, 2, which are particular instances of Schur-Weyl duality.Now, we will provide explicit proofs for these cases.

Example 14.
Comm(U(d), k = 1) = span I , (57) Proof.Consider an element A ∈ Comm(U(d), k = 1).In the canonical basis decomposition, A can be written as: where A i,j := ⟨i| A |j⟩. Since A is in the commutant, then U AU † = A holds for all unitary matrices U .By choosing a unitary U that sends vector |k⟩ to its negative |k⟩ → − |k⟩ while leaving the other basis vectors unchanged, it follows that, for all k ̸ = l ∈ [d], the term A k,l |k⟩⟨l| is equal to its negative, and hence, A k,l = 0. Therefore, we have that if A is in the commutant, then , by choosing a unitary matrix U that exchanges the vectors |k⟩ and |l⟩ while leaving the other basis states unchanged, we derive that all diagonal elements A k,k should be equal each other for all k ∈ [d].Thus, we have A = c d i=1 |i⟩⟨i| = cI, where c := A 1,1 .Therefore an operator can only be in the commutant if and only if it is proportional to the Identity I. Thus, we have shown the first formula we aimed to prove.
Consider now Q ∈ Comm(U(d), k = 2).Q can be decomposed as: where Q i,j;k,l := ⟨i, j| Q |k, l⟩.Since Q is in the commutant, then U ⊗2 QU †⊗2 = Q holds for all unitary matrices U .For all sets of indices {i, j} ̸ = {k, l}, let m be a index such that m ∈ {i, j} but m / ∈ {k, l} or vice versa (there is at least one).By choosing unitaries that send a basis vector to its negative, while leaving unchanged the other basis vectors, it follows that, for all {i, j} ̸ = {k, l} ∈ [d], the terms Q i,j;k,l |i, j⟩⟨k, l| are equal to their negatives, and hence, Q i,j;k,l = 0. Therefore, if Q is in the commutant, it must have the following form: Again, by choosing a unitary matrix U that exchanges the vectors |i, j⟩ with a different vector |k, l⟩ while leaving the other basis states unchanged, and using U ⊗2 QU †⊗2 = Q, we derive that: where a := Q 1,1;1,1 , b := Q 1,2;1,2 and c := Q 1,2;2,1 .Note that we can also write Q as: Since I and F commute with U ⊗2 for all U ∈ U(d), then Q is in the commutant if and only if , i⟩⟨i, i| is also in the commutant.We now show that d i=1 |i, i⟩⟨i, i| is not in the commutant, which implies that Q is in the commutant if and only if (a − b − c) = 0, i.e., if and only if Q is a linear combination of only I and F. To see this, suppose that d i=1 |i, i⟩⟨i, i| is in the commutant.Then, it should commute with U ⊗2 for all unitaries U .We can choose the (Discrete Fourier Transform) unitary U such that 1) |j⟩, where ω := exp i 2π d , and obtain the following relation: However, computing the expectation value on both sides of the previous equation over the state |1, 1⟩ leads to 1 = 1 d , which is false for every d > 1 (for d = 1, the relation to prove immediately follows).Therefore, we conclude that d i=1 |i, i⟩⟨i, i| is not in the commutant, and hence Q is in the commutant if and only if Q is a linear combination of I and F.

Symmetric subspace
In this section, we introduce the symmetric subspace, which plays a crucial role when analyzing Haar random states.For a more in-depth analysis, see [61].The symmetric subspace can be defined as the set of states |ψ⟩ in (C d ) ⊗k that are invariant under permutations of their constituent subsystems.Formally, we define the symmetric subspace as follows: To facilitate our analysis, we also define the operator P (d,k) sym as follows: Theorem 16 (Projector on Sym k (C d )).P sym is the orthogonal projector on the symmetric subspace Sym k (C d ).
Proof.We start by observing that V d (π)P Using this, we can show that P sym : Furthermore, we have P sym : Therefore, P sym is an orthogonal projector.Moreover Im P .
We now turn to computing the dimension of the symmetric subspace.
Theorem 17 (Dimension of the symmetric subspace). Tr Proof.The first equality follows by the fact that P sym is the orthogonal projector onto Sym k (C d ).Now we observe that: To count the number of vectors in this set that are linearly independent, we make the following observation.Suppose where It is easy to see that these vectors are orthogonal, and hence linearly independent.Therefore, the dimension of the symmetric subspace is equal to the number of linearly independent vectors in the set |n 1 , . . ., n d ⟩ defined earlier.This is equal to the number of ways of deciding how to assign k indices between d possible labels, which is k+d−1 k .
Next, we introduce the anti-symmetric subspace.
Definition 18 (Anti-symmetric subspace).The anti-symmetric subspace is the set: where sgn (σ) denotes the sign of a permutation σ ∈ S k .
Similarly as before, we can define the operator: and prove the following theorem: asym is the orthogonal projector on the anti-symmetric subspace ASym k (C d ).
Proof.We have: Similarly, P asym .Using this, we can show that P asym : We also have P asym : We can show that Im P asym |ψ⟩ for all π ∈ S k , where we used again that ), then we have: Therefore, we have shown that Im P asym is the orthogonal projector on the anti-symmetric subspace ASym k (C d ).
Next, we compute the dimension of the anti-symmetric subspace.
Proposition 20 (Dimension of the anti-symmetric subspace).If d ≥ k, we have: otherwise Tr P Proof.First, we have: To count the number of linearly independent vectors This because: where in the second equality we used P asym (as shown in the proof of Theorem 19).Therefore, i 1 , . . ., i k must all be distinct for P This also implies that if d < k, then Tr P (d,k) asym = 0. Therefore, we now focus on the case It is easy to see that such vectors are orthogonal and hence independent.Therefore, the dimension is given by the number of such independent vectors, which is equal to the number of ways to choose an unordered subset of k elements from a set of d elements, which is d k .We will now present a proposition that establishes a relationship between the symmetric and anti-symmetric subspaces.Proof.We have: where we used that V d (π)P where τ is any permutation.
Note that, for k = 2 (and d > 1), we have that, dim Sym sym and P (d,2) asym are both linear combinations of permutations, they commute with U ⊗2 for all unitaries U .Consequently, they are elements of a basis for the commutant Comm(U(d), k = 2) and they are orthogonal to each other.Furthermore, the commutant can have at most dimension two, since it is spanned by the identity and Flip operators (Example 14).Thus, P (d,k) sym and P (d,k) asym form an orthogonal basis for the commutant.
Since the moment operator is the orthogonal projector onto the commutant (Theorem 7), we can derive the formula for the second-order moment operator (already derived in Corollary 13), using P (d,k) sym and P (d,k) asym as the basis.We express the moment operator as: where in the first equality we used the fact that P with Proof.For all σ ∈ S k , we left-multiply both sides of Eq. ( 30) by V d (σ −1 ) and obtain: where U ⊗k for all unitaries U and that V d (σ −1 ) |ϕ⟩ ⊗k = |ϕ⟩ ⊗k , the LHS is: The RHS is: Therefore, we have: Assuming d > 1, the previous equation implies that all the coefficients must be the same, i.e., c σ = c I for all σ ∈ S k (note that this can be shown without using linear independence of the permutation matrices, but just using that the permutation matrices are all different linear operators for d > 13 ).Hence, we have: Thus, taking the trace, we get:

Vectorization formalism
We define the linear operator vec : It is worth noting that the map |Φ(X)⟩⟩ is linear with respect to |X⟩⟩ for all |X⟩⟩ ∈ (C d ) ⊗2 .This linearity implies the existence of a matrix vec(Φ) ∈ L((C d ) ⊗2 ) associated with this linear transformation.In other words, we can express the action of Φ on an operator X as follows: It is important to mention that for all linear superoperators Φ : To see why, observe that: where is the identity channel and Tr 2 indicates the partial trace with respect to the second tensor space.Now, performing a singular value decomposition (SVD) on ρ Φ , we have: where λ i ≥ 0, |u i ⟩ and |v i ⟩ are respectively the singular values and the column vectors associated to the SVD unitaries, and we defined . Substituting this expression back into (106), we obtain the desired claim.(Note that if Φ is completely positive, then its Choi matrix is positive, so we can perform an eigendecomposition instead of an SVD, and arrive at the same conclusion as before, with Using the ABC-rule (104), we can rewrite Φ(X) = i |X⟩⟩, which implies: Additionally, it is often advantageous to express the moment operator in its vectorized form: Definition 24 (Vectorized moment operator).We define the vectorized moment operator M Proposition 7 has a vectorized version as follows: Proposition 25.Let the commutant space be Comm := Comm(U(d), k).
be elements of an orthonormal basis of Comm with respect to the Hilbert-Schmidt inner product.Then, we have: Moreover, we have Tr M Proof.Vectorizing the equation derived in Proposition 7, we have: Since the previous equation holds for all O ∈ L (C d ) ⊗k and therefore for all |O⟩⟩ ∈ (C d ) ⊗2k , we have the thesis.
Building upon the previous proposition and the fact that Tr M (k) derive the following equation: In the case where d ≥ k (which is the case we are most interested in), we find that k!.This is due to the fact that the k-th order commutant of the unitary group corresponds to the span of the permutation operators (as a consequence of Schur-Weyl duality, see Theorem 9), which are linearly independent for d ≥ k (as stated in Proposition 11).
It is also worth noting that the vectorized moment operator µ H (due to the right invariance of the Haar measure) and Furthermore, as an exercise, we provide an alternative proof of Theorem 7 based on vectorization.
Theorem 26 (Projector onto the commutant).The moment operator is the orthogonal projector onto the commutant Comm := Comm(U(d), k) with respect to the Hilbert-Schmidt inner product.In particular, let P 1 , . . ., P dim(Comm) be an orthonormal basis of Comm and let O ∈ L((C d ) ⊗k ).Then, we have: Proof.Since the vectorized moment operator M µ H is an orthogonal projector, it admits an eigendecomposition with eigenvalues 0 and 1 and eigenvectors {v i } r i=1 , where r is the rank of the projector M (k) where we used the fact that for each |v i ⟩, there exists We note that P i are operators in Comm(U(d), k).This can be shown as follows: where in the last step we used Lemma 6.2.Furthermore, {P i } r i=1 form an orthonormal set with respect to the Hilbert-Schmidt inner product: Moreover, they constitute a basis for Comm since, for all A ∈ Comm, we have µ H is equal to the dimension of the commutant, i.e., r = dim (Comm).Consequently, we can conclude the proof by noting that for all O ∈ L (C d ) ⊗k : To address the issue of the permutation matrices not being orthonormal, we can introduce a vectorized version of Eq.( 35) that provides a concise representation of the moment operator.
Proof.Using Eq.( 35), we have: Since this equation holds for all vectors |O⟩⟩ in (C d ) ⊗2k , we conclude the proof.
Equations like the ones presented above can be effectively visualized and manipulated using Tensor Network diagrams.

Tensor network diagrams
Tensor Networks provide a graphical representation that simplifies the understanding and analysis of tensor operations [62].In these diagrams, tensors are represented as boxes (or nodes), and tensor contractions are represented as connections between boxes.To illustrate the concept, we use specific diagrams for ket-states, bra-states, and matrices.A ket-state |ψ⟩ ∈ C d is represented with a box and a leg as follows: Similarly, a bra-state ⟨ψ| is represented as: Matrices, such as A ∈ L C d , are depicted like boxes with one leg in and one leg out as: In particular, the identity matrix I ∈ L C d is simply represented by a line: The trace Tr(A) of an operator A ∈ L C d is denoted as: Given A, B ∈ L C d , their product AB is represented by placing the box representing matrix A on the left side of the box representing matrix B. This can be illustrated as follows: The transpose A T of the matrix A is depicted by: is represented with a box that has two (different) legs on the left and two on the right: The partial trace with respect to the second tensor space Tr 2 (A) is denoted as: and similarly for the partial trace with respect to the first tensor space.
Particularly important is the non-normalized maximally entangled state |Ω⟩ = |I⟩⟩, which is the vectorization of the identity and is denoted as: Given an operator A ∈ L C d , its vectorization |A⟩⟩ = A ⊗ I |Ω⟩ is represented as: where the equality is due to the transpose-trick |A⟩⟩ = A ⊗ I |Ω⟩ = I ⊗ A T |Ω⟩.In Tensor Network notation, the diagram below makes it clear that ⟨⟨A|B⟩⟩ = Tr A † B : as well as the ABC-rule (104): where A, B, C ∈ L C d .
In accordance with the Definition 23, a permutation matrix V d (π), with π ∈ S k , maps the tensor product state and it can be represented by a diagram with lines connecting the tensor space of the i-th element on the right with the the π(i)-th element on the left for each i ∈ [k].For example, the identity operator I and the Flip operator F are represented as follows: The composition of permutations can be visualized by placing the corresponding diagrams next to each other.When considering a matrix A ∈ L C d ⊗ L C d ′ , its vectorization is represented as: For instance, the vectorization of the identity operator I and the Flip operator F are represented as: Using this diagrammatic notation, we can express Eq.( 123) Similarly, Eq.( 124) |F⟩⟩⟨⟨F| can be represented as: Using the diagrammatic notation, we can easily see that: where we used where we used other useful relation is the partial-swap-trick Tr 2 (A ⊗ B F) = AB, which can be visualized as: and thus we have the swap-trick formula Tr(A ⊗ B F) = Tr(AB).Similarly we also have Tr More generally, consider the cyclic permutation π cyc ∈ S k , which corresponds to the unitary operator: This unitary operator can be depicted diagrammatically as: Now, we want to calculate Tr 2,...,n (A , and we will show that it simplifies to A 1 • • • A n .We can verify this through direct calculation using resolutions of identities: Alternatively, we can represent this calculation diagrammatically as: Hence, we have the generalized swap-trick, known as the cyclic-permutation-trick :

Unitary designs
Generating Haar random unitaries on a quantum computer can be a computationally expensive task, since most unitaries require an exponential number of elementary gates with respect to the number of qubits [3,39,63] to be implemented.However, for many applications in quantum information, only low-order moments of the Haar measure are needed [1, 10, 13-15, 21, 40].This motivates the definition of unitary k-designs [64], which are distributions of unitaries that match the moments of the Haar measure up to the k-th order, where k ∈ N. As we will mention later, generating unitary k-designs can be done efficiently in the number of elementary gates.
Definition 27 (Unitary k-design).Let ν be a probability distribution defined over a set of unitaries S ⊆ U(d).The distribution ν is unitary k-design if and only if: For instance, consider a distribution ν where the set of unitaries S is discrete and each unitary has an equal probability of being chosen.In this case, we have: An equivalent way to define a unitary k-design is to relate the vectorized moment operator of the distribution ν to that of the Haar measure µ H . Specifically, we have the following: Observation 28.A probability distribution ν is a unitary k-design if and only if: Proof.By taking the vectorization to both sides of Eq.( 158), we have: It is also useful to define the following quantity, the frame potential [65], which provides another way to verify if a probability distribution is a k-design.

Definition 30 (Frame potential). Let ν be a probability distribution defined over the set of unitaries S ⊆ U(d). For a given k ∈ N, we define the k-frame potential, denoted as F (k)
ν , as follows: As we will show in Proposition 34, a probability distribution ν is a k-design if and only if its kframe potential coincides with that of the Haar measure µ H . Thus, given a probability distribution ν and a set of unitaries S, we can compute the frame potential F (k) ν and compare it with that of the Haar measure to determine whether ν is a k-design or not.It is also useful to define the notion of k-invariant distribution.

Definition 31 (k-invariant measure). Let ν be a probability distribution defined over a set of unitaries S ⊆ U(d). ν is k-invariant if and only if, for any polynomial p(U ) of degree ≤ k in the matrix elements of U and U * , it holds
for all V ∈ S.
We begin by making the following observation, which will be useful for computing the frame potential of the Haar measure µ H . Lemma 32.Let ν be a probability distribution defined over a set of unitaries S ⊆ U(d).If ν is k-invariant, then we have: where Comm(S, k) is defined in Definition 5.
Proof.By (right) k-invariance we have: Therefore: We can conclude the proof by observing that Tr E U ∼ν cording to Proposition 25.Note that although the proof of Proposition 25 was given for the Haar measure on the unitary group, it applies equally to any k-invariant measure.
It is worth noting that the left and right invariance properties, and therefore Lemma 32, hold for any uniform probability distribution ν defined over a subgroup of unitaries S.
In the following lemma, we show that the difference between the frame potential of a probability distribution ν (not necessarily k-invariant) and that of the Haar measure µ H can be represented by the squared 2-norm of the difference between their vectorized moment operator.

Lemma 33 (Frame potential difference). Let F (k) ν and F (k)
µ H be the frame potentials of the probability distribution ν and the Haar measure µ H , respectively.Then, we have: First, we note that M µ H due to Proposition 3. Furthermore, because of the invariance of the Haar measure, we have M µ H . Thus, by using these properties, we obtain: We can conclude by observing that for µ = µ H and µ = ν we have: It follows from the previous Lemma that showing that a distribution ν is a k-design can be achieved by computing its frame potential and comparing it with that of the Haar measure µ H .This is due to the fact that if the two frame potentials coincide, the difference between their vectorized moment operators in 2-norm must be zero, hence they must be equal (which is an equivalent unitary design definition because of Observation 28).We can restate the result explicitly as follows: Proposition 34 (Frame potential k-design condition).We have: Moreover, the equality holds if and only if ν is a k-unitary design.
In particular, if k ≤ d, then dim(span Proof.By Lemma 33, Lemma 32 and Schur-Weyl duality 9, we have: Moreover, by Proposition 11, we have that if k ≤ d, then the number of linearly independent permutation matrices is k!. By utilizing this result, we can derive a straightforward lower bound on the cardinality of a discrete set S of unitaries necessary to form a k-design.To establish this bound, let us consider the frame potential F (k) ν associated with the uniform distribution ν over the discrete set S. We can then deduce that: Tr Furthermore, considering the fact that ν constitutes a k-design (with k ≤ d), by the previous proposition, we have that This implies that the cardinality of the set S must satisfy |S| ≥ d 2k k! .Consequently, if we are considering an n-qubit system where the Hilbert space dimension is d = 2 n , the cardinality of S must grow at least exponentially with the number of qubits.For a more comprehensive analysis of lower bounds of k-design, we recommend referring to the following references [39,59,65,66].
The following proposition provides equivalent definitions of unitary k-design: Proposition 35 (Equivalent definitions of unitary k-design.).Let ν be a probability distribution over a set of unitaries S ⊆ U.Then, ν is a unitary k-design if and only if:

E
] for all polynomials p(U ) homogeneous of degree k in the matrix elements of U and homogeneous of degree k in the matrix elements of U * .
Proof.The equivalence between 1. and 2. is shown in Observation 28 and the one between 2. and 3. is shown in Proposition 34.To show that 2. implies 4., we observe that any homogeneous polynomial p(U ) of degree k in the elements of U and in the elements of U * can be written as p(U ) = Tr AU ⊗k ⊗ U * ⊗k for a matrix A ∈ L (C d ) ⊗2k with entries the coefficients of the polynomial.Similarly, to show that 4. implies 2., we note that, for all matrices A ∈ L (C d ) ⊗2k , p A (U ) := Tr AU ⊗k ⊗ U * ⊗k is a homogeneous polynomial of degree k in both the elements of U and U * .Therefore we have which implies the point 2.
Moreover, if ν is a uniform distribution over a set of unitaries S which forms a group, then if dim (Comm(S, k)) = dim (Comm(U(d), k)), then, by Proposition 32 and Proposition 34, ν is a k-unitary design.Now we turn to the definition of k-designs for distributions over sets of states, which are known as state k-designs or spherical k-designs [67].
Definition 36 (State k-design).Let η be a probability distribution over a set of states S ⊆ C d .
The distribution η is said to be a state k-design (or also spherical k-designs) if and only if: A unitary k-design ν induces a k-state design for the probability distribution over states U |ψ 0 ⟩, where U is drawn from ν and |ψ 0 ⟩ is fixed.This can be seen by setting O = (|ψ 0 ⟩⟨ψ 0 |) ⊗k in Definition 27.
Similarly to the unitary k-design case, we can define the so-called state frame potential.
Definition 37 (State frame potential).The state frame potential is defined as: We now state a proposition that relates the state frame potential to k-designs: Proposition 38 (State frame potential design).Let η be a probability distribution over a set of states S ⊆ C d .Then: where in the second equality we used cyclicity of the trace.Additionally, we have: where in the first equality we used Eq.( 95), in the second equality the fact that P (d,k) sym squares to itself (Theorem 16) and in the last equality the fact that P (d,k) sym is the orthogonal projector on the symmetric subspace (Theorem 16).Similarly, we can show that Tr A Here, dim Sym k (C d ) can be calculated using Proposition 17, which shows that it is equal to d+k−1 k .Therefore, we can conclude.
It is worth noting that any uniform distribution ν defined over a set of unitaries that forms a basis for L(C d ) and satisfies Tr U † i U j = dδ i,j constitutes a 1-design.This can be easily proven by computing the frame potential as follows: and applying Proposition 34.Therefore, the uniform distribution defined over the Pauli basis P := {I, X, Y, Z} ⊗n is a 1-design, where n is the number of qubits and d = 2 n .The Flip operator can be elegantly represented in terms of the Pauli basis using the following expression: where we wrote the Flip operator in the Pauli basis and used the swap-trick.Using this, it is also evident that the Pauli group forms a 1-design (according to Definition 27).In fact, for any O ∈ L C d , we have: where in the first step we used the partial-swap-trick (Eq.( 152)) and in the last step (Eq.( 50)).An important set of unitaries is the Clifford group Cl(n) [68] i.e. the set of unitaries which sends the Pauli group P n in itself under the adjoint operation: where P n := {i k } 3 k=0 × {I, X, Y, Z} ⊗n .It can be proven that the uniform distribution over the Clifford group, forms a 3-design for all d = 2 n [69,70], but it fails to be a 4-design [71].
Moreover, it can be shown that any Clifford circuit can be implemented with O(n 2 / log(n)) gates from the set {H, CNOT, S} where H, CNOT and S are the Hadamard, Controlled-NOT and Phase gate, respectively [63,72].Furthermore, there are efficient algorithms to sample uniformly from the Clifford group [73,74].This is of fundamental importance for applications in quantum computing, since if we are interested in reproducing moments of the Haar measure up to the third moment, it is sufficient to sample efficient quantum circuits that correspond to Clifford unitaries (instead of sampling from the Haar measure, which would require implementing quantum circuits with an exponential number of gates in the number of qubits [63]).

Approximate unitary designs
In many cases, having an exact unitary design may not be necessary, and having an approximate one may suffice.Various definitions of approximate unitary designs have been proposed in the literature [39,49,59].Here we will explore some of them and their relationships.
We begin by defining the concept of Tensor Product Expander (TPE)-ε-approximate k-design [59].To simplify the notation, we use U ⊗k,k := U ⊗k ⊗ U * ⊗k .Definition 39 (Tensor Product Expander (TPE)-ε-approximate k-design).Let ε > 0. We say that ν is a TPE ε-approximate k-design if and only if: The TPE notion of approximate unitary design is particularly advantageous because it can be "amplified".This means that if a distribution of unitaries ν is a TPE λ-approximate k-design, then the distribution of unitaries associated to the product independently distributed according to ν, is a TPE λ P -approximate design (as we prove below).We denote by ν P this distribution of the product of P unitaries sampled independently by ν.Note that the vectorized moment operator associated with ν P is: We state the proposition below: , where P is a positive integer such that P ≥ Proof.We have to bound the following quantity . We first observe that the vectorized moment operator associated to ν P is: where we used the independence of U 1 , . . ., U P .Moreover, we have: which can be shown by induction.The basis step is trivial for P = 1.Suppose the claim is true for P, then: where in the last step we used the invariance of the Haar measure to conclude that Therefore we have: where in the first step we used Eq.( 191) and Eq.( 192), in the second step we used the submultiplicativity4 of the infinity norm and in the last step we used that ν is a TPE λ-approximate k-design.By choosing P ≥ log(ε) log(λ) , we can conclude that ν P is a TPE ε-approximate k-design.The previous Proposition implies that if we have a TPE λ-approximate k-design and we want to amplify it in order to achieve a TPE with approximation precision ε inverse exponential in the number of qubits i.e. ε = εd −c where ε, c > 0 and d = 2 n , then we can achieve this by choosing a number of repetitions P ≥ 1 log(λ −1 ) log ε−1 + nc log (2) .Note that this expression is linear in the number of qubits n if λ = O(1).
We will now define the notion of diamond ε-approximate k-design.

Definition 41 (Diamond ε-approximate k-design). We say that ν is a diamond ε-approximate k-design if and only if
where the diamond norm of a superoperator Φ : L(C D ) → L(C D ) can be defined as where It is worth mentioning that the diamond norm distance between two quantum channels has an important operational interpretation, as it is closely related to the one-shot distinguishability probability between the two channels [75].
In the following Proposition, we will see the relation between the Diamond approximate design definition and the TPE one previously introduced.

Proposition 42 (Diamond vs TPE).
• We have: Thus, if ν is a TPE ε-approximate k-design, then ν is a diamond εd k -approximate k-design.
• Conversely, we have: µ H .We have: where we used that ∥A∥ 1 ≤ √ D∥A∥ 2 and ∥A∥ 2 ≤ ∥A∥ 1 for A ∈ L(C D ).Now we observe that: where in the first equality we used that the Hilbert-Schmidt norm of a matrix is the 2-norm of the vectorized matrix and that vec (Φ ⊗ I (X)) = vec(Φ ⊗ I) |X⟩⟩ (see Eq.( 105)), while in the third equality we used that ∥A |v⟩∥ 2 = ⟨v| A † A |v⟩ and that the supremum of this quantity over all the vectors |v⟩ that are normalized with respect to the 2-norm is achieved by the largest eigenvalue of A † A, which coincides with the largest singular value of A. Now we have: where in the last step we permuted the tensor products using the p-norm property U AU † p = ∥A∥ p with U unitary matrix.Using this we can write: where in the last step we used the p-norm property ∥A ⊗ B∥ p = ∥A∥ p ∥B∥ p .Hence the first claim follows.Now for the second claim, reviewing in reverse what we have shown for the first claim, we have: By using that ∥A∥ 2 ≤ ∥A∥ 1 , that ∥A∥ 1 ≤ √ D∥A∥ 2 for A ∈ L(C D ) and the definition of diamond norm, we have: Motivated by Proposition 34, we introduce an alternative notion of approximation based on the frame potential.Notably, the frame potential may be easier to estimate numerically than the other approximation notions previously introduced (if approached naively by their definition), as it involves computing the trace of matrices in L(C d ), rather than working with matrices defined in tensor product spaces.Specifically, we define a frame potential ε-approximate k-design as follows: Definition 43 (Frame potential ε-approximate k-design).We say that ν is a frame potential ε-approximate k-design if and only if: Note that the argument of this square root is always non-negative (due to Lemma 33).
We now establish a relationship between the Frame potential notion of approximate k-design and the TPE one.

Proposition 44 (Frame potential vs TPE relation).
• We have: Therefore, if ν is a TPE ε-approximate k-design, then ν is a frame potential εd k -approximate k-design.
• Conversely, we have: Proof.By using Lemma (33), we have: Now the first claim follows just by using that ∥A∥ 2 ≤ √ D∥A∥ ∞ for all A ∈ L(C D ), while the second claim follows by ∥A∥ ∞ ≤ ∥A∥ 2 .
In the following, we relate the diamond notion of approximation with the frame potential one.

Proposition 45 (Diamond vs Frame potential).
• We have: Hence, if ν is a frame potential ε-approximate k-design, then ν is a diamond εd k -approximate k-design.
Proof.Using Eq.( 203), we have: where in the last step we used that ∥A∥ ∞ ≤ ∥A∥ 2 and Eq.( 33).Therefore we have shown the first claim.For the second claim, we have: where in the first step we used Lemma 33 and that ∥A∥ 2 ≤ √ D∥A∥ ∞ for A ∈ L(C D ), while in the second step we used Eq.( 204).
Another notion of unitary design approximation was introduced in [59], which we refer to as relative error ε-approximate k-design.This is defined as follows: Definition 46 (Relative error ε-approximate k-design).We say that ν is a relative error εapproximate k-design if and only if where A ≼ B means that B − A is completely positive, and A and B are linear superoperators.
For further details on this approximation notion, see [59].In [59,76,77], it was shown that one-dimensional local random quantum circuits, which are quantum circuits of n qubits formed by 2-qubit Haar random gates, of size O n 2 poly(k) are (relative error) ε-approximate k-designs for all k ∈ O 2 0.4n .This result has been extended to higher dimensional quantum circuits in [78].

Examples and applications
This section explores diverse examples and applications where the Haar measure plays a fundamental role in quantum information.We derive well-known formulas that reduce to computing moments over the Haar measure, including the twirling of quantum channels and the average gate fidelity.These formulas lay the foundation for various applications, such as Randomized Benchmarking [79].
Moreover, we show how these moment calculations can be translated into probability statements using concentration inequalities.Concentration inequalities serve as valuable tools for establishing rigorous bounds and enhancing our understanding of the statistical behavior associated with Haar random variables.
Furthermore, we provide detailed insights into two notable examples showcasing the applications of the theory of unitary design.We examine Barren Plateaus [30] in Variational Quantum Algorithms, shedding light on the optimization landscapes encountered in such algorithms.Additionally, we delve into Classical Shadow tomography [1], where the theory of unitary design aids in designing efficient measurement strategies for reconstructing properties of unknown quantum states.

Examples of moment calculations
We start by deriving a formula that plays an important role in quantum information, in particular in Randomized Benchmarking [18,79].This formula illustrates that when a unitary operator U , randomly chosen from a 2-design, is applied before a quantum channel Φ, followed by the application of its adjoint U † , the resulting output channel resembles a depolarizing channel.
Example 47 (Twirling of a quantum channel is a depolarizing channel).Let ν a unitary 2-design distribution.Consider a quantum channel Φ : L(C d ) → L(C d ) and a quantum state ρ ∈ S C d .Then: where the left-hand side represents the so-called "twirling" of Φ, and we define: Here, F e (Φ) denotes the entanglement fidelity given by F e (Φ) : Proof.Due to the 2-design property of ν and the fact that the quantity of interest is a second moment quantity, we can average over the Haar measure µ H instead of ν.Considering a Kraus decomposition for Φ with operators {K i } d 2 i=1 , we have: where in the second equality we used that AB = Tr 2 (A ⊗ B F), and in the third equality that ) for all integrable functions f .Using Eq.( 51) for the second moment, we have: where the coefficients c I and c F are given by: where we used the swap-trick and the fact that To conclude the proof, we observe that c I = p Φ , as defined in Eq.( 223).This follows from the relationship , as it can be easily seen: It is worth noting that this equality can also be understood visually with the help of diagrams.
It should be noted that the above formula, along with many other derivations in this tutorial, is based on the calculation of the moment operator that depends on the commutant of the unitary group.However, if we were to compute the Haar expected value defined over another subgroup on the unitary group instead on the full unitary group, we could still use the same approach by characterizing the commutant of this subgroup.
The following example introduces the concept of average gate fidelity, a measure used to assess the quality of quantum gates [24,80].
Example 48 (Average gate fidelity).Let ν be a state 2-design distribution.Consider a quantum channel Φ : L(C d ) → L(C d ) and a unitary channel U (•) = U (•) U † .Then, the average gate fidelity is given by: where Proof.It is worth noting that this formula can be easily derived also from the previous Eq.222 without the need to explicitly compute any additional Haar moment.However, for the sake of completeness, we will still perform the derivation without using Eq.222.We have: where we used in the first equality the Kraus decomposition of Φ and the fact that ν is a 2-design, in the second equality the swap-trick, in the third equality we used linearity of the expected value and the moment operator formula (95), in the fourth equality again the swap-trick, in the fifth equality the fact that Eq.( 233)), which concludes the proof.
The following example considers a bipartite state ρ ∈ S C d ⊗ C shared between Alice and Bob, who apply unitary operations U and U * to their respective subsystems, where U is sampled according to a 2-design [26].The resulting output state can be expressed as a combination of the maximally mixed state and the maximally entangled state d −1 |Ω⟩⟨Ω|, and the coefficients of this combination depend on the overlap of the state ρ with the maximally entangled state d −1 |Ω⟩⟨Ω|.
Example 49.Let ν be a unitary 2-design distribution, and ρ ∈ S C d ⊗ C d be a quantum state. where Proof.By considering the partial transpose over the second subsystem and the 2-design formula (Eq.( 51)), we obtain: and where ν is a 2-design distribution.
Proof.We can express the expected value as follows: where in the first equality we used the swap-trick on the two copies of the subsystem A and denoted with the subscript A the Flip operator acting on such two copies, and in the second equality we used that ρ A = Tr B (|ψ⟩⟨ψ| I A ⊗ I B ).In tensor diagrams this is also clear to visualize.Since ν is a 2-design, we have: Now let us consider a system like in the previous example, where d A ≤ B .Since the entanglement entropy S (ρ A ) := − Tr(ρ A log 2 ρ A ) is lower bounded by the 2-Renyi entropy S 2 (ρ A ) := − log 2 Tr ρ 2 A (Fact 1 in [82]), we have: where we used Jensen's inequality with the fact that − log 2 (x) is a convex function.By using the result from the previous Example 50, we have: Since S (ρ A ) ≤ log 2 (d A ), we conclude that the expected value of the entanglement entropy of the reduced state of a pure state distributed according to a 2-design is close to its maximum value.However, it is important to note that for finite system sizes, there exists a gap between the exact average entanglement entropy and the maximum value, as demonstrated by Page [83,84]: This value is commonly referred to as the Page entropy.For a more in-depth analysis, see [41], where higher Renyi-entropies are also discussed.
Example 51 (Expectation value).Consider a unitary k-design distribution denoted by ν.Let O ∈ L C d be an operator.Then: In the case of k = 2, it simplifies to: Proof.This follows easily from Eq.( 95): In particular, for k = 1, we have

Concentration inequalities
By utilizing concentration inequalities, such as Markov's inequality, it is possible to a connection between the exponential decay of the expected value with the number of qubits and a probabilistic statement.Markov's inequality states that for a non-negative random variable X and any ε > 0, the probability that X exceeds ε is bounded by the ratio of the expected value of X to ε: In a more general form, if g is a strictly increasing non-negative function, the inequality can be expressed as: To do an example, consider P ∈ {I, X, Y, Z} ⊗n \{I ⊗n } where X, Y, Z are Pauli matrices.By applying Markov's inequality and utilizing Example (51), we find that when sampling a (Haar) random state |ψ⟩, the probability that the expectation value of P exceeds a threshold ε > 0 decays exponentially with the number of qubits n: The same holds if the expected value is taken with respect to a state 2-design.
Another concentration inequality that can be particularly useful when analyzing the averages over Haar-random states is Levy's lemma.
Lemma 53 (Levy's lemma [85]).Consider the set For all ε ≥ 0, we have the probability bound: Using Levy's lemma, we can improve the dependence on the number of qubits n to a double exponential form compared to what we obtained by a simple application of Markov's inequality in Eq. (260), as shown in the next example.However, this double exponential form comes at the cost of requiring the state to be drawn from the Haar measure rather than a state 2-design.
Example 54.Let O ∈ Herm C d be a Hermitian operator.For all ε ≥ 0, we have: In particular, if O is a Pauli string P ∈ {I, X, Y, Z} ⊗n \{I ⊗n }, we have: Proof.To apply Levy's lemma, we consider the function f (ψ) = ⟨ψ| O |ψ⟩ and compute its expected value and Lipschitz constant.First, we observe that Next, we determine the Lipschitz constant.We have where we used the matrix Hölder inequality.We then observe that: This can be seen by considering Q := |u⟩⟨u| − |v⟩⟨v|.The Hermitian matrix Q has a rank of at most 2, which means it can have at most two non-zero eigenvalues denoted as λ 1 and λ 2 .Since the trace of Q is zero, we have λ 2 = −λ 1 .Additionally, we know that Tr , we can derive bounds based on the k-th moment of a random variable.This approach is particularly useful in situations involving k-unitary designs, where we can often compute moments only up to a given order.
Another useful inequality when averaging over the full Haar measure is the following.
To illustrate an application of this, let us consider an example provided also in the notes of Richard Kueng [45].
Proof.We have: where we used the result of Example 52.
For more in-depth exploration of the applications of concentration inequalities in the context of Haar random unitaries and k-designs, see [39,45,49,86].

Applications in Quantum Machine Learning
Let us now analyze an application of unitary designs the context of Quantum Machine Learning, specifically in Variational Quantum Algorithms (VQAs) [32].In VQAs, the problem at hand is formulated as the minimization of a cost function.This cost function is typically constructed using the expectation value of an observable, which is estimated on a quantum computer using a prepared quantum state that depends on the parameters of the quantum gates.These gate parameters are optimized with a classical optimizer with the goal of minimizing the cost function.
Specifically, consider a Hilbert space H associated with an n-qubit system of dimension d = 2 n .In VQAs, it is customary to define the cost function as: where O ∈ Herm(H) represents an observable, ρ 0 ∈ S (H) is a fixed initial state, and U (θ) denotes the parameterized unitary transformation (or ansatze) defined as: Here, L is a positive integer representing the total number of parametrized quantum gates, θ l ∈ R are the parameters associated with the parametrized gates, and H l ∈ Herm(H) are their Hamiltonian generators.In the context of VQAs, it is often assumed that the observable O can be expressed as a linear combination of Pauli operators as: where P i ∈ {I, X, Y, Z} Proof.By Eq.( 50), we have: where in the last step we used that O is traceless.The second equality follows from Var 2 and Eq.( 51): , and in the last step we used that O is traceless.The proof is concluded by noting that c F,ρ ⊗2 0 ≤ 1 2 n (2 n +1) and using that Tr O 2 ∈ O (poly(n)2 n ).
We can then apply Chebyshev's inequality, which states that for all ε > 0 we This inequality provides an upper bound on the probability of encountering a point in the parameter space where the cost function deviates from its expected value by more than ε.In particular, the probability of finding a point with a cost function larger than ε decays exponentially with the number of qubits: Similarly, with slightly more involved calculations, we can show that the exponential decay also applies to the variance of the partial derivatives of the cost function.This phenomenon, where the variance of the partial derivatives of the cost function decays exponentially with the number of qubits n, is commonly referred to as Barren Plateaus [30].
To analyze the partial derivatives of the cost function, we can express the parameterized unitary circuit U (θ) as the product of two unitary operators: U (θ) = U A U B , where U A = L l=µ+1 e −iθ l H l and U B = µ l=1 e −iθ l H l .Consequently, we can write the partial derivative of the cost function as follows: where we denoted by ∂ µ the partial derivative with respect to θ µ , we used that ∂ µ U (θ) = −iU A H µ U B and the cyclicity of the trace.Using this expression for the partial derivative of the cost function, we can prove the following: Observation 57.Let ν A , ν B be probability distributions defined over the sets of unitaries {U A (θ)} θ∈R L−µ and {U B (θ)} θ∈R µ , respectively.Suppose that both ν A and ν B are 2-designs distributions. 6In this case, the following properties hold: Proof.We have: where we used that O is traceless.Thus, for the variance, we have to compute Var where in the third equality we utilized Eq.( 51) as well as the fact that the of a commutator is always zero, we have defined c , and in the last step we used the swap-trick.Now, using the fact that: we have: where in the third equality we used again Eq.( 51) along with the fact that H µ is traceless, and we Thus, we have: We can conclude by using that Tr O 2 ∈ O(poly(n)2 n ) and Tr H 2 µ ∈ O(poly(n)2 n ).Using again Chebyshev's inequality, we can conclude that the probability of finding a point in the parameter space with a partial derivative larger than a threshold ε decreases exponentially with the number of qubits.Specifically, we have: There are several recent works that leverage Haar integration to compute the concentration of variational cost functions.For example, [87][88][89] extend the Barren Plateaus diagnosing method (variance computation) to the case where the variational circuit lives in a unitary subgroup (over which it forms 2-designs), which is particularly relevant in the case of symmetric ansatzes [90,91].Additionally, see [92][93][94] for cost-concentration analyses in the context of noisy circuits.

Classical shadow tomography
The classical shadow protocol, introduced by Huang et al. [1], is a method for predicting properties of quantum systems based on the theory of k-designs.In this tutorial, we will explore the classical shadow protocol as example of application.Let ρ ∈ S C d be a quantum state of n-qubits, and let O 1 , . . ., O M ∈ Herm C d be Hermitian operators, where d = 2 n .The goal is to estimate Tr(O 1 ρ), . . ., Tr(O M ρ) with a desired accuracy and probability of success.We assume that the full classical description of the state ρ is unknown but it can be "queried" on a quantum device multiple times.When the state ρ is queried, a unitary U is sampled randomly from a probability distribution µ, and it is applied to ρ.The resulting state, U ρU † , is measured in the computational basis {|b⟩} b∈ [d] .Information about the sampled unitary U and the measurement outcome |b⟩ is stored in a classical memory, which can be efficiently implemented for appropriately chosen distributions of unitaries.The state U † |b⟩⟨b| U is referred to as a classical snapshot.Now, the expected value of the classical snapshot E U † |b⟩⟨b| U is considered, where U is distributed according to the probability distribution µ, and b is distributed according to the Born's rule probability distribution ⟨b| U ρU † |b⟩.More explicitly, we can associate a completely positive trace-preserving linear map M (ρ) to E U † |b⟩⟨b| U , denoted as measurement channel : Assuming that M is invertible, ρ := M −1 U † |b⟩⟨b| U serves as an unbiased estimator for ρ, meaning E = ρ.The matrix ρ is commonly known as the classical shadow of the state ρ.Consequently, ôi := Tr(O i ρ) is an unbiased estimator for Tr(O i ρ): For appropriately chosen probability distributions µ, the estimator ôi can be efficiently computed classically.
To estimate the number N of copies of ρ (sample complexity) to achieve an additive accuracy ε > 0 in the estimation of Tr(O i ρ) for all i ∈ [M ], with a failure probability of at most δ > 0, it is important to bound the variance of the estimator: If the median of means [1,95] is used as the estimator to post-process the data ôi for each i ∈ [M ], then a number of copies is enough to estimate, for each i ∈ [m], Tr(O i ρ) up to precision ε with success probability at least 1 − δ.Therefore, the essential ingredients are the construction of the estimator through the inversion of the measurement channel M (•) and bounding the variance.Below we observe that the measurement channel M (•) is related to the second moment operator over the probability distribution µ, while the variance is related to a third moment.
For the variance Eq.(303), we need to consider: We first observe that Tr(AM (B)) = Therefore: We will now consider the unitary probability distribution that corresponds to the uniform distribution over the Clifford group, that we defined in Eq.( 188).It is important to remember that the any unitary of the Clifford group can be implemented with O(n 2 / log(n)) elementary gates [72] and that there are efficient algorithms to sample uniformly from the Clifford group [73,74].
Since the uniform distribution over the Clifford group is an exact 3-design, its first three moments coincide with those of the Haar measure.Thus, we need to insert in Eq.(305) the formula for the second moment over the Haar measure to find the expression of the measurement channel M (ρ) and then invert it.
To bound the variance we need to compute a third moment over the Haar measure of the unitary group, due to the 3-design property of the Clifford group.
where we used the Tensor Network representation of permutations defined in Section 6. Substituting this expression into the first term of Eq.( 319), up to a global factor ((d + 1)(d + 2)) −1 , we have : To simplify the calculations, we can utilize a trick exploited in [1], namely that the variance Var (ô i ) only depends on the traceless part of the operator O i , which is defined as O Using the previous bound on the variance, we have that a number of copies Tr O 2 i (330) suffices to estimate, for each i ∈ [m], Tr(O i ρ) up to precision ε and with success probability at least 1 − δ.Therefore, if we want to estimate observables with bounded 2-norm, we have that the protocol is sample-efficient.However, it is important to emphasize that sampling efficiency does not imply that the protocol described is computationally efficient; in fact, to guarantee it, we must also be able to compute the unbiased estimator in Eq.(302) in a time-efficient manner.This is known for observables that have a particular structure such as stabilizer states (i.e., states constructed by the application of a Clifford circuit to a computational basis state); in this case, one can take advantage of the fact that computing the overlap between two stabilizer states can be done efficiently in classical computational time [72].See [1] for more details and also for checking the interesting case in which the uniform distribution over the unitary set formed by the tensor product of single-qubit Clifford gates is considered, which is better known as "random Pauli basis".In this case, it turns out that the shadow protocol is both sample and time efficient for local observables like Paulis supported on a constant number of qubits.To explore additional examples of unitary distributions, one may refer for example to [96], which discusses the utilization of the fermionic Gaussian unitaries distribution, or to [97,98], where unitary distributions generated by local random quantum circuits are investigated.Additionally, [99] analyzes Pauli-invariant unitary ensembles.

Theorem 10 .
(Computing moments) Let O ∈ L (C d ) ⊗k .The moment operator can then be expressed as a linear combination of permutation operators:

)
Therefore we also have Sym k (C d ) ⊆ Im P (d,k) sym and so Sym k (C d ) = Im P (d,k) sym

Proposition 21 .
We have P (d,k) † asym P (d,k) sym = 0.In particular P (d,k) asym and P (d,k) sym are orthogonal with respect to the Hilbert-Schmidt inner product.

(d, 2 )Theorem 22 .
sym and P (d,2) asym are an orthonormal basis for the commutant, we utilized the fact that P (d,2) sym and P (d,2) asym are orthogonal projectors (meaning that they square to themselves and are Hermitian), in the third equality we substituted P (d,2) sym = 1 2 (I + F) and P (d,2) asym = 1 2 (I − F) along with their respective traces to arrive at the desired formula.Now we show a theorem that states that for all quantum states |ϕ⟩ in a d-dimensional Hilbert space, the moment operator of |ϕ⟩⟨ϕ| ⊗k is a uniform linear combination of permutations, which can be written in terms of the projector on the symmetric subspace P (d,k) sym .Let d, k ∈ N.For all |ϕ⟩ ∈ C d , the moment operator is a uniform linear combination of permutations: j |i⟩ ⊗ |j⟩.This justifies the name "vectorization", as the linear map vec(•) sends the d × d matrix A to the d 2 -dimensional vector vec(A).Importantly, vec(•) is a bijection, meaning that for all |v⟩ ∈ (C d ) ⊗2 there exists a unique A ∈ L C d such that |v⟩ = vec(A) (in fact, one can take A = d i,j=1 ⟨i, j|v⟩ |i⟩⟨j|) and vice versa.It is often convenient to express vec(A) in the form vec(A) = A ⊗ I |Ω⟩, where |Ω⟩ := vec(I) = d i=1 |i, i⟩ is the vectorization of the identity matrix, i.e., the non-normalized maximally entangled state.To simplify the notation, we can use the shorthand |A⟩⟩ := vec(A).Another useful property is that the canonical inner product between two vectorized operators A and B is equal to their Hilbert-Schmidt scalar product, i.e., ⟨⟨A|B⟩⟩ = ⟨A, B⟩ HS , where ⟨⟨A|B⟩⟩ := vec(A) † vec(B) is the canonical inner product.Using the so-called transpose-trick A ⊗ I |Ω⟩ = I ⊗ A T |Ω⟩, we have the ABC-rule: and δ a,b represents the Kronecker delta.Similarly:

)Observation 29 .
Now Eq.(160) follows by observing that for all |ψ⟩ ∈ (C d ) ⊗2k there exists O ∈ L (C d ) ⊗k such that |O⟩⟩ = |ψ⟩.Similarly, the converse holds.If ν is a (k + 1)-design, then ν is also a k-design.Proof.Let O = O ′ ⊗ I d where O ′ ∈ L (C d ) ⊗k .The claim follows by applying the Definition 27 with such defined O and taking the partial trace with respect to the (k + 1)-th tensor product space.

)
Noting that c I,ρ T B = p ρ d −2 and c F,ρ T B = (1 − p ρ )d −1 , we obtain the desired result.Now, let us consider the problem of evaluating the expected purity of the reduced state obtained from a pure state distributed according to a 2-design distribution[81].Example 50 (Purity).Consider the complex Hilbert space of two-qudit systems HA ⊗ H B of dimensions respectively d A = dim(H A ) and d B = dim(H B ).Given |ψ⟩ ∈ H A ⊗ H B , let ρ A := Tr B (|ψ⟩⟨ψ|).We have:
Definition 23 (Haar measure on states).Given a state |ϕ⟩ in C d , we denote Note that the right invariance of the Haar measure implies that the definition of E ⊗kdoes not depend on the choice of |ϕ⟩.Moreover, due to Theorem 22, we have: Hence, we have |f (v) − f (w)| ≤ 2∥O∥ ∞ ∥u − v∥ 2 .By applying Levy's lemma with the Lipschitz constant of f being 2∥O∥ ∞ , we can conclude.