Matrix concentration inequalities and efficiency of random universal sets of quantum gates

For a random set $\mathcal{S} \subset U(d)$ of quantum gates we provide bounds on the probability that $\mathcal{S}$ forms a $\delta$-approximate $t$-design. In particular we have found that for $\mathcal{S}$ drawn from an exact $t$-design the probability that it forms a $\delta$-approximate $t$-design satisfies the inequality $\mathbb{P}\left(\delta \geq x \right)\leq 2D_t \, \frac{e^{-|\mathcal{S}| x \, \mathrm{arctanh}(x)}}{(1-x^2)^{|\mathcal{S}|/2}} = O\left( 2D_t \left( \frac{e^{-x^2}}{\sqrt{1-x^2}} \right)^{|\mathcal{S}|} \right)$, where $D_t$ is a sum over dimensions of unique irreducible representations appearing in the decomposition of $U \mapsto U^{\otimes t}\otimes \bar{U}^{\otimes t}$. We use our results to show that to obtain a $\delta$-approximate $t$-design with probability $P$ one needs $O( \delta^{-2}(t\log(d)-\log(1-P)))$ many random gates. We also analyze how $\delta$ concentrates around its expected value $\mathbb{E}\delta$ for random $\mathcal{S}$. Our results are valid for both symmetric and non-symmetric sets of gates.


Introduction and summary of main results
Practical realisations of quantum computers are constricted by noise and decoherence that affect large-scale quantum systems. Although quantum error correction codes can overcome those effects they require usage of thousands of physical qubits to implement a single logical noiseless fault-tolerant qubit [1]. This is clearly out of reach for contemporary quantum computers with the number of physical qubits of the order of hundreds [2]. Hence, in the near future we are forced to work with noisy intermediate-scale quantum devices (NISQ) [2][3][4]. Moreover, currently the best error rates per gate are the order of 0.1% [5,6] which implies we cannot build circuits much longer than thousand [2]. The length of a circuit is also limited by the coherence time and the time of execution of a single gate. It is noteworthy that it is hard to make gates that are both fast and have low error rates [5]. Clearly, there is a great need for quantum computation using as few gates as possible. One way of achieving this is by using universal sets of gates (gate-sets) of high efficiency, i.e. such that can approximate any unitary with circuits of minimal length. This idea is also connected to complexity of unitaries (see [7] for more details).
The efficiency of a universal set S (see [8][9][10] for criteria that allow deciding universality) is typically measured by the length of a circuit needed to approximate any quantum transformation with a given precision . The Solovay-Kitaev theorem states that all symmetric universal sets 1 are roughly the same efficient. More precisely, the length of a circuit that -approximates any U ∈ SU (d) is bounded by A(S) log c (1/ ) [11], where c ≥ 1. There have been recently some new developments connected to the Solovay-Kitaev theorem for gate-sets without inverses. First, in [12,13] it was shown that an -approximate poly-log length circuit exists even if one drops the assumption that a set S is symmetric. Moreover in [14] an algorithm implementing this sequence was given. To estimate the value of A(S) one can use the concept of δ-approximate t-designs [13,15]. To this end let {S, ν S } be an ensemble of quantum gates, where S is a finite subset of U (d) and ν S is a probability measure on S. Such an ensemble is called δ(ν S , t)-approximate t-design if and only if where · is the operator norm and for any measure ν (in particular for the Haar measure µ) we define a moment operator whereŪ is entry-wise conjugation of U . When δ(ν S , t) = 0 we say that {S, ν S } is an exact t-design. Thus (approximate) t-design are ensembles of unitaries that (approximately) recover Haar averages of polynomials in entries of unitaries up to the order t. More precisely any balanced polynomial of degree t on U (d) can be written as f A (U ) = Tr AU t,t , where A is a fixed matrix of size d 2t × d 2t . Assuming that {S, ν S } is a δ-approximate t-design we have (1) where A 1 = Tr √ AA † . Thus δ(ν S , t) controls the error we make when taking average of f A with respect to ν S instead of the Haar measure. Unitaries constituting δ-approximate t-design form -nets for t d 5/2 and δ 3/2 d d 2 [13]. As a direct consequence of property (1) δ-approximate t-designs find numerous applications throughout quantum information, including randomized benchmarking [16,17], information transmission [18], quantum state discrimination [19], criteria for universality of quantum gates [10] and complexity growth [7,[20][21][22]. It is also known that the constant A(S) from the Solovay-Kitaev theorem is inversely proportional to 1 − δ(ν S ), where δ(ν S ) := sup t δ(ν S , t), whenever δ(ν S ) < 1 [15]. Determining the value of δ(ν S ) requires computation of the norm of an infinite number of operators T ν S ,t which is analytically and numerically intractable. It is known, however, that δ(ν S ) < 1 under the additional assumption that gates have algebraic entries [23,24]. In this case also the constant c in the Solovay-Kitaev theorem is equal to 1. Recent results [25][26][27][28][29] based on some number theoretic constructions give examples of universal sets with c = 1 and the smallest possible value of δ(ν S ). The approach presented in these contributions has been unified in [29] where the author pointed out the connection of these new results to the seminal work concerning distributions of points on the sphere S 2 [30].
In contrast to the above mentioned contributions, in this work, we do not focus on concrete gate-sets but instead we aim to answer the natural question of how likely it is that a set of gates has high efficiency. In order to achieve this goal we need to characterize efficiency of random universal gate-sets. Calculation of δ(ν S ) is of course intractable. Therefore we follow the approach of [12,13] and consider δ(ν S , t) for some fixed t (which is determined by a precision ). The results of [12,13] ensure that for a given precision the constant A(S) in the Solovay-Kitaev theorem is inversely proportional to 1 − δ(ν S , t( )), where t( ) = O( −1 ), and the constant c = 1. Therefore, in order to characterize efficiency of random universal sets of gates we need to characterize a probability distribution of δ(ν S , t).
What remains is to make precise what kind of random gate-sets we want to consider. As there is a natural uniform measure on the unitary group, i.e. the Haar measure one can consider two types of gate-sets: where U k 's are independent and Haar random unitaries from U (d).
Such S will be called Haar random gate-set.
where U k 's are independent and Haar random unitaries from U (d). Such S will be called symmetric Haar random gate-set.
Another choice would be to start with an ensemble {D, ν D } that forms an exact t-design and consider two sets: where U k 's are independent and distributed according to ν D .
Such S will be called t-random gate-set.
. . , U n } ⊂ D and U k 's are independent and distributed according to ν D . Such S will be called symmetric t-random gate-set.
We note that putting D = U (d) and ν D = µ, where µ is the Haar measure on U (d), we get that a (symmetric) Haar random gate-set is a (symmetric) t-random gate set for any t. Thus (symmetric) Haar random gate-sets are (symmetric) ∞-random gate-sets and all the results that we prove for the (symmetric) t-random gate-sets are automatically true for (symmetric) Haar random gate-sets.
In order to simplify the notation we will often denote the cardinality of S by S (instead of |S|). Our main results are given in a form of bounds on the probability P (δ(ν S , t) ≥ δ), where S is a (symmetric) t-random gate set and ν S is a uniform measure on S. In order to obtain them we first show that T ν S ,t decomposes as a direct sum over irreducible representations of the unitary group U (d) that can be labeled by elements subset λ ∈ Λ t ⊂ Z d (see formula (23)). The blocks appearing in this decomposition, that we denote by T ν S ,λ , have dimensions d λ given by the Weyl dimension formula (20). Using the union bound we reduce the problem to finding bounds on P (δ(ν S , λ) ≥ δ). We achieve this combining recently derived matrix concentration inequalities [31] with the recent result concerning calculation of higher Frobenius-Schur indicators for semisimple Lie algebras [32], that we further improve. Our main results are Theorems 1, 2 that give concrete calculable bounds on P (δ(ν S , t) ≥ δ).
Theorem 1. Let S be a t-random gate-set and ν S a uniform measure. Then for any δ < 1 where, Λ t is given by (23) and d λ is given by (20).
These theorems are then used to obtain bounds on the size of a t-random gate-set needed to form, with a probability P , a δ-approximate t-designs, for various t's and δ's. We show that this size is of the order O(δ −2 (t log(d) − log(1 − P ))). Moreover, we compare the number of independent t-random n-qubit gates, S n , needed to form 0.01-approximate 2-design with the probability 0.99 and the size of the n-qubit Clifford group, C n , that is known to be an exact 2-design. The ratio of C n /S n turns out to grow exponentially with the number of qubits. In Section 6.3 we also show that Theorems 1 and 2 can be easily generalised to a scenario where instead of t-random set of gates we have a set of random quantum circuits composed of t-random independent gates. Theorem 1 can be used when one needs to calculate average of any (t, t)-polynomial over U (d). Such polynomials arise, for example, in the randomized benchmarking protocols where one is interested in assessing quality of quantum gates implementation and considers the k-th moments of the fidelity [33][34][35].
where Φ is a quantum channel that represents gate independent noise and for perfect implementation of U is the identity. Of course one can replace the Haar measure in (5) by an exact 2k-design as the integrated function is (2k, 2k)-polynomial. Using our results we can replace an exact 2k-design, which as shown in [35] has exponential size in n √ 2k, where n is a number of qubits, by a δ-approximate t-designs of size (see Section 7) Moreover we can control the error using (1). Finally we analyze concentration properties of δ(ν S , t) around its mean value E U ∼µ δ(ν S , t).
Our main result is Theorem 3. Let S be a Haar random gate-set. Then The methods behind its proof [36] can be extended to Haar random gate-sets with particular structure or architecture. Following this observation we analyze efficiency of random d-mode circuits build from 2-mode beamsplitters. More precisely we consider the Hilbert space H = H 1 ⊕. . .⊕H d , where H k C, d > 2 and we call spaces H k modes. For a matrix B ∈ SU (2), which we call a 2-mode beamsplitter, we define matrices B ij , i = j, to be the matrices that act on a 2-dimensional subspace H i ⊕ H j ⊂ H as B and on the other components of H as the identity. This way a matrix B ∈ SU (2) gives d(d − 1) matrices in SU (d). Applying this procedure to a Haar random gate-set set S ⊂ SU (2) we obtain random gate-set S d (see [37,38]) and it is natural to ask about its efficiency. Our main conclusion here is that a Haar random gate-set S ⊂ SU (2) gives the gate-set S d ⊂ SU (d) for which δ(ν S d , t) has the same concentration rate around the mean as a Haar random gate-set S ⊂ SU (d) of size: S = 2S d , i.e. Theorem 4. Let S ⊂ SU (2) be a Haar random gate-set and S d ⊂ SU (d) the corresponding d-mode gate-set. Then Using similar methods one can show the concentration around the mean value of any function for U k 's independent and Haar random as long as F is L-Lipschitz: where · F is a Frobenius (Hilbert-Schmidt) norm. We explain this in detail in Section 2.
A function F can be, for example, given by the k-th moment of the fidelity of a quantum circuit of the length n (see [35] for more details). The paper is organized as follows. In Section 2 we present a short review of matrix concentration inequalities that will play a central role in our setting. Next, in Section 3 we provide necessary information concerning irreducible representations of unitary groups. Then in Section 4 we introduce notion of moment operators. In Section 5 we explain the role of Frobenius-Schur indicators and how to compute them. The main results of these paper are then showed in Section 6, while Section 7 contains analysis of the results and applications.

Short review of relevant matrix concentration inequalities
In this section we review known upper bounds on the probability that P(F (X) ≥ δ), for classes of random matrices X and real valued functions F that are relevant in our setting. An interested reader is referred to [31,36] for more details. The first class of inequalities concerns a situation when X = X k , where X k ∈ Mat(d, C) are independent, random, Hermitian matrices and the function F is the operator norm of X, F (X) = X . Thus we are looking for an upper bound for P ( X ≥ δ). The line of reasoning is as follows. Let λ max (X) and λ min (X) denote the largest and the smallest eigenvalues of X respectively. In the first step one uses the exponential Markov inequality, i.e. P (Y ≥ t) = P e θY ≥ e θt ≤ e −θt Ee θY , for any θ > 0. Taking Y = λ max (X) and using the fact that e θλmax(X) ≤ tre θX we get Next we note that P (λ min (X) Next, using the Lieb theorem [39] (one can alternatively use the Golden-Thomson inequality [40,41] but the resulting bound is in general weaker [31] ) we obtain for any θ ∈ R. Combining (8) and (9) with (10) we get Finally, .
are independent, random, Hermitian matrices. Then A master bound can be also derived for the sum of non-Hermitian random matrices. To this end, following [31], we consider the Hermitian dilation map where H(2d) is the space of 2d × 2d Hermitian matrices given by This map is clearly a real linear map. One can also show (cf. [31]) that X = H(X) = λ max (H(X)). Making use of (11) we get: are independent random matrices. Then The upper bounds in Facts 1 and 2 can be further simplified by finding a majorization of log Ee θX k in terms of moments EX n k that allow analytic optimization over θ (see chapter 6 of [31] for more details). Usage of moments up to order two leads to the matrix Bernstein inequality.

Fact 3 (Matrix Bernstein inequality)
. Let X = X k , where X k ∈ Mat(d, C) are independent, random matrices such that: Let v be the matrix variance statistic of the sum: Then for all δ ≥ 0

Bounds for Haar random matrices
The second type of inequalities we will consider in this paper rely on the fact that the randomness comes from the Haar measure. The interested reader is referred to chapter 5 of [36] for detailed discussion. Here we only give the main result and mention that its proof is based on the fact that, by Bakery-Émery curvature criterion, the (special)unitary group equipped with the Haar measure satisfies the so-called logarithmic Sobolev inequality which, via the Herbst argument, leads to concentration of measure for Lipschitz functions.
, be equipped with the metric given by the L 2 -sum of Hilbert-Schmidt (Frobenius) metrics on the group G, i.e. the distance between and that U 1 , ..., U S are independent and chosen according to the Haar measure on G. Then for any α > 0 where C is equal to 2 for SU (d) and 6 for U (d).

Irreducible representations of U (d) and SU (d)
In this section we recall some basic facts about Lie groups, Lie algebras and their representations in the context of groups G = U (d) and G = SU (d).
In Table 1 we summarize information about those groups, where we used M 0 d (C) := {X ∈ M d (C) | Tr(X) = 0}. We will denote by g the Lie algebra of G. The Lie algebra su(d) has no non-trivial ideals and thus is semisimple. On the other hand the algebra u(d) has an abelian ideal consisting of matrices proportional to the identity and is not semisimple. We note, however, that [u(d), u(d)] = su(d). We will call a Lie group semisimple if its Lie algebra is semisimple. Other relevant for us algebras are Lie algebra complexification g C := g + ig, the Lie algebra of the maximal torus T in Gt and the Cartan subalgebra (CSA)h := t + it.
The functional α ∈ h * is called a root of g if and only if there exists X α ∈ g C such that: We denote the set of all roots by Φ and call it the root system. For a given root α the subspace of all X α satisfying (13) is called a root subspace of α and denoted by g α . The algebra g C decomposes Among roots we distinguish positive roots and positive simple roots For α, β ∈ h * we say that α is higher (lower) than β iff α − β is a linear combination of positive simple roots with non-negative (non-positive) coefficients and we denote it by α > β ( α < β).
We introduce the inner product on h * defined by The inner product gives us an isomorphism h * L → L | ∈ h hence further in the text we will identify h * with h. For any α ∈ Φ the root system is preserved under the reflection about the hyper-plane perpendicular to α: The group W :=< s α | α ∈ Φ > is called the Weyl group of Φ. In our case W is isomorphic to the group of permutations S d and action of σ ∈ W on h is given by: An element H ∈ h is called: • analytically integral iff for all X ∈ h such that e 2πiX = 1 there holds X|H ∈ Z, Accepted in Quantum 2023-04-05, click title to verify. Published under CC-BY 4.0.
For every finite dimensional representation π : G → GL(V ) of G there exists representation ρ : g → gl(V ) such that for any X ∈ g there holds π e X = e ρ(X) .
Let us define the complexification of ρ to be ρ C : Then the following are equivalent: • π is irreducible, Further in the text we will slightly abuse notation and we will use ρ also for ρ C . A weight of ρ is an integral element µ such that there exists a non-zero vector v µ ∈ ρ satisfying: Subspace of all v µ satisfying (16) is called the weight space of µ and we denote it by ρ µ or π µ . The multiplicity m µ (or m(µ)) of µ is the dimension of its weight space. Every irreducible representation is a direct sum of its weight spaces. The notions of weight space and root space are closely connected. Indeed for v µ ∈ ρ µ , X α ∈ g α and H ∈ h we have: thus ρ (X α ) v µ is either 0 or in ρ µ+α . The highest weight λ is a weight in ρ that is higher than all other weights in ρ. Now, we can state the theorem of the highest weight.
Fact 5 (Theorem of the highest weight). For semi-simple complex Lie algebra g C and its finite-dimensional representation ρ we have: 1. ρ has unique highest weight λ,

if λ is dominant and integral then there exists finite-dimensional irreducible repre-
sentation of g C with highest weight λ.

Fact 6.
If G is compact and connected the analogous theorem holds with the only difference that the highest weight λ has to be analytically integral.
In case of U (d) we will identify highest weights λ with sequences (λ 1 , ..., λ d ) where λ i := λ|L i and in case of SU (d) we will identify highest weights λ s with sequences . From the second point of Fact 5 we have that for i < j there holds λ i ≥ λ j and from analytical integrality λ i ∈ Z. Moreover, from the fourth point of Fact 5 every sequence from Z d satisfying those conditions uniquely defines the highest weight λ and the associated representation. Analogously for SU (d) the condition is λ s i ∈ Z + and any element of Z d−1 + defines the highest weight. In both cases we use π λ and ρ λ to denote the irreducible finite-dimensional representations with highest weight λ.
The question arises what is the relation between λ and λ s . On the other hand, irreducible representations of SU (d) are often labeled by the Young diagrams instead of highest weights. In the following lemma we explore the relations between highest weights of U (d), highest weights of SU (d) and Young diagrams.
Then the corresponding representation of SU (d) has a Young diagram given by (λ 1 − λ d , . . . , λ d−1 − λ d ) and the standard highest weight of this representation is given by Proof. By the definition of λ we have that for the highest weight vector v λ ∈ ρ λ λ and any H ∈ h it holds: Then for the representation ρ λ := ρ λ | sl(d,C) and its any subspace V we have: Hence irreducibility of ρ λ implies the irreducibility of ρ λ .
Since v λ is the highest weight vector the sequence λ Y : To simplify the description of highest weights we introduce the following notation. For any λ = (λ 1 , . . . , λ d ) in Z d we will denote its length by l(λ) = d. By λ + we will denote the subsequence of λ of positive integers. Moreover The value of d λ is determined by the Weyl dimension formula Fact 7. [42] Suppose that π λ is an irreducible representation of semi-simple Lie group G then µ ∈ h is a weight of π λ if and only if the following two conditions are satisfied: We use Fact 7 to prove the following lemma.
Proof. The first condition of Fact 7 reads which proves the first part of the lemma. The second condition from Fact 7 implies By acting Σ on both sides we obtain 4 Moment operators where · is the operator norm and for any measure ν (in particular for the Haar measure µ) we define a moment operator whereŪ is entry-wise conjugation of U .
One can easily show that 0 ≤ δ(ν S , t) ≤ 1 [13]. When δ(ν S , t) = 0 we say that S is an exact t-design and when δ(ν S , t) = 1 we say that S is not a t-design.
Let us note that the entries of U t,t are monomials of the order t in the entries of U and of the order t in the entries ofŪ . We will call them (t, t)-monomials. The space of (t, t)-polynomials, which we denote by H t , is defined as the linear span of (t, t)-monomials. One can write any element f A ∈ H t as Thus δ(ν S , t) controls the error we make when taking average of f A with respect to ν S instead of the Haar measure.
A map U → U t,t is a representation of the unitary group U (d). This representation is reducible and decomposes into some irreducible representations U (d).

Fact 9. ([43]) Irreducible representations that appear in the decomposition of
where and 1 stands for the trivial representation and m 0 is its multiplicity and m λ is the multiplicity of π λ and stands for a unitary equivalence of representations.
The representations occurring in decomposition (22) are in fact irreducible representation of the projective unitary group, P U (d) One can show that every irreducible representation of P U (d) arises this way for some, possibly large, t [44]. For t = 1 decomposition (22) is particularly simple and reads U ⊗Ū Ad U ⊕1, where 1 stands for the trivial representation and Ad U is the adjoint representation of U (d) and P U (d) Ad U (d) 3 . For any irreducible representation π λ , λ ∈ Λ t we define Next we define δ(ν S , λ) := T ν S ,λ . It follows directly from the definitions and discussion above that One can also define δ(ν S ) := sup t δ(ν S , t). It is known that for S finite and ν S uniform Moreover the number of distinct irreducible representations π λ with λ 1 = 2k is given by where p n (k) is the number of partitions of k with exactly n integers andp n (k) is the number of partitions of k with at most n integers. When d ≥ 2t formula (25) simplifies to where p(k) is number of all partitions of k.
Proof. First, we will prove λ 1 = 2k. We define λ − to be the subsequence of λ of negative integers. Then from the condition Σ(λ) = 0 we have: Next, from the Eq. 18: Thus λ 1 = 2k for k = Σ(λ + ). Now let us put n = l(λ + ) then the λ + is a decreasing sequence of n positive integers that sum up to Σ(λ + ) = k so it is a partition of k with exactly n integers and (−(λ − ) l(λ − ) , ..., −(λ − ) 1 ) is a decreasing sequence of l(λ − ) ≤ d − n positive integers that sum up to −Σ(λ − ) = k so it is a partition of k with at most n integers.
On the other hand, if we take η to be a partition of k with exactly n integers and ζ to be a partition of k with at most n integers such that d ≥ n + l(ζ) then they can be combined into a sequence:λ = (η 1 , ..., η n , 0, ..., 0 , −ζ l(ζ) , ..., −ζ 1 ), that is clearly an element of Λ t and λ 1 = 2k. Hence there is a one to one correspondence between sequences like λ and pairs of partitions like (η, ζ). Thus to prove the formula (25) the only thing left to do is to note that the equations Σ(λ) = 0 and λ = 0 imply inequalities The formula (26) results easily from the fact that for n > k we have p n (k) = 0 and p n (k) = p(k) for n ≥ k.
Proof. As the operator norm is unitarily invariant it is enough to show that for a diagonal unitary matrix U = diag e iφ 1 , . . . , e iφ d ∈ U (d), φ k ∈ (−π, π] we have: One easily sees that The eigenvalues of π λ (U ) are given by

Moment operators for quantum circuits
We can generalize the notion of moment operators to random quantum circuit in a natural way. Recall that a quantum circuit of depth m is a product of m quantum gates. Thus we can identify quantum circuits with elements of U (d) ×m .
Consider an ensemble {R, ν R } where R ⊂ U (d) ×m is a set of circuits of length m and ν R is a probability measure on R.
Note that the entries of U t,t form a basis of the space of all (t, t)−polynomials in entries of U 1 , ..., U m which we will call H t,m . The average of U t,t over all circuits with the same probability is where × is a product of measures. Using Fact 9 we get that: whereΛ t := Λ t ∪ {0}. To simplify notation let us define: It is easy to see that:

Frobenius-Schur indicators
In Section 2 we explained that to obtain bounds on the norm of the random matrix X = k X k one needs to compute all moments E (X n k ). In this article we will be interested in estimating δ(ν S , t 1 ) for ν S uniform and S finite with each element of S chosen from an exact t 2 -design {D, ν D }. Thus in our scenario the role of X is played by X k 's are matrices proportional to π λ (U ) and the average is taken over for λ ∈ Λ t 1 and n ∈ Z. In particular in Lemma 11 we show that this integral is proportional to 1 d λ and the proportionality constant is the Frobenius-Schur indicator U (d) dµ(U )χ λ (U n ) divided by d λ .

Lemma 10.
Consider an irreducible finite-dimensional representation π λ of U (d) with the highest weight λ ∈ Λ t , an integer n and an exact (|n| · t)-design {D, ν D }. Then the average of π λ (U ) n taken over ν D and µ is the same, that is Proof. If n < 0 we can use unitarity of π λ to obtain that for any measure ν Therefore it is enough to prove the Lemma for n ≥ 0. In such case note that Entries of the right-hand side operator above are for all possible λ 1 , ..., λ n ∈ Λ t and 1 ≤ i k , j k ≤ d λ k . Since ν D is a (n · t)-design integrating the expression (28) over ν D and µ gives the same result. Thus the Lemma follows from the fact that the entries of π λ (U ) n are linear combinations of expressions (28) with λ 1 = ... = λ n = λ and j k = i k+1 .

Lemma 11.
For any irreducible representation π λ of a compact Lie group G and n ∈ Z we have Proof. Assume that n ≥ 0. For any g ∈ G we have where in the last equality we performed a change of variables U = V g and used the invariance of the Haar measure. Similarly where in the last equality we performed a change of variables U = gV and used the invariance of the Haar measure. In case n < 0 the argument is analogous but with the change of variables U = g −1 V in the first equation and U = V g −1 in the second. Thus for any g ∈ G, the matrix π λ (g) commutes with the integral in (11). By Schur lemma this integral must be proportional to identity. The proportionality constant, δ λ (n), can be establish by taking trace of both sides of equation (11).

Corollary 12. For any irreducible finite-dimensional representation π λ of U (d) with the highest weight λ ∈ Λ t , an integer n and an exact
Proof. From Lemma 10 we know that and from Lemma 11 that E U ∼µ π λ (U n ) = δ λ (n)1 d λ .
From the orthogonality of characters we know that for λ = 0 thus δ λ (±1) = 0. In case of δ λ (±2) we use a well known fact that is equal to 1, 0 or −1 for π λ real, complex or quaternionic respectively. Thus To calculate δ λ (n) for |n| > 2 we will use the following result from [32].
Immediate conclusion from Fact 10 is that there is n 0 such that for n ≥ n 0 elements ρ−σ·ρ n are integral elements if and only if σ · ρ = ρ. In the next lemma we calculate n 0 for the group SU (d).

Lemma 13.
For the group SU (d) the constant n 0 is equal to d + 1 and for n ≥ n 0 the formula (10) simplifies to: Proof. First, let us calculate ρ: Thus for i = j we have ρ i = ρ j and the only σ ∈ S d such that σ · ρ = ρ is the identity, which proves the simplified formula (29). In order to find minimal n 0 note first that and that for ρ−σ·ρ n to be an integral element it is required that for all 1 If n = d then σ(i) = i − 1 mod d satisfies this condition thus n 0 > d. On the other hand, for n > d the condition |σ(i + 1) − σ(i)| ≤ d − 1 implies that the constant C i in (30) has to be zero for all i. Let us choose j such that j = d and σ(j) = d then what is an obvious contradiction and σ −1 (d) has to be d but this implies σ(d − 1) = σ(d) − 1 = d − 1 and analogously for all 1 ≤ i ≤ d we have σ(i) = i. Therefore for n > d the only σ satisfying (30) is the trivial one.
In the next lemma we show when the representation π λ s has the weight 0. In particular it implies that for every λ ∈ Λ t the representation π λ s has the weight 0. Proof. We will start with 1) ⇒ 2). From Fact 7 there exist c 1 , ..., c d−1 ∈ Z such that where we assume c 0 = 0 = c d . It follows satisfies λ i ∈ Z and i < j ⇒ λ i ≥ λ j . Therefore there exists an irreducible finitedimensional representation of U (d) with highest weight λπ λ . From Lemma 5 clearly What is more for Hence by Fact 7 weight 0 is in π λ what implies that weight 0 is in π λ s .
As an example we next explicitly calculate the value of δ λ (n) for G = SU (2).

Lemma 15. The constant in Lemma 11 for SU (2) and non-trivial π λ is given by
Hence , if λ s -odd.

Main results
In this section we will consider two types of gate-sets: where U k 's are independent random unitaries from an exact tdesign D chosen according to the measure ν D . Such S will be called t-random gateset.
where U k 's are independent random unitaries from an exact t-design D chosen according to the measure ν D . Such S will be called symmetric t-random gate-set.
Note that in the symmetric case we have U k ∈ D but not necessarily U −1 k ∈ D so to obtain a symmetric t-random gate-set of size 2n we draw n gates from {D, ν D } and then we add to the set their inverses. We also want to point out that since {U (d), µ} is an exact t-design for any t all theorems we present below apply to gate-sets where gates are Haar random or Haar random with inverses (we could call them ∞-random gate-sets or symmetric ∞-random gate-sets). In order to simplify the notation we often denote the cardinality of S by S (instead of |S|). The moment operators associated with the above two types of random sets of gates are random matrices. When S is symmetric they are actually random Hermitian matrices. Using inequalities listed in Section 2 we derive upper bounds on P (δ(ν S , λ) ≥ δ) and P (δ(ν S , t) ≥ δ). Before we proceed with concrete inequalities we note that

Lemma 16. Assume that S ⊂ U (d) is a random set of quantum gates and that for every
Proof. By (24) and using the union bound for probabilities we have Thus, to find a bound on P (δ(ν S , t) ≥ δ) it is enough to find bounds on P (δ(ν S , λ) ≥ δ), λ ∈ Λ t , which we do in next sections.

Bernstein type bounds
In this section we derive bounds on P ( T ν S ,λ ≥ δ) using Bernstein inequality (see Section 2). It is worth to mention that computationally this is the simplest derivation presented in this paper as it requires only the knowledge of the second moments.
Theorem 17. Let λ be an element of Λ t and S be a t-random gate-set or a symmetric (2t)-random gate-set and ν S a uniform measure. Then Proof. In this proof we will use Fact 3 where we take X = k X k to be equal T ν S ,λ = 1 S U ∈S π λ (U ). Let us start with the Haar random case. In this case all elements of S are independent. Thus we will put X k = 1 S π λ (U k ) for k = 1, ..., S. Then from Corollary 12 Since π λ is unitary we have

satisfy conditions from Fact 3 and the result follows.
When S is symmetric we put X k = 1 S π λ (U ) + π λ U −1 . From Corollary 12 we have E U ∼ν D X k = 0. Then using triangle inequality we get and from hermiticity of X k Then using Corollary 12 once more we obtain Hence for L = 2 S and the result follows from Fact 3.

Master bounds
In this section we derive Theorems 1 and 2 from Introduction using the formula for the master bound (see Section 2). This derivation requires knowledge of all moments and is much more accurate than the Bernstein bound given in the previous section.

Haar random gate-sets
For t-random gate-sets the derivation of the bound for P (δ(ν S , λ) ≥ δ) turns out to be relatively simple.
Theorem 18. Let λ be an element of Λ t and S be a t-random gate-set and ν S a uniform measure. Then for any δ < 1 Proof. To compute the master bound for non-Hermitian matrix (Fact 2) we need to compute E U ∼ν D exp θ S H(π λ (U )) . First, let us note that for any unitary matrix U ∈ U (d λ ): Since π λ is unitary we have From Corollary 12 and the orthogonality of characters: Hence the right hand side of the master bound is: By calculating derivative we easily get that the infimum is obtained for θ = S arctanh(δ) and the bound is: S 2 e −δS arctanh(δ) .

Symmetric Haar random gate-sets
In the following we develop new bounds which are based on the fact that we can calculate explicitly the expected values that appear in the master bound (see Section 2). In order to perform this calculation we need to know all moments δ λ (n). To do this for n ≤ d we use an explicit formula from Fact 10 and for n > d we use it simplified form from Lemma 13. We derive the bound in Theorem 19 which, using Fact 11, we next optimize to obtain the main result of this section, that is, Theorem 20.

Theorem 19.
Let S be a symmetric Haar random gate-set from SU (d) and ν S a uniform measure. Then where Thus our main objective is to compute the E U ∼µ e θ S X U,λ s . Using the binomial formula one easily gets: From Lemma 13 we know that for |n − 2k| > d the value of δ λ s is where we assume n k = 0 for n < k or k < 0. We will consider separately cases for different parities of d and n: • d = 2p and n = 2l: • d = 2p and n = 2l + 1: Then we have: • d = 2p + 1 and n = 2l: • d = 2p + 1 and n = 2l + 1: Then we have: Combining Eq. 35 and Eq. 36 with Fact 1 we get the desired result. . Thus for λ s with large coefficients the expression in bracket in Eq. (32) is approximately equal to γ λ s (0)I 0 ( 2θ S ). In order to proceed we need the following property of the modified Bessel functions. Fact 11. [45] For any n ≥ 0 we have the following upper and lower bounds on the ratio of the modified Bessel functions Combining Fact 11 with Theorem 19 we arrive at Theorem 20. Let S be a symmetric Haar random gate-set from SU (d) and ν S a uniform measure. Then where F (·, ·, ·) is given by (33).
Proof. In order to find the best bound we need to determine θ that realizes As finding minimum of the above functions is analytically intractable, in both cases we look for θ that minimizes e −θδ I 0 2θ S S 2 . Taking the derivative with respect to θ we get where x = 2θ S . Using the upper bound for the ration of modified Bessel functions from Fact 11 we get The result follows.

Bernstein and master bounds for random circuits
Let us consider a scenario where we draw m independent gates from the exact t 0 -design {D, ν D } and combine them into a circuit U = (U 1 , ..., U m ). We repeat this procedure some number of times and in this way we construct a set of random circuits R ⊂ D ×m . Then for f ∈ H t,m we ask how much averaging of f over R differs from averaging over all circuits D ×m . More precisely we want to bound the tail probability P( T ν ×m D ,t − T ν R ,t ≥ δ), for ν R uniform, 0 < δ < 1 and t ≤ t 0 . Note that since D is an exact t 0 -design for any t ≤ t 0 we have: Thus With this we can prove Bernstein and master bounds for random circuits in a very similar way to the case with random gates. For example, consider calculation of moments of π λ (U) for n · t ≤ t 0 : Therefore Bernstein, symmetric Bernstein and master bounds for random circuits are almost exactly the same as for random gates with only small changes summarized in the below diagram: and then for λ = (λ 1 , ..., λ m ) ∈ Λ t,m : In case of symmetric master bound we have to additionally substitute

Concentration of δ(ν S , t) and δ(ν S , λ) around their expected values
In this section we derive Theorems 3 and 4 from Introduction about the concentration properties of δ(ν S , t) and δ(ν S , λ) using Fact 4.

Theorem 21. Let S ⊂ SU (d) be a Haar random gate-set. Then
Proof. Let S ⊂ SU (d) be any set of cardinality n > 1. For the convenience we denote by S 1 = {U 1 , . . . , U n }, and S 2 = {V 1 , . . . , V n } two exemplary sets S. Let We need to show that there is a constant L such that for any S 1 and S 2 we have In order to determine the value of L we go through the following chain of inequalities Thus the Lipschitz constant is L = 2t √ S . Knowing the value of L we use Fact 4 and obtain the result.

Theorem 22. Let S ⊂ SU (d) be a Haar random gate-set. Then
Proof. Let S ⊂ SU (d) be any set of cardinality n > 1. For the convenience we denote by S 1 = {U 1 , . . . , U n }, and S 2 = {V 1 , . . . , V n } two exemplary sets S. Let We need to show that there is a constant L such that for any S 1 and S 2 we have In order to determine the value of L we go through the following chain of inequalities where in the second inequality we used Lemma 9. Thus the Lipschitz constant is L = π λ 1 2 √ S . Knowing the value of L we use Fact 4 and obtain the result.

d-mode beamsplitters built from random 2-mode beamsplitters
In this section we consider the Hilbert space We will call spaces H k modes. For a matrix B ∈ SU (2), which we call a 2-mode beamsplitter, we define matrices B ij , i = j, to be the matrices that act on a 2-dimensional subspace H i ⊕ H j ⊂ H as B and on the other components of H as the identity. This way a matrix B ∈ SU (2) gives d(d − 1) matrices in SU (d). Applying this procedure to a Haar random gate-set set S ⊂ SU (2) we obtain random gate-set S d (see [37,38] Proof. Let S 1 , S 2 ⊂ SU (2) be two gate-sets of the same size n, i.
Thus the Lipschitz constant is L = 2t √ S . Knowing the value of L we use Fact 4 and obtain the result.
As a conclusion we see that a Haar random gate-set S ⊂ SU (2) gives the gate-set S d ⊂ SU (d) for which δ(ν S d , t) has the same concentration rate around the mean as a Haar random gate-set S ⊂ SU (d) of size: We note, however, that using this approach one cannot say anything about the relationship between E U ∼µ δ(ν S d , t) and E U ∼µ δ(ν S , t).

Comparison of bounds
In this section we use numerical results to compare derived bounds for various values of t (Fig. 1), d (Fig. 2) and S (Fig. 3 and Fig. 4). Throughout this section we will use the following convention: • Bounds using Theorem 17 will be called Bernstein bounds and symmetric Bernstein bounds. They will be plotted with a dashed yellow line and a solid yellow line respectively.
• Bounds using Theorem 18 will be called master bounds and they will be plotted with a dashed blue line.
• Bounds using Theorem 19 will be called symmetric master bounds and they will be plotted with a solid blue line.
• Bounds using Theorem 20 will be called simplified symmetric master bound and they will be plotted with solid orange line with crosses.
We assume that gate-sets are (c · t)-random with c chosen in such a way that the Theorems mentioned above can be applied to δ(ν S , t), that is: 1 for Bernstein and master bounds, 2 for symmetric Bernstein bounds, ∞ for symmetric master bounds.
First conclusion from Figures 1, 2, 3 and 4 is that the (symmetric) master bound is tighter than the (symmetric) Bernstein bound. The difference gets more pronounced with bigger t and d and smaller S. Next, note that simplified symmetric master bound is almost identical with symmetric master bound what implies that our guess in derivation of Theorem 20 was very close to optimal, even for small t. This can be explained by the fact that the functions under the infimum from (32) have the derivatives very close to zero in a quite wide interval near the minimum (see Fig. 5). Thus the range of close to optimal guesses for infimum is quite wide as well.
We also analyze what is (according to our bounds) the gain of adding inverses to random gate-sets i.e. by making them symmetric. Figures 1, 2 and 3 indicate that bounds for set with inverses are tighter but the difference decreases with growing S. On the other hand, when we compare gate-sets of the same size (see Fig. 4) then bounds for sets without inverses are better. Thus, we conclude that new random gates improve δ(ν S , t) more significantly than additional inverses of gates that were already in the set.  Next we check what are minimal sizes of S required to obtain δ-approximate t-design, with probability at least P according to the master bound. For t−random gate-sets the formula can be easily obtained from Theorem 1 and reads: For symmetric Haar random gate-sets, however, it is much more difficult to obtain analogous formula using Theorem 2. Nevertheless the numerical calculations of Table 2 suggest that the required number of gates is around half of (37) plus inverses.
forms 0.01-approximate 2-design with the probability P = 0.99. On the other hand the cardinality of C n is given by [11] |C n | = 2 n 2 +2n n j=1 4 j − 1 . Figure 6 shows |Cn| Sn for up to 50 qubits. The ratio |Cn| Sn grows at least exponentially with n. In fact, one can easily see, that λ∈Λt d λ ≤ d 2t . Thus (37) grows in a logarithmic way 0 10 20 30 40 50 n 1e-200 1e100 1e400 1e700 1e1000 1e1300 1e1600 | n | | n | Figure 6: Ratio of the cardinality of the set of n-qubit Clifford gates C n and the minimal cardinality of the t-random gate-set S n ⊂ U (2 n ) that forms a 0.01-approximate 2-design with the probability at least 0.99.
We also note that if ν S is a δ-approximate t-design with probability bounded by 1 − . Then ν * l S , whose support are all circuits of depth l built from gates form S, is a δ lapproximate t-design with probability bounded by 1 − . Thus one can construct first 1/2-approximate t-design and then by building all possible circuits of length l change it to 1/2 l -approximate t-design. Table 2) presents numerical calculations for various t and d.