Diagnosing Barren Plateaus with Tools from Quantum Optimal Control

Variational Quantum Algorithms (VQAs) have received considerable attention due to their potential for achieving near-term quantum advantage. However, more work is needed to understand their scalability. One known scaling result for VQAs is barren plateaus, where certain circumstances lead to exponentially vanishing gradients. It is common folklore that problem-inspired ansatzes avoid barren plateaus, but in fact, very little is known about their gradient scaling. In this work we employ tools from quantum optimal control to develop a framework that can diagnose the presence or absence of barren plateaus for problem-inspired ansatzes. Such ansatzes include the Quantum Alternating Operator Ansatz (QAOA), the Hamiltonian Variational Ansatz (HVA), and others. With our framework, we prove that avoiding barren plateaus for these ansatzes is not always guaranteed. Specifically, we show that the gradient scaling of the VQA depends on the degree of controllability of the system, and hence can be diagnosed through the dynamical Lie algebra $\mathfrak{g}$ obtained from the generators of the ansatz. We analyze the existence of barren plateaus in QAOA and HVA ansatzes, and we highlight the role of the input state, as different initial states can lead to the presence or absence of barren plateaus. Taken together, our results provide a framework for trainability-aware ansatz design strategies that do not come at the cost of extra quantum resources. Moreover, we prove no-go results for obtaining ground states with variational ansatzes for controllable system such as spin glasses. Our work establishes a link between the existence of barren plateaus and the scaling of the dimension of $\mathfrak{g}$.


INTRODUCTION
Quantum computers hold the promise to achieve computational speed-ups over classical supercomputers for certain tasks [1,2,3,4]. However, despite recent tremendous progress in quantum technologies, present-day quantum devices (known as Noisy Intermediate-Scale Quantum (NISQ) devices) are constrained by the limited number of qubits, connectivity, and by the presence of quantum noise [5]. Hence, it becomes crucial to determine what are the capabilities and limitations of NISQ computers to achieving a quantum advantage.
One of the most promising computational models for making use of near-term quantum computers are Variational Quantum Algorithms (VQAs) [6]. Here, a task of interest is encoded into a parametrized cost function C(θ) that is efficiently computable on a noisy quantum computer. Part of the computational complexity is pushed onto classical computers by leveraging the power of classical optimizer that train the parameters θ and minimize the cost. VQAs have been proposed for tasks such as solving linear systems of equations [7,8,9] or performing dynamical quantum simulations [10,11,12,13,14,15,16,17], as well as for many others relevant applications [18,19,20,21,22,23,24,25,26,27,28,29,30].
Despite the wide application range of VQAs, their widespread use is still limited by several challenges that can hinder their success. For instance, it has been shown that the optimization task associated with minimizing the VQA cost function is in general an NPhard non-convex optimization problem [31]. Moreover, despite the typical difficulties encountered in classical non-convex optimization tasks, there are new challenges that arise when training the parameters of VQAs such as hardware noise, or the limited precision arising from a limited number of shots. These difficulties have then led to several quantum-aware optimizers being developed [32,33,34,10,35,36].
In addition, certain VQAs have been shown to exhibit the so-called barren plateau phenomenon, where the cost function becomes untrainable due to gradients that vanish, on average, exponentially with the system size [37,38,39,40,41,42,43,44,45,46,47,48]. Thus, barren plateaus have then been recognized as one of the main limitations to overcome in order to preserve the hope of achieving quantum advantage with VQAs. Recently, a great deal of effort has been put forward to developing methods that can mitigate the effect of barren plateaus [49,50,51,52,53,54,55], but ideally one would like to devise and employ VQA ansatzes which do not exhibit barren plateaus altogether.
For instance, it is known that one should avoid problem-agnostic ansatzes such as deep hardware efficient ansatzes, as these can exhibit barren plateaus due to their high expressibility [37,38,47]. Hence, so-called problem-inspired ansatzes have been speculated to be able to overcome barren plateaus by encoding information about the problem at hand in the ansatz. Here, the intuition is that problem-inspired ansatzes constrain the space explored during the optimization to a space that either contains the solution to the problem, or that at least contains a good approximation to the solution, while maintaining a low expressibility.
In this work we employ tools from Quantum Optimal Control (QOC) to diagnose the presence or absence of barren plateaus in certain families of probleminspired ansatzes with a periodic structure. QOC theory is a long standing theoretical framework developed to provide tools for the manipulation of quantum dynamical processes. As shown in Fig. 1, we here make use of the fact that periodic VQAs and QOC systems can be considered as different level formulations of a common variational problem [56], as both aim at driving a quantum system with a classical optimization loop. Most importantly, this connection allows us to understand and forecast the presence or absence of barren plateaus in problem-inspired variational ansatzes like the Quantum Alternating Operator Ansatz (QAOA) [19,57] and the Hamiltonian Variational Ansatz (HVA) [58,59]. We note that the procedure is perfectly suitable for other periodic ansatzes like the adaptive QAOA [57, 60] and the quantum optimal control ansatz [61]. Moreover, our results also extend to quantum neural network architectures used in the quantum machine learning literature [62]. Our results indicate that probleminspired ansatzes are not immune to barren plateaus, and hence that certain ansatz strategies in the literature need to be revised.
Our main results are organized into propositions and theorems that show that one can diagnose the existence of barren plateaus by analyzing the controllability of the system, i.e., by studying the Dynamical Lie Algebra (DLA) of the system. The DLA is the subspace of operator space spanned by the nested commutators of the elements in the set of genera- VQAs and QOCs can be regarded as two different levels of a theory that manipulate the evolution of a quantum system by training sets of parameters governing the system's dynamical evolution [56]. In VQAs (QOC) one applies a series of parametrized quantum gates (control pulses) to an input state. By gathering knowledge on the evolution via measurements on the resulting evolved state, the set of parameters (controls) are trained using a classical optimizer until a given task is completed. In this work we consider VQAs and QOC systems that have periodic structure ansatzes as in Eq. (2).
tors of the ansatz (e.g., see [63] for an introduction to quantum control theory). In an effort to give a comprehensive picture, our results follow the different controllability scenarios shown in Fig. 2.
The manuscript is organized as follows. In Section 2 we present the theoretical framework for VQAs, which includes a description of the type of ansatz considered, as well as a basic review of concepts related to barren plateaus and ansatz expressibility. Then, in Section 3 we introduce the framework of QOC, and recall how in QOC theory the DLA of the ansatz generators is used to study the controllability of the system. Section 4 contains the main results of this work, while in Section 5 we present our numerical simulations. Finally, in Section 6 we present our discussions and conclusions.

VARIATIONAL QUANTUM ALGO-RITHMS
In this section we review the basic framework of Variational Quantum Algorithms (VQAs). In particular, we discuss a general form for ansatzes that have a periodic structure, which we consider throughout this work. Since our goal is to analyze the gradient scaling, we additionally provide an overview of the barren plateau phenomenon.

General framework
We consider an optimization task where the goal is to minimize a cost function of the form C(θ) = Tr[OU (θ)ρU † (θ)] . (1) Here, ρ is an input state on n qubits in a d-dimensional Hilbert space with d = 2 n , U (θ) a parametrized quantum circuit, and O is a Hermitian operator that defines the task at hand. Throughout this work we consider layered parametrized quantum circuits that, as shown in Fig. 1, have a periodic structure of the form (2) Here, the index l indicates the layer, θ l = (θ l1 , . . . , θ lK ) contains the parameters of such layer (such that θ = {θ l } L l=1 ), and H k are Hermitian traceless operators that generate the unitaries in the ansatz. For generality, one can also allow for certain layers to be unparametrized, in which case one would simply set certain θ lk to be constant. In what follows we refer to this type of ansatz as a Periodic Structure Ansatz (PSA). We refer the reader to Appendix B for a detailed discussion of several ansatzes from the literature that are PSAs.

Barren plateaus
Recently, it has been shown that the choice of ansatz can hinder the trainability of the parameters θ for large problem sizes due to the existence of the socalled barren plateau phenomenon. In this context, deep unstructured problem-agnostic ansatz are known to exhibit barren plateaus [37,49,47]. Hence, the design of ansatzes that overcome barren plateaus has been recognized as one of the most important challenges to guarantee the success of VQAs [6], and problem-inspired ansatzes have been proposed as one of the most promising strategies. However, despite their promise, little is known about the existence of barren plateaus in problem-inspired ansatzes.
Let us now briefly recall that when the cost exhibits a barren plateau, the gradients are exponentially suppressed (on average) across the optimization landscape. This implies that an exponentially large precision is needed to navigate trough the flat landscape and determine a cost minimizing direction [37,49,42]. Hence, consider the following definition.

Definition 1 (Barren Plateau). A cost function C(θ) as in Eq.
(1) is said to have a barren plateau when training θ µ ≡ θ pq ∈ θ, if the cost function partial derivative ∂C(θ)/∂θ µ ≡ ∂ µ C(θ) is such that for some b > 1. Here the variance is taken with respect to the set of parameters θ.
We refer the reader to Appendix C for additional details on barren plateaus.
It is worth remarking that the barren plateau phenomenon has been linked to the expressibility of the ansatz, as it has been shown that circuits with large expressibility will exhibit small gradients [47]. In this context, one can quantify the expressibility of an ansatz by comparing the distribution of unitaries obtained from U (θ) to the maximally expressive uniform (Haar) distribution U H [64]. Defining the t-th moment superoperator of the distribution generated by the ansatz U (θ), we recall that its ordinary action on a given operator can be obtained by placing said operator into the center of the representation of M In our case, we will only be interested in second moments. For that reason, we will focus on the deviation of the second moments of the distribution generated by the ansatz M (2) U H the second moments of the Haar distribution, via the norm of the superoperator For our purposes here, we find it convenient to define the expressibility as the infinity norm, A ∞ = λ max (A), with λ max (A) the largest singular value of A 1 . Thus, the more expressible the ansatz, the smaller the norm A (2) U (θ) ∞ , and the smaller the gradients of the cost partial derivatives [47]. The limit A (2) U (θ) ∞ = 0 is reached when U (θ) forms a 2-design, in which case the cost exhibits a barren plateau according to Definition 1 [37,49].

QUANTUM OPTIMAL CONTROL
Quantum Optimal Control (QOC) is a theoretical framework that provides tools for the systematic manipulation of quantum dynamical systems. The connection between VQAs and QOC has been previously established showing that one can use QOC tools to specify the parameters θ at a device-level [69,56,70] and to analyze VQA landscapes [71]. Conversely, tools from VQAs have been employed to determine optimal control sequences [72]. In particular, Ref. [56] notes that VQAs and QOC can be unified as formulations of variational optimization at the circuit level and pulse level, respectively. In addition, the framework of QOC has been employed to analyze the computational universality of quantum circuits [73,74,75], as well as their reachability [76].
In QOC one is interested in controlling the dynamical evolution of a quantum state |ψ in a complex d-dimensional Hilbert space H = C d (where d = 2 n ) [63]. In the typical setting, one has a Hamiltonian that is tunable through some time-dependent functions {f k (t)}, know as control fields or protocols. The fixed Hamiltonian H 0 , usually called the drift, represents the natural or free evolution of the system, whereas the control Hamiltonians {H i } are associated with interactions with external degrees of freedom (usually electromagnetic radiation). Thus, |ψ evolves through the parametrized propagator U (t) as |ψ (t) = |ψ(t) . In turn, U (t) is the solution to the Schrödinger equation As shown in Appendix D, under standard assumptions, the Trotrerized QOC propagator of Eq. (8) is a PSA as in Eq.
(2). The variety of different dynamics a quantum control system in the form of Eq. (7) can undergo, upon variation of the control fields, is well understood though group theory. Since the Hamiltonian is Hermitian and traceless, U (θ) belongs to SU(d), the Lie group of unitary d × d complex matrices that preserves the standard inner product on H. Surprisingly, the set of all unitaries U (θ) that can be accessed by such a control system forms itself a Lie group, known as the dynamical Lie group G ⊆ U(d). Hence, a natural question which arises is: how can this group be determined?
First, let us define the set of generators.
Definition 2 (Set of generators). Given a parametrized quantum circuit of the form in Eq.
(2) we define the set of generators G = {H k } K k=0 as the set (of size |G| = K + 1) of the (traceless) Hermitian operators that generate the unitaries in a single layer of U (θ).
Naturally, the group G depends on the set of generators G, yet it is not sufficient to look at the individual elements of G. Instead, one must consider the Lie algebra that emerges from their nested commutators. Hence, consider the following definition [77].
Definition 3 (Dynamical Lie Algebra). Given a control system with generators G (see Definition 2), the Dynamical Lie Algebra (DLA) g is the subalgebra of su(d) spanned by the repeated nested commutators of the elements in G, i.e., where S Lie denotes the Lie closure, i.e., the set obtained by repeatedly taking the nested commutators between the elements in S.
Here, su(d) is the special unitary algebra of degree d, the Lie algebra formed by the set of d × d skew-Hermitian, traceless matrices. In Appendix E, we lay down the basic procedure to build DLAs (see Algorithm 1) and provide some discussion on the complexity of such construction.
Once the DLA is obtained from the set of generators, one can determine the set of unitaries that are expressible by the control system. Specifically, one can now properly define the dynamical Lie group as follows.
Definition 4 (dynamical Lie group). The set unitaries G that can be generated by a control system is determined by its DLA (see Definition 3) through 2 The dynamical Lie group in turn determines the set of states |ψ(θ) = U (θ)|ψ that can be reached by evolving an initial state |ψ . Specifically, here U (θ) can attain values in the Lie group G. In addition, Definition 4 crucially shows that one can study the expressibility of a control system (i.e., the unitaries that can be generated, or the set of states that can be reached) via the DLA obtained from the set of generators. As shown in Fig. 2, when computing g there are several cases of interest that can arise and which we here consider. For the sake of clarity, in what follows we briefly recall several key concepts that will be useful throughout the manuscript. We refer the reader to [63] for additional details.
First and foremost, we recall the concept of controllability. A control system is said to be controllable if its DLA is full rank, i.e., g = su(d). This implies that G = SU(d) and hence every unitary (up to a phase) can be obtained by appropriately choosing control parameters in Eq. (45). In particular, this means that for any two states |ψ and |φ , there always exists a unitary U (θ) ∈ G such that U (θ)|ψ = |φ .
If the DLA is not full rank, then the system is said to be uncontrollable. In this case g is a proper subalgebra of su(d), and only a proper subgroup G ⊂ SU(d) Figure 2: Cases of interest for the Dynamical Lie Algebra. The Dynamical Lie Algebra (DLA) g determines the set of unitaries expressible, and concomitantly, the set of states reachable. In this figure we show different scenarios that can arise when computing g. Our main results (on gradient scaling of VQAs and QOC systems) pertain to these different scenarios.
is available to the control system, meaning that the set of reachable states {U (θ)|ψ , ∀ U (θ) ∈ G} is not the whole state space. As depicted in Fig. 2, there are two sources of uncontrollability [78].
On one hand, if the generators in G share one or more common symmetries, i.e., there is at least one Hermitian operator Σ that commutes with every element in G, then every H ∈ g is block diagonal in the eigenbasis of Σ. This causes the state space to break into subspaces that are invariant under the action of g, in which case controllability is clearly disrupted. Here, the DLA is a reducible representation of some Lie algebra. On the other hand, even in the absence of symmetries, that is, when the DLA is irreducible, uncontrollability can arise simply because the Lie algebra is a proper subalgebra of su(d).
Let us finally remark that, as shown in Fig. 2, even though a reducible system cannot be controllable on the entire Hilbert space it may still be controllable on some (or all) of the invariant subspaces. Given a DLA that is a direct sum of irreducible representations, i.e., g = j g j , then the Hilbert space can be expressed as H = j H j , with H j being invariant under the action of g. A system is said to be subspace controllable on subspace H j if g j is full rank, i.e., g j = u(d j ), where d j = dim(H j )) and u(d j ) denotes the unitary algebra of degree d j .

MAIN RESULTS
As previously discussed, VQAs and QOC can be considered as two formulations of a common variational optimization problem that optimizes parameters controlling the dynamical evolution of a quantum system. In this section we present our main results, where we basically leverage tools from QOC to analyze the trainability and the existence of barren plateaus in VQAs. Specifically, we organize our results in term of the different controllability settings shown in Fig. 2. In all cases, the proofs are presented in the Appendix. The main idea behind our results is that, given a PSA U (θ) as in Eq.
(2), the study of the DLA of the ansatz can diagnose the presence (or absence) of barren plateaus in the VQA landscapes.

Controllable systems
First, let us consider controllable systems. It is well known that the distribution of unitaries generated by controllable systems converges to a 2-design in the long-time (i.e., for sufficiently deep circuits) [79]. However, the rate of convergence actually depends on the specific choice of generators. Hence, our first result analyzes the depth at which the expressibility A (2) U (θ) ∞ of a controllable system is ε small. Theorem 1. Consider a controllable system. Then, the PSA U (θ) will form an ε-approximate 2-design, i.e. A (2) U (θ) ∞ = ε with > 0, when the number of layers L in the circuit is Here A (2) U1(θ) ∞ denotes the expressibility of a single layer U 1 (θ 1 ) of the ansatz according to Eqs. (2) and (6).
See Appendix F for a proof of Theorem 1.
We note that Theorem 1 arises from the following expression that connects the expressibility of an Llayered PSA to the expressibility of a single layer of the ansatz to the L-th power as (12) Here we can see that A (2) Hence, as expected, PSAs that have more expressible layers require less depth to have an ε-expressibility (to be ε-approximate 2-designs). Conversely, one can also see that ansatzes with less expressible layers require more depth to become εapproximate 2-design.
The following corollary analyzes the scaling of L.
From Corollary 1 we have that when the single layer expressibility is (at most) polynomially vanishing with n, then a polynomial number of layers suffice to make the PSA U (θ) exponentially close to being a 2-design. We note, however, that in the case where the single layer expressibility is exponentially close to 1, one requires an exponential number of layers to form an ε-approximate 2-design. In all the aforementioned cases it is worth remarking that an exponential number of layers will always lead to ε-approximate 2designs with ε ∈ O(1/2 n ), independently of the value of A Once the depth of the ansatz is sufficient for the controllable system to be an ε-approximate 2-design, then a barren plateau will arise. Hence, one can prove the following proposition from Theorem 1 and Corollary 1.

Proposition 1 (Controllable).
There exists a scaling of the depth for which controllable systems form ε-approximate 2-designs with ε ∈ O(1/2 n ), and hence the system exhibits a barren plateau according to Definition 1.
See Appendix H for a proof of Proposition 1.
Proposition 1 rephrases the well known barren plateau results of [37,47] in terms of controllability. Specifically, it has been shown that when an ansatz forms a 2-design, such randomness leads to a barren plateau. Hence, the proof of Proposition 1 simply follows the proof in [37], with the addition that the convergence to a 2-design comes from the fact that the system is controllable.
Evidently, it becomes relevant to determine systems that are controllable as these can exhibit barren plateaus. In this work we prove that two relevant sets of generators lead to full rank DLAs, and hence to controllable systems. Proposition 2. The following two sets of generators generate full rank DLAs, and concomitantly lead to controllable systems: See Appendix I for a proof of Proposition 2.
The first case in Proposition 2 corresponds to the generators of PSA layered Hardware Efficient Ansatz [80], and hence Proposition 1 indicates that this system can exhibit barren plateaus. While it is known that the layered Hardware Efficient Ansatz converges to a 2-design for sufficient depth [81,82,83,37,49], the proof of existence of barren plateaus for this ansatz presented here is novel in that we show that the system is controllable.
The second result in Proposition 2 pertains to determining the ground state energy of quantum spin glasses (usually configured to encode solution to combinatorial optimization problems) [84, 85] with a PSA generated by G SG . Hence, since the system is controllable, according to Proposition 1, such an ansatz will also exhibit a barren plateau. This provides a no-go theorem for determining the ground state of certain spin glasses with Hamiltonians using deep PSA variational ansatzes .

Subspace controllable systems
Let us now consider the case of reducible DLAs, i.e., control systems with symmetries. Here we recall that in this case the DLA is a direct sum of the form g = j g j , such that any unitary U (θ) in the dynamical group G preserves the subspaces U (θ)H j ⊂ H j . Then, similarly to the fully controllable case, if a system is subspace controllable in a given subspace there exists a depth at which the unitaries U (θ) form 2designs in that subspace. In such a case, we can derive the following theorem for the variance of the cost function partial derivative with respect to a parameter θ µ ≡ θ pq associated to layer p and generator H q (see Eq. (2) for a definition of the ansatz). In the following, will slightly abuse notation and denote H µ = H q .
Theorem 2 (Subspace controllable). Consider a system that is reducible, i.e. so that the Hilbert space is H = j H j with each H j invariant under the action of the dynamical Lie group G (see Def. 4), and controllable on some H k of dimension d k (i.e. g k = u(d k ) or su(d k )). Consider a cost function C(θ) in the form of Eq. (1) and suppose that the number of layers L in the circuit is enough to allow the distribution of unitaries U (θ) to be ε close to a 2-design in H k . Then, if the initial state is such that ρ ∈ H k , the variance of the cost function partial derivative with respect to parameter θ µ is given by where O is the operator whose expectation value is being minimized and H µ is the generator of the corresponding gate. Here Theorem 2 shows that the input state ρ can actually play a crucial role in determining the gradient scaling of the cost function. Specifically, if ρ belongs to an invariant subspace where the system is controllable, then the scaling of the cost function partial derivative variance is determined by the dimension of the invariant subspace rather than by the dimension d = 2 n (16) will be block diagonal. Since the system is subspace controllable in each invariant subspace, the gradient scaling of the PSA can be analyzed via Theorem 2. b) The existence or absence of barren plateaus is directly determined by the dimension of the invariant subspace to which the input state ρ belongs. For instance, the cost can be trainable for ρ ∈ H1, but will exhibit a barren plateau if ρ ∈ H n/2 , as in the latter case the dimension d n/2 is exponentially large.
of the Hilbert space. Hence, the cost function C(θ) might exhibit barren plateaus in some subspaces but not in others. This is formalized in the following corollary.

Corollary 2.
Consider an ansatz of the form in (2) giving rise to a reducible DLA, and let ρ ∈ H k , with H k some invariant subspace that is controllable (i.e. the DLA reduced to such subspace is full rank). The following bound holds with a PSA generated by G = G XXZ U {Z 1 }. Here, the XXZ generators are accompanied by a control generator Z 1 , which is introduced precisely to make the system (subspace) controllable. We remark that we here employ a U subindex in G XXZ U to indicate that this set of generators is uncontrollable. Since all ele- . Because the example set G has a DLA that is full rank on every subspace [87], we can analyze the trainability of such a PSA using Theorem 2. The implications of Corollary 2 for such a VQA are schematically shown in Fig. 3. Here we find that the presence, or absence, of barren plateaus for the PSA U (θ) generated by Eq. (16) is completely determined by the scaling of the invariant subspace to which the input state belongs. For instance, the cost may not exhibit a barren plateau if ρ has a number of excitations that does not scale with n, while it will have a barren plateau for k = n/2 (as in this case the dimension d n/2 scales exponentially with the number of qubits).

Uncontrollable and reducible systems
Analyzing the scaling of the gradients in the case of uncontrollable systems becomes much more intricate than in the controllable or subspace controllable cases, mainly because integrating over the Haar measure of proper subgroups of the unitary group is not so straightforward [88]. As shown in this (and the next) section, one can still obtain a few analytical results for these cases. In particular, one can derive an upper bound for the variance of partial derivatives in terms of the degree of expressibility on the invariant subspaces, in a spirit similar to that of [47].
Before presenting our main results for uncontrollable and reducible systems, it is convenient to introduce some notation. We will use U B and U A , respectively, to address the portions of the circuit that come before and after a given parameter θ µ ≡ θ pq ∈ θ. That is, and 3 We recall that a state |ψ has m excitations if it can be expressed as a linear combination of computational basis states with Hamming weight m where we have omitted the θ dependency for simplicity.
Then, the following theorem holds.
Theorem 3. Consider a system that is reducible and let ρ ∈ H k with H k an invariant subspace of dimension d k . Then, the variance of the cost function partial derivative is upper bounded by with Here For simplicity, we here employed the short-hand notation · U (k) See Appendix L for a proof of Theorem 3.
Theorem 3 generalizes the expressibility results in [47] to invariant subspaces. More specifically, Theorem 3 provides a bound on the variance of the cost function partial derivative ∂ µ C(θ) as a function of the ansatz expressibility on the relevant invariant subspace. Hence, similar to the results observed in [47], the more expressible an ansatz is in a subspace, the smaller the gradients will be. Moreover, ansatzes that are very expressible in subspaces with exponentially large dimensions can exhibit barren plateaus as the variance of the cost function partial derivative will vanish exponentially, according to Eq. (19).

Uncontrollable and irreducible systems
Here, we analyze a case where the DLA is an irreducible representation of some proper subalgebra of su(d). Specifically, we consider a toy model ansatz . That is, [S j , S k ] = 2i jkl S l with the Levi-Civita symbol and j, k, l ∈ {x, y, z}. We address the task of minimizing a cost function of the form where |m is an eigenstate of S z , i.e., S z |m = m|m with m ∈ {−S, −S + 1, . . . , S − 1, S}.
Let us analyze the variance of partial derivative with respect to parameter θ µ = θ jx , i.e., the one corresponding to the generator S x on the j-th layer. Assuming a depth p such that that the distributions U A (θ) and U B (θ) converge to ε-approximate 2designs on the dynamical Lie group G (which in this case is the d-dimensional irreducible representation of SU(2)), we are able to explicitly integrate over the Haar measure on G and find the following proposition to hold.

Proposition 3. Consider the cost function of
Eq. (20). Let θ µ = θ j,x , and let us assume that the circuit is deep enough to allow for the distribution of unitaries U A and U B to converge to 2-designs on G = SU(2). Then variance of the cost function par- See Appendix M for a proof of Proposition 3. Proposition 3 shows that the variance of the cost function again depends on the input state |m , which is a similar result to the one obtained in Theorem 2. Moreover, here Var θ [∂ µ C(θ)] can in fact be as large as d 2 . This is due to the fact that the "size" (the difference between maxima and minima) of the landscape also grows with d. One can get rid of this effect by considering a normalized cost instead, C(θ) = C(θ)/S, where the ad-hoc factor 1/S guarantees that the landscape is |C(θ)| 1 for all values of d. The variance of such normalized landscape is that is, vanishes exponentially for initial states with |m| ∈ O(poly(log(d))). Similar to the subspace controllable results in Corollary 2, here the choice of initial state is again crucial as it can lead to the cost function exhibiting barren plateaus.

General case: linking gradient scaling to the dimension of the Lie algebra
In this section we note that the dimension of the DLA can be linked to the scaling of the variance of the cost function partial derivatives. This opens up the possibility of diagnosing the existence of barren plateaus of uncontrollable systems by analyzing the scaling of their DLAs. First, let us remark that a key aspect of the toy model in Section 4.4 is that the dimension of the DLA is dim(g) = 3. This is independent of the dimension d of the Hilbert space it acts on. Moreover, as shown in Eq. (21) the variance is also independent of d as it does not present the typical dimensionaldependent factor in the denominator that one usually obtains when integrating over unitary 2-designs (see Eq. (13) in Theorem 2).
For instance, when the system is controllable, dim(g) = d 2 − 1 = 2 2n − 1, and thus the dimension of the DLA is exponentially growing with the system size n. Concomitantly, one finds that [37,49], and hence the variance is exponentially vanishing with the system size. A similar result is obtained in the subspace controllable case (see Theorem 2) where the variance is of the form These facts have led us to conjecture that the dimension of the DLA plays a key role in determining the presence or absence of barren plateaus in the cost function landscape. More specifically, for PSAs with sufficient depth (i.e., with a depth such that the distribution of unitaries generated by U (θ) has converged to the Haar measure in the Lie group G), we have noted that the following conjecture appears to hold.

Conjecture 1.
Let the state ρ belong to a subspace H k associated with a subspace DLA g k (or sub-DLA, the subrepresentation in g where ρ has support on). Then, the scaling of the variance of the cost function partial derivative is inversely proportional to the scaling of the dimension of the DLA, i.e. .
The implications of Conjecture 1 are as follows. First, it means that systems with a sub-DLA g k 4 that is polynomially growing with the system size can exhibit gradients that vanish only polynomially, and hence may not exhibit barren plateaus. Conversely, systems with a sub-DLA that is exponentially growing with the system size would exhibit gradients that vanish exponentially with the system size, hence exhibiting barren plateaus. Here, we remark that systems with sub-DLAs that are not exponentially growing may still have barren plateaus which are not related to the dimension of the DLA. For instance, if the cost function is global, the system can still exhibit barren plateaus even with trivial ansatzes that do not a have exponentially growing dimension of the DLA [49].
Using Conjecture 1, one could diagnose gradient scalings by determining the size of the Lie algebra of a given ansatz U (θ). This comes at the cost of taking the set of generators G and computing the DLA. While numerical methods (as in Algorithm 1) can prove valuable insights for small system sizes, these algorithms will generally scale poorly in the number of qubits. Hence, performing a theoretical analysis of the DLA (similar to the one performed in Proposition 2) is a preferable method.
Here we remark that there are simple (yet pathological) cases that show that Eq. (22) does not preclude the possibility that systems with algebras that grow polynomial with the system size may still exhibit barren plateaus. For instance, consider Eq. (13), where ρ belongs to a subspace with polynomially growing algebra: d k ∈ O(poly(n)). Then, note that if the input state is exponentially close to being maximally mixed on H k (i.e., if ∆(ρ (k) ) ∈ O(1/2 n )) one can easily verify that the system will exhibit a barren plateau according to Definition 1 as the cost function partial derivative will be exponentially vanishing. Here, the barren plateau arises not from the dimension of the DLA being exponentially large but rather from trying to train a VQA on an input state that is exponentially close to being maximally mixed. A similar result can be found if H (k) µ is exponentially close to the identity. Hence, we remark that Conjecture 1 does not imply that systems with polynomially growing algebras are exempt from having barren plateaus, as cases where ρ (H (k) µ ) is exponentially close to being maximally mixed (the identity) will naturally be hard to train from the definition of the cost function in Eq. (1).
We finally note that to further support the claim in Conjecture 1, we present in the following section results obtained from numerically computing the scaling of the variance of the cost function partial derivatives for systems with DLAs having several different dependencies on the number of qubits. As discussed in Section 5, we see that the the result in Conjecture 1 holds true for all cases considered, as in these cases the scaling of the variance of the cost function partial derivative is inversely proportional to the scaling of the dimension of the DLA. In addition, based on our conjecture one can accurately make predictions regarding whether a given modification to an ansatz (adding a new generator to G by introducing a new unitary in each layer) might improve or be detrimental to the trainability of the parameters.

NUMERICAL SIMULATIONS
In this section we present results obtained by numerically computing the variance of the cost function partial derivatives for systems with different PSAs, and with DLAs of dimensions with different scaling. In particular, we consider systems that are controllable, subspace controllable, and subspace uncontrollable. As we show, in all cases Conjecture 1 is verified. Finally, we refer the reader to Appendix M for a numerical study of the toy model in Section 4.4, where g and G are, respectively, the d-dimensional irreducible representations of su(2) and SU(2).

Controllable systems
First, let us remind that when the system is controllable, U (θ) forms a 2-design (see Proposition 1). In this case the scaling of Var θ [∂ µ C(θ)] has been widely analyzed in the literature (see for instance [37,47]). Controllable systems, as previously discussed, satisfy Conjecture 1. In Fig. 4 we show the variance of cost function partial derivatives as a function of 1/ dim(g) for the cost function Here, U (θ) is a layered Hardware Efficient ansatz (see the circuit in the inset of Fig. 4) with 200 layers and where |0 = |0 ⊗n . For each value of n = 2, 4, . . . , 20, the variance was computed by randomly initializing 1000 sets of parameters. Since this system is controllable (as proved in Proposition 1), then dim(g) = d 2 − 1 = 4 n − 1. In Fig. 4 we see that, as expected, the variance is a polynomial function of 1/ dim(g) (indicated by a straight line in a log-log scale).

The XXZ model
Let us first consider the task of finding the ground state energy of the XXZ Hamiltonian H XXZ of Eq. (15). First, let us notice that G XXZ U , the uncontrollable set of generators of Eq. (16), has two symmetries: magnetization and parity. Hence, the DLA is reducible, i.e. a sum of irreducible sub-representations , where the indices m and σ indicate number of excitations and parity, respectively (see Appendix N for details). Notably, the system can be rendered subspace controllable (while preserving the invariant subspace structure) by introducing an additional generator consisting of local fields at the ends of the chain [89, 90] The new set G XXZ generates a DLA that is full rank on each of the invariant subspaces, i.e. g XXZ = n m=0 σ=± u(d m,σ ). In Figure 5(a), we sketch a single layer of the ansatz generated by G XXZ . Note that upon the removal of the unitary generated by Z 1 +Z N (indicated by a shaded area), one recovers the HVA ansatz with generators G XXZ U proposed in Ref. [59]. Figure 5(b) shows numerical results obtained by computing Var θ [∂ µ C(θ)] for the cost function with J = 1. Here, U (θ) is the HVA ansatz generated by G XXZ (see Eq. (24)) with L = 6n layers, and |ψ m,+ is an initial state with m excitations and even parity σ = + (see Appendix O for details). For each system system size n = 2, 4, . . . , 20, and for each value of m, we computed the variance with respect to θ L 2 ,2 by randomly initializing each θ pq ∈ [0, 2π] and averaging over 9500 sets of parameters (for n = 20 we averaged over 2700 sets of parameters).
In Figure 5(b, left) we see that for m = 1, 2, . . . , 5 the variance of the cost function partial derivative is polynomially decreasing with n, indicating that the cost function does not exhibit a barren plateau for initial states with fixed number of excitations. However, in the case m = n/2 (see Figure 5(b, right)), one can observe that Var θ [∂ µ C(θ)] vanishes exponentially. In addition, in Figure 5(b) we also show the curves for Var θ [∂ µ C(θ)] obtained from the analytical result in Eq. (13) of Theorem 2. The agreement between theoretical and numerical results indicates that, already for the linear depths used in the experiments, the ansatz is well converged to a 2-design. Hence, the results in Theorem 2 suggest that the system will exhibit a barren plateau when initialized on any subspace where d m ∈ O(2 n ), for example, in the case of m = n/2 excitations.
In addition, Figure 5(b, right) shows the scaling of the variance for the PSA generated by G XXZ U , with an initial state with m = n/2. As previously noted, this case is not controllable and hence Theorem 2 does not hold. However, the gradient scaling of the cost function can still be diagnosed using the expressibility result of Theorem 3. First, we note that the variance values for the uncontrollable case are larger than the ones for the controllable case. This result is in accordance with the fact that the smallest variances are reached with the higher expressibilities. Still, despite the system not being controllable, we find that the cost function still exhibits a barren plateau as the cost vanishes exponentially with n.
In Figure 5(c) we show that for all subspace controllable cases considered, Conjecture 1 holds. Specifically, we have shown Var θ [∂ µ C(θ)] as a function of 1/ dim(g), and we see a linear dependence in a log-log scale. This is true both for the exponentially growing algebras (m = n/2) as well as for the polynomially growing algebras (m = 1, 2, . . . , 5). Moreover, we see that the Conjecture is verified on the subspace uncontrollable case of G XXZ U (pink stars), where the dimension of the DLA is exponentially growing, and concomitantly, the variance of the cost function partial derivative is exponentially suppressed.

The Ising Model
In this section we present results obtained for numerically simulating the use of a PSA to find the ground state of the Ising model. Specifically, consider the Hamiltonian of the one-dimensional Transverse Field Ising Model (TFIM) where n f = n − 1 in the case of open boundary conditions, and n f = n in the periodic boundary conditions case (where Z n+1 ≡ Z 1 ). Then, as shown in Figure 6(a, left), the ansatz is generated by the set Note that the PSA generated by G TFIM is in fact the QAOA employed for solving the MAXCUT problem on a 2-regular graph [19, 57]. As discussed in Appendix N, the generators in G TFIM (with open boundary conditions) have two symmetries: parity symmetry Π, and the so-called Z 2 symmetry Π Z2 (representing an invariance under a global flip in the qubits). The Hilbert space is broken into four invariant subspaces, H = σ,σ H σ,σ , where σ, σ = ±1 respectively spanning the eigenvalues of Π and Π Z2 , and where dim(H σ,σ ) is exponentially growing, i.e., dim(H σ,σ ) ∈ O(2 n ). In turn, the DLA decomposes as g TFIM = σ,σ g σ,σ ⊆ u(d σ,σ ). However, employing Algorithm 1 we computed the dimension of the DLA generated by G TFIM and we found that it only grows polynomially with n. That is, we obtain that Clearly, this implies that dim(g σ,σ ) n 2 for all σ, σ . Note that the set {1, Π} constitutes a representation of S 2 , the symmetric group of two elements, under which the open-boundary-condition TFIM generators are invariant. Instead, the TFIM generators with closed boundary conditions are invariant under a representation of C n , the cyclic group of n elements. As discussed in Appendix N, the dimension of the DLA now grows linearly instead of quadratically.
Similarly to what happened in the XXZ case, we can turn the TFIM model subspace controllable upon the introduction of an extra generator. Consider the set leading to the PSA in Figure 6(a, right). The set G LTFIM can also be regarded as being constituted by the individual terms in the one-dimensional Longitudinal and Transverse Field Ising Model (LTFIM) Hamiltonian In addition, the ansatz generated by G LTFIM is also a QAOA-type ansatz where an additional mixer has been added.
In the case of open boundary conditions, the n i=1 Z i term breaks the Z 2 symmetry, and thus the set G LTFIM only conserves the parity symmetry, g LTFIM = σ g σ . Using Algorithm 1 we find that the DLA is full rank on both σ = ±1 parity subspaces, and hence Similarly, in the closed boundary condition case, one can also find that the dimension of the DLA grows exponentially with n. This is an example where we show how a simple modification to the ansatz (adding a layer generated by n i=1 Z i ) can greatly change the dimension of the DLA, and, as discussed below, such a small change can greatly affect the trainability of the cost function.
In Figure 6(b) we show results for numerically computing Var θ [∂ µ C(θ)] for the cost function where U (θ) is the PSA generated by the set G TFIM of Eq. (27) with L = 12n layers for open boundary conditions, and L = 6n for closed boundary conditions. For each value of n = 4, 6, . . . , 18 we computed the variance by picking 4400 random sets of parameters, while for n = 20 we picked 1000 random intializations. In all cases the partial derivative was taken with respect to θ L 2 ,2 . We see from Figure 6(b) that the variance of the cost partial derivative vanishes polynomially with n for both open and closed boundary conditions, and hence the system does not exhibit a barren plateau. Then, as shown in Figure 6(c), once again, Conjecture 1 holds for both open and closed boundary conditions: Var θ [∂ µ C(θ)] and dim(g TFIM ) respectively vanish, and grow, polynomially with n.
Moreover, in Figure 6(b) we also depict results obtained by computing Var θ [∂ µ C(θ)] for the LTFIM ansatz, using the same cost function of Eq. (32). Now, U (θ) is the PSA generated by the set G LTFIM in Eq. (27) with L = 6n layers. Using the same number of samples than for the TFIM case, we find It is worth noting that, as discussed before, and as shown in Figure 6(a), the difference between the TFIM and the LTFIM ansatz is given by an additional unitary in each layer (parametrized by a single angle). However, despite this simple difference, we find the variance of the cost function have different scaling, as one cost exhibits a barren plateau while the other one does not exhibit a barren plateau.

Erdös-Rényi model
Let us now consider the task of solving MAXCUT problems with a QAOA ansatz. Here, we recall that MAXCUT is specified by a graph G = (V, E) of nodes V and edges E, such that one seeks to determine a partition of the nodes of G into two sets that maximize the number of edges connecting nodes between sets. The MAXCUT Hamiltonian is given by and we consider the standard QAOA ansatz generated by Let us analyze the variance of the partial derivative of the cost where we use |E| (the number of edges in the graph) to normalize the cost function. For each value of n = 2, 3, . . . , 9 we generated 90 graphs according to the Erdös-Rényi model [91]. That is, each graph G was chosen uniformly at random from the set of all graphs of n nodes. Then, for each graph we sampled 3000 random initializations with L = 12n layers and we took the partial derivative with respect to the angle in the L/2-th layer associated to the (mixer) Hamiltonian n i=1 X i . In Fig. 7(a) we show results of Var θ [∂ µ C(θ)] versus the number of qubits. Here we can see that, as expected, even for fixed n different graphs will have different value of the variance. However, by computing the median variance for each system size we found that the scaling of the median is exponentially decaying with the system size. While this result does not preclude the possibility of generating graphs that will not have a barren plateau, it suggests that uniform sampling of graphs from the Erdös-Rényi model will lead to the landscape for a typical graph having a barren plateau. Then, as shown in Fig. 7(b), we compute the dimension of the DLA for each graph, and find that Conjecture 1 is confirmed, as the relation between Var θ [∂ µ C(θ)] and dim(g) is linear in a log-log-scale.

DISCUSSION
In this work, we have explored a fundamental connection between VQAs and the theory of QOC with the purpose of analyzing the existence of barren plateaus in a family of periodic-structured ansatzes which contain, as special cases, the QAOA and the HVA, among other widely used ansatzes in variational quantum algorithms and quantum machine learning. Our results show that one can diagnose the presence of barren plateaus in the cost function landscape by analyzing the degree of controllability of the system, characterized by the dimension of the dynamical Lie algebra (DLA) obtained from the set of generators of the ansatz.
Our main results are the following. First, we show that if the DLA is full rank, i.e. if the system is controllable, then the cost function exhibits a barren plateau. This follows from the fact that, as we show, controllable systems converge to 2-designs. Here, we also derive an expression relating the depth required for a given ansatz to become an ε-approximate twodesign with the expressibility of one of its layers.
We then consider systems with symmetries, where the Hilbert space partitions into invariant subspaces associated with the different eigenspaces. In this context, we show that when the system is subspace controllable, the existence of barren plateaus crucially depends on the input state to the VQA. For example, the cost might be trainable for certain input states, but might exhibit a barren plateau for others. Specifically, our results connect the scaling of the variance of cost function partial derivatives to that of the dimension of the subspace in which the input state has support on. Instead, when the system is subspace uncontrollable, we show that one can still upper bound the variance of the cost function partial derivative using the expressibility of the ansatz in the relevant subspace. This indicates that larger subspace expressibilities leads to smaller gradients.
Finally, we present an conjecture that shows that one can directly study the scaling of the cost function partial derivative variance by computing the dimension of the subspace DLA to which the input state belongs. This conjecture implies that ansatzes with polynomially growing DLAs can exhibit polynomially vanishing gradients, while ansatzes with exponentially growing DLAs should exhibit exponentially vanishing gradients.
In addition, we performed numerical simulations of VQAs with the hardware efficient ansatz, QAOA, and HVA, for problems such as preparing ground states of the XXZ model and of the Ising model, or solving MAXCUT problems on graphs generated from the Erdös-Rényi model. The numerical results match our theoretical predictions and hence verify our analytical results for controllable and subspace controllable systems. Moreover, in all cases considered we verify that our conjecture holds, further providing evidence that the scaling of the cost function partial derivative variance may be directly linked to the dimension of the subspace DLA.

Implications of our results to ansatz design
The broader implication of our results is that the framework introduced here can be used to design ansatzes, as one could potentially predict if an ansatz, or a modification to the ansatz, will lead to the cost function exhibiting a barren plateau. Hence, our work can be considered as paving the way towards trainability-aware ansatz design.
For instance, we have shown how a simple change in the ansatz structure, such as adding an additional parametrized unitary per layer, can greatly affect the gradient scaling of the cost by changing the controllability of the system. This means that one should be careful when employing schemes such as the Adaptive QAOA or quantum optimal control ansatz as the addition of an operator H to the set G of generators of the ansatz can lead to barren plateaus if the system becomes controllable (or subspace controllable in an exponentially growing subspace). In particular, if H does not commute with the elements in G, one should analyze how the DLA changes by such addition before proceeding to change the ansatz.
Here, we crucially remark that one of the main advantages of the aforementioned theoretical analysis is that it can be performed classically (either analytically or numerically) as it just requires the evaluation of the DLA. Hence, our methods save precious quantum resources as one does not need to run the quantum algorithm, or even access a quantum computer, to test the trainability of the ansatzes.
Finally, we remark that if our conjecture holds more generally, then one can use this additional tool to directly study the trainability of an ansatz by estimating the scaling of the variance of the cost function partial derivative through the scaling of the dimension of the DLA. For example, such results can be used to show that certain ansatzes might not have exponentially vanishing gradients. For instance, when considering a QAOA ansatz for solving MAXCUT on 2regular graphs, a straightforward computation of the DLA reveals its scaling is only linear in n. Hence, we expect (and we find) no barren plateaus. Similarly, one can use our conjecture to analyze ansatz proposals in the literature. For example, Ref.
[71] recently proposed an ansatz generated by the set of products up to K-body Pauli X operators, i.e., Since the ansatz is abelian, the dimension of DLA is just the number of generators. Thus, we expect that when using a poly number of layers the ansatz should be rid barren plateaus.

Outlook
In the present work, we have established a novel framework for diagnosing the presence of barren plateaus in VQAs. While here we mainly focus on the trainability of ansatzes for near-term quantum computing, our results should also be considered as useful in the broader context of QOC. For instance, while the barren plateau phenomenon has been recently widely studied in VQAs, it is clear from our manuscript that barren plateaus can (and will) also arise in QOC schemes (see also [92]). Hence, we leave for future work to study how some of the results derived for the trainability of VQAs be used to analyze the trainability of QOC control pulses.
In addition, we note that since our work studies the trainability of certain families of ansatzes, we also leave for future work to show how the tools here presented can be employed to study more general ansatzes (e.g., ansatzes for quantum machine learning applications) which do not necessarily have a periodic structure. In addition, we leave as an open question how the results in our conjecture can be generalized and formally proved.

ACKNOWLEDGMENTS
We thank Marco Farinati and Robert Zeier for useful discussions on Lie-algebras, and we also thank Zoe Holmes and Pablo Poggi for helpful discussions. References [1] Peter W Shor.
Algorithms for quantum computation: discrete logarithms and factoring. In

Appendices
In the following appendices we present additional information and derive proofs for the main results in the manuscript. In Appendix A we introduce preliminary notation and definitions that will be relevant for the rest of the appendices. Then, in Appendix B we provide additional details on different widely known ansatzes that are Periodic Structure Ansatz (PSA). In Appendix C we provide a brief review of barren plateaus. Appendices F-M contain the proofs of our main Theorems, Corollaries and Propositions. Finally, in Appendix N we discuss the symmetries in the XXZ and Ising spin models considered in the main text, and in Appendix O we provide additional details on the initial state used for the numerical simulations of the XXZ model.

A Preliminaries
Let us first review some definitions and prior results that will be relevant for the rest of the appendices.
Symbolic integration. Let us present formulas that allow for the symbolical integration with respect to the Haar measure on a unitary group [93]. For any V ∈ U(d) the following expressions are valid for the first two moments: where u ij are the matrix elements of U . Assuming d = 2 n , we use the notation i = (i 1 , . . . i n ) to denote a bitstring of length n such that i 1 , i 2 , . . . , i n ∈ {0, 1}.
Useful Identities. We introduce the following identities, which can be derived using Eq. (37) (see [49] for a review): where A, B, C, and D are linear operators on a d-dimensional Hilbert space.
Integration over parameter space: In the next sections we will derive analytical expressions for the variance of the partial derivatives of cost functions C(θ) over parametrized circuits U (θ). In such derivations, we will have to deal with integration over the parameter space. A key step in the following analysis will be to relate the integration over parameters with integration over the ensemble of unitaries arising from different parameter choices. In this sense, we recall that given a set of parameters {θ} one can obtain an associated set of unitaries generated by the quantum circuit {U (θ)}. Then, consider the integration of some function f (U (θ)) over θ. Defining U as the distribution of unitaries generated by U (θ), the following identity holds In addition, if the distribution of unitaries U can be shown to converge to a 2-design, the integration over the distibution can be further converted into an integration over the Haar measure allowing the use of identities (38), (39), and (40).

B Ansatzes
In general, ansatzes for parametrized quantum circuits can be divided into two primary categories: problemagnostic and problem-inspired ansatzes. In a problem agnostic ansatz one does not have any information about the problem, or its solution, that one can encode in the ansatz. Such is the case for instance in a task of estimating the spectrum of an unknown density operator [29]. On the other hand, problem-inspired ansatzes employ prior information about a given problem or task. For example, for the problem of estimating the ground state energy of a particular Hamiltonian, one can design ansatzes that preserve the symmetry of the problem Hamiltonian [94].
Here we remark that several well known problem-agnostic and problem-inspired ansatzes in the literature are PSAs of the form in Eq. (2). In particular, our framework allows us to study the hardware-efficient ansatz (HEA) Hardware efficient ansatz. The Hardware Efficient Ansatz (HEA) is a problem-agnostic ansatz, which relies on gates native to a quantum hardware. In particular, an ansatz can be designed based on a gate alphabet, which depends on the architecture and the connectivity of a given quantum hardware. This procedure helps in avoiding the overhead associated with transpiling an arbitrary unitary into a sequence of native gates. For example, one can consider native gates, such as single qubit rotations e −iθ/2Z e −iγ/2Y and CNOTs, where Y and Z denote Pauli matrices, and a CNOT between the control qubit i and the target qubit j is given by: e −iπ/2(|1 1|i⊗(Xj −1j )) . Then an ansatz of the form in (2) can be generated as follows: one layer consists of parametrized single qubit rotations on each qubit, followed by unparametrized CNOTs acting on neighboring qubits.
The HEA has been employed to prepare the ground state of molecules [80], to study Hamiltonians that are similar to the device's interactions [95], and in several other variational quantum algorithms [21, 96,97,49]. The HEA is also suitable in the near-term implementations of VQAs due to its low-depth structure which results into a lower-noise circuit in comparison to other ansatze [97,39].
Quantum alternating operator ansatz. The Quantum Alternating Operator Ansatz (QAOA) is a probleminspired ansatz that simulates the discretized adiabatic transformations [19]. Consider a goal of preparing the ground state of a problem Hamiltonian H P . Let H M denote a mixer Hamiltonian, with corresponding ground state |ψ . Then the QAOA maps |ψ to the ground state of H P by sequentially applying the problem unitary e −iγ l H P , followed by the mixer unitary e −iβ l H M . Let θ = (γ, β). Then the QAOA is given by U (θ) = L l=1 e −iβ l H M e −iγ l H P , which follows the general form of the ansatz defined in (2). Here, p is the order of the discretized adiabatic transformation and it determines the precision of the solution [19]. The QAOA was originally introduced for finding approximate solutions to combinatorial optimization problems [19]. The QAOA has been generalized as a standalone ansatz [57] and its performance has been investigated in several tasks, including the task of learning a unitary [98]. Moreover, the QAOA has been shown to be computationally universal [74,75], and the choice of optimal mixer is still an open debate [99,100].
Adaptive QAOA. As a consequence of the adiabatic theorem, the QAOA should lead to good solutions for high values of p [19]. However, for small values of p, the QAOA is an ad-hoc ansatz, which is not necessarily an optimal strategy to approximate the ground state of the problem Hamiltonian. A way to improve such an ad-hoc ansatz is to employ a variable mixer instead of a fixed mixer at each layer [57]. Let {G k } q k=1 denote a set of mixer Hamiltonians. Then an adaptive QAOA can be defined as follows: U (θ) = L l=1 e −iβ l G l e −iγ l Hp , where each G l can be adaptively picked from {G k } q k=1 . One particular adaptive approach was introduced in [60], where at each layer, G l is picked based on the largest gradient of the cost function among all {G l }. Moreover, [60] observed that adaptive entangling mixers can improve performance and reduce the number of parameters and CNOTs to achieve a desired accuracy in comparison to the non-adaptive QAOA. We note that the adaptive QAOA follows the form in (2) if adaptive mixers are learned up to a fixed layer and then the whole structure is repeated. That is, U (θ) = p m=1 r l=1 e −iβ l,m G l,m e −iγ l,m H P , where G l,1 are learned adaptively for each l ∈ {1, . . . , r}, and G l,1 = G l,m for all m ∈ {1, . . . , m}.
Hamiltonian variational ansatz. The Hamiltonian variational ansatz is another problem-inspired ansatz, which implements time evolution under problem Hamiltonian via Trotterization [58]. It can be understood as a generalization of the QAOA to more than two non-commuting Hamiltonians. Let H P = l H l denote a problem Hamiltonian, such that [H l , H l ] = 0. Then the HVA of order p is given by U (θ) = p k=1 l e −iθ l,k H l , which is in the form of (2). The HVA has been investigated in studying one-and two-dimensional quantum many-body models [101,102].
A simple example where the HVA can be employed is the XXZ model to study magnetism. For a onedimensional chain, the Hamiltonian for the XXZ model is given by Then, one way to parametrize a HVA of order p is as follows: U (θ) = p l=1 e −iβ l H X e −iγ l H Y e −iδ l H Z for g = 1, and where θ = (β, γ, δ). Another way to parametrize a HVA is as follows: Quantum optimal control ansatz. The HVA discussed above helps constraining the variational search to a relevant symmetric subspace of the the total Hilbert space. In general, this approach might require high values of p to achieve a desired accuracy in approximating the ground state of many-body Hamiltonians. One way to avoid high values of p is to introduce drive terms in addition to the problem Hamiltonian, which break the symmetry of the problem Hamiltonian H p . This approach falls under the framework of quantum optimal control [61]. In particular, let {Ĥ k } denote a set of drive terms. Then the update time-dependent Hamiltonian is given byH(t) = H P + k c k (t)H k , where drive terms H k are picked such that [H P , H k ] = 0 for all k. Here, c k (t) are time-dependent control parameters. Let H P = q H q and let θ = (γ, β). Then the Quantum Optimal Control Ansatz (QOCA) of order L is given by U (θ) = L l=1 q e −iβ l,q Hq k e −iγ l,k H k , where γ l,k denote the discrete drive amplitudes of the control parameter c k (t).
In general, finding an optimal drive Hamiltonian terms {H k } is a computationally challenging problem. One can employ an adaptive approach to pick drive Hamiltonians from a fixed set of Hamiltonian, similar to the adaptive QAOA [60]. In [61], the QOCA was shown to outperform other ansatze, including the HEA and the HVA for the task of preparation of the ground state of the half-filled Fermi Hubbard model.

C Barren Plateaus
As mentioned in the main text, the barren plateau phenomenon has been recognized as one of the most important challenges to overcome to guarantee the success of VQAs. When a cost function exhibits a barren plateau, its gradients are exponentially suppressed (in average) across the optimization landscape. Consider the following mathematical definition.

Definition 1 (Barren Plateau). A cost function C(θ) as in Eq.
(1) is said to have a barren plateau when training θ µ ≡ θ pq ∈ θ, if the cost function partial derivative ∂C(θ)/∂θ µ ≡ ∂ µ C(θ) is such that for some b > 1. Here the variance is taken with respect to the set of parameters θ.
Equation (43) implies that one requires a precision (i.e., a number of shots) that grows exponentially with n to navigate trough the flat landscape and determine a cost minimizing direction when optimizing the cost function. Moreover, as shown in [40,42], barren plateaus affect both gradient-based and gradient-free methods meaning that simply changing the optimization strategy does not mitigate or solve the barren plateau issues. Since the goal of VQAs is to have computational complexities that scale polynomial with n, such exponential scaling in the required precision destroys the hope of achieving a computational advantage with the VQA over classical methods (which usually scale exponentially with n).
The first result for barren plateaus was obtained in [37], where it was shown that deep unstructured ansatz that form 2-designs have barren plateaus. This phenomenon was then generalized to layered Hardware Efficient Ansatzes in [49] were it was proven that the locality of the cost function is connected to the existence of barren plateaus. That is, global cost functions (i.e., cost functions where O in (1) acts non-trivially in all qubits) exhibit barren plateaus even for shallow depths, whereas local cost functions (i.e., cost functions where O in (1) acts non-trivially in a small number of neighboring qubits) do not exhibit barren plateaus for short-depth ansatzes.
The barren plateaus phenomenon has also been studied in the context of quantum neural networks [41,46,103], and to the problem of learning scramblers [43]. In addition, it has been shown that circuits that generate large amounts of entanglement [41,45,44] are prone to suffer from barren plateaus. To circumvent or mitigate the effect of barren plateaus, several strategies have been developed [52,51,54,53,46,104,28,29,105,106].

D Quantum Optimal Control
Here we recall for convenience that in a standard QOC setting one is interested in controlling the dynamical evolution of a quantum state |ψ in ad-dimensional Hilbert space H = C d (where d = 2 n ) [63]. Here, the system dynamics are determined by a Hamiltonian that is tunable through some time-dependent control fields functions {f k (t)}. At its core, the problem in QOC is to determine how to shape the control fields such that the system evolves in a desired manner. A specific set of optimal fields is usually constructed by imposing a parametrization on the functions and applying standard numerical optimization routines. The success of such optimization process depends on the structure of the underling optimization spaces, the so-called quantum control landscapes [107,108,109].
For instance, a common choice is to consider piece-wise constant fields where the protocol duration T is divided in L intervals ∆t j = t j − t j−1 (such that T = L j=1 ∆t j ) at each of which the fields take a constant value, e.g., f k (t) = f k,j if t j−1 < t < t j . In this case, the propagator factorizes into a product of individual sub-propagators, each of which is generated by a constant Hamiltonian and thus leads to the simple matrix exponential form By Trotterizing Eq. (45) one finds where f l,0 = 1 for all l. Note that (46) is a PSA of the form of Eq.
We remark that in the limit ∆t l − → 0, Eq. (46) becomes exact. In the general case, the exact and Trotterized ansatzes approximately coincide, and a nontrivial correction of Eq. (47) is needed to make the correspondence exact. In any case, Eq. (47) allows us to henceforth use the notation U ({f k (t)}) = U (θ) and indicate with θ the trainable parameters in a QOC setting.

E Dynamical Lie Algebra computation
First, let us recall that a Lie algebra is a vector space g together with an operation [·, ·] : g → g called Lie bracket that is bilinear, alternating (the output is zero if the inputs are linearly dependent) and satisfies the identity In our quantum context, Lie algebras manifest as matrix Lie algebras. For example, the space of quantum observables u(d) is a subspace of the vector space of d × d complex matrices that is closed under matrix commutator (playing the role of a Lie bracket). More generally, we will encounter ourselves with Lie algebras that form Lie subalgebras of u(d). A subalgebra is a subspace of an algebra that is itself closed under the Lie bracket. For example, in a n qubit quantum system, the subspace Ω = span{X, Y, Z}, where X = i X i , Algorithm 1: Basis for the Dynamical Lie Algebra (DLA). Input: Set of generators G of the ansatz. Output: Basis S of the algebra g obtained from G.
is closed under commutation and thus constitutes a 3-dimensional subalgebra of the 4 n -dimensional operator space. Specifically, we will be interested in the so-called dynamical Lie algebra (DLA). This Lie algebra is the subspace of u(d) generated by the Lie closure of the generators of a paramterized quantum circuit (see Definition 3). In general, computing the DLA is a highly nontrivial task. One possible approach to the DLA is direct construction, i.e. start with a set of generators defining a subspace of operator space (but not a subalgebra), and start commuting them, finding new elements until one obtains a basis of the DLA (see Algorithm 1). The complexity of such approach is, in general, O(poly(d)) with d = 2 n , that is, exponential in n the number of qubits. For example, a naive approach (representing operators as dense d × d matrices) yields roughly O(d 2 d 6 ), since, in general, one has to check linear independence O(d 2 ), for example by implementing LU or QR decompositions (whose cost is O(N 3 ) for N × N matrices) on such d × d matrices. Although such complexity can be reduced, for example, using more intelligent representations of operators, in essence direct construction is attempting to build a basis for a subalgebra of su(d 2 ), i.e. a basis with potentially as many as d 2 elements, and therefore it cannot generally avoid exponentiality.
Despite being exponential, direct construction of the DLA (either numerically or analitically) on small system sizes can constitute a remarkably useful tool to later extrapolate or prove the scaling (e.g. by induction) of the DLA beyond those small 'afforable' system sizes. Moreover, note that in many cases one may only be interested in checking whether the dimension of the DLA is above a certain threshold, a task with complexity linear in the size of such threshold. Of course, this is neglecting the complexity of computing new DLA elements, which, as mentioned above, can be substantially diminished by choosing efficient representations for those operators.

F Proof of Theorem 1: Convergence of controllable systems to 2-designs
In the following we provide a proof for Theorem 1, which we recall for convenience. Theorem 1. Consider a controllable system. Then, the PSA U (θ) will form an ε-approximate 2-design, i.e. A (2) U (θ) ∞ = ε with > 0, when the number of layers L in the circuit is Here A (2) U1(θ) ∞ denotes the expressibility of a single layer U 1 (θ 1 ) of the ansatz according to Eqs.
Proof. To study the convergence of the PSA U (θ) to an approximate 2-design we employ the tools of Harmonic analysis. The following arguments are based on Ref.
[110]. This is similar to using Fourier analysis to study the convergence of a probability distribution on a real line to the normal distribution, which is also known as the central limit theorem.
The second moment operator corresponding to the distribution U over unitaries U can be defined as follows U H is that it is a projector onto a two-dimensional subspace, that is, M U H has eigenvalues 0 or 1. We show this by noting that the following equations hold: and Tr M (2) In Eq. (50), we used the left invariance of the Haar measure, and in (52) we used the Weingarten function to explicitly evaluate the integral. The first property puts in evidence that M U H is a projector and the second property shows that the eigenspace with eigenvalue 1 is a two-dimensional subspace.
Let V = U L · · · U 2 U 1 , be an L-layered PSA, where each unitary U j is sampled from the same distribution dµ = P (U )dU . Then, the probability distribution and moment operator of V are respectively given by Equation (55) shows that the moment operator of an L-layered ansatz is equal to the L-th power of the moment operator of a single layer. We can also calculate this formally. In our case each U l is given by are the set of generators, and where the θ lk are sampled from the uniform distribution. Then, let us note that M (2) Here H k |l i,k = l i,k |l i,k , and W k is the unitary matrix that diagonalizes H k . To calculate the distance to a 2-design we need to prove some properties of eigenvalues and eigenvectors of M 1 .
The equality holds if and only if |φ is an eigenvector of U ⊗ U 1 ⊗ U * 1 ⊗ U * 1 ∀U 1 , such that P (U 1 ) = 0. For the specific case of Haar measure, this means that |φ is an eigenvector of U ⊗ U ⊗ U * ⊗ U * ∀U ∈ SU(d). We already showed that there are two such eigenvectors for Haar measure with eigenvalue 1. Now, using the following argument we hope to show that those two are also the only eigenvectors of M (2) U1 with eigenvalue 1, given that the set G is controllable. Let |φ be an eigenvector of M (2) U1 with eigenvalue 1. One can now also see that |φ is also an eigenvector of M e −iθ lk H k = U so that P (U ) = 0. That, is one can obtain any unitary in U(d) by tuning the parameters in V . It also implies that |φ has to be an eigenvector of U ⊗ U ⊗ U * ⊗ U * with eigenvalue 1. But this means that |φ is an eigenvector of U ⊗ U ⊗ U * ⊗ U * with eigenvalue 1 ∀U ∈ SU(d). So |φ is also an eigenvector of M 2 [µ H ] with eigenvalue 1, and there are two such eigenvectors. Let us call them |φ 1 , and |φ 2 . Hence, we have Here {λ i } is the set of the remaining eigenvalues such that 0 |λ i | < 1. Let λ max be the eigenvalue with maximum modulus. Thus, we can now show that A (2) Then, recalling that |λ max | = A (2) U1(θ) ∞ is also the expressibility of one layer, one can find that Solving for L and denoting ε = A (2) U (θ) ∞ leads to G Proof of Corollary 1: Rate of convergence of controllable systems to 2-designs Here we prove Corollary 1.

H Proof of Proposition 1: Controllability leads to barren plateaus
Let us now provide a proof for Proposition 1.

Proposition 1 (Controllable).
There exists a scaling of the depth for which controllable systems form εapproximate 2-designs with ε ∈ O(1/2 n ), and hence the system exhibits a barren plateau according to Definition 1.
Proof. Let us start by noting that for all controllable systems one can form a ε-approximate 2-designs with ε ∈ O(1/2 n ) with a depth scaling obtained from Eq. (64). Then, let us recall that we have defined the expressibility superoperator as with its ordinary action given by We then have that for any quantum state ρ U (θ) (ρ) ∞ ε holds, which is precisely the definition of an ε-approximate state 2design. Finally, we can use the results from [37], which imply that since ε ∈ O(1/2 n ), then the variance of the cost function partial derivative is given by Thus, the cost function exhibit a barren plateau.

I Proof of Proposition 2: Controllability of the HEA and the Spin Glass model
Let us now prove Proposition 2.
Proposition 2. The following two sets of generators generate full rank DLAs, and concomitantly lead to controllable systems: In the following we find that by repeated nested commutators between the elements of the sets G in Proposition 2. one can obtain all 2 2n − 1 Pauli strings, and hence, that the DLA g obtained is full rank.
Proof. We divide the proof into HEA and GS models.

I.0.1 Generators of the Hardware Efficient Ansatz (HEA)
We first start with the set G HEA corresponding to the set of generators of a HEA.
First, let us note that from the commutation of X i with Y i , we get every Z i . Meaning that one can already obtain all single qubit Pauli operators. Then, the commutation of X i and Y i with It then follows that the commutator of X i+1 with B i is Then by computing the commutators of we get all nearestneighbour two-body Pauli operators Similarly, it can be readily verified that the commutators between

and operators in
yield all "nearest-neighbour" three-body Pauli operators. Now, let us show that g also contains the remaining non-nearest-neighbour two body operators. Consider the commutator between the three-body nearest-neighbour operators where N, M, O, P, Q ∈ {X, Y, Z}. Clearly, the different choices of M, O, P and Q will generate all next-nearestneighbour two-body operators. Iterating this procedure we obtain all 9 n 2 two-body terms. Then, once we have all two-bodies we can use one-bodies to get all three-bodies. Three-bodies with one-bodies will give four-bodies, and so on. We will get all n-body operators. Thus the DLA of the HEA is full rank, which implies that the HEA is a controllable ansatz.

I.0.2 Generators of the Spin Glass (SG)
The set of generators for a spin glass system is given by with h i , J ij ∈ R. For convenience, we define the following two operators The commutator of H p and H m gives We then compute the commutator of A 0 and H m : Combining H p and [H m , A 0 ] gives We now compute the commutator between A 1 and H m as follows: Combining A 0 and A 3 , we get Similarly, combining H m and A 4 gives Finally, combining A 5 and H p gives From the commutator of A 4 and A 5 we get Moreover, the commutators of A 7 with A 4 and A 5 lead to By repeating this procedure, we get that the set also belongs to the Lie algebra. Now, because the h i s are sampled from a Gaussian distribution, we can safely assume them to be non-zero and different from each other. Then, using a Vandermonde determinant type of argument one can show that the 3n elements inS are linearly independent and span the same subspace as . Thus S belongs to g SG . Combining this with A 6 , we essentially get the generators of G HEA and hence one can again generate all n-body Pauli operators. Thus the DLA of the spin-glass system is also of full rank, which implies that the spin-glass system is controllable. J Proof of Theorem 2: Variance in subspace controllable systems In the following, we provide a proof for Theorem 2 by explicitly computing the variance of the cost function partial derivative in a subspace controllable setting. Consider a set of generators that share a symmetry (for simplicity we assume only one symmetry, although generalization to multiple symmetries is straightforward), i.e., there is a Hermitian operator Σ such that [Σ, g] = 0 ∀g ∈ g. Assuming Σ has N distinct eigenvalues, the DLA has the form g = Let us introduce some notation. Consider the d × d m matrix that results from horizontally stacking the eigenvectors of Σ associated with the m-th eigenvalue (of degeneracy g m ) such that Q m maps vectors from H to H m . These satisfy where P m are projectors onto the subspace, such that N m=1 P m = 1 d . Let us now use the notation to denote the d m -dimensional reduced states and operators, respectively. Recall that, since any unitary U ∈ G produced by such a system is block diagonal, we can write U = m P m U P m . Also, let us note that if We are ready to prove of Theorem 2, which we here recall for convenience. Here is the Hilbert-Schmidt distance, and A (k) the reduction of operator A onto the subspace of H k as defined in Eq (93).
Proof. Consider the partial derivative of the cost function C(θ) with respect to the parameter θ pq (= θ µ ), i.e. the one associated with layer p and generator H q (= H µ ). We have where in the second line we have expanded U (θ) = U A U B , with corresponding to the unitaries before and after the parameter θ pq , and O A = U † A OU A . Then, the variance of the partial derivative is where U A and U B denote the distribution of before and after unitaries, respectively, and F (U B , ]. Now, expanding the block-diagonal unitaries in this equation and assuming that the initial state belongs to a particular invariant subspace, i.e., ρ (k) = P k ρP k = ρ, we find where we defined A ], and thus Note that here we have used the fact that, owing to Eq. (100), F (U B , U A ) actually depends only on the action of U At this point, we introduce the assumption of subspace controllability on H k . By virtue of Theorem 1 and Corollary 1, we know that the distribution of unitaries produced by a subspace controllable PSA constitute a ε-approximate 2-design in the subspace. Therefore, we can use Eq. (39) to integrate and get where ∆(A) is the Hilbert-Schmidt distance defined in Theorem 2. Here, we used that Tr[X] = 0. This is grounded in the fact that, because the generator V shares the symmetry, the commutator X in the subspace is still a commutator, i.e., If the initial state was spread across two (or more) subspaces, neither ρ (k) would be a density matrix, nor X would be a commutator and one would have to be more careful in the derivation.
Finally, we proceed to integrate Eq. (102) over U A . Notice again thatÕ (k) = P kÕ P k = U †(k) A so X is actually only a function of U (k) A (and not of the entire U A ). Consequently we can integrate over the reduced distribution U (k) A , that, according to the subspace controllability assumption, forms a ε-approximate 2-design over U(d k ). This leads to Here, we used the notation Var θ − → Var A,B to make explicit the assumption that both U Proof. We consider the situation in which we have a reducible system and an initial state ρ that belongs to an invariant subspace H k which, by assumption, is controllable. Notice that, using Eqs. (J) and (93), we find Then, since the operator M = AP k A is a positive semi-definite operator, and since P k 1, one can write Using this expression, the following bound on the Hilbert-Schmidt distance holds Using Eq. (112) and the equation for the variance in (106) we find where we have additionally used the fact that ∆(ρ (k) ) 2 ∀ρ. Recalling that we are interested in the case when d k ∈ O(2 n ) we find that, assuming Tr[V 4 ], Tr[O 4 ] ∈ O(2 n ), then the function is such that Hence, we finally find that Var θ [∂ µ C(θ)] T (n), and the cost exhibits a barren plateau from the fact that the variance of the cost function partial derivative is upper bounded by a function that vanishes exponentially with n.
Let us finally denote that the proof of Corollary 2 follow from the fact that Tr[(H µ ) 4 ], Tr[O 4 ] ∈ O(2 n ). We here note that the following relevant cases satisfy this assumption: • H µ , O are projectors of arbitrary rank.
• H µ , O have a decomposition in the Pauli string basis of the form i c i σ i (with c i real coefficients, and σ i ∈ {1, X, Y, Z} ⊗n ) with up to O(poly(n)) terms such that i c 4 i ∈ O(poly(n)).

L Proof of Theorem 3: Expressibility in the subspace
Here we prove Theorem 3, which we now recall.
Theorem 3. Consider a system that is reducible and let ρ ∈ H k with H k an invariant subspace of dimension d k . Then, the variance of the cost function partial derivative is upper bounded by with Here we define X = [H For simplicity, we here employed the short-hand notation · U (k) Proof. Let us first note that the variance of the cost function partial derivative can be expressed as where we defined Then, we recall that A x (·) denotes the expressibility superoperator for the second moment in the k-th subspace where ω = A, B indicates that one evaluates the expressibility of U Which leads to Using the triangle inequality and then the Cauchy-Schwarz inequality one finds Then, using the fact that one finds Similarly, one can derive Eq. (118) by replacing A A ((O (k) ) ⊗2 ) into (119), which leads to (131) Following a similar derivation to the one used in obtaining (125), one finds M Proof of Proposition 3: Variance on the irreducible representations of SU(2) In this section we derive a proof for Proposition 3 for the variance of the irreducible representations of SU(2). Proposition 3 reads.
Proposition 3. Consider the cost function of Eq. (20). Let θ µ = θ j,x , and let us assume that the circuit is deep enough to allow for the distribution of unitaries U A and U B to converge to 2-designs on G = SU(2). Then variance of the cost function partial derivative ∂ µ C(θ) = ∂C(θ)/∂θ µ is Proof. First, let us bring the reader into context. We consider a toy model ansatz U (θ) = L l=1 e −iθ lx Sy e −iθ ly Sx with generators G = {S x , S y }, where {iS x , iS y , iS z }, with S ν ∈ C d×d (ν = x, y, z), form a basis of the spin S = (d − 1)/2 irreducible representation of su(2). Here, it is convenient to use as basis of the Hilbert space the set {|m }, m = −S, −S + 1, . . . , S − 1, S, of eigenvectors of S z . That is, we have with λ ± = S(S + 1) − m(m ± 1), and where S + and S − are the spin ladder operators such that S x = 1 2 (S + + S − ) and S − = 1 2i (S + − S − ). The dynamical group G is the d-dimensional representation of SU(2) and we will be interested in the partial derivative w.r.t θ µ = θ j,x , i.e., the parameter associated with generator V = S x on the j-th layer of the circuit. We will assume that the circuit is deep enough to allow for the distribution of unitaries U A and U B to converge to 2-designs on G = SU(2). In consequence, when computing the variance of the different gradient components, we will be allowed to replace the integration over the angles in the PSA with an integration over the Haar measure on the group SU(2). Moreover, because irreducible representations of SU(2) and SO(3) are isomorphic, we can choose to integrate over the latter That is, we choose a parameterization of SO(3), e.g. in terms of Euler angles U (α, β, γ) = e −iαSz e −iβSy e −iγSz , where 1 8π 2 dα dβ sin β dγ is the normalized Haar measure for this parametrization of SO(3). Considering an initial state that is an eigenstate of J z , i.e., ρ = |m m|, its evolution can be conveniently expressed in terms of Wigner's small d-matrices U (α.β, γ)ρU (α.β, γ) † = Now, the computation of the mean value of ∂ µ C(θ) amounts to making the choice M = X, with X = [H µ , U † A OU A ]. We readily find that which follows from the fact that commutators are traceless.
(150) We can proceed to integrate over U A , assuming again we can replace U A with an integration over the Haar measure on G. Let us call v i each contribution to the variance, i.e. Var θ [∂ µ C(θ)] = Similarly, it is easy to see that Altogether, we finally find where we have introduced Var A,B to make explicit the assumption made, that is, that the before and after distributions of unitaries form 2-designs.
Here we numerically simulate the model employing a PSA of the form for minimizing the normalized cost function We compute Var θ [∂ µ C(θ)] for the initial states |m = |S and |m = |1/2 for L = 100 using 1200 random initialization. As shown in Figure 1, the theoretical prediction of Eq. (162) matches the numerical results, indicating that an initial state |m = |1/2 is trainable, while an initial state |m = |S leads to a barren plateau in the cost function. Here, Π is the unitary d-dimensional irreducible representation of the element of the symmetric group S n that corresponds to a reflection over the central constituent. As a result of these two symmetries, the state space is broken into invariant subspaces with well defined parity and excitation number The dimension of each excitation subspace is dim(H m ) = n m , whereas, the joint parity-excitation subspaces have dimension dim(H m,σ ) ≈ 1 2 dim(H m ). The DLA is, accordingly, a direct sum of simple algebras on each invariant subspace Notably, upon the addition of a generator consisting of local fields on either end of the chain, G XXZ = G XXZ U ∪ {Z 1 + Z N }, the DLA can be shown to become full rank on each subspace g XXZ =

N.2 The TFIM and LTFIM models
In this section we review the symmetries of the different variants of the Transverse Field Ising Model (TFIM) presented in Section 5.2.2.

N.2.1 Open boundary condition
Let us first consider open boundary conditions on the TFIM model. In this case, the generators G TFIM = n i=1 X i have two symmetries. On one hand, they commute with the parity symmetry Π defined in Eq. (165). On the other hand, they commute with the so-called Z 2 operator that amounts to a global flip of the qubits. Consequently, H is broken into four subspaces H σ,σ with σ, σ ∈ {1, −1}. Because the initial state |+ ⊗n for the cost function in Eq. (32) is an eigenstate of both symmetry operators with eigenvalues σ = σ = +1, the dynamics under such a PQC is constrained to the H +1,+1 subspace. We find, using Algorithm 1, that the dimension of the DLA scales polynomially, dim(g) = n 2 (and so does the restriction to the σ = σ = +1 subspace. In turn, consider open boundary conditions on the Longitudinal Transverse Field Ising Model (LTFIM), given by generators G LTFIM = G TFIM { n i=1 Z i }. While parity symmetry is conserved, the introduction of this new global longitudinal field breaks the Z 2 symmetry. Thus, we are left with only two subspaces, H = σ=±1 H σ , of dimensions dim(H σ ) ≈ 2 n 2 . As expected, the DLA breaks into two corresponding subspace DLAs, each of which we find to be full rank on the corresponding subspaces, i.e., both subspaces are controllable.

N.2.2 Closed boundary conditions
Let us now consider closed boundary conditions. This case is slightly more involved since the parity symmetry is replaced by C n , the cyclic group of n elements. Hereafter, we follow Ref. [111]. Consider the operator R whose action is to cycle the qubits in a state, i.e., R|a 1 , . . . , a n = |a n , a 1 , . . . , a n−1 . Clearly, R n = 1. Now, as a consequence of the invariance of the generators under the action of this group of symmetries, the state space is broken into n invariant subspaces H = n−1 k=0 H k , where the projector onto each subspace is given by In particular, the state |+ ⊗n ∈ H 0 and for that reason we will only focus on this subspace. Let us note that a general formula for the dimension of this subspace is, as far as we know, not known, but it is possible to derive a closed expression in the case of n prime [111] dim(H 0 ) = 2 + (2 n − 2) n , which makes evident that the subspace is exponentially large. Furthermore, we compute the DLA reduced to this subspace in the LTFIM case and see an exponential behaviour. In contrast, for the TFIM model we find a DLA that reduced to the k = 0 subspace and the σ = +1 (also has Z 2 symmetry), and that has a linear scaling, i.e., dim(g 0,+ ) ∈ O(n).
O Initial state for the numerical simulations of the XXZ spin chain model The initial states |ψ m,+ in the cost function of Eq. (25) are chosen to be eigenstates of the symmetries of the generators, namely M = n i=1 Z i and Π (the reflective spatial symmetry w.r.t. the chain's center) defined in Eq. (165). That is, the initial state is Here m denotes the number of excitations in the state, and + the fact that it is an even parity eigenstate.