Characterization of variational quantum algorithms using free fermions

We study variational quantum algorithms from the perspective of free fermions. By deriving the explicit structure of the associated Lie algebras, we show that the Quantum Ap-proximate Optimization Algorithm (QAOA) on a one-dimensional lattice – with and without decoupled angles – is able to prepare all fermionic Gaussian states respecting the symmetries of the circuit.


Introduction
Variational quantum algorithms have recently received much attention as a potential candidate for demonstrating quantum advantage. Originally proposed in the context of quantum chemistry [76] and classical optimization problems [28], such algorithms have since found wide use as a tool for leveraging the current generation of noisy, intermediate scale quantum computers [77] to tackle problems which Gabriel Matos: pygdfm@leeds.ac.uk are hard to solve classically. For that purpose they have been extended to a myriad of domains, including machine learning, [1,9,18,27,38,81], preparation of general condensed matter quantum states [29,41,46,79,80,101,105], finance [25,71], molecular biology and biochemisty [10,74], and linear algebra [11,17,106], and have been implemented in numerous quantum simulation platforms [5,34,48,75,102].
A major issue with this class of algorithms is that they are notoriously hard to optimize classically. Not only are they known to have an optimization landscape riddled with local minima [44,58], they also suffer from the phenomenon of "barren plateaus", in which the gradients with respect to the optimization parameters vanish exponentially with system size [63,99]. Though several strategies have been proposed to deal with this problem [35,52,56,57,64,90,110] it remains an active area of research.
Successfully employing parametrized circuits requires a balance between expressibility and trainability. Indeed, while universal circuit ansätze exist [8,66], problem-agnostic circuits typically present barren plateaus [43]. Thus, it is crucial to design parameterized circuits that are able to adequately prepare or approximate the quantum state of interest, while not being overly expressive so as not to compromise its trainability. To this end, a characterization of the expressibility of these algorithms has been performed in several contexts, and strategies to systematically quantify it have been proposed [1,2,23,68,84]. A way to constrain the expressibility of a parameterized circuit is to employ problem-tailored ansätze, such as the Hamiltonian Variational Ansatz [103], among other strategies such as exploiting symmetries in the problem [26,33,59,95,98] or removing redundant parameters [31,32].
Other factors not immediately linked to the expressibility of the circuit are known to affect the optimization, such as the boundary conditions used; these have been observed to greatly affect the success of optimization, and strategies have recently been pro-posed to address this issue [86]. Another set of such factors is related to the individual characteristics of states to be prepared, such as locality of correlations and entanglement [20,42,60,93,94,96]. Despite the extensive work done on this subject, there is no systematic characterization or theory explaining the interplay between all these aspects and their connection to expressibility.
Finally, it has been observed that an increase in the number of parameters in the circuit without a corresponding increase in the expressibility induces an overparameterized regime, where the optimization is known to converge significantly faster, becoming less prone to local minima and alleviating the aforementioned issues [49,50,53,54]. Though significant progress in explaining the mechanism behind this phenomenon has been achieved [54,108], open questions remain, such as how to determine the optimal circuit depth that best leverages this effect.
In this work, we begin by comprehensively characterizing the original QAOA protocol [28] on a 1D lattice and a variation on it using decoupled angles [40]. By deriving the explicit structure of the associated Lie algebras [54,66], we show that it can prepare exactly all fermionic Gaussian states satisfying the symmetries of the circuit. Guided by these results, we find that, while decoupling the angles increases the number of parameters and the expressibility of the circuit by removing symmetries, this makes the preparation of non-local states easier and of local states harder, flipping the behavior observed when these angles are coupled. This contrasts with the commonly held belief that the use of symmetries is beneficial to the optimization [6,33,55,65,73,82,83,92,109].
By leveraging covariance matrix and automatic differentiation techniques to simulate deep circuits and system sizes up to 80 qubits, we study the overparameterized regime of optimization, exploiting the fact that its onset is polynomial in lattice size for the ansätze we consider (see [53,54] and Section 3.1). In these conditions, we observe that the number of iterations to converge to the solution scales linearly with the size of the system. This contrasts with the polynomial scaling we encounter at smaller depths. Moreover, we quantify the degree of improvement that overparameterization confers to the optimization as the circuit depth increases. We find that the number of iterations to converge decays exponentially with circuit depth, and that the optimization hardness develops a minimum for a depth that is proportional to L 2 , the square of the system size. Finally, we find that the exponential convergence to the solution seen in the overparameterized regime can be explained in terms of an improvement of the quality of the linear approximations provided by the gradients, elucidating why the optimization hardness continues to decrease even after local minima disappear. This paper is organized as follows. In Sec. 2 we in-troduce the variational setup, focusing on free-fermion systems, and establish the associated notation. Moreover, we develop Lie theoretical tools for the analysis of variational algorithms. We apply these tools in Sec. 3 which contains our main results. In particular, Section 3.1 characterizes the expressibility of protocols introduced in Section 2.3, where they are presented alongside a concise introduction to free fermionic systems. In Section 3.2, we explore the optimization associated to the corresponding parameterized circuits, and highlight the influence of symmetries and the locality of the target state. In Section 3.3 we discuss the effect of overparameterization, and perform a scaling of the number of iterations to converge with system size and circuit depth. Our conclusions are presented in Section 4, while the Appendixes contain further numerical characterization of the circuit's optimization landscape in terms of gradient variance, staircase structure in optimization traces, and the details of the underlying Lie algebra structure.

Variational Quantum Algorithms
Variational quantum algorithms (VQA) [7,14] are generally formulated as a feedback loop between an optimization routine running on a classical computer, and a quantum simulator. This routine manipulates a set of controllable parameters defining a family quantum circuits with the objective of finding a circuit which is able to prepare a quantum state of interest (though more general definitions exist depending on the problem at hand). In this context, the parameterized family of quantum circuits we will focus on is constructed by following the alternating "bang-bang" structure of the Quantum Approximate Optimization Algorithm (QAOA) [28,107]. Given a set of Hamiltonians H = (H 1 , ..., H m ), the circuit is defined by the unitary operator where p controls the circuit depth and θ ≡ {θ 1,1 , θ 1,2 , . . . , θ p,m } are the parameters to be optimized. We call such a set of Hamiltonians H a protocol. The associated parameters are often called angles in the literature. Given an initial state |ψ(0) and a set of parameters θ, this circuit prepares the state The goal, as outlined above, is to employ a classical optimization routine in order to find a set of angles θ * , such that |ψ(θ * ) is the quantum state of interest, which we call the target state. This is done by supplying the optimizer with a cost function, which measures the distance between the state in preparation and the target state.
Here, we define the target state as being the unique ground state of some quantum Hamiltonian H. In this case, one can use a cost function based on the expectation value of the state with respect to this Hamiltonian. We use the shifted energy density as the cost function, where E 0 is the ground state energy of H and L is the size of the system under consideration. By the variational principle, and given that we assume that the target state is the unique ground state of H, this cost is minimized and is equal to zero when |ψ is the target state. In what follows, we refer to the Hamiltonian featuring in this cost function as the target Hamiltonian.

Lie Theory and Expressibility
Lie theoretical techniques provide a powerful tool to characterize the expressibility of quantum controllable systems [3]. They have been previously used in the literature to prove the universality of a set of protocols under certain assumptions [66]. These have been further employed to characterize controllability and overparameterisation in variational quantum algorithms [53,54], and have recently found use in characterizing symmetries in data in the context of machine learning [55,65].
Here we summarize and formalize the theory behind this and define the associated notation. Given U (θ, p) defined by the Hamiltonians H = (H 1 , ..., H m ) as in Section 2.1, we define The set U contains all unitaries that the protocol H can generate at arbitrary circuit depth. It is a group, as it contains the product of any two of its elements and the inverse of any of its elements. Further, since matrix multiplication is differentiable, it constitutes a Lie group. Associated to the Lie group U is a Lie algebra u, which can be defined at a point θ as  From an initial state, the set of unitaries generated by the parameterized circuit, U, prepares a manifold of states S. The space of directions that the protocol is able to explore at a given point is characterized by the Lie algebra u, and there is a redundancy in the unitaries preparing a state which is represented by a stabilizer gauge group G. Symmetries in the protocol constrain the optimization (a) to a submanifold of states Ssym, which affects the landscape by introducing local minima (b) and restricting the features available to the optimizer, causing the optimization to take longer (c), as explained in Section 3.2. By increasing the depth of the circuit, the variational protocol enters an overparameterized regime (d), where minima disappear and convergence of the cost function to a global minimum becomes exponential. 2. Circuits illustrating the variational protocols introduced in Section 2.3 and studied throughout the rest of this paper 3. The parameterization at minimal circuit depthp (e) matches the dimension of the manifold of states, so that minima exist and are unique. By the implicit function theorem, the overparameterization (f) defines local, lower-dimensional parameterizations matching the dimension of the manifold. These more adequately capture its features, while describing the optimization path in a piecewise manner (see Section 3.3). It is an open question whether a global exact parameterization with these properties could be found.

site-dependent
We further define (6) as the set of states preparable by the variational quantum circuit at depth p and at any depth, respectively. We emphasize that S depends on the initial state chosen.
We will restrict our analysis to finite dimensional spaces, for which there must exist a p * such that U = U p * [19]. By the same argument, there must be ap such that S = Sp. This represents the circuit depth at which the circuit has reached maximum expressibility for a given initial state. We will see that, in general,p < p * . This happens because there can be a set G of unitary matrices in U that leave the initial state |ψ(0) invariant, forming a stabilizer subgroup. Mathematically, this translates into U having a fiber bundle structure and S ∼ = U/G, where G represents a Gauge symmetry group [67] (see Figure 1.1 for a graphical summary). As a consequence of this, justifying that, in general, S will require less parameters to describe than U.
It is an open question whether there is a method to systematically determine p * and p given a set of Hamiltonians H. [19,66], the unitaries in U can approximate a quantum annealing protocol and thus adiabatically prepare the ground state of any Hamiltonian in u [62], provided that |ψ(0) is the ground state of some H 0 ∈ u. Conversely, if |ψ is prepared by U ∈ U, then it is the ground state of H = U H 0 U † ∈ u. Thus, the Lie algebra u fully characterizes the set S of preparable states; we will exploit this in Section 3.1 to study the expressibility of the protocols defined in Section 2.3.

Free fermionic systems and VQA
Throughout this paper, we work with a 1D spin system on a linear lattice, and we denote the corresponding lattice size by L. We define Majorana operators through the standard Jordan-Wigner transformation where we assume that the string of Zs stretches from the left end of the lattice to the ith position. Quadratic fermionic Hamiltonians (which we abbreviate to "quadratic Hamiltonians") are of the form where h j,k is a 2L×2L real and antisymmetric matrix. Note that all quadratic Hamiltonians preserve (commute with) the fermionic parity P = j Z j . The eigenstates of a quadratic Hamiltonian are fermionic Gaussian states (FGS), which are a class of quantum states that are fully determined (up to a phase) by their 2L × 2L covariance matrix which is real and antisymmetric.
Both FGS and quadratic Hamiltonians are efficiently representable, requiring a number of parameters that is quadratic in system size to be specified. Moreover, the quantum dynamics associated to the evolution of a covariance matrix under the action of a quadratic Hamiltonian is efficiently computable, as is its expectation value with respect to a quantum observable [88,91,97].
We now establish how symmetries in the 1D lattice are reflected on the structures we have just introduced. The covariance matrix of a translationally invariant FGS |ψ and a translationally invariant quadratic Hamiltonian defined by h j,k satisfy, respectively: for all integers m and it is understood that coefficients are taken modulo the lattice size. The covariance matrix of a lattice inversion symmetric FGS ψ and a lattice inversion symmetric quadratic Hamiltonian H defined by h j,k satisfy, respectively: Throughout, we denote by "symmetric FGS" and "symmetric quadratic Hamiltonians" those that are invariant both under translation and lattice inversion. Following the framework outlined in Section 2.1 we study two different protocols, both of which feature only quadratic Hamiltonians: 1. A site-independent protocol, defined by the set and we denote its Lie algebra by i.

A site dependent protocol, where
and we denote its Lie algebra by d.
The former corresponds to the original QAOA protocol [28] on a linear lattice, while the latter results from removing the layer-wise coupling in the angles of this original protocol [40]. We will see that this decoupling results in distinct properties with respect to the optimization of the associated variational algorithm. Writing out the corresponding unitary explicitly: As mentioned above, the site-independent protocol can be seen as the site-dependent one with the additional constraint that θ i a,P = θ a,P , i.e., the value of the angle is the same across a circuit layer.
The full structure of the Lie algebras corresponding to the protocols above is elucidated in Appendix B. In particular, a basis for these algebras is obtained by iteratively taking the commutators of their generators. Moreover, we clarify how their structure changes when we restrict ourselves to a particular sector of the parity symmetry. In the next section, we will employ them to study the expressibility of the corresponding protocols following Section 2.2.

Circuit expressibility and saturation
Here, we determine the expressibility of the protocols introduced in the previous section by examining the corresponding Lie algebra. In particular, we study the set of unitaries U that each protocol can generate and the set of states S that each can prepare. In what follows we consider the initial state to be a fermionic Gaussian state of a given parity respecting the symmetries of the circuit.
Our results are summarized in Table 1. All protocols are able to prepare every FGS with the same parity as the initial state and respecting the symmetries of the circuit. This can be proved by studying the structure of the corresponding Lie algebras, derived in Appendix B and referenced in Table 1. By applying the Jordan-Wigner transformation, these Lie algebras can be seen to form the set of free fermionic Hamiltonians satisfying the symmetries of the circuit. As shown in Section 2.2, every ground state of such a Hamiltonian can be prepared by the circuit at some depth; and the set of these ground states are precisely the fermionic Gaussian states having the same parity as the initial state and respecting the appropriate symmetries.
As outlined in Section 2.2, there can be a symmetry subgroup G of U that leaves the initial state invariant. In the case of a FGS, there is a U (L) freedom in the fermionic modes, which can be rotated without changing the underlying state [104]. The subset of these rotations contained in U will form G; this can be all of the U (L) freedom, as is the case of the site dependent protocol, or only part of it, in which case dim G < dim U (L).
We further attempt to determine the minimum depthp necessary to prepare any state in S for each of the cases in Table 1. In what follows, we denote by q the number of variational parameters per unit of p. While it is clear that qp must be greater than dim S, the circuit must be also be deep enough so that correlations are able to propagate across the lattice [62]. A consequence of this is thatp ≥ L/2 . We computep numerically by randomly generating Hamiltonians in u and verifying that their ground states are prepared to numerical precision. The quantity represents the number of parameters in the circuit that exceeds dim S. We find, in cases where periodic boundary conditions (PBC) are employed, or where the sitedependent protocol is used, thatp = L/2 , saturating the aforementioned lower bound. For the remaining case, which corresponds to the site-independent protocol using open boundary conditions (OBC),p proved to be unfeasible to determine numerically in a precise manner due to a significantly higher number of local minima.
In the site-independent case with periodic boundary conditions, qp = dim S, which suggests an exact parameterization of S. In contrast, in the sitedependent case with PBCs, there are qp − dim S = L redundant parameters atp. One can, however, do away with them by removing the last e −i j θ p,Z Zj layer from this circuit, while still being able to prepare all states in S. Thus, in this case, one can also obtain an exact parameterization. By abuse of terminology, we will refer to the behavior at circuit deptĥ p as the exactly parameterized regime, regardless of whether qp = dim S. A consequence of having an exact parameterization of S is that, when the associated angles are appropriately restricted, global minima exist and are unique.

Effect of Symmetries and Locality on the Optimization Landscape
We proceed to study the hardness of the optimization and the characteristics of the associated landscape when running a variational algorithm using the siteindependent, Eq. (13), and site-dependent, Eq. (14), protocols. We work with PBC, and target the ground state of two models: 1. The critical transverse field Ising model [41] (13), and site-independent, Eq. (14), protocols. The structure of the Lie algebras u corresponding to each protocol is given in Appendix B; the entries in this table refer to the respective basis. As outlined in the main text, these are used to analytically deduce U, the space of unitaries that each protocol can generate, and S, the space of states that each protocol can prepare. We assume that the initial state is a FGS of a given parity respecting the symmetries of the circuit.
The first is a well-known quantum-critical model in condensed matter physics [24], possessing a ground state whose entanglement entropy diverges logarithmically with system size [13]. The second Hamiltonian is obtained by sampling at random out of all the ones for which the ground state is possible to prepare with both protocols. As mentioned at the end of Section 2.3, this is characterized by the Lie algebra corresponding to each protocol; in practice, the algebra of the site-dependent one contains that of the site-independent, and the latter, when using PBC, consists of all quadratic Hamiltonians satisfying Eqs. (11)- (12). These Hamiltonians are sampled by directly generating entries in h ij using a normal distribution with mean equal to zero and standard deviation equal to one, following these constraints. Moreover, below we use a Z-polarized state as the initial state of the protocol. The classical minimization is performed using the BFGS optimization algorithm; though other optimizers such as Nelder-Mead and conjugate gradient were checked and the behavior obtained was qualitatively the same. Figure 2 shows the optimization traces after classically optimizing the algorithm. We compare two cir-cuit depths: p = L/2 [41,62,100], which we have determined to be the minimum depth for which the protocol reaches maximum expressibility, and p = L 2 /4, well into the overparameterized regime (as we quantify in Section 3.3), where the redundancy in parameters is known to greatly reduce the computational cost of the optimization [49,50,53,54]. We defer a discussion of the latter for Section 3.3.
We observe that changing the target state and the protocol employed can drastically alter the characteristics of the optimization. In particular, we identify a "staircase" pattern, signaling a harder optimization problem, which is discernible at higher system sizes. It emerges both when employing the site dependent protocol to target the Ising model [ Figure 2(a)], and when using the site independent protocol to target generic quadratic Hamiltonians [ Figure 2(d)]. We discuss and propose a mechanism for this phenomenon in Appendix C, where we see that the cost function is not able to distinguish the target state from other states in the Hilbert space, even if they are orthogonal. It was argued in [4] that this behavior is not possible when the barren plateau phenomenon is present; here, we see that it can represent an intermediate behavior of the landscape as the size of the system increases and gradients begin to vanish (we study the scaling of the gradients in Appendix A). The efficient classical simulation of FGS is pivotal in observing it, as it is not as evident in smaller system sizes.
The only case susceptible to local minima is the site independent one at p = L/2 when targeting generic symmetric quadratic Hamiltonians [ Figure 2(d)]. There, the optimization is highly sensitive to the initial condition, and the number of iterations to converge, along with the value found at the minimum, can vary drastically. This starkly contrasts with the other cases in Figure 2 -in particular, the one where we target the Ising model using the same protocol [ Figure 2(b)]. While the average number of iterations to convergence is approximately the same in both cases, the former has a high standard deviation, as can be seen in Figure 5. Thus, using a protocol with less symmetry produces more consistent results which are not susceptible to local minima. This shows that imposing more symmetries may not always be desirable, as it may have an unpredictable effect in the optimization landscape and can result in local minima.
We now study how different properties of the target Hamiltonian can give rise to the phenomenology we previously observed and thereby explain Figure 2. One obvious property that distinguishes the Ising Hamiltonian, Eq. (17), from that of the random Hamiltonian in Eq. (18) is the presence of locality in the former case. Locality of the target Hamiltonian is known to influence whether the optimization associated to a quantum circuit will feature barren plateaus, with non-local terms presenting exponentially vanish- , the cost initially undergoes a slow, but smooth, decrease, before sharply dropping when the state is prepared; when the target state is a generic symmetric FGS (d), the staircase pattern is again visible, this time also accompanied by local minima. This behavior is highlighted in Figure 9 in Appendix C. After increasing the depth of the circuit into the overparameterized regime, the differences in optimization between states vanish, and the cost function decrease becomes exponential with no local minima present.
ing gradients [15]. Locality can also have an influence below system sizes at which barren plateaus appear, and it has been argued that long-range interactions in the target Hamiltonian make the optimization harder [89], resulting in higher values of the cost function at the optimum and requiring more iterations to converge. In our case, we will see that the influence of the locality of the target Hamiltonian on optimization depends on the constraints of the protocol being used. In particular, we will show that the site-independent and the site-dependent protocol behave differently in this respect, which explains the observed differences in the optimization. We use three families of models to quantify how the locality of the target Hamiltonian affects the hardness of the optimization: 1. A special type of a long-range Ising Hamiltonian: where α describes exponentially decaying interactions in a lattice. The choice of this Hamiltonian is motivated by the fact that its ground state can be expressed in terms of free fermions for any α, unlike the related models with power-law decaying interactions recently studied in Refs. [42,89].

(k + 2)-local, symmetric, quadratic Hamiltonians
which are derived from the randomly-generated generic symmetric quadratic Hamiltonians in Eq. (18) by setting h jl = 0 for any pair of Majoranas at a distance ≥ 2(k + 2).

A cluster Ising model at criticality [21, 70]
for which the ground state in one of the gapped phases is a symmetry-protected topological state [78].
In the two latter models, interactions are strictly limited to sites at most k + 2 sites away, while in the first model they are exponentially suppressed. Figure 3 compares the effect of locality on the optimization. We vary the parameters controlling localization of the couplings in each of the models Eqs. (20)- (22), and we measure the success probability in the site-independent case or the number of iterations to converge in the site-dependent case. The success probability is defined as the ratio between the number of random initializations that resulted in the cost function dropping below numerical precision (and thus the target state being successfully prepared) versus the total number of initializations. This measure was not used as a benchmark for the site-dependent protocol, as this protocol is not susceptible to getting trapped in local minima, thus the success probability is always equal to one regardless of the locality of the Hamiltonian.
We see in Figure 3 that the more non-local the target Hamiltonian is, the lower the success probability is in the site-independent protocol. Surprisingly, however, we see that the more non-local the target Hamiltonian is, the lower the number of iterations is to converge is the site-dependent case. Both statements are verified for all the models introduced above. Thus, while locality makes it easier to prepare the target state using site-independent protocol, it makes the site-dependent protocol harder to optimize. We conclude that, on the one hand, the symmetry constraints in the site-independent protocol cause nonlocality in the cost function to drive the optimization into difficult regions that trap it in local minima. The site-dependent case, on the other hand, is free to explore the entire manifold of FGS and bypass these traps, and non-local terms lead the optimization to converge faster, consistent with Figure 1

Overparameterized regime
It has been pointed out [49,103] that taking the circuit depth to be very large, the optimization associated with Eq. (1) becomes considerably easier -a phenomenon dubbed overparameterizaton. The onset of the overparameterized regime has been argued to correspond the circuit depth at which the Quantum Fisher Information Metric saturates at every point θ in the optimization landscape [37,54]. This is equivalent to the circuit depth at which an increase in p does not lead to an increase in the states that can be prepared by the variational circuit Eq. (2), i.e., the circuit depth corresponding top as defined in Section 2.2. We numerically confirm this to be the case for all the cases discussed at the end of Section 3.1.
We perform a scaling analysis of the number of iterations that the optimizer takes to prepare the state as the depth of the circuit increases well into the overparameterized regime. We find that, as depth increases, the average number of iterations to converge initially suffers a large initial decay, until it slows down and saturates at p ∼ L 2 , i.e., no further increase in circuit depth provides a decrease in the average number of iterations to converge. We observe this trend consistently between different optimizers and different system sizes, the latter shown in Figure 4, where an exponential decay is seen. Further, by comparing how the average number of iterations the optimizer takes to converge scales with lattice size, both when the circuit depth is equal top and into the overparameterized regime, we see that what is initially a polynomial scaling turns into a linear scaling with lattice size -see Figure 5. From the above, we note that the overparameterized regime effectively represents a shift in the workload from the quantum computer to the classical one and vice-versa, as increasing the number of parameters makes the classical optimization easier, but the preparation of the state in the quantum computer harder given the increased circuit depth (and corresponding time to run the circuit, see Appendix D). Thus, as noise levels in a device decrease, increasing the depth of the circuit allows variational algorithms to take immediate advantage of these advancements. This has the caveat that when the number of parameters passes a certain threshold, the overhead associated with certain algorithms, such as BFGS, exceeds the advantage obtained from overparameterizing the . circuit; we quantify this in Appendix D. In this case, algorithms designed to handle a large number of parameters, such as ADAM or stochastic gradient descent, should be employed instead. Since in the site dependent case each circuit layer has ∼ L parameters, the behavior we have just described in terms of circuit depth p ∼ L 2 corresponds to ∼ L 3 parameters in this case. As all quantities in Table 1 scale quadratically or linearly with system size, the phenomenon of overparameterisation can neither be explained by the saturation of the manifold of preparable states nor by the saturation of the manifold of unitaries. In what follows, we propose and test an explanation for gradient based optimizers in terms of a change in the very properties of the parameterisation of the manifold as the circuit depth increases.
A gradient based optimizer is an algorithm that, given an initial condition θ 0 , iterates the following update function until it converges, that is, it can not find a value of η, called the learning rate, such that the update reduces the cost function e. The matrix A is a bias that provides extra information to the algorithm. It can be the inverse of the Hessian H −1 in the case of Newton based methods (or an approximation of it as in the case of quasi-Newton methods such as BFGS), or the inverse of the metric of the manifold being optimized over, as is is done in e.g. Quantum Natural Gradient descent methods [85]. Importantly, the gradient of a function is a linear local approximation of the function at that point. While that means that, if ∇e = 0, there is a value of η such that e(θ i+1 ) < e(θ), it does not offer any real guarantee about the actual change Here, we examine how good this local approximation is as the optimization progresses, both when the circuit depth is equal top and in the overparameterized regime. We run the BFGS algorithm, and the learning rate is picked on a per-iteration basis by using the strong Wolfe conditions, an established heuristic based on a minimum descent criterion [69]. This is a quasi-Newton method, and so A = H −1 will be an approximation to the inverse of the Hessian. In Figure 6, we plot which quantifies how much of the variation in the cost function can be attributed to the local approximation given by the gradient at the ith iteration. We see that in the overparameterized regime, most of the variation in the cost is accounted for by this approximation, and that it concentrates around an average ofk leading to the exponential decay seen in Figure 2 following the equation ∂e/∂θ =ke. We conclude that the overparameterized regime leads to parameterizations that are more amenable to optimization, as they capture the variation in the cost for longer distances in the parameter space (see panel 3 in Figure 1). achieve maximum expressibility. The efficient classical simulation of these states was employed to systematically study the optimization associated to these protocols. We observed that decoupling the angles of the protocol makes the preparation of non-local states easier, and of local states harder, which is the opposite of what is observed when the angles are coupled. We argued that this is due to the symmetries in the system constraining the features available to the optimizer. Further, we studied in detail the overparameterized regime, exploiting the larger system sizes and circuit depths accessible to us, where we find that the number of iterations to converge to the solution scales linearly with system size. Moreover, we found that the number of iterations to converge to the solution decreases exponentially with the depth of the circuit, until it saturates at a depth which is quadratic in system size. Finally, we observed that the improvement in the optimization can be explained in terms of of better local linear approximations provided by the gradients.
Beyond elucidating the current knowledge on the interplay between symmetry and optimization, and furthering the understing on the overparameterized regime of optimization, our work can serve as the basis for a benchmarking scheme for the implementation of variational algorithms in quantum computers. These have already been performed using free states such as, e.g., Majorana zero modes [87] or the ground state of the Ising model [16]; this scheme could potentially be leveraged into error correcting methods [12]. Moreover, it provides a framework to better understand the theory behind the preparation of free states using variational algorithms, already studied in models such as the Ising model [22,41], the Kitaev model in the exactly solvable limit [45], or the cluster model [70]. Furthermore, while there are established algorithms to build circuits that prepare FGS [47,51], these require a full description of the corresponding covariance matrix; a variational approach is relevant where this structure is not known beforehand e.g. when approximating interacting states [61,70] or maximizing a quantity of interest such as magic [39,72]. Finally, by fully characterizing these circuits on 1D lattices, we open the possibility to describe more complex graphs in terms of simpler ones following a divide-and-conquer strategy [30,111].

A Variance of gradient
Here, we study how the variance of the gradient with respect to the center angle scales with the lattice size and the circuit depth. This variance is known to capture the phenomenon of barren plateaus, as it provides an upper bound for the magnitude of the gradients across the optimization landscape [43,63]. The scaling with respect to the center angle is taken to be representative of the scaling with respect to the other angles [4]. Figure 7 illustrates the scaling of this variance with the lattice size, expressed in terms of the dimension of the Lie algebra, following Ref. [53] which conjectured that the variance of the gradient is inversely proportional to this dimension. We see that while this seems to hold in general, it depends on the circuit depth and the state under peparation. When preparing the Ising model, increasing the circuit depth changes this proportionality by a constant factor; while, curiously, when preparing generic FGS with the site dependent protocol, it is independent of the circuit depth. Note also that when the circuit enters the overparameter-ized regime, for instance when targeting the Ising model in the site-independent case at p = 7L, this relation can break down as the variance of the gradient saturates at high circuit depths (see next paragraph). Finally, we note that there are exceptions to this relation; in particular, we note that when preparing a generic FGS using the site independent protocol, the gradient seems to oscillate around a constant value of ∼ 10 without decaying as the system size increases. Figure 8 illustrates the scaling of the variance of the gradient with circuit depth. We see that when targeting the Ising model, there is an initial drop in this variance, which then stabilizes to a fixed value. In contrast, when preparing a generic FGS, the variance almost immediately converges to this stable value, particularly in the site-dependent case.

B Lie algebra structure
Here, we deduce the structure of the Lie algebras associated with the protocols in Eqs. (13)- (14) for both OBCs and PBCs. We do so by deriving a basis for the algebras generated by the aforementioned sets when unrestricted to any symmetry sector. Then, we restrict these bases to a fixed parity symmetry sector, and see how this affects the structure and dimensions of the algebra. Note that, for notational simplicity, throughout this section we often disregard signs when converting from spin to Majorana operators when it does not influence the generated Lie algebras.

with OBC is
and has dimension L(2L − 1).

with PBC is
where and this algebra has dimension 2L(2L − 1).

Lemma 2. The Lie algebra generated by
and this algebra has dimension L 2 [53].

with PBC is
and has dimension 3L − 2.
The proofs for these statements follow by inductively taking the brackets of the generators of these algebras. Here, we provide a proof for the structure of d OBC ; the others are derived in a similar fashion.
L =⇒ L + 1: Assume the Lemma holds for L. Then, define and let R a,b be the Lie algebra generated by G a,b . We must prove that R 1,L+1 = L 1,L+1 .
By induction hypothesis R 1,L = L 1,L . Using this, and the definition of Lie algebra generators, we obtain L 1,L ⊆ R 1,L+1 . Since it is easy to prove that and iT i,j , iZ k ∝ (δ ik + δ jk )iT i,j , we conclude that , and Z L+1 ∈ G 1,L+1 . Thus it must be the case that R 1,L+1 = L 1,L+1 .
Since, from the above, L 1,L+1 = L 1,L ∪ {T k,L+1 } k=1,...,L ∪{Z L+1 }, and since these sets are disjoint, using the induction hypothesis, the dimension of Noting that every element of the generators of the Lie algebras commute with P , we now state the structure of these algebras restricted to each of the parity symmetry sectors.
We first note that, as can be seen in Lemma 1 for the generators D, when restricting to a parity sector, the algebra in the OBC case remains unchanged, while the algebra in the PBC case is cut in half, and is equal to former. Hence: and it has dimension L(2L − 1).
As for the case of the set of generators I, we see that the algebra with OBC remains unchanged when restricted to a parity sector. Hence and it has dimension L 2 .
Finally, the same set of generators with PBC yields and it has dimension 3L/2 .

C Mechanism behind staircases
In Section 3.2, we described a pattern that we dubbed "staircase", where the optimizer gets stuck and the value of the cost function undergoes very little variation for a number of iterations until it sharply drops to a new plateau; we highlight this phenomenon more clearly in Figure 9. Here, we offer an explanation for this observation. In Figure 10, we plot the overlap of the state along the optimization with the eigenstates of the target Hamiltonian. We notice that the overlap of the state under preparation with the target Hamiltonian is orders of magnitude higher than with other excited eigenstates. The dynamics of state preparation is thus dominated by a competition between the ground state and the first excited state. We propose that the staircase plateaus we observe follow a similar mechanism: in each plateau, there is a state in the Hilbert space (akin to the first excited state in the previous description) that fully captures the features that the cost function struggles to distinguish from those of the ground state in each of these plateaus.

D Scaling of Hessian and optimization
In Figure 11, we examine the effect of increasing the circuit depth on the total time taken to run an op- timization and on the time it would take to prepare these states on a quantum simulator (measured by the sum of the angles). We see that despite increasing the circuit depth into the overparameterized regime making the optimization easier, the sum of the angles in the protocol grows linearly with the circuit depth, increasing the time to run the circuit on a quantum simulator and making it more prone to errors. Furthermore, we see that despite the number of iterations to converge decreasing into the overparameterized regime, when one looks at the actual time taken to run the optimizer, there is an inflexion point where it first goes down and then starts increasing again. This is due to the algorithm (BFGS) used, which stores an approximation to the Hessian; as the size of the Hessian increases, the computational cost associated to storing and manipulating it dominates the computational time. Thus, while it is feasible to a larger class of optimization algorithms at lower system sizes, as one increases the circuit depth, one has to switch to algorithms specialized to dealing with a large number of parameters e.g. ADAM or stochastic gradient descent.  Figure 10: Overlaps of lowest excited states of the target Hamiltonian with the state under preparation |ψ(θi) . Legend indicates the index of an eigenstate with which overlaps were taken, with 0 indicating the ground state, 1 indicating the first excited state, etc., while the ranges indicate a sum of the overlaps with the corresponding eigenstates. We see a high overlap with the first excited state throughout the optimization, indicating that the cost function cannot easily distinguish this state from the ground state. The siteindependent protocol was used and the Ising model (17) was targeted at the exactly parameterized depth p = L/2 for L = 16.  Figure 11: Different quantities characterizing the hardness of the optimization with increasing circuit depth. On the left, we plot the mean of the logarithm of the number iterations to converge; the center plot depicts the mean of the sum of all the angles of the protocol, where periodicity is appropriately taken into account; on the right, the average of the logarithm of the total computational time is shown. Generic symmetric quadratic Hamiltonians were targeted, and results were averaged over 5 random states and 5 random initializations per state. Filled line corresponds to the site-dependent protocol, while dotted line represents the site-independent protocol; these two cases essentially display the same behavior. Periodic boundary conditions were used, and the black vertical line indicates p = L/2, the depth at which the circuit is exactly parameterized. These results were obtained on an Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz.