Non-trivial symmetries in quantum landscapes and their resilience to quantum noise

Very little is known about the cost landscape for parametrized Quantum Circuits (PQCs). Nevertheless, PQCs are employed in Quantum Neural Networks and Variational Quantum Algorithms, which may allow for near-term quantum advantage. Such applications require good optimizers to train PQCs. Recent works have focused on quantum-aware optimizers specifically tailored for PQCs. However, ignorance of the cost landscape could hinder progress towards such optimizers. In this work, we analytically prove two results for PQCs: (1) We find an exponentially large symmetry in PQCs, yielding an exponentially large degeneracy of the minima in the cost landscape. Alternatively, this can be cast as an exponential reduction in the volume of relevant hyperparameter space. (2) We study the resilience of the symmetries under noise, and show that while it is conserved under unital noise, non-unital channels can break these symmetries and lift the degeneracy of minima, leading to multiple new local minima. Based on these results, we introduce an optimization method called Symmetry-based Minima Hopping (SYMH), which exploits the underlying symmetries in PQCs. Our numerical simulations show that SYMH improves the overall optimizer performance in the presence of non-unital noise at a level comparable to current hardware. Overall, this work derives large-scale circuit symmetries from local gate transformations, and uses them to construct a noise-aware optimization method.


Introduction
The era of Noisy Intermediate Scale Quantum (NISQ) [1] computing has led to the emergence of novel algorithmic paradigms. Arguably, the leading role has been played by Parametrized Quantum Circuits (PQCs), which are exploited for both Variational Quantum Algorithms [2,3,4,5,6,7,8,9,10,11,12,13,14] and Quantum Neural Networks [15,16,17,18,19]. Training PQCs involves a hybrid quantum-classical optimization loop. Typically, the problem is encoded in a cost (or loss) function that is ideally efficient to evaluate on a quantum computer, but computationally expensive for a classical one. In practice, the cost function is estimated via measurements on a quantum computer which are usually post-processed on a classical device. While the quantum computer is used to evaluate the cost, a classical optimizer updates some (usually continuous) parameters associated with the quantum operations. PQCs with fixed gate structure are often referred to as a variational ansatz.
High performance classical optimizers are crucial to successfully train PQCs. To aid in optimizer selection and development, a fair amount of work has gone into determining the nature of quantum variational cost landscapes [20]. Important contributions include the development of analytical expressions for gradients of all orders [21,22,23,24,25], bounds on those derivatives [26], and even explicit functional forms for the expectation values computed with PQCs [27]. In addition, some light has recently been shed on the scaling of the gradient of quantum cost functions through a result known as barren plateaus [28,29,30,24,31,32,33]. This demonstrates that the landscape flattens exponentially with problem size for deep, unstructured PQCs, and also for shallow PQCs with global cost functions.
Even if one manages to avoid these barren plateaus, there is an additional difficulty of optimizing in the presence of the hardware noise that defines NISQ devices [34]. Hardware noise is expected to modify the cost landscape, and indeed it was recently shown to produce a novel kind of barren plateau whose impact increases with PQC depth [35]. Furthermore, while some models of hardware noise have been shown to leave the optimal parameters unchanged [36], this does not hold in general [37].
These results collectively reveal that the optimization of noisy PQCs presents many novel and unexpected challenges that must be addressed. This has spawned the field of quantum-aware optimizers, where researchers are developing classical optimizers that are specifically tailored to the unusual landscape issues in the quantum setting. Examples include quantum natural gradient [38,39], sequential function fitting [40], shot-frugal stochastic gradient descent [26,41,42], landscape modeling [43], and others [44]. Many of these optimizers are good at finding a local minimum, which has been shown to be accelerated by gradient information [45]. However, there remains the question of how to escape or move between such minima to find the global minimum.
In this work, we present a technique for training PQCs that we call Symmetry-based Minima Hopping (SYMH, pronounced "Sim"). As the name suggests, SYMH is a method for hoping between local minima that exploits underlying symmetries in PQCs. The SYMH technique can be combined with other optimizers, such as those in Refs. [38,39,40,26,41,42,43,44], to construct optimizers that search for minima that achieve lower costs by lowering the impact of noise. In this sense, our method is complementary to previous work, as classical local optimizers are easily integrated into the SYMH framework.
At the heart of our work is a novel understanding of symmetries and symmetry breaking in PQCs. In particular, we analytically prove that given some non-restrictive conditions on the ansatz, the cost landscape must have a certain periodicity, which gives rise to a large degeneracy of minima in the absence of noise. This is schematically shown in Fig. 1(a). In the noise-free setting, one can move between these degenerate minima with pulses that rotate the parameters by specific angles.
However, our second analytical result is that the symmetry can be broken by certain types of noise, specifically, non-unital noise (such as amplitude damping) and coherent noise. As a consequence, the degeneracy of various minima is lifted by noise, leading to landscapes like in Fig. 1(b). We denote this phenomenon noise-induced breaking of symmetries (NIBS).
The NIBS phenomenon, in turn, allows us to construct an optimizer where one exploits circuit symmetries to hop between local minima valleys. Because these valleys are no longer degenerate, such hopping potentially leads to lower cost values and mitigate the effect of noise by leading to better solutions. This is the idea behind our Symmetry-based Minima Hopping technique.
In what follows, we first lay out our general framework in the next section. We then discuss symmetries in PQCs in Section 3.1. In Section 3.2 we present a method to move between these symmetries that we call the σ-Pulse method. Section 4 considers the impact of noise and presents our results on the NIBS phenomenon. Section 5 presents our SYMH optimizer. Finally, Section 6 shows our numerical imple- Figure 1: (a) Schematic diagram of a noise-free cost landscape. Here the cost function is periodic under some translation in the parameters of the ansatz V (θ), which leads to the global minima being degenerate. By randomly initializing and optimizing θ one can converge to any of the degenerate minima. (b) Schematic diagram of the noisy cost landscape. The presence of quantum noise breaks the periodicity in the landscape, leading to some of the global minima that were degenerate in the noiseless scenario to become local minima. Now, when optimizing θ one can get trapped in a local minimum.
mentations of SYMH, and demonstrates that SYMH leads to a significant improvement in the optimizer performance for various problems under realistic noise models.

General Framework
In this section we introduce the general framework for our work. Specifically, we consider here a generic quantum machine learning task where the goal is to minimize a parametrized cost function of the form Here, S = {ρ x } S x=1 is a training set of input states, and f x is a function that determines the task at hand and which can be different for each input state. Moreover, V (θ) is a PQC, and θ are trainable parameters.
The cost function in (1) is in fact quite general and includes as special cases the cost functions for many important VQAs and QNNs. For example, in the Variational Quantum Eigensolver [3] we have S = 1 and the cost C = Tr[HV (θ)ρV † (θ)] is the expectation value of a given Hamiltonian H. Alternatively, in a binary classification problem, the cost can be expressed as the mean-squared error C = 1 2K x [y x − y(ρ x , V (θ))] 2 , with y x the true label, and y x (ρ x , V (θ)) the predicted label for each state in the training set [16].
We here consider PQCs V (θ) that can be expressed as the product of L unitaries as where θ = {θ l } L l=1 is a set of continuous parameters. Each unitary V l (θ l ) can in turn be expanded as where R µ (θ) = e −iσµθm l is a single qubit rotation, and where σ µ ∈ {X j , Y j , Z j } n j=1 is a Pauli operator on qubit j. Moreover, W m l denote unparametrized gates. For simplicity, we consider here that W m l are CNOTs or identities. That is, X is a CNOT with control qubit i and target qubit j, and where C is a graph of the qubit connectivity. Finally, take M to be the total number of controllable parameters in the PQC.
The PQC in (2)-(3) includes as special cases many ansatzes widely used in the literature. For instance, if V (θ) is a hardware-efficient ansatz [46], then the set of available CNOTs is determined by the device connectivity. In addition, V (θ) also includes as special cases the Quantum Alternating Operator Ansatz (QAOA) [5,47], and the Unitary Coupled Cluster (UCC) [48,49,50] ansatz. Specifically, when implementing the QAOA or a UCC ansatz one usually performs first order Trotter approximations of unitaries of the form e −iθH , where H is a Hermitian operator with an efficient decomposition in the Pauli basis. The latter then leads to a PQC that fits into the framework presently considered [35].
Given a PQC V (θ), we define its buffered version as follows.
Definition 1 (Buffered PQC). Let V (θ) be a PQC as in (2) and (3). We define the buffered version of this PQC as a gate sequence where U B (γ) is the so-called buffer unitary given by As shown in Fig. 2, the buffer unitary is simply given by a tensor product of single qubit rotations around the x and y axes which are parametrized by the vector γ of length 2n.
We note that our main results, stated below, are valid for buffered PQCs. However, since V (θ) = V B (θ, 0) one can always trivially extend any PQC to its buffered version. That is, any PQC of the form (2)-(3) can be considered as a buffered PQC with trivial rotation angles in the buffer unitary. In addition, we also remark that our results will also hold for any PQC where V (θ) contains (at least) two single-qubit rotations about different axes in every qubit (not necessarily sequentially or in parallel). However, for the sake of simplicity in introducing the method, we consider the case where one appends a buffer unitary to the PQC.
Finally, we remark that in some practical settings the buffer unitary can be implemented virtually without any additional computational overhead. Whenever V (θ) acts before measurement, the single-qubit rotations in the buffer layer can be absorbed into the measurement operator and executed classically by post-processing the measurement statistics. However, when the buffered unitary does not act prior to the measurements, such as when only a portion of the circuit is buffered, then U B (γ) must be included.

σ-Pulse method for finding parameter symmetries in PQCs
In this section, we first discuss symmetries in PQCs, which lead to degenerancies in the cost function landscape. We then present a method to move between different sets of parameters that are symmetric by propagating Pauli gates throughout the circuit, which we therefore call the σ-Pulse method.

Symmetries in PQCs
As it was recently pointed out in [37], different sets of parameters θ in a PQC can leads to the same unitary being produced. In order to further analyze this phenomenon, we first introduce the following definition: Definition 2 (Parameter symmetries). Let V (θ) be a PQC. We say that two distinct sets of parameters θ and θ are symmetric if V (θ) is equal to V ( θ) (up to a global phase).
Let us here make two important remarks. First, note that Definition (2) implies that the structure of the circuit remains unchanged between V (θ) and V ( θ), as no gates in the circuit are being added or replaced; only their parameter values differ. Second, we remark that these parameter symmetries naturally translate into cost function landscape degenerancies. That is, given two symmetric sets of parameters θ and θ we have C(θ) = C( θ).
Note that there are many mechanics which can lead to symmetries in θ. For instance, they can arise from the wrapping symmetry in a rotation, i.e., from the fact that for any single qubit rotation we have R µ (θ µ ) = R µ (θ µ + 2π). Similarly, parameter symmetries can also be obtained from other types of mechanisms, such as commutation symmetries. Consider for example a two-qubit PQC composed of a CNOT preceded and followed by single qubit rotations about the z axis on the first qubit. That is, for any θ 1 and θ 2 , and for any p, q ∈ (0, 1).
In the work we provide a general theory to analyze a class of non-trivial discrete parameter symmetries which were first reported in [37] and which we generalize here. Specifically, we introduce a method for finding and characterizing the following symmetries:  (6) for some p j , q j , p j , q j ∈ {0, 1}, such that {θ, γ} and { θ, γ} are symmetric.
In the next section we present the so-called σ-Pulse method for determining σ-Pulse symmetries.

σ-Pulse method
Here we introduce the σ-Pulse method which allow us to start from a set of parameters {θ, γ} of a buffered PQC and obtain a second set { θ, γ} which are σ-Pulse symmetric to {θ, γ} according to Definition 3. This method is based on three basic steps: (1) The creation of the so-called σ-Pulses, (2) The propagation of said pulses through the circuit, and (3) The absorption of the σ-Pulses in the buffer unitary. The mathematical rules for these steps (described below in detail) are schematically shown in the ZX-calculus [51] notation of Fig. 3(a).

Creation of σ-Pulses
The first step of the σ-Pulse method is based on the fact that any single qubit rotation around a principal , and for a CNOT gate. We additionally introduce notation for the σ-Pulses (rotation of angle π), and for rotations of angles ± π 2 . (b) Rules for creating and absorbing σ-pulses. (c) Non-trivial commutation rules for σ-Pulses. We remark that in panels (b) and (c), the rules fora (iσy) pulse can be derived from those of (iσx) and (iσz). In addition, in those panels an equal sign indicates that the unitaries are equal up to a global phase.
axis satisfies the following identity R µ (θ) = e −iθσµ/2 = e −i(θ+π)σµ/2 e iπσµ/2 where we defined the shifted rotations for p, q ∈ {0, 1} and µ ∈ {x, y, z}. As shown in Fig. 3(b), Eq. (7) implies that the angle of a singlequbit rotation in V (θ) can be shifted by π at the expense of adding to the circuit a iσ µ gate, i.e., at the expense of creating a σ-Pulse. When Eq. (7) is employed to generate a pulse, we say that the gate is a generator of a primary pulse. Note that simply employing (7) changes the structure of the ansatz as we have added new gates to the circuit. For the structure of V B (θ, γ) to be preserved, the σ-Pulses need to be propagated through the circuit towards U B (γ) where they be absorbed.

Propagation of σ-Pulses
Propagating the σ-Pulses through the circuit implies knowing how iσ µ commutes with all other gates in the ansatz. The commutation of a σ µ -Pulse through a single qubit rotation R ν (θ) is given by Equation (9) shows that if µ = ν, the commutation of a pulse with a rotation can lead to said rotation picking up a minus sign. Moreover, the following identities provide the commutation rules between a pulse and a CNOT: These commutation rules, which we illustrate in Fig. 3(c), show that commuting a σ-Pulse on the control (target) qubit through a CNOT can lead to the creation of a secondary pulse on the target (control) qubit, plus a global unobservable phase. For the gate structure of V B (θ, γ) to remain unchanged, the secondary pulses also need to be propagated towards the measurement, which in turn means that they can create additional secondary pulses.

Absorption of σ-Pulses
Once all the primary and secondary pulses have been propagated to the buffer unitary, they can be absorbed by shifting the rotation in U B (γ) angles as where we use the definition of the shifted rotations R pq µ of (8). Here we remark that the minus signs on the right-hand side of (11) simply correspond to unobservable global phases. These absorption rules are shown in Fig. 3

Parameter symmetries
Equations (9)-(11) provide the framework for determining symmetries in V B (θ, γ) with the σ-Pulse method. Given a set of angles {θ, γ}, one can select any number of rotations to generate primary pulses. Once the primary and secondary pulses are propagated and absorbed in the buffer unitary, we define { θ, γ} as the ensuing new set of angles. From Eqs. (7) and (11) it is straightforward to see that { θ, γ} and {θ, γ} are symmetric according to Eq (6) in Definition 3. In Fig. 4 we explicitly show this procedure.
Definition (3), and more specifically, Eq. (6), allows us to derive the following proposition, which is proved in the Appendix.

there exists 2 M sets of σ-Pulse symmetric parameters { θ, γ} according to Definition (3). Each symmetric set can be characterized by a bitstring β of length
. We denote as B the set of such bitstrings.

Proposition 1 has several important implications.
First, it shows that the each point in the cost function landscape is exponentially periodic, as the cost is symmetric over the parameter translation θ j → (θ j + π) for every j. This holds up to a sign change θ k → −θ k in some other parameters with k = j and a correction rotation in the buffer layer parameters γ, due to the necessity to propagate and absorb the extra σ-Pulses. In particular, denoting as {θ opt , γ opt } a set of parameters that minimize the cost C(θ, γ) we have that the global minimum is 2 M -fold degenerate.
In addition, Proposition 1 implies the next Corol-lary.
The proof of Corollary (1) follows from Proposition 1 by taking β = 0. This Corollary implies an exponential reduction of the effective hyperarameter space by restricting the domain of all angles in V (θ) Hence, all relevant features of the cost function landscape (including the global minima) can be found in [0, π) M . We finally remark that the domain restriction in Corollary (1) is nontrivial as it does not arise from a wrapping symmetry in the rotation parameters (i.e., it does not arise from the fact that R µ (θ) = R µ (θ + 2π)). Instead this domain reduction arises from the σ-Pulse symmetries.

Noise-induced lifting of the symmetries
In this section we analyze how noise affects the symmetries in V B (θ, γ) and hence the degenerancies in the cost landscape. Our main results are presented in the form of two theorems, with Theorem 1 analyzing the effect of unital Pauli noise, and Theorem 2 the effect of non-unital Pauli noise. We recall that unital Pauli noise channels include T 2 processes (i.e. dephasing channel), and depolarizing as special cases. On the other hand, non-unital Pauli noise channels include T 1 processes as a special case, i.e. the amplitude damping channel is a non-unital Pauli channel.

Unital Pauli noise
Consider the following definition: Definition 4 (Unital Pauli noise model). We define the unital Pauli noise model as a process in which a unital Pauli channel acts after every layer of gates acting in parallel in V B (θ, γ).
Here we recall that unital Pauli noise channels are completely positive trace-preserving maps P U whose superoperator is diagonal in the Pauli basis. The action of P U on an n-qubit Pauli operator X a Z b is given by where p 00 = 1, and where we assume that −1 p ab 1 ∀a, b. Here a, b ∈ {0, 1} ⊗n are bitstrings of length n, and where we employ the notation As explicitly shown in the Appendix, the following theorem holds. Theorem 1 shows that the parameter symmetries in V B (θ, γ) arising from σ-Pulse symmetries are completely preserved by the action of unital Pauli noise channels which includes as special cases the action of local (or global) depolarizing channels, as well as dephasing channels.
In addition, Theorem 1 implies that the degeneracy in the cost function landscape also remains unchanged. Particularly, we then know that the optimal parameters { θ opt , γ opt } leading to the global minimum of the noisy cost function will still be 2 M -fold degenerate in [0, 2π). That is, starting from { θ opt , γ opt }, all the symmetric parameters obtained from any bitstring in B will have the same energy. In a practical scenario Theorem 1 implies that the minimum cost achievable from randomly initializing the parameters {θ, γ} will be independent of the bitstring β characterizing the initial point.
Note however that for general cost functions the presence of quantum noise can change the cost landscape such that the optimal parameters of the noisy cost function are different than the ones for the noiseless case [4]. That is, one generally has { θ opt , γ opt } = {θ opt , γ opt }. For the special cases when { θ opt , γ opt } = {θ opt , γ opt } we say that the cost has optimal parameter resilience. This phenomenon has been analyzed in [36] for the problem of variational quantum compiling.
In addition, we also know that the value of the noisy cost function evaluated at the optimal parameters C( θ opt , γ opt ) can also change due to the presence of noise. Here C denotes the noisy cost function. In fact, let us consider a cost function of the form where O is a Hermitian operator, and assume that the parameters {θ opt , γ opt } yield the operator's ground state. Then we have trivially that In general, the output state of the PQC converges to the fixed point of the noise model [52], and so the cost function will be increasingly different as noise increases.

Non-unital Pauli noise
Let us here analyze the effect of non-unital Pauli noise on the σ-Pulse symmetries. Hence, consider the following definition Definition 5 (Non-unital Pauli noise model). We define the non-unital Pauli noise model as a process in which a non-unital Pauli channel acts after every layer of gates acting in parallel in V B (θ, γ).
Here we recall that non-unital Pauli noise channels are completely positive trace-preserving maps P N U whose action on the identity operator is On the other hand, its action on all other Pauli operators is given by As explicitly shown in the Appendix, the following theorem holds.
Theorem 2 (Symmetry breaking). Let V B (θ, γ) be a buffered PQC as in Definition 1. Then, the σ-Pulse parameter symmetries in V B (θ, γ) can be broken under the action of a non-unital Pauli noise model.
From Theorem 2 we have that given two sets of symmetric parameters {θ, γ} and { θ, γ}, then there exists some non-unital noise such that we have V (θ, γ) = V ( θ, γ). This implies that the 2 M -fold symmetry of the optimal noisy parameters, and hence the degeneracy in the cost landscape, is broken.
In particular we now have that some of the (previously) global minima are transformed into local minima. In practical terms, due to this degeneracy breaking in the cost landscape not all randomly initialized {θ, γ} will converge to the global minima. In the next section we show how this effect can be mitigated by exploiting the knowledge of parameter symmetries to construct am optimizer.
We remark that the proof for Theorem 2 in the Appendix is valid for more general noisy channels that include as special case non-unital Pauli nose channels. For instance, we show that coherent error, such as qubit drift, can also break the σ-Pulse symmetries.

Symmetry-based Minima Hopping (SYMH) optimizer
Here we present the Symmetry-based Minima Hopping (SYMH) optimizer, which is meant to be employed in the presence of non-unital quantum noise. As its name indicates SYMH employs the σ-Pulse symmetries to hop around the degeneracy-broken landscape and attempt to find the minima that are less sensitive to noise. As further explained below, the strength of SYMH is that it should be considered a general tool which can be implemented along with other optimization and error mitigation techniques.
Consider the problem of minimizing the cost C(θ, γ) function in Eq. (1). As shown in Fig. 5(a), a common strategy is to randomly initialize the parameters {θ, γ} and employ a classical optimizer which Figure 5: Main idea behind the SYMH optimizer. The contour plots correspond to the cost landscape of Fig. 1(b), where the presence of non-unital quantum noise has broken the landscape degeneracy. a) The parameters are random initialized. b) By employing a classical optimizer we can determine a cost minimizing direction. In this case, the minima to which the optimizer converges is a local minima which used to correspond to a global minima in the noiseless case. c) Using the σ-Pulse method we can hop in the landscape and land in the vicinity of another minima. d) By performing an optional second optimization we can find the global minima of the problem.
takes as input the value of the cost (or its gradients) to determine a cost minimizing direction. Usually one optimizes until some stopping criteria has been met, at which point one hopes that the minima reached corresponds to a global optima. We call the parameters obtained at the end of the optimization as {θ f , γ f }, where the f stands for final.
However, as shown in Fig. 5(b), randomly initializing the parameters can lead to the optimizer getting trapped in local minimum which used to correspond to global minimum in the noiseless scenario. In this case, one can attempt to find the global minimum by taking final parameters {θ f , γ f } and employing the σ-Pulse method (i.e. using Eq. (6)) to obtain a new set of parameters { θ f , γ f }. As depicted in Fig. 5(c) this will effectively lead to a hop in the cost-landscape whose ending point can be in the well of another minima.
Since the cost landscape degeneracy is broken, then { θ f , γ f } might not correspond to a critical point: one therefore has to re-optimize. This is schematically shown in Fig. 5(d), where the hop leads to the vicinity of the minimum and additional optional optimization could be needed. Note that in general this optional optimization will only require a small number of circuit evaluations, and hence will not add significant overhead to the optimization.
Here we remark that there are several possibilities for where the SYMH takes us in the landscape. As previously mentioned, in the best case scenario the hopping can lead to the vicinity of a minimum whose cost function value is smaller than C(θ f , γ f ). Here SYMH was successful as it allowed to mitigate the effect of noise and improve the quality of the solutions. It might also happen that one lands in the vicinity of a minimum whose cost function value is larger or equal than C(θ f , γ f ). In this case one simply rejects this particular hop and use Eqs.
Finally, if the cost landscape has shifted in such a way that hop does not lead to the vicinity of a minima, one can still perform the second optimization. This would simply correspond to an optimization starting from a new random seed. In all cases, the overhead added by employing SYMH does not change the overall complexity of the optimization.
Note that this high-level description of SYMH is intended as a template than can be employed in many different scenarios. Due to the versatility of the method we do not intend to present here an alldescriptive way of employing SYMH but rather to introduce it as a general method which can be coupled to other optimization and error mitigation techniques. In what follows we present different SYMH-based optimization techniques.

Parameter sweeping method
One of the main challenges that can arise when employing SYMH is the exponentially large number of possible hops one can take (arising from the 2 M bitstrings in B). In this section we present a technique called the sweeping method in which instead of exploring all hops, one instead simply explores a reduced sub-set of hops which adds an overhead at most in O(M ).
In the sweeping method one starts with {θ f , γ f } and employs SYMH to obtain a new set { θ f , γ f } such that θ f and θ f differ only in the first parameter. That is, the first parameter in the sets θ f and θ f are related according to (6). This guarantees that all the parametrized gates in V (θ) except for the first one remain the same. Then, as previously described one performs a second optimization to find the cost function value at { θ f , γ f }, which determines if the shift is accepted or rejected. This procedure is sequentially repeated by sweeping n s times through all M parameters. We refer the reader to Algorithm 1 for a more detailed description. One of the main advantages of this method is that it allows us to identify the parameters that, when shifted, yield the biggest improvement in terms of cost function minimization. Use symh with input α f to shift the corresponding parameter obtaining α; 5 (Optional) Use opt with input α to find α f ; 6 Evaluate C( α f ).

7
At end of cycle append the best parameter index to pulse_lst; α f ← α f ; C f ← C( α f ). 8 until no more improvement in C f ; at most n s times. 9 (Optional) Do final round of optimization with opt. 10 return C f , α f .

Using SYMH for ansatz symmetry breaking and landscape exploration
Let us now discuss how to implement SYMH to improve the solution quality in problems where the ansatz encodes some additional symmetry beyond that of the σ-Pulses. In general those symmetries are translate into constraints in the parameters of V (θ). Specifically, we analyze the possibility of using SYMH to break those additional constraints and improve the solution quality via the parameter hops obtained through the σ-Pulse method.
Unstructured ansatzes such as the hardware efficient ansatz [46] have been widely implemented in the literature for problems in which one has little to no information about the solution of the problem. However, there are many tasks in which one possesses knowledge which can be employ to construct the socalled physically-inspired ansatzes. Such information can come in the form of a specific symmetry that the ansatz must preserve [53], an adiabatic transformation that must be followed [5,47] (as in QAOA), or the operators that the ansatz must contain [48,49,50] (as in UCC). Preserving these additional symmetries during the parameter training can guarantee that the ansatz explores a sub-space of states related to the solution of the problem.
In most cases, these additional symmetries are translated into parameter constraints which the circuit description of V (θ) must obey. For instance, in QAOA all the parameters in a given mixing of driving layer are correlated [54]. Note that in the SYMH formalism this corresponds to exploring certain subspaces of B where the bistrsings β respect the ansatz symmetry.
In the presence of noise, however, it could happen that one can obtain a higher quality solution by starting from the set of ansatz constraint-preserving parameters {θ f , γ f }, and using SYMH to hop to an ansatz constraint-breaking set of parameters. In this case, SYMH allows us to explore circuit configurations inaccessible to the original circuit structure.

The Quantum Alternating Operator Ansatz
We explicitly characterize the symmetries possessed by a Quantum Alternating Operator Ansatz (QAOA) [47], without a buffer layer. To recap, the QAOA, which generalises the ansatz in the Quantum Approximate Optimization Algorithm [5], has a layered structure, each layer being composed of two unitaries: a problem unitary U P (β i ) = e −iβiH P , where H P is the problem Hamiltonian (consisting of Z-terms only) and a mixing unitary U M (γ i ) = e −iγiH M , where typically H M = j X j . The parameters of QAOA can therefore be represented as two vectors {β, γ} each corresponding to each type of unitary. Now let us consider the effect of shifting one of these parameters by π.
If the parameter belongs to a mixing unitary, the shift will be cancelled by the creation of a σ-Pulse in the X direction on all the qubits. It can be shown that the commutation of any number of X pulse through a gate consisting of Z terms of any order generates no additional pulses, but leads to the parameter of the gate acquiring a negative sign. Therefore, the X pulses generated by such a shift can only be annihilated by a restoring shift of another mixing parameter. Therefore, a symmetry is the following: If instead we focus on the problem unitary, we see that the parameter shift will create, on a given qubit, one Z pulse for each Z acting on that qubit. Since the mixing unitary presents one X rotation for each qubit, one concludes that, in order to be able to commute all of the pulses past a mixing unitary with only a sign modification of its parameter, there must be the same number of Z pulses on all the qubits. Therefore, the ansatz will only possess such a symmetry if the problem Hamiltonian features the same number of Zs on each qubit. This happens, for example, in MaxCut problems on n-regular graphs. In this specific case, when n is odd each shift will generate one total Z per qubit, and for any i, j with i < j the symmetries are: Otherwise, for n even all the pulses will cancel and the symmetries will simply be: Identifying these symmetries is significant, because they imply, if not a complete parameter space reduction, that restricting the range of some parameters will not affect the result of the algorithm.

Implementations
In this section we present heuristic results where we employ SYHM to improve the solution quality of VQAs in the presence of quantum noise. Specifically, we simulate VQAs for two implementations: variational quantum compiling, and for a Variational Quantum Eigensolver problem.

Variational Quantum Compiling
Quantum compiling [55,56,57] refers to the task of transforming a high-level algorithm into a low-level code that a quantum hardware can efficiently implement. In the near-term, one of the main applications for quantum compiling is to transforming a unitary with a deep quantum circuit description into a shorter depth gate sequence which mitigates the effect of noise. Several Variational Quantum Compiling (VQC) [58,59,7] architectures have been recently introduced where one trains the parameters in a shortdepth PQC so that its outputs approximate those of a target unitary U .
Here we consider a quantum compiling application where the goal is to train the parameters in the PQC so that V B (θ, γ)|0 = U |0 , where U is the W -state preparation circuit for three qubits (see [36] for an explicit circuit), and where |0 = |0 ⊗3 is the all-zero state. Explicitly, U |0 = |W , where |W is the three qubit W -state. For simplicity, we consider the cost function , γ) , (17) which vanishes if V B (θ, γ)|0 = |W (up to a global phase).  VB(θ, γ). By measuring the probability of all qubits being on zero P (0), the cost can be computed as C = 1 − P (0). Here the buffer layer is only composed of rotations about the y axis, as those are sufficient.
As outlined in Fig. 6, for the buffered PQC, we choose a layered hardware-efficient ansatz of the form where each U i (θ i ) consists of single qubit rotations followed by CNOTs. Note that here U B (γ) is only composed of R y rotations as we can only create σ y pulses.
In Fig. 7 we present our numerical results. Here we employed a noisy quantum circuit simulator with realistic amplitude damping noise acting after every gate (including idle gates) obtained from the average T1 and T2 parameters and gate times of the ibmq_melbourne quantum computer. For gradient descent we used the COBYLA [60] optimizer, initialized at random angles. In addition, we simulated the circuit with an ansatz composed of L = 1, 2, 3 layers. In a noiseless scenario, a single layer is not enough to prepare the W state, while L = 2, 3 can reproduce the desired target state. For each number of layers we ran 100 randomly initialized simulations, and once the first optimization was completed we implemented the parameter sweeping method of Algorithm 1 with n s = 4 maximum sweeps.
As seen in Fig. 7, for all values of L employing SYMH leads to a systematic improvement in the cost function value. This improvement can be measured by taking either the best run (out of 100) or by taking the average of all runs before and after SYMH. In addition we can see that the improvement increases with L as circuits with a longer depths accumulate more noise, hence leading to a larger possible improvement.

Variational Quantum Eigensolver
In this section we present our results for a Variational Quantum Eigensolver (VQE) implementation. Here, the goal is to train a PQC such that V B (θ, γ) prepares the ground-state of a given Hamiltonian H. Specifically, we consider H to be the SU (2)-symmetric Heisenberg XXX model on n qubits Where we assume period boundary conditions so that n + 1 ≡ 1. Here the cost function is simply given by where |ψ is an efficiently preparable input state.
For the PQC ansatz we consider a subclass of the QAOA called Hamiltonian Variational Ansatz (HVA), first introduced in [61]. Specifically, we follow the circuit structure for HVAs of [62], where we split H XXX into two summations with the i index being even and odd. That is, H XXX = H odd +H even so that the ansatz can be expressed as In addition, here we choose the initial state to be a tensor product of the Bell state |ψ = 1 2 n/4 (|01 − |10 ) ⊗n/2 , which is the ground state of H even . As shown in Fig. 8, the circuit description of V (θ) can be obtained from a first order Trotter expansion of the e −iθiHeven/2 and e −iθiHeven/2 unitaries and hence features alternating layers of XX, Y Y and ZZ interactions, first on odd qubits and then on even qubits.
Let us remark that the parameters in V (θ) encode additional problem symmetries as all the gates in a layer are constrained to being identical. To show that SYMH can be used to break these constraints and improve the solution quality in the presence of noise, our heuristics where perform with the optimization schedules of Fig. 9. First, we set the parameters in the buffer layer to be γ = 0 so that U B (θ) = 1.
Then, we randomly initialize the θ and optimize them respecting their correlation (see Fig. 9(1)). As indicated in Fig. 9(2), we then implement SYMH to break the VHA constraints, and hop in the landscape. This hopping can be followed by a second optimization, which we call a "free" optimization, where the VHA constraints are broken and all parameters are independently optimized (see Fig. 9(3)). To verify that the improvements arise from the SYMH and not from the free optimization, for each run we additionally perform a free optimization without SYMH ( Fig. 9(4)).
The results from our numerical simulations are presented in Fig. 10. Here we considered VQE problems with n = 4, 6, 8, 10 qubits and where the ansatz is composed of a single layer. Moreover, we employ the same noise model and optimizer as the one described in the previous section. For each n we ran 100 instances of the optimization. In Fig. 10(a) we present the best run for each number of qubits and for the different optimization method in Fig. 9. As shown, one can always improve the solution quality by breaking the ansatz constrains. However, the best solution is always achieved when employing the SYMH. Meaning that the optimal improvement follows from hoping in the landscape.
A more detailed comparison of the cost function improvement for the different methods is presented in Fig. 10(b), where the improvement is defined as Here, E GS denotes the true ground state energy E HV A is the energy obtained from (1) in Fig. 9 and E f is the final energy from the optimization schemes (2), (3) and (4) of Fig. 9. As shown in Fig. 10(b), employing SYMH and breaking the parameter constrains always seems to leads to the best improvement. In fact, for all values of n the improvement is larger than 7%.

Discussion
Analyzing the cost function landscape of variational quantum algorithms and quantum neural networks is a fundamental task to improve their performance. While some rigorous results have been derived that examine the connection between the ansatz and the cost landscape, much still remains to be done. In this work we discussed two phenomena related to the cost function landscape, the first phenomenon being an exponentially large symmetry in the parameters of a PQC, and the second pertains to how quantum noise affects these symmetries. The latter provides the theoretical grounding for the SYMH optimization method. We will now separately summarise these results and discuss their implications.

Exponential symmetry in PQCs
The first phenomenon can be condensed into the following: for buffered PQCs, there are exponentially many sets of symmetric parameters θ and θ such that V (θ) = V (θ ) (up to a global phase). These symmetries translate in turn into exponential degenerancies in the cost landscape. To understand and analyze these symmetries we have introduced the socalled σ-Pulse method. The main idea behind this method is the creation, propagation, and absorption of virtual gates in the PQC, which allow us to obtain symmetric sets of parameters. Despite the simple interpretation of the σ-Pulse method, its implications are non-trivial. For instance, we can show that all the relevant features of the cost landscape can be found in a subspace of the parameter hyperspace, hence providing an exponential reduction of the search space of variational quantum algorithms.
To the best of our knowledge, the current work represents the first discussion of such a broad degener- acy in quantum parameterized quantum circuits (altho subsequent works have studied existence of exponentially many local minima of overparameterized quantum neural networks [63]). Nonetheless, there exists a connection with concepts in Measurement-Based Quantum Computing (MBQC) [64], and with the concept of flow [65] in particular. The symmetries presented here are related to the correction operations following a measurement to a qubit. A variant of the "angles can be restricted to [0, π)" principle is even known for such measurement operations 1 .
In the context of variational algorithms, the result is however novel and may have numerous implications. For instance, the fact that the noiseless quantum landscape has been demonstrated to consist of an exponentially large number of translated copies of a single landscape "unit cell" means that the informational content of the landscape is actually greatly reduced. As such, one may hope to construct quantum 1 Private communication with Will Simmons algorithms that are more sample-efficient by exploiting these regularities. Beyond variational algorithms, quantum compilers might leverage the symmetries to reduce the single-qubit gate count from static circuits. Indeed, any R µ (π) gate may be eliminated by shifting it to an identity gate via an appropriate σ-Pulse. This might also be done in an approximate manner, for example if the rotation is only a small away from π. However such an approach would leave the circuit's T count invariant.

Noise-broken symmetries and SYMH
The second theoretical contribution considers what happens when noise in introduced in such a buffered PQC. First we rigorously showed that the parameter symmetries are preserved under the action of unital Pauli noise, implying that dephasing and depolarizing noise acting throughout the PQC have no effect on the overall symmetric structure of the cost landscape. We then proved that non-unital Pauli can break the parameter symmetries and hence the degenerancies in the cost landscape. This result implies that, when training in the presence of noise, some of the previously exponentially degenerate global minima can become local minima. Hence, optimization strategies that randomly initialize the parameters could converge to one of those local minima and not obtain an optimal solution.
To mitigate the effect of noise when training in degeneracy-broken landscapes, we introduced a novel optimization method which we call the Symmetrybased Minima Hopping (SYMH) optimizer. SYMH employs the parameter symmetries to hop around the landscape and attempt to converge to more noise resilient minima. The main advantage of SYMH is its versatility, in the sense that it can be easily combined with any optimization or error mitigation technique without significantly increasing the computational overhead.
To showcase the effectiveness of SYMH we numerically simulated two variational quantum algorithms in the presence of quantum noise. Namely, we implemented a quantum compiling task and the Variational Quantum Eigensolver. In both cases we heuristically showed that employing SYMH when optimizing consistently improves the solution quality. We remark that these are preliminary results, and more thorough approaches to constructing optimizers with SYMH might well prove to be even more fruitful. Hence, SYMH is an additional tool in the quantum variational toolbox, and it can be regarded as a quantumaware optimizer that accounts for symmetries and symmetry breaking in noisy PQCs.
At the same time, one should be mindful that SYMH in its present formulation cannot be employed as a standalone optimizer, but must be paired with a suitable local optimization routine. The local gradient descent step can be kept minimal, but it is nonetheless unavoidable as without it the optimization would be constrained to a discrete grid of points connected by σ-Pulse symmetries, where it is highly unlikely that the global minimum lies.
To better gauge the utility of SYMH, it is helpful to compare it with similar but non-quantum native approaches to global optimization. The classical machine learning literature offers several such algorithms, the main class being simulated annealing (SA) optimizers [66], which include Metropolis-based algorithms like the particle-collision algorithm [67]. These explore a number of random candidate solutions and select the best ones based on a probabilistic method. The similarity with SYMH is that such algorithms are designed to evade local minima and explore as much of the landscape as possible. Some of them may also incorporate a gradient-based local subroutine [68]. However, one considerable difference is that these algorithms are still local, as at each step they make gradual deviations away from the starting point. Setting a large step size would lead to unpredictable variations in the cost. Instead, SYMH is in principle able to make large jumps across the landscape while limiting the change in cost, as the algorithm is rooted on knowledge of the underlying symmetries of the landscape. Similar considerations also hold for the method of gradient descent with warm restarts [69]. Nonetheless, it is an interesting question for future research if a combination of SA-style approaches and SYMH jumps would be more effective than either for a quantum problem.
Interestingly, the SYMH method also has a natural connection with the broad family of randomized methods for noise mitigation, which include randomized compiling and benchmarking [70,71], twirling [72] and probabilistic error cancellation [73]. In fact the parameter symmetries obtained through our σ-Pulse method are a special subset of twirling operations that preserve the structure of the ansatz. Nonetheless, the aforementioned methods involve a modification of the circuit via the addition of extra Pauli gates, while we achieve this implicitly by shifting the parameter values. Another difference is that the symmetries are specific to variational circuits, and violate the assumption of Clifford operations required by most twirling-based methods. As noise mitigation techniques, SYMH and randomized error mitigation methods are distinct. The latter aim at converting general noise channels into Pauli ones by a symmetrization procedure, while SYMH leverages the broken symmetries to achieve an improved cost function. In addition SYMH does not require any averaging operation, leading to a smaller overhead.
Despite its merits, SYMH presents some limitations that we now discuss. First, the parameter sweeping algorithm is purely heuristic and comes with no guarantee of convergence. Certainly, if one were given knowledge of the precise noise channels affecting the system, it is reasonable to expect that more refined optimization methods could be identified. As such, we expect that more advanced versions of SYMH can be derived given knowledge of the noise structure. Still, the numerical results found here suggest that the sweeping method is able to improve the cost function significantly, see Figure 10.
Secondly, even with an ideal optimization schedule SYMH would only be effective up to unital channels, which cannot be corrected by SYMH. These channels, e.g. Pauli channels, are widely present in quantum computers [74]. However, is it also true that many noise mitigation methods exist that are effective for unital Pauli channels [73,75,76,77], and such methods may be easily integrated alongside SYMH. Ultimately we propose SYMH as a first step towards leveraging the quantum landscape structure to optimize cost functions. Moreover, SYMH should not be considered as a standalone technique, but one to be used in conjunction with more standard strategies.
Several future research directions follow for the SYMH method. First, we heuristically verified that hopping does not lead to a critical point, but rather to the valley of a minimum. Analyzing how much the minima can shift could provide additional guarantees for SYMH without a second optimization. Second, in our numerics we observed that shifting some parameters leads to the greatest improvements, but no clear pattern emerged. We leave for future work the analysis of this phenomenon.
Science, Office of Advanced Scientific Computing Research, under the Quantum Computing Application Teams program.

APPENDIX
In this appendix we provide of our main results, with Appendix A containing the proof of Proposition 1 and Corollary 1. Then, in Appendix B we present the proofs of Theorem 1 and Theorem 2.

A Proof of Proposition 1 and Corollary 1
In what follows we present the proof for Proposition 1.
Proof. Let us first recall that, as mentioned in the main text, there are M parameters in the PQC V (θ). Moreover let us denote as G the set of all possible generator choices. The number of distinct symmetric sets of parameters {θ, γ} obtained trough the σ-Pulse method is given by the cardinality of G: Similarly, we can count the number of symmetric sets of parameters as the number of bitstrings β in B of length M , which is precisely 2 M . Now we prove Corollary 1. In particular we show that given any {θ, γ} one can always find a set { θ, γ} with β = 0.
Proof. Let us assume that the parameters in θ = (θ 1 , θ 2 , . . .) are order by layer, where a layer consists of quantum gates that can be performed in parallel and where the first layer contains the first gates in V (θ). In general we can assume without loss of generality that the angles in θ are in [0, 2π). We now describe a sequential procedure that can be used to obtain the vector θ where every parameter θ j ∈ θ not in the buffer layer are in the reduced domain [0, π).
If θ 1 ∈ [0, π) then we do nothing, but if θ 1 ∈ [π, 2π) we create and forward propagate a σ-Pulse. According to Eq. (7), this will add π to θ 1 , which maps it to the interval [0, π). This procedure is then sequentially repeated for each parameter in θ not in the buffer layer. We remark that since σ-Pulse propagate forward in the circuit, creating a pulse in θ j does not affect any angle θ k with k < j. Moreover, we know from (9) that as the σ-Pulses propagate they can add a minus sign to other angles in θ. Hence, if a given θ j that was originally in [π, 2π) picked up a minus sign then we do nothing as it will now be in [0, π). On the other hand, if it was in [0, π) we have to create a σ-Pulse to map it to [0, π). Note that at the end of this procedure every parameter not in the buffer layer will be mapped to the reduced domain [0, π). B Proof of our main theorems B.1 Proof of Theorem 1 Let us start by presenting a more detailed definition of a unital Pauli channel: Unital Pauli channels. A Pauli noise channel corresponds to the action of random Pauli operators according to a given probability distribution. Let P U denote an n-qubit Pauli channel. The action of P U on any given n-qubit Pauli operator is given by (24) where 0 p A l,k 1, and l,k p A l,k = 1. By using the fact that we find where p a,b = l,k (−1) a·k (−1) b·l p l,k and −1 p a,b 1 for all a, b ∈ {0, 1} n .
We now prove Theorem 1.
Proof. Let us now consider a buffered circuit V B (θ, γ) which is implemented in the presence of a unital Pauli noise mode as presented in Definition 4. Here we show that propagating the pulses (Pauli operators) trough the circuit does not change the unitary being produced as the Pauli operators commute with unital Pauli noise.
Let us now consider the channel V B which implements the unitary V B (θ, γ). This channel can be expressed as where V l is the channel that implements the unitaries in the l-th layer, and where U B the channel that implements the buffer unitary. From the σ-Pulse method we know that we can find a symmetric set of parameters by creating a sigma pulse, propagating it, and absorbing it in the buffer layer. In the channel notation this procedure can be expressed as • Creation of a primary σ-Pulse: with Σ 1 the channel that implements the σ-Pulse, and with V 1 the parameter shifted unitary.
• Propagation of the σ-Pulses: where now Σ L is the channel that implements the primary and secondary σ-Pulses.
• Absorption of the σ-Pulses: Here we can see that the channel V B remains unchanged, meaning that V B (θ, γ) = V ( θ, γ).
Let us now analyze this procedure in the presence of noise. The noisy version of the channel that implements V B (θ, γ) can be expressed as with P (l) U the noisy channel acting after every layer of gates. Once a σ-Pulse has been created we will have As we now show, any unital noisy channels P (i) U always commute with the channels Σ k that implement σ-Pulses. Explicitly, the action of on any given nqubit Pauli operator is given by Hence, using Eqs. (25), (33) and (24) we have that the following chain of equalities always hold