On the energy landscape of symmetric quantum signal processing

Symmetric quantum signal processing provides a parameterized representation of a real polynomial, which can be translated into an efficient quantum circuit for performing a wide range of computational tasks on quantum computers. For a given polynomial $f$, the parameters (called phase factors) can be obtained by solving an optimization problem. However, the cost function is non-convex, and has a very complex energy landscape with numerous global and local minima. It is therefore surprising that the solution can be robustly obtained in practice, starting from a fixed initial guess $\Phi^0$ that contains no information of the input polynomial. To investigate this phenomenon, we first explicitly characterize all the global minima of the cost function. We then prove that one particular global minimum (called the maximal solution) belongs to a neighborhood of $\Phi^0$, on which the cost function is strongly convex under the condition ${\left\lVert f\right\rVert}_{\infty}=\mathcal{O}(d^{-1})$ with $d=\mathrm{deg}(f)$. Our result provides a partial explanation of the aforementioned success of optimization algorithms.

When the phase factors are restricted to be symmetric, i.e., this is referred to as the symmetric quantum signal processing. The simplest example is Φ = (0, . . . , 0). This gives U (x, Φ) = e id arccos(x)X and g(x, Φ) = cos(d arccos(x)) = T d (x), where T d is the Chebyshev polynomial of the first kind of degree d.
Due to the parity constraint, the number of degrees of freedom in the target polynomial f (x) is d := d+1 2 . Hence f (x) is entirely determined by its values on d distinct points. Throughout the paper, we choose these points to be x k = cos 2k−1 4 d π , k = 1, ..., d, i.e., positive nodes of the Chebyshev polynomial T 2 d (x). For any target polynomial f (x) defined above, the solution to Eq. (1) exists [14,6], and Ref. [5] suggests that the solution can be restricted to be symmetric. The existence of the solution implies that the problem of symmetric quantum signal processing can be equivalently solved via the following optimization problem Φ * = argmin Φ∈[−π,π) d+1 , symmetric.
i.e., any solution Φ * to Eq. (1) achieves the global minimum of the cost function with F (Φ * ) = 0, and vice versa. However, the energy landscape of the cost function F (Φ) is very complex, and has numerous global as well as local minima (see Section 7). This is already the case with the symmetry constraint. (Without the symmetry constraint, the number of variables is larger than the number of equations and there should be an infinite number of global minima.) Starting from a random initial guess, an optimization algorithm can easily be trapped at a local minima already when d is small 1 . It is therefore surprising that starting from a special symmetric initial guess Φ 0 = (π/4, 0, 0, . . . , 0, 0, π/4), (6) which is independent of the target function, at least one global minimum can be robustly identified using standard unconstrained optimization algorithms even when d is as large as 10, 000 [5], and the optimization method is free from being trapped by any local minima. Direct calculation shows that g(x, Φ 0 ) = 0, and therefore Φ 0 does not contain any a priori information of the target polynomial f (x)!

Main results
In this work, we provide a theory to characterize the energy landscape of the cost function F (Φ), and to explain this unusual behavior of numerical optimization. Under the assumptions on the maxnorm f ∞ as stated in Corollary 7, this leads to the first provable algorithm for solving the QSP problem without referring to extended precision arithmetic operations [8].
Let the domain of the symmetric phase factors be Our first main result is the existence and uniqueness of symmetric phase factors for a class of polynomial matrices in SU (2). (2) P has parity (d mod 2) and Q has parity (d − 1 mod 2).

(4) If d is odd, then the leading coefficient of Q is positive.
There exists a unique set of symmetric phase factors Φ := (φ 0 , φ 1 , · · · , φ 1 , φ 0 ) ∈ D d such that This result is a refinement of [6,Theorem 4], which only shows the existence of the phase factors without symmetry constraints. Theorem 1 states that in the presence of the symmetric constraint, the phase factors are also unique. Besides, under some constraint, the converse of Theorem 1 also holds (see Remark 12). It follows that there is a bijection between the global minimizers of Eq. (5) and all possible pairs of (P (x), Q(x)) satisfying the assumption in Theorem 1 and Re[P ](x) = f (x). The conditions (1), (2) for the target polynomial f are compatible with the first two requirements in Theorem 1. The condition (3) on the maxnorm, i.e., f ∞ < 1 is compatible with the normalization condition, which is itself a natural condition due to the unitarity of U (x, Φ).
To simplify the discussion, we introduce the following definition of admissible pair of polynomials associated with a target polynomial. Here P Re := Re[P ], P Im := Im[P ]. Comparing Theorem 1 and Definition 2, the main modification is that the leading coefficient of the complementary polynomial Q is always restricted to be positive. This allows us to unify the discussion of odd and even values of d. The bijection relation can then be concisely formulated as follows.
Corollary 3 (Bijection between global minima and admissible pairs). If d is odd, there is a bijection between the global minima of Eq. (5) and all admissible pairs (P, Q).
If d is even, there is a bijection between the global minima of Eq. (5) and all pairs of polynomials (P, ±Q), where (P, Q) is an admissible pair.
The proof of Theorem 1 is constructive. So given an admissible pair, we have an algorithm to evaluate the symmetric phase factor. Theorem 4 generalizes the result of [8,Lemma 4] by explicitly constructing all admissible pairs. Together with Theorem 1, we have a complete description of all global minima of the cost function in Eq. (5).

Theorem 4 (Construction of admissible pairs).
Given a target polynomial f (x), all admissible pairs (P, Q) must take the following form, Here, e(z) where each r i (i = 1, . . . , 2d) is a root of the function and α ∈ C satisfies F(z) = αe(z)e(z −1 ).
Note that the set {r i } 2d i=1 does not include all roots of the Laurent polynomial F(z), which has 4d roots in total. By properly choosing the roots (see a more detailed description in Remark 17), we can construct all the admissible pairs. The number of global minima is finite, which is a consequence of the compactness of the domain D d . Theorem 4 shows that the number of global minima can grow combinatorially with respect to d.
Unfortunately, the procedure described above for finding phase factors is numerically unstable. It requires extended precision arithmetic operations and is therefore very expensive when d is large (see detailed discussions in Section 1.2). This is noticeably different from solving the optimization problem in Eq. (5), which is numerically stable and can be readily performed using standard double precision arithmetic operations.
Among the myriad of global minima, Theorem 4 allows us to identify a special global minimum, which is obtained by choosing {r i } 2d i=1 to be the roots of F(z) within the unit disc. The unique symmetric phase factor associated with this admissible pair is referred to as the maximal solution (the reason for the naming is technical and is explained in Section 4.2).
The maximal solution enjoys many desirable properties. For any target polynomial f with f ∞ ≤ 1 2 , the maximal solution lies in the neighborhood of Φ 0 , which is defined in Eq. (6). To characterize this property, we first introduce the definition of reduced phase factors. Given any symmetric phase factors Φ of length d + 1, reduced phase factors is referred to the left half part of Φ, namely φ 0 , φ 1 , · · · , φ d−1 , and is denoted as Φ. Recall that d := d+1 2 . Due to symmetry, Φ is fully determined by Φ. Hence, we may adopt Euclidean distance between Φ * and Φ 0 instead of that between Φ * and Φ 0 . Furthermore, throughout this paper, we choose Φ as the free variables. With some abuse of notation, we identify the cost function F (Φ) with F ( Φ).
Theorem 5 (Distance between the maximal solution and Φ 0 ). Let Φ * be the maximal solution for the target function f (x). Denote Φ * and Φ 0 as the corresponding reduced phase factors of Φ * and Φ 0 respectively.
In particular, if the target polynomial is f = 0, then the maximal solution is the initial guess Φ 0 . We then prove that when f ∞ is sufficiently small, the maximal solution belongs to a neighborhood of Φ 0 , on which the cost function F ( Φ) is strongly convex, i.e., the Hessian matrix denoted by Hess( Φ) is positive definite.
The local strong convexity result in Theorem 6 immediately implies that when f ∞ is sufficiently small, standard optimization algorithms, such as the projected gradient method [1,11], can converge in the neighborhood of Φ 0 , without being trapped by any local minima.
Corollary 7 (Convergence of projected gradient method). If the target polynomial sat- , starting from Φ 0 , the projected gradient method with step size t = 1 L converges exponentially to the maximal solution Φ * , i.e., at the -th iteration Here σ = 1 4 and L = 25 4 .
Therefore when f ∞ ≤ √ 3 20π d , in order to reach precision , the projected gradient method can be terminated after O(log(1/ )) steps independent of the details of f . Since each iteration is numerically stable, the maximal solution can be readily obtained using standard double precision arithmetic operations in practice.

Background and related works
The mapping in Eqs. (1) and (2) may seem a very peculiar way for encoding a polynomial (and it is). However, such an SU(2) representation can be directly translated into a quantum circuit, which is so far the most concise way for performing eigenvalue transformation f (A) (when A is an Hermitian matrix) [14], and singular value transformations f SV (A) (when A is a general matrix) on quantum computers [6]. Many tasks in quantum computation can be formulated using such transformations. When f is not a polynomial, it can be approximated by a polynomial with bounded precision. This strategy can be used to unify a large class of quantum algorithms including Grover's search and quantum phase estimation [17,16], and to perform a wide range of computational tasks, such as solving linear system of equations [17,13], eigenvalue problems [12], Hamiltonian simulation [14,6] etc.
After the initial efforts [14,4], significant progress has been made by using direct methods to obtain phase factors [6,8]. These methods are based on finding roots of high degree polynomials to high precision and are not numerically stable. Specifically, these algorithms require O(d log(d/ )) bits of precision, where d is the degree of f (x) and is the target accuracy [8]. It is worth mentioning that the extended precision needed in these algorithms is not an artifact of the proof technique. For instance, for d ≈ 500, the number of bits needed to represent each floating point number can be as large as 1000 ∼ 2000 [5], which is much larger than the 64 bits provided by standard double precision floating point format.
There have been two recent improvements of the factorization based method, based on the capitalization method [3], and the Prony method [19], respectively. Although the two methods differ significantly, empirical results indicate that both methods are numerically stable, and are applicable to polynomials of large degrees. The algorithm in [3] is a variation of the algorithm in [6], which introduces a small perturbation to the high order Chebyshev coefficients to enhance the numerical stability. The Prony method in [19] avoids the root finding of high degree polynomials by directly constructing the admissible pairs for a given target polynomial, and uses randomness to enhance the numerical stability. Ref. [5] proposes a very different optimization based algorithm and suggests the use of the symmetric phase factors. This work provides a detailed discussion of the analytic structure of quantum signal processing with symmetric phase factors, and demonstrates the effectiveness of the optimization based approach under the condition that f ∞ is sufficiently small. This provides a partial solution to the open problem of finding phase factors using numerically stable algorithms, and justifies the usage of standard double precision arithmetic operations in practice.

Discussion and open questions
Without the symmetry constraint, the optimization problem in Eq. (5) has d degrees of freedom. This is larger than the number of degrees of freedom of target function f , which is only d due to the parity constraint. As a result, the Hessian matrix is always singular, preventing us from proving the exponential convergence of optimization methods [18].
Our analytical result is the first result trying to explain the unexpected success of optimization algorithms in the problem of finding phase factors. In Theorem 6, we prove the local strong convexity of the cost function when f ∞ = O(d −1 ). Therefore for a given target polynomial, in order to apply Corollary 7, we need to first multiply the target polynomial by a scaling factor c = O(d −1 ) and find the phase factors for cf (x) instead. Such a scaling factor can be undesirable and may lead to suboptimal query complexities when implementing certain quantum algorithms. On the other hand, numerical results suggest that optimization algorithms can robustly converge to a global minimum when f ∞ is smaller than a constant (less than 1) that is independent of d. Note that the estimate in Theorem 5 is already independent of d. Hence the gap is mainly due to Theorem 6, where the eigenvalue estimate of the Hessian matrix is obtained via the estimate of each matrix entry, which introduces the d-dependence. It is a natural open question to prove the strong convexity of the cost function near Φ 0 where f ∞ can be independent of d.
The proof of Corollary 7 requires the projection of Φ to the domain { Φ : Φ − Φ 0 2 ≤ 1 20 d } at each iteration. This is because the steepest descent method may potentially overshoot and deviate away from the locally strong convex region. Numerical observation shows that unconstrained optimizers, such as quasi-Newton methods can robustly achieve the global minimum, which is beyond our current theoretical analysis.
Finally, since Φ 0 is only one of the solutions when f = 0, we may ask what if we run the projected gradient method starting from other solutions when f = 0, and to obtain optimal phase factors for more general target polynomials. These solutions correspond to non-maximal solutions. Numerical observations show that the convergence rate towards such non-maximal solutions can be significantly slower than that towards the maximal solution (see Figs. 5 and 6). The analysis of such behavior requires the generalization of estimates of in Theorems 5 and 6 to non-maximal solutions. These will be our future works. Acknowledgments:

Chebyshev polynomials
We first collect some basic facts of Chebyshev polynomials, which is used to formulate the optimization problem in Eq. (5) and will be extensively used in Sections 5 and 6. Given x ∈ [−1, 1], the Chebyshev polynomial of the first kind is T n (x) = cos (n arccos (x)), and that of the second kind is given by U n (x) sin(arccos (x)) = sin ((n + 1) arccos (x)).
A useful fact is that both T n and U n form a sequence of orthogonal polynomials. The Chebyshev polynomial of the first kind T n are orthogonal with respect to the weight 1], and that of the second kind U n are orthogonal with respect to the weight The induced norms are referred to as the Chebyshev norm of the first kind and the second kind respectively. Furthermore, any function h with h T < ∞ can be uniquely expressed as a series of Chebyshev polynomials of the first kind, Similarly, any function h with h U < ∞ can be uniquely expressed as a series of Chebyshev polynomials of the second kind, Apart from the weighted orthogonality, Chebyshev polynomials of the first kind T n satisfies the discrete orthogonality condition: where x j = cos π 2j−1 4 d are the nodes of T 2 d and max(n, m) ≤ 2 d. This set of points is also referred to as Chebyshev nodes. Furthermore, if n and m have the same parity which implies

Notation
Given a positive integer n, K n [x] denotes the set of all polynomials of degree at most n with variable x, and coefficients taken in the field K. The Laurent polynomial is a linear combination of positive as well as negative powers of the variable x with coefficients in the field K, and K[x, x −1 ] denotes the set of all Laurent polynomials. Specifically, we are only interested in the cases where K is either R or C. Unless otherwise noted, the term "polynomial" refers to real polynomial or complex polynomial, which only have nonnegative power of the variable x. We assume that the length of the set of phase factors is d + 1 unless otherwise noted. We also assume that a pair of polynomials (P, The QSP problem involves some delicate relations of polynomial coefficients. Throughout the paper, we will interchangeably expand a polynomial using the monomial basis and the Chebyshev basis, i.e., Similarly, For convenience, we set the values of f j , f j , p j , p j , s j , s j , q j , q j with a negative integer index to zero.
The coefficients in the monomial and Chebyshev basis satisfy the relation The set [n] := {0, 1, · · · , n − 1} is referred to as the index set generated by a positive integer n. The row and column indices of a n-by-n matrix run from 0 to n − 1, namely in the index set [n]. For a matrix A ∈ C m×n , the transpose, Hermitian conjugate and complex conjugate are denoted by A , A † , A * , respectively. The same notations are also used for the operations on a vector. We use the convention in quantum computing to write the basis vectors in C 2 . It is equivalent to the following identification Given a 2-by-2 matrix A = (a ij ), we have i|A|j = a ij for any i, j ∈ {0, 1}. For a matrix A ∈ C m×n , we denote the operator norm · 2 induced by the vector 2 norm as and denote the Frobenius norm · F as We use σ k (A) to denote its (k + 1)-th largest singular value, i.e., We define σ min (A) := σ min(m,n)−1 (A) and σ max (A) := σ 0 (A). Here, the index k starts from 0 which agrees with the convention of matrix index. For a Hermitian matrix A ∈ C n×n , we use the notation λ k (A) to designate the (k+1)-th largest eigenvalue, i.e., λ n−1 (A) ≤ · · · ≤ λ 1 (A) ≤ λ 0 (A).

Quantum signal processing
In the absence of the symmetry constraint, the existence of phase factors for a unitary representation of a pair of polynomials (P, Q) is proved in [6]. where if and only if P, Q ∈ C[x] satisfy that (2) P (x) has parity (d mod 2) and Q(x) has parity (d − 1 mod 2), When the real component of P is of interest only, the conditions for the existence of phase factors can be further simplified.
Then there exists some P,

Quantum signal processing as an optimization problem
The existence result in Corollary 9 enables the following optimization based strategy to find phase factors. Given the target polynomial f (x) ∈ R[x], we may minimize the L 2 distance between g(x, Φ) and target function f (x), i.e., (24) The desired phase factors Φ * satisfy L(Φ * ) = 0.
Since both g(x, Φ) and f (Φ) are polynomials of degree d with definite parity, the number of degrees of freedom is d := d+1 2 . These polynomials are determined by their values at x k = cos 2k−1 4 d π , k = 1, ..., d, which are positive Chebyshev nodes of T 2 d (x). This leads to the following optimization problem, 3 Symmetric quantum signal processing In this section, we present some basic properties of symmetric quantum signal processing. We first explain a constraint of the sign of the leading coefficient of Q(x) when d is odd. Then we introduce a constructive method for finding phase factors given suitable polynomials (P, Q). In the end, we prove Theorem 1, which establishes a bijection between the global minimizers of Eq. (5) and all suitable complementary polynomials (P Im , Q).

Sign of the leading coefficient of Q(x)
We first derive an explicit expression of the coefficients of T d and U d−1 in the Chebyshev expansion of P and Q respectively.

Lemma 10.
Let Φ ∈ R d+1 be a set of phase factors, P (x) and Q(x) be defined by Eq. (23).
Considering the Chebyshev expansion of P and Q in Eq. (19), one has Proof. See Appendix A.
After imposing the symmetry constraint on phase factors, we find that there is an additional constraint on the leading coefficient of Q when d is odd. It originates from the observation that when Φ is symmetric, one has φ 0 = φ d and then the second equality in Eq. (26) becomes Note that deg(Q) may be less than d−1. The more precise statement is given in Lemma 11.
Lemma 11. Let Φ ∈ R d+1 be a set of phase factors, P (x) and Q(x) be defined by Eq. (23). If Φ is symmetric and d is odd, then the leading coefficient of Q in the Chebyshev basis has the same sign as (−1) Proof. See Appendix B.
As a remark, the statement in Lemma 11 also holds with the Chebyshev basis replaced by the monomial basis, which is an consequence of directly applying the relation in Eq. (20).

Remark 12.
By directly applying Lemma 11 and Theorem 8, one gets that the converse of Theorem 1 also holds, given that deg(P ) = d and deg(Q) = d − 1.

Existence of symmetric phase factors
We present a useful lemma, which evaluates the outmost symmetric phase factor, and reduces the polynomial degree by 2. It also connects the leading coefficients in the monomial basis before and after the reduction. The result of Lemma 13 will be used throughout the paper. To clarify, we only consider the expansion of polynomial in the monomial basis in rest of this section.
and define Consider the expansion of P , Q, P (1) and Q (1) in the monomial basis, Then And P (1) , Q (1) also satisfy the condition (4) in Theorem 1.
Proof. See Appendix C.
For convenience we denote P (0) , Q (0) = (P, Q). Inspired by Lemma 13, we choose φ to satisfy and repeat the construction as long as deg P ( ) ≥ 2. Notice that Thus, the iteration stops when = d − 1. Moreover, deg(P ( d−1) ) = 0 if d is even, and deg(P ( d−1) ) = 1 if d is odd. For convenience, we introduce the notation ofd,

Lemma 14. Suppose that P (x), Q(x) satisfy the conditions in Theorem 1. Then there exists a set of symmetric phase factor
Here p ( ) d−2 and q ( ) d−1−2 are the leading coefficient of the expansion of P ( ) and Q ( ) in the monomial basis respectively. We use the notation P (0) , Q (0) = (P, Q) and define Proof. By repeating the construction in Eq. (34), we obtain a sequence of polynomials is positive as well. By the normalization condition, P ( d−1) must be of the form cx where |c| = 1 and 2 ) according to Eq. (36) and one can check In this way, When d is even, we know that P ( d−1) is a constant of absolute value 1 and

Proof of Theorem 1
Now we are ready to prove Theorem 1.
Proof of Theorem 1. The existence is given by Lemma 14. We only need to show the uniqueness. If there exists another set of symmetric phase factors Ψ := (ψ 0 , ψ 1 , · · · , ψ 1 , ψ 0 ) ∈ D d satisfying the requirement, we may consider Since P Ψ (x) can be obtained by a set of symmetric phase factors of length d − 2, the coefficient of and this equation has only one solution over [− π 2 , π 2 ), hence ψ 0 = φ 0 . Then Φ = Ψ follows inductively.

Optimization based method for finding symmetric phase factors
Ref. [5] suggests that the optimization problem Eq. (5) should be solved by imposing the symmetry constraint on the phase factors. This allows us to use the reduced phase factors Φ = (φ 0 , φ 1 , · · · , φ d−1 ), as optimization variables. For any scalar valued function of the set of reduced phase factors, we use subscripts to represent the partial derivative with respect to the corresponding phase factor, for example, We also use subscripts to represent the corresponding entry of any matrix. These notations and conventions can simplify the discussion in Sections 5 and 6.
The Hessian matrix of the problem in Eq. (5) can be written element-wise as where and R is a real d × d matrix closely related to the residual with entry (note that R ij does not mean the partial derivative here) Due to the existence of phase factors stated in Theorem 1, at the optimal point Φ * , the second term vanishes, namely R( Φ * ) = 0, and Hess( and Φ * are the reduced phase factors corresponding to symmetric phase factors Φ * .

Global minimizer
According to the last section, the problem of finding all global minima of the optimization problem is converted to that of finding all admissible pairs of (P, Q) associated with the target polynomial f . In this section, we provide a characterization of all possible complementary polynomials (P Im , Q).

Characterizing complementary polynomials
Before constructing all possible pairs of (P (x), Q(x)) satisfying Definition 2, we first introduce the following lemma, which views the problem from the perspective of Laurent polynomials.

Lemma 15. Given any real polynomial
The decomposition is unique in the following sense: if α ∈ C and r i ∈ C, i = 1, · · · , n such that then up to permutation of the roots.
Proof. According to the fundamental theorem of algebra, 1 − f (x) 2 has a unique decomposition always has 2 nonzero solutions, say z 1 , z 2 . In particular, z 1 = z −1 2 . For any s i , pick one solution r i , use equality and one gets It provides a decomposition of F(z) satisfying the requirement. On the other hand, given any decomposition By the uniqueness decomposition , we obtain Eq. (41) up to permutation of the roots.
In particular, we are interested in the following decomposition of F(z), which is referred to as an admissible decomposition, is closed under complex conjugation and additive inverse.
Firstly, D contains half of the roots of F(z) with multiplicity (hence D is a multiset rather than set). Secondly, note that α > 0 naturally comes from that α = 1−f (1) 2 r∈D (1−r) 2 > 0 for any D closed under complex conjugation. Another observation is that the roots of F(z) cannot be any number of absolute value 1. Otherwise, 1−f which contradicts the constraint on the maxnorm of f .
To be more precise, we rephrase Theorem 4 as follows and present its proof.
such that where e(z) . Then the normalization condition gives Notice that z d H(z) is a real polynomial of degree no more than 2d. Thus z d H(z) can be represented as c r∈D (z − r), where c ∈ C and the multiset D contains all the roots of By Lemma 15, the cardinality of D is 2d and thus z d H(z) is of degree 2d. Moreover, r ∈ C\{0}, ∀r ∈ D.
Besides, we can check that the parity of z d H(z) is even by examining the parity of p Im (z) and q(z). Thus D is closed under additive inverse and Eq. (43) holds as well. ⇐=: Then Notice that for k > 0 We can verify that p Im ( are well defined and (P (x) = f (x) + iP Im (x), Q(x)) satisfies the normalization condition, Since D is closed under additive inverse, It leads to that P Im (x) has parity (d mod 2) and Q(x) has parity (d − 1 mod 2). Another important observation is that r∈D r < 0. It comes from and that the coefficient of z 2d in F(z) is negative. According to Eq. (44), one has that the coefficient of Since Here, we use l.o. to denote any linear combination of z j with coefficients in C and |j| < d−1. By comparing with the coefficient of

Maximal solution
Among all possible choices of D, we are interested in choosing D to contain all roots of F(z) within the unit disc and outside the unit disc. As will be shown below, the associated complementary polynomials satisfy many desirapble properties. Let (P Im , Q), P Im ,Q denote the corresponding pair of complementary polynomials respectively. Moreover, throughout the paper, unless otherwise noted, we use P Im ,Q to denote the complementary polynomials constructed according to Eq. (43), where D contains all roots of F(z) outside the unit disc. Proof. Given any multiset D satisfying the condition mentioned in Remark 17, note that F(z) can be represented as for some β ∈ C. By checking the leading coefficient, Recall the proof of Theorem 16, we know that α r∈D r = β by Eq. (44). Together with Eq. (46), we obtain where K := − r∈D r > 0 by Eq. (45). When D contains all the roots within the unit disc or outside the unit disc, 1 K + K is the largest among all the possibilities, that is, the leading coefficient of Q(x) andQ(x) has the largest magnitude.
The following lemma shows that (P Im , Q) and P Im ,Q are directly related to each other.
Due to this relation, we may only focus on the admissible pair (P (x), Q(x)) obtained by choosing D to contain all the roots of F(z) within the unit disc. Due to the property in Lemma 18, the corresponding phase factors in D d will be referred to as the maximal solution.
Remark 20 (Maximal solution at f = 0). Lemma 19 agrees with the observation that two particular admissible pairs associated with f = 0 are (iT d (x), U d−1 (x)) and (−iT d (x), U d−1 (x)), which are obtained by respectively. In the following discussion, we will show that the maximal solution will converge to Φ 0 as f ∞ → 0. The same argument is suitable for P Im ,Q with Φ 0 replaced byΦ 0 .

Remark 21.
There is a useful observation that the coefficient of x d for P Im is positive. By Eq. (43), we have Here l.o. represents any polynomial of degree < d. To achieve the second equality, we use the expansion of e(z) = z −d r∈D (z − r) in the monomial basis and discover that only terms z d and ( r∈D r) z −d contribute to the first term on the second line, if we replace respectively. The specific choice of D yields (1 + r∈D r) > 0. Hence the leading coefficient of P Im is positive.

Relation to other constructive methods
There are two other methods for constructing complementary polynomials, which will be referred to as the GSLW method [6] and the Haah method [8], respectively. The two methods are equivalent. The GSLW method solves the problem without using Laurent polynomials. The Haah method first points out that the decomposition can be viewed more naturally in terms of Laurent polynomials.
The Haah method considers and the corresponding decomposition: where D contains all the roots within the unit disc.
In our construction, we may omit the term Q Im due to the symmetry constraint. However, one can check that our method is also suitable for the case that Q Im = 0 by simply replacing F(z) by Eq. (55). By the same method as that in Theorem 16, it can shown that any admissible complementary polynomial (P Re , Q Re ) must take the form of Eq. (43), where Q is replaced by Q Re . Hence our method can be generalized to find all admissible complementary polynomials, by going through all possible e(z) = z −d r∈D (z − r). In particular, the Haah method considers the special case that D contains all the root of F(z) within the unit disc, and therefore does not generate all admissible complementary polynomials.
We would like to clarify that Ref [8] states that we can construct P Im and Q as long as the joint union of a conjugate-closed multiset D and its reciprocal { 1 r : r ∈ D} is the list of all the roots of Eq. (55), if there is no parity requirement for P Im and Q. After imposing the parity condition, according to Theorem 16, another necessary condition for suitable complementary polynomials is that D is closed under additive inverse. Without it, the parity condition may not hold. This can be verified by the proof of Theorem 16.

Local convergence
We prove Theorem 5 in this section. Throughout this section, polynomials are expanded in the Chebyshev basis without otherwise noted. This is more convenient than the expansion in the monomial basis.
According to the definition of the Chebyshev norms in Section 2.1 and the convention of the coefficients in Section 2.2, we specifically list the following relations which exploit the weighted orthogonality and will be frequently used in the proof, By integrating the point-wise normalization condition f (x) 2 + P Im (x) 2 + (1 − x 2 )Q(x) 2 = 1 with the weight function 1 √ 1−x 2 , we obtain the following Parseval's equality:

Sketch of the proof
We first give a sketch of the proof as a road map. For simplicity, we assume d is odd in the sketch, while the proof works for any d.
1. We first show that each element of Φ * − Φ 0 is bounded by π 4 by properly restricting f ∞ .

We further upper bound the distance
It follows the inequality |x| ≤ π 2 |sin(x)| which holds for any |x| ≤ π 2 . The bound in the first step ensures that the inequality can be invoked.

Combining the previous results, we finally get
The upper bound can be further simplified to be π 2 3 f 2 ∞ by exploiting the estimate to q d−1 (see Lemma 23). Then the statement in Theorem 5 follows.

Recursion relation in terms of Chebyshev coefficients
In this subsection, we will convert the recursion relation in terms of monomial coefficients to that in terms of Chebyshev coefficients. Furthermore, we will derive some useful properties which will be used as intermediate steps in the proof of Theorem 5. Let be a pair of polynomials satisfying the conditions in Theorem 1. The symmetric reduction procedures yield a sequence of polynomials (P ( ) , satisfying the conditions and an associated sequence of phase factors In particular, (P (0) , Q (0) ) = (P, Q). An important recursion relation generating the coefficients of new polynomials in the monomial basis is given in Eq. (32). For convenience, we restate it as follows: if d ≥ 3, then Here, recall that the boldface symbols refer to the coefficients in the monomial basis. We will first rewrite this relation using the Chebyshev basis. As a remark, it suffices to consider the case when = 0, and the general case can be recovered by substituting (P, Q, P (1) , Q (1) , φ 0 ) ← (P ( ) , Q ( ) , P ( +1) , Q ( +1) , φ ) in the arguments as long as deg P ( ) ≥ 3.
For simplicity, let us denote the Chebyshev polynomial in the monomial basis as where we use l.o. to denote any polynomial of degree no more than d − 3 and t d is a real number whose exact value is irrelevant in the proof.
Expanding the polynomials in mononial basis and Chebyshev basis, we have Then, by comparing the coefficients of terms x d and x d−2 , we have p d = 2 d−1 p d and In the last equality, we use

An important observation is
which follows q d−1 t d ∈ R.
Plugging the previous equations in terms of Chebyshev coefficients in Eq. (32), we have the following recursion relation The second equality implies the following sign preserving property and the monotonicity Regarding the next phase factor φ 1 , its defined equation is converted similarly to Chebyshev basis Furthermore, we can use the derived recursion relation in Eq. (60) to obtain The derived results can be generalized to P ( ) , Q ( ) where 1 ≤ ≤d (d is defined by Eq. (35)). We summarize these results as the following theorem.

For any 1 ≤ ≤d, the leading coefficients of P ( ) and Q ( ) satisfy the recursion relation
2. For any 0 ≤ ≤d, and if 1 ≤ ≤d,

(The monotonicity of the leading coefficient of Q ( ) in the Chebyshev basis)
If q The first equality is obtained by direct computation. Comparing the constant terms, we have Taking the modulus on both sides, we get Applying Parseval's equality derived from Plugging the last equality into Eq. (66), we obtain Re[p d−2 is pure imaginary which yields Eq. (64).

Bounding Chebyshev coefficients of the maximal solution
In this subsection, we assume that (P, Q) is the admissible pair corresponding to the maximal solution. We will derive bounds on the leading coefficients of Q and P Im (x) in the Chebyshev basis, i.e., q d−1 and s d respectively.
Lemma 23. The leading coefficient of Q in the Chebyshev basis satisfies the following estimate: Expanding the polynomials in the monomial basis, the normalization condition reduces to where l.o. denotes any polynomial of degree less than 2d. It implies that In the second line, we exploit Eq. (58), the Parseval's equality derived from the normalization condition.
To lower bound q d−1 , we first remark that Eq. (47) can be rewritten as where K := − r∈D r > 0, and D contains all roots of F(z) within the unit disc for the maximal solution.
Given a polynomial fully factored into g(z) = γ n i=1 (z −a i ) ∈ C[z], its Mahler measure [15] is defined as This quantity is equal to the geometric mean of |g(z)| on the unit circle by using Jensen's formula in [10] M (g) = exp 1 2π Remarkably, note that |β| 1 K is the Mahler measure of a polynomial h(z) defined as Therefore, one has (71) To conclude, which completes the proof.
Remark 24 (Convergence of the maximal solution towards Φ 0 ). The derived bounds on q d−1 consequentially yields the bounds on s d . Using the equality f 2 d + s 2 d = q 2 d−1 , we get the following estimate Combining the derived results, the remaining Chebyshev coefficients of P Im and Q can be upper bounded as follows (74)

As a consequence of the upper bound, we have
it follows that s d = q d−1 in that limit since f d → 0, and the normalization condition yields s d = q d−1 = 1 when the leading coefficient of Q is fixed to be positive.

Proof of Theorem 5
In this subsection, we will prove Theorem 5 based on the previous results. We start from a technical lemma strengthening the results in Theorem 22 under additional assumptions.
Proof. The first statement directly follows Lemma 23.
If d is even, following Theorem 22 statement 3, we have Here,d = d − 2 when d is even and the first statement is invoked. It implies that The second statement of the previous lemma ensures that the inequality π 2 |sin(x)| ≥ |x| for any |x| ≤ π 2 can be invoked. Now we are ready to prove Theorem 5.
Proof of Theorem 5. We first estimate sin(2φ 0 − π 2 ) 2 as follows where we use the inequality Here, the monotonicity in Theorem 22 implies that q d+1−2 ≥ 0. If d is odd, one can derive an upper bound by telescoping and Lemma 25 Therefore, we have If d is even, we have to treat φ d−1 separately. Using Theorem 22 statement 3, we have To proceed, let us consider the monomial We will show that the upper-left element 0|M d 1 ,d 2 ,d 3 (x)|0 is real. Following the anticommutation relation ZXZ = −X, a useful equality can be derived ZW (x)Z = e i arccos(x)ZXZ = e −i arccos(x)X = W (x) −1 . Consequentially, by moving Z gates to the same site for cancellation, it can be shown that 0|M Note that taking second-order derivative on the unitary defined in Equation (5) at Φ 0 results in an integer-coefficient superposition of monomials of the defined form. To illustrate, given an odd integer d = 2 d − 1 and 0 ≤ i < j < d, one can verify that Then, by taking the imaginary component, the second-order derivative g ij is vanishing at Φ 0 . Thus, the second term in the right-hand side of Equation (38) vanishes since for i = 0, · · · , d − 2 and g d−1 (x, Φ 0 ) = −1. According to the discrete orthogonality of Chebyshev nodes, we have Following Eq. (38), the Hessian matrix at Φ 0 takes the form in the theorem which completes the proof.
To show that the Hessian matrix is bounded from below, we need the following lemma regarding the perturbation analysis to the singular values of a matrix.
Lemma 27 ([7, Corollary 8.6.2]). Given A, A+E ∈ R m×n with m ≥ n, we have Then we bound the perturbation to the Jacobian matrix via that of the phase factors.

Lemma 28. Given any symmetric phase factors
Proof. Recall that in the proof of Theorem 26, we have shown that taking the derivative w.r.t. φ i yields terms by adding an iZ multiplication left to e iφ i Z . Note that iZ = e i π 2 Z , it shifts the i-th site by π 2 which essentially breaks the symmetry constraint. That means we have to use asymmetric set of phase factors to characterize the derivative. To capture this, we define the evaluation map of asymmetric phase factors as follows. Given an asymmetric set of phase factors Φ as := (φ 0 , φ 1 , · · · , φ d ), we define U (as) (x, Φ as ) := e iφ 0 Z e i arccos(x)X e iφ 1 Z e i arccos(x)X · · · e iφ d−1 Z e i arccos(x)X e iφ d Z .
We use a superscript (as) to distinguish it with U (x, Φ) which takes the reduced phase factors Φ of a symmetric set as argument.
When d is odd, we define an asymmetric set of phase factors as which unrolls the full set of phase factors and add a π 2 shift on the i-th site. As a remark, the shifted site i can be greater than d which adds a phase shift on the right half.
Taking derivative yields that Due to the unitarity, we have To further take the second derivative, let us first introduce a double-shifted asymmetric set of phase factors. When 0 ≤ i < j < d, When 0 ≤ i = j < d, Similarly, when i or j ≥ d, the phase shift is added on the right half accordingly.
With the defined notation, we are able to compute the second derivative as Due to the unitarity, we have Therefore, for any given odd integer d and symmetric Φ ∈ R d+1 , we have When d is even, the full set of phase factors is ..., φ 0 ). The only difference in the unrolled structure comparing to that in the odd case is that there is a single unpaired phase factor φ d−1 at the middle. Therefore, when the derivative only involves φ 0 , · · · , φ d−2 , the analysis is exactly the same as that when d is odd.
When i = d − 1, taking derivative w.r.t. φ d−1 is equivalently shifting this site by π 2 , namely, Note that Ψ is a set of symmetric phase factors (and therefore we use a different notation). Then, taking the second derivative on g(x, Φ) is equivalent to taking the first derivative on g(x, Ψ) whose analysis is the same as that in the previous case. Therefore, These results lead to the inequality Thus, the inequality in Eq. (81) also holds when d is even.
To conclude, for any given integer d and symmetric Φ ∈ R d+1 , we obtain a uniform upper bound of the gradient of g j (x, Φ) Following the mean value theorem, there exists t ∈ (0, 1) so that for any j ∈ [ d] and x ∈ [−1, 1] Furthermore Then, the theorem follows Then the perturbation analysis to the singular value of the Jacobian matrix is obtained by directly applying Lemma 27.
Proof. Applying Lemma 27, we have As a consequence of Theorem 26, σ max (A( Φ 0 )) = 2 d and and Recall that the Hessian matrix at the maximal solution satisfies Hess( Therefore which completes the proof.

Lower bounding the distance to the initial point
In the last subsection, we show that the Hessian matrix is positive definite at the maximal solution. To show the positive definiteness of the Hessian matrix in a neighborhood of the optima, we need the following lower bound as an intermediate step.

Theorem 31. Given any symmetric phase factor
Proof. We denote the expansion of the polynomial g(x, Φ) in the Chebyshev basis as For simplicity, we drop the Φ-dependence in the Chebyshev coefficient g k (in particular, g k is not the derivative of g(x, Φ) with respect to φ k as in the previous section). Then we apply Eq. (26) to estimate q d−1 , the coefficient of Q(x). Notice that cos(φ ) is positive for any We use Eq. (26) and the inequality cos(φ d−1 ) ≥ cos(φ d−1 ) 2 when d is even, and obtain Hence deg(Q) = d − 1 and the leading coefficient of Q is positive. Now we are able to apply Theorem 22 and Lemma 23 and have According to the reduction procedure, the leading Chebyshev coefficients and the phase factor are connected as follows where the last inequality follows |q d−1 | ≤ 1. Following Eq. (36), for any 1 ≤ ≤d, we have where the last inequality follows the monotonicity in Theorem 22. Now we are going to show where g d denotes the leading Chebyshev coefficient of g(x, Φ). When d is odd, we havê Here, we have used the fact that q (d) 0 = 1 obtained from the proof of Lemma 14. Hence When d is even, the dangling unpaired phase factor φ d−1 has to be analyzed individually The residual term is estimated as follows Here, the last line uses the discrete orthogonality of Chebyshev polynomial. Then, we have the following inequality Algorithm 1 Projected gradient descend algorithm for optimizing symmetric phase factors (with performance guarantee)

Input:
A target real polynomial f of degree d so that it has definite parity and f ∞ ≤ √ 3 20π d , an initial guess Φ 0 = π 4 , 0, · · · , 0 ∈ S and the error tolerance . Output: A set of reduced phase factors Φ so that Evaluate the objective function F ( Φ) at Φ 0 according to Eq. (5).
Theorem 3.10], the projected gradient method with step size 1 L converges exponentially, that is, where Φ is the set of reduced phase factors in the -th iteration step. In particular, Φ ∈ S for all follows the construction of the algorithm.
Let Φ be the set of reduced phase factors in the -th iteration step. Note that F ( Φ * ) = 0, ∇F ( Φ * ) = 0 and Φ * ∈ S. Following Taylor's theorem, there exists s ∈ (0, 1) so that Because Φ , Φ * ∈ S and S is convex, the intermediate point also lies in the convex set s Φ + (1 − s) Φ * ∈ S. Then, the Hessian is upper bounded at the intermediate point by Theorem 6. Thus Furthermore, using Corollary 7, we have To obtain F ( Φ ) ≤ , it suffices to take = O log 1 iterations. Remarkably, Algorithm 1 does not require a large number of bits. To see the claim, we consider the model of finite precision arithmetic (see standard axioms in [9]). The parameter u is referred to the unit roundoff. In each iteration step, the rounding error when computing the function F and its gradient is O(d 2 u). Furthermore, the projection is needed to be executed when the proposed new Φ lies outside S. Under this circumstance, the possible error amplification by taking square roots and division does not contribute significantly and the total rounding error accumulated in each iteration step remains O(d 2 u). To conclude, the numerical error of the algorithm is upper bounded by invoking triangle inequality. Then, there exists some constant C so that Therefore, the number of bits required by Algorithm 1 is O log d log (1/ ) . Therefore the algorithm is numerically stable, and it is sufficient to use the standard double precision arithmetic operations.

Additional numerical results
In this section, we present some additional numerical results demonstrating some of the features of the energy landscape of the optimization problem for symmetric phase factors in Eq. (5).

Hessian matrix without symmetry constraint is singular at global optima
When the symmetry constraint is not imposed, the Hessian matrix can be derived from Eq. (38). Note that the Jacobian matrix A ∈ R d×d does not have full column rank because d > d. Then, following that ker(A A) = ker(A), we have rank Note that at the global optima, the Hessian matrix is precisely Hess(Φ * ) = (A * ) A * due to the vanishing residual term R = 0. Here, A * refers to the Jacobian matrix at the optima. Thus, the Hessian matrix at optima is singular. Because the residual term R is small near the optima by continuity, the Hessian matrix is almost singular in a neighborhood of the optima. This behavior is demonstrated by the numerical result displayed in Fig. 1 (a). On the other hand, when the symmetry constraint is imposed, Fig. 1 (b) shows that the Hessian matrix is positive definite in a neighborhood of the maximal solution and it agrees our theoretical results about the local strong convexity of the Hessian matrix.

Existence of local minima
To visualize the full energy landscape of the optimization problem, we first consider the simplest scenario so that only two phase factors suffice to solve the optimization problem exactly. We choose two target polynomials x. The energy landscape on the irreducible domain is displayed in Fig. 2. The global minima derived from the proposed method are annotated in the domain. In this special case, there are no local minima. However, this is a rare exception rather than the norm. Existence of local minima can be observed by slightly increasing the degree of the target polynomial f (x) = 0.2103T 4 (x) + 0.1231T 2 (x) + 0.1666. The local minimizer is numerically searched by randomly initiating the stochastic gradient descent algorithm to solve Eq. (5). The value of the objective function at Φ loc is 0.0218. The Hessian matrix has eigenvalues 0.1359, 4.5815, 7.7510, which indicates that Φ loc is indeed a local minimum. To visualize the landscape, we plot a two-dimensional section of the optimization landscape spanned by the affine plane spanned by the eigenvectors of the least two eigenvalues of the Hessian matrix (Fig. 2). Next we present an example to demonstrate the complexity of the full landscape of the optimization problem. We choose f (x) = 1 α (T 4 (x) + 2T 2 (x) + T 0 (x)) as the target polynomial. The scale factor α = 440 is set to be large enough and satisfies the requirements of Theorem 6. We first found a global minimum around Φ 0 = ( π 4 , 0, 0), by taking Φ 0 as the initial guess for Algorithm 1. We also numerically searched two local minimizers by randomly initiating the stochastic gradient descent algorithm to solve Eq. (5). We use Φ * , Φ 1 and Φ 2 to denote the global minimum and two local minima respectively. Their values are as follows: The values of the objective function at Φ 1 and Φ 2 are both around 2.5769e-06. The Hessian matrix at both points are positive semidefinite, which indicates that they are indeed local minima. To visualize the landscape, we plot a two-dimensional section of the optimization landscape spanned by the affine plane spanned by Φ 1 − Φ * and Φ 2 − Φ * (Fig. 4). It agrees with our conclusion that the energy landscape of the optimization problem remains complex and has many local minima as well as global minima. This confirms that even when f ∞ is sufficiently small, the global energy landscape is still very complex, and the local energy landscape described by Theorem 6 is nontrivial.

A Proof of Lemma 10
Proof. One can check that Then, we expand the product of matrices from left to right by using e iφZ = cos(φ) + i sin(φ)Z and W (x)Z = ZW (x) −1 . We want to show that the term containing sin(φ j ) does not contribute to p d and q d−1 . We use l.o. to denote any polynomial or polynomialvalued matrix of degree < d. Here, we relax the definition of polynomial and classify the entry of the matrix of our interest as polynomial by considering √ 1 − x 2 as another variable. Note that where the product of the second term is indeed a matrix whose elements are polynomials of degree ≤ d − 2.
Using the fact that we may prove inductively, which proves the lemma.

B Proof of Lemma 11
Proof. Now that d is odd and Φ is symmetric, Lemma 10 indicates cos 2 (φ j ) ≥ 0.
If φ j = π 2 + kπ for all 1 ≤ j ≤ d − 1, where k is some integer, then the leading Chebyshev coefficient of Q is q d−1 , which is positive. Otherwise, there exists positive integer j 0 and integer k such that φ j 0 = φ d−j 0 = π 2 + kπ. And either of the following equalities holds e iφ j 0 Z = e iφ d−j 0 Z = e i π 2 Z , e iφ j 0 Z = e iφ d−j 0 Z = e −i π 2 Z .
We can perform a similar reduction for for Therefore, U (x, Φ) can be constructed by a new set of symmetric phase factors whose full length is d

C Proof of Lemma 13
First, we consider the expansion of P (x) and Q(x) in the monomial basis, and derive some useful equalities from the normalization condition, which comes from the unitarity of U (x, Φ).
Here we use the convention that q k = p k = 0 if k < 0.
Proof of Lemma 13. Direct computation shows that and Hence, Q(x) ∈ R[x]. Normalization condition is preserved due to unitarity. We can also verify that the parity of P (1) is the same as P and the parity of Q (1) is the same as Q. Now we apply the results of Lemma 34 and first examine the coefficients of P (1) . The coefficient of x d+2 is where we use Eq. (100) in the second equality and Eq. (101) in the third equality. Then we also apply the results of Lemma 34 to examine the coefficients of Q (1) . The coefficient of x d+1 is q where we exploit Eq. (101). Furthermore, we get that q (1) d−3 is positive if d is odd.