Stable factorization for phase factors of quantum signal processing

This paper proposes a new factorization algorithm for computing the phase factors of quantum signal processing. The proposed algorithm avoids root finding of high degree polynomials by using a key step of Prony's method and is numerically stable in the double precision arithmetics. Experimental results are reported for Hamiltonian simulation, eigenstate filtering, matrix inversion, and Fermi-Dirac operator.


Background
This paper is concerned with the problem of quantum signal processing. Quantum computing has been mostly working with unitary operators, since the quantum gates and circuits are unitary. However, in recent years, we have witnessed great progress in representing nonunitary operators efficiently with quantum circuits.
Let A be an N × N Hermitian matrix with N = 2 n and A 2 < 1 (after scaling if needed). For simplicity, we only consider Hermitian matrices in this paper and refer the readers to [6,4] for more general cases. One of the most successful methods for presenting A on a quantum circuit is the Hermitian block encoding where U A is a Hermitian unitary matrix of size (2 m · N ) × (2 m · N ), A is the top-left corner of U A , and U A can be implemented using a quantum circuit with n + m input qubits. In most of the quantum problems in scientific computing, such as Hamiltonian simulation, filtering, and quantum linear algebra [3,10,4,13], one is often interested the Hermitian matrix A key question is whether there is an algorithm that builds the quantum circuit U f (A) from the circuit U A by using only the knowledge of the function f (x) but treating U A as a black box This question is answered by the quantum eigenvalue transformation described in [9,6].
To simplify the discussion, we assume that all Hermitian matrices mentioned below satisfy A 2 < 1 and all functions defined on [−1, 1] satisfy f ∞ < 1. The quantum eigenvalue transform proceeds as follows (see [8] for example for details).
• Split the polynomial f (x) into the even and odd parts f e (x) and f o (x) on x ∈ [−1, 1] • Approximate the even part f e (x) with an even degree polynomial a e (x) and implement a e (A) with a circuit shown in Figure 1(b) with appropriate phase factors φ e 0 , . . . , φ e de . Here d e is the equal to degree of a e (x). • Combine the circuits implementing each component together by linear combination of unitaries (LCU) [2].
The key remaining step is how to construct the phase factors φ 0 , . . . , φ d for an even or odd polynomial a(x) of degree d. This is answered by the quantum signal processing theorem [9,6,10]: In this paper, we use the following notational convention It is often convenient to work with variable t ∈ [−π, π] and lift these functions to the t space via the transform x = cos(t), i.e., given for t ∈ [0, π] and extend to t ∈ [−π, 0] analytically. For example f (x) = x lifts to f (t) = cos(t) and In the t variable, (1) can be written more compactly as We can also use complex variable z = e it and lift f (t) analytically to a Laurent polynomial 2i . In the following discussion, we often work with the lifted functions a(z), b(z), c(z), and d(z) over the complex plane.

Previous work
There are two main approaches for computing the phase factors. The first one [6,7,1] is based on polynomial factorization. Following the notation of [7], this approach starts by choosing a function b(·) that has the right parity and satisfies a 2 (t) + b 2 (t) < 1. Let {ξ j } be the set of 2d roots of the Laurent polynomial 1 − a 2 (z) − b 2 (z) inside the unit circle. Define By setting α ≡ 1−a 2 (z)−b 2 (z) e(z)e(1/z) , the functions c(z) and d(z) are then equal to [7] With a(z), b(z), c(z), and d(z) available, p(z) = a(z) + ic(z) and r(z) = b(z) + id(z) as defined in (2). Given p(z) and r(z), the algorithm for extracting the phase factors Φ = (φ 0 , . . . , φ d ) is quite straightforward (see for example Theorem 3 of [5]). For completeness, it is also included in Section 3.1 following our notation. Though this approach is direct, the implementation requires finding roots of a high degree Laurent polynomial, which is often unstable in double precision arithmetics. It was shown that [7] that O(d log d/ ) classical bits are needed and the algorithms are often implemented with variable precision. In [1], an algorithm based on the halving and capitalization techniques is proposed to mitigate the numerical issue and it was able to scales to more than 3000 phase factors.
The second approach is based on optimization [4], i.e., minimizing directly but with the search space restricted to the symmetric phase factors Φ (i.e. φ j = φ d−j ). Though this minimization problem is highly non-convex, [4] demonstrates numerically that, starting from the initial guess Φ 0 = (π/4, 0, . . . , 0, π/4), a quasi-Newton method is able to find the maximal solution that corresponds to b(t) = 0 in our notation. The numerical results in [4] demonstrated robust computation of the phase factors up to 10000 phase factors. A recent study [14] proves that for a ∞ ≤ O(1/d) a projected gradient method converges to the maximal solution.

Contribution
The main contribution of this paper is a new stable algorithm of the factorization approach. It is based on two observations. First, the factorization approach does not really need the roots {ξ j } since the function e(z) = z −d |ξ j |<1 (z − ξ j ) only depends on the characteristic polynomial |ξ j |<1 (z − ξ j ) of these roots. We show that this characteristic polynomial can be computed directly via a key component of Prony's method [12,11], without knowing the roots {ξ j }. This avoids the root-finding, which is the main source of instability of the factorization approach. Second, in order to compute the characteristic polynomial in a robust way, we propose to pick b(z) randomly with a dominant highest frequency, i.e., in some sense opposite to the symmetric phase factors. This allows us to compute the characteristic polynomial using the standard numerical linear algebra routines.
The resulting algorithm is conceptually simple and easy to implement. On the numerical side, compared with the-state-of-the-art results in [4], our algorithm achieves comparable accuracy (∼ 10 −12 ) and has the same O(d 2 ) computational cost. The longest sequence reported in our experiments scales to over 50000 phase factors.
The rest of the paper is organized as follows. Section 2 reviews the Prony's method. Section 3 describes the main algorithms. The numerical results are given in Section 4.

Review of Prony's method
Let us explain Prony's method with a simple but key example. Let (f k ) k∈Z be a sequence of the form where d is the number of terms, {ω j } are the frequencies, and {r j } are the weights. Assume that d, {ω j }, and {r j } are all unknown to us. The computation problem is to recover d, {ω j } (up to 2π), and {r j } from potentially noisy values of (f k ) k∈Z .
Prony's method starts by considering the infinite vector [e iω j k ] k∈Z for some j and the upward shift operator S. Applying S to this vector gives Taking the product over all (S − e iω j ) leads to Taking a linear combination of the vectors over j with unknown weights r j gives Then the last equality becomes The final linear system contains a great deal of information.
• The rank of the matrix in (5) gives d.
• Any non-zero vector in the null space of the matrix in (5) gives the coefficients of m 0 , . . . , m d of the polynomial m(z).
• The roots of m(z) gives {e iω j }.
• Solving the least-squares problem Though we describe Prony's method using infinite vectors, it is clear now that only d + 1 rows of the matrix is needed. Due to the shifting nature of the matrix, only 2d + 1 consecutive values of (f k ) are required. The main advantages of the Prony's method are that (1) it is adaptive in the sense that {ω j } do not need to fall in any discrete grid, (2) it is conceptually simple, and (3) it leverages standard numerical routines such as root-finding and null-space computation. The main disadvantage is that root-finding can often be unstable when noise is present.

Key components
We start by choosing the function b(t) to be of the form Here the leading coefficient is the most dominant one and the rest of the coefficients (b d−2 , . . .) are chosen randomly. The reason for doing so will be explained below.
Recall that the key function of the factorization approach is The first idea is that it is possible to compute the characteristic polynomial directly without first calculating the roots. This avoids the root-finding, which is the main source of instability of the factorization approach. A simple but key observation is that these roots are the poles of the reciprocal Since g(z) is meromorphic, g(z) takes the form where the sum is taken over the roots both inside and outside D. Let us consider the integrals for integer values of k ≤ −1, where γ is the boundary of D in the counter clockwise orientation. For a fixed k ≤ −1, where the second equality relies on the analyticity of w j ξ j −z in D for |ξ j | > 1 and the third equality uses the residue theorem at {ξ j }. This computation shows that the integrals 1 2πi γ g(z)z k dz for k ≤ −1 contain important information about the poles inside D.
The integral 1 2πi γ g(z) z k dz z over the unit circle is also closely related to the Fourier transform of the function g(t) ≡ g(e it ): In order to recover the characteristic polynomial |ξ j |<1 (z − ξ j ) (the key part of e(z)), we apply Prony's method to the Fourier coefficients. Slightly different from the description in Section 2, we define the semi-infinite (instead of infinite) vector Let S be the shift operator that shifts the semi-infinite vector upward (i.e., dropping the first element). For any ξ j with |ξ j | < 1, Since the operators S − ξ j all commute, Sinceĝ − is a linear combination of such semi-infinite vectors with weights {−w j }, Since b(z) is chosen randomly, with probability 1 the roots {ξ i } are disjoint. Therefore, the polynomial |ξ i |<1 (z − ξ i ) is of degree 2d. By denoting it as

Phase factors from p(z).
The construction of the actual phase factors is given as follows, essentially following Theorem 3 of [5] but in terms of the t variable.
For each n = d down to 0, perform the following two steps • In the t variable, p(t) and r(t) are trigonometric polynomials of degree n. Write p(t) = p n e int + . . . and r(t) = r n e int + . . ., where p n and r n are the degree n coefficients. Solve φ n from e 2iφn = p n /r n .

• Transform p(t) and r(t) via
This brings the top coefficients of p(t) and r(t) to zero, hence reducing the degree by one.
Within this loop, the switch between the angular function p(t) and the coefficients {p j } −n≤j≤n can be done with the fast Fourier transform (FFT). Because of the O(d log d) complexity of the FFT, the overall cost of this loop is O(d 2 log d).

Robust computation of polynomial coefficients
. The remaining issue is to compute (m 0 , . . . , m 2d ) in a numerically stable way. This is in fact not always guaranteed. Consider for example the case that a(z) has negligible coefficients for large frequency. If we set b(z) = 0, then g(z) = (1 − a 2 (z) − b 2 (z)) −1 = (1 − a 2 (z)) −1 might lack high frequency content. This implies that all coefficientsĝ k might be be negligible for large k values. A direct consequence is that the matrix in (10) might have a numerical rank much smaller than 2d. In order to resolve this issue, we choose b(z) to have a large leading coefficient as suggested in (6). This is the second contribution of this paper. Figure 2 illustrates the difference between b(t) = 0 and b(t) ∼ sin(dt) + . . . for the Fermi-Dirac operator (the last example in Section 4) at β = 100. Notice that the leading term sin(dt) in b(t) introduces a dominant anti-diagonal in the matrix of (10), ensuring that it has numerical rank 2d.

Implementation
To implement this algorithm numerically, we need to take care several issues.
• The computation (8) requires the Fourier transform of is near singular and hence it is hard for numerical quadrature. In practice, we make sure that a ∞ and b ∞ are bounded by 1/3.  (7) with the trapezoidal rule. The trapezoidal rule is exponentially convergent for smooth functions when N s is sufficient large. In the current setting since 1 − a 2 (t) − b 2 (t) is bounded well away from zero, g(t) does not exhibit singular behaviors. As a result, the highest non-trivial frequency in g(t) is on the same order of the highest non-trivial frequency in a(t) and b(t). Therefore, by setting N s to be about 40 times d in practice, we ensure that the trapezoidal rule is exponentially accurate. Applying the fast Fourier transform to {g(t n )} gives the Fourier coefficients {ĝ k }.
• The semi-infinite matrix in (10). In the implementation, we only pick the first l rows of this semi-infinite matrix and define with l ≥ 2d + 1. In practice setting l = 2d + 2 seems to be sufficient.
• The computation of the vector m. The most straightforward way is to compute the singular value decomposition (SVD) of H in (12) and take m to be the last column of the V matrix, which unfortunately has O(d 3 ) time complexity.
This complexity can be improved based on the following observation. Let s 1 , . . . , s 2d+1 be the singular values of H. Numerically, our choice of b(t) leads to a large gap between s 2d and s 2d+1 that is actually proportional to s 1 . As a result, we propose the following iterative procedure where is small positive constant.  A few remarks are in order here.
• The symmetric phase factors obtained in [4] correspond to b(t)=0, which helps the optimization approach. In the current algorithm, the choice of b(t) = b d sin(dt) + . . . helps the direct factorization method and the resulting phase factors are non-symmetric.
• Whether symmetric or non-symmetric phase factors are preferred in practice is not clear. The main part of implementing QSVT on quantum circuits as in (3) is actually related to the terms e itX , i.e., the implementation of the circuit U A . The actual choice of the phase factors might very well depend on the circuit architecture.

Setup
The algorithm is implemented with the standard double precision arithmetics. All numerical results are obtained on a laptop with a 2.6 GHz 6-Core Intel Core i7 CPU. The computation of the vector m is performed via the iteration (13). The complexity is quadratic in terms of the degree d and the actual computation typically finishes within a couple of minutes.
As mentioned in Section 1, when the target function f (x) is not a polynomial, the first step is to construct an accurate polynomial approximation a(x). Since polynomial approximation in x is equivalent to trigonometric approximation in t, this task is performed the t space. Let us introduce an equally spaced grid t n = exp i 2πn Ns on the unit circle, where the grid size N s is taken in practice to be 40 times d as before.
• First, the values of f (t) at {t n } are computed.
• After applying fast Fourier transforms to {f (t n )}, we identify a frequency d such that all Fourier coefficients above frequency d are below a threshold multiplied by the maximum Fourier coefficient in absolute value. In the experiments, the threshold is chosen to be around 10 −12 since this is right above the accuracy of the QSP algorithm in double precision arithmetics [4] and further improvement below this threshold is not necessary.
Here d is enforced to have the same parity as f (x).
• The Fourier coefficients above frequency d are then set to zero and Fourier transform back gives the desired trigonometric approximation to f (t) in the t space. The final function a(t) is also scaled to have infinity norm equal to 0.3.
Regarding the choice of b(t), simply setting b(t) = 0.4 · sin(dt) suffices for in the examples presented below. However, in principle, one might need to choose to b(t) according to (6) in order to avoid identical roots.
The polynomial coefficient vector m is computed with the iterative procedure (13). The error of the constructed phase factors is measured in the relative L ∞ norm. More precisely, we compute a functionp(x) following (1) using the constructed phase factors Φ = (φ 0 , . . . , φ d ).
Withp(x) as an approximation to p(x), the error of the phase factor computation is estimated with over the interval [−1, 1].

Examples
In what follows, we present the numerical results for four examples: Hamiltonian simulation, eigenstate filtering, matrix inversion, and Fermi-Dirac operator. Among them, the first three examples are also studied numerically in the paper [4], arguably the most complete numerical study on the phase factor computation. In each example, we have chosen the parameters to be on par or harder compared with those used in [4], so that the actual instances have at least the same level of difficulty. Overall, our algorithm exhibits similar accuracy but short wall clock time when compared with [4]. The longest sequence reported below consists of more than 50000 phase factors.

Hamiltonian simulation
Assume that the Hamiltonian H satisfies H 2 ≤ 1. Hamiltonian simulation for a period of time τ boils down to the quantum signal processing problem for f (x) = e −iτ x . The even and odd parts are Re(f (x)) = cos(τ x), Im(f (x)) = sin(τ x).
Since both functions are not polynomials, we first use the procedure described above to compute trigonometric polynomial approximations a Re (t) ≈ 0.3 · Re(f (t)) and a Im (t) ≈ 0.3 · Im(f (t)), both scaled so that L ∞ norm is below 1/3. a Re (x) is even in x with even degree d Re while a Im (x) is odd in x with odd degree d Im . We perform the tests with τ = 1000, 2000, 3000, 4000, 5000 and the results are summarized in Figure 3.

Eigenstate filtering
For a fixed gap ∆, we follow [4] and consider the filtering function centered at the origin where T k is the Chebyshev polynomial. The parameter k is set to be 20/∆ so that the f (x) is negligible outside the ∆ neighborhood of the origin. In terms of variable t, the function Since f (t) is already a trigonometric polynomial, we simply set a(t) = 0.3 · f (t).

Matrix inversion
We consider the inversion of the matrices with spectrum resided in D κ = [−1, −1/κ] ∪ [1/κ, 1], where κ is the condition number. The QSP problem here amounts to approximating the function 1/x over D κ . We choose where the difference between f (x) and 1/x over D κ is negligible under the double precision arithmetics. In the t variable, this is f (t) = 1 − e −(5κ cos(t)) 2 cos(t) .
The procedure mentioned above is used to compute a trigonometric approximation a(t) to f (t) (up to a constant factor) with a ∞ = 0.3. The tests are performed with κ = 16, 64, 256, 1024 and the results are summarized in Figure 5. The longest sequence for κ = 1024 has more than 50000 phase factors.
In the t variable, it takes the form f (t) = 1 − e β cos(t) 1 + e β cos(t) .
The procedure mentioned above is used to compute a trigonometric approximation a(t) to f (t) (up to a constant factor) with a ∞ = 0.3. We perform the tests for β = 100, 200, 400, 800, 1600 and the results are summarized in Figure 6.

Discussions
In this paper, we proposed a new factorization algorithm for computing the phase factors of quantum signal processing. The proposed algorithm avoids root finding of high degree polynomials by using a key component of the Prony's method. The resulting algorithm is numerically stable in the double precision arithmetics. We have demonstrated the numerical performance with several important examples, including Hamiltonian simulation, eigenstate filtering, matrix inversion, and Fermi-Dirac operator. For future work, the immediate question is to prove theoretically the stability of the algorithm with the proposed choice of b(t).