On efficient quantum block encoding of pseudo-differential operators

Block encoding lies at the core of many existing quantum algorithms. Meanwhile, efficient and explicit block encodings of dense operators are commonly acknowledged as a challenging problem. This paper presents a comprehensive study of the block encoding of a rich family of dense operators: the pseudo-differential operators (PDOs). First, a block encoding scheme for generic PDOs is developed. Then we propose a more efficient scheme for PDOs with a separable structure. Finally, we demonstrate an explicit and efficient block encoding algorithm for PDOs with a dimension-wise fully separable structure. Complexity analysis is provided for all block encoding algorithms presented. The application of theoretical results is illustrated with worked examples, including the representation of variable coefficient elliptic operators and the computation of the inverse of elliptic operators without invoking quantum linear system algorithms (QLSAs).


Introduction
Block encoding [25] is a widely used technique in quantum computing and a crucial component of many quantum algorithms with a potentially exponential advantage over classical algorithms, such as quantum phase estimation (QPE) [23,31], the HHL algorithm [21], quantum singular value transformation (QSVT) [18] and various quantum linear system solvers [1,13,24], to name a few. The idea of block encoding is to embed a linear operator A into a unitary operator U A with larger dimensions after appropriate scaling. The unitary U A is then converted into a quantum circuit, allowing a quantum computer to access U A for actual computations.
The potential advantage of quantum algorithms depends critically on efficient and practical quantum circuits for block-encoding of the operators involved, and the construction of such circuits can be non-trivial in general. Researchers have constructed block encoding schemes leveraging different structures of the operators studied. For example, a block encoding scheme is provided in [5,18] for sparse matrices, and a recipe is presented for hierarchical matrices in [30]. In this paper, we consider the problem of block encoding a large family of dense operators: the pseudo-differential operators (PDOs). PDO is a rich family of linear operators that include many commonly used examples in scientific problems, which is typically given in the following form: where a(x, ξ) ∈ C ∞ (R d × R d ) is called the symbol of A and f is the Fourier transform of f . A major motivation for studying operators with the form (1) is that differential operators often enjoy a simple representation in the Fourier domain. For example, the elliptic operator: with ω(x) > 0 can be represented in the form of (1) with the symbol More generally, an m-th order linear partial differential operator P (x, D) = |α|≤m a α (x)D α with D = − i 2π ∇ x can be represented by where α = (α 1 , . . . , α d ) is the d-dimensional multi-index and |α| = d j=1 α j . Another popular example is the translation-invariant operator. Let φ ξ (x) = e 2πix·ξ be a function of x. If an operator A is translation-invariant, i.e., (Aφ ξ )(x) = a(ξ)φ ξ (x), then in which case we say that the symbol a(ξ) is a multiplier. Apart from the examples mentioned above, the PDO family also contains other operators such as convolution operators, singular integral operators, etc. Moreover, a space of PDOs is often closed with respect to many elementary operations under certain conditions. For example, for the operator A in (4) with symbol a(ξ) ̸ = 0, the inverse of A can be simply represented by In general, the operator defined by C ∞ function a(x, ξ) as in (1) is called a pseudodifferential operator only if a(x, ξ) satisfies some additional requirements such as where ⟨ξ⟩ := 1 + |ξ| 2 , and the space of the corresponding PDOs is denoted by S m . There are multiple monographs on PDOs that interested readers can refer to, such as [34,37]. The PDOs considered in this paper are equipped with a periodic boundary condition on the space domain Ω = [0, 1] d . The frequency variable ξ thus takes the value on the integer grid, and the operator A becomes wheref is the coefficient of the Fourier series of f . In this paper, we derive block encoding schemes for the PDO (5) based on different additional structures of the symbol a(x, ξ). First, we present a block encoding scheme for generic symbols a(x, ξ) without additional structures. We then point out that the success probability of the quantum circuit can be significantly improved if the symbol a(x, ξ) can be expanded into series: a(x, ξ) = j α j (x)β j (ξ), (6) with only O(1) terms. Furthermore, the circuit can be constructed in a much more explicit way with the help of quantum signal processing (QSP) and quantum eigenvalue transformation (QET) if the symbol is a sum of fully separable terms, i.e., it can be expanded as a(x, ξ) = j α j1 (x 1 ) · · · α jd (x d )β j1 (ξ 1 ) · · · β jd (ξ d ), with O(1) terms, where x = (x 1 , . . . , x d ) and ξ = (ξ 1 , . . . , ξ d ). See Definition 2 and Section 5 for details. Complexity analysis is included for all block encoding schemes, and their applications are showcased with specific examples. The contributions of this paper can be summarized as follows: • We provide practical block encoding schemes for pseudo-differential operators, including algorithms applicable to generic PDOs (see Figure 3), efficient block encoding for separable PDOs (see Figure 6) and explicit circuits for fully separable PDOs (see Figure 7). Novel ideas of circuit design, such as the phase multiplication circuit (see Figure 4) and the prototype for diagonal multiplication (see Figure 8), are included in the block encoding schemes.
• We conduct comprehensive complexity analysis for the block encoding schemes proposed. The result for complexity analysis includes the success probability, the number of ancilla qubits needed, and the number of gates used. In addition to theorems applicable to general cases, we also demonstrate possibilities of improving the complexity results by leveraging particular structures of the problem (see Section 6.1 for example).
• We demonstrate the usage of our results with explicit examples. One can use our block-encoding scheme not only as an integrated part of established quantum algorithms but also as an option for conducting operations directly on certain operators. For the example shown in Section 6.2, we use the idea of symbol calculus to directly block-encode the inverse of an elliptic operator and the dependence of the complexity on P (the number of discretization points used for each dimension) is at least quadratically improved compared to previous results for block-encoding the inverse matrix (see Remark 3).

Contents
The paper is organized as follows. In Section 2, we specify the notation used in this paper and provide preliminary results needed in subsequent sections, such as quantum Fourier transform (QFT), the linear combination of unitaries (LCU), quantum signal processing (QSP) and quantum eigenvalue transformation (QET). In Section 3, an algorithm is given for block encoding of generic symbols. For a separable symbol a(x, ξ) = α(x)β(ξ), a more efficient block encoding scheme is provided in Section 4. Then a more explicit and practically feasible block encoding is developed in Section 5 for fully separable symbols of the form displayed in (7). Finally, Section 6 presents the application of the proposed block encoding method to two types of widely used PDOs, including a variable coefficient second-order elliptic operator and the inverse of a constant coefficient elliptic operator. The paper is ended with a conclusion and discussion for future directions in Section 7.

Block encoding
Most of the previous work [2,9,25] assumes that we have access to a matrix by querying two oracles that encode the locations and values of the non-zero elements of the objective matrix. Among them, [18]*Lemma 48 provides a general framework to explicitly construct the block encoding of sparse matrices if we are given these two oracles. Following this routine, [5] constructs the block encoding of banded circulant matrices, extended binary tree matrices, and quantum walk operators.
For general non-sparse matrices, it is clearly impossible to block-encode them in logarithmic time, and [4] proposes a near-optimal scheme for block encoding general unstructured matrices. Many methods are also proposed to implement the block encoding for full-rank dense matrices with certain structures, such as Toeplitz and Hankel systems [26], and linear group convolutions [7] based on quantum Fourier transforms. The authors of [30] introduce a new method for kernel matrices with a hierarchical structure, which can be applied to non-uniform grids the Fourier transform cannot be used.

Quantum PDE solvers
Along with the development of quantum linear system solvers [9,13,18,21,24], many quantum PDE solvers are proposed to take advantage of exponential acceleration. Quantum counterparts of the finite element method (FEM) [28] and the finite difference method (FDM) [6,12] emerged for solving Poisson's equation and wave equation. In [10], adaptive finite difference and spectral methods are proposed to improve the dependence of the complexity on the error ϵ from O(poly(1/ϵ)) to O(polylog(1/ϵ)). It is worth noting that the process of block encoding the discretized differential operator is often not provided in these works, and constructing the block encoding for generic partial differential operators is highly non-trivial.

Numerical algorithms for PDOs
There are also various classical numerical algorithms that compute PDOs efficiently. For example, [15] exploits the following expansion of symbols: The paper presents efficient numerical approximations of β j (ξ) with Chebyshev polynomials and hierarchical splines and further reduces the number of terms in the expansion by SVD or QR decomposition. However, a naive extension to high-dimensional PDOs leads to exponential overhead, as is the case for most classical methods. This is also one of the reasons why a quantum implementation of PDOs can be potentially useful.

Preliminaries and notations 2.1 Notations
We adopt the commonly used notation for binary numbers: for an integer power of two P = 2 p and any y ∈ {0, . . . , P − 1} if y = y 0 + 2y 1 + · · · + 2 p−1 y p−1 = (y p−1 y p−2 · · · y 0 .) in the binary system, the corresponding quantum state is |y⟩ ≡ |y p−1 . . . y 0 ⟩. This extends to an m-tuple x = (x 1 , . . . , x m ) with x j ∈ {0, . . . , P − 1}. The corresponding quantum state is given by |x m ⟩ · · · |x 1 ⟩, where |x j ⟩ = |x j,p−1 · · · x j,0 ⟩ for each j. For a multivariate function g : {0, . . . , P − 1} m → R, we denote by D g the diagonal multiplication operator on the Hilbert space C mp : The notation |v| for a d-dimensional vector v stands for the Euclidean norm We also use the single qubit rotations To simplify the discussion, we assume access to all single qubit rotations, the Hadamard gate, the CNOT gate, the 2-qubit SWAP gate, and the Toffoli gate when counting the number of elementary gates used. If one wants to use certain commonly used universal gate sets such as Hadamard and Toffoli, there will be some overhead linear in the number of gates involved and ploylogarithm in the precision ϵ, as bounded by the famous Solovay-Kitaev theorem [22]. There are also many established results on decomposing commonly seen quantum gates with a certain universal gate set, such as [14,32,36], to name a few.
where I n denotes the n-qubit identity operator. In the matrix form, a (γ, m, ϵ)-blockencoding is a 2 m+n dimensional unitary matrix where * can be any block matrices of the correct size and ∥ A − A∥ ≤ ϵ. In addition, when A is a Hermitian matrix, it is possible to construct U A such that it is also Hermitian, in which case it is called a (γ, m, ϵ)-Hermitian-block-encoding of A. The error ϵ is omitted in the notation of block encodings if ϵ = 0.
For an n-qubit system, the quantum Fourier transform (QFT) is an implementation of where N = 2 n , using a circuit U FT with O(n 2 ) elementary gates and no ancilla qubit. The elementary gates involved include 2-qubit swap gates and 2-qubit controlled rotation gates. We refer the readers to [11,31] for more details on QFT. If only an approximation of U FT is needed, one can use approximated QFT [29], which has gate complexity O(n log(n/ϵ)), where ϵ is the spectral norm error of the approximation.

Linear combination of unitaries (LCU)
Given a few block-encoded matrices, a block encoding of a certain linear combination of them is often needed in practice. To this end, the linear combination of unitaries (LCU) technique has been developed ( [2,9,18]). For example, for two matrices A and B, a block encoding of A + B can be given by the circuit in Figure 1, where U A and U B are block encodings of A and B, respectively.

|0⟩
H For general linear combinations, we recall the following result from [18] for general linear combinations.
where I a+s denotes the identity operator with size 2 a+s × 2 a+s .

Quantum eigenvalue transformation and quantum signal processing
Given a Hermitian block encoding of Hermitian matrix A, one can construct a block encoding of f (A) for a certain function f using the qWeET) technique [18,25] ) be the even and odd part of f (x), respectively. The standard procedure consists of the following steps, where we assume that f (x) is properly scaled such that ∥f e ∥ < 1, ∥f o ∥ < 1, and ∥ · ∥ denotes the L ∞ norm on [−1, 1].
1. Approximate f e and f o with degree deg f e (ϵ) even polynomialf e and degree deg

Find two sequences of phase factors
where p e and p o are complex polynomials with degree deg f e (ϵ) and deg f o (ϵ), respectively, given by (10) Here, the superscripts e and o are omitted for simplicity. The phase factors are then used in the QET circuit shown in Figure 2(b) to construct block encodings U f e (A) and U f o (A) , where the controlled rotation gate CR ϕ is described in Figure 2  There are several methods to find the phase factors in a stable and efficient way. For example, we refer to [8,16,17,20,38] for more details. We summarize the procedure given above in Lemma 2 below, where we assume that f is either even or odd for simplicity.

Discretization of pseudo-differential operators
As mentioned in Section 1, the PDO considered in this paper is defined for periodic functions on Ω = [0, 1] d : In most numerical treatments, the function f is given on a discrete grid where P = 2 p is the number of discrete points used for each dimension. Notice that here we slightly abuse the notation by reusing x for the integer index of the grid points. Since the space variable takes value on the Cartesian grid X, the frequency domain is discretized correspondingly on {− P 2 , . . . , P 2 − 1} d , which leads to the discretized PDO: where Ξ = {0, 1, . . . , P − 1} d . We adopt an abuse of notation and denote the discretized PDO by A too. Though the frequency variable ξ is discretized on (11), by the convention of discrete Fourier transform (DFT) and fast Fourier transform (FFT), the frequency (P/2, . . . , P − 1) is often identified with (−P/2, . . . , −1), respectively, since P is a period for the frequency variable after DFT. In other words, the discretized PDO can be written as and e j is the j-th standard basis vector in C d . As an example, when d = 1, we havẽ To simplify the notation and avoid repetitive use of P , we further definȇ and the discretized PDO becomes It is clear that sup |ȃ| = sup |a|. In what follows, we also refer toȃ as the symbol of the PDO to be computed.

Block encoding for generic symbols
This section is concerned with the block encoding of the PDO (11) (or rather (13)) with a generic symbol a(x, ξ), without assuming any additional structure. In order to compute the PDO in (13), a simple strategy is to first lift the state to the phase space Ξ × Ξ. Then the multiplication ofȃ(x, ξ) in (13) can be performed by diagonal matrix block encodings. Combining the QFT circuit and the block encoding of diagonal matrices, one can construct the entire circuit as illustrated in Figure 3. |f | x f ( x P ) |x⟩ is the normalized input data , U FT is the QFT circuit, U ph and Uȃ are the circuits that perform the multiplication of e 2πix·ξ/P and a(x, ξ) in (13), and the desired output is obtained with normalizing factor 1 |Af | when getting |0 pd+b ⟩ for the pd + b qubits on the bottom. Now we explain the circuit displayed in Figure 3 in more detail. First, we assume that the information of the function f is prepared by a normalized vector 1 |f | For functions f with certain properties such as integrability, the state 1 |f | x f ( x P ) |x⟩ can be constructed efficiently (see [19] for more details). For the rest of the paper, we assume the accessibility of the state 1 |f | x f ( x P ) |x⟩ as an input.
Step 1. Apply QFT and lift the input state to the phase space. We first obtain the representation of f in the frequency domain by QFT. After applying the (inverse) QFT to the state 1 |f | Then the state 1 |f | x,ξf (ξ) |x⟩ |ξ⟩ is obtained by applying the Hadamard gates H ⊗pd to the x-register and putting both registers together.
Step 2. Multiply the phase e 2πix·ξ/P with U ph . A naive way of multiplying the phase e 2πix·ξ/P is to use Proposition 19, which involves many ancilla qubits and reduces the success probability. Here, we develop an efficient implementation for multiplication without involving any extra error or ancilla qubits in the following lemma. Figure 4 implements the unitary operator: with O(p 2 ) gate complexity precisely without ancilla qubits.
Proof. The idea of the construction is similar to the implementation of QFT, which is based on bit-wise controlled rotation. We first write the binary representation of integers and do the following calculation: where the second equality is true because e 2πix j ξ k ·2 j+k−p = 1 when j + k ≥ p. The circuit corresponding to the unitary in (16) can be implemented by a series of controlled rotations Finally, the circuit shown in Figure 4 is obtained after arranging the controlled rotations in the corresponding places.
Step 3. Multiply the symbolȃ(x, ξ). The next component in Figure 3 is the diagonal multiplication Uȃ, which is designed to approximate the map where C a > 0 is a constant that depends onȃ(x, ξ), b is the number of ancilla qubits used for Uȃ and ⊥ is an unnormalized state that is orthogonal to any state of the form |x⟩ |ξ⟩ |0 b ⟩.
As mentioned earlier, the idea is to treat (x, ξ) as a 2d-dimensional variable and utilize the result from arithmetic circuit construction. Leveraging the reversible computational model and the uncomputation technique, any classical arithmetic operation can be implemented by a quantum circuit efficiently. More specifically, one can construct a corresponding quantum circuit using O(polylog( 1 ϵ )) ancilla qubits and O(polylog( 1 ϵ )) gates, where ϵ is the precision one wants to achieve (Cf. [31,32]). We state a general result for an efficient block encoding of diagonal matrices D g , as defined in (8), which is summarized in Proposition 4. A similar idea has been used in [19] to create a given state, in [21] to construct the reciprocals of the eigenvalues and in [35] to implement the diagonal preconditioner. Proof. The circuit U g is constructed as follows. Let t = ⌈log 2 ( Cπ ϵ )⌉, and let θ(x 1 , . . . , x m ) be a map that gives |θ sgn θ t−1 θ t−2 . . . θ 0 ⟩, where (.θ t−1 θ t−2 . . . θ 0 ) is the closest t-bit fixedpoint representation of 1 π arcsin(|g(x 1 , . . . , x m )|/C), and θ sgn is assigned the value 0 if g ≥ 0 and the value 1 otherwise. For an arbitrary basis state |x m ⟩ · · · |x 1 ⟩, we consider the system with t + 2 ancilla qubits |0⟩ |x m ⟩ · · · |x 1 ⟩ |0 t+1 ⟩. Here |x j ⟩ = |x j,p−1 · · · x j,0 ⟩ for each j. Using the reversible computational model and uncomputation ( [31,32]), the classical circuit: can be constructed with O(poly(t) + poly(mp)) gates and O(poly(mp)) ancilla qubits. Then we apply the circuit in Figure 5 on the t + 2 ancilla qubits. The state obtained is: which can then be mapped to . , x m ))| < ϵ C , which means the desired state 1 C g(x 1 , . . . , x m ) |x m ⟩ · · · |x 1 ⟩ is obtained with error at most ϵ C upon measuring the ancilla qubits and getting |0 t+2 ⟩.
Step 4. Sum over the frequency variable. Finally, after applying the Hadamard gate H ⊗pd to the ξ registers, we obtain the state where⊥ is an unnormalized state that is orthogonal to all states of the form |x⟩ |0 pd+b ⟩ and which can be seen from (18). Therefore, one obtains the desired state on the x registers upon measuring |0 pd ⟩ for the ξ registers and |0 b ⟩ for the ancilla qubits. Notice that there is an extra scaling factor 1 √ P d due to the application of Hadamard gates, so the success probability is O(2 −pd ). The complete circuit we use is exactly the one shown in Figure 3.
The block encoding scheme of the PDO (13) constructed in this section can be summarized in the following theorem: For a generic symbol a(x, ξ), a block encoding of the corresponding discretized PDO defined by (13) can be (2 pd 2 C a , O(poly(pd) + polylog(1/ϵ)), ϵ)-block-encoded using the circuit displayed in Figure 3 with gate complexity O(poly(pd) + polylog(1/ϵ)), where C a ≥ sup |a| is a constant.
Challenge. Despite being applicable to generic symbols, one can observe from (20) that the success probability of the circuit in Figure 3 can be low when pd is large. In the following sections, we show that this challenge can be overcome when the symbol a(x, ξ) has additional structures.

Efficient block encoding for separable symbols
As explained in Section 3, the circuit designed as in Figure 3 suffers from exponentially small success probability, despite being simple and applicable to generic PDOs. In this section, we are concerned with symbols with particular structures and an efficient block encoding of the corresponding PDOs with O(1) success probability is constructed.

Separable symbols
We first give the following definition for separable symbols.

Definition 1. A symbol a(x, ξ) is separable if a(x, ξ) = α(x)β(ξ).
As explained in Section 2.4, we identify the frequency (P/2, . . . , P −1) with (−P/2, . . . , −1), respectively, since P is a period for the frequency variable after DFT. We also definȇ where e j is the j-th standard basis vector in C d . With help of the notations (21), the PDO (11) becomes It is clear from the definition thatα is P -periodic since α is 1-periodic, and we also have sup |α| = sup |α|, sup |β| = sup |β|. Figure 6: Circuit for an efficient block encoding for separable PDOs (22). Here b 1 , b 2 are the number of ancilla qubits needed for Uβ and Uα, respectively, 1 |f | x f ( x P ) |x⟩ is the normalized input data, U FT is the QFT circuit, Uβ, and Uα are the block encodings of Dβ and Dα, respectively, and the desired output is obtained with normalizing factor 1 |Af | when getting |0 b1+b2 ⟩ for the b 1 + b 2 ancilla qubits. Dβ and Dα are diagonal matrices defined in (8).
For the PDO (22), we propose an efficient block encoding illustrated by the following circuit.
For the rest of this section, we explain the circuit in Figure 6 with more details and show that it significantly improves the success probability compared with Figure 3. The circuit begins with a QFT step similar to that in Figure 3.
Step 2. Apply QFT and multiply the factorα(x). In order to multiply the factoȓ α(x) in the symbol, we apply QFT and convert the state to the space domain. Since where we have used (24) in the last line. By using Proposition 4 again with m = d, g =α and ϵ replaced by ϵ 2C β , the state |ϕ 1 ⟩ |0 b 2 ⟩ is mapped to |ϕ 2 ⟩ |0 b 2 ⟩ + ⊥ 2 , where ⊥ 2 is an unnormalized state orthogonal to all state of the form |x⟩ |0 b 2 ⟩ and |ϕ 2 ⟩ satisfies ∥ |ϕ 2 ⟩ − 1 Cα DαU FT ⊗d |ϕ 1 ⟩ ∥ < ϵ 2CαC β . In this step, O(poly(pd) + polylog(1/ϵ)) gates and b 2 = O(poly(pd)+polylog(1/ϵ)) ancilla qubits are used. The image of ⊥ 1 is still orthogonal to all states of the form |ξ⟩ |0 b 1 ⟩ since the b 1 ancilla qubits used in the previous step are unchanged. Therefore, the final state is where ⊥ is an unnormalized state orthogonal to all states of the form |x⟩ |0 b 1 +b 2 ⟩ and |ϕ 2 ⟩ satisfies where we have used the inequality (25) and the fact that C α ≥ sup |α| = sup |α| in the last line. By checking the definition of block encoding and adding up the gates and ancilla qubits used, we obtain the following theorem. Theorem 6. If a(x, ξ) = α(x)β(ξ) is a separable symbol as defined in Definition 1, then the discretized PDO (22) can be (C α C β , O(poly(pd) + polylog(1/ϵ)), ϵ)-block-encoded using the circuit displayed in Figure 6 with gate complexity O(poly(pd) + polylog(1/ϵ)), where C α , C β > 0 are constants such that C α ≥ sup |α| and C β ≥ sup |β|. Remark 1. In contrast to the result in Theorem 5, one can observe that the exponential factor 2 pd 2 is removed, and thus the success probability for the circuit in Figure 6 is improved exponentially compared to the one in Figure 3.

Linear combination of separable terms
With the block encoding for PDOs with separable symbols ready, the PDO for a linear combination of separable terms, i.e., can also be block-encoded, thanks to LCU (see Section 2.2). More precisely, we have the following corollary.  a (1, a, ϵ)-block-encoding of the discretized PDO A j associated with symbol a j (x, ξ) (see (22)) constructed in Theorem 6 with a = O(poly(pd)+polylog(1/ϵ)). Then (U † L ⊗I a+s )W (U R ⊗I a+s ) is a (δ, a+b, (1+δ)ϵ)-blockencoding of m−1 j=0 y j A j , where I a+s denotes the identity operator with size 2 a+s × 2 a+s . The gate complexity of the corresponding circuit is O(m(poly(pd) + polylog(1/ϵ))).

Efficient block encoding for fully separable symbols with explicit circuits
For separable symbols, the circuit presented in Figure 6 significantly increases the success probability compared to the one in Figure 3. However, this relies on circuits for arithmetic functions (see Proposition 4), which can still be challenging to construct in practice. In this section, we develop a more explicit circuit with the help of QSP and QET (see Section 2.3).

Dimension-wise fully separable symbols
To begin with, we consider the fully separable symbols defined as follows.
is a real even function, a real odd function or an exponential function of the form f (y) = exp(iθy) for some real parameter θ.
Similar with (21), we introduce the following notations: Then the discretized PDO (11) becomes (29) In order to block-encode the PDO (29), we adopt the following circuit, as shown in Figure 7, which is similar with the one in Figure 6:  |f | x f ( x P ) |x⟩ is the normalized input data, U FT is the QFT circuit, Uβ k and Uα k are the block encodings of Dβ k and Dα k , respectively, and the desired output is obtained with normalizing factor 1 |Af | when getting |0 b1+b2 ⟩ for the b 1 + b 2 ancilla qubits. Here Dβ k and Dα k are diagonal matrices defined in (8).
Exploiting the fully separable structure of the symbol, one can construct explicit circuits for the diagonal multiplications shown in Figure 7 by leveraging QSP and QET (see Section 2.3). To this end, we first introduce a lemma that gives Hermitian block encodings for two diagonal multiplication prototypes that allow us to build the QET circuit afterward. More specifically, by combining a series of single-qubit rotations, one can construct a diagonal matrix with diagonal elements {exp(ijθ)} P −1 j=0 . Then one can build a diagonal matrix with diagonal elements {sin(ijθ)} P −1 j=0 using a simple LCU circuit, from which a diagonal matrix with diagonal elements {g(ijθ)} P −1 j=0 can be built with QET for some smooth function g. Since the grid points of the frequency variables are taken as {− P 2 , − P 2 + 1, . . . , P 2 − 1}, one also needs the corresponding diagonal matrices where the index j takes values in {− P 2 , − P 2 + 1, . . . , P 2 − 1} rather than from 0 to P . We denote the corresponding matrices by subscript − as opposed to + if the index j goes from 0 to P . During the preparation of this paper, we notice that a similar result is built in [27] independently.
where R z is the single qubit rotation defined in Section 2.1. Let U σ be the circuit displayed in Figure 8, where σ ∈ {−, +}, then U σ is a (1, 1)-Hermitian-block-encoding of sin(θD σ ). In fact, the matrix corresponding to U σ is and this is a Hermitian matrix with the first diagonal block being sin(θD σ ) after ignoring the global phase factor i, since sin(θD σ ) cos(θD σ ) cos(θD σ ) − sin(θD σ ) is Hermitian. It can be seen from Figure 8 that the number of elementary gates used is 2p + 5.
Now, we aim to construct the block encoding of diagonal matrices Dα j =α j (D + ) and Dβ j =β j (D + ) = β j (D − ), namely Uα k and Uβ k in Figure 7. For the case where α j (x j ) = exp(iθx j ) or β j (ξ j ) = exp(iθξ j ), the matrices R − = exp(iθD − ) and R + = exp(iθD + ) constructed in Lemma 8 are exactly the diagonal matrices Dβ j = β j (D − ) and Dα j =α j (D + ), respectively. Therefore, we devote the rest of this section to the case where β j and α j are even or odd real functions. Thanks to the block encodings U + and U − introduced in Lemma 8, what remains to do is to find polynomial approximations of α and β so as to complete the QET procedure described in Section 2.3. Specifically, we restrict the parameter θ to be 0 < θ < π 2P in order to recover θD σ from sin(θD σ ) with the arcsin function. Going through the QET procedure, one obtains the following result.

Proposition 9.
Let U − and U + be the (1, 1)-Hermitian-block-encodings of sin(θD − ) and sin(θD + ) constructed in Lemma 8 for 0 < θ < π 2P with P = 2 p . Assume that g is an even (resp. odd) continuous real function on [−P, P ] and that deg g (ϵ) is the smallest positive integer such that there exists an even (resp. odd) polynomialg with the degree bounded by deg g (ϵ) satisfying sup − sin(P θ)≤x≤sin(P θ) where ∥ · ∥ is the L ∞ norm on [−1, 1] and C g is a constant such that C g ≥ sup |g|. Then there is a (C g , 2, ϵ)-block-encoding for both g(D − ) and g(D + ) with O(p deg g (ϵ)) gates, where D − and D + are defined in Lemma 8.
Proof. The circuit in Figure 2(b) with U A replaced by U σ gives a (C g , 2)-block-encoding forg(sin(θD σ )), and since sup − sin(P θ)≤x≤sin(P θ) where the operator 2-norm is used. Thus the circuit in Figure 2(b) with U A replaced by U σ gives a (C g , 2, ϵ)-block-encoding for where σ ∈ {−, +} and we have used the fact that 0 < θ < π 2P in the first equality. Since O(p) gates are used in U σ , the gate complexity of the circuit described above is O(p deg g (ϵ)), which closes the proof.
The gate complexity of the circuit built with QET in Proposition 9 depends on the smoothness of g. For instance, we have the following corollary.

Corollary 10. Assume that g is an even (resp. odd) differentiable real function on
. Let η g (ϵ, θ) be the smallest integer such that there exists a polynomial u with degree η g (ϵ, θ) satisfying sup |y|< π 2θ |u(y) − g(y)| < ϵ 3 , then the gate complexity of the circuit used in Proposition 9 is |g ′ (y)|. In particular, if g is a polynomial, the gate complexity reduces Proof. Without loss of generality, assume 1 > ϵ 2Cg , otherwise we can just letg = 0 (30). Since 1 θ arcsin(x) is an analytic function whose power series centered at x = 0 has convergence radius 1 > sin(P θ), there is a truncation v of the Taylor series of 1 θ arcsin(x) with degree O(log the coefficients of the Taylor series of 1 θ arcsin(x) at x = 0 are all non-negative, it holds , and therefore |g( 1]. In addition, we have for x ∈ [− sin(P θ), sin(P θ)]. According to Proposition 9, we have deg g (ϵ) = O(log C ′ g θϵ η g (ϵ, θ)) and the gate complexity of the circuit used in Proposition 9 is O p log C ′ g θϵ η g (ϵ, θ) , where the factor p comes from preparing sin(θD σ ) as mentioned in Lemma 8. In the case that g is a polynomial, we can simply let u = g and thus η g (ϵ, θ) ≤ deg g.
For the final step, we first introduce the notation where degα k (ϵ) and deg β k (ϵ) are defined in Proposition 9. Denote by Uα k and Uβ k the block encodings ofα k (D + ) andβ k (D + ) = β k (D − ) obtained in Proposition 9 with ϵ replaced by ϵ/2dC, respectively, where and C ≥ d k=1 (sup |α k | sup |β k |) since Cα k ≥ sup |α k | and Cβ k ≥ sup |β k |. Now we are ready to prove the following theorem that relies on the block encodings d k=1 Uα k and d k=1 Uβ k in Figure 7.
is a fully separable symbol as defined in Definition 2, then the corresponding PDO defined by (29) can be (C, O(d), ϵ)-block-encoded with gate complexity O(dp deg a ϵ 2dC + dp 2 ) using the circuit displayed in Figure 7, where C > 0 is a constant defined in (32), and deg a ϵ 2dC is defined in (31).
Proof. Since each Uα k is a (Cα k , 2, ϵ/2dC)-block-encoding forα k (D + ) according to Propo- Hence by the same argument as the proof of Theorem 6 (especially (23), (24), (25), (26) and (27)), the circuit in Figure 7 gives a (C, 4d, ϵ) block encoding of the PDO (29) [29] for example) can be used to replace the QFT blocks in Figure 7. With similar arguments as in the proof of Theorem 11, one can show that the gate complexity can be reduced to O(dp deg a ϵ 2dC ).

Linear combination of fully separable terms
Similar to Corollary 7, we can block-encode the PDO for a linear combination of fully separable terms, i.e., with LCU (see Section 2.2) and Theorem 11, which is stated in the following corollary.   1, a, ϵ)block-encoding of the discretized PDO A j associated with symbol a j (x, ξ) (see (29)) constructed in Theorem 11 with a = O(d). Then (U † L ⊗ I a+s )W (U R ⊗ I a+s ) is an (δ, a + b, (1 + δ)ϵ)-block-encoding of m−1 j=0 y j A j , where I a+s denotes the identity operator with size 2 a+s × 2 a+s . The gate complexity of the corresponding circuit is O(dp m−1 j=0 deg a ϵ 2d + dp 2 m).

Applications
In this section, we provide worked examples for particular symbols using the circuit shown in Figure 7 and provide complexity analysis, beginning with a variable coefficient second-order elliptic operator.

Second-order elliptic operator with variable coefficients
Recall that the elliptic operator introduced in (2) is of the following form: In this section, we assume that ω(x) > 0 has a low-rank Fourier expansion Many commonly seen functions have low-rank expansions or approximations. For instance, ω(x) = 2 + sin(2π d l=1 x l ) > 0 can be written in the rank-3 form Plugging the form (34) of ω into (3), one obtains the symbol associated with the PDO above a(x, ξ) = 1 + where P = 2 p is the number of discrete points used for each dimension (see Section 2.4).
Notice that the terms e 2πiq j ·x ξ l P and e 2πiq j ·x ξ 2 l P 2 above are fully separable by Definition 2, thus by Corollary 12, we know that the corresponding PDO can be block-encoded. As explained in Section 5, the multiplication of e 2πiq j ·x = d l=1 e 2πiq jl x l can be implemented directly using R − and R + constructed in Lemma 8. Consequently, the number of gates needed for multiplying each e 2πiq jl x l factor without error is O(p), and no ancilla qubits are used. Since ξ l P and ξ 2 l P 2 are polynomials, by Corollary 10, the multiplication of each ξ l P and ξ 2 l P 2 factor can be implemented with O p log 1 ϵ gates to O(ϵ) precision, and O(d) ancilla qubits are used. Going through the proof of Theorem 11, one can see that the PDO associated with e 2πiq j ·x ξ l P and e 2πiq j ·x ξ 2 l P 2 can be (1, O(d), ϵ)-block-encoded with gate complexity O(p log( 1 ϵ ) + p 2 + dp), where the three terms account for implementing the polynomials of ξ l , the QFT of the l-th component, and the multiplication of e 2πiq j ·x , respectively. Finally, going through the LCU step as in Corollary 12 with O(dr) terms, one obtains a (γ, O(d + log(dr)), (1 + γ)ϵ)-block-encoding of the PDO (2) with total gate complexity O(dr(p log 1 ϵ + p 2 + dp)) = O dpr(log 1 ϵ + d + p) , where γ = 1 + 4π 2 (P r j=1 |c j |∥q j ∥ 1 + P 2 d r j=1 |c j |). This result is summarized in the following theorem, where we have used O(d + log(dr)) = O(d + log(r)).
Similar to previous sections and by a slight abuse of notation, we denote the discretization of the operator defined in (33) also by A. Now we can use the QLSA in [13] to get the following corollary: Corollary 14. Let (A, b) be the discretization of the operator and the right-hand side of (33), respectively, there is a quantum algorithm finding the normalized state |A −1 b⟩ = In other words, U A is a (1, O(d + log(r)), 0)-block-encoding of some matrixÃ/γ such that ∥A −Ã∥ < ϵ/γ. Therefore, we have ∥Ã∥ = O(γ), ∥Ã −1 ∥ = O(1), and thus κ(Ã) = O(γ).
The main theorem of [13] gives a quantum algorithm that can output a state |y⟩ that is O(ϵ) close to |(Ã/γ) −1 b⟩ = |Ã −1 b⟩ =Ã So |y⟩ is also an O(ϵ) approximation of |A −1 b⟩ and the overall gate complexity is O γdr log 1 ϵ log P log γP ϵ + d .

Conclusion and Discussion
This paper systematically investigates block encodings for pseudo-differential operators (PDOs) under different structural assumptions. A block encoding scheme for PDOs with generic symbols is developed in Section 3, and the quantum circuit is illustrated in Figure 3. For PDOs with linear combinations of separable symbols, we improve the success probability exponentially and present an efficient block encoding algorithm in Section 4. Then a more explicit and practical block encoding scheme is derived in Section 5 with the help of QSP and QET, along with which the complexity analysis is provided. Plenty of worked examples are given in Section 6, including the block encoding of elliptic operators with a variable coefficient that is difficult to deal with for quantum solvers that use finite difference schemes and the block encoding of the inverse of constant-coefficient elliptic operators without using quantum linear system algorithms. The block encoding schemes presented in this paper enrich the study of the block encoding of dense operators and shed new light on designing practical quantum circuits for scientific computing.
For future directions, one can apply the established results in this paper to other PDOs besides the ones presented in Section 6. One can also use the idea of symbol calculus to implement different operations on the PDO, such as taking square root or exponential, which can help solve certain PDEs in practice.  a j b k |j 0 · · · j n−1 ⟩ |k 0 · · · k n−1 ⟩ , and after applying the CNOT gates, the k register is only |0⟩ when j l = k l for all l = 0, . . . , n − 1. which means the state becomes j 0 ,j 1 ,...,j n−1 a j b j |j 0 · · · j n−1 ⟩ |0⟩ + |⊥⟩ =

A Multiplication of two states
where |⊥⟩ is a term that is orthogonal to |j⟩ |0⟩ for any j. Thus the probability of obtaining |0⟩ after the measurement is N −1 j=0 |a j b j | 2 = c 2 , and the outcome of the system register is