Block-encoding dense and full-rank kernels using hierarchical matrices: applications in quantum numerical linear algebra

Many quantum algorithms for numerical linear algebra assume black-box access to a block-encoding of the matrix of interest, which is a strong assumption when the matrix is not sparse. Kernel matrices, which arise from discretizing a kernel function $k(x,x')$, have a variety of applications in mathematics and engineering. They are generally dense and full-rank. Classically, the celebrated fast multipole method performs matrix multiplication on kernel matrices of dimension $N$ in time almost linear in $N$ by using the linear algebraic framework of hierarchical matrices. In light of this success, we propose a block-encoding scheme of the hierarchical matrix structure on a quantum computer. When applied to many physical kernel matrices, our method can improve the runtime of solving quantum linear systems of dimension $N$ to $O(\kappa \operatorname{polylog}(\frac{N}{\varepsilon}))$, where $\kappa$ and $\varepsilon$ are the condition number and error bound of the matrix operation. This runtime is near-optimal and, in terms of $N$, exponentially improves over prior quantum linear systems algorithms in the case of dense and full-rank kernel matrices. We discuss possible applications of our methodology in solving integral equations and accelerating computations in N-body problems.


Introduction
One of the most promising applications of quantum information processing is in solving linear algebraic problem in high dimensions efficiently. Since the Harrow-Hassidim-Lloyd (HHL) algorithm was first developed for solving linear systems of equations [1], a number of improvements have been proposed. Significantly, the breakthrough algorithms in [2] and [3] solved sparse linear systems of equations in time polylogarithmic in the system size N and the target error bound ε. When these algorithms are applied to dense matrices, however, their complexity scales linearly in N (see Table 1). Using a quantum random access memory (QRAM) data structure [4], the work of [5] improved the runtime to O( √ N ). Classically, quantum-inspired algorithms [6,7] use an analog of QRAM called sample-query access and apply sketching techniques [8] to solve linear systems with runtimes independent of N but scaling poorly with the rank k of the matrix as O(k 6 ) (see Appendix A).
In fact, achieving a runtime logarithmic in N for dense and full-rank matrices is generally challenging unless there are specific symmetries or structures inherent in the matrix.
For example, quantum algorithms have been developed to achieve polylogarithmic complexity in N for Toepliz systems [9], Hankel matrices [10], and linear group convolutions [11] which all feature the group quantum Fourier transforms (QFT) [12] to implement operations as sparse matrices in the Fourier basis. This approach of using group QFTs, however, only applies for certain groups and requires the matrix entries to be exactly uniformly sampled from a generating function, thus inapplicable to more general settings (e.g., matrix entries corresponding to atoms on a lattice vibrating around their equilibrium positions).
Kernel matrices are a class of matrices whose entries are calculated as a function of a kernel k(x, x ). They have applications in mathematical physics [13], engineering [14] and machine learning [15], where they are often obtained by sampling or discretizing a smooth kernel k(x, x ) that reflects the physical model at hand. In general, kernel matrices are dense and full-rank. Hierarchical matrices (H-matrices) [16,17] are a classical framework for efficiently performing matrix operations on commonly found kernel matrices by hierarchically splitting the "space" dimension of the matrix entries. In terms of the matrix dimension N , classically performing matrix-vector multiplication with H-matrices takes time O(N polylog N ) which is a significant improvement over the generic O(N 2 ) time. This H-matrix structure lies at the heart of the celebrated fast multipole method [13]. This improvement intuitively arises because kernel matrices often decay or change slowly along entries that are farther from the diagonals of the matrix. H-matrices progressively approximate the matrix in larger low-rank blocks for farther off-diagonal entries to efficiently implement kernel matrices.
We present an efficient quantum algorithm to implement H-matrices in the blockencoding framework, which embeds matrices inside a unitary operator that can be queried multiple times. This construction, which is at the core of many quantum algorithms, allows one to conveniently perform matrix computations on a quantum computer [18][19][20][21]. As many recently proposed algorithms use block-encodings as a black box, one of the major challenges for quantum numerical linear algebra in this framework is to efficiently blockencode matrices so that the asymptotic runtimes are efficient even when the block-encoding complexity is included. Our H-matrix block-encoding procedure, combined with existing quantum linear systems algorithms, e.g., adiabatic solver in [22], provides near-optimal runtime in solving quantum linear systems for many typical kernels that are neither sparse nor low-rank, e.g., k(x, x ) = x−x −1 (see Table 1 for comparisons to prior work). Apart from the H-matrix block-encoding, we also provide in the Appendix a list of common kernel matrices that can be efficiently block encoded using existing techniques.
We proceed as follows. First, we overview classical hierarchical matrices in Section 2 and the quantum block-encoding framework in Section 3. We then present an algorithm for efficiently block-encoding hierarchical matrices and apply this to kernel matrices in Section 4. In Section 4.4, we generalize our methodology to construct block encodings for a more general class of matrices whose structure is derived from hierarchical matrices. Finally, we discuss applications of our algorithm in calculating N -body gravitational potentials and solving integral equations in Section 5.

Background in H-matrices and hierarchical splitting
To observe why hierarchical matrices are useful, consider a physical model which requires summation or integration over spacetime. The N -body gravitational force problem [ [3] (also see Lemma 3.2) b Lemma 5.2 uses the adiabatic solver in [22] which has optimal runtime dependence on κ c This input model is strictly more powerful than the oracle access model Table 1: Runtimes of previous methods versus our method for matrix multiplication (forward) and solving linear systems (inversion) of dense and full-rank kernel matrices, e.g., k(x, x ) = x − x −1 .
For purposes of comparison, we assume the operator norm of the matrix is bounded. The O notation hides terms that grow as polylog( N ε ). Here, κ is the condition number of the matrix operation and ε is the error bound on the output. For quantum algorithms, we assume the input vector is given as a quantum state and the runtime is equal to the circuit depth times the number of queries needed to obtain the output as a quantum state. We note that classical H-matrix literature and this work focus on kernel matrices, whereas the quantum algorithms we compare to are "general-purpose". For these general-purpose algorithms, runtimes are calculated assuming they are applied directly on kernel matrices. More details about prior works can be found in Appendix A.
for example, requires evaluating N potentials of the form where m i and x i denote the mass and location of particle i. Summing or integrating over all particles is equivalent to performing matrix-vector multiplication, generally requiring O(N 2 ) operations unless the underlying matrix is sparse or has special properties. As we will show, when particles are placed in a favorable arrangement (e.g., approximately uniformly), this matrix-vector multiplication can be approximately performed in time O(N polylog N ).
Hierarchical matrices (H-matrices) [16,17,24] are a powerful algebraic framework for accelerating matrix computations involving displacement kernels k(x, x ) = k( x − x ) which satisfy certain properties outlined below. The key idea is to (i) split the kernel matrix into hierarchically off-diagonal blocks and (ii) approximate each block by a lowrank matrix. This low rank approximation is justified because off-diagonal blocks represent distant interactions that are small or slowly changing with distance. We now detail the two ingredients of H-matrices, namely the hierarchical splitting and the off-diagonal low-rank approximation.

Hierarchical splitting into admissible blocks
Hierarchical splitting partitions a kernel matrix into so-called "admissible" blocks where entries representing more distant interactions are grouped together into larger blocks. Consider a system of N particles, whose pairwise interaction is described by the kernel k(x, x ). The kernel matrix of this system is defined as = + + + (2) (3) (4) (4) (3) (2) B A C Figure 1: (A) Structure of a hierarchical splitting for a 1D array of 16 particles (gray circles), which are close to uniformly distributed. The accumulative potential at the destination location (star, which can also be one of the sources) is decomposed into contributions from clusters of source particles whose sizes grow with the distance to the destination. (B) Cluster size decreases exponentially with the level of the hierarchical splitting. (C) The associated kernel matrix of this 16-particle system takes the form of an H-matrix, assuming that the destination locations are the same as the source locations. This matrix is decomposed into hierarchically structured blocks using the hierarchical splitting in (A), where each block is numerically low-rank.
where x i is the location of particle i. For ease of analysis, we assume the particles are uniformly distributed and choose x i = i/N, for any i ∈ {0, 1, . . . , N − 1} and N = 2 L . For example, this is the case when the kernel matrix is obtained by discretizing an integral operator over the domain [0, 1] (see [16]). In general, a kernel matrix need not be square and the source locations x i need not be 1-dimensional nor equally spaced and we will show how to generalize the current setup to these cases in later sections. We refer the interested readers to [17, [25][26][27][28] for further details.
A set of m particles located at {x j 1 , . . . , x jm } is denoted as a cluster σ, storing the list of individual indices σ = {j 1 , . . . , j m }.
i,j=0 and N = {0, 1, . . . , N − 1}. Let P(N ) be a partition of N , i.e., N = σ∈P(N ) σ. For a cluster σ ∈ P(N ), define the center of σ as c σ = mean{x i |i ∈ σ} and the radius of σ as r σ = max i∈σ |x i − c σ |. Furthermore, for σ, ρ ∈ P(N ), define the distance between the two clusters σ and ρ as dist(σ, ρ) = min i∈σ,j∈ρ |x i − x j |. Then, the matrix block Informally, a block is admissible if the distance between its two corresponding clusters is larger than their radii (other values of η can give rise to different forms of the hierarchical splitting, see [16,25]). We now split the matrix K into a hierarchy of admissible blocks consisting of L = log 2 N levels of decreasing radii. At level , define the following partition of N = {0, 1, . . . , N − 1}: where σ ( ) For example, P (0) = {N }, P (L) = {{i}|i ∈ N }, and P ( ) for ∈ {2, 3, 4} are shown in Figure 1B. Note that admissible blocks only exist for ≥ 2. We now construct a hierarchical splitting of our kernel matrix K into admissible blocks starting from the coarsest level = 2.
Next, we split K (2) inadmissible into admissible and inadmissible blocks in P (3) (excluding the admissible blocks of P (3) that are already covered in K (2) ) Recursively applying the decomposition K where inadmissible is a tridiagonal matrix which represents the "adjacent" interaction of neighboring particles (see Figure 1C). Observe that the matrices K ( ) are block-sparse; that is, each block-row or block-column of K ( ) contains at most three nonzero admissible blocks. Furthermore, these blocks represent "far-field" interactions which can be approximated by low-rank matrices if the kernel k(x, x ) satisfies some additional properties discussed in the next section.
In Appendix D.3, we describe a generalization of the hierarchical splitting to higher dimensions, where the two key properties still hold, i.e., there are only log N levels and each level K ( ) is block-sparse.

Low-rank approximation of admissible blocks
For far-field interactions in admissible blocks, the kernel k(x, x ) can be well approximated by a truncated Taylor series if it satisfies the following asymptotic smoothness condition: where C is a constant. For instance, the log kernel k(x, x ) = log(|x − x |) directly satisfies the above. This condition enables approximating any admissible block K σ,ρ by a rank-p matrix where D = Diag(1/q!) q∈P ∈ R p×p , P = {0, 1, . . . , p − 1} and Here, p regulates the precision of the approximation (typically p = polylog(N/ε) for an error bound ε). A detailed example and error analysis of this far-field approximation can be found in Appendix D. In addition, we note that conditions other than that of Equation 6 can also be used to justify the approximation (see [29][30][31]) of admissible blocks by low-rank kernels k(x, x ) = p−1 q=0 ψ q (x)ϕ q (x ). Combining this low-rank approximation and the hierarchical splitting forms a complete H-matrix. In Appendix D, we show that matrix-vector multiplication with an H-matrix can be performed in time O(N p log N ). In addition, other H-matrix operations such as matrix-matrix multiplication and matrix inversion run in time O(N p 2 log 2 N ), we refer the reader to [16,24,25] for further details. The rank p can be chosen at each level to further optimize computational complexity versus approximation error [26].

Block-encoding arithmetic
Throughout this paper, we employ the block-encoding framework to perform matrix computations on a quantum computer [18]. Here, an s-qubit (non-unitary) operator A ∈ C 2 s ×2 s (let N = 2 s ) is embedded in a unitary operator U ∈ C 2 (s+a) ×2 (s+a) using a ancilla qubits such that the top left block of U is proportional to A: where α is the normalization factor of the block-encoding. Formally, the (s + a)-qubit where · denotes the operator norm of a matrix and I s is the identity operator on s qubits. In other words, applying the unitary U to a quantum state |ψ and post-selecting on the measurement outcome |0 a on the ancilla qubits is equivalent to applying the operation A on |ψ : where |G is a garbage state that is orthogonal to the subspace |0 a , i.e., ( 0 a | ⊗ I s ) |G = 0. The probability of successfully post-selecting |0 a is thus equal to A |ψ 2 α −2 ≤ ( A /α) 2 . Observe that α ≥ A is required to embed A inside a unitary matrix. Further, since the probability of success of the post-selection step depends on the ratio A /α, a block-encoding U is optimal if α = Θ( A ). More generally, if the ratio α/ A grows logarithmically with the size of the matrix N , the probability of success also only changes logarithmically with N . More generally, as we will see some examples in Section 5, the complexity of quantum algorithms [3] in the block-encoding framework depends on the ratio α/ A . This important observation motivates the following definition of a "good" block-encoding.  )) and the circuit depth of U is O(polylog( N ε )). The block-encoding is called optimal if α = Θ( A ).
Throughout this paper, we assume that matrices are normalized to have the largest entry at most 1, and we analyze the optimality of block-encodings by comparing the normalization factor α to the operator norm A .
Before proceeding to block-encode a hierarchical matrix, we provide various methods for block-encoding the individual matrix blocks in the hierarchical splitting. Depending on the locations of the matrix blocks and how the values of the kernel entries can be accessed by a quantum computer, there are various different ways one can block-encode a given matrix. Previous work has proposed efficient means to block-encode sparse matrices, purified density matrices, etc., as well as transformations of these matrices in the block-encoding framework [3,18,32]. We list some previously established results in this framework in Appendix B, among which we particularly use Lemma B.5 to block-encode sparse matrices with oracle access to entries, Lemma B.3 to linearly combine block-encoded matrices, and Lemma B.4 to multiply block-encoded matrices. Later, we will apply and combine these individual block-encoding Lemmas to form a block-encoding of the complete hierarchical matrix.
First, we provide a version of Lemma B.5 applied to dense matrices, which we refer to as the "naive" block-encoding approach since it is unaware of the inherent structure of the block-encoded matrix. Lemma 3.2 (Naive block-encoding of dense matrices with oracle access). Let A ∈ C N ×N (where N = 2 s ) and letâ ≥ max i,j |a ij |. Suppose the following oracle is provided ) oneand two-qubit gates and O(b, polylog(â N ε )) extra qubits (which are discarded before the post-selection step). Assumingâ = 1, the normalization factor of this block-encoding is N , which is suboptimal if A N , e.g., this situation arises when the matrix A has many small entries. This suboptimal block-encoding could lead to an exponentially small probability of success in the post-selection step. Therefore, a structure-aware block-encoding procedure is needed to achieve an optimal runtime in terms of N .
To directly implement the H-matrices outlined in Section 2, we need procedures to block-encode block-sparse and low-rank matrices which we provide in the Lemmas below (all proofs are deferred to Appendix B). First we define state preparation pairs, which are frequently used in this study to prepare the coefficients in a linear combination of matrices.

Definition 3.3 (State preparation pair [3]).
Let y ∈ C m and y 1 ≤ β, the pair of unitaries Next, we show how to implement a block-encoding of low-rank matrices. Lemma 3.4 (Block-encoding of low-rank operators with state preparation unitaries, inspired by Lemma 1 of [33]).
Let r = log p and p−1 i=0 |σ i | ≤ β. Suppose the following (r + s)-qubit unitaries are provided: where 0 ≤ i < p and |u i (|v i ) is the quantum state whose amplitudes are the entries of u i (v i ). Let (P L , P R ) be a (β, r, ε)-state-preparation-pair for the vector [σ 0 , . . . , σ p−1 ]. Then, one can construct a (β, r+s, ε)-block-encoding of A with one use of each of G L , G † R , P † L , P R , and a SWAP gate on two s-qubit registers.
The above lemma provides one method to block-encode the approximately low-rank matrices described in Section 2, but as we will see, is not the only option one has at their disposal. When the rank of a matrix is small and bounded, the time needed to prepare unitaries P L , P R is typically negligible; e.g., they can be efficiently implemented with an oracle providing access to σ i (and β is close to the sum of the singular values). In addition, the unitaries G R , G L can be constructed using existing state preparation methods based on the structure of the entries in the singular vectors (e.g., [34] utilized efficiently integrable distributions, [4] used a QRAM data structure). In Appendix B.3, we also present an approach to efficiently construct G R , G L from oracles providing access to the entries of u i , v i for cases where the entries change slowly (Lemma B.8).
Next, the following lemma block-encodes a block-sparse matrix, whose blocks are also block-encoded. i,j=0 |i j| ⊗ A ij be a d r -row-block-sparse and d c -column-block-sparse matrix, where each A ij is an s-qubit operator. Let U ij be an (α ij , a, ε)-block-encoding of A ij . Suppose that we have the access to the following 2(t + 1)-qubit oracles r ik is the index for the k-th non-zero block of the i-th block-row of A, or if there are less than k non-zero blocks, then r ik = k+2 t , and similarly c lj is the index for the l-th non-zero block of the j-th block-column of A, or if there are less than l non-zero blocks, then c lj = l + 2 t . Additionally, suppose the following oracle is provided:

Block-encoding of kernel matrices via hierarchical splitting
In this section, we apply the block-encoding techniques delineated in Section 3 to blockencode H-matrices. Our goal is to obtain good block-encodings (Definition 3.1) of these matrices where the ratio between the normalization factor and the operator norm is at most O(polylog(N )). We provide an optimal block-encoding procedure for kernel matrices arising from a variety of polynomially decaying and exponentially decaying kernels. We show that using only the hierarchical splitting without subsequent low-rank approximation gives an optimal block-encoding for these kernel matrices, enabling matrix transformations (e.g., multiplication, inversion, polynomial transformations) with optimal query and gate complexities. Then, we provide a block-encoding procedure for general H-matrices whose admissible blocks are approximated by low-rank matrices and discuss the applicability of our procedure on other variants of hierarchical splitting.

Polynomially decaying kernels
We use the hierarchical splitting described in Section 2 to implement an optimal blockencoding procedure for polynomially decaying kernels, which take the following form where p > 0 and |C| ≤ 1 represents self-interaction terms (e.g., C = 0 in Coulomb point source interactions). For concreteness, we consider the problem of computing pairwise interactions in a system of N particles. Particularly, consider a 1D system of N particles , where x i and m i are the location and mass of the particle i, respectively. The potential at x i is the sum over the pairwise interactions For simplicity and ease of analysis, we consider a system of equally distanced particles in which x j = j and m j = 1 for any j ∈ [N ], where N = 2 L . Note that we choose the domain to be [1, N ] instead of [0, 1] as considered in Section 2, so that the maximum entry in the kernel matrix is 1. Had the particles been placed in the region [0, 1], we could simply downscale the matrix by dividing it by N p so that the entries are bounded by 1. Recall from Section 3 that doing so does not change the block-encoding and algorithmic runtime as these depend on the ratio between the operator norm and the normalization factor. This ratio is unchanged under global rescaling of the matrix because the normalization factor will rescale accordingly in the block-encoding procedure.
To access the kernel function on a quantum computer, we assume the following oracle is given wherek can be computed via an efficient classical circuit, one can construct a comparably efficient quantum circuit for O k [35]. In particular, this is the case when the distribution of particles is known and efficiently computable (e.g. x j are equally spaced points in the discretization of integral operators). We optimally block-encode the kernel matrix K = (k(x i , x j )) N −1 i,j=0 up to error ε with a normalization factor of Θ( K ) by using only two queries to O k and O(polylog(N, 1 ε )) additional resources (i.e., one-and two-qubit gates and ancilla qubits). At a high level, our method proceeds as follows. First, we decompose K into a hierarchical structure of admissible blocks as described in Section 2 Note that this hierarchical decomposition can also be performed in higher dimensional systems (see Appendix D.3). Next, for each admissible block (which is a dense matrix) in the level K ( ) , we block-encode it by using the naive procedure for dense matrices (Lemma 3.2). Then, we use the procedure for block-sparse matrices (Lemma 3.5) to form a block-encoding of K ( ) . Finally, we sum over hierarchical levels to obtain a block-encoding of K. Specifically, at level of the hierarchy, each admissible block has dimension 2 L− . In addition, the minimum pairwise distance between particles in an admissible block in K ( ) is d ( ) min = 2 L− , so the maximum entry of an admissible block K σ,ρ in K ( ) is bounded by k It follows that an admissible block at level can be (α , L − + 1, ε 3 )-block-encoded via the naive procedure for dense matrices (Lemma 3.2), where α = 2 (L− )(1−p) , using two queries to O k and O(polylog( N ε )) additional one-and twoqubit gates. Note that in using Lemma 3.2 to block-encode a block K σ,ρ at level , one actually needs to query k(x i , x j )/k ( ) max rather than k(x i , x j ) itself. This is readily achieved from the oracle O k by conditioning on a register that indexes the level and the block indices σ, ρ (see Appendix C.1 for details). From here, each K ( ) , which is a block-sparse matrix of block-sparsity 3, can be (3α , L+3, ε)-block-encoded via Lemma 3.5 (particularly Remark 3.6). Observe that while K ( ) is a 2 L × 2 L matrix, the normalization factor of this block-encoding is only 3α , which is better than the normalization factor of N that one would get when naively applying Lemma 3.2 to block-encode K ( ) . In addition, we still only need two queries to O k and O(polylog( N ε )) additional gates because all blocks K σ,ρ are constructed in parallel in Lemma 3.5 (see detailed analysis in Appendix C.1).
The adjacent interaction part K ad , which is a tridiagonal matrix, can be (3, L + 3, ε)block-encoded with the procedure for sparse matrices (Lemma B.5), using two queries to O k and O(polylog( N ε )) one-and two-qubit gates. Finally, we use the procedure for linearly combining block-encoded matrices (Lemma B.3) to obtain a block-encoding of This step requires a (log L)-qubit state preparation pair (Definition 3.3) which prepares the coefficients in the linear combination. We neglect the error in implementing this state preparation pair since it can be performed with log L = log log N qubit operations (see Appendix C.1). This entire procedure results in a (α, L+log L+3, αε)-block-encoding of the exact kernel matrix K, where for L = log N , As before, the linear combination step does not significantly increase the circuit complexity since all levels can be constructed in parallel with only two queries to the oracle O k . See Appendix C.1 for more detailed analysis of the entire block-encoding procedure.

Remark 4.1 (Optimality). The block-encoding of the polynomially decaying kernels in Equation 13
is optimal in the normalization factor α. Indeed, a lowerbound on the operator norm of K can be obtained by evaluating the norm of the vector K |1 , where |1 is the normalized all-ones state. Observe that, Clearly, the existence of our block-encoding procedure also implies that these (up to a constant factor) are an upper bound for K .

Remark 4.2 (Generalized polynomially decaying kernel). Since the above analysis only relies on the maximum entry in admissible blocks, it still holds for the case of non-uniform (bounded) masses m i or more general kernels of the form
In these cases, we simply scale the kernel matrix such that the maximum entry is 1 (if needed) and apply the above analysis. This generalized class includes a variety of kernels, ranging from power-law interactions to common Green's functions [36].
The above block-encoding procedure similarly applies for higher-dimensional settings where coordinates x j are in a d-dimensional space with d constant. In Appendix D.3, we show that there are still only log N hierarchical levels in this higher dimensional setting and that each level K ( ) is still block-sparse. The two key features of the hierarchical splitting that allow for optimal block-encodings are maintained, and the normalization factor is still α = Θ( K ). The above optimal block-encoding construction and remarks are summarized in the following lemma.

Lemma 4.3 (Optimal block-encoding of polynomially decaying kernel matrix using H-matrix).
can be (α, polylog N, ε)-block-encoded with α = Θ( K ) using the hierarchical matrix splitting. That is, the block-encoding is optimal according to Definition 3.1. This block-encoding can be constructed with two queries to O k and O(polylog N ε ) additional one-and two-qubit gates. Our block-encoding procedure is optimal for common kernel matrices, which are neither sparse nor low-rank, thus expanding the scope of application of quantum computers in the block-encoding framework. In Section 5, for example, we discuss applications of this blockencoding in N -body force computations and solving quantum linear systems.

Remark 4.4.
When sources are not close to uniformly distributed, one must first apply classical adaptive methods [27,28] which discretize the domain to form a finer mesh, such that the number of nonzero-mass sources in each mesh site is O (1). The dimension of the kernel matrix in this method depends on the inverse of the smallest pairwise distances between the actual sources. Therefore, the runtime of the block-encoding remains polylogarithmic in the number of particles N so long as the minimum pairwise distance is not exponentially small (see Appendix C.3).

Exponentially decaying kernels
The procedure used in the previous section readily generalizes to exponentially decaying kernels. Consider, for example, the following kernel where q > 0 and P ( x − x ) is a function growing polynomially or slower. Specifically, |P (y)| ≤ O(y k ). Typical kernels of this form include the Laplace and Gaussian kernels, which are commonly used in machine learning [15]. As before, we first hierarchically decompose the matrix into admissible blocks. For each admissible block K σ,ρ at level , the minimum pairwise distance between particles is d min = 2 L− and therefore the maximum entry in the block can be bounded as where we choose an integer t that satisfies qt − k > 1.
Thus, an admissible block K σ,ρ at level can be (α , L − + 3, ε/3)-block-encoded via the naive procedure for dense matrices (Lemma 3.2), where α = t!2 (L− )(1−(qt−k)) . From here, an efficient block-encoding of K can be obtained via the same procedure described in the previous section, where p is now replaced by qt − k > 1. It can be verified that the normalization factor of this block-encoding is Θ(1), which is optimal since the maximum entry of K is of magnitude Θ(1).
We note, however, that one can make use of a sparsification approach, in which the K is approximated as a sparse band matrix, and apply the block-encoding procedure for a sparse matrix (Lemma B.5). This approach yields a block-encoding with normalization factor logarithmic in the sparsification error (see Section 5.3).

Block-encoding of general H-matrices
In the setting most consistent with that of classical H-matrix literature, one may be provided oracle access to the low-rank vectors in Ψ σ,ρ and Φ ρ which approximate an admissible block K σ,ρ = Ψ σ,ρ D (Φ ρ ) † (as seen in Equation 7). Such a situation may also arise when one wants to apply an arbitrary H-matrix which has no connection to any kernel function, e.g., see [37]. In this case, we can use the block-encoding procedure for low-rank matrices (Lemma 3.4) to block-encode K σ,ρ .
After block-encoding the admissible blocks in each level of the hierarchical splitting, we follow the same procedure in previous sections to obtain the entire kernel matrix. Namely we apply the procedure for block-sparse matrices (Lemma 3.5) to construct a block-encoding of K ( ) for each level of the hierarchy; then we sum over all levels using the procedure for linear combination of block-encoded matrices (Lemma B.3) to obtain a block-encoding of K. The normalization factor of the block-encoding of K is bounded as where α is normalization factor of the admissible blocks K σ,ρ at level (for ease of analysis we assume admissible blocks at the same level have the same block-encoding normalization factor). As before, assuming the maximum entry of K is 1, the naive block-encoding (Lemma 3.2) yields a block-encoding with a normalization factor of N , the dimension of the matrix. Whereas, H-matrix block-encoding can yield an exponentially better normalization factor over the naive approach when α are O(polylog(N )). This is not the case for some kernels studied in classical H-matrix literature [13,25] such as the log kernel k( In fact, we show in Appendix C.6 that these particular kernels and most polynomially growing kernels can already be optimally block-encoded using the naive approach of Lemma 3.2. The naive approach works intuitively because these kernels are numerically low-rank and their top singular vectors are nearly uniform.

Variants of hierarchical splitting
The form of the hierarchical splitting can be slightly modified for other types of decaying kernels. For example, consider kernels of the form k(x, x ) = 1 |(x−x )+c| p on the domain Ω = [0, N − 1) and −N < c < N is an integer. In this case, we can use a "shifted" hierarchical splitting to obtain an optimal block-encoding of the kernel matrix K = (|i − j + c| −p ) N −1 i,j=0 , in which the hierarchy is shifted in the horizontal direction (along the rows). Similarly, for kernels of the form k( , we can apply a shifted hierarchical splitting along the skew-diagonal direction. We illustrate this approach in Appendix C.2. In Appendix C.4, we also describe a more general block-encoding that exploits the structure of entry magnitudes in a given matrix even if the entries of the admissible blocks are not neighboring (in fact, Definition 2.1 does not require the entries of an admissible block to be contiguous). At a high level, we use a pointer oracle which, conditioned on the level , points to the entries whose magnitudes are in the range [2 −( +1) , 2 − ). When this oracle can be efficiently implemented, this block-encoding procedure might find applications in other classes of dense matrices. Proposition 4.5 (Block-encoding using generalized hierarchical splitting). Let A ∈ R N ×N be a Hermitian matrix with non-negative entries which are provided via an oracle O A : |i |j |z → |i |j |z ⊕ã ij , whereã ij is an exact bit string description of a ij . Suppose N = 2 L and 2 −L ≤ a ij ≤ 1. For any 0 ≤ < L = log N and a column j, define its level-row-index subset to be I (j) = {i|2 −( +1) < a ij ≤ 2 − } and let n (j) = |I (j)|. Suppose there exist n and γ such that γn ≤ n (j) ≤ n for any j. Furthermore, suppose there exists an invertible function f j ( , k) which returns the row index of the k-th element (according to some fixed ordering, e.g. from top to bottom in the column) in I (j). Then, one can construct a block-encoding of A with a normalization factor of at most 2 Intuitively, γ measures how balanced the magnitudes of the matrix entries are across rows and columns. If 1 γ = polylog(N ), this yields a good block-encoding according to Definition 3.1. For instance, 1 γ is a constant for kernel matrices. In the proof of Proposition 4.5, we also present an efficient state preparation procedure using methods of H-matrix splitting in Appendix C.4, which could be of independent interest.

Applications
In this section, we discuss applications of our quantum hierarchical block-encoding method in two commonly studied settings: (i) gravitational potential calculation [23] and (ii) solving integral equations [13]. The first problem requires performing matrix-vector multiplication on a quantum computer, while the second problem is equivalent to solving a quantum linear system. We also compare our procedure to the sparsification approach and address potential issues with large condition numbers.
We first state the following lemmas, which allow one to apply a block-encoded matrix or its inverse to a quantum state. Other polynomial transformations of block-encoded matrices can be performed by the quantum singular value transform (Theorem 17 of [3]).

Lemma 5.1 (Applying block-encoded matrices).
If U is an (α, a, ε)-block-encoding of the matrix A ∈ C 2 s ×2 s and U x is a unitary that prepares the s-qubit quantum state |x , then it takes O α A κ queries to U and U x to obtain a quantum state |y such that |y − |Ax ≤ ε, where κ is the condition number of A.
Proof. Applying the operator U (I a ⊗ U x ) on the state |0 a ⊗ |0 s we obtain where |garbage is orthogonal to the subspace |0 a . By post-selecting the subspace |0 a , we obtain the desired quantum state |Ax . This step succeeds with a probability of . Using amplitude amplification [38], this can be improved to Ω( A ακ ), giving us the specified query complexity.

Lemma 5.2 (Applying inverse block-encoded matrices, adapted from [22]
). If U is an (α, a, ε)-block-encoding of an invertible matrix A ∈ C 2 s ×2 s and U b is a unitary that prepares the s-qubit quantum state |b , then it takes O ακ A log(1/ε) queries to U and U b to obtain a quantum state |y such that |y − |A −1 b < ε.
Proof. We apply the main theorem of [22], which assumed α = A = 1 to achieve a query complexity of O(κ log(1/ε)). In general, κ needs to be replaced by the inverse of the minimum singular value of the block-encoded part. In our case, this singular value is A ακ .
As noted in Section 3, the query complexities of the above procedures, and more generally quantum algorithms in the block-encoding framework [3], only depend on the ratio α/ A . Hence, a block-encoding is optimal (Definition 3.1) when α = Θ( A ). In the following, we apply such optimal block-encodings in Section 4 to two specific applications of kernel matrices.

Gravitational potential calculation: Quantum fast multipole method
We apply the block-encoding obtained in Section 4.1 to solve the gravitational potential problem [23] on a quantum computer. Since we use a quantum version of the hierarchical splitting, this is a quantum analogue to the classical fast multipole method [39]. Consider a 1D system of N particles which are approximately uniformly distributed in the domain where the mass and location of particle j are m j and x j , respectively. Furthermore, assume that m j are Θ(1) and the smallest pairwise distance between the particles is Ω(1). We would like to compute the accumulated potential at every particle i, stored in

Proposition 5.3 (Quantum fast multipole method). Given an oracle
, and a procedure P m that prepares the quantum state |m which stores the normalized masses of the particles, we can obtain a quantum state |φ that ε-approximates the normalized vector Φ/ Φ , whose j-th entry is the accumulated potential at particle j, Using Lemma 4.3, we can construct a unitary U , which is an (α, a, ε)-block-encoding of K, with the specified gate complexity and wherein α = Θ(log N ) and a = O(log N ). Then, we apply Lemma 5.1, with a more careful analysis on the post-selection step. Here, the probability of succesfully post-selecting the subspace |0 a is Φ α m 2 = Θ(1) since m = Θ( √ N ) and Φ = Ω( √ N log N ) (by noting that the particle masses are Θ(1) and using similar calculations to those in Remark 4.1). This gives us the specified query complexity.
We remark that the (non-unitary) state preparation procedure P m can be efficiently implemented when m j = Θ(1) (see Theorem 1 of [40]). In addition, if particles are not (nearly) uniformly distributed, we can discretize the domain to form a finer mesh as described in Remark 4.4. Here we have assumed these distances are Ω(1), hence the size of the mesh (and the size of the kernel matrix) is O(N ). This method also enables computing the potential at locations other than the sources themselves. This proposition also applies to N -body calculations in higher-dimensional systems as the hierarchical splitting and Lemma 4.3 can be generalized to these cases (see Appendix D.3).

Integral equation: Quantum algorithm for linear systems of kernel matrix
We provide an application of our hierarchical block-encoding procedure in solving a quantum linear systems of kernel matrices. These problems arise in many research areas, such as Gaussian processes [15] and integral equations [14]. Consider the following integral equation over the circular strip of unit radius and height and width λ 1 as shown in is a known smooth function, and f (x) is the unknown function we would like to solve. Equations of this form are commonly found in, e.g.,, Dirichlet problems is the given potential on the strip, and f (x) is the unknown charge density. Here, we use the collocation method to numerically solve this integral equation which represents the unknown function in a basis f ( can be rewritten as In the collocation method, we solve for the integral equation at the centroids of the panels located at c i = 2π(i + 0.5)/N . In other words, we solve the following linear system where bold text indicates discretized values, e.g.,, Observe that this collocation method can handle the singularity when i = j as For panels i, j close to each other (e.g., |i − j| < O(1)), simple adaptive methods can be used to approximate the integral. For instance, one can subdivide the panel j into eight smaller panels and approximate the integral as the sum of these eight sub-panels; we neglect the details and refer the interested reader to [41]. In summary, the entries of K are simple to calculate and they approximately are The error bound and the uniqueness of the solution in the collocation method are wellstudied in classical literature, we refer to [14, 41,42] for more details.
We now proceed to block-encode this kernel matrix. Observe that 1 N (2 sin(π|i−j|/N )) p ≤ 1 (π|i−j|) p for any panels i, j such that |i − j| ≤ N/2, hence we can apply a "cyclic" version of the hierarchical splitting in Section 2 to optimally block-encode K (see Figure 2B). As analyzed in Section 4.1, this block-encoding only has a circuit depth of O(polylog N ε ) and a normalization factor of α = Θ( K ), as the operator norm can be lowerbounded using the same technique as in Remark 4.1 by noticing that Finally, it is straightforward to use Lemma B.3 (linear combination of block-encoded matrices) to obtain an optimal block-encoding of I +K. Then, we directly apply Lemma 5.2 to solve the linear system in Equation 24. Since the block-encoding is optimal, the query complexity is O(κ log(1/ε)), for a total runtime of O(κ polylog(N/ε)) when including the circuit complexity of the block-encoding (Lemma 4.3). We address two practical aspects when applying Proposition 5.4. First, we must prepare g as a quantum state. In Appendix E, we present an efficient procedure for preparing |g in the case where g is sampled from a smooth function. At a high level, our procedure prepares the quantum state in the Fourier regime, which is approximately sparse due to the smoothness of g; then we apply the quantum Fourier transform to obtain the desired real-regime state. To prove Proposition 4.5, we also present another state preparation procedure using ideas from H-matrix splitting in Appendix C.4 which could be of independent interest. Second, the query complexity of the block-encoding depends on the condition number κ of the matrix I + K, which can be bounded as follows Note that here, K scales with λ. Since we require λ 1 for the collocation method to work, the operator norm K , and thus κ, is bounded. Therefore, the runtime of Proposition 5.4 is O(polylog N ε ).

Notes on sparsification approach
Since the entries of kernel matrices often decay along a row or column, one may assume that they can be approximated by only considering a sparse set of matrix entries and applying the naive block-encoding method for sparse matrices (Lemma B.5). Here, we analyze this approach showing that it is only favorable when kernel entries decay exponentially and not polynomially. In fact, the runtime of this method is exponentially worse in the target error bound ε compared to that of our approach. Consider the polynomially decaying kernel in Equation 13. One could attempt to approximate this kernel matrix by a band matrix K b of sparsity d. Defining the error to be K − K b , it can be seen that the error is of magnitude N d x −p dx. Since this integral diverges when p ≤ 1, one can only sparsify the matrix when p > 1. In this case, to achieve an error bound of ε s , the sparsity d needs to be Ω(ε For the exponential kernels, however, the sparsification method requires keeping Θ(log(1/ε s )) or more entries per row or column. With Lemma B.5, the sparsified block-encoding has a normalization factor of Θ(log(1/ε s )) using the same gate and query complexities as above. Thus, exponential kernels may admit a sparsification approach with an extra factor of Θ(log(1/ε)) in the runtime.

Notes on condition number
The condition number κ and its relation to the dimension N of the matrix play a central role in the runtime of our algorithms. Numerical experiments shown in Appendix C.5 indicate that for purely polynomially decaying kernels without any regularization and ignoring selfinteracting terms (diagonal entries set to zero), the condition number grows polynomially with N , i.e., the minimum singular value decays with N . In practice, however, the input vector often lives in the well-conditioned subspace where these smaller singular values can be ignored as observed in the gravitation example earlier. For force calculation problems, this is the case when particles have comparable masses/charges. For numerical integration or integral equations, this is equivalently the case when the input functions change slowly.
We note that Lemma 5.2 for matrix inversion uses the discrete adiabatic theorem, whose runtime depends directly on the condition number κ of the matrix operation. Hence, in the case where the matrix operation A is ill-conditioned but the input vector lives in the wellconditioned subspace, we can use techniques that allow for filtering out small eigenvalues before the inversion step such as the QSVT [3]. The query complexity of the block-encoding in matrix inversion in this case is O( log(1/ζ,1/ε) ζ A −1 |b ), where ζ is a chosen threshold bounding the singular values in the well-conditioned subspace.
Fortunately, for applications in numerical analysis or machine learning, the kernel matrix is often regularized by an added matrix, e.g., the addition of an identity matrix as in Fredholm integral equations of the second kind or the noise matrices σ 2 n I in Gaussian processes and kernel ridge regression [14,15]. In the integral equation setting, this regularization is equivalent to scaling the magnitude of self-interacting terms (diagonal entries). Furthermore, if the input vector is potentially in the ill-conditioned subspace, preconditioning methods in [43] may be used to improve the condition number.

Conclusion
Our results show that factoring certain kernel matrices into hierarchical or H-matrices leads to efficient algorithms for implementing such kernel matrices on a quantum computer, thus expanding the scope of application of quantum computers to a class of matrices which are neither low-rank nor sparse. In fact, our approach to implementing such hierarchical matrices, in a sense, blends prior quantum methods for implementing unitary block encodings by hierarchically splitting a matrix into a sparse matrix component close to the diagonal and progressively larger blocks farther away from the diagonals. Looking forward, it is an interesting question whether any extensions to the H-matrix framework can lead to further improvements in the quantum setting. Classically, the most significant extension of the H-matrix framework is perhaps the development of H 2 -matrices, which incorporates a further level of hierarchical nested bases to provide a logarithmic speedup in the matrix dimension for performing matrix operations [24,44]. Furthermore, recent classical work has parameterized H-matrices to integrate them into neural networks and design learning algorithms with inductive biases that match the form of the hierarchical splitting [37,45]. Future work can study whether such parameterizations can be performed as well in the quantum setting via variational quantum algorithms or other quantum analogues to classical neural networks [46].

A Prior works on block-encoding and quantum linear systems algorithms
In this section, we provide a brief overview and discuss the runtimes of previous works in Table 1. As a reminder, the quantum linear system problem (QLSP) seeks an approximate solution |x to Let N be the dimension; d be the sparsity; κ be the condition number; and k be the rank of the matrix A, which is assumed to be square, normalized ( A = 1), and Hermitian 1 . The original HHL algorithm [1]  Later, [2] noticed that the QPE step can be circumvented by utilizing the fact that the inverse function 1 x can be written, to the desired precision ε/κ, as a linear combination of O(κ polylog(κ/ε)) Chebyshev polynomials. Combining this expansion with the variable-time amplitude amplification technique in [48], they achieved a runtime of O(dκ polylog (N κ/ε)). This polynomial expansion approach is generalized in the blockencoding framework by [3,19] to perform almost any polynomial matrix transformation with the same complexity. All these works use a naive block-encoding in the oracle-access model (Lemma 48 in [3], same as Lemma B.5 or Lemma 3.2).
In another approach, [5] used a QRAM data structure to implement a unitary W whose eigenvalues are related to the singular values of A (this algorithm does not require A to be square or Hermitian) via its Frobenius norm A F . Thus, performing QPE on W yields a quantum singular value estimation algorithm for A. Matrix multiplication and inversion can then be implemented immediately with a rotation operator controlled on the register storing the singular values, just like done in HHL. The runtime of both operations, 1 If not, one can consider the equivalent linear system 0 A A † 0 y = b 0 , whose solution is y = 0 x .
including the post-selection step, is O(κ 2 A F polylog(N )/ε). For a full-rank matrix whose singular values are O(1), we have A F = √ N . This QRAM model, however, is strictly stronger than the oracle-access model as it requires the entries be loaded and pre-processed in the data structure.
Recent adiabatic algorithms for QLSP [49][50][51] also enjoy success and culminate in [22], which achieved an optimal query complexity of O(κ log(1/ε)) by constructing a sophisticated schedule function for the adiabatic evolution. This query complexity is better than the algorithms in [2,3,5] and has been shown to be optimal [22]. However, all of these assume that the matrix A is perfectly block-encoded inside a unitary U = A A · · · which is generally hard to construct without prior knowledge of the structure of A. The exact query complexity and total runtime in the general case is stated in Lemma 5.2 and governed by the ratio A α , where α is the normalization factor in the block-encoding. The runtimes shown in Table 1 are obtained by combining the optimal adiabatic solver in [22] (see Lemma 5.2) with the corresponding block-encoding used in [2][3][4][5]19]. In particular, the naive oracle-access block-encoding in [2,3,19] yields a normalization factor of α = d = N for dense matrices. The QRAM-based block-encoding in [4,5] yields α = A F = √ N for full-rank matrices. The runtime for matrix multiplication is simply obtained by computing the probability of success in the post-selection step after applying the block-encoding, as described in Section 3 and Lemma 5.1.
Quantum-inspired algorithms are a class of classical algorithms equipped with samplequery access, a classical analogue to QRAM. The currently fastest quantum-inspired algorithms for QLSP [6,7] are based on stochastic gradient descent (specifically randomized Kaczmarz method) for a runtime of O( A 6 F κ 8 /ε 2 ) (where the notation O suppresses the log factors). Assuming κ = O(1) and the matrix has rank k, this runtime is O(k 6 /ε 2 ), which is probitive when the matrix is full-rank.
We note that all the algorithms above are "general-purpose" since they do not assume any inherent structures in the matrix A. Whereas, our work builds on these past works and focuses on optimizing the performance of QLSP algorithms for dense and full-rank kernel matrices and potentially other classes of matrices that suit the H-matrix framework.

B Details of block-encodings
We list some previously established results in the block-encoding framework and prove the helper lemmas presented in the main text. Most of the following results are drawn from Section 4 of [3]. As a reminder, a block-encoding is defined as follows.

Definition B.1 (Block-encoding [3]). Suppose that A is an s-qubit operator, α, ε > 0, then we say that the (s + a)-qubit unitary U is an (α, a, ε)-block-encoding of A, if
where · denotes the operator norm of a matrix.

B.1 Linear combinations and multiplications of block-encodings
Combinations of block-encoded matrices can be efficiently combined linearly or multiplied with each other. First we define state-preparation-pairs used to prepare the coefficients in a linear combination.
We can then implement a linear combination of block-encoded matrices using an above state preparation pair.

Lemma B.3 (Linear combination of block-encoded matrices, adapted from [3]). Let
Proof. We have: We note that in the original work [3], Lemma 52 states that the error of the blockencoding is αε 1 + αβε 2 , which is a typo. Lemma B.3 also applies to cases where P L and P R are non-unitary. As long as block-encodings of P L , P R are provided, we can still implement the unitary W in Lemma B.3 by using the following lemma, which allows one to multiply block-encoded matrices.

B.2 Block-encoding of oracle-access matrices
The following lemma can be used to block-encode any matrix whose entries are provided by an oracle.
Let A ∈ C 2 s ×2 s be a d r -row-sparse and d c -column-sparse matrix. Suppose that we have the access to the following 2(s + 1)-qubit oracles where r ik is the index for the k-th non-zero entry of the i-th row of A, or if there are less than k non-zero entries, then r ik = k + 2 s , and similarly c lj is the index for the l-th non-zero entry of the j-th column of A, or if there are less than l non-zero entries, then c lj = l + 2 s . Additionally, letâ ≥ max i,j |a ij | and suppose the following oracle is provided )) ancilla qubits (which are discarded before the post-selection step).
Proof. Let G d be the (s + 1)-qubit unitary that performs: |i , which can be implemented with O(s) Hadamard gates (assume d is a power of 2 and note that d ≤ 2 s ). Additionally, define the following 2(s + 1)-qubit operators: V R = O c (G dc ⊗ I s+1 ) and V L = O r (I s+1 ⊗ G dr ) SWAP s+1 , where SWAP s+1 swaps two (s + 1)-qubit registers (which uses s two-qubit swap gates). For any 0 ≤ i, j < 2 s , we have that Observe that after applying V R , instead of applying V † L right away, we can first append a b-qubit register and call O A to query the matrix entry, then append the register |0 and perform the controlled rotation: |0 |ã ij → (a ij /â |0 + 1 − |a ij /â| 2 |1 ) |ã ij , and finally call O A again to uncompute the b-qubit ancilla register. Conditioning on the ancilla qubit being |0 , we obtain the desired block-encoding. We only count s + 3 qubits as ancilla qubits in the block-encoding notation because the others can be discarded immediately after the uncomputation step.
We analyze the circuit complexity of the controlled rotation step above. Observe that, assuming the b-qubit description |ã ij is exact, if the controlled rotation can be done with an accuracy of O( ε a √ drdc ) then we achieve the overall accuracy of the block-encoding. )) controlled quantum gates to perform the controlled rotation.
We can see that the normalization factor of the above block-encoding depends on the sparsity of the matrix. Without loss of generality, assumeâ = 1. Then, the above lemma when applied to dense matrices produces a block-encoding with a normalization factor of O(N ). This is not optimal (see definition in Section 3) if the operator norm of the matrix is significantly smaller than O(N ). Hence, we refer to it as the "naive" block-encoding. We provide some examples in which this lemma works optimally in appendix C.6. Remark B.6 ("Naive" block-encoding of dense oracle-access matrices (same as Lemma 3.2)). In the dense case, the oracles O r , O c are not needed; we can simply use s Hadamard gates to query all matrix entries. It can be easily verified that in this case we can obtain an (â2 s , s + 1, ε)-block-encoding with two uses of O A and additionally O(s + polylog(â 2 s ε )) one-and two-qubit gates while using O(b, polylog(â 2 s ε )) ancilla qubits.
If the matrix A is rectangular, that is A ∈ C 2 s ×2 t , where we assume s > t without loss of generality, we can block-encode A by applying the above lemmas on the 2 s by 2 s matrix A defined as Then for an input state |ψ , we can obtain A |ψ by applying A on the state |0 s−t ⊗ |ψ .

B.3 Block-encoding of low-rank matrices
Recall Lemma 3.4, copied below.
Suppose the following (r + s)-qubit unitaries are provided: where 0 ≤ i < p and |u i (|v i ) is the quantum state whose amplitudes are the entries of u i (v i ). Let (P L , P R ) be a (β, r, ε)-state-preparation-pair for the vector [σ 0 , . . . , σ p−1 ]. Then, one can construct a (β, r +s, ε)-block-encoding of A with one use of each of G L , G † R , P † L , P R and a SWAP gate on two s-qubit registers.
The unitaries G R , G L can be constructed using existing state preparation methods based on the structure of the entries in the singular vectors (e.g., [34] utilized efficiently integrable distributions, [4] used a QRAM data structure). As an example, when given oracle-access to the entries of the vectors u, v, one can use the following lemma to blockencode rank-1 operators.
Lemma B.8 (Block-encoding of rank-1 operators with oracle access entries). Let A = uv † ∈ C 2 s ×2 s be a rank-1 operator. Suppose the following oracles are provided: Then one can obtain a (2 sûv Proof. Let CRû denote the conditioned rotation operator: Define G v similarly as above. Then we have 2 This low-rank block-encoding can also be applied for rectangular matrices. Suppose u ∈ C 2 m and v ∈ C 2 n , where we assume without loss of generality m > n, then we can apply the lemmas on the matrix A = u( 0| m−n ⊗ v † ).
Remark B.9. The normalization factor 2 sûv is to ensure that the block-encoding is unitary. Thus, Lemma B.8 works best when the entries u i and v j change slowly. If u or v is sparse, we can apply the techniques in Lemma B.5 to improve the normalization factor of this low-rank block-encoding. In general, however, the above lemma is rather redundant since we can directly apply Lemma B.5 naively by constructing the matrix entry-access oracle from O u and O v . We simply present this lemma here for completeness.

B.4 Block-encoding of block-sparse matrices (Lemma 3.5)
In this section, we provide a block-encoding procedure for block-sparse matrices. First we consider the block-diagonal case and later use these techniques for constructing general block-sparse matrices.
Lemma B.10 (Block-encoding of block-diagonal matrices). Let {A j : 0 ≤ j ≤ 2 t − 1} be a set of s-qubit operators and each U j be an (α j , a, ε)-block-encoding of A j . Let W = 2 t −1 j=0 |j j| ⊗ U j andα = max j α j . Furthermore, suppose that the following oracle is provided: O α : |j |z → |j |z ⊕α j , whereα j is the (exact) b-qubit description of α j  and z is a b-qubit string. Then one can obtain an (α, a + 1, 2ε)-block-encoding of A = 2 t −1 j=0 |j j| ⊗ A j with two uses of O α , one use of W , and additionally O(polylog(α ε )) one-and two-qubit gates while using O(b, polylog(α ε )) ancilla qubits (which are discarded before the post-selection step).
We do not include b in the notation of the block-encoding because this register can actually be taken out of the system right after the uncomputation step.
In the above lemma, if the normalization factors α j are the same for all blocks, then the operator W is already an (α, a, ε)-block-encoding of the desired block-diagonal matrix.
Next, we apply the proof techniques above to generalize to the block-sparse case. The following lemma is the same as Lemma 3.5.
Lemma B.11 (Block-encoding of block-sparse matrices). Let A = 2 t −1 i,j=0 |i j| ⊗ A ij be a d r -row-block-sparse and d c -column-block-sparse matrix, where each A ij is an s-qubit operator. Let U ij be an (α ij , a, ε)-block-encoding of A ij . Suppose that we have the access to the following 2(t + 1)-qubit oracles where r ik is the index for the k-th non-zero block of the i-th block-row of A, or if there are less than k non-zero blocks, then r ik = k + 2 t , and similarly c lj is the index for the l-th non-zero block of the j-th block-column of A, or if there are less than l non-zero blocks, then c lj = l + 2 t . Additionally, suppose the following oracle is provided: Proof. Let CR be the controlled rotation operator: Let the (2t + 2 + s + a)-qubit unitary W = i,j:A ij =0 (|i |j i| j|) ⊗ U ij + (I − i,j:A ij =0 (|i |j i| j|))⊗I s+a and, similarly to Lemma B.10, is the desired block-encoding. Indeed, this follows directly from Equation 30 and a similar proof to that of Lemma B.10, noting that W leaves the |c lj and |j registers unchanged, and Implementing G d and the controlled rotation operator requires the indicated additional gate and ancilla qubit complexities (see explanation in the proof of Lemma B.5).
This lemma does not require sparsity to work. It generalizes Lemma B.5 (also Lemma 48 of [3]) for block-encoding matrices with oracle access to entries. If α ij are the same for all blocks, we can omit the oracle O α and the controlled rotation step, hence only t + a + 2 ancilla qubits are needed.
C Detailed analysis of block-encodings of kernel matrices C.1 Circuit complexity analysis of Section 4.1 In this section, we provide detailed analysis on circuit and query complexities of the blockencoding procedure presented in Section 4.1. Our problem is to optimally block-encode the kernel matrix K = (k(x i , x j )) N −1 i,j=0 , where x i = i, N = 2 L , and k(x, x ) = |x − x | −p is a polynomially decaying kernel, given the oracle wherek As a reminder, our procedure uses the hierarchical splitting in Figure 1. Specifically, at level of the hierarchy, each admissible block has size 2 L− . In addition, the maximum entry of an admissible block in K ( ) is bounded by d −p min = 2 −(L− )p . Thus, an admissible block at level can be (α , L − + 1, ε 3 )-block-encoded via the naive procedure for dense matrices (Lemma 3.2 or Remark B.6), where α = 2 (L− )(1−p) . From here, each K ( ) , which is a block-sparse matrix of sparsity 3, can be (3α , L + 3, ε)-block-encoded via the procedure for block-sparse matrices (Lemma 3.5). The adjacent interaction part K ad , which is a tridiagonal matrix, can be (3, L + 3, ε)-block-encoded with Lemma B.5. Finally, we use the procedure for linear combination of block-encoded matrices (Lemma B.3) to obtain a block-encoding of K = L =2 K ( ) + K ad as U = L =2 3α U ( ) + 3U ad , where the U 's denote the corresponding block-encodings. This step requires a (log L)-qubit state preparation pair (Definition 3.3) which prepares the coefficients [3, 3α 2 , . . . , 3α L ] in the linear combination. This entire procedure results in a (α, L + log L + 3, αε)-block-encoding of the exact kernel matrix K, where, In the main text, we have shown that this procedure results in an optimal blockencoding. That is, in the obtained (α, log N + log log N + 3, ε)-block-encoding, the normalization factor α is exactly Θ( K ). Here, we show why exactly two queries to O k and O(polylog( N ε )) extra one-and two-qubit gates are needed by carefully combining Lemma 3.5, Lemma B.3, and Lemma 3.2.
Resources We use 4 registers A (log L qubits), B (L + 1 qubits), C (L + 1 qubits), D (single qubit). For simplicity, assume log L is an integer. We want to construct a (2L + log L + 3)-qubit unitary U such that, for any 0 ≤ m, n ≤ 2 L − 1, we have Here, although 0 ≤ m, n ≤ 2 L − 1, we allocate (L + 1) qubits to the registers B, C for later use.

State preparation pair
Observe that Lemma B.3 requires a state preparation pair (P L , P R ) (Definition 3.3) that prepares the coefficient vector β = [3, 3α 2 , . . . , 3α L ] (the first entry being α ad of K ad ). Note that β 1 = α, the overall normalization factor. These operators are log L = log log N qubits, hence we assume they are provided for now (later we will show how to efficiently construct them). In particular, we choose P L = P R and Implementing the "right" part Suppose the registers are in the state We apply P R to get (omitting the register names for ease of notation) Next, we take care of Lemma 3.5 for the block-sparse levels K ( ) . Conditioned on register A being − 1, where 2 ≤ ≤ L, we focus on a particular . Define the operator D 3 : Next, we apply the controlled block-row index oracle O r , which, based on , c, n, computes the block-row index m b of the c-th non-zero admissible block in K ( ) that intersects column n (there are at most 3 such blocks since K ( ) is 3-block-sparse) and writes the result m b to the first + 1 qubits of register B. Note that this computation is easy and reversible. We denote this set of non-zero block-row index m b as I( , n). If there are less than c such blocks, O r writes c + 2 instead (this choice ensures the oracle is reversible and also the first qubit of |m b acts as a flag that will be useful later).
Next, we take care of the steps in Lemma 3.2. Also conditioned on register A being − 1, where 2 ≤ ≤ L, we apply Hadamard gates on the last L − qubits of register B Observe that when the first qubit of |m b is zero, |m b |m i together form a valid row index m < 2 L . If this qubit is one, then |m b |m i is not a valid row index and we can simply take k mn = 0 when querying the matrix entry oracle O k . In particular, we query the oracle O k on registers B, C to obtain |k mn in another appended b-qubit register. Then, we perform the controlled rotation to rotate register D as: where k ( ) max is the maximum entry in the admissible blocks at level , which we have shown to be 2 −(L− )p . Finally, we call O k one more time to uncompute and discard the appended register. This gives On the other hand, conditioned on = 1 (register A being |0 ), we take care of the adjacent part K ad rather simply. Apply the (2L + 2)-qubit operator O ad which performs the map |0 |n → 1 √ 3 1 i=−1 |n + i |n for 0 ≤ n < 2 L (if n + i = −1 then it writes to the first register any value that is not a valid row index, e.g., 2 L+1 − 1) on registers B, C. Then, we perform the oracle query, controlled rotation, and uncomputation similarly as done above to obtain where k n+i,n = 0 if n + i is an invalid row index.
To summarize, the entire procedure from Equation 36 to Equation 40 (and Equation 41), denoted by U R , performs the following where β 0 = 3 is the normalization factor of the adjacent level K ad and β −1 = 3α is the normalization factor of the hierarchical level K ( ) . Importantly, notice that register B has full support on all 2 L possible values of row index m in this superposition.
Implementing the "left" part We would like to transform the state |0 |0 |m |0 (43) to obtain similar result to that of the "right" procedure above. The procedure follows the same steps, except • P R needs to be replaced by P L .
• O r needs to be replaced by O c , the controlled block-column index oracle which, based on , m, computes the block-column indices n b of the non-zero admissible blocks in K ( ) that intersect row m (there are at most 3 such blocks since K ( ) is 3-blocksparse) and writes the result n b to the first + 1 qubits of register B. We denote this set of non-zero block-column indices n b as J( , m).
• There is no need to query the entry oracle O k (nor the accompanied controlled rotation).
• Registers B and C are swapped at the end.
Using similar notation to that of Equation 42, the above modifications give

Combining both parts Combining Equation 42 with Equation 44 and noting that
which is exactly what we set out to prove (Equation 33).
Circuit complexity As seen above, this block-encoding only uses two calls to O k , and one call to each of O r , O c and P R , P L . To achieve an overall error bound of ε, it suffices for the controlled rotation on |k mn |0 to have error O( ε N ). This can be done with O(b, polylog( N ε )) extra qubits and one-and two-qubit gates (see explanation in proof of Lemma B.5).
Constructing state preparation pair P L and P R act on only log log N qubits, hence were assumed given as unitaries. If this is not the case, we can still construct them (Equation 34) as block-encoded operators with only an extra normalization factor of log N as follows. Let | |β |0 (query β into an appended register) where β 0 = 3 and β −1 = 3α = 3 · 2 (L− )(1−p) for 2 ≤ ≤ L andβ = max |β |. As explained in Lemma B.5, the controlled rotation can be implemented with error bound ε using O(polylog( L ε )) extra one-and two-qubit gates. Comparing with Equation 34 we see that P is a ( . In other words, the normalization factor will become a factor ofβ L α larger than the optimal block-encoding. We can bound this extra factor rather easily. As a reminder, α is given in Equation 32.
Therefore, this extra factor only increases the runtimes of Lemma 5.1 (applying blockencoded matrices) and Lemma 5.2 (applying inverse of block-encoded matrices) by a factor of O(log N ).

C.2 Variants of hierarchical splitting
The form of the hierarchical splitting can be slightly modified for other types of decaying kernels. For example, consider kernels of the form k(x, x ) = 1 |(x−x )+c| p on the domain Ω = [0, N − 1) and −N < c < N is an integer. In this case, we can use a "shifted" hierarchical splitting to obtain an optimal block-encoding of the kernel matrix K = (|i − j + c| −p ) N −1 i,j=0 , in which the hierarchy is shifted in the horizontal direction (along the rows, see  |x−x +4| p with uniform distribution of particles, which is obtained by shifting the splitting in Figure 1C 4 units to the right. Note that the fine splittings in the first 4 columns are actually only needed when the sum x − x + 4 is modulo N .

C.3 Adaptive algorithm for non-uniform particle distribution
In particle simulation tasks, if the distribution of particles is not approximately uniform, an adaptive hierarchical splitting can be employed. Let the smallest interparticle spacing be N −c , where N is the number of particles. Then, we can scale the mesh such that the interparticle distance is at least 1 and the mesh size is N c . The added mesh sites are treated as zero-mass particles. The kernel matrix has size N c × N c and the hierarchical splitting constructed on this mesh has c log N levels. The entire block-encoding procedure presented in Section 4 applies, with only an extra factor of poly(c) in the normalization factor and the required resources of the block-encoding. The optimality of the normalization factor of this adaptive block-encoding is harder to state precisely, since it depends on the distribution of particles which is not determined a priori.

C.4 Generalization of quantum hierarchical block-encoding
Here, we provide a block-encoding procedures in which one has direct access to the structure of the entry magnitudes, thus generalizing hierarchical block-encoding procedure presented in Section 4.1 and Appendix C.1.
First, we present a procedure for preparing a quantum state |x ∈ C N , where N = 2 L , with high probability of success. For any 0 ≤ < L − 1, let I = {i : 2 −( +1) < |x i | ≤ 2 − } and I L−1 = {i : |x i | ≤ 2 −(L−1) }. Also, let n = |I |. Apart from the oracle O x : |i |0 → |i |x i , our procedure assumes the following oracle is provided: where the first register is log L qubits and the second register is L = log N qubits, and f ( , j) is an efficiently computable and reversible function that computes the index of the jth element in the set I (according to some fixed ordering, e.g. top to bottom if |x is treated as a column vector) if 0 ≤ j < n , or else the oracle writes N + (j − n + 1) + −1 l=0 (N − n l ) as a bit string on the concatenation of the two registers (so that the first register will be non-zero). This ensures the oracle is reversible. An example of such efficient and reversible f ( , j) is in regular H-matrix splitting. Our procedure is as follows: (query x f ( ,j) and rotate ancilla qubit) Above, we have ignored the terms when j is invalid (j ≥ n ) in the sum because we can simply set x f ( ,j) = 0 for those cases. Furthermore, observe that the superposition has support on all indices. Thus, conditioning on the ancilla being zero, we obtain the desire state |x . The probability of successfully post-selecting is 1 β 2 , where We also assumed that the first operation in the procedure, which maps |0 log L to a weighted superposition over | , can be done perfectly. In fact, it can only be block-encoded with an extra normalization factor of √ L: for a vector |y ∈ C L one can simply do (query y and rotate ancilla qubit) When including this step into the procedure above, the probability of successfully postselecting |00 and obtaining |x is 1 Lβ 2 ≥ 1 8 log N . Intuitively, our procedure exploits the structure of the amplitudes such that in the controlled rotation step, the ancilla has a large component in the good subspace |0 . C.5 Condition number of polynomially and exponentially decaying kernels Condition number k(r) = 1/r 1 k(r) = 1/r 2 k(r) = 1/r 3 k(r) = e r Figure 4: The condition number of polynomially decaying kernels grow polynomially with the matrix dimension. Whereas, the condition number of the exponentially decaying kernel converges. Here, the kernel matrices take the form , and k(0) = 0 (no self-interactions). We remark that, the condition number can be dramatically improved if the selfinteracting terms are sufficiently large. For example, for k(0) = 2 we found that the condition number for k(r) = 1/r grows only logarithmically in N , while those of the other kernels scale as O(1). This can be explained using similar calculations to Equation 26.
C.6 Kernels for which the naive block-encoding approach (Lemma 3.2) works optimally We provide examples of kernels that are studied in classical H-matrix or fast multipole method literature where the naive block-encoding approach of Lemma 3.2 (or Remark B.6) can be directly used to produce an optimal block-encoding of their kernel matrices.
The log kernel k(x, x ) = log |x − x | (chapter 1, [54]). We use the same setup as in the main text. Let the kernel matrix be K = (k(x i , x j )) N −1 i,j=0 , where x i = i, N = 2 L , and k(x, x) = 0. Directly applying Lemma 3.2 yields a (N log N, log N + 1, ε)-block-encoding using only O(polylog N ε ) additional resources. To show that this block-encoding is optimal, we bound the operator norm from below: The multiquadric kernel k(x, x ) = c 2 + |x − x | 2 (chapter 2.1, [13]). For this, let the kernel matrix be Directly applying Lemma 3.2 yields a (2N, log N + 1, ε) using only O(polylog N ε ) additional resources. The operator norm of K is bounded as K ≥ K |1 ≥ N 1/2 0 √ c 2 + x 2 dx = Ω(N ), implying that the block-encoding is optimal.
We also list here, omitting the proofs, some typical kernels (which are not considered in classical H-matrix) that can be optimally block-encoded by the naive approach of Lemma 3.2. For example, these include polynomially growing kernels k(x, x ) = |x − x | p and polyharmonic radial basis functions k(x, x ) = |x − x | p log |x − x |. Our numerical experiments indicate that these kernels are numerically low-rank, as shown in Figure 5.
of the log, polyharmonic, and multiquadric kernels. Here r = |x − x |, x j = j/N for the multiquadric kernel (k(r) = 1/4 2 + r 2 ), and x j = j for the other kernels. The plots indicate that all kernels are numerically low-rank. singular value) is close to uniform (the normalized all-ones vector). This explains why these kernels can be block-encoded by the naive block-encoding.

D.1 Low-rank approximation using asymptotic smoothness
In this section, we show how the asymptotic smoothness condition (Equation 6) can be used to approximate an admissible block as a low-rank matrix. As a reminder, the asymptotic smoothness condition is For instance, the log kernel k(x, x ) = log(|x − x |) directly holds this property. Consider the admissible block K σ,ρ = (k(x i , x j )) i∈σ,j∈ρ , where σ, ρ ∈ P ( ) . Let c ρ be the center of ρ, we first obtain the p-th order Taylor expansion of k(x, where the remainder R(x, x ) can be bounded using the asymptotic smoothness property as follows Here, we have used the admissibility criteria (Definition 2.1) in the third inequality and that r σ = r ρ . Thus, K σ,ρ can be approximated with exponentially small error by a rank-p matrix where D = Diag(1/q!) q∈P ∈ R p×p , P = {0, 1, . . . p − 1} and Next, we give an example of a kernel that does not directly satisfy the asymptotic smoothness condition (Equation 6), but can still be manipulated to apply the low-rank approximation. Consider the kernel k(x, x ) = |x − x | −2 . This kernel does not directly satisfy Equation 6. Nevertheless, for an admissible block K σ,ρ = (k(x i , x j )) i∈σ,j∈ρ at level , the kernel can be rewritten as where c ρ is the center of ρ. In our setting (see Section 2), c ρ is simply the middle point of the corresponding segment on the domain Ω. Furthermore, noting that k (q) (1) = (−1) q (q +1)!, let and Since K σ,ρ is an admissible block (Definition 2.1), we have that cρ−x j x i −cρ ≤ 1 2 , ∀i ∈ σ, j ∈ ρ. Therefore, (q + 1)2 −q = 2 −p (2p + 4). (60) Thus, k(x i , x j ) can be approximated with error ε/N by the truncated sum in Equation 59 of order p = O(polylog N/ε). Accordingly, the admissible block K σ,ρ can be approximated to error ε by a rank-p matrix.

D.2 Classical fast H-matrix-vector multiplication
We count the number of operations needed to compute the matrix-vector multiplication, where K ∈ R N ×N and v ∈ R N , The adjacent term K ad is 3-sparse, hence requires only O(N ) operations. In level , each block-row (row of blocks) requires computing 3 matrix-vector multiplications on dense matrices of size N · 2 − , which takes O(pN 2 − ) operations (where p is the rank of the admissible blocks). There are 2 block-rows, thus computing each K ( ) requires O(pN ) operations. Since there are L − 1 = O(log N ) levels, the total operation complexity to compute all u is O (pN log N ). Adding up u and u ad costs another O(N log N ). As we have shown above, it is required that p = O(polylog N ε ) to achieve an error bound of ε on K. Thus, the overall runtime of matrix-vector multiplication on H-matrices is O (N polylog N  ε ). We refer to [16] for operation complexities of other tasks on H-matrices, such as matrix-matrix multiplication and matrix inversion, which are also O(N polylog N ε ) when p = O(polylog N ε ).

D.3 High dimensional hierarchical splittings
In the main text, we have seen that two key properties of the 1D hierarchical splitting that enables optimal quantum block-encodings were the logarithmic number of hierarchical levels and the block sparseness of each level (the third key property was that the off-diagonal entries were decaying or sufficiently smooth). Here, we show that these properties still hold for the 2D case and make a generalization argument for higher dimensional cases. The 2D and 3D hierarchical splittings are well-studied in classical literature [17]. We will follow the indexing rules and conventions in the original work [17]. For an overview of hierarchical matrices and splittings, we suggest interested readers to read [13]-which covers the fast multipole algorithm (a special case of hierarchical splittings), [16]-which is the original paper that introduced the 1D hieararchical splitting, and [17]-the follow-up paper that generalized the hierarchical splittings to 2D and 3D problems.
Consider the uniform 2D grid of size Before we proceed to construct the hierarchical splitting of the kernel matrix, we first have to decide how to number particles in a 2D grid. We follow the "canonical" choice in Section 4 of [17] (see Figure 6). In this numbering rule, the conversion from a grid site (i, j) to the corresponding vectorized index m is given by where i 1 . . . i L (j 1 . . . j L ) is the binary representation of the row (column) index which ranges from 0 to 2 L − 1 and f is the CNOT function whose target bit is the second: f (0, 0) = 0, f (0, 1) = 1, f (1, 0) = 3, f (1, 1) = 2. The conversion from the vectorized index K to grid site (i, j) can be done by first writing m in base-4 representation, then inverting the function f . In summary, the entries of the kernel matrix are of the form K m(i 1 ,j 1 ),m(i 2 ,j 2 ) = k((i 1 , j 1 ), (i 2 , j 2 )). Next, we also need to define the admissibility criteria (Definition 2.1) for now 2D clusters. For two clusters in the same level, e.g., σ where we choose η = 1 √ 2 . In other words, two clusters (hence the block in the kernel matrix K that contains the interactions between them) are admissible when they are not neighboring along a column, row, or diagonal.
Having established the admissibility criteria, we proceed similarly to Section 2 to obtain the following decomposition of the kernel matrix K where K ( ) contains the admissible interactions in level that have not been included in K ( −1) . Next, we calculate the block-sparsity of K ( ) by counting, among the 4 clusters in level , how many clusters contribute to the potential at the particles inside a particular cluster σ ( ) I,J . In fact, at most 6 2 − 3 2 = 27 level-clusters need to be included. This can be most easily observed in Figure 7. Thus, the row-block-sparsity of K ( ) is 27. And since K ( ) is symmetric, the column-block-sparsity is also 27. In summary, we still have log N hierarchical levels, each of which is block-sparse in the 2D case. Therefore, we can apply the entire block-encoding procedure in Section 4 to block-encode kernel matrices of a 2D grid. Below, we provide a similar analysis to that of Section 4.1 for polynomially decaying kernels (k(x, x ) = x − x −p ), showing that the hierarchical block-encoding is optimal for the 2D case. Since most clusters have been taken into account in previous levels of the hierarchy, at most 6−3 = 3 of them (green) will contribute to the potential at the target (red star). Similarly, (B) shows the structure of the clusters in level ( = 4) of a 2D hierarchical splitting. Since most clusters have been included in previous levels of the hierarchy, at most 6 2 − 3 2 = 27 of them (green) will contribute to the potential at the target.
At level (2 ≤ ≤ L), an admissible matrix block K σ,ρ , which contains the interactions between two clusters σ, ρ at level , has size 4 L− ×4 L− ; and the minimum pairwise distance between particles within this block is 2 L− by the definition of admissibility (Equation 66). It follows that the maximum entry in K σ,ρ is 2 −(L− )p . Thus, K σ,ρ can be block-encoded by Lemma 3.2 with a normalization factor of α = 2 (L− )(2−p) . Then, K ( ) can be blockencoded by Lemma 3.5 with a normalization factor of 27α . The adjacent interaction part, K ad , is a sparse matrix with sparsity 9 and the largest entry equal to 1. Thus, K ad can be block-encoded with a normalization factor of 9 using Lemma B.5. Finally, the entire kernel matrix K is obtained by summing over all levels using Lemma B.3. The overall normalization factor is α = 9 + This block-encoding can be shown to be optimal by a similar method to Remark 4.1. We have that K ≥ K |1 , where |1 is the normalized all-ones vector. Furthermore, observe that In the limit N → ∞, it can be easily verified that the above integral is O(N 2−p ) when p = 2 and O(log N ) when p = 2. Therefore, α = Θ( K ) and the block-encoding procedure using hierarchical splitting is optimal. Finally, the analysis in this section can be generalized to any d-dimensional setting, where the block-sparsity of each hierarchical level K ( ) is 6 d − 3 d and the sparsity of the adjacent part K ad is 3 d . For a visualization of 2D and 3D hierarchical splittings, see Figure  6 of [55], which has a smaller block-sparsity than the aforementioned value since diagonally neighboring clusters are considered admissible by the authors of [55], as opposed to our treatment in this section.