Quantum Regularized Least Squares

Linear regression is a widely used technique to fit linear models and finds widespread applications across different areas such as machine learning and statistics. In most real-world scenarios, however, linear regression problems are often ill-posed or the underlying model suffers from overfitting, leading to erroneous or trivial solutions. This is often dealt with by adding extra constraints, known as regularization. In this paper, we use the frameworks of block-encoding and quantum singular value transformation (QSVT) to design the first quantum algorithms for quantum least squares with general $\ell_2$-regularization. These include regularized versions of quantum ordinary least squares, quantum weighted least squares, and quantum generalized least squares. Our quantum algorithms substantially improve upon prior results on quantum ridge regression (polynomial improvement in the condition number and an exponential improvement in accuracy), which is a particular case of our result. To this end, we assume approximate block-encodings of the underlying matrices as input and use robust QSVT algorithms for various linear algebra operations. In particular, we develop a variable-time quantum algorithm for matrix inversion using QSVT, where we use quantum eigenvalue discrimination as a subroutine instead of gapped phase estimation. This ensures that substantially fewer ancilla qubits are required for this procedure than prior results. Owing to the generality of the block-encoding framework, our algorithms are applicable to a variety of input models and can also be seen as improved and generalized versions of prior results on standard (non-regularized) quantum least squares algorithms.

Linear regression is a widely used technique to fit linear models and finds widespread applications across different areas such as machine learning and statistics. In most real-world scenarios, however, linear regression problems are often ill-posed or the underlying model suffers from overfitting, leading to erroneous or trivial solutions. This is often dealt with by adding extra constraints, known as regularization. In this paper, we use the frameworks of block-encoding and quantum singular value transformation (QSVT) to design the first quantum algorithms for quantum least squares with general 2 -regularization. These include regularized versions of quantum ordinary least squares, quantum weighted least squares, and quantum generalized least squares. Our quantum algorithms substantially improve upon prior results on quantum ridge regression (polynomial improvement in the condition number and an exponential improvement in accuracy), which is a particular case of our result.
To this end, we assume approximate block-encodings of the underlying matrices as input and use robust QSVT algorithms for various linear algebra operations. In particular, we develop a variable-time quantum algorithm for matrix inversion using QSVT, where we use quantum singular value discrimination as a subroutine instead of gapped phase estimation. This ensures that substantially fewer ancilla qubits are required for this procedure than prior results. Owing to the generality of the block-encoding framework, our algorithms are applicable to a variety of input models and can also be seen as improved and generalized versions of prior results on standard (non-regularized) quantum least squares algorithms.

Introduction
The problem of fitting a theoretical model to a large set of experimental data appears across various fields ranging from the natural sciences to machine learning and statistics [Mur12]. Linear regression is one of the most widely used procedures for achieving this. By assuming that, for the underlying model, there exists a linear relationship between a dependent variable and one or more explanatory variables, linear regression constructs the best linear fit to the series of data points. Usually, it does so while minimizing the sum of squared errors -known as the least squares method.
In other words, suppose that we are given N data points The assumption is that each b i is linearly dependent on a i up to some random noise of mean 0. Suppose A is the data matrix of dimension N ×d, such that its i th row is the vector a i and b ∈ R N such that b = (b 1 , · · · , b N ) T . Then the procedure, known as ordinary least squares, obtains a vector x ∈ R d that minimizes the objective function ||Ax − b|| 2 2 . This problem has a closed-form solution given by x = (A T A) −1 A T b = A + b, where A + denotes the Moore-Penrose inverse of the matrix A. Thus computationally, finding the best fit by linear regression reduces to finding the pseudoinverse of a matrix that represents the data, a task that is expensive for classical machines for large data sets.
In practice, however, least squares regression runs into problems such as overfitting. For instance, the solution might fit most data points, even those corresponding to random noise. Furthermore, the linear regression problem may also be ill-posed, for instance, when the number of variables exceeds the number of data points rendering it impossible to fit the data. These issues come up frequently with linear regression models and result in erroneous or trivial solutions. Furthermore, another frequent occurrence is that the data matrix A has linearly dependent columns. In this scenario, the matrix A T A is not full rank and therefore is not invertible.
Regularization is a widely used technique to remedy these problems, not just for linear regression but for inverse problems, in general [EHN96]. In the context of linear regression, broadly, this involves adding a penalty term to the objective function, which constrains the solution of the regression problem. For instance, in the case of 2 -regularization, the objective is to obtain x that minimizes where L is an appropriately chosen penalty matrix (or regularization matrix) of dimension N × d and λ > 0 is the regularization parameter, an appropriately chosen constant. This regularization technique is known as general 2 -regularization or Tikhonov regularization in the literature [Hem75,HH93,Bis95,GHO99,vW15]. It is a generalization of ridge regression which corresponds to the case when L is the identity matrix [HK00, Mar70,Vin78]. The closed-form solution of the general 2 -regularized ordinary least squares problem is given by A straightforward observation is that even when A T A is singular, a judicious choice of the penalty matrix L can ensure that the effective condition number (ratio of the maximum and the minimum non-zero singular values) of the overall matrix is finite and A T A + λL T L is invertible.
In this paper, we develop quantum algorithms for linear regression with general 2regularization. If the optimal solution is x = (x 1 , · · · , x d ) T , then our quantum algorithm outputs a quantum state that is δ-close to |x = d j=1 x j |j / x , assuming access to the matrices A, L, and the quantum state |b via general quantum input models.
In several practical scenarios, depending on the underlying theoretical model, generalizations of the ordinary least squares (OLS) technique are more useful to fit the data. For instance, certain samples may be of more importance (and therefore have more weight) than the others, in which case weighted least squares (WLS) is preferred. Generalized least squares (GLS) is used when the underlying samples obtained are correlated. These techniques also suffer from the issues commonplace with OLS, warranting the need for regularization [vW15]. Consequently, we also design algorithms for quantum WLS with general 2 -regularization and quantum GLS with general 2 -regularization.
Organization of the paper: In the remainder of Section 1, we formally describe 2regularized versions of OLS, WLS, and GLS (Section 1.1), discuss prior and related work (Section 1.2), and outline our contributions and results (Section 1.3). In Section 2, we briefly outline the framework of block-encoding and quantum input models that are particular instances of it (Section 2.2). We also briefly introduce quantum singular value transformation (QSVT) (Section 2.3) and variable time amplitude amplification (VTAA) (Section 2.4). Following this, in Section 3, we develop several algorithmic primitives involving arithmetic of block-encodings (Section 3.1), quantum singular value discrimination (Section 3.2) and quantum linear algebra using QSVT (Section 3.3). These are the technical building blocks for designing our quantum regularized regression algorithms. Using these algorithmic primitives, we design quantum algorithms for the quantum least squares with 2 -regularization in Section 4. Finally, we conclude by discussing some possible future research directions in Section 5.

Linear regression with 2 -regularization
Suppose we are given data points i.e. they are sampled i.i.d. from some unknown distribution D, assumed to be linear. We want to find a vector x ∈ R d such that the inner product x T a j is a good predictor for the target b j for some unknown a j . This can be done by minimizing the total squared loss over the given data points, leading to the ordinary least squares (OLS) optimization problem. The task then is to find x ∈ R d that minimizes Ax − b 2 2 , where A is the N × d data matrix such that the i th row of A is a i , and the i th element of the vector b is b i . Assuming that A T A is non-singular, the optimal x satisfies which corresponds to solving a linear system of equations. Suppose that out of the samples present in the data, we have higher confidence in some of them than others. In such a scenario, the i th observation can be assigned a weight w i ∈ R. This leads to a generalization of the OLS problem to weighted least squares (WLS). In order to obtain the best linear fit, the task is now to minimize the weighted version of the loss As before, assuming A T W A is non-singular, the above loss function has the following closed-form solution: where W is a diagonal matrix with w i being the i th diagonal element. There can arise scenarios where there exists some correlation between any two samples. For generalized least squares (GLS), the presumed correlations between pairs of samples are given in a symmetric, non-singular covariance matrix Ω. This objective is to find the vector x that minimizes Similarly, the closed-form solution for GLS is given by As mentioned previously, in several practical scenarios, the linear regression problem may be ill-posed or suffer from overfitting. Furthermore, the data may be such that some of the columns of the matrix A are linearly dependent. This shrinks the rank of A, and consequently of the matrix A T A, rendering it singular and, therefore non-invertible. Recall that the closed-form solution of OLS exists only if A T A is non-singular, which is no longer the case. Such scenarios arise even for WLS and GLS problems [vW15].
In such cases, one resorts to regularization to deal with them. Let L be the loss function to be minimized for the underlying least squares problem (such as OLS, WLS, or GLS). Then general 2 -regularization (Tikhonov regularization) involves an additional penalty term so that the objective now is to find the vector x ∈ R d that minimizes Here λ, known as the regularization parameter, is a positive constant that controls the size of the vector x, while L is known as the penalty matrix (or regularization matrix) that defines a (semi)norm on the solution through which the size is measured. The solution to the Tikhonov regularization problem also has a closed-form solution. For example, in the OLS problem, when L = L O , we have that It is worth noting that when L = I, the 2 -regularized OLS problem is known as ridge regression. For the unregularized OLS problem, the singular values of A, σ j are mapped to 1/σ j . The penalty term due to 2 -regularization, results in a shrinkage of the singular values. This implies that even in the scenario where A has linearly dependent columns (some σ j = 0) and (A T A) −1 does not exist, the inverse (A T A + λL T L) −1 is well defined for λ > 0 and any positive-definite L. Throughout this article, we refer to such an L (which is positive definite) as a good regularizer . The penalty matrix L allows for penalizing each regression parameter differently and leads to joint shrinkage among the elements of x. It also determines the rate and direction of shrinkage. In the special case of ridge regression, as L = I, the penalty shrinks each element of x equally along the unit vectors e j . Also note that by definition, I is a good regularizer . Closed-form expressions can also be obtained for the WLS and the GLS problem (L = L W , L Ω respectively), and finding the optimal solution x reduces to solving a linear system. The quantum version of these algorithms output a quantum state that is -close |x = j x j |j / x . Throughout this work, while designing our quantum algorithms, we shall assume access (via a block-encoding) to the matrices A, W , Ω, and L and knowledge of the parameter λ. Classically, however, the regularization matrix L and the optimal parameter λ are obtained via several heuristic techniques [HH93,GHO99,vW15].

Prior work
Quantum algorithms for (unregularized) linear regression was first developed by Wiebe et al. [WBL12], wherein the authors made use of the HHL algorithm for solving a linear system of equations [HHL09]. Their algorithm assumes query access to a sparse matrix A (sparse-access-model ) and to a procedure to prepare |b = i b i |i . They first prepare a quantum state proportional to A T |b , and then use the HHL algorithm to apply the operator (A T A) −1 to it. Overall the algorithm runs in a time scaling as κ 6 A (the condition number of A) and inverse polynomial in the accuracy δ. Subsequent results have considered the problem of obtaining classical outputs for linear regression. For instance, in Ref. [Wan17], A + is directly applied to the quantum state |b , followed by amplitude estimation to obtain the entries of x. On the other hand, Ref. [SSP16] used the techniques of quantum principal component analysis in [LMR14] to predict a new data point for the regression problem. These algorithms also work in the sparse access model and run in a time that scales as poly (κ, 1/δ). Kerenidis and Prakash [KP20] provided a quantum algorithm for the WLS problem wherein they used a classical data structure to store the entries of A and W . Furthermore, they assumed QRAM access to this data structure [Pra14,KP17] that would allow the preparation of quantum states proportional to the entries of A and W efficiently. They showed that in this input model (quantum data structure model ), an iterative quantum linear systems algorithm can prepare |x in applied the framework of block-encoding along with (controlled) Hamiltonian simulation of Low and Chuang [LC19] to design improved quantum algorithms for solving linear systems. Quantum algorithms developed in the block-encoding framework are applicable to a wide variety of input models, including the sparse access model and the quantum data structure model of [KP20]. They applied their quantum linear systems solver to develop quantum algorithms for quantum weighted least squares and generalized least squares. Their quantum algorithm for WLS has a complexity that is in O (ακpolylog(N d/δ)), where α = s, the sparsity of the matrix A T √ W in the sparse access model while α = √ W A F , for the quantum data structure input model. For GLS, their quantum algorithm outputs |x in cost O (κ A κ Ω (α A + α Ω κ Ω )polylog(1/δ)), where κ A and κ Ω are the condition numbers of A and Ω respectively while α A and α Ω are parameters that depend on how the matrices A and Ω are accessed in the underlying input model.
While quantum linear regression algorithms have been designed and subsequently improved over the years, quantum algorithms for regularized least squares have not been developed extensively. Yu et al. [YGW21] developed a quantum algorithm for ridge regression in the sparse access model using the LMR scheme [LMR14] for Hamiltonian simulation and quantum phase estimation, which they then used to determine the optimal value of the parameter λ. Their algorithm to output |x has a cubic dependence on both κ and 1/δ. They use this as a subroutine to determine a good value of λ. A few other works [SX20,CYGL22] have considered the quantum ridge regression problem in the sparse access model, all of which can be implemented with poly(κ, 1/δ) cost.
Recently, Chen and de Wolf designed quantum algorithms for lasso ( 1 -regularization) and ridge regressions from the perspective of empirical loss minimization [CdW21]. For both lasso and ridge, their quantum algorithms output a classical vector x whose loss (mean squared error) is δ-close to the minimum achievable loss. In this context, they prove a quantum lower bound of Ω(d/δ) for ridge regression which indicates that in their setting, the dependence on d cannot be improved on a quantum computer (the classical lower bound is also linear in d and there exists a matching upper bound). Note that x is not necessarily close to the optimal solution x of the corresponding least squares problem, even though their respective loss values are. Moreover, their result (of outputting a classical vector x) is incomparable to our objective of obtaining a quantum state encoding the optimal solution to the regularized regression problem.
Finally, Gilyén et al. obtained a "dequantized" classical algorithm for ridge regression assuming norm squared access to input data similar to the quantum data structure input model [GST22]. Furthermore, similar to the quantum setting where the output is the quantum state |x = j x j |j / x instead of x itself, their algorithm obtains samples from the distribution x 2 j / x 2 . For the regularization parameter λ = O ( A A F ), the running time of their algorithm is in O κ 12 r 3 A /δ 4 , where r A is the rank of A. Their result (and several prior results) does not have a polynomial dependence on the dimension of A and therefore rules out the possibility of generic exponential quantum speedup (except in δ) in the quantum data structure input model.

Our contributions
In this work, we design the first quantum algorithms for OLS, WLS, and GLS with general 2 -regularization. We use the Quantum Singular Value Transformation (QSVT) framework introduced by Gilyén et al [GSLW19]. We assume that the relevant matrices are provided as input in the block-encoding model, in which access to an input matrix A is given by a unitary U A whose top-left block is (close to) A/α. The parameter α takes specific values depending on the underlying input model. QSVT then allows us to implement nearly arbitrary polynomial transformations to a block of a unitary matrix using a series of parameterized, projector-controlled rotations (quantum signal processing [LC17b]).
More precisely, given approximate block-encodings of the data matrix A and the regularizing matrix L, and a unitary procedure to prepare the state |b , our quantum algorithms output a quantum state that is δ-close to |x , the quantum state proportional to the 2regularized ordinary least squares (or weighted least squares or generalized least squares problem). We briefly summarize the query complexities of our results in Table 1.
For the OLS problem with general 2 -regularization (Section 4.2, Theorem 32), we design a quantum algorithm which given an (α A , a A , ε A )-block-encoding of A (implemented in cost T A ), an (α L , a L , ε L )-block-encoding of L (implemented in cost T L ), a parameter λ > 0, and a procedure to prepare |b (in cost T b ), outputs a quantum state which is δ-close to |x . The algorithm has a cost where κ can be thought of as a modified condition number, related to the effective condition numbers of A and L. When L is a good regularizer, this is given by the expression Notice that κ is independent of κ A , the condition number of the data matrix A, which underscores the advantage of regularization. The parameters α A and α L take specific values depending on the underlying input model. For the sparse access input model, α A = s A and α L = s L , the respective sparsities of the matrices A and L. On the other hand for the quantum data structure input model, α A = A F and α L = L F . Consequently, the complexity of Quantum Ridge Regression can be obtained by substituting L = I in the above complexity as where κ = 1 + A / √ λ, by noting that the block-encoding of I is trivial while the norm and condition number of the identity matrix is one. For this problem of quantum ridge regression, our quantum algorithms are substantially better than prior results [SX20,YGW21,CYGL22], exhibiting a polynomial improvement in κ and an exponential improvement in 1/δ.
For the 2 -regularized GLS problem (Section 4, Theorem 42), we design a quantum algorithm that along with approximate block-encodings of A and L, takes as input an (α Ω , a Ω , ε Ω )-lock-encoding of the matrix Ω (implementable at a cost of T Ω ) to output a state δ-close to |x at a cost of In the above complexity, when L is a good regularizer, the modified condition number κ is defined as The WLS problem is a particular case of GLS, wherein the matrix Ω is diagonal. However, we show that better complexities for the 2 -regularized WLS problem can be obtained if we assume QRAM access to the diagonal entries of W (Section 4, Theorem 39 and Theorem 40). Table 1 summarizes the complexities of our algorithms for quantum linear regression with general 2 -regularization. For better exposition, here we assume that A , L , Ω and λ = Θ (1). For the general expression of the complexities, we refer the readers to Section 4.

Problem
Unregularized In order to derive our results, we take advantage of the ability to efficiently perform arithmetic operations on block-encoded matrices, as outlined in Section 3. Along with this, we use QSVT to perform linear algebraic operations on block-encoded matrices. To this end, adapt the results in Refs. [GSLW19, MRTC21] to our setting. One of our contributions is that we work with robust versions of many of these algorithms. In prior works, QSVT is often applied to block-encoded matrices, assuming perfect block-encoding. For the quantum algorithms in this paper, we rigorously obtain the precision ε required to obtain a δ-approximation of the desired output state.
For instance, a key ingredient of our algorithm for regularized least squares is to make use of QSVT to obtain A + , given an ε-approximate block-encoding of A. In order to obtain a (near) optimal dependence on the condition number of A by applying variable-time amplitude amplification (VTAA) [Amb12], we recast the standard QSVT algorithm as a variable stopping-time quantum algorithm. Using QSVT instead of controlled Hamiltonian simulation ensures that the variable-time quantum procedure to prepare A + |x has a slightly better running time (by a log factor) and considerably fewer additional qubits than Refs. [CKS17,CGJ19].
Furthermore, for the variable time matrix inversion algorithm, a crucial requirement is the application of the inversion procedure to the portion of the input state that is spanned by singular values larger than a certain threshold. In order to achieve this, prior results have made use of Gapped Phase Estimation (GPE), which is a simple variant of the standard phase estimation procedure that decides whether the eigenvalue of a Hermitian operator is above or below a certain threshold [Amb12, CKS17, CGJ19]. However, GPE can only be applied to a Hermitian matrix and requires additional registers that store the estimates of the phases, which are never used for variable-time amplitude amplification. In this work, instead of GPE, we develop a robust version of quantum singular value discrimination (QSVD) using QSVT, which can be directly applied to non-Hermitian matrices. This algorithm decides whether some singular value of a matrix is above or below a certain threshold without storing estimates of the singular values. This leads to a space-efficient variable time quantum algorithm for matrix inversion by further reducing the number of additional qubits required by a factor of O(log 2 (κ/δ)) as compared to prior results [CKS17,CGJ19]. Consequently, this also implies that in our framework, quantum algorithms for (unregularized) least squares (which are special cases of our result) have better complexities than those of Ref. [CGJ19].

Preliminaries
This section lays down the notation, and introduces the quantum singular value transformation (QSVT) and block-encoding frameworks, which are used to design the algorithm for quantum regression.

Notation
For a matrix A ∈ R N ×d , A i,. denotes the i th row of A, and A i,· denotes the vector norm of A T i,. . s A r and s A c denote the row and column sparsity of the matrix, which is the maximum number of non-zero entries in any row and any column, respectively.
If A is full-rank, then σ min (A) = σ min (A), and κ A becomes the condition number of the matrix. In this text, unless stated otherwise, we always refer to κ A as (an upper bound on) effective condition number of a matrix, and not the true condition number.
Norm. Unless otherwise specified, A denotes the spectral norm of A, while A F denotes the Frobenius norm of A, defined as Unless otherwise specified, when A is assumed to be normalized, it is with respect to the spectral norm.
Controlled Unitaries. If U is a s-qubit unitary, then C-U is a (s + 1)-qubit unitary defined by Throughout this text whenever we state that the time taken to implement a unitary U A is T A and the cost of an algorithm is O(nT A ), we imply that the algorithm makes n uses of the unitary U A . Thus, if the circuit depth of U A is T A , the circuit depth of our algorithm is O(nT A ).

Quantum Input Models
The complexities of quantum algorithms often depend on how the input data is accessed. For instance, in quantum algorithms for linear algebra (involving matrix operations), it is often assumed that there exists a black-box that returns the positions of the non-zero entries of the underlying matrix when queried. The algorithmic running time is expressed in terms of the number of queries made to this black-box. Such an input model, known as the Sparse Access Model, helps design efficient quantum algorithms whenever the underlying matrices are sparse. Various other input models exist, and quantum algorithms are typically designed and optimized for specific input models.
Kerenidis and Prakash [KP17, Section 5.1] introduced a different input model, known as the quantum data structure model, which is more conducive for designing quantum machine learning algorithms. In this model, the input data (e.g: entries of matrices) arrive online and are stored in a classical data structure (often referred to as the KP-tree in the literature), which can be queried in superposition by using a QRAM. This facilitates efficiently preparing quantum states corresponding to the rows of the underlying matrix, that can then be used for performing several matrix operations. Subsequently, several quantuminspired classical algorithms have also been developed following the breakthrough result of Tang [Tan19]. Such classical algorithms have the same underlying assumptions as the quantum algorithms designed in the data structure input model and are only polynomially slower provided the underlying matrix is low rank.
In this work, we will consider the framework of block-encoding, wherein it is assumed that the input matrix A (up to some sub-normalization) is stored in the left block of some unitary. The advantage of the block-encoding framework, which was introduced in a series of works [LC19, Definition 1], [CGJ19, Section 1], [GSLW19, Section 1.3], is that it can be applied to a wide variety of input models. For instance, it can be shown that both the sparse access input model as well as the quantum data structure input model are specific instances of block-encoded matrices [CGJ19, Sections 2.2 and 2.4], [GSLW19, Section 5.2]. Here we formally define the framework of block-encoding and also express the sparse access model as well as the quantum data structure model as block-encodings. We refer the reader to [CGJ19, GSLW19] for proofs.

Definition 1 (Block Encoding, restated from [GSLW19], Definition 24). Suppose that A
is an s-qubit operator, α, ε ∈ R + and a ∈ N, then we say that the Let |ψ be an s-qubit quantum state. Then applying U A to |ψ |0 ⊗a outputs a quantum state that is ε α -close to In the subsequent sections, we provide an outline of the quantum data structure model and the sparse access model which are particular instances of the block encoding framework.

Quantum Data Structure Input Model
Kerenidis and Prakash introduced a quantum accessible classical data structure which has proven to be quite useful for designing several quantum algorithms for linear algebra [KP17]. The classical data structure stores entries of matrices or vectors and can be queried in superposition using a QRAM (quantum random access memory). We directly state the following theorem from therein.
Theorem 2 (Implementing quantum operators using an efficient data structure, [Pra14,KP17]). Let A ∈ R N ×d , and w be the number of non-zero entries of A. Then there exists a data structure of size O w log 2 (dN ) that given the matrix elements (i, j, a ij ), stores them at a cost of O(log (dN )) operations per element. Once all the non-zero entries of A have been stored in the data structure, there exist quantum algorithms that are ε-approximations to the following maps: where A i,· is the norm of the i th row of A and the second register of |ψ i is the quantum state corresponding to the i th row of A. These operations can be applied at a cost of O(polylog(N d/ε)).

It was identified in Ref. [CGJ19] that if a matrix
A is stored in this quantum accessible data structure, there exists an efficiently implementable block-encoding of A. We restate their result here.
Lemma 3 (Implementing block encodings from quantum data structures, [CGJ19], Theorem 4). Let the entries of the matrix A ∈ R N ×d be stored in a quantum accessible data structure, then there exist unitaries U R , U L that can be implemented at a cost of Proof. The unitaries U R and U L can be implemented via U and V in the previous lemma.
Moreover since only ε-approximations of U and V can be implemented we have that U † R U L is a ( A F , log(n + d) , ε) block encoding of A implementable with the same cost as U and V .
In Ref. [KP20] argued that in certain scenarios, storing the entries of A (p) , (A 1−p ) † might be useful as compared to storing A, for some p ∈ [0, 1]. In such cases, the quantum data structure is a (µ p , Throughout the work, whenever our results are expressed in the quantum data structure input model, we shall state our complexity in terms of µ A . When the entries of A are directly stored in the data structure, µ A = A F . Although, we will not state it explicitly each time, our results also hold when fractional powers of A are stored in the database and simply substituting µ A = µ p (A), yields the required complexity.

Sparse Access Input Model
The sparse access input model considers that the input matrix A ∈ R N ×d has row sparsity s r and column sparsity s c . Furthermore, it assumes that the entries of A can be queried via an oracle as and the indices of the non-zero elements of each row and column can be queried via the following oracles: where r ij is the j th non-zero entry of the i th row of A and c ij is the i th non-zero entry of the j th column of A. Gilyén et al. [GSLW19] showed that a block encoding of a sparse A can be efficiently prepared by using these three oracles. We restate their lemma below.
Lemma 4 (Constructing a block-encoding from sparse-access to matrices, [GSLW18], Lemma 48). Let A ∈ R N ×d be an s r , s c row, column sparse matrix given as a sparse access input. Then for all ε ∈ (0, 1), Throughout the paper, we shall assume input matrices are accessible via approximate block-encodings. This also allows us to write down the complexities of our quantum algorithms in this general framework. Additionally, we state the complexities in both the sparse access input model as well as the quantum accessible data structure input model as particular cases.

Quantum Singular Value Transformation
In a seminal work, Gilyén et al. presented a framework to apply an arbitrary polynomial function to the singular values of a matrix, known as Quantum Singular Value Transformation (QSVT) [GSLW19]. QSVT is quite general: many quantum algorithms can be recast to this framework, and for several problems, better quantum algorithms can be obtained [GSLW19,MRTC21]. In particular, QSVT has been extremely useful in obtaining optimal quantum algorithms for linear algebra. For instance, using QSVT, given the block-encoding of a matrix A, one could obtain A −c with c ∈ [0, ∞) with optimal complexity and by using fewer additional qubits than prior art. This section briefly describes this framework, which is a generalization of Quantum Signal Processing (QSP) [LC19, Section 2], [LC17b, Theorem 2], [LYC16]. The reader may refer to [MRTC21] for a more pedagogical overview of these techniques.
Let us begin by discussing the framework of Quantum Signal Processing. QSP is a quantum algorithm to apply a d-degree bounded polynomial transformation with parity d mod 2 to an arbitrary quantum subsystem, using a quantum circuit U Φ consisting of only controlled single qubit rotations. This is achieved by interleaving a signal rotation operator W (which is an x-rotation by some fixed angle θ) and a signal processing operator S φ (which is a z-rotation by a variable angle φ ∈ [0, 2π]). In this formulation, the signal rotation operator is defined as which is an x-rotation by angle θ = −2 arccos(x), and the signal processing operator is defined as which is a z-rotation by an angle −2φ. Interestingly, sandwiching them together for some Φ := (φ 0 , φ 1 , . . . φ d ) ∈ R d+1 , as shown in Equation 14, gives us a matrix whose elements are polynomial transformations of x, such that Following the application of the quantum circuit U Φ for an appropriate Φ, one can project into the top left block of U Φ to recover the polynomial 0| U Φ |0 = P (x). Projecting to other basis allows the ability to perform more interesting polynomial transformations, which can be linear combinations of P (x), Q(x), and their complex conjugates. For example, projecting to {|+ , |− } basis gives us Quantum Signal Processing can be formally stated as follows.
Theorem 5 (Quantum Signal Processing, Corollary 8 from [GSLW19]). Let P ∈ C[x] be a polynomial of degree d ≥ 2, such that • P has parity-(d mod 2), Thus, QSP allows us to implement any polynomial P (x) that satisfies the aforementioned requirements. Throughout this article, we refer to any such polynomial P (x) as a QSP polynomial. Quantum Singular Value Transformation is a natural generalization of this procedure. It allows us to apply a QSP polynomial transformation to each singular value of an arbitrary block of a unitary matrix. In addition to this generalization, QSVT relies on the observation that several functions can be well-approximated by QSP polynomials. Thus, through QSVT one can transform each singular value of a block-encoded matrix by any function that can be approximated by a QSP polynomial. Since several linear algebra problems boil down to applying specific transformations to the singular values of a matrix, QSVT is particularly useful for developing fast algorithms for quantum linear algebra. Next, we introduce QSVT formally via the following theorem.
Theorem 6 (Quantum Singular Value Transformation [GSLW18], Section 3.2). Suppose A ∈ R N ×d is a matrix with singular value decomposition A = d min j=1 σ j |v j w j |, where d min = min{N, d} and |v j (|w j ) is the left (right) singular vector with singular value σ j . Furthermore, let U A be a unitary such that A = ΠU A Π, where Π and Π are orthogonal projectors. Then, for any QSP polynomial P (x) of degree n, there exists a vector Φ = (φ 1 , φ 2 , · · · φ n ) ∈ R n and a unitary such that where P SV (A) is the polynomial transformation of the matrix A defined as (20) Theorem 6 tells us that for any QSP polynomial P of degree n, we can implement P SV (A) using one ancilla qubit, Θ(n) applications of U A , U † A and controlled reflections I − 2Π and I − 2 Π. Furthermore, if in some well-defined interval, some function f (x) is well approximated by an n-degree QSP polynomial P (x), then Theorem 6 also allows us to implement a transformation that approximates The following theorem from Ref. [GSLW18] deals with the robustness of the QSVT procedure, i.e. how errors propagate in QSVT. In particular, for two matrices A andÃ, it shows how close their polynomial transformations (P SV (A) and P SV ( A), respectively) are, as a function of the distance between A andÃ.
Let P ∈ C[x] be a QSP polynomial of degree n. Let A,Ã ∈ C N ×d be matrices of spectral norm at most 1, such that Then, We will apply this theorem to develop a robust version of QSVT. More precisely, in order to implement QSVT, we require access to a unitary U A , which is a block-encoding of some matrix A. This block-encoding, in most practical scenarios, is not perfect: we only have access to a ε-approximate block-encoding of A. If we want an δ-accurate implementation of P SV (A), how precise should the block-encoding of A be? Such a robustness analysis has been absent from prior work involving QSVT and will allow us to develop robust versions of a number of quantum algorithms in subsequent sections. The following theorem determines the precision ε required in the block-encoding of A in terms of n, the degree of the QSP polynomial that we wish to implement and δ, the accuracy of P SV (A).
Proof. LetÃ be the encoded block of U , then A −Ã ≤ ε. Applying QSVT on U with the polynomial P , we get a block-encoding for P (Ã/α), with O(n) uses of U, U † , and as many multiply-controlled NOT gates.
Therefore the error in the final block-encoding is given by invoking Lemma 7 with matrices A/α,Ã/α: In Section 3, we will make use of Theorem 8, to develop robust quantum algorithms for singular value discrimination, variable-time matrix inversion, positive and negative powers of matrices. Subsequently, in Sec. 4, we shall combine algorithmic primitives to design robust quantum regularized least squares algorithms.

Variable Time Amplitude Amplification
Ambainis [Amb12] defined the notion of a variable-stopping-time quantum algorithm and formulated the technique of Variable Time Amplitude Amplification (VTAA), a tool that can be used to amplify the success probability of a variable-stopping-time quantum algorithm to a constant by taking advantage of the fact that computation on some parts of an algorithm can complete earlier than on other parts. The key idea here is to look at a quantum algorithm A acting on a state |ψ as a combination of m quantum sub-algorithms A = A m · A m−1 · . . . A 1 , each acting on |ψ conditioned on some ancilla flag being set. Formally, a variable stopping time algorithm is defined as follows Here H C i is a single qubit clock register. In VTAA, H A has a flag space consisting of a single qubit to indicate success, H A = H F ⊗ H W . Here H F = Span(|g , |b ) flags the good and bad parts of the run. Furthermore, for 1 ≤ i ≤ m, define the stopping times t i such that t 1 < t 2 < · · · t m = T max , such that the algorithm A j A j−1 · · · A 1 having (gate/query) complexity t i halts with probability where |0 H ∈ H is the all zero quantum state and Π C j is the projector onto |1 in H C j . From this one can define the average stopping time of the algorithm A defined as For a variable stopping time algorithm if the average stopping time T 2 is less than the maximum stopping time T max , VTAA can amplify the success probability (p succ ) much faster than standard amplitude amplification. In this framework, the success probability of A is given by Then with success probability ≥ 1 − δ, we can create a variable-stopping time algorithm A that prepares the state a |0 A |ψ 0 + √ 1 − a 2 |1 |ψ garbage , such that a = Θ(1) is a constant and A has the complexity O(Q).
One cannot simply replace standard amplitude amplification with VTAA to boost the success probability of a quantum algorithm. A crucial task would be to recast the underlying algorithm in the VTAA framework. We will be applying VTAA to the quantum algorithm for matrix inversion by QSVT. So, first of all, in order to apply VTAA to the algorithm must be first recast into a variable-time stopping algorithm so that VTAA can be applied.
Originally, Ambainis [Amb12] used VTAA to improve the running time of the HHL algorithm from O κ 2 log N to O κ log 3 κ log N . Childs et al. [CKS17] designed a quantum linear systems algorithm with a polylogarithmic dependence on the accuracy. Additionally, they recast their algorithm into a framework where VTAA could be applied to obtain a linear dependence on κ. Later Chakraborty et al. [CGJ19] modified Ambainis' VTAA algorithm to perform variable time amplitude estimation.
In this work, to design quantum algorithms for 2 -regularized linear regression, we use a quantum algorithm for matrix inversion by QSVT. We recast this algorithm in the framework of VTAA to achieve nearly linear dependence in κ (the effective condition number of the matrix to be inverted). Using QSVT instead of controlled Hamiltonian simulation improves the complexity of the overall matrix inversion algorithm (QSVT and VTAA) by a log factor. It also reduces the number of additional qubits substantially. Furthermore, we replace a gapped quantum phase estimation procedure with a more efficient quantum singular value discrimination algorithm using QSVT. This further reduces the number of additional qubits by O(log 2 (κ/δ)) than in Refs. [CKS17,CGJ19], where κ is the condition number of the underlying matrix and δ is the desired accuracy. The details of the variable stopping time quantum algorithm for matrix inversion by QSVT are laid out in Section 3.3.

Algorithmic Primitives
This section introduces the building blocks of our quantum algorithms for quantum linear regression with general 2 -regularization. As mentioned previously, we work in the block-encoding framework. We develop robust quantum algorithms for arithmetic operations, inversion, and positive and negative powers of matrices using quantum singular value transformation, assuming we have access to approximate block-encodings of these matrices. While some of these results were previously derived assuming perfect block-encodings [GSLW19, CGJ19], we calculate the precision required in the input block-encodings to output a block-encoding or quantum state arbitrarily close to the target.
Given a (α, a, ε)-block-encoding of a matrix A, we can efficiently amplify the subnormalization factor from α to a constant and obtain an amplified block-encoding of A. For our quantum algorithms in Sec. 4, we show working with pre-amplified blockencodings often yields better complexities. We state the following lemma which was proven in Ref. [LC17a]: Lemma 11 (Uniform Block Amplification of Contractions, [LC17a]). Let A ∈ R N ×d such that A ≤ 1 If α ≥ 1 and U is a (α, a, ε)-block-encoding of A that can be implemented at a cost of T U , then there is a ( √ 2, a + 1, ε + γ)-block-encoding of A that can be implemented at a cost of O(αT U log (1/γ)).

Corollary 12 (Uniform Block Amplification). Let
We now obtain the complexity of applying a block-encoded matrix to a quantum state, which is a generalization of a lemma proven in Ref. [CGJ19].

Arithmetic with Block-Encoded Matrices
The block-encoding framework embeds a matrix on the top left block of a larger unitary U . It has been demonstrated that this framework allows us to obtain sums, products, linear combinations of block-encoded matrices. This is particularly useful for solving linear algebra problems in general. Here, we state some of the arithmetic operations on blockencoded matrices that we shall be using in order to design the quantum algorithms of Section 4 and tailor existing results to our requirements.
First we prove a slightly more general form of linear combination of unitaries in the block-encoding framework, presented in [GSLW19]. To do this we assume that we are given optimal state preparation pairs, defined as follows.
The proof is similar to the one in Ref. [GSLW19], with some improvements to the bounds. The detailed proof can be found in Appendix A. We now specialize the above lemma for the case where we need a linear combination of just two unitaries. This is the case used in this work, and we obtain a better error scaling for this by giving an explicit state preparation unitary.
Given block-encodings of two matrices A and B, it is easy to obtain a block-encoding of AB.
Lemma 19 (Product of Block Encodings, [GSLW18], Lemma 53). If U A is an (α, a, δ)block-encoding of an s-qubit operator A implemented in time T A , and U B is a (β, b, ε) Directly applying Lemma 19 results in a block-encoding of AB αβ . If α and β are large, then the sub-normalization factor αβ might incur an undesirable overhead to the cost of the algorithm that uses it. In many cases, the complexity of obtaining products of blockencodings can be improved if we first amplify the block-encodings (using Lemma 12) and then apply Lemma 19. We prove the following lemma: Proof. Using Corollary 12 for some δ A ≥ 2ε A we get a ( Similarly for some δ B ≥ 2ε B we get a ( Now using Lemma 19 we get a (2, a A + a B + 2, 2 A which bounds the final block-encoding error by δ. Observe that we have assumed that A and B are s-qubit operators. For any two matrices of dimension N × d and d × K, such that N, d, K ≤ 2 s , we can always pad them with rows and columns of zero entries and convert them to s-qubit operators. Thus, in the scenario where A and B are not s-qubit operators, one can consider block encodings of padded versions of these matrices. Note that this does not affect the operations on the sub-matrix blocks encoding A and B. Thus, the above results can be used to perform block-encoded matrix products for arbitrary (compatible) matrices.
Next we show how to find the block encoding of tensor product of matrices from their block encodings. This procedure will be useful in creating the dilated matrices required for regularization. The proof can be found in Appendix A.
We will now use Lemma 21 to augment one matrix into another, given their approximate block-encodings.
Lemma 22 (Block-encoding of augmented matrix). If U A is an (α A , a A , ε A )-block encoding of an s-qubit operator A that can be implemented in time T A and U B is an (α B , a B , ε B )block encoding of an s-qubit operator B that can be implemented in time T B , then we an implement an (α A + α B , max(a A , a B

Robust Quantum Singular Value Discrimination
The problem of deciding whether the eigenvalues of a Hamiltonian lie above or below a certain threshold, known as eigenvalue discrimination, finds widespread applications. For instance, the problem of determining whether the ground energy of a generic local Hamiltonian is < λ a or > λ b is known to be QMA-Complete [KKR06]. Nevertheless, quantum eigenvalue discrimination has been useful in preparing ground states of Hamiltonians. Generally, a variant of quantum phase estimation, which effectively performs a projection onto the eigenbasis of the underlying Hamiltonian, is used to perform eigenvalue discrimination [GTC19]. Recently, it has been shown that QSVT can be used to approximate a projection onto the eigenspace of an operator by implementing a polynomial approximation of the sign function [LT20a]. This was then used to design improved quantum algorithms for ground state preparation. In our work, we design a more general primitive, known as Quantum Singular Value Discrimination (QSVD). Instead of eigenvalues, the algorithm distinguishes whether a singular value σ is ≤ σ a or ≥ σ b . This is particularly useful when the block-encoded matrix is not necessarily Hermitian and hence, may not have well-defined eigenvalues. We use this procedure to develop a more space-efficient variable stopping time matrix inversion algorithm in Section 3.3. Owing to the widespread use of singular values in a plethora of fields, we believe that our QSVD procedure is of independent interest. Let us define the sign function sign : R → R as follows: Given a threshold singular value c, Low and Chuang [LC17a] showed that there exists a polynomial approximation to sign(c − x) (based on its approximation of the erf function). We use the result of Ref. [MRTC21], where such a polynomial of even parity was considered. This is crucial, as for even polynomials, QSVT maps right (left) singular vectors to right (left) singular vectors, which enables us to use the polynomial in [MRTC21] for singular value discrimination.

Lemma 23 (Polynomial approximation to the sign function [LC17a, Low17, MRTC21]).
For any ε, ∆, c ∈ (0, 1), there exists an efficiently computable even polynomial P ε,∆,c (x) of degree l = O 1 ∆ log(1/ε) such that Therefore, given a matrix A with singular values between [0, 1], we can use QSVT to implement P ε,∆,c (A) which correctly distinguishes between singular values of A whose value is less than c − ∆/2 and those whose value is greater than c + ∆/2. For our purposes, we shall consider that we are given U A , which is an (α, a, ε) block-encoding of a matrix A. Our goal would be to distinguish whether a certain singular value σ satisfies 0 ≤ σ ≤ ϕ or 2ϕ ≤ σ ≤ 1. Since U A (approximately) implements A/α, the task can be rephrased as distinguishing whether a singular value of A/α is in [0, ϕ/α] or in [2ϕ/α, 1]. For this, we develop a robust version of quantum singular value discrimination (QSV D(φ, δ)), which indicates the precision ε required to commit an error that is at most δ.

This algorithm has a cost of
Proof. We invoke Lemma 23 with parameters ε := δ 2 , c := 3ϕ 2α and ∆ := ϕ 2α , to construct an even polynomial P := P ε ,∆,c of degree n := O α ϕ log 1 ε , which is an ε -approximation of f (x) := sign 3ϕ 2α − x for x ∈ 0, ϕ α ∪ 2ϕ α , 1 . Invoking Theorem 8 with P and U A , we get U B -a (1, a + 1, γ)-block-encoding of B := P (A/α), implemented in cost O(nT A ), where ε must satisfy ε ≤ αγ/2n. Now consider the following unitary W that acts on s + a + 2 qubits: W is the required block-encoding of D, and SWAP [l,r] sequentially swaps adjacent qubits with indices in range [l, r] effectively moving qubit indexed l to the right of qubit r. (where qubits are zero-indexed, with higher indices for ancillas). LetB be the top-left block of U B (therefore B −B ≤ γ). Then we can extractD, the top-left block of W as follows: In Section 3.3, we develop a variable stopping time quantum algorithm for matrix inversion using QSVT. In order to recast the usual matrix inversion to the VTAA framework, we need to be able to apply this algorithm to specific ranges of the singular values of the matrix. This is achieved by applying a controlled QSVD algorithm, to determine whether the input singular vector corresponds to an singular value less than (or greater than) a certain threshold. Based on the outcome of controlled QSVD, the standard inversion algorithm is applied. These two steps correspond to sub-algorithms A j of the VTAA framework.
In prior works such as Refs. [Amb12, CKS17, CGJ19], gapped phase estimation (GPE) was used to implement this. GPE requires an additional register of O(log(κ) log(1/δ)) qubits to store the estimated phases. For the whole VTAA procedure, log κ such registers are needed. As a result, substituting GPE with QSVD, we save O log 2 (κ) log(1/δ) qubits.

Variable-Time Quantum Algorithm for Matrix Inversion using QSVT
Matrix inversion by QSVT applies a polynomial approximation of f (x) = 1/x, satisfying the constraints laid out in Section 2.3. Here, we make use of the result of [MRTC21] to implement A + . We adapt their result to the scenario where we have an approximate block-encoding of A as input. Finally, we convert this to a variable stopping time quantum algorithm and apply VTAA to obtain a linear dependence on the condition number of A.

Theorem 26 (Inverting Normalized Matrices using QSVT). Let A be a normalized matrix with non-zero singular values in the range
For some ε = o δ κ 2 A log(κ A /δ) and α ≥ 2, let U A be an (α, a, ε)-block-encoding of A, implemented in time T A . Then we can implement a (2κ A , a + 1, δ)-block-encoding of A + at a cost of Proof. We use the matrix inversion polynomial defined in Lemma 25, P := P M I φ,κ for this task, with κ = κ A α and an appropriate φ. This has a degree of n := O(κ A α log (κ A α/φ)). We invoke Theorem 8 to apply QSVT using the polynomial P above, block-encoding U A , and an appropriate error parameter γ such that ε ≤ αγ/2n, to get the unitary U , a (1, a + 1, γ)-block-encoding of P (A/α). As P is a (φ/2κ)-approximation of f (x) := 1/2κx, we have which implies U is a (1, a+1, γ +φ/2κ)-block-encoding of f (A/α). And because f (A/α) = αA + 2κ = A + /2κ A , we can re-interpret U as a (2κ A , a + 1, 2κ A γ + φ/α)-block-encoding of A + . Choosing 2κ A γ = φ/α = δ/2, the final block-encoding has an error of δ. This gives us φ = αδ/2 and γ = δ/4κ A , and Next, we design a map W (γ, δ) that uses QSVT to invert the singular values of a matrix if they belong to a particular domain. This helps us recast the usual matrix inversion algorithm as a variable-stopping-time algorithm and will be a key subroutine for boosting the success probability of this algorithm using VTAA. This procedure was also used in Refs. [CKS17,CGJ19] for the quantum linear systems algorithms.
Theorem 27 (Efficient inversion of block-encoded matrix). Let A be a normalized matrix with non-zero singular values in the range [1/κ, 1], for some κ ≥ 1. Let δ ∈ (0, 1]; 0 < γ ≤ 1. Let U A be an (α, a, ε)  Proof. Since we only need to invert the singular values in a particular range, we can use the procedure in Theorem 26 with κ A modified to the restricted range. That gives us the description of a quantum circuit W (γ, δ) that can implement the following map where |⊥ is an unnomalized state with no component along |0 Q . This has the same cost as Equation 24. Here f (A) |ψ − A + |ψ ≤ δ whenever |ψ is a unit vector in the span of the singular vectors of A corresponding to the singular values in [γ, 1]. This follows from the sub-multiplicativity property of the matrix-vector product. Next, we must transform the amplitude of the good part of the state to Θ(κ), independent of γ. To achieve this, we will have to flag it with an ancillary qubit to use a controlled rotation to modify the amplitude. Thus we add a single qubit |0 F register and flip this register controlled on register Q being in the state |0 (the good part). This gives us the transformation Then we use a controlled rotation to replace the amplitude γ/2 with some constant a −1 max which is independent of γ, which is achieved by introducing the relevant phase to the flag space

This gives us the desired W (γ, δ) as in Equation 23.
Given such a unitary W (γ, δ), Ref. [CGJ19] laid out a procedure for a variable time quantum algorithm A that takes as input the block encoding of an N × d matrix A, and a state preparation procedure U b : |0 ⊗n → |b , and outputs a quantum state that is a bounded distance away from A + |b / A + |b . In order to determine the branches of the algorithm on which to apply VTAA at a particular iteration, [CKS17,CGJ19,Amb12] use the technique of gapped phase estimation, which given a unitary U , a threshold φ and one of its eigenstate |λ , decides if the corresponding eigenvalue is a bounded distance below the threshold, or a bounded distance above it. In this work, we replace gapped phase estimation with the QSVD algorithm (Theorem 24) which can be applied directly to any block-encoded (not necessarily Hermitian) matrix A, and allows for saving on O log 2 (κ/δ) qubits.
The Variable time Algorithm: This algorithm will be a sequence of m sub-algorithms A = A m · A m−1 · . . . A 1 , where m = log κ + 1. The overall algorithm acts on the following registers: • m single qubit clock registers C i : i ∈ [m].
• An input register I, initialized to |0 ⊗s .
• Ancillary register space Q for the block encoding of A, initialized to |0 ⊗a .
• A single qubit flag register |0 F used to flag success of the algorithm.
Once we have prepared the above state space, we use the state preparation procedure to prepare the state |b . Now we can define how each A j acts on the state space. Let ε = δ amaxm . The action of A j can be broken down into two parts: 1. If C j−1 . . . C 1 is in state |0 ⊗(j−1) , apply QSVD(2 −j , ε ), (Theorem 24) to the state |b . The output is to be written to the clock register C j .
2. If the state of C j is now |1 , apply W (2 −j , ε ) to I ⊗ F ⊗ Q.
Additionally, we would need algorithms A = A m · · · A 1 which are similar to A, except that in Step 2, it implements W which sets the flag register to 1. That is, Now we are in a position to define the variable time quantum linear systems algorithm using QSVT.
Proof. The correctness of the algorithm is similar to that of Refs. [CKS17,CGJ19], except here, we use QSVD instead of gapped phase estimation. According to Lemma 10, we need T max (the maximum time any of the sub-algorithms A j take), T 2 2 (the 2 -averaged stopping time of the sub-algorithms), and √ p succ (the square root of the success probability.) Now each sub-algorithm consists of two steps, implementing QEVD with precision 2 −j and error ε , followed by W (2 −j , ε ). From Theorem 24, the first step costs O αT A 2 j log 1 ε , and the cost of implementing W (2 −j , ε ) is as described in Equation 24. Thus the overall cost of A j , which is the sum of these two costs, turns out to be Note that the time t j required to implement A j . . . A 1 is also the same as Equation 26. Also, The T 2 2 is dependent on the probability that A stops at the j th step. This is given by , where Π C j is the projector on |1 C j , the j th clock register. From this, T 2 2 can be calculated as Next we calculate the success probability.
Given these, we can use Lemma 10 to write the final complexity of matrix inversion with VTAA: The upper bound on the precision required for the input block-encoding, ε, can be calculated from the bounds on the precisions for W (κ, ε ) (Theorem 27) and QSVD(κ, ε ) (Theorem 24) as follows: The overall complexity is better by a log factor and requires O log 2 (κ/δ) fewer additional qubits as compared to the variable time algorithms in Refs. [CKS17,CGJ19].

Negative Powers of Matrices using QSVT
We consider the problem: given an approximate block-encoding of a matrix A, we need to prepare a block-encoding of A −c , where c ∈ (0, 1). This procedure will be used to develop algorithms for 2 -regularized versions of GLS. We will directly use the results of [GSLW19].  can construct a (2κ c , a + 1, δ) Proof. From Lemma 29, using ∆ := 1 κα and an appropriate ϕ ∈ (0, 1 2 ], we get an even QSP polynomial P := P c,ϕ,∆ which is ϕ-close to f (x) := 1 2κ c α c x c , and has degree n such that n = O ακ log 1 ϕ . Therefore Using Theorem 8 we can construct U P , a (1, a + 1, γ)-block-encoding of P (A/α), given that ε ≤ αγ 2n . Then from triangle inequality it follows that it is a (1, a + 1, ϕ + γ)block-encoding of f (A/α). And because f (A/α) = A −c 2κ c , U P can be re-interpreted as a (2κ c , a + 1, 2κ c (ϕ + γ))-block-encoding of A −c . We therefore choose ϕ = γ = δ 4κ c , and choose ε as Having discussed the necessary algorithmic primitives, we are now in a position to design quantum algorithms for linear regression with general 2 -regularization. We will first deal with ordinary least squares followed by weighted and generalized least squares.

Quantum Least Squares with General 2 -Regularization
In this section, we derive the main results of our paper, namely quantum algorithms for quantum ordinary least squares (OLS), quantum weighted least squares (WLS) and quantum generalized least squares (GLS) with 2 -regularization.

Quantum Ordinary Least Squares
such that a i ∈ R d and b i ∈ R, the objective of linear regression is to find x ∈ R d that minimizes the loss function Consider the N × d matrix A (known as the data matrix) such that the i th row of A is the vector a i transposed and the column vector b = (b 1 · · · b N ) T . Then, the solution to the OLS problem is given by For the 2 -regularized version of the OLS problem, a penalty term is added to its objective function. This has the effect of shrinking the singular values of A which helps overcome problems such as rank deficiency and overfitting for the OLS problem. The loss function to be minimized is of the form where L is the N × d penalty matrix and λ > 0 is the optimal regularizing parameter. The Therefore, for quantum ordinary least squares with general 2 -regularization, we assume that we have access to approximate block-encodings of the data matrix A, L and a procedure to prepare the quantum state |b = N j=1 b j |j / b . Our algorithm outputs a quantum state that is close to In order to implement a quantum algorithm that implements this, a straightforward approach would be the following: We first construct block-encodings of A T A and L T L, given block encodings of A and L, respectively (Using Lemma 19). We could then implement a block-encoding of A T A + λL T L using these block encodings (By Lemma 17). On the other hand, we could also prepare a quantum state proportional to A T |b by using the block-encoding for A and the unitary preparing |b . Finally, using the block encoding of A T A+λL T L, we could implement a block-encoding of (A T A+λL T L) −1 (using Theorem 26) and apply it to the state A T |b . Although this procedure would output a quantum state close to |x , it is not efficient. It is easy to see that the inverse of A T A + λL T L, would be implemented with a complexity that has a quadratic dependence on the condition numbers of A and L. This would be undesirable as it would perform worse than the unregularized quantum least squares algorithm, where one is able to implement A + directly. However, it is possible to design a quantum algorithm that performs significantly better than this.
The first observation is that it is possible to recast this problem as finding the pseudoinverse of some augmented matrix. Given the data matrix A ∈ R N ×d , the regularizing matrix L ∈ R N ×d , let us define the following augmented matrix It is easy to see that the top left block of A + L = (A T A + λL T L) −1 A T , which is the required linear transformation to be applied to b. Consequently, our strategy would be to implement a block-encoding of A L , given block-encodings of A and L. Following this, we use matrix inversion by QSVT to implement A + L |b |0 . The first register is left in the quantum state given in Equation 31.
From this, it is clear that the complexity of our quantum algorithm would depend on the effective condition number of the augmented matrix A L . In this regard, we shall assume that the penalty matrix L is a good regularizer. That is, L is chosen such that it does not have zero singular values (positive definite). This is a fair assumption as if L has only non-zero singular values, the minimum singular value of A L is guaranteed to be lower bounded by the minimum singular value of L. This ensures that the effective condition number of A L depends on κ L , even when the data matrix A has zero singular values and A T A is not invertible. Consequently, this also guarantees that regularized least squares provide an advantage over their unregularized counterparts.
Next, we obtain bounds on the effective condition number of the augmented matrix A L for a good regularizer L via the following lemma: Lemma 31 (Condition number and Spectral Norm of A L ). Let the data matrix A and the positive definite penalty matrix L have spectral norms A and L , respectively. Furthermore, suppose their effective condition numbers be upper bounded by κ A and κ L . Then the ratio between the maximum and minimum (non-zero) singular value of A L is upper bounded by We can also bound the spectral norm as Proof. To bound the spectral norm and condition number of A L , consider the eigenvalues of the following matrix: This implies that the non-zero eigenvalues of A T L A L are the same as those of A T A + λL T L. Therefore, using triangle inequality, the spectral norm of A L can be upper-bounded as follows: Similarly A L ≥ A and A L ≥ √ λ L , which effectively gives the tight bound for A L .
As L T L is positive definite, we have that its minimum singular value is σ min(L) = L /κ L . And we also know that A T A is positive semidefinite, so by Weyl's inequality, the minimum singular value of A L is lower bounded by In the theorems and lemmas for regularized quantum linear regression and its variants that we develop in this section, we consider that L is a good regularizer in order to provide a simple expression for κ. However, this is without loss of generality. When L is not a good regularizer, the expressions for the respective complexities will remain unaltered, except that κ would now correspond to the condition number of the augmented matrix. Now it might be possible that |b does not belong to the row space of (A T A+λL T L) −1 A T which is equivalent to saying |b |0 may not lie in row(A + L ). However, it is reasonable to expect that the initial hypothesis of the underlying model being close to linear is correct. That is, we expect |b to have a good overlap with row A + L = col (A L ). The quantity that quantifies how far the model is from being linear is the so called normalized residual sum of squares. For 2 -regularized ordinary least squares, this is given by If the underlying data can indeed be fit by a linear function, S O will be low. Subsequently, we assume that = Ω(1), implying that the data can be reasonably fit by a linear model. i Now we are in a position to present our quantum algorithm for the quantum least squares problem with general 2 -regularization. We also present an improved quantum algorithm for the closely related quantum ridge regression, which is a special case of the former.
Theorem 32 (Quantum Ordinary Least Squares with General 2 -Regularization). Let A, L ∈ R N ×d be the data and penalty matrices with effective condition numbers κ A and i Our results also hold if we assume that SO ≤ γ for some γ ∈ (0, 1). That is, Π col(A L ) ≥ 1 − γ. In such a scenario our complexity to prepare A + L |b, 0 / A + L |b, 0 is rescaled by 1/ κ L respectively, and λ ∈ R + be the regression parameter. Let U A be a (α A , a A , ε A )-blockencoding of A implemented in time T A and U L be a (α L , a L , ε L )-block-encoding of L implemented in time T L . Furthermore, suppose U b be a unitary that prepares |b in time T b and define Then for any δ ∈ (0, 1) such that we can prepare a state that is δ-close to using only O(log κ) additional qubits.
Proof. We invoke Lemma 22, to obtain a unitary U , which is a (α A + √ λα L , max(a A , a L )+ 2, ε A + √ λε L )-block-encoding of the matrix A L , implemented at a cost of O(T A + T L ). Note that in Lemma 22, A and L are considered to be s-qubit operators. For N × d matrices, such that N, d ≤ 2 s , we can pad them with zero entries. Padding A and L with zeros may result in the augmented matrix A L having some zero rows between A and L. However, this is also not an issue as we are only interested in the top left block of A + L which remains unaffected.
Note that U can be reinterpreted as a α A + -blockencoding of the normalized matrix A L / A L . Furthermore, we can prepare the quantum state |b |0 in time T b . Now by using Theorem 28 with U and an appropriately chosen δ specified above, we obtain a quantum state that is δ-close to in the first register.
In the above complexity, when L is a good regularizer, κ is independent of κ A . κ can be made arbitrarily smaller than κ A by an appropriate choice of L. Thus the regularized version has significantly better time complexity than the unregularized case. One such example of a good regularizer is in case of Quantum Ridge Regression, where we use the identity matrix to regularize. The corollary below elucidates this.
Corollary 33 (Quantum Ridge Regression). Let A be a matrix of dimension N × d with effective condition number κ A and λ ∈ R + be the regression parameter. Let U A be a (α, a, ε)-block-encoding of A implemented in time T A . Let U b be a unitary that prepares |b in time T b . If κ = 1 + A / √ λ then for any δ such that ε = o δ κ 3 log 2 κ δ we can prepare a state δ-close to with probability Θ(1) using only O(log κ) additional qubits.
Proof. The identity matrix I is a trivial (1, 0, 0)-block-encoding of itself, and κ I = 1. We invoke Theorem 32 with L = I to obtain the solution.
Being in the block-encoding framework allows us to express the complexity of our quantum algorithm in specific input models such as the quantum data structure input model and the sparse access model. We express these complexities via the following corollaries.
Corollary 34 (Quantum Ordinary Least Squares with 2 -Regularization in the Quantum Data Structure Input Model). Let A, L ∈ R N ×d with effective condition numbers κ A , κ L respectively. Let λ ∈ R + and b ∈ R N . Let κ be the effective condition number of the augmented matrix A L . Suppose that A, L and b are stored in a quantum accessible data structure. Then for any δ > 0 there exists a quantum algorithm to prepare a quantum state δ-close to Proof. Since b is stored in the data structure, for some ε b > 0, we can prepare the state |b that is ε b -close to |b = i b i |i / b using T b = O(polylog(N/ε b )) queries to the data structure (see Section 2.2.1.) Similarly, for some parameters ε A , ε L > 0, we can construct a (µ A , log(d + N ) , ε A )-block-encoding of A using T A = O(polylog(N d/ε A )) queries to the data structure and a (µ L , log(d + N ) , ε B )-block-encoding of L using T L = O(polylog(N d/ε B )) queries. We invoke Theorem 32 with a precision δ/2 by choosing ε A and ε L such that equation Equation 34 is satisfied. This gives us a state that is δ/2-close to To compute the final precision as δ, we use Lemma 15 by choosing ε b = δ 2κ . The complexity can be calculated by plugging in the relevant values in Equation 35 In the previous corollary µ A = A F and µ L = L F when the matrix A and L are stored in the data structure. Similarly, µ A = µ p (A) and µ L = µ p (L) when the matrices A (p) , A (1−p) and L (p) , L (1−p) are stored in the data structure. Now we discuss the complexity of quantum ordinary least squares with 2 -regularization in the sparse access input model. We call a matrix M as (s r , s c ) row-column sparse if it has a row sparsity s r and column sparsity s c .
Corollary 35 (Quantum Ordinary least squares with 2 -regularization in the sparse access model). Let A ∈ R N ×d be (s A r , s A c ) row-column sparse, and similarly, let L ∈ R N ×d be (s L r , s L c ) row-column sparse, with effective condition numbers κ A and κ L respectively. Let λ ∈ R + and δ > 0. Suppose there exists a unitary that prepares |b at a cost, T b . Then there is a quantum algorithm to prepare a quantum state that is δ-close to Proof. The proof is similar to Corollary 34 but with α A = s A r s A c and α L = s L r s L c .

Quantum Weighted And Generalized Least Squares
This technique of working with a augmented matrix will also hold for the other variants of ordinary least squares. In this section, we begin by briefly describing these variants before moving on to designing quantum algorithms for the corresponding problems.
Weighted Least Squares: For the WLS problem, each observation {a i , b i } is assigned some weight w i ∈ R + and the objective function to be minimized is of the form If W ∈ R N ×N is the diagonal matrix with w i being the i th diagonal entry, then the optimal The 2 -regularized version of WLS satisfies Our quantum algorithm outputs a state that is close to given approximate block-encodings of A, W and L. Much like Equation 32, finding the optimal solution reduces to finding the pseudo inverse of an augmented matrix A L given by

The top left block of
which is the required linear transformation to be applied to the vector y = √ W b. The ratio between the minimum and maximum singular values of A L , κ, can be obtained analogously to Lemma 31. For the 2 -regularized WLS problem, normalized residual sum of squares is given by Subsequently, we assume that S W = 1 − Π col(A L ) |y |0 2 ≤ γ < 1/2. This in turn implies that Π col(A L ) |y |0 2 = Ω(1), implying that the data can be reasonably fit by a linear model.
Generalized Least Squares. Similarly, we can extend this to GLS problem, where there the input data may be correlated. These correlations are given by the non-singular covariance matrix Ω ∈ R N ×N . The WLS problem is a special case of the GLS problem, corresponding to when Ω is a diagonal matrix. The objective function to be minimized is The optimal x ∈ R d satisfies Similarly, the 2 -regularized GLS solver outputs x such that So, given approximate block-encodings of A, Ω and L a quantum GLS solver outputs a quantum state close to The augmented matrix A L is defined as Then top left block of A + L to the vector y = Ω −1/2 b yields the optimal x. Thus the quantum GLS problem with 2 -regularization first prepares Ω −1/2 |b |0 and then uses the matrix inversion algorithm by QSVT to implement A + L Ω −1/2 |b |0 . Analogous to OLS and WLS, we assume that the normalized residual sum of squares S Ω ≤ γ < 1/2.

Quantum Weighted Least Squares
In this section, we derive the complexity of the 2 -regularized WLS problem. We assume that we have a diagonal weight matrix W ∈ R N ×N such that its smallest and largest diagonal entries are w min and w max , respectively. This implies that W = w max and κ W = w max /w min . We take advantage of the fact that the matrix W is diagonal and then apply controlled rotations to directly implement a block encoding of √ W A. Additionally, given a state preparation procedure for |b , we can easily prepare a state proportional to √ W |b . We then use Theorem 32 to solve QWLS.
We first formalize this idea in Theorem 36, assuming direct access to (i) a block encoding of B = √ W A, and (ii) a procedure for preparing the state |b w = √ W |b √ W |b . Subsequently, for the specific input models, we show that we can indeed efficiently obtain a block-encoding of B and prepare the state |b w .
Theorem 36 (Quantum Weighted Least Squares with General 2 -Regularization). Let A, L ∈ R N ×d , be the data and penalty matrix, with effective condition numbers κ A and κ L , respectively. Let λ ∈ R + be the regularizing parameter. Let W ∈ R N ×N be a diagonal weight matrix with the largest and smallest diagonal entries being w max , w min , respectively.
. Let U bw be a unitary that prepares Then for any δ > 0 we can prepare a quantum state that is δ-close to using only O(log κ) additional qubits.
Proof. We then invoke Theorem 32 with B and L as the data and regularization matrices, respectively. This requires that ε B , ε L such that Thus, we get the upper bounds on the precision ε B , ε L required. This gives us a quantum state δ-close to  N d/δ)).
Proof. ∀j ∈ [N ], define Similarly, ∀k ∈ [d], define Given quantum data structure accesses to W and A, one can construct quantum circuits W R and W L similar to U L and U R from Lemma 3 that prepare |φ k and |ψ j above. |φ k can be prepared just as in Lemma 3, while |ψ j can be prepared using controlled rotations on the state | w j wmax (which can be constructed from the QRAM access to W ) after adding an ancilla qubit and the QRAM access to A. Thus, W † R W L is the required block encoding, which according to Theorem 2 can be implemented using polylog(N d/δ) queries.

Lemma 38 (Efficiently preparing
√ W |b in the Quantum Data Structure Model). Let b ∈ R N and W ∈ R N ×N . Suppose that b and W are stored in a quantum-accessible data structure such that we have a state preparation procedure that acts as Then for any δ > 0 we can prepare the quantum state that is δ-close to which can again be applied using some controlled rotations, a square root circuit and U W . This gives us the state (ignoring some blank registers) The probability for the ancilla to be in |0 state is Ω w min w max .
Thus performing O wmax w min rounds of amplitude amplification on |0 gives us a constant probability of observing |0 , and therefore obtaining the desired state Using the above two theorems, and the quantum OLS solver (Theorem 32), we can construct an algorithm for regularized quantum WLS.
Theorem 39 (Quantum Weighted Least Squares with General 2 -Regularization in the Quantum Data Structure Model). Let A, L ∈ R N ×d with effective condition numbers κ A , κ L respectively be stored in an efficient quantum accessible data structure. Let W ∈ R N ×N be a diagonal matrix with largest and smallest singular values w max , w min respectively, which is also stored in an efficient quantum accessible data structure. Furthermore, suppose the entries of the vector b ∈ R N are also stored in a quantum-accessible data structure and define, Then for any δ > 0 we can prepare a quantum state that is δ-close to Proof. Choose some precision parameter ε > 0 for accessing the data structure. Given access to W and A, we can use Lemma 37 to prepare a ( queries to the data structure. Similarly, Lemma 3 allows us to build a ( L F , log (N + d) , ε)-block-encoding of L using T L := O(polylog(N d/ε)) queries to the data structure.
Next, using Lemma 38, for any ε b > 0, we can prepare a state ε b -close to |b := Finally, for the output state to be δ-close to the required state, we choose δ b = δ/2 and ε b = δ/2κ to use the robustness result from Lemma 15. This gives us Now we can substitute the cost of the individual components in Equation 35 to obtain the final cost as Now, for the sparse access model, we can obtain a block encoding similar to Lemma 37 and a quantum state similar to Lemma 38, with the same query complexities. Thus we have an algorithm similar to Theorem 39 in the sparse access model as well. We directly state the complexity of this algorithm.
Theorem 40 (Quantum Weighted Least Squares with General 2 -Regularization in the Sparse Access Model). Let A ∈ R N ×d be (s A r , s A c ) row-column sparse, and similarly, let L ∈ R N ×d be (s L r , s L c ) row-column sparse, with effective condition numbers κ A and κ L respectively. Let λ ∈ R + . Let W ∈ R N ×N be a diagonal matrix with the largest and the smallest diagonal entries being w max , w min , respectively. Suppose that the diagonal entries of W are stored in a QROM such that, for any δ > 0, we can compute |j 0 → |j |w j in cost O (polylog (N d/δ)) as well as w max . Furthermore, suppose there exists a unitary that prepares |b at a cost T b and define, Then for any δ > 0 we can prepare a quantum state that is δ-close to

Quantum Generalized Least Squares
In this section, we assume that we have block-encoded access to the correlation matrix Ω ∈ R N ×N , with condition number κ Ω . We begin by preparing a block encoding of Ω −1/2 , given an approximate block-encoding of Ω.
We will now use this lemma in conjunction with Theorem 32 to develop quantum algorithms for GLS with general 2 -regularization.
Theorem 42 (Quantum Generalized Least Squares with General 2 -regularization). Let A, L ∈ R N ×d be the data and penalty matrices with effective condition numbers κ A , κ L respectively. Let Ω ∈ R N ×N be the covariance matrix with condition number κ Ω . Let δ > 0 be the precision parameter. Define κ as For some ε A such that we have access to U Ω , an (α Ω , a Ω , ε Ω )-block-encoding of Ω implemented in time T Ω . Let U b be a unitary that prepares the state |b in time T b . Then we can prepare the quantum state that is δ-close to using only O(log κ) additional qubits.
Proof. Observe that by choosing A := Ω −1/2 A, L := L, |b := Ω −1/2 |b (upto normalization) in the quantum ordinary least squares, we get a state proportional to ( For convenience, let us define the matrix B := Ω −1/2 (and therefore κ B = √ κ Ω and B = κ Ω / Ω ). We now need to prepare a block-encoding of BA and the quantum state B|b B|b , which we then use to invoke Theorem 32.
We begin by using Lemma 41 with some precision ε B to construct a (α B , a B , ε B )-blockencoding of B = Ω −1/2 , where α B = 2 κ Ω Ω = 2 B , and a B = a Ω + 1. This bounds ε Ω as and has a cost of T Ω Then using Lemma 20 with precision γ satisfying γ ≥ 4 √ 2 max ( B ε A , A ε B ), we get a (2 A B , a A + a B + 3, γ)-block-encoding of A := BA = Ω −1/2 A at a cost To prepare B|b B|b , we use Lemma 13 with precision ε b ≥ 2ε B κ B / B . This prepares a state that is ε b -close to |b := B|b B|b with constant success probability at a cost of We could invoke OLS directly using the above two, but that ends up with a product of sub-normalization factors (α terms) in the complexity. We want to avoid this, because in most common cases α-s for block-encodings are quite large. So we also pre-amplify U L using Corollary 12: for any δ L ≥ 2ε L we get a ( √ 2 L , a L + 1, δ L )-encoding of L at a cost of Now that we have these, we can use Theorem 32 to get a quantum state δ -close to and would cost To simplify the ratio of norms term, we can first lower-bound BA ≥ A / B −1 = A / Ω . And as B = κ Ω / Ω , the whole term can be simplified to O √ κ Ω . This We can compute the error between |ψ and the expected state by using Lemma 15. For the final error to be δ, we have to choose ε b = δ/2κ and δ = δ/2. Therefore Combining both bounds of ε B by using sums or products, we can effectively bound Finally for the final costs, we calculate the respective coefficients of terms T A , T Ω , T L and T b , (excluding the common factor of κ √ κ Ω log κ for brevity). Let us label these "coefficient extraction" functions as C with matching subscripts, and the total cost as T .
And hence the final complexity is given by the expression One immediate observation is that for the special case of the (unregularized) quantum GLS problem (when L = 0 and λ = 0), our algorithm has a slightly better complexity than [CGJ19] and requires fewer additional qubits. Now, we will state the complexities of this algorithm in specific input models, namely the quantum data structure model and the sparse-access input model.
Corollary 43 (Quantum Generalized Least Squares with General 2 -Regularization in the Quantum Data Structure Model). Let A, L ∈ R N ×d be the data and penalty matrices with effective condition numbers κ A , κ L respectively. and Ω ∈ R N ×N be the covariance matrix with condition number κ Ω . Let the matrices A, L, Ω and the vector b be stored in a quantum-accessible data structure. Define κ as Then for any δ > 0, we can prepare the quantum state that is δ-close to Proof. The proof is very similar to Corollary 34 with the extra input of Ω. We can use the data structure to prepare the block-encodings for A, L, Ω and the state |b , with precisions ε A , ε L , ε Ω , ε b respectively. We invoke Theorem 42 with a precision of δ b , and choose the above ε terms to be equal to their corresponding upper-bounds. And finally we use Lemma 15 with ε b = δ/2κ and δ b = δ/2 to get the final error as δ.
Now, µ A = A F (similarly for µ L and µ Ω ). As A F ≤ r(A) A , where r(A) is the rank of A, we have that the complexity of Corollary 43 can be re-expressed as O κ √ κ Ω r(A) + r(L) + r(Ω)κ Ω polylog N dκ δ . (54) Corollary 44 (Quantum Generalized Least Squares with General 2 -Regularization in the Sparse Access Model). Let A ∈ R N ×d be a (s A r , s A c ) row-column sparse data matrix. Let L ∈ R N ×d be a (s L r , s L c ) row-column sparse penaly matrix. Let Ω ∈ R N ×N be a (s Ω r , s Ω c ) row-column sparse covariance matrix. Suppose we have a procedure to prepare |b in cost T b . Define κ as λ Ω L Then for any δ > 0, we can prepare the quantum state that is δ-close to

Future Directions
Our algorithms for quantum linear regression with general 2 -regularization made use of QSVT to implement various several matrix operations. However, it is possible to use QSVT directly to obtain the solution to quantum ridge regression. This requires computing a polynomial approximation for the transformation σ → σ/(σ 2 + λ), to be applied on the singular values of A, which lie between [1/κ A , 1]. However, it is unclear how to extend this while considering general 2 -regularization. For instance, even when the data matrix and the penalty matrix share the same right singular vectors, this approach involves obtaining polynomial approximations to directly implement transformations of the form σ → σ/(σ 2 + λ σ 2 ), where σ is a singular value of the penalty matrix L. A monomial is no longer sufficient to approximate this quantum singular value transformation. It would be interesting to explore whether newly developed ideas of M-QSVT [RC22] can be used to implement such transformations directly with improved complexity. While developing quantum machine learning algorithms, it is essential to point out the caveats, even at the risk of being repetitive [Aar15]. Our quantum algorithms output a quantum state |x whose amplitudes encode the solution of the classical (regularized) linear regression problem. While given access to the data matrix and the penalty matrix, we achieve an exponential advantage over classical algorithms, this advantage is not generic. If similar assumptions ( 2 -sample and query access) are provided to a classical device, Gilyén et al. developed a quantum algorithm [GST22] for ridge regression (building upon [CGL + 20]) which has a running time in O(poly(κ, rank(A), 1/δ)). This implies that any quantum algorithm for this problem can be at most polynomially faster in κ under these assumptions. One might posit that similar quantum-inspired classical algorithms for general 2 -regression can also be developed. The exponential quantum speedup, however, is retained when the underlying matrices are sparse.
Another future direction of research would be to recast our algorithms in the framework of adiabatic quantum computing (AQC) following the works of [LT20b,AL22]. Quantum algorithms for linear systems in this framework have the advantage that a linear dependence on κ can be obtained without using complicated subroutines like variable-time amplitude amplification. The strategy is to implement these problems in the AQC model and then use time-dependent Hamiltonian simulation [LW18] to obtain their complexities in the circuit model. One caveat is that, so far, time-dependent Hamiltonian simulation algorithms have only been developed in the sparse-access model and therefore the advantage of the generality of the block-encoding framework is lost.
In the future, it would also be interesting to explore other quantum algorithms for machine learning such as principal component regression and linear support vector machines [RML14] using QSVT. Finally, following the results of [CdW21], it would be interesting to investigate techniques for quantum machine learning that do not require the quantum linear systems algorithm as a subroutine. and Technology (SERB-DST), Government of India via grant number SRG/2022/000354. SC is also supported by IIIT Hyderabad via the Faculty Seed Grant. AP thanks Michael Walter for useful discussions. AP acknowledges support by the BMBF through project Quantum Methods and Benchmarks for Resource Allocation (QuBRA).

A.1 Arithmetic with Block-Encoded Matrices
Lemma 17 (Linear Combination of Block Encoded Matrices). For each j ∈ {0, . . . , m−1}, let A j be an s-qubit operator, and y j ∈ R + . Let U j be a (α j , a j , ε j )-block-encoding of A j , implemented in time T j . Define the matrix A = j y j A j , and the vector η ∈ R m s.t. η j = y j α j . Let U η be a η state-preparation unitary, implemented in time T η . Then we can implement a   j y j α j , max j (a j ) + s, j y j ε j   block-encoding of A at a cost of O j T j + T η .
Proof. Let a = max j (a j ) + s and α = j y j α j . For each j ∈ {0, . . . , m − 1}, construct the extended unitary U j by padding ancillas to U j , i.e. U j = I a−s−a j ⊗ U j . Note that U j is a (α j , a − s, ε j )-block-encoding of A j . Let B j = ( 0| a j ⊗ I s )U j (|0 a j ⊗ I s ) denote the top left block of U j and U j , and observe that A j − α j B j ≤ ε j . We also construct P -an η state-preparation unitary s.t. P |0 = j √ y j α j |j -by invoking Definition 16.