Variational Quantum Linear Solver

Previously proposed quantum algorithms for solving linear systems of equations cannot be implemented in the near term due to the required circuit depth. Here, we propose a hybrid quantum-classical algorithm, called Variational Quantum Linear Solver (VQLS), for solving linear systems on near-term quantum computers. VQLS seeks to variationally prepare | x ⟩ such that A | x ⟩ ∝ | b ⟩ . We derive an operationally meaningful termination condition for VQLS that allows one to guarantee that a desired solution precision ϵ is achieved. Specifically, we prove that C ⩾ ϵ 2 /κ 2 , where C is the VQLS cost function and κ is the condition number of A . We present efficient quantum circuits to estimate C , while providing evidence for the classical hardness of its estimation. Using Rigetti’s quantum computer, we successfully implement VQLS up to a problem size of 1024 × 1024 . Finally, we numerically solve non-trivial problems of size up to 2 50 × 2 50 . For the specific examples that we consider, we heuristically find that the time complexity of VQLS scales efficiently in ϵ , κ , and the system size N .


Introduction
Linear systems of equations play an important role in many areas of science and technology, including machine learning [1,2], solving partial differential equations [3], fitting polynomial curves [4], and analyzing electrical circuits [5].In the past decade, significant attention has been given to the possibility of solving linear systems on quantum computers.Classically solving an N × N linear system (N equations for N unknowns) scales polynomially in N .In contrast, Harrow-Hassidim-Lloyd (HHL) introduced a quantum algorithm that scales logarithmically in N , suggesting that quantum computers may provide an exponential speedup for certain linear system problems [6].More precisely, the HHL algorithm treats the Quantum Lin-ear Systems Problem (QLSP), where the goal is to prepare a quantum state |x⟩ that is proportional to a vector x that satisfies the equation Ax = b.If both A and b are sparse, then for a fixed precision ϵ in the solution, the complexity of HHL scales polynomially in log N and κ, where κ is the condition number of A, i.e, the ratio of the largest to the smallest singular value.Further improvements to HHL have reduced the complexity to linear κ scaling [7,8] and polylogarithmic scaling in 1/ϵ [9,10], as well as improved the sparsity requirements [11].
The aforementioned quantum algorithms hold promise for the future, when large-scale quantum computers exist with enough qubits for quantum error correction.The timescale for such computers remains an open question, but is typically estimated to be on the order of two decades.On the other hand, commercial quantum computers currently exist with ∼ 50 noisy qubits, with the number of qubits rapidly increasing.A crucial question is how to make use of such noisy intermediate-scale quantum (NISQ) computers [12].In principle, one can implement the aforementioned quantum algorithms on NISQ devices, however noise limits the problem size to be extremely small.For example, the HHL algorithm has been implemented with superconducting qubits [13,14], nuclear magnetic resonance (NMR) [15], and photonic devices [16,17], but these experiments were limited to a problem size of 2 × 2.More recently, an alternative approach based on an adiabatic-inspired quantum algorithm [8] was implemented with NMR for an 8 × 8 problem, and this appears to be the current record for the largest linear system solved with a gate-based quantum computer [18].
In this work, we propose a VHQCA for solving the QLSP.Our algorithm, called the Variational Quantum Linear Solver (VQLS), employs a cost function that quantifies either the global or local closeness of the quantum states A|x⟩ and |b⟩, where the latter is a normalized version of b.We provide efficient quantum circuits to estimate our cost functions, and show under typical complexity assumptions that they cannot be efficiently estimated classically.Furthermore, we derive operational meaning for our cost functions, as upper bounds on ϵ 2 /κ 2 .This is crucial since it gives a termination criterion for VQLS that guarantees a desired precision ϵ.
It is important to emphasize that all VHQCAs are heuristic algorithms, making rigorous complexity analysis difficult.Nevertheless, our numerical simulations (without finite sampling, and both for specific A and for randomly chosen A) indicate that the run time of VQLS scales efficiently in κ, ϵ, and N .Namely, we find evidence of (at worst) linear scaling in κ, logarithmic scaling in 1/ϵ, and polylogarithmic scaling in N .
We employ Rigetti's Quantum Cloud Services to implement VQLS.With their quantum hardware, we were able to successfully solve a particular linear system of size 1024 × 1024.We are therefore optimistic that VQLS could provide a near-term approach to the QLSP.

Overview
Figure 1 shows a schematic diagram of the VQLS algorithm.The input to VQLS is: (1) an efficient gate sequence U that prepares a quantum state |b⟩ that is proportional to the vector b, and (2) a decomposition of the matrix A into a linear combination of L unitaries of the form where the A l are unitaries, and the c l are complex numbers.The assumption that A is given in this form is analogous to the assumption that the Hamiltonian H in the variational quantum eigensolver [20] is given as a linear combination of Pauli operators H = L l=1 c l σ l , where naturally one makes the assumption that L is only a polynomial function of the number of qubits, n.Additionally, we assume κ < ∞ and ||A|| ⩽ 1, and that the A l unitaries can be implemented with efficient quantum circuits.Appendix A describes an efficient method to decompose A in the form of (1) for the case when A is a sparse matrix.
With this input, the Quantum Linear Systems Problem (QLSP) is to prepare a state |x⟩ such that A|x⟩ is proportional to |b⟩.To solve this problem, VQLS employs an ansatz for the gate sequence V (α) that prepares a potential solution |x(α)⟩ = V (α)|0⟩.The parameters α are input to a quantum computer, which prepares |x(α)⟩ and runs an efficient quantum circuit that estimates a cost function C(α).The precise details of the cost function and its estimation are discussed below.We simply remark here that C(α) quantifies how much component A|x⟩ has orthogonal to |b⟩.The value of C(α) from the quantum computer is returned to the classical computer which then adjusts α (via a classical optimization algorithm) in an attempt to reduce the cost.This process is iterated many times until one reaches a termination condition of the form C(α) ⩽ γ, at which point we say that α = α opt .
VQLS outputs the parameters α opt , which can then be used to prepare the quantum state |x(α opt )⟩ = V (α opt )|0⟩.One can then measure observables of interest on the state |x(α opt )⟩ in order to characterize the solution vector.Due to the operational meaning of our cost function (discussed below), one can upper bound the deviation of observable expectation values for |x(α opt )⟩ from those of the true solution, based on the value of the cost function.Hence, before running VQLS, one can decide on a desired error tolerance ϵ, where is the trace distance between the exact solution |x 0 ⟩ and the approximate solution |x(α opt )⟩.This ϵ then translates into a threshold value γ that the final cost C(α opt ) must achieve (see (9) for the relation between ϵ and γ).
Here we discuss several reasonable cost functions.A simple, intuitive cost function involves the overlap between the (unnormalized) projector |ψ⟩⟨ψ|, with |ψ⟩ = A|x⟩, and the subspace orthogonal to |b⟩, as follows: We note that one can view this cost function as the expectation value of an effective Hamiltonian which is similar to the final Hamiltonian in Ref. [8].
The C G function is small if |ψ⟩ is nearly proportional to |b⟩ or if the norm of |ψ⟩ is small.The latter does not represent a true solution, and hence to deal with this, one can divide C G by the norm of |ψ⟩ to obtain where |Ψ⟩ = |ψ⟩/ ⟨ψ|ψ⟩ is a normalized state.As discussed in the Supplemental Material, C G and C G have similar performance for the QLSPs that we considered.
We emphasize that global cost functions such as those in (3) and (5) can exhibit barren plateaus, i.e., cost function gradients that vanish exponentially in the number of qubits n, see Ref. [38].To improve trainability for large n, one can introduce local versions of these costs, as follows: where the effective Hamiltonian is with |0 j ⟩ the zero state on qubit j and 1 j the identity on all qubits except qubit j.One can show that (see Appendix B) We assume that κ is not infinite (i.e., that A is full rank) and hence that ⟨ψ|ψ⟩ ̸ = 0.This implies that all four cost functions vanish under precisely the same conditions, namely, when |ψ⟩ ∝ |b⟩, which is the case when |x⟩ is a solution to the QLSP.As shown in Fig. 2, as n increases it becomes increasingly hard to optimize the global cost function C G .On the other hand, the local cost function C L performs significantly better, as we are able to train C L for systems of size up to 2 50 × 2 50 (i.e., with 50 qubits).These results show that the vanishing gradients of global cost functions could make them untrainable for large n, and hence we propose using our local cost functions for large-scale implementations.

Operational meaning of cost functions
Here we provide operational meanings for the aforementioned cost functions.These operational meanings are crucial since they allow one to define termination conditions for VQLS in order to achieve a desired precision.In particular, we find that the fol-lowing bounds hold in general: (9) Note that one can take the right-hand-sides of these inequalities as the γ quantity shown in Fig. 1.
We remark that, for C G and C L , the bounds in (9) can be tightened (by using the bounds on C G and C L in (9)) as follows: Here, ⟨ψ | ψ⟩ is experimentally computable (see (14) below) and satisfies ⟨ψ |ψ⟩ ⩽ 1.Hence, when training C G or C L , one can employ the right-hand-sides of (10) as opposed to those of (9) as the termination condition γ.Furthermore, one can employ the operational meaning of the trace distance [39] to note that, for any POVM element M , we have ϵ ⩾ D(M ), where measures the difference between expectation values on |x⟩ and |x 0 ⟩.Relaxing to the general case where M is any Hermitian observable gives ϵ ⩾ D(M )/(2∥M ∥), and hence (9) is a bound on observable differences.
Let us now provide a proof for (9).Consider first that C G = ⟨H G ⟩, with the eigenstates and eigenvalues of H G denoted by {|x i ⟩} and {E i }, respectively for i = 0, 1, . . . .By construction |x 0 ⟩ is the ground state of H G with E 0 = 0.In what follows we assume for simplicity that |x 1 ⟩ is non-degenerate, although the same proof approach works for the degenerate case.
It is clear that for a given ϵ, the smallest energy ⟨H G ⟩ (hence cost) is achieved if the state |x⟩ is a superposition of |x 0 ⟩ and |x 1 ⟩ only.One can see this by expanding an arbitrary state |x⟩ in the energy eigenbasis, |x⟩ = i χ i |x i ⟩, and noting that ϵ depends only on the magnitude of χ 0 .Hence for a fixed ϵ, one is free to vary the set of coefficients {χ i } i̸ =0 , and the set that minimizes the energy corresponds to choosing χ i = 0 for all i > 1.

Cost evaluation
In principle, all the aforementioned cost functions can be efficiently evaluated using the Hadamard Test circuit and simple classical post-processing.However, in practice, care must be taken to minimize the number of controlled operations in these circuits.Consider evaluating the term ⟨ψ|ψ⟩, which can be written as with There are L(L − 1)/2 different β ll ′ terms that one needs to estimate, and which can be measured with Hadamard Tests.The Hadamard Test involves acting with V on |0⟩, and then using an ancilla as the control qubit, applying , where C W denotes controlled-W (see Appendix for precise circuits).
In addition, for C G and C G , one needs to evaluate with The γ ll terms are easily estimated by applying U † A l V to |0⟩ and then measuring the probability of the all-zeros outcome.For the L(L − 1)/2 terms with l ̸ = l ′ , there are various strategies to estimate γ ll ′ .For example, one could estimate the L terms of the form ⟨0|U † A l V |0⟩ with a Hadamard Test, but one would have to control all of the unitaries: V , A l , and U † .Instead, we introduce a novel circuit called the Hadamard-Overlap Test that directly computes γ ll ′ without having to control V or U at the expense of doubling the number of qubits.This circuit is schematically shown in Fig. 1 and explained in detail in Appendix C. Finally, for C L and C L , one needs to estimate terms of the form (18)   These terms can either be estimated with the Hadamard-Overlap Test or with the Hadamard Test, which are discussed in Appendix C.

Classical hardness of computing the cost functions
Here we state that computing the cost functions in (3), (5), and (6) is classically hard under typical complexity assumptions.As shown in Appendix D, the following proposition holds: As indicated by the dashed box, each layer is composed of controlled-Z gates acting on alternating pairs of neighboring qubits which are preceded and followed by single qubit rotations around the y-axis, Ry(αi) = e −iα i Y /2 .Shown is the case of four layers and n = 10 qubits.The number of variational parameters and gates scales linearly with n: for n = 50, four layers of this ansatz consist of 640 gates and 440 variational parameters.

Proposition 1. The problem of estimating the VQLS cost functions C
Recall that the complexity class Deterministic Quantum Computing with 1 Clean Qubit (DQC1) consists of all problems that can be efficiently solved with bounded error in the one-clean-qubit model of computation [40].Moreover, classically simulating DQC1 is impossible unless the polynomial hierarchy collapses to the second level [41,42], which is not believed to be the case.Hence, Proposition 1 strongly suggests that a classical algorithm cannot efficiently estimate the VQLS cost functions, and hence VQLS cannot be efficiently simulated classically.

Ansatz
In the VQLS algorithm, |x⟩ is prepared by acting on the |0⟩ state with a trainable gate sequence V (α).Without loss of generality, V (α) can be expressed in terms of L gates from a gate alphabet A = {G k (α)} as Here k = (k L , . . ., k 1 ) identifies the types of gates and their placement in the circuit (i.e., on which qubit they act), while α are continuous parameters.When working with a specific quantum hardware, it is convenient to choose a Hardware-Efficient Ansatz [43], where A is composed of gates native to that hardware.This reduces the gate overhead that arises when implementing the algorithm in the actual device.We use the term "fixed-structure ansatz" when the gate structure of V (α) is fixed (i.e., when k is fixed), and when one only optimizes over α. Figure 3 shows an example of such an ansatz, with A composed of single qubit y-rotations and controlled-Z gates.We employ the ansatz in Fig. 3 for the heuristics in Section 2.2.1.
Let us remark that this ansatz can have trainability issues [38,44] for large-scale problems.
Strategies such as layer-by-layer training [45] and correlating the α parameters [46] have been shown to be effective to address these trainability issues.In addition, trainability could be further improved by combining these strategies with more advanced ansatz architectures, and we now consider two such architectures.First we discuss a "variable structure ansatz" [30,47], where one optimizes over the gate angles and the gate placement in the circuit, i.e., where one optimizes over α and also over k.We employ such ansatz for our heuristics in Section 2.2.2.We refer the reader to Appendix E for a discussion of the optimization method employed for a variable structure ansatz.
In addition to the aforementioned ansatz, one can also employ the Quantum Alternating Operator Ansatz (QAOA) [48,49] to construct the unitary V (α) and avoid trainability issues.The QAOA consists of evolving the H ⊗n |0⟩ state (where H denotes the Hadamard unitary) by two Hamiltonians for a specified number of layers, or rounds.These Hamiltonians are conventionally known as driver and mixer Hamiltonians, and respectively denoted as H D and H M .Since the ground state of both H G and H L is |x 0 ⟩, we can either use (4) or (7) as the driver Hamiltonian H D .Evolving with H D for a time α i corresponds to the unitary operator U D (α i ) := e −iH D αi .Moreover, one can take the mixer Hamiltonian to be the conventional H M = n i=1 X i , where X i denotes Pauli X acting on the ith qubit.Accordingly, evolving with H M for a time α j yields the unitary operator U M (α j ) := e −iH M αj .The trainable ansatz V (α) is then obtained by alternating the unitary operators U D (α i ) and U M (α j ) p times: In this ansatz, each α i is a trainable continuous parameter.We note that QAOA is known to be universal as the number of layers p tends to infinity [48,50], and that finite values of p have obtained good results for several problems [51][52][53].In the Supplemental Material we present results of a small scale implementation of VQLS with a QAOA ansatz.
Let us remark that Ref. [6] showed that it is possible to efficiently generate an accurate approximation to the true solution |x 0 ⟩, i.e., with a number of gates that is polynomial in n, assuming certain constraints on A and b.Therefore, in principle, one may efficiently approximate these sort of solutions with a universal variational ansatz, such as the ones that we discussed above.

Training algorithm
There are several classical optimizers that may be employed to train V (α) and minimize the cost functions of VQLS.For example, our heuristics in Section 2.2.1 employ an optimization method that, at each iteration, chooses a random direction w in the parameter space along which to perform a line search, i.e., to solve min s∈R C(α + sw).On the other hand, in Section 2.2.2 we perform an optimization where all the parameters in α are independently optimized at each iteration.
In addition, there has been an increasing interest in gradient-based methods for VHQCAs [54][55][56] as it has been shown that the first-order gradient information can be directly measured [57,58] and can lead to faster rates of convergences to the optimum [59].To enable gradient-based strategies, Appendix F derives explicit formulas for the gradients of the cost functions and shows that the same circuits used to compute the cost functions can be used to compute their gradients.Finally, when employing the QAOA ansatz we leverage literature on QAOA-specific training (for instance, Ref. [52]).

Resilience to noise
Recently, it was shown that certain VHQCAs, specifically those for compiling, can exhibit noise resilience in the sense that the optimal parameters are unaffected by certain noise models [60].However, the generality of this Optimal Parameter Resilience (OPR) phenomenon is not clear.For the VQLS algorithm, the analysis of noise is made complicated by the fact that different quantum circuits are used to compute different terms in the cost function.However, perhaps surprisingly, we are able to prove that VQLS does exhibit the OPR phenomenon.Specifically, we show in Appendix G that our normalized cost functions C G and C L are both resilient to global depolarizing noise.We also show that C L is resilient to measurement noise.This is encouraging since C L is our proposed cost function of choice for large-scale implementations, although we leave an analysis of more complicated noise models for future work.
Because of OPR, we are able to train the variational ansatz and learn the correct parameters which prepare the solution |x⟩.However, we still have a noisy estimate of the cost function which affects the termination conditions ((9) or (10)) used to achieve the desired precision.In what follows, we outline an error mitigation procedure known as Probabilistic Error Cancellation (PEC) [61] which allows us to certify the termination conditions in the presence of noise.
The basic idea of PEC is to represent an ideal gate in a basis of noisy gates which a given quantum computer can implement.Exact procedures to do so are known for simple noise models such as depolarizing and amplitude damping noise.Generally, as long as the gates which a quantum computer can implement form a basis for unitary operations, an arbitrary unitary can be expressed in this basis by solving a linear program [61].Using the notation of Ref. [61], an ideal circuit U β can thus be written Here, p α (β) is a probability distribution, σ α (β) = ±1, and γ β is the negativity of the quasi-probability distribution a α (β) := γ β σ α (β)p α (β).For any observable A † = A, the noise-free estimate is where T is the number of gates in the circuit, and Ω T is the index set over all possible circuits which has size |Ω T | = m T , where m is the number of gates in the noisy basis.This equation is exact, but the summation has exponentially terms in the number of gates T .However, we may define an estimate of the noise-free observable as the random variable where M is the number of samples.To achieve pre- Concretely, this means running M = (γ β /δ) 2 circuits sampled from the distribution p α (β) and combining the results via (23).
We can apply this to VQLS as follows.Let C ∈ {C G , C L , ĈG , ĈL } be a cost function and C be its noisy estimate.The cost C is computed by evaluating poly(L) circuits (terms).Using PEC to compute each term with M = (γ β /δ) 2 circuits, we have for any δ > 0, we thus may run circuits.
In summary, OPR shows that we can obtain the optimal parameters for certain noise models, meaning that we will indeed obtain the optimal solution |x⟩ to the QLSP.The above procedure based on PEC shows that, with polynomial overhead, we can verify if the solution is optimal by satisfying the termination conditions.Note that this overhead is not required during the training phase of the VQLS but rather only after the training phase has halted.If the verification fails, additional training may be done, or the training may be restarted.The time-to-solution is the number of executions needed to guarantee a precision of ϵ = 0.002 (solid line) and ϵ = 0.01 (dashed line).Curves are shown for n = 10, 20, 30 qubits.In each case we averaged over 30 runs of the VQLS algorithm with four layers of the Layered Hardware-Efficient Ansatz of Fig. 3, and we trained the gate sequence by minimizing CL of (6).While the the κ scaling appears to be sub-linear here, it is known that linear scaling is optimal in general [6], and hence the observed scaling is likely specific to this example.

Heuristic Scaling
Here we study the scaling of VQLS with the condition number κ, error tolerance ϵ, and number of qubits n.First we consider a specific QLSP for which |x 0 ⟩ admits an efficient matrix-product-state representation, allowing us to simulate large values of n.We then consider QLSPs where the matrix A is randomly generated.In both cases we restrict A to be a sparse matrix, which is standard for QLSPs [6], and we simulate VQLS without finite sampling.Moreover, we quantify the run time of VQLS with the time-to-solution, which refers to the number of exact cost function evaluations during the optimization needed to guarantee that ϵ is below a specified value.In practice, for largescale implementations where the true solution |x 0 ⟩ is unknown, ϵ cannot be directly calculated.Rather, one can use the operational meaning of our cost function in (9) to upper-bound ϵ.Hence, we take this approach in all of our heuristics, i.e., we use the value of the cost, combined with (9), to determine the worstcase ϵ.We emphasize that, while it is tempting to directly compute ϵ from (2) in one's heuristics, this is essentially cheating since |x 0 ⟩ is unknown, and this is why our certification procedure is so important. .In all cases V (α) was composed of four layers of the Layered Hardware-Efficient Ansatz of Fig. 3, and we trained the local cost CL.The time-to-solution was obtained by averaging over 30 runs of the VQLS algorithm.The inset depicts the same data in a logarithmic scale.The dependence on 1/ϵ appears to be logarithmic, i.e., linear on a logarithmic scale.

Ising-inspired QLSP
Here we numerically simulate VQLS to solve the QLSP defined by the sparse matrix where the subscripts in (26) denote the qubits acted upon non-trivially by the Pauli operator.Here, we set J = 0.1.The parameters ζ and η are chosen such that the smallest eigenvalue of A is 1/κ and its largest eigenvalue is 1, which involves analytically computing [62] the smallest eigenvalue of the first two terms of A and then re-scaling A. As previously mentioned, this QLSP example is motivated from the fact that for J = 0 the solution is given by |x 0 ⟩ = |b⟩.Hence for small J, |x 0 ⟩ admits an efficient matrix-product-state representation.
Dependence on κ: Figure 4 shows our results, plotting time-to-solution versus κ for the QLSP in (26).Our numerical results were obtained by employing the layered Hardware-Efficient Ansatz of Fig. 3, and by training the local cost C L for different values of n. Figure 4 shows that as the condition number κ is increased, the time-to-solution needed to achieve a given ϵ increases with a scaling that appears to be sub-linear.Hence VQLS scales efficiently with κ for this example.It is known that linear scaling is optimal [6].Hence we expect that the scaling observed here is specific to this example, and indeed the example in the next subsection shows scaling that is closer to linear.Dependence on ϵ: To study the scaling of VQLS with ϵ, we numerically solved the QLSP in (26) for different values of κ and n.In all cases we trained the gate parameters by optimizing the C L cost function.Figure 5 shows the time-to-solution versus 1/ϵ.These results show that as 1/ϵ grows, the time-to-solution exhibits a logarithmic growth.
Dependence on n: The QLSP of (26) allows us to increase the number of qubits and analyze the scaling of VQLS with n.Here we implemented VQLS with n = 6, 8, . . ., 30 and for κ = 60, 120, 200 by training the local cost function C L . Figure 6 shows time-tosolution versus n.As the number of qubits increases the time-to-solution needed to guarantee a particular ϵ with κ fixed appears to increase linearly with n.This corresponds to logarithmic scaling in the linear system size N , analogous to that of the HHL algorithm [6].

Randomly generated QLSP
In this section we present scaling results for the case when the matrix A is randomly generated with the form Here p is either 0 or 1 according to a fixed binomial distribution, a j,k are random weights in (−1, 1), and σ α j is the Pauli matrix acting on qubit j with α = x, y, z.For each j, k = 1, . . ., n in (27), α and β are randomly chosen.Finally, we remark that ξ 1 , and ξ 2 are normalization coefficients that rescale the matrix so that its largest eigenvalue is 1 and its smallest is 1/κ (where κ is fixed).
For a given number of qubits n, we randomly created a matrix A according to (27), and we ran four independent instances of VQLS.We then selected the best run, i.e., the instance that required the small-est number of cost function evaluations to reach a specified value of guaranteed ϵ (guarenteed via (9)).This procedure was then repeated for 10 independent random matrices A, and the time-to-solution was obtained as the average of the best run for each matrix.
Dependence on κ: In Fig. 7(a) we show the timeto-solution versus κ for matrices randomly generated according to (27), and for n = 4.Here we employed a variable-structure ansatz as described in Appendix E, and we trained the local cost in (6).Different curves represent different desired precision ϵ.The data were plotted in a log-log scale and each curve was fitted with a power function κ m .In all cases we found m < 1, indicating that the scaling in κ for these examples is at worst linear.Linear scaling in κ is known to be optimal [6].
Dependence on ϵ: Let us now analyze the scaling of VQLS with respect to ϵ for matrices with different condition numbers.Figure 7(b) depicts the timeto-solution versus ϵ for matrices randomly generated according to (27), and for n = 4.All curves were fitted with a linear function, and since the x axis is in a logarithmic scale, the dependence on 1/ϵ appears to be logarithmic.Upon examining Figures 7(a) and 7(b) collectively, the apparent scaling of the timeto-solution with κ and log(1/ϵ) seems to exhibit a multiplicative behavior.
Dependence on n: In Fig. 7(c) we present the timeto-solution versus n needed to guarantee ϵ = 0.3 for QLSPs with n = 2, . . ., 7. All matrices A had condition number κ = 10.The data were fitted with a power function and we obtained the relation y ∼ n 8.5 .This corresponds to polylogarithmic scaling in N , which is the standard goal of quantum algorithms for the QLSP [7][8][9][10].
We refer the reader to the Supplemental Material for additional numerical simulations of VQLS for other QLSP examples, both with a Hardware-Efficient Ansatz and with a QAOA ansatz.These examples also exhibit efficient scaling behavior.

Implementation on quantum hardware
Here we present the results of a 1024 × 1024 (i.e., 10qubit) implementation of VQLS using Rigetti's 16Q Aspen-4 quantum computer.Specifically, we solved the QLSP defined by the matrix A in (26), with ζ = η = 1, and where the vector |b⟩ = |0⟩ was the all zero state.The ansatz consisted of R y (α i ) gates acting on each qubit.To adapt to hardware constraints, we computed the cost function C G in (5) by expanding the effective Hamiltonian H G in terms of Pauli operators and then employing Rigetti's quantum computer to estimate the expectation values of these terms.
The results of two representative VQLS runs are shown in Fig. 8.As shown, the cost function data obtained by training in a quantum computer closely matches the one obtained from training on a noiseless Figure 7: VQLS heuristic scaling for random matrices generated according to (27).The time-to-solution is the number of executions needed to guarantee a desired precision ϵ.In all cases we employed a variable-structure ansatz V (α) as described in Appendix E, and we trained the local cost CL of (6).a) Time-to-solution versus κ for a system of n = 4 qubits.Axes are shown in a log-log scale.For each value of ϵ the data were fitted with a power function κ m and in all cases m < 1, suggesting that the κ scaling appears to be sub-linear.b) Time-to-solution versus 1/ϵ for a system of n = 4 qubits.The x axis is shown in a log scale.Each curve corresponds to a different condition number.For all values of κ the data were fitted with a linear function, implying that the 1/ϵ scaling is logarithmic.c) Time-to-solution versus n needed to guarantee ϵ = 0.3.All matrices had a condition number κ = 10.The plot employs a log-log scale.The data were fitted with a power function y ∼ n 8.5 , suggesting that the N dependence is polylogarithmic.
simulator.For each run on the QPU, the value of the cost function approaches zero, indicating that a good solution to the linear system was found.
Additional experiments performed on quantum hardware are presented in the Supplemental Material.

Discussion
In this work, we presented a variational quantumclassical algorithm called VQLS for solving the quantum linear systems problem.On the analytical side, we presented four different faithful cost functions, we derived efficient quantum circuits to estimate them while showing that they are difficult to estimate classically, and we proved operational meanings for them as upper bounds on ϵ 2 /κ 2 .On the numerical side, we studied the scaling of the VQLS run time by solving non-trivial problems of size up to 2 50 × 2 50 .For the examples considered, we found VQLS to scale efficiently, namely, at worst linearly in κ, logarithmically in 1/ϵ, and polylogarithmically in the linear system size N .
It remains to be seen how the VQLS training is affected by finite sampling, which is not accounted for in our heuristics.Our solution verification procedure in Sec.2.1.3will require the shot noise to appropriately scale with ϵ and κ as dictated by (9).Namely, the number of shots would need to scale as (κ/ϵ) 4 , although this complexity might be reduced if one does not require solution verification.
Furthermore, we utilized Rigetti's Quantum Cloud Services to implement VQLS for a particular problem up to a size of 1024 × 1024, which to our knowledge is the largest implementation of a linear system on quantum hardware.Interestingly, with our implementation on Rigetti's hardware, we noticed some preliminary evidence of noise resilience, along the same lines as those discussed in Ref. [60] for a different varia-  (26).One can observe that for each QPU run the cost function is reduced to a value below 10 −1 .Due to noise present in the quantum device the cost does not go to zero.tional algorithm.Namely, we noticed optimal parameter resilience, where VQLS learned the correct optimal parameters despite various noise sources (e.g., measurement noise, decoherence, gate infidelity) acting during the cost evaluation circuit.We will explore this in future work, including tightening our certification bound in (9) when accounting for noise.
Finally, we discuss how VQLS fits into the larger literature on quantum algorithms for linear systems.Most prior algorithms rely on time evolutions with the matrix A [6,7,9] or a simple function of it [8].In these algorithms, the duration of the time evolution is O(κ) in order to prepare a state |x⟩ that is ϵ-close to the correct answer.In general, this can only be achieved with a quantum circuit of size linear in κ as per the "no fast-forwarding theorem" [63,64].This is even true if there exists a very short quantum circuit that prepares the desired state |x⟩.The non-variational algorithms simply cannot exploit this fact.On the other hand, a variational algorithm with a short-depth ansatz might be used to prepare such a state.This does not mean, however, that the overall complexity of the variational algorithm does not depend on the condition number.This dependence enters through the stopping criteria given in (9).As the condition number increases, the cost has to be lowered further in order to guarantee an error of ϵ.This will undoubtedly require more iterations of the variational loop to achieve.In effect, our variational approach trades the gate complexity of non-variational algorithms with the number of iterations for a fixed circuit depth.This trade-off can be useful in utilizing NISQ devices without error correction.
We remark that other variational approaches to the QLSP distinct from ours were very recently proposed [65,66].Relatively speaking, the distinct aspects of our work include: (1) our quantitative certification procedure for the solution, (2) our clear approach to improve trainability for large-scale prob-lems, (3) our novel circuits for efficient cost evalutaion, (4) our large-scale heuristics demonstrating efficient scaling, and (5) our large-scale implementations on quantum hardware.Finally, it exciting that, shortly after our paper was posted, two independent tutorials for the VQLS algorithm were created and added to IBM's open-source Qiskit textbook [67], and to Xanadu's PennyLane library [68].

A Sparse Matrices
In this section we describe how sparse matrices can be expressed as a linear combination of unitaries.Once this is achieved, the cost function can be computed using the methods described elsewhere in this paper.The construction below from Ref. [8] uses a version of Szegedy walks that applies to Hermitian matrices [69,70].Let A be a d-sparse matrix of dimension N and n = log 2 N .We define unitary operations U x , U y , and S that act as follows: The first two registers have n qubits each, the last two registers have a single qubit each, and F j is the set of indices i for which A ji is nonzero.It follows that where we defined | 0⟩ := |0⟩|0⟩|0⟩ for the state of the last three registers and P = | 0⟩⟨ 0|.Equation ( 31) is a decomposition of A as a linear combination of 4 unitaries with equal weights of d/4.We assume access to an oracle for A that acts as Here, j and i label the row and column of A, respectively, so that j, i ∈ {1, . . ., N }, and f (j, l) is the column index of the l'th nonzero element of A in row j.We refer to this oracle as O A .This is the same as that used in previous works for the QLSP and Hamiltonian simulation such as Refs.[9,70].
U x can then be implemented in five steps as follows: The third register is used to temporarily store the matrix elements of A and is discarded at the end.Its size depends on the precision with which the matrix elements of A are specified.A similar procedure can be followed to implement U y .Next, we briefly analyze the gate complexity of the unitaries in the decomposition of A given by Eq. (31).
Since both U x and U y use 3 queries to O A , each unitary uses 6 queries.Other than the queries the main gate complexity comes from the implementation of the operations U x and U y .The gate complexity of both of these strongly depends on the form and precision with which the matrix elements of A are specified.The gate complexity of the unitaries e iπP is O(log N ) = O(n).
Finally, we note that the decomposition in Eq. (31) has a prefactor proportional to d multiplying all the unitaries in the linear combination.This implies that in order to compute the cost function with a given precision, we have to estimate the expectation value of each unitary with a precision that is inversely proportional to d.For this reason we can only use this approach for sparse matrices where d ≪ N .

B Faithfulness of the cost functions
We now prove (8), which is restated here: For the lower bound, let For the upper bound, note that Π G = z̸ =0 |z⟩⟨z|.Let S j denote the set of all bitstrings that have a one at position j, and let S = j S j denote the union of all of these sets.Then

Hence we have nH
Equation (8) implies the faithfulness of the cost functions as follows.Because Π G ⩾ 0 and Π L ⩾ 0, we have that all four cost functions are non-negative.Furthermore, it is clear that if A|x⟩ = |b⟩, then we have

C Cost evaluation circuits
In this section we present short-depth circuits for computing the cost functions of Eqs.(3), ( 5) and (6).In particular, we introduce the Hadamard-Overlap Test circuit, which should be of interest on its own as it is likely to have applications outside of the scope of VQLS.

C.1 Hadamard Test
Figure 9(a) shows a Hadamard Test which can be used to measure the coefficients β ll ′ defined in (15), and used to compute ⟨ψ|ψ⟩ as in (14).When the phase gate is excluded, the probability of measuring the ancilla qubit in the |0⟩ a state is P (0) = (1 + Re[β ll ′ ])/2, while the probability of measuring it in the |1⟩ a state is As we now show, in order to compute the coefficients δ (j) ll ′ in (18) we can use the previous result combined with those obtained by means of the Hadamard test of Figure 9(c).In particular, since (37)   Hence, in order to calculate δ (j) ll ′ one only needs to measure the real and imaginary parts of the matrix elements

C.2 Hadamard-Overlap Test
Consider the circuit in Fig. 9(b), which we refer to as the Hadamard-Overlap Test.A nice feature of the Hadamard-Overlap Test is that it only requires one application of both U and V , and these unitaries do not need to be controlled, in contrast to the Hadamard Test.As explained below, the circuit for the Hadamard-Overlap Test can be obtained by combining the Hadamard Test with the Overlap circuit of Refs.[47,71].This circuit requires 2n + 1 qubits and classical post-processing (which scales linearly with n) similar to that of the Overlap circuit.
Given that the Hadamard-Overlap Test combines two different ideas, let us describe its inner workings in detail.First, the circuit is initialized to the state |0⟩ a ⊗ |0⟩ ⊗ |0⟩, with |0⟩ = |0⟩ ⊗n .We will denote the first qubit, initialized to |0⟩ a , as the ancilla, while we will refer to the next two sets of n-qubits as subsystems S 1 and S 2 , respectively, as in Fig. 9(b).One then applies a Hadamard gate to the ancilla qubit, V to qubits in S 1 , and the unitary U to the qubits in S 2 , producing the state The application of the controlled A l and A † l ′ unitaries leads to and calculate the inner product ⟨ψ|ψ⟩ of (14).The phase gate in the colored box is excluded when calculating the real part of β ll ′ and included when calculating its imaginary part.b) Hadamard-Overlap Test used to compute the coefficients γ ll ′ defined in (17).The Overlap circuit of Refs.[47,71] is indicated in the dashed box.Here, the Rz gate in the colored box denotes a rotation about the z axis of an angle −π/2.Excluding (including) this rotation allows one to calculate the real (imaginary) part of γ ll ′ .As explained in the text, additional post-processing is required.c) Hadamard Test circuit for computing δ (j) ll ′ as defined in (37).Shown here is case when j = 1.
Next, let us consider the case when the R z gate in Fig. 9(b) is excluded.The next Hadamard gate on the ancilla leads to From here, one needs to perform a measurement over state |ψ⟩.As shown in Fig. 9(b), we measure the ancilla on the computational basis and the S 1 and S 2 qubits on the Bell basis.More specifically, we can see that we are measuring the first qubit on S 1 and the first qubit in S 2 in the Bell basis, as well as the second qubit on S 1 and the second qubit in S 2 in the Bell basis, and so-on.At this point, we find it convenient to recall that the Bell basis on two qubits is composed of four states: Note that given a two-qubit state ρ, and denoting as δ p j q j ,11 P (xϕ for x = 0, 1, one can readily see that where SWAP n is the SWAP operator that exchanges the state of the qubits in S 1 and S 2 .An explicit evaluation leads to and Then, combining (40) and (41) yields Following a similar procedure, it can be shown that including the R z gate allows us to calculate Im[γ ll ′ ].
Note that the Hadamard-Overlap test can also be used to compute the real and imaginary parts of δ (j) ll ′ in (18).In this case an additional random unitary R j must be initially applied to the qubits in register S 2 in order to generate the input state |0 j ⟩⟨0 j | ⊗ 1 j .Specif- ically, R j randomly applies a bit-flips to all qubits except qubit j: with r = r 1 , r 2 . . ., r n a random bitstring of length n.

D Proof of Proposition 1
Here we prove Proposition 1 of the main text, which we restate for convenience.

Proposition 1. The problem of estimating the VQLS cost functions C
Proof.Let us first show that estimating C G and C G is DQC1-hard.Our proof is a reduction from the problem of estimating the Hilbert-Schmidt inner-product magnitude ∆ HS between two quantum circuits Ũ and Ṽ acting on n-qubits [33], where we have defined In particular, let us consider the following specific case of estimating ∆ HS , which in turn can be identified as a specific instance of approximating the cost functions C G or C G .Let A = 1, and let |x⟩ and |b⟩ be 2n-qubit states given by where E is an efficient unitary gate that produces a maximally entangled state (e.g., a depth-two circuit composed of Hadamard and CNOT gates).Note that here |x⟩ and |b⟩ correspond to the Choi states of Ṽ and Ũ , respectively.The global cost function is given by Moreover, it is known that approximating ∆ HS to within inverse polynomial precision is DQC1hard [33], and hence the result follows.We additionally remark that estimating Eq. ( 47) can also be interpreted as estimating the Fidelity between two pure states, which was also shown to be DQC1-hard [36].
We now show that estimating C L and C L is also DQC1-hard.In this case our proof is reduced to the problem of approximating the trace of a unitary matrix W .Let |x⟩ = V |0⟩ and |b⟩ = U |0⟩ be 2n + 1qubit states and let A = 1.Moreover, as depicted in Fig. 10, let us denote the first qubit as S 1 , while qubits 2, . . ., n + 1 compose system S 2 , and qubits n + 2, . . ., 2n + 1 compose S 3 .Let where H S1 is a Hadamard gate acting on S 1 , C S1S2 W denotes a controlled-W gate with S 1 the control and S 2 the target, and E is a maximally-entangling gate on S 2 and S 3 .The local cost is then Consider first the case when j = 1 and we measure the qubit in S 1 .It is straightforward to see that the probability of measuring this qubit in the |0⟩ state is P (0) = (1 + Re(Tr W ))/2. On the other hand, the probability P (0) = 1/2 for all qubits in S 2 or S 3 .Hence, we find which implies that the real part of Tr W can be computed as Similarly, by adding a phase gate on S 1 (as indicated in Fig. 10) we can compute the imaginary part of Tr W .By choosing U and V according to (50) one finds that the problem of estimating the local cost function up to inverse polynomial precision is equivalent to approximating the real (or imaginary) part of Tr W . Hence, computing C L or C L is hard for DQC1, since all problems in DQC1 can be seen as estimating the real part of a trace of a unitary matrix [33].

E Variable ansatz optimization
Here we discuss the optimization method employed for the heuristics in Section 2.2.2.As mentioned in the main text, we employed a variable-structure ansatz where the gate placement and the type of gates in V (α) can change during the optimization.Our approach here is similar to the variable-structure ansatzes employed in Refs.[30,47].
First, the gate structure and the angles of V (α) are randomly initialized.That is, one randomly chooses k, and α in (19).Then, the optimization is performed in two alternating loops: an inner loop and an outer loop.During the inner loop, k is fixed and one optimizes over α.Once a local minima is reached, the circuit layout is changed in the outer optimization loop.In this outer loop, the circuit is randomly grown by inserting into V (θ) a sequence of parametrized gates which compile to identity, such that they do not change the cost value.The previous process is then repeated by alternating between the inner and outer loops until the optimization termination condition is met.
Here we remark that the goal of the outer loop is to enhance the expressivity of V (α) and lead to smaller cost values during the next inner loop.However, it may happen that after growing the circuit, the optimizer is not able to minimize the cost function.This is due to the fact that some gate insertions do not lead to more expressive circuits.In order to avoid such unnecessary circuit growth, one can then accept the parametrized gate insertion conditioned to leading to smaller cost values.

F Gradient-based optimization
Here we derive analytical expressions for the gradients of our cost functions, and we show that these gradients can be computed with the same circuits as those introduced in Section C.This enables one to take a gradient-descent optimization approach to VQLS, and is inspired by gradient-based approaches in previous work [33,57] for other VHQCAs.
For simplicity, let us first consider the global cost functions.Let us recall from that the trainable unitary V (α) can be expressed as a sequence of gates G(α i ), where we have dropped the subscript on G.In turn, each gate G(α i ) can always be parametrized by a single-qubit rotation angle of the form e −iαiσi/2 .The gradient of V (α) with respect to α is then given by and each partial derivative is As we now show, the gradient of β ll ′ and γ ll ′ can be computed with the Hadamard-Overlap test and the Hadamard test, respectively.Let us consider the partial derivatives which is valid for any matrix A, we combine Eqs. ( 55)-( 57) to obtain where we have defined Each term in (58) can be computed by means of the Hadamard-Overlap test, while the terms in (59) can be determined via the Hadamard test.These results imply that the gradient with respect to α of C G and C G are determined by and and hence that they can be computed by means of the Hadamard-Overlap test and the Hadamard test.
A similar derivation can be used with Hence the gradient of the local cost functions, C L and C L , can also be computed with the circuits in Sec. C.

G Resilience to simple noise models
Here we show that the normalized cost functions C G and C L are resilient to certain kinds of noise.In particular, we consider global depolarizing noise and measurement noise, which are relatively simple noise models.We leave the analysis of more complicated noise models for future work.
For noise resilience, we employ the definition in Ref. [60], where they define Optimal Parameter Resilience (OPR).A cost function exhibits OPR to a given noise model if this noise does not shift the global minima in parameter space, i.e., if the optimal parameters are not changed by the noise.
OPR is very important to the success of VQLS for the following reason.If VQLS exhibits OPR, then that means that optimizing the noisy cost function can still lead to the correct optimal parameters.Then, these correct parameters, one can prepare the state |x⟩ corresponding to the correct solution of the QLSP.One can then use standard error mitigation techniques, such as zero-noise extrapolation, to obtain noise-free estimates of observable expectation values of the form ⟨x|O|x⟩. Hence, OPR during the optimization process combined with error mitigation after the optimization is over can potentially lead to accurate estimation of observables on the true solution of the QLSP.
Let us first consider global depolarizing noise, which transforms the state according to ρ → Note that if a circuit has L layers, with noise acting after each layer, then the final state is q L ρ + (1 − q L )1/d.First consider μ, which is estimated by the circuit in Fig. 9(a).We assume noise affects the estimation of the real and imaginary parts of βll ′ to an equal extent, where βll ′ is the noisy version of β ll ′ .The maximally mixed state has zero expectation value for the circuit in Fig. 9(a), which measures the Pauli Z operator on the ancilla.Therefore, we obtain that βll ′ = q L(β ll ′ ) β ll ′ .Here, L(β ll ′ ) is the number of layers in the circuit used to estimate β ll ′ .Hence, we have μ = ll ′ c l c * l ′ q L(β ll ′ ) β ll ′ .Finally, one can make an additional assumption that L(β ll ′ ) = L β is independent of l and l ′ , which is a reasonable approximation since the depth of the circuit will likely be dominated by the ansatz V rather than by the controlled gates in Fig. 9(a).This gives: Next consider ν, which we assume is estimated by the circuit in Fig. 9(b).We assume noise affects the estimation of the real and imaginary parts of γll ′ to an equal extent, where γll ′ is the noisy version of γ ll ′ .The maximally mixed state has zero expectation value for the circuit in Fig. 9(b), which measures the Pauli Z operator on the ancilla.Therefore, we obtain that γll ′ = q L(γ ll ′ ) γ ll ′ .Here, L(γ ll ′ ) is the number of layers in the circuit used to estimate γ ll ′ .Hence, we have ν = ll ′ c l c * l ′ q L(γ ll ′ ) γ ll ′ .Finally, one can make an additional assumption that L(γ ll ′ ) = L γ is independent of l and l ′ , which is a reasonable approximation since the depth of the circuit will likely be dominated by the ansatz V rather than by the controlled gates in Fig. 9(b).This gives: Combining ( 65) and (66) gives From this expression, we see that Furthermore, it is clear that Hence we arrive at This proves our desired statement of OPR, since it shows that the optimal parameters are unaffected by the noise.

G.2.1 Global depolarizing noise
Now consider the local cost C L .We can write where ll ′ .Let CL , ω, and δ(j) ll ′ denote the noisy versions of these quantities.
Recall that we can expand δ (j) ll ′ according to (37) as δ (j) under the same assumptions as those considered above.This gives where ll ′ is estimated by the circuit in Fig. 9(c).We assume noise affects the estimation of the real and imaginary parts of ζ(j) ll ′ to an equal extent.The maximally mixed state has zero expectation value for the circuit in Fig. 9(c), which measures the Pauli Z operator on the ancilla.Therefore, we obtain that ζ(j) ll ′ ) is the number of layers in the circuit used to estimate ζ (j) ll ′ .Finally, one can make an additional assumption that L(ζ and j, which is a reasonable approximation since the depth of the circuit will likely be dominated by either the ansatz V or the unitary U , rather than by the controlled gates in Fig. 9(c).This gives ζ(j) where |χ⟩ = U † A|x⟩ is an unnormalized state, and ) is a Hermitian (but not necessarily positive semidefinite) matrix.Note that we have μ = q L β µ, which gives: where | χ⟩ = |χ⟩/ √ µ is a normalized state.
Let us now argue that CL is a faithful cost function, reaching its minimum value iff A|x⟩ ∝ |b⟩.Let us first note that the minimum value of the cost can be found as follows.Note that Tr(|χ⟩⟨χ|(ρ j ⊗ 1 j )) is not larger than the largest eigenvalue of (ρ j ⊗ 1 j ), and this eigenvalue is The lower bound here is actually achievable and hence corresponds to the minimum value of CL .Let us denote this minimum value as Cmin L .Namely, if we assume that |x⟩ is a solution to the QLSP, then we have A|x⟩ ∝ |b⟩, which implies |χ⟩ ∝ |0⟩ and | χ⟩ = |0⟩.In turn, this implies Tr(|χ⟩⟨χ|(ρ j ⊗ 1 j )) = (1 + q L ζ /q L β )/2 and hence CL = Cmin L .Conversely, if we assume that CL = Cmin L , then this implies that | χ⟩ must be an eigenvector of (ρ j ⊗ 1 j ) for all j, with an eigenvalue of (1 + q L ζ /q L β )/2.This implies that Tr j (| χ⟩⟨ χ|) = |0⟩⟨0| for all j, i.e., that | χ⟩ is locally the zero state on each qubit.The only state that satisfies this is | χ⟩ = |0⟩.This state corresponds to A|x⟩ ∝ |b⟩, and hence the QLSP is solved in this case.
This completes the argument that CL is a faithful cost function.Note that this faithfulness property corresponds to OPR, assuming the ansatz V (α) is capable of expressing the true solution.Hence we have proven OPR for CL under global depolarizing noise.

G.2.2 Measurement noise
Finally, we remark that the above proof has implications for measurement noise as well.Measurement noise that is symmetric with respect to the 0 and 1 outcomes can be modeled as a local depolarizing channel acting on the state just prior to the measurement.Moreover, if only a single qubit is being measured, then this local depolarizing channel can be replaced by a global depolarizing channel.We remark that this is the case for the circuits in Fig. 9(a) and (c), which are the circuits used to compute the local cost CL .Hence, for this local cost, any symmetric measurement noise can mathematically be viewed as an additional global depolarizing channel acting just prior to measurement.This means that CL exhibits OPR to a noise model that includes both global depolarizing as well as symmetric measurement noise.
Supplementary Figure S2: Time-to-solution versus condition number κ.The time-to-solution is the mean number of executions needed to guarantee a desired precision ϵ.The QLSP is determined by |b⟩ of (S79), and A given by: Left: Matrix A1 of (S76).Center: Matrix A2 of (S77).Right: Matrix A3 of (S78).For each data point we ran and averaged 1000 instances of the VQLS algorithm.In all cases we trained the gate sequence by minimizing CG, CG, CL, and CL.As can be seen, the scaling in terms of the condition number κ appears to be efficient for all A matrices and for all cost functions.
Supplementary Table S1: Expectation value of an observable M computed with the exact solution, and with the output solution of VQLS.D(M ) measures the difference between these two results.The linear system considered is A2×2 = H, and |b⟩ = X|0⟩.
was performed with the Powell method [72].For every run, the local cost function C L of (6) achieved a value of ∼ 7 × 10 −2 (hardware noise prevented further cost reduction).While this cost value led to a trivial bound on ϵ via (9), we nevertheless found the solution |x⟩ to be of high quality.We verified this by measuring the expectation value of different Hermitian observables M on the state |x⟩ prepared on the quantum computer.According to (11), we can use D(M ) 2 as a figure of merit to quantify the quality of our solution.For all M we considered, D(M ) 2 was no larger than 0.01, and hence the results have a good agreement with the exact solution.See Table S7 for all values of D(M ) 2 .Figure S5 shows the value of the cost function versus the number of optimization steps for different linear systems and for several runs.It is worth mentioning that the cost function is reduced to values ≲ 0.1 for every example, except for the case depicted in panel (b).In this particular case, the solution of the 2 × 2 linear system is |x 0 ⟩ = |1⟩.Therefore, one may note the effect of relaxation to the state |0⟩ in the quantum device, which likely significantly affected the result quality.The Tables S1, S2, S3, S4, S5 and S6 correspond to the examples shown in Figure S5.In the tables we show the expectation values of several observables M , obtained from the output of the VQLS and we compare them to the exact ones.
Supplementary Figure S3: (a) Landscape for CG with a QAOA ansatz of p = 1 layer and unscaled parameters α1 and α2.Here, α1 (α2) corresponds the the parameter in the driver (mixer) Hamiltonian.(b) Landscape for CG with a QAOA ansatz of p = 1 layer where α1 was scaled by the condition number κ.In both cases the QLSP is defined by a randomly generated 4 × 4 matrix with condition number κ ≈ 11, and with |b⟩ given by (S79).The scaled landscape contains more regions of low cost and thus makes optimization more likely to be successful.(c) Time-to-solution versus condition number κ for the QLSP on three qubits defined by the A2 matrix of (S77) and with |b⟩ given by (S79).Three curves are shown for ϵ 2 = 0.10, 0.02, and 0.01.The inset depicts the same data on a logarithmic scale.As can be seen from the inset, the scaling in κ is sub-exponential for each ϵ considered.

Figure 1 :
Figure 1: Schematic diagram for the VQLS algorithm.The input to VQLS is a matrix A written as a linear combination of unitaries A l and a short-depth quantum circuit U which prepares the state |b⟩.The output of VQLS is a quantum state |x⟩ that is approximately proportional to the solution of the linear system Ax = b.Parameters α in the ansatz V (α) are adjusted in a hybrid quantum-classical optimization loop until the cost C(α) (local or global) is below a user-specified threshold.When this loop terminates, the resulting gate sequence V (αopt) prepares the state |x⟩ = x/||x||2, from which observable quantities can be computed.Furthermore, the final value of the cost C(αopt) provides an upper bound on the deviation of observables measured on |x⟩ from observables measured on the exact solution.

Figure 2 :
Figure 2: Comparison of local CL and global CG cost performance.Here we consider the QLSP of Eq. (26) for different system sizes.In all cases κ = 20.For each n ∈ {10, . . ., 50}, we plot the cost value versus the number of cost function evaluations.As n increases it becomes increasingly hard to train to global cost function.At n = 50, our optimization cannot significantly lower CG below a value of one.On the other hand, we are able to train CL for all values of n considered.

Figure 3 :
Figure3: Fixed-structure layered Hardware-Eficient Ansatz for V (α).As indicated by the dashed box, each layer is composed of controlled-Z gates acting on alternating pairs of neighboring qubits which are preceded and followed by single qubit rotations around the y-axis, Ry(αi) = e −iα i Y /2 .Shown is the case of four layers and n = 10 qubits.The number of variational parameters and gates scales linearly with n: for n = 50, four layers of this ansatz consist of 640 gates and 440 variational parameters.

Figure 4 :
Figure4: Scaling with κ for the Ising-inspired QLSP.The time-to-solution is the number of executions needed to guarantee a precision of ϵ = 0.002 (solid line) and ϵ = 0.01 (dashed line).Curves are shown for n = 10, 20, 30 qubits.In each case we averaged over 30 runs of the VQLS algorithm with four layers of the Layered Hardware-Efficient Ansatz of Fig.3, and we trained the gate sequence by minimizing CL of (6).While the the κ scaling appears to be sub-linear here, it is known that linear scaling is optimal in general[6], and hence the observed scaling is likely specific to this example.

Figure 5 :
Figure 5: Scaling with 1/ϵ for the Ising-inspired QLSP.Curves are shown for n = 10, 20, 30 qubits, with κ = 60 (solid line) or κ = 200 (dashed line).In all cases V (α) was composed of four layers of the Layered Hardware-Efficient Ansatz of Fig.3, and we trained the local cost CL.The time-to-solution was obtained by averaging over 30 runs of the VQLS algorithm.The inset depicts the same data in a logarithmic scale.The dependence on 1/ϵ appears to be logarithmic, i.e., linear on a logarithmic scale.

Figure 6 :
Figure 6: Scaling with n for the Ising-inspired QLSP.Curves are shown for ϵ = 0.01 and for κ = 60, 120, 200.In all cases we trained the local cost CL with four layers of the Layered Hardware-Efficient Ansatz of Fig. 3.The dependence on n appears to be linear (logarithmic in N ) for this example.

Figure 8 :
Figure8: Implementation of VQLS on Rigetti's quantum hardware.Cost function CG is plotted against the number of optimization steps, where A is defined in(26).One can observe that for each QPU run the cost function is reduced to a value below 10 −1 .Due to noise present in the quantum device the cost does not go to zero.
which can be accomplished by means of the circuit in Fig.9(c).

Figure 9 :
Figure 9: a) Circuit for the Hadamard Test used to compute the coefficientsβ ll ′ = ⟨0|V † A † l ′ A l V |0⟩and calculate the inner product ⟨ψ|ψ⟩ of(14).The phase gate in the colored box is excluded when calculating the real part of β ll ′ and included when calculating its imaginary part.b) Hadamard-Overlap Test used to compute the coefficients γ ll ′ defined in(17).The Overlap circuit of Refs.[47,71] is indicated in the dashed box.Here, the Rz gate in the colored box denotes a rotation about the z axis of an angle −π/2.Excluding (including) this rotation allows one to calculate the real (imaginary) part of γ ll ′ .As explained in the text, additional post-processing is required.c) Hadamard Test circuit for computing δ

1 (− 1 ) 1 ϕ p2q2 2 .p2q2 2 .
Tr[ρ|ϕ pq ⟩⟨ϕ pq |] the probability of obtaining the outcome |ϕ peq ⟩, by measuring ρ on the Bell basis, then pq=0,δ pq,11 P (ϕ pq ) = P (ϕ 00 ) + P (ϕ 01 ) + P (ϕ 10 ) − P (ϕ 11 ) = Tr[ρSWAP] , (38)where here SWAP denotes the standard two-qubit SWAP gate.The previous result follows from the fact that the SWAP gate is diagonal in the Bell basis with |ϕ 00 ⟩, |ϕ 01 ⟩, |ϕ 10 ⟩ being eigenvectors of eigenvalue one, and |ϕ 11 ⟩ being an eigenvector of eigenvalue minus one.As such, a measurement on the Bell basis plus appropriate classical post-processing allows us to compute the expectation value of the SWAP operator.From the previous, let us note that the measurement outcome from Fig.9(b) can be expressed by the tuple xϕ p1q1 . .ϕ pnqn n where x = 0, 1 denotes the measurement outcome of on the ancilla qubits, and ϕ pj qj j with p j , q j = 0, 1 and the Bell basis measurement outcome on the j-th qubits of S 1 and S 2 .With this notation, P (xϕ p1q1 1 ϕ

Figure 10 :
Figure 10: Schematic representation of the circuit used to compute CL for the specific case when A = 1, and |x⟩ = V |0⟩, |b⟩ = U |0⟩ are 2n + 1-qubit states such that U † V is given by (50).Shown is the measurement of the qubit in S1.The dashed box indicates the entangling gate E. The phase gate in the colored box is excluded when calculating the real part of Tr W and included when calculating the imaginary part.
Conversely, assuming that A|x⟩ ̸ = |b⟩ implies that C G > 0 and hence that all four cost functions are positive.Therefore, all four cost functions are faithful, vanishing if and only if A|x⟩ = |b⟩.
Hence, by means of the Hadamard Test we can compute the real part of β ll ′ as Re[β ll ′ ] = P (0) − P (1) .With a similar argument it can be easily shown that by including the phase gate one can compute Im[β ll ′ ].

Table S6 :
Supplementary TableS2: Expectation value of an observable M computed with the exact solution, and with the output solution of VQLS.D(M ) measures the difference between these two results.The linear system considered is A2×2 = 1 Expectation value of observables M computed with the exact solution, and with the output solution of VQLS.D(M ) measures the difference between these two results.The linear system considered is A32×32 = 1 + 0.25X5, and |b⟩ = H ⊗5 |0⟩.

Table S7 :
Expectation value of observables M computed with the exact solution, and with the output solution of VQLS.D(M ) measures the difference between these two results.The linear system considered is A32×32 = 1 + 0.2X1Z2 + 0.2X1, and |b⟩ = H1H3H4H5|0⟩.