Variational Quantum Singular Value Decomposition

Singular value decomposition is central to many problems in engineering and scientific fields. Several quantum algorithms have been proposed to determine the singular values and their associated singular vectors of a given matrix. Although these algorithms are promising, the required quantum subroutines and resources are too costly on near-term quantum devices. In this work, we propose a variational quantum algorithm for singular value decomposition (VQSVD). By exploiting the variational principles for singular values and the Ky Fan Theorem, we design a novel loss function such that two quantum neural networks (or parameterized quantum circuits) could be trained to learn the singular vectors and output the corresponding singular values. Furthermore, we conduct numerical simulations of VQSVD for random matrices as well as its applications in image compression of handwritten digits. Finally, we discuss the applications of our algorithm in recommendation systems and polar decomposition. Our work explores new avenues for quantum information processing beyond the conventional protocols that only works for Hermitian data, and reveals the capability of matrix decomposition on near-term quantum devices.


Introduction
Matrix decompositions are integral parts of many algorithms in optimization [1], machine learning [2], and recommendation systems [3]. One crucial approach is the singular value decomposition (SVD). Mathematical applications of the SVD include computing the pseudoinverse, matrix approximation, and estimating the range and null space of a matrix. SVD has also been successfully applied to many areas of science and engineering industry, such as data compression, noise reduction, and image processing. The goal of SVD is to decompose a square matrix M to U DV † with diagonal matrix D = diag(d 1 , · · · , d r ) and unitaries U and V , where r denotes the rank of matrix M .
Quantum computing is believed to deliver new technology to speed up computation, and it already promises speedups for integer factoring [4] and database search [5] in theory. Enormous efforts have been made in exploring the possibility of using quantum resources to speed up other important tasks, including linear system solvers [6][7][8][9][10], convex optimizations [11][12][13][14], and machine learning [15][16][17]. Quantum algorithms for SVD have been proposed in [18,19], which leads to applications in solving linear systems of equations [9] and developing quantum recommendation systems [18]. However, these algorithms above are too costly to be convincingly validated for nearterm quantum devices, which only support a restricted number of physical qubits and limited gate fidelity.
In this paper, we formulate the task of SVD as an optimization problem and derive a variational quantum algorithm for singular value decomposition (VQSVD) that can be implemented on near-term quantum computers. The core idea is to construct a novel loss function inspired by the variational principles and properties of singular values. We theoretically show that the optimized quantum neural networks based on this loss function could learn the singular vectors of a given matrix. That is, we could train two quantum neural networks U (α) and V (β) to learn the singular vectors of a matrix M in the sense that M ≈ U (α)DV (β) † , where the diagonal matrix D provides us the singular values. Our approach generalizes the conventional methods of Hamiltonian diagonalization [29,50] to a non-Hermitian regime, extending the capabilities of matrix decomposition on near-term quantum computers. As a proof of principle, we conduct numerical simulations to estimate the SVD of random 8 × 8 matrices. Furthermore, we explore the possibility of applying VQSVD to compress images of size 32 × 32 pixel, including the famous MNIST dataset. Finally, we showcase the applications of VQSVD in recommendation systems and polar decomposition.

Variational quantum singular value decomposition
In this section, we present a variational quantum algorithm for singular value decomposition of n × n matrices, and it can be naturally generalized for n × m complex matrices. For given n × n matrix M ∈ R n×n , there exists a decomposition of the form M = U DV † , where U, V ∈ R n×n are unitary operators and D is a diagonal matrix with r positive entries d 1 , · · · , d r and r is the rank of M . Alternatively, we write M = r j=1 d j |u j v j |, where |u j , |v j , and d j are the sets of left and right orthonormal singular vectors, and singular values of M , respectively.
A vital issue in NISQ algorithm is to choose a suitable loss function. A desirable loss function here should be able to output the target singular values and vectors after the optimization and in particular should be implementable on near-term devices. As a key step, we design such a desirable loss function for quantum singular value decomposition (cf. Section 2.2).
The input of our VQSVD algorithm is a decomposition of the matrix M into a linear combination of K unitaries of the form M = K k=1 c k A k with real numbers c k . For instance, one could decompose M into a linear combination of Pauli terms. And a method for finding such a decomposition was proposed in a recent work [51]. Several important generic matrices in engineering and science also exhibit LCUs, including Toeplitz, circulant, and Hankle matrices [52]. After taking in the inputs, our VQSVD algorithm enters a hybrid quantum-classical optimization loop to train the parameters α and β in the parameterized quantum circuits U (α) and V (β) via a designed loss L(α, β) (cf. Section 2.2). This loss function can be computed on quantum computers via the Hadamard tests. We then feeds the value of the loss function or its gradients (in gradient-based optimization) to a classical computer, which adjusts the parameters α and β for the next round of the loop. The goal is to find the global minimum of L(α, β), i.e., α * , β * := arg min α,β L(α, β).
In practice, one will need to set some termination condition (e.g., convergence tolerance) on the optimization loop. After the hybrid optimization, one will obtain values {m j } T j=1 and optimal parameters α * and β * . The outputs {m j } T j=1 approximate the singular values of M , and approximate singular vectors of M are obtained by inputting optimal parameters α * and β * into the parameterized circuits U and V in VQSVD and then applying to the orthonormal vectors |ψ j for all j = 1, ..., T . The detailed VQSVD algorithm is included in Algorithm 1. A schematic diagram is shown in Fig. 1. The algorithm also requires the desired number of singular values T , orthonormal input states ψi|ψj = δij and two parametrized unitary matrices U (α), V (β). Finally, the weight is usually set to be integers qj = T + 1 − j. The former two information will be sent to the hybrid-optimization loop in (b) where the quantum computer (QPU) will estimate each singular value mj = Re ψj|U † M V |ψj via Hadamard test. These estimations are sent to a classical computer (CPU) to evaluate the loss function until it converges to tolerance ε. Once we reach the global minimum, the singular vectors {|ûj , |vj } can be produced in (c) by applying the learned unitary matrices U (α * ) and V (β * ) on orthonormal basis {|ψj } T j=1 to extract the column vectors.

Loss function
In this section, we provide more details and intuitions of the loss function in VQSVD. The key idea is to exploit the variational principles in matrix computation, which have great importance in analysis for error bounds of matrix analysis. In particular, the singular values satisfy a subtler variational property that incorporates both left and right singular vectors at the same time. For a given n × n matrix M , the largest singular value of M can be characterized by where S is the set of pure states (normalized vectors) and Re means taking the real part. Moreover, by denoting the optimal singular vectors as |u 1 , |v 1 , the remaining singular values (d 2 ≥ · · · ≥ d r ) can be deduced using similar methods by restricting the unit vectors to be orthogonal to previous singular value vectors. For a given n × n matrix M , the largest singular value of M can be characterized by where S is the set of pure states (normalized vectors) and Re[·] means to take the real part. Moreover, by denoting the optimal singular vectors as |u 1 , |v 1 , the remaining singular values (d 2 ≥ · · · ≥ d r ) can be deduced as follows Another useful fact (Ky Fan Theorem, cf. [53,54]) is that For a given matrix M , the loss function in our VQSVD algorithm is defined as where q 1 > · · · > q T > 0 are real weights and {ψ j } T j=1 is a set of orthonormal states. The setting of constants q j not only theoretically guarantees the correct outcome but also allows our approach to be more flexible. Usually, choosing these constants with excellent performances are empirical.

Theorem 1 For a given matrix M , the loss function L(α, β) is maximized if only if
where d 1 ≥ · · · ≥ d T are the largest T singular values of M and |ψ 1 , · · · , |ψ T are orthonormal vectors. Moreover, ψ j |U (α) † and V (β)|ψ j are left and right singular vectors, respectively.
Proof Let assume that ψ j |U (α) † M V (β)|ψ j = m j are real numbers for simplicity, which could be achieved after the ideal maximization process. Then we have Assume q T +1 = 0. The first inequality Eq.
Therefore, the loss function is maximized if and only if ψ j |U (α) † M V (β)|ψ j extracts the singular values d j for each j from 1 to T . Further, due to the variational property of singular values in Eq. (2) and Eq. (3), we conclude that the quantum neural networks U and V learn the singular vectors in the sense that ψ j |U (α) † and V (β)|ψ j extract the left and right singular vectors, respectively.
In particular, the weights q j can be any sequence that satisfies the condition in Theorem 1. They are the key to train the quantum neural networks to learn the right singular vectors. Such weights can be understood as the tool to discriminate and identify the eigen-space.

Proposition 2 The loss function L(α, β) can be estimated on near-term quantum devices.
To estimate the quantity in Eq. (12), we could use quantum subroutines for estimating the quantity Re ψ|U |ψ for a general unitary U . One of these subroutines is to utilize the well-known Hadamard test [55], which requires only one ancillary qubit, one copy of state |ψ , and one controlled unitary operation U , and hence it can be experimentally implemented on near term quantum hardware. To be specific, Hadamard test (see Fig. 2) starts with state |+ A |ψ W , where A denotes the ancillary qubit, and W denotes the work register, and then apply a controlled unitary U , conditioned on the qubit in register A, to prepare state 1 , at last, apply a Hadamard gate on the ancillary qubit, and measure. If the measurement outcome is 0, then let the output be 1; otherwise, let the output be −1, and the expectation of output is Re ψ|U |ψ . As for the imaginary part Im ψ|U |ψ , it also can be estimated via Hadamard test by starting with state Remark 1: Notice that terms in Eq. (12) may be exponentially many for matrices with particular real-world applications, endowing a potential obstacle to the loss evaluation. However, we could apply the importance sampling technique to circumvent this issue. We provide a detailed discussion in Appendix B. In particular, we show that the loss evaluation efficiency is dependent on the 1norm of all coefficients c k rather than the integer K. Hence, for a matrix with a reasonable 1 -norm (e.g., polynomial size), the evaluation could be efficient even for large-size problems. Besides, we take the circulant matrix as an example in Appendix B for illustration, suggesting the potential application of our approach in related practical fields.

As a generalization of the VQE family
In this subsection, we discuss the connection between our VQSVD algorithm and the variational quantum eigensolver (VQE) [25], which estimates the ground-state energy given a molecular electronic-structure Hamiltonian H. This is a central problem to quantum chemistry as many properties of the molecule can be calculated once we determine the eigenstates of H. Several related algorithms have been proposed to improve the spectrum learning capability of VQE (i.e., SSVQE [31]) such that the properties of excited states can also be explored. We consider these algorithms as a unified family and promising applications for quantum chemistry.
For practical reasons, one has to discretize the aforementioned Hamiltonian H into a series of Pauli tensor products d i ⊗ · · · ⊗ d j to work on quantum devices. Many physical models naturally fit into this scheme including the quantum Ising model and the Heisenberg Model. In particular, if the input of VQSVD in Eq. (5) is a Hamiltonian in a discretized form (i.e., a linear combination of Pauli tensor products or equivalently a Hermitian matrix), VQSVD could be naturally applied to diagonalize this Hamiltonian and prepare the eigenstates. Therefore, VQSVD can be seen as a generalization of the VQE family that works not only for Hamiltonians, but also for more general non-Hermitian matrices.

Optimization of the loss function
Finding optimal parameters {α * , β * } is a significant part of variational hybrid quantum-classical algorithms. Both gradient-based and gradient-free methods could be used to do the optimization. Here, we provide analytical details on the gradient-based approach, and we refer to [56] for more information on the optimization subroutines in variational quantum algorithms. Reference about gradients estimation via quantum devices can be found in Ref. [30,57,58].

Gradients estimation
Here, we discuss the computation of the gradient of the global loss function L(α, β) by giving an analytical expression, and show that the gradients can be estimated by shifting parameters of the circuit used for evaluating the loss function. In Algorithm 1, to prepare states |u j and |v j , we apply gate sequences U = U 1 ...U 1 and V = V 2 ...V 1 in turn to state |ψ j , where each gate U l and V k are either fixed, e.g., C-NOT gate, or parameterized, for all l = 1, ..., 1 and k = 1, ..., 2 . The parameterized gates U l and V k have forms U l = e −iH l α l /2 and V k = e −iQ k β k /2 , respectively, where α l and β k are real parameters, and H l and Q k are tensor products of Pauli matrices. Hence the gradient of loss function L is dependent on parameters (α, β) and the following proposition shows it can be computed on near-term quantum devices.
Proof Notice that the partial derivatives of loss function are given by Eqs. (14) (15), and hence the gradient is computed by shifting the parameters of circuits that are used to evaluate the loss function. Since the loss function can be estimated on near term quantum devices, claimed in Proposition 2, thus, the gradient can be calculated on near-term quantum devices. The derivations of Eqs. (14) (15) use the fact that ∂ θ (Rez(θ)) = Re(∂ θ z(θ)), where z(θ) is a parameterized complex number, and more details of derivation are deferred to Appendix A.

Barren plateaus
It has been shown that when employing hardware-efficient ansatzes (usually problem agnostic) [59], global cost functions Ô = Tr[ÔU (θ)ρU (θ) † ] of an observableÔ are untrainable for large problem sizes since they exhibit exponentially vanishing gradients with respect to the qubit number N which makes the optimization landscape flat (i.e., barren plateaus [60]). Consequently, traditional optimization methods including the Adam optimizer utilized in our numerical experiments are impacted. This trainability issue happens even when the ansatz is short depth [61] and the work [62] showed that the barren plateau phenomenon could also arise in the architecture of dissipative quantum neural networks [63][64][65].
Several approaches have been proposed to mitigate this problem. One can either implementing identity-block initialization strategy [66] or employing the technique of local cost [61], where the local cost function is defined such that one firstly construct a local observableÔ L , calculating the expectation value with respect to each individual qubit rather than gathering information in a global sense, and finally adding up all the local contributions. The latter strategy has been verified to extend the trainable circuit depth up to D ∈ O(poly(log(N ))). Recently, a new method of constructing adaptive Hamiltonians during the optimization loop has been proposed to resolve similar issues [67].

Numerical experiments and applications
Here we numerically simulate the VQSVD algorithm with randomly generated 8×8 non-Hermitian real matrices as a proof of concept. Then, to demonstrate the possibility of scaling VQSVD to larger and more exciting applications, we simulate VQSVD to compress some standard gray-scale 32 × 32 images. Fig. 3 shows the variational ansatz used. Since only real matrices are involved, the combination of R y rotation gates and CNOT is sufficient. For the case of complex matrices and different ansatzs, see Appendix C. We choose the input states to be {|ψ j } = {|000 , |001 , · · · , |111 } and the circuit depth D is set to be D = 20.

Three-Qubit example
The VQSVD algorithm described in Section 2.1 can find T largest singular values of matrix M n×n at once. Here, we choose the weight to be positive integers (q 1 , · · · , q T ) = (T, T − 1, · · · , 1). Fig. 4 shows the learning process. One can see that this approach successfully find the desired singular  the matrix norm ||A n×n || 2 = n i,j=1 |a ij | 2 where a ij are the matrix elements. As illustrated in Fig. 4, the distance decreases as more and more singular values being used.

Image compression
Next, we apply the VQSVD algorithm to compress a 32 × 32 pixel handwritten digit image taken from the MNIST dataset with only 7.81% (choose rank to be T = 5) of its original information. By comparing with the classical SVD method, one can see that the digit #7 is successfully reconstructed with some background noise. Notice the circuit structure demonstrated in Fig. 3 is ordinary and it is a not well-studied topic for circuit architecture. Future studies are needed to efficiently load classical information into NISQ devices beyond the LCU assumption.

Solution quality estimation
After obtaining the results (singular values and vectors) from Algorithm 1, it is natural and desirable to have a method to benchmark or verify these results. In this section, we further introduce a procedure for verification purposes. Particularly, we propose a variational quantum algorithm for estimating the sum of largest T squared singular values, i.e., T j=1 d 2 j , of a matrix M ∈ C m×n as a subroutine. In the following, we first define the error of the inferred singular values and singular vectors, and then show its estimation via a tool provided in Algorithm 3.
Let {m j } T j=1 denote the inferred singular values of the matrix M that are arranged in descending order, and let {|û j } T j=1 {|v j } T j=1 denote the associated inferred left (right) singular vectors. The error of inferred singular values is defined as follows, where d j s are the exact singular values of matrix M and also arranged in descending order. And the error v of inferred singular vectors is defined below, where H is a Hermitian operator of the form It is worth pointing out that when inferred vectors |ê ± j approximate the eigenvectors |e ± j of H, i.e., v → 0, inferred singular vectors |û j and |v j approximate the singular vectors |u j and |v j respectively, and vice versa. Now we are ready to introduce the procedure for verifying the quality of VQSVD outputs. To quantify the errors, we exploit the fact that errors d and v are upper bounded. Specifically speaking, We refer the detailed proofs for Eqs. (18), (19) to Lemma 4 included in Appendix D.
Thus, we only need to evaluate the sum T j=1 d 2 j , which is the Frobenius norm of matrix M , and the sum T j=1 m 2 j independently. The latter can be directly computed from the outputs of VQSVD while the former quantity can be estimated through Algorithm 3 presented in Appendix D.

Applications
In this section, we give a brief discussion about the application of VQSVD in recommendation systems, and some other problems such as polar decomposition.

Recommendation systems
The goal of a recommendation system is to provide personalized items to customers based on their ratings of items, or other information. Usually, the preferences of customers are modeled by an m × n matrix A, where we assume there are m customers and n items. The element A lj indicates the level that customer l rates the item j. In particular, finding suitable recommendations means to locate the large entries of rows.
One commonly used method for locating the large entries of rows is matrix reconstruction. Given an arbitrary matrix and integer k, the goal is to output a small k-rank matrix as an approximate matrix. In recommendation system, such an low-rank approximate matrix A k of the preference matrix A is usually used to recommend items. In this sense, our VQSVD algorithm finds its application in recommendation system by learning such a small rank matrix. To be more specific, the information about its singular values and vectors can be obtained.
After obtaining the low rank matrix A k , the process of recommendation for customer l proceeds as follows: project the lth row vector of A onto the space spanned by the right singular vectors of A k . Specifically, denote the right singular vectors of A k by |v 1 , ..., |v k , and projection operator by Π := k t=1 |v t v t |, then apply Π to the lth normalized row vectors |b of A. Specially, let | b denote the normalized vector after projection, given below: where the right singular vectors {|v t } form an orthogonal basis, ξ t are the corresponding coefficients, and G denotes the normalization factor, i.e., G = k t=1 (ξ t ) 2 . The state | b can be efficiently prepared via paramerized circuit V (β) as long as we can prepare the state 1 G k t=1 ξ t |ψ t . Particularly, there are k coefficients ξ t in Eq. (20), determining | b . Further, we present a procedure for estimating these coefficients in Algorithm 2 (shown below).

5:
Apply U b to state |0 and obtain |b = U b |0 ;
Finally, measure | b in the computational basis and output the outcome.

Polar decomposition and other applications
Another important application of our VQSVD algorithm is finding the polar decomposition which has many applications in linear algebra [71,72]. Recently, Lloyd et al [73] proposed a quantum polar decomposition algorithm that performs in a deterministic way. For a given a complex matrix M ∈ C n×n , the right polar decomposition is defined as follows, where W is a unitary and P is a positive-semidefinite Hermitian matrix. Particularly, suppose the singular value decomposition of M is calculated through the VQSVD algorithm M = U DV † , then W = U V † and P = V DV † . It is interesting to explore whether our VQSVD could be applied to polar decomposition. For Hermitian matrices, the VQSVD algorithm can be applied as an eigensolver since singular value decomposition reduces to spectral decomposition in this case. Recently, some work has been proposed to extract eigenvalues and reconstruct eigenvectors of Hermitian matrices [31,74], density operators [75].
In the context of quantum information, SVD could be used to compute the Schmidt decomposition of bipartite pure states and we note that our VQSVD algorithm could also be applied to do Schmidt decomposition. Meanwhile, a recent work by Bravo-Prieto et al. [76] introduces a novel varitional quantum algorithm for obtaining the Schmidt coefficients and associated orthonormal vectors of a bipartite pure state. In comparison, our VQSVD algorithm can deal with the SVD of the general matrix, while Ref. [76] is intended for the Schmidt decomposition of bipartite pure states.

Discussion and outlook
To summarize, we have presented a variational quantum algorithm for singular value decomposition with NISQ devices. One key contribution is to design a loss function that could be used to train the quantum neural networks to learn the left and right singular vectors and output the target singular values. Further improvements on the performance of our VQSVD algorithm may be done for sparse matrices together with more sophisticated ansatzes. We have numerically verified our algorithm for singular value decomposition of random matrices and image compression and proposed extensive applications in solving linear systems of equations. As a generalization of the family of VQE algorithms on Hamiltonian diagonalization (or spectral decomposition), the VQSVD algorithm may have potential applications in quantum chemistry, quantum machine learning. and quantum optimization in the NISQ era.
Our algorithm theoretically works well for large-scale problems if we could have carefully handled the trainability and accuracy of QNN. Just like regular variational quantum algorithms, there are still several challenges that need to be addressed to maintain the hope of achieving quantum speedups when scaling up to large-scale problems on NISQ devices. We refer to the recent review papers [22][23][24] for potential solutions to these challenges.
One future direction is to develop near-term quantum algorithms for non-negative matrix factorization [3] which have various applications and broad interests in machine learning. See [77] as an example of the quantum solution. Another interesting direction is to develop near-term quantum algorithms for higher-order singular value decomposition [78].
Consider the parametrized quantum circuit U (α) = Π 1 i=n U i (α i ) and V (β) = Π 1 j=m V j (β j ). For convenience, denote U i:j = U i · · · U j . We can write the cost function as: Absorb most gates into state |ψ j and matrix M , where |ϕ j = V k−1:1 |ψ j , |φ j = U −1:1 |ψ j and G ≡ U † +1:n M V m:k+1 . We assume U = e −iα H /2 is generated by a Pauli product H and same for V k = e −iβ k Q k /2 . The derivative with respect to a certain angle is Thus the gradient is calculated to be With the following property, we can absorb the Pauli product H by an rotation on α → α + π U (±π) = e ∓iπH /2 = cos( Plug it back and we get, This can be further simplified as Similarly, for β k we have This can be further simplified as One can see from the above derivation that calculating the analytical gradient of VQSVD algorithm simply means rotating a specific gate parameter (angle α or β k ) by ±π which can be easily implemented on near-term quantum devices.

B Supplemental material for cost evaluation
In this section, we show how to apply the importance sampling technique to reduce the cost of estimating the loss function in Eq. (12). For convenience, we restate Eq. (12) below.
We assume all coefficients c k in Eq. (12) are positive, since their signs can be absorbed into unitaries A k . Define an importance sampling in a sense that where c = (c 1 , c 2 , . . . , c K ), and · 1 denotes the 1 -norm. Random variable R indicates that each unitary A k is selected at random with probability proportional to its weight. Here, we rewrite Eq. (34) by using R.
As C(α, β) is formed as an expectation, the sample mean is supposed to estimate it. By Chebyshev's inequality, O(Var/ 2 ) samples suffice to compute an estimate up to precision with high probability, where Var denotes the variance, and denotes the precision. By Chernoff bound, the probability can be further improved to 1 − δ costing an additional multiplicative factor O(log(1/δ)). Regarding C(α, β), the variance is bounded by c 2 1 . We, therefore, only need to choose O( c 2 1 log(1/δ)/ 2 ) many unitaries to evaluate C(α, β). Viewed from this point, the cost of the evaluation is dependent on the c 1 instead of the integer K. As a result, the loss evaluation could be efficient if the c 1 is small, even there are exponentially many terms.
Example. Suppose we use VQSVD to compute SVDs of circulant matrices, which have applications in signal processing, image compression, number theory and cryptography. A circulant matrix C d is given by Clearly, the matrix C d is determined by the sequence c = (c 0 , c 1 , . . . , c d−1 ). It can be easily decomposed into a weighted sum of cyclic permutations. where The loss for a circulant matrix in Eq. (34) is given by Hence, for circulant matrices with feasible cost c 1 (e.g., polynomial size), the loss evaluation could be quite efficient even for large-size problems. Our VQSVD algorithm is expected to find further applications in circulant matrix-related problems.

C Supplemental material for different ansatzs
In this section, we supplement additional numerics on 8 × 8 real matrices to explore the effect of different circuit ansatzs and later test our VQSVD algorithm on random complex matrices. First, we introduce several ansatz candidates. After introducing all the candidates, we test the distance measure on a real matrix similar to Fig. 4. For fair comparison, we want all the ansatz candidates to have the same total amount of Other setups including desired rank T = 8, the weight coefficient (T, T − 1, · · · , 1), the random seed for generating trainable parameters, number of optimization iterations ITR=200, learning rate LR=0.05, and the Adam optimizer are fixed. The result is illustrated in Fig. 7. The advantage of candidate (a) and (d) is observed consistently in numerical experiments. Finally, we repeat the same experiments on random 8 × 8 complex matrices. No clear advantage of a specific ansatz candidate is observed under this setup. Most of the time, candidate (a), (c) and (d) would return a better result compare to candidate (b). But in certain cases, the performance of candidate (a) could be very bad due to the lack of entangling capability. We report a case study in Fig. 8. Further studies are needed to check more ansatz candidates by manipulating the entangling structure.

D Supplemental material for verification of the solution quality
In this section, we provide the necessary proofs in Sec. 5 and detailed discussions on variational quantum Frobenius norm estimation algorithm.

D.1 Definitions
Recall that the error of inferred singular values is defined as follows, where d j are the exact singular values of matrix M and also arranged in descending order. And the error v of inferred singular vectors is defined below, where H is a Hermitian of the form H = |0 1|⊗M +|1 0|⊗M † , and |ê ± j = (|0 |û j ±|1 |v j )/ √ 2. The quantity H|ê ± j ∓ m j |ê ± j 2 quantifies the component of H|ê ± j that is perpendicular to |ê ± j , which follows from (I − ê ± j ê ± j )H|ê ± j . It is worth pointing out that when inferred vectors |ê ± j approximate the eigenvectors |e ± j , where |e ± j = (|0 |u j ± |1 |v j )/ √ 2, of H, i.e., v → 0, inferred singular vectors |û j and |v j approximate the singular vectors |u j and |v j respectively, and vice versa. On the other hand, the error v which is used to quantify the extent that vectors |ê ± j approximate eigenvector |e ± j can quantify the extent that inferred vectors {u j } and {v j } approximate the singular vectors. Specifically, these distances have an equal relation, which is depicted in the following equation.
where D denotes the distance between vectors.
Here, we give the explicit forms of the distances . (42). The distances between {|u j , |v j } and {|û j , |v j } are defined in the following form, ≡ |u j − |û j 2 + |v j − |v j 2 .
And the distances between |e ± j and |ê ± j are defined below, Notice that both of the Right-Hand-Sides of Eqs. (44), (46) are equivalent to 4 − 2(Re u j |û j + Re v j |v j ), and then the relation in Eq. (42) follows.

D.2 Error analysis
After running VQSVD, it would be ideal if we can verify the quality of the outputs from VQSVD. For achieving this purpose, we show that these error d and v are upper bounded and give the explicit form of upper bounds. We present the derivation for upper bounds on errors in the following lemma.