Quantum Phase Recognition via Quantum Kernel Methods

The application of quantum computation to accelerate machine learning algorithms is one of the most promising areas of research in quantum algorithms. In this paper, we explore the power of quantum learning algorithms in solving an important class of Quantum Phase Recognition (QPR) problems, which are crucially important in understanding many-particle quantum systems. We prove that, under widely believed complexity theory assumptions, there exists a wide range of QPR problems that cannot be efficiently solved by classical learning algorithms with classical resources. Whereas using a quantum computer, we prove the efficiency and robustness of quantum kernel methods in solving QPR problems through Linear order parameter Observables. We numerically benchmark our algorithm for a variety of problems, including recognizing symmetry-protected topological phases and symmetry-broken phases. Our results highlight the capability of quantum machine learning in predicting such quantum phase transitions in many-particle systems.

quantum phases of matter and proposed a provable efficient classical method in learning through shadow tomography [18].
However, for quantum many-body systems that possess intricate and long-range entanglement, and if the target quantum phase is determined by a nonlocal order parameter observable, the required sampling complexity is expected to increase exponentially with respect to the system size. Quantum machine learning has been intensely studied in terms of its expressive ability [19][20][21], optimization [22][23][24][25], provable quantum advantages [26][27][28], as well as potential limitations [29]. Meanwhile, recent pioneering experiments on quantum computer processors [26,[30][31][32] have demonstrated significant quantum computing advantages in random state sampling [30-32] and density matrix learning problems [26]. In detail, Huang et al. [26,27] proved that the entangled Bell-Measurement protocol can efficiently extract information from an unknown density matrix and predict its linear properties, meanwhile, it is classically hard in the worst-case scenario. Therefore, two critical questions are still open: (1) what is the limitation of classical machine learning in solving quantum many-body problems? and (2) whether a near-term quantum computer can enhance the power of classical learning algorithms in solving practical problems?
In this paper, we provide a novel approach to address the above two questions. For a many-body quantum system described by an n-qubit parameterized Hamiltonian H(a) = m j=1 a j P j , where a ∈ R m represents external parameters and P j are n-qubit Pauli operators, we focus on a class of Quantum Phase Recognition problems that can be distinguished by a Linear order parameter Observable, termed as the LO-QPR problem. We aim at learning about detailed phase transitions of many-particle quantum systems using a quantum kernel method. In the learning phase, a classical training data set is used, where a i and b i are respectively the external parameters and ground state property observed from experiments. In the prediction phase, the learning algorithm succeeds if it correctly predicts the property b of the ground state |ψ(a) . For example, considering the Ising Hamiltonian, the external parameter a could be the strength of the transverse magnetic field, and b represents quantum phases such as the paramagnetic, ferromagnetic, and antiferromagnetic phases. Phase transitions occur when the external parame-ters varies [33], and the ability to correctly predict the quantum phase transition boundary can help us understand many strong-correlated systems, even for canonical microscopic physical models [34].
Under two widely accepted assumptions: (1) the polynomial hierarchy does not collapse in the computational complexity theory, and (2) the classical hardness for ground state sampling holds, we prove that certain LO-QPR problems are hard for any classical machine learning methods with classical resources. We demonstrate that if these LO-QPR problems could be solved by a classical learner with classical resources (even up to a general additive error tolerance), then the infinite tower of the polynomial hierarchy would collapse to its second level. While this does not imply that P = NP, such a collapse is also widely regarded as being implausible. We therefore answer the first question by showing the exact limitation of classical machine learning in LO-QPR.
The rapid advancement of realistic quantum devices provides an opportunity to answer the second question in a fundamentally different and more powerful way compared with classical ML. Instead of classically simulating ground states and then infer quantum phase transition, we utilize quantum machine learning to extract high-level abstractions from observed data and directly process quantum ground states information by a quantum computer. Here, the ground state |ψ(a) of H(a) embeds classical external parameter a onto a specific quantum-enhanced feature space, where inner products of such quantum feature states give rise to a quantum kernel, a metric to characterize distances in the feature space. As a result, predicting the ground state property can be transformed into quantum state overlap computation, and thus bypasses the required exponential sample complexity in Ref. [17]. We prove that the proposed Quantum Kernel Alphatron (QKA) algorithm can efficiently learn from quantum data and solve LO-QPR problems with a promisingly small learning error. We benchmark the proposed QKA in detecting symmetry-protected topological phases and symmetry broken phases, and simulation results show better performances compared with previous QMLs [35] and classical MLs [17] This paper is organized as follows. In Sec. 2 we review some related works and give the definition of the LO-QPR problem, then introduce supervised learning with quantum feature spaces. We prove the hardness of classical learning algorithms in Sec. 3. Sec. 4 details our quantum learning algorithm for LO-QPR problem. Sec. 5 presents our numerical simulations. Sec 6 classifies various complexity classes of learning algorithms. Finally, Sec. 7 concludes the paper.

Preliminaries
To clearly demonstrate our contributions in this paper, we first review previous related works and define the learning and computation tasks of interest. [26,27]). Given an artificial n-qubit density matrix ρ = (I + 0.9P )/2 n where P ∈ P = {I, X, Y, Z} ⊗n \ I ⊗n . The learning algorithms learn about ρ through conventional or quantum-enhanced measurement strategies. The learning algorithm succeeds if it can correctly predict the expectation value Tr (ρQ) within an additive error with 3/4 probability for Q ∈ P.

Task 1 (Density Matrix Learning
In [26,27], the authors proved that the entangled Bell-Measurement protocol can use O(n/ 4 ) copies of ρ to solve this learning problem, meanwhile, it is classically hard in the worst-case scenario in estimating Tr (ρQ) for some Q ∈ P.

Task 2 (Quantum Phase Learning [17]). Given the shadow tomography data set
is a classical representation of the ground state of a Hamiltonian H(a i ), the task is to determine the quantum phase of matter for each Φ shadow (a i ) ∈ D.
In [17], the authors developed a "classical" learning algorithm based on classical data, which are however obtained by shadow tomography of the target ground state generated by a quantum computer. As they discussed in [36], when the quantum phase transition can only be determined by a non-local order parameter observable, the sample complexity is expected to increase exponentially with respect to the system size. In this case, a better way is using a quantum computer to learn directly from the quantum phase value b ∈ {0, 1} observed from the experiment, which will be defined as Task 4 below. Task 3 (Ground state Linear Property (GLP) problem). Given an n-qubit Hamiltonian H(a) with external parameters a and an observable M ∈ C 2 n ×2 n , the goal of GLP is to approximate the ground state property b = ψ(a)|M|ψ(a) to additive error = 1/poly(n), that is |b − b| ≤ , whereb represents the estimated property and |ψ(a) is the ground state of H(a).
Noting that the GLP problem could characterize a class of quantum phase problem that is determined by a linear order parameter, which is one of the most significant elements in understanding quantum fluctuation phenomenons in condensed-matter systems [41,42]. For example, considering the Ising Hamiltonian, the external parameter a and the order parameter observable M could be the strength of the transverse magnetic field and the spin correlation respectively, and quantum phases b include paramagnetic, ferromagnetic, and antiferromagnetic phases.  In general, it would be hard to recognize quantum phases of an arbitrary many-body quantum system, owing to the hardness of obtaining the ground state and the fact that the order parameter is generally unknown. Nevertheless, there may also exist cases where the problem is exactly efficiently solvable for very specific choices of parameters. Then, a natural question is, based on the solvable or known quantum phases from experiments, whether we could learn and predict GLP for other external parameter domains. Therefore, it is reasonable to consider the learning version of the GLP problem.

Task 4 (LO-QPR). Given the training data
for which a i indicates the external parameter, and b i represents its quantum phase value characterized by some unknown observable, associated with an n-qubit Hamiltonian H (a i ) = k a (k) i P k , our aim is to efficiently learn a model h(a) enabling the risk The crux of the matter on solving LO-QPR problems can be summarized into two levels: how to generate high-quality training data and how to efficiently learn from these data. Solving LO-QPR problems inevitably involves high dimensional Hilbert space, and classical learning algorithms thus are challenging in both generating and operating training data with quantum entanglement.

Express ground state by quantum circuit
The brickwork architecture A is a general approach to construct ansatz for ground state computation [37], whose structure is formed as follows: perform a string of two-qubit gates U 1 ⊗ U 2 ⊗ ... ⊗ U n/2 as the first layer, then perform a staggering string of gates, as illustrated as Fig. 1 (a). Meanwhile, the brickwork architecture can induce structured variational quantum circuit, which is a convinced method in approximating the ground state of many-body Hamiltonians H(x) [39,[43][44][45][46][47].
The key idea of using variational quantum circuit is that the parameterized quantum state is prepared and measured on a quantum computer, and the classical optimizer updates the parameter θ according to the measurement information. With the brickwork architecture A, the variational ansatz can be prepared by is composed of D unitaries U d (θ d ) whose structure is shown in Fig. 1 (b). After several classical optimisation steps, the classical optimizer can provide a parameter θ x enabling |Ψ(θ x ) be an approximation of |ψ(x) . We provide a deterministic method in finding a θ x to approximate |ψ(x) by using the quantum imaginary time evolution in the Appendix B. Since the variational quantum circuit U (θ) has the same architecture A to that of random circuit U , and two-qubit gates U i (θ i ) are sampled from a subset of SU(4). Then the relationship U A (θ) ⊆ U A holds, where U A (θ) and U A denote the set of U (θ) and random circuit set, respectively.

Classical hardness results
The GLP problem is an instance of the mean-value problem which is the central part of the variational quantum algorithms, and "Is the quantum computer necessary for the mean value problem? " is still an open problem as mentioned in [48]. Note that for a quantum state |ψ(x) = U |0 n , Ref [48] proposed an upper bound on estimating ψ(x)|M|ψ(x) in the case of a poly(n)-depth U associated with |ψ(x) . Here, we show the reduction from quantum circuit sampling problem to the GLP problem (see the proof in Lemma 1), and prove that the classical hardness of quantum sampling will imply the hardness of the GLP problem in the worst-case scenario, and thus provide a lower bound on estimating ψ(x)|M|ψ(x) .
The proof of Lemma 1 is inspired by the classical hardness conjecture of the random quantum state sampling problem in [49]. The random quantum state sampling is an artificial problem that samples from the output distribution of some experimentally feasible quantum algorithms. Sample one particular bitstring j from a random quantum circuit U with the exact probability p U (j) is believed classically hard under standard complexity assumptions, which is also recognized as the worst-case hardness.
However, a convincing quantum advantage must be established in the average-case scenario, that is, the classical hardness should be held across the entire distribution, rather than concentrated in a single quantum process and output. Bouland et al. [49] introduced the Feynman path integral to connect a bridge between a fixed outcome j and a low-degree multivariate polynomial described by quantum gates, and thus proved the worst-to-average reduction. We utilize a similar Feynman path integral method to prove the classical learning hardness result for LO-QPR problem (see Theorem 1).
The probability distribution of a truly random quantum state |ψ possesses the Porter-Thomas (PT) distribution Pr(| j|ψ | 2 ) = 2 n e −2 n | j|ψ | 2 , which is known to be classically hard to sample [30,50]. Whether ground states |ψ(x) of a family of Hamiltonian H(x) satisfy conjecture 1 can be verified by comparing probability distribution of |ψ(x) to the PT distribution. We show that if the probability distribution of a ground state |ψ(x) is O(n −1 )-close to PT distribution by means of trace distance, then sample from |ψ(x) is classically hard. Details refer to Theorem 4 in the Appendix E. 1 Since any quantum state can be encoded as a ground state of some Hamiltonian, the presented conjecture uses the 'ground state' to substitute 'random quantum state' mentioned in Ref. [49] Using this result and the relationship between ground states and variational quantum states, the hardness result for the GLP problem can be stated as the following lemma. We provide the detailed proof in the Appendix C.1. This lemma also serves for the hardness of the LO-QPR problem. Let |ψ(a) = U (θ a )|0 n be the ground state of Hamiltonian H(a) satisfying Lemma 1, where U (θ a ) ∈ U A (θ). For a general family of Hamiltonian H(a), the explicitly mathematical expression of its ground state is hard to be characterized, and variational quantum circuit is a natural expression which characterizes the circuit complexity of these concerned ground states, but not change their properties [39, 43-45]. As a starting point of worst case scenario |ψ(a) = U (θ a )|0 n , we provide a method on constructing ground states T = {|ψ(a i ) } by using variational circuit U (θ). Detail refers to the Appendix C.3. Then we have the following theorem for the average case hardness of the LO-QPR problem on T . Given classical training data S (given a i , its label b i can be computed by some classical Turing machines), we prove that no classical learning algorithm can efficiently learn the hypothesis h * such that R(h * (x)) is upper bounded by 1/poly(n) for x ∈ T .

Theorem 1. Given training data
(acquired from classical methods) for which (a i , b i ) indicate the external parameter and phase value associated with the n-qubit Hamiltonian H(a i ), there exists a testing set , such that predicting 8/9 of y i ∈ T with additive error 1/poly(n) is hard for any classical ML algorithm, with the assumption that Conjecture 1 holds and the PH does not collapse, where the scale of testing data M = poly(N ) and N = poly(n).  training set S (testing set T ). Generally, the task of supervised learning is to learn a label y of the testing datum x ∈ T ⊂ X from a distribution D(x) defined on the space X according to some decision rule h. The decision rule h is assigned by a selected machine learning model from the training set , where a i ∈ X follows distribution D(a i ), the label b i = h(a i ), and N is the size of the training set. Given the training set S, an efficient learner needs to generate a classifier h in poly(N ) time, to minimize the error The datum x is sampled randomly according to D(x), in both the training and testing procedure, and the size N of the training set is polynomial in the data dimension.
The kernel method has played a crucial role in the development of supervised learning [51-53], which provides an approach to increase the expressivity and trainability of the original training set. We can describe a kernel function K : X × X → R as K(x, x ) = Ψ(x) T Ψ(x ), where Ψ : X → H is the feature map which maps the datum x ∈ X to a higher-dimensional space H (feature space). Tremendous classical kernel methods [52, 53] have been proposed to learn the non-linear functions or decision boundaries. With the rapid development of quantum computers, there is a growing interest in exploring whether the quantum kernel method can surpass the classical kernel [54][55][56][57][58][59][60][61][62][63][64][65][66][67][68][69][70][71].
Here we leverage the quantum kernel as our kernel

Quantum Kernel Alphatron
Here, we show the possibility of solving the LO-QPR problem with quantum data by leveraging the quantum kernel method combined with the Alphatron algorithm [52]. From the learning theory perspective, training can be phrased as the empirical risk minimization, and the associated learning model h * for the minimise of the empirical risk follows the representer theorem.
be the training data set. Q : X × X → R be a quantum kernel with the kernel space H. Consider a strictly monotonic increasing regularisation function g : [0, ∞) → R, and regularised empirical riskR Then any minimizer of the empirical riskR L (h * ) admits a representation of the form , where α i ∈ R for all i ∈ {1, 2, ..., N }, x and a are drawn from the same distribution.
According to the above theorem, one of the options to the quantum kernel is Q(a i , x) = | ψ(a i )|ψ(x) | Prepare quantum state |ψ(a i ) = U (θ ai )|0 ⊗n , where θ ai := arg min θ 0 ⊗n |U † (θ)|H(a i )|U (θ)|0 ⊗n ; where |ψ(a i ) represents the ground state of H(a i ), and unknown order parameter observable M thus can be represented as a linear combination of feature states, that is, M ≈ i α i |ψ(a i ) ψ(a i )|. Given the kernel matrix Q = [Q(a i , a j )] N ×N , the optimal weight parameters α i in the expression of M has a closed-form solution by leveraging linear regression algorithms, and it requires O(N 2.373 ) time complexity for solving such problem [73].
Here, we provide a more advanced method in learning α i . From the learning theory perspective, training can be phrased as the empirical risk minimization, and the associated learning modelĥ * for the minimizer of the empirical risk can be learned as Alg. 1. As a result, given the quantum kernel matrix Q, the LO-QPR problem can be solved in T × N = O(N 1.5 ) running time, as shown in Theorem 3. The outline of quantum learning algorithm is shown in Fig. 2.
is a strictly monotonic increasing regularisation function such that E[g 2 ] ≤ ε g and Then for failure probability δ ∈ (0, 1), O N 5/2 copies of quantum states to estimate Q(a i , x), Alg. 1 outputs a hypothesisĥ * such that the training errorR L (ĥ * ) can be bounded by The essential difference between Alg. 1 and the original Alphatron algorithm is that we substitute the quantum kernel Q(a i , x) into the classical kernel function. Although estimating the quantum kernel will introduce an additive error q , we rigorously prove the robustness of Alg. 1 when encountering measurement errors. Specifically, utilizing O N 5/2 copies of quantum states to estimate the quantum kernel function, the estimation error q of Q(a i , x) can be upper bounded by which provides the quantum learner O N 5/2 quantum joint measurement overhead. This gives an upper bound of h t (x) −ĥ t (x) at the t-th iteration step, and the quantum empirical error R(ĥ) will saturate the upper bound as indicated in Eq. 5.
Remark (properties of Alg. 1) With the increase of training data scale N , Alg. 1 enablesR(ĥ * ) of QPR convergence to a low-level empirical risk, which is promised by Theorem 3. Specifically, if the quantum kernel matrix Q can be exactly calculated, then by Goel and Klivans [52], QKA will output a hypothesis h * with an O( 4 log(1/δ)/N ) empirical risk, where δ represents faliure probability. However, quantum kernel Q(a i , x) is actually obtained by performing Destructive-Swap-Test algorithm [74] finite rounds, and the estimatedQ has an additive error q to the exact Q. In this quantum scenario, utilizing O N 5/2 copies of quantum feature states to estimate the quantum kernel function suffices to provide an estimation of Q(a i , x) with q = O N −5/4 log(1/δ) additive error, and this leads the quantum empirical error where t Q is the time required to compute kernel functionQ. The above procedure introduces O N 5/2 quantum joint measurements overhead to the quantum learner, nevertheless, it provides a convincing performance for QKA. Further details are explained in the Appendix C.4.

Numerical Simulations
Given a general parameterized Hamiltonian H(a), there only exists specific choices of parameters a for which the ground state |ψ(a) can be classically solved. Thus the number of collected training data is often limited. In this paper, we explore the capability  of QKA algorithm with small-scale of training data in recognizing quantum phases for several instances of LO-QPR tasks.
Firstly, we consider a warm-up case that detects the appearance of the staggered magnetization for the S = 1 2 XXZ spin chain in the Ising limit [75]. The where S α i is the α-component of the S = 1/2 spin operator at the i-th site, and g is the strength of the transverse field. The exchange coupling constant in xy plane is denoted by J 1 and that of the z-axis direction by J 2 . Here, we set J 1 = 0.2, J 2 = 1 and depict the phase diagram M x = X as a function of g (see the green curve in Fig. 3 (c)), where the expectation is under the ground state of Hamiltonian H w . In this case, the number of qubits n = 16, the training data where g i is randomly sampled from the interval [0, 2], and the testing data contains all the point from {g t = 0.067t} for 0 ≤ t ≤ 30. The predictions proposed by Alg. 1 are illustrated in Fig. 3, which shows Alg. 1 yields higher accuracy prediction for the training set with more training data. This provides a simulation support for the theoretical bound in Eq. 5 Secondly, we consider a Z 2 × Z 2 symmetryprotected topological (SPT) phase P which contains the S = 1 Haldane chain. The ground states where X i , Z i are Pauli operators for the spin at site i, n is the number of spins, and h 1 , h 2 and J are parameters of H s . In Fig. 4 (a), the blue and red curves show the phase boundary points, and the background shading (colored tape) represents the phase diagram as a function of x = (h 1 /J, h 2 /J). When the parameter h 2 = 0, the ground states of H s can be exactly solvable via the Jordan-Wigner transformation, and it can be efficiently detected by global order parameters whether these ground states belong to the SPT phase P. Here, we utilize N = 40 data pairs {a = (h 1 /J, h 2 /J), b} as the training data, in which h 2 = 0 and b indicates phase value on a (see yellow points in Fig. 4 (a)). Our target is to identify whether a given, unknown ground state |ψ(x) belongs to P. In principle, the SPT phase P can be detected by measuring a non-local order parame- , n denotes the number of qubits in concerned Hamiltonian. Here, we choose S = Z 1 X 1 ...X 15 Z 16 and utilize N 5/2 = 40 2.5 ≈ 10 5 quantum measurement to estimate the quantum kernel function. The classification results for n = 16 and N = 15, 40 are illustrated as Fig. 4 (b) and (c) respectively, which show the performance on the approximation of the order parameters can be systematically improved by increasing the number of training samples. Fig. 4 (c) shows Although the training data is only on the line with h 2 = 0, which can be classically simulated [35], a classical learner cannot learn from these data to predict the target quantum phase if the testing data satisfies Conjecture 1. However, we demonstrate that even if the training data can be classically simulated, quantum kernel Alphatron still works. Here, we provide more discussions on this result that shows training in classical data provides a predictive model for quantum points. The reason relies on that the order parameter observable is approximated by a linear combination of feature states in the training set, that is, M ≈ i α i |ψ(a i ) ψ(a i )|, and the prediction of the order parameter is significantly determined by the quality of training data. Although the training data are classically simulated, 40 ground states on line h 2 = 0 suffice to approximate an observable that accurately classifies the quantum phase transition boundary. A quantum learner then can utilize SWAP-test technique to efficiently estimate the quantum kernel | ψ(x)|ψ(a i ) | 2 and predict the quantum phase of |ψ(x) . In our model, we introduced a regularised term g( h ) to avoid over-fitting and to enhance its generalization ability. In other words, this model will not necessarily match all training data S, but it has more generalized ability on the testing data T . As a result, the qualitative domain walls are correctly mapped, but there exists some error in the vicinity of training data.
We also note that the quantum convolution neural network (QCNN) method [35] has been proposed to solve the same problem by applying a CNN quantum circuit to the quantum state. Given n-qubit ground states |ψ(x) , the QCNN method requires addition- multi-qubit operations and 4d single-qubit rotations to provide an output at depth d, and the proposed quantum kernel Alphatron requires additionally O(n) two-qubit operations to estimate the quantum kernel matrix. The measurement complexity of QCNN is determined by the number of iteration steps which is hard to be theoretically analyzed, however the quantum kernel Alphatron only requires O(N 2 / 2 ) quantum measurement in the whole training procedure. The comparison of the computational resources is summarized in Table 1.
Finally, we consider the bond-alternating XXZ model where The XXZ model has three different phases that can be detected by the topological invariant Z R (J 1 /J 2 , δ) [72]. Here, we select totally N = 60 pairs } as the training data on the δ = 0.5, δ = 3.0 horizontal lines. In Fig. 5 (b) and Fig. 6, we utilize Alg. 1 to generate the phase diagram as a function of x = (J 1 /J 2 , δ), where the colored shading background represents the phase classification results on a 16-qubit system. The data in phase diagram P 3 is post-processed by the averaging scheme. The testing data contains 900 ground states, where 59 points are mis-classified in the vicinity of quantum phase transition boundary, and the classification accuracy v s = 0.934.

The Power of Learning Algorithms
In this paper, relationships between four different machine learning classes in terms of the method that produces the training data and the learning algorithm are discussed.
• "Q-Data" point (a, b) represents the property value b can be observed from physical experiments associated with the system H(a), and "C-Data" point (a, b) represents the property b can be efficiently computed by some classical Turing machines given a.
In this paper, a separation is proved between (C-Learning Alg. + C-Data) and (Q-Learning Alg. + Q-Data). Noting that the definition of C-Learning Alg. + C-Data is different to the 'classical' machine learning in Ref.
[17], and the proposed theoretical result does not contradict to their statement.
(1) As shown in the Ref [77] (also see Appendix C.2), the power of C-Learning Alg. + C-Data will gradually enhance with the accumulating of training (advice) data, and the set of problems can be solved by classical learning algorithms is defined as the BPP/poly class. With the increase of the training data set, the learner will obtain more and more advicing data, and BPP/poly class will convergence to the P/poly class. It has been proved that the relationship BPP ⊆ BPP/poly ⊆ P/poly holds. Hence, a machine learning task where some data (even generated classically) is provided can be considerably different than commonly studied computational tasks. Classical learning algorithm with classical data (C-Learning Alg. + C-Data) has recently proven successful for many practical applications [78][79][80][81][82]. However, the direct application of these learning algorithms is challenging for intrinsically quantum problems. This is because the extremely large Hilbert space hinders the efficient translation of many-body problems into a classical learning framework. A natural question is thus raised-where is the limitation of such learning algorithms in quantum problems? Our first result rigorously proved that there exists a LO-QPR problem that cannot be solved by such classical learning algorithms under standard complexity assumptions (see Lemma 1 and Theorem 1).
(2) Quantum learning algorithm with quantum data (Q-Learning Alg. + Q-Data) is another significant theme in this paper. The Q-Learning Alg. can utilize quantum-enhanced feature spaces in the learning process, and ground states are provided by quantum circuits with polynomial circuit size. Here we select variational quantum circuits to provide the quantum-enhanced feature state. The proposed quantum learner first learns the order parameter from the training data (a, b), then utilizes this approximated order parameter to predict the quantum phase transition. Our second result claims that the LO-QPR problem can be efficiently solved by Q-Learning Alg. + Q-Data (see Alg. 1 and Theorem 3). These two results thus imply that "Q-Learning Alg. + Q-Data" is strictly stronger than "C-Learning Alg. + C-Data" with suitable complexity assumptions. This implies that the quantum kernel Alphatron with quantum data can efficiently solve the LO-QPR problem if there exists a quantum algorithm (or a quantum circuit) that provides the ground state of the concerned Hamiltonians.
(3) Another interesting class is classical learning algorithm with quantum data (C-Learning Alg. + Q-Data). Here, we briefly discuss a special learning scenario for the LO-QPR problem. Suppose the training data is provided by a quantum computer, and the quantum linear property is described by a nonlocal order parameter M. In this setting, the training data has several choices, such as the external parameter and quantum phase (a, b) results from a quantum computer or the classical shadow representation used in [17]. Given an unknown ground state |ψ(x) , here we discuss whether a classical learner can predict its quantum phase. Similar to the quantum learning process, a possible follow-up classical learning steps are: (a) approximate the order parameter M = i O i from the quantum training data where O i is the tensor product of Pauli operators; (b) Given the learned M, estimate the quantum phase value by using preobtained random computational basis measurements ({0, 1} n bit-string samples) from the quantum state |ψ(x) . Under these two steps, can a classical learner solve the LO-QPR efficiently? The answer is yet unknown, but we are pessimistic about it. According to Theorem 2 in Ref.
[18], at least Ω(3 L( M) M ∞ / 2 ) samples are needed to provide an approximation of ψ(x)| M|ψ(x) with additive error , where · ∞ denotes the spectral norm, and L( M) is the locality of M. Then in the worst case such that L( M) = n, it needs an exponential number of samples to obtain the order parameter. This example implies that a classical learner (without a quantum computer) might not efficiently predict a quantum phase transition phenomenon described by non-local order parameters. However, for this non-local order parameter M, we can successfully learn it with Q-Learning Alg. + Q-Data.
(4) The last category is the quantum learning algorithm with classical data (Q-Learning Alg. + C-Data). Since the relationship BPP ⊆ BQP holds 2 , "Q-Learning Alg. + C-Data" could also be strictly stronger than "C-Learning Alg. + C-Data". In this paper, we provide simulation results on this scenario (see Fig. 4): quantum algorithm learns knowledge from classical data and can predict a classicalhardness quantum phase with high accuracy. Combined with the classical hardness results in Theorem 1, we conjecture that even if the training data are classically simulated, quantum learning algorithm might still have quantum advantages. We leave the rigours proof of this scenario to a future work.
We summarize the complexity relationship of these four categories in Fig. 7, where "Q-Learning Alg." refers to the use of a quantum computer, while "C-Learning Alg." relies only on a classical computer; "Q-Data" represents learning data directly observed from physical quantum experiments, while "C-Data" are efficiently producible by classical Turing machines.

Conclusion
In this paper, we study the power of classical and quantum learning algorithms in solving LO-QPR problems. Specifically, we prove that under widely accepted assumptions, there exist some LO-QPR problems that cannot be efficiently solved by classical machine learning with classical data. We then prove 2 Bounded error Probabilistic Polynomial time (BPP): the class of decision problems solvable by an NP oracle such that: if the answer is 'yes' then accept it with at least 2/3 probability, if the answer is 'no' then accept it with at most 1/3 probability. that LO-QPR problems can be efficiently solved by leveraging the QKA algorithm with quantum data. Furthermore, we provided strong numerical evidence showing that the LO-QPR problems can be solved by the QKA algorithm with quantum data. In some cases, the QKA algorithm succeeded even with only classical data. Based on the above-mentioned theoretical and simulation results, we discussed the complexity relationships of four different machine learning classes in terms of the training data resources and the learning algorithm. We believe the proposed complexity classification helps us understand the power and limitation of classical and quantum learning algorithms.
This work leaves room for further research. For example, our numerical results witnessed the possibility of efficiently solving some LO-QPR problems by QML with classical data, then whether theoretical guarantees exist in showing that the LO-QPR problem belongs to the "Q-Learning Alg.+C-Data" class deserves to be further investigated. Finally, exploring the influences of noisy quantum channels on the effectiveness of quantum learning algorithms in solving LO-QPR would be important in practice.

Author Contributions
Y. Wu and B. Wu contributed equally to this work. All authors contributed to the discussion of results and writing of the manuscript.

A Comparison to related works
Refs. [26, 27] focused on designing efficient measurement protocols to learn knowledge from an unknown density matrix, then predict its linear property by using accumulated measurement results, which is a learning analogue of shadow tomography problem as shown in Task 1.
Authors in [26,27] proved that the entangled Bell measurement protocol can efficiently solve Task 1, meanwhile, it is classically hard in the worst-case scenario in estimating Tr (ρQ) for some Q ∈ P. Noting that the claimed quantum advantages might disappear without entangled measurement when Q represents a global observable. In contrast, the proposed quantum advantage in this paper does not depend on the entanglement of multiple copies of quantum states, while the power of quantum-enhanced feature space plays an essential role.
In Task 4, the training data set S only contains external parameters a and corresponding phase values b rather than the artificially designed density matrix. Furthermore, the order parameter observable M is unknown, and the learning protocol utilizes Quantum Kernel Alphatron (QKA) to approximate M by extracting abstract patterns from the data set S = {(a i , b i )} N i=1 , then using the approximated M to construct a prediction model h(a). Then we rigorously proved that there exits a testing set T = {(x i , y i )} M i=1 such that 8/9 of y i ∈ T cannot be efficiently predicted by any classical ML algorithm under some standard complexity assumptions. In the table 2, we summarize the mainly differences between previous works and this paper.

Key Properties
This  From the above comparison, it is clear that previous approaches focus on learning from a single density matrix, but our paper learns patterns from a series of external parameters and their corresponding quantum phases. Then the Bell measurement methods in [26, 27] might not be directly applied to the problem studied in this paper.
Recently, Huang et al.
[17] utilized an unsupervised learning method to learn samples from provided ground states, which can be summarized as Task 2. Then we show that the required sample complexity T by shadowtomography based classical ML is expected to increase exponentially with respect to the system size, when the order parameter M performs on O(n) qubits. As discussed in [36], only a few LO-QPR problems determined by a global observable have a few-body observable approximation. Therefore, it is reasonable to consider a scenario where the shadow-tomography-formed training data is provided by a quantum computer, and the quantum phase transition can only be determined by a non-local order parameter M. Given an unknown ground state |ψ(x) , here we discuss whether a classical learner can predict its quantum phase. Similar to the quantum learning process, possible follow-up classical learning steps are: Then in the worst case such that L( M) = n, it is expected to require exponential number of samples to solve the LO-QPR problem. This example implies that a classical learner (without a quantum computer) might not efficiently predict a quantum phase transition phenomenon that only can be determined by non-local order parameters. While our work utilizes quantum machine learning to extract high-level abstractions from observed data and directly process quantum ground states information by a quantum computer. Here, the ground state |ψ(a) of H(a) embeds classical external parameter a onto a specific quantum-enhanced feature space, where inner products of such quantum feature states give rise to a quantum kernel, a metric to characterize distances in the feature space. As a result, predicting the ground state property can be transformed into quantum state overlap computation, and thus bypasses the required exponential sample complexity.
Then we emphasize that the definition of "classical ML" is different to that in [17]. In our paper, we discuss the complexity relationship of four categories in terms of the method that produces the training data and the learning algorithm. Here, "Q-Learning Alg." refers to the use of a quantum computer, while "C-Learning Alg." relies only on a classical computer; "Q-Data" represents learning data directly observed from physical quantum experiments, while "C-Data" are efficiently producible by classical Turing machines. In this paper, a separation is proved between C-Learning Alg. + C-Data and Q-Learning Alg. + Q-Data.
Finally, we point that C-Learning Alg. + C-Data represents an nontrivial class. As shown in [77], the power of classical learning algorithms will gradually enhance with the accumulating of training (advice) data, and the set of problems can be solved by classical learning algorithms is defined as the BPP/poly class. With the increase of the training data set, the learner will obtain more and more advicing data, and BPP/poly class will be convergence to the P/poly class. It has been proved that the relationship BPP ⊆ BPP/poly ⊆ P/poly holds. Hence, a machine learning task where some data (even generated classically) is provided can be considerably different than commonly studied computational tasks. In our manuscript, we want to demonstrate quantum advantages by introducing quantum computational resources into learning algorithms, and our main contribution is to prove that there exists some LO-QPR problems cannot be efficiently solved by any 'C-Learning Alg. + C-Data', however, the 'Q-Learning Alg. + Q-Data' can efficiently solve this learning problem which thus illustrates quantum advantages.

m . Define the distribution H A over circuits in A by drawing each gate G i independently from the Haar measure.
We first review the method on constructing a quantum random circuit. As presented in Ref [30], the quantum random circuit can be constructed in an iterative method in the realistic physical experiment. The construction starts with an initial layer of Hadamard gates to rotate the X basis, and the next D layers alternately insert controlled-Z (CZ) configurations. And one-qubit gates are also randomly sampled from the set {X 1/2 , Y 1/2 , T} and are placed between two CZ configurations. Theoretically, the brickwork architecture is also can be used to generate quantum random circuits. The brickwork is a kind of structure formed as follows: Perform a string of 2-qubit gates U 1 ⊗ U 2 ⊗ ... ⊗ U n/2 as the first layer, then perform a staggered string of gates, as illustrated in Fig. 2 (a) of the main file.

Definition 3 (Haar random quantum circuit). Let A be an architecture over circuits and let the gates in the architecture be
.,R . Define the distribution H A over circuits in A by drawing each 2-qubit gate U i independently from the Haar measure. Then construct the unitaries along the edges of A, and each constructed circuit is defined as a Haar random quantum circuit.
In the field of quantum computation, the variational quantum circuit is a popular method for approximating the ground state of H(x). The key idea of using variational quantum circuit is that the parameterized quantum state |Ψ(θ) is prepared and measured on a quantum computer, and the classical optimizer updates the parameters θ according to the measurement information. The quantum state |Ψ(θ) can be prepared by where U (θ) is composed of D unitaries U d (θ d ). Noting that the variational quantum circuit U (θ) has the same architecture to that of random circuit U , and two-qubit gates U i (θ i ) are sampled from a subset of SU (4). Then the relationship U A (θ) ⊆ U A holds, where U A (θ) and U A denote the set of U (θ) and random quantum circuit U based on A, respectively. Then we will show how to utilize one of the instances U (θ) ∈ U A (θ) ⊆ U A to generate ground states of a family of Hamiltonians H(x) [45]. The ground state |ψ(x) of H(x) can be obtained from the imaginary time evolution, that is where β indicates the inverse temperature, A(β) = 1/ φ 0 |e −2βH(x) |0 n . If we consider the imaginary time evolution of the Schrödinger equation on the variational circuit state space, the parameter dynamics is governed by where the term E β (x) = η(β, x)|H(x)|η(β, x) and θ(β) denotes the varational parameter in the circuit U (θ).

C Proof of theorems
Here, we provide technical details for the proof of theorems in the main text.

C.1 Proof of Lemma 1
We first review several lemmas and assumptions which are closely related to our proof.
for the value if the function f can be computed efficiently given x.
). There exists an n-qubit quantum circuit U such that the following task is # P-hard: approximate p U (j) = | j|U |0 n | 2 to additive error c /2 n with probability 3 4 + 1 poly(n) , where j is a {0, 1} n bit string and c = 1/poly(n).
Here, a candidate of the worst-case U ∈ C 2 n ×2 n is a size m ≤ poly(n) unitary where each basic gate is a two-qubit gate following some fixed gate position architecture A. We denote this distribution as H A . Note that the presented conjecture assets that it is #P -hard to compute anything in an interval of radius 1/(2 n poly(n)) around the point p U (j) on the choice of U , however, Bouland et al. proved that it is #P -hard to compute a truncated property p U (j) which is close to p U (j) with an exponentially small error. Since this hardness interval is completely contained within the domain of conjectured hardness, their result is necessary for the conjecture. Therefore, if this conjecture holds, it implies computing most of p U (j) is #P -hard, and the worst and average-case quantum circuit instances share the same property in the architecture A.
and o s /2 n is the Fourier transformation of p(j). Based on the algebra symmetry between p(j) and o s /2 n , we have If o s can be efficiently approximated by a classical computer given s, there exists a BPP NP BPP algorithm that can approximate p(j) with the multiplicative error 1/poly(n) based on a theorem by Stockmeyer [83]. Considering BPP ⊆ P/poly and approximating p(j) is #P -hard, these yield P #P ⊆ BPP NP BPP ⊆ BPP NP /poly. Since NP NP ⊆ P #P , one has NP NP ⊆ BPP NP /poly, which implies PH collapses to the second level [84]. Therefore, there does not exist a classical algorithm that can efficiently calculate o s * for some s * ∈ {0, 1} n based on the assumption that PH does not collapse and Conjecture 1 holds. Without loss of generality, let M(s * ) = Z s * 1 1 ⊗ Z s * 2 2 ⊗ · · · ⊗ Z s * n n , and the interested physical order parameter observable M t = P 1 ⊗ · · · ⊗ P n (for example, ferromagnetic parameter X or SPT parameter Z i X i+1 X i+3 ...X j−3 X j−1 Z j ). Since a Clifford gate C maps a Pauli operator to another Pauli operator, then the target order parameter observable M t can be expressed as where C represents a Clifford gate. Therefore, for any target n-qubit Pauli observable M t , the expectation value The last equality is valid because CH(x)C † belongs to the Hamiltonian family H. If |ψ(x) represents the ground state of H(x), then C|ψ(x) will be the ground state of CH(x)C † . Therefore, for any interested order parameter observable M t ∈ P, there exists a ground state |ψ(y) ∈ H such that their corresponding quantum phase is classically hard.

C.2 Complexity argument for the power of data
Here, we review the power of classical ML algorithms that can learn from data by means of a complexity class, which is defined as BPP/poly in Ref. [77]. A language L of bit strings is in BPP/poly if and only if the following holds. Suppose M and D are two probabilistic Turing machines, where D generates samples x with |x| = n in polynomial time for any size n and D defines a sequence of input distributions {D n }. M takes an input x of size n along with a set {( , where x i is sampled from D n using D and y i indicates the corresponding label. If x i ∈ L, one has y i = 1, else y i = 0. Specifically, one requires: (1) The probabilistic Turing machine M processes all inputs x in polynomial time.
(3) For all x / ∈ L, M outputs 0 with probability less than 1/3. From the above definition, we know that BPP is contained in this complexity class. Now we provide details on the separation between classical ML algorithms with classical data and BPP. Consider an undecidable language L h = {1 n |n ∈ A}, where A is a subset of the natural numbers set, and consider a classically easy language L e ∈ BPP. Assuming that for any input size n, there exists an input a n ∈ L e and an input b n / ∈ L e . Then a new language can be defined: For each size n, if 1 n ∈ L h , the language L would include all x ∈ L e with |x| = n, otherwise, the language L would include all x / ∈ L e with |x| = n. That is to say, if one can decide whether a problem x ∈ L for an input x using a classical algorithm, we can output whether 1 n ∈ L h by checking whether x ∈ L e . This is impossible since the language L h is undicidable. Hence the language L is not in BPP class. On the other hand, if the training data {x i , y i } are provided, where the label y i represents whether x i belongs to L, and we thus can decide whether 1 n belongs to L h . Based on the above discussion, we know that the power of classical learning algorithms will gradually enhance with the accumulating of training (advice) data, and the set of problems can be solved by classical learning algorithms is defined as the BPP/poly class. With the increase of the training data set, the learner will obtain more and more advicing data, and BPP/poly class will be convergence to the P/poly class. Hence, a machine learning task where some data is provided can be considerably different than commonly studied computational tasks. In our manuscript, we want to demonstrate quantum advantages by introducing quantum computational resources into learning algorithms, and our main contribution is to rigorously prove that any 'C-Learning Algorithm + C-Data' cannot solve the quantum phase learning problem. However, the 'Q-Learning Algorithm + Q-Data' can efficiently solve this learning problem which thus illustrates quantum advantages.

C.3 Proof of Theorem 1
The following lemma gives an average-case hardness for the quantum phase computation problem.

Lemma 3.
With the assumption that Conjecture 1 holds, and the PH in the computational complexity theory does not collapse, it is classically hard to approximate 8/9 of the quantum phase computation problem given a certain n-qubit Hamiltonian H(a) with additive error = 1/(poly(n)), where its ground state |ψ(a) = U ( θ(a))|0 n and U ( θ(a)) ∈ U A (θ).
Proof. Suppose we take a worst-case ground state |Ψ = U ( θ)|0 n of H(a) generated by a variational quantum circuit U ( θ) ∈ U A (θ), such that computing p(j) = | j|Ψ | 2 to within additive error 2 −poly(n) is #P -hard (based on conjecture 1 in the main file). Since the two-qubit gate structures of U ( θ) = U DL (θ DL ) · · · U 1 (θ 1 ) is provided in A, where U r (θ r ) denotes the r-th two-qubit gate for r ∈ [DL], θ = (θ 1 , . . . , θ DL ) and θ r ∈ R 15 . Denote R = DL, each two-qubit gate where P j ∈ {I, X, Y, Z} and each θ r (j 1 , j 2 ) ∈ [0, 2π]. Using Taylor series, one obtains . For arbitrary bit-string x, y, we can apply standard bound on Taylor series to bound x|U (θ r ) − U (θ r ) tr |y 1 ≤ κ/K! for some constant κ. Therefore for an arbitrary observable M where the y 1 , .y 2 , ... represent Feymann integration path. Since y r |U (θ r )|0 n can be approximated by a polynomial of degree K based on Taylor truncated method, the above expression can be rewritten by where f r represents a polynomial of degree RK. Furthermore, f r (θ 1 , ..., θ R ) can be approximated by a lowdegree function with at most C q R (K) q terms, where q = O(1) and C q R represents the combination number. Let f r ( θ) = i α i θ i1 1 · · · θ i R R , where i = (i 1 , ..., i R ) and each i l ∈ [K], l ∈ [R]. For every term α i θ i1 1 · · · θ iv v with i 1 = · · · = i v = K and v > q, its corresponding parameter α i ≤ 1/(K!) q (based on Taylor series). Therefore, the relationship holds. Then let q = O(1), R = O(n 2 ), K = poly(n) and K R,f r can provide an estimation to f r within 2 −poly(n) additive error, andf r only has C q R (K) q = poly(n) terms. Then Eq. 26 can be represented by a muti-variable polynomial function f ( θ, M) with R variables and at most poly(n) terms, and the relationship holds. Suppose the variational quantum circuit U ( θ) is powerful enough, such that it can calculate some ground Hamiltonian H(a). The 'worst-to-average-case' reduction can be achieved by proof of contradiction: Since the variational quantum circuit can generate a ground state set G for a family of Hamiltonian H(a), suppose there exists a classical algorithm O, which can efficiently approximate 8/9 of It implies that for at least 2/3 choices of {U ( θ(a i ))}, O correctly approximate {b i = 0 n |U † ( θ(a i ))MU ( θ(a i ))|0 n }. According to the assumption in G, the variational quantum circuit provides a map between a i → θ(a i ). From Eq. 28, one can fit a polynomial function in θ that recovers the value of 0 n |U † (θ)MU (θ)|0 n by using { θ(a i )}. However, according to the Lemma 1, successful approximating 0 n |U † (θ)MU (θ)|0 n (worst-case scenario) by a BPP algorithm will yield PH collapse. Then, it is hard to approximate 8/9 of the {b i }. Then the above ( θ(a i ), b i ) can be used in constructing a testing set where |ψ(x i ) = U ( θ(a i ))|0 n and y i = b i .
Note. One might think that the above procedure could inspire a classical learning algorithm in predicting a hard quantum phase by using quantum data, however this cannot be directly used in solving LO-QPR problems. The reason is that the above procedure fits a polynomial function in θ rather than the external parameter a in the training data set, then the proof is not sufficient in proving the efficiency of C-Learning Alg.+Q-Data on LO-QPRs determined by a global linear observable.
Proof of Theorem 1. Our proof depends on the quantum circuit representation of concerned ground states. We firstly provide the method on constructing a test set According to the construction in Sec. B, the variational quantum circuit state |Ψ( θ(x)) can approximate a ground state |ψ(x) of the Hamiltonian H(x). Starting from |Ψ( θ(x)) , we want to approximate the ground state |ψ(x i + δx) of H(x i + δx) at external parameter x i + δx by |Ψ( θ(x) + δθ) . The value of δθ = (δθ 1 , δθ 2 , ..., δθ DL ) can be determined by minimizing the distance where and and the notation · represents the fidelity norm. Then the function L 2 (δθ) can be further computed as If we focus on the m-th variable δθ m , the minimum of L 2 (δθ) obtains at in which the parameter and Once each elements are estimated, the variation of parameters δθ can be efficiently computed by solving the linear system where the matrix B( θ(x)) = (B s,m ) DL×DL and E( θ(x)) = (E 1 , ..., E DL ) T . Since the matrix B is a real-valued symmetry matrix, the inverse of B must exist. And θ(x) can be updated by Then, the ground state |ψ(x i + δx) can be approximated by |Ψ( θ(x) + δθ) . In this iterative method, one can construct a series of (|Ψ( θ(x)) , |Ψ( θ(x) + δθ) , ...) to represent the ground state |ψ(x) , |ψ(x + δx) , ... from a family of Hamiltonian H(x), and this thus constructs a testing set T based on the Hamiltonian H(x) and the architecture A. Now we only need to prove that there does not exist efficient classical ML algorithm that can predict y i for x i ∈ T (constructed in Lemma 3) with probability 8/9. The basic idea relies on: if the classical ML can predict all y i ∈ T , then we can design an efficient classical algorithm to solve the worst-case hardness GLP problem, which results in a contradiction. Given the classical training set S (C-Data), the power of classical ML can be characterized as the BPP/samp class [77]. Suppose there exists a classical ML, which can efficiently predict 8/9 of and the vector θ(x) (parameter in the worst-case scenario), one can fit a polynomial function in θ that recovers the value of 0 n |U † ( θ(x))MU ( θ(x))|0 n which is the worst-case scenario. According to Lemma 1, an algorithm O that can approximate the worst-case scenario 0 n |U † ( θ)MU ( θ)|0 n with 1/poly(n) additive error implies a BPP NP O algorithm can approximate p(j) = | j|Ψ | 2 = | j|U ( θ)|0 n | 2 with the multiplicative error 1/poly(n) based on a theorem by Stockmeyer [83]. Therefore, if there exists a classical ML with classical data can efficiently predict 8/9 of y i ∈ T , this implies a BPP NP BPP/samp algorithm that can approximate p(j) with the multiplicative error. Considering BPP/samp ⊆ P/poly and approximating p(j) is #P -hard, these yield Since NP NP ⊆ P #P , one has NP NP ⊆ BPP NP /poly, which implies PH collapses to the second level [84]. Hence with the assumption that PH does not collapse, classical machine learning with classical resources cannot solve LO-QPRs even in the average-case scenario on T .

C.4 Proof of Theorem 3
Proof sketch of Theorem 3. Let w = N i=1 α i |ψ(a i ) ⊗ |ψ(a i ) * , where |ψ(·) * is the conjugate of |ψ(·) , and the reproduced-kernel-feature-vector Ψ(a i ) = |ψ(a i ) ⊗ |ψ(a i ) * . Then by the definition of w and |Ψ(a i ) . Therefore, E [b j |a j ] = w, Ψ(x) + g (a j ) and Hence, this theorem is followed by substituting the quantum kernel Q and feature map Ψ into holds with 1 − δ probability, where h t (x),ĥ t (x) represent the ideal QKA model and estimated QKA model at the t-th iteration step.
The anticipated range of Eq. 41 matches perfectly to our simulation results, which can be checked from the generalized riskR L (ĥ * ) at t = T , as demonstrated in Fig. 4 (b) and Supplementary material. For example, the theoretical upper bound in detecting SPT isR L (ĥ * ) ≤ 0.483 (N = 40, δ = 0.1), and the numerical risk is lower than 0.483 after 15 iteration steps which is consistent with the anticipated bound.

Lemma 4 ([51]
). Fix a data distribution (x, y) ∼ D, kernel function Q and training data size N . Then, with probability at least 1 − δ, holds.
Proof. Notice that if the quantum kernel Q can be exactly calculated, then by Goel and Klivans [52], Quantum kernel Alphatron in the main file outputs a hypothesis h * such that This inequality is obtained by leveraging of andR for some t * ≤ T = O (N/ log(1/δ)). Nevertheless, if the quantum kernel Q is approximated via performing quantum circuits, Eq. (44) should be replaced witĥ , andQ is the approximation of Q. In the following, we will prove that R ĥt * −R (h t * ) is bounded, and hence R (h * ) is bounded by combining Eq. (43), (44) and (45). By Theorem 3 in the main file,Q (a i , x) is an Q approximation of Q (a i , x), i.e., with high probability. For convenience, in the later proof we require for all i, δ i Q =Q (a i , x)−Q (a i , x) are the same, and δ t αi =α t i −α t i are also the same, denoted them as δ Q , δ t α respectively. Since δ i Q are in the same order for i ∈ [N ] (δ t αi similarly), hence it is reasonable for the assumptions. We will have the same upper bound for R (h * ) without the assumptions and with a more tedious proof. Then for any i, we have where hence with the fact that 0 ≤Q i ≤ 1 and A k ≤ k−1 N , the absolute value of δ t α satisfies the inequality where the firstly inequality holds by the definition ofR (h t * ) andR ĥt * (Recall that where the last inequality holds since t = O (N/ log(1/δ)) and G = Ω (1). Combine the above inequality to Lemma 4, the upper bound of generalize error is obtained.

D Implementation of quantum kernel with SWAP test
By leveraging of Chernoff bound, the quantum kernel can be approximated by independently performing the Destructive-Swap-Test [74] to O(log(1/δ)/ 2 Q ) copies of 2n-qubit state |φ (a i ) ⊗ |φ (x) , with additive error Q and failure probability δ. The expectation of the measurement results of the Destructive-Swap-Test is where SWAP|φ(a i ) ⊗ φ(x) = |φ(x) ⊗ φ(a i ) denotes the 2n-qubit swap operator. For QPL problem, |φ (a i ) and |φ (x) can all be generated with polynomial-size circuit, hence the Destructive-Swap-Test can be performed efficiently.

E Discussions on sample complexity of ground states
Here, we carried out further theoretical analysis and numerical calculation on the distribution probabilities of ground states |ψ g (x) of parameterized Hamiltonian H(x). We aim to provide a numerical window of x, for which the corresponding Hamiltonian simulation is expected to be classically hard.
The probability distribution of a truly random quantum state |ψ possesses the Porter-Thomas (PT) distribution Pr(| j|ψ | 2 ) = 2 n e −2 n | j|ψ | 2 , which is known to be classically hard to sample [30,50]. In the following we compare the distribution probabilities of ground states |ψ g (x) of parameterized Hamiltonian H(x) with the Porter-Thomas distribution.
Proof Sketch: Here, we provide a proof by contradiction. On one hand, it is known #P -hard to approximate p(j) to additive error O(1/(n2 n )) with a constant probability [49]. On the other hand, if ≤ n −1 and assume that there exists a classical sample algorithm A that can efficiently sample from |ψ g (x) , then A can be used to efficiently estimate p(j) to additive error O(1/(n2 n )) with probability 1 4 . This leads to a contradiction. Therefore, if ≤ n −1 , such classical sample algorithm A does not exist.
Taking everything together, the classical algorithm A combined with classical post-processing can provide an estimation to p(j) within an additive error , the above estimation is valid with probability 1/4. Therefore, under the conditions ≤ n −1 , and assuming A can efficiently sample from the ground state |ψ g (x) , then there exists a classical algorithm which can estimate p(j) to a O(1/(n2 n )) additive error with a constant probability.
However, we already know that it is #P -hard to estimate p(j) to a O(1/(n2 n )) additive error with a constant probability. Then, if < n −1 , the existence of such classical algorithm A would lead to Polynomial Hierarchy collapse to its second level [84]. Therefore, no classical algorithm can efficiently sample from |ψ g (x) if < n −1 .
To illustrate the usefulness of theorem 4, we show numerical calculations for (1) the lattice Transversefield Ising model and (2) the Fermi-Hubbard model. The lattice Transverse-field Ising model is given by where X i and Z i are Pauli operators on the i-th qubit, and x = (W, J, F ) determines the relative strength of the Hamiltonian terms. And the Fermi-Hubbard model is given by H H (t, U ) = −t i,j ,s (a † i,s a j,s + a † j,s a i,s ) + U i n i↑ n i↓ , where x = (t, U ) determines the relative strength of the Hamiltonian terms, a † i,s and a i,s are fermionic creation and annihilation operators, n i↑ = a † i↑ a i↑ and similarly for n i↓ . The notation i, j in the first sum associates sites that are adjacent in a (n a × n b ) lattice, and s ∈ {↑, ↓}. Here, we denote the above two Hamiltonians as H(x), and analyze the probability distribution of |ψ g (x) , namely the ground state of H(x).
We use the trace distance Tr(p, q) as a measure between two distributions, where Tr(p, q) ∈ [0, 1]; Tr(p, q) = 0 holds if and only if distribution p = q. In Figure 10(a), we plot our results for lattice Transverse-field Ising model with n = 12, W = 1, and a range of J and F values. When the nearest neighbour coupling strength J = 0, the corresponding ground states are classically solvable. sample from these ground states is easy for classical algorithms, as indicated by large values of trace distance in Figure 10(a). On the other hand, J = 0 induces complex ground states |ψ(x) , resulting much smaller values of trace distances. However, the ground states of 2-dimensional lattice models donot saturate into the ≤ n −1 domain which has been proved to be classically hard.
In Figure 10(b), we plot ground states of Hubbard model for U = 1, and a range of n = (n a × n b ) and t values. We observe that the change of t increases the sample complexity of |ψ g (x) for different size of the Hubbard models. The ground states of (2 × 5) Fermi-Hubbard model saturate into the = n −1 = 0.1 error. It also demonstrates Fermi-Hubbard ground states are more difficult compared with (a) lattice Ising model ground states, which is consistent with physical intuition. F Numerical comparison to related works

F.1 Comparison to QCNN
To provide a fairly comparison, we first elaborately provide the computational overhead of both method in each iteration step. For the proposed Algorithm 1, the quantum learner should first calculate an N × N quantum kernel method with O N 2 / 2 sample complexity and a quantum circuit with O(n) controlled Z gates. After that, the proposed Algorithm 1 does not need any quantum resource in each iteration step. The QCNN loss function is where f U,V,F denotes the output of QCNN and U, V, F represent the variational quantum circuits in QCNN. At depth d, the QCNN method requires O( 7n 2 (1 − 3 1−d ) + n3 1−d ) multi-qubit operations and 4d single-qubit rotations to provide an output. Repeat this quantum circuit N times, the QCNN obtains a loss function L(U, V, F ) in a single iteration step.
Consider a classification task on a set C = {c 1 , c 2 } of 2 classes in a supervised learning scenario. In such settings, a training set S and a testing set T both are assumed to be labeled by a map m : S ∪ T → C, and both S and T are provided to the learner, where only the training set S has the label. Formally, the learner has only access to a restriction m(x) of the indexing map: Suppose the learner outputs a model h(·) to predict the label of data x ∈ T , and the classification result can be defined as: where t 1 and t 2 are selected thresholds. Then the accuracy of the model is quantified by a classification success rate, proportional to the number of collisions: does not provide the exactly accuracy of QCNN, we thus simulate the QCNN based on the code in the link (https://github.com/Jaybsoni/Quantum-Convolutional-Neural-Networks). The simulation results show that its output cannot perfectly fit the Antiferromagnetic boundary which is consistent with Fig. 4 in ref [35]. Using the same testing set with 4096 data points, there are 116 points are mis-classified, and its classification accuracy v s = 0.971. The predicted phase diagrams is illustrated as Fig 9. We also numerically show our method takes less steps to a relatively stable landscape compared to the QCNN method (see Fig. 11).

F.2 Comparison to Ref. [17]
cannot provide a comparable classification result with QKA method (see Fig. 12 (c) and Fig. 13 (c)), where some strange quantum phase assignments appear. For example, Fig. 12 (a) appears two mis-classified ground states in the top right corner which do not represent any quantum phase, and a similar phenomenon is observed on the left side in Fig. 13 (a).

G Review of Alphatron algorithm
In this section, we review the Alphatron algorithm [52], and give the comparison for Alphatron, Quantum Alphatron [88], and the Quantum kernel Alphatron algorithm (this paper).
Alphatron algorithm requires poly(N, d, log(1/δ), t K ) running time to train a kernel model, where t K is the running time on computing the kernel function. In Quantum Kernel Alphatron algorithm, we let the kernel be quantum kernel Q(x, a i ) = 0 n | U (x) † U (a i ) |0 n 2 . Since Q (x, a i ) is approximated via SWAP test, the risk in Eq. (64) does not hold. Nevertheless, we prove that via O N 5/2 copies of quantum states for each training data, the risk of the Quantum kernel Alphatron is also bounded.
As a comparison, Quantum Alphatron algorithm [88] quantizes the Alphatron algorithm to provide a quantum implementation of the well-known polynomial kernel function, which accelerate the running time of the original Alphatron algorithm in the scenario that the dimension d of the data is much larger than other parameters.