On the Quantum versus Classical Learnability of Discrete Distributions

Here we study the comparative power of classical and quantum learners for generative modelling within the Probably Approximately Correct (PAC) framework. More specifically we consider the following task: Given samples from some unknown discrete probability distribution, output with high probability an efficient algorithm for generating new samples from a good approximation of the original distribution. Our primary result is the explicit construction of a class of discrete probability distributions which, under the decisional Diffie-Hellman assumption, is provably not efficiently PAC learnable by a classical generative modelling algorithm, but for which we construct an efficient quantum learner. This class of distributions therefore provides a concrete example of a generative modelling problem for which quantum learners exhibit a provable advantage over classical learning algorithms. In addition, we discuss techniques for proving classical generative modelling hardness results, as well as the relationship between the PAC learnability of Boolean functions and the PAC learnability of discrete probability distributions.


Introduction
Since its introduction, Valiant's model of "Probably Approximately Correct" (PAC) learning [1], along with a variety of natural extensions and modifications, has provided a fruitful framework for studying both the computational and statistical aspects of machine learning [2,3].Importantly, the PAC framework also provides a natural setting for the rigorous comparison of quantum and classical learning algorithms [4,5].In fact, while the recent availability of "noisy intermediate scale quantum" (NISQ) devices has spurred a huge interest in the potential of quantum enhanced learning algorithms [6][7][8], it is interesting to note that there is a rich history of quantum learning theory, beginning as early as 1995 with the seminal work of Bshouty and Jackson [4,9].Despite this rich history, the majority of previous work on quantum learning theory has focused on the classical versus quantum learnability of different classes of Boolean functions, which provides an abstraction of supervised learning [3].
In this work, we study the classical versus quantum PAC learnability of discrete probability distributions.More specifically, at an informal level we explore the following question, from the perspective of both classical and quantum learning algorithms: Given samples from some unknown probability distribution, output with high probability an efficient algorithm for generating new samples from a good approximation of the original distribution.We refer to this task as generative modelling.Note that one could also consider the related problem not of generating new samples from a distribution, but of learning a description of the distribution itself -a problem known as density estimation [10][11][12].
Here, we focus exclusively on generative modelling for a variety of reasons.Firstly, from a purely classical perspective, modern heuristic models and algorithms for generative modelling, such as Generative Adversarial Networks (GANS) [13], Variational Auto-Encoders [14] and Normalizing Flows [15] have proven extremely successful, with a wide variety of practical applications, and as such understanding whether quantum algorithms may be able to offer an advantage for this task is of natural interest.Additionally, a variety of quantum models and algorithms for generative modelling have recently been proposed, such as Born Machines [16][17][18][19], Quantum GANS [20][21][22][23] and Quantum Hamiltonian-based models [24].While the majority of these approaches remain ill-understood from a theoretical perspective, Ref. [19] has indeed already established evidence for a meaningful generative modelling advantage, in some specific instances, when using a particular tensor network inspired quantum generative model.Furthermore, we know that there exist probability distributions which cannot be efficiently sampled from classically, but which can be efficiently sampled from using quantum devices [25][26][27].In light of this fact, and the emergence of quantum algorithms for generative modelling, Ref. [16] has formalized the question, within the PAC framework, of whether there also exist classes of probability distributions which can be efficiently PAC learned (in a generative sense) with quantum resources, but not with purely classical approaches.Our primary contribution in this work is to answer this question in the affirmative, through the explicit construction of a concept class of discrete probability distributions which, under the Decisional Diffie-Hellman (DDH) assumption [28], is provably not efficiently PAC learnable from classical samples by a classical learning algorithm, but for which we provide an efficient quantum PAC learning algorithm.This class of distributions therefore provides a concrete example of a generative modelling problem for which quantum learners exhibit a provable advantage over classical learning algorithms, within the PAC framework.
The following important points regarding the setting of the result are worth clarifying.Firstly, although it might seem natural to consider the learnability of probability distributions describing the outcome of quantum processes, such as the measurement of a parameterized quantum state or random quantum circuit, we focus exclusively in this work on probability distributions describing the outcomes of classical circuits.Apart from allowing us to exploit existing results concerning the hardness of learnability for specific classes of discrete probability distributions [29], this restriction also allows us to demonstrate a quantum generative modelling advantage for purely classical problems.Additionally, in the context of Boolean function learning, it is often of interest to consider quantum learning algorithms which have access to quantum examples -in essence a superposition of input/output pairs from the unknown function to be learned [4].In the setting we are concerned with here, it is also possible to consider a notion of quantum samples from a distribution, however once again we choose to restrict ourselves to quantum learning algorithms with access to classical samples from the unknown probability distribution, which provides the fairest comparison of quantum versus classical learners for generative modelling.Finally, it is important to stress that the efficient quantum learning algorithm which we provide is expected to require a universal fault-tolerant quantum computer, as it makes use of the exact efficient quantum algorithm for discrete logarithms [30].As a result, we do not expect that the separation we show in this work can be experimentally demonstrated on NISQ devices.Studying the learnability of probability distributions generated by quantum processes, the power of learners with quantum samples, and the power of nearterm quantum learning algorithms remain interesting open problems, and as such we will also discuss the consequences of our results and techniques for approaching these questions.
To provide a generative modelling task for which there exists a definitive provable separation between the power of quantum and classical learners, we rely heavily on techniques at the rich interface of computational learning theory and cryptography [2].More specifically, we start from the prior work of Kearns, Mansour, Ron, Rubinfeld, Schapire and Sellie (KMRRSS) [29], who have shown that given any pseudorandom function collection it is possible to construct a class of probability distributions for which no efficient classical generative modelling algorithm exists.We show that in order for such a class of distributions to be efficiently quantum learnable, one requires a pseudorandom function collection for which there exists a quantum adversary, who in addition to distinguishing keyed instances of the pseudorandom function collection from random functions via membership queries, can also learn the secret key using only random examples.By using the DDH assumption as a primitive, we are then able to construct such a pseudorandom function collection via a slight modification of the Goldreich-Goldwasser-Micali (GGM) construction [31].
Although the classical hardness result of KMRRSS [29] is a sufficient starting point for our purposes, we also address in this work the possibility of obtaining similar classical hardness results for generative modelling from primitives other than pseudorandom function collections.More specifically, we formulate and discuss conjectures concerning the possibility of proving classical hardness results for generative modelling from both weak pseudorandom function collections, and from existing hardness results for the PAC learnability of Boolean functions.Apart from being of conceptual interest, in the former case these considerations are motivated by the possibility of using such results to address questions concerning generative modelling with near-term quantum learners, as well as quantum learners with quantum samples.
In the latter case, these considerations are motivated by a desire to understand better the relationship between the PAC learnability of discrete probability distributions, and the PAC learnability of Boolean functions.
From the above outline one can see that both the results and techniques of this work lie at the intersection of quantum machine learning, computational learning theory and cryptography.In particular, while our primary result is very much in the spirit of computational learning theory, and contributes new ideas and techniques in this vein, it is also certainly of interest to the quantum machine learning community, and largely motivated by a desire to understand more clearly the potential and limitations of quantum enhanced machine learning.As a result, in order for this work to be accessible to readers with differing backgrounds and interests, we will provide a detailed and pedagogical presentation of the foundational material necessary for understanding both the context and details of our main result.
We proceed in this work as follows: Firstly, we begin in Section 2 with an introduction to the PAC framework, both for concept classes consisting of Boolean functions, and for the generative modelling of concept classes consisting of probability distributions over discrete domains.Given these foundations, we conclude Section 2 with the statement of Question 1, which provides a precise technical description of the primary question that we address in this work, and which we have described informally above.With this in hand, we then proceed in Section 3 to answer Question 1 in the affirmative.More specifically, after providing an overview of the necessary cryptographic notions in Section 3.1, we present in Section 3.2 a technique due to KMRRSS [29] for constructing from any pseudorandom function collection a distribution class which is provably not efficiently learnable by classical learning algorithms.This technique then allows us to construct in Section 3.3 a distribution class which, under the DDH assumption, is provably not efficiently learnable by any classical learning algorithm, but for which we provide explicitly an efficient quantum learner for the generative modelling task.We then briefly discuss in Section 3.4 a method for the verification of the advantage exhibited by the quantum learner we provide.Having fully addressed Question 1 at this point, we then shift gears and explore in Section 4 the possibility of obtaining classical generative modelling hardness results from primitives other than pseudorandom function collections.In particular, in Section 4.1 we discuss whether weak pseudorandom function collections would be sufficient, and in Section 4.2 we examine the relationship between PAC learnability of Boolean functions, and the PAC generative modelling of associated probability distributions.Finally, in Section 5 we summarize our results, and provide an overview of interesting related and open questions, focusing specifically on the setting of probability distributions generated by quantum processes.

Quantum and Classical PAC Learning
In this section, we begin by defining the notion of probably approximately correct (PAC) learnability, both for concept classes consisting of Boolean functions, and concept classes consisting of probability distributions over discrete domains.As we will see, these notions provide a meaningful abstract framework for studying computational aspects of both supervised learning and probabilistic/generative modelling respectively.While the main result of this work is concerned with the latter setting, we begin with the more familiar context of Boolean functions in order to introduce both the fundamental ideas, and a variety of oracle models which will be important throughout this work.Additionally, as mentioned in the introduction, after presentation of our main distribution learning results in Section 3, we will in Section 4.2 discuss in detail the relationship between PAC learnability of Boolean function classes, and PAC learnability of discrete distribution classes.

PAC Learning of Boolean Functions
Let us denote by F n the set of all Boolean functions on n bits -i.e.F n = {f |f : {0, 1} n → {0, 1}}.Notice that any function in F n can be specified via its truth table, and therefore F n {0, 1} 2 n .We call any subset C ⊆ F n a concept class.For any f ∈ F n we can define various types of classical and quantum oracle access to f .Classically, we define the membership query oracle MQ(f ) as the oracle which on input x returns f (x), and the random example oracle PEX(f, D) as the oracle which when invoked returns a tuple (x, f (x)), where x is drawn from the distribution D over {0, 1} n .It will also be useful to us later to define the oracle RPEX(f, D) which when invoked returns only f (x), with x drawn from D. This can be summarized as follows: Query[PEX(f, D)] = (x, f (x)) with x ← D, (2) where we have used the notation x ← D to indicate that x is drawn from D. Additionally, we define the quantum membership query oracle QMQ(f ) as the oracle which on input |x ⊗ |y returns |x ⊗ |f (x) ⊕ y , and the quantum random example oracle QPEX(f, D) as the oracle which when invoked returns the quantum state x D(x)|f (x) , where again D is some distribution over {0, 1} n .This can be summarized as As it will be convenient later, we also define MQ(f, D) := MQ(f ) and QMQ(f, D) := QMQ(f ) for all distributions D. For a more detailed discussion of these oracles, and in particular the motivation behind their definitions and the relationships between them, we refer to Ref. [4].Given these notions, we can then formulate the following definition of a PAC learner for a given concept class: Definition 1 (PAC Learners).An algorithm A is an ( , δ, O, D)-PAC learner for a concept class C ⊆ F n , if for all c ∈ C, when given access to oracle O(c, D), with probability at least 1 − δ, the learner A outputs a hypothesis h ∈ F n such that Pr An algorithm A is an ( , δ, O)-PAC learner for a concept class C, if it is an ( , δ, O, D)-PAC learner for all distributions D.
Before continuing, it is useful to make some comments concerning this definition.Firstly, note that the above formulation allows us to consider both classical learning algorithms, with either classical membership query or classical random example oracle access, as well as quantum learning algorithms, with either classical or quantum oracle access of any type.Additionally, it is important to point out that for a given model of oracle access O we can consider either distribution dependent learners -i.e.learners which are required to succeed (in the sense of being probably approximately correct) only with respect to samples drawn from some fixed distribution D, or distribution independent learners, which should succeed with respect to samples drawn from all possible distributions.In light of these observations, we see that Definition 1 provides for us a flexible abstraction of supervised learning, which allows for the comparison of a variety of different learning algorithms, each of which models the supervised learning problem in a different context.In order to perform a meaningful computational comparison of these different learning algorithms, we need the following notions of sample and time complexity.Given this, the following definition formalizes a variety of notions for the efficient PAC learnability of a concept class: Definition 3 (Efficient PAC Learnability of a Concept Class).We say that a concept class C is efficiently classically (quantum) PAC learnable with respect to distribution D and oracle O if for all 0 < , δ < 1 there exists an efficient classical (quantum) ( , δ, O, D)-PAC learner for C. Similarly, C is efficiently classically (quantum) PAC learnable with respect to oracle O if for all 0 < , δ < 1 there exists an efficient classical (quantum) ( , δ, O)-PAC learner for C.
For a complete overview of known results and open questions concerning classical versus quantum learnability of Boolean function concept classes, we again refer to Ref. [4].

PAC Learning of Discrete Distributions
In the previous section we provided definitions for the PAC learnability of concept classes consisting of Boolean functions, which provides an abstract framework for studying and comparing computational properties of different supervised learning algorithms.In this section, we formulate a generalization to concept classes consisting of discrete distributions, which builds on and refines the prior work of Refs.[16,29], and provides an abstract framework for studying probabilistic modelling from a computational perspective.Additionally, this formulation allows us to state precisely the primary question that we address in this work.For simplicity (and without loss of generality) we will consider distributions over bit strings, and as such we denote the set of all distributions over {0, 1} n as D n , and we call any C ⊆ D n a distribution class.We also denote the uniform distribution over {0, 1} n as U n .In order to provide a meaningful generalization of PAC learning to this setting, the first thing that we require is a meaningful notion of a query to a distribution.To do this, given some distribution D ∈ D n , we define the sample oracle SAMPLE(D) as the oracle which when invoked returns some x drawn from D. More specifically, we have that Query[SAMPLE(D)] = x ← D. (7) Additionally, it is natural to define the quantum sample oracle QSAMPLE(D) via In particular, note that given access to QSAMPLE(D) one can straightforwardly simulate access to SAMPLE(D) by simply querying QSAMPLE(D), and then performing a measurement in the computational basis.At this point, it is important to point out that unlike in the case of function concept classes -where what it means to "learn a function" is relatively straightforward -there are two distinct notions of what it means to "learn a distribution" [29].Informally, given some unknown distribution D, as well as access to either a classical or quantum sample oracle, we could ask that a learning algorithm outputs an evaluator for D -i.e.some function D : {0, 1} n → [0, 1] which on input x ∈ {0, 1} n outputs an estimate for D(x), and therefore provides an approximate description of the distribution.This is perhaps the most intuitive notion of what it means to learn a probability distribution, and one can indeed construct a corresponding notion of PAC learnability [29], for which a variety of results are known for different distribution classes [10][11][12]29].However, in many practical settings one might not be interested in learning a full description of the probability distribution (an evaluator for the probability of events) but rather in being able to generate samples from the distribution.As such, instead of asking for a description of the unknown probability distribution (an evaluator) we could ask that the learning algorithm outputs a generator for D -i.e. a probabilistic (quantum or classical) algorithm which when run generates samples from D. From a heuristic perspective we note that many of the most widely utilized probabilistic modelling architectures and algorithms, such as generative adversarial networks, are precisely learning algorithms of this type.Additionally, there has recently been a surge of interest in quantum learning algorithms of this type -so called Born machines [16][17][18] -which are based on the simple observation that one can sample from a given distribution by preparing and measuring an appropriate quantum state (such as the state provided by the QSAMPLE oracle).We note that interestingly "learning to evaluate" and "learning to generate" are incomparable learning problems, in the sense that being able to learn an evaluator does not imply being able to learn a generator and vice versa [29].While the learning of evaluators is certainly both interesting and important, with many open questions [12], in this work we will focus exclusively on the problem of learning generators for distribution classes.To this end, we start with the following definition, adapted from Refs.[16,29], which formalizes the notions of efficient classical and quantum generators: In this work we will use primarily the Kullback-Leibler (KL) divergence, defined via as well as the total variation (TV) distance We note that by virtue of its asymmetry the KL-divergence is not strictly a metric, however, via Pinsker's inequality we have that For more details on the interpretation of these and other relevant distance measures, we refer to Ref. [32].Given these preliminaries the following definition provides a natural generalisation of Definition 1 to the generative modelling context in which we are interested: Before continuing it is worth clarifying two important aspects of Definition 6 (also illustrated in Figure 1): 1.The learning algorithm A could be either classical or quantum, and in the quantum case the learner could have access to either the classical or quantum sample oracle (i.e. in the quantum case O could be either SAMPLE or QSAMPLE).
2. Both classical and quantum learning algorithms could output either a classical generator or a quantum generator.In the former case we refer to the learner as a GEN-learner, and in the latter case as a QGEN-learner.In particular, while perhaps counterintuitive, we could consider classical QGEN-learners (which could for example output a description of a quantum sampling circuit) as well as quantum GEN-learners (which could output descriptions of classical circuits).
Given the above definition, we can now define the sample/time complexity of PAC generator learners analogously to how we have defined these notions in Definition 2: e cient quantum generator < l a t e x i t s h a 1 _ b a s e 6 4 = " m 7 c g d R H g + b w h K C 4 P P q p t J 3 d a 1 0 r H 6 v p C P 6 6 s k f p 7 I 4 Z M y o h 5 e j K J K q e 9 R P z P a 4 a q e + z G h A e h w h y x L n x X l 3 P p a t J a e Y O Y U / c D 5 / A H 4 g j 6 M = < / l a t e x i t >

SAMPLE(D)
< l a t e x i t s h a 1 _ b a s e 6 4 = " d S c L c 0 n m e P 9 u P 2 2 b O K l g j w F l i T 0 i u t P 2 Q + 7 4 v f 1 T a 5 m e r 4 + P Q I 1 x h h q R s 2 l a g n A g J R T E j c a Y V S h I g P E A 9 0 t S U I 4 9 I J x p l j + G e V j q w 6 w v 9 u I I j 9 f d G h D w p h 5 6 r J 5 O k c t p L x P + 8 Z q i 6 R 0

SAMPLE(D)
< l a t e x i t s h a 1 _ b a s e 6 4 = " d S c L c 0 n m e P 9 u P 2 2 b O K l g j w F l i T 0 i u t P 2 Q + 7 4 v f 1 T a 5 m e r 4 + P Q I 1 x h h q R s 2 l a g n A g J R T E j c a Y V S h I g P E A 9 0 t S U I 4 9 I J x p l j + G e V j q w 6 w v 9 u I I j 9 f d G h D w p h 5 6 r J 5 O k c t p L x P + 8 Z q i 6 R 0 5 E e R A q w v H 4 U D d k U P k w K Q J 2 q C B Y s a E m C A u q s 0 L c R w J h p e v K 6 B L s 6 S / P k l q x Y B 8 U i l e 6 j W M w R h r s g < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 X c 1 r Q u k p V D z U J e 9 6 E w q (2) Learner has runtime O(poly(n, 1/✏, 1/ )) < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 S H m s L h r 5 Goals: (1) d(D, D 0 )  ✏ with probability at least 1 , for all (✏, ) < l a t e x i t s h a 1 _ b a s e 6 4 = " X e C 1 Q 9 X 1 y H 8 l S R / w P Q 4 G g e s 4 z 0 8D 2 C there exists: < l a t e x i t s h a 1 _ b a s e 6 4 = " w c E l 6 B K S n N f 8 e ciently generated distribution class Given these definitions, we are finally in a position to state precisely the primary question that we explore in this work: Question 1.Does there exist a distribution class C, which can be efficiently classically generated, and which (a) is not efficiently classically PAC GEN-learnable with respect to the SAMPLE oracle and the TVdistance, (b) is efficiently quantum PAC GEN-learnable with respect to the SAMPLE oracle and the TV-distance.
In the following section we will show, via an explicit construction of such a distribution class, that up to a standard cryptographic assumption, the answer to this question is "Yes".Before continuing though, it is again worth clarifying a few potentially subtle aspects of the above question (also illustrated in Fig. 1): (1) Classically generated distribution class: We have restricted our attention in Question 1 to distribution classes which can be efficiently classically generated -i.e.distribution classes C with the property that for all D ∈ C there exists an efficient classical generator GEN D .We note that in principle this restriction is not necessary and if at a high level one's goal is simply to construct a generative modelling problem (i.e.distribution class) which is solvable using quantum resources (i.e.either efficiently quantum PAC GEN-learnable or efficiently quantum PAC QGEN-learnable), but not efficiently solvable with purely classical resources (i.e.not efficiently classically PAC GEN-learnable), then in principle this restriction is not necessary, and one could indeed also consider concept classes which are efficiently quantum generated.However, we have chosen to consider this additional constraint here both in order to make clear the conceptual distinction between generative modelling problems defined by underlying classical processes and generative modelling problems defined by underlying quantum processes, and to demonstrate clearly that quantum learning algorithms can obtain a clear advantage even for problems defined by underlying classical processes.However, despite the focus here on distribution classes which can be efficiently classically generated, the question of whether one can prove a quantum learning advantage for distribution classes which are specified by efficient quantum generators (such as those used for the demonstration of "quantum computational supremacy" [25][26][27]) remains an interesting open question, which we discuss in Section 5.
(2) Classical sample oracle: We note that in Question 1 we have restricted both the classical and quantum learning algorithms to the classical SAMPLE oracle access to the unknown distributions.Once again, as mentioned before, this is not strictly necessary, and one could also consider quantum learners with access to the QSAMPLE oracle.However, we have chosen to restrict ourselves here to classical SAMPLE oracle access both because this is the most natural abstraction of a typical applied generative modelling problem, and because this provides the "fairest" playing field on which to compare the power of classical and quantum learners.That said, understanding the additional power that quantum samples might offer a quantum generator-learner is indeed also an interesting open question, which we formulate and discuss in Section 5.
(3) Classical output generators: Given that we have restricted ourselves to concept classes which can be efficiently classically generated, it is natural as a first step to consider both classical and quantum GEN-learners -i.e.learning algorithms which are restricted to outputting classical generators.It is however also of great interest to determine whether the answer to Question 1 is still "yes" if one considers quantum QGEN-learners, as most recent proposals for quantum generative modelling algorithms are of this type.Once again we discuss this possibility further in Section 5.
(4) Total variation distance: In order to motivate our choice of the TV-distance in Question 1 we note that Pinkser's inequality (Eq.( 11)) implies that if a distribution class is efficiently PAC GEN-learnable with respect to the KL-divergence, then it is efficiently PAC GEN-learnable with respect to the TV distance.As a result, if a concept class is not efficiently classically PAC GEN-learnable with respect to the TV-distance, then it is not efficiently classically PAC GEN-learnable with respect to KL-divergence.Given this, providing a positive answer to Question 1 with the TV-distance provides a stronger result then if one were to use the KL-divergence.In particular, if one were to only prove that a concept class was not PAC GEN-learnable with respect to the KL-divergence, this would not rule out its efficient PAC GEN-learnability with respect to the TV-distance.
Finally, in addition to the above points, we note that Ref. [16] has defined "quantum learning supremacy" as the existence of a distribution class for which, for some distance measure d (not necessarily the total variation distance), for all 0 < δ < 1, and for some fixed > 0, there exists an efficient ( , δ, SAMPLE, d) quantum PAC generator-learner (either a GEN-learner or a QGEN-learner), while for at least one value of 0 < δ < 1 there does not exist an efficient classical ( , δ, SAMPLE, d) PAC GEN-learner.As such we see that a positive answer to Question 1 provides not only a clear example of what Ref. [16] has called "quantum learning supremacy", but in fact something slightly stronger, as a result of the fact that in order for a distribution class to be efficiently quantum PAC GEN-learnable (as per Definition 9) there should exist an efficient ( , δ, SAMPLE, d) quantum PAC GEN-learner not just for some fixed > 0, but for all > 0.

A Quantum/Classical Distribution Learning Separation
In this section we answer Question 1 in the affirmative, providing the main result of this work.To this end, we use the result of KMRRSS [29] as a starting point, who have shown that any pseudorandom function (PRF) can be used to construct a distribution class which is not efficiently classically PAC GEN-learnable, with respect to SAMPLE and the KL-divergence.In addition, each distribution in the concept class defined by KMRRSS admits an efficient classical generator, which is fully specified by a key of the underlying PRF.In light of this, we begin by strengthening the above result to hold also for the TV-distance.We then design a keyed function which, under the decisional Diffie-Hellman (DDH) assumption for the group family of quadratic residues [28], is pseudorandom from the perspective of classical adversaries, but not pseudorandom from the perspective of quantum adversaries, who in addition to distinguishing keyed instances of the function from random with membership queries, can also learn the secret key using only random examples.Instantiating a slight modification of the KMRRSS construction with this DDH based PRF yields a distribution class which answers Question 1 in the affirmative, under the DDH assumption for quadratic residues.We proceed by introducing all the necessary cryptographic primitives in Section 3.1.Equipped with these preliminaries, we then present in Section 3.2 the classical hardness result of KMRRSS [29], along with some important corollaries and modifications.Finally, given this result, in Section 3.3 we use the DDH assumption to explicitly construct a distribution class which, due to the results in Section 3.2, is provably not efficiently classically learnable, but for which we are able to construct explicitly an efficient quantum learner.

Cryptographic Primitives
We begin here with a brief overview of the cryptographic notions which are necessary to understand the constructions in the following sections.For a more detailed introduction to these concepts and constructions, we refer to Ref. [33].The first notion that we need is that of a parameterization set.
Definition 10 (Parameterization Set).We call some infinite set P a parameterization set if: 1. P is the union of countably many pairwise disjoint sets, i.e.P = n∈N P n with P i ∩ P j = ∅ for all i = j.
2. There exists an efficient (possibly probabilistic) instance generation algorithm IG which, for all n ∈ N, on input 1 n outputs some P ∈ P n .
3. There exists an efficient algorithm which for all P ∈ P, on input P outputs the unique n ∈ N such that P ∈ P n .
We note that two particularly simple "textbook" examples of parameterization sets, useful for gaining intuition, are P = N and P = {p ∈ N | p is prime}.In the former case one has P n = {n}, along with the deterministic instance generation algorithm IG(1 n ) = n, and in the latter case one can define P n as the set of n-bit primes, and use as an instance generation algorithm existing efficient algorithms for sampling from the n-bit primes [34].In particular, we note that in general, on input 1 n , the instance generation algorithm IG effectively samples from some implicitly defined distribution over P n -i.e.IG(1 n ) is a random variable taking values in P n .Using such parameterization sets we can then define indexed collections of efficiently computable functions: Definition 11 (Indexed Collection of Efficiently Computable Functions).Given some parameterization set P, we say that the set of functions {F P | P ∈ P} is an indexed collection of efficiently computable functions if for all P ∈ P we have that F P : X P → Y P , and there exists: 1.An efficient instance description algorithm which, for all P ∈ P, on input P outputs a description of the domain X P and the codomain Y P .
2. An efficient evaluation algorithm which, for all P ∈ P and all x ∈ X P , on input P and x, outputs Using this, we are then able to define the notion of a collection of pseudorandom generators as follows:

Definition 12 (Collection of Pseudorandom Generators). An indexed collection of efficiently computable functions {G P } is called a collection of pseudorandom generators if for all classical probabilistic polynomial time algorithms A, all polynomials p and all sufficiently large n it holds that
where U (X ) denotes the uniform distribution over the set X , and IG is the instance generation algorithm for {G P }.
We would now like to define pseudorandom functions.To do this we will need the slightly modified notion of an efficiently computable indexed collection of keyed functions: Definition 13 (Indexed Collection of Efficiently Computable Keyed Functions).Given some parameterization set P, we call a collection of functions {F P | P ∈ P} an indexed collection of efficiently computable keyed functions if for all P ∈ P we have that F P : K P × X P → Y P , and there exists: 1.An efficient instance description algorithm which, for all P ∈ P, on input P outputs a description of the key space K P , effective domain X P , and codomain Y P .
2. An efficient probabilistic key selection algorithm which, for all P ∈ P, on input P can sample efficiently from the uniform distribution over K P .
3. An efficient evaluation algorithm which, for all P ∈ P, all k ∈ K P and all x ∈ X P , on input P, k, x outputs F P (k, x).
Given this, following Refs.[35,36], we can define various types of pseudorandom function collections via the following: where U (F : X P → Y P ) denotes the uniform distribution over all functions from X P to Y P and A is given oracle access to (a,c) In order to clarify the above definition, we summarize informally below, using the abbreviation "AECA" for "all efficient classical algorithms" and the abbreviation "AEQA" for "all efficient quantum algorithms": AECA with classical random example oracle access satisfy Eq. (13) =⇒ weak-secure.
While at first glance the above naming conventions may seem extremely confusing, we note that if one assumes the existence of quantum computers, then the "standard" setting in which one would like to prove pseudorandomness of a function collection -i.e. the setting which corresponds to most realistic physical scenarios -is the setting in which any possible adversary (including quantum adversaries) has classical membership query access to the unknown functions [35].

Classical Hardness from Classical-Secure Pseudorandom Functions
Given the preliminaries from the previous section, we present below -in Theorem 1 -a construction due to KMRRSS [29], which allows one to use (almost) any classical-secure pseudorandom function collection to construct a distribution class -in fact, infinitely many such classes -which were proven in Ref. [29] to be not efficiently classically GEN-learnable, with respect to the SAMPLE oracle and the KL-divergence.Before presenting this construction, however, a few remarks are necessary.Firstly, we note that Theorem 1 as presented below, is in fact both a slight generalization and strengthening of the original result from Ref. [29].More specifically, Theorem 1 below makes it explicit that (a) one can use classical-secure pseudorandom function collections parameterized by arbitrary parameterization sets (as opposed to simply P = N), provided the domain and co-domain satisfy mild requirements, and (b) this construction actually results in distribution classes which are not efficiently classically GEN-learnable with respect to the SAMPLE oracle and the TV-distance.While the motivation for strengthening the result is clear, the generalization will be necessary for us, as in the following section we wish to instantiate this distribution class construction using a concrete pseudorandom function candidate, based on the DDH assumption.Additionally, in this work we wish to construct a distribution class which is not only provably hard to learn classically, but which is also provably efficiently quantum learnable.To do this we will require another modification of the construction from Ref. [29], which is presented as Corollary 1.1, and whose significance will be discussed at length in the following section.Finally, we note that KMRRSS have provided in Ref. [29] only a sketch of a proof that their construction yields distribution classes which are classically hard to learn.As we both generalize and strengthen this result, as well as ultimately require a modification (Corollary 1.1) of this construction, we provide here a full proof for Theorem 1, based on the original sketch from Ref. [29].
At this stage we are almost ready to present the construction, in a language sufficiently general for our requirements.As a final preliminary consideration, we note that for all non-negative integers x ∈ N 0 we will denote by BIN(x) the shortest possible binary representation of x, and by BIN n (x) the n-bit binary representation obtained by padding BIN(x) with zeros.We also denote by x||y the concatenation of bit strings x and y.Additionally, for any set X ⊂ N 0 we write X ⊆ {0, 1} n when BIN n (x) exists for all x ∈ X.Given these definitions we state the following theorem, which is a reformulation, generalization and strengthening of the original result from KMRRSS [29]: Theorem 1 (Classical Hardness from Classical-Secure Pseudorandom Functions).Let {F P } be a classicalsecure pseudorandom function collection with the property that for all n, for all P ∈ P n , it is the case that F P : K P × {0, 1} n → Y P , with Y P ⊆ {0, 1} n .For all P , and all k ∈ K P , we then define Additionally, we denote by D(P,k) the discrete distribution over {0, 1} 2n for which KGEN (P,k) is a classical generator.For all sufficiently large n the distribution class Cn := { D(P,k) |P ∈ P n , k ∈ K p } is not efficiently classically PAC GEN-learnable with respect to the SAMPLE oracle and the TV-distance.
In order to simplify the presentation of the proof of Theorem 1, it will be convenient to begin with a few preliminary lemmas.The first result that we need is an alternative characterization of classical-secure pseudorandom function collections, which we develop below, and illustrate in Fig. 2.
Definition 15 (Polynomial Inference [31]).Let {F P } be an indexed collection of keyed functions, and let A be some probabilistic polynomial time classical algorithm capable of oracle calls.On input P ∈ P n , algorithm A is given oracle access to MQ(F P (k, •)), and carries out a computation in which it queries the oracle on x 1 , . . ., x j ∈ X P .Algorithm A then outputs some x ∈ X P , which must satisfy x / ∈ {x 1 , . . ., x j }.We call x the "exam string".At this point, A is then disconnected from MQ(F P (k, •)) and presented the two values F P (k, x) and y ← U (Y P ) in random order.We say that A "passes the exam" if it correctly guesses which of the two values is F P (k, x).Let Q be some polynomial.We then say that A Q-infers the collection {F P } if for infinitely many n, given input P ∈ P n , it passes the exam with probability at least 1/2 + 1/Q(n), where the probability is taken uniformly over all possible choices of P ∈ P n , k ∈ K P , all possible choices of y ∈ Y P and all possible orders of the exam strings F P (k, x) and y.We say that an indexed collection of keyed functions {F P } can be polynomially inferred if there exists a polynomial Q and a probabilistic polynomial time algorithm A which Q-infers {F P }.

Lemma 1 ([31]
).Let {F P } be an indexed collection of efficiently computable keyed functions.Then, {F p } cannot be polynomially inferred if and only if it is a classical-secure pseudorandom function collection.
Additionally, we will need the following observation.Lemma 2. Let GEN D be a (d TV , )-generator for D(P,k) , for some < 1/5.Then, for at least 2 n /2 of the 2 n possible strings of the form y = x||BIN n (F p (k, x)) ∈ {0, 1} 2n with x ∈ {0, 1} n it is the case that

Distinguishing Algorithm
< l a t e x i t s h a 1 _ b a s e 6 4 = "

Distinguishing Algorithm
< l a t e x i t s h a 1 _ b a s e 6 4 = "    , in which we consider an inference algorithm, which after a learning phase, should try to pass an "exam" of its own choosing.As per Lemma 1, for a given indexed collection of keyed functions, there exists a suitable distinguishing algorithm, if and only if there exists a suitable inference algorithm.
Proof.Assume that the claim is false, i.e. that for at least 2 n /2 of the strings This contradicts the assumption that GEN D is a (d TV , )-generator for D(P,k) .
Finally, given these preliminary results and observations, we can present a full proof of Theorem 1.

Proof (Theorem 1)
. At a high level, the idea of the proof is to assume that Cn is efficiently classically PAC GEN-learnable for infinitely many n, and use the associated learning algorithms to construct a poly-time algorithm which Q-infers {F P }, for some polynomial Q.By Lemma 1 this implies that {F P } is not classical-secure pseudorandom, which then gives a proof by contradiction.To do this, let us denote the assumed learning algorithm for Cn as Ãn .Our goal is now to construct an inference algorithm for {F P }, which we denote as A. Now, as per Definition 15, on input P ∈ P n , when given access to MQ(F P (k, •)), algorithm A proceeds in two steps as follows: 1. Obtain an approximate generator for D(P,k) by simulating the learning algorithm Ãn .Specifically, run the learning algorithm Ãn , with = 1/n and δ = 1/2, by using access to MQ(F P (k, •)) to simulate SAMPLE( D(P,k) ).Each time Ãn queries SAMPLE( D(P,k) ), algorithm A simply draws some x ∈ {0, 1} n uniformly at random, queries MQ(F P (k, •)) on input x, and then provides Ãn with the sample Let us denote by GEN D the generator output by Ãn , and by X = {x 1 , . . ., x j } the set of strings used by A to simulate Ãn .We know that with probability 1 − δ = 1/2, the output generator GEN D is a (d TV , )-generator for D(P,k) .Additionally, it follows from the efficiency of A n that |X| = poly(n, 1/δ, 1/ ) = poly(n).
• If x / ∈ X, then submit x as the exam string, and receive the strings y 1 , y 2 .If y ∈ {y 1 , y 2 }, then output y.Let us call this casea .Else, if y / ∈ {y 1 , y 2 } then output either y 1 or y 2 uniformly at random.We call this case-b .
• Else, if x ∈ X, then select any x / ∈ X as the exam string, and after receiving y 1 , y 2 simply output either y 1 or y 2 at random.Call this case-c .
We now want to determine a lower bound on the probability that A passes the exam.To do this, let us denote by Pr[ z ] the probability that case-z occurs, and by Pr pass [ z ] the conditional probability that A passes the exam, given that case-z has occurred.Clearly, Pr pass [ b ] = Pr pass [ c ] = 1/2, so let us look at case-a more carefully.In particular, there are two possibilities: 1.The first possibility is that y = F P (k, x).Let's call this case-a1 .In this case A definitely passes the exam -i.e.we have that Pr pass [ a1 ] = 1.
2. The second possibility is that y = F P (k, x), and that whichever string from {y 1 , y 2 } was randomly drawn, just happens to equal y.Lets call this case-a2 .In this case A definitely fails the exam -i.e.
In light of this, the probability that A passes the exam is then given by So, to proceed we now analyze Pr[ a1 ] and Pr[ a2 ].Notice that case-a1 occurs when x||y = x||F P (k, x) for some x / ∈ X.Using Lemma 2, with = 1/n, and n > 5 (so that < 1/5) we know that, when GEN D is a (d TV , )-generator for D(P,k) , there exist at least 2 n /2 strings of the form x||F P (k, x) for which Using the above, along with the fact that |X| = p(n) for some polynomial p, we then have that As a result, there exists some n 1 such that for all n ≥ n 1 we have that Pr[ a1 ] ≥ 1/(6n).So, at this point we know that for all n large enough Now, note that case-a2 occurs when x / ∈ X and when whichever of y 1 or y 2 is randomly drawn is equal to y.As a result, we have that Pr[ a2 ] ≤ 1/2 n .Using this, we see that for all n ≥ n 1 , Similarly, to the previous case, we now know that there exists some n 2 , such that for all n ≥ max{n 1 , n 2 }, In light of the above, we therefore see that for all sufficiently large n, A Q-infers {F p }, and therefore, via Lemma 1, {F P } cannot be classical-secure pseudorandom, which contradicts the assumptions of the theorem.
As mentioned earlier, while Theorem 1 provides a method for the construction of distribution classes which are not efficiently classically learnable, in order to construct such a distribution class which is also efficiently quantum learnable, it will be helpful to formulate the following modified construction: Corollary 1.1.Let {F P } be a classical-secure pseudorandom function satisfying all the properties required for Theorem 1.In addition, for all n, we assume that for all P ∈ P n there exists an efficient m = poly(n) bit encoding of P , which we denote as BIN m (P ).For all P , and all K ∈ K P we then define via Additionally, we define D (P,k) as the discrete distribution over {0, 1} 2n+m for which GEN (P,k) is a classical generator.For all sufficiently large n the distribution class C n := {D (P,k) |P ∈ P n , k ∈ K p } is not efficiently classically PAC GEN-learnable with respect to the SAMPLE oracle and the TV-distance.
To make clear the difference between the constructions of Theorem 1 and Corollary 1.1 we summarize informally as follows: Before describing the motivation behind such a modification, we note that we have stated this construction as a corollary due to the fact that the proof is essentially the same as the proof of Theorem 1.The only difference is that when the polynomial inference algorithm A is given input P ∈ P n and access to MQ(F P (k, •)), in order to simulate the learning algorithm Ãn , it should respond to a SAMPLE query by drawing some x ∈ {0, 1} n uniformly at random, and then returning To see why such a modified construction will be helpful for constructing an efficient quantum learner, note that both KGEN (P,k) and GEN (P,k) are fully specified by the parameterization P , and some key k ∈ K P .As such, given SAMPLE access to either generator, it would be sufficient for a generator learning algorithm to learn the tuple (P, k).If one uses the distribution class D(P,k) of Theorem 1, generated by KGEN (P,k) , then a learner really has to learn both the parameterization P , and the key k.However, if one uses the distribution class D (P,k) of Corollary 1.1, generated by GEN (P,k) , then with every sample from the distribution the learner is given a binary encoding of P , and as such only needs to learn the key k.

Quantum Learnability and Classical Hardness for a DDH Based Distribution Class
Theorem 1 and Corollary 1.1 provide the first ingredient necessary to answer Question 1 in the affirmative; namely a technique for constructing, from almost any classical-secure pseudorandom function collection, a distribution class which can be efficiently classically generated, and which is not efficiently classically PAC GEN-learnable.Given this result, on first impressions one might think that all that is required for such distribution classes to be efficiently quantum PAC GEN-learnable is that the underlying classicalsecure PRF collection is not standard-secure -i.e. is not pseudorandom from the perspective of quantum adversaries with classical membership query access.In particular, it seems plausible that if the underlying PRF is classical-secure but not standard-secure, then one could exploit the quantum PRF adversary A for the construction of a quantum generator learner A. Unfortunately, however, it is in fact not so straightforward, for the following reasons: Firstly, we note that Additionally, we have that and so by comparison we see that if a learning algorithm A is given access to the oracle SAMPLE(D (P,k) ), then it can efficiently simulate oracle access to PEX(F P (k, •), U n ), however, it cannot simulate oracle access to MQ(F P (k, •)).As such even if there exists an efficient quantum adversary A for the classicalsecure PRF {F P }, a learning algorithm with oracle access to SAMPLE(D (P,k) ) could not simulate A , which requires access to MQ(F P (k, •)).Additionally, note from Eq. ( 13) that any quantum PRF adversary A requires as input the corresponding parameterization P .As such, even if this quantum adversary could succeed with only PEX access, a learning algorithm with access only to the SAMPLE( D(P,k) ) oracle could not simulate A , as it does not have access to P .However, a learning algorithm with access to SAMPLE(D (P,k) ) could simulate A , as an encoding of the parameterization P is given in the suffix of each sample from the distribution.This in fact provides an additional important motivation for the formulation of Corollary 1.1.Secondly, in order for a collection of keyed functions to not be standardsecure, all that is required is that a quantum adversary, with classical membership query access can, for all sufficiently large n, distinguish instances of the keyed function from randomly drawn functions with non-negligible probability, as formalized by condition (13).As such, even if the classical-secure PRF underlying the distribution class is not standard-secure, and even if it is not standard-secure with respect to PEX access as opposed to MQ access, this does not instantly imply that the quantum PRF adversary could be turned into a quantum distribution learner, which should for all valid (δ, ) be able to learn a (d TV , )-generator, with probability 1 − δ.However, as per the discussion in the previous section, we know that the generators {GEN (P,k) } for the distribution class C n = {D (P,k) } are fully specified by the tuple (P, k), and that as an encoding of P is given "for free" with each sample from the distribution, being able to learn the key k ∈ K P from SAMPLE access to {D (P,k) } is sufficient to learn the exact generator GEN (P,k) .Given this, we see that if the underlying classical-secure but not standard-secure PRF is such that the quantum adversary A (a) requires only PEX access, as opposed to MQ access, (b) can in addition to distinguishing an instance of the keyed function from a random function, also learn the key itself with any specified probability, then this PRF adversary could be efficiently simulated by a learner A with access to SAMPLE(D (P,k) ), and used to learn GEN (P,k) with any desired probability (as illustrated in Fig. 3).Instantiating the construction of Corollary 1.1 with this PRF would then guarantee that {D (P,k) } can not be efficiently classically learned, and therefore yield a distribution class which provides an affirmative answer to Question 1.In light of this, we therefore in the remainder of this section construct a classical-secure but not standard-secure PRF with the properties listed above.
The PRF we construct will be classical-secure under the DDH assumption for the group family of quadratic residues.We use such a construction as this assumption is known not to hold for quantum adversaries [28].In order to present this construction, we require the following preliminary definitions, following closely the presentation of Ref. [28]: C f 0 g l 6 t R + v Z e r P e p 6 0 J a z a z i 3 7 B + v g G T l 2 c g w = = < / l a t e x i t >

SAMPLE(D (P,k) )
< l a t e x i t s h a 1 _ b a s e 6 4 = " J Q z Z g 8 G n m J t

j P v e p v S A k H 8 F B t j Y = " >
(5) Output GEN (P,k) < l a t e x i t s h a 1 _ b a s e 6 4 = " M h a 2 7 o a E q C e X n v c Z t q g 4 t   x J P W Y q r 0 + q w X L H r 9 g J 4 n T g 5 q a A c 7 W H 5 a z C K a B I y

< l a t e x i t s h a 1 _ b a s e 6 4 = " / D E h o B l i I K B V S W b b + z p p d Y P u o v s = " >
x A j 6 s = < / l a t e x i t > (3) P , (x, F P (k, x)) < l a t e x i t s h a 1 _ b a s e 6 4 = " R t u j 7 + Y 7 R u B i 9 3 S S h l r + + W s 5 g k

PEX(F
g r B m n 9 5 k X R P m l a r e X 7 b q r Y v i j j K 4 A A c g R q w w C l o g x v Q A T Z A 4 B E 8 g 1 f w p j 1 p L 9 q 7 9 j F r L W n F z D 7 4 A + 3 z B x c d m C c = < / l a t e x i t >

GEN-learner
< l a t e x i t s h a 1 _ b a s e 6 4 = " Z O e n q G 7 R g x j D v 6 K F w + K e P U 3 v P k 3 d p a D R h 8 U P N 6 r o q q e F w m u 0 X G + r N T c / M L i U n o 5 s 7 K 6 t r 6 R 3 d y q 6 j B W D C o s F K G q e 1 S D 4 B I q y F F A P V J A A 0 9 A z e u f j / z a H S j N Q 3 m L g w h a A e 1 K 7 n N G 0 U j t 7 E 6 + i X C P 2 k 8 u L 6 6

GEN-Learner
< l a t e x i t s h a 1 _ b a s e 6 4 = " g X D 9 I O L 2 r / P 9 Definition (Efficient Group Family).Given some parameterization set P, we say that a set of groups G = {G P | P ∈ P} is an efficient group family if for all P ∈ P the group G P is a finite cyclic group, and there exists: 1.An efficient instance description algorithm ID which for all P ∈ P, on input P outputs a generator g for the group G P .
2. An efficient group product algorithm G which, for all P ∈ P and for all a, b ∈ G P , on input (P, a, b) outputs the group product a * b ∈ G P .
Definition (Quadratic Residues in Z * N ).Let Z * N denote the multiplicative group of integers modulo N .We say that an element y ∈ Z * N is a quadratic residue modulo N if there exists an x ∈ Z * N such that x 2 ≡ y mod N .We denote the set of quadratic residues modulo N by QR N .
We note that QR N forms a subgroup of Z * N .Additionally, when N is prime, squaring modulo N is a two-to-one map, and therefore exactly half the elements of Z * N are quadratic residues.As a result, if N is a safe prime -i.e.N = 2q + 1 with q prime -then QR N is of prime order q = (N − 1)/2, and therefore cyclic, with all elements (except the identity) as generators.Given this, we make the following observations: Observation (Parameterization Set of Safe Primes).The set : is a valid parameterization set.In particular, we have by definition that P i ∩ P j = ∅ for all i = j, there exists an efficient probabilistic instance generation algorithm which, for all n ∈ N, maps from 1 n to the set of n-bit safe primes [34], and for all n ∈ N, given some p ∈ P n one can recover n efficiently.
Observation (Group Family of Quadratic Residues).The set of groups G = {QR p | p ∈ P safe primes } is an efficient group family.More specifically, we have already established that P safe primes is a valid parameterization set.Additionally, one can construct an efficient instance description algorithm from (a) the existence of an efficient algorithm for testing membership in QR p [37], and (b) the observation that all elements of QR p except the identity are generators.Finally, The existence of an efficient group operation algorithm follows from the fact that modular multiplication can be performed efficiently [34].We refer to {QR p } as the group family of quadratic residues.
With these preliminaries, we can now state the decisional Diffie-Hellman assumption: Definition 18 (Decisional Diffie-Hellman Assumption [28,38]).We say that a group family G = {G P | P ∈ P}, with instance generator IG and instance description algorithm ID, satisfies the DDH assumption, if for all probabilistic polynomial time algorithms A, all polynomials p(•), and all sufficiently large n it holds that where the probability in the first term is taken over the random variable IG(1 n ) (i.e. a random parameterization P ∈ P n ), the random variable ID(P ) (i.e. a random generator g for the group G P ) and a, b ∈ Z |G P | selected uniformly at random, and in the second term c ∈ Z |G P | is also chosen uniformly at random.
From this point on we will restrict ourselves to the group family of quadratic residues parameterized by safe primes, which is believed to satisfy the DDH assumption [28].The first thing to note is that the DDH assumption instantly implies a method for the construction of a collection of pseudorandom generators.To show this, let us start by defining the parameterization set P (p,g,g a ) as the infinite set of all tuples of the form (p, g, g a ) where p = 2 q + 1 is some safe prime, g is a generator for QR p and a ∈ Z q .We denote the subset of all such tuples in which p is an n-bit prime as P n,(p,g,g a ) and we note that P (p,g,g a ) = n∈N P n,(p,g,g a ) is a valid parameterization set.In particular, for all n ∈ N, on input 1 n the instance generation algorithm for P (p,g,g a ) first runs the instance generation algorithm for P safe primes to obtain an n-bit safe prime p, then runs the instance description algorithm for the group family {QR p } to obtain a generator g for QR p , before finally selecting an element a of Z q uniformly at random.We then define the function modexp p,g : which allows us, for all tuples (p, g, g a ) ∈ P (p,g,g a ) , to define the function Given this construction, we now note, following Ref.[28], that {G (p,g,g a ) | (p, g, g a ) ∈ P (p,g,g a ) } is a collection of pseudorandom generators, under the assumption that {QR p } satisfies the DDH assumption.
Observation 3 ({G (p,g,g a ) } is a Collection of Pseudorandom Generators [28]).Note that when using the group family of quadratic residues, we can rewrite Eq. (38) as By comparison with Definition 12, we therefore see that {G (p,g,g a ) } is an indexed collection of pseudorandom generators, under the assumption that {QR p } satisfies the DDH assumption.
We would now like to use the PRG {G (p,g,g a ) } to define a suitable classical-secure PRF collection.As {G (p,g,g a ) } is a collection of effectively length doubling pseudorandom generators, it has been observed multiple times [28,36,38] that one could in principle construct a classical-secure PRF collection from {G (p,g,g a ) } via the Goldreich-Goldwasser-Micali (GGM) construction [31,33].However, as noted and discussed in Ref. [39], one needs to take some care in order to construct such a PRF collection in a rigorous way.More specifically, the GGM construction requires that the functions G 0 (p,g,g a ) and G 1 (p,g,g a ) (defined via Eq.( 41)) be iterated in an order defined via the input to the PRF, and for such an iteration to be well defined, and for the GGM proof to hold with only minor modifications, it is essential that there exists an efficient bijection from the codomain of G i (p,g,g a ) to its domain -i.e. a function which efficiently enumerates the elements of the group.However, for the group family of quadratic residues, such an efficient bijection exists, and as such it is indeed possible to construct a PRF collection, via the GGM construction, starting from a slightly modified DDH based PRG collection.To make this more precise, given some safe prime p = 2q + 1, we define the function As noted in Ref. [39], this function is fact a bijection, whose inverse f −1 p : Z q → QR p is given by While it is clear that f p can be efficiently computed, efficiency of f −1 p is less obvious, and follows from the fact that group membership in QR p can be efficiently tested [37].With this in hand, we now define the indexed collection of functions { G(p,g,g a ) }, where for all valid parameterizations G(p,g,g a ) : Z q :→ Z q × Z q (45) is defined via As f p is a bijection, we again have, analogously to Observation 3, that { G(p,g,g a ) } is an indexed collection of PRG's, under the DDH assumption for {QR p }.Given this, we can finally construct the DDH based PRF which will fulfill all our requirements.Specifically, we consider an indexed collection of keyed functions {F (p,g,g a ) | (p, g, g a ) ∈ P (p,g,g a ) }, where is defined algorithmically, via the GGM construction [31,33], as follows: Algorithm 1 Algorithmic implementation of F (p,g,g a ) (b, x) 1: Given parameterization p, g, g a , as well as key b ∈ Z q and input x ∈ {0, 1} n 2: b 0 ← b 3: for all 1 ≤ j ≤ n do 4: end if 9: end for The above algorithm is also illustrated in Fig. 4, which serves to illustrate that for all tuples (b, x) the desired output F (p,g,g a ) (b, x) can be calculated by moving through a binary tree, with the key b ∈ Z q at the root, and where at each level either G0 (p,g,g a ) or G1 (p,g,g a ) is applied, with the path determined by the input string x ∈ {0, 1} n .We now make the following claims: Claim 1. {F (p,g,g a ) } is a classical-secure pseudorandom function collection.Claim 2. For all n ∈ N, there exists an exact efficient quantum key-learning algorithm which, for all valid parameterizations (p, g, g a ) ∈ P n,(p,g,g a ) , all x ∈ {0, 1} n , and all b ∈ Z q , on input (p, g, g a ), x and F (p,g,g a ) (b, x) returns b with probability 1.
GGM tree for F (p,g,g a ) (Algorithm 1)

< l a t e x i t s h a 1 _ b a s e 6 4 = " y Z n z T T M 8 r a B D h 3 q R 8 L J o I 9 S H z g U = " >
o v z 7 n x 8 n 4 4 5 w 8 w q + Q W n 9 w X + / a E J < / l a t e x i t > Key learner for F (p,g,g a ) (b, •) (Algorithm 2) < l a t e x i t s h a 1 _ b a s e 6 4 = " P D m L x v W a F 8 u X h a N 5 J 6 p c C G c 0 H s o = " > A A A C K 3 i c b V D N a t t A G F y l a e O 6 P 3 H b Y y 9 L 7 I I N x k i m 0 D Q n p 4 V Q 6 M W B + g d s 1 6 x W n + Q l q 1 2 x + y l g h N 4 n l 7 5 K D 8 m h S e i 1 7 x F Z 9 q G 1 O 6 d h 5 h t 2 Z / x E C o u u e + f s P d p / / O S g 8 r T 6 7 P m L l 4 e 1 V 6 + H V q e G w 4 B r q c 3 Y Z n 3 v m 9 P t 1 z N p k 3 5 B 8 4 f x 4 A g v S l 7 g = = < / l a t e x i t > Input: Parameterization p, g, g a and "sample" x||F (p,g,g a ) (b, x) < l a t e x i t s h a 1 _ b a s e 6 4 = " W j R T 6 4 z A 8 G P 0 C I c a u J x a U q u P E A 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 T 4 a 8 v 8 N a j f u 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " Q M e t I k K c k p < l a t e x i t s h a 1 _ b a s e 6 4 = " r p S N y F X < l a t e x i t s h a 1 _ b a s e 6 4 = " < l a t e x i t s h a 1 _ b a s e 6 4 = " p 0 s 6 o L u 2 6 r y C A j e 4 s v L p H F e 9 i r l q 7 t K q X q d x Z G H I z i G U / D g A q p w C z W o A w M B z / A K b 8 6 j 8 + K 8 O x / z 1 p y T z R z C H z i f P 6 e r j 7 c = < / l a t e x i t > Figure 4: (Left Panel) An illustation of the GGM tree construction for {F (p,g,g a ) }.For all input tuples (b, x) the desired output F (p,g,g a ) (b, x) can be calculated by moving through a binary tree, with the key b ∈ Zq at the root, and where at each level either G0 (p,g,g a ) = fp • modexp p,g or G1 (p,g,g a ) = fp • modexp p,g a is applied, with the path determined by the input string x ∈ {0, 1} n .(Right Panel) An illustration of the "tree-reversal" key-learning algorithm, which given a leaf of the tree F (p,g,g a ) (b, x), along with a description of the path x from the root to this leaf, exploits the knowledge of this path, along with the ability to invert Gi (p,g,g a ) , to reverse the tree and obtain the value of the root node.
Before proceeding to prove these claims, we make the following observations: Firstly, note that given Claim 1, the indexed collection of keyed functions {F (p,g,g a ) } satisfies all the requirements of both Theorem 1 and Corollary 1.1, and as such, for all sufficiently large n, the distribution class defined as per Corollary 1.1, is not efficiently classically PAC learnable with respect to SAMPLE and the TV-distance.
Additionally, let us assume that a quantum GEN-learner A is given access to SAMPLE(D (p,g,g a ),b ), for some (p, g, g a ) ∈ P n,(p,g,g a ) .Note that Query[SAMPLE(D (p,g,g a ),b )] = x||BIN(F (p,g,g a ) (b, x))||BIN(p, g, g a ) with x ← U n (49) and therefore, as illustrated in Fig. 3, from any single such sample, the quantum GEN-learner A can run the quantum key-learning algorithm A from Claim 2, on input (p, g, g a ), x and F (p,g,g a ) (b, x), and obtain the key b with probability 1, which coupled with the parameterization (p, g, g a ) provides a complete description of the exact generator GEN ((p,g,g a ),b) for D (p,g,g a ),b .As a result, it follows from Claim 2 that the distribution class {D (p,g,g a ),b } is efficiently quantum PAC GEN-learnable with SAMPLE access, and with respect to both the KL-divergence and the TV-distance (due to the fact that the learner in fact always learns an exact generator).Given this discussion, we see that the following theorem, providing an affirmative answer to Question 1, follows from Claims 1 and 2: Theorem 2 (Quantum GEN-Learning Advantage for {D (p,g,g a ),b }).For all sufficiently large n the distribution class is not efficiently classically PAC GEN-learnable with respect to the SAMPLE oracle and the TV-distance, but is efficiently quantum PAC GEN-learnable with respect to the SAMPLE oracle and the TV-distance.
At this point all that remains is to provide the proofs of Claims 1 and 2. As { G(p,g,g a ) } is a collection of length doubling PRG's the proof of Claim 1 is essentially identical to the original GGM proof [31,33], with only minor straightforward modifications required to the definition of the hybrid distributions in order to account for the slightly more general parameterization set.As such, we omit this proof and proceed with a constructive proof of Claim 2. Intuitively, as illustrated in Fig. 4, given (p, g, g a ), x and F (p,g,g a ) (b, x) -i.e. a "leaf" of the GGM tree -we construct an algorithm which can reverse the path taken through the tree to reach this leaf, and return the value of the root node, which is the desired key.More formally, the proof is as follows: Proof (Claim 2).For all tuples p, g, where p = 2q + 1 is an n-bit safe prime, and g is a generator for QR p , we define the discrete log function dlog p,g : QR p → Z q via dlog p,g (g x ) = x mod(q), ( for all x ∈ N. As shown in Ref [30], for all relevant tuples (p, g), there exists an exact efficient quantum algorithm for computing dlog p,g -i.e. an efficient quantum algorithm which on input p, g, g x outputs dlog p,g (g x ) with probability 1.Given the ability to efficiently calculate dlog as a subroutine, we can now construct a "tree-reversal" algorithm by using the string x to determine whether f p • modexp p,g or f p • modexp p,g a was applied at a given level of the tree, and then applying either dlog p,g • f −1 p or dlog p,g a • f −1 p as appropriate.More specifically, the following algorithm, also illustrated in Fig. 4, provides a constructive proof for the claim: Algorithm 2 Exact efficient quantum key learner for F p,g,g a (b, x)

Verification of Quantum GEN-Learning Advantage
In light of our main result -given in Theorem 2 -it is natural to ask whether such a quantum GENlearning advantage could be efficiently verified.While there are a variety of ways to formalize the notion of a verification procedure in this context, one approach is as follows: We consider an honest referee, Ronald, who for some sufficiently large n, is in possession of a full description of some element D (p,g,g a ),b of the distribution class C n -which in this case means in possession of the generator GEN ((p,g,g a ),b) as well as its defining parameters (p, g, g a ) ∈ P n,(p,g,g a ) and b ∈ Z q , and a description of its construction.Ronald then selects some valid ( , δ) pair and asks Alice, who claims to have a quantum computer, to run the claimed quantum GEN-learning algorithm A for C n , on input ( , δ), by using his access to GEN ((p,g,g a ),b) to respond to Alice's SAMPLE queries.Alice then obtains from her claimed quantum learning algorithm some classical hypothesis generator HGEN, which generates samples from some distribution D h .If Alice's learning algorithm is indeed a PAC GEN-learner for C n , then with probability 1 − δ, HGEN should be a classical (d TV , )-generator for D (p,g,g a ),b .The question of verification is then whether there exists some efficient interactive procedure V (with suitable completeness and soundness guarantees) through which Alice, if HGEN is indeed a (d TV , )-generator for D (p,g,g a ),b , can convince Ronald of this fact.
In the most general (weak-membership) setting of verification Ronald should with high probability reject HGEN if the generated distribution D h is at least -far away in TV-distance from the target distribution D ((p,g,g a ),b) , and accept with high probability if the generated distribution D h is at least -close to the target, for some < .As such, the weak-membership verification problem, with non-zero completeness , is indeed precisely the well studied problem of tolerant identity testing, between the learned distribution D h and the target distribution D ((p,g,g a ),b) [40].
In the simplest (black-box) verification protocol, Alice can only send samples from the generator HGEN to Ronald to convince him of the fact that is indeed a (d TV , )-generator for D (p,g,g a ),b , that is, allow Ronald random example queries to HGEN.However, given that HGEN is a classical generator, Alice could also allow Ronald membership queries to HGEN (after publishing a description of the domain).Clearly, such a protocol is stronger as in the latter case Ronald has access not only to random samples from D h , but additional information about the generator as well.Importantly, as alluded to above, we note that membership-query based verification is only a possibility in the case of GEN learners, which output classical generators.In the case of QGEN learners, which output quantum generators, it is only possible to consider the setting of black-box (sample based) verification.
As we are primarily concerned here with the verification of the GEN learner presented in the previous section, let us focus first on the setting in which Alice publishes the domain of HGEN and allows Ronald membership queries to HGEN.Recall that GEN ((p,g,g a ),b) : {0, 1} n → {0, 1} 2n+m , i.e. its domain is {0, 1} n .We can now distinguish between two possibilities.The first possibility is then that the domain of HGEN is the same as that of GEN ((p,g,g a ),b) , i.e.HGEN : {0, 1} n → {0, 1} 2n+m , and the second is that it differs, i.e.HGEN : X → {0, 1} 2n+m , for some domain X = {0, 1} n .The following lemma allows us to show that efficient verification is indeed possible in the first case, and therefore in particular, for the GEN learner presented in the previous section.
Proof.We start with the observation that for any distribution D ∈ D m generated by GEN D : {0, 1} n → {0, 1} m we have that where δ(y, y ) = 1 if y = y and δ(y, y ) = 0 otherwise.Using this we then have that In particular, when HGEN : {0, 1} n → {0, 1} 2n+m we can apply the above lemma, with HGEN as GEN D1 and GEN ((p,g,g a ),b) as GEN D2 , which allows us to see that Ronald can convince himself that d TV (D h , D (p,g,g a ),b) ≤ by checking that dist HGEN, GEN ((p,g,g a ),b) ≤ .To do this, Ronald draws a uniformly random x ∈ {0, 1} n , queries both HGEN and GEN ((p,g,g a ),b) , and outputs 1 in the event that HGEN(x) = GEN ((p,g,g a ),b) (x) and 0 otherwise.By repeating this procedure O(1/˜ 2 log(1/η)) many times, he can estimate the bias dist HGEN, GEN ((p,g,g a ),b) of the corresponding coin up to error ˜ with failure probability η.He accepts the hypothesis generator HGEN if his estimate b satisfies b + ˜ < and rejects otherwise.Given this, we note that if Alice runs the quantum GEN-learner described in the previous section, then she will obtain, with certainty, the exact generator GEN ((p,g,g a ),b) for D (p,g,g a ),b , so that dist HGEN, GEN ((p,g,g a ),b) = 0, and the above verification procedure can be applied.In the case of the quantum GEN-learner that we have provided, it is therefore straightforward to verify the quantum learning advantage exhibited by this learner.
Finally, if we are in the latter case, i.e.HGEN : X → {0, 1} 2n+m for some set X = {0, 1} n , then it seems unclear how membership queries to the hypothesis generator HGEN would aid Ronald in verifying closeness.The problem of verification with membership queries then reduces to the problem of verification with random example queries, that is, of the tolerant identity testing, from random samples, of the distributions D h and D ((p,g,g a ),b) .We note that the distribution D ((p,g,g a ),b) is in fact the uniform distribution over a size 2 n subset of {0, 1} 2n+m , and therefore it is known that there exists a universal constant 0 > 0 such that for any ∈ (0, 0 ) and = c with c > 1 at least Ω(2 n /( n)) many samples from D h are required to verify the hypothesis generator [40][41][42].In the simpler setting in which = 0 -i.e.only generators of the exact target distribution should be accepted while -false distributions are still rejected -it is known that Ω( √ 2 n / 2 ) many samples from D h are required [40,43,44].In summary, we see that the quantum learning advantage exhibited by the learning algorithm provided in the previous section could indeed be efficiently verified, as a result of the fact that the learner always obtains an exact classical generator whose domain coincides with that of the target generator, and therefore allows for efficient membership query based tolerant identity testing.However, in general, verification of a claimed GEN-learner (or QGEN learner) reduces to the problem of black-box (sample based) tolerant distribution identity testing, which in general requires exponential sample complexity.Given this, we note that in general, verification of a generative learner is a harder task than verification of a supervised learner, which can be performed efficiently, in the black-box setting [45].

On Classical Hardness Results From Alternative Primitives
One of the key tools which allowed us to prove our main result (Theorem 2) was the construction and associated classical hardness result of KMRRSS [29], presented here (generalized and strengthened) as Theorem 1.In this section we shift direction slightly, and explore the possibility of obtaining similar classical hardness results from alternative primitives.More specifically, as we have seen in the previous section, KMRRSS have shown that given any classical-secure PRF collection, it is possible to construct a distribution class which is not efficiently classically learnable.In Section 4.1 we ask whether it is necessary to use a PRF collection to obtain such a classical hardness result, or whether the weaker notion of a weak -secure PRF collection [36] would be sufficient.This question is motivated partly by the existence of weak-secure PRF's with known quantum adversaries [36], upon which, if one is able to obtain a classical hardness result, then one might be able to obtain additional quantum/classical distribution learning separations, which are possibly achievable by near-term quantum learners.In Section 4.2 we then ask whether one could instantiate the construction of KMRRSS with a Boolean function concept class which is provably not efficiently classically PAC learnable.In this latter case, the motivation is to understand better the relationship between PAC learning of Boolean functions and PAC learning of discrete distributions, and in particular whether one could leverage existing classical/quantum separations for learning Boolean functions into distribution learning separations.In Sections 4.1 and 4.2 we formulate conjectures concerning the possibility of using these alternative primitives for classical hardness results, and describe clearly some obstacles towards proving these conjectures.

Classical Hardness Results from Weak-Secure Pseudorandom Function Collections
Let us begin by discussing the possibility of proving a classical/quantum learning separation that is based on weak-secure PRFs as opposed to classical-secure PRFs, as they are used in Theorem 1.In order to understand the motivation for this question, it is necessary to briefly return to an analysis of the proof of Theorem 1.In particular, at an informal level (i.e.modulo some details) we recall the following: 1. Theorem 1 states that if {F P } is a classical-secure PRF collection, then for infinitely many n the distribution class Cn = {D (P,k) } is not classically efficiently PAC GEN-learnable.
2. The proof of Theorem 1 proceeds by assuming that the concept class Cn is efficiently PAC GENlearnable for infinitely many n, and then using the associated learning algorithms Ãn to construct a polynomial time algorithm A which Q-infers the function collection {F P }, for some polynomial Q, thereby contradicting the assumption that {F P } is a classical-secure pseudorandom function collection.
More specifically, the proof of Theorem 1 begins with the observation that and therefore when given oracle access to MQ(F P (k, •)), the polynomial inference algorithm A can simulate the learner Ãn by responding to sample queries of Ãn with x||BIN n (Query[MQ(F P (k, •))](x)), where x has been drawn uniformly at random.Algorithm A then uses the obtained generator to "pass the exam" described in Definition 15.However, we note that in order to simulate the learner Ã, it is not necessary for the inference algorithm to have membership query access to F P (k, •).In particular, we have that and therefore the entire proof of Theorem 1 holds, even if the polynomial inference algorithm A only has random example oracle access to F P (k, •).In light of this observation, one may immediately think that a weak -secure PRF collection would be sufficient for instantiating the construction described in Theorem 1.
In other words, given that the proof of Theorem 1 holds when the polynomial inference algorithm only has random example oracle access to F P (k, •), it seems plausible that if one can efficiently learn the distribution class { DP,k }, then one can use this learner to construct an adversary (i.e.distinguishing algorithm) for the function collection {F P } which only requires random example oracle access.To formalize this idea, we make the following conjecture, a slight variant of Theorem 1: Conjecture 1.Let {F P } be a weak-secure pseudorandom function collection with the additional properties stated in Theorem 1.Then, for all sufficiently large n the distribution class Cn := { D(P,k) |P ∈ P n , k ∈ K p } is not efficiently classically PAC GEN-learnable with respect to the SAMPLE oracle and the TV-distance.
Given the proof of Theorem 1, and the observation that the polynomial inference algorithm involved in the proof requires only random example oracle access, one might think that a proof of Conjecture 1 would follow immediately.Unfortunately though, this is not the case.However, before discussing the obstacles one faces in adapting the proof of Theorem 1 to prove Conjecture 1, it is worth examining briefly why Conjecture 1 is interesting, and what consequences its truth might have for obtaining classical/quantum distribution learning separations.Firstly, as discussed in Ref. [36] there is currently strong evidence that weak-secure PRF's are indeed a less complex object than classical-secure PRF's.More specifically, it is believed that there exist weak-secure PRF's which are not classical-secure PRF's [36], and as such Conjecture 1 would provide evidence that the existence of classical-secure PRF's is not necessary for the construction of distribution classes which are provably not classically efficiently PAC GEN-learnable.
Additionally one candidate for such a weak-secure PRF collection is based on the "learning parity with noise" problem1 , which is strongly believed to be hard for classical algorithms with classical random examples [46], but which is known to be efficiently solvable by quantum algorithms with quantum random examples [47,48].Importantly, unlike quantum algorithms for the discrete logarithm [30], which seem to require a universal fault-tolerant quantum computer, quantum algorithms for "learning parity with noise" (LPWN) are robust against certain types of noise models [47], and have in fact already been demonstrated on existing NISQ devices [49].As such, while demonstrating a quantum advantage for the generator-learnability of the DDH based concept class described in Section 3.3 would require a universal fault-tolerant quantum computer, it is plausible that if Conjecture 1 is true, then one could construct a LPWN based concept class which is not classically efficiently PAC GEN-learnable, but which is quantum efficiently PAC GEN-learnable using existing or near-term quantum devices, albeit with quantum random samples.With these observations in mind, let us return to a discussion of the difficulties in adapting the proof of Theorem 1 into a proof for Conjecture 1. Analogously to the proof of Theorem 1, we would like to show that if the distribution class { D(P,k) } is efficiently PAC-generator learnable (with respect to SAMPLE and the KL-divergence) then the function collection {F P } is not weak-secure.However, it is critical to note that in proving Theorem 1 we relied heavily on the alternative characterization of classical-secure PRF's provided by Lemma 1.More specifically, we used the fact that if there exists a polynomial inference algorithm (using membership queries) for an indexed collection of keyed functions, then this collection of functions is not standard-secure.Now, from the previous discussion, we know that if the distribution class { D(P,k) } is efficiently PAC GEN-learnable, then there exists an efficient polynomial inference algorithm for {F P } which only requires random examples, as opposed to membership queries.However, it is not clear that this implies that the function collection {F P } is not weak-secure!In other words, in order to adapt the proof of Theorem 1 to a proof of Conjecture 1 we need an alternative characterization of weak-secure PRF's analagous to Lemma 1 -i.e. a statement that if there exists an efficient polynomial inference algorithm for {F P } which only requires random examples, then {F P } is not a weak-secure PRF collection.To understand why obtaining such a characterization is tricky, it is necessary to sketch the original proof of Lemma 1 from Ref. [31].In order to prove the direction that we are concerned with, one starts by assuming that there exists a polynomial inference algorithm A for {F P }, and then using this algorithm to construct a new distinguishing algorithm A which, when given membership query access to some unknown function F can with non-negligible probability determine whether this function was drawn uniformly at random from the set of all functions F : DP → D P , or uniformly at random from the set of functions {F P (k, •) | k ∈ K P }.More specifically, algorithm A works as follows: 1.When given some parameterization P , along with oracle access to MQ(F ), the distinguishing algorithm A begins by simulating the inference algorithm A, which returns an "exam string" x.
2. Using MQ(F ), algorithm A then "prepares the exam" -i.e.presents algorithm A with y 1 = F (x) and y 2 ← U D P in a random order.
3. The inference algorithm A then "takes the exam", and picks either y 1 or y 2 .
4. If A picks y 1 , then A outputs 1, otherwise A outputs 0.
The fact that A succeeds with non-neglible probability then follows straightforwardly from the fact that A Q-infers {F P } for some polynomial Q [31].In light of the above sketch, we can now analyze the difficulties one faces in adapting the above proof when both the distinguishing algorithm and the inference algorithm are only allowed random example oracle access.Recall, we want to show that if A Q-infers {F P } using only random examples -i.e. with PEX access -then we can build a suitable distinguishing algorithm A , which also only requires random examples (which would imply {F P } is not weak-secure).So, as per the proof sketched above for the case of membership queries, when given access to PEX(F, U ), the distinguishing algorithm A could start by simulating the inference algorithm A, which returns an exam string x.It is at this point that we encounter a problem!Specifically, given the exam string x, A should prepare the exam by returning y 1 = F (x) along with some y 2 drawn uniformly at random from D P .If A has access to MQ(x) then it is straightforward to prepare the exam, as Query[MQ(F )](x) = F (x).However, if A only has access to PEX(F, U ), then it cannot prepare the exam!Note that if we modified the definition of polynomial inference (given as Definition 15) so that the inference algorithm does not get to choose its exam string, but is just given an exam string sampled uniformly at random (from the set of strings which have not yet been used), then algorithm A could prepare the exam for algorithm A, and the rest of the proof would hold, yielding an alternative characterization of weak-secure PRF collections in terms of a slightly modified notion of polynomial inference.However, we note that with such a modified definition of polynomial inference, the proof of Theorem 1 will no longer work!In particular, recall that the proof of Theorem 1 relies heavily on the fact that the constructed inference algorithm can use the generator it obtained from the distribution learner to choose its own exam string.In other words, if a polynomial inference algorithm for {F P } is required to pass a randomly drawn exam with non-negligible probability, then it is completely unclear how a distribution learner for {D (P,k) } can be used to construct a successful polynomial inference algorithm.Given these observations we see that, while Conjecture 1 seems plausible, and has a variety of interesting consequences if true, one cannot simply adapt the proof of Theorem 1 to this modified setting.

Classical Hardness Results from Hard to Learn Function Concept Classes
While in this work we have so far focused primarily on the PAC learnability of distribution concept classes, as an abstraction of generative modelling, there exists already a large body of work concerning the quantum versus classical PAC learnability of Boolean function concept classes [4].In this section, we aim to explore to some extent the relationship between these two notions, and in particular whether existing results in the latter context can be leveraged to obtain results in the former.As a starting point, we note that in principle one could instantiate the distribution class construction from KMRRSS [29] with a Boolean function concept class, as formalized by the following definition: Definition 19.Given some Boolean function f ∈ F n , we define the distribution D f ∈ D n+1 as the distribution defined via the classical generator Additionally, given some concept class C ⊆ F n we define the distribution class Given the above construction, we proceed in this section to prove Theorem 3, and to discuss in detail its inverse statement, which we formalize as Conjecture 2. Apart from shedding some light on the relationship between function learnability and distribution learnability, what we might hope is that taken together Theorem 3 and Conjecture 2 (if true) would allow us to instantly leverage some existing separation between the classical versus quantum learnability of a particular Boolean function concept class C, to obtain a separation between the classical versus quantum learnability of the associated distribution class D C .Unfortunately however this is not the case.In particular, we stress that both Theorem 3 and Conjecture 2 describe a relationship between the generatorlearnability of D C , and the distribution specific PAC learnability of the concept class C, with respect to the uniform distribution, as well as with respect to the classical random example PEX oracle.More specifically, what this means is that, if Conjecture 2 is true, and if there exists a concept class C which has the following properties: (a) C is not efficiently classically PAC learnable, with respect to the uniform distribution and the PEX oracle, (b) C is efficiently quantum learnable, with respect to the uniform distribution and the PEX oracle, then the distribution class D C would not be efficiently classically PAC GEN-learnable (via Conjecture 2), but it would be efficiently quantum PAC GEN-learnable (via Theorem 3).However, at present it is not known whether a concept class C with both of the above properties exists.More specifically, as discussed in Ref. [4], Kearns and Valiant [50] have constructed a concept class which, under the assumption that there exists no efficient algorithm for the factorization of Blum integers, is not efficiently PAC learnable with respect to the PEX oracle, but which Servedio and Gortler [51] have shown is efficiently quantum PAC learnable with respect to the PEX oracle.However, recall that in order to prove that a concept class is not efficiently PAC learnable, all one has to do is prove that there exists a single distribution D with respect to which the concept class is not efficiently PAC learnable.As such, it can be that a concept class is not efficiently PAC learnable, while still being efficiently PAC learnable with respect to the uniform distribution -which is the case for the factoring based concept class of Kearns and Valiant.
If one restricts themselves to PAC learnability with respect to the uniform distribution, Bshouty and Jackson [9] have shown that the concept class of s-term DNF, whose best known classical learner with PEX access requires quasi-polynomial time [52], is efficiently quantum PAC learnable, if one allows the learner access to the quantum random example oracle QPEX.As such we see that the factoring based concept class of Kearns and Valiant fails to satisfy our requirements due to the fact that it is efficiently classically PAC learnable with respect to the uniform distribution, while the concept class of s-term DNF fails to satisfy our requirements due to the fact that the efficient quantum learner requires quantum random examples.Despite this we note that Kharitonov [53,54] has given a variety of concept classes, which under various cryptographic assumptions, satisfy property (a) above -i.e. are not efficiently classically PAC learnable with respect to the uniform distribution and the random example oracle2 .In light of these results, we see that the truth of Conjecture 2 would at the least imply the existence of a distribution class which is not efficiently classically PAC GEN-learnable.Given these observations, we proceed to prove Theorem 3, and to discuss in more detail Conjecture 2.
In order to prove Theorem 3 we begin with a few preliminary results and observations.The first such observation follows directly from Definition 4: As such it is clear that any algorithm given oracle access to PEX(c, U ) can efficiently simulate oracle access to SAMPLE(D c ), and vice versa.Now, we assume that C is efficiently classically (quantum) learnable with respect to the uniform distribution and the PEX oracle -i.e. for all valid and δ there exists an efficient classical (quantum) ( , δ, PEX, U )-PAC learner for C, which we denote A ,δ .Using this we show that for all valid , δ there exists an efficient classical (quantum) ( , δ, SAMPLE, TV) PAC GEN-learner for the distribution class D C , which we denote A ,δ .More specifically, given some valid , δ, when given access to SAMPLE(D c ) algorithm A ,δ does the following: 1. Simulate A ,δ on input and obtain some h ∈ F n .Given the above result, we now move onto a discussion of Conjecture 2. As per the previous discussion, if Conjecture 2 is true, this would allow one to use any concept class which is not efficiently classically PAC learnable, with respect to the uniform distribution and random examples (such as those discussed by Kharitonov [53,54]), to obtain a distribution class which is not efficiently classically PAC GEN-learnable with respect to the SAMPLE oracle and the total variation distance.As we will see, a primary obstacle in trying to proving Conjecture 2 is the non-uniqueness of exact classical generators for a given discrete probability distribution.In fact, this difficulty illustrates clearly a key difference between the learnability of Boolean functions and the GEN-learnability of distribution classes.More specifically, given a concept class C which (up to some assumption) is provably not efficiently classically learnable (with respect to random examples drawn from the uniform distribution) a natural proof strategy for Conjecture 2 would be to obtain a contradiction by showing that if the distribution class D C was efficiently GEN-learnable, then the concept class C would be efficiently learnable.Similarly to the proof of Theorem 3, when given access to PEX(U, c) for some c ∈ C, a function learner A for C could easily simulate a distribution-class learner A for D C by using the PEX(U, c) oracle to simulate the SAMPLE(D c ) oracle.However, unlike in the proof of Theorem 3, the concept class learner A cannot simply extract a function hypothesis h from the approximate generator output by A .To make this more precise, and to pinpoint clearly the key difficulty, we begin with the following series of observations and lemmas which fully characterize the non-uniqueness of exact classical generators for D c .which would in fact be an exact solution.The key point, however, is that the learner A does not even know the permutation P .A natural question is then whether A could use GEN (Dc,m,P ) to learn P −1 ?Well, we note that GEN (Dc,m,P ) (x) = P (x) [1,n] ||c(P (x) [1,n] ) (80)

Output
and so, at least in the case that m = n, it is clear that one can generate a data-set of input/output pairs (x, P (x)) := (P −1 (y), y).Unfortunately, however, it is known that even with respect to membership queries, there does not exist an efficient exact learner for the concept class of permutations [55], and so the possibility of efficiently exactly learning P −1 from GEN (Dc,m,P ) is ruled out.Of course, in principle it could be sufficient to learn an approximation to P −1 from polynomially many random examples, however whether or not this is possible efficiently is not known.Additionally, as mentioned before, all of this is under the overly strong assumption that the generator learner is an exact learner, which outputs an exact generator with m = n.As can be seen from the above discussion, lifting either of these assumptions makes the fundamental problem of defining a suitable hypothesis from the output generator significantly harder.
At this stage we have outlined the primary difficulty with one natural strategy for proving Conjecture 2, which provides a clear illustration of a key conceptual and technical difference between the PAC learnability of Boolean function concept classes and the generator learnability of distribution classes.Of course, one could conceive of a variety of other strategies, based for example on alternative characterizations of efficient PAC learnabilty [56], Occam algorithms [57] or VC dimensions [58], however it is important to keep in mind the restriction of efficient learnability with respect to random examples from the uniform distribution, which makes it unclear how to immediately apply existing results involving some of the above mentioned tools and characterizations.

Discussion and Conclusion
Given the results and insights of this work, we provide here a brief summary, as well as a perspective on interesting open questions and future directions.Firstly, to summarize, in Section 3 we have constructed a class of probability distributions, specified by classical generators, which under the DDH assumption, is provably not efficiently PAC GEN-learnable by any classical algorithm, but for which we have constructed an efficient quantum GEN-learner, which only requires access to classical samples from the unknown distribution.This construction therefore provides a clear example of a generative modelling task for which quantum learners exhibit a provable quantum advantage.Despite this, there of course remain a variety of interesting open questions, for which the insights and conjectures from Section 4 may provide useful: 1. What can one say about the quantum versus classical PAC learnability (in a generative sense) of the probability distributions used for the demonstration of "quantum computational supremacy" [25][26][27].In particular, there are a variety of distinct questions.Firstly, while it is known that there exists no efficient classical algorithm mapping from randomly drawn quantum circuit parameters to samples from a distribution close to the one generated by the corresponding quantum circuit, is it possible to prove that when given samples from such distributions (as opposed to circuit parameters) it is also not possible to efficiently classically learn a description of a suitable generator?Intuitively, this question seems closely related to the question of whether or not efficient classical verification of such distributions is possible.To this end, we note that Ref. [44] has proven that efficiently verifying certain such distributions given classical samples is not efficiently possible.However, it seems plausible that the existence of an efficient classical PAC GEN-learner would imply the existence of an efficient classical black-box verification algorithm -which would then rule out the possibility of an efficient classical PAC GEN-learner.However, there are two important obstacles.Firstly, as discussed at length in Section 4.2, it is important to note that there is no unique generator for a given probability distribution.Secondly, the PAC framework places no requirement on the behaviour of GEN-learners when given access to samples from some distribution outside of the learnable distribution class.As such, while plausible, it is not clear how exactly to exploit an efficient PAC GEN-learner for the construction of an efficient black-box verification algorithm with suitable soundness guarantees, and formalizing this connection would certainly be of great interest.Secondly, in addition to proving hardness of classically learning such distributions, we would of course also like to investigate the possibility of efficient quantum learnability, and how this relates to quantum verifiability [44,[59][60][61][62]. Once again, while it is known that there exist efficient quantum generators for such distributions, this certainly does not immediately imply the existence of efficient quantum PAC QGEN-learners.As such, understanding whether or not there exist efficient quantum PAC QGEN-learners for such distributions is of natural interest, particularly in light of the potential connections between generator-learnability and black-box verification.
2. As the quantum learner that we have constructed in Section 3.3 relies on the quantum algorithm for discrete logarithms [30], it is most likely the case that the quantum advantage exhibited by this learner would require the existence of a universal fault-tolerant quantum computer.Given the current availability of NISQ devices, it is of natural interest to ask whether there exists a generative modelling problem for which near-term quantum learners (and in particular QGEN-learners) can exhibit a provable advantage over classical learning algorithms.In order to answer this question it will certainly be necessary to understand better the theoretical properties of previously proposed NISQ hybrid-quantum classical algorithms for generative modelling, such as Born machines [16].Additionally, it also seems likely that techniques for proving classical hardness results under weaker assumptions, as discussed in Section 4, would be of great help.Alternatively, it may help to focus on probability distributions which can be generated by near term quantum devices, but not classical devices, as discussed in the previous point.It is also important to reiterate that the seperation we have obtained here relies fundamentally on the known advantage quantum computers offer for computing discrete logarithms, and as such this work does not provide a new primitive for proving classical/quantum separations.Whether one can construct a quantum/classical learning separation without relying on such prior primitives is an interesting open question.
3. It is of interest to note that the efficient quantum GEN-learner that we have constructed in Section 3.3 requires only a single oracle query, and always outputs an exact generator.Such a quantum GENlearner is certainly formally sufficient for the purpose of answering Question 1 in the affirmative, and from one perspective provides the "optimal" GEN-learner, in the sense that its query complexity is clearly optimal, and its behaviour (both run-time and output) is independent of and δ -i.e. for all and δ the algorithm returns an exact generator with certainty.However, intuitively we might think of a "learning" algorithm as an algorithm which requires multiple samples (i.e.learns from a "data-set"), and outputs most often only approximate solutions, and from this perspective it is not clear to which extent the GEN-learner we have constructed can be considered a "learning algorithm".As such, from a conceptual perspective it is interesting to ask whether there exists a distribution class which provides an affirmative answer to Question 1, but for which the efficient quantum generator learner requires a non-trivial query complexity, and at best outputs only a suitably approximate generator, with sufficiently high probability.In particular, while the GENlearner we have constructed is clearly highly specific to the distribution class we have constructed, it is possible that by considering concept classes for which always exact constant query complexity learners do not exist, one may be forced to construct or consider quantum generative modelling algorithms which are not as task-specific as the learner we have constructed here, and may also be suitable for near-term devices.
4. In this work we have considered quantum and classical GEN-learners, both of which only have access to classical samples from the unknown probability distribution -i.e. to the SAMPLE oracle.Analogously to the Boolean function setting [4], it is also interesting to ask whether there exists a distribution class for which a quantum learner (either a GEN-learner or a QGEN-learner) exhibits a quantum advantage, but only if the quantum learner has access to quantum samples from the QSAMPLE oracle.As discussed in Section 4.1, it seems likely that if Conjecture 1 is true then one could construct such a concept class using the weak-secure pseudorandom function collection based on the Learning Parity with Noise problem.Additionally, it is also plausible that if Conjecture 2 is true, then one could modify both this and Theorem 3 to prove both learnability and hardness results for quantum learners with quantum samples, from corresponding results for Boolean function concept classes.

Definition 2 (
Complexity of PAC Learners).The sample (time) complexity of an ( , δ, O, D)-PAC learner A for a concept class C is the maximum number of queries made by A to the oracle O(c, D) (maximum run-time required by A) over all c ∈ C, and over all internal randomness of the learner.The sample (time) complexity of an ( , δ, O)-PAC learner A for a concept class C is the maximum number of queries made by A to the oracle O(c, D) (maximum run-time required byA) over all c ∈ C, all possible distributions D and all internal randomness of the learner.Both an ( , δ, O, D)-PAC and an ( , δ, O)-PAC learner for a concept class C ⊆ F n is called efficient if its time complexity is O(poly(n, 1/δ, 1/ )).

Definition 6 (
PAC Generator Learners).A learning algorithm A is an ( , δ, O, d)-PAC GEN-learner (QGEN-learner) for a distribution class C, if for all D ∈ C, when given access to oracle O(D), with probability 1 − δ the learner A outputs a classical (d, )-generator GEN D (quantum (d, )-generator QGEN D ) for D.

Definition 7 (
Complexity of PAC Generator Learners).The sample (time) complexity of either an ( , δ, O, d)-PAC GEN-learner or an ( , δ, O, d)-PAC QGEN-learner A for a distribution class C is the maximum number of queries made by A to the oracle O(D) (maximum run-time required by A) over all D ∈ C, and over all internal randomness of the learner.Both an ( , δ, O, d)-PAC GEN-learner or an ( , δ, O, d)-PAC QGEN-learner for a concept class C ⊆ F n is called efficient if its time complexity is O(poly(n, 1/δ, 1/ )).Additionally, we can define both the efficient classical PAC generator learnability of a distribution class (Definition 8) as well as the efficient quantum PAC generator-learnability of a distribution class (Definition 9): e cient classical generator < l a t e x i t s h a 1 _ b a s e 6 4 = " s 6 c b S h K l u P B / g C P 3 Z S D 8 h v L Y p E r 3 I 4 i O k I n 6 A z 5 6 A L V 0 C 2 q o w a i 6 B E 9 o 1 f 0 5 j w 5 L 8 6 7 8 z E v L T h 5 z y H 6 E 8 7 n D 5 e X m T I = < / l a t e x i t > GEND < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 M B 8 G W x p 9 c 7 8 V E 8 n l s b

7 t 6 5
O y e E I z / C x k I R W 3 + P n f / G T X K F J j 4 Y e L w 3 w 8 y 8 K J H C o u 9 / e 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 a e P U M N 5 g s D M 7 z C m 5 d 4 L 9 6 7 9 7 F o X f P y m R P 4 A + / z B 7 I m j 8 U = < / l a t e x i t > learner < l a t e x i t s h a 1 _ b a s e 6 4 = " i m c T m 1 Y c l 8 A 1 D 0

x 7 4 8 l 4 N
d 5 G o x l j v L M B / s B 4 / w F C n J e 8 < / l a t e x i t > or < l a t e x i t s h a 1 _ b a s e 6 4 = " + C x q E y I G c 2 O G C K q 8 0 U p g 3 O 5I P F 0 = " > A A A B 6 X i c b V A 9 S w N B E J 3 z M 8 a v q K X N Y h C s w l 0 s F K u A j W U U 8 w H J E f Y 2 e 8 m S v d 1 j d 0 4 I I f / A x k I R W / + R n f / G T X K F J j4 Y e L w 3 w 8 y 8 K J X C o u 9 / e 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 a X / l a t e x i t > classical < l a t e x i t s h a 1 _ b a s e 6 4 = " d j x d A 6 s 7 / b I y Z s h 04 u a Z Y x W S B y Y = " > A A A B 8 H i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s w U w c e q 4 M Z l B f u Q d i i Z N N O G J p k h y Q h l 6 F e 4 c a G I W z / H n X 9 j Z j o L b T 0 Q O D n n 3 u T e E 8 S c a e O 6 3 0 5 p b X 1 j c 6 u 8 X d n Z 3 d s / q B 4 e d X S U K E L b J O K R 6 g V Y U 8 4 k b R t m O O 3 F i m I R c N o N p r e Z 3 3 2 i S r N I P p h Z T H 2 B x 5 K F j G B j p U f C s d b 2 w o f V m l t 3 c 6 B V 4 h W k B g V a w + r X Y B S R R F B p 8 k f 6 n h s b P 8 X K M M L p v D J I N I 0 x m e I x 7 V s q s a D a T / O B 5 + j M K i M U R s o e a V C u / u 5 I s d B 6 J g J b K b C Z 6 G U v E / / z + o k J r / y U y T g x V J L F R 2 H C k Y l Q t j 0 a M U W J 4 T N L M F H M z o r I B C t M j M2 o Y k P w l l d e J Z 1 G 3 b u o X 9 8 3 a s 2 b I o 4 y n M A p n I M H l 9 C E O 2 h B G w g I e I Z X e H O U 8 + K 8 O x + L 0 p J T 9 B z D H z i f P / z e k I U = < / l a t e x i t > learner < l a t e x i t s h a 1 _ b a s e 6 4 = " i m c T m 1 Y c l 8 A 1 D 0 d U b D 4 1 a 8 7 a I o w x n c A 6 X 4 M E 1 N O E e W t A G B l N 4 h l d 4 c x L n x X l 3 P p a t J a e Y O Y U / c D 5 / A H 4 g j 6 M = < / l a t e x i t > QGEN D 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " j b 1 Y / S M b

x 7 4 8 l 4 N
d 5 G o x l j v L M B / s B 4 / w F C n J e 8 < / l a t e x i t > or < l a t e x i t s h a 1 _ b a s e 6 4 = " + C x q E y I G c 2 O G C K q 8 0 U p g 3 O 5I P F 0 = " > A A A B 6 X i c b V A 9 S w N B E J 3 z M 8 a v q K X N Y h C s w l 0 s F K u A j W U U 8 w H J E f Y 2 e 8 m S v d 1 j d 0 4 I I f / A x k I R W / + R n f / G T X K F J j4 Y e L w 3 w 8 y 8 K J X C o u 9 / e 2 v r G 5 t b 2 4 W d 4 u 7 e / s F h 6 e i 4 a X h o a l P P J B O P D o m w f t a a e N O I P T z F R 6 p v y d i 4 k k 5 8 F y d T J e U 0 1 4 q / u c 1 I t U 5 c W L m h 5 E C n 4 4 / 6 k r s F e 6 9 w e K H T O A I D Z M A G 2 A J 5 Y I N 9 U A S n o A T K A I M 7 8 A i e w Y v x Y D w Z r 8 b b o H X C G M 6 s g T 8 w P n 4 A k 3 C c J w = = < / l a t e x i t > k U (KP ) < l a t e x i t s h a 1 _ b a s e 6 4 = " G 7 A W D S / O + e X i 7 R 0 b q q 9 a 0 Q d 5 B A W O n k 8 j o E e / z l S V L d L 9 k H p e N L n c Y J G C I H t s A O K A I b H I I y O A M V 4 A A M 7 s A j e A Y v x o P x Z L w a b 8 P W K W M 0 s w H + w P j 4 A S 6 t m / o = < / l a t e x i t > {0, 1} < l a t e x i t s h a 1 _ b a s e 6 4 = " / D M / 1 c e L p k G 1 g M A I P I J n 8 K L d a 0 / a q / Y 2 a c 1 o 0 5 l d 8 A f a + w 9 w 9 Z n W < / l a t e x i t > MQ(R) < l a t e x i t s h a 1 _ b a s e 6 4 = " n e G o e n V k w 8 n X B Z w E 7 H b

Figure 2 :
Figure2: Illustration of equivalent notions of a classical-secure PRF collection.The left panel illustrates the setting as per Definition 14, in which we consider distinguishing algorithms which, with membership query access, try to distinguish between a randomly drawn instance of the keyed function collection FP (k, •) and a function R drawn uniformly at random.The right panel illustrates Definition 15, in which we consider an inference algorithm, which after a learning phase, should try to pass an "exam" of its own choosing.As per Lemma 1, for a given indexed collection of keyed functions, there exists a suitable distinguishing algorithm, if and only if there exists a suitable inference algorithm.

A
< l a t e x i t s h a 1 _ b a s e 6 4 = " 7 s 3 b K e e j N W d o / v 6 I 5 T s F 0 e 6 9 j 4 4 = " g y J e P X j z b 5 w 8 D h o t a C i q u u n u 8 i L B N d j 2 l 5 V Y W F x a X k m u p t b W N z a 3 0 t s 7 V R 3 G i r I K D U W o 7 j y i m e C S V Y C D Y H e R Y i T w B L v 1 O h d j / / a e K c 1 D e Q P 9 i D U C 0 p L c 5 5 S A k d z 0 Y a 6 Y x 9 n e c H j p l n O d o 1 5 +

1 u 1 /
d c = " > A A A C C n i c b V D L S s N A F J 3 4 r P U V d e l m t B V a k J I U x c e q I K I r r W A f 0 I Y w m U 7 a o Z M H M x O h h K z d + C t u X C j i 1 i 9 w 5 9 8 4 a b P Q 1 g M D h 3 P u Z e 4 5 T s i o k I b x r c 3 N L y w u L e d W 8 q t r 6 x u b + t Z 2 U w Q R x 6 S B A x b w t o M E Y d Q n D U k l I + 2 Q E + Q 5 j L S c 4 U X q t x 4 I F z T w 7 + U o J J a H + j 5 1 K U Z S S b a + V z o u w 9 t I h p G E x a 6 H 5 I B 7 8 d X l T W L H p f r h s J w U b b 1 g 2 p P 2 o r 1 r H 5 P R O S 3 b 2 Q F / o H 3 + A H r 3 m N w = < / l a t e x i t > 7 h F d 6 s J + v F e r c + 5 q M 5 K 9 s 5 h D + w P n 8 A k z i R 5 g = = < / l a t e x i t > (4) key k < l a t e x i t s h a 1 _ b a s e 6 4 = "N K m U 8 4 h f j X M a r 6 7 X L D n b v Y z G Z b 4 = " > A A A B 8 n i c b V D L S s N A F J 3 U V 6 2 v q k s 3 g 6 1 Q N y U p B R + rg h u X F e w D 0 l A m 0 0 k 7 d D I J M z d C C P 0 M N y 4 U c e v X u P N v n L Z Z a O u B C 4 d z 7 u X e e / x Y c A 2 2 / W 0 V N j a 3 t n e K u 6 W 9 / Y P D o / L x S V d H i a K s Q y M R q b 5 P N B N c s g 5 w E K w f K 0 Z C X 7 C e P 7 2 b + 7 0 n p j S P 5 C O k M f N C M p Y 8 4 J S A k d x a 8 e m 4 j / e a 1 E d i / c l I Z x I k m I Z 4 u 6 C Y M y g p M 8 Y I d y g i U b K Y I w p + p W i P u I I y x V a n k V g j 3 / 8 i K p V 8 r 2 W f n y r l K o X m V x 5 M A R O A Y m s M E 5 q I J b 4 I A a w O A R P I N X 8 K Y 9 a S / a u / Y x a 1 3 S s p k D 8 A f a 5 w + X Y p O 5 < / l a t e x i t > n x l 9 x 4 0 I R t / 6 C O / / G S Z u F t h 6 4 c D j n X u 6 9 x 0 8 o E d I 0 v 7 X S 0 v L K 6 l p 5 v b K x u b W 9 o + / u d U W c c o R t F N O Y 9 3 w o M C U M 2 5 J I i n s J x z D y K b 7 3 R 5 e 5 f / + A u S A x u 5 P j B L s R H D A S E g S l k j z 9 0 I m g H P I o 6 1 z 1

Figure 3 :
Figure3: Overview of the strategy for proving quantum learnability.In particular, if the classical-secure PRF {FP } has the property that there exists an efficient quantum key learner, an efficient quantum algorithm A which on input P and access to PEX(FP (k, •), Un) can learn the key k, then one can construct a GEN-learner A for {D (P,k) } by using oracle access to SAMPLE(D (P,K) ) to simulate the key learner A .

1 <
R e r N d J a 8 a a z m z B L 1 j v 3 1 L S l b g = < / l a t e x i t > xn = l a t e x i t s h a 1 _ b a s e 6 4 = " U b q F w I i i E m w 5 8 j Z O X L 3 G F B W X n p Y = " > A A A B 7 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e y K 4 A O E g B e P E c w D k i X M T m a T I b O z y 0 y v G J Z 8 h B c P i n j 1 e 7 z 5 N 0 6 S P W

r 2 6 P 6 /
U r v M 4 i n A E x 3 A K H l x A D e 6 g D g 1 g M I J n e I U 3 J 3 F e n H f n Y 9 5 a c P K Z Q / g D 5 / M H C y y O t w = = < / l a t e x i t > y = f 1 p (bn) < l a t e x i t s h a 1 _ b a s e 6 4 = " 6 T 4 a 8 v 8 N a j f u 1 H o C y 1 I x H z e D s W g 9 W S 9 W m / T 1 o w 1 m 9 m H P 7 D e f w D y w p J k < / l a t e x i t > x2 = 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " p 0 s 6 o L u 2 6 r y C x P / Z f F 7 b U C E 4 K R I = " > A A A B 7 n i c b V D J S g N B E K 2 J W 4 z b q E d B G o P g K c y E g A s I A S 8 e I 5 g F k i H 0 d H q S J j 0 9 Q 3 e P G I b 8 g 1 4 8 K O L V S / 7 B b x A 8 + D d 2 l o M m 9 W S 9 W m / T 1 o w 1 m 9 m H P 7 D e f w D y w p J k < / l a t e x i t > Input: Parameterization p, g, g a , key b 2 Zq, and input x = x1|| . . .||xn < l a t e x i t s h a 1 _ b a s e 6 4 = " r c + J q 0 L 1 n R m D / y B 9 f U D 6 U u d E g = = < / l a t e x i t > bn = f (bn 1)= F (p,g,g a ) (b, 0 . . .0) < l a t e x i t s h a 1 _ b a s e 6 4 = " Y f M D C t Q j E c T M + 8 g o y z 9 i b b j Z l 8 s = " > A A A C G n i c b V D L S s N A F J 3 4 r P E V d e l m s C g t 1 J J I w Q c I B U F c V r A P a G q Y T C b t 0 M k k z E y E E v o d b v w V N y 4 U c S d u / B u n j 4 W 2 H r h w 5 p x 7 m X u P n z A q l W 1 / G w u L S 8 s r q 7 k 1 c 3 1 j c 2 v b 2 t l t y D g V m N R x z G L R 8 p E k j H J S V 1 Q x 0 k o E Q Z H P S N P v X 4 3 8 5 g M R k s b 8 T g 0 S 0 o l Q l 9 O Q Y q S 05 F m O 7 3 F 4 d A n D g u 9 l / N g Z F l 3 X 1 O 9 r L y s k p W 6 p e 4 + K w 4 J f s l 0 W x E p C u + h Z e b t s j w H n i T M l e T B F z b M + 3 S D G a U S 4 w g x J 2 X b s R H U y J B T F j A x N N 5 U k Q b

1 <
G S P X k r D c S / / P a q Q r P O h n l S a o I x 5 O P w p R B F c N R T j C g g m D F B p o g L K j e F e I e E g g r n a a p Q 3 B m T 5 4 n j Z O y U y m f 3 1 b y 1 Y t p H D m w D w 5 A A T j g F F T B D a i B O s D g E T y D V / B m P B k v x r v x M W l d M K Y z e + A P j K 8 f 5 j m d E A = = < / l a t e x i t > dlog < l a t e x i t s h a 1 _ b a s e 6 4 = " e N Y 8 X tW f / l A o Q g / F D O J l H w d x x N 4 = " > A A A B 9 H i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s y I 4 G N V c O O y g n 1 A O 5 R M J t O G J p k x y R T K 0 O 9 w 4 0 I R t 3 6 M O / / G T D s L b T 0 Q O J x z L / f k B A l n 2 r j u t 1 N a W 9 / Y 3 C p v V 3 Z 2 9 / Y P q o d H b R 2 n i t A W i X m s u g H W l D N J W 4 Y Z T r u J o l g E n H a C 8 V 3 u d y Z U a R b L R z N N q C /w U L K I E W y s 5 P c F N i M l s p D H w 9 m g W n P r 7 h x o l X g F q U G B 5 q D 6 1 Q 9 j k g o q D e F Y 6 5 7 n J s b P s D K M c D q r 9 F N N E 0 z G e E h 7 l k o s q P a z e e g Z O r NK i K J Y 2 S c N m q u / N z I s t J 6 K w E 7 m I f W y l 4 v / e b 3 U R N d + x m S S G i r J 4 l C U c m R i l D e A Q q Y o M X x q C S a K 2 a y I j L D C x N i e K r Y E b / n L q 6 R 9 U f c u 6 z c P l 7 X G b V F H G U 7 g F M 7 B g y t o w D 0 0 o Q U E n u A Z X u H N m T g v z r v z s R g t O c X O M f y B 8 / k D a H C S g w = = < / l a t e x i t > dlog < l a t e x i t s h a 1 _ b a s e 6 4 = " e N Y 8 X t W f / l A o Q g / F D O J l H w d x x N 4 = " > A A A B 9 H i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s y I 4 G N V c O O y g n 1 A O 5 R M J t O G J p k x y R T K 0 O 9 w 4 0 I R t 3 6 M O / / G T D s L b T 0 Q O J x z L / f k B A l n 2 r j u t 1 N a W 9 / Y 3 C p v V 3 Z 2 9 / Y P q o d H b R 2 n i t A W i X m s u g H W l D N J W 4 Y Z T r u J o l g E n H a C 8 V 3 u d y Z U a R b L R z N N q C / w U L K I E W y s 5 P c F N i M l s p D H w 9 m g W n P r 7 h x o l X g F q U G B 5 q D 6 1 Q 9 j k g o q D e F Y 6 5 7 n J s b P s D K M c D q r 9 F N N E 0 z G e E h 7 l k o s q P a z e e g Z O r N K i K J Y 2 S c N m q u / N z I s t J 6 K w E 7 m I f W y l 4 v / e b 3 U R N d + x m S S G i r J 4 l C U c m R i l D e A Q q Y o M X x q C S a K 2 a y I j L D C x N i e K r Y E b / n L q 6 R 9 U f c u 6 z c P l 7 X G b V F H G U 7 g F M 7 B g y t o w D 0 0 o Q U E n u A Z X u H N m T g v z r v z s R g t O c X O M f y B 8 / k D a H C S g w = = < / l a t e x i t > x1 = 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " f C g W e 1 3 5 J B X u M m G 4 3 i u o 9 o v S i g c = " > A A A B 7 n i c b V D L S s N A F L 2 p r 1 p f U Z e C D B b B V U l E 8 A F C w Y 3 L C v Y B b S i T 6 a Q d O p m E m Y l Y Q v 9 B N y 4 U c e u m / + A 3 C C 7 8 G 6 d p F 9 p 6 4 M L h n H u 5 9 x 4 / 5 k x p x / m 2 c g u L S 8 s r + d X C 2 v r G 5 p a 9 v V N T U S I J r Z K I R 7 L h Y 0 U 5 E 7 S q m e a 0 E U u K Q 5 / T u t + / G v v 1 O y o V i 8 S t H s T U C 3 F X s I A R r I 1 U v 2 + 7 6 B I 5 b b v o l J w M a J 6 4 U 1 I s 2 / u j r 4 e P U a V t f 7 Y 6 E U l C K j T h W K m m 6 8 T a S 7 H U j H A 6 L L Q S R W N M + r h L m 4 Y K H F L l p d m 5 Q 3 R o l A 4 K I m l K a J S p v y d S H C o 1 C H 3 T G W L d U 7 P e W P z P a y Y 6 O P N S J u J E U 0 E m i 4 K E I x 2 h 8 e + o w y Q l m g 8 M w U Q y c y s i P S w x 0 S a h g g n B n X 1 5 n t S O S + 5 J 6 f z G p H E B E + R h D w 7 g C F w 4 h T J c Q w W q Q K A P j / A M L 1 Z s P V m v 1 t u k N W d N Z 3 b h D 6 z 3 H + + 2 k m I = < / l a t e x i t > x1 = l a t e x i t s h a 1 _ b a s e 6 4 = " r p S N y F X P N + Q X q V N m V s F w B 5 Y W W 9 g = " > A A A B 7 n i c b V D L S s N A F L 2 p r 1 p f U Z e C D B b B V U l E 8 A F C w Y 3 L C v Y B b S i T 6 a Q d O p m E m Y l Y Q v 9 B Ny 4 U c e u m / + A 3 C C 7 8 G 6 d p F 9 p 6 4 M L h n H u 5 9 x 4 / 5 k x p x / m 2 c g u L S 8 s r + d X C 2 v r G 5 p a 9 v V N T U S I J r Z K I R 7 L h Y 0 U 5 E 7 S q m e a 0 E U u K Q 5 / T u t + / G v v 1 O y o V i 8 S t H s T U C 3 F X s I A R r I 1 U v 2 + 7 6 B K 5 b b v o l J w M a J 6 4 U 1 I s 2 / u j r 4 e P U a V t f 7 Y 6 E U l C K j T h W K m m 6 8 T a S 7 H U j H A 6 L L Q S R W N M + r h L m 4 Y K H F L l p d m 5 Q 3 R o l A 4 K I m l K a J S p v y d S H C o 1 C H 3 T G W L d U 7 P e W P z P a y Y 6 O P N S J u J E U 0 E m i 4 K E I x 2 h 8 e + o w y Q l m g 8 M w U Q y c y s i P S w x 0 S a h g g n B n X 1 5 n t S O S + 5 J 6 f z G p H E B E + R h D w 7 g C F w 4 h T J c Q w W q Q K A P j / A M L 1 Z s P V m v 1 t u k N W d N Z 3 b h D 6 z 3 H / E 6 k m M = < / l a t e x i t > fp < l a t e x i t s h a 1 _ b a s e 6 4 = " e 6 Y 0 E e X G P / 3 N U Z o q W o J F p w O P + v Q = " > A A A B 6 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e x K w M c p 4 M V j R P O A Z A m z k 9 l k y O z s M t M r h C W f 4 M W D I l 7 9 I m / + j Z N k D 5 p Y 0 F B U d d P d F S R S G H T d b 6 e w t r 6 x u V X c L u 3 s 7 u 0 f l A + P W i Z O N e N N F s t Y d w J q u B S K N 1 G g 5 J 1 E c x o F k r e D 8 e 3 M b z 9 x b U S s H n G S c D + i Q y V C w S h a 6 S H s J / 1 y x a 2 6 c 5 B

1 p
3 h z p P P i v D s f i 9 a C k 8 8 c w x 8 4 n z 9 R O I 3 R < / l a t e x i t > f < l a t e x i t s h a 1 _ b a s e 6 4 = " 3 0 8 W ei p Z S 1 v K L V L e E o t 1 D 2 / G j h g = " > A A A B 7 3 i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r A R + n g B e P E c w D k h h m J 7 3 J k N n Z d W Z W C E t + w o s H R b z 6 O 9 7 8 G y f J H j S x o K G o 6 q a 7 y 4 8 F 1 8 Z 1 v 5 3 c y u r a + k Z + s 7 C 1 v b O 7 V 9 w / a O g o U Q z r L B K R a v l U o + A S 6 4 Y b g a 1 Y I Q 1 9 g U 1 / d D P 1 m 0 + o N I / k v R n H 2 A 3 p Q P K A M 2 q s 1 A o e 0 j Nv 0 o t 7 x Z J b d m c g y 8 T L S A k y 1 H r F r 0 4 / Y k m I 0 j B B t W 5 7 b m y 6 / A K b 8 6 j 8 + K 8 O x / z 1 p y T z R z C H z i f P 6 e r j 7 c = < / l a t e x i t > fp < l a t e x i t s h a 1 _ b a s e 6 4 = " e 6 Y 0 E e X G P /3 N U Z o q W o J F p w O P + v Q = " > A A A B 6 n i c b V D L S g N B E O y N r x h f U Y 9 e B o P g K e x K w M c p 4 M V j R P O A Z A m z k 9 l k y O z s M t M r h C W f 4 M W D I l 7 9 I m / + j Z N k D 5 p Y 0 F B U d d P d F S R S G H T d b 6 e w t r 6 x u V X c L u 3 s 7 u 0 f l A + P W i Z O N e N N F s t Y d w J q u B S K N 1 G g 5 J 1 E c x o F kr e D 8 e 3 M b z 9 x b U S s H n G S c D + i Q y V C w S h a 6 S H s J / 1 y x a 2 6 c 5 B

1 p
3 h z p P P i v D s f i 9 a C k 8 8 c w x 8 4 n z 9 R O I 3 R < / l a t e x i t > f < l a t e x i t s h a 1 _ b a s e 6 4 = " 3 0 8 W e i p Z S 1 v K L V L e E o t 1 D 2 / G j h g = " > A A A B 7 3 i c b V D L S g N B E O y N r x h f U Y 9 e B o P g x b A r A R + n g B e P E c w D k h h m J 7 3 J k N n Z d W Z W C E t + w o s H R b z 6 O 9 7 8 G y f J H j S x o K G o 6 q a 7 y 4 8 F 1 8 Z 1 v 5 3 c y u r a + k Z + s 7 C 1 v b O 7 V 9 w / a O g o U Q z r L B K R a v l U o + A S 6 4 Y b g a 1 Y I Q 1 9 g U 1 / d D P 1 m 0 + o N I / k v R n H 2 A 3 p Q P K A M 2 q s 1 A o e 0 j N v 0 o t 7 x Z J b d m c g y 8 T L S A k y 1 H r F r 0 4 / Y k m I 0 j B B t W 5 7 b m y 6
Definition 8 (Efficient Classical PAC Generator-Learnability of a Distribution Class).We say that a distribution class C is efficiently classically PAC GEN-learnable (QGEN-learnable) with respect to oracle O and distance measure d if for all > 0, 0 < δ < 1 there exists an efficient classical ( , δ, O, d)-PAC GEN-learner (QGEN-learner) for C. Definition 9 (Efficient Quantum PAC Generator-Learnability of a Distribution Class).We say that a distribution class C is efficiently quantum PAC GEN-learnable (QGEN-learnable) with respect to oracle O and distance measure d if for all > 0, 0 < δ < 1 there exists an efficient quantum ( , δ, O, d)-PAC GEN-learner (QGEN-learner) for C.
GEN D h .ByLemma 4we know that if Pr x←Un [h(x) = c(x)] ≤ , then GEN D h is a (d TV , ) generator for D c .Therefore it follows from the fact that A ,δ is an efficient ( , δ, PEX, U )-PAC learner for C, that A is an ( , δ, SAMPLE, d TV )-PAC GEN-learner for the distribution class D C .Before continuing we note that the proof of Theorem 3 relies strongly on the fact that the concept class C is learnable from random examples drawn from the uniform distribution.In particular, if the concept class C was only learnable with respect to membership queries, or random examples drawn from some other distribution, then the distribution class learner A could not simulate the concept class learner A. It is this observation that motivates our restriction to the uniform distribution specific learnability of concept classes from random examples.