Amplitude Ratios and Neural Network Quantum States

Neural Network Quantum States (NQS) represent quantum wavefunctions by artificial neural networks. Here we study the wavefunction access provided by NQS defined in [Science, \textbf{355}, 6325, pp. 602-606 (2017)] and relate it to results from distribution testing. This leads to improved distribution testing algorithms for such NQS. It also motivates an independent definition of a wavefunction access model: the amplitude ratio access. We compare it to sample and sample and query access models, previously considered in the study of dequantization of quantum algorithms. First, we show that the amplitude ratio access is strictly stronger than sample access. Second, we argue that the amplitude ratio access is strictly weaker than sample and query access, but also show that it retains many of its simulation capabilities. Interestingly, we only show such separation under computational assumptions. Lastly, we use the connection to distribution testing algorithms to produce an NQS with just three nodes that does not encode a valid wavefunction and cannot be sampled from.

• the ability to compute expectation values of sparse observables.
We give evidence that SQ is strictly stronger access model than AR and show that AR is a strictly stronger model than PCOND. We derive a robust version of the fidelity estimator and show how to estimate sparse observables with AR access.

NQS postselection gadgets:
We show how to postselect the Born distribution of an NQS by changing its network structure. We call such transformation a NQS postselection gadget and use it to give an NQS with only three nodes (and polynomially-bounded weights and biases) that does not encode a valid wavefunction. As the result implies that the NQS distribution cannot be sampled from, it can be understood as a counterpart to the best known hardness of sampling result for Restricted Boltzmann Machines, which shows that a certain distribution cannot be represented (and hence sampled from) with a polynomially-sized RBM [20]. We briefly discuss other possible application of the gadgets.

Neural Network Quantum States (NQS)
Ref. [6] proposed NQS representation of quantum wavefunctions by a hidden Markov models, largely inspired by Restricted Boltzmann Machines (RBM) [23,12]. We briefly review it here.
Let v ∈ {−1, +1} n and define ψ(v) = f θ (v)/Z θ , where: and Z θ := v |f θ (v)| 2 . The parameters: are all complex-valued; a ∈ C n , b ∈ C m , W ∈ C n×m and fully specify the model. We denote θ ∞ = max ( a ∞ , b ∞ , W ∞ ) and assume that θ ∞ ≤ poly(n). Note that Z θ := v |f θ (v)| 2 sums over all configurations v and that there is apriori no simple way to evaluate it. In contrast, the numerator f θ (v) in Eq. 1 can be evaluated to machine precision 2 in polynomial time in m and n. This follows by observing that each of the m factors only depends on a n j v i W ji which has at most n terms. See Fig. 1. The set of parameters θ is usually found by variational optimization [6]. This relies on sampling from the Born distribution |ψ θ (v)| 2 and gradually updating the network parameters. Sampling from the Born distribution is usually done by "thermalizing" the model with a Markov Chain Monte Carlo; or Gibbs sampling from the Born distribution. The same method is then used to sample from the Born distribution of a trained model. There is generally no guarantee that the Markov chains, both during the training and testing phase, converge to the target distribution rapidly for a given θ. Remarkably, we in show in Sec. 4.2 that there exist parameters for which no such Markov chain converges. Figure 1: NQS is composed of hidden (h) and visible (v) nodes arranged in a bipartite graph. Every node can be in a state +1 or −1. Each edge (i, j) of the graph carries a complex-valued interaction strength W ij and every node carries a local field a i (for visible) or b j (hidden). The model encodes the unnormalized wavefuntion into a "marginal amplitude" over hidden nodes (Eq. 1). It can be sampled from using Gibbs sampling.

NQS and Restricted Boltzmann Machines (RBMs):
NQS were inspired by Restricted Boltzmann Machines (RBMs) [6] and while the two models share many similarities, there are important differences. An RBM represents a distribution p θ (v) for v ∈ {−1, +1} n by a marginal over the Gibbs-Boltzmann distribution of an Ising model on a bipartite graph. The model is, in the simplest setting, defined by a set of real-valued weights and biases (local fields). The output distribution is represented as the thermal distribution marginalized over the hidden nodes: Even though Eq. 1 and Eq. 3 look very similar, NQS is not straightforwardly represented by a marginal distribution. The model instead implicitly defines a Born distribution: This allows for interference between the summands. Because the network parameters can be complex-valued, such interference can be destructive. This means that Eq. 4 can evaluate to zero. In comparisson, the marginal sum for an RBM in Eq 3 is lower-bounded by 2 m exp(− a ∞ n), which is large for m n and small a ∞ . In such setting, the RBM algorithm cannot faithfully represent zeros in the output distribution. We show in Sec. 4.2 that this implies that there exist parameters for which the NQS does not define a valid distribution.

Approximations
Here we introduce some notions of approximation that will be used throughout the work. We also state and justify the key assumption (Assumption 1), which ensures that the notion of amplitude ratios is well-defined.
Additive approximations: Let ≥ 0. An -additive approximationR to a real number R is a real number that satisfies: Relative approximations: Let > 0. An -relative approximationR to a non-negative real number R ∈ R + 0 is a real number that satisfies: Lemma 1. Fix > 0. LetQ ∈ R andR ∈ R be -relative approximations to Q, R ∈ C respectively. ThenQ/R is a 3 -relative approximation to Q/R,QR is a 3 -relative approximation to QR.
Proof. First note that: Set 1 + = (1 + ) 2 as the degree of relative approximation to the ratio, from which = 2 +2 ≤ 3 . Given an -relative approximationQ to Q, 1/Q is an -relative approximation to 1/Q. This means thatQR is a 3 -relative approximation to QR.
Complex numbers: We will require a similar notion of approximation for complex numbers. To simplify our analysis, we use the following simplification: we assume that the complex numbers are stored in polar form as e iα R := (α, R) for α ∈ [0, 2π) and R ∈ R + 0 and that the phase α is stored exactly 3 . By an -relative approximation to a complex number C = e iα R, we mean a numberC that satisfies: and α =α. Lemma 1 works for this notion of approximation if we assume that the result of adding two phases is always mapped back to [0, 2π).

Machine precision:
Machine precision is an upper bound on the relative error due to rounding in floating point arithmetic: numbers are represented up to a finite number of bits, which leads to usually insignificant additive errors. These errors usually imply good relative approximation. We define this more precisely now: Lemma 2. Let R ∈ R + and ∈ [0, 1). Assuming ≤ R, an 2 2 -additive approximationR to R is also an -relative approximation to R.
Proof. This follows from: Throughout this work, we will make a simplifying assumption that by evaluation to machine precision, we mean evaluation to -relative error in poly(log(1/ )) time and justify it shortly. We will often require evaluation of some quantities to machine precision in poly(n, log(1/ )) time, where n is the input size to the problem. From this, we bind the error parameter to the input size and simply assume that standard functions, such as exp or cosh, can be efficiently evaluated to 2 −poly(n) error for arguments with magnitude bounded by some polynomial poly(n) (similarly to Eq. 1).
The following is an important consequence of Lemma 2: can be efficiently evaluated to machine precision, then so can 1/Q, QR and Q/R.

The key simplifying assumption
To make use of Corollary 1, we will often need the following assumption regarding wavefunctions: We assume that |ψ(v)| ≥ 2 −poly(n) or ψ(v) = 0 for some sufficiently large polynomial poly(n).
This always holds for any wavefunction represented by NQS, because for θ ∞ ≤ poly 1 (n), we have f θ (v) ≥ 2 −poly 2 (n) for a suitable polynomial (perhaps distinct from poly 1 (n)) and Z ≤ 2 −n . The condition is not automatically guaranteed for other families of wavefunctions and this may become problematic for the access models that we study in the next section. The assumption also guarantees that a machine precision approximation to ψ(v) is always representable by poly(n) bits.

Amplitude Ratios
We first compare the NQS wavefunction access model to the pair-cond (PCOND) query access to distributions from Refs. [7,3]. PCOND queries are strictly more powerful than sampling and we show that they can be simulated efficiently for any NQS distribution. This gives improved algorithms for distribution testing. It also leads to a modification of PCOND for quantum wavefunctions. We define this as amplitude ratio (AR) access and study it independently of NQS. We compare AR to sample and query (SQ) access used in dequantization [25] and probabilistic simulation [27]. We argue that AR is weaker access model than SQ with normalized queries, but retains many classical simulation techniques of SQ.
Proof. Observe that ψ(i)/ψ(j) = f θ (i)/f θ (j). The claim follows from: (see Eq. 1) and noting that each of the m factors only depends on n j v j W ji , which has at most n terms. Since we assumed that θ ∞ ≤ poly(n) below Eq. 2, f θ (v) can be efficiently evaluated to machine precision (Sec. 2.3). The result follows from Corollary 1.
There is an analogy between amplitude comparison and pair-cond (PCOND) oracle access used in conditional distribution testing [3].
Definition 1 (PCOND [3]). Let p be a probability distribution over Ω. The PCOND oracle accepts an input set S that is either S = Ω or S = {i, j} for some i, j ∈ Ω. It returns an element i ∈ S with probability p(i)/p(S), where p(S) = i∈S p(i).
Refs. [3,7] show that PCOND queries lead to significant complexity improvement for some distribution testing tasks. We observe that:

Problem
Random samples PCOND queries Is p θ (v) -close to uniform distribution?
To show that NQS allows PCOND queries, fix Ω = {−1, 1} n in the definition of PCOND.
1. On input S = Ω, output a sample from the Born distribution encoded by the NQS.
2. On input S = {i, j} for i, j ∈ Ω, compute: Return i or j according to a r-biased coin flip (if both |f θ (i)| 2 and |f θ (j)| 2 are zero, return one of i, j u.a.r).
The ratio in Eq. 12 is not computed exactly, but we assume that the deviation from its exact value will be only observed for exponentially many coin flips in the input size. Since we are interested in algorithms that make at most polynomialy many PCOND queries, we neglect this.
Theorem 1 enables efficient implementation of algorithms for NQS distributional testing; see Tab. 1. The algorithms test the total variation distance of two NQS Born distributions and have exponential advantage in their runtime compared to sampling algorithms.

Procedure compare
The key algorithmic tool used in Ref. [3] is a procedure compare that uses PCOND queries for estimating probability ratios. Given that the ratio f θ (i)/f θ (j) is not too large or too small, the procedute compare outputs its 1/poly(n) additive approximation with high probability of success using poly(n) many PCOND queries [3]. For NQS, this ratio can be efficiently computed to machine precision, which offers further simplifications of the algorithms. This motivates the definition of amplitude ratio wavefunction access model introduced in the next section.
PCOND and RBM: PCOND can be also instantiated for RBMs. The PCOND algorithms of Ref. [3] apply with little modification to RBMs and imply significant speedups for some distribution testing tasks. Results of Ref. [3] were presented as a part of a theoretical analysis in conditional property testing and do not have runtimes that would make them immediately practical. No natural and efficient instantiation of the PCOND oracle (such as the one in the context of RBMs) has been known to the authors in mid 2021 [2]. It is therefore possible that the algorithms could be optimized and perhaps used in practice.
Previous quantum information work on PCOND: PCOND access was studied from the quantum computing viewpoint by Sardharwalla, Strelchuk and Jozsa in Ref. [22], but the AR definition (introduced below) is new. The difference between their definition and AR is that their quantum extension of PCOND oracle (PQCOND) provides access to conditional probabilities associated with the underlying distribution -their oracles are defined at a level of quantum states, while the AR access is always defined with respect to some fixed basis/wavefunction. The authors give a PQCOND version of the compare procedure from Ref. [3] and show that PQCOND queries have polynomial improvements upon many of the PCOND distribution testing results. They also derive results on boolean distribution testing and quantum spectrum testing. There does not seem to be an obvious connection between AR access and PQCOND, but it would be interesting to understand this better.

Amplitude Ratio (AR) Access
Because NQS allows for computation of amplitude ratios to high precision, they should offer stronger access to the wavefunction than PCOND. We study this as a wavefunction access model, independently of NQS.

Definition 2 (Exact AR). Let ψ : Ω → C be a wavefunction over Ω. The AR oracle accepts as an input either an ordered pair
If the ratio diverges, it returns a special symbol 'DIV' and it conventionally returns 1 for ψ(i) = ψ(j) = 0.
Definition 2 assumes that the queries are returned exactly. This becomes problematic if ψ(i)/ψ(j) becomes irrational number as the query result is an infinitelly-long output. This issue can be dealt with as follows: Definition 3 (AR). Let ψ : Ω → C be a wavefunction over Ω subject to Assumption 1. For ∈ [0, 1), the AR( ) oracle accepts as an input either an ordered pair S = (i, j) or S = Ω. If S = Ω, it returns a random sample from the Born distribution |ψ(v)| 2 over v ∈ {−1, 1} n . If S = (i, j), it returns an -relative approximation to ψ(i)/ψ(j). 4 If the ratio diverges, it returns 'DIV' and it conventionally returns 1 for ψ(i) = ψ(j) = 0. We write AR := AR( ) if scales as 2 −poly(n) for some polynomial poly(n). 5

SQ vs AR
We now compare AR with another type of access to quantum wavefunctions, the sample and query (SQ) access. SQ was defined in Ref. [25], but was previously used as computational tractability with additional computational requirements in Ref. [27]. There is a difference between the two definitions: Ref. [25] defines SQ access without normalizing the queries, but subsequenly assumes knowledge of the normalization factor (see for example Prop 4.2. [25]), while Ref. [27] assumes normalization (as well as efficiency) in the definition of computationally tractable states (Ref. [27], Def. 1). Tang's definition of SQ also does not treat the underlying object as a wavefunction, but more generally as a real valued vector with 2 -norm sampling access. This presentation puts less emphasis on the need for the normalization factor. Here we use the SQ access model assuming that the queries yield normalized amplitudes, but impose no efficiency constraints.
Definition 4 (Exact SQ). Let ψ : Ω → C be a wavefunction over Ω. The wavefunction has SQ access if ψ(i) can be computed for any i ∈ {−1, +1} n and its Born distribution |ψ(i)| 2 can be sampled from.
The above definition has the same problem as exact AR (Def. 2): if the result of the query is an irrational number, the result of the query will not have bounded size. We update it as follows: AR and unnormalized SQ Tang used SQ with unnormalized queries in [25] but subsequently assumed knowledge of the normalizing factor to the wavefunction throughout the work. Without knowledge of the normalization factor, the only information about the vector with such access comes from amplitude ratios and sampling. This way, the definition is essentially the same as AR -so why bother with AR?
The key reason for defining AR is to emphasize that the normalization factor of the wavefunction is simply unavailable, aside from its empirical estimate by sampling. It seems problematic to guarantee this (at least somewhat) rigorously: does evaluation of the state up to an "arbitrary" normalization factor always force you to not use it? That subtlety aside, one can also see AR as unnormalized SQ access and the rest of this section as comparison between normalized and unnormalized variants of SQ.
AR is not stronger than SQ AR can be simulated by SQ:
Evidence that AR is weaker than SQ: To show that AR access is in some sense weaker than SQ, we show conditional separation between variants of the two models under efficiency constraints. SQ is related to the concept of computationally tractable (CT) states [27]:

subject to Assumption 1, is computationally tractable (CT) if both queries in Def. 5 can be implemented with a poly(n)-time randomized algorithm. 6
This can be seen as SQ with efficient classical sampler and efficient classical algorithm for the amplitude queries. The key capability of CT states is an efficient algorithm that, given two CT wavefunctions ψ(v) and φ(v), approximates ψ|φ in polynomial time to inverse-polynomial precision (see Theorem 3. and Lemma 3. in Ref. [27] or equivalently Prop 4.8. in Ref. [25]). This enables estimation of constant-local bounded observables on CT states or simulation of sparse quantum circuits. Techniques related to CT framework were used in dequantization algorithms in Ref. [25] or quantum algorithm analysis in Ref. [11].
Analogously to CT states, we define amplitude ratio (AR) states and show that their fidelities and expectation values on constant-local observables can be also efficiently computed. They subsume CT states, which suggests that the CT requirement can be relaxed to AR in many applications. We show that, subject to Assumption 1, all CT states are AR states. Our evidence that SQ access is somewhat stronger than the AR access follows from that for machine precision, not all AR states are CT, unless #P = FBPP. 7 This is shown using the separation between (exact) counting and uniform sampling by Jerrum, Valiant and Vazirani (Sec. 4 [15]). (There is an exact AR state that is not CT): The proof uses the observation that uniform sampling of solutions to a given boolean formula over n variables in disjunctive normal form (DNF) is easy, while their exact enumeration is #P-complete (Sec. 4 of Ref. [14]). We consider a quantum state that is a uniform superposition over satisfying assignments to the boolean formula and show that its Born distribution can be easily sampled from. The normalization factor of such state counts the number of satisfying solutions to the formula, which is #P-complete to compute exactly. We show that a good relative approximation to the amplitude determines this quantity.
Given a boolean formula in DNF over n variables, interpret v ∈ {−1, +1} n as an assignment to its variables: if v i = +1, then the i-th variable is true and if v i = −1, the i-th variable is false. Define the DNF formula-"state" as follows: Notice that Z is the number of variable assignments that satisfy the formula. For any input v, the predicate |ψ DNF (v)| > 0 can be tested by plugging in the variable assignments into the formula. Any non-zero amplitude evaluates to 1/ √ Z, from which it is possible to compute the amplitude ratio as required by exact AR.
The Born distribution |ψ DNF (v)| 2 is a uniform distribution over the satisfiying assignments to the boolean formula. This distribution can be sampled from exactly by a polynomial-time randomized algorithm described in Sec. 4 of Ref. [15]. It works as follows: For a DNF boolean formula F = F 1 ∨ F 2 . . . ∨ F m in n variables, where F i is a conjunction of literals for all i ∈ [1, m], let S j ⊆ {−1, +1} n be the set of satisfying assignments to F j . Note that |S j | is easy to compute, because F j is only satisfied if it fixes a subset of the variables involved in it and the remaining variables can take arbitrary values. Let S = j S j the set of all satisfying assignments to F . The aim is to sample uniformly over S, which is achieved by Algorithm 1.

With 1/N probability, output a and halt. end for
The algorithm does not halt with small probability. If it does not halt, rerun the forloop. The output of the algorithm is a uniformly random satisfying variable assignment. This implies that ψ DNF (v) is an exact-AR state. 8 The family of states ψ DNF is not exact-CT unless #P = FBPP, because ψ DNF (v) = 1/ √ Z for any satisfying variable assignment v. Evaluating this exactly is a #P-complete problem, because Z is the number of satisfying solutions to the boolean formula. So unless FBPP = #P, ψ DNF (v) is an exact-AR state that is not exact-CT.
It remains to show that ψ DNF (v) is not CT. To do this, we show that CT can compute Z to sufficient accuracy to determine it exactly. First note that Z ≤ 2 n . We want to choose so that the -relative approximations to Z and Z + 1 can be distinguished for any 0 ≤ Z ≤ 2 n . This happens if (1 + )Z < (1 + ) −1 (Z + 1). Since < 1, we have that (1 + ) 2 < 1 + 3 and (1 + 3 )Z < Z + 1 =⇒ (1 + ) 2 Z < Z + 1. We can therefore choose ≤ 2 n−2 to recover Z exactly. It follows that if ψ DNF (v) was CT, Z could be computed exactly by a polynomial time randomized algorithm (FBPP). This problem is however #P-complete, so this gives a contradiction unless #P = FBPP.
We remark that the above argument won't work for = 1/poly(n). The reason is that there is a polynomial time randomized algorithm for approximating the number of DNF satisfying clauses to relative error that outputs the solution with high probability in poly(n, 1/ ) time [17]. 9 This is worse approximation than what is assumed in the definition of AR states, which shows that one has to use different argument to separate CT( ) from AR( ) for scaling as 1/poly(n). Similar argument to the above leads to an argument that could separate SQ and AR in terms of their query complexity: Lemma 5. Given SQ access to ψ DNF from the previous theorem, a single query suffices to approximate Z to machine precision.
Proof. (sketch) Given ψ DNF and an SQ access to it, the SQ algorithm draws a sample v from |ψ DNF (v)| 2 and evaluates ψ DNF (v) = 1/ √ Z, which gives a machine precision approximation to Z. Conjecture 1. Z for ψ DNF can be query-efficiently estimated by AR to no better than 1/poly(n) relative error.
The reasoning behind the conjecture is the following: Given AR access to ψ DNF , any amplitude ratio query on a pair of nonzero amplitudes evaluates to 1. Any amplitude ratio on pairs of zero and nonzero amplitudes gives either 0 or 'DIV'. By finding a v with zero amplitude and a single w with a nonzero amplitude, AR can detect if ψ DNF (z) > 0 for any z ∈ {−1, +1} n . Possibly the best way to approximate Z with this access seems to be variants of importance/nested sampling (see for example [17,24,8,13]), which at best lead to poly(n)-sample algorithm that estimates Z to 1/poly(n) relative error. 10 It would be extremely surprising if these methods were not asymptotically optimal. I don't have a proof though.

Theorem 3. Assuming Conjecture 1, there is a task that requires just one SQ query, but at least poly(n)-many AR queries.
AR is stronger than sample access This is shown by noting that AR access implies PCOND access and then referencing the known results from conditional distribution testing that separate PCOND and sample access.
AR access is stronger than PCOND access and PCOND is strictly stronger than sample access (Tab 1). It is worth noting that the algorithms of [2] work even if the AR ratios can be approximated only to 1/poly(n) additive error. It may be therefore interesting to study weaker variants of AR.

Lemma 7. AR is stronger than PCOND.
Proof. (sketch) AR can decide if |ψ(i)/ψ(j)| 2 ≤ 2 −n in a single query, but the same task requires exponentially many PCOND queries. This is because the PCOND model can estimate the ratio only by sampling, which is limited by the usual concentration bounds. These imply a lower bound of exponentially many queries.
One can object that the above comparison is rather unfair, because AR computes the amplitude ratio to a high degree of precision with a single query, while PCOND can only do so query-efficiently to 1/poly(n) additive error. We strengthen the above lemma to separation from a version of high-precision PCOND that can query for ratios of Born distribution as follows: which can be computed with 2 AR queries. The states have the same, uniform, Born distributions, so all PCOND ratio queries evaluate to 1. The PCOND therefore cannot evaluate the overlap of ψ and φ. The PCOND access model therefore cannot evaluate the overlap of ψ and φ.
Summary Results of Sec. 3.3 can be summarized as : since PCOND access is separated from sampling access to a Born distribution of a wavefunction by Tab 1, AR is separated from PCOND access by Lemma 8, and SQ is almost surely (that's why the star) separated from AR by Theorems 2 and 3. By separation, we mean that there exists at least one problem that can be query efficiently solved by one of the classes, but not by the other.

AR and probabilistic simulation
We now show that AR states retain many simulation capabilities of CT states. Most of the results follow from standard algorithms used with NQS that implicitly used the AR access. We improve some of them, for example by robustification of the AR fidelity estimator, to give the closest possible analogues to the previous results for CT states [27].
AR fidelity estimator: Given two amplitude ratio (AR) states ψ and φ, there is a randomized algorithm that approximates their fidelity | ψ|φ | 2 in polynomial time to inverse-polynomial precision. Medvidovic and Carleo use the following estimator for NQS in Ref. [21]: Every term in the summation uses two AR ratio queries, which can be seen from: This can be operationally understood as follows: sample (x, y) ∼ |φ(x)| 2 |ψ(y)| 2 (which is a valid product distribution) and compute the product of AR ratios on ψ for (x, y) and φ for (y, x). We now show that the estimator has finite variance and give a robust version of it with fast concentration around the mean. Set: and notice that: From this, the variance becomes: While the main utility of G(x, y) is a simplification of the variance analysis, it could also allow for additional cancellation between f ψ (x) and f φ (y) that could not be exploited in the product of means estimator of Ref. [21].

Robust AR Fidelity Estimator:
Despite the fact that the estimator has a finite variance, the random variable G(x, y) is unbounded because it contains a ratio of wavefunctions evaluated at two distinct points and may explode if either f ψ (y) or f φ (x) are close to zero. If we are unlucky enough to obtain such outlier in our empirical estimation, it can significantly skew the statistics. It is therefore desirable to use an estimator that is less affected by outliers -such estimators are often called robust. We show how to make the above estimator robust using the median of means amplification.
Theorem 4 (Median of means estimator). Let k, be two integers. Define the empirical meanḠ as:Ḡ = i G i /k. Compute such empirical means and use the median as the estimator. Then: Proof. We have σ 2 (Ḡ) = σ 2 [G]/k. By Chebyshev inequality: The median of means condition in Eq. 24 is violated if the majority of empirical means violate the Chebyshev condition in Eq. 25. The probability of this happening is at most: where the inequality follows by monotonicity. We can bound this by: where we used a tail-bound on the binomial distribution.
This technique is commonly used in computer science (see Eg. [15] or a recent review [18]) and was previously used with SQ access/CT states by Tang [25], where it was presented as a standard technique. An alternative way to make the SQ estimators robust was used by Van den Nest in Ref. [27], but it does not work for AR. A corollary of Theorem 4 is a polynomial-time robust estimator of the overlap: Corollary 2 (Robust AR fidelity estimator). There is an algorithm that estimates | ψ|φ | 2 to -additive error with probability 1 − e −n using 64n/ 2 AR queries.
Proof. Use the median of means estimator in Theorem 4. One evaluation of the random variable G(x, y) in Eq. 21 costs two AR queries. Use the median of empirical means of G(x, y) (each over k evaluations of the random variable) as the estimator of the overlap. Setting k = 4/ 2 and = 8n in Theorem 4 gives: which follows from: The overall number of AR queries that achieves this is at most 2 k = 64n/ 2 .
We briefly compare this estimator to the CT estimator of Ref. [27]. The algorithms achieve the same goal and their asymptotic query complexity in and n are the same. The above fidelity estimator however does not require computation of the amplitude, but only amplitude ratios, which is a computationally easier problem as argued in Theorem 2.
Estimating Sparse Observables: Given AR access, there is an algorithm for estimating expectation value of a (hermitian) observable O expanded in the same basis as the wavefunction. This algorithm is well-known in computational physics as local observable estimation: This can be interpreted as sampling from the Born distribution |ψ(j)| 2 and querying the f ψ (k)/f ψ (j) AR ratios for every k that appears in: This means that if O has at most poly(n) non-zero entries in each row, we can compute this estimator in polynomial time. As previously, we bound the variance of this estimator. We have that: and 11 We have that ψ|O 2 |ψ ≤ λ λ 2 | λ|ψ | 2 ≤ λ 2 max , where λ max is the largest eigenvalue of O. For hermitian matrices, the largest eigenvalue coincides with the operator norm O = λ max , from which we have that ψ|O 2 |ψ ≤ O 2 . It follows that: σ[X] ≤ O 2 .
Theorem 5 (Estimating Sparse Observables with AR). There is an algorithm that estimates ψ|O|ψ for O ≤ 1 to -additive error with probability at least 1 − e −n using at most 32sn/ 2 , where s is the row-sparsity of O.
Proof. Use the median of means estimator in Theorem 4. One evaluation of the random variable X costs at most s queries, where s is the row sparsity of O. The rest follows from Corollary 2.

AR and dequantization?
SQ access was studied in dequantization and it is natural to ask if AR can lead to some improvements in that framework. We outline mostly negative results for improvements of Ref. [25] using AR.
Ref. [25], Proposition 4.2. gives an algorithm for estimating the inner product x, y of two real vectors x and y, using the knowledge of the normalization constant of one of the vectors and error that depends on the normalization factor of both. Assuming x, y ≥ 0 (i.e. for non-negative vectors), the AR fidelity algorithm gives a good estimator of the inner product without knowledge of the normalizing factors. Proposition 4.3. of Ref. [25] crucially depends on computing the rejection sampling filter: This can be computed with O(k 2 ) AR ratio queries to V , assuming that the entire matrix has been normalized with the same (possibly unknown) normalization factor. This assumption is most likely too strong to be useful in the context of Tang's algorithm. 11 The cancellation of |ψ(j)| 2 is problematic for all values for which ψ(j) = 0, because the random variable becomes unbounded on the values outside of the support of |ψ(j)| 2 . The expectation value then depends on the values that the observable O jk takes on samples outside of the support of |ψ(j)| 2 , which is undesirable. Let Ω be the domain of |ψ(j)| 2 and Σ := {j ∈ Ω : |ψ(j)| 2 > 0} be its support. To alleviate the above issue, it is more natural to define: such that the cancellation of |ψ(j)| 2 is well-defined. We can write this as: Irrespectivelly of the definition of E|X| 2 , the inequality following Eq. 36 holds. I want to thank Giuseppe Carleo for pointing out this subtlety.
Lastly, the modified version of the FKV algorithm [9] (Algorithm 2 in [25]) crucially relies on sampling from the distribution induced by norms of the matrix columns (normalization factors). There does not seem to be a simple way to circumvent this with the AR access. It would be really interesting to see if some of these limitations could be avoided in different dequantization algorithms.

Postselection Gadgets
Here we explore a different way of accessing the wavefunction that can be implemented with an NQS: postselection gadgets. Postselection gadgets are maps between NQSs that allow for a different set of conditional queries to the Born distribution encoded in the NQS called subcube conditional queries in Ref. [4]. In contrast to PCOND, postselection gadgets cannot be instantiated efficiently for an arbitrary NQS. We use this property to show that there is an NQS with just three nodes that does not encode valid wavefunction and cannot be sampled from -a counterpart to similar results for RBMs [19,20]. It is possible that the gadgets may have applications beyond this, but the analysis seems to be beyond reach of the techniques known to the author.

Postselection Gadgets
Let |ψ θ (v|r)| 2 := p θ (v|r) where v ∈ {−1, 1} n and r ∈ {−1, 1, } n be the distribution |ψ θ (v)| 2 := p θ (v), conditioned on the event that for all non-bits of r, the output bits of v are fixed to bits of r. A postselection gadget is a function that transforms a NQS θ to another NQSθ (possibly with additional nodes), such that |ψ θ (v|r)| 2 = |ψθ(v)| 2 . Proof. The following proof uses notation used in Sec. 2.1. For every non-bit of r, introduce a hidden node g and attach it to the corresponding visible node. Set the bias on this hidden node to i(π/4) and couple it to the visible node with strength −i(π/4)r i (see Fig. 2). This gives: Let F be the event 12 that v 1 = r 1 , v 2 = r 2 . . . , v k = r k . Because 2 cosh (iπ/2) = 0, we have that: Let E ⊆ {−1, +1} n be an arbitrary event. We have that: From Eq. 42, it follows that: (44) 12 An event F ⊆ Ω is a subset of domain Ω of the probability distribution. and pθ(E|F Hence, from Eq. 43: pθ(E) = p θ (E|F ). Thus, the output distribution of the augmented state is exactly the conditional of the original distribution. See Fig. 2. Note that the function θ → θ is easy to compute. Postselection gadgets allow for sampling from a subset of conditional distributions of the distribution encoded into the NQS. Additional examples of postselection gadgets can be found in the appendix. The construction almost trivially extends to Deep Boltzmann Machines [10,5]. While the analysis was inspired by its RBM counterpart by Long and Servedio in Ref. [19], their gadget construction does not straightforwardly extend to RBMs.

Not all NQS encode valid quantum states
We show that many NQS do not encode valid quantum states. This follows from the fact that the postselection gadget allows postselection on probability zero events. Any NQS can be modified by adding two hidden nodes g 1 and g 2 , as in Sec. 4, that fix the value of the visible node v 1 to 1 and −1 respectively. Eq. 41 then gives fθ(v) = 0, which also means that the encoded "wavefunction" is zero. The resulting NQS then does not encode any wavefunction because it cannot be normalized. The smallest NQS for which this works is one with two hidden nodes and one visible node. Such NQS naturally cannot be sampled from by any algorithm. Figure 3: A NQS that does not encode a valid quantum state can be constructed by appending two auxiliary nodes, each of which fixes the value of v 1 to +1 and −1 respectively.
There does not seem to be any analogous simple construction for restricted Boltzmann machines with real valued coefficients. The reason is, as shown earlier, that RBMs cannot encode zeroes in the output probability because each outcome probability is lower-bounded by 2 m exp(− a ∞ n) and the contribution of any "gadget" hidden node to the output RBM probability is a factor of cosh(x) for some real x, which is lower-bounded by 1. We remark that existence of such "zero-valued" NQS is however not significant for applications in which the network is trained by sampling from output distributions of a sequence of NQSs. Some care must be however taken when encoding quantum states directly.

Other applications?
The postselection gadget can be in principle used for postselection without retries. However, even when the original NQS θ can be easily sampled from, it may be (and it is very likely in some cases) that the Gibbs sampling algorithm for the modified NQSθ won't be efficient anymore. Still, one toy example that may suggest that this could be more efficient than resampling (but that also seems useless) is the case in which the encoded distribution is a product distribution -fixing output bits of such distribution using gadgets will not lead to any slowdown.
In the case where the sampling algorithm remains efficient for the appropriate sequence of conditionals, one can also obtain a very crude multiplicative approximation of the normalizing factor of the NQS wavefunction, essentially by retracing the RBM algorithm presented in Ref. [19]. The question of characterizing what conditional gadgets allow for efficient sampling however remains widely open.
Lastly, Ref. [16] used a similar gadget construction to simulate universal random circuits with NQSs. The postselection gadgets from Thm. 6 can be seen as extension of their result. See Appendix A for additional postselection gadgets and Appendix B for enconding of Pauli gates.

Discussion
We studied the access model offered by neural network quantum states (NQS), which, along with connection to previous results from conditional distribution testing, motivated the definition of amplitude ratio (AR) access model. We related AR to sample and query (SQ) access and showed that it retains some of the simulation capabilities of SQ. We gave some evidence that AR may be weaker than SQ and showed that existing results in distribution testing imply that AR is stronger than sample access. We then considered alternative access to the NQS wavefunctions by means of subcube conditional queries and showed that even small NQS may not encode valid distributions. Our work left several questions open: • It would interesting to further explore the connections between AR and dequantization and understand if it is possible to meaningfully relax the SQ normalization requirement.
• Both definitions of CT and AR states assumed existence of a classical randomized sampler for the Born distribution. One may thus ask if there are any nontrivial states in the quantum-classical generalization of CT and AR states, where we can efficiently sample from using quantum algorithm, but still classically estimate the ratios or amplitudes. Such states may be useful for construction of quantum algorithms based on conditional property testing results in Ref. [3,7]. It might be for example interesting to understand how this plays with some of the known supremacy results in which approximating a target amplitude is known to be #P-hard, yet there is an efficient quantum algorithm that samples the output [1]. It may be that, while the amplitudes are hard to compute, approximating their ratios remains tractable.
• It would be also interesting to understand, perhaps numerically, if and for what problems could the NQS postselection gadgets provide advantage when compared to postselection by resampling in some of the applications of NQS.

Acknowledgements
I want to sincerely thank Ashley Montanaro and Noah Linden for their help. I also want to thank Giuseppe Carleo, Srini Arunachalam, Sergii Strelchuk, James Stokes and Juani Bermejo-Vega for their suggestions and discussion.