Empirical Sample Complexity of Neural Network Mixed State Reconstruction

Quantum state reconstruction using Neural Quantum States has been proposed as a viable tool to reduce quantum shot complexity in practical applications, and its advantage over competing techniques has been shown in numerical experiments focusing mainly on the noiseless case. In this work, we numerically investigate the performance of different quantum state reconstruction techniques for mixed states: the finite-temperature Ising model. We show how to systematically reduce the quantum resource requirement of the algorithms by applying variance reduction techniques. Then, we compare the two leading neural quantum state encodings of the state, namely, the Neural Density Operator and the positive operator-valued measurement representation, and illustrate their different performance as the mixedness of the target state varies. We find that certain encodings are more efficient in different regimes of mixedness and point out the need for designing more efficient encodings in terms of both classical and quantum resources.


Introduction
Recent advances in quantum technologies [1] have led to diversified applications in areas such as quantum simulation [2], communication [3], cryptography [4] and machine learning [5].However, present-day noisy intermediate-scale quantum (NISQ) devices are inherently noisy [6] and limited in size.Techniques to mitigate noise [7], reduce quantum circuit depth [8], minimize the number of quantum shots, and verify experimentally prepared states [9] are crucial to leveraging these devices in realistic applications.
One such technique, quantum state reconstruction (or tomography) [10], uses a limited number of measurements to produce a classical approximation ρ of the quantum state ρ prepared on a device.The classical reconstruction facilitates the efficient computation of many observables without further quantum circuit evaluations and serves to validate the quantum device [9].Classical reconstructions, which vary in computational overhead, are all derived from datasets of measurement outcomes collected by repeatedly measuring a set of observables on the state ρ prepared on the device.The accuracy of the reconstruction can be quantified via several reconstruction errors ϵ, such as the difference in expectation values between the reconstruction and the original state or other distance measures like infidelity or trace distance between ρ and ρ.An important indicator of the asymptotic performance of such methods is the sample complexity: the size of the quantum-generated dataset needed to obtain a classical reconstruction with a certain error ϵ.
Recent research has established that tomography methods using generic quantum states, such as maximum likelihood estimation [11,12]), necessitate a sample size that grows exponentially with the system size [13,14,15].One way to circumvent this is to design randomized measurement protocols that only estimate expectation values of certain observables (e.g., classical shadow tomography [16,17,18,19]) Alternatively, one can exploit the fact that only a small set of possible states, with low complexity, are actually observed in physical models [20,21,22], and design more efficient encodings of those physically realizable states to alleviate the data requirement, similar to variational methods commonly adopted to simulate quantum systems classically.For example, one can use matrix product states to efficiently encode and reconstruct one-dimensional pure states with area law entanglement [23,24,25].For more general states, generative neural networks (NN) can be used as a variational ansatz trained to reproduce the measurement data [26,27,28,29,30,31,32].These ansatze, called Neural Quantum States (NQS), encode the quantum state in a compact form, significantly reducing the data requirement for reconstructions.The cost associated with this approach is introducing a non-convex optimization problem, which established techniques in modern artificial intelligence can approximately solve [33,34].
Designing efficient NQS encodings for general mixed states presents a significant challenge.Three broad categories exist: (i) an expansion onto a set of pure-states encoded with standard NQS, which is exponentially costly for states with large entropy [35]; (ii) a physical (positive-semidefinite) neural-network encoding of the mixed state in the Pauli-Z computational basis known as Neural Density Operator (NDO) [32,36] and (iii) a neural network encoding of the probability distribution over the outcomes of a set of informationally-complete positive operatorvalued measurements (POVM-NQS).This last approach is less costly than (ii), but may result in unphysical density matrices [28,31,30,37].While the approximation used in the first approach is wellunderstood, the relationship between the latter two methods remains unclear.Furthermore, no comparison has been made thus far regarding their effectiveness in Quantum State Reconstruction tasks.In particular, whether these two methods share the same sample complexity is uncertain.
While the dependence on system size has been extensively studied for pure states (e.g., [38] Fig. 7), few comparisons have been made regarding the dependence on reconstruction error ϵ.This ϵ dependence can be especially significant for NISQ algorithms such as variational quantum eigensolvers (VQEs) [39,8], as a worse scaling will lead to a large increase in the number of quantum executions to achieve the same accuracy of the result.For classical shadow tomography, the sample complexity is known to scale as ϵ −2 , which doesn't yield an asymptotic improvement over naive statistical averaging [17,40].More recently, numerical evidence suggests that the pure state NQS method holds an approximately quadratic advantage (i.e., ϵ −1 ) over classical shadow in energy estimation for certain molecular ground states [41].However, it is unknown if this advantage persists when the target state is mixed.
In this work, we first improve the reconstruction algorithm by considerably reducing its classical computational overhead using the Control-Variates variance reduction technique (section 3).
Then, we conduct comprehensive numerical simulations to investigate the sample complexity of mixedstate reconstruction for different NQS encodings.In particular, we benchmark NDO and POVM-NQS on reconstructing the finite temperature density matrix of the Transvese-Field Ising model.We numerically demonstrate that for NDO, the quadratic advantage in pure state reconstruction can only survive when the state is slightly mixed, and the scaling deteriorates when the state is highly mixed.On the other hand, POVM-NQS does not hold such an advantage even for pure states and has a similar scaling as classical shadows, independent of how mixed the target state is.Consequently, NDO performs better than POVM-NQS for nearly pure states, while for highly mixed states, the situation is reversed.We also propose a phenomenological model that can explain the results.These results provide valuable guidance to the practical implementation of NQS-based state reconstruction and also point out the need for designing more efficient encodings in terms of quantum resources.

Neural Quantum State Reconstruction
We begin by describing the general framework of NQS reconstruction.The fundamental concept involves training a (potentially generative) NN that approximates a quantum state in a well-defined basis to reproduce the statistics of the measurement data.Let's assume that in the experiments, the system state ρ has been measured with , where This includes projective measurements like Pauli string measurements as a special case.Assume that we have gathered N d measurement outcomes under each measurement basis, where each outcome is denoted by a number σ b j ∈ {1, . . ., K} for j = 1, . . ., N d .The dataset that we aim to reproduce can be represented as: We use p b (σ) to denote the probability of obtaining the measurement outcome σ by measuring ρ under basis b, and use q θ b (σ) to denote the corresponding probability given by the NQS ρ θ with variational parameters θ.Our goal is to minimize the averaged distance between these two probability distributions over all bases.We quantify this distance by the Kullback-Leibler divergence KL(p∥q) = E σ∼p log p (σ)  q(σ) .We, therefore, define the loss function as: where we have omitted the constants that do not depend on θ and approximated the expectation values E with the sample average over the finite dataset.We then use gradient-based optimization methods to optimize the parameters θ, in which the t th iteration reads where opt refers to the optimization algorithm used (e.g, for gradient descent with learning rate α, opt(θ, g) = θ − αg).In practice, when the dataset is large, we employ a technique called mini-batching to reduce the computational cost.This involves estimating the KL-divergence and its gradient not on the whole dataset, but only on a smaller subset of it.
Once the training procedure has converged, the NQS can be used to generate samples, predict properties of interest, or, for sufficiently small systems, retrieve the full density matrix.We now briefly introduce the two different NQS encodings that we compared during our investigations: NDO and POVM-NQS.

Neural Density Operator
The NDO encoding is compatible with projective measurements.We take Pauli string measurements {X, Y, Z} ⊗N on an N -qubit system as an example.The corresponding projectors will be denoted with {P b i } and the basis rotation matrix with {U b }.Measurement outcomes can be denoted by bit-strings of length N (e.g., σ b j = (0100) for N = 4).When a properly normalized neural network (NN) is usedfor instance, an autoregressive NN -to parameterize the density matrix elements ⟨η|ρ θ |η ′ ⟩ = NN θ (η, η ′ ), the variational probability distribution reads If the NN parameterization is not normalized, an additional normalization term in the loss function appears, whose gradient can be estimated through Monte Carlo sampling [27].We remark that to efficiently evaluate eq. ( 4), the number of connected elements in the rotation unitary U b must be bounded.However, their number will grow exponentially with the number of Xs and Y s rotations present in the measurement basis.A similar limitation appears for pure-state tomography.
For benchmarking purposes, we use all the possible 3 N basis in this work, to guarantee that the dataset contains enough information to determine the mixed target state.Nevertheless, this is not necessary in practice since NDO works for generic datasets and measurement basis that may not be informationally complete [32,38,41].

POVM-NQS
The POVM-NQS method exploits the one-to-one correspondence between the state ρ and the outcome statistics of a single (N b = 1) informationally complete POVM measurement {P i } K i=1 .The probability of obtaining an outcome σ ∈ {1, . . ., K} is given by p(σ) = tr(ρP σ ).Conversely, the density matrix can be obtained by the inverse formula where T σ,σ ′ = tr(P σ P σ ′ ) is the overlap matrix [42].Therefore, reconstructing p(σ) suffices to determine ρ.As an example, we consider the tensor products of single-qubit Pauli-4 measurements {P (0),( 1),(2) =  } ⊗N for an N qubit system.Under this measurement scheme, a normalized neural network is used to approximate p(σ): q θ (σ) = NN θ (σ).When the NN is trained, one can use the inverse formula to reconstruct the target state, or directly estimate relevant properties via sampling [30].

Variance Reduction via Control Variates
To investigate the asymptotic behavior of the sample complexity, we must train the NN to high precision.However, we observe that the noise introduced by the mini-batching strategy makes accurate training prohibitively expensive in practice.To understand that, consider randomly sampling a batch of outcomes B from the dataset at each iteration.The gradient of the loss function evaluated on this batch L B will be given by which an unbiased estimator of ∇ θ L. However, the variance of L B reads which remains finite as q θ b approaches the target p b .This asymptotically finite variance is in sharp contrast to the zero-variance property of Variational Monte Carlo, which allows for accurate optimization of the ground-state with a relatively small number of samples [43,44].The statistical fluctuations in the gradient estimation introduce noise that prevents the reconstruction error from dropping below its standard deviation, which scales proportionally to 1/ |B|.Consequently, in situations where we cannot afford training with large batch sizes, we must reduce the variance of the gradient estimator.
The Control Variates (CV) method is a wellestablished statistical technique for variance reduction [45], which was recently discussed in the context of Variational Monte Carlo [46,47,48,44].Specifically, when estimating the gradient g B (θ t ) at step t, we introduce a second random variable (the CV), which represents the gradient evaluated at an earlier step t ′ : g B (θ t ′ ).We then adjust the gradient estimator to g B (θ t−1 ) − g B (θ t ′ ) + Eg B (θ t ′ ).This revised estimator is unbiased but can have lower variance because g B (θ t ) and g B (θ t ′ ) are correlated.The expectation can be computed by averaging over the entire dataset.Consequently, we obtain the following variance-reduced training rule: To further reduce the computational cost, we update the CV only once every T steps, i.e., t ′ = T ⌊t/T ⌋.For all simulations in this paper, we set T = 50.This method is also known as stochastic variance reduced gradient (SVRG) in the machine learning literature [49,50].
To further substantiate our approach, we conduct a systematic numerical analysis.As a benchmark problem, we consider the NDO reconstruction of the one-dimensional open-boundary transverse field Ising model (TFIM) We set h = 1 and N = 3, small enough to study the batch size's effects systematically.We randomly generate 10 3 measurement shots for each of the Dashed and solid lines correspond to training with and without CV.It is evident that when NQS is trained without CV, the errors scale like 1/ √ B as expected from eq. ( 7), and then saturates for large batch size at an intrinsic limit set by the dataset.When the CV method is applied, the adverse effect of mini-batching is eliminated, and the errors are independent of the batch size.These results validate the effectiveness of our CV method, which we will use in the rest of our analysis.
We note that this CV method applies to generic NQS reconstruction algorithms, including different pure and mixed encodings, and should be used as a default technique when mini-batching introduces noise.The code is implemented and open-sourced in NetKet [51,52,53].

Simulations
To study the performance of NDO and POVM-NQS, we simulate the finite-temperature Gibbs ensemble of the TFIM, which is representative of mixed states where the prepared states might interact with a thermal bath.
We generate measurement datasets of varying sizes and use different NQS ansatze for the reconstruction to understand the asymptotic scaling behavior of the sample complexity.In this work, we focus on sample sizes in the regime of 10 2 to 10 4 , which are currently achievable with modern quantum devices [54,55].By plotting the errors of different sample sizes on a loglog scale, we can determine the sample complexity scaling exponents from the slopes of the linear fits.We use the loss value (defined in eq. ( 2)), the infidelity I, and the error in energy ε as metrics for the reconstruction error.As the density matrix reconstructed through the POVM-NQS method might be negative, the infidelity may not always be a good indicator.Therefore, we take the absolute value of infidelity and also calculate the average classical infidelity , which is commonly used as a performance indicator in the literature on POVM-NQS.
We consider the 3-qubit one-dimensional openboundary TFIM at h = 1, and use thermal states ρ β = exp(−βH Ising )/ tr[exp(−βH Ising )] across a wide range of inverse-temperatures β ∈ [10 −1 , 10 1 ] as the target states.The numerical details can be found in appendix A. In Fig. 2, we show three inversetemperatures β = 10, 1, 0.1 representing low, medium, and high-temperature regimes, respectively.We plot the sample complexity scaling behaviors using solid lines, with the linear fits shown as dashed lines.The  slopes and r-squared values of the linear fits are reported in the corresponding legends.In Fig. 3, we summarize the scaling exponents for different inverse temperatures.

Scaling Behavior
As shown in fig.3, we note that the scaling exponents for KL and classical infidelity for both classes of NQS ansatze are approximately −1, regardless of the mixedness of the target states.This is because we are learning the classical probability distributions of the measurement outcomes, which is known to have sample complexity Θ ϵ −1 , for a classical error ϵ quantified by the KL or classical infidelity [56].
Regarding energy, POVM-NQS and NDO show qualitatively different behaviors.The scaling exponent for POVM-NQS remains around −2 irrespective of the mixedness of the target state, suggesting that the method doesn't improve the asymptotic quantum shot complexity over naive statistical averaging or classical shadows.On the other hand, NDO exhibits a scaling exponent of approximately −1 for slightly mixed states, although this exponent gradually deteriorates to −2 for highly mixed states.This suggests that NDO has an advantage over POVM-NQS when the target state is only slightly mixed, but the advantage disappears when the mixedness increases signifi-cantly.As NDO ansatze can naturally represent purestates, this observation is consistent with the behavior of pure-state reconstruction, which was recently numerically also demonstrated to have an exponent of −1 (see Ref. [41]).We further note that NDO tends to be more accurate than POVM-NQS in terms of reconstruction error for the same sample size.Still, the classical optimization process is often harder to converge, leading to higher classical overhead.This might be related to variance problems arising from zeroes in the density matrix, similar to what was recently found for variational dynamics in Ref. [44].
In appendix D, we conduct the same study for a molecular ground-state (LiH) subject to depolarization and find a consistent picture as here.

Theoretical Analysis
The previous section discussed the asymptotic quantum shot complexity for both KL and classical infidelity.Now, our focus shifts to examining the error scaling on the energy and quantum infidelity, intending to provide a theoretical explanation for the observed behavior.To this end, we consider a simple phenomenological model of errors occurring in NQS reconstructions.After the training procedure, we assume that the NQS does not perfectly encode the target state, but has a small error denoted as δ.We

Error
KL then derive asymptotic expressions to identify how this error affects various error metrics.The findings are summarized in table 1.For POVM-NQS, which directly encodes the probability distribution of POVM outcomes, we make the assumption that q θ (σ) = p(σ) + δ∆(σ) + o(δ 2 ), (10) where σ ∆(σ) = 0.This implies that the total variation distance TV, defined as σ |p(σ)−q θ (σ)|/2 is of order δ.Building upon the theory of classical distributions learning, we understand that the sample complexity scales as TV −2 ∼ δ −2 [56].Since the energy can be expressed as an expectation over the POVM distribution σ p(σ)H σ , with H σ = tr(P σ ′ H)T −1 σ ′ σ [30], the error in energy is also of order δ.As a result, the energy error exhibits an exponent of −2.
Then tr(∆H) = 2Re ⟨∆|H|ψ⟩ = 0, since |ψ⟩ is an eigenstate.Thus the error in energy is of order δ 2 when ρ is a pure eigenstate, and of order δ 2 otherwise.This distinction arises from the cancellation of terms of order δ when the state is pure and an eigenstate of the Hamiltonian.However, such cancellation doesn't occur for mixed states or generic observables, leading to the observed scaling behavior: the energy has an exponent of −1 for pure states, deteriorating to −2 as mixedness increases.Furthermore, in appendix B, we show that KL is of order δ 2 and establish a general proposition that the trace distance can be upper bounded by the square root of the KL.This finding is consistent with the exponent of −1 for KL.
The behavior of quantum infidelity presents a more complex scenario.According to the theory of quantum state tomography [13,14,15], infidelity should exhibit an exponent of −1, which agrees with the NDO simulations when mixedness is small or large.However, in the regime of intermediate mixedness, we observe a degradation of the infidelity exponent, forming a so-called valley.We explain this phenomenon in appendix C due to the misalignment between KL and infidelity, and we can reproduce a qualitatively consistent valley with random reconstruction error ∆ in numerical simulations.The remaining quantitative discrepancies might arise from complicated interactions between the NN structure, training heuristics, and properties of the target states.
For POVM-NQS, we observe that a simple switch from I(ρ, ρ θ ) to I(ρ θ , ρ) significantly alters the behavior, indicating the presence of several negative eigenvalues in ρ θ .This suggests that the observed behavior of quantum infidelity is primarily caused by the unphysical nature of POVM-NQS reconstruction and is presented here only for completeness.We note that the same reason caused the anomalous behavior of quantum infidelity in the right top panel of fig. 2.

Conclusions
In this paper, we systematically study the sample complexity of NQS mixed-state reconstruction and compare different NQS encodings, including NDO and POVM-NQS.To achieve accurate reconstruction, we introduce a strategy to systematically suppress the noise introduced by mini-batching based on Control-Variates.We provide theoretical arguments and numerical proof that this strategy leads to significantly better accuracy of reconstruction algorithms and has no trade-offs.Even though we only discuss the case of mixed-state reconstruction, it can also be applied to any scheme based on NQS.We also open-sourced a high-quality implementation in the quantum state reconstruction (QSR) driver of NetKet [51,52].
We then present extensive numerical simulations for the finite-temperature TFIM, which is a prototypical example of realistic scenarios that experimentalists would encounter in quantum simulation experiments on NISQ devices.We find that NDO offers a quadratic advantage over POVM-NQS and classical shadows in the asymptotic sample complexity when the state is pure or almost pure.This advantage deteriorates and eventually vanishes when the target state becomes more mixed.On the other hand, POVM-NQS treats states of various mixedness on an equal footing and does not have such an advantage at all, regardless of the state's mixedness.Therefore, NDO is a more efficient tool for state reconstruction for slightly mixed states.
Our results establish asymptotic sample complexity as an important performance indicator for designing NQS architectures and showcase the advantages of enforcing physical constraints at the level of the NN architecture.We also provides a first comparison of the performance of the NDO and POVM-NQS encodings for mixed-states, which has otherwise not been investigated and might be of interest for developing variational methods to simulate finite-temperature and/or open quantum systems.
Finally, we note that in this work we focus on the error dependence of sample complexity, and the system size dependence for general mixed states using different ansazte remains to be elucidated.A more systematic study of this issue is computationally challenging as we remarked in appendix A, and we leave this for future works.classification approach".ACM Transactions on Sensor Networks 20, 1-27 (2024).

A Numerical Details
Here we list the numerical details of our studies.The NDO used in the simulations of an N -qubit system is a restricted Boltzmann machine with one layer of N hidden neurons and N ancillas [57].The POVM-NQS is an autoregressive dense NN with 2 layers of 10 neurons [30].All training are conducted via the Adam optimizer [58] with learning rate 10 −3 , batch size 100, maximal iteration number 10 5 and CV update frequency T = 50.The training is terminated when the loss value stops decreasing for 2000 iterations.All code is implemented with NetKet [51,52,53] and JAX [59].
We note that the numerical study of the NDO scaling for mixed states is computationally more challenging than the case of pure states, such as Ref. [41].This is because for pure states, measurements of all the Pauli strings in the Hamiltonian suffice to determine the state, while for mixed states they don't.Intuitively, when the pure state ansatz is trained to reproduce the probability distributions of all Hamiltonian terms, it gives an energy approximating the true energy, which is the minimal energy.Then by the variational principle, the state also approximates the ground state.In contrast, for mixed states, one has to measure an informationally-complete set of bases (e.g., all the Pauli bases) to minimize the reconstruction error.This exponentially growing basis-set size makes numerical simulations for larger systems very challenging.Nevertheless, our theoretical analysis is independent of the system size and agrees with numerical simulations on small systems.Also, a relevant open question would be to investigate what is the effect of a truncated basis set on the reconstruction accuracy, which is relevant for experimental implementation.

B Theoretical Details
For NDO, we assume ρ θ = ρ + δ∆ + o(δ 2 ), and we will omit o(δ 2 ) in derivations in this appendix.The KL divergence reads For two n-qubit states ρ and ρ ′ , let TD = ∥ρ − ρ ′ ∥ 1 /2 be the trace distance and use σ i ∈ {I, X, Y, Z} to denote Pauli operators.Define the KL over all Pauli basis as KL = 3 i1,...,in=0 KL(p i1,...,in ∥p ′ i1,...,in ), where p and p ′ denote the Bernoulli probability distributions given by the corresponding Pauli measurements and states.Then we have To show this, note that all Pauli string operators form a complete basis of the space of Hermitian matrices.Therefore, we can decompose ρ as The coefficients where p i1,••• ,in is the Bernoulli distribution given by ρ measured in the basis σ i1 ⊗ • • • ⊗ σ in .We use the same notations with primes to denote the corresponding quantities of ρ ′ .Then the trace distance can be bounded by the differences in coefficients as On the other hand, the differences in coefficients can be bounded as where the last inequality follows from Pinsker's inequality [60].Therefore, we arrive at where we have used the mean inequality.This proposition can also serve as a guarantee for training quantum density estimators in the context of quantum federated learning [61,62].

C The Valley Phenomenon
Here we aim to provide an explanation of the valley phenomenon observed in NDO: the exponent of infidelity is −1 for β → 0, ∞, while decreasing to −2 for intermediate β.
We start from our phenomenological model and try to find a relationship between KL and infidelity.Since the behavior of KL is well understood, such a relationship would give us insights into how infidelity behaves.We consider the error matrix ∆ to be drawn randomly as ∆ = AA † / tr AA † , where each entry of A follows the complex standard Gaussian distribution.Then we calculate the perturbed state ρ θ = ρ + δ∆, and normalize it again by dividing its trace.
For a given β and the corresponding target state ρ, we randomly generate 100 such perturbed states for different choices of δ ∈ [10 −3.5 , 10 −2.5 ], and calculate the KL and infidelity I against the target state.We find that the resulting (KL, I) pairs fall on a straight line in log-log scale, with r-squared greater than 0.99.The slope α = α(β) depends on β, and gives an effective power law relationship between KL and I: I ∝ KL α .This means that the way we quantify the reconstruction error impacts the scaling exponents we observe.In particular, such misalignment between KL and infidelity leads to a β-dependent difference of α(β) in exponents.Now we assume that KL has an exponent of −1, which is theoretically and numerically validated.Hence the sample complexity is proportional to KL −1 ∝ I −1/α(β) .In fig.4, we plot the simulated exponents −1/α(β) in blue, with the standard deviation indicated by the shaded region.We find a valley pattern that is qualitatively consistent with what we observe in NDO simulations, which confirms our the-oretical explanation.The rest quantitative differences might be a complicated result of the NN design, generalization, training heuristics, and the property of the target states.

D Depolarized Molecular Groundstates
In this appendix, we study the sample complexity scaling behavior for LiH ground-state subjected to depolarization noise.This scenario emerges in digital simulation or VQEs, where the quantum gates are imperfect and introduce noise that can be modeled as depolarization.We apply the parity transformation to transform the Fermionic Hamiltonian into a 4-qubit Hamiltonian that can be implemented on quantum computers (details of the transformations are found in the appendix of [27,41]).We take its ground state |ψ⟩ and simulate the depolarized states ρ p = (1 − p) |ψ⟩ ⟨ψ| + pI/2 4 over p ∈ [0, 1] as the target states.In fig. 5, we choose three depolarization intensity p = 0, 0.01, 0.1 that represents low, medium, and high depolarization regimes, and plot the corresponding results.
We observe that the behavior of POVM-NQS on LiH approximately matches what was seen on the TFIM, while the quantum shot complexity of the NDO is generally lower than that of TFIM by about 0.5.This might arise from the specific NN architecture and training heuristics used here.Intuitively, depolarization noise has less structure that can be exploited by NNs than thermal states, leading to a slightly worse scaling in the simulations.Nevertheless, an advantage of NDO over POVM-NQS can still be observed at small mixedness, while disappearing when decoherence leads to very mixed states, showing a consistent picture as TFIM.

Figure 1 :
Figure 1: NQS reconstruction of the ground state of 3-qubit 1D open-boundary transverse field Ising model using NDO with different batch sizes B. KL divergence, energy error and infidelity are used as metrics for performance.Dashed and solid lines represent the results of training with and without the control variates method.The dotted line marks the 1/ √ B scaling.All the data points are averaged over 100 random instances.

3 3 = 27
Pauli basis, train the NDO with and without CV for different values of the batch size B, and repeat every simulation 100 times with different initial conditions and random seeds.The details of the training process and NDO are listed in appendix A. In fig. 1, we compare three metrics, the KL divergence averaged over all measurement basis (KL), the error in energy ε = | tr(Hρ θ ) − tr(Hρ)|, and the infidelity

Figure 2 :
Figure 2: Sample complexity dependence on the reconstruction error for 3-qubit 1D open-boundary transverse field Ising model under different inverse-temperature β = 10, 1, 0.1 using POVM-NQS (top) and NDO (bottom).KL divergence, energy error, infidelity and classical infidelity are used as metrics for reconstruction error.Dashed lines represent the log-log linear regression results, with the slopes and r 2 values indicated in the legend.All data points are averaged over 100 random instances.

Figure 3 :
Figure 3: The scaling exponents of reconstruction error for 3-qubit 1D open-boundary transverse field Ising model under different inverse-temperature β ∈ [10 −1 , 10 1 ] with POVM-NQS (top) and NDO (bottom).KL divergence, energy error, infidelity, and classical infidelity are used as metrics for reconstruction error.The error bars are given by linear regression.

Figure 4 :
Figure 4: The valley phenomenon in scaling exponents of infidelity for different β.The exponents observed in NDO simulations are plotted in orange, while the ones given by random perturbations are plotted in blue.Error bars and the shaded region indicate the standard deviation over 100 random instances.

Table 1 :
Scaling exponents for KL, classical infidelity, infidelity, and energy error for TFIM under different β with different methods.Valley means −1 for β → 0, ∞ while decreasing to −2 for the intermediate regime.