Synergy between deep neural networks and the variational Monte Carlo method for small 4 He N clusters

We introduce a neural network-based approach for modeling wave functions that satisfy Bose-Einstein statistics. Applying this model to small 4 He N clusters (with N ranging from 2 to 14 atoms), we accurately predict ground state energies, pair density functions, and two-body contact parameters C ( N ) 2 related to weak unitarity. The results obtained via the variational Monte Carlo method exhibit remarkable agreement with previous studies us-ing the diffusion Monte Carlo method, which is considered exact within its statistical uncertainties. This indicates the effectiveness of our neural network approach for investigating many-body systems governed by Bose-Einstein statistics.

We introduce a neural network-based approach for modeling wave functions that satisfy Bose-Einstein statistics.Applying this model to small 4 He N clusters (with N ranging from 2 to 14 atoms), we accurately predict ground state energies, pair density functions, and two-body contact parameters C (N ) 2 related to weak unitarity.The results obtained via the variational Monte Carlo method exhibit remarkable agreement with previous studies using the diffusion Monte Carlo method, which is considered exact within its statistical uncertainties.This indicates the effectiveness of our neural network approach for investigating many-body systems governed by Bose-Einstein statistics.
The field of artificial intelligence (AI) has seen significant advancements in recent years, making it a promising tool for addressing complex problems in quantum many-body physics [1][2][3][4][5].Studies have demonstrated the effectiveness of AI in a wide range of applications [6], leading to increasing interest in exploring its potential for understanding nature as a whole.This paper builds upon this progress by proposing a neural network-based approach for modeling a wave function that satisfies Bose-Einstein statistics and applying it to the study of small clusters of 4 He N with N = {2, . . .14} atoms using the variational Monte Carlo (VMC) method.
Exact solutions of the Schrödinger equation are known for only a limited number of systems.In this context, quantum Monte Carlo methods have become a powerful tool for understanding quantum many-body systems.In recent years, machine learning, specifically neural networks, has gained traction as a method for studying fermionic systems, after the seminal work of Carleo and Troyer [7].Although there have been some efforts [8,9], the application of machine learning to the analysis of bosonic systems has generally received less attention.The present work William Freitas: wfsilva@ifi.unicamp.braims to address this gap.Representing trial functions with neural networks and training them through an unsupervised VMC method yields results comparable to those obtained through the diffusion Monte Carlo (DMC) method, highlighting the potential of machine learning for studying Bose-Einstein systems.The diffusion Monte Carlo method, a projective method, requires configurations drawn from a trial function that is not orthogonal to the wavefunction of the system's ground state.Even when the available trial function is of low quality, this method can still yield very good results for the ground state energy of bosonic systems.However, in such cases, estimates of other noncommuting properties with the system's Hamiltonian may be biased due to the need for extrapolation in the estimation process.An alternative approach is to use a neural network-based representation of the ground state trial function, which can provide excellent energies and most likely better estimates of many properties in agreement with experimental results.This study of 4 He N clusters is not only a proof of concept, but also a topic that continues to be of recent interest [10][11][12][13][14][15][16][17][18][19].

Methods
The stationary states of small 4 He N clusters can be modeled by assuming that they are described by the Hamiltonian where the kinetic energy term depends on the atomic mass m, and V , the interatomic interaction, on the relative distance between atoms, r ij = |r i − r j |.The variational energies of the clusters were computed using the HFD-He interatomic potential, formulated by Aziz and collaborators [20].This potential is widely employed in the study of 4 He N clusters [10,21] due to its accurate modeling of the interactions between atoms in the Hamiltonian of Eq. 1. Exclusively to gain further insights into the optimization process, the Lennard-Jones  (LJ) potential was also utilized at this stage.The HFD-He potential offers a significantly higher level of accuracy for the system under investigation compared to the LJ potential.The minimum of the HFD-He potential is slightly shifted to large interatomic separations and is marginally deeper in comparison to the LJ interaction for helium atoms.A comparison of these potentials is presented in Fig. 1.Among the very accurate potentials available to describe helium atom interactions, one of the motivations for choosing the HFD-He potential is the availability of existing literature results for the clusters we have analyzed.
The neural network architecture was built in search of efficiency.Thus, translational invariance and the symmetrical character of the wave function were encoded, since the Hamiltonian commutes with the translation operator and the wave function must obey Bose-Einstein statistics.The encoded neural network quantum state was introduced to model the ground state wave function (2) where R = {r i | i = 1, ..., N } is a set of space coordinates associated with the N particles that form the many-body system.The first term of this trial function is a product of two-body terms of the Jastrow form with a pseudopotential of the McMillan form [22], which captures short-range correlations.It is multiplied by a sum of eight functions χ α weighted by variational parameters ω α .This is an accurate function that includes Feynman and Cohen backflow [23] correlations that cause an atom position to be dependent on the whole system configuration.In fact, it goes beyond the conventional manner [24] in which backflow correlations are included.It replaces a single function by a set of functions that includes backflow.
The simplicity of the trial function in Eq. 2, with only the variational parameters ω α explicitly stated, may obscure a limitation of the variational method.This limitation arises from the presence of a substan-tial number of parameters embedded within the functions χ α , requiring optimization, which makes this process cumbersome.In the following sections, we will delve into strategies for overcoming these challenges and demonstrate how deep neural networks can be employed to effectively represent trial functions for the study of many-body bosonic systems.
Accurately handling atomic configurations is vital for an appropriate representation of the trial function.The information pertaining to each atom is presented as a vector where {r i/ } is the set of all space points other than the r i one, q i = r i − r cm is the particle coordinate relative to the center of mass r cm and q i = |r i − r cm |.
The particle r i is referenced as the main particle.Any one of the four layers ℓ of the neural network has as input a single-atom stream h ℓ i and a two-atoms stream h ℓ ij ; for ℓ = 0, the streams are h 0 i = (q i , q i ) and h 0 ij = (r ij , r ij ).The average of both streams are employed to compute the intermediate single-atom stream vector f ℓ i given by Each stream in layers with ℓ > 0 are connected through a linear operation followed by a non-linear one.Information from the previous layer is also transmitted in the form of residual connections when both streams have the same shape.These steps can be summarized as The weights V ℓ and W ℓ , as well as the biases b ℓ and c ℓ of the neural network, are variational parameters to be optimized.The final layer single-particle stream outputs h L i of each particle are reshaped into a matrix of elements h αν used to construct the orbitals (7) where each main particle is correlated to the centre of mass to bind the system and to give the correct behavior asymptotically when r i → ∞.The decaying rates a αν are also variational parameters.The functions ϕ α remain invariant under exchanges of pairs of atoms that do not include the i-th atom.This is due to the fact that the input features of the neural networks do not depend on the order of coordinates in {r i/ }.Finally, symmetric functions χ α are formed by the product where particle exchanges leave χ α invariant due to the commutativity of the product, which allows rearrangement of the terms in any order.The foundational architecture of our approach was inspired by FermiNet [25], particularly in adopting a two-stream structure.However, there are several key distinctions between the two.Firstly, our input features f 0 i and intermediate stream vectors f ℓ i are specialized for spin s = 0 particles.Additionally, our single-particle input h 0 i bounds the i-th particle coordinates to the cluster's center of mass.This specialization arises from our treatment of the entire helium atom as a single particle, eliminating the Bohr-Oppenheimer approximation.This distinction is also evident in the intermediate stream vectors, where we employ subtraction, as outlined in Eq. 4, instead of concatenating the mean of single-particle vectors h ℓ i .Furthermore, our approach diverges significantly, starting from the h L i output single-particle stream onward, to address systems made from bosons.Nonetheless, it's worth noting that the linear combination of functions χ α in Eq. 2 is equivalent to the multi-determinant expansion introduced in the fermionic function [2], establishing a final connection between the two Ansatz.
There is currently a growing focus on extending the neural network Ansatz to periodic systems [5,26].Notably, Pescia and colleagues [5] have investigated 4 He systems in one and two dimensions.Although both Eq. 3 and their trial function ensure that the input features remain invariant with respect to the symmetries of the problem, the approaches are distinct.In their work, periodic boundary conditions are implemented, while for the clusters, global translation invariance is considered.Additionally, the pooling operation, which confers permutation invariance, resembles the procedure outlined in Eq. 7. It's worth noting that their wave function construction is independent of system size, which proves advantageous once the network is trained.In contrast, our trial function is size-dependent, necessitating multiple training sessions that, nevertheless, also afford greater flexibility for the neural network to learn.
Unsupervised training of the neural network depends on the sampling of the probability distribution The expectation value of the Hamiltonian E is estimated by approximating the integral dR p(R)E L (R) by the average values of the local energy E L (R) = HΨ B DNN (R)/Ψ B DNN (R) over the sampled configurations [27].
The set θ of all variational parameters is optimized using a second-order gradient-based method that accounts for correlations between pairs of variational parameters through derivatives of energy E and density probability p [2] ∇ The optimization process updates the parameters repeatedly according to F −1 ∇ θ E, where F −1 is the inverse of the Fisher information matrix, which correlates pairs of parameters.
The extensive parameter space inherent in neural networks can be handled in different ways.Particularly, the update of neural network parameters is a crucial step in the deep learning process.The iterative solver framework [28] is a powerful approach that can incorporate different methods to facilitate efficient parameter updates.In this work, we focus on employing one of these methods that may potentially be utilized within the iterative solver framework, namely the Kronecker-factored approximate curvature (K-FAC) method [29].This method hinges on two critical principles.Firstly, it approximates the Fisher information matrix as block diagonal.This insight, rooted in the domain of neural networks, recognizes that for functions represented by neural networks, elements F ij of the Fisher matrix effectively become zero if the network parameters Θ i and Θ j pertain to different layers.This approximation resonates with the hierarchical structure of neural networks, where parameters across distinct layers tend to exhibit a high degree of independence.
Secondly, to augment computational efficiency, the method employs Kronecker factorization to approximate the inverse of each of these blocks, including slight modifications to extend its applicability to unnormalized probability densities [2].This technique offers a highly effective approach for managing the large-scale matrices inherent in neural network-based functions, thereby rendering the optimization process computationally manageable.
Collectively, these considerations underpin the methodology, facilitating the optimization of trial functions based on neural networks for intricate systems.In our specific case, this approach leads to a significant reduction in computational complexity, replacing the inversion of matrices with dimensions on the order of 10 5 with the inversion of ten matrices, each with dimensions of about 10 2 .
In the optimization process, the selection of hyperparameters plays a crucial role in ensuring stability and efficiency.Many of these parameters can be extracted from the FermiNet approach [25].The typical parameters for the simulations are provided in Table 1.The step size of proposed movements is not included, since it is dynamically adjusted during the simulations.It is worth noting that the width of the layers can be reduced for smaller clusters, particularly for the single-atom stream width.It is worthwhile to discuss why the results were obtained without a parameterization of the Jastrow factor in Eq. 2, as is typically done in variational calculations for systems composed of helium atoms.The main motivation for this choice was to enhance the efficiency of the optimization process.Preliminary tests, where this factor was parameterized, showed an unstable optimization process due to the intense character of the helium short-range interatomic interaction.While the HFD-He potential does not diverge at the origin, it does exhibit considerably high values for small pair separations.In finite-time runs, no sampled configuration will feature a pair at distances smaller than approximately 1 Å.However, eventually sampled configurations may involve a pair separation where the potential still has a substantial energy, and a local energy that significantly deviates from the average value.In principle, the Laplacian of the wave function should perfectly cancel out this value.However, the proposed trial wave function is only an approximation for the description of the ground state, and this perfect cancellation does not occur.Consequently, these imprecise estimations can significantly impede the optimization process by pushing variational parameters away from their optimal values and rendering the optimization process excessively slow.
The absence of the parameterization of the Jastrow factor in a trial function represented by a neural network for a system composed of helium atoms stands in contrast to approaches used in systems governed by Coulombic interactions [2,31,32].The latter architectures prove especially effective in handling electron-electron correlations, scenarios where the Coulomb potential exhibits relatively predictable and manageable behavior.
These findings were validated through a simulation employing a Lennard-Jones potential with parameters ε = 10.22K and σ = 2.556 Å for a 6-atom cluster.In this scenario, the Jastrow factor in Eq. 2, incorporating a McMillan pseudopotential with an r −5 dependency, ensures an asymptotically correct solution to the Schrödinger equation as r → 0. The performance of the neural network with this Jastrow factor choice is depicted in the inset of Fig. 2, illustrating the total energies throughout the optimization process as a function of the performed iterations.The uncertainties associated with the energy are omitted due to significant variances in the estimated values during the initial iterations, making it difficult to observe the energy's progression as the optimization advances.
Following the optimization, a standard VMC simulation with fixed variational parameters yielded a total binding energy of -1.8007(1) K, a value notably higher than that obtained with the HFD-He potential.This is the only result presented for the LJ potential, since its inclusion in this work was solely for the purpose of investigating the optimization stability.
The optimization of the parameters of the DNN Ansatz, Eq. 2, is performed to minimize the energy.Initially, the learning process was focused on determining parameters that yielded reasonable energy values.Subsequently, without an explicit directive, the neural network learned to reduce the variance, leading to more precise results as a consequence.This behavior can be seen in Fig. 2.More specifically, each optimization iteration consisted in obtaining a set of sampled configurations by applying the Metropolis algorithm followed by computing the energy and its gradients, the Fisher matrix, and updating the parameters to conclude the iteration.Typically, the number of configurations, i.e. the batch size, in each step was 2 13 .After reaching a converged energy with a small variance, the optimization process was stopped.Then, a standard variational simulation was performed to get the final results and their associated variances.
At the beginning of the optimization process, large fluctuations in the total energy of the system often obscure the evolution of this quantity.To provide a clearer view of its progression during the optimization process, the uncertainties of the values shown in Fig. 2 are not displayed, as in the case of the LJ system.
Moreover, during the optimization, it was observed that this process evolved in a less than optimal manner, exhibiting slight instabilities after obtaining parameters that reasonably described the cluster energy.To address this challenge, we incorporated a hard sphere (HS) potential into the interatomic interaction given by the HFD-He potential during the course of the optimization.This addition addresses the issue that the r −5 dependence in the Jastrow factor of Ψ B DNN , Eq. 2, does not properly account for, in terms of behavior at short pair distances.The HS potential prevents atoms from getting too close to each other, overcoming the limitations of the Jastrow factor at small distances, which can lead to lowprobability configurations.
The specific chosen value (1.8 Å) of the HS radius is not critical and did not produce any significant changes in the energy estimate.In fact, the weak unitarity satisfied by the helium clusters, makes the addition of an HS potential irrelevant, which will be discussed later.The optimization process with an infinite barrier at the HS radius is also shown in Fig. 2. The results indicate that the inadequate functional dependence of the pseudopotential in the Jastrow factor is a source of instability during the optimization process.The inclusion of the HS potential was able to lower the variances.Therefore, for all optimizations performed (and only in this stage), the HS potential was added to the HFD-He interatomic interaction.

Results
The reported final cluster energies were obtained in a standard VMC calculation with the interatomic interaction V given exclusively by the HFD-He potential while keeping the Ψ B DNN parameters fixed.These results are presented in Table 2. Comparisons of the variational results with those from the DMC show that within statistical uncertainty they are in excellent agreement.This is remarkable because the DMC results are in principle exact in the Monte Carlo sense.For most of the clusters, the variational results are within one statistical standard deviation from the DMC results.
A three-dimensional plot of the two-body wave function is displayed in Fig. 3, where one of the atoms is at the origin and the probability amplitudes are shown for points (x, y) in the plane z = 0.The prob-  ability amplitude exhibits a peak at an interatomic separation of approximately 3.8 Å.The bond length, determined by the average value of the pair distance ⟨r ij ⟩, is 36.6±0.6 Å.However, small pair separation is not unexpected, since the interatomic interaction used does not include relativistic and quantum electrodynamics contributions [33].The 4 He 2 dimer is known to exist mostly in the quantum tunneling regime in a quantum halo state [34].The plot demonstrates that even though radial symmetry was not explicitly imposed, the neural network was able to learn it.The pair density function ρ N (r) for a cluster of N atoms is estimated by the operator The integration of ρ N (r) in the whole space gives ρ N (r)r 2 dr = N (N −1)/2.The pair density function is shown in Fig. 4.
Helium clusters are bound by the attractive van der Waals interaction, typically characterized by a (−C 6 /r 6 ) tail.This phenomenon arises due to the zero-point fluctuations of atomic dipole moments, leading to short-range correlations.In the study of these clusters, attention has recently been directed to the universal character of the short-range correlations [10,15] by adapting the Tan relations [35][36][37] to clusters [38][39][40].Within the framework of weak unitarity, the concept of contact that emerges from these relations serves to quantify the density of closely interacting pairs in strongly interacting systems, offering insights into their behavior at short distances.
While strong universality is not expected in helium clusters due to their characteristic distances (with an average pair distance of approximately 5 Å and a van der Waals length of about 5.4 Å [15]), it is well established [22] that the behavior of helium systems is primarily influenced by the closest pair of atoms.Nonetheless, provided there exists a strongly interacting model at short distances, the N -body wave function can still be factorized into the product of a universal two-body function and a state-dependent function [15].This remains valid, because in close proximity, a correlated pair is minimally influenced by the surrounding particles.Consequently, its two-body wave function should exhibit consistency regardless of the system's size or state.This scenario is called weak universality [15].
The r-independent two-body contact C as a function of the radial distance, are displayed in Fig. 5.As shown, there is a clear collapse of the pair density functions for all cluster sizes.These results indicate that even the largest clusters, with up to 14 atoms, exhibit a weak universality.This is consistent with the coalescence of N-body atoms in a Bosonic system, as previously predicted [39].The pair-atom contact C (N ) 2+1 [10] will be discussed in a separate publication.

Final comments
The use of neural network-based representation of the trial function for a bosonic system yielded results for core properties of 4 He N clusters that are in agreement  shown by colors from light to dark.
with those obtained using the exact diffusion Monte Carlo method.
The implemented approach for the deep neural network does not rely on external information.The chosen representation of the trial function, by explicitly satisfying the Bose-Einstein statistics, avoids machine time spent considering non-physical solutions.Additionally, by satisfying global translation symmetry, the representation eliminates the need to consider infinite degenerate coordinates that represent the same physical situation.These features of the representation allow for an efficient optimization process, able to obtain results with modest computational resources.
Our results also demonstrated the importance, in the optimization process, of two-body correlation factors with the appropriate properties at the asymptotic limit r → 0. The weak universality of 4 He N clusters enabled the inclusion of an HS potential in the atomic interaction, which stabilized the optimization process by eliminating configurations with very low probability, where two atoms are too close together.Previous literature [10,41,42] has presented various proposals for the two-body correlation factors of 4 He N clusters.The optimization process, due to its features at the limit r → 0, could be valuable in determining the correlation factor that most accurately describes the system.An improved functional form of the two-body correlation factor in Ψ B DNN may remove the need for a cut-off in the interatomic interaction during optimization.An alternative and simpler approach that could also improve optimization is to use a soft sphere potential instead of an HS potential, which would allow the wave function to smoothly approach zero at short distance separations between atoms.
The weak universality of 4 He N clusters is shown through the collapse, up to the maxima, of the pair density functions normalized by their respective contacts C (N ) 2 . These results were obtained by using a deep neural network to represent the trial function Ψ B DNN .This approach is validated both by previous findings [10,15] and the specific analysis performed in this work.
One way to expand on this research is to study larger clusters and use the liquid drop model to extract the contact C (2) 2 for bulk liquid 4 He.Another avenue for investigation could be to look into the effects of impurities in the clusters.

Figure 1 :
Figure 1: Potential energy V as a function of the pair separation r for the interatomic potentials Lennard-Jones (red) and HFD-He (blue).

Figure 2 :
Figure2: The total energy, without associated uncertainties, is presented to elucidate its evolution as a function of the optimization step for a cluster of N = 6 atoms employing the HFD-He potential (depicted as blue dots).The influence of incorporating a hard sphere (HS) potential is represented by the orange dots (refer to the text for details).The inset illustrates the optimization process for the Lennard-Jones system.

Figure 3 :
Figure 3: The wave function of the helium dimer is shown, with light surface colors emphasizing regions of large probability amplitude in an arbitrary scale.The space coordinates are given in terms of rm = 2.963nm, where the potential is minimum.
a fit parameter adjusted for values of r up to the maximum value of ρ N (r) in the minimization of the integral (ρ N (r) − C (N ) 2 ρ 2 (r)) 2 dr.The results of the pair density functions for 4 He N clusters, ranging from N = 2 to N = 14 atoms, normalized by contacts C (N ) 2

Figure 5 :
Figure 5: Pair density functions of 4 HeN clusters with N = 2 to N = 14 atoms normalized with the appropriate contact C (N ) 2

Table 1 :
[30]cal hyperparameter values employed in the simulations.An open-source version of the code used in this work to optimize the trial functions is available[30].

Table 2 :
[21]tic ⟨T ⟩ and total ground state ⟨E⟩ energies obtained with the DNN ansatz in units of Kelvin are presented for a4He cluster of size N .The fourth column shows DMC results from the literature[21].