Automatic design of Hamiltonians

We formulate an optimization problem of finding optimal Hamiltonians by analogy to the variational principle. Given a variational ansatz for a Hamiltonian we construct a loss function as a weighted sum of relevant Hamiltonian properties specifying thereby the search query. Using fractional quantum Hall effect as a test system we illustrate how the framework can be used to determine a generating Hamiltonian of a finite-size model wavefunction (Moore-Read Pfaffian and Read-Rezayi states) or to find optimal parameters for an experiment. We also discuss how the search for approximate generating Hamiltonians may be used to find simpler and more realistic models implementing the given exotic phase of matter by experimentally accessible interaction terms.


I. INTRODUCTION
In quantum physics one often starts by postulating a Hamiltonian of the system and studying it's properties. Inventing a simple minimal Hamiltonian capable of capturing certain effect is an art and an important step towards design and control of quantum systems.
Nature (including experiment) often operates with Hamiltonians that include a mix of such simple terms. Navigating this "Hamiltonian space" efficiently is crucial for designing a (numerical) experiment that best exhibits the physics we are after. In addition, because of the often non-trivial interplay of simple model terms, combining them we could as well find completely new physical effects as exemplified by the recent discovery of the many-body-localised phase [1][2][3] found for the disorder, hopping and interaction terms mixed in an appropriate proportion.
A large number of numerical methods such as exact diagonalisation (ED), density-matrix renormalization group (DMRG), quantum Monte-carlo (QMC) and others have been developed for solving quantum systems such that when faced with a particular Hamiltonian we in most cases have a numerical tool that could help us solving it and eventually compare two Hamiltonians according to the relevant property. In this situation one may wonder if we could systematically look for the Hamiltonian that maximizes that property.
In this work we present a numerical framework that automates the search for Hamiltonians that are best in some user-defined sense. Examples of the search queries are endless and may include questions like: "What is the most stable Hamiltonian with certain quasiparticle statistics? or "What is the minimal approximation for a generating Hamiltonian of a certain wavefunction?" or "How do I combine the limited number of experimentally accessible Hamiltonian terms such that the many-body ground state is in the desired universality class?" The three applications we will illustrate are finding parent Hamiltonians for model states, "extrapolating" the model states to larger system sizes and identifying the optimal conditions for an experiment.
A special case, the problem of finding the generating Hamiltonian given an eigenstate attracted interest in literature recently [4][5][6][7]. Given a wavefunction and having chosen a set of admissible Hamiltonian operator terms {Ô b m } one can non-iteratively construct a unique local hermitian opera-tor for which the original wavefunction is an eigenvector [4][5][6] provided that all the necessary operators are available. A more experimentally-relevant approach [8] requires the measurement of sufficiently many locally-supported operators in an eigenstate or a Gibbs state but doesn't need the access to the full wavefunction.
Numerical optimization was used in Ref. [7] to show that for the ground states of lattice Hamiltonians described by quantum field theories with emergent Lorentz invariance the parent Hamiltonian may be found iteratively and in Ref. [9] to find an effective Hamiltonian that best reproduces a given spectrum or entangelement spectrum. The problem of finding a density matrix that fits a given reference density matrix [10][11][12] has been considered for lattice spin models and named Quantum Boltzman Machine. It may be used to approximate the Hamiltonian that generates a given density matrix.
Our approach is new in that it seeks to optimize a weighted combination of several terms as opposed to using only one of them. Additional terms in the loss function allow us to look for the Hamiltonian that is not only in the same phase of matter as the reference state but also has other favourable properties such as energy gap or entanglement entropy. We can work with any continuous parametrisation and find a solution even when the set of available Hamiltonian terms is limited.
The freedom to combine several terms in the loss function provides for essentially infinite number of possible applications. One of them, discussed in Section IV C will help design quantum experiments profiting the most of the available numerical simulations and realistic Hamiltonian models. This way of experiment/Hamiltonian design has not been discussed in literature to our knowledge.
We describe our framework in Section II, briefly introduce fractional quantum Hall effect, our test system, in Section III and present the applications in Section IV.

A. Hamiltonian parametrisation
We assume the existence of an ansatz for the Hamiltonian that continuously maps a parameter vector from a convex subregion of R M into a valid Hamiltonian operator.
One way to achieve this is by linearly combining M "basis" This form is useful if we would like to consider Hamiltonians made of certain type of terms (say, only translationally invariant Hamiltonians with nearest-neighbour couplings). In the experimental context the operators {Ô b m } may be all the operators that can be implemented in the particular experiment.
On the other hand as discussed in Sec. IV C the Hamiltonian parametrisation may be given by a theoretical model that constructs an effective Hamiltonian given values of experimentally relevant parameters.
In any case we assume some pre-defined general structure of the Hamiltonian such that the number of parameters only grows polynomially with the system size, not exponentially. In principle, the approach described below may be applied also when every matrix element in the exponentially large Hilbert space is considered a parameter but we do not expect it to be practically useful for such parametrizations.

B. Set of observables and numerical "back-end"
A particular application is defined by a set of Hamiltonian properties we are interested in. Eventually, we will be searching the space of Hamiltonians with the goal to find the point extremising a loss function LF made of a weighted sum of several observables. Below we define the observables used in this work whereas there are no restrictions on what they may be except the fact that an available numerical tool must be capable of evaluating the observable given a fixed Hamiltonian.
Relative Energy variance is defined for a given reference wavefunction |ψ r and a Hamiltonian H O as In case |ψ r is an eigenstate of H O this quantity is zero. Otherwise its magnitude can be used as a measure of how close is |ψ r to being an eigenstate of H O . We will quote (σ rel E ) 2 in our results however in the optimisation we will use which ensures that the minimisation of relative energy variance is not achieved by an increase in the ground state energy.
We use two measures of similarity between two wavefunctions. One is simply the absolute value of their overlap or scalar product: | ψ|ψ r |. Another measure can be defined if we view the two wavefunctions as probability distributions in the Hilbert space. The distance between such distributions can be defined as their relative entropy also known as Kullback-Leibler divergence (or KL divergence): where α i and β i are the expansion coefficients of the two wavefunctions in a basis |b i spanning the Hilbert space: |ψ r = i α i |b i and |ψ = i β i |b i . In case two wavefunctions are identical their overlap is one and the KLdivergence is 0. We note that KL-divergence only depends on the absolute values of the wavefunction coefficients and not their signs thus, it's zero value is a necessary but not sufficient condition for two wavefunctions to be identical.
In case one has access to the full finite-temperature density matrix of the system the difference between reference and any other density matrices may be measured by their relative entropy [10,12,13].
Further observables that will be used include: ground state energy E 0 , energy gap ∆ N for a particular system size N, energy gap extrapolated to the thermodynamic limit ∆ extr , symmetry generator quantum number evaluated in the ground state wavefunction.
In general, for any observable T ({γ}) a term proportional to (T ({γ}) − T ref ) 2 may be added to the loss function to select the solutions where the value of the observable is close to the desired value T ref .
The loss function LF will be a weighted sum of the chosen observables evaluated for a particular finite system size N : t N k or a system-size independent property T s where G N are constant coefficients used to prioritise certain system sizes if needed (in most cases we set G N proportional to the Hilbert space dimension) and the weights w k and w s are hyperparameters defining the importance of every observable. The terms included into the loss function and the weights fully specify the question we are answering by minimising it. A regularisation term proportional to We can define the desired search region in the parameter space by adding a term quickly growing outside of it to the loss function. This may depend on the application and is especially useful when {γ m } are experimentally tuneable parameters for which a natural range of allowed values exists.
In this work we will use exact diagonalisation to evaluate the observables of a given Hamiltonian. Any other numerical tool suitable for a particular application may be used instead. We note that exact diagonalisation is the most computationally intense method and thus using approximate methods may only make the computation more feasible.

C. Optimisation problem
We formulate the optimisation problem as follows: find the set of {γ m } from a convex subregion of R M that minimises the loss function defined in Eq. 4.
If possible the loss function should be designed such that the optimisation problem has a unique solution and is convex. A prominent feature of non-convex problems is that there may exist multiple solutions to the problem and finding the global minimum may be very hard. Although it may appear problematic from the abstract mathematical point of view also in this case we can obtain practically very useful physical results by following several simple strategies. The situation is analogous to the original variational principle being useful despite the fact that the optimisation problem might not be convex for some choices of the Hamiltonian and ansatz wavefunctions.
First, a non-convex function may be convex on a subdomain. A simple example is a cosine which has many minima on [−∞, +∞] but only one on [0, 2π]. Adding a regularisation term (Eq. 5) to the loss function that penalizes deviations from a certain reference Hamiltonian is a way to profit from this property in cases when a meaningful reference Hamiltonian exists. In a general case one can always use trivial zero as a reference rendering the search to be the search for the simplest, minimal model. In the experimental context there is often a well defined preferred region of experimental parameters and a point in that region can be used as a reference Hamiltonian. In the cases described the problem formulation becomes "What is the minimal deformation of the reference Hamiltonian that optimises the properties of interest?".
In presence of a symmetry there may exist several points in the parameter space that would produce equally good solutions. Such symmetry should be explicitly broken s.t. the loss function prefers only one of the symmetry-related Hamiltonians.
Constructing the loss function out of multiple physical variables is another way to increase the chance of a unique solution. Suppose there are several Hamiltonians that have high overlap with a certain wavefunction. It is less likely that they will also yield identical ground state energy and energy variance.
Various further techniques have been developed in the field of non-convex optimisation. As an example one can start the optimisation from a several different points and pick the minimum solution should the converged results differ. Evaluating the Hessian matrix may be used to escape a saddle point. If stuck in a deep local minimum one can add some "temperature" over a few parameter updates which may help escaping from it.
Finally, for some applications finding any solution may be of great value despite the fact the found solution might not be unique.

D. Optimisation algorithm
In this work we use non-linear conjugate gradient descent [14][15][16] algorithm (CGD) to solve the optimisation problem set up above. Other optimisation schemes may be used as appropriate for the particular application.
Conjugate gradient descent is an iterative first order method where the iterations start with an initial guess for the solution {γ 0 m }. In the absence of any prior information such a guess can be generated randomly. Next, the gradient of the loss function w.r.t. all of the {γ m } is calculated. Where possible the partial derivatives of the loss function should be calculated analytically. If that is not possible the gradient can be estimated numerically by evaluating the loss function at a small displacement from the initial point {γ 0 m }. In the worst (latter) case the computational effort of this step grows linearly with the dimension of the parameter space M .
Once the gradient direction is calculated a step is made in the opposite direction. The size of the step is chosen s.t. it minimises the loss function the most, which we implement using golden section line search [17]. Further updates move in the "conjugate" direction which admixes the history of previous updates to the direction of the current gradient (see Sec. IV C for an illustration). In most of this work we use the Hestenes-Stiefel [18] formula for calculating the conjugate direction whereas Polak-Ribeire scheme [19] was used in Section IV C. We reset the step direction to pure steepest descent every M steps.

III. TEST SYSTEM: FRACTIONAL QUANTUM HALL EFFECT
In this work we will illustrate a few applications of the described framework on the example of the fractional quantum Hall states and Hamiltonians and we briefly introduce them here.
The fractional quantum Hall effect [20,21] is due to strongly-correlated many-electron states formed in 2D electron gas subject to a perpendicular magnetic field. If the field is strong, the resulting Landau levels are well separated and the non-trivial physics can be approximately described by the Coulomb interaction within the topmost partially filled Landau level. Thus, ignoring dimensional constants the Hamiltonian in real space may be written simply as To avoid having boundaries that can lead to severe finite-size effects one often works on the spherical shell enclosing a magnetic monopole of strength 2S [22], where the solution to a one-particle Schroedinger equation is given by the generalised spherical harmonics Y Slm [21]. Angular momentum projection m ∈ [−l, l] is the only "free" quantum number given fixed system size and Landau level index . Thus a many-body fermionic state can be thought of as a one-dimensional chain of zeros and ones corresponding to the presence or absence of an electron in a certain "orbital". Coulomb interaction corresponds to a (momentum conserving) pairwise interaction between all the filled orbitals. A very useful way of parametrising any momentumpreserving interaction in this setting is given by the Haldane pseudopotentials V m [22,23] where the operatorsP (2b) m project each pair of electrons into a state with a fixed total angular momentum m. The set of numbers V m thus defines an interaction and the pseudopotentials corresponding to the Coulomb interaction are well known [21]. For spin-polarized systems, we will be concerned with in this work, only odd values of m in the sum 7 are allowed due to fermionic statistics. Note that in addition to the 2-body one may consider 3-body terms that project the triples of electrons into a fixed total angular momentum state.
For the optimisation purposes we may use the pseudopotentials V m as our variational parameters γ i and the projectorŝ P (2b) i as a set of the allowed operatorsÔ b . To stabilise the optimisation we would like to exclude the trivial transformations of the Hamiltonian of adding or multiplying it by an overall constant. Such transformations do not change the nature of the many-body ground state. In most cases we avoid this problem by freezing the value of one of the pseudopotentials to 1. In cases when this might introduce an unphysical bias we instead fix the sum of all the absolute values of pseudopotentials.
Spherical geometry is used throughout this work. The valid many-body states on the sphere are eigenvectors of the total angular momentum L and the requirement that the state is spatially uniform corresponds to the condition L = 0. A term w L L 2 may be added to the loss function to enforce that the ground state of the Hamiltonian has L = 0.

IV. APPLICATIONS
A. Finding (approximate) generating Hamiltonians from an eigenstate / finding simpler, more realistic ways to implement a given phase of matter Given a reference wavefunction |ψ r we would like to find the best approximation to it's generating Hamiltonian expanded as a linear combination of a given set of hermitian operators {Ô b }: H = m γ mÔ b m . The generating Hamiltonian must approximate the reference wavefunction with it's ground state |ψ o . Besides finding the actual parent Hamiltonians this formulation can also be used to find simpler realistic Hamiltonians that implement the same phase of matter as the true parent Hamiltonian but only involve the chosen set of terms {Ô b } that can be implemented in a particular experimental setting or preferred on other grounds.
We define an approximation to the generating Hamiltonian as the Hamiltonian that minimizes the loss function from Eq. 4, where the largest weights are given to the overlap ψ o |ψ r , KL-divergence and energy variance. Besides overlap and KL-divergence other terms suitable for a particular case may be used to encourage the proximity of the final solution to the desired phase. For the situations when the solution is not unique (may be the case for non-local Hamiltonians) we take the solution closest to the given reference interaction as an answer.
By construction, the Hamiltonian found approximates the desired model wavefunction with its ground state as opposed to a state in the middle of the spectrum. Besides, since we are looking for an approximate generating Hamiltonian there will always be at least one solution and we do not risk to find a symmetry operator instead of the Hamiltonian. These features represent the major distinction between our framework and the covariance-matrix-based methods described in literature recently [4][5][6]. On the other hand we make no assumptions about the relation between the parent Hamiltonian and the reduced density matrix as opposed to Ref. [7]. The generality of our method comes at a price of potentially higher computational cost and it may be reasonable to attempt one of the methods described in Refs. [4][5][6][7] first, should the system in question satisfy the respective assumptions. Moreover the results of a covariance-matrix-based method [4][5][6] may be used as the starting point for the iterations in our approach although we didn't do this. Finally, as we demonstrate here, our method performs well in situations when not all Hamiltonian terms needed for exact state generation are available. The Moore-Read Pfaffian state [24] (MR Pfaffian) is one of the candidates to explain the ν = 5/2 FQHE. It describes a gapped topological strongly-correlated phase of electrons with quasi-particle excitations governed by non-Abelian braiding statistics. The MR Pfaffian state can be exactly generated as a ground state of the 3-body interaction where only a single 3-body pseudopotential corresponding to the closest approach of 3 electrons V (3b) M =3 is non-zero. In this section we test if we can establish this fact automatically by means of optimisation.
Let the 12-electron MR Pfaffian wavefunction on the sphere be the reference wavefunction for which the generating Hamiltonian is to be determined. For this system m ∈ [−10.5, 10.5] and the Hilbert space dimension is 16'660. We consider the set {Ô b } that includes all 2-body and eleven lowest 3-body pseudopotentials: V Even-M 2-body and M = 0, 1, 2, 4 3-body pseudopotentials are excluded by fermionic statistics for spin-polarized systems. The 3-body pseudopotentials corresponding to M = 9, 11, 12, 13, 14 are multi-valued and we parametrise each of them with 2 diagonal entries of a 2 × 2 matrix. Thus in total we have 11 (2-body) and 16 (3-body) free parameters and the dimension of the parameter space in which we search for the generating Hamiltonian is 27. The loss function LF was mainly composed of the overlap, KL-divergence, regularization and energy variance . The pseudopotentials are initialized with random numbers uniformly distributed between 0 and 1 which means we input no prior knowledge of the expected form of the resulting Hamiltonian.
The convergence towards the correct solution is shown in Fig. 1. After 45 steps all the "unnecessary" pseudopotentials are below 10 −2 while the value of V We have demonstrated that in cases when the chosen set {Ô b } includes all the operators necessary for exactly generating a wavefunction we are able to find the exact generating Hamiltonian. In the remainder of this Section we consider a more realistic problem formulation: find the Hamiltonian that generates the best approximation to the desired wavefunction given that the accessible operator basis {Ô b } does not include all the terms needed to generate that wavefunction exactly.
To illustrate this scenario we ask what is the minimal 2body Hamiltonian that best approximates the MR Pfaffian with it's ground state. The loss function includes contributions for system sizes between 6 and N e max train electrons (with N e max train between 10 and 16). To fix the energy scale we fix V 1 = 1. The optimisation results are summarised in table I. We find that all the pseudopotentials V m converge to near-zero values for m > 3. The value of the V 3 pseudopotential converges to values between 2.5 and 3 depending on N e max train . Note that the possibility of approximately generating MR Pfaffian using only two 2-body pseudopotentials V 1 and V 3 has been discussed in literature (most recently in [25]) and the ratio V 1 /V 3 was found to extrapolate to 3 with increasing the system size to infinity. Thus our results are consistent with prior literature whereas the overlaps we find are slightly higher than those in Ref. [25] (see Table I).

Read-Rezayi state
The Read-Rezayi state [26] (RR) is a possible description of the ν = 13/5 FQHE (it's particle-hole conjugate (cRR) -the It is a gapped topological phase with quasiparticle excitations governed by non-Abelian braiding statistics complex enough to implement universal quantum computing [27]. The RR state can be exactly generated as the ground state of the 4-body Hamiltonian with only contact 4body interaction turned on. Here we again attempt to approximate the generating Hamiltonian of the RR state with only 2-body Hamiltonian terms. While looking for the minimal generating model we find that all V m for m > 7 are irrelevant and converge to zero. The results for the optimal values of the remaining 3 pseudopotentials are presented in Table II.
The possibility to produce the approximation to the RR state as a ground state of a two-body Hamiltonian has been discussed in literature recently [28,29]. In ref. [29] an extensive parameter scan of the 3 lowest pseudopotentials (V 1 , V 3 , V 5 ) was performed and in Ref. [28] an analytic meanfield calculation. It was found that an approximation to the RR state can be generated using the lowest 3 pseudopotentials such that V 1 : V 3 : V 5 = 6 : 3 : 1 in the thermodynamic limit. The values we find are in agreement which is another test of our procedure.
B. Extrapolating ground states to higher system sizes Another application comes as a natural extension and test for finding the generating Hamiltonians. In general, the Hamiltonian we find is approximate.
Intuitively, we expect that if we used system sizes of up to N max e tr electrons to learn the Hamiltonian than a goodapproximation Hamiltonian would produce the ground states in the same phase of matter also for higher system sizes.
To quantify this definition we will calculate the overlap be-  tween the true model wavefunction (|ψ r ) and the ground state of the learned optimal Hamiltonian (|ψ o ) at the system sizes beyond those used in training (N e test > N max e tr ). The Hamiltonian giving high overlaps has high "predictive power" and thus the optimisation scheme could be used to "extrapolate" the ground states in certain phase of matter to higher system sizes.
Here we investigate how successful this program may be in the (more general) case when the Hamiltonian ansatz does not include the terms that are needed to generate the desired wavefunction exactly. Throughout this section we use 2nd Landau level Coulomb interaction as the reference and thus are finding the minimal deformation of the Coulomb interaction that reproduces the model wavefunctions.
The loss function is mainly designed to optimize the overlaps and the energy variances of the "training" system sizes (6 to N e max training for MR Pfaffian). There is an additional importance factor on the order of 1000 assigned to the largest training system N max e tr . As a result the optimisation is mainly looking to improve the overlap and energy variance for the largest training system.
We expect that the predictive power of the model should increase with the maximum size of the wavefunctions used in the learning phase. This is the trend we observe for the MR Pfaffian (Table III), RR (Table IV) states: the overlap and the energy variance improve with increasing the maximum training system size. Finite-size effects may disturb this tendency for small system sizes as can be seen on the example of the N max e tr = 12 MR Pfaffian state. The Hamiltonian we learn can only reproduce the correlations present in the training systems. Should all the training systems be too small to fully accommodate a certain phase, the learned Hamiltonian will fail to approximate the ground states at the higher system sizes where all the correlations are fully manifested. This is the situation we observe for the RR state (see the first column of Table IV ): if only RR wavefunctions for 4,6 and 8 holes (N φ =12,17 and 22) are used the learned optimal Hamiltonian can only poorly approximate the RR state at higher sizes. Thus, if the extrapolation to higher system sizes is the goal one must train on the system sizes beyond the characteristic correlation length of the phase.
Overall we conclude that the framework can be used to construct a good approximation of a wavefunction at higher system size by "learning" the generating Hamiltonian on a smaller and diagonalizing it for a larger system. An important prerequisite for this is that the training system is large enough to have developed all the correlations characteristic of the underlying phase of matter.

C. Finding optimal experimental conditions
Here we illustrate another important application of the framework that can be useful for both theory and experiment.
Suppose we have a quantum system for which we can control a few (experimental) parameters. In addition, assume there is a theoretical model that translates these parameters into a Hamiltonian. How do we choose the values of these parameters such that the system has the best possible value of a certain observable of interest?
As an example, let's consider fractional quantum Hall effect at filling factor ν = 12/5. The two experimental parameters will be the finite width of the quantum well in which the quasi-2D electron gas resides, parametrised by w -the width of an equivalent infinitely deep quantum well (details on such parametrisation can be found in the appendix of Ref. [30]); and the strength of Landau level mixing κ that is directly controlled by the strength of the applied magnetic field: κ ∼ 1 √ B . As the model translating these two experimental parameters into a Hamiltonian we will use the continuous version of the Landau level mixing (LLM) model derived in Refs. [31][32][33]. In particular, we use the pseudopotential corrections presented in Ref. [33] (and used in Refs. [30,34]) for the five values of the finite width (w = 0, 1, 2, 3, 4) to interpolate the pseudopotentials corrections by a 4th order polynomial In a nutshell, the model [33] provides the values for the 2-and 3body pseudopotentials as a function of the effective width and LLM strength: V m = V m (w, κ). Now, instead of varying the pseudopotentials as in the previous sections we will vary the effective width and LLM strength which will be mapped to a Hamiltonian H = H(w, κ) through the continuous version of the model [33].
The loss function is defined as a weighted combination of the extrapolated (to the infinite system size) neutral gap (weight -100), the overlap with the model non-Abelian particle-hole conjugate of the Read-Rezayi state (weight -3.7) and the total angular momentum quantum number L measured in the ground state (weight 2). So the question we are asking is: "What are the experimental parameters w, κ most suitable for observing the most stable (high neutral gap) non-Abelian physics (high overlap with RR)?" Including L into optimisation we ensure that we are looking for spacially uniform states with L = 0, which is the basic property of all valid FQHE states.
To visualise the search in this 2-parameter space we first calculate the relevant observables (for the system sizes with 4,6,8 and 10 electrons) on a 2D grid of 10 × 11 = 121 points. The optimisation landscape defined by the loss function is shown in Fig. 2, where we observe the maximum around (w, κ) = (3, 0.2). The plots of each individual observable (L, extrapolated gap and RR overlap can be found in Fig. 3 of the Supplementary Materials VII ).
The overlap (the overall loss function) has a maximum (minimum) along a narrow stripe around κ ∼ 0.2. This creates a long valley, a generic situation which originally motivated the development of the conjugate gradient descent algorithm. Taking into account the value of the gradient on the previous step conjugate gradient descent is able to go along the valley reaching the optimal point on the 5th step. The simple gradient descent on the other hand keeps bouncing between the sides of the valley (see the bottom panel of Fig. 2) and doesn't reach the optimal point after making 24 steps.
In this 2-parameter example the number of loss function evaluations (118) was comparable to the grid search (121). However the optimal point was found here to machine precision whereas it would only be determined up to the grid step for the grid search. The real advantage comes in situations when more parameters need to be optimised, where the conjugate gradient descent will scale linearly (gradient estimation) with the parameter space dimension as opposed to the exponential scaling of the grid search.
Note that in this example we have limited the range of the parameters κ and w to the region where the model is expected to be valid: [0,1]×[0,3] by setting the value of the loss function to predefined unfavourable values which drives the optimisation procedure away from unwanted values. In the same manner in general setting the parameter values may be limited such that we find the best solution in the experimentally accessible region of the parameter space.

V. CONCLUSIONS
We've presented a general optimisation framework for searching for optimal Hamiltonians given a variational ansatz for the Hamiltonian and a numerical tool that is able to extract the quantity of interest from it. Combining multiple properties in the loss function allows for virtually infinite number of possible applications four of which: finding parent Hamiltonians for model wavefunctions; search for simpler, experimentally accessible Hamiltonians implementing given phase of matter; extrapolating model wavefunctions to higher system sizes and finding optimal experimental parameters, are discussed here.
We expect our approach may be useful in both theoretical studies, for discovering new special Hamiltonians and in experiment-oriented numerical studies aiming to find models best suited for describing an experiment or for designing an experiment s.t. it is optimal in a given way. Validation of the Hamiltonians implemented in quantum hardware may potentially also be implemented within this framework.

VI. ACKNOWLEDGEMENTS
The simulations presented in this article were performed on computational resources managed and supported by Princeton's Institute for Computational Science & Engineering and OIT Research Computing. KP was supported by the Swiss National Science Foundation through the Early Postdoc.Mobility grant P2EZP2 172168.  Table III in the main text. Bottom panel: optimal pseudopotentials used in Table IV in the main text.