High-accuracy Hamiltonian learning via delocalized quantum state evolutions

Learning the unknown Hamiltonian governing the dynamics of a quantum many-body system is a challenging task. In this manuscript, we propose a possible strategy based on repeated measurements on a single time-dependent state. We prove that the accuracy of the learning process is maximized for states that are delocalized in the Hamiltonian eigenbasis. This implies that delocalization is a quantum resource for Hamiltonian learning, that can be exploited to select optimal initial states for learning algorithms. We investigate the error scaling of our reconstruction with respect to the number of measurements, and we provide examples of our learning algorithm on simulated quantum systems.


Introduction
Thanks to the enormous progress in manufacturing and controlling quantum devices made out of an increasingly large number of qubits, we are now entering the era of Noisy Intermediate-Scale Quantum technology [1]. Relevant progresses have been made in controlling quantum degrees of freedom on different platforms [2][3][4]. However, to some extent the true Hamiltonian, governing the dynamics of these systems is often (at least) partially unknown. In this context, the major challenge is then to infer a realistic Hamiltonian model of the quantum system that can match the experimental data, guided by physical intuition. By querying the device (assumed a black box), one can measure the time evolution of several observables in order to learn the system Hamiltonian. This process, known as Hamiltonian learning, has been fundamental over the years for the Davide Rattacaso: davide.rattacaso@unina.it Figure 1: The larger is the capability of the state ρ(t) to explore the space of states S, the larger is the amount of information obtained through the learning process, and, consequently, the smaller is the uncertainty in reconstructing the Hamiltonian. validation of theoretical models, where the observation of the system evolution aims to characterize the unknown parameters of the model and establish its plausibility, and is now centralizing the attention of the scientific community due to its relevance for quantum technologies and quantum computation . Applications of Hamiltonian learning range from the verification of the performances of quantum devices [29][30][31][32][33], to quantum error correction [34], to the design, characterization and calibration of quantum devices [26,27,34].
The performance of a Hamiltonian learning algorithm is determined by the scaling of the relative uncertainty of the reconstructed Hamiltonian as a function of the computational effort required, which, in many relevant situations, is given by the number N S of experiments, or shots. Each shot begins with a state preparation and ends with a measurement in the computational basis. To fully reconstruct a quantum Hamiltonian, its action on a basis of the Hilbert space must be known. As a result, a number of initial states that is exponential in the systems size is needed for a full process tomography, leading to an exponential overhead in the number of shots required for learning.
When the Hamiltonian is the span of local interactions, it has been proven that a polynomial number of experiments is sufficient to fully reconstruct its couplings [35]. Significant examples in this direction are the reconstruction of the system Hamiltonian from short-time evolutions [25] and taking advantage from the exploitation of symmetries of the unknown Hamiltonian [26]. In these works, locality is a resource for the learning process.
In this paper, we focus on properties of the state (rather than the Hamiltonian) and we propose a novel algorithm that requires measurements of a single time-dependent quantum state. The main idea is to take benefit of quantum superposition. As depicted in Figure 1, indeed, some states are able to explore a significant portion of the space of states S during their time evolution and therefore encode enough information to completely reconstruct the Hamiltonian generating their dynamics. Using analytical arguments and numerical examples, we show that the accuracy of the method proposed is related to the delocalization of the initial state in the Hamiltonian eigenstates. To this aim, we prove an analytical relationship between the inverse participation ratio (IPR) [36][37][38] of the initial state, and the information matrix that measures the amount of information acquired in the learning process. Equally weighted superpositions of the Hamiltonian eigenstates explore a large sample of the space of states and provide the maximum amount of information about the system Hamiltonian. In other words, delocalization is a resource for Hamiltonian learning. This opens a new perspective on the application of quantum information theory to the study of out-of-equilibrium quantum systems [36,39]. As a proof of concepts, we apply our method to learn the Hamiltonian of systems of few superconducting qubits, highlighting its relevance to gatebased quantum computation. In this setting, we exploit state tomography to define a simple algorithm that clarifies the relationship between delocalization and learning. Full state tomography requires an exponential amount of resources in the system size. However, scaling to large system sizes is out of the purposes of this manuscript.

Hamiltonian learning algorithm
To fully represent the information content of the system state, we define a basis B = {O α } for the space of Hermitian operators, orthonormal with respect to the Hilbert-Schmidt product (A, B) = Tr(AB). In this basis, the system density matrix ρ(t) can be expanded as ) are the components of ρ(t) over B and are the expectation values of the observables O α over the state ρ(t). If we measured the coefficients r α (t n ) at a collection of N T times {t n } ≡ {0, δt, . . . , (N T − 1)δt}, we would have a time-dependent state tomography. Repeating each measurement N M times, the uncertainty on each observables is σ(r α (t n )) = Tr(O 2 α ρ(t n ))/N M . In superconducting quantum processing units, measurements are always performed in the computational basis, i. e., in the basis of simultaneous eigenstates of the singlequbit operators σ z i . Measuring observables that do not commute with σ z i , such as σ x,y i , can be done by first rotating the system state so that σ x,y i → σ z i . This means that we need to perform 3 measurements per qubit in order to have full information about the system state. As a consequence, in a system of N q qubits the number experiment shots needed for a full state tomogra- In general, the unknown Hamiltonian is a Hermitian operator that can be expanded in the (exponentially large in the system size) basis B. However, realistic Hamiltonians often have symmetries and/or locality constraints that can be identified resorting to first-principle theoretical models [40]. Hence, we can write the unknown Hamiltonian as the span of a set of local Hermitian and traceless operators B L = {L i } that represent the relevant interactions between the constituents of the system: H = i h i L i . Remarkably, the locality constraint implies that the number of L i in B L is at most polynomial in the system size [28].
We want to exploit the information extracted from the state evolution via a full tomography to learn the couplings h i of the system Hamiltonian H. Assuming that the system evolves unitarily according to the Liouville-von Neumann equation ρ = −i[H, ρ] (here and in the following = 1), the system state at a time t n+1 is related to ρ(t n ) via the equation is the remainder of the Taylor expansion at the first order of ρ(t n+1 ), for t * ∈ [t n , t n+1 ].
In an ideal experiment, δt could be made arbitrarily small and the uncertainty on the states ρ(t n ) would vanish as well. As a consequence, the optimal couplings h (opt) i are the ones that minimize the Frobenius norm of the LHS of Eq. (1) at each time, that is: where The optimal couplings are such that the gradient of f ( h) is null. When the TQCM is invertible these couplings can be written as Eq. (4) can serve to our goal if the kernel of V ij is empty, otherwise, the experimental data are insufficient to specify the system Hamiltonian, meaning that different Hamiltonians can produce the same observables.

Uncertainty estimation
In real experiments, the expectation values r α (t n ) are affected by statistical uncertainties. Moreover, the time interval δt between measurements cannot be reduced to zero, and the remainder R n has to be considered as a systematic source of error for the estimated derivatives (ρ(t n+1 ) − ρ(t n )) /δt. These contributions determine a total uncertainty δB i on the vector B i . As a consequence, the Hamiltonian couplings are known up to an uncertainty δh i and any Hamil- | < δh i is compatible with the dynamics observed through the measurement of the local expectation values r α (t n ). Remarkably, since we perform tomography even at the initial time, state preparation errors do not directly affect the uncertainty on the reconstructed Hamiltonian.
As we show in Appendix A, both the statistical and the systematic part of the uncertainty δB i can be upper-bounded by a function that does not depend on the specific evolution of the system, but only on the Hamiltonian norm, the total number of repetitions N M , time steps N T and the number of qubits N q . Relating the uncertainty δB i to the uncertainty δh i through Eq. (4) and considering the relationship between the vector norm of the exact couplings h and the operator norm H op , we can extend this bound to the relative error on the reconstructed Hamiltonian: where l is the number of couplings of the Hamiltonian and, without loss of generality, we suppose that L i op = L op ∀i. Remarkably, this reconstruction error is not affected by the derivative uncertainty, inversely proportional to the time step, which decreases the accuracy of the learning protocols based on short-time evolution [25]. This feature can represent a significant advantage in characterizing devices with a high temporal resolution. In Eq. (6), the upperbound on the relative error only depends on the system evolution through the eigenvalues of the TQCM. In particular, the larger are these eigenvalues, the larger is the amount of information about the system Hamiltonian that we gain by observing the state evolution. Conversely, when some eigenvalue of the TQCM goes to zero and the matrix is not invertible, the uncertainty diverges, signaling that the information acquired during the experiment is not sufficient for Hamiltonian learning. We can clearly see the statistical meaning of the TQCM: Eq. (6) is analogous to a Cramer-Rao bound on the error on the estimated Hamiltonian [27,41], where the TQCM takes the role of an information matrix. While the spectrum of V ij estimates the total amount of information about the Hamiltonian that we have gained by observing the state during its evolution, at any fixed time t n , the spectrum of estimates the amount of information about the Hamiltonian gained by observing the short-time evolution around t n . Both V ij and V ij,n depend on the initial state preparation and on the consequent evolution. The quantum state preparation of the initial state is crucial to the success of the algorithm, optimal initial states are those maximizing the amount of information gained by observing the evolution, hence, minimizing the uncertainty on the reconstructed Hamiltonian.

Information and inverse participation ratio
Schrodinger's evolution often delocalizes states in the Hilbert space. In this section, we show that the more the state samples the Hilbert space, the more information we gain about the system Hamiltonian.
As shown in Eq. (6), the largest contribution to the relative error comes from the minimum eigenvalue of the TQCM. The TQCM is the sum over the different time steps of the covariance matrices V ij,n . Following the definition of Eq. (7), these are positive semi-definite. Evolution of highly delocalized states generates very different covariance matrices at different times. In this case, we speculate that the eigenvalues of the TQCM exponentially increase for an initial transient, corresponding to the time spent by the state ρ(t) in exploring the space of states before returning close to its previous orbit. The larger is the sample of the space of states explored by ρ(t) during its evolution, the larger will be the amount of information on the system Hamiltonian gained by observing the evolution of ρ(t).
Quantitatively, we show that a good estimation of the information obtained in the learning process is the IPR of the initial state in the Hamiltonian eigenstates [36][37][38]. If the system Hamiltonian is H = α E α |α α| and the initial state of the system is |ψ = α a α |α , the IPR is defined as IPR(ψ, H) = α |a α | 4 . Therefore, the IPR measures the spreading of the initial state in the Hamiltonian eigenstates: the lower is the IPR, the more the initial state spreads out. This is also an estimation of the capability of the state of sampling the Hilbert space uniformly during the evolution, in analogy with the ergodic hypothesis. Indeed, the time average of an observable A during the evolution isĀ = Tr(ρA), whereρ = |a α | 2 |α α| is the dephased state. As a consequence, when the IPR is minimum (IPR min = 1/2 Nq ) all the populations ofρ are equal and the time average becomes an average on the energy eigenstates with equal weights.
The relationship between the uncertainty on the reconstructed Hamiltonian and the IPR will be clear in the numerics that follow. However, a simple general argument is given here, showing that the states with a small IPR are associated with large eigenvalues of the TQCM and, due to Eq. (6), with a small error of the reconstructed Hamiltonian.
Please note that the in order to calculate the IPR, one needs to know the Hamiltonian eigenstates |α . This may sound odd, as the Hamiltonian itself is unknown in general. However, in our examples we apply our method to learn "known" Hamiltonians, and we can explicitly calculate the IPR. In possible practical applications, this is not the case, but one could resort to adaptive learning approaches as in Ref. [27]. Starting from a state minimizing the IPR over the eigenstates of a guessed Hamiltonian, one can iteratively obtain better estimates of the target Hamiltonian.
We consider ρ(t) as a pure state ρ(t) = |ψ(t) ψ(t)|, where |ψ(0) = α a α |α and |α are the eigenstates of the unknown system Hamiltonian. The eigenvalues ω i of the TQCM can be written as If the time-step δt is sufficiently small, this sum can be approximated by an integral: Considering that |ψ(t) = α a α |α e −iEαt , for δt → 0 these eigenvalues can be written as integrals of local correlations: At this point we impose a non-degeneracy and non-resonance condition on the system Hamiltonian H. The non-resonance condition consists in the fact that This condition is automatically satisfied by any Hamiltonian that is not fine-tuned, so we can assume its validity for the real unknown Hamiltonian. In this way, after a time transient T e that is the equilibration time of the system [36], the oscillating terms vanish and the previous equation becomes or, equivalently, (10) After the equilibration time these eigenvalues become linear in the number of time steps, with a coefficient [Tr(ρA 2 i ) − (Tr 2 (ρA i ) + Tr(ρ 2 A 2 i ))] that corresponds to a measure of variance for A i in the dephased stateρ. The positive contribution to this variance comes from the term Tr(ρA 2 i ), while the negative contributions come from Tr 2 (ρA i ) and Tr(ρ 2 A 2 i ). When the IPR of the system state is near to its minimum, the dephased state is well approximated by the totally mixed state and, since the L i 's are traceless, the term Tr 2 (ρA i ) vanishes. The remaining negative contribution also decreases in magnitude with the IPR, that can be written as Tr(ρ 2 ). We can conclude that minimizing the IPR is a good strategy to generate larger eigenvalues ω i of the TQCM.
The optimal initial states, minimizing the IPR and therefore optimizing the learning process, are thus where the φ α 's are arbitrary phases that we can fix to zero. Due to the beneficial effect of delocalization, one could wonder if mixed states can also be considered a resource for learning algorithms. However, since the TQCM in Eq. (3) is related to the trace of ρ 2 , we expect states with smaller purity to have a smaller TQCM and to determine a worse accuracy.

Simulations
We have applied our Hamiltonian learning method to some few-qubit problems, in order to verify our predictions about the error scaling in Eq. (6) and the relationship between the TQCM and the IPR in Eq. (10). After focusing on a simple two-qubit problem, we discuss a threequbit model with random couplings and we use Qiskit [42] to show how to apply our method to a real quantum processor.
In order to apply out learning procedure, we first choose a Hamiltonian H and generate the evolution of the expectation values of the basis elements {O α } by numerically integrating the Liouville equation for a set of initial states, corresponding to different initial configurations for the experiment with different IPRs. The effect of the statistical error is simulated by adding a uniform random noise with amplitude 1/ 2 Nq N M to each expectation value. Then, given the simulated expectation values r α (t n ), we apply our method to find the optimal Hamiltonian H (opt) = i h (opt) i L i . The success of the learning procedure can be checked comparing H with H (opt) . In particular we analyze the relative error ε and the TQCM for each initial state and for different numbers of times steps N T , corresponding to different total observation times. We also estimate the relationship between the IPR and the information gained in the experiment, measured by the eigenvalues of the TQCM. Finally, we also calculate the optimal initial state |ψ opt for each Hamiltonian and exploit it for an optimal learning.

Cross-resonance gate
We focus on a quantum system representing a two-qubit device governed by the typical Hamiltonian of a cross-resonance gate [40,[43][44][45]. The implementation of this gate, consisting in two transmon qubits coupled by a bus resonator, is fundamental to realize the CNOT gate for universal quantum computation. The Hamiltonian that we want to reconstruct is where the couplings are taken from Ref. [46] and the energies are expressed in MHz. The optimal Hamiltonian is then found as the span of the op- This choice is justified by first-principles studies [45]. The Hamiltonian learning algorithm is performed with a time step δt = 0.01, N M = 1000 measurement repetitions and a total number of shots N S = 3 × 10 6 , for four initial states with different IPR. The reconstructed Hamiltonian couplings for each initial state (with its IPR) are shown in Table 1. We observe that when the initial state has a small IPR, the accuracy is maximized.
In Fig. 2(a) we show the behavior of the relative reconstruction error as a function of the number of experiment shots N S . N S is increased by increasing the number of time steps N T and, consequently, the observation time. Different curves represent states with different IPR. The numerical simulations confirm our predictions: after a  Fig. 2(b), we show the behavior of the Frobenius norm of the inverse TQCM times the number of time steps, as a function of the experiment shots. We observe a correlation between the exponential decrease of the reconstruction error and the exponential decrease of this quantity, which saturates to a value proportional to the IPR. This is consistent with our theoretical predictions.
The relationship between IPR and information is shown in Fig. 2(c), where, for a set of random initial states, we represent the IPR and the Frobenius norm of the inverse TQCM at the final time. We observe that these functions are positively correlated for small values of the TQCM, confirming the predictions of Eq. (10): to improve the performance of the learning algorithm, we have to prepare initial states with a small IPR. In particular, the best possible performance corresponds to the minimum IPR, which, for a 2-qubit system, is IPR min = 1/4.

Random 2-body Hamiltonian
Here we test the learning algorithm on a system evolving with a random Hamiltonian where h i ∈ [−5, 5] and the L i are all the twospins interactions acting on a three-spins system, represented by tensor products of two Pauli operators and the identity operator. The Hamiltonian learning algorithm is performed with a time step δt = 0.01, N M = 1000 measurement repetitions and a maximum number of 370 time steps. The relative error of the reconstruction and the behaviour of the TQCM are shown in Figure 3 Panels (a) and (b). In Figure 3 Panel (c) we show the IPR and the Frobenius norm of the inverse TQCM for a collection of random states at large final time. The validity of theoretical predictions about the optimality of low IPR states is particularly evident in Panels (b) and (c), where the statistical and systematic contributions to uncertainty are not considered.

IBM Q FakeAthens processor
We test the learning algorithm on a simulated quantum processor, the FakeAthens processor, using Qiskit [42]. Our approach, from state preparation to the final measurements, can be identically extended to any quantum processor. The present simulator considers a two qubits system and take into account errors in state preparation and in measurements. We remark that, due to preparation errors, the starting state is not a pure state, nevertheless our method is applicable. We execute a time-dependent unitary gate, represented in the computational basis as We know that this gate is obtained through the cross-resonance mechanism illustrated in our first example, hence we look at a parent Hamiltonian that is spanned by the same operators: Since in this case we do not know a priori the real system Hamiltonian, in Figure 4 Panels (a) and (b) we show, for increasing total observation time corresponding to an increasing number of shots, the Hamiltonian couplings learned with different initial states, respectively the states |↓↓ and the Bell state (|↑↑ + |↓↓ )/ √ 2. Looking at Panel 4(c) we can see that a smaller uncertainty, and therefore a more reliable Hamiltonian reconstruction is obtained when the starting state is the Bell state, with smaller IPR [Panel (b)].

Conclusions and outlook
We have introduced a Hamiltonian learning algorithm whose global accuracy depends on the IPR of the starting state in the Hamiltonian eigenstates. The reconstruction error decays exponentially at short times and equilibrate to a value proportional to the IPR. Our results establish a direct connection between state delocalization and Hamiltonian learning. We conclude that delocalization can be exploited to drastically improve the efficiency of Hamiltonian learning algorithms. Moreover, since the TQCM can be interpreted as an information matrix on the space of states, the relationship between the time evolution of the QCM and the IPR of the state opens a new perspective on the information-geometric approach to the investigation of many-body quantum sys-  Coupling Coupling  tems [47][48][49][50]: the equilibration and ergodicity of a closed quantum system [39] can reflect on its out-of-equilibrium geometry [51]. Remarkably, our method is designed to reconstruct the system Hamiltonian from a generic quantum state, either pure or mixed. This feature is appealing for the application to real devices, where the preparation of the initial state can be affected by SPAM noise. In this regard, a natural extension of the methods described in this paper is the design of Lindbladian learning algorithms to deal with open quantum systems.
After the initial time transient, the errors on the learning is negligible and our results show an excellent agreement between the true Hamiltonian used to generate the dynamics and our optimal reconstructed Hamiltonian. The price to pay is to perform a complete tomography of the state at different times, obtained by measuring all the observables in {O α }. The number of these observables increases exponentially in the system size, rendering our approach too demanding for large systems. This great effort in collecting expectation values is partially rewarded by an exponentially decreasing relative error [Eq. (6)]. In perspective, we could perform a partial tomography and only measure the expectation values of the polynomial set of observables that optimizes the accuracy. These observables, as well as optimal initial states, could be selected through adaptive learning strategies, in analogy with Ref. [27], where the role of the TQCM is taken by the Fisher Information Matrix.
We thank G. Acanfora, A. Dutt, R. Fazio, A. Mezzacapo, and A. Russomanno for useful discussions and support. This work has been funded by project code PIR01_00011 "IBiSCo", PON 2014-2020, for all three entities (INFN, UNINA and CNR).

A Details of uncertainty estimation
In this appendix, we illustrate the details of the derivation of the bound in Eq. (6).
The uncertainty on B i is We want to find an upper bound on this uncertainty that does not depend on the states {ρ(t m )}.
The first term in the previous equation contains the statistical uncertainty. In a spin system we can choose basis operators as normalized tensor products of Pauli operators, hence O 2 α = 1/2 Nq and σ(r α (t n )) = 1/ 2 Nq N M . It follows that Since we can take the derivative to obtain At this point we approximate the fraction with the derivative and exploit the fact that {O α } is an orthonormal basis: Replacing this estimate of the statistical uncertainty and the Taylor remainder where H is the system Hamiltonian. Now, we want to derive a bound on the relative uncertainty on the couplings. Taking into account the triangular inequality and the relationship between p-norms, and defining l as the number of Hamiltonian couplings, we can write from which we finally obtain