Heisenberg-limited metrology with perturbing interactions

We show that it is possible to perform Heisenberg-limited metrology on GHZ-like states, in the presence of generic spatially local, possibly strong interactions during the measurement process. An explicit protocol, which relies on single-qubit measurements and feedback based on polynomial-time classical computation, achieves the Heisenberg limit. In one dimension, matrix product state methods can be used to perform this classical calculation, while in higher dimensions the cluster expansion underlies the efficient calculations. The latter approach is based on an efficient classical sampling algorithm for short-time quantum dynamics, which may be of independent interest.


Introduction
In classical physics, an ideal detector and noise-free experiment can perfectly measure the state of a system.In quantum physics, this is fundamentally not possible: due to Heisenberg's uncertainty principle, we cannot deterministically measure the state of a spin-1/2 system without knowing the axis along which its spin S is aligned.The rich field of quantum metrology has been developed over the previous decades in order to understand both the fundamental limits on how many measurements are needed to measure some parameter to a given accuracy in quantum mechanics, as well as to develop protocols that can achieve such fundamental limits: see [1,2,3] for reviews.
As we will review below in more detail, it is well-established that using unentangled states of N spins, after measuring M times some unknown parameter ω, the uncertainty δω ≳ (M N ) −1/2 .This is called the standard quantum limit, and it arises from the classical central limit theorem on the measurement outcomes of unentangled qubits.However, using a cleverly chosen entangled quantum state, one can improve this sensitivity: δω ≳ M −1/2 N −1 .Such scaling is called the Heisenberg limit.Thus, quantum entanglement can enhance the sensitivity of our measuring apparatus.
Unfortunately, an entangled state of N qubits is extraordinarily fragile to perturbations, so it is unlikely that such a state can plausibly ever be built in experiment.It is therefore of critical importance to understand to what extent the Heisenberg limit is robust to perturbations.This paper will address from one particular perspective, in which an experimentalist is handed a highly entangled state capable of achieving the Heisenberg limit, yet in which the measurement procedure itself is imperfect: the qubits interact with themselves in addition to sensing the external parameter ω.Similar problems were studied in [4,5,6,7,8,9,10,11,12,13,14].We will show that without proper "error correction" for these interactions, the experimentalist will lose any advantage to using the entangled quantum state.At the same time, we prove that there is a classically computable protocol involving measurement and feedback, which enables the experimentalist to achieve Heisenberg-limited scaling.Our result proves a conjecture from this earlier literature (see e.g.[5]) that Heisenberg-limited metrology is robust in the presence of known unitary perturbations.
The remainder of the introductory section overviews our results in a more thorough fashion, with explicit formulas outlining our setup and main results, together with some background into the field of metrology.The remainder of the paper rigorously demonstrates our claims.

Heisenberg-limited metrology in an ideal world
Consider sensing a parameter ω (often a magnetic field along a specified axis) using N qubits (i.e.spins- 1  2 ), whose Pauli matrices are denoted by X j , Y j , Z j where j ∈ Λ := {1, • • • , N }.The spins start in some initial state ρ in , evolve under the Hamiltonian for some time t, and are then measured.The goal is to use the measurement outcomes to learn the parameter ω.
One repeats this procedure M times: preparing the same initial state ρ in , and running the same protocol each time.After this, one uses the collective information about all the measurement outcomes to deduce an optimal estimate ω est .
If the initial state is unentangled, the best precision (δω) 2 := E[(ω est − ω) 2 ] one can achieve is given by the standard quantum limit (SQL) ( Here E[• • • ] denotes an average over all the possible measurement outcomes in each of the M trials.The N −1/2 scaling arises because the N spins undergo independent dynamics, and the measurement outcomes are uncorrelated.Here and below, we refer to reviews [1,2,3] for further introduction to the subject of metrology.While this result also scales as M −1/2 because each trial is independent and classical post-processing of the noisy measurement outcomes is constrained by the central limit theorem of probability, this M scaling is unavoidable so long as one must repeat the experiment M times using the same N qubits.The main question then becomes: can we improve the N scaling of ( 2)?
The answer is yes.If we begin with an entangled initial state -the Greenberger-Horne-Zeilinger (GHZ) state ρ in = |GHZ⟩ ⟨GHZ|, where it is possible to parametrically reduce δω at large N .To see how, observe that if we evolve for time t using H ideal ω , this state evolves to gaining an extensive phase difference between the two parts in (3).Then by measuring the expectation value of observable X := j X j : ⟨X⟩ id := ⟨ψ id | X |ψ id ⟩ = cos(2N ωt), (5) one achieves the Heisenberg limit (HL) which scales as ∼ N −1 ≪ N −1/2 , which is a dramatic advantage over SQL.
To derive (6), observe that for large M , See Fig. 1(a) for an illustration of the first equality.Here we have used (5), and the fact that the variation for Pauli-like (involutory) observable X obeys (∆X) Beyond its theoretical desirability, this entanglement-enhanced metrology is efficient to implement (in the absence of quantum noise or errors!): the global operator X can be measured simply by making simultaneous projective single-qubit measurements in the local X-basis.The needed ⟨X⟩ is just the parity ±1 of the product of the N measurement outcomes, averaged over M trials.
Notice that in the discussion above, we cannot fully determine the rotation angle θ = 2N ωt using the algorithm as stated.The reason is that we cannot tell apart θ and θ + 2π.Happily, the problem is easy to correct.We will assume, throughout this paper, that the "integer part" Figure 1: (a) Given measurement of observable O with precision δ ⟨O⟩ = ∆O/ √ M , it leads to a precision δω determined by the slope ∂ω ⟨O⟩.See (7) and (8) for the case of O = X in the ideal GHZ metrology.(b) A sketch of qubit interactions.In this example, the interaction graph G is the 2d square lattice, so the distance between two qubits d(•, •) is the Manhattan distance.Two spatially local terms in V , VS and V S ′ , are shown in the two shaded regions, each coupling four qubits.They both contribute to the local strength defined in (13) at qubit j.One can check that they have diameters diam(S) = 3 and diam(S ′ ) = 2.
of ω is known, such that We learn ω ′ by, for example, evolving first for an extremely short time t, such that we can deduce the first digit in ω (assuming we have an order of magnitude estimate for its value!).Then, we double the time, such that we can get a slightly more accurate estimate of ω; iterating this process many times, we can obtain as accurate of an estimate as we want, until we hit the HL (at which point the best thing to do is take M → ∞).See [15,16,17] for precise statements on this procedure; in particular, the precision is of Heisenberg scaling in terms of the overall resources.

The problem of interest
In this paper, we study the robustness of the above GHZ metrology to known perturbations.In particular, we assume the Hamiltonian acting on the spins is which contains unwanted local interaction V among the spins of local strength J.To be precise, the N spins sit on the vertices of a graph G, connected by edges that define a distance function d(•, •) on the graph.We assume each vertex connects to at most K other vertices by edges.Then we assume the interaction is where V S acts nontrivially on set S of qubits, diam(S) = max i,j∈S d(i, j) is the diameter of set S, and ℓ is the interaction range.The local strength is defined by which quantifies how strong the interaction on a single qubit can be.Here ∥•∥ denotes the operator norm, i.e. the largest singular value of the operator.As an example of the above formal definitions, V can be local interactions on a constant-dimensional lattice as sketched in Fig. 1(b), where each V S = P a P P can expanded by the Pauli strings contained in the local set S, with coefficients a P = O(J). 1 Note that J could depend on how the total operator V is decomposed into local terms V S ; since our results apply to any such decomposition, the reader can just choose a convenient decomposition, although in practice the results will be stronger for a decomposition that achieves the smallest possible J.We in general do not require J to be small compared to ω, although requiring so indeed gives a longer time scale for achieving HL, as we will see.
An explicit example of a Hamiltonian that is within the purview of the theory we describe below would be a transverse-field Ising model in a strong magnetic field: In an experiment, such a situation in (11) may arise when there are spin-spin interactions that one cannot perfectly cancel (e.g. by dynamical decoupling [18]). 2 If one wants to measure the magnetic field in a very local spatial region, one needs to spatially move the N spins closer together, and V may no longer be negligible.If V is neglected altogether, it could in principle become the dominant noise in an experiment.
Besides changing the Hamiltonian, we also address more general initial states |ψ⟩ in beyond the ideal GHZ state.We evolve the system under (11) by a time t independent of N , and ask whether measuring the final state can achieve HL, in terms of its N -scaling δω ∝ N −1 , given the exact knowledge of V .We will assume V is known exactly.The HL is impossible to achieve if each local V S has uncertainty Ω(1/N ).For example, suppose one pretends that there is no interaction, but there is actually a weak interaction: . When evolving the GHZ state, V contributes a Θ(1) phase difference between the two parts in (4), which leads to Θ(1/N ) biased error for ω that shares the same precision of the prior knowledge (10) and cannot be reduced by increasing M .Therefore, an unknown V can be just as dangerous as Z-type errors for metrology (the rate of which cannot exceed 1/N [19]).Nevertheless, our results could be useful so long as V is a well-calibrated interaction.

Overview of our results
The first question one might ask is whether or not the same protocol sketched in (5) would apply.
Unfortunately, the answer is no: the naive protocol ( 5) is extremely sensitive to perturbation V , even if V is an on-site field: see Appendix A. Some intuition for this result is that an unknown V is no better than decohering noise, which immediately leads to the SQL [20,21].Evidently, more work will be required to achieve the Heisenberg limit in the presence of V .In this subsection, we both outline the rest of the paper, and explain in less technical terms our main results along with the intuition for why they hold.We focus on the ideal GHZ initial state for this section.

Robustness of the "phase difference" to perturbations
In the ideal case, the extensive phase difference (4) comes from the extensive difference in Z polarization of the two parts: See the green curves in Fig. 2(a), where |α⟩ in represents |α • • • α⟩ (or its generalized version, see Section 2).With interactions, the two parts are evolved at time t to |ϕ 0 ω ⟩ and |ϕ 1 ω ⟩ that are no longer the maximally polarized product states.Nevertheless, the extensive Z difference persists for short time: for some O(1) constant c in , as shown by the distance between the two red curves in Fig. 2(a).The reason is simple: for each qubit, it needs a nonvanishing time Ω(1/J) to change its local state (polarization) significantly, because it only couples to O(1) other qubits nearby on the interaction graph.More precisely, this can be derived from where ρ i = tr i c (ρ) (c means complement) is the local density matrix of qubit i, and H on i contains the finite number of terms in H ω that act nontrivially on i; other terms do not contribute due to the cyclic property of the partial trace.(17) implies that the two parts keep gaining an extensive phase difference for short time.More precisely, when tuning ω, the phase difference changes accordingly by an extensive O(N ) sensitivity.In this process, since |ϕ α ω ⟩ (α = 0, 1) also rotate in the physical Hilbert space (which is absent in the ideal case), it is desirable that such rotation is subdominant comparing to the change of phase.We show this from the fact that |ϕ α ω ⟩ is short-time evolution from a product state and thus short-range correlated.The intuition is that a local Hamiltonian like H ω maps a product state (more generally, a short-range correlated state) to an orthogonal one with amplitude bounded by In Fig. 2(a), this is manifested by the O( √ N ) width of the wave-packets: for any α = 0, 1.
Gathering the two ingredients (17) and (19), we prove in Section 2 (which contains precise statements) that HL is robust for t < c in /J, because the extensive phase difference (4) is welldefined in the presence of interaction and can be measured in principle to estimate ω.

Measurement and feedback to achieve the Heisenberg limit
Now that we know there is a coherent phase difference between the two halves of the initial GHZ state even in the presence of interactions, we must now develop a protocol to measure the state and this relative phase difference accumulated.The key challenge is that if we do not measure in the right basis, we will accidentally measure which of the two |ϕ 0,1 ω ⟩ we are in (analogous to a Z error destroying all entanglement in GHZ).Hence, we must now develop efficient protocols to measure the phase difference over all N qubits, without causing such a generalized Z error, in order to achieve HL for t < c in /J.Such protocols are desirable: the x-basis measurement in the ideal case no longer works as shown in Appendix A, 3 and one does not want to implement a protocol that uses an exponential (in system size N ) amount of classical/quantum resources.
If one can engineer generic Hamiltonian evolution in the quantum experiment, we show in Appendix B that one can effectively4 reverse the evolution by H ω , after which an x-basis measurement like the ideal case leads to HL.However, such quantum control is typically demanding in experiment.
Therefore, we focus on another protocol based on local operations and classical communication (LOCC).As depicted in Fig. 2(b), the qubits are measured one-by-one, where the measurement basis depends on previous measurement outcomes (i.e., classical communication) and is determined by classical computation.During the measurement protocol, we assume the qubits undergo no Hamiltonian evolution or decoherence.In practice, this condition holds if measurement and classical computation is much faster than Hamiltonian evolution etc, or if one deliberately separates the qubits after the H ω evolution to well-isolated ones, awaiting for measurement.
Such an LOCC measurement protocol for pure-state metrology was previously proposed in [22], where they specified an algorithm to find the measurement basis (using previous measurement Figure 2: Sketch of our main results.(a) An illustration for robustness of HL, using distribution PZ in Z polarization.We assume the initial state (green curves) is like |GHZ⟩, where the two parts |0⟩ in and |1⟩ in have an extensive ∼ N difference in Z, and each of them has small ∼ √ N fluctuation.We prove these properties hold after evolving for time t < cin/J (red curves), so that the two parts robustly gain an extensive phase difference leading to HL.If J ≪ ω, this robustness extends to any N -independent time t.(b) A sketch of the LOCC measurement protocol that achieves HL for t < c M /J.Although classical computation is needed to determine the local measurement basis like |ex 1 x 2 ⟩, we prove the overhead is poly (N ) for many situations.In the final step to extract the paramter ω, the average parity ⟨P⟩ of the measurement string x is compared to two quantities f, ⟨P⟩ ′ from classical computation, using (67).(c) A sketch of the poly (N )-time classical sampling algorithm at short time t < c+/J (under certain conditions), which also serves as an ingrediant for proving efficiency of the LOCC measurement protocol in (b).To compute a local density matrix ρx n after measuring n qubits (so that one can determine the measurement probability of qubit n + 1), we prove that it only depends on dynamics within distance m ∼ log N of the given vertex.In the figure m = 2. Constraining computation to the radius-m ball (shown in orange) naively leads to a quasi-polynomial exp[(log N ) d ] complexity for d-dimensional systems; we use the cluster expansion to get poly (N ) for any d, and even for any bounded degree graph.Note that measurements in the figure occur one-by-one in a spatially continuous way: This is only for simplicity of drawing and not required in general.
outcomes) that achieves the best possible precision.We adapt this algorithm to our problem starting from Section 3, which achieves HL due to results described in Section 1.3.1.However, [22] did not study the classical complexity of the algorithm, and in general one might expect it to be exponentially difficult to find this ideal basis due to the many-body Hilbert space involved.However, we will prove that the classical computation of this efficient basis can be done using polynomial classical resources and runtime.This is usually denoted as poly (N ) (i.e.bounded by c q N q for some constants q and c q ).

Efficient classical sampling of short time dynamics
If the qubits are arranged in one dimension, the computation is done efficiently using matrix product state (MPS) representations for the two parts |ϕ α ω ⟩ of the state, because they are both short-time evolution from a product state.This is detailed in Section 3.For higher dimensions or general graph G, MPS techniques do not work, and we need to develop another method.The classical computation in the LOCC algorithm turns out to be merely the same as another task of its own interest, namely classically sampling LOCC measurements on |ϕ 0 ω ⟩.More precisely, consider evolving an initial product state using a local Hamiltonian like (11) for time t, after which the system is measured in the computational basis to output a classical string x with probability p x .It is crucial to understand whether this quantum experiment can be classically simulated in polynomial time, i.e., whether a classical computer can also sample the string x efficiently.It is believed (based on conjectures in theoretical computer science) that a depth-3 quantum circuit in 2d, which can be viewed as a time-dependent Hamiltonian evolution with t = Θ(1), already becomes exponentially hard to classically sample in the worst case [23].However, it is unclear whether this quantum advantage holds for any constant t independent of N .Moreover, generalizing computational-basis measurement, it is sometimes desirable to perform LOCC measurement to try to do measurement-based quantum computation (MBQC) on the final state [24].Is this adaptive case even harder to classically simulate?Is it capable to do universal quantum computation like MBQC on the cluster state [25]?
To answer these questions, in Section 4 we prove that for t < c + /J with a constant c + , the final state can be sampled by a poly (N ) classical algorithm.Therefore it is a computationally simple state, and does not enable universal MBQC.Technically, we achieved this by generalizing the cluster expansion method developed in [26,27], and rely on one assumption that the measurement basis is not so close to that of the initial state.The idea is shown in Fig. 2(c): the local state on an unmeasured qubit is easy to simulate, because it only depends on dynamics (and measurement) in a neighboring region of size ∼ log N .A rough physical picture for this is as follows.After evolution in time t = 0.1/J, each pair of neighboring qubits only establishes ∼ 0.1 correlation.Suppose qubits 1, 2 and 2, 3 are such pairs, but 1, 3 does not share correlation.Although measuring 2 may help to create correlation between 1 and 3, the amount is 0.1 2 = 0.01.Repeating this argument implies that faraway qubits have correlation exponentially small in distance, even after some local measurements are done.
Based on the complexity result in Section 4, we return to the metrology problem in Section 5, and prove (without assumptions on the measurement basis) that the LOCC protocol achieving HL involves poly (N ) classical computation, for general graphs and t < c M /J for some constant c M .

Robustness against weak perturbations
The above results hold even if the interaction is strong J ≫ ω.If it is actually weak J ≪ ω, we show in Section 6 simply by energy conservation that HL is robust well beyond the time window Jt < c in in Section 1.3.1.However, for such long times, we no longer have guarantees on the efficiency of the measurement protocol in general.We also discuss prethermalization theory [28] that plays a role in preserving Z polarization against small perturbations.

Further background on metrology: quantum Fisher information
The rest of the introduction provides further background knowledge, along with context, for our work.In this subsection, we summarize further useful facts about quantum metrology which are well-established in the literature.The first question we ask is whether the state (15) could achieve HL, without restriction on the kinds of measurements made (or classical algorithm used to process them).This question is equivalent to asking for the scaling of the quantum Fisher information (QFI) of |ψ ω ⟩ with respect to ω: For a pure state, QFI is defined as [29,30,31] (see e.g.Eq.( 4) in [32]) The first expression in (20) is intuitive: QFI simply measures how fast |ψ ω ⟩ rotates in the physical Hilbert space when tuning ω, where the unphysical global phase does not contribute.On the other hand, the second expression in (20) can be understood as some connected correlation function (see (42) below).QFI bounds the estimation precision by the quantum Cramér-Rao bound (QCRB) In the M ≫ 1 regime that we focus here, (21) is saturable [30,33] by projective measurements in the eigenbasis of followed by classical post-processing the data using maximum likelihood estimation.See [34] for modification of QCRB in the case M = 1.
In general, to achieve HL it is necessary to have F(|ψ ω ⟩) ∝ N 2 , which we prove in Corollary 2.3 and Proposition 2.5 for the problem described in Section 1.2.However, bounding QFI is not the only way to show robustness of HL.Indeed the most direct result we will establish is (as summarized above), based on proving that the extensive phase difference between the two halves of the GHZ state is robust to arbitrary perturbations (at short t).While this fact does imply the desired scaling of QFI, we will argue that the constraints of locality at short time scales in quantum mechanics ensure more than simply good QFI: they also ensure an efficient measurement protocol to achieve the HL in metrology.In particular, it is quite undesirable to do an eigenbasis measurement of (22), because: (i ) it requires evolving the many-body state |ψ ω ⟩ which a priori could be exponentially hard in N , (ii ) the naive algorithm of just measuring L ω requires knowledge of ω anyway, which is precisely what we want to learn, and (iii ) measuring L ω is not realistic in any experiment with N ≳ 4. In fact, property (ii ) means the measurement protocol does only local quantum estimation [33].
Ultimately, the QCRB can be saturated (or at least saturated up to a constant factor) by more than one protocol.The goal of our work is to find efficient ones with global quantum estimation, like the ideal one in Section 1.1, which involves only local measurements and applies to all ω in range I ω ′ (10).

Previous work on robustness of metrology
Special cases of our problem described in Section 1.2 have been considered in the literature.[4,5] show that HL is robust if V is on-site.In particular, [4] conjectures the robustness holds for more general interactions, which is proven in this paper.[35,6,7,8,9,10,11,12,13,14] consider specific models, where the chosen interaction V sometimes enhances the precision or robustness of metrology.For general unwanted interactions, it is proposed to reduce the interaction strength by dynamical decoupling [36,37,38], where the qubits are actively operated by control pulses.Indeed, quantum control is shown to have advantages in the setting of estimating multiple parameters [39], especially learning a many-body Hamiltonian [40,41].In this respect, our result may be surprising: HL for single-parameter estimation is actually robust even without quantum control during sensing.The price to pay is the nontrivial (but provably efficient) LOCC measurement procedure after sensing, in order to accurately estimate the parameter.
We assume the system is isolated from the environment throughout the paper.If there is decoherence from coupling to the environment, HL is not robust anymore and reduces to SQL in general [20,21], even if the qubits couple to the environment independently.Intuitively, this comes from the fragility of the global many-body entanglement of the GHZ state.In certain cases, HL can be restored by active quantum error correction (QEC) [42,43,44,45,46,47].However, it is usually assumed that the QEC operation is much faster than the decoherence rate.Our results may shed light on the actual QEC time scale needed, especially for decoherence that are correlated among qubits [48,49,50,51,52].

Robustness of the Heisenberg limit
Having summarized our strategy, we can now begin by deriving our first key result: the Heisenberg limit is robust if Jt is smaller than a constant even in the presence of strong perturbations.Note that this result does not rule out the possibility of HL robustness on longer time scales (see e.g.Section 6).Still, as we will see, this more limited result will be adequate for our purposes.

Generalized initial states
We allow for any initial state of the form where the two parts |α⟩ in (α = 0, 1) are orthonormal and satisfy the following two conditions: 1.They have extensive Z polarization difference: Here α = 0, 1, O is any operator, and c in > 0 is a constant.
2. They are short-range correlated: for any subsets A, B ⊂ Λ, and operators in them with 2, and ξ is the correlation length.If the graph G is not finite-dimensional, we assume the correlation length is relatively short The GHZ state is thus the extreme case c in = 1 and ξ = 0. We expect these conditions are satisfied by many more states of physical interest: e.g.equal superposition of Z 2 symmetry broken states, or "rotated" version of GHZ state like where the two parts have negligible ∼ exp(−N ) overlap and can be massaged to form (23).

Robustness of the extensive phase difference
We first show that before some O(1) time, the two parts in (27) keep gaining an extensive phase difference when varying ω, just like the ideal case.
Theorem 2.1.If the initial state satisfies (24) and (25), then As a result, the whole state satisfies (dropping the unphysical global phase) where the function f (ω) : and is an increasing function for t < c in /J with slope proportional to N : We will provide the proof shortly.Without QFI, (30) is already transparent on why HL is achievable for constant time t < c in /J, even for global quantum estimation: the two states |ϕ 0 ω ⟩ and |ϕ 1 ω ⟩ just gain opposite phases that are proportional to N when tuning ω.Suppose the function f is known, then for t < c in /J up to a negligible error O(N −1/2 ), a protocol achieves HL as long as it measures the relative phase5 between the two states with outcome f E (a number), because one can then solve f (ω est ) = f E to get an estimate ω est of the true ω.As we will show, the function f will be efficiently computable by a classical computer: one does not need to compute the value f (ω) for every ω; it suffices if for any given ω ∈ I ω ′ , the classical algorithm outputs f (ω) with good accuracy.The reason is that one can easily perform a classical binary search algorithm when comparing to the quantum experimental value f E : Proposition 2.2.Suppose there is an oracle such that for each given Then based on the quantum experimental result f E , calling the oracle O(log M ) times gives an estimate ω est for the true ω with HL.
Proof.After M measurements, the precision As a result, since the oracle has similar precision (32), it suffices to find an ω est such that the oracle output f O (ω est ) is within distance O(1/ √ M ) to the experimental value f E .This would achieve HL due to (31).
To find ω est , we use the following binary search algorithm.If one is lucky that ; otherwise call the oracle at the middle point ω ′ + π/(8N t) of the candidate interval, and repeat the above process.The length of the candidate interval shrinks exponentially with the number of steps, so that after O(log M ) calls of the oracle, one is guaranteed to get an ω est with HL (7).

Corollary 2.3. If the initial state satisfies
which scales as ∼ N 2 so long as t < c in /J.Note that we assume this short time inequality when stating (33).

Proof of Theorem 2.1
We first sketch the idea, focusing on (28) since (30) will follow by a simple integration over ω.The c ′ term in (28) is an unimportant global phase.The c ω term, on the other hand, is the crucial relative phase between the two parts α = 0, 1. Intuitively, an extensive Z polarization leads to an extensive relative phase gained by tuning ω, so we will first show the initial extensive ⟨Z⟩ difference (24) persists in short time.This is summarized in Lemma 2.4, which does not rely on the condition on correlation (25).Due to the presence of V , the two parts do not simply gain phases, but also rotate in the physical Hilbert space (where states differing by an overall phase are identified).Nevertheless, since the two parts are initially short-range correlated (25), the state rotated by a single term in the local Hamiltonian is orthogonal to each other.As a simple example, Therefore, the norm squared of the rotated part of the state scales linearly with N .This leads to the last term in (28), which is subdominant ∼ √ N comparing to the relative phase ∼ N .We note that similar locality estimates are also used to prove that even if the Hamiltonian is interacting, SQL cannot be surpassed by separable initial states [53].
We now show that the extensive phase difference c ω term in (28) comes from condition (24) alone.
Proof.Taking derivative on the matrix e −itHω with respect to ω, we have Using (24), In general, the time-averaged operator Z is close to Z in short time: where we have used (13) and ∥[A, B]∥ ≤ 2 ∥A∥ ∥B∥.This establishes (35) due to (38).
Proof of Theorem 2.1.Since we have proven the extensive relative phase (35), to prove (28) it remains to show the extra rotation in the physical Hilbert space contributes O( √ N ) in (28): where the function g(Jt) is independent of N , and t 2 comes from dimensional analysis due to two ω-derivatives on the left hand side.We have projected out the part in which would contribute to the c ω term in (28).One can verify that combining (35) and ( 40) yields (28).The reason is as follows.One can always decompose where (40) implies that ) and thus corresponds to the second term of O( √ N ) in (28).For the first term of ( 28), the c ′ term just corresponds to a global phase, and is unimportant.Taking the inner product between (41) and ⟨ϕ α ω |, the c ω term is further extracted by subtracting the two choices of α, which leads to (29) from (35).
To show (40), we first rewrite its left hand side using ( 36), (37) and In some sense, the final sum quantifies the total correlation shared by spin i with all other spins, in the final state |ϕ α ω ⟩.Intuitively, since in finite time i can only build correlation effectively with a finite set of neighboring spins, we expect i.e., bounded by some function independent of N , which leads to (40).More rigorously, since V is local, we have Lieb-Robinson bound [54]: There exists an operator where ∥Z i,r (t)∥ ≤ 1, and c LR , µ, v > 0 are constants. 7The bound (44) holds similarly for the timeaveraged operator Z i (t), because one can bound the approximation error of the integral in (36) by a triangle inequality like the first line of (39).Then using the initial condition (25), connected correlation for any pair i, j is bounded by an exponential-decaying function following [56] (see also Theorem 5.9 in [55]).Now sum (45) over j.The result is independent of N for finite-dimensional lattices, because the exponential decay in distance d(i, j) overcomes the polynomial number of js at the given distance.For general graphs with bounded degree K, the number of spins at a given distance to i is upper bounded by K(K − 1) d(i,j)−1 , so the decay in (45) still wins due to (26), and the fact that µ can be chosen arbitrarily large in (44) (by adjusting c LR and v accordingly) [55].This establishes (43) and further (40) from (42).
and the first inequality of (31) simply comes from (28).The second inequality of (31) comes from the first line of (38), which implies c ω ≤ tN because Due to this bound on the slope, the range of

An alternative viewpoint: correlation cannot decay in O(1) time
Theorem 2.1 above is one central result of this paper: it shows the robustness of the relative phase for GHZ-like states, proves HL for QFI in (33), and sets foundations for the latter sections on efficient protocols.As an interlude, this subsection is devoted to an alternative argument for robustness of metrology, at least at the level of QFI.The intuition is that the super-sensitivity of |GHZ⟩ with respect to the Z field stems from its large ZZ correlation/fluctuation. Mathematically, one can verify that QFI (20) is just proportional to ZZ connected correlation function (see (47)) when V = 0. Then for robustness, we basically need to show such long-range correlation cannot be killed in O(1) time evolution.Formalizing this idea, we have: with c ′ in > 0, then QFI defined in (20) satisfies which is HL for t < c ′ in /(4J).
The GHZ state corresponds to the case c ′ in = 1.We do not need the conditions (24) and (25) here, although it might be possible to derive (47) from those (possibly under physical assumptions that off-diagonal elements like ⟨0|Z|1⟩ in are subdominant).Beyond GHZ metrology, Proposition 2.5 thus also applies to metrology using spin-squeezed states [57].However, we leave as future work to design efficient measurement protocols for such states.
Proof.Using (36) in a similar way as (42), QFI in (20) becomes Here in the second line we have used Z = Z + (Z − Z).In the third line, we have used (47), (39) leads to (48).
3 LOCC protocol in 1d: sampling matrix product states

LOCC protocol
For our situation where the system remains in a pure state, [22] shows that QCRB can always be saturated by a measurement protocol built of local operation and classical communication (LOCC), where the spins {1, 2, • • • , N } are measured one after another, adaptively.Namely, first measure spin 1 in some chosen basis, and then measure spin 2 in a basis depending on the previous measurement outcome x 1 ∈ {0, 1}, and so on.After measuring all the spins, one obtains a binary string x := x 1 • • • x N , from which one tries to decode the parameter ω.Note that spins with nearby labels do not need to be close spatially.The measurement is described by the set of projectors E x , or equivalently the set of product states |E x ⟩: where determined by the previous measurement outcomes Like in measurement-based quantum computation, such classical communication is necessary in general [22]; see also [58].Note that we use lowercase e for operator/state on a single qubit, while uppercase E for projector on multiple qubits.For t < c in /J, combining Corollary 2.3 with [22] implies the existence of a LOCC protocol that achieves HL, at least for local estimation.However, although [22] describes a procedure to obtain the local basis E x defining the protocol based on the quantum state, it is not guaranteed that this procedure is easy to implement in practice.More precisely, we need to show that the local basis is classically computable in poly (N ) runtime (which implies space complexity is also poly (N )).From now on, we show that there is indeed a LOCC protocol of global estimation that (i ) achieves HL for certain class of GHZ-like initial states with t < c in /J, and (ii ) is provably efficient to implement.In this section, after showing the procedure to determine the local basis (50a), we focus on 1d where the complexity result is relatively easy to establish, and applies to more general initial states, using matrix product states (MPSs).In the next two sections, we use sampling complexity results to deal with general interaction graphs.

Finding the LOCC measurement basis
In this subsection, we adapt the procedure in [22] to our case of GHZ metrology with global estimation.Namely, we show how to determine the local measurement basis E x from knowledge of |ψ ω ′ ⟩ where ω ′ is known.Observe that the probability distribution Since the relative phase between ϕ 0 ω ′ and ϕ 1 ω ′ only contributes to the second term, the intuition is to make the second term large enough, and check that is also the case for any ω ∈ I ω ′ .Indeed, if the measurement basis is not chosen carefully, the second term would become exponentially small, leading to SQL as shown in Proposition A.1.Strictly speaking, the second term has amplitude and we demand a criterion for each: 1. We want the amplitude to be as large as possible: In order to saturate Cauchy-Schwarz inequality we desire E x |ϕ 1 ω ′ ⟩ only differ by a phase, or equivalently, 2. The phase where the + (−) sign is for even (odd) string x, i.e., determined by the parity.This requires These two conditions combine to Note that the phase (54) is chosen deliberately such that (56) is satisfied by the standard x-basis measurement in the unperturbed case V = 0, where a known field ω ′ (9) would accumulate a phase θ = π/2 modular π.In general, the measurement basis satisfying (56) can be determined by the following procedure that generalizes [22].1. Calculate "reduced matrices" on spin 1: Here subscript ∅ means no previous substring.
This is possible because two traceless Hermitian matrices can be simultaneously zero-diagonalized [22]: there is always a basis such that one matrix is proportional to Pauli X, while the other is linear combination of X and Y .Moreover, Im ⟨e 0 | M ∅ |e 0 ⟩ is chosen to be positive.This completely determines e x1 , as long as M ∅ and M ∅ are nonvanishing.

Calculate
We are using ⟨e x1 |M|e x1 ⟩ to denote an operator acting only on spins 2, . . ., N .

Repeat steps 3 and 4 for spins
One can verify that condition (56) is satisfied in the final step.

Matrix product state approximations
In general, the states |ϕ α ω ′ ⟩ and therefore the matrices M, M cannot be computed exactly.Instead, we need approximations of them that (i ) can be computed efficiently, and (ii ) induces negligible error for quantum metrology.This will lead to approximation E ap x of the local basis, along which the quantum measurement is actually done.In this section, we focus on the easier case of one dimension, where we use MPS to approximate states.
An MPS on a chain of qubits where D is called the bond dimension.Pictorially {A (j) α } are tensors with three legs: one physical leg and two bond legs, and the MPS is represented by contracting the bond legs of nearestneighbors.See Fig. 3(a).Although any state can be expressed as an MPS, the bond dimension is typically exponentially large in N .In contrast, MPSs of physical interest are those whose bond dimension is upper bounded by poly (N ), which contains only a polynomial number of parameters.Such an MPS is easy to manipulate, because many properties of it can be computed classically in polynomial time by contracting the tensors.For example, given a string operator O = O 1 ⊗O 2 ⊗• • • , its matrix element between two MPSs is represented by sandwiching the operator by the tensors of the two MPSs (one is conjugated).As shown in Fig. 3(b), one can contract this tensor from left to right, using only poly (N ) memory and runtime if bond dimensions are bounded by poly (N ).Our first result shows that given our assumptions in the previous section, such an efficient MPS must exist.

Proposition 3.2. Assuming (25) holds for |α⟩ in in one dimension, there is an approximation
and |Φ α ω ′ ⟩ is an MPS with poly (N ) bond dimension.Proof.In 1d, any |α⟩ in with exponential decay of correlation (25) can be approximated by an MPS |α⟩ IN with bond dimension poly (N ), with a small error [60]: On the other hand, there exists a quantum circuit unitary U ′ (which can be found using the HHKL algorithm [59]) shown in Fig. 3(c) that approximates the evolution unitary e −itH ω ′ to error where U ′ consists of three layers of local gates.Each gate is acting on at most L = O(log(tN/ϵ)) neighboring spins, and is simply e ∓itH ω ′ ,local , where H ω ′ ,local only contains terms in H ω ′ that are supported inside the given region.Only the second layer is backward time evolution with + sign on the exponent.We choose ϵ = 1/ √ 2N and focus on constant time, so that L = O(log N ). 8e then define |Φ α ω ′ ⟩ = U ′ |α⟩ IN , which satisfies (60) from ( 61) and (62).Since |α⟩ IN is an MPS with poly (N ) bond dimension, the output |Φ α ω ′ ⟩ from acting quantum circuit U ′ on it remains to be such an MPS, because U ′ increases the bond dimension at most by a multiplicative factor poly (N ).
To see this, one can combine each block of L qubits to one qudit of dimension 2 L = poly (N ); the matrices A (j) for |α⟩ IN are simply multiplied within each block, which does not change the bond dimension.On the other hand, the quantum circuit U ′ is simply two layers of nearest-neighbor 2-local gates on the qudits (with a layer of 1-local gates in between), which can multiply at most a factor of poly (N ) to the bond dimension, because the quantum circuit can be expressed as a matrix product operator of poly (N ) bond dimension [61].One can also see this fact in Fig. 3(d) by identifying the MPS structure of the final state.
Based on the MPS approximation, the measurement basis is not exactly the one determined by (56), but an approximate version E ap x that satisfies This basis is again determined by Procedure 3.1; the only difference is that the matrices are substituted by This procedure becomes efficient to implement, as we will show.

Efficient metrology protocol with Heisenberg limit
With the developments above, we propose the following metrology protocol sketched in Fig. 2(b).

Evolve under
H ω for time t.

Measure {E ap
x } determined by Procedure 3.1 with MPS-approximated matrices (64), record the parity ±1 of the string

In this step, do not calculate E ap
x for all x because there are exponentially many of them.Instead, just calculate the one chosen by the measurement trajectory.

Repeat steps 1-3 for 1 ≪ M ≪ N times to extract an estimate of the average parity
5. Classically simulate with error bounded by 1/ √ N .Note that this quantity does not depend on the unknown ω.

Match the experimental result of ⟨P⟩ − ⟨P⟩
′ with − sin[f (ω)] to read out the parameter ω, using binary search in Proposition 2.2, where the function f (ω) is classically computed for any given ω with error bounded by 1/ √ N .
Clearly, this protocol is to globally estimate any ω ∈ I ω ′ , and involves poly (N ) quantum resources.We show that for short-time dynamics, it achieves HL and involves only poly (N ) classical computation.Theorem 3.4.For t < c in /J in 1d, Procedure 3.3 (0) gives an unbiased estimation of ω at large N : and satisfies HL.Furthermore, the classical computation involved has poly (N ) runtime.Specifically, there is a poly (N ) classical algorithm for calculating each of the following up to error O(1/ √ N ): (1) f (ω) for any ω ∈ I ω ′ .(2) The local basis by Procedure 3.1 for a given trajectory x.
Proof.We prove the claims (0-3) one by one.(0) ⟨P⟩ is the expectation of observable P = x (−1) x E ap x in the final state ψ ω with ∥P∥ = 1.(30) together with (60) implies so By explicit calculation, Here the first line follows from ( 51) and ( 68).In the third line, we have used the MPS version of (56).To get the last line, we used x E ap x = 1.Thus ( 67) is established by combining ( 69) and ( 70).
Since M ≪ N , the experimental error 1/ √ M for ⟨P⟩ is dominant in (67).As a result, we can ignore the 1/ √ N errors from the final term, or the computation of ⟨P⟩ ′ and f (ω).The HL is therefore satisfied, because the experimental ⟨P⟩ is compared to a function of ω that has ∝ N slope according to (31), using the binary search algorithm in Proposition 2.2.
(1) From ( 30) and (36), where we approximate the integral by a sum in the last line with step-sizes ∆ ω , ∆ t to be determined.71) is poly (N ) / (∆ ω ∆ t ), because there are O(1/ (N ∆ ω ∆ t )) number of summation terms.Finally, since each summation term varies with both ω ′′ and s by a bounded slope O(N 2 ) (e.g. To summarize, f (ω) can be computed with error O(N −1/2 ) in poly (N ) time.
(2) Steps 2 and 4 in Procedure 3.1 are just calculations for 2-by-2 matrices, so it suffices to show poly (N ) complexity for steps 1 and 3, where in general one wants to compute and M ap x1•••xn defined in a similar way.Since M ap is the difference of (the density matrices of) two MPSs ( 64), ( 72) can be computed by contracting tensors as shown in Fig. 3(b), which takes poly (N ) classical runtime due to poly (N ) bond dimension.This contraction algorithm works for M ap similarly.An alternative way to see this is depicted in Fig. 3(e): each time a qubit gets projected, the remaining qubits keep in an MPS with bond dimensions unchanged, so that one can easily iterate this process.
(3) Since the next measurement basis is always easily calculable as shown in (2), sampling string x with probability p ′ x is classically easy, because any adaptive single-site measurement (e.g.measurement-based quantum computation) on MPS with poly (N ) bond dimension is classically simulable [62,63].The idea is that one can sample the classical distribution p ′ x by outputing the bits x 1 , x 2 , • • • one by one, where x 1 , for example, is generated by a random variable 0, 1 with probabilities ⟨e ap 0 ⟩ Ψ ω ′ , ⟨e ap 1 ⟩ Ψ ω ′ accordingly.These two probabilities are equal to due to the MPS version of ( 51) and ( 64), which are easy to calculate from contracting tensors of the MPS Φ 0 ω ′ described in (2).After sampling N strings x, one gets an estimate for ⟨P⟩ ′ with error ∼ 1/ √ N .This in total takes poly (N ) classical resources.

Sampling of real time evolution
The MPS techniques in the previous section do not generalize to higher dimensions.For example, we used a poly (N ) classical algorithm that samples LOCC measurement on MPS; while 2d tensor network states like the cluster state are hard to sample classically, because they have the potential to perform universal quantum computation.To make progress, in this section we restrict the state to be obtainable by short-time evolution with a local Hamiltonian from a product state.For such states, we give a poly (N ) sampling algorithm under certain conditions.This section can be read independently of the rest of the paper, which will subsequently apply the results of this section to develop an efficient LOCC metrology protocol in a general interaction graph G.
We consider a time-dependent Hamiltonian where each H a is a local Pauli string with a time-dependent coefficient λ a J a (s) satisfying |J a (s)| ≤ J, and |λ a | ≤ 1. 10 We further require the whole Hamiltonian has F Fourier components {ω ν }, i.e., where ω ν does not depend on a.We say two Pauli strings H a and H b (a ̸ = b) overlap if there is a qubit on which they both act nontrivially.Let d be the maximal number of H b s that an H a overlaps with, and k be the maximal number of qubits an H a acts on (i.e., H is k-local).
The system starts from initial product state |0 • • • 0⟩, evolves under H(s) for time t, and is then measured by LOCC in local basis E x introduced in (50a).This process generates a binary string where T means time-ordering.We will show that, under certain conditions, p x is easy to sample on a classical computer if Jt is smaller than a constant.In particular, the frequencies ω ν s can be arbitrarily large comparing to J. We focus on qubit systems, but the results generalize naturally to higher-dimensional local degree-of-freedom, i.e., qudits.In Section 4.5, we show how the results about the time-dependent case (73) can be modified to the time-independent case (11) with an extra local field ωZ, which we no longer assume needs to be small.

Assumption on the local measurement basis
Our rigorous results will hold for the following situation.Suppose the local measurement basis always have a finite overlap with the local initial state |0⟩: for some constant 0 < c m ≤ 1/ √ 2 (subscript m means measurement).The upper bound 1/ √ 2 has to be there in order to satisfy (76) Therefore condition (76) merely says the measurement basis is far from the computational basis (that contains the initial state).This is satisfied by, for example, measurement in the x-basis.
Intuitively, to do useful quantum information processing in short time, one typically wants to measure in a way like (76), instead of in the computational basis {|0⟩ , |1⟩}.The reason is that after short time evolution, the qubit is still close to its initial state |0⟩, so the computational-basis measurement would just output a boring 0 with high probability.Indeed, we will see in the next section that for robust metrology, one always measures in a basis satisfying (76), like the ideal case with x-basis measurement.

Polynomial-time classical sampling of short-time evolved product states
Theorem 4.1.For any LOCC measurement satisfying (76), there is a constant time such that for all t < t * , the measurement string x can be classically sampled in poly (N ) runtime, where the output probability distribution p cl x (cl means "classical") is close to the true one p x in total variation distance: As will be shown in the proof, it is crucial that the whole Hamiltonian has a bounded number F of frequencies; if each local term has its own set of F frequencies that do not agree with those of other terms, one can adjust the proof to achieve a quasi-polynomial O(N log log N ) runtime algorithm.
Proof.The proof closely follows [64], where we showed the imaginary-time version of Theorem 4.1, namely sampling high-temperature thermal states is easy (see also [65]).
As the starting point, sampling the whole string x is reduced to the problem of computing conditional probabilities (marginals) p(x n+1 |x n ) for measurement outcome x n+1 to appear after a previous substring More precisely, the sampling algorithm works as follows: First choose x 1 ∈ {0, 1} from probability p(x 1 ), then choose x 2 based on marginal p(x 2 |x 1 ), then choose x 3 based on p(x 3 |x 1 x 2 ), and so on.Lemma 4 in [64] guarantees that to achieve (79) with the right hand side being N 1−α , it suffices to get an approximation p cl (x n+1 = 0|x n ) with error (p cl (1|x n ) will be just 1 − p cl (0|x n )) For each marginal, we invoke the following Lemma that we prove later.
Lemma 4.2.If t < t * in (78) and (76) holds, the marginal is given by where γ m is bounded: This implies the series (81) converges absolutely.Moreover, γ m can be computed in time is a constant; recall the definitions of d and k below (74).
This Lemma implies a polynomial algorithm for computing p cl (0|x n ) in (80), by truncating the sum to m ≤ M = η log N , where η > α/ log(t * /t).Combined with the sampling-to-computing reduction above, we get the desired sampling algorithm, with runtime O(N 1+cη ) to output one string x.

Proof of Lemma 4.2 using cluster expansion, and its corollary
We need an operator formalism for the proof. 11Namely, each operator O is viewed as a vector |O) in an "operator Hilbert space".For example, the initial density matrix is a direct-product vector in the operator space.Define the inner product between operators which induces the operator 2-norm We rewrite the (forward-in-time) Heisenberg evolution as One can verify that the Liouvillian L(s) is anti-Hermitian in terms of inner product (85).As a result, the final density matrix |ϕ⟩ ⟨ϕ|, as an operator |ϕ), is where iL(s) plays the role of a Hermitian Hamiltonian.
Proof of Lemma 4.2.Using (88), the conditional probability satisfies where the "partition function" Note that we extract the term |⟨0|e x1•••xn0 ⟩| 2 in (89) such that the right hand side vanishes for β = 0, and will become the m ≥ 1 expansion in (81).In (90), E n+1 is the superoperator that multiplies e ′ n+1 to the left of an operator - is a direct-product operator.Thanks to our assumption (76), the denominator in (90) is nonzero.It does not matter that the denominator is exponentially small in n, as we will see.Now compare to Lemma 5 in [64].The conclusions of the lemmas are almost the same, where the value of t * (78) has an extra factor 2c −2k m J −1 comparing to β * in [64], and an extra F appears in (83) for the runtime.On the other hand, the starting point, (89) and (90), are in a similar form to Eqs. (A2) and (A3) in [64].The differences are (1 ) (90) is real-time evolution viewing iL as the Hamiltonian, not imaginary-time e −βH ; (2 ) (90) is a ratio of matrix elements (in the operator space), not a trace; (3 ) evolution in (90) is time-dependent.We show that these differences can be incorporated into the proof of Lemma 5 in [64], which leads exactly to the extra factors in Lemma 4.2.We encourage the reader to get familiar with the proof in [64] first, which relies on the technique of cluster expansion developed in [26,27].
Difference ( 1) is straightforward to tackle, because the proof in [64] does not rely on the fact that the inverse-temperature β is real: it can be replaced by it here.Below we show how to deal with (2 ) and ( 3).
The key quantity to be bounded is the cluster derivative of Z, the simplest case κ = 0 being where is the m-th permutation group, and Due to time-ordering, the integral in (92) is on a simplex T m , whose volume is Vol(T m ) = t m /m!.Since the states |0) , |x n ) are both product states, the matrix element in (92) can be restricted in the support region Supp(V) ⊂ Λ of V: the numerator and denominator factorize (e.g. ( ) so that contributions outside the region Supp(V) cancel.We then bound the restricted numerator and denominator as follows.The numerator is bounded by Cauchy-Schwarz inequality: ) where we used the superoperator norm bounded by Hölder inequality ∥AB∥ 2 ≤ ∥A∥ ∥B∥ 2 : (95) and n V denoting the number of measured qubits in Supp(V).The denominator is bounded by (76): Putting ingredients together, (92) is bounded by Here in the second line, we maximize the expression at the largest Comparing (97) to the bound β m in Eq. (A14) of [64], we see difference ( 2) just invokes the advertised extra factor 2c −2k m J.Here 2 comes from taking commutator in (95), c −2k m comes from the denominator in (92), and J is just fixing the energy unit.(82) then follows from the proof in [64].
To bound the runtime, observe that difference (3 ) only modifies the first step to calculate a given cluster derivative D V log Z.In [64], the first step is to compute Z K (originating from the K-th order β-expansion of e −βH ), where Z := a∈V z a H a with z a being some integers, and K ≤ m is a positive integer.Here, due to time dependence, the first step is to compute where Z(s) := a∈V z a J a (s)H a .Thanks to (74), Z(s) only has F frequencies: so that Since the integral inside the bracket can be done analytically, the overhead incurred by timedependence is merely that instead of a single product of matrices Z K , one needs to compute F K ≤ F m such products. 12Each product Z ν1 • • • Z ν K is no harder to compute than Z K : one simply multiplies the K sparse matrices one by one.( 83) then follows from [64] where one adds an extra factor F m to the runtime.This finishes the proof of Lemma 4.2.
In the next Section, we will also consider the following quantity similar to the form of p(0|x n ) in (89), where Although the operator |1, 0) is not a quantum state so that p(0|x n ) is not interpreted as a marginal probability, Lemma 4.2 generalizes to yield a cluster expansion for p(0|x n ) because of the following: In the above proof, we only need two properties of the initial operator (84): its direct-product structure, and its nonvanishing local overlap with the previous measurement basis |x n ).These properties also hold for |1, 0), because |⟨0|e where γ m has the same bound (82) for some constants c cor , c ′ cor > 0, where O i , O j act on i and j respectively with ∥O i ∥ , ∥O j ∥ ≤ 1.
It is well-known that one can perform universal measurement-based quantum computation (MBQC) on the 2d cluster state, which can be prepared from an initial product state by Θ(1)-time Hamiltonian evolution.Theorem 4.4 suggests that the evolution time has to be Ω(1) in order for the final state to be a universal MBQC resource.This constraint also holds for large-distance teleportation using measurements [66], because teleportation is equivalent to building long-range correlation [58].It is an interesting question whether the condition (76) can be removed, which may demand deeper understanding of the cluster expansion itself.

Eliminating local fields via the interaction picture
Results in this section generalize to the case (11) where the Hamiltonian H ω is time-independent with a local field ωZ.For example, sampling is easy as long as Jt is smaller than a constant, even if ω is much larger than the local strength J of V .The reason is as follows.In the interaction picture, where T is anti-time ordering.Acting on the initial state |0 • • • 0⟩, the e −iωtZ part becomes trivial, so the evolution is just generated by a time-dependent Hamiltonian H(s) = V (t − s).Since interaction V has finite range and Z is integer-valued, V (s) has finite number of frequencies each being an integer multiple of ω.Thus the situation reduces to the case (73) with (74), and we have the following:

LOCC protocol in higher dimensions
In Section 3, we presented a provably efficient LOCC measurement protocol for Heisenberg-limited metrology in 1d, using the fact that MPS with small bond dimension is easy to sample from.In this section, we establish similar results for general graphs, using the sampling result in Section 4.
In contrast to 1d, here we deal with a more restricted class of initial states, since correlation decay (25) no longer guarantees easiness of sampling. 13For simplicity, we assume the initial state is just the ideal GHZ state.The following results can be generalized to other initial states, for example, sufficiently short time evolution on |GHZ⟩, or rotated version of GHZ states.The metrology protocol is similar to Procedure 3.3.The only difference is how to approximate E x determined by (56).In 1d, we used MPS approximations to get E ap x ; here in the next subsection, we use cluster expansions developed in the previous section (which also applies to 1d as a special case).Note that although one still has a tensor network representation for ψ ω ′ in higher dimensions [59], it is not manifestly easy to sample so is not used here.

Finding the next LOCC measurement basis via cluster expansion: assuming no previous errors
We want to compute an approximation  (76), then the next one can be computed efficiently with small error.We will analyze the accumulated errors, as well as justify (76), in the next subsection.
Observe that the key quantity to compute in Procedure 3.1 is the 2-by-2 matrix 14 where ρ 0 x1•••xn for example is the normalized reduced density matrix of n + 1 after measuring the previous qubits 1, • • • , n, in the time-evolved all-zero (instead of GHZ) state ϕ 0 ω ′ .One of its matrix elements is given by where p α (0|x n ) is just the marginal probability in (81) that is shown to obey the cluster expansion!Note that (81) is exactly the case α = 0, and is straightforwardly generalized to the case α = 1 where one just changes the initial state to be all-one.Furthermore, since the proof of Lemma 4. M x1•••xn can be computed using cluster expansion in a similar way: Explicit calculation similar to (108) yields where  (76).More explicitly, we use the following procedure based on cluster expansion that mimics Procedure 3.1.We focus on M, and M is treated analogously. 14From now on we normalize the matrix such that Mx is a normalization factor (which holds for both α = 0, 1 because the previous basis are chosen to zero-diagonalize M old x 1 ••• such as (58)).This normalization makes the new matrix Mx 1 •••xn to be of order 1 (instead of exponentially small for M old

Compute the reduced density matrices ρ α
∅ on spin 1 in the state ϕ α ω ′ (whose initial state is all-α instead of GHZ).Using the cluster expansion algorithm above, a polynomial-time computation outputs an approximation ρ α,ap ∅ (here subscript ∅ means no previous substring measured) with 1/poly (N ) error for all of its matrix elements.This yields which approximates M ∅ with 1/poly (N ) error.Find an approximation M ap ∅ for M ∅ analogously.x based on the cluster expansion algorithm.There exists positive constants c M , c m (here subscript M stands for "metrology") independent of N such that the following holds.If t < c M /J, the local basis elements are far from the computational basis

Find an orthonormal basis
so that Procedure 5.1 only needs poly (N )-time classical computation.Furthermore, the result E ap x is close to E x in the sense that Proof.Let ϵ = 1/poly (N ) be the desired accuracy on the right hand side of (116).We have shown in the previous subsection that (116) holds for the first step n = 0 ((115) also holds trivially), because there is no previously computed basis.We prove for the later measurements n = 1, 2 , the basis cannot be close to the computational basis.This argument can be shown explicitly by Here in the first line, O(Jt) comes from the m ≥ 1 expansion of ( 81) and does not depend on n, N .
To get the second line, we used the induction hypothesis (115) for the previous measurements that enables a polynomial-time computation for ρ where γ ap m is the coefficient of the cluster expansion for (a matrix element of) are O(1) constants, kmϵ comes from the fact that there are at most km local projections involved for a cluster of size m, with error ϵ for each projection according to our induction hypothesis.Then the total error of the conditioned local density matrix is bounded by where c ′′ * is a constant independent of N .Then if Jt < 1/2 max(c * , c ′′ * ), which we ensure by choosing  116).This finishes the inductive proof.

Main result
Proposition 5.2 implies the metrology protocol given in Procedure 3.3 is efficient and achieves HL.Therefore, we have an explicit protocol to achieve HL metrology in the presence of local interactions for arbitrary interaction graphs.Proof.We show the facts established in the 1d Theorem 3.4 also apply here.
(0) (67) holds, so that the estimation of ω is unbiased at large N .Following the 1d proof of (67), here we only need to show that the error of local basis (116) induces O(1/ √ N ) error for any observable like P. 15 We choose the right hand side of (116) to scale as N −3/2 .With this bound and a similar argument of the sampling-to-computing reduction around (80), we know the measured probability distribution p x is close to the ideal one (with E x ) with total variational distance O(1/ √ N ).This error bound also holds for observables, and is thus absorbed in the last term in (67).
The classical computation of the following quantities are poly (N ): (1) f (ω) for any ω ∈ I ω ′ : The proof of part (1 ) of Theorem 3.4 still applies, as it only relies on a Lieb-Robinson bound, which holds for general graphs [55].
(2) The local basis by Procedure 5.1 for a given trajectory x: This is guaranteed by Proposition 5.2.

Robustness against small perturbations
In previous sections, we did not require weak interaction If (121) holds, the Z field is dominant in H ω , so (24) implies the two parts ϕ 0 ω and ϕ 1 ω have a large energy difference with respect to H ω .By energy conservation, the two parts keep an extensive difference in Z polarization forever.Formalizing this idea, we show that the HL is robust to much longer time scales, compared to Jt < c in in previous sections (see Theorem 2.1).Substituting ( 35) by ( 125), the rest of the proof of Theorem 6.1 follows verbatim to establish Theorem 6.1.In particular, the errors O( √ N ) in (28) and O(1/ √ N ) in (30) come from the fact that g(Jt) in (43) does not scale with N , which holds for any constant t = O(N 0 ).Theorem 6.1 implies HL for QFI following Corollary 2.3.In terms of achieving HL by the LOCC measurement protocol, 1d results in Section 3 continue to hold here for any constant t independent of N , because the two parts of the state remain to be MPS with poly (N ) bond dimensions.However, efficiency of the protocol in higher dimensions is not guaranteed if Jt ≳ 1.
In the proof of Theorem 6.1, we used bound (124) that simply comes from energy conservation.If the system is finite-dimensional and (121) holds, there is actually another mechanism called prethermalization [28,67,68,69] that yields a similar bound (127) below.In a nutshell, although the U(1) rotational symmetry along the z direction is explicitly broken by the interaction V , prethermalization theory proves that there is an approximately conserved U(1)-charge (which is a dressed version of Z) on time scales t ≪ t pre , where the prethermalization time (after which the system thermalizes with no conserved U(1)-charge) t pre := J −1 exp(c pre ω/J), (126) which is exponentially large in ω/J ≫ 1.Here c pre is a constant independent of N, ω, J. Physically, prethermalization roughly says that an initial eigenstate of Z cannot decay into other charge sectors in short time.This is because the other charge sectors have at least ∆E = ω energy difference in time-dependent perturbation theory of J/ω, so at any finite order the decay channel is off-resonant and ineffective.To overcome this energy gap by gaining energy from the perturbation V , one has to go to non-perturbatively large order k * ∼ ω/J, where the local decay rate is exponentially small in k * .This leads to the large prethermalization time (126) for H ω .For more general class of models with a many-body gap, this dynamical stability under perturbation is made precise in [70], albeit with a slightly weaker bound on t pre .
Applying prethermalization to the metrology setting, we have e isHω Ze −isHω − Z ≤ c Z pre N J/ω, ∀s ≤ t pre , (127) from combining Theorem 3.1 and 3.2 in [28], where the constant c Z pre is determined by geometry of the interaction graph.If c Z pre ≤ 2, (127) would be a stronger bound than (124) in the prethermalization time window, which further gives a tighter bound than (123) on the slope of f (ω).
In higher dimensions, it is interesting to ask whether an evolved product state is easy to sample before prethermalization time t pre .This might be tackled by combining the Schrieffer-Wolff transformation involved in prethermalization [70] and the cluster expansion techniques in Section 4. If such a polynomial-time classical sampler exists, we expect the LOCC measurement protocol would be efficient to implement in higher dimensions, for t < t pre . is a product of almost local operators, because X j,ω = R † ω X j R ω evolved by unitary R ω is mostly supported in a finite set of spins near site j.If one measures observable X ω in the final state |ψ ω ⟩, then it is equivalent to measure X in the unperturbed protocol.In practice, one can apply unitary R ω to |ψ ω ⟩ to effectively time-reverse the dynamics, and then measure X by projective measurement to the x-basis.
However, the above measurement does only local estimation because X ω depends on the unknown parameter ω.For global estimation, instead, we propose to measure observable X ω ′ determined by prior knowledge ω ′ (9).By bounding the difference between X ω ′ and X ω , we show such measurement achieves HL for sufficiently small Jt. and V ′ (s) is defined similarly with ω replaced by ω ′ .In the second and third lines of (139), we have used the Duhamel identity.(136) then follows from (10).
The time window (138) is O(J −1 ) for almost all ω ∈ I ω ′ except if ω is very close to ω ′ ±π/(4N t); the latter case can be avoided by slightly adjusting N or t.After measuring g(ω) := ⟨ψ ω | X ω ′ |ψ ω ⟩ quantumly, one cannot compare it to the ideal function cos(2N ωt) to read out ω due to the error (136).Instead, one needs to classically calculate g(ω) as a function of ω to compare the quantum result with.In general dimensions, there are algorithms with runtime being quasi-polynomial in N (poly (N ) for ≤ 2d) [83,84,85,86] for such classical simulation, known as the quantum mean value problem, if Jt = O(log N ).To conclude, the X ω ′ measurement protocol satisfies HL and requires (quasi-)polynomial classical computation to implement.

Figure 3 :
Figure 3: (a) Tensor representation of MPS |A⟩ for N = 5.(b) Tensor representation for ⟨B|O|A⟩, where O = O1 ⊗ O2 ⊗ • • • is a string operator.It can be contracted along the spatial direction in poly (N ) memory and runtime, if the bond dimensions of the two MPSs are poly (N ).More precisely, at the n-th step one has contracted the tensors on the leftmost n qubits to a matrix C (n) of dimension poly (N ); C (n) is then contracted with tensors A (n+1) , On+1, B (n+1) to yield C (n+1) , which costs poly (N ) overhead.When finding the measurement basis for the LOCC protocol, we need e.g.O = ex 1 ⊗ • • • ⊗ ex 1 •••xn ⊗ On+1 to calculate Mx 1 •••xn .(c) 3-layer circuit unitary U ′ thatapproximates e −itH ω ′ with error ϵ.Each gate acts on at most L = O(log(tN/ϵ)) neighboring spins, so U ′ maps MPS to MPS preserving the condition that the bond dimension is poly (N ).Adapted from[59].(d) Acting a 2-layer unitary circuit on an MPS, the bond dimension is increased at most by a factor of the local physical dimension.This can be seen by viewing the shaded region as the new matrix for the final state, whose legs are shown by red.(e) A sketch of why MPS is easy to sample: Whenever a qubit is measured, that qubit can be contracted and the neighboring matrix gets updated.The state is still an MPS on one fewer qubit, with the bond dimension unchanged.

( 71 )
reduces to expectation of local Z j in finite time evolution from |α⟩ in .According to Lieb-Robinson bound(44), to simulate ⟨e isH ω ′′ Z j e −isH ω ′′ ⟩ 0 with error O(N −1/2 ),9 one can truncate the dynamics in a region of size O(log N ) around j. Furthermore, |α⟩ in can be replaced by MPS |α⟩ IN , because the error (61) contributes O 1/ √ N to (71).It takes poly (N ) classical runtime to contract the matrices to get the initial density matrix of |α⟩ IN on the O(log N )-size region.Evolving the state and taking expectation of Z j invokes another poly (N ) overhead.So the total complexity for computing the last line of ( for |e x1•••xn−10 ⟩ and |e x1•••xn−11 ⟩ that comprise a local orthonormal basis.Due to the same reason, (76) also implies

Theorem 4 . 5 .
Suppose product state |0 • • • 0⟩ is evolved under Hamiltonian (11) for time t, after which LOCC measurement satisfying (76) is performed.There exists a constant c + > 0 (independent of N ) determined by geometry and c m , such that for t < c + /J, there is a poly (N ) classical sampling algorithm achieving(79).

Theorem 5 . 3 .
Suppose |GHZ⟩ is evolved under Hamiltonian(11) for time t.For t < c M /J where c M is defined in Proposition 5.2, the LOCC measurement protocol described in Procedure 3.3 (with the basis determined by Procedure 5.1) achieves HL and requires only poly (N ) classical computation.

Proposition 5.2. Suppose
(11)p x1 ⟩ for spin 1 s.t.We want to show that Procedure 5.1 is accurate and efficient, in the sense that the measurement basis automatically satisfy (76) enabling the cluster expansion algorithm, and that the errors accumulate in a mild way.|GHZ⟩ is evolved under Hamiltonian(11)for time t.Procedure 3.1 defines a LOCC measurement basis E x for metrology, while Procedure 5.1 yields an approximate basis E ap as an approximation of M x1 .Compute M ap x1 that approximates M x1 similarly.4. Find an orthonormal basis {|e ap x1x2 ⟩ : x 2 = 0, 1} for spin 2 s.t.