A rapidly mixing Markov chain from any gapped quantum many-body system

We consider the computational task of sampling a bit string $x$ from a distribution $\pi(x)=|\langle x|\psi\rangle|^2$, where $\psi$ is the unique ground state of a local Hamiltonian $H$. Our main result describes a direct link between the inverse spectral gap of $H$ and the mixing time of an associated continuous-time Markov Chain with steady state $\pi$. The Markov Chain can be implemented efficiently whenever ratios of ground state amplitudes $\langle y|\psi\rangle/\langle x|\psi\rangle$ are efficiently computable, the spectral gap of $H$ is at least inverse polynomial in the system size, and the starting state of the chain satisfies a mild technical condition that can be efficiently checked. This extends a previously known relationship between sign-problem free Hamiltonians and Markov chains. The tool which enables this generalization is the so-called fixed-node Hamiltonian construction, previously used in Quantum Monte Carlo simulations to address the fermionic sign problem. We implement the proposed sampling algorithm numerically and use it to sample from the ground state of Haldane-Shastry Hamiltonian with up to 56 qubits. We observe empirically that our Markov chain based on the fixed-node Hamiltonian mixes more rapidly than the standard Metropolis-Hastings Markov chain.

It can be viewed as the zero-temperature quantum analogue of a classical Boltzmann distribution describing classical spins with few-spin interactions.
To accomplish this goal one may construct a suitable quantum-to-classical mapping that converts a quantum Hamiltonian with the ground state ψ to a classical Markov chain M with the steady distribution π(x) = |⟨x|ψ⟩| 2 .A possible choice for M is a Metropolis-Hastings (MH) Markov chain with local updates.In a simple, typical setting, each step of the MH chain flips a randomly chosen bit (or a subset of bits) of x to propose a candidate next state y.The proposed state y is accepted with the probability min{1, π(y)/π(x)} to ensure the detailed balance condition.As was shown in Ref. [5], the mixing time of the MH chain can be upper bounded as where Õ notation hides certain logarithmic factors and s is a sensitivity parameter defined as s = max x̸ =y |⟨y|H|x⟩⟨y|ψ⟩/⟨x|ψ⟩|.For the widely studied family of sign-problem free 1Hamiltonians the sensitivity parameter can be bounded as [5] s ≤ 2∥H∥ (if H is sign-problem free). ( For a local Hamiltonian of the type we consider we have ∥H∥ = O(poly(n)) and so Eqs.(2,3) give a polynomial upper bound on the mixing time of the MH chain.Since each step of the chain makes use of a ratio π(y)/π(x), this constitutes an efficient reduction from sampling to computing (ratios of) probabilities.In other words, we obtain a positive answer to Q2 for sign-problem free Hamiltonians2 .If we further specialize to frustrationfree and sign-problem free Hamiltonians then the requisite ratios of probabilities can be computed efficiently [7] and we get a partial answer to Q1 as well.
Unfortunately, for more general local Hamiltonians-those that may not be signproblem free-it is unknown whether the sensitivity s admits a polynomial upper bound in terms of n and 1/γ, and significant differences may thwart this approach altogether.In the sign-problem free case the Perron Frobenius theorem implies that the ground state amplitudes ⟨x|ψ⟩ are nonnegative in the computational basis.For more general local Hamiltonians the entries of the ground state wavefunction may have nontrivial relative phases, but the MH chain ignores this information.Indeed, this chain only depends on the ratios of probabilities π(y)/π(x).For these reasons it appears unlikely that the MH chain can be shown to mix rapidly for general local Hamiltonians with an inverse polynomial spectral gap.
To proceed, we use a quantum-to-classical mapping based on a proposal from Ref. [8] which was originally introduced to circumvent the fermionic sign problem in Quantum Monte Carlo simulations [9,10].This construction maps any k-local n-qubit Hamiltonian H with real matrix elements, unique ground state ψ, and a spectral gap γ to a new n-qubit Hamiltonian F -defined in Eq. ( 12)-such that: (i) F and H have the same ground state ψ, (ii) the spectral gap of F is at least γ, and (iii) the Hamiltonian F is sign-problem free, modulo a simple basis change, see Section 1.1 for details.Following Ref. [8] we refer to F as a fixed-node Hamiltonian.Importantly, matrix elements of F are efficiently computable given an efficient subroutine for computing the ratio of amplitudes x, y −→ ⟨y|ψ⟩/⟨x|ψ⟩.
(Amplitude computation subroutine) We are thus led to consider natural variants of the questions Q1, Q2 where probabilities are replaced by amplitudes.This allows us to exploit the information encoded in the amplitudes' relative phase, which is an important feature of the ground state.We note that the model where a quantum state can be accessed via the amplitude computation subroutine has been recently studied in [11].Since the fixed node Hamiltonian is sign-problem free, we can sample the ground state distribution π(x) using the MH chain and upper bound the mixing time of the chain using Eqs.(2,3).Unfortunately this is not useful because, as we shall see below, the norm of the fixed-node Hamiltonian can be unbounded.
Instead, we introduce a quantum-to-classical mapping that yields a continuous-time Markov chain rather than a regular discrete-time chain.Recall that a continuous-time Markov chain (CTMC) with a state space S defines a family of probability distributions π t (x) where t ≥ 0 is the evolution time and x ∈ S is the state reached at time t.The time evolution of π t (x) is governed by a differential equation where G is a generator matrix.Rows and columns of G are labeled by elements of S. A matrix element ⟨x|G|y⟩ with x ̸ = y can be viewed as the rate of transitions from y to x. Accordingly, all off-diagonal elements of G must be non-negative.The normalization condition x∈S π t (x) = 1 is satisfied for all t ≥ 0 as long as each column of G sums to zero.A solution of Eq. ( 4) has the form π t (x) = ⟨x|e Gt |x in ⟩, where x in ∈ S is the starting state at time t = 0 and e Gt denotes the matrix exponential.
Our CTMC is defined by a generator G which is a suitably rescaled version of the fixed-node Hamiltonian F associated with H.By design, it has a steady distribution π(x) = lim t→∞ π t (x).It is given by ⟨x|G|y⟩ = max{0, −⟨x|H|y⟩⟨x|ψ⟩/⟨y|ψ⟩} (5) for x ̸ = y.Here and below we assume for simplicity that Note that Eq. ( 5) also determines the diagonal matrix elements of G, since each column of G sums to zero (due to the normalization condition).Our main results are as follows.
Theorem 1 (Rapid mixing).Let H be a k-local n-qubit Hamiltonian with real matrix elements in the standard basis, unique ground state ψ, and a spectral gap γ.Then a continuous-time Markov chain with the state space {0, 1} n and a generator matrix G defined in Eq. ( 5) has a unique steady distribution π(x) = |⟨x|ψ⟩| 2 and obeys for any t ≥ 0 and any starting state x in ∈ {0, 1} n .Here π t (x) = ⟨x|e Gt |x in ⟩ is the distribution achieved by the Markov chain at a time t.
As we show below (Lemma 1), the restriction to Hamiltonians H with real matrix elements is not essential and can be avoided by adding one ancillary qubit.Theorem 1 shows that we may approximately sample from π by running the continuous-time Markov chain for a total time T ∼ γ −1 log(π(x in ) −1 ).However it is not immediately clear how to simulate this process using resources polynomial in T , because of the significant caveat that the norm of G may be large.This may lead to many transitions of the chain occuring in a very short time, and it prevents us from approximating the continuous-time chain by a discrete-time one obtained by naively discretizing the interval [0, T ].
Our saving grace is that we are able to establish a mild upper bound on the mean number of transitions of the chain G within a given interval, when the starting state is sampled from the steady distribution.This allows us to directly simulate the Markov chain of Theorem 1 using a truncated version of the well-known Gillespie's algorithm [12] in which we impose an upper limit on the total number of transitions of the chain.In this way we obtain the following result.
Theorem 2 (Ground state sampling).Let π * = min x π(x).There exists a classical randomized algorithm that takes as input a precision ϵ > 0, a starting state x in ∈ {0, 1} n , makes at most calls to the amplitude computation subroutine, and either outputs a bit string y ∈ {0, 1} n or declares an error.Let S ϵ ⊆ {0, 1} n be the set of starting states x in for which the algorithm declares an error with probability at most ϵ/4.The set S ϵ is nonempty.Moreover, if the algorithm is run with starting state x in ∈ S ϵ and does not declare an error, then its output y is sampled from a distribution ϵ-close to π.
The aforementioned caveat that G has large matrix elements is also direcly related to the additional requirement above that we are provided with a good starting state x in .Strictly speaking, Theorem 2 falls short of giving an efficient reduction from sampling to computing amplitudes of the ground state, since it requires this extra input.However, a good starting state x in can at least be verified using polynomial resources (and the amplitude computation subroutine): given x ∈ {0, 1} n and ϵ > 0 we can decide with high probability whether or not x ∈ S ϵ by running the above algorithm O(ϵ −2 ) times with starting state x, and using the results to compute an estimate of the probability that the algorithm declares an error.
In Section 1 we describe the quantum-to-classical mapping in detail and we prove Theorems 1 and 2. Then in Section 2 we demonstrate our algorithm for a concrete example (the Haldane-Shastry spin chain) where the amplitudes of the ground state are efficiently computable, and we compare our approach with the Metropolis-Hastings algorithm.
1 Quantum-to-classical mapping In this section we prove Theorems 1 and 2.
In the following, for any matrix M with real eigenvalues we write λ i (M ) for the i-th smallest eigenvalue of M .If M is an n-qubit operator then i = 1, 2, . . ., 2 n .As described in the introduction, we shall consider an n-qubit, k-local Hamiltonian with a unique ground state |ψ⟩ and we are interested in sampling from the distribution π(x) = |⟨x|ψ⟩| 2 .We shall assume Eq. (6) holds, i.e., |⟨x|ψ⟩| > 0 for all x ∈ {0, 1} n .

The continuous-time Markov chain
We first establish the following Lemma that shows we may restrict our attention to Hamiltonians with real matrix elements in the standard basis.

Lemma 1 (Reduction to real Hamiltonians).
Let H be a local Hamiltonian with unique ground state ψ and spectral gap γ > 0, satisfying Eq. (6).There is a O(n k )-sparse (n + 1)-qubit Hamiltonian H R with unique ground state |ϕ⟩ = Re(|ψ⟩)|0⟩ + Im(|ψ⟩)|1⟩ and spectral gap at least min{1, γ}.The jth nonzero entry of H R in a given row can be computed using one call to the amplitude computation subroutine and efficient classical computation.
Proof.Suppose H is an n-qubit, k-local Hamiltonian with a unique ground state ψ such that ⟨x|ψ⟩ ̸ = 0 for all x.Let us fix the global phase of ψ such that ⟨0 n |ψ⟩ ∈ R. We may write for real matrices A, B which have at most O(n k ) entries in each row.Now adjoin one ancilla qubit and consider the Hamiltonian Note that H ′ is a real symmetric matrix.Let us suppose that |ϕ⟩ is an eigenvector of H with eigenvalue λ.Write |ϕ⟩ = |a⟩ + i|b⟩ where a, b are real vectors, and consider the n + 1-qubit states Observe that |ϕ 0 ⟩ is orthogonal to |ϕ 1 ⟩ and that each of these states is an eigenstate of H ′ with eigenvalue λ.By letting |ϕ⟩ range over all 2 n eigenvectors of H we obtain a complete set of 2 • 2 n eigenvectors for H ′ .In particular, the Hamiltonian H ′ has the same spectral gap as H but has a groundspace spanned by two eigenstates ψ 0 , ψ 1 .
In order to get rid of this degeneracy while preserving the spectral gap, we add a suitable positive semidefinite term to H ′ .For each x ∈ {0, 1} n write the complex phase of ⟨x|ψ⟩ as ⟨x|ψ⟩ |⟨x|ψ⟩| = cos(θ x ) + i sin(θ x ) where θ x ∈ R. Then consider a Hamiltonian Note that H ′′ has O(n k ) nonzero entries in each row and each one can be computed efficiently using one call to the amplitude computation subroutine 3 .Now observe that where we used the definition of θ x .Therefore From this we see that ψ 1 is an eigenvector of H ′′ with eigenvalue λ 1 (H)+1.Since H ′′ ≥ H ′ , all other eigenvalues of H ′′ are at least λ 2 (H).Therefore H ′′ has unique ground state ψ 0 and its spectral gap is at least as large as Proof of Theorem 1.Our main technical tool is the so-called effective fixed-node Hamiltonian proposed by Ceperley et al [8].It can be viewed as a method of "curing" the sign problem in Quantum Monte Carlo simulations.The method is applicable whenever amplitudes of the ground state can be efficiently computed.
As discussed above, in light of Lemma 1 we shall assume without loss of generality that H has real matrix elements and a unique ground state with real entries in the standard basis.Define sets Here x, y ∈ {0, 1} n are basis states.Define a fixed-node Hamiltonian F with matrix elements Note that F is stoquastic (sign problem free) modulo a change of basis |x⟩ → sign(⟨x|ψ⟩)|x⟩.
The following lemma is largely based on Ref. [8].
Lemma 2 (Fixed-node Hamiltonian).The Hamiltonians F and H have the same unique ground state ψ and the same ground energy.The spectral gap of F is at least as large as the one of H.
Proof.First we claim that Indeed, for any basis state x which proves Eq. ( 13).Next we claim that ⟨ϕ|F |ϕ⟩ ≥ ⟨ϕ|H|ϕ⟩ for any state ϕ. ( Indeed, Using the definition of S + one gets where s(x, y) = sign(⟨x|H|y⟩).This is equivalent to In particular, ⟨ϕ|F −H|ϕ⟩ ≥ 0. We claim that the smallest and second-smallest eigenvalues of H and F satisfy Indeed, from Eq. ( 13) one infers λ 1 (F ) ≤ ⟨ψ|F |ψ⟩ = ⟨ψ|H|ψ⟩ = λ 1 (H).Suppose ϕ is a ground state of F .From Eq. ( 14) one gets λ 1 (F ) = ⟨ϕ|F |ϕ⟩ ≥ ⟨ϕ|H|ϕ⟩ ≥ λ 1 (H).Thus λ 1 (H) = λ 1 (F ) and ψ is a ground state of both H and F .Let ϕ be an eigenvector of F orthogonal to ψ such that F |ϕ⟩ = λ 2 (F )|ϕ⟩.Since H has the unique ground state ψ, one has λ 2 (H) ≤ ⟨ϕ|H|ϕ⟩ ≤ ⟨ϕ|F |ϕ⟩ = λ 2 (F ).Here the second inequality uses Eq. ( 14).Thus The next step is to convert the fixed-node Hamiltonian F to a Markov chain with the state space {0, 1} n such that the ground state distribution π is the unique steady state of the chain.One technical difficulty that prevents us from applying the standard quantum-to-classical mapping commonly used in Quantum Monte Carlo simulations (see e.g.Section 8 of Ref. [6]) is that the norm of F may be unbounded.Indeed, a diagonal matrix element ⟨x|F |x⟩ depends on quantities ⟨x|H|z⟩⟨z|ψ⟩/⟨x|ψ⟩ which can be arbitrarily large even if the original Hamiltonian H has a bounded norm.Instead, we shall convert F to a generator matrix describing a continuous-time Markov chain.We will see that the latter can be simulated efficiently using the well-known Gillespie's algorithm [12].In the proof of Theorem 2 below we establish that the average number of iterations in Gillespie's algorithm can be bounded by a quantity that depends only on the spectral gap of F and a certain "off-diagonal norm" of F which is at most n k ∥H∥ even if diagonal matrix elements of F are unbounded.
Define a continuous-time Markov chain with the state space {0, 1} n as a family of stochastic matrices of the form e Gt , where we use the matrix exponential, t ≥ 0 is the evolution time, and G is a generator matrix of size 2 n × 2 n .By definition, the probability that the chain evolved for time t makes a transition from a state x to a state y is given by ⟨y|e Gt |x⟩.Equivalently, the probability of a transition from x to y ∈ {0, 1} n \ {x} between time t and t + dt in the limit dt → 0 is given by ⟨y|G|x⟩dt.A valid generator matrix G must have non-negative off-diagonal matrix elements and each column of G must sum to zero.This ensures that e Gt is a stochastic matrix for any t ≥ 0. A probability distribution η is a steady state of the chain iff for any y.Equivalently, x ⟨y|e Gt |x⟩η(x) = η(y) for all y and all t ≥ 0.
Let F be the fixed-node Hamiltonian constructed above.Define a generator matrix G such that for all states x, y.Using Eq. ( 12) we see that G can equivalently be expressed as in Eq. ( 5).Note that λ 1 (F ) can be efficiently computed by making O(n k ) calls to the amplitude computation subroutine.Indeed, Lemma 2 implies that λ 1 (F ) = λ 1 (H).Furthermore, for any basis state x.It remains to note that each row of H has at most O(n k ) non-zeros.
The following lemma implies that G is a valid generator matrix for a continuous-time Markov chain with the unique steady state π.Let us check that e Gt is a stochastic matrix.It follows directly from the definitions that G has non-negative off-diagonal elements.Thus all matrix elements of e Gt are nonnegative.Any column of G sums to zero since Here we noted that ⟨ψ|F = λ 1 (F )⟨ψ| due to Lemma 2. This implies that any column of e Gt sums to one, that is, e Gt is a stochastic matrix.Finally, for any state y.Thus π is a steady state.This is a unique steady state since the zero eigenvalue of G is non-degenerate.
To complete the proof of Theorem 1 we now establish the bound Eq. ( 7).
Our proof strategy closely follows Ref. [13], see Proposition 3 thereof.First we claim the Markov chain e Gt obeys the detailed balance condition for all states y, z.Indeed, let D be a diagonal matrix such that ⟨z|D|z⟩ = ⟨z|ψ⟩ for all z.
We have Note that M is a symmetric matrix since the fixed-node Hamiltonian F is symmetric.The identity e Gt = De M t D −1 gives Clearly this expression is symmetric under the exchange of y and z, which proves Eq. ( 18).
Using Cauchy-Schwartz one gets The detailed balance condition Eq. ( 18) gives for all y.Combining Eqs.(19,20) gives Here we used an identity e 2Gt = De 2M t D −1 .Consider an eigenvalue decomposition such that λ i (M ) is the i-th smallest eigenvalue of M and ϕ i is the corresponding eigenvector.We have ⟨ϕ i |ϕ j ⟩ = δ i,j .Since the matrices M and G are related by a similarity transformation, Lemma 3 implies that λ 2 n (M ) = 0 is the largest eigenvalue of M and λ i (M ) ≤ −γ F for all i < 2 n .Furthermore, Lemma 2 implies M |ψ⟩ = 0, that is, the only zero eigenvector of M is ϕ 2 n = ψ.Thus we can write obeys ∥R∥ ≤ e −2γ F t .Substituting e 2M t = |ψ⟩⟨ψ| + R into Eq.( 21) gives It remains to note that γ F ≥ γ, see Lemma 2.

Sampling algorithm
Let x in ∈ {0, 1} n be the starting state.We would like to sample x from a distribution To this end we use Gillespie's algorithm [12].The algorithm takes as input the starting state x in , evolution time t ≥ 0, and returns a sample x from the distribution π t (x).
For completeness, we include a proof sketch for the correctness of Gillespie's algorithm.
Proof sketch.The following derivations are based on Ref. [14].Let P y,x (τ ) = Pr(ξ(τ ) = y|ξ(0) = x) = ⟨y|P (τ )|x⟩ denote the probability of being in state y at time τ given that the initial state is ξ(0) = x.We claim that P (τ ) satisfies the differential equation P ′ (τ ) = GP (τ ).Let h > 0 where we think of h as a small quantity tending to zero.Let m(x, h) be a random variable that is equal to the number of transitions that occur in the time interval [0, h] starting with initial state x.Since the time ∆τ at which the first transition occurs is exponentially distributed with rate |⟨x|G|x⟩|, Next we note that where ∆τ x , ∆τ y are exponentially distributed random variables with rates |⟨x|G|x⟩|, |⟨y|G|y⟩| respectively.We have Eqs. (23,24) imply that for every x ∈ {0, 1} n and every y ̸ = x, For every τ ≥ 0 and x, y ∈ {0, 1} n , we arrive at Hence, we conclude that Solving this differential equation known as Kolmogorov's forward equation, we obtain the solution P (s) = e Gs .This establishes the desired equivalence Pr(ξ(s Let m(x in , t) be the number of flips performed by the algorithm, that is, the number of times the function ξ(s) changes its value.One can easily check that the total number of calls to the amplitude computation subroutine made at lines 5 and 11 is at most O(n k ) • m(x in , t).Thus the algorithm is efficient as long as the number of flips is not too large.Our key observation is the following.

Lemma 4 (Average number of flips). Let F od be the off-diagonal part of the fixed-node
Hamiltonian obtained from F by setting to zero all diagonal matrix elements.Then We note that ⟨ψ|F od |ψ⟩ ≤ 0 by definition of F , see Eq. ( 12).Eq. ( 25) is closely related to a known estimator for the off-diagonal part of the Hamiltonian in many-body simulations based on the continuous-time worldline quantum Monte Carlo method [15] (see, e.g., Eq. ( 19) of [16]).
Proof.Let ρ(ξ) be the probability distributions over paths ξ : [0, t] → {0, 1} n generated by the algorithm with the starting state x in sampled from π(x in ).Since π is the steady distribution of the considered Markov chain, one infers that Gillespie's algorithm terminated at any intermediate time s samples the steady distribution, that is, for any fixed s ∈ [0, t].Now let us consider the probability P x,s (ξ) for the path ξ with a fixed value ξ(s) = x to contain a flip between time s and s + ds in the limit ds → 0. This probability is P x,s (ξ)ds = y̸ =x ⟨y|G|x⟩ds.Thus the average number of flips between time s and s + ds is given by Here we used Eq. ( 26) and the definitions of F and G, see Eqs. (12,17).Integrating Eq. ( 27) over s ∈ [0, t] completes the proof.
By definition of F , see Eq. ( 12), one has Here we noted that each row of H contains at most n k non-zero matrix elements and each matrix element has magnitude at most ∥H∥.Thus the average number of flips performed by Gillespie's algorithm is at most tn k ∥H∥, assuming that the starting state x in is sampled from the steady distribution π(x in ).
Proof of Theorem 2. We shall use a truncated version of Gillespie's algorithm which terminates whenever the condition τ ≥ t at line 8 is satisfied or the number of flips exceeds a cutoff value 4ϵ −1 tn k ∥H∥.In the latter case the truncated algorithm declares an error.By Markov's inequality, there exists at least one starting state x ⋆ in such that the algorithm errs with the probability at most ϵ/4.The truncated Gillespie's algorithm makes at most O(tϵ −1 n 2k ∥H∥) calls to the amplitude computation subroutine since each flip requires O(n k ) amplitude computations.Let us write πt (x) for the probability distribution sampled by the truncated algorithm, with some starting state x in , and conditioned on the event that no error occurs.Let E be the event m(x in , t) ≥ 4ϵ −1 tn k ∥H∥ and let P (E, x in , t) be the probability of this event, i.e., the probability that the truncated Gillespie algorithm outputs an error.Then where η t (x) is the probability that Gillespie's algorithm outputs x conditioned on event E. From the above we get For the special starting state x ⋆ in we see that the distribution sampled by the truncated Gillespie algorithm is ϵ/2-close to the distribution π t sampled by the Gillespie algorithm without truncation.
Finally, we use Theorem 1 to choose the evolution time t in Gillespie's algorithm large enough such that the distribution π t (x) is ϵ/2-close to the steady distribution π(x) in total variation distance.In particular, we conclude that the distributions π t (x) = ⟨x|e Gt |x in ⟩ and π(x) are ϵ/2-close in the total variation distance for all t ≥ t ϵ/2 , where This concludes the proof of Theorem 2.

Application to the Haldane-Shastry model
In this section we further investigate the properties of the continuous-time Markov Chain (CTMC) based on the fixed node Hamiltonian.Here we focus on a specific example of a quantum spin system where the amplitudes of the ground state can be computed exactly and efficiently.We implement the CTMC in software and use it to approximately sample from the resulting probability distribution.The example we consider is a system of L ≥ 2 qubits interacting according to the Haldane-Shastry Hamiltonian as defined in, e.g., Refs.[17] and [18].This Hamiltonian describes a spin chain with periodic boundary conditions (i.e., a ring) and long-range two-qubit interactions that depend on the shortest distance between the qubits on the ring.The Hamiltonian Eq. (29) has a unique ground state ψ with the following properties [17,18].For every where L k=1 (−1) (k−1)x k determines the relative sign, and for every x ∈ {0, 1} L such that |x| ̸ = L 2 , ⟨x|ψ⟩ = 0.It is also known [17] that the spectral gap of H scales inversely with system size L, and in particular satisfies for some constant c.We chose this example because the entries of the ground state Eq.(30) are efficiently computable, and because we are not aware of alternative algorithms with a provable polynomial runtime for sampling from the probability distribution π(x) = |⟨x|ψ⟩| 2 .For example, in Appendix A we show that the state ψ is not a fermionic Gaussian state and therefore cannot be directly sampled using a naive application of free-fermion based methods.One can easily check that Haldane-Shastry Hamiltonian H has a sign problem (that is, H is not stoquastic).Indeed, each two-qubit term in H has a positive matrix element between basis states 10 and 01.Moreover, in Appendix B we use techniques of Ref. [19] to show that the sign problem in H cannot be "cured" by a local change of basis, that is, a Hamiltonian U † HU has a sign problem for any product unitary U = U 1 ⊗U 2 ⊗• • •⊗U L .This suggests that standard Quantum Monte Carlo methods applicable to sign problem free Hamiltonians cannot be directly used to sample the ground state of H.It should be noted that Haldane-Shastry Hamiltonian admits a 2D generalization with an efficiently computable ground state amplitudes [20].We expect that our sampling algorithm can be applied to this generalized model as well.
The first step in our method for sampling from the probability distribution π(x) is to construct the associated fixed-node Hamiltonian defined by Eq. (12).Note that all matrix elements of H and all entries of |ψ⟩ are real numbers, so in this example there is no need to preprocess the Hamiltonian using Lemma 1.The associated CTMC is generated by the matrix G from Eq. (17).The mixing time of this CTMC scales inversely with the spectral gap γ F of F (see Eq. ( 22)).We have shown that γ F ≥ γ holds in general, which, combined with Eq. (31) implies that the mixing time is upper bounded as O(L). Figure 1 shows the spectral gaps γ and γ F for the Haldane-Shastry Hamiltonian and small values of L, computed using exact numerical diagonalization.We find empirically that γ F has a milder scaling with system size L than γ and we therefore expect the CTMC to mix more rapidly than the rigorous bounds suggest.
We implement Gillespie's algorithm using the fixed-node construction to approximately sample from the distribution π(x) = |⟨x|ψ⟩| 2 , x ∈ {0, 1} L .The starting state is chosen randomly from Supp(|ψ⟩) = {x ∈ {0, 1} L : ⟨x|ψ⟩ ̸ = 0}.By our rigorous mixing time upper bound and numerically determined spectral gaps, the Markov chain is expected to converge rapidly.To assess convergence numerically in practice, we use the vanilla R statistic discussed in [21] to select an appropriate "burn-in" period for our chain.After allowing the CTMC to (approximately) converge, we are then able to estimate physical quantities in the ground state ψ.These can be compared with exact formulas that are available for the Haldane-Shastry ground state.For example, it is known from [17] that for every 1 ≤ i < j ≤ L, the two-point ZZ correlation admits the formula For every d ∈ {1, . . ., L − 1}, we consider a two-point correlator which is averaged over all pairs of spins at a distance d on the ring.In particular, we let Then the ground state satisfies and the exact value is therefore given by Eq. (32).We have tested our implementation of the CTMC by computing the expected value of M d for d = 1, 5, 10.Since these observables are diagonal in the computational basis, they can be straightforwardly estimated from the output {ξ(τ ) : 0 ≤ τ ≤ T } of Gillespie's algorithm.Here T is the final time that determines the stopping condition of Gillespie's algorithm (see line 8).The time T should not be confused with the number of flips, which is a random variable.In particular, let f : {0, 1} L → R be such that f (z) = ⟨z|M d |z⟩.Letting τ 0 ≤ T be an initial "burn-in" time used to equilibrate the CTMC, we compute an approximation to Ex∼π [f (x)] = ⟨ψ|M d |ψ⟩ using an estimator (note that this integral can be equivalently be expressed as a finite sum since ξ is piecewise constant).The data from our CTMC is shown in Fig. 2 and compared with the exact value computed using Eq.(32).See Appendix C for a description of the analysis used to estimate the error bars in this plot.
Next we compare our CTMC with the standard Metropolis-Hastings Markov Chain for sampling from the ground state probability distribution.Fig. 3 shows the nearestneighbor two-point correlator M 1 estimated using the MH method.To use a Metropolis-Hastings Markov chain to sample from the ground state distribution π, we define the proposal distribution by viewing H as the unweighted adjacency matrix of a transition graph with the state space {x ∈ {0, 1} L : |x| = L 2 }.When the chain is in state x, it will propose to move to a neighbour y of x chosen uniformly at random.More explicitly, for every x, y ∈ {0, 1} L such that |x| = |y| = L 2 and x and y differ in exactly two bits, the chain proposes to transition from x to y with probability Q(y|x) = 1/(L/2) 2 =4 L 2 .The acceptance probabilities A(y|x) = min{1, π(y)Q (x|y)  π(x)Q(y|x) } are defined so that Q and A together satisfy the detailed balance condition.The fixed-node MH chain is defined by selecting a different proposal distribution, namely the one induced by viewing F as an unweighted adjacency matrix.More precisely, for a fixed state x as above the proposal distribution Q(y|x) is uniform on the set of states y with |y| = L 2 such that ⟨ψ|x⟩⟨x|H|y⟩⟨y|ψ⟩ ≤ 0. We note that both MH chains are irreducible and aperiodic 4 and therefore the distribution obtained by running the chain with any initial state converges to the limiting distribution π as the number of steps grows.To enable a comparison between the Metropolis-Hastings Markov Chain and the CTMC we have tried to assess the computational cost of generating independent samples using each method.For the MH chain, this is determined by the autocorrelation time as measured in the number of steps of the (discrete-time) Markov Chain.For the CTMC, we can also compute the autocorrelation time but the computational cost of running the chain for a given interval of time (using Gillespie's algorithm) is determined by the number of transitions (i.e.flips) during the interval.For this reason we choose to normalize the autocorrelation time of the CTMC so that, roughly speaking, it is measured in units of transitions rather than time, see Appendix C for details.We note that the normalized autocorrelation times in Fig. 4 (a) are significantly lower than the ones for the Metropolis-Hastings algorithm reported in Fig. 4 (b).Intriguingly, we also observe in Fig. 4 (b) that the Metropolis-Hastings algorithm using the fixed-node Hamiltonian F has shorter autocorrelation times for each number of qubits and each of the three observables.In other words, using the fixed-node Hamiltonian to generate the proposal distribution appears to have a marked advantage over the naive strategy based on the Hamiltonian itself.This is despite the fact that the nonzero matrix elements of F are a subset of those of H.It appears that the information about the sign structure of ψ that determines which entries of F are set to zero may help the MH chain converge more quickly.
To investigate this further, we looked at how the autocorrelation times behave when we use a corrupted ground state distribution π as opposed to the true ground state distribution π.For some noise strength κ > 0, we construct a perturbed distribution π by defining and setting π(x) = |⟨x| ψ⟩| 2 for every x ∈ {0, 1} L where N (µ, σ) is a Gaussian random variable with mean µ and standard deviation σ. π is normalized so that x∈{0,1} L π(x) = 1.Fig. 5 shows that as the total variance distance between π and π increases, the differences between the autocorrelation times based on H (Haldane-Shastry Hamiltonian) and F (defined by Eqs.(10,11,12) with ψ replaced by ψ) disappear as expected.Proof.Suppose D acts non-trivially on a single qubit j.Then D is a linear combination of the identity I and Z j = −iγ 2j−1 γ 2j .Thus D ∼ e αγ 2j−1 γ 2j for some complex number α.In the general case D is a product of operators as above.Thus we can write D ∼ e Γ , where Γ is some operator quadratic in γ 1 , . . ., γ 2n .It is well known that matrix exponentials e Γ with a quadriatic fermionic operator Γ map free fermion states to free fermion states up to the normalization [25].
Assume that |ψ 4 ⟩ ∼ |ϕ⟩ for some free fermion state |ϕ⟩.Then Eq. (39) gives for some real-valued functions θ(x A , x B ) and f j (x A j , x B ).Note that ⟨x|ψ n ⟩ ̸ = 0 whenever |x| = n/2.Thus f j (x A j , x B ) ̸ = 0 for all x A and x B as above.From Eq. (41) one gets where |ψ 4 ⟩ is defined in Eq. ( 36) and D j are diagonal invertible single-qubit operators such that )⟩ Using Fact 3 again one infers that |ψ 4 ⟩ is free.However this is a contradiction since we have already proved that |ψ 4 ⟩ is not free.Thus |ψ n ⟩ is not proportional to a free state.

B Sign problem in Haldane-Shastry Hamiltonian
Recall that an n-qubit Hamiltonian H is called stoquastic (sign problem free) if H has real matrix elements in the standard basis and ⟨x|H|y⟩ ≤ 0 for all x ̸ = y.Consider a Hamiltonian H = where J i,j > 0 are arbitrary coefficients.This includes Haldane-Shastry Hamiltonian Eq. (29) as a special case.Choose any basis vectors x, y ∈ {0, 1} n that differ only on two qubits i, j such that x i x j = 10, y i y j = 01, and x ℓ = y ℓ for all ℓ / ∈ {i, j}.A simple calculation gives ⟨x|H|y⟩ = 2J i,j > 0, that is, H is not stoquastic.Suppose there exist single-qubit unitary operators U 1 , U 2 , . . ., U n such that is stoquastic.The question of whether Hamiltonians of the form Eq. ( 43) can be made stoquastic by a local change of basis has been studied by Klassen and Terhal [19].Lemma 22 of Ref. [19] implies that without loss of generality the unitaries U j can be chosen from a finite subgroup of the unitary group (known as the Clifford group).Hence the number of distinct unitaries U j is upper bounded by a constant independent of n.Thus for a sufficiently large number of qubits n, there will be at least one pair of qubits i < j such that U i = U j .Note that U i ⊗ U j commutes with X ⊗ X + Y ⊗ Y + Z ⊗ Z if U i = U j .Thus we can write where H else is a sum of operators that act non-trivially on at most one of the qubits i, j.Now the same calculation as above shows that ⟨x|H U |y⟩ = 2J i,j > 0, that is, H U is not stoquastic.

C Details of numerical implementation
In this appendix we describe how error bars are computed in Fig. 2. We also describe the definition of the normalized autocorrelation time reported in Fig. 4. Let {ξ(τ ) : τ ≥ 0} denote the stochastic process induced by running Gillespie's algorithm on the HS fixed-node Hamiltonian where the initial state ξ(0) ∼ π, the true ground state distribution.The following derivation is based on Ref. [26].Let f : {0, 1} L → R be a function and suppose our goal is to estimate the mean of f with respect to the steady distribution π.Let T > 0 be large and h > 0 be small.We can estimate µ = Ex∼π [f (x)] using the estimator f (ξ(jh)).
Note that in the limit h → 0 this can be represented as an integral (cf.Eq. (34)).For the purposes of estimating error bars it will be convenient to use the discretized representation however.Since ξ(0) ∼ π, ξ(τ ) ∼ π for every τ ≥ 0. Let σ 2 f = Var(f (ξ(0))) = Var(f (ξ(τ ))) for every τ ≥ 0. Recall that the Pearson correlation coefficient of two random variables A and B is defined by cor(A, B) = cov(A, B) σ A σ B where σ 2 A and σ 2 B are the variances of A and B respectively.Then, the variance of the estimator μ is cor(f (ξ(ih)), f (ξ(jh))).
We assume for every i ∈ {1, . . ., ⌊ T h ⌋}, each correlation term in τ f = ⌊ T h ⌋ j=1 cor(f (ξ(ih)), f (ξ(jh))) only depends on |i − j| and is otherwise independent of i.Thus, τ f is the integrated autocorrelation time w.r.t f .We use the emcee library [27] to obtain an estimate τf of τ f .We further estimate σ2 using τf and the sample variance of {f (ξ(1h)), f (ξ(2h)), . . ., f (ξ(⌊ T h ⌋h))}.Notice that τf h estimates the autocorrelation time of the CTMC.Let r denote the total number of transitions it took for the CTMC to reach time T .Then r T gives the average number of transitions needed for the CTMC to advance time by 1 unit.Thus, we infer that it takes an average of τf h r T transitions for the CTMC to advance time by τf h units.In Figure 4 we plot the normalized autocorrelation time τf h r T against the number of qubits.

Lemma 3 (
Generator matrix).The generator matrix G has real non-positive eigenvalues.Its largest and second-largest eigenvalues are 0 and −γ F respectively, where γ F is the spectral gap of the fixed-node Hamiltonian F .The matrix exponential e Gt is a stochastic matrix for any t ≥ 0. The distribution π is the unique steady state of e Gt .Proof.Let D be a diagonal matrix such that ⟨x|D|x⟩ = ⟨x|ψ⟩.By definition, G = D(λ 1 (F )I − F )D −1 .Thus eigenvalues of G coincide with eigenvalues of λ 1 (F )I − F .This proves the first and the second claims of the lemma.

Figure 1 :
Figure 1: (a) The inverse of the spectral gap of the fixed-node Hamiltonian F increases more slowly than that of the Haldane-Shastry Hamiltonian for small values of the number of qubits L (computed using exact numerical diagonalization).(b) A linear fit for L ≥ 4 on a log-log plot suggests a scaling of γ F ∼ L −0.7... .

Figure 2 :
Figure 2: Two-point correlation functions ⟨ψ|Z i Z i+d |ψ⟩ with d = 1, 5, 10 estimated using the continuous-time Markov Chain and compared with the exact formula Eq. (32).Here the CTMC was run for a total time T = 10 6 and an initial "burn-in" time τ 0 = 100 was used for equilibration.The estimator μ is computed according to Eq. (34) and the estimate of standard deviation σ is constructed as described in Appendix C.

Figure 3 :
Figure 3: Nearest-neighbor two-point correlator computed using a Metropolis-Hastings Markov Chain with 5 • 10 6 steps of the Markov Chain and 10 6 used for equilibration.In (a) the Markov Chain proposal distribution was determined by H, and in (b) the proposal distribution was determined by F .

Figure 4 :
Figure 4: (a) Autocorrelation time of observables estimated using CTMC, measured in units of the number of transitions.(b) Autocorrelation time of observables estimated using Metropolis-Hastings with proposal distributions generated by either the Haldane-Shastry Hamiltonian H or the associated fixed-node Hamiltonian F .

Figure 5 :
Figure5: Autocorrelation times of Metropolis-Hastings chains defined using corrupted ground state distributions.Each datapoint is computed using a length 1000000 chain with a 1000-sample burn in period, all for L = 20.When the noise strength κ is small, applying the fixed-node transformation reduces autocorrelation times for each of the chosen observables.As the error strength increases (as the corrupted distributions become dominated by noise), the differences between the autocorrelation times of F and H vanish.
40) which contradicts to Eq. (36).This proves the lemma for n = 4. Consider an integer n = 4m ≥ 4. Partition the set of n qubits as [n] = AB where A = {1, m + 1, 2m + 1, 3m + 1} and B is the complement of A. Given bit strings x A ∈ {0, 1} |A| and x B ∈ {0, 1} |B| , let x A x B ∈ {0, 1} n be a string whose projection onto A and B coincides with x A and x B respectively.Suppose |x A | = 2 and |x B | = n/2 − 2. Then Eq. (35) implies Fact 2([24]).Suppose a four-qubit state |ϕ⟩ has support only on even-weight basis vectors.Then |ϕ⟩ is a free fermion state if and only if Suppose |ϕ⟩ is an n-qubit free fermion state and D is a tensor product of diagonal single-qubit operators.Then D|ϕ⟩ is proportional to a free fermion state.