Rapid mixing of path integral Monte Carlo for 1D stoquastic Hamiltonians

Path integral quantum Monte Carlo (PIMC) is a method for estimating thermal equilibrium properties of stoquastic quantum spin systems by sampling from a classical Gibbs distribution using Markov chain Monte Carlo. The PIMC method has been widely used to study the physics of materials and for simulated quantum annealing, but these successful applications are rarely accompanied by formal proofs that the Markov chains underlying PIMC rapidly converge to the desired equilibrium distribution. In this work we analyze the mixing time of PIMC for 1D stoquastic Hamiltonians, including disordered transverse Ising models (TIM) with long-range algebraically decaying interactions as well as disordered XY spin chains with nearest-neighbor interactions. By bounding the convergence time to the equilibrium distribution we rigorously justify the use of PIMC to approximate partition functions and expectations of observables for these models at inverse temperatures that scale at most logarithmically with the number of qubits. The mixing time analysis is based on the canonical paths method applied to the single-site Metropolis Markov chain for the Gibbs distribution of 2D classical spin models with couplings related to the interactions in the quantum Hamiltonian. Since the system has strongly nonisotropic couplings that grow with system size, it does not fall into the known cases where 2D classical spin models are known to mix rapidly.


Introduction
The application of Markov chain Monte Carlo methods to thermal states of a limited class of quantum systems was first proposed by Suzuki, Miyashita and Kuroda 40 years ago [43]. Their method applied to Hamiltonians with real and nonpositive off-diagonal matrix elements in a particular basis, a property that was called "being free of the sign problem" for many years until the term "stoquastic" was eventually adopted [9] to emphasize a connection with nonnegative matrices. Since then there have been a growing number of works on rigorous classical simulations for various stoquastic systems [12,8,16,10]. Many quantum systems of physical and computational interest are stoquastic, including spinless particles moving on arbitrary graphs with position dependent interactions, as well as generalized transverse Ising models, which are particularly notable for their use in quantum annealing [31,20,1].

Statement of results.
A general n-qubit 1D stoquastic Hamiltonian with nearest-neighbor interactions has the form, where each H j,j+1 acts non-trivially on (at most) the sites j, j + 1 (with n + 1 identified with 1) and each H j,j+1 ≤ 1. Define the "computational basis" {|1 , |−1 } for each qubit to be the eigenstates of the σ z operator. We assume H is stoquastic in the computational basis [9,12], which means that in this basis each H j,j+1 is a real symmetric matrix with nonpositive offdiagonal entries, z|H j,j+1 |z ≤ 0 for all z, z ∈ {−1, 1} n with z = z .
For a given inverse temperature β we consider the quantum partition function Z β and the quantum Gibbs state ρ β , and also the expectation value of observables Q in the thermal state, Q β := tr[Qρ β ]. For the special class of systems called 1D generalized transverse Ising models (TIM) we go beyond (1) and allow systems with long-range interactions that decay sufficiently quickly (faster than inverse square) with distance, K zz jk σ z j σ z k , Γ := min j=1,...,n Γ j > 0, K z j ∈ [−1, 1] , and |K zz jk | ≤ |i − j| −(2+ξ) for ξ > 0.
The form (5) simplifies both the presentation of the PIMC method and our proof that the mixing time of PIMC for these models is at most poly(n, e β , Γ −1 ). We also show the mixing time of PIMC is at most poly(n, e β , Γ −1 ) for generalized TIM with σ x σ x and σ y σ y interactions, K zz j , K z j ∈ [−1, 1] , K xx j ≥ 0 , K yy j ∈ [−K xx j , K xx j ] and Γ := min j=1,...,n Γ j > 0.
The mixing time bounds can be used to construct an FPRAS (fully polynomial-time randomized approximation scheme) for the partition function of models of the form (5) and (6) and inverse temperatures β = O(log n). In addition to the partition function there is a similar FPRAS for the expectation value of observables that are sparse and row-computable (in the computational basis) in the thermal state ρ β for β = O(log n). For general 1D stoquastic Hamiltonians (1) we prove that the PIMC mixing time is at most quasi-polynomial in the system size, which yields a quasi-polynomial time approximation algorithm for the partition function and for the approximation of observables.
Theorem 1. There is an algorithm which takes as input an inverse temperature β and an n-qubit 1D stoquastic Hamiltonian H and outputs an estimateZ β of the quantum partition function Z β . If also given an observable Q 0 that is sparse and row-computable in the computational basis, the algorithm outputs an estimateQ of its expectation value in the state ρ β .
Relation to previous work. A variety of quantum Monte Carlo methods have been devised for estimating equilibrium properties of stoquastic Hamiltonians, which can be broadly divided into diffusion methods [21,40,26] and path integral methods [43,42,36,2,39]. Diffusion Monte Carlo methods use Hamiltonian matrix elements to define transition rates for a substochastic random process, along with birth and death population dynamics or importance sampling to control statistical fluctuations. Examples of proven efficient convergence for diffusion Monte Carlo methods include simulating frustration-free stoquastic adiabatic computation [12] and simulating the "go with the winners" method [27]. Disadvantages of diffusion methods include quantum amplitude vs quantum probability obstructions for some variants [26,13], as well as the need for importance sampling with trial wave functions that may in general not be available [8].
PIMC methods, in contrast, are based on Markov chains defined on an enlarged state space consisting of many coupled copies of a high-temperature classical system along a so-called "imaginary-time" direction [43]. PIMC has been used to give an FPRAS for the ferromagnetic case of TIM [8] on general interaction graphs by a reduction to a classic result on sampling the Gibbs distribution of classical ferromagnetic Ising models [28]. Also, the Glauber dynamics for an infinite dimensional variant of PIMC has been shown to have optimal temporal mixing for ferromagnetic TIM at high temperature [14] and for ferromagnetic TIM on regular trees [35].
Here "ferromagnetic" means of the form in (5) but with each K z j = 0 and each K zz j ≤ 0, i.e. lower energy is achieved by aligning adjacent spins. Since such interactions cannot be frustrated, these models are considered easier to analyze. Disadvantages of PIMC methods include the significant overhead created by adding the imaginary-time direction, and also the possibility of slow mixing due to topological obstructions [3,23]. The general presence of obstructions to rapid mixing for the standard versions of both diffusion and path integral methods also motivates the continuing development of new algorithms for simulating stoquastic Hamiltonians [2,10].
To our knowledge, the simulations in this work are the first provably efficient Monte Carlo methods for a large class of generalized TIM systems, which differ from ferromagnetic TIM by allowing couplings of both signs between neighboring spins, as well as local fields of both signs. A generalized TIM in 2D or higher dimensions can encode NP-complete constraint satisfaction problems, which is the basis for the use of these models in quantum annealing [31,20,1]. Besides their relative simplicity from the standpoint of experimental implementation, generalized TIM on interaction graphs of degree 3 are also known to be universal for stoquastic quantum annealing [11,17]. The application of Monte Carlo methods to stoquastic quantum annealing is called simulated quantum annealing, and a significant open question is whether some variant of simulated quantum annealing can efficiently simulate stoquastic quantum annealing. This open question has motivated several recent proposals for non-stoquastic quantum annealing [15,25,46,41]. Some recent progress has been made by analyzing specific models for which both quantum annealing and simulated quantum annealing can be exponentially faster than classical simulated annealing [19,38,32,37,6,7,16,30].
Another class of simulation methods for 1D quantum systems are those based on matrix product states [45]. These methods do not require the Hamiltonian to be stoquastic and have been applied in rigorous classical simulations of systems with limited entanglement [22,33,4]. Their runtime scales exponentially with spectral gap, while our runtime bound for PIMC scales exponentially with the inverse temperature, which is qualitatively similar. These kinds of simulations are not expected to obsolete PIMC in general, however, as the latter is used in practice to simulate highly entangled stoquastic systems. Furthermore, the version of PIMC that we consider is closely related to the practical implementations of PIMC, and so this work can be seen as a rigorous analysis of a heuristic approximation algorithm that was originally developed and applied with great success by the physics community.
Organization of the remaining sections. Section 2 reviews the PIMC method and sketches the proof of our mixing time analysis. Section 3.1 defines the Suzuki-Trotter approximation to the partition function which is used to map 1D stoquastic Hamiltonians onto 2D classical spin systems. Section 3.2 and Section 3.3 describe the use of samples from the Gibbs distribution of the 2D classical system to approximate quantum observables and the quantum partition function. Section 3.4 proves a concentration theorem for PIMC that justifies the analysis of a restricted state space in later sections. Section 4.1 reviews the canonical paths method that we use to analyze the PIMC mixing time. This method is then applied to generalized TIM in Section 4.3, generalized TIM with XX interactions in Section 4.4, and general 1D stoquastic Hamiltonians in Section 4.5.

Overview
In the PIMC method the Hamiltonian (1) is related by Suzuki's quantum-to-classical mapping [42,44] to a system of {±1} classical spins on a 2D lattice Λ = {1, . . . , L} × {1, . . . , n}. The new site index {1, . . . , L} is known in physics as the "imaginary-time" direction, and it is often useful to think of the set of spin configurations Ω := {−1, 1} n×L as consisting of "time slices" along the L direction, i.e. for z ∈ Ω we write z := (z 1 , . . . , z L ) with each z i ∈ {−1, 1} n . The mapping introduces a local energy function E β : Ω → R on the classical spin configurations in such a way that an estimate of the classical partition function Z := z∈Ω e −βE β (z) can be used to estimate the quantum partition function Z, and so that samples from the Gibbs distribution π(z) := e −βE β (z) /Z can be used to estimate expectation values in the quantum Gibbs state. For the nearest-neighbor version of the transverse Ising models in (5) the energy function E β is where z i,j denotes the spin in configuration z at the site (i, j) ∈ Λ, and J i := 1 2 log coth βΓi L (see Section 3.1). The couplings along the imaginary-time direction are ferromagnetic, but the couplings along the spatial direction can have varying signs as well as local fields. Our choices of parameters are such that L = poly(n), and so the general case considers J i with magnitude up to O(log n). Therefore the couplings of the 2D model are highly anisotropic, as in figure 1. space imaginary-time  x 3,4 x 3,5 x 4,5 x 4,4 x 4,3 x 4,2 x 4,1 x 5,1 x 5,2 x 5,3 x 5,4 x 5,5 x 6,5 x 6,4 x 6,3 x 6,2 x 6,1 space n imaginary-time L The PIMC method proceeds by attempting to sample from π and approximate Z using the Markov chain Monte Carlo method with single-site Metropolis transition probabilities. With this in mind, the primary question is to determine the mixing time τ for this Markov chain, which is the minimum number of steps of this Markov chain needed to sample from a distribution that is close to π. We will use the method of canonical paths that was originally introduced by Jerrum and Sinclair (reviewed in [29,34]) in order to bound the mixing time. The idea of the method is that for every pair of states x, y ∈ Ω the Markov chain P needs to route an amount of "traffic" equal to π(x)π(y) along some path γ x,y which connects x to y using the transitions of the Markov chain. Define the congestion across a particular transition e = (z, z ) to be the ratio of the total traffic through it (i.e. x,y:e∈γx,y π(x)π(y)) to the amount of probability flow across it at equilibrium, which equals π(z)P (z, z ). The mixing time can then be upper bounded in terms of the maximum congestion through any transition, using standard arguments which we review in Section 4.1.
Our problem now reduces to finding good sets of paths between every pair of configurations x, y ∈ Ω. We use a set of paths inspired by those used to analyze the (unweighted) random walk on the boolean hypercube [29]. Specifically we will replace the entries of x with entries of y one at a time. A crucial requirement is that the energies of the intermediate states do not get too large, as this would create a large congestion. For the model (9) the paths first update the classical spins corresponding to the site of qubit 1 in consecutive imaginary-time order, and then proceed to do the same for the sites of qubits 2 through n. For each qubit j ∈ {1, . . . , n} the line of sites {(1, j), (2, j), . . . , (L, j)} is called the worldline of the j-th qubit, and so these canonical paths have the form of updating each qubit worldline consecutively, always finishing one worldline before starting the next.
An example of an intermediate step along this path is depicted in Fig. 2. Notice that intermediate steps have only introduced a constant number of broken bonds between spins that are neighbors in the imaginary time direction, each contributing an amount of energy that is O(β −1 log n), and at most L broken bonds between spins that are neighbors in the space direction, each contributing energy O(1/L). The relatively low energy of the intermediate configurations along the path is the key to the formal proof of rapid mixing for PIMC applied to 1D generalized TIM in Section 4.3. For 1D generalized TIM with long-range interactions, the energy of the intermediate configuration z will include contributions from spins that are arbitrarily far away from the cut. Using the assumption on the rate of decay of the magnitude of these interactions, the total contribution to the energy is upper bounded by where the bound is proven by viewing the summation as a right Riemann sum of the corresponding integral, which yields an upper bound because the integrand is a monotonically decreasing function. Since we regard ξ as a constant this contribution to the energy is O(1). For Hamiltonians of the form (6) the 2-local off-diagonal terms give rise to a classical energy function with interactions involving four spins at a time, The h i,j terms (which are implicitly defined by the Gibbs distribution (22)) correspond to strong ferromagnetic interactions between neighboring pairs of spins in neighboring time slices, and violating them increases the energy by an amount that is O(log n). If z i,j = z i+1,j and z i,j+1 = z i+1,j+1 then we say there is a 2-local jump between time slice i and time slice i + 1, because the resulting energy contribution from the 4-body interaction will depend on the magnitude of the 2-local off-diagonal quantum terms i.e. K xx j and K yy j . The canonical paths we have described so far will lead to intermediate configurations that potentially create or destroy a non-constant number of 2-local jumps across the domain wall pictured in figure 2. If K xx j is larger than K x j , then configurations x and y which contain a nonconstant number of 2-local jumps will result in the canonical paths described so far overloading some edges with large congestion. To overcome this we include additional path segments along the canonical paths that remove 2-local jumps from the worldlines near the domain wall in figure 2. This greatly increases the number of paths passing through each edge, but we are still able to obtain a poly(n, e β ) upper bound on the congestion by noticing that most of these paths have endpoints with very small stationary probabilities.
For Hamiltonians of the form (1) it is possible that the off-diagonal terms in (1) do not allow for the PIMC Markov chain with single-site transitions to be ergodic. In this situation some configurations z ∈ Ω will have π(z) = 0, which creates a new obstacle to the canonical paths we have described so far. To restore ergodicity under single-site updates we add a fictitious transverse field to the Hamiltonian with a weak coupling that goes to zero with system size: with Γ ≤ δ mult /n so that the subsequent estimate for the partition function will not be further than the allowed error tolerance away from its intended value.
Once the fictitious field is added it once again makes sense to attempt to use the same canonical paths that were applied to systems of the form (6). But now the fact that the off-diagonal matrix elements of the Hamiltonian are no longer symmetric under the global operation of flipping all spins means that removing a single (1-local or 2-local) jump from a given worldline can decrease the stationary probability of that configuration by a factor scaling exponentially in the number of jumps remaining in that worldline. To overcome this problem we show in Section 3.4 that typical configurations distributed according to π will have at most O(β log n) jumps in any particular worldline. This means that the worst-case intermediate configurations just described may have a stationary probability reduced by n O(β log n) , and this factor bottlenecks the mixing time estimate. Using these facts we describe canonical paths in Section 4.5 within a restricted state space Ω * in which every configuration has O(β log n) jumps per worldline, and by either restricting the PIMC Markov chain to Ω * or applying the leaky random walk conductance bound [16] we obtain a bound on the mixing time within the subset Ω * that establishes theorem 1.

Path integral Monte Carlo
Sections 3.1 through 3.3 rigorously present the underpinnings of the standard PIMC method: the quantum-to-classical mapping, and its application to approximating expectation values of observables as well as the partition function. Section 3.4 proves a general concentration of measure bound which shows that the number of jumps along any worldline in PIMC is O(β log n) with high probability. Note that this result applies to general stoquastic Hamiltonians (not just the 1D case). This result is used to analyze a restricted state space in Section 4.5. .

Quantum-to-classical mapping
The first step of the quantum-to-classical mapping is to split H as in (1) The fact that lim L→∞ Z β,L = Z β is the content of the 1875 Lie product formula, while for finite L the Suzuki-Trotter approximation is related to the quantum partition function (3) by for some δ that is O H 3 /L 3 . This error bound follows from two facts about the matrix exponential: a variant of the usual Suzuki-Trotter expansion, and a continuity bound. Both appear in [8] for transverse Ising models, and the proof for three commuting layers A, B, C is similar and so we omit the details. Before considering the general case of H as in (1) it is useful to consider the simplified case of (12) for generalized TIM of the form (5) by taking A = −β diag(H), B = −β (H − diag(H)), and C = 0 so that the Suzuki-Trotter approximation reduces to For transverse Ising models, the next step is to expand the Suzuki-Trotter approximation (14) to the partition function by inserting several complete sets of states, where in the last step we have used the fact that A = −β diag(H) is diagonal in the computational basis and defined A(z i ) := z i |A|z i . The Gibbs distribution of the classical 2D spin system is defined by equating e −βE β (z1,...,z L ) with the summand of (17). The closed form (9) is obtained applying the identity e ωσ x = cosh(ω)I + sinh(ω)σ x to For a general 1D stoquastic Hamiltonian of the form in (1) we will apply (12) with Expanding (12) with 2L complete sets of states, where periodic boundary conditions z 2L+1 := z 1 are taken in (20). At this point the essential difference from the transverse Ising case is clear: the off-diagonal part of the Hamiltonian is split into two commuting layers, these layers alternate along the imaginary-time direction, and this causes the coupling between classical spins at neighboring time slices to alternate as well.
For ease of notation define B i to be 2B if i is odd, and 2C if i is even, and rescale L → L/2 to obtain a distribution π from (21),

Approximating observables
Let O be an observable for which we would like to estimate O β := tr[Oρ β ]. This expectation can be expressed as the derivative of a partition function by defining and using the fact that The next proposition shows that O β can be approximated by a finite difference.
satisfies the bound Proof. We use first Taylor's theorem, then the bound tr AB ≤ A B 1 to obtain: Dividing both sides of (27) by ζZ β (0) we have Then In the last step we have used the assumption that δ ≤ 1/21 to simplify the numerical constants. A similar argument shows that O β − O β ≥ −2 √ δ O , thus completing the proof.
By (25), (13), and the triangle inequality, the expectation O β can be approximated by where Z β,L (ζ) is a Suzuki-Trotter approximation to (23). If O is diagonal use the form in Section 3.1 with A → A + ζO, Evaluating the derivative at ζ = 0, where the final expression follows by the symmetry of π under cyclic permutations of the time slices. For off-diagonal O, Evaluating the derivative at ζ = 0, where the last step follows by symmetry of π under even cyclic permutations of the time slices.
Since the summand of (42) is efficiently computable and has bounded magnitude we can estimate it using a standard Monte Carlo procedure (see lemma 3) once one has an efficient meanns of obtaining independent samples from π.

Approximating partition functions
In Section 4 we will prove that for any tolerance δ > 0 the PIMC method can efficiently output an element of Ω sampled from a distributionπ which is within |π − π| ≤ δ. Here we explain how these samples can be used to approximate the partition function with an application of the standard telescoping product technique [28,5]. We express the partition function Z(β) as a product of terms involving the partition function at higher temperatures, where the temperature schedule β 0 < β 1 < . . . < β k with β 0 = 0 and β k = β is chosen so that each ratio in the telescoping product is bounded. Defining w βi (z) := Z(β i )π βi (z), note that At infinite temperature the quantum partition function is Z(0) = 2 n . Further, a uniform schedule with k = O(β H log(β H )) ensures that Note that in infinite temperature limit β = 0 the classical effective systems (9) and (10) have infinite coupling strength between spins in the imaginary-time direction, and so the Markov chain we analyze in Section 4 will not be ergodic. This is ununsual from the perspective of classical spin systems, but it happens in this case because of the temperature dependence of the couplings. To deal with this the first expectation w β1 /w β0 π β 0 in the telescoping product can be computed using the fact that π β0 is equivalent to the uniform distribution on the subset Ω classical ⊆ Ω consisting of configurations (z 1 , . . . , z L ) which have z i = z j for all i, j = 1, . . . , L.
Finally, in order to bound the number of samples needed to compute the expectation values in the telescoping product we will make use of the Hoeffding bound.

Restricting the PIMC state space
In this section we justify restricting PIMC to a subset of the state space that limits the number of jumps in any particular worldline to be O(β log n) (this restricted state space will be used for the mixing time analysis in Section 4.5). For a configuration z = (z 1 , ..., z L ) the number of jumps in the worldline of the j-th qubit can be expressed in terms of the Pauli operator X j , The strategy is to bound the k-th moment of the random variable d j with respect to the QMC stationary distribution π and use the fact that These moments satisfy where A, B i are defined in Section 3.1. To merge the two products in the expression above into a simpler trace expression, we want to show the following inequality for any neighboring time slices z i and z i+1 , where we assume the local terms are normalized so that for some Γ > 0 If z i |X j |z i+1 = 0 then (50) is satisfied. Otherwise if z i |X j |z i+1 = 1 then the numerator on the RHS of (50) satisfies To upper bound the denominator of the RHS of (50) , the fact that B i is a commuting Hamiltonian implies where in the last step we have used H j ≤ 1, β L ≤ ln(2) and the convexity of e x . Using ln(2)/2 ≥ 1/4 we have established (50).
To apply this result to (49) we will use the permutation symmetry of the i 1 , ..., i k variables to restrict to a sum over ordered k-tuples (i 1 , ..., i k ) which satisfy Defining t r = (i r − i r−1 )β/L (as well as t 1 = r 1 β/L, t k = (L − i k )β/L) this expression can be seen as the trace of a product of operators, where M ti := e Beven/L e A/L e B odd /L e A/L e Beven/L tiβ/L −2 and L i ,R ti can each denote one of several products of operators depending on t i , which is determined by where the X j operators disrupt the sequence, The purpose of this decomposition is that each M ti is a PSD operator raised to an positive integer power, and the rest of the Matrices L ti , R ti , X j are all either PSD operators or are explicit products of 2,3, or 4 PSD operators. Now the traces in (54) can be bounded using classic inequality due to Ky Fan [18,Theorem 1]. Let P 1 ...P N satisfy P i 0 for each i, and let σ(P i ) be a diagonal matrix with the eigenvalues of P i along the diagonal in descending order. Then tr [P 1 ...P N ] ≤ tr [σ(P 1 )...σ(P N )] For our purposes define the notation σ(R ti ) to be the product of diagonal matrices of eigenvalues of the operators that make up R ti e.g. if R ti = e A/L e Beven/L then σ(R ti ) = σ e A/L σ e Beven/L . Furthermore we may take B even , B odd 0 without loss of generality (by shifting the overall Hamiltonian by a multiple of the identity), so that 1 σ(R ti ), σ(L ti ) for each t i . Therefore each trace in (54) satisfies = tr e Beven/L e A/L e B odd /L e A/L e Beven/L L−2k where the fact that 1 σ(R ti ), σ(L ti ) is used to go from (55) to (56). Now for k L we have Z β,L ≈ Z β,L−2k , so this implies that E π d k j ≤ k!β k . Therefore taking k = c log n for some c > 0 and using k! ≤ (k/e) k yields 4 Proofs of rapid mixing

Markov chains, mixing times, and canonical paths
Let P be the transition matrix of a reversible Markov chain defined on a state space Ω with stationary distribution π. Let z be the state of the chain at time t = 0, and define P t (z, ·) to be the distribution of the state of the chain at time t. The variation distance of the chain at time t starting from the initial state z at time t = 0 is Define τ ( ) to be the worst-case time needed to be within variation distance of the stationary distribution, Define the mixing time to be τ mix := τ (1/4), and note that τ ( ) = τ mix log −1 [34].
To bound the mixing time of P we use the method of canonical paths [29]. A path from x ∈ Ω to y ∈ Ω is a sequence γ x,y = (v 1 , . . . , v k ) of states with v 1 = x, v k = y and P (v i , v i + 1) > 0 for i = 1, . . . , k − 1. A set P = {γ x,y } containing a path that connects every pair of states is called a set of canonical paths. If = max γ∈P |γ| is the maximum length of any path in P and π min := min z∈Ω π(z), then the mixing time τ mix is O(R ln π −1 min ), where the congestion R is defined as In order to compute the congestion we will use a standard technique called encoding. For an edge e = (v, v ), let P(e) be the set of paths in P which pass through e. An encoding is an assignment of an injective function η e : P(e) → Ω to every edge e ∈ E(Ω). More generally, it can be useful to consider injective encodings η e : P(e) → Ω × Θ that map at most G = |Θ| paths through the e to each state in Ω [28]. Let η e (γ x,y ) = (η e (x, y), θ e (x, y)), and suppose there is some M such that π(x)π (y) ≤ M π (v) π (η e (x, y)) ∀x, y ∈ Ω, then the congestion satisfies where P min := min (z,z )∈E(Ω) P (z, z ), (62) follows from property (60), and (63) follows from the injectivity property of the encoding.

PIMC transition probabilities
To sample from the classical Gibbs distribution (22) the PIMC method uses a Markov chain which chooses a site in the classical lattice Λ uniformly at random and proposes to flip the bit at that site to make a transition z → z with an acceptance probability P (z, z ) given by the Metropolis rule [34], As long as the state space is a connected graph, the transition matrix P with transitions satisfying (64) will converge to the stationary distribution π.
For the sake of lower bounding the minimum transition probability P min from the previous section it therefore suffices to lower bound π(z )/π(z) where z and z differ only on a single site. Suppose z differs from z at position (i, j), then after canceling common factors in (22) we have Depending on the off-diagonal matrix elements in B i , the numerator of (65) can potentially be zero. This means that the state space Ω may not be connected by single bit flips, because the stationary distribution does not have support on all of Ω. This can be avoided in general by adding a 1-local transverse field field with sufficiently small magnitude Γ to avoid disturbing the resulting approximation. With this we can lower bound the ratio (65) using the fact that B i is a nonnegative matrix, which implies The denominator of (65) is upper bounded by 1, and (66) implies that the numerator is at least (βΓ/L) 2 , therefore

Transverse Ising models
In this section we will describe a set of canonical paths which will be used to show rapid mixing of PIMC applied to Hamiltonians of the form (5). Let x, y ∈ Ω be spin configurations with x = (x 1,1 , . . . , x L,1 , . . . , x 1,n . . . , x L,n ) and y = (y 1,1 , . . . , y L,1 , . . . , y 1,n . . . , y L,n ). It will be convenient to choose an ordering on the space [L] × [n]. We order bits first by the second coordinate and then the first, so that (r, l) (s, m) if l ≤ m or if r ≤ s and l = m.
The canonical path γ x,y from x to y consists of simply replacing each bit of x with the corresponding bit of y following the above ordering. This can be thought of as n steps, each of which consists of L substeps that correspond to transitions of the Markov chain. The r-th substep of the m-th step consists of flipping the bit in the r-th Trotter slice of the m-th qubit if x r,m = y r,m or leaving it alone if x r,m = y r,m . In the former case, this contributes an edge (v, v ) ∈ E(Ω) with v = (y 1,1 , . . . , y r−1,m , x r,m , x r+1,m , . . . , x L,n ) (70) v = (y 1,1 , . . . , y r−1,m , y r,m , x r+1,m . . . , x L,n ). (71) In the latter case (i.e. x r,m = y r,m ), this step does not add an edge to the path. Thus the number of edges in γ x,y equals the number of positions on which x and y disagree, which is always ≤ nL. In Section 4.1 we defined an encoding of a path to be an injective map η e from the set of paths through e to the state space Ω. The encoding we'll use for this path is where · to denote elementwise multiplication. To see that x, y are uniquely determined by v together with η = η (v,v ) (γ x,y ), observe that v · η = x · y, which specifies the bits that are flipped along the path γ x,y . Moreover, the edge (v, v ) specifies the location (r, m) of the bit being flipped in the transition from v to v . Let (v · η) ≤(r,m) denote the vector which equals v · η = x · y on the coordinates in [≤ (r, m)] and equals 1 on the coordinates in [> (r, m)]; define (v · η) >(r,m) analogously. Then we can recover x by calculating v · (v · η) ≤(r,m) , and can recover y by calculating v · (v · η) >(r,m) . For any subset of the lattice sites A ⊆ Λ, define the restriction E A β of the classical energy function E β by restricting the sum in (9) to sites (i, j) ∈ A, so that for all z ∈ Ω we have Now we compute where We also need where and B Y is defined similarly. The value of M needed to satisfy (60) for this encoding can now be determined by To upper bound the exponent in the worst-case define J := max m=1,...,n J m , then To upper bound the sum we use the assumption K zz jk ≤ |j − k| −(2+ξ) and regard it as a lower Riemann sum for an integral, djdk |j − k| −(2+ξ) = 1 − 2 ξ+1 n −ξ + (n − 1) −ξ ξ 2 + ξ , and since ξ > 0 it suffices in (63) to take M = O e 8J+4β . If the 1D quantum system has periodic boundary conditions connecting qubit 1 to qubit n, then the same analysis would show that M = O e 8(J+β) suffices for (60). Therefore by (61)-(63) the congestion R satisfies the factor of 2nL comes from the chain being lazy and the choice of random single-site updates. Note that max (z,z ) (π(z /π(z)) is O(L 2 Γ −2 ) from (67) and also := max x,y |γ x,y | = nL and ln π −1 min = ln(Z · e nLJ+nβ ) ≤ nL(J + β + ln 2), using Z ≤ 2 nL . In Section 3.1 we show that L = poly(n, δ −1 mult ) suffices to approximate the partition function with the desired precision. Therefore the mixing time bounds in Section 4.1 imply that the PIMC method generates samples as was assumed in Section 3.3, which completes the proof of theorem 1 for models of the form (5).

Polynomial time mixing in theorem 1
In this section we will prove rapid mixing of PIMC for Hamiltonians of the form (6) by describing paths between arbitrary states x, y ∈ Ω. It will be convenient to represent states in the form z = [z 1 , . . . ,z n ] withz j := (z 1,j , . . . , z L,j ) denoting the spin values along the j-th worldline in configuration z ∈ Ω. We say that a "double jump" occurs in configuration z at position (i, j) if (1 + z i,j z i+1,j ) (1 + z i,j+1 z i+1,j+1 ) = 0. For any consecutive pair of worldlines j, j + 1, define the number of double jumps d(z j ,z j+1 ), For any configuration z and worldline j we will define a path segment to a new configuration [z 1 , . . . ,z j−2 ,z j−1 ,z j ,z j+1 ,z j+2 , . . . ,z n ] with all of the double jumps removed from worldline j, d(z j−1 ,z j ) = d(z j ,z j+1 ) = 0 (80) The double prime notation indicates that the j-th worldline does not contain any double jumps, while the single prime indicates a worldline that shares no jumps with its double primed neighboring worldline.
There are many possible paths to choose from for removing double jumps in the i-th worldline. Due to the form of the Hamitonian (6), changing the placement of a double jump along the imaginary-time direction has no effect on the energy of the effective classical system, except possibly for the part which arises from the diagonal part of the Hamiltonian. Similar reasoning holds for the ordering of the single and double jumps; changing the ordering can only change the effective classical energy through the diagonal part of the Hamiltonian. Similarly, changing the imaginary-time position of the double jumps inx i−1 ,x i will have no effect on the contributions to the effective classical energy that comes from double jumps occuring inx i−2 ,x i−1 . Finally, note that changing the position of a double jump along the imaginary-time direction can only affect the classical energy through the diagonal part of the Hamiltonian, reducing the probability of the configuration by at most e −4β (the factor of 4 is due to the potential change from the diagonal part of the Hamiltonian when 3 worldlines are changed arbitrarily).
From the above considerations it follows that the imaginary-time position of the double jumps in a particular worldline can be changed freely with the stationary weight of the resulting configuration decreased by at most a factor of e −4β (the factor of 4 bounds the potential change from the diagonal part of the Hamiltonian when 3 worldlines are changed arbitrarily). Furthermore, since double jumps arising from the term σ x i σ x i+1 are simply bit-flips, two such jumps will cancel out ("annihilate") if they coincide. This allows for paths that remove these double jumps in pairs simply by moving them around, with the additional guarantee that this will not lower the stationary weight by more than e −4β . Therefore the path from [z 1 , . . . ,z n ] to [z 1 , . . . ,z j−2 ,z j−1 ,z j ,z j+1 ,z j+2 , . . . ,z n ] is defined to remove double jumps in pairs by shifting the position of the double jump at the least imaginary-time position along the imaginary-time direction until it encounters the next double jump and the pair is annihilated. The canonical path from x to y proceeds through the worldlines in order 1, . . . , n, prepping each pair of consecutive worldlines to remove double jumps, then updating the j-th worldline fromx j toȳ j . The full update of the first worldline proceeds as, The purpose of this series of steps (each of which consists of many single-site updates) is to avoid passing through intermediate configurations with too many additional double jumps occuring across the 1D domain wall formed near the worldlines which are being updated. The worst case intermediate configurations along the paths will have the form, z = [ȳ 1 , . . . ,ȳ j−2 ,ȳ j−1 , (y 1,j , . . . , y r,j , x r+1,j , . . . , x L,j ),x j+1xj+2 , . . . ,x n ]. (81) This system of paths is highly degenerate (a particular configuration such as (81) will appear in many paths). To bound the congestion we will use an encoding function η (z,z ) : P(z, z ) → Ω×Θ. If η (z,z ) (γ xy ) = (η, θ), then . , x r,j , y r+1,j , . . . , y L,j ),ȳ j+1 ,ȳ j+2 , . . . ,ȳ n ].
and θ contains all of the information needed to reconstructx j−1 ,x j ,x j+1 ,ȳ j−1 ,ȳ j ,ȳ j+1 from x j−1 ,x j ,x j+1 ,ȳ j−1 ,ȳ j ,ȳ j+1 . This lost information must specify the location of all of the jumps that have been removed from these 6 worldlines. This means that the encoding sends exponentially paths to each state, and so to obtain a polynomial time mixing bound we have to slightly generalize (63) to account for the fact that the stationary probability on the endpoints of most of these paths is sufficiently small. For any v ∈ Ω define d j (v) := d(v j−1 ,v j ) + d(v j ,v j+1 ) (which is just the total number of double jumps in the j-th worldline of v). Let q be the number of additional double jumps in the j-th worldlines of x and y which are not present in the j-th worldlines of z and η, Define P q z,z to be the subset of canonical paths passing through (z, z ) with endpoints x and y that satisfy (83). With these definitions the congestion (59) becomes q is the number of ways to restore q jumps to the undetermined worldlines. Now suppose we find M (q) such that π(x)π(y) ≤ M (q) · π(z)π(η), for all γ xy ∈ P q z,z , then this implies To satisfy (85) we need M (q) to satisfy where (90) to (91) follows because the the terms in the classical energy function which correspond to off-diagonal parts of the Hamiltonian involving qubits j − 2 and j + 2 are unaffected by the potentially changed spin values in the primed j − 1 and j + 1 worldlines. The factor of e 4β comes from upper bounding the change in the classical energy function due to the diagonal terms of the Hamiltonian between qubits j − 2, j − 1 and j + 1, j + 2. The effect of the diagonal part of the Hamiltonian on the ratio expression remaining in (91) can be bounded by a factor of e 8β . The additional single jumps introduced near the domain wall can increase the ratio by at most L 12 (using (67), and noting that the worst case occurs when 2 new single jumps are introduced in each of 3 worldlines, in each of z and η). Each of the double jumps in the numerator which are not present in the denominator decreases the ratio by at least (2β/L). Therefore we may take M (q) = e 12β (2β) q L −q and (88) becomes Using the fact that P −1 min is poly(n, β, Γ −1 ) from (67), and the fact that the maximum length of the canonical paths used in this section is O(n 2 L 2 ), this shows that τ mix is poly(n, e β , Γ −1 ) as required to complete the proof of theorem 1.
In Section 3.4 we show that Z * = z∈Ω * e −βE β (z) satisfies |Z − Z * | ≤ δZ. The PIMC dynamics can be restricted to Ω * by rejecting moves which would leave Ω * , or alternatively the canonical paths in this section bound the mixing time within Ω * by the leaky random walk conductance bound in [16]. To show rapid mixing of PIMC within Ω * we assign a canonical path γ xy to every x, y ∈ Ω * . The simple case for these paths occurs when x, y are in a subset Ω * inner ⊆ Ω * which only allows half as many jumps per worldline as Ω * does, Ω * inner = {z = [z 1 , . . . , z n ] ∈ Ω : |{i : z i,j = z i+1,j }| ≤ B for j = 1, . . . , n}.
If x, y ∈ Ω * inner then we use the same paths which were used to show rapid mixing for TIM in Section 4.3. Various intermediate points v along this path γ x,y may leave Ω * inner but will remain in Ω * since v = (y 1,1 , . . . , y L,1 , . . . , y L−r−1,m , x L−r,m , x L−r+1,m . . . , x 1,n , . . . , x L,n ), may have as many as B jumps in the first r time slices ofȳ m and B jumps in the L − r latter time slices ofx m , sov m will have at most 2B jumps in total and so v ∈ Ω * . For points x, y ∈ Ω * which are not in Ω * inner , we must not produce intermediate points with too many jumps in any worldline. To accomplish this the paths will have a "clean up" step similar to the one which was used in the previous section, to reduce the number of 2-local jumps in a worldline. Consider an intermediate point of the path which has so far updated the worldlines of the first m qubits (which could be m = 0), v = [ȳ 1 , . . . ,ȳ m ,x m+1 , . . . ,x n ]. If x m+1 has more than B double jumps then we remove these in imaginary-time order. Since we can no longer "annihilate" these transitions in pairs as was done in the previous section, we instead dismantle them into 1-local jumps (relying on the fictitious transverse field) and then annihilate those against one another. The effect of this process on the stationary weight of the configuration will be analyzed after we have given the encoding function for these paths. The net result of this process results is a worldlinex m+1 with fewer than B jumps. Similarly letȳ m+1 be defined fromȳ m+1 with all but the first B jumps removed. Next we updatex m+1 toȳ m+1 in the usual way, and sincex m+1 andȳ m+1 each have fewer than B jumps, the intermediate configurations along this path will have less than 2B jumps in every worldline and therefore remain in Ω * .
The encoding η : P (z,z ) → Ω * × Θ for these paths is again defined by (82). This time the maximum degeneracy of the encoding is 4L B , since there are at most 4L positions to which at most B 2-local jumps can be restored in the undetermined worldlines. Unfortunately, this time we cannot have a inequality analogous to (85), because the additional double jumps present in x and y can completely change the bit-values in wordlines j − 1 and j + 1. Unlike the case in the previous section these altered spin values can reduce the classical energy drastically because of the h terms acting on worldlines j − 2, j − 1 and j + 1, j + 2. The stationary weight of z and η can be lowered from x and y by a factor which scales exponentially in the number of double jumps that x and y have in worldlines j − 2, j − 1 and j + 1, j + 2. Therefore, M ≥ π(x)π(y) π(z)π(η) The fictitious transverse field has magnitude δ mult /n, so the congestion satisfies ≤ P −1 min · e 12β · poly(n) · δ −1 mult 8βΓ log n · 4L 4βΓ log n , from which it follows that the mixing time τ mix is O(2 β(log nδ −1 mult ) 2 ) and we obtain the overall PIMC runtime as claimed in theorem 1.