Simpler (classical) and faster (quantum) algorithms for Gibbs partition functions

We present classical and quantum algorithms for approximating partition functions of classical Hamiltonians at a given temperature. Our work has two main contributions: first, we modify the classical algorithm of \v{S}tefankovi\v{c}, Vempala and Vigoda (\emph{J.~ACM}, 56(3), 2009) to improve its sample complexity; second, we quantize this new algorithm, improving upon the previously fastest quantum algorithm for this problem, due to Harrow and Wei (SODA 2020). The conventional approach to estimating partition functions requires approximating the means of Gibbs distributions at a set of inverse temperatures that form the so-called cooling schedule. The length of the cooling schedule directly affects the complexity of the algorithm. Combining our improved version of the algorithm of \v{S}tefankovi\v{c}, Vempala and Vigoda with the paired-product estimator of Huber (\emph{Ann.\ Appl.\ Probab.}, 25(2),~2015), our new quantum algorithm uses a shorter cooling schedule than previously known. This length matches the optimal length conjectured by \v{S}tefankovi\v{c}, Vempala and Vigoda. The quantum algorithm also achieves a quadratic advantage in the number of required quantum samples compared to the number of random samples drawn by the best classical algorithm, and its computational complexity has quadratically better dependence on the spectral gap of the Markov chains used to produce the quantum samples.


Introduction
Markov Chain Monte Carlo (MCMC) method is a fundamental computational tool in statistics. It is a strategy for sampling from certain high-dimensional probability distributions, which can be used to estimate properties of systems that would be difficult to study otherwise. In the last two decades, MCMC methods found numerous applications in Bayesian inference [10,8], counting problems [23], volume estimation of convex bodies [7], approximation of the permanent [16], estimating thermodynamic properties of systems [2] or in finance for simulating the performance and volatility of portfolios [9]. A major algorithmic task in the area of MCMC methods is the approximation of partition functions. In physics, the partition function describes the statistical properties of a physical system at a fixed inverse temperature, but a wide range of problems can be naturally cast as questions about partition functions. In machine learning for example, the partition function appears in the definition of probabilistic graphical models such as Markov Random Fields (e.g., restricted Boltzman machines [13]), and has been extensively studied before [5,19,14,18]. Given the many of their applications, it is important to design efficient algorithms for their computation, and explore the potential of using faster techniques to compute them.
We study the advantages that quantum computers have over classical computers for the computation of partition functions. It is well known that quantum algorithms provide quadratic speedups over classical algorithms for a variety of tasks related to MCMC, such as amplitude amplification and estimation [3] and spectral gap amplification of Markov Chains [24]. Even though a general quantum speedup for MCMC methods has not been found, fast quantum algorithms for partition functions using simulated annealing have been proposed [28,21,12]. Here we study classical and quantum algorithms for computing partition functions; our main contribution is a unification of two classical algorithms, leading to a quantum algorithm that improves upon the best known (quantum or classical) approaches. Before we describe our main results, we first introduce some basic notation and the problem of computing Gibbs partition functions.

Estimating partition functions
Let Ω be a finite set and H : Ω → {0, . . . , n} be a function called the Hamiltonian 1 . The Gibbs distribution is defined as: where β is the inverse temperature and Z(β) := x∈Ω e −βH(x) is the partition function.
Given β max and ε > 0, we aim to findẐ such that (1 − ε) · Z(β max ) ≤Ẑ ≤ (1 + ε) · Z(β max ) using the least amount of samples from µ β for various choices of β. An approach for this task is based on the following telescoping product: where 0 = β 0 < β 1 < · · · < β = β max is a sequence of inverse temperatures called a cooling schedule. Eq. 1 approximates Z(β max ) if each factor Z(β i+1 )/Z(β i ) is estimated with sufficient precision. We fix β max = ∞ in the rest of the introduction for simplicity. Two approaches for this approximation were introduced in Refs. [23,14]. The algorithm ofŠtefankovič et al. [23] computes Z(∞) in two steps: first, it produces a cooling schedule consisting of O( ln |Ω| ln n ln ln |Ω|) inverse temperatures. Defining X i = e (β i −β i+1 )H(x) , we have E x∼µ β i [X i ] = Z(β i+1 )/Z(β i ). The partition function can be approximated by sampling many x ∼ µ β i , empirically estimating the factors at subsequent temperatures and taking their product. This is called the product estimator. The resulting algorithm for partition function estimation has sample complexity 10 10 · ln |Ω| · ε −2 · ln n + ln ln |Ω| 5 . We call this algorithm SVV, forŠtefankovič, Vempala and Vigoda.
SVV contains many subtle technical points that lead to some of the polylogarithmic factors and a large prefactor in its running time. Huber [14] and Kolmogorov [18] eliminated these hurdles using the so-called Tootsie Pop Algorithm (TPA) [15]. This algorithm uses a Poisson point process to build a cooling schedule with ∼ ln |Ω| temperatures. The product estimator would be inefficient in this context as it would pick up a quadratic overhead in the schedule length. To address this, Huber introduced the paired-product estimator. Given a cooling schedule, define d i,i+1 = (β i+1 − β i )/2, and random variables V i , W i : 1 Hamiltonians are usually defined as real-valued functions. In this paper we make the simplifying assumption that the range of the Hamiltonian is contained in the set of nonnegative integers. This assumption is common in the literature and can often be relaxed, see Kolmogorov [18] for a discussion.
where x i ∼ µ β i . The telescoping product in Eq. (1) can be rewritten as (2) , then takes their ratio. This, remarkably, leads to a better estimator: the overall complexity of TPA (using the improved analysis given by Kolmogorov [18]) is O(ln |Ω| · ε −2 · ln n).

Results
We give a quantum algorithm that estimates partition functions by combining elements of SVV and TPA. Assuming the ability to query an oracle that provides coherent encodings of the Gibbs distribution at arbitrary inverse temperature (qsamples), we show that the quantum algorithm requires fewer qsamples compared to the number of samples drawn by the best classical algorithm. The above assumption is not restrictive, as qsamples can be generated from a classical Markov chain that has the corresponding Gibbs distribution as its limit distribution; the total running time of the algorithm (classical and quantum) must also account for the time -known as mixing time -taken to generate these samples. Quantum algorithms for approximating partition functions [28,21,12] do not perform the classical steps of the MCMC algorithm. Instead, they utilize unitary quantum walks [24] that are related to the classical Markov chain by a suitable embedding as the product of two reflection operators. Such quantum walks are constructed so that a natural quantum encoding of the Gibbs distribution is an eigenvector with eigenvalue one of the quantum walk unitary. This leads to a quadratic advantage of the quantum algorithm over classical, in terms of the mixing time.

Classical contribution
We propose a classical algorithm for the approximation of partition functions that simplifies [23], and almost matches the sample complexity of TPA, the best classical algorithm [18,14]. It uses the paired-product estimator within SVV: this allows for a shorter cooling schedule and a simpler estimator.
This estimator becomes simpler (motivating the "simpler (classsical)" in the paper title), because the approximation of the partition function ratios in the cooling schedule algorithm does not depend on temperatures outside the interval on which these are estimated. This gets rid of the nested search in the SVV cooling schedule algorithm, a siginficant simplification. Moreover, unlike TPA, where these ratios were estimated using a Poisson Point Process, our algorithm is easier to quantize.
Ref. [23] conjectured that an optimal adaptive cooling schedule has length Θ( ln |Ω| ln n) and proved a lower bound of Ω( ln |Ω|). 2 They give an algorithm that produces a schedule of length O( ln |Ω| ln n ln ln |Ω|). Our algorithm, albeit quantum, produces a schedule of length O( ln |Ω| ln n), while achieving the same approximation precision.

Quantum contribution
Our main contribution is a quantum algorithm that improves the best known sample complexity and overall running time. In the quantum setting, the samples are the coherent encodings of the Gibbs distribution (qsamples): The quantum algorithm is based on SVV, but uses the paired-product estimator of [14]. We give a procedure, based on binary search, to obtain a cooling sequence of length O( ln |Ω| ln n), using the fact that we can efficiently compute inner products of qsamples of the form | µ β i |µ β i+1 |. The length of this cooling sequence matches the optimal length conjectured in [23].

A technical tool
To estimate the telescopic product (2), we give a quantum algorithm for the estimation of the expected value of random variables with bounded relative variance. Computing expectations of random variables is a fundamental task in statistics, applied probability, and related disciplines. Our contribution in this context is a simpler and faster quantum algorithm for computing the expectation of a random variable, given quantum samples. Recall that, in order to compute the Gibbs partition function, one needs to estimate ratios in the telescopic product (2). To do this, we give a simple quantum algorithm to estimate the expected value of random variables with bounded relative variance. In particular, for a distribution D, and given a random variable we describe a quantum algorithm that uses O(B) copies of |ψ D = x D(x)|x , and O √ B/ε) reflections about |ψ D , to obtain with high probability an ε-relative approximation of E[V ]; additionally, the algorithm restores one copy of |ψ . The proposed algorithm is based on the work of Montanaro [21] and improves upon its scaling from O(B/ε) to O( √ B/ε). Our algorithm has essentially the same scaling as the algorithm of Hamoudi and Magniez [11], but the analysis of our algorithm is simpler.

Applications
Many computational problems can be encoded into evaluations of the partition function and we briefly sketch out two applications of our algorithm. Our work also applies to other problems, such as counting independent sets, matchings and Bayesian inference as discussed in [12,21,23]; we refer the interested reader to these works. As discussed earlier, our classical and quantum improvements over [12,23] directly imply better algorithms for estimating partition functions of these Hamiltonians.

Ferromagnetic Ising Model
The ferromagnetic Ising model on a graph G = (V, E) can be defined as 3 The associated Gibbs distribution can be sampled using a Markov Chain called Glauber dynamics. Convergence of Glauber dynamics is well studied and several criteria for rapid mixing have been found. Montanaro [21] quotes a result of Mossel and Sly [22] that proves that the chain mixes in time O(|V | ln |V |) on general graphs with finite degree for a sufficiently low value of β. As the mixing time upper-bounds the inverse spectral gap (see [22,Eq. 6]), it follows that for |V | = n, our classical algorithm achieves complexitỹ O(n 2 · ε −2 ). While there may exist algorithms that perform significantly better, the complexity matches (up to polylogarithmic factors) the best classical upper bound that we are aware of [18]. The quantum algorithm in comparison achieves complexity ofÕ(n 3/2 · ε −1 ). Apart from being simpler than the algorithm derived in [12], our approach improves the complexity by polylogarithmic factors.

k-Colorings
The k-state Potts model on a graph G = (V, E) has the Hamiltonian with the corresponding partition function The partition function at β = ∞ therefore gives the number of proper k-colorings of G. Vigoda [26] gave a Markov chain with mixinng time O(nk log n) whenever k > 11/6 · d, where d is the maximum degree of G. Our quantum algorithm hasÕ(n 3/2 · ε −1 ) complexity for this problem. Table 1 summarizes the sample complexity of the algorithms presented in this paper and compares them to the existing results in literature.

Classical algorithm
We first introduce our notation and describe the improved version of SVV, that is the basis for the quantum algorithm.
Let H : Ω → {0, . . . , n} be a classical Hamiltonian. For fixed β min , β max , we let Q = Z(β max )/Z(β min ). We often use the shorthand q = ln |Ω| and assume that ln n ≥ 1, ln q ≥ 1, |Ω| ≥ ln n as in [23]. 5 We assume that the partition function is non-zero so that the Hamiltonian has at least one state in its ground state (i.e. state with a zero energy). This implies that |Ω| ≥ Z(β) = x∈Ω e −βH(x) ≥ 1 for all β ∈ [0, ∞). Often in this paper we assume that β max ≤ q; this is not restrictive because after a cooling schedule reaches
Montanaro's algorithm [21] uses classical SVV to generate a cooling schedule, and a quantum routine to estimate the ratio given a cooling schedule; here we report only the sample complexity of the quantum routine, for the total complexity one should add the complexity of classical SVV schedule generation. The stated complexity of our classical algorithm holds for ε < (ln |Ω|+ln n) −1 ; the stated complexity of our quantum algorithm holds for ε < ln |Ω|+ln ln n ln |Ω|+ln n 2 .
For a more precise statement of the complexities we refer the reader to Theorems 2.13 and 3.6.
the inverse temperature q = ln |Ω|, any larger inverse temperature can be reached in a single step. 6

Product Estimator
We aim to estimate Q = Z(β max )/Z(β min ) given access to samples from the Gibbs distribution at arbitrary inverse temperatures. A naive estimator of Q can be constructed as follows: • sample x ∼ µ β min from the Gibbs distribution at β min .
With a slight overload of terminology, will call S[X] the relative variance. 7 It is equal to: This is usually prohibitively large. To address this, Valleau and Card introduced the product estimator in Ref. [25] as a low-relative variance estimator of Q . It uses a cooling schedule of inverse temperatures β min = β 0 < β 1 < · · · < β = β max . For every step 0 ≤ i < : Let X be the product random variable defined as X = −1 i=1 X i . The random variables X i , X j are independent for i = j and it follows that: The relative variance of X i for each i ∈ {0, . . . , − 1} is: The product estimator reduces the relative variance S[X] in (3). However, the depen-

Paired-product Estimator
Our algorithm replaces the product estimator with the paired-product estimator. This estimator was introduced by Huber in [14]. Then: observe that x i is used for V i and x i+1 for W i .
• The expectations of V i and W i are: can be expressed as: The advantage of using the pairs V i and W i becomes evident from their relative variances:

Dyer and Frieze's bound on product estimation
A key technical ingredient in our classical algorithm is a result od Dyer and Frieze [6] that bounds the number of samples to estimate the expectation of a product random variable with relative error. We use this to estimate the expectation of the product of the random variables V i and W i .
we can obtain X that satisfies

Cooling schedule
We give a short cooling schedule such that the resulting relative variances of the random variables V i and W i are bounded from above by a constant. This allows us to apply Theorem 2.1. To do this, we use a result due toŠtefankovič et al. [23,Lemma 4.3], establishing the existence of a short cooling schedule that satisfies: where f (β) = ln Z(β) . It is important that f (β) is a strictly decreasing convex function, which is the case if the Hamiltonian is non-negative. [23]). ] There exists a sequence β 0 < · · · < β with β 0 = β min and β = β max satisfying the condition in (10) with equality and having length bounded from above by:

Theorem 2.2 (Perfectly-balanced schedule length [Analogous to
The proof of Theorem 2.2 can be found in Appendix A.

Corollary 2.3 (Upper bound on perfectly-balanced schedule length). The length of a perfectly-balanced schedule, as in Theorem 2.2, satisfies
Proof. By Theorem 2.2, the length of the schedule satisfies Eq. (11). We can further upper bound in Eq. (11) as follows: observe that f (·) = ln(Z(·)), so we have The above inequality holds because |Ω| ≥ Z(β) ≥ 1 for all β ≥ 0. Moreover, We have: where we used the fact that −f (β min ) ≤ n (which follows from the equation above and the assumption that Theorem 2.2 establishes existence of a perfectly-balanced cooling schedule. Determining temperatures satisfying Eq. (10) as equality are challenging -we instead work with well-balanced cooling schedules, which satisfy: for suitably chosen constants c 1 , c 2 . The upper bound c 2 ensures that the relative variances S[V i ] and S[W i ] are not too large, while the lower bound c 1 ensures that the temperatures increase rapidly enough, so that the schedule is short. We will also call a cooling schedule that satisfies: a c 2 -slowly varying cooling schedule. We describe an iterative algorithm that, given a cooling schedule ending at β i , finds β i+1 so that the condition in Eq. (12) is satisfied. To achieve this, we estimate the ratios ) and use their product as the estimate for the relative variances S[V i ] and S[W i ]. Obtaining highly accurate estimates of these ratios is difficult: indeed, if one could compute Z(β i ) Z(β i+1 ) with high accuracy, then one could follow a cooling schedule, estimate the products Z(β i ) Z(β i+1 ) for all i ∈ {0, . . . , }, multiply them and solve the original problem of estimating Z(β ). Fortunately, to decide if the well-balanced condition holds, it suffices to have "crude" estimates obtained by taking less samples. To prove this, we use several results from Ref. [23]. The following lemma bounds the number of large temperature increases in any cooling schedule by comparing it to a perfectly-balanced cooling schedule. [23,Corollary 4.4]). Let β min = γ 0 < · · · < γ m = β max be an arbitrary cooling schedule such that:
Proof. Theorem 2.2 upper-bounds the length of a perfectly-balanced cooling schedule, which has relative variances equal to e 2 . We show that a cooling schedule γ 0 < · · · < γ m with relative variances at least e 2 cannot be longer than the upper bound derived in Theorem 2.2. This requires showing that increasing the relative variance can only make the schedule shorter. Define: due to strict convexity of f and x < (x + y)/2. Similarly one can show that g(x, y) is increasing in y for x < y. We now show by induction that β j ≤ γ j for every j. This is trivial for j = 0. By assumption g(β j , β j+1 ) = 2 because the β schedule is perfectly-balanced, and g(γ j , γ j+1 ) ≥ 2 from Eq. (14). Furthermore, β j ≤ γ j by the induction hypothesis. Assume to the contrary that β j+1 > γ j+1 . In this case, the monotonicity of g (in x,y individually) implies that g(β j , β j+1 ) > g(γ j , γ j+1 ). But this is a contradiction since g(β j , β j+1 ) = 2 and g(γ j , γ j+1 ) > 2. This concludes the induction step β j+1 ≤ γ j+1 .
Theorem 2.2, Corollary 2.3 and Corollary 2.4 upper-bounds the length of a set of well-balanced cooling schedules with c 1 = e 2 by √ q ln n.
The quantity Pr X∼µ β [H(X) ∈ I] is also called the weight of I at inverse temperature β.
Lemma 2.7 shows that if we find some interval I = [b, c] that is both sufficiently narrow and is h-heavy for two nearby temperatures β 1 and β 2 , we can use that interval to estimate the ratio Z(β 2 )/Z(β 1 ): . . , n}, δ ∈ (0, 1] and h ∈ (0, 1). Suppose that I is h-heavy for β 1 , β 2 > 0 satisfying: For k ∈ {1, 2}, let X k ∼ µ β k and let Y k be the indicator function for the event [H(X k ) ∈ I]. Let s = (8/h) · ln(1/δ) and U k be the average of s independent samples from Y k . Let Then, with probability at least 1 − 4δ, we have: Lemma 2.7 provides a way to compute Z(β i )/Z(β i,i+1 ) and Z(β i+1 )/Z(β i,i+1 ): find an interval which is h-heavy for both β i and β i+1 , and use Eq. (16) to estimate the two ratios up to a relative error of 4e. This is possible thanks to Lemma 2.6, ensuring that the midpointβ i,i+1 between β i and β i+1 satisfies the assumptions of Lemma 2.7. To generate a short cooling schedule, given some β i we need to determine some β i+1 such that: (i) we have an interval I which is h-heavy for both β i and β i+1 ; (ii) the well-balanced condition Eq. (12) holds. The following lemma is used to determine if a given interval is h-heavy at a given temperature.
Proof. Observe that there exists a kh = (1/|P |)-heavy interval in P \F : if this were not the case, then every interval in the partition P would have weight < 1/|P | (since intervals in F also have weight < 1/|P |) and the overall weight would be < 1 which is a contradiction. Furthermore, k ≥ 4, therefore there is at least one 4h-heavy interval in P \ F . We now apply the union bound for the following event: every 4h-heavy interval receives at least 2hs samples, and every interval that is not h-heavy receives less than 2hs samples. This event occurs with probability at least 1 − |P | · δ, and conditioned on this event, the interval returned by the algorithm (i.e., the one that received the largest number of samples) is h-heavy.
We use the following special partition P in our schedule generation algorithm, that provides suitable candidate intervals I from which one can select an h-heavy interval: • Return the set of intervals P . The last ingredient is the binary search described in Algorithm 1.
Proof. See Algorithm 2. 9 A monotone predicate P is a Boolean function defined on a totally ordered set with the following property: if P(x) = true, then P(y) = true for all y ≤ x in the domain, i.e., P is monotone decreasing. input : Monotone predicate P, 9 interval [a, b] such that P(a) = true, precision α. input : Initial temperature β 0 , partition P , largest temperature β max , probability δ. output: Set of temperatures β 0 , . . . , β k = β max . Set h ← 1 8|P | throughout the algorithm and subroutine calls; Apply the procedure of Corollary 2.9 with β = β k , h = 1 8|P | , and F as the set of forbidden intervals; let its return value be the interval I = {b, . . . , c}; Set β k+1 ← β * , k ← k + 1; end return β 1 , . . . , β k , β max Algorithm 2: Classical schedule generation procedure. The partition P is generated according to Lemma 2.10.
We prove correctness of this algorithm and analyze its complexity. Fix h := 1/(8|P |). By Lemma 2.10, we have |P | ≤ 4 √ q ln n. Using Corollary 2.9 in the first line of the "while" loop, we can always find an interval I that is h-heavy for β k and does not belong to the set of forbidden intervals F , provided that none of the forbidden intervals in F is 8h = (1/|P |)-heavy at β k . This will be shown below. The interval I depends on the current inverse temperature β k . For now, we neglect the failure probability of the algorithm for determining I in Corollary 2.9 as well as the failure probabilities of the two binary searches in the third and fourth lines of the "while" loop and account for these at the end of the proof by a union bound argument. We will also show that the two predicates used inside the binary searches in the third and fourth lines are monotone and are satisfied at the left ends of their respective search intervals, as required for Algorithm 1.
To bound the total number of temperatures in the final schedule, we analyze three mutually exclusive cases that can arise in each iteration: (1) β * = L * = L, (2) β * = L * < L, (3) β * < L * . Observe that in cases (2) and (3) the binary searches return inverse temperatures that are not equal to the right endpoint of their respective search intervals. Therefore, if these inverse temperatures are increased by the precision α = 1/(2n), the predicates are no longer satisfied. This allows us to bound the number of steps in cases (2) and (3).

Case 1
In case (1), we move by setting L * = L and β * = L * . These moves are called "long moves" in [23] because the inverse temperature is increased by the maximally possible value according to the requirement (15) in Lemma 2.7. To bound the number of long moves, we use the following lemma.
Lemma 2.12. The number of "long moves" in Algorithm 2, where we set β * = L * = L, is at most 6 √ q ln n.
We defer the proof of this lemma to Appendix B as it is rather long and technical.

Case 2
We move by setting L * < L and β * = L * and show that when such a move takes place, the interval I is not 8h = (1/|P |)-heavy for any β ≥ L * . This ensures that we can correctly apply Corollary 2.9 in all subsequent iterations, which are executed with a set F of forbidden intervals that now includes I and only consider inverse temperatures β ≥ L * . The value L * is obtained by applying BinarySearch(IsHeavy(I, ·) = true, [β k , L], 1/2n). Clearly, it holds that IsHeavy(I, β k ) = true. Thus, the predicate used in the first binary search of Algorithm 2 is satisfied at the left endpoint β k of the search interval. This predicate is monotone, which is implied by Lemma 2.6. The fact L * < L means that the value returned by binary search is not equal to the right end of the search interval. Therefore, by the properties of the binary search procedure we have IsHeavy(I, L * ) = true and IsHeavy(I, ρ) = false for some ρ with L * < ρ ≤ L * + 1/(2n). By Lemma 2.8, I is 4h-heavy for L * but is not 4h-heavy anymore for ρ.
In the chain of inequalities above, we used the following facts: for the first inequality, Z(β ) ≥ Z(ρ) because Z(·) is non-increasing, and β ≥ ρ − 1/(2n); for the second inequality we used H(x) ≤ n; for the third inequality, we used the fact that the weight of I at ρ is at most 4h (recall Definition 2.5).
It remains to prove that I is not 8h-heavy for any β > ρ. Assume to the contrary that such a β exists. This would automatically imply that I is 4h-heavy at β . Combined with Lemma 2.6, this would imply that I is 4h-heavy for all temperatures inside the interval [L * , β ]. But this contradicts that I is not 4h-heavy for ρ.
We can therefore apply Corollary 2.9 in subsequent iterations where I is forbidden. Because I is added to the set F of forbidden intervals, an iteration in which we set β * = L * with L * < L can only take place at most once per interval I. Hence, the number of these iterations is at most |P | ≤ 4 √ q ln n. (3), we move by setting β * < L * . Notice that

Case 3 In case
by Lemma 2.7. Thus, the predicate used in the second binary search of Algorithm 2 is satisfied at the left endpoint β k of the search interval. Moreover, this predicate is monotone, which follows from the discussion in the proof of Corollary 2.4. Hence, binary search with precision 1/2n determines a value λ such that there exists ρ satisfying: Using Lemma 2.7, we have: Furthermore, because the predicate is false at ρ. It remains to lower bound the LHS of Eq. (17). In order to bound the ratio, first observe that Z(λ) ≥ Z(ρ). Then, observe that for every ε > 0, the following inequality holds: It then follows that Z( β k +λ 2 ) ≤ Z( β k +ρ 2 )e 1/4 , because β k +ρ 2 − β k +λ 2 ≤ 1 4n . The LHS of Eq. (17) can be lower bounded by where the first inequality used Z( β k +λ 2 ) ≤ Z( β k +ρ 2 )e 1/4 and the second inequality used Eq. (18).
We now conclude the proof. We showed above that, in the iterations for which β * < L * , we select inverse temperatures that satisfy condition (19). The number of such inverse temperatures is at most √ q ln n by Corollary 2.4. We have also shown that the number of iterations in which we set β * = L * = L is at most 6 √ q ln n using Lemma 2.12, and the number of iterations in which se set β * = L * < L is at most 4 √ q ln n. Therefore, the length of the computed schedule is at most 6 √ q ln n + 5 √ q ln n ≤ 11 √ q ln n.
In each step we perform binary search (twice) with precision 1/(2n) over a domain that is contained in [0, β max ]; the total number of binary search iterations per step is at most log(4nβ max ) ≤ 8(ln β max + ln n). Each binary search iteration requires at most 2s samples, where s = (8/h) · ln(1/δ ) = 64|P | ln(1/δ ) is given by Lemma 2.7. To ensure that the algorithm is successful with probability at least 1 − δ, we apply a union bound over T = 88 √ q ln n(ln β max + ln n) subroutine calls, and we choose δ = δ · T −1 for each subroutine.
We obtain a schedule with length 11 √ q ln n by taking at most 2sT ≤ 5·10 4 q ln 2 n(ln q+ ln n) ln(1/δ ) samples from the Gibbs distribution µ β . Substituting the value for δ and simplifying by using the assumptions discussed at the beginning of Section 2, plus the assumption ln n ≥ 5 + ln(ln q + ln n) + ln ln n, the total number of samples is at most 5 · 10 4 q ln 2 n(ln q + ln n) 2 ln(1/δ).

Quantum algorithm 3.1 Cooling schedule, quantumly
Our quantum algorithm is much simpler than the classical algorithm. For the schedule generation, the variances of the random variables S[V i ] and S[W i ] can be estimated with quantum amplitude estimation thanks to a simple relationship between the variances of these random variables and the overlap between the qsamples |µ β i and |µ β i+1 .
Let |φ and |ψ be two arbitrary quantum states. Define the projectors P φ = |φ φ| and P ⊥ φ = I − P for the state |φ and similarly define P ψ .

Fact 3.3.
Let |φ , |ψ be quantum states. Assume that the transition probability a = | φ|ψ | 2 is bounded from below by some constant. Then, starting with |φ , we can prepare |ψ with probability 1 − η, by performing at Proof. The probability that we fail to prepare |ψ after performing 2k + 1 measurements is given by (1 − a) a 2 + (1 − a) 2 k . This result is easily established by observing that everything can be analyzed in the two dimensional space spanned by P ψ |φ and P ⊥ ψ |φ . Consider the two bases {|φ , |φ ⊥ } and {|ψ , |ψ ⊥ } for this subspace, where φ|φ ⊥ = 0 and ψ|ψ ⊥ = 0. The factor (1 − a) corresponds to the transition |φ → |ψ ⊥ , i.e., that the first measurement fails to prepare |ψ . The term a 2 corresponds to the transition |ψ ⊥ → |φ ⊥ → |ψ ⊥ and the term 1 − a corresponds to the transition |ψ ⊥ → |φ → |ψ ⊥ , i.e., that P ψ measurement followed by the P φ measurement fails to prepare |ψ . It is clear that we can make the failure probability smaller than η by choosing k = O ln(1/η) provided that a is bounded from below by a constant.
We now describe a quantum algorithm to obtain a schedule that matches Theorem 2.2. The difference between the classical and quantum setting is mainly due to the classical estimator in Eq. (8), inaccurate compared to the quantum estimator. The quantum algorithm is more efficient than the classical algorithm, and improves the length quadratically in ln n. input : Initial temperature β 0 , largest temperature β max , probability δ, constant c 2 . output: Set of inverse temperatures β 0 , . . . , β k = β max . Set k ← 0; while β k < β max do Define the function f o (β) := | µ β k |µ β | 2 , where the overlap is evaluated using Corollary 3.2 with additive error 0.005; Algorithm 3: Quantum schedule generation procedure.
Proof. We first bound the number of iterations of the algorithm. Assume for now that all subroutines are successful, and we use a union bound at the end. Notice that | µ β k |µ β k | 2 = 1 so the predicate used in binary search is satisfied at the left endpoint of the interval [β k , q].
If the binary search (as given in Alg. 1) returns q, we are done. Otherwise, it determines λ, ρ such that: We have which shows that the schedule is 15-slowly-varying. Also, It can be shown that Z(λ) ≥ Z(ρ), and that Z( β k +λ 2 ) ≤ Z( β k +ρ 2 )e 1/4 . Using these facts, we can write: We see that the cooling schedule obtained by setting β k+1 = λ in each iteration satisfies the condition in Corollary 2.4, which coincides with the bound in (22). This implies that the length of the resulting cooling schedule is bounded from above by the length of a perfectly balanced cooling schedule. The latter in turn is bounded from above by √ q ln n, which is established in Corollary 2.3. In each step we perform binary search with precision 1/2n over a domain that is contained in [0, β max ], which implies that the total number of binary search iterations per step is at most log(2nβ max ) ≤ 2(ln β max + ln n). The total number of binary searches in all steps is therefore at most 2 √ q ln n(ln β max + ln n). Each binary search invokes amplitude estimation subroutine in Corollary 3.2 with additive error ε set to the constant 0.005. To ensure that the entire algorithm succeeds with probability at least 1 − δ, we choose the maximum probability of failure η = δ/ 4 √ q ln n(ln β max + ln n) for the amplitude estimation subroutine. By the union bound, the probability that at least one amplitude estimation subroutine fails is ≤ δ/2. Each amplitude estimation requires O(ln(1/η)) many reflections since ε is a constant. Hence, the total number of reflections for all binary searches is O √ q ln n(ln β max + ln n) ln(1/η) , which, in terms of δ and using the assumption β max ≤ q (see the proof of Theorem 2.11), is O q ln n(ln q + ln n)(ln q + ln ln n + ln(1/δ)) .
It remains to bound the time it takes to iteratively prepare the states |µ β 0 , . . . , |µ β k in the schedule. The overlap between all adjacent qsamples is large since | µ β k |µ β k+1 | 2 ≥ 1 15 . Therefore, we can use the method described in Fact 3.3 to "jump" from |µ β k to |µ β k+1 . The necessary projective measurements can be implemented with the reflections around the Gibbs qsamples. Since there are at most √ q ln n stages, we want the failure probability of any jump to be smaller than (δ/2)/ √ q ln n to ensure that the probability that any jump fails is ≤ δ/2. The number of reflections across the Gibbs qsamples is O √ q ln n(ln q + ln n)(ln q + ln ln n + ln(1/δ)) .
Before we prove the main theorem in this section, we need the following result about the sample complexity of computing functions of random variables, given access to coherent qsamples of distributions. The proof of this theorem has been deferred to Appendix C. B/ε · ln(B/ε) 1.5 · ln ln(B/ε)/η many times.
The theorem above allows us to estimate the expectations E[W k ] and E[V k ] by setting D to be the Gibbs distribution µ β k and f to be exp(−β k,k+1 H(x k )) for the random variable V k or to exp(β k,k+1 H(x k+1 )) for the random variable W k , as defined in Section 2.2. We can use that B ≤ 15 because the cooling schedule β 0 , . . . , β generated by the algorithm in Theorem 3.4 is 15-slowly varying. The error parameter ε and the failure probability η will have to be of the order of 1/ .

Quantum algorithm
Theorem 3.6. Let n ≥ 1, δ, ε, η ∈ (0, 1) and 0 ≤ β min < β max . Let H : Ω → [0, n] be a classical Hamiltonian. Let Q = Z(β max )/Z(β min ) and q = ln Q. There exists a quantum algorithm that usesÕ q · ε −1 many applications of reflection operators around Gibbs state (at different inverse temperatures) and with probability at least 4/5, approximates Q up to to relative error ε, i.e., outputs Q such that Proof. The proof of the quantum algorithm is very similar to the classical proof in Theorem 2.13. The main differences are in estimating the cooling schedule using quantum techniques and estimating the mean values using quantum techniques. Overall, the structure of the quantum algorithm is: 1. Compute a slowly varying cooling schedule of length as described in Theorem 3.4. We argue the correctness of the algorithm first.
Step (1) is clear. Theorem 3.4 satisfies the following: with probability ≥ 9/10, the algorithm generates a sequence of = √ q ln n inverse temperatures β 1 , . . . , β satisfying Additionally observe that One can use Theorem 3.5 which uses O(ln ) samples of each |µ β i and invokes the reflection R β iÕ /ε many times and: with probability ≥ 1 − 1/(20 ) produces a W i , V i such that We use ratios of the lower and upper bounds to bound the ratioŴ i /V i from below and above and employ the union bound to obtain a bound on the success probability. We obtain that hold with probability at least 1 − 1/(10 ) for all i. This in turn implies that holds with probability at least 1−1/10−1/10 ≥ 4/5 (the first 1/10 is from the union bound over the 1 − 1/(10 ) in satisfying Eq. (23) and the second 1/10 comes because Theorem 3.4 was assumed to fail with probability ≤ 1/10).
is the desired ratio. We obtain: where the first inequality used (1 + x) t ≥ 1 + xt (for x ≥ −1 and t ≥ 2), the second and third inequality used Eq. (24), the fourth inequality used It remains to analyze the complexity of this quantum algorithm. First, Theorem 3.4 uses O · (ln q + ln n) · ln invocations of a reflection around the Gibbs state (note that we fix δ = O(1) when invoking this theorem). Next, Theorem 3.5 uses O( ln ) copies of the Gibbs states and invokes the reflection operator around Gibbs state (at different inverse temperatures)Õ 2 /ε many times (we specify the poly-logarithmic factors below). We can use Fact 3.3 and similar arguments as at the end of the proof of Theorem 3.4 to analyze the complexity of preparing these copies. Observe that we cannot directly prepare a qsample |µ β k for an arbitrary k, but have to move successively through the sequence |µ β 0 , . . . , |µ β k . In total, we need to make at most O( 2 ln ) transitions between Gibbs qsamples of adjacent stages by performing reflections around these qsamples. To use the union bound, we need that the failure probability of any transition should be on the order of 2 ln , which can be accomplished with at most O(ln ) many reflections per transition. Therefore, O 2 · (ln ) 2 many reflections are necessary in total to prepare all the copies. Overall, using = √ q ln n, our algorithm uses O(q + 2 · (ln ) 2 ) = O(q ln n · (ln q + ln n) 2 ) walk steps to prepare these Gibbs state and O q/ε · ln n · ln(q/ε · ln n) 2 invocations of the reflection around Gibbs state. The total number of quantum walk steps is O q/ε · ln n · ln(q/ε · ln n) 2 + q ln n · (ln q + ln n) 2 .
Finally, in order to translate the complexity stated in Theorem 3.6 into a running time bound, observe that each invocation of the reflection operator in the theorem involves O(1/ √ ∆) Markov chain steps on top of the sample complexity above: this is a well-known result from [24,20]. Corollary 3.7. In the setting of Theorem 3.6, there exists a quantum algorithm that makesÕ q · ε −1 · ∆ −1/2 steps of the quantum walk operator (each with spectral gap lower bounded by ∆) and with probability at least 4/5, approximates Q up to to relative error ε, i.e., outputs Q such that

Conclusion
First, we have improved a well-known classical algorithm for estimating partition functions. Second, we have shown how to quantize this improved algorithm, thereby obtaining a quadratic speed-up with respect to the estimation precision and to the spectral gaps of the underlying Markov chains. In particular, we have obtained the best quantum algorithm for estimating partition functions, improving upon the state-of-the-art algorithm due to Harrow and Wei [12].

Open questions.
We conclude with a few concrete open questions. (1) Can we fully quantize the classical algorithms of [14] and Kolmogorov [18], in the process removing large prefactor in our algorithms? (2) Does there exist a classical algorithm for computing partition functions with schedule length O( ln |Ω| ln n), matching the conjecture by [23]? (3) Can we prove any lower bound in the standard model of computation for computing partition functions (either quantum or classical)? 11 (4) Can one remove the assumption that SVV or TPA applies only for non-negative Hamiltonians?

B Bounding number of iterations in classical schedule generation
Here we prove Lemma 2.12 (restated below for convenience) that bounds number of "long moves" in case 1 of the proof of Theorem 2.11.
Lemma B.1. The number of "long moves" in Algorithm 2, where we set β * = L * = L, is at most 6 √ q ln n.
Proof. We follow the same scheme as the proof of [23,Lemma 5.14], with a few modifications to account for the differences between our Algorithm 2 and the schedule generation algorithm in [23]. Because of these modifications, we are able to obtain a tighter upper bound.
Recall that I ⊆ {0, . . . , n} and I is a contiguous interval, therefore the set of possible interval widths is a subset of {0, . . . , n}. We first consider long moves that happen with interval width of at least 1; as discussed at the end of this proof, there can be at most one long move with interval width 0, therefore a bound on the number of long moves with interval width in {1, . . . , n} immediately implies a bound on the total number of long moves. Let x k be the number of times that we perform a long move with an interval of width k, for k = 1, . . . , n. Let k max := arg max k {x k : x k > 0}, i.e., the maximum interval width k for which we perform at least one long move.
Let us consider all the long moves performed with an interval width belonging to the set {k min , . . . , k max }, where we arbitrarily fix a choice k min ∈ {1, . . . , k max }. We want to derive a lower bound on the inverse temperature β reached after x k min +x k min +1 +· · ·+x kmax = t+1 such long moves. Since β is only increasing in the course of the algorithm, clearly a lower bound after t moves is also valid after t + 1 moves. In particular, since we want to obtain the tightest possible lower bound, we consider β after performing y k moves with interval width k, where y k is defined as follows for k ∈ {k min , . . . , k max }: Notice that this is equivalent to undercounting the number of moves at interval width k max by exactly 1. Since for interval width k, the inverse temperature β increases by exactly 1/k during a long move, after t long moves that satisfy the move count y k min , . . . , y kmax we have: We claim that in addition, β must satisfy: We now prove this claim. First, notice that by construction of P , since we choose the width w of an interval I = {b, . . . , b + w = c} to be w = b/ √ q , we must have: Next, notice that in order for a long move to happen starting from inverse temperature β with an interval I of width w, there must exist some inverse temperature β > β at which I is h-heavy (more specifically, I must be h-heavy at β = β + 1/w, otherwise the long move cannot take place). The weight of I at β is 1 x:H(x)∈I e −β H(x) . Therefore we must have: where for the second inequality we used β > β, and for the last inequality we used Z(β ) ≥ 1, which is true by assumption. We obtain the following chain of inequalities: where we used the facts that the width of I is between k min and k max , and Eq. 35 (which, together, imply H(x) ≥ k min √ q for the states considered in the summation). Taking the logarithm on both sides yields which immediately implies (34). Combining (33) and (34), we obtain: Now notice that kmax k min =1 kmax k=k min y k k = kmax k=1 y k , because each term y k k appears exactly k times in the double summation. Therefore, taking (36) and taking the sum for k min = 1, . . . , k max on both sides, we obtain: where the last inequality exploits the fact that k max ≤ n and the well-known inequality n i=1 1/i ≤ 1 + ln n. Using the choice h = 1 8|P | in Algorithm 2 and Lemma 2.10, we can write ln 1 h = ln(32 √ q ln n) ≤ 2 ln q ln ln n. Finally, using (37) and (1 + ln n) ≤ 2 ln n, we obtain: kmax k=1 y k ≤ (2 ln n) q + 2 ln q ln ln n √ q ≤ 4 √ q ln n.
The LHS of the last equation is, by definition, equal to the number of long moves that are performed with interval width in {1, . . . , n}, minus one. Recall that in Algorithm 2 we can also have long moves with intervals of width 0, but there can be at most one such move because then we set L = q ≥ β max . Hence, the total number of long moves in the algorithm is at most 4 √ q ln n + 2 ≤ 6 √ q ln n.

C Proof of non-destructive amplitude estimation
We first introduce some definitions and notation which we use throughout.
Let |ψ D denote the coherent encoding of the distribution D, i.e, and R D the reflection around |ψ D , i.e., The results in this section are stated as general subroutines. Note that when we apply these techniques for the task of estimating partition functions, the probability distribution D corresponds to the Gibbs distribution µ β i at β i , the function f corresponds to the function exp(−βH(·)). 13 The goal of this section is to present a quantum algorithm for estimating µ with relative error ε given access to copies of the coherent encoding |ψ D and reflection operator R D . We assume that the relative variance φ µ 2 is bounded from above by B > 1. Naturally, the goal is to estimate µ using as few copies of |ψ D and invocations of the reflection operator R D as possible. Naively, a quantum algorithm could simply use multiple copies of |ψ D to obtain O(B/ε 2 ) samples x according to D, and estimate µ using the classical Chebyshev's inequality. However, quantumly one can obtain a quadratic improvement in 1/ε, which we prove in this section. The theorem above builds upon and improves the results due to [21, Algorithm 4 and Theorem 6]. Most importantly, the complexity of our algorithm grows only with √ B/ε, whereas the algorithm in [21] grows with B/ε. Note that [11] already proved that a scaling with √ B/ε is possible and referred to this result as the quantum Chebyshev inequality. Our approach provides a different algorithm, with a simpler proof, that achieves essentially the same running time. We also note that [12] used the algorithm in [21] as a subroutine for estimating partition functions.
In order to prove this theorem, we will need a few theorems which we state first. We first state non-destructive amplitude estimation: a variant of amplitude estimation where the initial resource state is not destroyed in the quantum algorithm. 14 Theorem C.3 (Non-destructive amplitude estimation [12,Theorem 6]). Let |ψ be an arbitrary quantum state and P an arbitrary projector. Let R ψ = 2|ψ ψ| − I. For every t > 0, there is a quantum algorithm A that starts in the initial state |ψ and with probability at least 1 − η, outputs an estimatep of p = ψ|P |ψ such that Additionally, A restores |ψ with probability 1 − η. Overall A invokes the controlled reflection R ψ O t · ln(1/η) many times.
An immediate corollary of this theorem is the following. . For every t > 0, there is a quantum algorithm A that, given access to a single copy of |ψ D and controlled reflection R D , with probability ≥ 1 − η outputs an estimateμ of µ such that Additionally, A restores the state |ψ D with probability 1 − η. Overall A invokes the controlled reflection R D O(t · ln(1/η)) many times.
The algorithm in the corollary above is straightforward: first transform |ψ D into by performing a controlled rotation on an additional qubit. For a projector P = I ⊗ |1 1|, observe that ψ D,f |P |ψ D,f = x∈Ω D(x)f (x) = µ. So the algorithm can simply estimate µ using non-destructive amplitude estimation in Theorem C.3. The reflection around |ψ D,f can be realized with the help of the reflection R D and the controlled qubit rotation.
The lemma below provides an important subroutine required for proving Theorem C.2. The lemma makes use of the non-destructive amplitude estimation routine. Our proof follows closely [21, Algorithm 2 and Lemma 4], but we choose a different way of partitioning Ω, which allows us to reduce the complexity of the algorithm. Additionally, A restores the initial state |ψ D with probability ≥ 1 − η and invokes the controlled reflection R D O √ B/ε · ln(B/ε) 1.5 · ln ln(B/ε)/η many times.
Proof. Let k be a parameter to be determined later and consider the following sets , . . . , k}, We write the expectation The second term in (39) can be bounded from above as follows where we used 2 k ≤ f (x) for x ∈ Ω k+1 . We can "ignore" this term provided that k is sufficiently large so its contribution to µ becomes negligible.
Let us now focus on the double sum on the left in Eq.(39), which can be understood as a weighted sum of k + 1 means. For each ∈ {0, . . . , k}, we can estimate these means µ = x∈Ω D(x)f (x) · 2 − with the help of Corollary C.4. To this end, we define the modified functions f by setting f (x) = f (x) · 2 − if x ∈ Ω and 0 otherwise. Letμ be the estimates returned by amplitude estimation when applied to the probability distribution p and the functions f . Our final estimateμ of µ will then bê µ = k =0 2 ·μ , We now analyze the resulting estimation error. Let t > 0 be a parameter determined later and let η = η/(k + 1). We use Corollary C.4 (with parameters t, η ) to obtainμ satisfying Note that the algorithm in Corollary C.4 succeeds with probability 1 − η , but since we invoke this corollary k + 1 times for each ∈ {0, . . . , k}, by union bound the success