A (simple) classical algorithm for estimating Betti numbers

We describe a simple algorithm for estimating the k -th normalized Betti number of a simplicial complex over n elements using the path integral Monte Carlo method. For a general simplicial complex, the running time of our algo-rithm is n O (cid:16) 1 √ γ log 1 ε (cid:17) with γ measuring the spectral gap of the combinatorial Laplacian and ε ∈ (0 , 1) the additive precision. In the case of a clique complex, the running time of our algorithm improves to ( n/λ max ) O


Introduction
An abstract simplicial complex K is a family of sets or faces that is downward closed under the subset relation.A canonical example of a simplicial complex is the clique complex of a graph, where the sets correspond to the cliques in the graph.
There is a recent surge of interest in using simplicial complexes to model higher-order relations in data sets -a technique often referred to as "topological data analysis" [6].In these studies, an object of particular interest is the homology or Betti number of a simplicial complex, which characterizes the number of high-dimensional holes in the complex.Unfortunately, efficiently computing the Betti number seems like a challenging task.Indeed, it was recently shown that even multiplicatively approximating the Betti number of a simplicial complex or clique complex is hard even for quantum computers 1 .
Thus the next natural question is whether we can additively approximate the Betti number.More formally, given a parameter ε ∈ (0, 1), can we efficiently output an estimate νk of the k-th (normalized) Betti number of the complex satisfying where d k denotes the number of k-faces in the complex.To the best of our knowledge, Elek [11] was the first to study this question.In [11], it was proved that if the complex has constant degree, that is, every 0-face (vertex) of the complex is contained in a constant number of 1-faces (edges), then there is an algorithm whose running time depends only on the parameter ε.The constant degree assumption however implies that the complex has constant dimension as well (that is, it contains no k-faces for k ∈ ω(1)), and thus all Betti numbers β k for non-constant k are zero.
The problem was later reconsidered by Lloyd, Garnerone, and Zanardi [19], who proposed a quantum algorithm for estimating the Betti number 2 .Their algorithm combines quantum techniques such as Hamiltonian simulation and quantum phase estimation.Assuming that we can efficiently sample a uniformly random k-face from the complex, the algorithm outputs an ε-additive approximation of β k in time poly(n, 1/γ, 1/ε), where n = d 0 denotes the number of 0-faces in the complex, and γ is the (normalized) spectral gap of the k-th combinatorial Laplacian.As current classical algorithms for calculating Betti numbers seem to run in time poly(d k ) [13], which can be poly(n k ), this suggests an exponential speedup of quantum algorithms over classical ones for this problem.This explains the surge of interest in the quantum algorithm, and in particular in its application to clique complexes, which have a concise poly(n)-size description [1,2,4,7,17,20,23].

Contributions:
In this work, we describe a simple classical algorithm for approximating Betti numbers.We refer to Theorem 1.1 for a formal statement about the complexity of our algorithm, here we discuss its implications.Our algorithm provides a natural benchmark for the aforementioned quantum algorithms.Similarly to these quantum algorithms, our algorithm is efficient if the gap and the precision are "large".However, while the quantum algorithms can afford a gap γ and precision ε as small as 1/ poly(n), our algorithm requires these to be constant for general complexes.This is similar to the dequantization results from [14].In the case of clique complexes, we can go further.For example, if k ∈ Ω(n) then we can afford precision ε = 1/ poly(n) (if the gap is constant) or gap γ = Ω(1/ log 2 n) (if the precision is constant).Overall, this further pins down the region where we can expect a quantum advantage for the problem of estimating Betti numbers.
In a nutshell, we base our algorithm on a random variable whose expectation is close to β k /d k and whose variance is (sufficiently) small.Crucially, we show that we can efficiently generate samples from this random variable.Standard concentration bounds can be used to bound the required number of samples and hence establish the complexity of our algorithm.More precisely, the algorithm is based on the technique of path integral Monte Carlo [3], akin to the Ulam-von Neumann algorithm for solving linear systems [12].Our result is formally stated below.By Γ(∆ k ) we denote the spectral gap of the combinatorial Laplacian ∆ k , which is equal to its smallest nonzero eigenvalue.Theorem 1.1.Let ∆ k denote the k-th combinatorial Laplacian of the complex.Assume that in time poly(n) we can (i) draw a k-face uniformly at random, and (ii) check whether a set is in the complex.Given an estimate λ ∈ Θ(λ max (∆ k )) and a lower bound γ such that ∆ k has spectral gap Γ(∆ k ) ≥ γ λ, there exists a classical algorithm that, for any ε > 0, outputs with high probability an estimate νk = 2 More correctly, [19] and follow-up works output an approximation βk = β k ± εd k of the actual Betti number, rather than the normalized version that we consider.This however requires a (multiplicative) estimate of d k .While such an estimate can be obtained efficiently if e.g. the complex is "dense" (i.e., d k / n k+1 = 1/ poly(n)), we avoid this restriction.

Complexes
Complexity is poly(n) if quantum algorithm of [19] general γ, ε ∈ Ω(1/ poly(n)) this work general γ, ε ∈ Ω(1) Table 1: Comparison of the parameter settings of quantum and classical algorithms for the Betti number estimation problem under which their running time is polynomial.

and of a clique complex in time
The algorithm has space complexity poly(n, 1/γ, log(1/ε)).
For general complexes, our algorithm improves upon the aforementioned classical al- being the maximum up-degree3 over all k-faces, we can simply set λ = n if k ∈ Ω(n) or if we know that δ k ∈ Ω(n).In such case, the algorithm for clique complexes runs in time 2 . This is polynomial if either γ ∈ Ω(1) and ε = 1/ poly(n), or γ ∈ Ω(1/ log 2 n) and ε ∈ Ω(1).The algorithm provides a classical counterpart to the aforementioned line of quantum algorithms for estimating Betti numbers which, under similar assumptions, have a runtime scaling as poly(n, 1/γ, 1/ε).We summarize these findings in Table 1.
The complexity of our algorithm for general simplicial complexes can alternatively be obtained using the singular value transformation (SVT) algorithm4 from Gharibian and Le Gall [14].The main difference is that we use a path integral Monte Carlo approach for computing matrix powers, instead of computing them explicitly as in [14,Lemma 3].This approach provides us with an exponential improvement in the space complexity since the SVT algorithm has space complexity n The more significant benefit is that we get an improved algorithm for clique complexes, which is the main case of interest for the aforementioned quantum algorithms.We show that the k-th combinatorial Laplacian is n-sparse for clique complexes, as compared to general complexes which are O(kn)-sparse.This implies that it is closer to a diagonally dominant matrix, and we can exploit this for designing better algorithms using the path integral Monte Carlo technique.
Open problems A natural question is to what extent we can improve our results.The most stringent barrier seems to come from [5,Theorem 6], who proved that Betti number estimation for general complexes is DQC1-hard when ε, γ = 1/ poly(n), where DQC1 is a complexity class that is expected to be hard to simulate classically5 .This safeguards a quantum speedup for the case of general complexes, yet it leaves open the case of clique complexes.Our work shows that we can get additional leverage for clique complexes.We leave it as our main open question whether the classical complexity for clique complexes can be improved to poly(n, 1/γ, 1/ε).
The task of estimating persistent Betti numbers has been getting an increasing amount of attention, see e.g.[17,21,24].It seems like our method can be used for solving this problem too (if we have membership query access to both complexes K and L with K ⊆ L), but we leave the formal proof of this for future work.However, note that we lose the speedup in the clique complex case, because for the persistent Laplacian it is no longer true that it can only have less than n nonzero entries per row (see our Lemma 4.3).For example, if K only consists of n isolated vertices, but L is the clique complex corresponding to the complete graph, then for any k the k-th persistent Laplacian of the pair (K, L) has Ω(nk) nonzero entries in each row.
A final open question, as was already mentioned in earlier works [4], is characterizing which complexes admit a large spectral gap.The advantage of our algorithm, as well as the aforementioned quantum algorithms, hinges on this assumption.We note that [4] discussed the complete k-partite graph K(m, k) as an example where our spectral gap assumption on the combinatorial Laplacian holds.K(m, k) consists of k clusters, where each cluster contains m vertices, and two vertices are adjacent iff they are in different clusters.The (k − 1)-th combinatorial Laplacian of the clique complex defined by K(m, k) has spectral gap m and the (k − 1)-th Betti number of the complex is (m − 1) k (see [4,Proposition 1 & 2]).Observe that for K(n/k, k), the spectral gap is n/k, and the normalized spectral gap is γ = 1/k.Thus, for this kind of complexes our algorithm runs in polynomial time if for example ε ∈ Ω(1) and k ∈ O log 2 (n) .

Related work.
Upon completion of this work, we noticed that a similar classical path integral Monte Carlo algorithm for estimating Betti numbers was proposed in [4].The authors use a Trotterization approach to implement an imaginary time evolution of the combinatorial Laplacian, and use a more complex distribution over paths to minimize the variance of the Monte Carlo estimator.These techniques seem more flexible than ours, and might eventually lead to a better algorithm.However, unless some additional conditions are imposed, the current runtime of their algorithm still shows an exponential dependency on both k and 1/ε, which our algorithm avoids.
As a reviewer has pointed out, a similar usage of the path integral Monte Carlo method to estimate elements of a matrix power is present in [8] (in particular, see their Lemma 2.5).

Homology and Laplacians
In this work, we consider a simplicial complex K over the set [n], which corresponds to a downward closed set system over [n], where [n] = {1, . . ., n}.We denote by We orient the faces by ordering the vertices in increasing order (that is, we assume i ℓ < i ℓ+1 ).Once we have defined an orientation of the faces, we associate to each face a basis vector From the boundary operator we can define the combinatorial Laplacian ∆ k is a symmetric, positive semidefinite matrix of size d k × d k .Next we define notions of degree and neighborhood of a face.Definition 2.1.In a simplicial complex, the up-degree of a k-face i is the number of (k + 1)-faces that contain i.It is denoted as The maximum up-degree among all the k-faces is denoted as Definition 2.2 (Down-up and up-down neighbors).Let i, j ∈ C k be two k-faces of a simplicial complex K. i and j are said to be down-up neighbors if their symmetric difference |i△j| = 2. Additionally, if i ∪ j is a (k + 1)-face of K, then i and j are also said to be up-down neighbors.
The following lemma from [15] uses these notions to characterize the entries of ∆ k .
Lemma 2.3 (Restatement of Laplacian Matrix Theorem, [15,Theorem 3.3.4]).Let K be a finite oriented simplicial complex, k be an integer with 0 < k ≤ dim(K), and {1, 2, . . ., Then we have the following: , and i and j are down-up neighbors but they are not up-down neighbors.
• (∆ k ) ij = 0 otherwise (i.e. if i ̸ = j, and either i and j are up-down neighbors or they are not down-up neighbors).
The following lemma gathers some useful facts about ∆ k that will be used in our proofs.
Lemma 2.4.Let us consider a simplicial complex K with ∆ k being its combinatorial Laplacian and δ k being the maximum up-degree among all k-faces of K. Then the following results hold: Proof.The second and third bullets follow from Lemma 2.3.The second inequality of the first bullet follows from [10, Proposition 6.2], who prove that For proving the first inequality of the first bullet, we write the largest eigenvalue using the Rayleigh quotient: We can lower bound this by taking a particular x: the one that is all zero except for a position where it is one, and the latter position corresponds to a k-face with up-degree δ k .For this vector, ∂ † k+1 x contains δ k ones and the other elements are zero (because it has up-degree δ k ).And it is also true that ∂ k x contains k + 1 ones and the other elements are zero (because every k-face contains k + 1 many (k − 1)-faces).This concludes that Now let us define the central objects of our interest, namely the Betti numbers of a simplicial complex.
Definition 2.5 (Betti number).Let ∆ k be the k-th combinatorial Laplacian of a simplicial complex K with ∂ k denoting the boundary operator.The k-th Betti number β k equals the dimension of the k-th homology group: thus it is equal to the number of zero eigenvalues of ∆ k .
We will be particularly interested in the Betti numbers of clique complexes.The clique complex K associated to a graph G = (V, E), with vertices V = [n] and (undirected) edges E, is the family of subsets of V that are cliques in G, that is, a subset A ⊆ V is in K if and only if A induces a clique in G.

Hoeffding's inequality
The following concentration bound will be crucial in our proofs.Lemma 2.6 (Hoeffding's Inequality [18], see also [9]).Let X 1 , . . ., X p be independent random variables such that a ≤ X i ≤ b and let X = 1 p p i=1 X i .Then, for all δ > 0, 3 Algorithm for general simplicial complexes Consider a general simplicial complex K over [n], with d k being the number of its k-faces, ∆ k its k-th combinatorial Laplacian, and with k-th Betti number β k .We wish to obtain an estimate νk that satisfies νk = β k /d k ± ε for some parameter ε ∈ (0, 1).In this section, we make the following assumptions: 1.In time polynomial in n, (a) we can check whether a set is in the complex, and (b) we can sample a k-face from the simplicial complex K uniformly at random7 .
2. We have estimates λ and γ on the largest eigenvalue and the spectral gap of ∆ k , respectively, satisfying for some constant c > 08 .

Now consider the matrix
which, as we discuss below, satisfies 0 ⪯ H ⪯ I. From Lemma 2.4, we know that 0 ≤ (∆ k ) ii ≤ n and ∆ k has O(nk) nonzero off-diagonal entries in every row, each of absolute magnitude 1.This implies that By construction, the k-th combinatorial Laplacian ∆ k is positive semidefinite, hence all eigenvalues are non-negative, β k of them are equal to 0, the second smallest distinct eigenvalue is λ 2 (∆ k ), and the maximum eigenvalue is λ max (∆ k ).Thus, by linearity, the eigenvalues of H lie between 0 and 1, β k of them equal to 1, and all other eigenvalues lie below The following lemma shows how to relate the trace of H r , for sufficiently large r, to the Betti number Proof.On the one hand, we have that On the other hand, we have that where we used that (1 − γ) r ≤ ε for r ≥ 1 γ log 1 ε .
Using this observation, we can obtain a 2ε-additive estimate of β k /d k from an ε-additive estimate of Tr (H r ) /d k .To obtain the latter we use another observation, that holds not only for large r as above, but for a general nonnegative z-th power of H.
is sampled uniformly at random and z exactly in time O(n 2z ).Indeed this is the approach in [14].Here we use another approach based on the path integral Monte Carlo method, which has two advantages.First, it improves the space complexity from n O(z) , as in [14], to O(nz).Second, it will lead to a faster algorithm for clique complexes (see next section).
Let us denote the sign of H i,j by (−1) s(i,j) , with s(i, j) ∈ {0, 1}.We can rewrite X (i) z as follows: ).By our choice of normalization, we can interpret |H j,i |/∥H •,i ∥ 1 =: P (i, j) as a transition probability from face i to face j.We can then say that where the path (j 0 , j 1 , . . ., j z ) is drawn with probability P (j 0 , j 1 ) . . .P (j z−1 , j z ) from the resulting Markov chain with transition matrix P .Moreover, if we choose the initial k-face j 0 ∈ [d k ] uniformly at random, then This gives us an unbiased estimator Y z for the normalized trace of H z .Moreover, as proven in the following lemma, we can sample Y z efficiently.
Lemma 3.3.We can sample from Y z , as defined above, in time z • poly(n).
Proof.We can evaluate Y z by sampling z steps of the Markov chain over k-faces.The initial k-face j 0 is drawn uniformly at random.By our assumptions, we can do this in time poly(n).Subsequent steps are sampled as follows.
Let j i be the current k-face.First we learn the up-degree d up j i , and hence (∆ k ) j i j i .We do this by, for all potential up-neighbors (obtained by adding one element to the face), querying whether they are in the complex.This takes n − k − 1 queries, and hence time poly(n).Then we learn all down-up neighbors by querying all O(n 2 ) subsets with symmetric difference 2. This again takes time poly(n).By Lemma 2.3 we can now derive all O(n 2 ) nonzero entries of the j i -th row H •,j i , and hence sample j i+1 according to the probability P (j i+1 , j i ) = |H j i+1 ,j i |/∥H •,j i ∥ 1 .This yields time poly(n) per step of the Markov chain, and so z • poly(n) time overall.
This leads to the following algorithm, which has time complexity O(n 4z /δ 2 ).
Input: Query and sample access to complex K, integer k, parameters λ and z, precision parameter δ ∈ (0, 1).Output: Estimate est k,z such that est k,z = Tr(H z )/d k ± δ with high probability.Sample z steps (j 0 , j 1 , . . ., j z ) of the Markov chain P with initial face j 0 .

Improvement using Chebyshev polynomials
We can slightly improve this result by approximating H r with a polynomial of degree roughly √ r using Chebyshev polynomials, and then estimating the monomials using Algorithm 1.We use the following approximation lemma.where T i (x) is the i'th Chebyshev polynomial (of the first kind), and α Proof.We prove the lemma using induction on i.The Chebyshev polynomials can be defined via the following recurrence where T 0 (x) = 1, T 1 (x) = x.This immediately shows that the bounds c (i) ℓ ≤ 2 2i hold for i = 0 and i = 1.Now assume i > 1 and that for all i ′ < i and ℓ ≤ i ′ , we have c Then from the recursion T i (x) = 2xT i−1 (x) − T i−2 (x), we obtain c Now we can describe an efficient algorithm that, for any ε > 0, outputs an additive ε-estimate of β k /d k with high probability.The correctness and complexity of the algorithm are proven in Theorem 3.7.To bound the time complexity, recall that the time complexity of Algorithm 1 in Line 3 is O(n 4ℓ /δ 2 ) ∈ n O(d) /ε 2 .Summing over the d + 1 loops, and using the expression for d, this yields a total time complexity that is n This completes the proof of the first item of Theorem 1.1.

Algorithm for clique complexes
The complexity of our path integral Monte Carlo algorithm is dominated by the sample complexity that follows from Hoeffding's inequality (Lemma 2.6), which we bound using the fact that |Y z | ≤ ∥H∥ z 1 and ∥H∥ 1 = poly(n).Here we prove a tighter bound on ∥H∥ 1 for the special case of clique complexes, and exploit this to improve the algorithm.
We will use the following characterization of the off-diagonal elements of the combinatorial Laplacian ∆ k .Lemma 4.1 (Follows from Lemma 2.3).Let ∆ k denote the k-th combinatorial Laplacian of a simplicial complex K. Then (∆ k ) ij for i ̸ = j is nonzero if and only if the corresponding two k-faces are down-up neighbors but not up-down neighbors.
The following claim is going to be useful for the proof of the next lemma.Claim 4.2.In a clique complex, every k-face has at most n − k − 1 down-up neighbors that are not its up-down neighbors.
Proof.Since we are in the clique complex case, a k-face is exactly a (k + 1)-clique, so we will use the two expressions interchangeably.We will prove a slightly stronger statement: every vertex that is not in a (k + 1)-clique C can appear in at most one down-up neighbor of C that is not its up-down neighbor.Let us consider a (k + 1)-clique C and suppose there is a vertex v among the n − k − 1 vertices that are not in C such that v belongs to two distinct down-up neighbors C 1 and C 2 of C. That is, suppose there are two distinct vertices u 1 , u 2 ∈ C such that C 1 = C \{u 1 }∪{v} and C 2 = C \{u 2 }∪{v} are (k+1)-cliques.We show that C 1 and C 2 are up-down neighbors of C. Indeed, v must be adjacent to every vertex of C: from C 1 ∈ K it is adjacent to all vertices in C other than the vertex u 1 , and from C 2 ∈ K it is adjacent all vertices except for u 2 .So C ∪ {v} forms a (k + 2)-clique and hence C 1 and C 2 are up-down neighbors of C.This section's main observation is the following.Proof.As a consequence of Claim 4.2, every k-face has at most n−k−1 down-up neighbors that are not its up-down neighbors.Following Lemma 4.1, these elements correspond exactly to the nonzero off-diagonal entries in ∆ k .Adding the diagonal element (∆ k ) ii , we obtain that the total number of nonzero entries in a row of the k-th combinatorial Laplacian is at most n − k.
To improve this bound, notice that if a vertex v is adjacent to all the vertices of a k-face (i.e, we have an up-neighboring (k + 1)-face), then v cannot be in any down-up neighbor that is not an up-down neighbor as well.Thus, using Lemma 4.1 again, we can say that every up-neighbor "cancels" the corresponding down-up neighbors in ∆ k .
Hence, if the k-face corresponding to the i-th row of the combinatorial Laplacian has up-degree d up i , then the number of nonzero entries in the i-th row of ∆ k is at most n − k − d up i .
From this, we get the following corollary.Proof.Recall that H = I − ∆ k / λ.Thus, where for the first inequality we used the fact that |(∆ k ) ij | is either 1 or 0 if i ̸ = j, and by Lemma 4.1 it is 1 at most n times in every row or column.In the second inequality we used the fact that 0 ≤ (∆ k ) ii ≤ n.Combining, we have the result.• 1 δ 2 .This directly propagates to Algorithm 2, improving its time complexity from The poly(n) term comes from the time required for sampling (Lemma 3.3).This completes the proof of the second item of Theorem 1.1.

Lemma 3 . 4 .
For any δ > 0 and integer z ≥ 0, we can obtain a δ-additive estimate of E[Y z ] = Tr(H z )/d k by taking the average of O(n 4z )/δ 2 many independent samples of Y z .

For r ≥ 1 γ log 2 ε 1 γ log 1 ϵ
, we know from Lemma 3.1 that Tr(H r )/d k = β k /d k ± ε/2.Hence, setting δ = ε/2 and z = r in the algorithm above we get an ε-additive estimate of β k /d k .The algorithm requires p = O(n 4r /δ 2 ) = n O samples of Y r , each of which can be obtained in time r • poly(n) by Lemma 3.3.The overall time complexity of Algorithm 1 is hence n O 1 γ log 1 ε .

Lemma 3 . 5 (
Follows from[22, Theorem 3.3]).For any δ > 0 and d ≥ 2r log(2/δ), the monomial x r can be approximated by a polynomial p r,d (x) of degreed such that |p r,d (x) − x r | ≤ δ for all x ∈ [−1, 1].Now we bound the size of the monomial coefficients in p r,d (x), as these will govern the precision with which we need to estimate the trace of each monomial.Following[22,  Chapter 3], the polynomial p r,d (x) is obtained by first approximating x r in the Chebyshev basis byp r,d (x) = α i T i (x)Accepted in Quantum 2023-12-04, click title to verify.Published under CC-BY 4.0.

≤ 1 Lemma 3 . 6 .
i)/2 /2 r if i has the same parity as r, and α (r) i = 0 otherwise.We then obtain the desired coefficients of p r,d in the monomial basis by expressing each of the Chebyshev polynomials in the monomial basis.Concretely, if T i (x) = i ℓ=0 c this lemma yields the upper bound|b ℓ | ≤ (d + 1) • 2 • 2 2d ≤ 2 3d.For all i ∈ N and ℓ ≤ i, we have c(i) ℓ ≤ 2 2i .

Lemma 4 . 3 .
The k-th combinatorial Laplacian of a clique complex has at most n−k −d up i nonzero entries in every row, that is,|{j : (∆ k ) ij ̸ = 0}| ≤ n − k − d up i ∀i ∈ d kwhere d up i is the up-degree of the k-face corresponding to the i-th row of the combinatorial Laplacian.