Optimizing sparse fermionic Hamiltonians

We consider the problem of approximating the ground state energy of a fermionic Hamiltonian using a Gaussian state. In sharp contrast to the dense case, we prove that strictly $q$-local $\rm {\textit {sparse}}$ fermionic Hamiltonians have a constant Gaussian approximation ratio; the result holds for any connectivity and interaction strengths. Sparsity means that each fermion participates in a bounded number of interactions, and strictly $q$-local means that each term involves exactly $q$ fermionic (Majorana) operators. We extend our proof to give a constant Gaussian approximation ratio for sparse fermionic Hamiltonians with both quartic and quadratic terms. With additional work, we also prove a constant Gaussian approximation ratio for the so-called sparse SYK model with strictly $4$-local interactions (sparse SYK-$4$ model). In each setting we show that the Gaussian state can be efficiently determined. Finally, we prove that the $O(n^{-1/2})$ Gaussian approximation ratio for the normal (dense) SYK-$4$ model extends to SYK-$q$ for even $q>4$, with an approximation ratio of $O(n^{1/2 - q/4})$. Our results identify non-sparseness as the prime reason that the SYK-$4$ model can fail to have a constant approximation ratio.

Approximating the ground state energy of a local Hamiltonian is a central problem in both physics and computer science.In computer science it plays a key role in complexity theory [3], while in physics ground states capture the behaviour of systems at low energy.Two common families of Hamiltonians of interest are those defined on collections of qubits and those acting on fermionic degrees of freedom.Fermionic Hamiltonians model various physical systems, such as electrons in condensed matter and quantum chemistryprime targets for quantum simulation.Fermions also define a model of quantum computation, equivalent to the one based on qubits [4].Despite its practical and conceptual relevance, the general problem of approximating fermionic ground state energies is currently less well understood than its qubit counterpart.
Some rigorous progress in studying this problemboth for qubits and for fermions -was made from the perspective of optimization.In this subfield of computer science, one of the central tasks is efficiently finding problem solutions that are provably close to optimal [5].The closeness is usually quantified by an approximation ratio, i.e. the ratio between the value attained by an algorithm and the optimal value for a given problem.For the classical equivalent of the ground state energy finding -Constraint Satisfaction Problems (CSPs) -such approximation ratios have been extensively studied [6].
For quantum Hamiltonians, an interesting question is how well the ground state energy can be approximated using "classical" or "mean-field" states.For qubit Hamiltonians the natural choice of classical states are product states, while for fermionic Hamiltonians they are Gaussian states.Gaussian states play a prominent role in fermionic optimization problems using the mean-field Hartree-Fock method, see e.g.[7], or dynamical mean-field theory via solving impurity problems [8], or the simulation of free fermionic computation [9,10].
Formal guarantees on approximation ratios characterize numerical simulation methods using classical states and outline their limitations compared to quantum computing.For qubit Hamiltonians, it was first proved by Lieb [11] (see [12] for a simplified proof) that there always exists a product state which approximates the ground state energy of a traceless 2local qubit Hamiltonian by a factor of 1/9.Many more results on approximating ground state energies of many-body systems by product states can be found in [13,14,15,16,17,18].In [12] it was shown, through the Goemans-Williamson method, that for a 2-local traceless qubit Hamiltonian a product state can always be efficiently found with approximation ratio O(1/ log(n)) where n is the number of qubits.Ref. [12] also considered fermionic Hamiltonians with quadratic (q = 2) and quartic (q = 4) fermionic terms.They left as an open question whether all 4-local fermionic Hamiltonians have a constant approximation ratio with respect to Gaussian states (a Gaussian approximation ratio).
A surprising counterexample to this conjecture was recently presented in Refs.[1,2] -the family of SYK-4 models (Sachdev-Ye-Kitaev models with quartic fermionic interactions, see Definition 2).It was shown that with high probability, SYK-4 Hamiltonians admits a Gaussian approximation ratio no better than O(1/ √ n) where n is the number of fermionic modes.Contrasting this result to Refs.[11,14], it means that qubit and fermionic ground states strongly differ in their approximability by classical states.Moreover, this opens up the question of which fermionic Hamiltonians do allow finite Gaussian approximation ratios.This is the question that we aim to answer here.We do this by considering sparse Hamiltonians, i.e.
Hamiltonians where each fermionic mode participates in a bounded number of interactions.Sparsity holds for many physically relevant Hamiltonians, such as the Fermi-Hubbard model.It also holds for exotic Hamiltonians, such as those determined by constantdegree expander hypergraphs; notably, it does not hold for the SYK model.Sparsity of interactions has been considered in the classical CSP literature.It was shown in [19] that the MaxQP problem has an efficient constant approximation ratio algorithm on graphs of bounded chromatic number, in particular graphs with bounded degree.We show that a similar assumption of sparsity is enough to guarantee constant Gaussian approximation ratios for 4-local and strictly q-local Hamiltonians.Moreover, we show that a constant Gaussian approximation ratio can be achieved for the sparse SYK-4 model [20] (which has a logarithmically growing interaction participation and is thus not sparse by our definition).Finally, we consider in more detail the optimal approximation ratio for the dense SYK-q model for q > 4 (thus extending the work of [2]).We show that the shortfall of Gaussian states is even more pronounced in this setting.
To avoid confusion, we note that instead of the ground state energy, existing works often consider approximating the maximal eigenvalue of the Hamiltonian λ max (H).These two optimization problems are equivalent if the family of Hamiltonians considered is invariant under a change of sign (e.g.traceless qlocal Hamiltonians).For mathematical convenience and consistency with the literature, in the rest of the text, we will also be formulating our results in terms of approximating λ max (H).

Preliminaries
Before surveying our results, we introduce the basic setup of fermionic Hamiltonians and q-locality.This subsection also defines the SYK-q model and spells out the previous result of a vanishing Gaussian approximation ratio for SYK-4.
We consider a system of 2n traceless Majorana fermion operators c i , i = 1, . . ., 2n with c 2 i = I, c † i = c i , forming a Clifford algebra, i.e., {c j , c k } = 2δ j,k I and representing n fermionic modes.We denote as I an ordered subset I = {i 1 , i 2 , ..i q } ⊆ [2n] ≡ {1, . . ., 2n} where i 1 < i 2 < . . .i q with q even.We denote C I as the Hermitian Majorana monomial and one can verify that C 2 I = I.We can think about a subset I as corresponding to a term or interaction in a Hamiltonian.Indeed, it is natural to impose some form of locality:  Definition 15).The next step is to match all Majorana operators, i.e., split the vertices into disjoint pairs, each connected by an edge (see panel (c)).We separately match the support of each term in one targeted set of terms (the color highlighted in (b) and (c)).The remaining vertices are matched in such a way that no two vertices connected by an edge belong to the same term.The Gaussian state is then created from the resulting matching, with only terms from the targeted set contributing to the energy.By optimizing the choice of the targeted set, a finite approximation ratio can be guaranteed.
Definition 1 (q-local fermionic Hamiltonian).Let H be a fermionic Hamiltonian on 2n Majorana operators.We say that H is q-local if H is a sum of Hermitian traceless terms C I of weight at most q, i.e. each term is proportional to a product of at most q operators c i .H is said to be strictly q-local when all terms have exactly weight q.
A local traceless fermionic Hamiltonian H = I∈I J I C I is thus characterized by an interaction set I and the coefficients J I ∈ R. The maximum eigenvalue of H is denoted by λ max (H) := max ρ Tr(Hρ).Sometimes we will refer to a collection of sets I denoted as I = {I 1 , I 2 , . ..}.The support of I is defined as Sup(I) = ∪ i I i and I ′ ⊆ I implies that the sets in I ′ are also sets in I. Definition 2 (SYK-q Model).A q-local (with q even) SYK model on 2n Majoranas is defined as a family of Hamiltonians where each J I is a Gaussian random variable (i.e., with zero mean and unit variance) and each C I is the product of the q distinct Majorana operators as in Eq. (1).We normalize the model in expectation, i.e., In [1] it was shown that with high probability (over the draw of J I s) for the SYK-4 model, one has max ρ Gaussian

Tr(Hρ) = O(1).
In order to thus provide a counterexample to a constant Gaussian approximation ratio, one needs to prove a lower bound on λ max (H) for the SYK-4 model, which holds with high probability, which was done in [2]: Theorem 3. [2] There is a poly(n)-time quantum algorithm that, given any SYK-4 Hamiltonian H, returns a quantum state ρ.With probability 1 − exp − Ω(n) (over the draw of the J I s), this state ρ has Tr(Hρ) = Ω( √ n).

Sparse fermionic Hamiltonians
Key to our work is the notion of a sparse Hamiltonian.

Definition 4. Let H be a local traceless fermionic
Hamiltonian of 2n Majorana operators.We say that H is k-sparse, for an integer k, if no Majorana operator c i occurs in more than k terms of the Hamiltonian.
Using graph theoretic terminology, one may say that interactions in a k-sparse Hamiltonian form a hypergraph of bounded degree k.This condition allows us to efficiently find Gaussian states with constant approximation ratio.We have the following theorem, which is the main result of our work: Theorem 5. Let H be a traceless fermionic Hamiltonian on 2n Majorana operators with maximal eigenvalue λ max (H).If H is k-sparse and strictly q-local and n > (q 2 −1)k, a Gaussian state ρ can be efficiently constructed such that The proof of this theorem is given in Section 5; its basic idea is explained in Figure 1.
We note that this proof only holds for Hamiltonians with terms of exactly weight q.Typical physical Hamiltonians, however, have quadratic (kinetic energy of the electrons) and quartic terms (potential energy due to Coulomb interaction).Fortunately, we can also show that in the q = 4 case we can include q = 2 terms.For this we use a trick from [12] to lift such a 4-local Hamiltonian to a strictly 4-local Hamiltonian.This trick makes the Hamiltonian nonsparse.However, we show in Section 6 that, in this special case, we can circumvent the non-sparseness of the Hamiltonian and achieve a constant Gaussian approximation ratio.

2.3
The sparse q = 4 SYK model In view of Theorem 5 it is worth revisiting the lack of a constant Gaussian approximation for the SYK model.
The SYK-q model in Definition 2 is extremely nonsparse, in the sense that every Majorana operator occurs together with all other Majorana operators.This makes the SYK model somewhat unphysical, and several sparse versions of the model have been considered [20,21].Such sparse models intend to produce the same (low energy) physics, while being easier to simulate on both quantum and classical computers (see sections III and V in [20]).The sparse SYK model is generated by including terms by a Bernoulli trial with a certain probability p tuned such that the expected sparsity is bounded: where the X I are i.i.d.Bernoulli random variables with p = Pr(X and the J I are i.i.d.Gaussian random variables with mean 0 and variance 1. Unlike the full SYK model with 2n 4 terms in H, the sparse SYK model has a number of terms ∼ n in expectation.Note that the SSYK-4 model is only ksparse in expectation, and with high probability there is a Majorana operator with degree Ω log (n)  log log(n) (the degree distribution follows that of an Erdős-Renyi hypergraph.See Theorem 3.4 in [22] for a proof of the statement for Erdős-Renyi graphs.The hypergraph version follows by the same logic).This means that Theorem 5 does not directly apply.However, one can show, through a truncation argument, that almost all instantiations of SSYK-4 can be sparsified, giving rise to a constant approximation ratio result that holds with high probability.Theorem 8. Let H be a SSYK-4 Hamiltonian in Eq. (5) with expected degree k = O (1), such that n > 120(k + 1).With probability at least 1 − 4 exp − e −16(k+1) k 3 64(8k+7) n , a Gaussian state ρ can be efficiently constructed such that where Thus we arrive at the surprising conclusion that the SSYK-4 model has a constant Gaussian approximation ratio, while the dense SYK-4 model does noteven though SSYK-4 has similar physical properties as SYK-4.

Higher-q SYK models
We investigate what Gaussian approximation ratios can be achieved for the dense SYK model of even weight q > 4, as this was left as an open question in [2].We establish an upper bound on the largest Gaussian expectation value of SYK-q, which behaves rather dramatically for q > 4. We prove the following Lemma employing a method similar to the one used in [1].

Lemma 9.
Let H be the dense SYK-q Hamiltonian (with even q ≥ 4 and q = O(1)).With probability at least 1 − exp − Ω(n) over the draw of SYK-q Hamiltonians, the expectation value of every Gaussian state ρ is bounded, more precisely: This Lemma is proved in Section 8. Our second result establishes a lower bound on the largest eigenvalue for SYK-q, essentially generalizing what was established in [2] for q = 4.We prove the following Lemma (its proof can be found in Section 8): Lemma 10.Let H be the dense SYK-q Hamiltonian with even q ≥ 4 (and q = O(1)).With probability at least 1 − exp − Ω(n) over the draw of SYK-q Hamiltonians, λ max (H) = Ω( √ n).
As an immediate consequence of the previous results, we see that the Gaussian approximation ratio of the dense SYK-q model can be no better than O n 1/2−q/4 : Theorem 11.Let H be the dense SYK-q Hamiltonian (with even q ≥ 4 and q = O(1)).With probability at least 1 − exp − Ω(n) over the draw of SYK-q Hamiltonians, we have Proof.Theorem 11 follows from combining Lemma 9 and Lemma 10 and applying the union bound.

Discussion
The goal of this section is to place our results in a broader context and mention a few open questions.
First, let us discuss the relation between this work and the fermion-to-qubit mapping methods.As was shown in [4], one can map a sparse O(1)-local fermionic Hamiltonian onto a sparse O(1)-local qubit Hamiltonian (BK-superfast encoding).However, for this mapping one needs to enforce parity checks which are in general nonlocal; therefore, we cannot obtain our Theorem 5 in this way.There is also an additional obstacle: using the BK-superfast encoding, an approximating product state for the qubit Hamiltonian does not necessarily map back to a Gaussian fermionic state.
Ref. [4] also showed that one can map a general local fermionic Hamiltonian (like a SYK model) onto a qubit Hamiltonian with terms which are O(log n)local.Such qubit Hamiltonian is generally not expected to have a constant approximation ratio by a product state due to its n-dependent locality.In fact, one can easily prove that a dense model like the SYK model can only be mapped onto a qubit Hamiltonian which is Ω(log n)-local.We give the argument in Appendix A.
These observations suggest that approximation ratios by classical states such as Gaussian states or product states are likely to be affected by sparsity in the case of fermions, which is consistent with our new results.
Another question which is raised by our work and that of [2] and [15], is whether studying fermionic Hamiltonians can lead to new insights into the possibility of a quantum PCP theorem [23].In this context it is important to mention that, besides the lower bound in Theorem 3, Ref. [2] also determined an upper bound on λ max of the SYK-4 model showing that with high probability λ max = Θ( √ n).This shows that the SYK-4 model is extremely frustrated: the maximal average expected energy per term, the energy density, is only Θ(n −3/2 ).In contrast, our results for the sparse SYK model (see Lemma 23) show that the maximal average expected energy per term is Ω(1), which is the more 'natural' physical scaling.A simple fermionic toy model in which the maximal average energy per term decreases is a model in which an extensive set of Majorana operators is mutually anti-commuting, see Lemma 29 in Appendix A. The presence of many such fully-anticommuting sets in the SYK model can be seen as one of the intuitive reasons why the maximal energy density achieved is so low.
For k-local qubit Hamiltonians researchers have looked at the hardness of approximating the maximal energy density with constant error ϵ: showing that this problem is QMA-complete would prove the quantum PCP theorem.For dense (non-sparse) klocal qubit Hamiltonians, it was proved in [15] (Theorem 13) that there is a polynomial-time classical algorithm to approximate the maximal energy density, using product state approximations.Ref. [24] generalized this result and formulated an efficient classical algorithm which approximately estimates the free energy of a 2-local dense qubit Hamiltonian.
One can similarly ask the question of approximating the maximal energy density for dense q-local fermionic Hamiltonians.Observe that the question is moot if the maximal energy density decreases as a function of n (as in the SYK model), since for large enough n (depending on ϵ) the classical algorithm could always output 0 and make an error less than ϵ.However, other dense O(1)-local fermionic Hamiltonians could exist for which this question is nontrivial and not already covered by the dense qubit case.
There are further open directions that are more practically-oriented.One of these is achieving finite approximation ratios for at least some classes of nonsparse fermionic Hamiltonians (e.g., quantum chemistry or lattice systems with long-range Coulomb interactions).Furthermore, in most applications, one is interested in obtaining approximation ratios as close to 1 as possible.Although for most systems of interest one cannot expect ratios that are ϵ-close to 1, our theoretical lower bounds could still be vastly improved.For instance, for sparse SYK with k = 10, the guaranteed ratio is only ≃ 5 × 10 −6 (cf.Theorem 8).This can be contrasted to the Hartree-Fock applications to quantum chemistry systems, which usually achieve approximation ratios of > 0.9.Improving our results to derive more realistic lower bounds could be of great value; some possible approaches are as follows.
One option is to extend the interaction subsets targeted by the constructed Gaussian state beyond the diffuse subsets considered here.If the overlapping interactions in the problem Hamiltonian are not prone to frustration, including them in the targeted set may dramatically increase the approximation ratio.The proof of Theorem 6 (Section 6) is a special case of this approach, with the constructed Gaussian state targeting multiple overlapping terms at the same time.
Another option for improvement is to minimize the contribution from frustration terms instead of avoiding frustration altogether.This could both improve the eventual approximation ratio by targeting a larger pool of interactions, as well as allowing to mitigate the issue of non-sparsity.An example of this approach is the proof of Theorem 8 (Section 7), where the contributions from the non-sparse part of the Hamiltonian are shown to be small compared to the energy achieved by the Gaussian state.
As a third option, one can modify the basis of fermionic modes so that non-sparsity and frustration in the Hamiltonian are minimized.In the simplest case of q = 2, such a basis rotation can always turn all interactions into a diffuse set (simply by diagonalizing the Hamiltonian).A similar improvement may be possible for some classes of q-local Hamiltonians with q ≥ 4.
Developing these and other directions for efficient Gaussian ground state approximation are interesting possibilities for future research.
Finally, it would be interesting to provide a nonrandom family of fermionic Hamiltonians without a constant approximation ratio with respect to Gaussian states.

Background on Gaussian states
In this section, we first provide some background and definitions that will be used throughout the remainder of this text.

Gaussian states
We define the class of fermionic Gaussian states, which are ground states and thermal states of noninteracting, quadratic (q = 2), fermionic Hamiltonians, and give some of their useful properties.
We first note that any transformation by a real orthogonal matrix R ∈ SO(2n), i.e., where (β ij ) 2n i,j=1 is a real anti-symmetric matrix and the normalization is such that Tr ρ = 1.
Fermionic Gaussian states have a number of useful properties, which we list here for future use.
1.The matrix β can be block-diagonalized by a real orthogonal matrix R ∈ SO(2n) such that with b j ≥ 0. Therefore, ρ can be brought to the following standard form where ci = j R ij c j and λ j = tanh(2b j ) ∈ [−1, +1].
2. Each fermionic Gaussian state can be associated with a 2n × 2n correlation matrix Γ, with Γ is a real anti-symmetric matrix and hence there is a real orthogonal matrix R ∈ SO(2n) such that where the λ j are in Eq. (12).
4. The Pfaffian of a 2k × 2k anti-symmetric matrix A is defined as Alternatively, we can see the Pfaffian as a sum over perfect matchings in a graph of 2k vertices where an edge (i < j) has weight A ij and each matching contributes the products of these weights to the sum.For a Gaussian state with correlation matrix Γ, one has for even |I|: where Γ I is the |I| × |I| submatrix of Γ restricted to rows and columns in the ordered set I.
A special class of a pure Gaussian states is given by a perfect matching M of Majorana operators.Such matching M is specified by n disjoint pairs (m 1 , m 2 ) with m 1 < m 2 .For each pair we have a coefficient λ (m1,m2) = ±1, together forming the n-dimensional vector ⃗ λ.The class of states are of the form It is useful to introduce a notion of consistency between this class of Gaussian states specified by a matching M and an interaction subset I.

Definition 13. An (even) interaction subset I ⊆ [2n]
and a perfect matching M on [2n] are called consistent if M contains a perfect matching of the elements of I. Given a set of interactions I, we say that M is consistent (resp.inconsistent) with I if M is consistent (resp.inconsistent) with each interaction in I.
The following Lemma is straightforward Lemma 14.Consider a matching M and an interaction I = {i 1 , i 2 , ..i q }.
2. If M is inconsistent with I, then Proof.In order for the trace to be nonzero, one needs to exactly match the Majorana operators in C I with some in the expansion of ρ(M, ⃗ λ) since Tr(C I ′ ) = 0 for any non-empty subset I ′ .If M is inconsistent, there is no term in the expansion of ρ which precisely matches C I , so the expectation vanishes.If M is consistent, we have Here we have used that one can first reorder C I such that the pairs in the perfect matching are adjacent, i.e.C I = sign(π)i q/2 c i π(1) c iπ (2) . . .c i π(q) , then one can commute through each pair to its matching pair in ρ and use (c i c j ) 2 = −I, i q = (−1) q/2 and tr(I) = 2 n .

Approximation ratios for sparse fermionic Hamiltonians
In this section we prove Theorem 5. We begin by setting up needed definitions and stating several technical Lemmas (which are proved in the Appendices).
The key auxiliary notion in the proof of Theorem 5 is that of a diffuse subset of Hamiltonian terms.Intuitively, the terms in a diffuse subset are well separated from each other while covering only a limited part of the system.This idea is formalized as follows: Definition 15.Consider a set of q-local interactions I on 2n Majorana operators.A subset of these interactions I ′ ⊂ I is diffuse with respect to I, if the following three conditions apply: The parameter Q is given as Q = q(q − 1)(k − 1) 2 + q(k − 1) + 2 and does not depend on n.The construction of this splitting can be done efficiently, in time poly(n).
Lemma 16 is a special case of Lemma 19, which is proven in Appendix B. The proof relies on a combinatorial argument on a graph that takes Hamiltonian terms as vertices and connects them with an edge if the pair violates conditions 1 or 2 of Definition 15.By the sparsity assumption, this graph has an efficiently constructable coloring with a bounded number of colors, from which the split I = Q α=1 I α can be constructed.
The usefulness of diffuse sets comes from Lemma 20, see its proof in Appendix C.Here we state its corollary, relevant to proving Theorem 5: Lemma 17.Let the interaction set I ′ be diffuse w.r.t.I ⊃ I ′ (I ′ and I are strictly q-local and k-sparse).If n > (q 2 −1)k, one can efficiently construct a matching M of the set [2n] that is consistent with each interaction in I ′ and inconsistent with each interaction in I\I ′ .With matchings introduced above, one can construct useful Gaussian states.The tool to do so is given by the following statement: Lemma 18.Let H = I∈I J I C I be strictly q-local and I ′ be a diffuse subset of I. Let M be a matching of [2n] as guaranteed by Lemma 17.One can efficiently construct a Gaussian state ρ I ′ with the property: Lemma 18 is a specific case of a slightly more general Lemma 21, which is stated and proven in Appendix D. We denote As shown below, Theorem 5 can be proven by constructing a diffuse I ′ ⊂ I and a corresponding Gaussian state ρ I ′ with large enough Tr(Hρ I ′ ) = J (I ′ ).

Theorem (Repetition of Theorem 5). Let H be a traceless fermionic Hamiltonian on 2n Majoranas with maximal eigenvalue λ max (H).
If H is k-sparse and strictly q-local and n > (q 2 − 1)k, a Gaussian state ρ can be efficiently constructed, such that Proof.For a Hamiltonian H = I∈I J I C I , we construct the splitting of I into diffuse subsets I = ∪ α I α as guaranteed by Lemma 16.Next, find α = argmax α ′ J(I α ′ ); since Q in Lemma 16 is constant, α can be found efficiently.Next, use Lemma 17 to construct a matching M (I α ) (the condition n > (q 2 −1)k is satisfied by assumptions of Theorem 5).Since I α is diffuse with respect to I, the Gaussian state ρ Iα can be efficiently constructed from M (I α ) via Lemma 18.
Using Tr(Hρ Iα ) = J(I α ), the following inequality can be obtained for the resulting approximation ratio: For the first inequality, note that λ max (H) ≤ The second inequality comes from a pigeonhole-type argument: if (23) concludes the proof, as it asserts the approximation ratio bound claimed in the Theorem.

Sparse Hamiltonians with terms of weight 2 and 4
In this section we prove Theorem 6.We will again need to use the concept of diffuse subsets in Definition 15.The proof of Theorem 6 is similar in its basic idea to that of Theorem 5.The main obstacle in this case is the presence of terms of different weight, which does not allow one to use Lemmas 16-18 directly.This can be resolved by a slightly more elaborate construction and applying the more general Lemmas 19-21 which are proved in the Appendices and Lemmas 16-18 directly follow as special cases.Lemma 16).Let I be the interaction set of a k-sparse q-local Hamiltonian on the set of Majorana fermions [2n].The set I can be split into (qQ)/2 disjoint, strictly 2q ′ -local subsets

Lemma 19 (Generalization of
) each of which is diffuse with respect to I: The parameter Q = q(q − 1)(k − 1) 2 + q(k − 1) + 2 does not grow with n.The construction of this splitting can be done efficiently, in time poly(n).
Lemma 20 (Generalization of Lemma 17).Let a strictly q ′ -local I ′ be diffuse w.r.t.q-local k-sparse I on [2n], such that n > (q 2 − 1)k.One can efficiently construct a matching M of [2n] that is consistent with I ′ and inconsistent with all interactions I ∈ I\I ′ such that (1) |I| ≥ q ′ or (2) I ̸ ⊂ Sup(I ′ ).
Lemma 21 (Generalization of Lemma 18).Let H = If M is consistent with I ′ and inconsistent with I\I ′ , one can efficiently construct a Gaussian state ρ I ′ with the property: Tr(Hρ In Lemma 21, we use n ′ instead of n to avoid confusion, as it will also be used for n ′ ̸ = n.The Lemmas above are proven in Appendices B-D.With these in hand, we are ready to proceed with the proof of Theorem 6.

Theorem (Repetition of Theorem 6). Let H be a traceless fermionic Hamiltonian on [2n] with maximal eigenvalue λ max (H).
If H is k-sparse with terms of weight 2 and 4 and 2n > 15k, a Gaussian state ρ can be efficiently constructed, such that Proof.We make use of the construction in Ref. [12] which relates a Hamiltonian with weights 2 and 4 on a set of fermionic modes [2n], that is, to a strictly 4-local Hamiltonian H on an extended set of fermions [2n + 2]: }, H can be also written as: The relation between H and H is via the following property: Lemma 22 (Lemma 6 of [12]).For H and H introduced above, λ max (H) = λ max ( H).Moreover, for any Gaussian state ρ of 2n + 2 Majorana modes, one can efficiently compute a Gaussian state ρ of 2n Majorana modes s.t.Tr(Hρ) ≥ Tr( H ρ).
Although strictly 4-local, Hamiltonian H is no longer sparse since the operators c 2n+1 and c 2n+2 participate in |I (2) | terms (which is generally O(n)).This prevents a direct application of Lemma 16 to H. We resolve the issue as follows.
Similarly to the proof of Theorem 5, we start by splitting each set of the original interactions I (2,4) in H into subsets diffuse w.r.t.I (2) ∪I (4) : α .Each of the two splittings exists and can be done efficiently, as guaranteed by Lemma 16 (since the original H is sparse).Since I (2) ∪ I (4)  is k-sparse and 4-local, we can bound In what follows, we will use the splittings α to construct two Gaussian states ρ(I With these Gaussian states, we will then show that the Gaussian state ρ(I ) is efficiently constructable and yields the desired approximation ratio for H.We will then apply Lemma 22 and extend the statement to the original Hamiltonian H, thus finishing the proof.
Following the outline above, we now move to construct the Gaussian state ρ(I α is 2−local and diffuse w.r.t.I (2) ∪ I (4)  which is 4-local.Since 2n > 15k by assumptions of Theorem 6, we can apply Lemma 20 with q = 4 to construct a matching M (I α is 2-local, Lemma 20 also implies that the matching M (I . This implies the following expression (using Eq. ( 28) for H): By choosing σ to be the +1 eigenstate projector of operator −ic 2n+1 c 2n+2 , we arrive at the desired outcome: The constructed Gaussian state ρ we will denote as ρ(I 4) , we construct the Gaussian states ρ(I   shown in green.Vertices i 1 , i 2 are chosen not to belong to the same term in I (2) , ensuring no accidental consistency with a term in H.Note the special status of the term from I (2) that is a subset of the I (4) α term.From the perspective of H, it is not consistent with M (I (4) α ) although it coincides with an edge from M (I (4) α ).This is due to the intentional absence of the edge (2n + 1, 2n + 2) in M (I α but may be consistent with some terms in I (2) (as those I don't obey the |I| ≥ q ′ = 4 condition).At the same time, we aim to achieve Tr( H ρ(I α ) which excludes contributions from I (2) .Thus we cannot extend M (I α .Instead, we will create a matching of [2n+2] using a reduced version of M (I (4) α ) which inherits its beneficial properties, and then complete the matching by making it inconsistent with Ĩ(2) -eliminating the difficulty described above.
To enable this, we find and mark an edge (i α ).This implies that as a twofermion interaction, {i 1 , i 2 } is guaranteed not to belong to I (2) .The latter statement is the key property of the marked edge (i 1 , i 2 ) that we will employ momentarily.
The Gaussian state claimed in Theorem 6 is to be chosen among the states ρ(I (2,4) α ) whose existence we've proven above.We make the choice by identifying the highest energy in the respective Gaussian state: (q, α) = argmax (q,α) J (q) α .As we showed, the respective Gaussian state ρ(I (q) α ) can be efficiently constructed and the following is guaranteed: Here we used that λ max ( H) With the state ρ(I (q) α ) on [2n + 2] fermions at hand, we finalize the proof by an application of Lemma 22.This relates λ max (H) to λ max ( H) and allows us to efficiently construct the Gaussian state ρ(I with the desired property: where Proof.In what follows we will omit the normalization 1/ √ 2kn in Eq. ( 5), of course this normalization is irrelevant for lowerbounding the Gaussian approximation ratio.We split the Hamiltonian H (k ′ ) is k ′ -sparse and the residual Hamiltonian h (k ′ ) contains the rest of H.The term sets are denoted as follows: To define such a split, we use the following deterministic algorithm.For every given Majorana, we list the interactions I ∈ I which involve that Majorana using a lexicographical order for the words I = {i 1 , i 2 , i 3 , i 4 }.For each Majorana where such a list is longer than k ′ , we mark all elements except for the first k ′ .All terms of H which were marked this way at least once, we include into h (k ′ ) .The rest of the terms enter H (k ′ ) , which by this construction is k ′ -sparse.To continue the proof we need a pair of Lemmas.The first lower bounds the total interaction strength of the SSYK-4 Hamiltonian: Lemma 23.With probability at least 1 − 2e − kn 32 , we have This statement is proven in Appendix E, by splitting the problem into upper bounding |I| separately from I |J I |, and then applying the Chernoff bound for both.
The second lemma shows that the total interaction strength of the residual Hamiltonian h (k ′ ) is bounded from above with high probability: Lemma 24 is proven in Appendix E. The key technical difficulty is bounding the random variable Ī(k ′ ) , which does not reduce to a sum of independent variables and thus a simple Chernoff bound cannot be applied.Instead, we apply an exponential version of Efron-Stein inequality [25].
To build a Gaussian state with finite approximation ratio, we apply the construction of Theorem 5 to H (k ′ ) , which is k ′ -sparse and strictly 4-local.If n is large enough (i.e.n > (q 2 − 1)k ′ for q = 4), this state ρ is guaranteed to yield energy Tr(H 23) in the proof of Theorem 5).At the same time, with high probability |Tr(h 23 and 24).The resulting approximation ratio is then: Crucially, the second term decays exponentially with k ′ and the first term only algebraically (note here the definition of Q ′ ).We now fix k ′ = 8(k + 1), consistent with the requirement k ′ ≥ e 2 k + 1 of Lemma 24.In this case as a function of k is always smaller than 1 2Q ′ .This allows us to bound the right hand side of Eq. ( 42) as 1 2Q ′ , and substituting k ′ = 8(k +1) we obtain the bound claimed in the Theorem: The earlier assumed condition n > (q 2 −1)k ′ for q = 4 and k ′ translates into n > 120(k + 1).Given the conditions of Lemmas 23 and 24, the bound in Eq. ( 43) holds with the probability: 8 Upper bound on Gaussian approximation ratio for SYK-q Hamiltonians 8.1 Gaussian upper bound for SYK-q models We consider the expectation value of a SYK-q Hamiltonian H with respect to fermionic Gaussian states and we obtain an upper bound on its expectation value, with high probability over the random couplings J I .
Lemma (Repetition of Lemma 9).Let H denote a Hamiltonian drawn from the q-local SYK Hamiltonians (with q ≥ 4 even and q = O(1)), i.e. the coupling strengths J I are drawn according to their distribution.With probability at least 1 − exp(−Ω(n)), H has the property that, for any fermionic Gaussian state ρ Tr(Hρ) ≤ (q−1)!! 2 1/2−q/4 q 1/2+q/2 × log[q/ log(3/2)] (2n) 1−q/4 .(45) Proof.We first use Wick's theorem on the expectation of a product of Majorana operators w.r.t. a fermionic Gaussian state ρ characterized by a correlation matrix Γ, see Eq. (15).Note that the correlation matrix Γ i<j can be viewed as a real d := (2n 2 −n)-dimensional vector.We note that i<j Let M (I) be a perfect matching of the indices in I (|I| even), there are (q − 1)!! such matchings.We have Here we have assumed that for each matching M (I); i 1 (M ) < i 2 (M ), i 3 (M ) < i 4 (M ), . .., i q−1 (M ) < i q (M ), i.e. any sign arising from getting the expression to this form is absorbed in sign M (I) ).
The expectation of H in Eq. ( 2) w.r.t.fermionic Gaussian states ρ can be written as: sign M (I) sign M (I) We note that we can view Tr(Hρ) as a sum of (q −1)!! terms, one for each matching M of some subset of indices I, i.e.Tr(Hρ) = M Tr(H M ρ) where M ) .We have defined the q/2-way, d × d × . . .× d, tensor J(M, I), whose entries are equal to either zero (when the indices coincide or are not ordered properly) or to a standard Gaussian random variable.Each J I appears only once in Tr(H M ρ) and therefore all entries of J(M, I) are statistically independent.We note that sign(M ) does not depend on which (ordered) subset I one chooses.To bound each term Tr(H M ρ), with high probability, we invoke the following Lemma: Lemma 25. (Theorem 1 in [26].)Let A be a random K-way tensor ∈ R d1×d2×...×d K and w i be vectors ∈ R di and If we have for each fixed unit vector w i /∥w i ∥ (i ∈ {1, . . ., K}): , with probability at least 1 − δ.
To apply the Lemma, note that the vectors w i correspond to Γ i<j viewing i < j as a single index and we can use their norm ∥Γ∥ ≤ n1/2 .In addition, for each entry in the tensor we have E exp t J(M, I) k1,...,k q/2 ≤ exp t 2 /2 (for t ≥ 0) as the entry is zero or a Gaussian variable with variance 1 and mean zero.Using Chernoff's bound and the fact that all entries of J(M, I) are statistically independent, we conclude that for any set of real vectors w 1 , . . ., w q/2 one has Pr Therefore, for each term H M we can apply Lemma 25 and, using K = q/2 and σ = 1, obtain with probability at least 1 − δ.Then we can first bound max ρ Gaussian where we have used that 2n q ≥ (2n/q) q .We can now combine the upper bound in Eq. (50) and Eq.(51).Applying the union bound, we have with probability at least 1 − (q − 1)!! δ, that max ρ Gaussian Tr(Hρ) ≤ (q − 1)!! 2 1−q/2 q q+1 × (2n) 2−q/2 − (2n) 1−q/2 log q/ log(3/2) + 2 3−q/2 q q (2n) −q/2 log 2δ −1 Therefore, we can take δ = exp − Ω(n) such that, asymptotically, we have (assuming q = O(1)): Tr(Hρ) ≤ (q − 1)!! 2 1/2−q/4 q 1/2+q/2 × log[q/ log(3/2)] (2n) 1−q/4 , (53) with probability at least 1 − δ.Note that in deriving this upper bound we only use the norm of the correlation matrix Γ, hence this upper bound is not necessarily achievable by a Gaussian state as the constraint Γ T Γ ≤ I imposes more conditions on Γ than just an upper bound on its norm.

Maximum eigenvalue lower bound for qlocal SYK Hamiltonians
To show that fermionic Gaussian states cannot achieve a constant approximation ratio for q ≥ 4 SYK models, we derive a lower bound on the maximum eigenvalue of the Hamiltonians H in Eq. (2): Lemma 10).For the class of q-local SYK Hamiltonians (with even q ≥ 4) in Eq. (2), λ max (H) = Ω( √ n) with probability at least 1 − exp − Ω(n) over the draw of Hamiltonians.
The remainder of this section will be devoted to proving this Lemma.The techniques used are similar to those used in Section 6 of Ref. [2].We note that throughout this section, we shall use C to denote a quantity that is constant in n or is bounded from above and below by a constant in n, and it will generally differ from appearance to appearance (for the sake of clarity).Importantly, C can contain factors of q (note that q = O(1)).
We start by obtaining a lower bound on the maximum eigenvalue of a so-called 2-colored SYK model and will use this to prove Lemma 10.The Hamiltonian of such a 2-colored SYK model is slightly different from the standard SYK model Hamiltonian in Eq. (2).We divide the 2n Majorana operators into two subsets, with sizes n 1 and n 2 (n 2 ≤ n 1 ), and denote the operators in the first set by ϕ 1 , . . ., ϕ n1 and the ones in the second set by χ 1 , . . ., χ n2 .The Hamiltonian is now given by 1 : where Here ϕ S the product of q − 1 of the ϕ Majorana operators in subset S, and J S,j are independent Gaussian random variables.The subset S labels an ordered subset of q − 1 Majorana operators (note that these are different from the subsets I defined before that correspond to ordered subsets of q Majorana operators).We note that the (Hermitian) τ j operators do not necessarily obey {τ j , τ k } = 2δ jk I, but instead satisfy E({τ j , τ k }) = −i q−2 δ jk I.

Lemma 26.
Let {ϕ i } n1 i=1 and {χ i } n2 i=1 be n 1 + n 2 Majorana operators.For the class of q-local 2-colored SYK Hamiltonians (with even q ≥ 4) in Eq. (54) defined in terms of these Majorana operators, the maximum eigenvalue of the Hamiltonian λ max (H) is lower bounded by C √ n (with C a constant) with probability at least 1 − exp − Ω(n) over the draw of Hamiltonians.
Proof.We introduce a new set of Majorana operators (again of size n 2 ) σ 1 , . . ., σ n2 (which do obey {σ j , σ k } = 2δ jk I) and we define the quadratic Hamiltonian H ′ : This quadratic Hamiltonian H ′ is optimized by the fermionic Gaussian state ρ 0 = The idea is now to construct a new state ρ θ obtained from ρ 0 by applying a unitary transformation to ρ 0 , and to find a lower bound for the expectation value of H (2)  w.r.t.ρ θ .(57) The expectation value of H (2) w.r.t.ρ θ is: Using the BCH expansion of H θ and Tr(H (2) ρ 0 ) = 0 , we obtain: where we have used the triangle inequality and ∥•∥ denotes the spectral norm.To lower bound Tr(H (2) ρ θ ), one now has to (i) lower bound θ Tr([ζ, H (2) ]ρ 0 ) and (ii) upper bound θ 2 ∥ [ζ, [ζ, H (2) ]] ∥.This proof technique is similar in spirit to the proof in [27], although their proof is for qubit Hamiltonians with boundeddegree interactions.
First, we find a lower bound for θ Tr([ζ, H (2) ]ρ 0 ) which holds with high probability: Tr([ where we have used that Tr([τ j σ j , τ k χ k ]ρ 0 ) is non-zero only for j = k, and the definition of τ j .The quantity Tr([ζ, H (2) ]ρ 0 ) is thus a chi-squared random variable (up to normalization factors and potentially a sign) with n 2 n1 q−1 degrees of freedom and its expectation value is given by: where we have used that E J S,j 2 = 1.We note that in order to obtain a positive first-order contribution to Tr(H (2) ρ θ ), one should take θ positive for q/2 even, and one should take θ negative for q/2 odd.Since Tr([ζ, H (2) ]ρ 0 ) is a chi-squared random variable with n 2 n1 q−1 degrees of freedom, the following tail bounds can be obtained [28]: for q/2 even, and ) for q/2 odd.The random variable Tr([ζ, H (2) ]ρ 0 ) is thus equal to 2 √ n 2 (−1) q/2 in expectation and the probability that -for any even q ≥ 4 -its norm is smaller than half the norm of this expectation is at most exponentially small in the system size.
In order to upper bound where the final sum over S, S ′ , S ′′ is over all (all sums over S, S ′ , S ′′ will implicitly have this constraint from now on).The nested commutator in this expression simplifies as follows (note that the product of i 3q/2−2 and the nested commutator is Hermitian): where (ϕ K σ k σ l χ j ) H denotes a Hermitian version of ϕ K σ k σ l χ j (i.e., ϕ K σ k σ l χ j up to potential integer powers of i) and K := (S△S ′ △S ′′ ) ∪ (S ∩ S ′ ∩ S ′′ ) (note that |K| is odd).We therefore have: where we have defined We now wish to find an upper bound on the expected value of the spectral norm of [ζ, [ζ, H (2) ]].And in addition, we would like to show that the spectral norm exceeds twice the value of this upper bound with probability that is at most exponentially small in the system size.To establish this, we will have to show the following: for even k proportional to the system size and for some α.Eq. ( 68) implies two things: First, since (2) ]] ∥ ≤ α (i.e., α is the upper bound on the expected value of the spectral norm).Second, applying Markov's inequality to the random variable ∥ [ζ, [ζ, H (2) ]] ∥ and using Eq. ( 68) yields with α ′ ≥ α.So taking α ′ = 2α and k equal to the system size 2n (= 2n 2 +n 1 ) yields the desired result of the probability of the spectral norm exceeding twice the value of the upper bound being at most exponentially small in the system size.
For convenience, we define A := [ζ, [ζ, H (2) ]].Since A is Hermitian (by direct calculation), the spectrum of A 2 is non-negative and therefore we have ∥A∥ k = λ max (A 2 ) k/2 ≤ Tr(A k ) (for even k).Using Eq. (66), we express A as C S⊆[2n2+n1] Q S C S for convenience, where C is a non-negative constant, Q S are real random variables, and C S denotes a Hermitian (even) Majorana monomial.In addition, we define the random variable (which is obtained by replacing Majorana monomials in A with 1) If we now assume that both hold for some even k and some constant α (note that the first condition will automatically be satisfied since {J S,j } is a collection of independent standard Gaussian random variables), then for even k we can establish where the first inequality is again Jensen's inequality and we have also used that E Tr(A k ) is real (since A is Hermitian) and that Re Tr C S1 ...C Sk is always at most 2 n2+n1/2 (note that Tr C S1 ...C Sk equals 2 n2+n1/2 up to integer powers of i, but imaginary contributions vanish in the sum).This establishes Eq. ( 68), and thereby the desired result.Therefore, what is left is to show that the second condition in Eq. ( 71) is satisfied.
From this point onward, we shall take n 1 and n 2 proportional to n, where 2n = 2n 2 + n 1 denotes the total number of Majorana operators.We now show that the second condition in Eq. ( 71) is satisfied for k = 2n and α = C √ n.In order to do so, we show that (where the factor of 2 n2+n1/2 is absorbed in C 2n ).To that end, we thus need to find an upper bound on the (2n)th moment of the random variable A(1) in Eq. ( 70).
In Appendix F, we derive this upper bound and indeed show that which is the desired result.Combining Eq. (59), Eqs.(62),(63) and Eq. ( 73), we conclude that there exists a θ = O(1) such that Tr(H (2) with probability at least 1 What is left is to show that this result also holds for the standard SYK Hamiltonian.This translation from 2-colored SYK Hamiltonian to standard SYK Hamiltonian is given in Lemma 27 below, and its proof is given in Appendix G.
This also concludes the proof of Lemma 10, i.e., that λ max (H) = Ω( √ n) with probability at least 1 − exp − Ω(n) over the draw of standard SYK Hamiltonians.

A Extensive sets of all anti-commuting terms
One can easily prove that when one maps a dense, non-sparse, fermionic model such as the SYK model onto a qubit Hamiltonian, the locality of the resulting Hamiltonian has to grow as some function of n, due to the following Lemma: Lemma 28.Any set of all-mutually anti-commuting Pauli strings {Q i } m−1 i=0 , each of weight at most k, on n qubits has cardinality m bounded as assuming that k(k − 1) < n.
Proof.Take Q 0 of weight at most k and let m − 1 Paulis Q i anticommute with it.We can represent each Pauli string as a 2n-bit string y, say Q 0 = y x y z where the Hamming weight |y x | ≤ k, |y z | ≤ k.Any other Q i in the set has to anti-commute with Q 0 on the support of the string y.First, note that the set of strings of length at most 2k which have symplectic inner product equal to 1 (so anti-commute) to a given string of length 2k is at most 2 2k−1 .Now we pick the largest subset M 1 of the set of elements Q 1 , . . .Q m−1 such that all elements in the subset act identically on the support of Q 0 , i.e. are represented by the same string of length at most 2k while differing beyond the support of Q 0 .Let the cardinality of this set be as the largest set should at least be a fraction 1/2 2k−1 of the total.So now we consider this set M 1 and their action on the remaining n−k qubits (outside the support of Q 0 ), where these elements all have to anti-commute.
In addition, each element has Pauli weight at most k − 1 (as we had to overlap with at least one Pauli with Q 0 ).We then reapply this argument on this set, leading to a new set M 2 with |M 2 | = m 2 ≥ m1−1 2 2(k−1)−1 acting on n − 2k qubits and having weight k − 2 etc.We can reiterate this process l times so that the remaining weight of the set of Pauli strings M l has k − l = 1.This implies that M l can contain at most 3 elements since they all need to anti-commute on a single qubit (assuming that n − kl > 0 or n − k(k − 1) > 0).So we have The SYK-4 model contains large (of size n) sets of mutually anti-commuting terms.An example is the set of all terms which only overlap on one fixed Majorana.Lemma 28 then shows that any fermion-to-qubit mapping (an encoding possibly using more qubits) will require the weight of some of the resulting Pauli terms to grow as a function of n.Note that the actual mapping by Bravyi and Kitaev [4] with k = O(log n) shows that the upper bound in Eq. (75) is not completely tight.
Another straightforward observation on the energy scaling of a model where all terms anti-commute is that λ max does not necessarily scale with the number of terms, as captured by the following Lemma .This is the maximal eigenvalue that can be reached since one can map each c I onto a single Majorana operator c i(I) as these sets form identical algebras.Then we can use the normalization of β I to view i β I c i(I) = c1 with single Majorana operator c1 (this is an example of the transformation in Eq. ( 9)).A single Majorana c1 has spectrum ±1 and hence the (hugely degenerate) spectrum of H is simply ± I J 2 I .
Thus, if all J I are of similar strength, we observe that the overall maximal energy scales as |I| rather than |I|.

B Splitting sparse Hamiltonians into diffuse interaction sets
Lemma (Repetition of Lemma 19).Let I be the interaction set of a k-sparse q-local Hamiltonian on the set of fermions [2n].The set I can be split into (qQ)/2 disjoint, strictly 2q ′ -local subsets ) each of which is diffuse with respect to I: The parameter Q = q(q − 1)(k − 1) 2 + q(k − 1) + 2 does not grow with n.The construction of this splitting can be done efficiently, in time poly(n).
Proof.Consider a graph G with vertices corresponding to interaction sets I ∈ I, where two interaction sets I 1 , I 2 are connected with an edge if either 1. they share at least one Majorana operator or 2. I 1 and I 2 both share Majorana operators with another set Here q(k − 1) is the maximal number of interactions I 2 directly sharing a Majorana fermion with any given interaction I 1 , and q(q − 1)(k − 1) 2 is the maximal number of interactions satisfying condition 2. Since a Q ′ -sparse graph is vertex-colorable by at most (Q ′ + 1) colors [29], we can split I into (Q ′ + 1) subsets I α , s.t.any two interactions I 1 , I 2 from a set I α are not connected by an edge in G.By definition of G, this amounts to sets I α satisfying the first two conditions of Definition 15.A greedy algorithm can be used to assign the vertices G with (Q ′ + 1) colors, so I α can be constructed efficiently.
Each interaction set I α can contain terms of different weight.For each value of α we define strictly 2q ′ -local sets I (2q ′ ) α (for q ′ = 1, .., q/2) by restricting to the strictly 2q ′ -local part of I α .This gives a splitting of I into efficiently constructable subsets I (2q ′ ) α : where all sets I (2q ′ ) α satisfy conditions 1 and 2 in Definition 15.The rest of the proof is concerned with the third condition of a diffuse set in Definition 15, for all sets I (2q ′ ) α .This means ensuring that for all values of α and q ′ , the support size |Sup(I )| is smaller than 2n q q+1 .Fix q ′ and consider sets I Consider the case where |Sup(I does not hold for at least one value of α, which we set to be α = Q ′ + 1 without loss of generality.
Let us prove that the violation |Sup(I simultaneously.The first scenario is excluded since I This can be further bounded as |Sup(I )| < 2nq/(q + 1).Thus we have shown that for a given q ′ , the condition 3 of Definition 15 -indeed cannot be violated by more than one I (2q ′ ) α .Consider all q ′ for which there exists a violation |Sup(I for any q, this violation can be fixed by splitting and ⌈|I )| ≤ 2nq/(q + 1).We conclude the construction by modifying the set for the considered q ′ : we redefine , and introduce one extra interaction set . The proof can now be finalized.Performing the above procedure for all q ′ where a violation was present, and completing the {I α=Q ′ +2 = ∅, we arrive at the splitting where are diffuse (satisfying all three conditions of Definition 15) with respect to I for all q ′ and α.The construction of I (2q ′ ) α is efficient, because each step can be implemented in time poly(n).

C Majorana matchings from diffuse interaction sets
Lemma (Repetition of Lemma 20).Let a strictly q ′ -local I ′ be diffuse w.r.t.q-local k-sparse I on [2n], such that n > (q 2 −1)k.One can efficiently construct a matching M of [2n] that is consistent with I ′ and inconsistent with all interactions I ∈ I\I ′ such that (1) |I| ≥ q ′ or (2) I ̸ ⊂ Sup(I ′ ).
Proof.We first note that for I ∈ I\I ′ the condition |I| ≥ q ′ implies I ̸ ⊂ Sup(I ′ ).Indeed, there are two possible options for I ∈ I\I ′ such that I ⊂ Sup(I ′ ).The first option is that I is a strict subset of a single interaction from I ′ .However, this is not possible given |I| ≥ q ′ , because I ′ is q ′ -local.The second option is for I to share Majorana modes with two or more interactions in I ′ .This is ruled out because I ′ is diffuse with respect to I (cf.Condition 2 in Definition 15).The above implies that it is sufficient to construct the matching M that is consistent with I ′ and inconsistent with {I ∈ I\I ′ |I ̸ ⊂ Sup(I ′ )}.
We construct M in two steps.First we construct a matching M ′ of Sup(I ′ ) (note |Sup(I ′ )| is always even).Next, we construct a matching M ′′ of the remaining Majorana modes [2n]\Sup(I ′ ).The desired matching of [2n] is the union M = M ′ ∪ M ′′ .
To construct M ′ , we match vertices of each I ∈ I ′ in an arbitrary way: for every such . This matching is always possible, since I ′ is diffuse and thus different interactions from I ′ do not overlap.Thus constructed M ′ (and therefore also M = M ′ ∪ M ′′ ) is explicitly consistent with all I ∈ I ′ .
To construct a matching M ′′ of [2n]\Sup(I ′ ), we aim to ensure that no (m 1 , m 2 ) ∈ M ′′ is a subset of any interaction in I.For this, consider a 'permitted edge' graph P with vertices [2n]\Sup(I ′ ), and edges inserted between every pair (i 1 , i 2 ) unless they belong to the same interaction in I.We aim to construct M ′′ as a perfect matching of P. Note that since I is q-local and k-sparse, the graph P has degree bounded from below as q+1 .Therefore, since n > (q 2 − 1)k by assumption, the degree of the vertices in P is lower bounded as Given this lower bound, we apply Dirac's theorem [30], which yields an efficiently constructable Hamiltonian cycle in the graph P. Matching M ′′ is then obtained by pairing the sequential vertices in this cycle, making it a perfect matching of P. By definition of P, M ′′ is guaranteed to contain at least one outgoing edge from every interaction in {I ∈ I\I ′ |I ̸ ⊂ Sup(I ′ )}.This makes M = M ′ ∪ M ′′ inconsistent with {I ∈ I\I ′ |I ̸ ⊂ Sup(I ′ )}, as desired.
Lemma 17, which is used in the proof of Theorem 5, is a special case of Lemma 20.To obtain Lemma 17, one sets q ′ = q and considers strictly q-local I instead of simply q-local.In this case all terms in I\I ′ satisfy the first condition of the Lemma, and therefore the constructed M is inconsistent with the entirety of I\I ′ .

D Matchings and Gaussian states
Lemma (Repetition of Lemma 21).Let H = I∈I J I C I on [2n ′ ] be q-local and I ′ be a diffuse subset of I. Consider a matching M of [2n ′ ].If M is consistent with I ′ and inconsistent with I\I ′ , one can efficiently construct a Gaussian state ρ I ′ with the property: Proof.For the given matching M , consider its associated Gaussian state pure ρ(M, ⃗ λ) of the form: Lemma 14 implies that the contribution to Tr(Hρ(M, ⃗ λ)) from inconsistent interactions I\I ′ vanishes and contributions from I ′ yield: The proof is completed by choosing an appropriate value for ⃗ λ.Since I ′ is diffuse, by Condition 1 of Definition 15, distinct interactions from I ′ do not share Majorana fermions.This means that the values λ (m1,m2) for different I in Eq. ( 84) can be chosen independently.In particular, by picking appropriate λ (m1,m2) = ±1, one can eliminate the sign of J I sign(π) and achieve a contribution |J I | for each I ∈ I ′ .Note that this procedure can be done efficiently, as it is simply a matter of choosing at most n ±1 values by checking the sign of most |I ′ | terms.Denoting the thus chosen ρ(M, ⃗ λ) as ρ(I ′ ), this yields Eq. (82).
A special case of Lemma 21 is Lemma 18 used in the proof of Theorem 5.

E Concentration bounds for sparse SYK-4
Here we derive the concentration bounds for the SSYK-4 Hamiltonian that were used in the proof of Theorem 8 (Section 7).We first prove an auxiliary Lemma that will be used later in this Section, allowing to separate the statistics of interaction selection and interaction strength: Lemma 30.For a ∈ [D], let X a be i.i.d.Bernoulli random variables X a ∼ Bern(p) and J a i.i.d.Gaussian random variables J a ∼ N(0, 1).Then for any integer d ∈ [D] Proof.To prove Eq. ( 85), first show It follows that This ends the proof of Eq. ( 85).In the same vein, one derives Eq. ( 86).Namely, we first have (cf.Eq. ( 87)): Similarly to Eq. (88), one obtains Eq. (86) from Eq. (89): We proceed with the proof of Lemmas 23 and 24, which were used in Section 7 to prove Theorem 8.

Lemma
X I is drawn from a Bernoulli distribution with probability p = kD −1 , i.e.X I ∼ Bern kD −1 .The second set is J I for all I ∈ I, distributed normally J I ∼ N(0, 1).We introduce auxiliary variables J a for a ∈ [⌈kn/4⌉] and J a ∼ N(0, 1).Then by Lemma 30: We can bound the first term using the Chernoff bound for sums of Bernoulli random variables.Substituting On the other hand, standard concentration properties of Gaussian random variables imply, see Lemma 31 at the end of this Appendix, Since exp − kn 4 (1 − log 2) ≤ e −kn/32 , the bound in Eq. ( 93) yields Lemma (Repetition of Lemma 24).If k ′ ≥ e 2 k + 1, we have with probability at least Proof.The random variable I∈ Ī(k ′ ) |J I | is a function of random variables X I ∼ Bern kD −1 for I ⊂ [2n], |I| = 4 and J I ∼ N(0, 1) for all I ∈ I.We introduce auxiliary random variables J ′ a ∼ N(0, 1) for a ∈ [K] where By Lemma 30, one can upperbound We now proceed with upper bounding , which is a random variable that counts the number of interactions in I involving a given Majorana c i .Since X I ∼ Bern kD −1 , k i follows the binomial distribution Bin(D, kD −1 ) (note however that different k i and k j are not necessarily independent).Given the construction of h (k ′ ) , it is clear that | Ī(k ′ ) | can be bounded by the 'excess degree' summed over all Majoranas.Concretely, using the Majorana degree function k i we define a random variable which has the immediate property Here we used the indicator function I ki>k ′ = 1 when k i > k ′ and 0 otherwise.Given Eq. (101 and thus it suffices to bound the former.We begin by calculating its mean: where we used linearity of E(.) and the permutation symmetry of the SSYK ensemble.Hence we now need to calculate E[(k 1 − k ′ )I k1>k ′ ] for a single Majorana (w.l.o.g.c 1 ).Since the associated degree k 1 ∼ Bin(D, kD −1 ), we calculate directly (denoting p = kD −1 ): The following identity holds [31]: where β w (y, z − y) is the regularized incomplete beta function.For integer y, z > y it is defined as We now aim to apply the Efron-Stein inequality [25] to bound deviations from the mean E(Z).For this, we introduce an additional set of independent random variables {X ′ I } such that X ′ I ∼ Bern kD −1 .This allows to define auxiliary functions where for a single interaction I only, the variable X I is replaced by X ′ I .Using the indicator function I Z>Z ′ I , a further auxiliary function V = V ({X I }) can be defined: where the averaging is performed over the additional random variables {X ′ I } alone.An exponential version of the Efron-Stein inequality (Theorem 2 of [25]) states for all θ > 0 and λ ∈ (0, θ −1 ): To employ Eq. ( 109), we have to bound E exp λV θ .First we upper bound V ({X I }) as a function.For all interactions I we claim, independent of {X I } and {X ′ I }: To show this, we will go through four possible cases: (X I , X ′ I ) = (0, 0), (1, 0), (0, 1), or (1,1).If X I = X ′ I , the left hand side of Eq. (110) vanishes, reproducing Eq. (110) for the cases (X I , X ′ I ) = (0, 0) and (1, 1).For (X I , X ′ I ) = (0, 1), Z is smaller than Z ′ I , because replacing X I = 0 by X ′ I = 1 cannot decrease the excess degree for any Majorana (cf.definition of k i and 2nZ = 2n i=1 (k i − k ′ )I ki>k ′ ).Due to the factor in this case is zero, in agreement with Eq. ( 110).The last case is (X I , X ′ I ) = (1, 0).As any interaction I only involves 4 fermions, the reduction of total excess degree 2n(Z X I =1 − Z X I =0 ) is at most equal to 4, independent of the rest of the variables {X I }.Therefore (Z − Z ′ I ) 2 I Z>Z ′ I for (X I , X ′ I ) = (1, 0) is at most equal to 4 n 2 , proving Eq. ( 110).From Eq. (110) it follows that , which we can use to bound V ({X I }).From the definition stated in Eq. ( 108) we get: Since X I ∼ Bin(1, kD −1 ), we have We further assume a constraint λ < n 2 θ 4 , which implies the inequality exp( 4λ θn 2 ) − 1 < 8λ θn 2 .This allows to further bound E exp λV θ : We now assume an additional constraint λ < 1 2θ , which strengthens the condition λ < θ −1 of Eq. ( 109).With this constraint, using Eq.(113) in Eq. (109), we obtain: This inequality is true regardless of θ and λ, insofar both numbers are positive and satisfy the constraints we introduced: 4λ For a valid θ to exist, it's necessary and sufficient that λ belongs to the interval (0, n 2 √ 2 ).For such λ, Eq. ( 114) holds, and combined with a Markov inequality it implies for any t > 0: We next choose the value of λ ∈ (0, n 2 √ 2 ) that optimizes the right hand side.If t 2 √ 2k < 1, this is achieved with λ = tn 16k .This yields the result We choose t = k 4 2(k ′ −1) e −k ′ , which automatically ensures the desired condition t 2 √ 2k < 1 because of the constraint k ′ > e 2 k + 1 that we assumed in the Lemma statement.We obtain: 2π(k ′ −1) e −k ′ (Eq.( 106)) and 2π + 1 2 < 2, we arrive at an upper bound for the probability To bound Note that our bound for 119) is always greater than our bound for ).This allows us to conclude the proof of the Lemma, as Eqs.(99) and (119) imply:

P
Proof.For J ∼ N(0, 1), we have The Chernoff bound then implies: Evaluating the two expressions at λ = 1 2 and λ = 1 respectively and using basic inequalities for the resulting constants, we obtain the two bounds claimed in the Lemma.

F Moment bound for dense SYK-q
In this Appendix, we establish the moment bound E A(1) 2n ≤ C √ n 2n , where A( 1) is defined as (in Eq. (70)): The function f in this expression is defined as (in Eq. (67)): We classify the terms in the sum in Eq. (124) into five classes whose total contributions to the sum are denoted by D 0 , D 1 , D 2 , D 3 and D 4 .D 0 comprises of all terms for which the three J's are distinct.We shall therefore call the call the D 0 contribution the diagonal-free contribution.D 1 comprises of all terms for which the three J's are equal.D 2 , D 3 and D 4 comprise of all terms for which exactly two out of three J's are equal.
Taking f into account, and thereby the terms that actually appear in A(1), we conclude that the terms appearing in each class D 0 , D 1 , D 2 , D 3 and D 4 correspond to the index sets given in Table 1.An illustration of examples of the index sets (S, j), (S ′ , k) and (S ′′ , l) associated with these different classes of contributions to A(1) is given in Figure 4.
Table 1: The index sets associated with each class of terms, and the index sets associated with each class of terms that appear in the expression for A(1) (i.e., taking f into account).
To upper bound the (2n)th moment of A(1) min , we upper bound the rth moments (for even r ≤ 16 • 2n) of D 0 , D 1 , D 2 , D 3 , D 4 separately.In particular, if E (D i ) r ≤ C √ n r for i = 0, 1, ..., 4 and all even r ≤ 16 • 2n, . Note that through the multinomial expansion and successive application of Cauchy-Schwarz inequality these former bounds indeed give an upper bound on the (2n)th moment of A( 1): where we have used that the multinomial coefficient can be upper bounded by C 2n and that the number of 5-tuples of non-negative integers whose sum equals 2n is upper bounded by Cn 4 (which is smaller than C 2n for some constant C).Although clearly the rth moments of e.g.D 0 have to only be bounded for even r ≤ 2 • 2n, we bound -for the sake of clarity -the rth moments for even r ≤ 16 • 2n for all D i 's.We first deal with the case of D 0 , since the fact that this contribution is diagonal-free allows one to employ a decoupling technique.Afterwards, we will consider the D 1 , D 2 , D 3 and D 4 contributions.First, we state the following lemma, which will be useful throughout this appendix.
Lemma 32.Let P and P ′ be two polynomials of centered Gaussian random variables (i.e., the monomials are formed by products of elements from a sequence of independent centered Gaussian random variables, and each variable is allowed to appear in a monomial multiple times) with non-negative coefficients.Then, for any even r, , and E P r−k (P ′ ) k is non-negative (for any integers r, k) since P and P ′ have non-negative coefficients and all moments of centered Gaussian random variables are non-negative.

F.1 Upper bound for moments of D 0 (diagonal-free contribution)
We start by noting that the function f takes on values 0 or 1, dependent on the index sets S, S ′ , S ′′ , j, k, l labeling the Majorana operators.We consider replacing f in each term of D 0 (Eq.(124)) with δ a,b δ c,d , where either or We denote this modified sum as D 0,δδ .By inspection, the index sets for which f is non-zero all correspond to a non-zero contribution for δ a,b δ c,d .Note that those index sets for which δ a,b δ c,d is non-zero also include index sets for which f is zero.Hence, the terms associated with non-zero δ a,b δ c,d (for the two options listed above) are a superset of the terms that correspond to non-zero values of f .Therefore, by Lemma 32, the upper bounds on even moments of D 0 can be obtained by upper bounding the even moments of D 0,δδ .We will denote the part of the sum D 0,δδ corresponding to option 1 as D 0,min : where the sum is over indices such that (S, j) ̸ = (S ′ , k) ̸ = (S ′′ , l) ̸ = (S, j) (by definition of D 0 ) and such that (S ′ ∪ k) ∩ (S ′′ ∪ l) and (S ′′ ∪ l) ∩ (S ∪ j) differ by at least one element.Any bound for all even moments of D 0,min also holds for D 0,δδ − D 0,min which corresponds to option 2, due to the symmetry (S, j) ↔ (S ′′ , l) between the two options.An upper bound on all even moments of D 0,δδ (and, by implication, D 0 ) then follows from binomial expansion and application of the Cauchy-Schwarz inequality, similarly to Eq. (126).Thus it only remains to prove E |D 0,min | r < (C √ n) r for all even r.To upper bound the even moments of D 0,min , we are going to employ a decoupling technique.To that end, we will study the even moments of a related decoupled quantity.This decoupled quantity is defined as D 0,min but with the standard Gaussian random variables J S,j , J S ′ ,k and J S ′′ ,l (selected from a single sequence of standard J 1 , . . ., J m is again a standard Gaussian random variable.We now obtain the following expression for D decoupled 0,min : K (1)  x,p1 K (2)  y,p2 K (3)  x,p3;y,p4 . (131) The sum over all free indices gives an extra total factor of n 3q/2−2 , which partially cancels against n 3q/2−1 in Eq. (129).Importantly, we note that now the random variables K (1) x,p1 and K (1) x ′ ,p1 are independent for x ̸ = x ′ (and equivalently for K x,p3;y,p4 ).We will apply Lemma 33 from [32] separately to each contribution to D decoupled 0,min in Eq. (131) (with a contribution corresponding to one combination of p i 's).
Lemma 33 (Theorem 1 in [32]).Let Y ∈ R N ×...×N be a d-dimensional matrix and define: where {K  i Ps : with each x ∈ R.

See e.g. Theorem 2.1 in [33].
The fact that this decoupling inequality only holds for diagonal-free polynomials is exactly the reason for differentiating between the diagonal-free contribution D 0 and the diagonal contributions D 1 , D 2 , D 3 , D 4 to A (1).For 2n x,y=1 K (1) x,p1 K (2) y,p2 K x,p3;y,p4 in Eq. (131), we see that d = 3 and hence the possible partitions P are {1, 2, 3}, {1}{2, 3}, {2}{1, 3}, {1, 2}{3}, {1}{2}{3}.The associated ∥Y ∥ P values can be (straightforwardly) calculated and are given in Table 2. Using Table 2 and Lemma 33, we find the following upper bound on x,p3;y,p4 r (for all even r): Note that D decoupled 0,min in Eq. 131 consists of q 4 (with q = O(1)) contributions, each corresponding to a given combination of p i 's.We can again use the multinomial expansion and successive application of the Cauchy-Schwarz inequality (together with the fact that the multinomial coefficients can be upper bounded by C r and that the number of q 4 -tuples of non-negative integers whose sum equals r is upper bounded by C r for some constant C) to conclude that the upper bounds of (C √ n) r for rth moments (for all even r) of these contributions imply an upper bound of (C √ n) r for rth moments (for all even r) of D decoupled 0,min .We now employ the decoupling inequality from the above remark to obtain From the arguments given previously, this implies the desired bound E |D 0 | r ≤ C √ n r for all even r, in particular for r ≤ 16 • 2n.
Table 2: The different partitions P of [3] into non-empty parts, with the associated number of parts |P|, and the associated ∥Y ∥ P for 2n x,y=1 K (1) x,p1 K (2) y,p2 K (3) x,p3;y,p4 in Eq. ( 131).∥Y ∥ P for the first four partitions can be straightforwardly evaluated by applying Eq. (133) to Eq. ( 128), and the fifth ∥Y ∥ P can be evaluated by additional application of the Cauchy-Schwarz inequality.In the previous section we used a decoupling inequality to upper bound the rth moments (for even r ≤ 16 • 2n) of D 0 .These decoupling inequalities hold for (Gaussian) polynomials for which each Gaussian monomial is a product of distinct Gaussian random variables, i.e., diagonal-free polynomials.This holds indeed -by definition -for the D 0 contribution to A(1), but not for contributions D 1 , D 2 , D 3 and D 4 .For that reason, we cannot make use of the same decoupling inequality for the D 1 , D 2 , D 3 and D 4 contributions.Therefore, we have to resort to other methods to bound their rth moments (for even r ≤ 16 • 2n).
• The D 1 contribution can be written as: The rth moment (with r even) of D 1 can be upper bounded as follows: where we have used that E m i=1 K i r = k1+...+km=r r! k1!...km! E K 1 ) k1 . . .E (K m ) km (for K 1 , . . ., K m independent random variables), the fact that (S, j) can take on 2n 2n q−1 values and the fact that the pth moment of a standard Gaussian random variable is equal to (p − 1)!! (≤ p p/2 ).For even r, we therefore conclude that • The D 2 and D 3 contributions are equivalent and can be written as: The rth moment (with r even) can be written as follows: We define g := j,S,S ′ s.t.S̸ =S ′ J S,j 2 J S ′ ,j , ( for which E(g) = 0. We note that g is a homogeneous polynomial in standard Gaussian random variables of degree 3. To upper bound the moments of g, and thereby the moments of D 2 and D 3 , we use the following result from [34].This result is an extension of Lemma 33 from [32] to the setting where diagonal terms are allowed to appear in the polynomial.The extension also includes inhomogeneous polynomials, although in the current setting we are considering only homogeneous polynomials.
Lemma 34 (Theorem 1.3 in [34]).Let K := K 1 . . ., K N denote a sequence of N independent standard Gaussian random variables and g : R N → R a polynomial of degree D.Then, for all r ≥ 2: E g(K) − E g(K) For g in Eq. (139), we have that N = n n q−1 , since the sequence of Gaussian random variables corresponds to {J S,j }.To find an upper bound for the rth moment of g using Eq.(143) In Table 3, we give the values of E D d g P for all partitions P([d]) for d = 1, 2, 3. E D d g P for d = 1 can be straightforwardly evaluated using Eq.(133) and for d = 2 can be trivially evaluated by using E D 2 g = 0.For d = 3, E D d g P can be upper bounded using Eq.(133), and the triangle and Cauchy-Schwarz inequalities (for illustration purposes, we provide an example of the derivation of this upper bound for P = {1, 2}{3} below).Combining the upper bounds for E D d g P in Table 3 with the factor of r |P|/2 (≤ Cn |P|/2 ) in Eq. (140) and the normalization factor in Eq. (138), we find -using E(g) = 0 -that indeed E |D 2 | r , E |D 3 | r ≤ C √ n r for all even r ≤ 16 • 2n.
• The D 4 contribution can be written as: We note that the main difference with the D 2 and D 3 contributions is that, for D 4 , the sum is over the double index j, k (instead of over the single index j), and over a restricted sum over sets S, S ′ (instead of over a free sum over sets S, S ′ ).To bound the moments of D 4 , we will employ a similar method as for D 2 and D 3 .The rth moment (with r even) can be upper bounded as follows (where we drop the '|S ∩ S ′ | is odd' constraint using Lemma 32 and denote the collection of subsets S ′ such that 0 < |S ∩ S ′ | < q − 1 by σ(S)): We note that |σ(S)| can be upper bounded and lower bounded by Cn q−2 (for some constants C).We define h := j,k,S, S ′ ∈σ(S) for which E(h) = 0. We note that h is a homogeneous polynomial in standard Gaussian random variables of degree 3. To upper bound the moments of g, and thus the moments of D J S ′ ,p (S,j) =⇒ E Dh ≤ Cn q−1 (S,j) , (149) where the sum over p runs from 0 to n and we have used the bounds on |σ(S)|.Note that this is a pointwise upper bound on the entries of the vector E Dh , which will be enough to bound the corresponding norm.Combining the upper bounds for E D d h P in Table 4 with the factor of r |P|/2 (≤ Cn |P|/2 ) in Eq. (140 and the normalization factor in Eq. (147), we find -using E(h) = 0 -that indeed E |D 4 | r ≤ C √ n r for all even r ≤ 16 • 2n.
In conclusion, we have shown that the rth moments (for even r ≤ 16 G Two-colored SYK to standard SYK In this Appendix, we give the proof of Lemma 27.

Figure 1 :
Figure 1: Illustrating the key idea of the proof of Theorem 5.An example of a strictly 4-local Hamiltonian is given in (a), vertices and faces representing Majorana operators and their interactions.The Hamiltonian is split into sets of terms -different colors in (b) -well separated from each other inside each set (so-called diffuse sets, see Definition 15).The next step is to match all Majorana operators, i.e., split the vertices into disjoint pairs, each connected by an edge (see panel (c)).We separately match the support of each term in one targeted set of terms (the color highlighted in (b) and (c)).The remaining vertices are matched in such a way that no two vertices connected by an edge belong to the same term.The Gaussian state is then created from the resulting matching, with only terms from the targeted set contributing to the energy.By optimizing the choice of the targeted set, a finite approximation ratio can be guaranteed.
α ) on [2n + 2] with good properties relative to H, that is,

( 4 )
α ) in a different way.First we use Lemma 20 to construct a matching M (I (4) α ) of [2n].This matching is guaranteed to be consistent with I (4) α .However, since I (2) is 2-local and I (4) is

Figure 2 :
Figure 2: Demonstration of the method in the proof of Theorem 6.(a) Matching M (I (2) α ) for I (2) α , here comprised of a single term (shown in green).To ensure consistency with I (2) α in H, M (I (2)α ) perfectly matches these terms and the pair (2n + 1, 2n + 2).The rest of the vertices are matched so that each pair does not belong to the same term in I\I

Lemma 29 . 2 II 2 I I β 2
Let H = i∈I J I C I where the {C I } are a set of all-mutually anti-commuting Majorana operators on [2n] (each C I has even support).Then λ max (H) = We have H = I J I C I = I J β I C I with I β 2 I = 1.Take the state ρ = 1 2 n (I + I β I C I ) and thus Tr(Hρ) = I J I = I J 2 I

(
2q ′ ) Q ′ +1 and I (2q ′ ) β are both strictly 2q ′ -local and the second scenario is excluded because I (2q ′ ) Q ′ +1 satisfies condition 2 of Definition 15.From these two facts it follows that each interaction in I (2q ′ ) β must involve at least one Majorana from [2n]\Sup(I

Figure 3 :
Figure 3: Example of the construction from the proof of Lemma 20.(a) The q-local set of interactions I (q = 4).Highlighted in green is diffuse and strictly q ′ -local I ′ (q ′ = 4), in red are the interactions in Sup(I ′ ) with weight less than q ′ , in grey are the rest of interactions in I.The goal is to create a matching M consistent with green-colored terms, inconsistent with grey-colored terms, and with no guaranteed relation to the red-colored terms.(b) Matching M ′ on Sup(I ′ ), consistent with I ′ by construction.Ensuring inconsistency with all redcolored terms is in general impossible.For example, consider the three overlapping red-colored terms at the top center.(c) Completing M = M ′ ∪ M ′′ with a matching M ′′ on [2n]\Sup(I ′ ), ensuring inconsistency with all grey-colored terms.For this, the vertices are matched only if they belong to different interactions.
(Repetition of Lemma 23).Let interactions I and interaction strengths {J I } be those of the SSYK-4 model with average degree k.With probability at least 1 − 2e − kn 32 we have I∈I |J I | ≥ kn/8.(91) Proof.The random variable I∈I |J I | is a function of two sets of random variables.The first set is X I ∈ {0, 1} for all possible 4-Majorana interactions I ⊂ [2n], |I| = 4, indicating the presence of I in I. Denoting Using the Stirling bound x! ≥ √ 2πx x+1/2 e −x , x ∈ N, one bounds β w (y, z − y) as: β w (y, z − y) < w(z − 1p = kD −1 and using Eqs.(104), (105) in Eq. (103) for k ′ > e 2 k + 1 we obtain we use the concentration properties of Gaussian random variables (see Lemma 31 at the end of this Appendix).Using K = ⌊ 4k 2 √ k ′ −1 e −k ′ n⌋ in Lemma 31.1:

class associated index sets
of terms associated index sets of terms in A(1) j=1 are d independent sequences of N standard Gaussian random variables.Then for any integer k ≥ 2:

F. 2
Upper bound for moments of D 1 , D 2 , D 3 and D 4

r∂ ∂Ki 1 .
|P|/2 E D d g(K) P r , (140) where P are partitions of [d] into non-empty parts, and ∥Y ∥ P (with Y a d-way tensor) is defined in Eq. (133).D d g(K) denotes the dth derivative of g(K), which corresponds to a d-way tensor with entries equal to D d g(K) i1,...,i d = For d = D, D d g(K) is constant.

2 S 2 ,
(140), we first calculate D d g for d = 1, 2, 3.Then, for each d, we upper bound E D d g P for all partitions P of [d].We will show that for all d and associated partitions P([d]), E D d g P can be upper bounded in such a way thatE |D 2 | r , E |D 3 | r ≤ C √ n r for all even 2 ≤ r ≤ 16 • 2n.Finally, the 0th moment also (trivially) satisfies this upper bound, hence it holds for all even r ≤ 16 • 2n.The derivatives of g are equal to:D g = S ′ : S ′ ̸ =S J 2 S ′ ,j + 2J S,j S ′ : S ′ ̸ =S J S ′ ,j (S,j) =⇒ E D g = n q − 1 (S,j) , ′ : S ′ ̸ =S J S ′ ,j , if (S, j) = (T, k) 2(J T,j + J S,j ), if S ̸ = T and j = k 0, if j ̸ = k    (S,j),(T,k) =⇒ E D 2 g = (0) (S,j),(T,k) , if (S = T ̸ = U or S ̸ = T = U or S = U ̸ = T ) and j = k = l 0, if (S = T = U or S ̸ = T ̸ = U ) and j = k = l 0, if j, k, l are not all equal    (S,j),(T,k),(U,l) =⇒ E D 3 g = D 3 g.
• 2n) of D 0 , D 1 , D 2 , D 3 and D 4 can be upper bounded by C √ n r , and hence, by Eq. (126), the (2n)th moment of A(1) can be upper bounded by C √ n 2n .Thereby, we have also established that the second condition in Eq. (71) is satisfied.

Theorem 6 .
Let H be a traceless fermionic Hamiltonian on 2n Majorana operators with maximal eigenvalue λ max (H).If H is k-sparse with terms of weight 2 and 4 and 2n > 15k, a Gaussian state ρ can be efficiently constructed, such that

1 .
∀I 1 , I 2 ∈ I ′ , I 1 and I 2 don't share any Majorana operators, i.e.I 1 ∩ I 2 = ∅.2. ∀I 1 , I 2 ∈ I ′ , there exists no I 3 ∈ I which shares Majorana operators with both I 1 and I 2 (if I 3 ∩ I 1 ̸ = ∅ then I 3 ∩ I 2 = ∅ and vice versa).3. The size of support of I ′ , i.e. |Sup(I ′ )|, is smaller than 2qn q+1 .In the setting of Theorem 5, diffuse sets of terms appear naturally due to the following Lemma.Consider a k-sparse strictly q-local fermionic Hamiltonian H on 2n Majoranas.The interaction set I of H can be split into Q disjoint subsets I α (α ∈ [Q]) all of which are diffuse with respect to I such that 2) , (48) /∥w 1 ∥, . . ., w K /∥w K ∥ (with w i ∈ R di ) can be bounded as follows: , and Mathematical Tables.

Table 3 :
[3] different partitions P of[3]into non-empty parts, with the associated number of parts |P|, and (the upper bounds for) the associated E D d g P for g in Eq. (139).
4 , we again use Lemma 32 from [34].We use Eq.(140) to find an upper bound for the rth moment of h.We first calculate D d h for d = 1, 2, 3.Then, for each d, we upper bound E D d h P for all partitions P of [d].Thereby, we show that for all d and associated partitions P([d]), E D d h P can be upper bounded such that E |D 4 | r ≤ C √ n r for all even 2 ≤ r ≤ 16 • 2n.The 0th moment trivially satisfies this bound, and therefore it holds for all even r ≤ 16 • 2n.

Table 4 :
[3] different partitions P of[3]into non-empty parts, with the associated number of parts |P|, and (the upper bounds for) the associated E D d h P for h in Eq. (148).