A diagrammatic approach to variational quantum ansatz construction

Variational quantum eigensolvers (VQEs) are a promising class of quantum algorithms for preparing approximate ground states in near-term quantum devices. Minimizing the error in such an approximation requires designing ansatzes using physical considerations that target the studied system. One such consideration is size-extensivity, meaning that the ground state quantum correlations are to be compactly represented in the ansatz. On digital quantum computers, however, the size-extensive ansatzes usually require expansion via Trotter-Suzuki methods. These introduce additional costs and errors to the approximation. In this work, we present a diagrammatic scheme for the digital VQE ansatzes, which is size-extensive but does not rely on Trotterization. We start by designing a family of digital ansatzes that explore the entire Hilbert space with the minimum number of free parameters. We then demonstrate how one may compress an arbitrary digital ansatz, by enforcing symmetry constraints of the target system, or by using them as parent ansatzes for a hierarchy of increasingly long but increasingly accurate sub-ansatzes. We apply a perturbative analysis and develop a diagrammatic formalism that ensures the size-extensivity of generated hierarchies. We test our methods on a short spin chain, finding good convergence to the ground state in the paramagnetic and the ferromagnetic phase of the transverse-field Ising model.


Introduction
Despite promises of exponential speedups, quantum algorithms require optimization to achieve an advantage over their classical counterparts on state of the art supercomputers for problems of interest. This is the case both in the Noisy Intermediate-Scale Quantum era [1], where coherence times in quantum devices prohibit all but the shortest experiments to be performed, and in first-generation fault-tolerant de-vices, where a single non-Clifford rotation requires thousands of additional qubits and hundreds of error correcting cycles [2]. In the field of digital quantum simulation, the variational quantum eigensolver (VQE) [3] has emerged as a competitive class of algorithms for generating approximate ground states of quantum systems, due to its relatively low circuit length. These algorithms consist of parametrizing a quantum circuit with a small number of classical control variables, which may be tuned to minimize the energy of the state produced by the circuit, given a target Hamiltonian. As the manifold of obtainable states for a given VQE will only ever be an exponentially small region in the larger Hilbert space, optimizing VQE design is critical to obtain good approximations of the system's ground state [4,5,6]. This has spurred much recent work in optimizing VQEs based on the unitary coupled cluster expansion [4,5,7], or on the quantum approximate optimization algorithm [8,9]. The efficiency of coupled cluster methods is based on the principle of size-extensivity. This means that the ansatz systematically accounts for ground state correlations, as ensured in perturbative language by the linked-cluster theorem [10]. However, to be realized as a quantum circuit size-extensive ansatzes typically require expansion via Trotter-Suzuki-based methods [11,12]. At low circuit depth, these expansions introduce significant errors. Alleviating this issue would help to ensure the efficiency of the VQE algorithm.
In this work, we develop a Trotterization-free diagrammatic method to generate size-extensive VQEs. We start by designing a class of VQE ansatzes, based on the stabilizer formalism in quantum error correction, which provably tightly span the entire Hilbert space of N q qubits. We then demonstrate how one may compress an arbitrary variational ansatz to account for symmetries of a target Hamiltonian. We further show how to construct a hierarchy of ansatz generators, allowing one to trade between circuit length and accuracy in a practical manner by choosing only those generators that contribute well to solving the problem. We motivate the construction of one particular such hierarchy from a general perturbative analysis of weakly coupled target Hamiltonians, for which we develop a simple-to-use diagrammatic formalism. We find that our geometrically tight stabilizer ansatz may be compressed to a practical size using this per-turbative scheme. The analogue of the linked-cluster theorem for such compressed digital ansatzes is stated and proven, ensuring the size-extensivity of the construction. We also propose some possible modifications to our perturbative scheme to account for circuit depth and locality. We compare the performance of these constructions on simulations of the transversefield Ising model in three different physical regimes (weak-coupling, strong-coupling, and critical). We find that strictly following the perturbative approach is beneficial in the weak-coupling regime, but restricting the ansatz to lowest-order gives better convergence in the strong-coupling regime -even though such ansatzes are seemingly less-informed about the strong-coupling physics.

Variational quantum eigensolvers
A variational quantum eigensolver (VQE) is an algorithm executed on a quantum register that aims to approximate the minimum eigenvalue E 0 of a target Hamiltonian H on C 2 Nq by finding low energy states |ψ ∈ C 2 Nq variationally. To be precise, this algorithm minimizes ψ|H|ψ over a variational ansatz: Definition 1. A variational ansatz on N p parameters corresponds to a pair (U, | 0 ), where U is a smooth map from the parameter space θ ∈ R Np to the unitary operator U ( θ) on C 2 Nq , and | 0 ∈ C 2 Nq is the starting state, which is acted on to generate the variational state |ψ( θ) = U ( θ)| 0 , with variational energy E( θ) = ψ( θ)|H|ψ( θ) .
As a brief example, let us define the following toy two-qubit variational ansatz: Example 2. The 3-parameter YYX variational ansatz (U Y Y X , |00 ) is defined on two qubits {Q 1 , Q 2 }, with the starting state |00 in the computational (Z) basis, and U Y Y X (θ 1 , θ 2 , θ 3 ) := e iθ3Y1X2 e iθ2Y2 e iθ1Y1 . (1) A quantum circuit that implements this toy ansatz is given in Fig. 1, using standard methods [13] to decompose the two-qubit e iθ2Y1X2 term in terms of single-qubit rotations and CNOT gates.
VQEs are appealing because they reduce the computational complexity of searching the (exponentially large) N q -qubit Hilbert space to the complexity of searching the parameter space (which may be made arbitrarily small). However, this comes at a cost, as none of the states |ψ( θ) may be close (in energy or overlap) to the target ground state. The variance in the energy ψ|H|ψ of states |ψ randomly drawn (i.e. with Haar measure) from an N q -qubit Hilbert space is given by (1), as color coded with the variational parameters θi. (below) The above circuit in a compressed notation, treating each rotation U as a single gate labeled by the elements of the rotation generators (Eq. (4)) on each qubit.
with · F the Frobenius norm and · S the spectral norm. This implies that the probability of a random state having energy close to the ground state energy of H scales as e −2 Nq , while one expects the volume of space explored by a VQE to grow only as e Np . This, and similar results for derivatives of the energy with respect to variational parameters [6], imply that random ansatz choice has little to no chance of success for finding ground state energies. Instead, a variational ansatz should be designed to cover as much of the N q -qubit Hilbert space as possible, in a way that maximises the chance of finding low-energy states (or states that overlap well with the true ground state). A full VQE protocol must also concern itself with optimizing the minimization procedure, especially to prevent being stuck in local minima or barren plateaus [6]. One should further take care to make the resulting quantum circuit as hardware efficient [14,15] as possible. Hardware-efficiency is an active field of research and dependent upon the physical implementation of the quantum computer, and recent work has gone into optimizing the minimization procedure of a VQE [16,5], including the choice of cost function to minimize (e.g. to target excited states [17,18]). In this work, we focus instead on studying the variational ansatzes themselves. We first focus on constructing 'geometrically efficient' variational ansatzes. Then we tailor these to target specific Hamiltonians based on a perturbative approach. This generic approach is in complement with previous work on ansatz design targeting specific (classically hard) problems of interest in e.g. optimization [8] and quantum chemistry [4].
To pin down a working definition of 'fundamentally digital' quantum ansatzes, we will use the following conditions (similar to those stated in [6,16,5,19]): where each U i has a generator T i : If n i > n j whenever i > j, we call the ansatz ordered, and if each generator is a Pauli operator -T i ∈ P Nq := {I, X, Y, Z} ⊗Nq -we call the ansatz a Pauli-type ansatz.
We take the product in Eq. (3) from right to left (i.e. U 1 (θ n1 ) acts first on the state | 0 ). As we allow n i = n j when i = j, we may have strictly more unitaries than parameters: N u ≥ N p . In the rest of the text, we will refer to Pauli-type ansatzes as fundamentally digital: note that Pauli rotations can be directly implemented in a quantum circuit via the techniques of [13]. When used in a VQE, Pauli-type ansatzes also have the advantage that some derivatives of the variational energy may be obtained 'for free' [19].

Variational manifolds
Although tailoring a VQE to a Hamiltonian is essential for its success [6], interesting statements may be made about the variational ansatz prior to fixing such a target, by focusing on the manifold of states it explores.
We note that, despite being a 'manifold generated by unitary rotations', M(U, | 0 ) does not have a structure of a Lie group. This is because we only apply U once to create the variational state; a state U ( θ)U ( θ )| 0 may not correspond to any state U ( θ )| 0 (and most often will not). If U is a product ansatz, one can defined a Lie group L(U ) ⊂ U(2 Nq ) from the set of generators T i . The manifold L(U )| 0 then contains M(U, | 0 ) as a submanifold, though it is almost always larger. Indeed, when e iθTi defines a universal gate set, L(U ) = U(2 Nq ) and L(U )| 0 is the entire set of N q -qubit states, which is not terribly informative about the structure of M(U, | 0 ).
As a rough guide, the bigger the variational manifold the better; simply adding more manifold to an ansatz can never shift it further from the target ground state. However, measuring the size of a variational manifold is made somewhat difficult by dimensionality concerns. The (real) dimension D M(U,| 0 ) of M(U, | 0 ) is at most N p , but it may not achieve this upper bound, and M(U, | 0 ) may contain boundary regions of lower dimension. (Curiously, the minimal subspace of C 2 Nq containing M(U, | 0 ) may be of much higher dimension than N p .) As M(U, | 0 ) inherits a metric from C 2 Nq , one can use this to define a Borel measure d|ψ , and thus define the area of the manifold: When the map ( θ) → |ψ( θ) is invertible on some range of parameters, its Jacobian J is full-rank, and the manifold area may be calculated as However, when evaluating this integral one must take care to avoid double-counting points θ = θ when |ψ( θ) = |ψ( θ ) .

Stabilizer ansatzes
Clearly the largest space that can be spanned by any variational ansatz is the entire Hilbert space. The minimal number of (real) parameters required to achieve this spanning is 2(2 Nq − 1), and it is an interesting question whether this may be provably achieved. In this section we answer this question in the affirmative, constructing a class of ansatzes from sequential layers of n = 1, . . . , N q -qubit stabilizer groups [20] (defined in App. A). Although such a construction has impractically large overhead, one may use this construction as a base to generate tractable variational ansatzes with the methods developed in Sec. 4 and Sec. 5.

Definition 7.
A stabilizer ansatz (U, | 0 ) on N q qubits is constructed by choosing for each n = 1, . . . , N q : The definition above allows for any choice of the [n − 1, n − 1] stabilizer groups S (n) , including ones with non-commuting elements between different S (n) . However, we use the following prototypical example throughout the rest of this text. Example 8. The quantum combinatorial ansatz, or QCA, is a stabilizer ansatz with A compressed circuit for the quantum combinatorial ansatz on 3 qubits is given in Fig. 2 Theorem 9. A stabilizer ansatz (U, | 0 ) spans the entire Hilbert space of N q -qubit states with the minimal number of parameters.
Proof -That the number of parameters is minimal may be immediately calculated, We then prove that the ansatz spans the entire Hilbert space by induction. The stabilizer group S (n) gives a basis |p for the n − 1 qubit Hilbert space. Then, as [R This sends the state |p |s n to the state where the angles θ n p,j are given by the following linear transformation: This is the Hadamard-Walsh transformation, which is invertible, so θ n p,j can now be treated as independent parameters. On the other hand, our choice of R (n) j explicitly takes the starting state |s n on qubit n to any state on the Bloch sphere. This implies that if we have the ability to create an arbitrary n − 1-qubit state U (n) |Ψ (n−1) |s n may be tuned to achieve any state of the form which describes an arbitrary n-qubit state. This then completes the proof of coverage by induction, as U (1) |s 1 covers the entire Bloch sphere.

Children ansatzes and their construction
The cost of implementing a product VQE grows polynomially in both the number of units N u (as this dictates the circuit size) and the number of parameters N p (as this dictates the size of the optimization problem). Thus, an ansatz that covers the entire Hilbert space is too expensive to be of use; one must use it to construct child ansatzes of a manageable size.

Definition 10.
A product ansatz (U , | 0 ) is a child ansatz of a parent product ansatz (U, | 0 ) when each unit U i of U also appears in U .
This definition is operational rather than fundamental; the variational manifold of a child ansatz is not necessarily a submanifold of the parent ansatz' variational manifold. However, one expects that these children ansatzes will still inherit some properties of the parent. In particular, we expect that a parent ansatz that spans as large a part of the Hilbert space as possible will lead to children ansatzes that are similarly large.

Ansatz compression and hierarchical construction
An obvious method to construct a child ansatz from a parent is to simply get rid of individual units or parameters: Definition 11. Given a product ansatz ( j U j (θ nj ), | 0 ), one may remove a parameter θ ni to obtain the child ansatz ( nj =ni U j (θ nj ), | 0 ), or fix a parameter θ ni = cθ nj with c ∈ R to obtain the child ansatz ( l U l (θ m l )| 0 ), where m i = n j , m l = n l for l = i, and T l = cT l whenever n l = n i .
Parameter fixing may be considered strictly more general than unit removal, as fixing θ ni = 0θ nj produces the same variational manifold as removing θ ni . However, unit removal reduces both N u and N p , while parameter fixing does not reduce the resulting circuit length.
Alternatively, one may construct child ansatzes using a bottom-up approach: Given a product ansatz ( j U j (θ nj ), | 0 ), one may construct a priority Figure 2: A circuit for the QCA on 3 qubits. For simplicity, we label each circuit element Ui( θ) by the tensor factors of its generating Pauli operator Ti (=: R (n) S in Eq. (9)) on each qubit. For example, the label XXX corresponds to the rotation e iθ 3 XX,0 XXX . This compression may be expanded on as shown in Fig. 1 using the methods of [13]. For Nq qubits, QCA contains 2(2 Nq − 1) gates and is proven to cover the entire Hilbert space (Theorem 9). In a practical application, QCA is to be reduced to polynomial size via a hierarchical approach outlined in Sec. 5. Note that the order of gate multiplication in QCA does not imply the order of gate importance in the hierarchical reduction scheme of Sec. 5. For instance, consider an application of the displayed QCA circuit to the open transverse-field Ising chain (Sec. 6). In this case, the two gates preferred in the reduction are those generated by Paulis XY I and IXY , followed by the one generated by XIY (cf.
The two methods described above may be combined if desired. Subsequent generations of ansatzes will trade off a lower cost to implement against a smallersized variational manifold. We now focus on methods to optimize this balance. We first demonstrate how one may use unit reduction and parameter fixing to force a large VQE to respect symmetry constraints on the system. Following this, we take a rigorous perturbative approach to construct priority lists for a given target Hamiltonian.

Compression over symmetries
One may often restrict the ground state of a system by symmetries of the Hamiltonian; that is, operators S that commute with H. When this is true, all eigenstates |E 0 of H may be chosen to be eigenstates of S. This is particularly relevant in electronic systems where the particle number i Z i or parity i Z i is conserved. The symmetry is enforced on all states in a variational ansatz (U, | 0 ) when | 0 is an eigenstate of S, and [U ( θ), S] = 0 for all choices of the parameters θ. This in turn requires for an ordered product ansatz When a symmetry is not respected by a variational ansatz, one may choose to either remove or fix the offending terms (see [21] for an alternative approach). Removal of generators that do not respect a given symmetry is simplest, but may be too restrictive for our desires. One may fix an ordered product ansatz to obey a symmetry that is broken by a set of commuting generators {T M0 , T M0+1 , . . . , T M1 }. To do this, one needs to solve the system of linear equations and fix c n θ n = c m θ n for N ≤ n, m ≤ M . This requires fixing all parameters between N and M , which in turn might require rearranging the original ansatz to place specific units next to each other. A very simple symmetry to enforce in a problem is the (antiunitary) complex conjugation operator, Ki = −iK. (This symmetry is respected whenever the Hamiltonian is purely real.) As we have defined our generators T i with an imaginary unit, U i = e iθn i Ti commutes with K when T i anti-commutes with K. (e.g. for a single qubit, the rotation e iθY rotates between the real eigenstates of the real X and Z Pauli operators.)

Example 13. The YYX toy ansatz is the compression of the QCA stabilizer ansatz for two qubits over K. It thus spans the entire Hilbert space of 2-qubit states with real coefficients (which matches the calculation of its variational area).
In App. B, we give another example of a symmetrycompressed Pauli-type ansatz -the fermionic unitary coupled cluster ansatz.

Size-extensivity of a variational ansatz
To show beyond-classical performance, we desire our variational quantum algorithms to be able to produce strongly entangled states, inaccessible to a classical computer. For this, we would like the VQE ansatz to represent quantum correlations in a maximally compact manner. To achieve this, we are guided by the idea of size-extensivity. The notion of size-extensivity has its origins in strongly-correlated physics, and is formalized there by the linked cluster theorem [10]. The rough notion is that: (1) if a computation treats two uncoupled systems together, it should converge to the same solution as when it treats them independently, and (2) the only complexity one should be adding to the solution of coupled systems is that which is minimally demanded. Formalizing this idea requires somewhat heavy machinery; we give a formal definition later in the text (Def. 23) and now put forward the following (weaker) statement as an informal definition.

Definition 14. (informal) Consider variational ansatz (U, | 0 ) for a Hamiltonian H on a system S, and an arbitrary (disjoint) partition
if H other is reduced to 0. In (18), each U ( θ i ) acts only on system S i (i.e. the coefficients of any part of the ansatz U that acts outside of S i are set to 0).
In the language of Def. 14, the stronger statement of Def. 23 is needed to treat the case where {S i } together form a connected system, but some pairs (S k , S l ) are mutually separated (e.g. because of spatial locality). It appears that in this case, a variational ansatz is efficient if it tends to introduce more correlations between less separated subsystem pairs (S k , S l ). However, this heuristic needs to be re-stated more rigorously. In Def. 23, we provide such a rigorous formulation and apply it in an explicit construction of size-extensive ansatzes.

Perturbative construction for digital size-extensive ansatzes
We now propose a perturbative approach for the construction of digital size-extensive ansatzes. We formulate it in terms of a gate hierarchy list (U 1 , . . .) derived from a large parent ansatz (U, | 0 ). To decide on the hierarchy list, we split the system Hamiltonian H into the non-interacting part H 0 and the coupling JV ( H 0 , V ∼ 1): To allow for analytical treatment we consider the weak coupling limit, J 1. In this limit, the overlap between the true ground state |E 0 and unperturbed excited states |E 0 j is exponentially small in the number of applications of V required to couple |E 0 j to the unperturbed ground state |E 0 0 . We may rewrite the non-interacting part H 0 via a unitary transformation as which ties each |E 0 j to a computational basis state | s If we can further tie each state | s to one or a few variational units U i (θ i ), we can construct a hierarchy list of these U i (θ i ) based on the approximate magnitude of | s|E 0 |. The resulting hierarchy list is to be used in the VQE procedure for the original, potentially strongly coupled Hamiltonian H (J = O (1)).
Performing this construction in a size-extensive way runs into a challenge which we call 'back-action'. Namely, the action of any unit U i (θ i ) on the state j<i U j (θ j )| 0 may be very different to the action of U i (θ i ) on the starting state. In particular, one could imagine this action generating an undesired term to the variational wavefunction which must be cancelled by later rotations. As we will show, one can deal with this back-action while retaining the size-extensivity. To achieve this, in the rest of this section we will expand the target equality, assuming that |ψ( θ) is given by a digital (i.e., Paulitype) ansatz. We will do so in terms of a Pauli decomposition of the perturbation and then we will equate terms based on the order of their polynomial dependence on each J i . On the left-hand side (Sec. 5.1), we will use a Dyson expansion, with a convenient diagrammatic representation. On the right-hand side (Sec. 5.2) we will use a Taylor expansion of the exponential operators. We will show that a single condition (Def. 21) on the parent ansatz is sufficient to automatically cancel all undesired back-action. Then, we will show that an additional condition (Def. 24) causes the back-action terms to precisely cancel out any need for entangling circuits between disconnected regions (Theorem 26). This ensures the desired feature of size-extensivity, thus providing the digital quantum version of the linked-cluster theorem [10]. The QCA ansatz of Example 8 will be seen to satisfy the above conditions, and therefore gives rise to a hierarchy of size-extensive digital ansatzes.
Our perturbative approach can be thought of as a digital unitary relative of the Kirkwood-Thomas expansion [22,23]. Also note, that as we intend to optimize the parameters θ as part of the VQE, we will approximate these only to leading order in the interaction strength J. This makes our method potentially applicable even in the strongly correlated regime where perturbation theory breaks down.

Diagrammatic expansion of the ground state
To expand the left-hand side of Eq. (22), let us use vector notation J for the coupling terms J i (and V for the operators V i ). Then, let us introduce some notation that simplifies the following expressions: We wish to use this expression for both vectors of numbers (e.g. J) and vectors of operators (e.g. V ).
In the latter we must take care of ordering; as previous, we assume that the product runs right-to-left. As Pauli operators either commute or anticommute, rearranging these products simply requires one to keep track of minus signs. This may be assisted by the following definition and a relative sign Then, as Pauli operators map computational basis states to computational basis states, V k | 0 is an eigenstate of H 0 , with energy Let us now expand the ground state as a Taylor series in J: Following a standard Dyson expansion (details in App. C), we observe that Lemma 16. The vectors |Ψ k take the form where C k is a real number. 1 We take the natural numbers N to include 0.
To find the values of coefficients C k , we first develop a perturbative expansion for a ground state |Ẽ 0 with a special normalization condition 0|Ẽ 0 = 1, The states |Ψ k then satisfy (see App. C): whereC k is a real number. In particular, if δ β is the unit vector with a 1 in the β index,C k = δ k, 0 if s( k) = 0, and is otherwise given by the recursive relatioñ where k < k if k β ≤ k β for all β and k = k.
To find the coefficients C k of the normalized ground state, one may then expand the expression |E 0 = Ẽ 0 |Ẽ 0 −1/2 |Ẽ 0 in powers of J, which allows to express C k in terms ofC k obtained from (32).
We note here that we have no guarantee that the normalization constant N = Ẽ 0 |Ẽ 0 −1/2 behaves regularly in thermodynamic limit N q → ∞. This is a standard breakdown of perturbation theory for the wavefunction, however when this occurs our approach to VQE construction is still possible, and may indeed still be practical. At the stage of estimating the variational parameters θ, we will be using theC k coefficients, since they behave regularly and are more practical to calculate. As θ will be optimized later on the quantum device, the estimation itself need not be exact.
The size-extensivity of our approach relies on an important relationship between C k terms that are the combination of disconnected pieces. To formalize this notion of connectedness, we introduce some terminology: The set of qubits on which at least one activated coupling V i acts non-trivially is called the support of k.
Then the connectedness of the contribution C k is defined as follows: such that the respective supports of k A and k B do not share any qubits. This implies, but is not equivalent to, the following statement: The disconnected contributions C k obey the following special property (proven in App. D).
This idea of connectedness of contributions may be described in a graphical representation of the product of operators V · k : Definition 20. Let V define the order of a decomposition of the perturbation J · V to a non-interacting Hamiltonian H 0 . A perturbative diagram for a vector k, is a bipartite graph with one circular vertex for each qubit, and k β square vertices for each interaction V β . We draw edges between each square vertex and the qubits that the corresponding V β term acts non-trivially on, and color the edge to qubit i blue,

Each circular vertex is then coloured black or white if it is connected to by an odd or even number of coloured edges respectively.
A contribution C k is connected if all square vertices in the perturbative diagram are connected 2 . In Fig. 3, we show some examples of connected and disconnected perturbative diagrams. Diagrams also allow one to read off s( k) (s i ( k) = 0 when the corresponding vertex is white), and Γ( k) mod 2 (being the number of red lines modulo 2). (The rest of Γ( k) depends on the order in which the operations V i are applied, which is not captured in the perturbative diagrams.)

Taylor expansion of the variational ansatz
We now consider the expansion of the right hand side of Eq. (22). In keeping with the previous subsection, we wish to do this in terms of the individual perturbations J i . Let us expand each coefficient θ i in a power series over all interaction terms J i where the shorthand vector power notation was defined in Eq. (24). This may be substituted into the 2 The circular vertices, corresponding to qubits, need not be connected, as a connected contribution need not act on all qubits. variational ansatz (U, | 0 ) where we added the brackets to emphasize the ordering of the product over i. Now, we take the Taylor series of the exponentials in Eq. (37), obtaining We will eventually wish to rearrange this product to identify all terms that share the same power of each J i -that is, those that share the same J · k . This requires first expanding our product over sums to a sum over products (pulling the sum over integers g in front of the products over k and i). Each term in the resulting sum will have a unique product of powers of the different θ To put (39) in a simpler form, we define: which allows us to rewrite the sum as

Equating ansatz and perturbative terms
Our plan is now to solve for θ ( k) i , by comparing |ψ( θ) from Eq. (43) to the perturbative series for |Ψ( J) from Eq. (28). We will equate the contributions coming from different PT orders, and those proportional to the same computational basis state. (The vectors K( f ) and N ( f ) allow us to identify which terms need be equated.) This will result in equations that are linear in the coefficients C k and Θ( f ). Due to the structure of Θ( f ) these equations will be highly nonlinear in θ ( k) i . However, under certain conditions (Def. 21 and Def. 24), we find that these equations for θ ( k) i may be solved iteratively, and that many coefficients will vanish exactly. This will yield a class of ansatzes which are also size-extensive, the technical definition of which we give in Def. 23. For such ansatzes, we will have a guarantee that a relatively compact circuit is capable of reproducing the perturbative series for |Ψ( J) up to a given PT order k. These circuits will have a relatively small (polynomial in N q at fixed PT order k) number of free parameters when used as a VQE, as this coincides with the number of leading order connected diagrams up to order k.
Equating the action of the Taylor-expanded U ( θ) (Eq. (43)) on the starting state | 0 to the expansion of the ground state |E 0 (Eq. (28)) and separating in orders of J obtains the form This may be further separated by taking the inner product with different computational basis states to give the equations Eqs. 46 contain what we call the back-action terms.
These are undesirable; if one fixes the θ ( k) i values one at a time, then any non-zero term appearing in Eqs. 46 will need to be cancelled out by fixing some other θ j k at a later point. However, these terms may be avoided for a large class of parent ansatzes: Note that a generating ansatz requires at least sufficient parameters to span the entire Hilbert space, however it remains unclear whether a generating ansatz does span the entire Hilbert space. Instead, we are interested in generating ansatzes here as they avoid undesired back-action Proof -Eq. (45) may be rewritten as We then use this equation to fix the left-hand side, being an equation of free Θ( f ) terms. If this is done in ascending order in | k|, one can check that all Θ( f ) terms on the right-hand side at each k will have been fixed previously, implying that this fixing is well-defined. Then, one notes that for any odd m, which implies that contributions from linear combinations of the fixed components will never appear in Eq. (46).
The above implies that the (strictly real) term C k from each perturbative diagram contributes only to θ (k) s( k),a ( k) . Then, by definition, we have and as Pauli operators are either entirely real or entirely imaginary, this extends to any computational basis state | s This implies that for any function f such that f s,a ( k) = 0, unless s = s( k), a = a( k) we have and so the right-hand side of Eq.

18).
A Pauli-type ansatz satisfying this definition will satisfy Def. 14 whenever the perturbative expansion above converges. To see this, note that when the perturbative expansion converges the solution to Eqs. 45 will provide the ground state exactly. Then, consider a Hamiltonian that does not couple two systems S i and S j , and a term T s,a in our ansatz that does couple S i and S j . One can see that whenever s = s( k), a = a( k) for some k that k will be disconnected, and so θ s,a = 0 at all orders of k by Def. 23.
We now have the machinery to present a condition for our ansatz to be size-extensive that just relates the ansatz terms T i to the perturbation terms V i .

Definition 24. A generating Pauli-type ansatz is matched to a perturbation JV if
whenever ( k, f ) and ( k , f ) act non-trivially on disconnected parts of the system.

Theorem 26. A perturbative hierarchy constructed from a Pauli-type ansatz via Eqs. 45, that is matched to a perturbation JV , is size-extensive.
Proof -By Lemma 19, we have that C k = C k A C k B . Inserting Eq. (45), we find As disconnected parts of k, either k A,i = 0 or k B,i = 0 for any i, implying f A ( k ) = 0 or f B ( k ) = 0 for all k in the above sum. From this we may write (54) Combining this with the definition of a matched ansatz obtains It remains to check that all f : in which case the right-hand side of Eq. (47) cancels, giving the required result. This may be seen by induc- for disconnected k AB with | k AB | < K, and thus Θ( f AB ) = 0. This result can be seen as the digital quantum cousin of the linked-cluster theorem [10].

The perturbative construction
Following the above, we can construct a hierarchy of the T s,a by estimating the corresponding value of θ s,a and placing them in order. We do not need to know the precise values of θ s,a , as these will be optimized as part of the VQE. Instead we plan to estimate only the largest contributions to each θ s,a . Under the assumption that J i J h n for all interaction terms i and all qubits n, we expect the largest contributions to come from those (connected) C k with smallest possible | k|. This may be read off immediately from the perturbative diagrams themselves

Definition 27. A connected perturbative diagram D for a vector k is a sub-leading diagram to a diagram D for a vector k if:
• D and D have identically coloured vertices (implying s( k) = s( k )).
• D and D have the same number of red edges modulo 2 (implying a( k) = a( k )).

A diagram D is leading if it is not a sub-leading diagram to any D .
Note that multiple leading diagrams may exist for a single parameter θ a k . We now wish to construct a perturbative hierarchy by drawing all leading diagrams with | k| < K interaction vertices (for some sufficiently large K), and then ordering corresponding T a s by the leading-order contributions to θ s,a we obtain via Eq. (47). However, this calculation requires the normalized coefficients C k , which in turn require computing the perturbative series for the normalization constant N . To avoid this cumbersome normalization procedure and also to simplify Eq. (47), we suggest to approximate θ We expect that typicallyθ s,a <θ r,b ↔ θ s,a < θ r,b , which implies that this approximation should preserve the perturbative hierarchy. We now have all the machinery required to define our perturbative hierarchy.

Definition 28. Let {T s,a } be the generators for a matched, generating variational ansatz for a Hamiltonian
and ifθ s,a =θ r,b , we choose the ordering of T s,a and T r,b at random.
The explicit calculation of theθ s,a variables is quite time consuming. As a shortcut, we note thatθ ( k) s,a scales as J · k , which, when J i 1 typically dominates any combinatorial terms. To formalize this, let us define J s,a = leading k, s( k)= s, a( k)=a and we suggest to save on calculation by assuming θ s,a < θ r,b when J s,a < J r,b .

Application: transverse-field Ising model
In this section, we demonstrate the construction of a variational hierarchy and study the resulting VQE performance on a target system. As a simple target example, we take the 1-dimensional transverse-field Ising model (TFIM): This system is a well-known prototype for condensed matter systems, being a non-interacting set of spins at J = 0, an Ising chain at h = 0, and demonstrating a quantum phase transition at h = J. For our example,  1, 1, 0)), but to lower order.
we consider the h J > 0 regime, and construct a perturbative hierarchy around J = 0, using the QCA as a parent ansatz. The noninteracting ground state may be immediately identified as the computational basis state | 0 with energy −hN q , which we use as the starting state of our ansatz. Non-interacting excited states | s have energy (2| s| − N q )h.

Example perturbative construction on four sites
To demonstrate the application of the methods developed in Sec. 5 in detail, we now construct the full perturbative hierarchy on a small chain (N q = 4). This system has three perturbation terms, which we labelV i = X i X i+1 for i = 1, 2, 3. These perturbations preserve the antiunitary complex conjugation symmetry K, and the unitary global parity symmetry Z 1 Z 2 Z 3 Z 4 . This reduces the required variational manifold dimension from 2 5 − 2 = 30 to 2 3 − 1 = 7 (both symmetries halve the Hilbert space dimension, but complex conjugation makes the phase equivalence redundant). In the QCA, this corresponds to removing all imaginary rotations (of the form e iθX...X ), and all generators with an odd number of non-trivial terms. This removal will be automatic in the perturbative construction, as removed terms will never appear in the hierarchy, so we need only note the symmetries in case we 'run out' of terms to add to the variational ansatz 3 . The remaining generators are then For convenience in this small system, we will drop the stabilizer notation of Sec.3, and write the QCA as 7 j=1 exp(iθ j T j ). (For example, in the notation of Sec. 3 we would have written θ 6 as θ 4 XII,1 .) To construct the perturbative hierachy, we proceed by drawing all lowest-order diagrams, and calculating the correspondingC k contributions. In Fig. 4, we list the seven lowest-order connected diagrams in the system. This gives us the following: 1. 3 contributions at order J (to T 1 , T 2 , and T 3 ).

1 contribution at order J 4 (to T 7 ).
This may then be used as an initial guess for the ordering in the perturbative hierarchy. Importantly, although k = (1, 0, 1) is an order-J 2 term satisfying 0| V k T 7 |0 = 0, the corresponding diagram is disconnected (Fig. 4, bottom-left). This implies that its contribution to θ 7 will be cancelled out by the contributions of (1, 0, 0) and (0, 0, 1) (Theorem 26), and the diagram need not be considered in our construction, as we will confirm shortly. We further note that higher-order diagrams exist, e.g. that corresponding to k = (0, 1, 2) (Fig. 4, bottom-right). Although these have non-zero contribution to the actual value of the variational angles (in this case θ 2 ), as this contribution is at a higher-order of J we expect it to not affect the order of the hierarchy.
We now check the above ordering of the perturbative hierarchy by explicit calculation of the lowestorder contributions toθ j . Applying Eq. (32) recursively, the lowest-order connected contributions can be found to be (noting S k, k = 1 as all V i commute), One may then calculate in turn the lowest-order approximation for the variational parameters via Eq. (57) (noting here that Γ( k) = 1 for all k in this system).
We see that ordering terms by J s,a reproduces the full perturbative hierarchy whenever J < 2h. We also note that the order J 2 contribution toθ 7 from k = (1, 0, 1) is cancelled (following Theorem 26), as We also note that the magnitudes ofθ i are systematically smaller than the magnitudes of corresponding perturbative terms J · kC k . This suggests that the back-action terms in QCA may have a systematic positive effect on VQE convergence.

Low-order construction for a large chain
Following the analysis of the four-site example, we expect little to no deviation between parameters of the same order in a larger chain. Indeed, all first, second and third-order leading diagrams are identical up to translation along the chain (Fig. 5). As the onsite and interaction strengths are uniform along the chain, this implies that the coefficients for all such diagrams are likewise equal (to lowest-order). At fourthorder, two separate types of diagrams exist. One corresponds to k = (1, 2, 1) in the four-site model, and gives the same parameter estimate (θ s,a = −J 4 512h 4 ), to the QCA generators of the form {Y i X i+1 X i+2 X i+3 } Figure 5: The leading connected diagrams to fourth-order on the transverse-field Ising model. Each diagram should be repeated across the entire Nq-qubit chain -the total number of copies of each diagram that will appear is written in the right-hand column. Diagrams are labelled by the generator T s,a that they contribute to.
The other was not present in the four-site model (as it requires 5 qubits) -it contributes a parameter estimate ofθ s,a = J 4 128h 4 to QCA generators of the form {Y i X i+4 }, placing these generators earlier in the perturbative hierarchy. The resulting ansatz thus needs only 5N q − 13 generators to reproduce the ground state with errors of order (J/h) 5 . To obtain this level of accuracy with a classical calculation, one would in theory need to sum over all (N q − 1) 4 combinations of individual perturbations. However, as clever grouping of terms (e.g. via tensor network contractions or similar) should reduce the time-cost of such a summation far below such numbers, this argument does not lead to an immediate guarantee of a quantum speedup for VQEs of this form.

Alternative hierarchies and circuit ordering
Although perturbation theory is a natural choice for developing variational hierarchies, it is not necessarily the only starting point. In the presence of strong interactions (where pertubation theory breaks down), other generator properties may provide better insight into how important they are at obtaining the ground state. In the following, we study the following natural constructions of a priority list, all of which use QCA as a parent ansatz: • pertQCA: The perturbative hierarchy from Def. 28, using QCA as the parent variational ansatz.
• 2-locQCA: A low-weight variant of pertQCA, obtained by only allowing 2-local generators (those acting non-trivially on up to 2 qubits). When more generators are desired than in the final priority list, we loop over it repeatedly.
• locQCA: A geometrically local variant of pertQCA, obtained by only allowing generators acting on nearest neighbour pairs of qubits (and again looping over the priority list if required). This is equivalent to allowing only the generators which are dictated by the first-order perturbation theory, allowing for a generalization to an arbitrary Hamiltonian.
We have so far not discussed the ordering of the units within the ansatz circuit. Two natural choices present themselves: taking the order in which the gates appear in the priority list, and taking the order in which the gates appear in the parent ansatz. However, this is only well-defined when the priority list is inherited from a parent ansatz without repetition. For the above hierarchies that require looping, we only study the former choice, and denote by an asterisk results where the latter ordering is used.

VQE performance
We now test the performance of our variational hierarchies in different parameter regimes of the transversefield Ising model on N q = 8 sites. (Code to perform this investigation can be found at https://github. com/tarrlikh/QSA.) We take as a performance metric the relative energy error where E VQE is the energy of the converged VQE, and plot this as we increase the number N p of parameters in the hierarchy. The hierarchy gives a natural strategy to perform the optimization -at each N p , the optimized values of the previous N p −1 parameters are used as a starting guess for their new values (whilst the new parameter is initialized to 0). This approach converges much faster than re-starting each new simulation at the original value, as found previously in [5].
To focus on the performance of the ansatzes themselves, we do not include the effects of sampling noise or any experimental noise in our simulations. We first investigate the weak-coupling regime where perturbation theory holds (J/h = 0.15). In Fig. 6, we plot the convergence of as the first 30 terms from all studied hierarchies are added consecutively. At each subsequent point we reoptimize all parameters using the SLSQP algorithm, starting from the local minimum found at the previous point. We observe that all  hierarchies achieve good convergence, with the exception of revQCA, and that both variants of pertQCA achieve over an order of magnitude improvement over other ansatzes after 30 terms are added. We further observe that re-ordering the gates to follow the parent ansatz (pertQCA*) is preferable, leading to another order of magnitude improvement. We are unsure of the precise reason for this improvement, but suggest it may be attributed to the relatively large area of the variational manifold inherited from the parent ansatz, that may be lost under re-ordering. The discontinuities in the plot for pertQCA, pertQCA*, and 2-locQCA correspond to the points where all gates up to a certain perturbation theory order have been included. This makes sense, as our theory predicts these points should correspond to the error decreasing from O(J n ) to O(J n+1 ).
We next investigate VQE convergence in the strongly correlated regime (J/h = 6). We observe that all hierarchies perform worse here than previously. We attribute this to the strongly-coupled ground state being further from the starting state than the weakly-coupled ground state. Note however, that one can obtain one of the two degenerate ground states at h = 0 from | 0 as which is a rotation achievable after the first N q −1 = 7 terms of all considered hierarchies. This suggests that in all cases, the first order of the hierarchy is used to prepare this state, from which later orders perturb. Then, as perturbation theory around the strongly correlated ground state is significantly different to the perturbation theory around the non-interacting ground state, the generators we have chosen may not be optimal for this perturbation. This also explains the good performance of locQCA over the other hierarchies: by repeating local operators it ensures that it will obtain the lower orders (in h/J) of the true ground state. We finally investigate the performance of our hierarchies in the critical regime (J/h = 1), where a transition between the strongly-correlated and weaklycorrelated phases occurs in the thermodynamic limit. We observe that the relative error obtained by all ansatzes is the worst here, and that locQCA and pertQCA* behave similarly, obtaining up to an order of magnitude improvement over 2-locQCA and pertQCA. This loss of accuracy is not surprising, as we do not have a relatively cheap way of accessing any states perturbatively coupled to the ground state in the same manner as Eq. (64).

Conclusion
In this work, we have developed a diagrammatic framework for size-extensive variational quantum ansatzes, which avoids the use of Trotter-Suzuki approximation methods. We have described a large class of Pauli-generated product ansatzes demonstrably capable of spanning the entire Hilbert space with the minimum number of parameters necessary. We have demonstrated means by which one can compress ansatzes such as the above to a practical size, by a perturbative treatment of the target system, and by taking into account any symmetries that exist. To ensure the size-extensivity of the construction, we have stated and proven the digital quantum version of the linked-cluster theorem. We have tested variants of the resulting ansatzes on the transverse-field Ising model, finding that their performance in various regimes matches our expectations based on their means of construction. We observe that ansatzes that fully match the perturbation theory give a benefit in the weak coupling regime as expected. However, in the strong-coupling regime, focusing on the locality of the ansatz at the expense of perturbation theory considerations appears to be preferred.
As is well known in the field, the performance of any VQE ansatz is system dependent. Ansatzes that are derived from perturbative physical principles can be expected to perform best when perturbation theory converges well. By contrast, those founded on adiabatic principles (e.g. the variational Hamiltonian ansatz [24]) can be expected to perform best on systems with a large gap. As these two conditions are often correlated (e.g. a gap closing often corresponds to a phase transition and a breakdown of perturbation theory), a fair comparison of ansatzes based on these two principles (and with any other ansatzes) would require an extensive numerical study. This is an obvious target for future research.
We have avoided in the above any discussion of a quantum speedup for the VQEs that we have constructed in this work. To the best of our knowledge this remains an open and difficult question to show for any class of VQEs. Informally, to demonstrate a quantum speedup, one requires to be able to obtain an estimate of the true ground state energy E for an N q -qubit system, within an error , in time polynomial in N q . This also needs to be achieved in a class of N q -qubit systems for which no similar estimation is possible classically. The circuit length in a variational hierarchy grows polynomially in the number of parameters N p , so it would be sufficient to show that the error (N p , N ) scales polynomially in N p and N q . One also needs to consider the time cost of measuring the energy (which grows polynomially in N q ) and the time cost of optimization (which grows polynomially in N p ). Our results appear to show this behavior; we observe what appears to be exponential decay in N p for all three systems studied. (Note that the measurement and optimization requirements imply that the time cost to extract these energies from the device will still be at best polynomial.) However, 1D spin chains such as the transverse-field Ising model are well accessible by classical methods and polynomialtime algorithms are known for any weakly-coupled 2local spin system [23], so we do not expect a quantum speedup in this case. Finding target systems for which a speedup may be demonstrable, and further optimizing hierarchy construction to show this, are obvious targets for future research.

A Background
Definition 29. The state of an N q -qubit quantum register is represented by a norm-1 vector in the Hilbert space H = C 2 Nq , under the association |ψ ∈ H ≡ e iφ |ψ for φ ∈ R.
Definition 30. The Pauli basis on N q qubits is defined as P Nq := {I, X, Y, Z} ⊗Nq , where I, X, Y, Z are the 2 × 2 matrices on C 2 : and ⊗ is the Kronecker tensor product.
P Nq has the following nice properties: 1. P 2 = 1 for all P ∈ P Nq .
3. P ∈ P Nq = 1 has only two eigenvalues, ±1, and the dimension of the corresponding eigenspaces is precisely 2 Nq−1 (i.e. each P divids C 2 Nq in two).
4. This division by two may be further continuedgiven P, Q = 1 such that [P, Q] = 0, P and Q divide the Hilbert space into 4 eigenspaces (labeled by combinations of their eigenvalues).

5.
To generalize, one can form a [N q , k] stabilizer group S, generated by k Hermitian, commuting, non-generating elements of P Nq (up to a complex phase); this diagonalizes C 2 Nq into 2 k unique eigensectors of dimension 2 Nq−k . When N q = k, these sectors contain single eigenstates, which we call stabilizer states [20].
The Pauli basis is a basis for the set of 2 Nq × 2 Nq complex-valued matrices (hence the name); it is also a basis for the set of Hermitian matrices if one chooses real coefficients. However, it is not a group under matrix multiplication, as the single-qubit Pauli matrices pick up a factor of i on multiplication -XY = iZ / ∈ P. The closure of the Pauli basis is the Pauli group Π Nq = {±i} × P Nq ; this is four times as large, and no longer has the basis properties of P Nq . The Pauli basis inherits a form of multiplication from Π Nq -P · Q = R ∈ P Nq if P Q = e iφ R ∈ Π Nq , at which point However, under this multiplication P Nq becomes a commutative group, which sacrifices key information about its operator structure. Based on the second point in the above list, we may make the following useful definition: Definition 31. The relative sign of P, Q ∈ P Nq , s P,Q ∈ {−1, 1}, is defined such that P Q + s P,Q QP = 0. We further define the markers δ P,Q = (1 + s P,Q )/2,δ P,Q = (1 − s P,Q )/2 = 1 − δ P,Q .
This allows us to write the following useful identity: Unfortunately this does not extend to the commutation of two such exponentials; one has instead by the application of the Baker-Campbell-Hausdorff formula where the operator T ( θ) is a sum of n-th order cluster operators T (n) ( θ) between filled states i and empty states j of the non-interacting problem. (71) The choice of T ( θ) − T † ( θ) is made to respect K (as creation and annihilation operators are real). One typically takes only a few T (n) (usually up to n = 2), and Trotterizes the resulting expression in terms of individual excitations to implement on a quantum computer, in which case it becomes a product ansatz.ĉ † j andĉ j are the fermionic creation and annihilation operators for the jth orbital. These are not themselves Pauli operators, but they may be combined to make Majorana operators which are elements of P Nq (up to a possible sign).
(One can show this immediately upon choosing a mapping from fermions to qubits.) The fermionic number operator, N = jĉ † jĉ j , is equivalent to The secondorder cluster operator is slightly more complicated; one must take all terms of the form with i 1 = i 2 (j 1 = j 2 ) operators for empty (filled) states, and i a i + b i = 1 mod 2 (terms where i a i +b i = 0 mod 2 do not commute with K). Then, to conserve Γ, one must fix (One can confirm that all operators being fixed commute here, as required.) This procedure may be continued as needed to obtain higher-order cluster operators.
One might try to use the tools developed above and check if the Trotterized UCC ansatz tightly spans the reduced Hilbert space. On the one hand, the number of parameters in the full UCC, does match precisely the dimension of a real Hilbert space with η particles in N q orbitals. On the other hand, as the Trotterized UCC Jacobian is full-rank at θ = 0, we strongly suspect that it spans this Hilbert state. However, we did not find a definitive proof of this. In particular, Trotterized UCC is not a stabilizer ansatz, and we have not found an obvious construction of a stabilizer ansatz from UCC.

C Multivariate Dyson series
To prove the statement of Lemma 16, we need to analyze the multi-parameter expansion (28) of the ground state |E 0 , as a perturbative solution to the corresponding eigenvalue equation It proves to be convenient to first find an unnormalized solution |Ẽ 0 whose expansion states |Ψ k (cf. (30) ) obey a special condition: The properly normalized ground state |E 0 is then to be obtained as |E 0 = N |Ẽ 0 , for N = ( Ẽ 0 |Ẽ 0 ) −1/2 .
To find |Ψ k , one can use the Dyson series-like approach. For this, one rewrites (75) as: for E (0) 0 being the unperturbed ground state energy, and quantity ∆ defined as follows: Eq. (77) can be rewritten as: where the action of the inverse operator (E  (76)). Using expansion (30) and the form of perturbation JV = J · V , one recovers from (79) a set of equations on |Ψ k for all k = 0: for δ β the unit vector with the β component equal to 1. Note, that the action of G 0 here is again welldefined, since it acts on a state which has a zero overlap with |Ψ 0 (cf. (81) and (76)). Now, with (80), we expressed each state |Ψ k in terms of states |Ψ k which belong to lower PT orders: | k | < | k|. Using (80) and the unperturbed ground state |Ψ 0 = | 0 , one can obtain all the states |Ψ k up to any desired order.
Given the states |Ψ k , one can also find the expression for the normalization N , as a multi-parameter series: The expansion states |Ψ k of the normalised ground state |E 0 are then given by: With this scheme for finding the expansion states |Ψ k , we're ready to prove Lemma 16. To do so, first we will use (80) and prove the validity of the expression (31), together with the recursive relation (32). Then, using (83), we will extend our proof also to the states |Ψ k , recovering the statement of Lemma 16.
Proof -We start with a proof of the relation (31) for the states |Ψ k , by induction in PT order | k|. We first note that for | k| = 0, we have a single state |Ψ k= 0 = | 0 that clearly satisfies (31) -this will be the base of our induction. Next, we have to prove (31) for |Ψ k with an arbitrary k, assuming the validity of (31) for all |Ψ k s.t. | k | < | k|. To do so, let us express |Ψ k using (80) and show that the different terms that are present on the r.h.s. are proportional to the state V · k | 0 with a real coefficient. The terms of the type G 0 V β |Ψ k− δ β , assuming expression (31) for |Ψ k− δ β , can be rewritten as: The other contributions to the r.h.s. of (80) are of the form G 0 ∆ ( k ) |Ψ k , such that k + k = k. The factor ∆ ( k ) here can be rewritten using the assumption of induction: where we introduced the shorthand notation ∆ Re k for the real coefficient β S δ β , k−δ βC ( k − δ β ) . With this observation about ∆ k and the assumption of induction at hand, the following manipulation can be performed: where we used the condition k + k = k. Combining (85) and (90), we see that the expression (80) indeed implies the form (31) of |Ψ k , with a real coefficient C k which is given by the formula (32). Before extending this result to the coefficient states |Ψ k of the normalized ground state |E 0 = N |Ẽ 0 , we will need to make an aside and prove the following property of the coefficients N k : for a real coefficient N Re k . First, one can observe that an analogous property holds for the coefficients Z k of Z ≡ Ẽ 0 |Ẽ 0 = N −2 : with a real coefficient Z Re k defined as in this derivation, we used (29) for states |Ψ k . Now, observe that Z 0 = 1, which means that the norm N = Z −1/2 = (1 + ) −1/2 can be expressed as a Taylor series in = k =0 J · k Z k , which is a quantity of order O(J). Expanding the terms of such Taylor series, one observes that the coefficients N k are given in terms of products of coefficients Z k such that the combined perturbation theory order k is conserved -for example, a product Z k1 Z k2 will contribute to N k1+ k2 . This allows to obtain the property (91) from (95) term by term. For instance, Z k1 Z k2 is proportional to 0| V ·( k1+ k2) | 0 with a real coefficient: = δ s( k1), 0 δ s( k2), 0 S k1, k2 Z Re k1 Z Re k2 0| V ·( k1+ k2) | 0 .
This statement can be directly extended to any product of multiple Z k 's, recovering (91), as desired.
To prove expression (29), we simply use the property (91) and (31) for |Ψ k , in the formula (83): This concludes our proof of Lemma 16.

D Separability of disconnected contributions
In what follows, we prove Lemma 19.
Proof -Consider a disconnected contribution |Ψ k = C k V · k | 0 to the ground state |E 0 of the Hamiltonian H = H 0 + J · V , with a corresponding splitting k = k A + k B . The two sets of couplings that are activated, respectively, in k A and k B , we will denote A and B. We also introduce two non-intersecting sets of qubits, Q A and Q B , such that they include, respectively, the supports of k A and k B , and their union Q A ∪ Q B constitutes the whole set of qubits.
Let us consider an auxilliary Hamiltonian H , which is equal to H with a constraint J i = 0 for all couplings V i which are not in A ∪ B. In the PT series for the ground state |E 0 of such an auxilliary Hamiltonian, the terms C k are equal to the corresponding terms C k in the full series (28) -namely those, where no couplings V i are activated besides those in A ∪ B. In particular, (103) still contains the disconnected contribution of interest, C k = k = C k . On the other hand, H is a sum of two independent Hamiltonians, defined on subsystems Q A and Q B : This implies that the ground state |E 0 , will be a tensor product of the ground states of H A and H B , In turn, the subsystem ground states |E 0 A and |E 0 B can themselves be written as PT series in couplings restricted on A and B, separately: whose terms, again, are identical to those in the full series (28), with only couplings from A (B) activated: ). Combining (103), (107), (108) and (109), for our term of interest C k we obtain the desired relation: No. of energy evaluations

E Convergence speed of classical optimization of QCA
In this appendix we show the convergence rate of our classical optimization of QCA in terms of the number of function evaluations for Fig. 6, Fig. 7 and Fig. 8  (Fig. 9, Fig. 10 and Fig. 11 respectively). We have not performed any metaparameter tuning for this optimization, which would likely improve these numbers significantly. The optimization here was performed in the absence of realistic conditions on quantum hardware (in particular in the absence of sampling noise); any further optimization of convergence times would need to take this into account in order to make a realistic comparison to other ansatzes.