Efficient variational contraction of two-dimensional tensor networks with a non-trivial unit cell

Tensor network states provide an efficient class of states that faithfully capture strongly correlated quantum models and systems in classical statistical mechanics. While tensor networks can now be seen as becoming standard tools in the description of such complex many-body systems, close to optimal variational principles based on such states are less obvious to come by. In this work, we generalize a recently proposed variational uniform matrix product state algorithm for capturing one-dimensional quantum lattices in the thermodynamic limit, to the study of regular two-dimensional tensor networks with a non-trivial unit cell. A key property of the algorithm is a computational effort that scales linearly rather than exponentially in the size of the unit cell. We demonstrate the performance of our approach on the computation of the classical partition functions of the antiferromagnetic Ising model and interacting dimers on the square lattice, as well as of a quantum doped resonating valence bond state.


Introduction
Tensor network methods are increasingly becoming a standard tool for studying the physics of stronglycorrelated systems, both from the perspective of a theoretical and mathematical understanding of manybody effects as well as for providing a versatile toolbox for numerical simulations [1][2][3][4][5][6]. In the context of onedimensional quantum physics, matrix product states (MPS) have been identified to parametrize the lowenergy states of gapped local Hamiltonians, in fact provably so. MPS-based algorithms allow for efficient simulations of static and dynamic properties of spin chains in various facets [7] and allow for precise numerical analysis of symmetry protected topological order in one dimension [8][9][10]. Two-dimensional quantum systems can be simulated using instances of projected entangled-pair states (PEPS). Indeed, the A. Nietner: anietner@physik.fu-berlin.de B. Vanhecke: bavhecke.vanhecke@ugent.be PEPS toolbox is increasingly capturing ground state [11][12][13][14], dynamic [15][16][17], spectral [18,19], and finitetemperature [20,21], properties as well as the simulation of open system dynamics [22] of quantum spins or electrons in two dimensions. In the field of classical statistical mechanics, tensor networks provide a natural way of simulating e.g. critical and/or frustrated systems in two [23] and three [24,25] dimensions, which are notoriously difficult for standard sampling methods.
When applying tensor networks to two-and threedimensional systems, both quantum and classical, the computational bottleneck always consists of the contraction of a two-dimensional tensor network. In the easiest case, one considers a translation-invariant tensor network on a regular lattice in the thermodynamic limit, i.e. the network consists of a single tensor that is repeated over an infinite lattice. In the past, different algorithms have been proposed for this fundamental task, which can be subdivided into three main approaches: (i) Real-space renormalization-group methods [26][27][28][29][30][31] rely on coarse-graining transformations on the level of the tensors that make up the network, such that global properties of the tensor network can be efficiently computed. (ii) In corner transfer matrix methods [32][33][34][35][36][37], part of the network is represented by effective corner tensors which are obtained by an iterative growing of the environment and truncation of the tensors. (iii) Boundary methods aim to approximate the fixed point of the row-to-row transfer matrix as an MPS, such that this MPS represents an effective representation of half of the network; different algorithms can be used to find the fixed point such as the density-matrix renormalization group [1,23], the time-evolving block decimation [38,39], the tensor ring decomposition [40,41] or the VUMPS algorithm [42][43][44].
The two former approaches rely on power methods to find a fixed point for the contraction of an infinite tensor network, whereas the latter two iterate local self-consistency relations. In particular variational MPS-tangent-space methods such as VUMPS can exploit more advanced solvers for the leading eigenvector of the transfer matrix. This property leads to a significant speed-up for the VUMPS algorithm as compared to power methods when critical or close-tocritical tensor networks are considered [42,45].
In many applications, the relevant two-dimensional tensor network cannot be chosen to be translation invariant, but rather consists of a larger unit cell of different tensors that are repeated over the infinite lattice. In other scenarios, the tensor network itself is translation-invariant but the lattice symmetry is spontaneously broken. In both cases, an algorithm with uniform tensors can not be used for the contraction. Whereas corner transfer matrix approaches have been extended to the case of larger unit cells [12,46], the variational boundary-MPS methods have not been formulated in this more general setting. In this work, we show that this generalization of the VUMPS algorithm is, in fact, a very natural one and leads to an algorithm with a complexity that scales linearly with the size of the non-trivial unit cell.

Set-up
In this section we review the basic ingredients that allow for the contraction of translation-invariant twodimensional tensor networks using the VUMPS algorithm [42], setting the stage for the multi-site version in the next section.

Two-dimensional tensor networks
Two-dimensional tensor networks are most naturally obtained in the context of two-dimensional statistical mechanics, where they appear as a representation of the partition function of lattice spin models with local interactions. Indeed, suppose we have a system of spins s i = ±1 on a regular, say square, lattice with nearest-neighbour interactions (1) The partition function for this model is given by We can now write this partition function as a tensor network by placing a local Boltzmann weight represented by the symmetric matrix t t = e −βH(s1,s2) on each link on the lattice, placing a δ tensor where q 2 = t. A local expectation value of an observable O with values O j , j = ±1, at site i is given by changing one tensor in this tensor network, i.e., where the new tensor N is given by and is placed at site i. Similarly one can represent generic n-point functions by placing the n defect tensors in the form of Eq. (8) at the corresponding sites in the partition function Eq. (5). Two-dimensional tensor networks also show up as the norm or local expectation values of twodimensional PEPS. Then, the elementary four leg tensor M is given as the sandwich of the PEPS tensor and its conjugate where the thick leg on the left hand side corresponds to the tensor product of the two thin legs of the respective side of the PEPS tensor.
To contract two dimensional tensor networks it is natural to use the fact that topological two dimensional systems (trivial or non-trivial) are in general believed to admit a gapped boundary [47,48]. This is, there exists a many body state vector |ψ with exponentially decaying correlations such that (10) Note that similar to Ref. [47], we use the notion of a boundary interchangeably for the boundary of the lattice as well as for the boundary state living on the boundary. If one is able to efficiently compute and manipulate this boundary one can use the fixed point equation (10) in order to reduce the contraction of the two-dimensional system to the contraction of a one-dimensional system.

Matrix product states in the thermodynamic limit
Matrix product states (MPS) are a class of states that can be efficiently contracted and that provably capture the local properties of gapped boundaries arbitrarily well [7,[49][50][51][52]. Therefore, MPS are a powerful tool for approximating boundaries of two-dimensional tensor networks in the thermodynamic limit in an efficient way. An MPS is fully determined by a three-leg where χ is called the bond dimension and d the physical dimension of the MPS. The boundary represented by this tensor can be written in the thermodynamic limit as The representation of a state by a three-leg tensor is not unique due to the gauge degrees of freedom of introducing an identity in the form of a pair of matrices X · X −1 on a bond. This gauge freedom can be exploited to choose a canonical form for the tensors. In the so-called mixed canonical gauge one fixes a reference site i and all tensors to the left of i are represented in the left canonical gauge A L , and all tensors to the right of i are represented by the right canonical gauge A R , i.e.
where we have introduced a center site tensor A c , represented as with C = (C α,β ) the matrix that gauge transforms A L to A R and whose singular values are the Schmidt values of the state corresponding to the cut through that bond. The left and right canonical tensors are defined as the tensors representing the same state as A with the additional properties of AL AL = (14) and respectively. In other words, the tensors A L and A R are isometries 1 . The set of MPS tensors for a fixed bond dimension χ modulo gauge equivalence defines the manifold of MPS M χ . Through the map |ψ(·) that embeds the tensors into the many body Hilbert space, one obtains a metric on M χ and a notion of a tangent space projector that projects states from the Hilbert space onto the tangent space of the manifold [44,53], cf. 2.4.

Approximating center tensors
Motivated by the VUMPS algorithm, it is convenient to have a means of finding a MPS approximation to a given pair of tensors A C and C. In principle, a MPS is fully defined via A L or A R alone. However, one can define a MPS via A C and C, where A L and A R can be found by using Eq. (13) and inverting the matrix C. Note, however, that not every pair consisting of a three-and a two-leg tensor exactly represents a MPS, as solving Eq. (13) via inverting C not necessarily gives rise to a left or right isometric tensor. Given two tensors A C and C corresponding to a MPS, one can find the isometric tensors A L and A R that realize Eq. (13) by a singular-value decomposition (or, likewise, by a polar decomposition) [42] and similarly for A R Obtaining isometries A L and A R from some pair A C and C (not necessarily explicitly representing a MPS) gives rise to gauging errors L and R as 1 Throughout this paper we have the convention that MPS tensors with their physical leg pointing upwards are the complex conjugated version of those with their legs pointing downwards.

Variational optimization for uniform matrix product states (VUMPS)
The VUMPS algorithm [42] is a fixed point iteration method for finding the boundary of a two-dimensional tensor network. The desired fixed point equation is obtained starting from expressing the eigenvalue equation for the boundary state Eq. (10) in terms of MPS as The approximation sign in Eq. (22) is due to the MPS approximation of the boundary, and signifies that we aim at finding an MPS that approximates this equation in an optimal way. Here, we chose tho following optimality condition: We interpret the set of MPS with a given bond dimension χ as a variational manifold M χ . Next, we observe that the MPS on the left hand side of Eq. (22) has bond dimension χD, with D the bond dimension of the MPO, whereas the MPS on the right hand side has bond dimension χ. Then, optimality with respect to the variational manifold M χ implies that the left hand side variationally truncated to χ should equal the right hand side. This optimality condition can be reformulated as saying that we look for the MPS for which the error made in Eq. (22) is orthogonal (in Hilbert space) to the tangent space on the manifold. Put differently, the tangent space projector of the manifold applied to the equation should vanish to guarantee the optimal solution within the variational manifold.
The tangent space projector P A at A (projecting any state in the many body Hilbert space onto its overlap with the tangent space at the state parametrized by A) can be graphically written as [43] where the diverging N and N denote the normalizations counteracting the non-normalized tensors M , one finds [42] that the vanishing of the tangent space projector on Eq. (22) is equivalent to Eq. (13) together with Moreover, because the gauge transformation C transforming A L and A R into each other is unique up to a factor, it must hold that A C ∝ A C . This leads to the VUMPS equations Still, we can use the framework from above to iteratively approximate the desired fixed point. Starting from a random MPS and solving this equation once for A C and C respectively we can use Eq. (14) and (15) to find a new MPS approximating this pair of tensors with a gauging error max{ L , R }. This update implicitly defines a non-linear map in the MPS manifold. Iterating the procedure we will eventually converge to the fixed point of this map represented by a pair of A C and C with vanishing gauging error. This fixed point will then correspond to the optimal approximation to the boundary of the tensor network within the MPS manifold. The gauging errors L and R correspond to the norm of the gradient of ψ(A)|M |ψ(A) with respect to A. Similar as in DMRG and other contraction methods it is possible to get stuck in a local minimum, which in practice however is rarely the case.

Contracting tensor networks with non trivial unit cell
In this section we will show how the above mentioned method can be extended to the case of a non-trivial unit cell with the computational effort of the algorithm growing only linearly rather than exponentially in the unit cell size.

MPS with non-trivial unit cell
In order to deal with tensor networks with a nontrivial unit cell it is convenient to first introduce MPS with a non-trivial unit cell. A MPS with a non-trivial unit cell of size n is given by the data of the n tensors A i , i = 1, . . . , n, of shape (χ, d, χ), or equivalently by a single four leg tensor A of shape (χ, d, χ, n), where the A i are repeated cyclically in order to construct the MPS Here, the blue index corresponds to i = 1, . . . , n and the + matrix is the raising operator mapping the i-th basis vector to the i+1-th basis vector modulo n. Note that throughout this work the raising operator will be used such that it is either acting from the left to the right or acting from top to bottom. The dimension will be clear from the context. Moreover, if it is clear from the context that we are dealing with a non-trivial unit cell MPS we might drop the dependency on i (the blue leg) for the sake of a clearer picture. Note that the state as in Eq. (29) is a superposition of all unit cell shifts of the MPS with a non trivial unit cell. However, fixing the x-channel to a specific value at any point (eg via an additional leg at one of the delta tensors) one can fix to a specific MPS with a nontrivial unit cell.
Similar as for standard MPS we can bring |ψ(A) to a canonical form defined with tensors A C and C (now both having an additional leg encoding the position in the unit cell) where again the left and right canonical tensors A L and A R fulfill (14) and (15) at each i = 1, . . . , n, sep-arately. A multi-site MPS can, therefore, be represented as The formulas for finding A L and A R from a given A C and C similar to Eq. (17) and (19) for MPS with a non-trivial unit cell will turn out to be very convenient in order to generalize the VUMPS algorithm. It is straightforward to see that they can be generalized to and

Non-trivial unit cell VUMPS
Let us now generalize the VUMPS algorithm to tensor networks with a non-trivial unit cell. As before we assume a tensor network on a square lattice now with a unit cell of shape (n x , n y ) with D denoting the bond dimension of the tensor network. We can write this as the consecutive application of n y MPOs M ny · · · M 1 , each with a substructure of a unit cell of length n x . Again, we assume the existence of a gapped boundary of the tensor network that we approximate by an MPS |ψ(A) with a non-trivial unit cell. Due to the nontrivial unit cell, |ψ(A) gets mapped onto itself only after the consecutive application of the n y MPOs. As this argument is independent on the cyclic order of the n y MPOs there exist n y boundaries |ψ(A 1 ) , . . . , ψ(A ny ) for which the fixed point equation reads where we have dropped the dependencies of the tensors on the x and y coordinate for the sake of simplicity of the picture (as with the substructure of a nontrivial unit cell MPS we can encode the y dependency of the tensor in an additional leg for the MPS and MPO tensors which, in case of the y coordinate, we will encode by a red coloured leg in the following, cf. Appendix B.1). It is easy to see that λ y is constant in y: if M y+ny · · · M y |ψ = λ y |ψ then it must hold that M y M y+ny · · · M y−1 (M y |ψ ) = λ y (M y |ψ ). More generally, the full spectrum of M σ(y+ny) · · · M σ(y) is the same for all cyclic permutations σ. One way of solving (36) is to block the MPOs together and thereby reduce the problem to a tensor network with a trivial unit cell for which one can apply the VUMPS algorithm from the previous section. However, this comes at the cost of increasing the bond dimension of the MPO to D ny (the blocking along the x direction can be avoided with a linear overhead as shown in Ref. [42]). Let us therefore introduce a simple trick to solve (36) with only a linear instead of an exponential overhead in n y .
The general idea is as follows. Given n square matrices X 1 , . . . , X n of dimension k × k such that the product X y+n · · · X y is diagonalizable. We are looking for the n vectors v y such that where − is the inverse of the + operator, λ is the eigenvalue corresponding to the consecutive application of all the X y , the black dots are δ tensors and the dots indicate that we multiply over all the n y matrices. Eq. (37) can be solved either by blocking the n matrices to a single matrix, or we use the fact that it holds that where we have written γ y as a vector in y where γ y+n · · · γ y = λ with lambda is the respective eigenvalue of the blocked matrices. It is straightforward to check (38) by simply substituting it into (37). We can rewrite (38) as Finally, absorbing the γ y into the norm of the v y+1 we obtain an eigenvector equation for the vector v in the vector space corresponding to the tensor product space of the two legs of v as where we set γ 0 = 1 and with γ y , y ≥ 1, as before.
Eq. (40) is an eigenvector equation solving for all v i at the same time. Assuming an iterative solver this corresponds to increasing the complexity from O(k) to O(nk), which in turn corresponds to a linear overhead in the scaling parameter n. While this is not necessary for matrices, as blocking increases the complexity only linearly, too, it is crucial for MPOs due to the non-trivial bond dimension. Finally, the crucial observation is that we can re-obtain the sought-after v y from the solutionṽ simply by normalizing the respectiveṽ y (c.f. Eq. (41)).
Let us now do the same on the level of MPOs. We rewrite Eq. (36) similar to Eq. (38) as (42) where we made the y dependence on the tensors explicit for clarity. This is the analogue of (38) on the level of tensor networks. Hence, we can use the same arguments as before to obtain the tensor network analogue of (40) We start from Eq. (43) and enforce the tangent space projectors corresponding to the states of each y on the right hand side to vanish. This corresponds to the application of the generalized tangent space projector as in Appendix B.2. Combining this with the gauge constraints Eq. (30) and the fact that the n y MPS with a fixed x-and y-index are injective, we obtain the equations for A y+1 C and C y+1 respectively A more detailed derivation of these fixed-point equations can be found in Appendix B. The variational optimality can be easily understood as shown in Appendix A.2.

The algorithm
We obtain the n y coupled local equations for the A C and C tensors Eq. (44) and (45) which, together with Eq. (33) and (35) implicitly defines a flow through the coupled MPS manifolds of the n y boundaries. This flow can be integrated just as in 2.4. To do so we solve the local equations Eq. (44) and (45) for an initial set of MPS, update all boundaries according to the just obtained solutions for A C and C using Eq. (33) and (35). Again, derive the new local equations according to the current set of boundaries. Iterate this procedure until convergence is reached.
Let us now phrase this in explicit algorithmic form. We can compute the left and right channels in (44) and (45) explicitly by using the leading left and right eigenvectors of the transfer matrices respectively. To this end, we can either use (40) for the consecutive application of the transfer matrices. However, Eq. (46) and Eq. (47) are already optimal in complexity because we are dealing with matrices rather than MPOs. On the other hand, for sufficiently small gaps γ i in the transfer matrices (cf. Eq (39)) Eq (40) leads to more stability. Moreover, using an iterative solver we can exploit the tensor-network structure to reduce the memory allocation. Thus one obtains a computational effort of O(D 2 χ 3 +D 4 χ 2 ) compared to O(D 2 χ 4 ), which can further be reduced by paralleliz- where we have drawn the dependence of the tensors on x and y explicitly for the sake of clarity. The dots imply the matrix product over the n x transfer matrices 2 In terms of the PEPS bond dimension D P those complex- contained in one unit cell of the y-th row. Similarly, the equations for the right eigenvectors are Note, while we need to solve this equation for all y we can fix the x value and obtain the remaining environment tensors via applying the respective transfer matrices (followed by a normalization).
We can now use the (x, y)-dependent environments (L x,y ) x,y and (R x,y ) x,y in order to define the application of the local fixed point equations h A C and h C corresponding to (44) and (45) that we will use to iteratively solve for the tensors A C and C and Let us recapitulate the algorithm: To start we initialize a set of n y MPS with a unit cell of size n x , each of these MPS corresponding to a boundary. Then we derive the local equations h A C (cf. (48)) and h c (cf. (49)) using (46) and (47). Solving h A C and h C we obtain new tensors A C for all x and y and C for all x and y. Using these we update the boundaries using (33) and (35) obtaining a gauging error. To do so, use either (33) or (35) to get a set of left (right) canonical tensors from the A C and C together with the respective gauging errors. Next find the right (left) canonical representation of these tensors as in Ref. [44] Algorithm 2. The left and right canonical tensors define the boundaries for the next iteration. Next, start again from the beginning until the gauging error is smaller than a desired threshold . The algorithm is shown with more structure in Algorithms 1 and 2. Note, instead of using the gauging error one might as well use the singular values of the C tensors as convergence criterion. Finally, instead of the parallel update just defined one could as well use a sequential update in the x direction (cf. Appendix C). Algorithm 1: Explicit terms of the local Hamiltonians in the parallel implementation. y ) x,y ←− compute left eigenvector according to (46) (R x,y ) x,y ←− compute left eigenvector according to (47)

Test cases and benchmarks 4.1 Classical anti-ferromagnetic Ising model
To test the multi-site VUMPS algorithm we investigate the classical anti-ferromagnetic Ising model on a square lattice with a magnetic field. This is given by the partition function where i, j denotes the sum over nearest neighbours on the square lattice. We can write this partition function as a two-dimensional tensor network on the square lattice from the tensor x,y ←− solve h A C using an iterative solver calling apply_h_a_c (C x,y ) x,y ←− solve h C using an iterative solver calling apply_h_c for y ∈ {0, . . . , n y } do L/R } x by finding the mixed gauge (similar to Algorithm 2 in Ref. [44]) and Although the partition function of the Ising antiferromagnet can be represented by a single tensor M , it actually has a non-trivial unit cell. This can be easily seen as follows. The partition function of the anti-ferromagnet equals that of the Ising ferromagnet after a sub-lattice rotation flipping every second spin in the lattice. This transformation has a 2 × 2 unit cell. Also, this sub-lattice rotation would map a homogenous magnetic field to a staggered magnetic field highlighting this unit cell. This actually reflects the fact that the anti-ferromagnetic partition function has a vanishing norm if restricted to odd lattices. Hence, the thermodynamic limit is only well defined if restricting to even lattices. This behaviour can be nicely observed using our algorithm. So in a sense, the anti-ferromagnet is a ferromagnet with a non-trivial unit cell.
Fixing J = 1 we can map the physics of Eq. (50) into the plane spanned by β ≥ 0 and h ∈ R. For h = 0 we can compute the magnetization within the unit cell and compare to exact results. As for h = 0 the anti-ferromagnet equals the ferromagnet with every second site flipped in the z basis, we expect a staggered magnetization in the unit cell with the same strength as for the ferromagnet. In Fig. 1 we show our results obtained with algorithm 4 for bond dimension χ = 20. We compare the absolute value of the magnetization of the first site in the unit cell (x, y) = (0, 0) with the exact results. Moreover, we compare to the half distance between consecutive sites in the unit cell y) is the magnetization at the unit cell site (x, y). We find perfect agreement for all quantities up to machine precision.
We have investigated the phase transition around β c ≈ 0.44068 in more detail in Fig. 2. We have computed |m(0, 0)| for various bond dimensions χ = 12, 16, 20, 28, 36, 44. Only very close to the critical point our results start to deviate from the exact values depending on the bond dimension, reflecting the fact that low-χ MPS cannot describe critical states accurately. In order to further illustrate our results, in Fig. 4 we show the magnetization in the unit cell depending on β for h = 0. One can nicely see the emergence of the staggered magnetization through the phase transition.
Next, we have investigated the transition from a staggered magnetized phase to an ordered magnetized phase at β = 0.45 and along h ∈ [0, 2]. In

Interacting dimers on the square lattice
As a second benchmark, we consider a model of interacting dimers on the square lattice [54,55]. This model starts from the well-known dimer-counting problem, which was solved exactly in the early sixties [56][57][58]. This problem can be extended to a statis-   tical mechanics model, where the allowed configurations are determined by all dimer coverings and we can associate an energy to each dimer configuration. A natural choice for the energy E c of a given configuration is the number of parallel dimers, where the sum is over all plaquettes and N p (c) is either one or zero (depending on whether there are two parallel dimers on plaquette p for the configuration c). We include a minus sign in order to favour parallel dimer configurations. The partition function is obtained by summing the Boltzmann weights over all configurations with β = 1/T the inverse temperature. This model has been shown [54,55] to exhibit rotational and translational symmetry breaking in a so-called columnar phase, quantified by the order parameter where M l (c) is one if there is a dimer on the link l and zero otherwise, and V and H denote all vertical resp. horizontal links. On the other hand, in the high-temperature limit (T → ∞) the partition function reduces to an unweighted sum over all allowed dimer configurations, the model is exactly solvable [56][57][58] and exhibits critical correlations [59] without any spatial symmetry breaking. A phase transition occurs between these two phases around temperature T c = 0.65 [55,60].
We can represent the partition function of the interacting dimer model by a tensor network with a bipartite sub-lattice structure. Indeed, if we choose the tensors A and B as and define the matrix t as This construction is seen to give the correct partition function by realizing that (i) a '1' on a link denotes the presence of a dimer in that configuration, (ii) the tensors A and B guarantee that exactly one dimer is on each site, (iii) the e β/2 factors in the t matrix introduce the correct Boltzmann weight when there are two parallel dimers on a plaquette.
Since we have a tensor network with a two-by-two unit cell and this model breaks translational symmetry at low temperatures, we apply the multisite VUMPS algorithm. In Fig. 7 we plot the probabilities of finding a dimer on the different links in the lattice, showing columnar order for low temperatures and a uniform distribution in the high-temperature phase. In Fig. 6 we show the behaviour of the order parameter D for different values of the bond dimension, showing an increasingly critical form as the bond dimension increases. To determine the critical point we compare the temperature of the inflection point against the order parameter of the inflection point for different values of the bond dimension. We see that the temperature of the inflection point relates linearly with the value of the order parameter at that temperature. In the inset of Fig. 6 we provide an extrapolation of the critical point T c = 0.6523 ± 0.0001, which agrees with and improves upon known values in the literature.

Doped RVB state
For our third benchmark we start from the resonating valence bond (RVB) state in the square lattice, which can be represented as a translation-invariant PEPS from a local tensor A s u,r,d,l with explicit SU(2) invariance [61]. Following the framework of symmetric tensor networks, we can label the non-zero blocks in the tensor with SU(2) irreducible representations; for the nearest-neighbour RVB we have only four nonzero blocks, This state is known [61] to be in a critical (2 + 0)dimensional Coulomb phase and has power-law decaying dimer-dimer correlations. As a result, the contraction typically requires a large bond dimension for accurate results. We now modify the RVB state by introducing holes on one of the two sub-lattices. This 'doping' is performed by introducing a second PEPS tensor B with non-zero blocks The PEPS is then built up from the PEPS tensors  where the above blocks in the tensor B are chosen such that the state is rotation invariant. This state is similar to previous doping constructions, where a doping of the RVB state by unpaired spins [62] and fermionic holes [63] has been implemented in a translation-invariant way. Also, in contrast to the latter, we do not invoke any fermionic degrees of freedom in our state, so we effectively dope the state with bosonic holes on every second site. Since this PEPS is non-translation-invariant by construction, we use the multi-site VUMPS algorithm for contracting it and computing observables. As an illustration, we fix the above three parameters to be the same (λ 1 = λ 2 = λ 3 = λ) and we compute the hole density (per site) as a function of the parameter λ. Because the undoped RVB state is critical, a large bond dimension for the boundaries is needed and we work with explicit SU(2) invariant tensors in each step of the VUMPS algorithm. In Fig. 8 we plot the result, showing a slow onset that follows a power-law, as is illustrated by the inset log-log plot followed by a slow saturation to 1 as λ grows bigger, as is to be expected.

Conclusion and outlook
In this work, we have presented a generalization of the VUMPS algorithm for contracting two-dimensional tensor networks to the case of non-trivial unit cells. The algorithm inherits the salient features of variational uniform MPS methods including optimality guarantees and high rates of convergence, while the computational effort scales only linearly with the size of the unit cell. We have benchmarked the contraction method on the square lattice antiferromagnetic Ising model, the phase transition in the square-lattice interacting dimer model and on the doped RVB PEPS. We expect this algorithm to be important in the simulation of quantum spin systems in two-dimensions with PEPS, wherever larger unit cells are required for the representation of the ground state, as well as for the simulation of classical two-dimensional systems where a unit cell appears either in the representation of the partition function or due to symmetry breaking. We hope that the emulation technique of larger unit cells presented in this work inspires further generalizations of known tensor network methods to larger unit cells.

A Derivation of the generalized fixed point equations
In this appendix, we are going to give an alternative to the derivation of the optimality criterion of the VUMPS algorithm and thereby generalize it to non trivial unit cells. In Section A.1 we will generalize the reasoning from Ref. [42] to non-hermitian transfer operators. In Section A.2 we will further generalize to non-trivial unit cells. An alternative derivation of the fixed point equations for a non-trivial unit cell using the emulation of non-trivial unit cell tensor networks with trivial unit cell tensor networks which is based on the single site VUMPS algorithm with non-hermitian transfer operators is given in Appendix B.

A.1 VUMPS for non-hermitian transfer matrices
The VUMPS algorithm [42] is a scheme to optimize the equation via iterating over local tensor equations. Here A is the tensor parametrizing the state vector |ψ(A) , M is the MPO for which we wish to find an approximation of the boundary, λ is the corresponding maximal eigenvalue and P A is the projector onto the tangent space of the MPS manifold at the point parametrized by A as in Eq. (23). Eq. (68) can be shown to be the variational optimum approximating the boundary of the tensor network parametrized by M over the MPS manifold for any given fixed bond dimension given that the MPO transfer matrix defined through M is hermitian [42]. In the following, we are going to show that this holds true also if the MPO transfer matrix defined through M ceases to be hermitian but is gapped, in the sense that the the absolute values of the eigenvalues of the MPO transfer matrix still sustain a gap stable in the asymptotic limit. This situation can reflect a classical partition function of a non-critical Ising model or the double layer sandwich of the toric code PEPS. For the contraction of the tensor network, we are interested in its boundary which is the maximal eigenvector of the MPO transfer matrix. More precisely, the boundary |φ is the eigenvector corresponding to the largest eigenvalue of M in absolute value Because the MPO is gapped in the above sense, the boundary will feature exponentially decaying correlations. The exponentially decaying correlations of the one dimensional boundary, however, imply that we can faithfully approximate it locally by an MPS. More precisely, the reduced density matrix of the many body state with exponential correlations on segments of length k is -close to an MPS with bond dimension scaling as O(k/ 3 ) [49][50][51][52]. So in this sense, being interested only in k-local properties we can write the state explicitly as an MPS with an error smaller than .
Hence, for large enough k and at a sufficiently large bond dimension χ such that the corresponding (k, ) approximation error is below machine precision, we are practically not able to computationally distinguish the MPS |ψ(A) from the true boundary state vector |φ . Hence, we argue that we can apply the fixed point equation Eq. (69) to |ψ(A) where the approximation error due to the application of M might increase from to some > , though the increase will be in a controlled way given that M is gapped. This can be understood as follows. The state vector |ψ(A) is (k, )-close to |φ . Additionally, the application of M exponentially suppresses the overlap orthogonal to its maximal eigenstate. Hence, the normalized M |ψ(A) is closer to |φ than |ψ(A) is. Therefore, the normalized M |ψ(A) will be (k , /2) close to |φ , up to boundary effects (at the boundary of the local patch of size k) decaying exponentially in the depth into the region, leading to a modified /2, and hence at most (k , )-far from |ψ(A) . Hence, if k and χ are chosen sufficiently large, the resulting > 0 will still be below machine precision, while k will still be sufficiently large. As we are working over the manifold of MPS, it is natural to use the tangent space projector in order to reduce the fixed point Eq. (70) from a many body Hilbert space equation to a finite equation. The resulting equation still resolves the overlap of M |ψ(A) with the full manifold of MPS whilst only neglecting states that cannot be captured by the manifold of MPS. Hence, we obtain Eq. (68). Similarly as before, one might argue that the projector is not a local operator and hence one might not be able to apply the (k, )-closeness. However, we can construct a similar argument as for the MPO transfer matrix applied to the MPS in Eq. (70). For each term in the sum of the projector (cf. Eq. (23)) applied to the MPS, we find that essentially only a finite patch contributes to the projector because the gap in the MPS-MPO-MPS transfer matrix exponentially suppresses the contribution of the tail of the transfer matrix channel.
It is interesting to note that the above reasoning could be generalized even further to critical systems, at the price of a polynomial rather than a linear de-pendence on the bond dimension [52]. Also, for critical systems, the application of M does not exponentially suppress the overlap orthogonal to the boundary but only in a weaker sense. To conclude, we find that the optimality criterion the VUMPS algorithm optimizes can also be understood for non-hermitian gapped MPOs.

A.2 Non-trivial unit cells: A variational argument
As explained in Section 3.2, in order to find the boundary of a tensor network with a non-trivial unit cell we need to find a |ψ 1 and λ that satisfy M n . . . M 2 M 1 |ψ 1 = λ |ψ 1 , where the M i are MPOs or MPO-like objects defining the tensor network. We do this by assuming |ψ 1 is well captured by an MPS |ψ(A 1 ) , which may be motivated through arguments similar as in Appendix A.1. As explained before, this is equivalent to looking for a set of MPSs |ψ(A i ) and scalars γ i satisfying with i γ i = λ. We use the approximate sign in order to emphasize that |ψ(A i+1 ) approximates M i |ψ(A i ) in some sort of optimal way. In this text we choose the meaning of this optimality to be variational in the sense of the fidelity per site in the thermodynamic limit. In particular, should be maximal with respect to A i+1 . Having thus translated the problem to variational MPS [44] we can formulate a first naive algorithm to find the boundary. Start with some initial MPS |ψ(A (0) 1 ) and consecutively apply all the M i each time followed by variationally approximating the new MPS by maximizing the fidelity per site by another MPS of some given bond dimension. Repeat that procedure until convergence is reached. Clearly, this power method will eventually converge to the fixed point characterized by the Eq. (71).
Similarly to the derivation of the VUMPS algorithm [42] we find the optimality criterion of maximizing the fidelity per site to be equivalent to the vanishing of the tangent space projector on the fixed point equation. In particular, for the variationally optimal solution A i to Eq. (71) it must hold where P A represents the MPS-tangent space projector at the state parametrized by A. In this particular case the tangent space criterion is equivalent to maximizing the fidelity per site. Note, however, that the tangent space criterion need not imply a variational principle in general. Instead of the just described power method however we go one step further: we write down all the fixed point equations Eq. Note that the same chain of reasoning starting from the variational power method and concluding with the VUMPS fixed point iteration can be made for the standard single site VUMPS algorithm. However, for a Hermitean operator H one may also derive the VUMPS equations directly from the more common global variational principle This follows from the fact that the VUMPS condition can be represented as ∂Āf (A,Ā) = 0, which implies ∂ A f (A,Ā) = 0. Hence, the total derivative of f is zero at the VUMPS fixed point. We can generalize this to two layered transfer matrices T 1 T 2 with the special structure that they compose to a Hermitean transfer matrix: The global projected fixed point equations then correspond ∂ A1 f = ∂ A2 f = 0. Moreover, due to the Hermiticity of T † T the function f is real such that this also implies ∂Ā 1 f = ∂Ā 2 f = 0. Hence, the multi site VUMPS equations in the special case of a hermitean two layered transfer matrix directly corresponds to a global variational principle similar as for single site VUMPS.
For any number of layers greater than two, this argument can not be repeated. We illustrate this here with three layers. Consider the analogous cost function f defined as follows (76) where H is chosen Hermitian such that the total transfer matrix T † HT is Hermitian. The corresponding fixed point equations are still given by ∂Ā 1 f = 0, ∂Ā 2 f = 0 and ∂Ā 3 f = 0. However, neither is f guaranteed to be real, nor are the partial derivatives implied by their adjoint counter part. The fixed point equation thus does not follow from a variational principle. In fact, while the fixed point equation proposed in this work characterizes the actual solution to the boundary equation there is no corresponding global variational principle whose extremum corresponds to the solution of the three boundary equations simultaneously.
An underlying global variational principle can be very helpful in stabilising the corresponding method: If the iterative VUMPS method on a Hermitian transfer matrix fails to converge, one can resort to a gradient based optimisation scheme on f and still find the fixed point (disregarding local minima). For two dimensional tensor networks with a unit cell that is both wider and taller than two we know of no variational principle that can be used efficiently. Hence, in these cases there is only the method proposed here together with TEBD and CTMRG, neither of which can be guaranteed to converge.

B Tensor networks with non-trivial unit cell
In this appendix we will formally generalize the VUMPS method to tensor networks that admit a non-trivial unit cell. Instead of arguing from a power method as in Appendix A.2 we will be using the emulation of nontrivial unit cell tensor networks by trivial unit cell tensor networks. It will be easy to see that the cost scales only linearly rather than an exponential in the unit cell size. In this work, we assume that the lattice on which the tensor network is defined still has the same regularity within the unit cell as on the outer level connecting those unit cells, both having rectangular structure.

B.1 Emulating tensor networks with a non-trivial unit cell by tensor networks with a trivial unit cell
In order to apply the VUMPS methodology to tensor networks with a non-trivial unit cell we are going to emulate the latter by a tensor network with a trivial unit cell. However, instead of achieving this by merely blocking the tensors within one unit cell -leading to an exponential overhead in the complexity -we are going to define a new model with a tensor network with a richer structure leading only to a linear overhead.
To start with, we observe that the data of a two dimensional tensor network on a rectangular lattice is defined by a four leg tensor for any position in the unit cell. In particular, this data can be modeled by a five leg tensor, where the fifth leg carries as a basis the sites of the unit cell. Explicitly, the basis of the fifth leg is given by Z nxny Z nx × Z ny and we can split this leg into two legs each with basis Z nx and Z ny , respectively. This corresponds to making the x and y dependence of the tensors explicit. Graphically, the defining data is given by the tensor This new tensor network is the superposition of all unit cell shifts of the original tensor network. Similarly, the red and blue bonds can be seen as defect lines in the two dimensional tensor network corresponding to the symmetry breaking. The original tensor network is then emulated via symmetry breaking into specific unit cell channels and working therein. Moreover, it is easy to see that the new tensor network still carries the information about the unit cell: working on a torus with width m x × m y the tensor network will vanish for any m x = k x n x and m y = k y n y . In order to efficiently parametrize the boundary of such tensor networks we introduce a generalization of MPS. This is, we generalize the tensor network structure of MPS to a structure related to Eq. (78) (80) This is equivalent to a list of MPS each with a non-trivial unit cell. The substructure of the tensor network of an MPS constructed from tensors as in Eq. (80) is designed such that we are actually emulating non-trivial unit cell tensor networks when the MPS is placed on the boundary of tensor networks as in Eq. (78). Clearly, by blocking the indices we re-obtain an MPS however at bond dimension n x n y χ and physical dimension n x n y D.
For MPS as in Eq. (80) we can define a gauge in a similar way as in 3.1 where the injectivity of the MPS holds in each of the n x n y channels encoding the unit cell separately. Also, the state in Eq. (81) is not normalized in the usual sense. Rather, the MPS corresponds to n y MPS with a non-trivial unit cell in the x-direction, each of which being normalized. Similarly, the MPS in Eq. (81) is such that normalization holds in each of the n x n y channels separately. In what follows, we will alway refer to normalized as normalized in each unit cell channel (x, y) separately. In order to emulate non-trivial unit cell tensor networks via tensors as in Eq. Using + T = − and the fact that the y legs are all connected through delta tensors only we find that Eq. (82) is equivalent to Eq. (42). Note that if the MPO in Eq. (82) is not normalized such that it maps the y-th normalized boundary MPS to the normalized y + 1'st boundary MPS we must allow the boundary MPS to be not normalized within the unit cell channels in order to absorb the different scalar factors from the MPOs in each y channel similar to Eq. (40).

B.2 Geometry of MPS with non trivial unit cell tensors
As explained in the previous section we are going to work not on the usual manifold of MPS but on the manifold of MPS with non-trivial unit cell tensors M nx,ny,χ . Therefore, let us re-derive the geometric picture of MPS on the new manifold M nx,ny,χ similar as in Ref. [44].
where n indicates the site on which the defect matrix is placed. Due to the normalization of the MPS manifold the tangent space T A M nx,ny,χ at a point |ψ(A) must be orthogonal in Hilbert space to the the state parametrized by that point. This, together with the gauge degree of freedom in the MPS eliminates χ 2 n x n y degrees of freedom from the original χ 2 dn x n y degrees of freedom. This can be addressed directly by requiring that for the tensor B in Eq. (83) it must hold and vice versa for the same diagram flipped about the horizontal axis. In particular, B must have support on the null space of A L only. In order to parametrize such tensors B we can define the tensor V L similar to Ref. [44] to be an isometry on on the null space of A L at each (x, y). Hence, for V L it must hold VL AL = 0 (85) and VL VL = . (86) As the null space of each of the n x n y A L is χ(d − 1) dimensional, the right bond of V L at (n x , n y ) has dimension χ(d − 1). Finally, we can write the parametrization of B using the n x n y matrices X nx,ny with dimensions χ(d − 1) × χ fixing the free degrees of freedom of B as Similarly as in Ref. [44], we can rewrite the projector onto the null space as where the diverging N and N denote the normalizations counteracting the non-normalized transfer matrices as before. It is worth noticing that due to the substructure in the tensor network spanned by the x-and y-legs respectively the only non-vanishing elements on the left hand side are exactly those where all x-index configurations (respectively y-index configurations) match. This means, that we are free to absorb all open x-legs (y-legs) on the left hand side into one delta tensor with one open leg whilst dropping the delta tensor on the right hand side splitting the x-leg (y-leg) into three legs, respectively two legs. In other words: The tensor network structure involving the delta tensors on the left hand side is automatically fulfilled by the tensor network on the right hand side (independent of the M , C and A R/L tensors). Using the tensors A C and C it is easy to see that the optimality criterion Eq. (93) As Eq. (92) is the gauge condition for the generalized MPS which is unique up to a phase we find that A C ∝ A C and C ∝ C which are the generalized VUMPS equations

C Sequential implementation
The algorithm defined in the main text updates all tensors in parallel. Similarly, one could as well define a sequential update method. In particular, instead of updating the whole unit cell at once, one could iterate through the x-channel in Eqs. (48) and (49) as follows: To start we initialize a set of n y MPS with a unit cell of size n x , each of these MPS corresponding to a boundary. Then, for each x in the MPS' unit cell we derive the local equations h A C (cf. (48)) and h c (cf. (49)) using (46) and (47) at x (where h C actually should be derived for x and x − 1). Solving h A C and h C we obtain new tensors A C at x for all y and C at x and x − 1 for all y. Using these we update the boundaries using (33) and (35) obtaining a gauging error. Starting over again from the new set of boundaries for x + 1 until one sweep through the unit cell is completed. Start again from the beginning until the gauging error (or the difference in the singular values of C in different iterations) is smaller than a desired threshold . The algorithm is shown with more structure in Algorithms 3 and 4.
The difference in the sequential and parallel update for local Hamiltonian quantum systems, similar to this work with n y = 1 and a hermitian MPO, is discussed in Ref. [42] (cf. Figure 1). They find the two algorithms to perform comparably. While the parallel update has a smaller overhead per iteration computationally, because the environment terms need to be computed only once, the sequential algorithm seems to find short cuts in the iteration trajectory resulting in less iterations necessary till convergence. This can be understood nicely using the picture of Appendix B: While the parallel update corresponds to the minimization with respect to the full tangent space projector Eq. (89), the sequential update breaks these updates into optimizations with respect to hyperplanes within the tangent space (as the x-channel in the projector is fixed) with the consecutive updates in the x-direction being coupled. This means that for the sequential update there is more freedom to find different iteration trajectories, which however in general are not guaranteed to be shortevr. It is easy to see that a solution of one algorithm is also a solution to the other algorithm.
While the understanding carries over, the findings cannot be applied straightforwardly to this work. Here we are dealing with MPOs rather than local Hamiltonians. In turn, the environment equations are as complex as the leading terms in complexity, the computation of A C . Therefore, the environment terms are a leading term in complexity such that the sequential implementation scales quadratically in n x . Note: The sequential algorithm in Ref. [42] scales quadratically in n x , too, but only in a sub-leading term. For n x sufficiently larger than the physical dimension, however, the parallel update can be expected to perform better than the sequential update in their case. Obviously in our case it still holds that both algorithms converge to the same solutions. (|ψ(A y ) ) y ←− initialize MPS array (L x,y ) x,y , (R x,y ) x,y ←− update environments from {(|ψ(A y ) ) y , M } calling environment_terms gauge ←− 1 ( gauge,y ) y ←− (1) y while gauge > prec do for x ∈ {0, . . . n x } do (A x,y C ) y ←− solve h A C using an iterative solver calling local_apply_h_a_c at x (C x,y R ) y ←− solve h C R using an iterative solver calling local_apply_h_c at x (C x,y L ) y ←− solve h C L using an iterative solver calling local_apply_h_c at x − 1 for y ∈ {0, . . . , n y } do |ψ(A y ) , trunc,y ←− {A x,y C , C x,y R , C x,y L } following (33) and (35) gauge ←− max{ gauge,y } (L x,y ) x,y , (R x,y ) x,y ←− update environments from {(|ψ(A y ) ) y , M } calling environment_terms Accepted in Quantum 2020-08-03, click title to verify. Published under CC-BY 4.0.