NoRA: A Tensor Network Ansatz for Volume-Law Entangled Equilibrium States of Highly Connected Hamiltonians

Motivated by the ground state structure of quantum models with all-to-all interactions such as mean-field quantum spin glass models and the Sachdev-Ye-Kitaev (SYK) model, we propose a tensor network architecture which can accomodate volume law entanglement and a large ground state degeneracy. We call this architecture the non-local renormalization ansatz (NoRA) because it can be viewed as a generalization of MERA, DMERA, and branching MERA networks with the constraints of spatial locality removed. We argue that the architecture is potentially expressive enough to capture the entanglement and complexity of the ground space of the SYK model, thus making it a suitable variational ansatz, but we leave a detailed study of SYK to future work. We further explore the architecture in the special case in which the tensors are random Clifford gates. Here the architecture can be viewed as the encoding map of a random stabilizer code. We introduce a family of codes inspired by the SYK model which can be chosen to have constant rate and linear distance at the cost of some high weight stabilizers. We also comment on potential similarities between this code family and the approximate code formed from the SYK ground space.


Introduction
Tensor networks are a powerful tool in the study of geometrically local quantum systems which have proven particularly useful for one-dimensional systems [1].In quantum many-body physics, they first appeared in the guise of "finitely-correlated states" [2] and were later understood to underlie the functioning of a powerful numerical technique, the density matrix renormalization group (DMRG), which gave unprecedented access to ground states of 1d Hamiltonians [3].It was understood that DMRG worked because the ground states of interest had limited entanglement and could be effectively compressed to a much smaller space parameterized by so-called matrix product states, a simple kind of 1d tensor network.The use of these tools has since broadened, and there is now a large family of tensor network architectures that are used for both analytical and numerical purposes, both with classical computers and, potentially, quantum computers, with approaches including [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18].
In contrast, such network representations have not been much explored for mean-field quantum models which are characterized by all-to-all interactions amongst their degrees of freedom.This is presumably because ground states of such models are expected to be volume-law entangled (e.g.[19,20]), and such a high degree of entanglement is costly to represent using existing tensor networks.In this paper, we address this problem by proposing a class of tensor networks which have the potential to represent the highly entangled ground states of meanfield models.
The networks we consider can be viewed as generalizations of MERA, DMERA, and branching MERA networks where the requirement of spatial locality is removed [5,[21][22][23].As we show below, such networks can accommodate volume law entanglement as is expected for ground states of meanfield models.However, without the imposition of additional structure it is not possible to efficiently contract these networks on a classical computer.Nevertheless, they provide a number of conceptual advantages and can still form the basis for variational quantum algorithms, e.g.[24,25].
We are particularly motivated to consider these networks in light of the physics of the Sachdev-Ye-Kitaev (SYK) model [26][27][28][29][30].This is a model of all-to-all interacting fermions with a number of unusual features, including an extensive ground state degeneracy and a power-law temperature dependence of the heat capacity at low temperature.Moreover, these curious low energy properties are related to the existence of a dual description in terms of a low-dimensional theory of gravity known as Jackiw-Teitelboim (JT) gravity [27].It is desirable to better understand the emergence of this gravitational physics, especially for a fixed realization of the couplings, in both the SYK model and beyond.Following earlier ideas relating tensor networks and holography, a small sampling of which is [31][32][33][34][35][36][37][38][39], a tensor network model of SYK may also provide useful information about the emergence of the bulk.
Informed by these properties, we consider a class of networks which can encode an extensively degenerate space of highly entangled ground states.Figure 1 illustrates the network architecture, dubbed the non-local renormalization ansatz (NoRA), which should be viewed as a quantum circuit ansatz for the ground space of a suitable class of Hamiltonians.To justify this architecture as a potential model of SYK, we estimate its entanglement and circuit complexity and find qualitative agreement with SYK expectations.In addition to constructing the ground space, the network also provides a skeleton on which we can build a model of excitations [40].For an appropriate choice of parameters this model can exhibit a power-law temperature dependence of the thermodynamic entropy (and therefore the heat capacity).These features are the key desiderata underlying our construction, and are discussed in detail in Section 2.
A natural next step would be to explore the NoRA network as a variational ansatz for SYK.This is complicated by two issues: we need to generalize the network structure to fermionic degrees of freedom, and we need to find a way to efficiently contract the network (or use a quantum computer).Given this extra complexity, we have elected to first explore the architecture in a simpler setting where the elementary gates are not variationally chosen but instead are taken to be random Clifford gates.This enables us to study the network properties using the stabilizer formalism [41] without needing to explicitly contract the network.Moreover, this setting yields a class of stabilizer codes in which the logical space is identified with the ground state degrees of freedom and the network represents an encoding circuit for the code.We study the stabilizer weights and distance of the resulting codes as a function of the layer depth D and the total system size N (see Figure 1).We find that the network can produce good quantum codes [42], meaning code families where the distance and number of logical qudits are both proportional to the number of physical qudits.However, these codes are not low-density parity check (LDPC) codes [43,44] since some of the stabilizers are high weight.We hypothesize that by further fine-tuning the gates, our network architecture could also yield encoding circuits for the recently discovered classes of good quantum LDPC codes [45][46][47][48][49][50].
The rest of this paper is organized as follows: In Section 2 we describe the architecture in detail and discuss its key properties.In Section 3 we define a family of random Clifford networks based on our architecture and discuss their interpretation as encoding circuits for stabilizer quantum error correcting codes.In Section 4 we report a numerical study of several different realizations of the architecture falling within the stabilizer code ansatz.We describe in detail how the distance and stabilizer weights of the resulting codes depend on the model parameters.In Section 5 we discuss a particular thermodynamic limit which is inspired by the structure of SYK.We compare the entanglement and complexity to expectations from holographic calculations and comment on the code properties.Finally, in Section 6 we give an outlook and discuss ongoing and future work.

Network Architecture
Throughout this section we work with general qudits of local dimension d.We first describe the general structure of the class of NoRA networks we consider, then we specialize to a particular network structure inspired by scaling and renormalization group (RG) considerations.We analyze both the entanglement and complexity of the scalingadapted ground state network and discuss an extension to describe excited states.In particular, we show that a natural choice of energy scales in a toy model Hamiltonian can give rise to a powerlaw temperature dependence of the thermodynamic entropy and heat capacity.

General Structure
The NoRA network is defined by L layers as in Fig. 1, where we refer to the bottom qudits as ground state qudits and the other qudits as excited state or thermal qudits.When we set the thermal qudits to some fixed product state, |0⟩, we obtain the ground state network as in Fig. 1.This nomenclature is chosen because we can view the network as a variational ansatz for the ground space of a mean-field model.From this point of view, the ground state qudits parameterize a space of states that would be identified with the degenerate ground space of the concrete model of interest.
One way to think about the network is as a "finegraining" circuit moving upwards from the bottom ground state qudits.This is the inverse of a conventional RG transformation since we are adding degrees of freedom.We start with k of these ground state qudits.Then at each layer ℓ we add ∆n ℓ thermal qudits in the fixed state |0⟩ ⊗∆n ℓ and apply a depth D quantum circuit to all the qudits in that layer.This circuit could also be generalized to be time evolution with a suitably normalized all-toall Hamiltonian for a constant time (proportional to D).The next layer takes all the qudits from the previous layer and adds more thermal qudits to generate the hierarchical structure in Fig. 1.The total number of qudits at layer ℓ is denoted n ℓ and given by (1) Figure 1: Basic architecture of the proposed NoRA tensor network ansatz.A code word |ψ code ⟩ consisting of n 0 ≡ k (logical) "ground-state" qudits is embedded as |Ψ phys ⟩ in the (physical) ground space of the d N -dimensional manybody Hilbert space by the means of L layers of some given depth D quantum circuits.For each layer 1 ≤ ℓ ≤ L the circuit D ℓ acts on the n ℓ−1 qudit output from the previous layer and an additional ∆n ℓ new ancillary "thermal" qudits initialized in state |0⟩.We stress that the layer circuits D ℓ do not have to respect locality structure depicted by the 1d arrangement of qudit lines.
The total number of qudits is therefore (2)

Scaling Specialization
As is, we have described a fairly general architecture.Motivated by scaling and renormalization group considerations, we will primarily consider the special case where n ℓ ∼ k + r ℓ , so that the number of thermal qudits is increasing exponentially with each layer up from the bottom.Viewing the top layer as the UV or microscopic degrees of freedom and the bottom layer as the IR or emergent degrees of freedom, moving from the UV to IR (top to bottom) mimics a renormalization group transformation where we remove some fraction of the thermal degrees of freedom at each step.Indeed, borrowing the language of MERA and DMERA and viewing the circuit from top to bottom, the individual layers are like disentanglers that leave behind some decoupled degrees of freedom, the thermal qudits added at that layer.In this scheme, we choose the number of qudits at layer ℓ to be implying that the number of new thermal qudits for each layer must be For the case of r = 2, which we primarily consider in this work, this simplifies to approximately ∆n ℓ = 2 ℓ−1 for all layers ℓ.

Entanglement and Complexity
We next discuss the entanglement and complexity of the RG-inspired network.There are O(N ) nontrivial bonds in the circuit, of which N bonds connect to the same constant-depth circuit in the last layer.It is therefore straightforward to establish that the network has the potential to encode volume law entanglement for sub-regions of a "typical" UV state.We also explicitly demonstrate that this is achievable within the Clifford model discussed below in Sections 3, 4, and 5.
Turning to the complexity, we take the number of gates in the network as an estimate of the circuit complexity of the UV state, although in general this is only an upper bound.For a layer ℓ with n ℓ total qubits in it, we apply D rounds of ⌊n ℓ /q⌋ q-qudit gates, so the number of gates of layer ℓ is gates at layer ℓ = D • ⌊n ℓ /q⌋. ( Summing this result over all layers and assuming that q divides n ℓ without remainder gives a total number of gates equal to In sections 4 and 5 we will cast this result into simpler leading-order expressions that correspond to the respective types of ground space scaling being considered.

Extension to Excited States
Let us conclude this section by extending the ground state network we have so far discussed to the case of excited states.As we have repeatedly emphasized, the discussion so far is general and does not consider a particular physical Hamiltonian.We are simply trying to match certain qualitative features of the entanglement and complexity expected for mean-field models.A structure similar to what we will consider here was recently studied for non-interacting fermions and advocated for as a general approach to approximating thermal states [40].
The idea is to introduce a toy Hamiltonian for which the above network is an exact ground state for any choice of state on the k ground state qudits.In other words, the toy Hamiltonian has an exactly degenerate ground space.The Hamiltonian is constructed in a standard way by introducing projectors P = |0⟩⟨0| for each thermal qudit and defining corresponding projectors acting on the UV qudits by conjugating these elementary projectors with the network circuit.Let Pi denote the projector for thermal qudit i conjugated by the network circuit.The toy Hamiltonian is where J i are a set of free parameters that determine the energy scale associated with each thermal qudit.Note that -just like the circuit it encodes -this Hamiltonian is highly non-local and not necessarily few-body, thus limiting the potential for physical interpretation.The setup is described in more detail in appendix A. Again motivated by RG considerations, in which the energy scale of excitations decreases by a fixed factor after every RG step (top to bottom), we take the J i to be equal within a layer and to depend on the layer index ℓ as In this way, the UV energy scale is Λ and the energy of excitations decreases exponentially with the layer index decreasing towards the IR.The free parameter γ controls the rate of decrease.As computed in appendix A, the entropy for the Gibbs ensemble associated to said toy Hamiltonian describing our tensor network ansatz (and for general scaling of J ℓ ) is (9) where we defined a probability, S(p ℓ ) is the classical binary entropy function, Note that in the case of qubits (d = 2), p ℓ coincides with the ordinary Fermi-Dirac distribution, in which case ⟨N − k⟩ is analogous to a sum of occupation numbers.
Plugging in (8) and going to the low-temperature regime (relative to the energy scale Λ), ( 9) can be approximated in the continuum limit as with N = k+r L and α = log(r).This together with the specific example depicted in figure 2 confirms that in this limit the entropy does obey a power law.By choosing the parameters α and γ suitable, one could even match the precise low-temperature behavior of the SYK heat capacity C V (which is proportional to T ) due to dS = C V T dT : Both match almost exactly for our choice of parameters and small T /Λ, confirming the existence of a scaling law.The same is also true for other choices of γ (the only significant free parameter), as seen in figure 14.

Summary
Starting from the general architecture in Figure 1, we introduced the RG-inspired network in which the number of qudits at layer ℓ is k + r ℓ .In the special case where k = 0, i.e. a non-degenerate ground space, the number of qudits decreases by a factor from one layer to the next into the IR.This decrease is analogous to a block decimation RG procedure applied to a quantum state.The case of k ̸ = 0 describes a generalization of such an RG procedure.The entanglement entropy of the physical states produced by the RG-inspired network can be volume-law, as expected for mean-field models.We also showed that the ground state network can be extended to provide a model of thermal excitations in which the thermodynamic heat capacity has a power-law temperature dependence at low temperature.These general features are all chosen to match characteristics of the SYK model, which also features a nearly degenerate space of highly entangled ground states and a power-law heat capacity at low temperature.

Clifford Ansatz
Having laid out the scaling-inspired architecture in the previous section and shown that it can capture some expected features of mean-field models, especially the SYK model, we now consider a concrete version of the network built from Clifford gates.We would also like to use the network as a variational ansatz to study physical mean-field models, but for the reasons outlined in the introduction, in this paper we focus on the Clifford model as an example where we can also classically simulate the network properties.A review of the Clifford group and how it can be implemented is provided in appendix C. If the circuits in Figure 1 are composed of Clifford gates, then the network can be interpreted as an encoding circuit for a stabilizer quantum error correcting code [41].The ground state qudits then correspond to the logical qudits of the code.We focus in particular on the distance of the code and the weight of the stabilizers, as they provide a good heuristic for probing the entanglement structure and give us a glimpse at the network's potential as an error-correcting code.The purpose of this section is to review this error correction interpretation and setup the subsequent calculations in Sections 4 and 5.

The Clifford Group
Let us briefly recall the motivation for Clifford circuits.In general, simulating quantum circuits on a classical computer architecture becomes difficult with increasing number of qudits due to the exponential scaling of the Hilbert space dimension with the number of qudits.However, we can still compute certain quantities efficiently on classical computers by restricting ourselves to a subgroup of the full unitary group that only scales linearly in the number of qudits [51,52].This group is called the Clifford group and is defined as the subgroup of unitary operators that map Pauli strings to Pauli strings [51].Elements of the Clifford group can then be represented as Clifford circuits, which are circuits composed of successive (elementary) Clifford gates acting on a bounded number of qudits at a time.An example of such a Clifford circuit is depicted in Figure 3 in the context of random scrambling.
The Clifford group has a variety of applications in quantum information.For example, in the context of generating random states, the Clifford group is useful because it forms a k-design of the Haar measure of random unitaries.This means that quantities averaged over random choices of gates/states only start to differ between Clifford and Haar in probability moments higher than k, where we have k = 1 for all possible qudit dimensions d, k = 2 for all d that are powers of primes, and k = 3 when said base prime is 2. [53,54].
The reason the Clifford group for N qudits with local dimension d can be efficiently simulated resides in the fact that it is a projective representation of the symplectic group Sp(2N, F d ), where 2N is the vector space dimension the group acts on and F d is the (unique) finite field with d1 elements.As mentioned before, the space of Pauli strings therefore scales linearly, with operators mapping between them being represented (up to a phase) by 2N × 2N symplectic matrices over F d .Sampling Cliffords therefore can be achieved by sampling symplectic matrices, for which efficient algorithms exist [55].A more detailed description of this framework is provided in Appendix C.

Random Layer Circuits
To define a precise model based on our architecture, we have to make an explicit choice for the depth D circuits D ℓ that are applied at each layer ℓ.Inspired by SYK, our approach is to apply n ℓ /q randomly sampled Clifford gates2 to randomly chosen non-intersecting sets of q qudits for each sublayer 1 ≤ m ≤ D of the total layer circuit.Such a Clifford circuit is depicted in figure 3 for q = 2. Heuristically, this ansatz can be interpreted as a Trotterization of the SYK Hamiltonian, although with qudits instead of Majorana fermions 3 .
With that we can then view the resulting net- An example of a layer circuit acting on 10 qudits with q = 2 and depth D = 3.Each unmarked gate represents a randomly sampled Clifford element acting on two qudits, while the gates π m for m = 1, 2, 3 are random permutations of the qudits.While such a circuit does not exhibit a causal cone, this non-locality of interactions is expected from mean-field models.
work as an encoding circuit for a quantum stabilizer code.The k ground state qudits are the logical qudits and the N UV qudits are the physical qudits.We now briefly review stabilizer codes and the important notion of distance, which captures aspects of the entanglement structure discussed above in Section 2.

Stabilizer Codes
A [[N, k, δ]] stabilizer code that encodes k logical qudits into N physical qudits with distance δ is defined in terms of a stabilizer group S, which is an abelian subgroup of the (generalized) Pauli group P d (N ) i.e. the group generated by all possible Nelement tensor products of ordinary Pauli operators (d = 2) or their higher-dimensional counterparts (d > 2), which are defined in appendix C.1 [41].The stabilizer group must therefore be generated by N − k independent and commuting elements of P d (N ).A code word then is a state vector |ψ⟩ ∈ C d N that satisfies s |ψ⟩ = |ψ⟩ for all s ∈ S. The space spanned by all possible code words is called the code space and has dimension d k due to the rank of the group being N − k.The operators mapping logical states to other logical states are called logical operators and must therefore commute with all elements of the stabilizer group and hence form the centralizer of the stabilizer group in in P d (N ).

Decoupling & Code Distance
The code distance is a measure of how robust the code is to errors on the physical qubits.Determining the distance for a stabilizer code is in general a computationally intensive problem due to the potential for complex patterns of entanglement.We use an adversarial approach, which is based on analyzing the mutual information between all possible subsystems A of the physical qudits and some external reference R which is maximally entangled with the code space.A depiction of the setup can be found in figure 4.
Because R is maximally entangled with the code space, it is effectively tracking the encoded information.Therefore the question is how much of the system does an adversary need access to in order to be correlated with R and thus have (at least partial) access to the encoded information.This correlation can be detected using the aforementioned mutual information (16), which becomes non-zero in such a case.The code distance δ is the biggest integer such that all regions A with |A| < δ have Implementing this approach as an algorithm is time-consuming though, since iterating through all possible choices for A is combinatorically intensive.A way to simplify the procedure at the cost of only getting an upper bound approximation for the code distance is by randomly sampling choices for A and determining the largest one which has vanishing mutual information.This Monte Carlo approach is the method we use.

Stabilizer Weights
It is also interesting to ask about the weights of the stabilizers.The weight of a Pauli string is defined as the number of elements of P d (1) in the tensor product representation of the operator that are not proportional to the identity operator I. Since P d (1) contains d 2 elements, there are d 2 − 1 such nontrivial operators.If the stabilizer group has a generating set containing only Pauli strings of bounded weight, then we say the code has constant weight.The code space can always be obtained as the ground space of a Hamiltonian built from a generating set of the stabilizer group, and if the code has constant weight then there is such a Hamiltonian which contains only terms acting on a bounded number of qudits at a time.

Summary
Here we reviewed the notion of a stabilizer code and defined the random Clifford gate version of our architecture.In the following two sections, 4 and 5, we consider random stabilizer codes built from random Clifford layers inserted in the RG-inspired architecture (Figure 1 and Section 2).We investigate the distance and stabilizer weights both numerically and via analytic arguments.We verify that these codes can be highly entangled, for example, with a distance proportional to N .We also study the distribution of stabilizer weights and show that some stabilizers do have high weight proportional to N .As such, they are not constant weight codes in general.

A Numerical Study
We now present a (non-exhaustive) numerical analysis of the NoRA tensor network using the Clifford stabilizer formalism discussed previously.Our primary focus is the scaling of the (relative) code distance with N , and how it differs between having the space of ground states scale with L and having it fixed.
The stabilizer simulation used to generate the following data was written in Python 3.10.4using Numpy 1.21.6 (linear algebra) [56] and Galois 0.1.1 (finite field arithmetic) [57], and is based on the projective symplectic representation discussed in appendix C. The algorithm used to randomly sample symplectic matrices for Clifford operators is based on [55], but was generalized to work for any choice of qudit dimension d that is a power of an odd prime.The complete code can be found at https://github.com/vbettaque/qstab.
The datasets used in this paper were generated using a 2021 MacBook Pro with M1 Pro processor and 16 GB RAM, and can be shared upon request.If the computation involved random sampling, an average of 1000 samples is displayed together with the error on the mean 4 .In general we also chose a qudit dimension of d = 3, a growth rate of r = 2 per layer, and a (naive) layer circuit growth rate of q = 2.

Fixed Ground Space Size
We begin our analysis with the case where the size of the ground space is fixed.The other case, where the size of the ground space grows with L, more closely resembles SYK, but the fixed size case is also interesting as a starting point and for the codes it produces.In such cases, the rate k/N of the code approaches zero exponentially fast with the total layer number.However, the complexity still increases exponentially in L according to suggesting that distances scaling with N should be achievable.The vanishing rate is also not inher-ently problematic as this is also true for other errorcorrecting codes like the [[A 2 , 2, A]] toric code [58].

Entanglement Entropy
The first part of our analysis deals with directly confirming the expected volume-law entanglement of the average NoRA-prepared stabilizer state.
For any state to have that property, the (von-Neumann) entanglement entropy S(A) associated to a random subregion A of the state should scale with the size |A| of the region, at least for the right parameters and as long as one has |A| < N/2.Entanglement Entropy (S(A))

Code Distance
We now turn to determining how the average code distance depends on the layer-circuit depth D. This is of interest to us since for error correction we want to choose D to be as small as possible to reduce the circuit complexity, while still having δ as large as possible (i.e.saturated ) on average.Looking at figure 6, that seems to be the case for D sat = 2, 3, 4, depending on the tolerated margin of error between δ and δ max .The case of D sat = 1 not coming close to maximizing the distance coincides with the lack of volume-law entanglement in figure 5 for the same depth, albeit for different choices of L. Due to the approximately exponential trend of the data, the average distance is already close to its saturated maximum of δ max = 64.43 ± 0.09 if the layer circuits have a depth of D = 3. Larger depths therefore provide little to no improvement.However, the maximum average distance is still several standard deviations less than the theoretical maximum provided by the quantum singleton bound δ qsb , though we expect them to be somewhat closer (but not necessarily equal) at large D and L.
In general one would assume that D sat depends on L as well as all other parameters.However, we can argue that D sat is largely independent of L and should only strongly depend on d, q and r.As is shown in appendix B, the weight of an operator string increases on average by a factor of g Dsat (where g ≈ q • (d 2 − 1)/d 2 only depends strongly on d and q) when subjected to a single q-local Clifford with depth D sat .And since the distance of a typical stabilizer code is expected to scale like its operator weight, we can adopt the same arguments here 5 .This means that D sat ≈ log g (n ℓ /w ℓ−1 ) should estimate the minimum depth needed for both weight and distance saturation.
In the case of NoRA with k fixed we have n ℓ /n ℓ−1 = (k + r ℓ )/(k + r ℓ−1 ) ≈ r for r ℓ ≫ k, meaning the system increases by a roughly constant factor at each layer.Given that we start in a steady state, i.e. with a Pauli string with close to maximum weight (w ℓ−1 ≈ n ℓ−1 ), then for the string in the subsequent layer to also have maximum weight we require that w ℓ ≈ n ℓ ≈ g Dsat • n ℓ−1 and hence which does not depend strongly on L (or ℓ).For d = 3 and q = r = 2 this gives D sat ≈ 1.2 which rounds up to 2, the observed lower bound for the occurrence of volume-law scaling.Additional numerical evidence for this heuristic with regard to different choices of L is also provided by figure 9 in section 4.1.3.Overall we therefore can have D sat fixed for different sizes of the tensor network without expecting a significant impact on the relative code distance δ/N .Figure 6 also shows that the tensor network can (on average) achieve distances that are quite close to the theoretical maximum: This maximum is assumed if the quantum singleton bound N −k ≥ 2(δ−1) becomes an equality.Reaching this saturation limit (or at least coming close to it) requires states with volume law entanglement, thus verifying our previous expectations.It would be interesting to understand how close the average code distance comes to δ qsb as a function of L and D. However the computing time scales exponentially with N and therefore double-exponentially with L, making it more difficult to gather data for larger system sizes.But for now our results do indeed suggest a possible approximate distance saturation with more layers, as shown in figure 7 for one specific example 6 .We say approximate because (18) implies that for reasonable choices of D we expect the system to reach a steady state after a certain number of layers, meaning that for subsequent layers the scrambling rate of the finite-depth circuit and the rate of new thermal qudits form an equilibrium and thus keep the relative distance constant.Depending on the choice of parameters, this equilibrium does not necessarily have to coincide with δ qsb .However, we expect this to be the case for unreasonably large scrambling rates g D ≫ r due to the network dynamics being dominated by the upper-most finite-depth circuit D L .Finally we consider how the code distance δ scales when the number of logical qudits k is increased while keeping the number of layers L fixed.
Doing so provides another heuristic as to whether the tensor network exhibits volume-law entanglement or not, since we expect a linear decrease of the entanglement entropy and therefore distance with increasing k in that case.As shown in figure 8 this seems to be indeed the case on average and for our choice of parameters.

Stabilizer Weights
Besides the average code distance of our tensor network ansatz, it is also interesting to consider the weight distribution of the (naive) stabilizer basis describing the code.For the purpose of performing error correction, having a low-weight code (meaning a code with a low weight generating set for the stabilizer group) is desirable since it means the syndrome can be obtained by measuring lowweight operators.If many of the stabilizers are high weight, it might be infeasible to measure the entire syndrome before too many errors accumulate.Moreover, the commuting projector Hamiltonian whose ground space coincides with the code space is only local (few-body) if the code is lowweight.
To analyze the whole stabilizer basis, we first consider how the tensor network affects the weight of a single Pauli string with unit weight.The nonidentity operator is here at the beginning of the string, which means that it is acted on non-trivially by all layers of the circuit.The resulting averaged weight evolution is depicted for different choices of D in figure 9 in terms of its relative difference to the expected maximum weight which is given at each layer ℓ by What can be seen is that for all choices of D > 1 the weight differences reach an equilibrium 7 , barely changing for later layers.The same circuit depth therefore always approximately produces the same relative weight, regardless of the actual number of layers L. We already used this argument in section 4.1.2to argue that the minimal depth D to achieve a good distance δ does not depend on L because distance and weight usually have correlating behaviors.However, when considering a complete set of stabilizer states, not all of them are acted on nontrivially immediately since they might correspond to thermal qudits introduced only in later layers.Take for example the number of thermal qudits introduced in the final layer of a NoRA circuit, which experience the least scrambling of all degrees of freedom but make up r L−1 of the total k+r L qudits.For k = r = 2 this is close to 50%.Unlike an arbitrary random circuit with equivalent total depth, where all stabilizer basis elements experience the same amount of scrambling and are therefore expected to have equally high weights, we predict the stabilizer weights of a NoRA circuit to obey a multimodal distribution with dominant peaks around both 1 and w sat L .This is indeed approximately the case as seen from the specific example depicted in figure 10.It also shows that with increased circuit depth D more and more basis weights approach saturation, as expected.
Note though that for D = 1 all stabilizers fail to come anywhere close to maximum weight.This aligns with the predictions that are coming up in section 5.1, where we suggest that some sort of phase transition should occur in the relative stabilizer weight distribution (and hence distance) when  Already for a circuit depth of D = 2 does the relative difference seemingly converge towards a constant value, with larger depths resulting in even faster convergence and smaller differences.This shows that the relative weight (and therefore the ideal circuit depth to maximize the code distance) does only depend strongly on d, q and r, not L.
going from the regime of g D < r to g D > r, and considering large L. In the former case we expect the average stabilizer weight to be small to negligible compared to the total size, while in the latter case we predict complete weight saturation for all elements.For the specific example in figure 10 we assumed g = q • (d 2 − 1)/d 2 = 16/9 < 2 (as shown in appendix B) and r = 2, meaning that we should have g D < r for D = 1 and g D > r for D > 1.
And since the relative weights for D = 1 are comparatively small, this indicates that this transition does indeed take place.In the future we intend to explore this behavior in more detail by looking at other examples in the parameter space.
Comparing the weight analysis with our results from the previous section we can therefore conclude that the stabilizer bases with the lowest weights and highest distances are achieved when choosing D = 3 as the layer circuit depth.Choosing D = 1 could also be beneficial though at a significant cost of distance.Either way, the relative number of high-weight stabilizers is significant, thought we expect there to be potential for further reducing the weights, as will be explained in section 6.2.

Enabling Ground Space Scaling
To model a situation like SYK where the number of ground state qudits is proportional to the total number of qudits, and where we have a thermodynamic limit where both numbers go to infinity, we want to take L ∼ log r k, although this is not an integer in general.So consider as a simple model the case where k = r a and L = a + b for two integers a and b.Then the total number of qudits is and the ratio between ground state qudits and the total number of qudits is therefore which is independent of a.The limit a → ∞ can thus be viewed as a thermodynamic limit in which N and k diverge but with a fixed finite ratio.By varying b we can then adjust the relative number of ground state qudits.Many of our previous arguments that assume the number of layers to be fixed therefore apply here as well and will not be repeated.Of primary interest to us is therefore how our code's distance and weight scale with N = r a + r a+b where a increases and b is fixed.In addition to the previously made choices for d, q and r, we also assume D = 3 and b = 1 for the following examples.

Code Distance
Before considering explicit simulations, we can again use the quantum singleton bound to find an upper bound for the expected relative code distances.This bound turns out to be which unlike the fixed case is necessarily dependent on N , although only weakly at large N .For large system sizes the relative distance therefore approaches the fixed value of 1/3 for b = 1 (and r = 2).Comparing this to numerical approximations of the average distance and its trend as shown (in orange) in figure 11, we can see that both trends might coincide in that very limit, or at least come close.

Stabilizer Weights
As seen in figure 12 the weight distributions for the SYK-like NoRA model don't differ significantly from the case of a fixed ground space.The only significant difference lies in the origin of the distributions: In the case of a scaling ground space we extracted the weights from circuits with different choices for a, while in the fixed case we depicted the weights at each layer of a single circuit.That both cases nevertheless produce similar figures is due to the fact that our tensor network ansatz exhibits self-similarity.It remains to be shown that this trend occurs for different choices of g and r and continues as expected for larger a and b.We are also interested in exploring the phase transition at hand in the limit of large L.Those are things we intend to explore in future work.

Summary
Through extensive numerical simulations, we verified that the stabilizer codes obtained from the random Clifford layers indeed can have volumelaw entanglement, and relative distance approaching a non-zero constant in the thermodynamic limit N → ∞.This corresponds to distance proportional to N , which in turn is also a heuristic for the presence of volume-law entanglement.The relative distance depends on the model parameters, especially the depth D, with the result coming close to the relative singleton bound in both the fixed k case and the k ∝ N case as D is increased.We also found a broad distribution of stabilizer weights, with a few high weight stabilizers coming from the near-IR thermal qudits and a larger number of low weight stabilizes coming from the near-UV thermal qudits.Our architecture with random layers is therefor capable of producing a family of codes indexed by N with non-vanishing relative distance and rate at the cost of having some high weight stabilizers (although significantly fewer than in a fully random code).

Analysis of the SYK-Inspired Code
We now consider in more detail the properties of the SYK inspired code with k = r a and L = a+b for two integers a and b.Recall that the total number of qudits is and the ratio between ground state qudits and the total number of qudits (i.e. the rate) is therefore which is independent of a.The thermodynamic limit a → ∞ gives a family of codes with non-zero rate.We already established in Section 4 that this code can be highly entangled.It is also interesting to consider its complexity.In this case, the complexity sum (6) can be rewritten as This leading N log N scaling with the total number of degrees of freedom can be compared to holographic complexity conjectures applied to JT gravity [59]; one also gets N log N by studying, for example, the volume (length) of the wormhole dual to the thermofield double state with temperature of order 1/N .The key point is that the throat of the wormhole is long, of order log N , at this temperature.Hence, the circuit complexity of our SYKinspired encoding also resembles that obtain from holographic models dual to SYK.
For the estimates discussed below, we continue to assume that the layers are composed of random 2-qudit Clifford gates applied to random pairs of qudits.We caution that this is certainly not correct for the actual SYK model: the gates must act on fermionic degrees of freedom and will not be Clifford (or the fermionic analogue of Clifford) generically.Here we continue to focus on the Clifford case for ease of analysis and for its interpretation in terms of an exact quantum error correcting code.Below we comment briefly on the potential similarities and differences with the actual SYK model.

Distance Estimate and Stabilizer Weights
We know the rate of our SYK-inspired code.To estimate the distance, we need to understand how logical operators grow as they pass from the IR to the UV.Let us assume that a typical operator grows in size by a factor of g D after passing through one layer (i.e.being conjugated by that layer unitary), up to a maximum size set by the total number of qudits.A way to estimate g when the layer unitary is a random Clifford circuit can be found in appendix B. At the same time, the number of qudits is also growing, going from k + r ℓ−1 to k +r ℓ .The distance depends on whether the size of operators grows faster or slower than the number of qudits.Note that we saw already a manifestation of this competition in the discussion in Section 4; here we explain in more detail the issues.
From a given random circuit layer, we expect operators to grow by a factor of g D provided they are not close to maximum weight.If they are close to maximum weight, then they will grow by a reduced factor.We must compare this operator growth to the rate of qudit increase.The ratio R ℓ between the number of qudits in successive layers is which monotonically increases with ℓ.As logical operators evolve from layer to layer into the UV, the relative weight of the operator either increases or decreases depending on whether g D > R ℓ or g D < R ℓ .The dynamics of this process, iterated over all L layers, gives an estimate for the size of non-trivial logical operators.

Warmup: Small Fixed k
To illustrate the key competition, consider first the case in which k is small and fixed.In this case, the ratio R ℓ → r as ℓ increases, so most of the evolution corresponds to a fixed ratio of r.In terms of the parameters above, we can achieve this regime by taking b large at fixed a. Suppose g D > r.Then operator growth is the fastest process and logical operators will reach saturation.In this case, we expect the distance to be linear in N .It will not exactly saturate the singleton bound, but it may come close for large D. Now suppose g D < r.In this case, we are adding qudits faster than operators can grow, so the logical operators are ultimately supported on a dilute fraction of all the sites.Indeed, the size of a typical logical operator will be g DL , whereas the total number of qudits is N = r L (1 + r −b ) ≈ r L .Expressed in terms of N , the size of a typical logical operator is where c = ln g D ln r < 1.Hence, we expect a distance that scales as a sublinear function of N .

SYK-Like Scaling
Now we turn to the case where k = r a is large and b is fixed.Here, when ℓ is small, the R ℓ ratio is close to one and the number of qudits is barely increasing from layer to layer.In this regime, operator growth is completely dominant.In contrast, at the most UV layer, where ℓ = L = a + b, the ratio is Suppose g D > R L .Then operator growth always dominates over qudit growth.However, because the initial number of qudits (the ground state qudits) is large, we still have to compare the total operator size, g DL , to the total number of qudits, N = r L (1 + r −b ).We see again that if g D > r, then this naive estimate gives an operator weight larger than N , meaning that the operator growth actually saturated at something proportional to N .If g D < r, then we are again in the situation where g DL ∼ N c .Suppose g D < R L .Then there will be some layer ℓ * such that operator growth and qudit growth switch dominance as ℓ increases through ℓ * .We may approximately determine this crossover scale from noting that this ℓ * is not typically an integer.In the thermodynamic limit a → ∞, we must have ℓ * = a + b * for some constant b * since the ratio R ℓ is essentially unity until r ℓ is comparable to k.Now between ℓ = 1 and ℓ = ℓ * , logical operators will grow faster than the number of qudits.Assuming they don't reach saturation, they will grow by roughly a factor of g Dℓ * .By contrast, the number of qudits at layer ℓ * is so the ratio of operator size to number of qudits is This ratio vanishes as a → ∞ since we are assuming that g D < R L and R L < r.Hence, g Dℓ * ∼ (n ℓ * ) c as above.
There are a fixed number of layers from ℓ * to L since b and b * are fixed as a → ∞.Therefore operators and the number of qudits grow by an additional factor independent of N from ℓ * to L. Hence, the scaling of g DL with N is the same as the scaling of g Dℓ * with n ℓ * , that is g DL ∼ N c .

Stabilizer Weights
We expect that the stabilizer weights will display a similar pattern as in Figure 10.In particular, a non-zero fraction of all the stabilizers will have constant weight.These arise from the UV most layer.Then as we descend in the network towards the IR, there are fewer stabilizers but of increasing weight.In particular, there are at least a few stabilizers of very high weight, similar to the weight of logical operators.

Comparison to SYK
We now compare features of the SYK-inspired code to those of the actual SYK model.To be precise, we will compare a particular realization of the SYK Hamiltonian (with q = 4), H SYK , with a particular realization of the toy code Hamiltonian, H code , for the SYK-inspired code (see Section 2).(It is also interesting to consider supersymmetric generalizations [60].)• Low-temperature thermodynamics: The SYK model has a low temperature heat capacity proportional to temperature T .Similarly, the parameters of H code can be chosen so that its low temperature heat capacity is proportional to T .
• Fine-tained spectrum: The fine-grained energy spectrum of H SYK is random-matrixlike [61,62].The fine-grained energy spectrum of H code is not random-matrix-like because H code is a commuting projector Hamiltonian.
• Entanglement: Both models feature Hamiltonian eigenstates with volume-law entanglement.The entanglement spectrum will, however, be quite different between the two kinds of states.In particular, eigenstates of H code , being stabilizer states, have a flat entanglement spectrum.
• Complexity: We only have estimates here.Using the duality to JT gravity and holographic complexity/geometry conjectures, the circuit complexity of the SYK approximate ground states is estimated to be O(N ln N ).We have an explicit estimate (and upper bound) of O(N ln N ) for circuit complexity of the ground space of H code .
The many similarities between H SYK and H code are the basis for our conjecture that the architecture in Figure 1 has the potential to describe the physics of the SYK model once the tensors in the network have been adapted to a particular SYK instance, for example, using a variational approach.However, there are also crucial differences between the two.Two that stand out are the different scalings of the weights of Hamiltonian terms with system size and the exact versus approximate nature of the ground state degeneracy.The fine-grained energy spectrum is also very different in the two cases.Thus, it will be informative in the future to explore our network architecture as a variational ansatz for the SYK ground space.

SYK Ground Space as an Approximate Code
Here we want to comment on another possibility raised by the similarities above.For H code , we have seen explicitly that the ground space can be viewed as an error correcting code with constant relative distance and constant rate (provided D is big enough).In particular, it is an exact stabilizer code.This naturally raises the possibility that the approximate ground space of the SYK model could have interesting properties as an approximate quantum error correction code. 8hus we consider a code defined by the full approximate ground space of some particular H SYK realization.By construction this code has a constant rate as N → ∞ which is given by ground state entropy density s 0 .This code is not a stabilizer code, but it does have a sort of "low weight" definition via the SYK Hamiltonian.
What is not immediately clear is the distance of this code.Moreover, since the code is approximate, we must specify precisely what we mean by the distance.We will defer a full discussion to a future work, but here let us note that if the architecture in Figure 1 does indeed provide a good approximation to the ground space of the SYK realization, then the same kind of scaling analysis discussed above for the random Clifford code would also provide an estimate for the operator size of logical operators.
In this case, it would be important to understand the analog of r and g D in the SYK case.As one approach, we could fix r = 2 and then adjust the layer circuits so that we get a good approximation to the ground space.The parameter g D would then be determined by the properties of these circuit.A simple random operator growth model may be too crude to capture the detailed physics, but continuing with this estimate for now, if the resulting g D were greater than r, then we have logical operators of weight proportional to N and potentially distance proportional to N .Alternatively, if g D < r, then the distance could be some power of N , N c .
It would be interesting to understand which of two cases is realized; this should be related to the spectrum of the scaling dimensions in the theory since these are related to the mixing properties of the scaling superoperator [22].Given the relatively low scaling dimension of the fermion operators, it may be that one is effectively in the g D < r regime.

Summary
We gave analytical estimates of the distance for a family of SYK-inspired codes in the thermodynamic limit of many qudits.This code family shares a number of similarities with known properties of the actual SYK model, although there are crucial differences as well.Viewing the approximate ground space of SYK as an approximate quantum code, the analysis of the SYK-inspired model suggests that the actual SYK ground space code, which has constant rate as N → ∞, could have a distance N c for some constant 0 < c ≤ 1.

Generalizations of the Basic Architecture
We presented one simple architecture (Figure 1) which was motivated by the entropy and complexity of mean-field quantum models, especially the SYK model.The particular scaling-inspired ansatz with k = r a and N = r a + r a+b is one instance of that architecture, but one could well imagine other choices.
Moreover, inspired by branching MERA [23] and s-sourcery [7], one can consider other architectures in which the added thermal degrees of freedom are not just in a product state.As a basic example, consider the following structure.Take the encoding circuits for two [[n, k, δ]] codes and mix their physical qubits using an additional depth D quantum circuit.The result is a code on n ′ = 2n qubits with k ′ = 2k.Hence, the rate is the same, k ′ /n ′ = k/n.The distance will also increase by a factor with some probability, as will the weights of the stabilizers.Starting with a root [[n 0 , k 0 , δ 0 ]] code, L layers of this construction produces a code with parameters [[2 L n 0 , 2 L k 0 , δ]] with δ some L-dependent distance.
In the above construction, the rate of the final code is determined by the rate of the root code.We could also vary the rate by introducing additional product qubits in each iteration of the process (analogous to the thermal qubits above) or by combining codes with different rates at each iteration.
In all these constructions, the distance is also expected to grow exponentially with L, the number of layers.However, the weight of the checks will also generically grow when we use random depth-D circuits.From this perspective, the challenge of producing a good quantum LPDC code is the challenge of keeping the weights of checks low while keeping the distance high.This clearly requires tuning of the layer circuits, likely made possible by the addition of some structure to the problem.In light of the recent rapid progress in the area of good quantum LDPC codes, it would be interesting to understand if our architecture can capture these recently discovered codes.

Further Weight Reductions
One direction we intend to explore in the future is finding alternative bases of given generated stabilizer code that minimize the overall weights.In the exact case, such a basis is unique and given by the reduced-row echelon form (RREF) of the stabilizer matrix i.e. the matrix with all stabilizer basis elements as row vectors.The RREF for a general matrix is defined in terms of the following rules: 1.All rows consisting of only zeroes are at the bottom.
2. The leading entry (i.e. the left-most non-zero entry) of every non-zero row is a 1, and to the right of the leading entry of every row above.
3. Each column containing a leading 1 has zeros in all its other entries.
In the case of a stabilizer matrix the first rule can be ignored since the matrix has maximum rank.The remaining two requirements can be easily met by applying a Gaussian elimination algorithm.An example of a possible resulting RREF is given by From this it should be intuitively clear that the RREF maximizes the number of zero-valued elements in the matrix, thus minimizing the weights of the stabilizer basis elements.
Another possible method of weight reduction could be finding a set of stabilizers that approximately replicates our model, but whose basis has low weights.A potential way to achieve this is to use so-called perturbative gadgets [64].Considering the Hamiltonian representation (7) of a given stabilizer code [[n, k, δ]], each term in the sum acts on a number of qudits equivalent to its weight.Let w be the largest weight of all terms in the Hamiltonian, then we speak of a w-local Hamiltonian.Using wth-order perturbation theory it can then be shown that there must exist a 2-local Hamiltonian that approximately has the same ground space i.e. the desired quantum error-correcting code.Said Hamiltonian is called the gadget Hamiltonian and its construction involves introducing w • (n − k) ancillary qudits, where n−k is the number of terms in the original Hamiltonian.The approximate ground space Hamiltonian can then be recovered by blockdiagonalizing the gadget Hamiltonian and only considering the entries with unit eigenvalue in the ancillary space.It should be noted that the terms of this resulting Hamiltonian might not necessarily commute anymore, but we expect the ground state degeneracy and code properties to be unaffected.In future work we intend to explore both approaches for weight reduction in the context of our tensor network ansatz and compare them to the baseline considerations made in this paper.

Towards a Closer Link With SYK
It is also interesting to move towards closer contact with SYK.The first step is to develop a fermionic analogue of our architecture.Then, because the network is not efficiently contractible in general on a classical computer, it is interesting to pursue a quantum simulation strategy where we treat our architectecture as a variational ansatz.The variational parameters would be the gates within each circuit layer as well as the discrete data of the network, e.g. the number of qubits at each layer.It would also be important to understand and adapt the construction to some of the details of SYK, e.g. the specific expected form of the ground state degeneracy.
A related setting where we should be able to carry out classical simulations is the SYK model for q = 2, i.e. a non-interacting fermion model with random all-to-all hoppings.In this case, we can efficiently simulate the network using non-interacting fermion machinery.There is no ground state degeneracy for the random hopping model, so k = 0 in this case, but one could still test other properties of the network.We are currently exploring this direction.
In the spirit of generalizing to fermionic models, it is also interesting to consider fermionic generalizations of the Clifford formalism, e.g. the subgroup of the full set of fermionic unitaries that maps strings of fermion operators to other strings of fermion operators.By developing methods to sample these transformations and to compute entropies of subsets of the fermions, one would be able to repeat the studies in this work in the language of fermionic codes [65].This is also a work in progress.

NoRA as an Ansatz for Approximating Mean-Field Ground States
In addition to the SYK model, there are numerous other mean-field models whose ground states we might try to model with a NoRA network.One large class consists of quantum spin glass models.Here we explain with an example why NoRA is a plausible ansatz for this class of models.
For concreteness, consider a quantum transverse field Sherrington-Kirkpatrick (TFSK) The couplings J ij are Gaussian random variables with zero mean and variance 1/N .This model is known to have a quantum phase transition between a non-glassy state at large g and a glassy state at small g.At g = ∞, the ground state is simply a product state σ x eigenstates, while at g = 0 the model reduces to the classical Sherrington-Kirkpatrick spin glass.At g = ∞, the ground state is a product state and hence trivially a NORA network with zero layers.The energy gap between the ground state and the first excited state is non-vanishing at large N .At large but finite g, the model is still in the nonglassy phase and the gap between the ground state and the first excited state remains non-vanishing at large N .Because of the non-vanishing gap, we can presumably approximately prepare the ground state at large but finite g using an adiabatic evolution for a constant time proportional to 1/g (since the gap is proportional to g).This is analogous to one layer of the NORA network.Hence, it is plausibly the case that a NoRA with one layer could capture the ground state of the TFSK using a NoRA network (in fact, one which requires just one layer as opposed to ln N layers).
Similarly, at g = 0 the model reduces to the classical Sherrington-Kirkpatrick spin glass and the ground state is a product state.Deep in the glassy phase at small but non-zero g, the ground state is also caricatured by a product state and it is also plausible that a single-layer NoRA can capture the ground state.
Finally, at the critical point separating the two phases, it is less clear whether NoRA is suitable or not, but there is still a scaling symmetry present in the model and so NoRA wth the scaling ansatz and k = 0 is still a reasonable candidate for describing the ground state wavefunction.
We emphasize that these are plausibility arguments.We hope to carry out a more systematic study of this direction in future work.These plausibility arguments can also be adapted to a variety of other mean-field models.For example, other models with gapless critical points or critical phases may have a NORA-like description which requires the full layer structure with ln N layers.We have also conjectured that general quantum LDPC codes may have a NORA-like description and in such cases presumably the full layer structure is also needed, i.e. a finite depth circuit, even without geometric locality constraints, would not be suf-ficient to capture the ground space.These are all interesting directions for future work, especially the question of what determines the required number of layers in a NORA network.

Connections With Holographic Models
Finally, let us comment further on the connection to low-dimensional models of quantum gravity.We already made use of these connections as part of the motivation for our ansatz, in particular, we checked the complexity of our network against holographic estimates of complexity.
The basic point is that our architecture mimics the structure of two-dimensional Jackiw-Teitelboim (JT) gravity in AdS2.The analogue of the area formula for black hole entropy is the statement that the entropy of black hole is equal to the value of a scalar field called the dilaton evaluated at the event horizon (technically, at the bifurcation point), see e.g.[66].This in turn leads to a low-dimensional version of the Ryu-Takayanagi formula for entanglement (for a review see [67]), also in terms of the dilaton field.
After solving the equations of motion, one finds a solution in which the metric is and the dilaton is The number of degrees of freedom at a scale determined by z is proportional to where ϕ 0 is some background value.Comparing to our network, we interpret ln z as analogous to L − ℓ, which increases from zero as we go from the UV (top of Fig. 1) to the IR (bottom of Fig. 1) .The number of qudits at "layer z" would be k + (N − k)e − ln z .Then, since ϕ 0 + ϕ(z) also has the we could interpret ϕ 0 4G N as analogous to k and ϕ 1 4G N as analogous to N − k.It is also instructive to compare the tensor network model we just described with a less structured model.Let us step back and start by just supposing that the tensor network is a model of the spatial geometry.For two or more spatial dimensions in the bulk, this can give an interesting network structure.For example, the HaPPY code and its associated network yields a discretization of the two-dimensional hyperbolic disk.However, in one dimension, there is no interesting geometry and the analogue of the HaPPY network would be a simple one-dimensional network, essentially a matrix product state.We could identify the direction along the network with ln z and let the bond dimension of the network depend on z such that ln χ(z) ∝ ϕ 0 + ϕ(z).But apart from these choices, the model is unstructured and nothing yet has been said about the structure of the tensors in the matrix product state.Within this framing, we can view the NoRA network model as a refinement of the matrix product state model in which we endow the tensors with the layer structure of the NoRA network as in Figure 13.
The coefficients J i can be arbitrarily chosen and determine the energy scales of the system, but since they are necessarily positive-definite, this do not affect the space of ground states i.e. the space of valid physical qudit states.Excitations away from a ground states then correspond to errors being present in the state, which is because of the one-to-one relation between projectors and stabilizers.

A.2 General Thermodynamic Quantities
Using the Hamiltonian derived in the previous section, we can now compute the associated Gibbs state and some of its properties, including the entropy.First, it is straightforward to show that where in the last line we used a generalization of the binomial theorem and the fact that the projection operators commute by definition.Computing the partition function Z using the final expression in (46) can be done in the following way: x (47) Note that in the second line we used the definition (42) for the projection operators, which implies that tr[ given that none of the indices i a coincide.Going from the penultimate line to the last one we then again applied the generalized binomial theorem.
where n ℓ is the number of stabilizer basis elements with the same associated energy level: with 1 ≤ ℓ ≤ L and r L = N − k.It is easy to see that this distribution therefore does indeed satisfy The other assumption we are making is that the distribution of energies J ℓ increases exponentially with increasing ℓ, giving it the form of for some UV energy scale Λ > 0 and rate of increase γ > 0. This is an artificial but reasonable choice because we want the circuit to obey renormalization invariance while going from the IR to UV limit in the same was as MERA networks generally do.

A.3.1 Moving to the Continuum Limit
To determine the scaling of the entropy close to the zero temperature (i.e.β → ∞) limit, it is useful to consider the continuum limit of (55) in addition to the other assumptions we made.The stabilizer difference ∆n ℓ therefore becomes the stabilizer density where α > 0 can be chosen arbitrarily 9 and ρ 0 is fixed by the density having to satisfy Because the distribution of the energy levels (57) can be left untouched when moving to the continuum limit, the stabilizer entropy can be naively approximated as with p(ℓ) being of the same form as p ℓ in (53), but now considered as a continuous function of ℓ.But to make the upcoming calculations easier, we perform a change of variables, integrating over J = J(ℓ) instead of ℓ.To do that, we first note that from (57) it follows that and hence This also allows us to express the stabilizer density as a function dependent on J: Finally, the continuous entropy as an integral over J is Note that the lower integration bound acts as an effective IR cutoff for the integral.This is necessary for us to be able to make the following approximations..

A.3.2 Low-Temperature Limit
Computing the integral in (64) is in general hard, but since we are only interested in the limit of small T /J (or equivalently large βJ), we can approximate the binary entropy S(p(J)) that occurs in the integral as which is straightforward to prove.To realize this limit it is necessary to choose the right parameters since it follows from (57 and hence βΛ • e −γL ≫ 1 ⇐⇒ γL ≪ log(βΛ).( 67) Plugging (65) into (64) and noting that ⟨N − k⟩ = i p i = 0 in that limit then leaves us with an expression that can be further simplified using a change of variables: Let's consider the trailing integral.Up to the integration bounds it is the same as the gamma function Γ(α/γ + 1), whose integrand is positive everywhere.We can therefore get an upper bound for S cont (that we also expect to be approximately saturated for certain domains of βΛ) by substituting the "incomplete" gamma function with the proper one.Thus we have which only scales with (βΛ) −α/γ = (T /Λ) α/γ , indicating that the entropy could indeed follow a power law, at least for certain low-temperature regimes.To show how well both continuous approximations hold up against the discrete stabilizer entropy with equivalent parameters (N − k = r L , α = log(r)), we display both in logarithmic plots over log(T /Λ) and with different choices of γ, which is the only significant free parameter.These plots are depicted in figure 14 and indeed confirm that our low-temperature approximations are good at predicting aspects of the actual entropy, including its power-law growth.In the first two figures it can be seen that our continuous approximation from (68) matches almost exactly with the discrete stabilizer entropy for γ ≪ 1 and small T /Λ.Even though the second figure shows less behavior than the first one, we expect that it will behave similarly for even lower relative temperatures.While the last two approximations with γ ≥ 0 also receive their primary contribution from the polynomial term, it is more apparent that they don't completely align with the actual data anymore.Especially in the last figure where γ = 2 the trend of the stabilizer entropy is not strictly polynomial anymore.Still, each figure has at least a regime where its growth is either exactly polynomial or follows a polynomial trend that aligns with our theoretical predictions up to a total constant factor.

B Estimating the Layer Growth Factor
Given a generic string of n generalized Pauli operators with local dimension d and initial weight w 0 ≪ n, we can estimate the relative weight growth g the string experiences from one layer of n/q random Cliffords being applied to random disjoint substrings of length q.The weight w k at the kth layer can therefore be estimated as

B.1 Single Layer
To find g it is helpful to look at a single substring of length q being scrambled by a single random Clifford.
In that case, as long as a substring's weight w k−1 (q) is not zero, we can expect its weight in the next layer to be on average regardless of how the initial string looked 10 .To extend this argument to the whole Pauli string we can therefore distinguish between two extreme cases: • All w k−1 non-trivial Pauli operators are contained in as few substrings as possible, namely ⌈w k−1 /q⌉.Since each such substring will on average have the weight (71) after the Clifford layer, we find that • Each (non-trivial) substring contains exactly one non-trivial Pauli operator, meaning that in the next layer we have w k−1 substrings each having the average weight (71).The total weight is therefore Hence we can provide approximate upper and lower bounds for g by However, for our purposes we will always have w k−1 ≪ n (see next section), which makes it more likely for g to be closer to the upper bound.Hence we can assume that

B.2 Multiple Layers
Usually the scrambling circuits will be composed of more than one layer of random q-party Clifford gates.Therefore, given that we start with w 0 ≪ n and keep g fixed to be (75), what is the approximate maximum depth D for which w D = g D • w 0 gives a good estimate for the total operator weight at the end?Due to our previous arguments, we can expect the approximation to not hold anymore by the time w D is of order n since then the case of (72) will dominate.In this case we say that the weight is saturated, and we can estimate the order of magnitude of the saturation depth D sat by requiring that g Dsat • w 0 ⪅ n, leading to: A tighter bound can also be achieved by using log q instead of log g .Both options are shown for a specific simulated example in Figure 15.

C Phase Space Formalism C.1 Weyl Representation
Given a Hilbert space H of prime dimension d > 2 11 , we choose a basis {|0⟩ , |1⟩ , . . ., |d − 1⟩} with its states being labeled by the elements of the associated finite (Galois) field GF(d) ≡ F d12 .One can then introduce clock and shift operators Z, X which act on the basis states according to [68] 10 Remember that the generalized Pauli group of dimension d has d 2 different elements, up to phases.Of those only the identity has zero weight. 11The case of d = 2 is excluded here since our choice of representation requires the existence of a 2-element in the group such that 1  2 ≡ d+1 2 is also a group element, which is not true for d = 2 (i.e. the field cannot have characteristic 2).Depicted are also the estimates of the effect growth factor g (75) and the saturation depth(s) D sat (76) for which it can be considered to hold.While (76) indeed provides a good maximum circuit depth for which the data and our estimate for g = 16/9 approximately coincide, a tighter bound can be achieved by instead using q = 2 as the base of the logarithm.
where p, q, k ∈ F d and χ(k) = e 2πik/d .Note that addition and multiplication happens over F d and is thus mod d.This is also respected by our choice for χ(k) since χ(k + d) = χ(k) even for addition without modulo.
We are now able to define the so-called Weyl operators for a single qudit, which provide a generalisation of the Pauli operators on a qubit: Extending this definition to n qudits is as easy as tensoring n copies of (78), which we write as w(v) = w(p 1 , q 1 , . . ., p n , q n ) = w(p 1 , q 1 ) ⊗ . . .⊗ w(p n , q n ).( 79) Each Weyl operator is therefore uniquely represented by an element v of a 2n-dimensional vector space V over the field F d .Using the commutation relations of Z p and X q that arise from their definition in (77), it also follows that where ⟨•, •⟩ is the symplectic product on V , which obeys ⟨v, w⟩ = − ⟨w, v⟩ and can be expressed as a matrix product: ⟨v, w⟩ = v T Jw, J = 0 1 Because of that the Weyl operators form a projective representation of the associated vector space V equipped with a symplectic product.It is also noteworthy that (80) implies that two Weyl operators w(v), w(w) commute if and only if the corresponding symplectic product ⟨v, w⟩ vanishes.
Another useful identity which we will use later is the fact that only the identity I n = w(0) has a non-vanishing trace: tr[w(v)] = d n δ v,0 . (82) This is trivial to show for X q but requires using the fact that the Kronecker delta can be written as to prove it for Z p as well.

C.2 The Clifford Group
The Clifford group is a subset of the unitary group which maps Weyl operators to other Weyl operators (up to a factor): for some c : V → C and S : V → V .Because S therefore has to be compatible with (80), it is easy to see that it has to be linear and preserve the symplectic product: In matrix representation, one can also equivalently state this property as S T JS = J.Such a function is called symplectic.The set of all symplectic functions for a given vector space V forms the so-called symplectic group Sp(2n, F d ) 13 .In general, the structure of the Clifford group is completely determined by the following statements: for some phase ϕ.
3. Up to a phase, any Clifford operator is of the form for a suitable a ∈ V and symplectic S.
A proof of these statements can be found in [68].Note that this also fixes the factor from (84) to be c(v) = χ(⟨a, Sv⟩).

C.3 Stabilizer States and Codes
As mentioned before, a vanishing symplectic product ⟨v, w⟩ is equivalent to a vanishing commutator [w(v), w(w)].One can therefore construct a set containing only commuting Weyl operators by choosing M to be a subspace of V satisfying ⟨m i , m j ⟩ = 0 ∀ m i , m j ∈ M (90) Such a subspace is called isotropic and it is easy to see that it also forms a group under vector addition since the symplectic product is bilinear.The cardinality of isotropic subspaces can range between 0 and d n as there are at most n elements with mutually vanishing symplectic product in a 2n-dimensional symplectic basis (see footnote 13 for the reason).We will refer to M having maximal cardinality as maximally isotropic.
In general it is convenient to write the basis elements of an isotropic subspace as a k × 2n (or 2n × k) matrix over F d , where k = log d (M ) is the size of the basis.In the literature this is called the stabilizer matrix, although there it is often written in terms of the actual Pauli/Weyl operators and not their symplectic representation.
Isotropy of M allows one to (at least partially) diagonalize the Weyl operators contained in w(M ), even completely if M is maximally isotropic.In the latter case it is therefore possible to define a unique quantum state |M, v⟩ in terms of the elements in w(M ) acting on it as stabilizers: The vector v ∈ V therefore determines the phase differences between the eigenstates assocated with w(M ).
A state satisfying (91) is called a stabilizer state and can be written as It is easy to show that (92) is a projection operator and has unit trace by applying (82) and using the fact that M is a group and thus satisfies M + m = M for all m ∈ M .In fact, even for a non-maximally isotropic subspace M would (92) still be a projector (up to normalization), but not a quantum state anymore.In this more general case we have with tr[Π(M, v)] = d n |M | .All states in the subspace which Π(M, v) projects onto therefore satisfy (91), meaning that they form a code space.We can therefore identify this case as being a stabilizer code since it satisfies the definition in section 3.3.Even though finding stabilizer codes therefore just amounts to making a choice for M and v, it does not ensure that the resulting code is good in the sense that its Hamming distance might be small or does not scale well.

C.4 Entanglement Entropy of Stabilizer States
Thanks to the structure of the symplectic product (81) and the multi-particle Weyl operators defined in (79), one can easily take the partial trace of (92) over a desired subsystem B by writing v = v A ⊕ v B (same for m) and w(m) = w(m A ) ⊗ w(m B ) for all m ∈ M and applying (82) to the latter term in the tensor product.The resulting reduced state is then where we made use of the fact that n = n A + n B and identified (93), but this time in terms of v A and This is possible since the definition of M A ensures that it is again a group (although not necessarily maximally isotropic) 14 .
The fact that even after tracing out a subsystem the resulting reduced state is still proportional to a projection operator makes computing the entanglement entropy straightforward.While it is possible to just directly evaluate the Von-Neumann entropy S(A) = tr[ρ A log d (ρ A )], a more elegant and insightful approach can be made by instead considering the Rényi entropies which have the property that where S(A) = S (1) (A) = lim n→0 S (n) (A) reproduces the ordinary Von-Neumann entropy.What makes the Rényi entropies interesting here is that they satisfy S (n) (A) = log d (rank ρ A ) for all n > 0 if the state being considered has a flat entanglement spectrum i.e. it is proportional to a projection operator 15 .Since this is the case for the reduced stabilizer state we can use the fact that rank ρ A = d 14 Naively computing MA using (95) is not efficient as such an algorithm would have O(d n ) runtime.A runtime that is polynomial in the system size can be achieved by instead permuting the sites that are to be traced out to the front the stabilizer matrix and then computing its reduced row echolon form.The basis vectors b = bA ⊕ bB for which bB ̸ = 0 are then removed and for the remaining elements only bA is being considered. 15The proof is straightforward: Let ρA = α • ΠA, then S (n) (A) =

Figure 2 :
Figure 2: Logarithmic scaling of the exact Gibbs entropy S stab associated to H, and the low-temperature approximation S approx for L = 20, r = 2, k = 1, d = 2, Λ = 1 and γ = 0.4.Both match almost exactly for our choice of parameters and small T /Λ, confirming the existence of a scaling law.The same is also true for other choices of γ (the only significant free parameter), as seen in figure14.

Figure 4 :
Figure 4: Circuit representation of state in which the code space is maximally entangled with a reference R.Here U M N is a unitary that takes states of the form |ψ anc ⟩ M ⊗ |ψ code ⟩ N and maps |ψ code ⟩ to the code space of the chosen stabilizer code.|ψ anc ⟩ is the all 0 state of the ancillary qubits.If a region A has zero mutual information with R, then it has no access to the encoded information.The code distance δ is the biggest integer such that all regions

Figure 5
depicts (approximately) that behavior for k = 2, L = 6 and D > 1.The case of D = 1 is also shown, but the corresponding average entropy fails to scale linearly with |A| at all.

Figure 5
Figure 5  also shows that once the size of A increases beyond N/2, the entropy shrinks again.This is due to the complementary subregion A now being smaller than A, and both regions sharing the same entanglement spectrum.Overall the entanglement entropy S(A) therefore follows a symmetric Page curve, as expected.

Figure 5 :
Figure 5: Average entanglement entropy S(A) with regard to the subregion size |A|, corresponding to a NoRAprepared stabilizer state (k = 2, L = 6) with different layer circuit depths D. Volume-law scaling of S(A) for small |A| is already possible when D = 2, but bigger subregions (up to a size of N/2 = 33) also exhibit a volume scaling when D is larger.Once |A| is larger than N/2, the entanglement entropy decreases again for any choice of D, forming a symmetric Page curve.

Figure 6 :
Figure6: The average code distance δ with regard to the circuit depth D of the NoRA network with a fixed ground space (k = 2, L = 7).Due to the approximately exponential trend of the data, the average distance is already close to its saturated maximum of δ max = 64.43 ± 0.09 if the layer circuits have a depth of D = 3. Larger depths therefore provide little to no improvement.However, the maximum average distance is still several standard deviations less than the theoretical maximum provided by the quantum singleton bound δ qsb , though we expect them to be somewhat closer (but not necessarily equal) at large D and L.

5 Figure 7 :
Figure 7: Average relative code distance δ/N with regard to the system size N = 2 + 2 L and its inverse 1/N , with D = 4.Even though the depicted data trends are only approximate, they do suggest approximate convergence of the relative distance towards the (relative) quantum singleton bound δ qsb /N = 0.5 for larger N .

Figure 8 :
Figure 8: Change of the code distance with regard to increasing ground space size k, and keeping the number of layers fixed to be L = 6 with circuit depth D = 3.The approximately linear decrease of the distance is indicative of the tensor network being able to create volume-law entanglement.

Figure 9 :
Figure 9: (Logarithmic) relative difference between the actual weights of a single Pauli string and their expected maximum of N • (d 2 − 1)/d 2 for the tensor network ansatz with a variable number of layers L and depths D (k = 2).Already for a circuit depth of D = 2 does the relative difference seemingly converge towards a constant value, with larger depths resulting in even faster convergence and smaller differences.This shows that the relative weight (and therefore the ideal circuit depth to maximize the code distance) does only depend strongly on d, q and r, not L.

Figure 10 :
Figure 10: Relative stabilizer weight distributions at each layer of the tensor network ansatz with L = 6 total layers.The green distributions correspond to circuit depth D = 1, the orange ones to D = 2 and the blue ones to D = 3.The dashed lines indicate where the expected maximum weight averages w max ℓ are located for each layer.Note that as expected the distribution corresponding to D = 1 does not converge towards saturation due to the rate r of new qudits being added outweighing the scrambling rate g D of the circuits at each layer.

Figure 11 :
Figure11: Average relative code distance δ/N with regard to the inverse system size 1/N for the tensor network ansatz with ground space scaling.Both the quantum singleton bound (in grey) and the approximate trend of the generated data (in orange) could coincide in the limit of N → ∞ (a → ∞), where we have δ/N = 1/3.However more data is needed to be able to prove this.Overall, better relative distances can only be achieved by increasing b at the cost of reducing the rate.

Figure 12 :
Figure 12: Relative stabilizer weight distributions of the tensor network ansatz with scaling ground space and different choices of a.The green distributions correspond to circuit depth D = 1, the orange ones to D = 2 and the blue ones to D = 3.These distribution are not dissimilar to what we already encountered in 10 for the individual layers of a single circuit, highlighting the self-similarity of our ansatz.

Figure 13 :
Figure13: Schematic of a matrix product state as simple toy model of a spatial slice of a 1 + 1d holographic geometry.The orange upward lines can be viewed as bulk degrees of freedom and are analogous to the |0⟩ input states in NoRA (see Figure1).The shrinking number of blue lines as one proceeds into the bulk provides a schematic version of the decrease of degrees of freedom discussed just above.The NoRA network can be viewed as a refinement of this simple matrix product state model in which the tensors of the matrix product state have additional substructure.Here the orientation of the diagram is rotated 90 degrees relative to Figure1.

Figure 14 :
Figure 14: Logarithmic scaling of exact stabilizer entropies S stab and their continuous approximations S cont (with and without the gamma function correction) for L = 20, N − k = r L , k = 1, d = 2, r = 2, α = log(r), Λ = 1 and γ ∈ {0.1, 0.4, 1, 3}.In the first two figures it can be seen that our continuous approximation from (68) matches almost exactly with the discrete stabilizer entropy for γ ≪ 1 and small T /Λ.Even though the second figure shows less behavior than the first one, we expect that it will behave similarly for even lower relative temperatures.While the last two approximations with γ ≥ 0 also receive their primary contribution from the polynomial term, it is more apparent that they don't completely align with the actual data anymore.Especially in the last figure where γ = 2 the trend of the stabilizer entropy is not strictly polynomial anymore.Still, each figure has at least a regime where its growth is either exactly polynomial or follows a polynomial trend that aligns with our theoretical predictions up to a total constant factor.

9 DFigure 15 :
Figure 15: Averaged relative weight growth w D /w D−1 of a single Pauli string (d = 3, n = 128, w 0 = 1) subjected to a random 2-local Clifford with increasing circuit depth D (1000 repetitions).Depicted are also the estimates of the effect growth factor g (75) and the saturation depth(s) D sat (76) for which it can be considered to hold.While (76) indeed provides a good maximum circuit depth for which the data and our estimate for g = 16/9 approximately coincide, a tighter bound can be achieved by instead using q = 2 as the base of the logarithm.

1 .
For any symplectic S there is a unitary operator µ(S) satisfyingµ(S)w(v)µ(S) † = w(Sv) ∀ v ∈ V. (86)2.µ(S) is a projective representation of the symplectic group, meaning µ(S)µ(T ) = e iϕ µ(ST ) (87) n A |M A | to show that S(A) = n A − log d |M A |. (98)If the number of basis vectors kA = log d |M A | is known, then computing S(A) = n A −k A is straightforwardand numerically stable16 .