Classical shadows based on locally-entangled measurements

We study classical shadows protocols based on randomized measurements in $n$-qubit entangled bases, generalizing the random Pauli measurement protocol ($n = 1$). We show that entangled measurements ($n\geq 2$) enable nontrivial and potentially advantageous trade-offs in the sample complexity of learning Pauli expectation values. This is sharply illustrated by shadows based on two-qubit Bell measurements: the scaling of sample complexity with Pauli weight $k$ improves quadratically (from $\sim 3^k$ down to $\sim 3^{k/2}$) for many operators, while others become impossible to learn. Tuning the amount of entanglement in the measurement bases defines a family of protocols that interpolate between Pauli and Bell shadows, retaining some of the benefits of both. For large $n$, we show that randomized measurements in $n$-qubit GHZ bases further improve the best scaling to $\sim (3/2)^k$, albeit on an increasingly restricted set of operators. Despite their simplicity and lower hardware requirements, these protocols can match or outperform recently-introduced ``shallow shadows'' in some practically-relevant Pauli estimation tasks.


Introduction
Classical shadows are a powerful method to learn many properties of unknown quantum states with a relatively low number of measurements [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17].This is an important task in light of the advent of programmable quantum simulators capable of preparing increasingly complex quantum states, whose experimental characterization and classical description may be challenging [18,19,20,21].Classical shadows are based on randomized measurements [2,22,23]: the unknown state of in-terest is measured in a large number of different bases, randomly chosen from a suitable ensemble, and the resulting classical data is stored and processed to predict properties of the state.
Different choices for the ensemble of random unitary rotations yield different flavors of classical shadows, each of which may be best suited to the prediction of different properties [1].Arguably the most practically-relevant example is the random Pauli ensemble, where each qubit is measured in a randomly-chosen X, Y or Z basis; this requires only single-qubit random rotations on the hardware and is well suited to learning e.g. the expectation value of k-local operators.Another important example is the random Clifford ensemble, where the basis is randomized by a global Clifford operation; this allows efficient estimation of fidelities and low-rank operators.Intermediate schemes dubbed shallow shadows have been recently introduced [24,25,26,27].These randomize the basis by means of variable-depth circuits, thus interpolating between random Pauli and random Clifford measurements.
In this work, we introduce a family of protocols that interpolates between locally-and globallyrandom measurements in a different way, by tuning the locality of subsystems on which entangled measurement bases are allowed.The protocols are hardware-efficient, requiring only few-body entanglement, and the classical post-processing is likewise simple.Nonetheless, they can outperform random Pauli shadows and even shallow shadows in some Pauli estimation tasks of practical interest, making them a useful addition to the randomized measurement toolbox.

Review
In general, given an ensemble of random unitaries, the classical shadows protocol is as follows.A quantum state of interest ρ on N qubits is transformed under a unitary U drawn from the ensemble.It is then measured in the computational basis yielding a bitstring b.The pairs {(U, b)} of basis choice and measurement outcome represent classical data that can be used to efficiently construct a compressed description of the state.Namely one builds "snapshots" σ = U † |b⟩⟨b| U which in expectation are related to the state of interest ρ by a channel M, called the shadow channel: E[σ] = M(ρ).From this, one defines "inverted snapshots" ρ = M −1 (σ) which by construction yield ρ in expectation [1].The set of inverted snapshots {ρ i } K i=1 (obtained over K iterations fo the quantum experiment) is a classical shadow of ρ of size K.
The practical usefulness of classical shadows depends on the sample complexity of various estimation tasks, i.e., how many experimental shots are needed in order to predict a given property of ρ to a fixed (additive) accuracy ϵ with high probability.For predicting M linear functions (i.e.expectation values of a set of operators {O i } M i=1 ), this is bounded above by [1] where ∥ • ∥ sh is a norm determined by the measurement protocol, called the shadow norm.For a Pauli operator P in an N -qubit system, assuming Pauli invariance of the measurement ensemble [11], the shadow norm is given by independent of the state ρ.Here the inverse map M −1 appears due to the construction of the "inverted snapshots" ρ = M −1 (σ).For random Pauli measurements, the shadow channel factors into single-qubit depolarizing channels x, y, z}, and eigenvalues where • denotes the identity (α = 0) and • stands for any of the traceless Pauli matrices α ∈ {x, y, z}.It follows that where k is the weight of operator P , i.e., the number of qubits on which it acts nontrivially [1].
A simple way to understand this scaling is that there are three possible basis choices (X, Y or Z) per site, sampled randomly in each experimental run.Only some basis choices are useful towards the estimation of a Pauli expectation value ⟨P ⟩.In particular, only measurement bases that correctly match all nontrivial Pauli matrices in P contribute to its estimation (in the language of Ref. [28], we say such measurements "hit" P ).As only one in 3 k bases "hits" P , estimating ⟨P ⟩ to accuracy ϵ requires of order 3 k /ϵ 2 iterations of the experiment.
It is known on information-theoretic grounds [1] that the scaling of 3 k is optimal in general.
However, it is possible to improve performance on certain Pauli operators at the expense of others.For example, shallow shadows [24,25] were shown to achieve a scaling of ∼ k2 k (when tuned to a k-dependent optimal depth) for Pauli operators with contiguous support in one dimension [27], while typically performing worse on operators with a sparse support.Such trade-offs may be worthwhile in many cases, given the importance of geometric locality in many-body physics.The protocols we introduce in this work feature a similar trade-off, with very favorable performance on a physically-relevant class of operators obtained at the expense of the learnability of other operators.

Bell shadows 2.1 Protocol
We begin by introducing a variant of the random Pauli measurement protocol based on two-qubit measurements in the Bell basis.The protocol requires first to choose a grouping of the qubits into pairs (we assume the number of qubits N is even).For concreteness, we take this pairing to be geometrically-local on an underlying lattice, and thus refer to this as a dimer covering of the lattice; however geometric locality is not necessary.The protocol then involves the following steps, illustrated in Fig. 1(a): (i) "Locally scramble" the state, ρ → U ρU † with U = N i=1 u i , each u i being a random Clifford gate; (ii) Measure each pair of qubits in the Bell basis; (iii) Build the classical shadow according to the standard prescription [1].
In Fig. 1(a), the Bell measurement [step (ii)] is compiled as a sequence of CZ, Hadamard and X-basis measurements.CZ is the controlled-Z gate, CZ = diag(1, 1, 1, −1); it is a Clifford operation that maps the Pauli-X basis to the basis stabilized by ±X 1 Z 2 , ±Z 1 X 2 .This basis, up to local unitary transformations (which may be absorbed into the local scrambling step), is equivalent to the standard Bell basis stabilized by ±X 1 X 2 , ±Z 1 Z 2 , i.e. the basis vectors {|Φ α ⟩ : α = 0, x, y, z}, with

Sample complexity
The sample complexity of learning the expectation value of a Pauli operator P via this protocol is determined by the shadow norm ∥P ∥2 sh .This can be analyzed in analogy with the random Pauli measurement protocol [1], by noting that the shadow channel factors into a product of two-qubit channels, M = E ⊗N/2 2 . Each two-qubit channel E 2 acts on a single dimer as Here both du and dv are the Haar measure1 over U (2).The channel is diagonal in the Pauli basis owing to local scrambling [24,25], The eigenvalues are given by Note that the four possible Bell state outcomes {|Φ γ ⟩ : γ = 0, x, y, z} all yield the same contribution to Eq. (6) due to unitary invariance of the du measure, hence a factor of 4.
Next, using the fact that Tr BC T for any single-qubit operators B, C, we have We see that one of the random rotations is redundant: since the integrand depends on u, v only through the product u † v, we may change integration variables from du dv to du d(u † v) (using the unitary invariance of the Haar measure) and integrate over u † v; the result is independent of u, making it redundant.This illustrates an advantageous property of the protocol: it is enough to scramble only one qubit per dimer.Physically, this is a consequence of gate teleportation across the Bell pair: From Eq. (7), it is straightforward to derive the results (like in Eq. (3), • stands for the identity, α = 0, while • stands for a traceless Pauli matrix α ∈ {x, y, z}).These eigenvalues fully determine the shadow norm of any Pauli operator P : where the pair (i, j) ranges over dimers in the system and α i indices are defined by P = N i=1 σ α i i .The presence of null eigenvalues in M immediately shows that the ensemble is not tomographically complete.For convenience, let us introduce the following definition: a Pauli operator P is compatible with the dimer covering if its support intersects each dimer in either 0 or 2 sites.Clearly Pauli operators that are incompatible with the dimer covering are not learnable, as sketched in Fig. 1(b), since they feature at least one λ •• = 0 eigenvalue.Nonetheless, for the Pauli operators that are compatible with the dimer covering, the shadow norm is remarkably low: where k is the weight of P .There are 10 N/2 compatible Pauli operators out of a total of 4 N (each one of the N/2 dimers may host σ 0 ⊗σ 0 or σ α ⊗σ β with α, β ∈ {x, y, z}, for a total of 10 options).This exponentially vanishing fraction, ≃ 0.79 N , is the price to pay for the improved sample complexity.
In analogy with the random Pauli case, we may understand the performance of Bell shadows with a simple basis counting argument.For each qubit pair, each run of the experiment measures 3 out of the 9 two-qubit operators {σ α ⊗ σ β : α, β = x, y, z}-e.g., the experimentalist may explicitly measure XX and Y Z, but that also implicitly measures 2 ZY = XX • Y Z. Thus we measure 3 out of the 9 weight-two operators on the two qubits.In all, the probability that the measurement "hits" a given k-qubit Pauli operator P (compatible with the dimer covering) is 3/9 = 1/3 per pair of qubits on which P acts nontrivially, i.e. 3 −k/2 overall.We also see why operators that are incompatible with the dimer covering are not learnable: any triplet of weight-2 commuting operators has the structure of {X ⊗σ α , Y ⊗σ β , Z ⊗σ γ }, with (αβγ) a permutation of (xyz); in other words, each operator in the triplet must have a different Pauli matrix on the first qubit.The same clearly goes for the second qubit.Thus weight-1 operators such as IX or Y I necessarily anticommute with two of the operators in the triplet, and therefore we never learn about their value from Bell measurements.

Some use cases
While tomographically incomplete, the random Bell measurement ensemble is very powerful for learning Pauli operators compatible with the chosen dimer covering.This includes many cases of interest for condensed matter physics and quantum information science; we list some examples below.
String operators.In a 1D lattice, any Pauli operator P whose support is made of k consecutive sites, with k even, is learnable.Operators of this form are interesting in condensed matter physics as they include string order parameters for symmetry-protected topological (SPT) phases [29].To learn all operators of this form with a given length k, regardless of endpoint location, one must sample two dimer coverings of the lattice (pairing even and odd bonds, respectively, see Fig. 2(a)).Following Eq. (1), the overall sample complexity of learning all M = N • 3 k operators of this form (N possible endpoint locations, 3 k sequences of Pauli matrices inside the support) with accuracy ϵ is at most 2 ln(M/2)3 k/2 ϵ −2 .Here we have split the M operators into two sets of size M/2 based on the parity of their endpoint location, which determines the choice of dimer covering, and applied Eq. (1) to each set separately.The corresponding scaling for random Pauli measurements is ln(M )3 k ϵ −2 , larger by a factor of ≃ 3 k/2 /2.Plaquette operators.Products of Pauli matrices around a plaquette in a two-dimensional lattice may also be learnable by choosing suitable dimer coverings.Stabilizers of topological codes [30] typically take this form, as do "ring exchange" terms [31] in lattice Hamiltonians.As an example, all hexagonal plaquette operators of a honeycomb qubit lattice may be learned by sampling two dimer coverings (Fig. 2(b)).The prefactor to the sample complexity is 2•33 = 54 (i.e. the squared shadow norm 3 k/2 , k = 6, for each of the 2 dimer coverings).For comparison, with random Pauli measurements the corresponding prefactor is 3 6 = 729, over an order of magnitude larger.
Multi-point functions.Learnable operators need not be geometrically local; in particular, multi-point correlation functions of k-body Hamiltonian terms (with k even) may be learnable.These quantities play an important role in condensed matter physics, both in and out of equilibrium; for example, two-point functions may capture long-range order or describe the propagation of a single quasiparticle from one place to another, while four-point functions may describe scattering between two quasiparticles, with implications for pairing instabilities or transport [32,33,34,35].As a simple example, consider a local Hamiltonian in a d-dimensional lattice, H = ⟨i,j⟩ h (i,j) , where h (i,j) = α,β=x,y,z J αβ i,j σ α i σ β j is the energy density operator at bond b = (i, j), and J αβ i,j are a priori unknown couplings For a general locally-scrambled measurement ensemble [24,36], the eigenvalues of the shadow channel are fully specified by the entanglement feature [37,38] {P A } of the measurement basis: a vector of length 2 N , with entries labelled by subsystems A ⊆ {1, . . .N }, defined by where |ψ⟩ ranges over states in the measurement basis (we assume the measurements are projective and thus described by a basis of pure states).In words, Eq. (11) describes the purity of each subsystem A averaged over basis states.As shown in Ref. [11] 4 , the shadow channel eigenvalues are given by (here λ A refers to any Pauli operator with support A).In the present (two-qubit) case, the entanglement feature contains only one nontrivial parameter: the average purity of a single qubit, which we will denote by 5 e −S a 2 .In practice this parameter can be tuned by varying the two-qubit gates in the protocol, Fig. 1(a), from CZ to CPhase(ϕ); the gate angle ϕ is related to the entropy S a 2 via e −S a 2 = cos 4 (ϕ/4) + sin 4 (ϕ/4), which achieves the extremal values at ϕ = 0 (S a 2 = 0, disentangled measurements) and ϕ = π (S a 2 = ln(2), Bell measurements).
It is convenient to introduce a "deformation parameter" δ = ln(2) − S a 2 , so that δ = 0 recovers Bell shadows.Then, from Eq. ( 12), the nontrivial eigenvalues of the two-qubit shadow channel where the ≃ holds to leading order in δ → 0.
Thus for δ > 0 the spectrum of M becomes strictly positive, and the ensemble tomographically complete.The shadow norm of a Pauli operator P with support A that cuts c A dimers is given by where the second line is up to subleading corrections in small δ.We recover Eq. (10) for δ → 0: operators that are compatible with the dimer covering (c A = 0) have squared shadow norm 3 |A|/2 , the others (c A > 0) are not learnable.
For the practically-relevant case of string operators in 1D, the result Eq. ( 14) is illustrated in Fig. 3.For small δ (highly entangled measurement basis), the asymptotic scaling in large k is close to 3 k/2 , but odd-k operators (which break a dimer) are much more costly to learn.Increasing δ alleviates the even-odd discrepancy at the expense of a slightly worse asymptotic scaling in k.In particular for δ = ln(11/8) ≃ 0.318, we have ∥P ∥ 2 sh = 4 k mod 2 • 2 k .This is notable as it beats the performance of optimal-depth shallow shadows (∼ k2 k ) [27] for modest values k ≳ 4 (including odd k) that are relevant to near-term applications.It is especially remarkable given the much simpler protocol; see Sec. 5.2 for further discussion of the relationship between these results.
average" S2 = −Eρ log Tr(ρ 2 ).The two differ by the order of averaging and taking the logarithm.We have S a 2 ≤ S2 by convexity.
Finally, it is interesting to relax the condition of a fully-contiguous support to allow for a density of "holes" (identity matrices) in the operator.Specifically, we consider the ensemble of Pauli operators supported inside a segment of ℓ sites in a 1D chain, in which each of the ℓ sites has a probability ρ ∈ [0, 1] of hosting a traceless Pauli (i.e., the average weight is ρℓ).The sample complexity of learning a typical operator 6 from this ensemble is given by This is to be compared with the analogous result for random Pauli shadows, 3 ρℓ .Entanglement in the measurement basis is found to be beneficial only above a threshold Pauli density, that depends on the amount of entanglement in the basis (parametrized by δ = ln(2) − S a 2 ).We find that ρ * (δ) approaches 1 for δ → 0 (Bell shadows) and 1/2 for δ → ln(2) (Pauli shadows).Thus in particular for any density ρ ≥ 1/2, there exist entangled bases that are advantageous over product bases for this task.We can also compare Eq. ( 16) to the analogous result for shallow shadows at depth log(ℓ), which is 7 2 ℓ [27]; numerical inspection shows that locally-entangled shadows remain advantageous over shallow shadows above a threshold density ρ * ≳ 0.945.Thus the results in Fig. 3, about 1D string operators with contiguous support, are qualitatively robust to the insertion of sufficiently sparse holes in the support. 6Here for the "typical" value of a random variable x we use its geometric mean e E ln(x) .This quantity is less affected by rare fluctuations of x compared to the mean E(x).This is important in our case since the shadow norm ∥P ∥ 2 sh fluctuates across many orders of magnitude. 7For finite ρ, typical operators contain holes whose size is at most O(log ℓ): the probability of a hole of length x goes as ∼ (1 − ρ) x , so the probability of having no such hole is roughly ∼ [1−(1−ρ) x ] ℓ , which becomes large when x ∼ log ℓ.In the limit of large ℓ, at depth O(log ℓ) such holes are filled and the interior of the operator is nearly equilibrated, giving the same result we would get for an operator with contiguous support (ρ = 1).

Beyond two-qubit measurements
It is straightforward to generalize the previous discussion to measurements that factor into nqubit bases, with n > 2. For n = 3, this requires picking a trimer covering and a choice of measurements on each trimer.It is easy to show 8 that the optimal protocol (optimized for the learnability of compatible operators, defined here as having either 0 or 3 non-identity operators per trimer) is given by measurements in locally-scrambled GHZ bases, i.e., the 8 orthonormal states stabilized by While the optimization is less straightforward for n > 3, we show next that the n-qubit GHZ basis is optimal among stabilizer measurements (i.e., measurements of commuting Pauli operators) at large n.The relevant eigenvalue for compatible operators, adapting Eq. ( 12), is 8 Eq. ( 12) for any pure measurement basis gives λ••• = (7−6P)/27 where P is the average purity of a single qubit in the measurement basis (averaged over sites and states in the basis).This is maximized for P = 1/2, which is achieved by the GHZ basis.
where [n] ≡ {1, . . .n}.The entanglement feature of an n-qubit GHZ state is P A = 1/2 for all subsystems A except A = ∅ and A = [n], where Thus the shadow norm of Pauli operators that are compatible with the partition is ∥P ∥ 2 sh = (f n ) k , with k the Pauli weight and Let us unpack this result: • For n = 1 we recover random Pauli shadows, • For n = 2 we recover Bell shadows, f 2 = √ 3 ≃ 1.732; • For n = 3 we find f 3 = 3/2 2/3 ≃ 1.890, larger than f 2 and thus less efficient than Bell shadows when both are applicable, but potentially useful to learn quantities such as multi-point functions of 3-body operators; • For n ≥ 4, f n decreases monotonically and asymptotes to 3/2, meaning a scaling of shadow norms as ∥P ∥ 2 sh → (3/2) k for compatible operators in the n ≫ 1 limit.
It is easy to see that the latter scaling of (3/2) k is optimal for stabilizer measurements, i.e. when each run of the experiment measures n independent, commuting Pauli operators {g i } n i=1 .In effect this corresponds to measuring 2 n stabilizers: { i g s i i : s ∈ {0, 1} n }.Thus each basis choice "hits" at most 2 n out of the 3 n maximumweight Pauli operators.It follows that the probability of "hitting" a given compatible operator is ≤ [(2/3) n ] k/n = (2/3) k , hence the bound on the shadow norm ∥P ∥ 2 sh ≥ (3/2) k for all stabilizer measurement bases.
By the above reasoning, minimizing the sample complexity of learning "compatible" operators corresponds to choosing a stabilizer group in which the largest possible number of elements have full support.The distribution of weights across elements of a stabilizer group is a wellstudied object with applications in coding theory, known as the Shor-Laflamme distribution and characterizable algebraically by weight enumerator polynomials [40,41].The GHZ state is known to maximize the number of fully-supported elements in its stabilizer group [42], meaning that it is indeed optimal for our task.
The improved scaling of sample complexity in these protocols with larger n however comes with some trade-offs.For one, the set of "compatible" operators whose shadow norm obeys the scaling in Eq. (20) gets more constrained with increasing n (the set comprises [3 n + 1] N/n operators).Secondly, preparing the GHZ measurement basis on n qubits requires either circuit depth linear in n, which makes the method less scalable on noisy hardware and limits n to modest finite values, or the introduction of extensively many auxiliary qubits.The performance of these schemes under realistic constraints is an interesting direction for future work.

Summary
We have introduced classical shadows protocols based on randomized measurements that feature entanglement over finite-sized subsystems.These protocols are NISQ-friendly, requiring only local shallow circuits; further, the classical postprocessing steps are straightforward, on par with the standard classical shadows protocols based on random Pauli or Clifford measurements [1].
We have shown that locally-entangled measurements can lead to substantial improvements in sample complexity for some Pauli estimation tasks.As a paradigmatic example, we have focused on two-qubit Bell measurements, Sec. 2. These achieve an improved scaling of sample complexity ∝ 3 k/2 with Pauli weight k for many operators, while failing to learn others.Such tradeoffs are unavoidable since the scaling ∼ 3 k of random Pauli shadows is optimal in general under a fixed set of local measurements [1] (i.e. it is not possible to do better for all Pauli operators); nonetheless, the trade-off can be advantageous for tasks of interest in quantum many-body physics such as the estimation of string operators or multi-point functions of local operators.
We have further shown (Sec.3) that general two-qubit entangled bases make it possible to interpolate between Pauli and Bell measurements.This gives a family of tomographically-complete protocols that retain a favorable scaling of sample complexity in many cases, Eq. (15).Finally, when allowing entanglement between n > 2 qubits (Sec.4), we have found that the optimal basis is given by n-qubit GHZ states, which enable the estimation of certain 'compatible' Pauli operators with even more advantageous scaling of the shadow norm, ∥P ∥ 2 sh ∼ (3/2) k for large n, Eq. (20).Our results are summarized in Table 1.

Shallow shadows
Recent works have focused on leveraging locality-an important factor in many NISQ architectures-to develop variations of classical shadows with certain practical advantages.In particular, shallow shadows [24,25] implement the basis randomization step via circuits of variable depth and have been recently shown to give a significantly improved sample complexity (relative to random Pauli measurements) for learning expectation values of geometrically local Pauli operators, among other advantageous properties [24,25,26,27].
Bell shadows (and other n = 2 protocols) may be seen as a depth-1 instance of shallow shadows, featuring a single layer of two-qubit entangling gates.However, this appears to raise a puzzle in relation to the shadow norm of Pauli string operators in 1D.While shallow shadows were shown to achieve an optimal scaling of sample complexity ∼ k2 k [27], Bell shadows manage to achieve the improved scaling ∼ 3 k/2 for some of the same operators.How can a depth-1 instance outperform the optimal-depth version of the protocol?
The answer lies in the fact that Bell shadows feature a partly deterministic measurement sequence: the state is locally-scrambled, then evolved by a deterministic circuit (a single layer of entangling CZ or CPhase(ϕ) gates) and measured in a deterministic local basis (X).Shallow shadows, on the contrary, are based on fullyrandom circuits [25,26], or feature a local scrambling step right before the final single-qubit measurements [24,36].Exploiting this final localscrambling step, Ref. [27]   under the twirling circuit, and as a consequence derives the optimal scaling9 ∼ k2 k .Bell shadows, by avoiding the local scrambling step before the final single-qubit measurements, evade this result and are thus able to improve the scaling to ∼ 3 k/2 in the best case.
Another advantage of locally-entangled shadows over generic shallow shadows is that the inversion of M is effortless (on par with the random Pauli and random Clifford protocols [1]), and does not require tensor network algorithms [24,25].
This straightforwardly unlocks applications to higher-dimensional systems.Moreover, locally-entangled shadows may also be more advantageous for learning k-local, but geometrically non-local Pauli operators (as long as these are compatible with the partition of the system into n-qubit sets)-see the discussion on multi-point correlators in Sec.2.3.
At the same time, generic shallow shadows retain some significant advantages.First of all, shallow shadows can efficiently learn many-body fidelity (at depth t ∼ log(N )), akin to random Clifford measurements [25].This is beyond the scope of locally-entangled shadows.Within Pauli estimation, generic shallow shadows may be more efficient for estimating expectation values of local Pauli operators in 1D with a finite density of "holes" (identity operators) between their endpoints, depending on the density of holes-see the discussion at the end of Sec. 3.

Learning via locally-entangled measurements
Various other state-learning protocols that make use of Bell, GHZ, or other locally-entangled measurements have been studied.Refs.[43,44,45] focus on including Bell and GHZ measurements as part of a measurement optimization algorithm for learning a given set of operators, which is complementary to our focus on randomized measurements.
Ref. [46], while written prior to the introduction of classical shadows, has several analogies with our discussion of Bell shadows, Sec. 2. The approach relies on introducing one auxiliary qubit ⃗ τ i per system qubit ⃗ σ i ; each of the auxiliary qubits is in an initial state ξ = (I + n • ⃗ τ )/2 with n = (1, 1, 1)/ √ 3.By measuring the two qubits ⃗ σ i , ⃗ τ i in the standard Bell basis, one simultaneously learns the expectation all three commuting operators10 σ α i ⊗ τ α i for all α = x, y, z and each site i.We thus learn the expectation of P = i σ α i i ⊗ τ α i i : where P = i σ α i i is a Pauli operator on the system qubits, and k is its weight.Learning ⟨P ⟩ with additive error ϵ requires learning ⟨ P ⟩ with additive error 3 −k/2 ϵ, thus the sample complexity scales as 3 k ϵ −2 .
The relationship with Bell shadows becomes apparent if one views the above protocol as learning a Pauli operator P of weight 2k on a two-leg ladder.The dimer covering is given by pairing qubits ⃗ σ i and ⃗ τ i on each rung, and P is manifestly compatible with the covering, giving the expected sample complexity ∼ 3 | P |/2 = 3 k .Thus the protocol of Ref. [46] may be seen as a "derandomized" version of Bell shadows with a specific geometry, dimer covering, and subset of initial states.

Future directions
The advantages and limitations of locallyentangled shadows both follow from using a structured, (partially) non-random circuit.This observation points to interesting directions for future research, based on leveraging structure in a problem of interest to design tailored, highlyoptimized shadow (or shadow-like) state-learning protocols.Several such proposals have been put forth, e.g. with the goal of optimizing energy estimates for molecular Hamiltonians [3,28,48,49,50,51].However, these proposals typically deal with the optimization of local basis choices within a Pauli measurement framework.Our work points to the possibility of substantial gains from using entangled measurement bases, even in the simplest and most practically accessible case of two-qubit entanglement.It would be interesting to identify more complex many-body entanglement structures optimized for learning different classes of properties of quantum states, such as entropies or other nonlinear functionals.
Another interesting direction is to bring locally-entangled shadows within reach of analog quantum simulators, by adapting the approaches of Refs.[52,53].In lieu of random unitary gates, these approaches make use of fixed Hamiltonian dynamics and the stochastic nature of quantum measurements to supply the randomization needed for classical shadows without digital control [54,55,56,57,58,59,60].The "patched quench" scenario of Ref. [52] in particular appears as a natural setup for optimized locally-entangled shadows.The task of finely tuning the amount of entanglement in the measurement basis for these protocols is nontrivial, but may be achievable depending on the nature of the analog simulator dynamics.

Figure 1 :
Figure 1: Schematic of the Bell shadows protocol.(a) Unknown state ρ is locally scrambled with random single-qubit Clifford gates (colored squares), then pairs of neighboring qubits are measured in the Bell basis (shown as a sequence of CZ, Hadamard and computational basis measurements).(b) Bell measurements define a dimer covering of the lattice (ellipses).An operator is compatible with the covering if its support intersects each dimer on 0 or 2 sites (green), incompatible otherwise (red).Bell shadows only learn compatible operators.

Figure 2 :
Figure 2: Use cases of Bell shadows, Sec.2.3.The graphical conventions are as in Fig. 1(b): blue ellipses denote dimers whose qubits are measured in a Bell basis, green operators are compatible with the given dimer covering and thus learnable, red operators are incompatible and not learnable.(a) Pauli string operators in 1D chains.All operators of even length are learnable by sampling two distinct dimer coverings of the chain (left and right).(b) Plaquette operators of a honeycomb system (e.g.color code stabilizers).The dimer covering in the picture is compatible with two thirds of all hexagonal plaquettes; the remaining ones may be learned by translating the dimer covering by one lattice vector.(c) Multi-point functions of two-body operators.

2
As an example, let us imagine measuring XX = +1 and Y Z = −1; this also informs us of the eigenvalue of ZY = XX • Y Z: in this case, (+1) • (−1) = −1.More generally, measuring any number of commuting Pauli operators {Pi} is equivalent to measuring the entire stabilizer group they generate.

Figure 3 :
Figure 3: Pauli shadow norm from two-qubit measurements in bases with variable entanglement.(a) Setup: qubits (circles) in a 1D chain are grouped into pairs (blue ellipses) and measured in a given two-qubit basis.The Pauli operator's support (full circles) has length k.Operators with odd k break a dimer.(b) ∥P ∥ 2 sh as a function of k for different values of the basis entanglement S 2 = ln(2)−δ.Also shown are the scalings 3 k/2 (dashed line), 2 k (dotted line) and 3 k (dot-dashed line).
3. Bell shadows can efficiently estimate p-point functions ⟨h b 1 h b 2 • • • h bp ⟩ of the energy density (as long as all bonds b 1 , ...b p match the dimer covering, see Fig.2(c)) with sample complexity scaling with p as ∼ 3 p , compared to the scaling for random Pauli measurements ∼ 9 p .
maps the shadow norm to properties of the operator weight distribution

Table 1 :
Summary of results for various locally-entangled classical shadows protocols.