Continuous groups of transversal gates for quantum error correcting codes from finite clock reference frames

Following the introduction of the task of reference frame error correction [1], we show how, by using reference frame alignment with clocks, one can add a continuous Abelian group of transversal logical gates to any errorcorrecting code. With this we further explore a way of circumventing the no-go theorem of Eastin and Knill, which states that if local errors are correctable, the group of transversal gates must be of finite order. We are able to do this by introducing a small error on the decoding procedure that decreases with the dimension of the frames used. Furthermore, we show that there is a direct relationship between how small this error can be and how accurate quantum clocks can be: the more accurate the clock, the smaller the error; and the no-go theorem would be violated if time could be measured perfectly in quantum mechanics. The asymptotic scaling of the error is studied under a number of scenarios of reference frames and error models. The scheme is also extended to errors at unknown locations, and we show how to achieve this by simple majority voting related error correction schemes on the reference frames. In the Outlook, we discuss our results in relation to the AdS/CFT correspondence and the Page-Wooters mechanism.


Introduction and Overview of Results
In order to build a functional universal quantum computer, full fault-tolerance must be achieved. The idea behind fault-tolerance is that the errors that occur at particular points during the computation do not propagate or amplify along the whole computation to the point of being uncorrectable. Due to fundamental physical constraints such as no-cloning, achieving this is a notoriously challenging task, with a number of different requirements on how to prepare, manipulate, and protect the quantum states with error-correcting codes. One of the most desirable features of the codes used in fault-tolerant computation is the ability to ap-ply logical gates transversally, which one can implement while still being able to correct for local errors.
The framework for error correction is based on considering three spaces -a logical H L , physical H P and a code H Co ⊆ H P space. 1 Logical states ρ L containing quantum information are encoded via an encoding map E : B(H L ) → B(H Co ) onto the code space, which is a subspace of some larger physical space where errors -represented via error maps {E j } j : B(H Co ) → B(H P ) -can occur. Decoding maps {D j } j : B(H P ) → B(H L ) can then retrieve the information while correcting for errors; outputting the logical state ρ L . That is: for all j and for all states ρ L ∈ S (H L ). Depending on the error model, the index j indicating which error occurred may or may not be known. If it is unknown, the decoding map D j cannot depend on j. We say that a logical gate V L can be applied transversally if for any state ρ, the encoder E is such that where the tensor product structure "⊗K" represents the division of the code into different subsystems or "blocks" in which errors can be independently corrected. This condition means that the action of the encoding map commutes with the action of the logical gate V L , which is represented by V ⊗K Co in the physical space. An interesting case to consider is that of codes E and groups G for which all group elements (indexed by g) can be applied transversally using a unitary group representation U L (g), (3) We will refer to codes whose encoding has this property as covariant codes. There are a number of results that restrict the existence of such codes, in particular for stabilizer codes [2][3][4][5][6]. Most notably, the no-go theorem of Eastin and Knill [7] states that in any finite-dimensional code (not necessarily stabilizer) in which local errors can be corrected, the groups of logical gates that can be applied transversally must be 1 Some authors use the convention of not considering the code space explicitly, in which case one sets H Co = H P .
finite. This thus excludes dense sets of logical gates, as well as any continuous Lie subgroup of U(d).
Since this no-go result imposes a fundamental limitation on the possible transversal gates, it is interesting to find ways of circumventing it, via schemes that do not satisfy all of the assumptions. A number of alternatives have been thoroughly explored, including the protocols of magic state distillation [8], and other more specific schemes (see for instance [9][10][11][12][13][14]), many of which propose a relaxation of the transversality condition in some fault-tolerant way.
Recently, a new kind of circumvention was put forward in [1], where they show examples of codes with physical spaces of infinite dimension that allow for the transversal implementation of Lie groups. The need for infinite dimensions (and seemingly infinite energy too, as we will soon discuss) limits their practical relevance, but the idea motivates the following question: do there exist large (but finite) covariant codes in which approximate error correction can be performed? In other words, can we circumvent the no-go results by allowing for small errors that decrease with the size of the code?
Here we explore this question using the notion of reference frames and clocks. We construct imperfect codes in which Abelian U(1) groups can be implemented transversally. To that aim, we use a simple finite dimensional version of the encoding map from [1], which shows that perfect covariant codes are possible provided one has access to a perfect reference frame. A perfect reference frame [15] is defined as a quantum system that encodes, without error, information about a particular group element. That is, given a group representation U (g), such that U (0) = 1, the state |ψ is a perfect reference frame iff ∀ g ψ| U (g) |ψ = δ 0,g , where δ 0,g is a Kronecker delta for finite groups and Dirac delta in the case of Lie groups. Hence, each point of the orbit |ψ(g) = U (g) |ψ is orthogonal to all the others (and thus perfectly distinguishable). This connects with the work by Pauli on quantum clocks in the case of an Abelian U(1) groups. If the group {U (g)} g is a one-parameter compact Lie group generated by solving the Schrödinger equation, namely U (g) = e −igĤ for some HamiltonianĤ, where g = t is the time transcribed, then the constraint Eq. (4) is analogous to requiring the existence of a perfect time operatort. Indeed, defininĝ t = G dg g |ψ(g) ψ(g)|, (5) where the integral is over the Haar measure, one findŝ t |ψ(g) = g |ψ(g) for all g ∈ G. Pauli [16,17] already concluded that such time operators require quantum systems that -while mathematically well defined -cannot exist as they require infinite energy. 2 These are known as Idealised clocks. However, this does not rule out the existence of approximate time operatorst and initial clock states |ψ which serve as a reference frame for the observable t, [19][20][21] either in the form of a stopwatch [22] or ticking clock [23][24][25] Here we explore how certain finite-sized reference frames (which we call "clocks") can be used to build imperfect covariant codes, and we give upper bounds to the errors induced by their finite size. We use the construction of Quasi-Ideal clocks [21] and Salecker-Wigner-Peres (SWP) clocks [19,20], to provide simple encoding and decoding protocols based on the task of reference frame alignment. We show using Quasi-Ideal clock states entangled over L subsystems, that the worst-case entanglement fidelity between the input ρ L and decoded output D j (E j (E cov (ρ L ))) denoted f worst , and defined in Section 2.2, in our protocol satisfies (up to log factors) the lower bound 1 − f worst ≤ O(1/(Ld C ) 2 ) where L is the number of entangled Quasi-Ideal clocks and d C is their dimension. In [26] the generic upper bound 1 − f worst ≥ O(1/(Ld C ) 2 ) is derived for all covariant encoding maps generated by isometries. A direct consequence of the combination of our results with those of [26] adapted to our setting, is that all error correcting codes {E, {E j , D j } j }, can be made covariant w.r.t. the U(1) groups considered here with an optimal fidelity f worst (largest possible value) satisfying (6) for a large Ld C .
In order to study the role of different resources in our protocols, we define t-incoherent clock states ρ C as those for which there exits g ∈ G such that their group evolved initial state, U C (g)ρ C U † C (g), commutes with the projective measurements used in our protocol to measure them. 3 As we later explain, these states require minimal coherent resources to be created in comparison with other clock states. We find that all codes which use t-incoherent clock states satisfy the upper bound 1 − f worst ≥ O(1/(Ld C )). We then consider the more commonly used SWP clocks (see for instance [19,20,[27][28][29][30] and references therein), which belong to this class, and show that they yield an error of order 1 − f worst = O(1/(Ld C )) hence saturating the bound for t-incoherent clocks. However, we also show that while coherent clock states are necessary to achieve 1 − f worst ≤ O(1/(Ld C )), they are not sufficient, since finite dimensional analogues of coherent states can only achieve the same scaling as the SWP clocks.
The results discussed so far consider codes in which the error model on the clock is just erasure at known locations. However, what if one cannot discern the location at which the error occurred? Our final result is to prove that one can also correct local unknown phase errors which occur at an unknown location in the clocks. For this case, we are able to achieve 1 − f worst ≤ O(1/(Ld C )) up to log factors.

Covariant codes based on reference frames
We now outline the generic error correction scheme we consider, which is based on a generalisation of the reference frame-based scheme in [1]. Let {E, {E j , D j } j } be any perfect error correcting code satisfying Eq. (1) (not necessarily covariant). Now, let ρ (M) F be a reference frame for a group isomorphic to {U L (g)} g and {U Co (g)} g , with unitary group representation {U ⊗M F (g)} g . We define the following for all g ∈ G E g (·) := U Co (g) ⊗K E(U † L (g)(·)U L (g))U Co (g) †⊗K . (

7)
The covariant code is then: (8) where dg is the uniform or Haar measure over the group. 4 Note that E cov is from H L to H P ⊗ H F while usually the encoding map takes logical states to states in the physical space. As such, it is convenient to think of the reference frames as an extension of the code space to H Co := H Co ⊗ H F . Moreover, the channel E cov is now covariant in the sense of Eq. (3) if the symmetry group on the r.h.s. is defined in the extended physical space, namely U Co (g) ⊗K ⊗ U F (g) ⊗M , so that U L (g) can be implemented transversally. This way, the notion of covariance defined in Eq. (3) is now over H Co rather than H Co alone. We note, however, that unlike the code subspace, none of the logical information is encoded directly into frames F, and they merely serve as a reference for g, for which both the group {U ⊗M F } g and initial frame state ρ can be chosen at our convenience to optimise our protocol. The purpose of the error maps E j is that they take information from the code space and "mix" it with the rest of physical space resulting in errors. On the other hand, the reference frames play the following role in the encoding map E cov : the logical information is still encoded into the code space, but now the information about which transversal gate is applied to it is encoded in the reference frames, hence without knowledge from the reference frames during the decoding, it would not be possible to discern which gates had been applied. If the reference frames are damaged via an error (which is a reasonable assumption since they are now part of the extended code space), this will inhibit the ability to decode correctly resulting in a worse decoding of the logical information.
The protection of the information to be encoded relies on the channel E, which is arbitrary. On the other hand, the protection for the reference frame in the enconding of Eq. (8) will come merely from having a few copies of them which may or may not be entangled. In the case of no entanglement, ρ F , the states are only classically correlated due to the twirling over G in Eq. (4), and thus bears strong analogies with a classical repetition code.
The encoding-error-decoding procedure consists of the following steps: 1) An arbitrary state ρ L is encoded as E cov (ρ L ).
2) Errors occur: The error maps considered here are now of the form E j,q = E j ⊗ ξ q , with ξ q acting on the reference frames and the index q -as with j -may or may not be known depending on the error model. This may include the loss of up to M − 1 frames, or in the case of Theorem 4, an unknown phase error at an unknown location.
3) Frames are measured: In the case of erasure errors at known locations of the reference frames, only the N non erased clocks are measured. The remaining N frames are measured projectively in some basis, and after tracing out the reference frames, we write whereÊ g represents the error term and g is chosen based on the measurement outcomes, the initial clock states and the error maps ξ q . If the reference frames are good, the errorÊ g will be small since the measurements will be able to distinguish approximately the group elements. Indeed, in the case of perfect reference frames Eq.
(4) or equivalently the Idealised clock Eq. (5), the optimal choice of g leads to no error,Ê g = 0.
4) Finally, we apply the decoding mapD j,g defined asD j,g (·) = (10) where D j (E j (·)) = E −1 (·), with E −1 being the inverse channel for the encoder E (that is, the channel that undoes the encoding when no errors occur). (11) whereÊ g is the final error term.

This achieves
The circuit diagram is as follows.
This scheme is closely related to the task of reference frame alignment, in which the group of "transversal gates" corresponds to the unknown rotation that happens between two parties, here the encoder and decoder. This is such that the channel connecting them is effectively a decoherence channel. The unknown rotation is inferred by measuring the frame (see [15,31] for further details). Reference frame alignment can in fact be thought of as the particular case where the encoding, error and decoding are identity channels. We discuss this connection further in Supplementary material I, where we show that the error probability using the Quasi-Ideal clock is (ln d C ) 3 /d 2 C -a quadratic improvement over the SWP clock, which only achieves ∝ 1/d C as shown in [31].
Another question of relevance is what happens if the original encoding E already had a group of transversal gates {V L , V Co } satisfying Eq. (2). If [V L , U L (g)] = 0 and [V Co , U Co (g)] = 0 for all g ∈ G then these gates are also transversal for E cov .
Moreover, while these gates are not transversal for E cov in general, the gates which can be viewed as merely a change of basis on the original gates, are -up to a small error E g -pairs of transversal gates on E cov after the reference frames have been measured and a value g has been discerned in step 3); which is performed before decoding. See Supplementary material A for details.

Finite-sized clocks
We now detail the unitary group representations to which our results apply to. For the logical and physical spaces, we consider all compact representations of the Abelian U(1) group. These can be written in the form U L (t) = e −itĤL , U ⊗K Co (t) = e −itĤCo respectively, where using the shorthand S ∈ {L, Co}, the generators areĤ S = dS−1 n=0 ωh S,n |n n| S for a d S dimensional space, for some frequency ω > 0. They have fixed gaps but arbitrary degeneracy, so that h S,n ∈ {0, 1, 2, . . . , ∆h S } with N ∆h S ≤ d S . As we will see, the performance improves as the range ∆h Co decreases, so in fact the best choice is a trivial generatorĤ Co ∝ 1, for which we can set ∆h Co = 0.
As motivated by our discussion of Pauli's findings, in this context, one can think of the group elements U F (g) as providing the dynamics of quantum clocks ρ F where g = t is time. We will therefore refer to the reference frame as a "clock" and use the labels F and C interchangeably in this context. The clock is chosen as the compact non-degenerate representation, namely U C (t) = e −itĤC withĤ C = dC−1 r=0 ωr|r r| C where {|r C } is an arbitrary orthonormal basis. Physically, energy gaps which are all integer multiples of ω are required to create degeneracies between systems L, Co, C. Technically, it means that the group representations on L, Co, C are compact, and that the irreps of U C (g) contain those of U L (g) and U Co (g). The compactness allows replacement of the integration range in Eq. (8) with [0, T 0 ] where T 0 = 2π/ω is the recurrence time and the measure becomes dg = dt/T 0 .
The main clocks we use in Eq. (8) are based upon the Quasi-Ideal clocks [21] and the SWP clocks [20]. They both share the same HamiltonianĤ C . The definition of SWP clock states are simple, they are any one of the pure states of the Fourier-transformed basis of eigenbasis ofĤ C , namely ρ SWP = |θ k θ k | with where |θ k = |θ k mod d. , k ∈ Z; and is referred to as the time basis. On the other hand, the Quasi-Ideal clock states ρ QI = |Ψ(k 0 1 ) Ψ(k 0 1 )| are defined as a coherent superposition of SWP clocks, where A is a normalization constant, S dC (k 0 1 ) is the set of d C consecutive integers centred about k 0 1 ∈ R and ωn 0 ∈ (0, ωd C ) is approximately the mean energy of the clock. k 0 1 can be thought of as an approximate initial time marked by the clock. The parameters n 0 , k 0 1 and σ ∈ (0, d C ) can be tuned to our convenience. Both the Quasi-Ideal and SWP clocks are usually associated with the same projective operators in the "time" basis, defined aŝ When measuring individual clocks, we will do so in this time basis. When the clocks are entangled, we will use a time operator of this form but with the projectors |θ k θ k | replaced with non-local projectors over the multiple entangled clocks, as explained later.
For a detailed description of the measurement protocols, see Supplementary material B. The Quasi-Ideal clock states have been shown to yield a good performance in the context of quantum control [21] and measurement of time [25]. On the other hand, in [21,25] the SWP clocks appeared to be suboptimal. For the task at hand, we will prove that this difference of performance still occurs.

Entanglement fidelity
The figure of merit we will use for quantifying the performance of codes which cannot decode perfectly is the worst-case entanglement fidelity. Given a sequence of encoding, error and decoding which we label as the channel K : H L → H L , the entanglement fidelity is defined as where I is an identity channel acting on a copy of H L , and the optimization is over all bipartite states on which the map K ⊗ I acts. 5 Having a perfect code as those satisfying Eq. (1), is equivalent to f worst (K) = 1, which is achieved with perfect reference frames or Idealised clocks. For a justification of why this is a good measure of approximate codes, see for instance [32,33]. The channel K that we consider here is for the error correction codes {E cov , {E j,q ,D j,q } j,q } as described in section 2, in which we use clocks that are not idealised.

Results
For simplicity, we will only state our bounds to leading order in d C , d P , d L , L. Here d P ≥ d Co is the dimension of the physical space and L is introduced later. Except for Theorem 4, explicit boundsnot just the leading order terms -can be found in the supplementary material. Furthermore, in all theorems and corollaries in which the Quasi-Ideal clock is involved, the width σ scales logarithmically with d C . The exact value can be found in the proofs.
Our first result covers the case in which at least one of the clocks is left untouched by erasure errors.
with error channels {E j,q = E j ⊗ ξ q } j,q , where {ξ q } q are erasure at known locations of at most M −1 clocks and {E j } j are the error channels for the code E. Then for all M ∈ N + , and for all error correcting codes 5 Note that it is sufficient to take an ancillary space of size d L .
{E, {E j , D j } j }, there exists decoding channelsD j,q for E cov such that the worst-case entanglement fidelity of the covariant encoding satisfies For the proof, we need to write the encodingerror-decoding scheme in the form described in supplementary material B when we have a single clock unaffected by errors. After working out the decoding protocol explicitly, we find a bound that, together with the results on the entanglement fidelity of supplementary material C allows us to derive the Theorem in supplementary material D, after choosing σ = ln 3 (d C ) for the single clock we measure.
In order to study the role that the different resources play in the current scheme, consider the following definitions. We call a clock state ρ inc ∈ S(H L ), with time operatort C and unitary group representation {U C (t)} t∈R , t-incoherent if there exists a state in its periodic orbit which commutes with the time operator, namely if there exists t 0 ∈ R such that wheret C is the operator used in step 3) to measure the clock. 6 Conversely, clock states which are not tincoherent are called t-coherent.
Observe that if one can construct the code E and decoders D j , then for a given initial clock state ρ (M) C , in order to construct the covariant code E cov in Eq. (8), one needs to be able to apply the transversal gates to the clock and code E, and create classical correlations (a separable state) between them. For the de-codersD j , the only additional resource required to apply them is the ability to measure the clocks projectively in the time basis. The ability to perform such operations is always required for the construction of any error correction code {E cov , {E j,q ,D j,q } j,q }. In the case of t-incoherent clocks, the initial clock state ρ (M) C can then easily be constructed by simply measuring any state on the clock Hilbert space and applying the appropriate traversal gate. However, in the case that the initial clock state ρ (M) C is t-coherent (such as in the case of Quasi-Ideal clock states in Theorem 1), one cannot constructed it with any of the above mentioned operations since they do not allow for the creation of the necessary coherence (that is, with respect to the basis oft C ). Therefore, from a resourcetheoretic standpoint, it is interesting to characterise how good codes can be if they only use t-incoherent clocks: 6 Note that in the case of time operators which are not projective POVMs, t-incoherent clocks can still be defined by replacingt C by the 1st moment operator of the POVM. We do not need to consider such generalisations here however. Theorem 2. Consider the covariant code E cov in Eq. (8), with one d C dimensional t-incoherent clock ρ inc , with time operatort C defined in Eq. (13), Allow for no errors on the clock, where C > 0 is independent of d C . Moreover, there exists a scheme using the Salecker-Wigner-Peres clock, ρ inc = ρ SWP that achieves where C * ≥ C is also independent of d C .
The full proof is shown in supplementary material E and it goes as follows: we write explicitly the state of the code after the clock has been measured, and we apply an arbitrary decoder. Then we write the entanglement fidelity explicitly and show that the error term decays at best linearly with the dimension of the clock. The performance achieved by the SWP clock is calculated by looking at the scaling of the errors in the decoding scheme of Section 2.
The setup in the above Theorem is the same as in Theorem 1 in the special case that M = 1 and one exchanges the Quasi-Ideal clock for a t-incoherent one. So by comparing Eqs. (16) and (19) we see that there is essentially a quadratic improvement in the scaling w.r.t. the clock dimension d C . This demonstrates the necessity of t-coherent clock states to achieve the improved scaling in our protocols. Furthermore, in all our protocols, the final state of the clock after applying a decoderD j is t-incoherent and thus in protocols in which the initial clock state was t-coherent, the coherence was "used up" in the process.
Interestingly, we have observed the same effective quadratic advantage by using the Quasi-Ideal clock rather than the SWP clock for the related task of reference frame alignment (See Section 2 and Supplementary material I) and ticking clocks [25].
While t-coherent clock states are necessary to achieve the improved scaling, they are clearly not sufficient. A case in point are clock states which are incoherent in the energy basis. Indeed, by inverting Eq. (12) one observes that they are t-coherent yet since they do not evolve in time are useless for this task.
Changing the width σ of the Quasi-Ideal clock state allows one to understand these differences better. The uncertainty in both the energy and time basis, denoted ∆E and ∆t respectively, satisfy ∆E∆t = 1/2 to leading order in d C for all Quasi-Ideal clock states. They are thus approximately minimum uncertainty states. By changing the value of σ/ √ d C from small to large one can achieve the limiting cases of a time eigenstate ∆E ∆t, and an energy eigenstate ∆E ∆t. Eq. (16) in Theorem 1 corresponds to the optimal value of σ = ln 3 (d C ), which corresponds to states which are time squeezed (∆E ≥ ∆t), but not by too much -the states do not become (t-incoherent) time eigenstates. On the other hand, Quasi-Ideal clock states which are energy squeezed (∆E ≤ ∆t), yield an entanglement fidelity f worst which scales with d C even worse than t-incoherent states. In-between, one finds the non squeezed (∆E = ∆t) Quasi-Ideal states which have the same d −1 C scaling as the t-incoherent SWP clock.
The latter Quasi-Ideal clock states behave analogously to "classical" coherent states -their expectation values in the time and energy bases oscillate like simple harmonic oscillators, and they minimize the Heisenberg uncertainty with equal uncertainty in each basis. However, the analogy ends here. The time squeezed Quasi-Ideal clock states remain time squeezed under the application of U C (t) for all t ∈ R, while squeezed coherent states in one quadrature (e.g. position) become squeezed in the complementary quadrature (e.g. momentum) under evolution of its Hamiltonian -broadening in the initial quadrature basis. Intuitively, this is expected to be an important difference between squeezed coherent states and squeezed Quasi-Ideal clock states, at least regarding the present task. This is because good decoding maps would require the states to be squeezed during the entire periodic orbit in the basis in which they are measured in step 3) of the decoding protocol.
We also find a significant improvement to the entanglement fidelity when one has access to a larger number of clocks. To achieve it, we embed a large entangled clock within the Hilbert space of L ∈ N + smaller ones, effectively creating a clock of dimension d(L) := L(d C − 1) + 1.

Theorem 3. Given a covariant error correction code
is used, there exists another covariant error correction code with the same error channels The embedding needed for this theorem leads naturally to an entangled discrete Fourier transform basis on the Hilbert space of L clocks where |E n,1 are a non-degenerate set of L eigenvectors of the generator L i=1Ĥ C 7 (see supplementary material F for details).
When this embedding is applied to the SWP clock and Quasi-Ideal clock, it gives rise to a L-site SWP Entangled clock state, ρ SWPE,L = |θ k (L) θ k (L)| and an L-site Quasi-Ideal Entangled clock, ρ QIE,L = |Ψ L (k 0 1 ) Ψ L (k 0 1 )|, with the new time operator, The combination of Theorem 3 with Theorems 1, 2, yields an important corollary: Similarly, consider the setup in Theorem 2, but with a d C dimensional t-incoherent clock state ρ inc,L , with a time operatort C (L), rather than a SWP clock with time operatort C . The worst-case entanglement fidelity of the covariant encoding now satisfies where C is independent of d C , L; and equality is obtained for the L-site SWP Entangled clock ρ SWPE,L .
If one compares the scaling with the number of copies L in Eqs. (23),(24) one finds effectively a quadratic advantage in the case of the t-coherent states. The difference in scaling between the two cases is the same as that found in metrology when comparing the classical shot noise scaling with the quantum Heisenberg scaling [34].
The bound Eq. (23) in Corollary 1 effectively saturates the upper bound derived in [26], which is proven to hold for all covariant error correction codes generated by isometries. Applying it to our constructions, it takes the form For the details about how this inequality follows from the results of [26] see Supplementary material G. 8 If we keep the code parameters ∆h Co , ∆h L , d P , d L fixed and scale up the clock size and the number of clocks, the combination of lower and upper bounds Eqs. 23, 25 prove that our construction achieves an optimal scaling with both the dimension of the clock d C and the number of them L, up to the logarithmic factors. Furthermore, this proves that the bound is tight for all choices of So far we have only considered erasure errors at known locations on the clock. However, what if one cannot detect where the error occurred without damaging the code? This is the case in the most elementary error correcting codes. If one has many clocks and the error occurs with an approximately uniform error probability distribution over the clock locations, one simple approach would be to choose one of the clocks, erase the other clocks, and perform error correction with this clock. If the probability of the error occurring on this clock is low, then this would work well on average. However, this approach is wasteful since it does not use the encoded information in the other clocks and requires a low probability of error on a particular clock. Now we will investigate how well we can recover in the case of unknown phase errors at unknown locations which also works well for a small number of clocks. Consider the case in which one clock (or an entangled block of L clocks in the sense of Theorem 3) whose location is unknown has a random phase applied to it (i.e. an un-known group element U ⊗L C (t ph ), t ph ∈ R). We call this a 2-unknown phase error and denote it ξ ph,q (t ph ), where q ∈ N and t ph ∈ R are the unknown site location and phase respectively. The following result shows that such errors are correctable up to an arbitrarily small error.
are the error channels for the code E, and ξ ph,q is a 2-unknown phase error acting on one of the three L-site Quasi-Ideal Entangled clocks, ρ QIE,L . Then for all L ∈ N + , q ∈ {1, 2, 3}, t ph ∈ R and error correcting codes {E, {E j , D j } j }, there exists a decoding channelD j for E cov , which is independent of the unknown block q and phase t ph , such that Figure 1: Illustration of 3 clocks superimposed with a 2-unknown phase error. All clocks are set to have the same initial time, k 0 1 = k 0 2 = k 0 3 (red glowing hand). The three measurement outcomes of the three clocks (black hands) attain similar values with high probability. The yellow shaded region represents possible values of kα for this example, where kα = g (d C − 1)/(2πT0), and g is described in step 3) of the decoding protocol in Section 2. An unknown phase error occurs on an unknown clock (for the purpose of illustration, this is the 2 nd clock). It has the effect of shifting the initial time of the 2 nd clock by an unknown amount k ph . The apparent elapsed times are ∆t1 = 2π(k1 −k 0 , however, due to the 2unknown phase error, ∆t2 will give the incorrect prediction. Nevertheless, in this example, the phase error is small and no correction is needed. This result extends trivially to the more general case in which one has M blocks (rather than 3) of L-site Quasi-Ideal Entangled clocks in the covariant code and has erasure errors at up to M − 3 known locations, and the 2-unknown phase error on one of the 3 blocks of the remaining copies. See supplementary material H for proof.
To gain an intuitive picture of how the protocol works, it is best to consider the case L = 1 (the general L case is then a direct consequence of Theorem 3). For L = 1 the protocol is similar to our previous ones: we measure the 3 clocks locally in the time basis, and based upon the information from the 3 measurement outcomes, we calculate a time g = t [step 3)] and apply a corresponding decoding mapD j,t [step 4)] on the physical space. Due to the classical correlations between the code and clocks, the outcomes of the 3 clocks are correlated. This is such that, if there were no 2-unknown phase errors, the clocks would all indicate approximately the same "elapsed time" and one could correct the code up to an error of order Eq. (27) with the knowledge of only one of the 3 measurement outcomes. However, when a phase error occurs in the q th clock, its initial time k 0 q shifts by an unknown phase, making it an unknown variable and the reported elapsed time by the clock measure-ment of the q th clock, is thus incorrect. Since the value of q is also unknown, one cannot simply ignore the corresponding measurement outcome, so our protocol is to order the three measured elapsed times in ascending order and apply a unitary corresponding to the elapsed time which is neither the smallest nor the largest out of the three. There are two possibilities: 1) there was no phase error on this clock, in which case the marked elapsed time is approximately correct. In this case, the phase error could have been large. 2) the phase error occurred on this clock. In this case, the phase error must have been small, since the (incorrect) measured elapsed time is upper and lower bounded by that of the other two clocks (which must both be correct, since there is at most one 2unknown phase error). This corresponds to the case of Fig. 1.
This processing of the measurement outcomes is very closely related to majority voting used in a classical repetition code. Here the main difference is that the outcomes of the clocks are not binary and are unlikely to agree exactly.
Note how our protocol does not rely on any assumptions about the probability distributions over blocks q = 1, 2, 3 and phases t ph ∈ R over which the 2unknown phase error occurs. However, if one does assume that the probability of the 2-unknown phase error ξ ph,q (t ph ) is small and allows for an arbitrary number of 2-unknown phase errors to occur in an independent and identically distributed manner, then the above protocol will also work with high fidelity since the probability of two or more phase errors occurring on two or three clocks will be very small.
Finally, what about infinite dimensional covariant codes of finite energy? We leave the motivation to the Outlook (Section 4) and state our conclusions here. One can embed our finite dimensional clock into an infinite dimensional space, and express d C in terms of the mean energy of the clock, denoted Ĥ C . For all the clocks considered here, this corresponds to the mapping (up to additive higher order corrections in Ĥ C , for the case of Quasi-Ideal clocks). Our theorems and corollary; when framed in this context, provide bounds as a function of energy for how well the errors on covariant codes can be corrected. In this context, due to the infinite dimensions of the physical space, the Eastin-Knill no-go theorem does not apply, so in principle one could have perfect covariant error correcting codes. However, as we have seen in Section 2, due to the insights of Pauli, covariant error correcting codes based on reference frames can only be decodable without errors (i.e. f worst = 1) iff they use Idealised clocks which necessitate infinite energy. We conjecture that this is true in general. Specifically, that all error correcting codes which are covariant w.r.t. any faithful representation of a non-trivial Lie group, necessitate infinite energy. This would be an extension of the Eastin-Knill no-go theorem since all finite dimensional systems have finite energy but not viceversa.

Outlook
Given any error correcting code, we have seen how to construct simple classes of approximate codes in which compact U(1) groups can be implemented transversally. We study a specific scheme using Quasi-Ideal clocks [21] that saturates optimal bounds, explore the performance of alternative schemes, and also extend to an error model beyond errors at known locations. The present codes are based on the task of reference frame alignment, which has been studied quite extensively [15] and also formalized within the resource theory of asymmetry [35,36], which we hope will be of further use in the context of error correction [1,26]. Furthermore, since the clocks need to be measured projectively in our protocols, a preferred basis naturally arises and necessary conditions w.r.t. this basis for a quadratic advantage are identified. An obvious extension of our results is to groups beyond U(1). This requires the construction of more involved reference frames, such as finite-sized quantum "gyroscopes" (for the case of SU(2)). We hope that our techniques may serve as a starting point for generalising the results found here in this direction -perhaps together with the construction of approximate frames for general groups from [31].
An important question is to understand and characterize the sort of error models and implementations for which the present scheme is adequate. We have mostly explored the case of erasure errors at known locations, and a type of dephasing errors on the clocks at unknown locations. While these are quite natural error models, it may well be that with more involved schemes of codes involving clocks, other types of errors can be dealt with.
Also, the results of [26] show tight bounds that cannot be overcome only for erasure errors at known locations, and it would be interesting to know if their bounds are also tight for other error models. We suspect that the error scaling in Theorem 4 is optimal, even though it is only as good as that of the tincoherent clock states when the error model was that of erasure at know locations. Intuitively, the limiting factor in this case is that the 3 blocks of clocks are only classically correlated (separable states). While one can easily construct entangled counterparts, it is not clear how this can help when the errors occur over the unknown blocks.
One key point to be determined is whether the error bounds shown here, even if they are essentially optimal, are not too large to be useful in practice for some type of architecture. In the case of Theorem 1 and Corollary 1, explicit bounds on the entangle-ment fidelity f worst for finite d C have been derived in the supplementary material which can be evaluated numerically for experimentally feasible parameters. Were this the case, we would hope that the reference frames used here might be useful in some near-term applications of quantum technologies in which error correction has started playing a role, such as quantum metrology [37][38][39]. In particular, given Eq. (25), the Quasi-Ideal clock states appear to be almost optimally distiguishable along a whole U (1) orbit. This suggests that a setting like that of Theorem 4 could be of use in the construction of a metrological scheme aiming to determine unknown parameters (such as the time t of the operator U C (t)).
Covariant infinite dimensional error correcting codes are also of interest. Perhaps most prominently is the example of the hypothesized AdS/CFT duality between quantum gravity in asymptotically AdS space (known as the bulk) and a conformal field theory on the boundary, where the theory on the bulk and boundary are related via quantum error correcting codes [40]. Specifically, the bulk constitutes the logical space while the boundary is the physical/code space. Global symmetries in the bulk should correspond to symmetries on the boundary which conserve the local structure of the theory -in a similar spirit to how transversal gates act globally in the logical space while locally in the physical space [c.f. Eq. (3)]. It is argued that variants of the Eastin-Knill no-go theorem have important interpretations in AdS/CFT: all global symmetries in the bulk (both continuous and discrete) are ruled out, since their existence would be in contradiction with the structure of the correspondence 9 [41,42]. A related issue is time dynamics which is given by a U(1) symmetry with representations e −itĤ blk and e −itĤ bdy for HamiltoniansĤ blk ,Ĥ bdy on the bulk and boundary respectively. OftenĤ blk ,Ĥ bdy are quasi-local and thus the corresponding U(1) group action has to also preserve this local structure. An approximate U(1) covariant code with these properties for finite dimensional Hamiltonians has been developed in [43] using techniques from [44]. By choosing the number of code blocks to be one i.e. K = 1, our protocols allow for such bulk-boundary time dynamics covariance for arbitrary (e.g. quasi-local) finite dimensional Hamiltonians on H L and H Co . An open question is whether further work may allow for clocks with interacting quasi-local Hamiltonians too.
Since both theories of the duality are infinitedimensional but of finite energy locally, further quantitative variants of the Eastin-Knill no-go theorem for infinite dimensional physical spaces, seem more ap- 9 Local CFT operators acting on a spatial region R in the boundary, can only access information in a limited region in the bulk via entanglement wedge reconstruction. This is incompatible with the existence of charged localised objects in the bulk.
propriate (for instance, by taking into account the "average energy density" of the code space). Our results suggest that given sufficient energy, the preservation of local symmetries in the boundary is at least approximately possible.
Finally, it is worth noting that the extension of a physical Hilbert space by including clocks has been proposed in a different context before. The aim of the Wheeler-DeWit equation is to describe a time-static theory of quantum gravity in which locally one finds dynamics [45]. This motivated the Page-Wooters mechanism for describing how a quantum clock can allow for time evolution to follow from a static universe using Idealised clocks [46]. In supplementary material J we show that when the logical space has the trivial group representation of the U(1) symmetry, our formalism is an example of an approximate Page-Wooters mechanism where the approximation comes from using non Idealised clocks.
Note on Related Work: Around the time this work was being developed, it was realized by the authors that a complementary approach to characterizing how well the Eastin-Knill no-go theorem can be circumvented by allowing for a small error, was being developed independently by the authors of [26]. We thank them for their community spirit. A Compatibility with the transversal gates of E In our schemes we start with a code E(·) which is in principle arbitrary. We can ask what happens if this code already has a group of transversal or covariant gates (possibly of finite order as granted by [7]). Does that covariance remain after the reference frame is added? Let {V L , V Co , V F } be an element of a representation of the group of transversal gates of E on the logical, physical and reference frame spaces. Thus: where ⊗K represents the tensor product structure of the physical space of E. Does the same symmetry hold for E cov ? Let us write whereas we have, on the other hand It is clear that Eqs. (30) and (31) hold for all g ∈ G and ii) V F acts trivially on the clocks. Otherwise, the covariance/transversality is lost. While the latter can be obtained by choosing V F to act trivially on the clocks, the former seems much more restrictive. However, there is a sense in which we might still have transversal gates in the code, if we are able to set the representation U Co (g) ⊗K on the physical space to the trivial case. After measuring the clocks and obtaining some outcome g , one can apply the gates V ⊗K P ⊗ V F . Then, applying the decoder, the resulting state is If [V L , U L (g)] = 0 we may just apply U L (g) and end up with the desired state to which the gate has been applied transversally. Otherwise, what has happened is that we have applied the gate V L U † L (g ). We have knowledge of g , and so to end up with V L ρ L V † L we may just apply the unitary V L U L (g )V † L . This assumes that one can apply the logical gate both at the logical and the physical level.
This discussion gives a further example, together with the errors of the main results, of why it is advantageous to choose the generatorĤ Co to be trivialĤ Co ∝ 1 P , if possible.

B General encoding-decoding error with finite clocks
Here we explain the form for the encoding-error-decoding protocol for all the schemes of the present work. We refer to this discussion repeatedly in the different proofs. We will here assume that the M clocks are of product form. We will see later that this construction is general enough for our purposes even when the clock are entangled, by considering further divisions of the local sites considered here. We define the dimensions of the input and output of the error correcting code E to be d L , d P . The dimension of the ith clock is d i . If all clocks have the same dimension we use the notation d C := d 1 = d 2 = . . .. If we do not write the range over a summation, it will be over the full range.
Let us first calculate the integral over t by writing the unitary operators U C (t), U L (t), U Co (t) in their eigenbasis: and thus, e −iωt(hCo,q−h Co,q +hL,n−h L,n ) E q,q ,n,n (ρ L ) |q q | P , with E q,q ,n,n (ρ L ) := P q| E ( L n | ρ L |n L |n n|) |q P .
For simplicity, let us define Q := h Co,q − h Co,q + h L,n − h L,n . Therefore where {|r i } is the eigenbasis of the generatorĤ C of the i-th clock, δ (·,·) is the Kronecker-Delta function and Since by assumption the last N + 1, . . . , M clocks may be lost due to erasure errors, we can trace them out: So by comparing Eqs. (46), (41), we observe that after tracing out the additional clocks, the resultant channel is of the same form as the original channel, when evaluated for the renaming number of clocks. We now assume that an error due to the environment occurs. This means that we apply a CPTP error map E j : H P → H P corresponding to an error j. We thus denote We now measure the remaining clocks, performing projective measurements in the "time" basis on the mth clock. Let us drop the m subindex for simplicity of notation. The state after the outcome k := k 1 , k 2 , . . . , k N is: Thus, after postselecting on a particular outcome represented with the vector k, we obtain the following state on the Hilbert space of the output of E j E(·) An outcome labeled with k happens with a probability given by Now, let us define the following function.
where for simplicity we will assume that all the clock dimensions are equal d 1 = d 2 = . . . = d N =: d C . This definition has the following properties: 1) Since the integrand has period T 0 , F Q ( k) is independent of t 0 and we can set it to any preferred real number to help perform the calculations.
2) We can perform the change of variable t = t + a, a = −l 2π ωdC , l ∈ Z, to show where the map acts element-wise on vectors and (mod. d ) denotes modular d arithmetic.
3) By inspection, we see that F Q is invariant under any pairwise interchanges where k q and k r are the qth and pth vector elements of k.

4) By
Eq. (57), it follows that F Q encodes the measurement outcome probabilities, This has important consequences when property 2) is taken into account. For example, in the case of one clock, it implies that all measurement outcomes are equally likely, so

5) By making the substitution
Thus using property 5), we can write ρ k P in Eq. 52 as It is convenient to introduce an arbitrary phase k α ∈ R which for our purposes has to be defined modulo d C .
It will depend on k, the measurement outcomes. We will choose it later depending on the particular covariant error correcting code set-up and protocol. We can now write In order to proceed with the decoding, let us define the function p(Q, k) as: where p(Q, k) is a complex number for which we expect |p(Q, k)| 1 (how small will determine the size of the error of the particular sheme). The phase k α determines the group elements we should apply in the decoding procedure, as per Eq. (73).
Since by assumption the group action only acts non-trivially in the physical space, U ⊗K P (t α ) = U Co (t α )⊕1 P/Co , the decoder may take the form where E −1 (·) = D j (E j (·)) is the inverse channel for the encoder E and D j the decoding map of E.
With it we see that if we apply the decoder to the state ρ k P , we obtain where we defineÊ Since each of the outcomes k occurred with probability F 0 ( k), the final decoded state, averaged over all measurement outcomes, is of the form In Appendix C we show how a bound on the entanglement fidelity of the code follows from this expression. The last fact that then needs to be shown is the form of the RHS of Eq. (75) for particular cases of clocks.

C Calculation of the entanglement fidelity
We here give the bounds on the entanglement fidelity which will be used to prove the main results. Again, it is defined as where the minimization is over all the pure bipartite states |φ . Motivated by the discussion in Appendix B leading to Eq. (79), let us start with the assumption that after the encoding, error and decoding steps, the map has the following form when applied to a state ρ.
A result from [32] allows us to lower bound the entanglement fidelity in terms ofÊ(ρ L ). If for all pure states on a single system on H din the following holds then we have that f worst (K) ≥ 1 − 3 2 . Given that φ|Ê(|φ φ|) |φ ≤ ||Ê(|φ φ|)|| 1 , we can choose = max |φ φ| ||Ê(|φ φ|)|| 1 , where the optimization is over all pure states, thus obtaining The next step is to give an upper bound to the 1-norm ofÊ that holds for any pure state. First, by the triangle inequality As seen in Appendix B, in all of the cases considered here, the operatorÊ k is defined asÊ k =D j,tα Ê k wherē D j,tα is the decoder, and where p(Q, k) is in principle constrained by complete positivity and trace preservation (as defined in Eq. (78)). By contractivity of CPTP maps, ||Ê k || 1 ≤ ||Ê k || 1 . Now, we can write where the third line follows from the inequality [47], with 1 = 1/p + 1/r and choosing p = 1, r = ∞, with B = |q q| and A the rest. The fourth line follows from the definition of the 1 − 1 norm for CPTP maps, which is ||E j || 1−1 = sup X ||Ej (X)||1 ||X||1 , where the optimization is over operators on the Hilbert space of the input of E j . By contractivity, we have that ||E j || 1−1 ≤ 1. The last step is to bound the left-over 1-norm. By using ||A|| 1 ≤ √ d in ||A|| 2 we can bound the 2 norm instead, as |n n| n| ρ L |n p(Q, k) since ρ L = |φ φ|. Generically in our examples the optimization is over the range Q ∈ {−(∆h L + ∆h Co ), . . . , ∆h L + ∆h Co } as defined in Sec. 2.1, but if the representation U † Co (t) is trivial (that is, the generator is the identity) then Q ∈ {−∆h L , ∆h L }, which yields the best performance. Finally we have which holds for any |φ φ|. Since outcome k occurs with probability p k = F 0 ( k), we have For particular sets of clocks and schemes, we will give bounds for F 0 ( k) max Q |p(Q, k)|, and then use Eq. (94) to estimate the entanglement fidelity.

D Proof of Theorem 1, a bound for a single remaining clock
Following the discussion of appendices B and C, here we provide the proof of the bound on the quantity that we use to bound the entanglement fidelity for the case in which, after erasure errors, only a single clock is ensured not to be erased. For simplicity of notation, let us use the shorthand d = d C . The first step is to notice that the discussion of Appendices B and C shows we only have to compute a bound on max as defined in Eq. (75). Then, any upper bound on |p(Q, k)| ∀ Q, k will yield a lower bound on the worst-case entanglement fidelity as per Eq. (94). We now compute this ratio explicitly for a single Quasi-Ideal clock, which we do following the notation of Appendix B. Let us recall that In this case of this proof, we let k α = k 1 − k 0 1 . After measuring the single clock, and right before applying the decoding, the state is that of Eq. (72), that is Using property 2) of F Q ( k) (see Eq. (60)), we observe that the first line of Eq. (98) is invariant under the change of variable Thus the situation is the same for all measurement outcomes k 1 . As such we will only need to concern ourselves with one of the measurement outcome k 1 , which we choose for convenience to be k 1 = k M a := max{S d1 (k 0 1 )}, where the set S d (k 0 1 ) is described in the main text is defined to be for k 0 1 ∈ R. Thus k M a = k 0 1 + d/2 if k 0 1 + d/2 is integer (we will assume this is the case here, but one can find analogous results for when it is half integer) 10 We are now interested in bounding the overlap This can be bounded with the results in Theorem 8.1 on page 151 of [21]. The theorem tells us that for all where with The error term ε c := |||ε || 2 satisfies 10 Recall that k 0 1 is the value at which the Gaussian amplitude of the initial Quasi-Ideal clock state is centred.
where ε total = 2πAd 2σ and A is defined in Eq. (124), and α 0 ∈ (0, 1] is defined in [21] to be: Definition 1. (Distance of the mean energy from the edge of the spectrum) We define the parameter α 0 ∈ (0, 1] as a measure of how close n 0 ∈ (0, d − 1) is to the edge of the energy spectrum, namely The maximum value α 0 = 1 is obtained for n 0 = (d − 1)/2 when the mean energy is at the mid point of the energy spectrum, while α 0 → 0 as n 0 approaches the edge values 0 or d − 1. Furthermore, where on the r.h.s. of the inequality in Eq. (109) we have assumed σ ≥ 1 and d = 2, 3, 4, . . . (tighter bounds can be found in Section E.A.2 of [21]).
We can now get back to estimating the overlap Eq. (100). We find that for all t ∈ [0, T 0 ], Thus when ρ C,1 = |ψ nor (k 0 1 ) ψ nor (k 0 1 )|, and t ∈ [0, T 0 ], Thus using ω = 2π/T 0 and the change of variable x = t/T 0 − 1/2, and setting t 0 = 0 such that integral is over t ∈ [0, T 0 ], we find Recalling that F 0 ( k) = 1/d and that, following Section E.1.1 of [21] we have ε A is specified later in Eq. (131), we can write where in the last line we have used k α = k 1 − k 0 1 = d/2 in order to get rid of the phase. Finally, we need to find bounds for both ε 2 and ε A . Let us start with ε 2 , defined as where ε 1 is defined in Eq. (118). By inspection, we have where ε c is reproduced in Eq. (104). Hence going back to the definition of 2 in Eq. (128), we can then conclude that |ε 2 | ≤ 2 (dε total + (d + 1)ε step + ε nor ) + (dε total + (d + 1)ε step + ε nor ) To bound the error term ε A , we use the results from Eq. (477-483) from Section E.1.1 on page 84 of [21]. The bound is Thus, the leading contribution to ε A comes fromε 2 , and the leading contribution to ε 2 comes from the first term of ε total and the last of ε nor . It can be seen in Eq. (105) that the first term of the sum only decays as √ σde − πσ 2 4 α 2 0 . Thus, we can writẽ Going back to the definition of Eq. (75), this gives the following bound on max from which, given the discussion of Appendix C, the bound on the entanglement fidelity follows.
To finalise the proof, we need to pick an optimal value of σ. Choosing σ = ln 3 (d), we obtain the best scaling of the leading error term in d, which gives from which, given the discussion of Appendix C, the bound on the entanglement fidelity follows. The choice σ = ln 3 (d) corresponds to time squeezed Quasi-Ideal lock state (i.e. ∆t < ∆E, where ∆t and ∆E are the standard deviation in the time and energy bases as described in the main text).

E Proof of bound for clocks diagonal in the time eigenbasis
We here give a proof of Theorem 2, which is a bound on the entanglement fidelity of clocks which are, at some point in their periodic orbit, diagonal in the basis in which the are measured -the time eigenbasis {|θ k }, conjugate to the energy eigenbasis. That is, we assume that there exits t 0 ∈ R such that the initial clock state for some probability amplitudes { |A k | } dC−1 k=0 . Therefore, Unlike in the previous case of an optimal clock, the above equation only depends on the difference r 1 − r 1 rather than r 1 , r 1 individually (c.f. the Quasi-Ideal clock case). Before preceding, we need to write the covariant encoding channel in a way which reflects the form of Eq. (138). Note that Eq. (18) in the theorem has the property for all t ∈ R. This is easily provable via a change of variable and is a consequence of the fact that the integrand is periodic and we are integrating over one period. Hence via the change x = t − t 0 and setting t = t 0 , we find The above equation shows how to write the covariant encoding channel in terms of the clock stateρ C,1 which is diagonal in the time basis. Since our results are stated in terms of the entanglement fidelity which is independent of any particular input, the change in inputs ρ L toρ L in the above equation is irrelevant and we will thus hence ignore this difference. The only relevant difference is thus that the encoding map E t is shifted to E t+t0 . This small difference can easily be accounted for at the decoding stage without extra complication.
From Eq. (35), we see that changing E t for E t+t0 is equivalent to changing E q,q ,n,n to e −iωt0(hCo,q−h Co,q +hL,n−h L,n ) E q,q ,n,n or equivalently, changing E j q,q ,n,n tõ E j q,q ,n,n := e −iωt0(hCo,q−h Co,q +hL,n−h L,n ) E j q,q ,n,n Hence we can use Eq. 52 by making the replacements E j q,q ,n,n (defined in Eq. (47)) withẼ j q,q ,n,n (·) and ρ C,1 withρ C,1 , followed by plugging in Eq. (139). Hence recalling the short hand notation Q := h Co,q − h Co,q + h L,n − h L,n , we find Now applying to Eq. (146), we have As all the outcomes are equally likely, we have that p k = 1/d C . Hence where we have defined It is convenient to write this as where |Q|Ô q,q ,n,n (t).
Observe that if we can apply the map to ρ P (t k−k1 ), we have perfect error correction, i.e.
as such we define the error term Let us assume we apply an arbitrary decoder D to ρ k P . The fidelity with an arbitrary initial pure state |ψ ψ| is which shows that a the Salecker-Wigner-Peres clock with |A k | 2 = δ k,k is the optimal choice.
Crucially, the term in the optimization ψ| D δ (t k−k1 ) |ψ now only depends on parameters of the encoding E and decoding D independently of the clock and its dimension d C . Now, let us assume there exists a sequence of encoding-decoding schemes such that for some fixed d P , If this were true, one could achieve an arbitrarily large entanglement fidelity for all pure states without increasing the size of the clock. This contradicts the existing no-go results [7,26], so there must exist a constant C, such that for all D andδ(t k−k1 ) with fixed d P which can be chosen such that the inequality is saturated for some D andδ(t k−k1 ). Hence, we have that On the other hand, since the Salecker-Wigner-Peres clock saturates the inequalities (162) and (163) we can apply the decoder for some C * ≥ C, as the only dependence with the dimension of the clock is in the term 1 dCδ (t k−k1 ).

F Proof of Theorem 3
We just need to show that with L d C -dimensional clocks we can construct a clock state with the same properties as a single clock of dimension L(d C −1)+1. Let us take the Hamiltonian of L non-interacting clocks of dimension d C , This defines a subspace with dimension d(L) and with a Hamiltonian with equally-spaced energy levels with gap ω. Now given any single clock state . This large dimensional clock state will now be a superposition of product states (the energy eigenstates) and will thus be entangled. Thus, in any scheme that uses a clock with dimension d C , such that it achieves a fidelity bounded by f (d C ) (either from above or below), one can now use this d(L) dimensional clock, to achieve a fidelity bounded by f (d(L)) (again, either from above or below). This way, we can use a large amount of clocks to vastly improve the performance of the codes. G The converse bound from [26] Here we show how to adapt the bound from [26] to our setting 12 . To understand why the bound cannot be straightforwardly applied to our setting, first recall our definition of the covariant code E cov (·). For all t ∈ R, and a given encoding E(·) and group representations U L (t), U Co (t) ⊗K , we define The covariant encoding that we use throughout our work is then the CPTP map: This construction is by definition not an isometry, and in general the output E cov (·) is a mixed state even for pure inputs. However, the theorem of [26] assumes that the encoding map is an isometry, and as such it does not directly apply to our results. However, since in our setup we assume that the encoding of Eq. (171) is covariant, it is possible to construct a Stinespring dilation with an isometry to which the bound of [26] can be directly applied. We can construct the Stinespring dilation with an isometry directly for Eq. (171). However, since in our set-up M − N clocks are lost due to erasure errors, and the erased clocks are inaccessible, we can trace out M −N clocks and work with the resultant effective encoding map instead. Since we also assumed no correlations between the erased and non-erased clocks, this leads to the effective channel Finally, since in some instances the generator of the unitary group on the clock can be exchanged for an effective generator w.l.o.g. (noticeably Theorem 3), we will assume in Eq. (172) that the unitary representation of the compact U(1) group on the clock U C (t) ⊗N , takes on the generic form U C (t) ⊗N = e −itHC and specialise it later in this section to specific cases.
We can now proceed with the isometry. Notice that E(·) can be dilated to an isometry V L→CoA defined such that the representation of the symmetry group acts trivially on system A. Also, the state ρ (N ) C may not be a pure state, but it can also be dilated to |Ψ CC Ψ CC | (N ) , again such that the symmetry group again acts trivially on systemC. We thus definē where now we have The only reason whyĒ cov (·) is not yet an isometry is the twirling over the group, for which we are also able to define a dilation with the help of the Choi-Jamiolkowski isomorphism. First, let {|k L } be the eigenbasis of L withĤ L = k ωh L,k |k k| L , and let |Φ LL = k |k L ⊗ |k L be an unnormalized maximally entangled state between the logical space L and a copyL. Moreover, let us defineĤL = k −ωh L,k |k k|L such that Ĥ L ⊗ 1L + 1 L ⊗ĤL |Φ LL = 0. The Choi-Jamiolkowski representation of channelĒ cov is (we now omit writing the trivial representations such as 1 A , 1 ⊗N C for simplicity of notation) where in the second line we have used the properties of the maximally entangled state, and we define U CoCL (t) := U Co (t) ⊗K ⊗ U C (t) ⊗N ⊗ UL(t) = e −it(ĤCo⊕HC⊕ĤL) . Thus, we can writeĒ cov in terms of the projectors onto the degenerate eigenspaces ofĤ Co ⊕H C ⊕ĤL as Since the operatorsĤ Co ,H C ,ĤL act on different subsystems, we can decompose Π x CoCL as where h CoC,l are the eigenvalues ofĤ Co ⊕H C and h L,k are the eigenvalues ofĤL. Furthermore we can write Π k L = |k k|L. If we write the corresponding projector in L as Π k L = |k k| L , using the definition of |Φ LL Φ LL | we have that (179) Now we can define an extra system B on a Hilbert space spanned by orthonormal basis vectors {|x B }, and such thatĤ B = x∈{h CoC,l −h L,k } (x)|x x| B , so that every single energy difference h CoC,l − h L,k appears in the sum only once (note that by assumption this set is finite). With this, we can define the following isometry from the logical space labeled by L to BCoACC.
This is indeed an isometry since W † L→BCoACC W L→BCoACC = 1 L . It is covariant, in the sense that by construction W L→BCoACC e −itĤL = e −it(ĤCo⊕HC⊕ĤB) ⊗ 1 AC W L→BCoACC .

(181)
Moreover, it can be easily computed that and subsequently so W L→BCoACC is a covariant Stinespring dilation of channel tr C N +1 ...C M [E cov (·)], which we can see as an encoding isometry for which the error is i) first the loss to the environment of systems BAC, and then ii) an erasure error channel for E cov . The result of [26] states that for these isometries, the entanglement fidelity achieved by any decoding scheme is bounded by where ∆h L is the range of values of the set {h L,k } ofĤ L , N is the number of subsystems in the encoding that can be erased independently by the error model, and where ∆h loss is the largest energy difference in the Hamiltonians of all the subsystems that are lost to the environment. We have that BAC are lost and since A,C have trivial generators we just have to look atĤ B , so that ∆h loss = ∆h B . The eigenvalues ofĤ B are of the form h CoC,l − h L,k , so that the range ofĤ B is ∆h B = ∆h Co + ∆h C (the terms with h L,k always contribute negatively and hence ∆h L does not enter here).
We will now specialise the bound to the first case considered in Corollary 1. Here we have erasure of up to M − 1 blocks of L entangled clocks so N = 1. Furthermore, the effective clock generator on the remaining block of L entangled clocks isH C = H clock (recall Eq. (169) for an expression for H clock ). This effective clock system has ∆h C = Ld C . Therefore ∆h loss = ∆h Co + Ld C . To compute N , first notice that BAC counts as a single system as it is always lost to the environment. Moreover, as mentioned above, the M − 1 clocks that are lost to the environment do not appear in the decoding procedure at all, and as detailed above are such that effectively they were not there in the first place. Finally, the error channel on P is later corrected by its own decoding map D j (·) and is independent of whether we make the code covariant in the first place, and the block of L entangled clocks that is left (which we have taken to be the whole C system) does not get erased, so we can also count both these two as a single subsystem (which gets erased with probability 0). Hence we have N = 2. Putting everything together, we finally obtain The appearance of ∆h Co as an additive contribution to the effective clock dimension Ld C in the upper bound on the fidelity is to be expected, since in our setup one could have chosen the encoding map E to be a clock whose decoding map D j measures this clock in the same way that the decoding mapD j,q measures the clocks on C. This scenario would help to make the encoding channel E cov increase its decoding fidelity, since its effective clock dimension would be ∆h Co + Ld C . Contrarily, if one considers encoding and decoding channels E, D j which do not help to make the encoding channel E cov reduce its decoding errors at all, then the value of ∆h Co should not be expected to play a role in the decoding fidelity f worst . Finally, comparing Eq. (185) with the lower bound of Eq. (23), we see that up to logarithmic factors, the L entangled Quasi-Ideal clocks achieve the optimal error scaling with both their dimension d C and number of clocks L.

H Proof of Theorem 4
In this section, we will prove Theorem 4. We will prove it only for the special case that the three blocks consist of one clock each, i.e. L = 1. Once Theorem 4 is proven for this specialised case, the result for larger L is a direct consequence of Theorem 3. Again for simplicity of notation, we will label d = d C .
For simplicity, we will start assuming, as for a single clock, that d is even (d odd follows analogously), and that σ satisfies The first goal will be to work out an explicit expression for F Q ( k) evaluated for the case of three Quasi-Ideal clocks in which a 2-unknown phase error applied to one of them. To indicate this difference (i.e. that un unknown phase has now been applied), we add a tilde to F . Specifically, we havẽ and r = 1, 2, 3 denotes the clock to which the phase is applied. The variables r and t ph are assumed to be unknown. Applying Theorem 8.1 (Eq. 71) in [21] we find where we have used the Quasi-Ideal clock states ρ C,q = |ψ nor (k 0 q ) ψ nor (k 0 q )| q and defined In the last line of Eq. (193) we have defined where ε c is defined in Theorem 8.1 (Eq. 71) in [21]. Thus from Eqs. (193) and (58), it follows that if the clocks ρ C,q in question are Quasi-Ideal clocks, theñ so we see that a 2-unknown phase error, up to a small error ε c , simply re-maps the initial time of one of the clocks to an unknown value. Given the relation Eq. (198), our 1st task will be to find an explicit expression for the integral (58). We do this in the following section.

H.1 An Expression for the Integral in F Q
We start by deriving a general expression for θ k |ρ C,q (t)|θ k for k ∈ S d (k 0 q ), q ∈ {1, 2, 3} for Quasi-Ideal clock states. For this we need to consider the overlaps Let t q be defined via the relation Therefore, Now consider Eq. (58) with measurement outcomes k 1 ≤ k 2 ≤ k 3 , with identical clocks other then their starting times denoted k 0 1 , k 0 2 , k 0 3 respectively. We consider this special case w.l.o.g. since the other cases can be reconstructed later using property 3) in Section B. Thus using Theorem 9.1 in [21] we have Aψ nor (k 0 2 ;k 2 (t, where in the last line we performed the change of variable x = t/T 0 and used ω = 2π/T 0 and defined From Eq. (211) we have Using the triangle inequality and the identity |R + C| 2 = R 2 + ε where |ε| ≤ |C|(2R + |C|) for R ≥ 0, C ∈ C, we can bound the ε I1 term, where ε c (T 0 , d) is an upper bound to |ε c (t) 2 evaluated at t = T 0 . This quantity, (which already appeared in Appendix D) is given in Theorem 8.1 of [21] and satisfies where α 0 ∈ (0, 1] is defined in Def. 2 in [21] and explained in Eq. (106). Using Eq. 483 in [21], and Eq. (216) we conclude Dividing the integral in Eq. (215) into subintervals and substituting for ψ nor we find Thus taking into account (219), we conclude (233) In order to simplify the equations further, we will set k 1 to the smallest measurement outcome possible, i.e. k 1 = k0 1 − d/2 + 1. We can easily generate the other cases by employing Eq. (60), which we postpone to Section H.4. Substituting into Eq. (230) gives us where in the penultimate line we have used Eq. (248) and (249). Analogously, we can bound ε R 3 (m, p). We find where we have introduced the constraint Thus computing the integral in Eq. (245) and combining the epsilons into a single term, we have where in the last line we have multiplied by 1 = e 2πiQ and defined the quantities Note that if inequalities (249) and (261) are both satisfied, then Whenever inequality (249) and/or inequality (261) are/is not satisfied, we can bound the integral by the largest value of the integrand, which is exponentially small in (d/σ) 2 . In this case since the point at which the Gaussian takes on its maximum value lays outside of the integration region. However, it turns out that (m 2 + p 2 − mp)/d 2 is uniformly bounded away from zero in this case and thus the 1st term in (265) is exponentially small in (d/σ) 2 . We will prove this result, since it will provide a more useful expression.
We start by defining three functions for x ∈ R With the identification m = (∆k 0 1 − ∆k 0 2 + m)/d and p = (∆k 0 1 − ∆k 0 3 + p)/d, we see that the constraints in Eqs. (249), (261) are the conditions C 3L,p (m) ≤ 0 and C 3R,p (m) ≥ 0 respectively, while F 3,p (m) is the exponent in Eq. (265) which we aim to prove is uniformly bounded away from zero whenever C 3L,p (m) ≤ 0 and C 3R,p (m) ≥ 0 is not satisfied, i.e. when C 3L,p (m) > 0 and/or C 3R,p (m) < 0. (272) Given the range of p = 0, . . . , d − 1, it is easy to verify that both Eqs, (272) cannot simultaneously hold since C 3L,p (m) > 0 implies C 3R,p (m) ≥ 0 and C 3R,p (m) < 0 implies C 3L,p (m) ≤ 0. Therefore, it suffices to consider Eq. (272) with the "or" case only. It is useful to think of these functions as functions of m which are parametrized by p, as the chosen notation suggests. Furthermore, d 2 dx 2 F 3,p (x) = 2, and thus F 3,p (x) is a convex function for all p. As such, we can lower bound F 3,p by any tangent straight line.
First consider the case that C 3L,p (m) > 0 holds. We will proceed to lower bound F 3,p (x) by the tangent line It follows that from which we find It is convenient to demand B p ≤ 0. We thus set so that B p ≤ 0 for all p = 0, . . . , d − 1. To find A p , we need to solve the equation giving Therefore, putting it all together we have for all m, if C 3L,p (m) > 0 holds. The R.H.S. is a convex function in p, thus calculating its stationary point, it can be lower bounded by if C 3L,p (m) > 0 holds. Performing the analogous procedure for the case that C 3R,p (m) < 0 holds, and defining we find Thus summarising the two cases, we see that if the statement "Inequalities (249), (261) are both satisfied" is false. Thus defining ε F by the equation we have that if "Inequalities (249), (261) are both satisfied" is false. Hence, finally, taking into account Eq. (265) and Eqs.

H.5 Bounding Ê 1
Now that we have from the previous section an expression forF Q for all measurement outcomes of the 3 clocks, the next task is to bound Ê 1 defined in 72 to be In all cases, the optimization is over Q ∈ {−(∆h Co + ∆h L ), . . . , (∆h Co + ∆h L )}.
In order to proceed further, we need to specify k α for this protocol. If there were no phase errors, it turns out that defining it equal to either of the three quantitiesk 1 := −d/2 + 1 + l,k 2 := −d/2 + 1 + l + m, k 3 := −d/2 + 1 + l + p, would suffice. Note how doing so would only require information from one clock measurement -the other two clock results would be redundant information. Intuitively, this is because with high probability, all three clocks will have measurement outcomes corresponding to approximately the same elapsed time (if there were no phase errors). For the case of no phase error, it is convenient to prove that the protocol works up to the specified error when k α is chosen to be "any angle 15 in-between the three anglesk 1 , k 2 ,k 3 " whenk 1 ,k 2 ,k 3 belong to a to-be specified domain, and "any angle otherwise". Later in section H.6 we will show how such a result is sufficient in the case of the unknown phase error. Specifically, we define where γ l,m,p are functions indexed by l, m, p = 0, . . . , d − 1 of the form Due to the modular arithmetic, depending on which sector l, m, p belong to, the domain of γ l,m,p changes. Specifically, for R ∈ [1, d/4] we define the sets (343) 15 Throughout this proof, we call "angle" to real numbers which only need to be specified up to modulo d. These angles can be converted to radian by multiplying them by 2π/d. Observer that l, (l + m) (mod. d) , (l + p) (mod. d) ∈ Dom(γ l,m,p ) (mod. d) always holds. In particular, the function γ m,p (α) covers up to (but not more than), all angles 0, 1, . . . , d−1 which are between the three angles l, l+m, l+p whenever (m, p) ∈ S 11 (R) ∪ S 31 (R) ∪ S 13 (R) ∪ S 33 (R) and l ∈ I d . See Fig. 2 where |ε T | = max k 1 |ε T +ε T |/2 is upper bounded by the r.h.s. of Eq.  where we have used that for (m, p) ∈ S 11 (R), Dom(γ l,m,p ) = l, . . . , l + max{m, p}. Thus we conclude that for all α l,m,p ∈ Dom(γ l,m,p ) and for all (m, p) ∈ S 12 (R), l ∈ I d ∆F (d; l, m, p) ≤ π 10 3 where we have used that for (m, p) ∈ S 33 (R), CoDom(γ l,m,p (α l,m,p ) − l − d) = min{m, p} − d, . . . , 0 and −R ≤ min{m, p} − d, giving us |γ l,m,p (α l,m,p ) − l − d| ≤ R. Thus we conclude that for all α l,m,p ∈ Dom(γ l,m,p ) and for all (m, p) ∈ S 33 (R), l ∈ I d ∆F (d; l, m, p) ≤ π 10 3 We therefore conclude that the following is true for all x, y ∈ {1, 3}: For all α l,m,p ∈ Dom(γ l,m,p ) and for all (m, p) ∈ S xy (R), l ∈ I d , From the definition of the sets x, y ∈ {1, 3} with m, p ∈ S xy (R) we see that they all have cardinality R 2 . Thus from Eq. (371), it follows that for all α l,m,p ∈ Dom(γ l,m,p ) and for all (m, p) ∈ S 11 (R)∪S 13 Similarly, one finds that Eq.
Hence, combining Eqs. (373) and (380), and plugging into Eq. (347) we find for all α l,m,p ∈ Dom(γ l,m,p ) and for all (m, p) ∈ S tot , l ∈ I d where, to find the higher order terms, we have used in the last equality where c 1 , c 2 are positive constants independent of d C and σ. Let us now sketch how to derive Eq. (408). First one writes out the characteristic function | ψ| U F (t) |ψ |, as a finite sum of terms corresponding to the elements of the eigenbasis {|φ k }. This is where in going from the second to the third line we have used the Poisson summation formula (see for instance Corollary C.0.2 from [21]), and in the third to the fourth we have kept the m = 0 term and bounded the size of the others. With that, we obtain σ 2 ) (415) with C > 0 independent of d C and σ. Thus, to leading order, it gives a probability of decoherence of The best choice for the width of the clock is σ = (ln d C ) 3 2 , which gives a scaling of which (up to the logarithmic factor) is quadratically smaller than p P SW . This is essentially the same advantage as the one obtained in covariant error correction as shown in the main text.

J Evolution without evolution: connection with the Page-Wootters mechanism
In this Section we briefly re-cap the Page-Wooters mechanism (Section III. "Evolution in a Stationary Universe" in [46]) and show how our formalism and the results in this paper fit into that picture. These authors consider a bipartite setup consisting of a clock and a system S of interest. The system and clock are considered as a closed system evolving under a Hamiltonian of the form H =Ĥ S ⊗ 1 clock + 1 S ⊗Ĥ clock .

(419)
Their idea is to show that there is a global state ρ Sclock which does not evolve in time, i.e. [ρ Sclock ,Ĥ] = 0 yet the dynamics of the system, after conditioning on the clock being in a state |ψ(τ ) clock = e −τ iĤ clock |ψ(0) clock , at time τ , is given by the free evolution of the system S according to its own HamiltonianĤ S . Namely, tr clock [P clock (τ )ρ SclockPclock (τ )] = e −iτĤ S tr clock [ρ Sclock ]e iτĤ S , whereP clock (τ ) := 1 S ⊗ |ψ(τ ) ψ(τ )| clock is the projector onto the clock state at time τ . So in the above sense, even though the global state ρ Sclock is stationary, the state of the system conditioned on the clock being at a particular time, is evolving according to the Schrödinger equation for its own HamiltonianĤ S .
Before seeing how this is connected to our work, a few remarks are pertinent. Firstly, time is assumed to be continuous, i.e. if one wishes Eq. (421) to hold for all τ in some finite subinterval of the real line, then one needs to use an Idealised clock (these are discussed in Section 1 in the main text). These, however, are unphysical in the sense of requiring infinite energy. To the best knowledge of the authors, while many aspects of the Page-Wooters mechanism have been explored (see [48][49][50][51] and references therein), it remains an open question to how well finite dimensional clocks (or infinite dimensional clocks with finite energy) can realize this. It is also noteworthy that in our setup, the clock and system are only classically correlated, and thus our results demonstrate that quantum entanglement between the clock and system S is not necessary to fulfil the Page-Wooters mechanism. We now find the Hamiltonian and time invariant state which realise the Page-Wooters mechanism up to a quantifiable error incurred due to using physical clocks.
Firstly, we can identify the physical space with the system in the Page-Wooters model and the clocks in our setup with those in Page-Wooters model. The total Hamiltonian on the system and the clock will be the Kronecker sum of the generators of the physical space Lie group and those of the clock's Lie group, namely (c.f. Eq. (419))Ĥ Secondly, we use our covariant encoding channel E cov (·) in Eq. (8) to realise the stationary state ρ Sclock of the Page-Wooters mechanism ρ Sclock = E cov (ρ L ), for any ρ L ∈ S (H L ) and for the choice of trivial representation on the Logical space, namely U L (t) = 1 L for all t ∈ R. To see how this works, note that 1) the unitary group representation on the physical and clock space has the Hamiltonian in Eq. (422) as its generator, and 2) that the encoding map is covariant. Specifically where in the last line, we have used U L (t) = 1 L for all t ∈ R. Hence differentiating w.r.t. t on both sides on Eq. (424), we achieve the Page-Wooters condition, Thirdly, our clock state must, at least approximately, satisfy Eq. (421). We have already calculated the state of E cov (ρ L ) when the clocks were projected onto the time basis {|θ k θ k |} dC−1 k=0 in Section B. Now, we want to perform a similar calculation but for when the clock is projected onto the state of the clock at time τ . In the case of one Quasi-Ideal clock, this corresponds to covariant positive-operator valued measure, which up to a vanishing error in the large d C /σ and σ limit, are {P QI (τ ) := U C (τ )|ψ nor (k 0 1 ) ψ nor (k 0 1 )|U C (τ ) † } τ ∈[0,T0] . Proceeding analogously to Section B, but this time not applying the error map E j , we find ρ P (τ ) := tr C P QI (τ )E cov (ρ L )P QI (τ ) =U ⊗K † F Q (τ ) F 0 (τ ) e 2πiτ Q/T0 − 1 E q,q ,n,n (·)|q q |, with E q,q ,n,n give by Eq. (36) and F Q (τ ) given by where |ψ nor (k 0 ) is the Quasi-Ideal clock described previously in the manuscript. Note that Eq. (426) is analogous to Eq. (72) but with a trivial error map E j = I (no clock errors),ρ L = U L (t α )ρ L U † L (t α ) = ρ L (trivial logical group representation), and a time τ rather than the discrete times t α := k α T0 dC (since we are not projecting onto the time basis any-more). One can bound Eqs. (429) and (427), analogously to the calculations in Sections D C. We will not perform such calculations here for brevity, but one finds that F Q (τ )/F 0 (τ ) ≈ e −2πiτ Q/T0 where the approximation becomes exact in the large d C limit. So one can see that the quantity in square brackets in Eq. (427) vanishes in said limit. Hence for large d C , from Eq. (426) where the approximate equality becomes exact in the large d C limit. This is an approximate Page-Wooters condition (Eq. (420)).