Proposal for room-temperature quantum repeaters with nitrogen-vacancy centers and optomechanics

We propose a quantum repeater architecture that can operate under ambient conditions. Our proposal builds on recent progress towards non-cryogenic spin-photon interfaces based on nitrogen-vacancy centers, which have excellent spin coherence times even at room temperature, and optomechanics, which allows to avoid phonon-related decoherence and also allows the emitted photons to be in the telecom band. We apply the photon number decomposition method to quantify the fidelity and the efficiency of entanglement established between two remote electron spins. We describe how the entanglement can be stored in nuclear spins and extended to long distances via quasi-deterministic entanglement swapping operations involving the electron and nuclear spins. We furthermore propose schemes to achieve high-fidelity readout of the spin states at room temperature using the spin-optomechanics interface. Our work shows that long-distance quantum networks made of solid-state components that operate at room temperature are within reach of current technological capabilities.


Introduction
The successful implementation of global quantum networks would have many applications such as secure communication [1], blind quantum computing [2], private database queries [3], ultimately leading to a "quantum internet" [4][5][6] of networked quantum computers and other quan-Jia-Wei Ji: quantum.jiawei.ji@gmail.com Christoph Simon: christoph.simon@gmail.com tum devices. This will require photons for establishing long-distance connections, as well as stationary qubits for storing and processing the quantum information. In particular, since quantum information cannot be amplified, quantum repeaters are likely to be required [5,7,8]. Most current approaches to such quantum networks require either vacuum equipment and optical trapping or cryogenic cooling [7,[9][10][11][12][13][14][15][16], which adds significantly to the difficulty of scaling up such architectures. There is notable recent work towards quantum networks with room-temperature atomic ensembles [17][18][19][20][21], but it is also of interest to investigate solid-state approaches, which might ultimately be the most advantageous in terms of scalability.
Nitrogen-vacancy (NV) centers have millisecond-long electron spin coherence times even at room temperature [22][23][24][25], making them excellent candidates for being stationary qubits in quantum networks [12,13,26]. So far, NVbased room-temperature quantum information processors were proposed based on the spin-chain model where the interactions between electron spin qubits are mediated by the nuclear spin chain [27] or based on the strongly interacting fluorine nuclear spins [28]. It is intriguing to ask whether photonic links can be implemented for NV centers at room temperature. Unfortunately, the phonon-induced broadening of optical transition poses a serious challenge to using NV centers in generating spin-photon entanglement at room temperature [29]. An alternative approach to overcome this problem could be using quantum optomechanics [30], where the effective spin-photon coupling is mediated by an ultra-low loss mechanical resonator [31,32] to bypass the direct spin-photon interface. It was shown theoretically that this approach allows the emission of highly indistinguishable photons [33] at room temperature, which suggests that high-fidelity entanglement creation should be possible as well. Further, this interface allows the freedom of choosing the wavelength of emitted photons. Thus, one could have the emission at telecom band, which is ideal for connecting distant NV centers through optical fibers.
Nuclear spins in diamond have even longer coherence time at room temperature than the electron spins, exceeding a second [22]. Therefore, these nuclear spins can be used as quantum memory to store the entanglement both at ambient conditions [34], similar to what is being done at cryogenic temperatures [35]. Electron and nuclear spin qubits can be coupled via hyperfine interactions [22,27,36].
Based on the above line of thought, we here propose a room-temperature quantum repeater architecture based on NV centers and optomechanics. In our proposal the entanglement between two distant NV electron spins is established via photons following the Barrett-Kok scheme [26,37,38]. We apply the photon number decomposition method [39] to quantify and analyze the entanglement generation efficiency and fidelity. Mapping of the electron spin entanglement onto nuclear spins is achieved via performing CNOT gates and electron spin readout through the spin-optomechanics interface. Finally, entanglement swapping is done using the same gate operations assisted by the readout of electron spin and nuclear spin states. The quasi-deterministic gate operations allow us to distribute the entanglement in the nesting-level free manner which outperforms other conventional nested repeater protocols. This paper is organized as follows. In Sec. 2, we introduce the quantum repeater architecture, including the spin-optomechanics interface, as well as entanglement generation, entanglement storage in nuclear spins, and entanglement swapping. The NV electron spin readout at room temperature is discussed in Sec. 3. Sec. 4 discusses the repeater rate and fidelity. Sec. 5 gives more details in implementation. We conclude and provide an outlook in Sec. 6.

Quantum repeater architecture
The diagram in Fig. 1(a) illustrates the basic steps and components for building a room-temperature quantum repeater architecture based on spin-optomechanics systems. A typical quantum repeater features two basic ingredients: the entanglement generation between two remote memories, and the entanglement swapping between two local memories to propagate it further [4,5]. Here, our physical systems also have these two components, and they can operate at room temperature. One crucial component of our proposal is the spin-optomechanics interface which was first proposed by R. Ghobadi et.al. [33]. Moreover, our proposal features two kinds of qubits: the NV electron spins serve as communication qubits, and the nuclear spins serve as memory qubits for storing the entanglement because they have long coherence time even at room temperature [22,34]. At cryogenic temperature, experimental realizations of such diamond-based nuclear-spin memories have already been demonstrated [34,35].
This section is dedicated to the basic structure and components of our proposed architecture. We start with the introduction to the spinoptomechanics interface [33], and then quantify the efficiency and fidelity of entanglement generation between two remote nodes based on the recently developed photon number decomposition method [39]. Then we discuss entanglement storage and swapping under ambient conditions. The application of the spin-optomechanics interface for the electron spin state readout at room temperature, which serves as a crucial ingredient in the proposed architecture, is discussed in the next section.

Spin-optomechanics interface
The schematic of spin-optomechanics interface is shown in Fig. 1(b). There are three main components in the system: the NV electron spin, the mechanical oscillator (SiN membrane) and the high-finesse optical cavity. The NV electron spin is coupled to the mechanical oscillator via a magnetic tip that is attached to the oscillator, which requires the magnetic field gradient to produce the strong spin-mechanics coupling rate λ [33]. The red-detuned control laser is used to induce the optomechanical coupling rate g. The NV electron spin must be tuned to be resonant with the red-detuned control laser so that a single spin-excitation would be converted a single photon emitted at the cavity frequency via the Figure 1: (a) Room-temperature quantum repeater architecture. Here, we just show four nodes and three links to demonstrate the basic logic of the quantum repeater protocol, which proceeds in four steps.
Step 1 is to generate the entanglement between two remote NV electron spins using the spin-optomechanics interface.
Step 2 is the memory mapping, which stores the entanglement between two electron spins into the entanglement between two nuclear spins.
Step 3 is the same as step 1 for generating the entanglement between two remote NV electron spins.
Step 4 is to perform the entanglement swapping that establishes the entanglement only between the first and the last nuclear spins. (b) Schematic of the spin-optomechanics interface with membrane-in-the-middle design. The optomechanical system consists of a SiN membrane oscillator placed inside the high-finesse cavity. A magnetic tip is attached to this membrane. An NV center in bulk diamond is placed near the tip, such that the oscillator is coupled to the dressed ground states of the NV center. A single telecom photon is produced via the mechanically mediated interaction between the control laser and the dressed NV center, while the cooling laser is on to keep the membrane oscillator near its ground state. mechanical oscillator. However, when the control laser is red-detuned from the cavity, it also starts to cool the mechanical oscillator via the phonon sideband. This converts phonons to single photons at the cavity frequency as well, which causes a thermal noise that degrades the quality of the single photon from the NV electron spin. In order to reduce this noise, we detune the control laser far from the phonon sideband ω m . Since the control laser is detuned far from the phonon sideband, it is ineffective at cooling the mechanical oscillator. Hence, we introduce a different laser on resonance with the mechanical oscillator to efficiently cool it [33].
The triplet NV electron spin state {|0 , |−1 , |+1 } is under the dressing of a microwave source [33], which form a three-level dressed spin states {|0 , |D , |B } that are noise-protected from the nuclear-spin bath [40]. Only the bright state |B = (|+1 + |−1 )/ √ 2 and the dark state |D = (|+1 − |−1 )/ √ 2 couple to the mechanical oscillator with the rate λ. The states |+1 and |−1 are two of the triplet ground states of the NV center. The transition frequency between |B and |D is ω q , which is tuned to be the same as the control laser via controlling the Rabi frequency of the microwave dressing source. The detuning δ between the red-detuned control laser ω q and the phonon sideband ω m is δ = ω m − ω q . The level diagram of this spin-optomechanics system is shown in Fig. 2(a). Then, the system Hamiltonian is given by ( = 1)Ĥ = ω q (σ +σ− +â †â ) + ω m (b †b +ĉ †ĉ ) +Ĥ I , (1) whereσ − = |D B| is the lowering operator for the dressed NV spin states, andâ andĉ are the control cavity mode and cooling cavity mode respectively, andb is the oscillator mode.Ĥ I stands for the interaction term, and it takes the following form: where λ is the spin-mechanics coupling strength, g is the control optomechanical coupling rate, and g c is the cooling optomechanical coupling rate. Under the condition that δ {λ, g}, and the cooling mode significantly reduces the thermal noise from the mechanical oscillator, making it near the ground state [33], it is valid to adiabatically eliminate the δ-detuned mechanical phonon mode to achieve the effective coupling between the dressed spin state and a cavity photon [33,41]. The cooling mode can also be ignored as it cools the mechanical oscillator, converting phonons to photons that are emitted at a different frequency than the desired single The level diagram illustrates the coupling between the excited dressed NV electron spin state and the mechanical phonon with the rate λ, and the coupling between the mechanical phonon and the cavity photon with rate g. Coupled states are denoted as |spin, mechanics, cavity . A single photon is generated via the indirect coupling between the spin and cavity mode through the oscillator, and is then released by the cavity at the rate κ, leaving the whole system in |D00 . The dressed spin state has dephasing rate γ * s , and the mechanical oscillator is dissipatively driven by the environment with the rate γ m n th . (b) The schematic of fourlevel spin-cavity system after the adiabatic elimination of oscillator mode. The effective coupling strength between the cavity and the NV spin is λg/δ. This effective spincavity system has five effective decoherence rates: the pure spin dephasing rate γ * s , the mechanically-induced thermal decay and excitation rates γ 1 and γ 2 for the spin, and the effective decay rate κ 1 and mechanicallyinduced thermal excitation rate κ 2 for the cavity mode.
photon from the NV spin. The effective coupling rate is λg/δ as indicated by the blue arrow in Fig. 2(b). After adiabatic elimination and rotating-wave approximation (δ ω q , ω m ), the simplified Hamiltonian is given by [33,42] where Ω = λg/δ is the effective coupling strength between the cavity photon and NV bright state.
Although this system is a three-level system containing two coupled ground states of NV spin {|D , |B } and the cavity mode, it is convenient to include the uncoupled ground state |0 in the system for the later analysis. From now, we call this system a four-level system. Then, the corresponding effective master equation is given by [42] where κ 1 = κ + g 2 γ m (n th + 1)/δ 2 is the effective cavity decay rate with original cavity decay rate κ, and κ 2 = g 2 n th γ m /δ 2 is the mechanicallyinduced thermal excitation rate for the cavity photon with the oscillator damping rate γ m and the average phonon number n th determined by the environment temperature, and γ * s is the pure spin dephasing rate, and γ 1 = λ 2 γ m (n th + 1)/δ 2 , γ 2 = λ 2 n th γ m /δ 2 are the mechanically-induced thermal decay and excitation rates for the NV spin state, respectively. Here D[Â]ρ =ÂρÂ † − A †Âρ /2−ρÂ †Â /2. The inherent NV spin flip-flop rate is ignored because it is much smaller than the pure spin dephasing rate γ * s even at ambient temperature [23].

Entanglement generation
Step 1 in Fig. 1 is to generate entanglement between two remote NV electron spins at room temperature. This can be achieved using the protocol described in Sec. 2.1. Photons with high indistinguishability, brightness and purity can be produced using this spin-optomechanics interface at room temperature [33]. Each of the two spinoptomechanical interfaces can be modeled as described in the previous section.
If the initial state of the NV center is prepared as (|B + |0 )/ √ 2, a single photon would be released from |B0 at the cavity frequency via the effective coupling between |B0 and |D1 . Therefore, a spin-photon entangled state (|D1 + |00 )/ √ 2 is created. Then, after interfering the photonic modes from each interface at a beam splitter, detection of a single photon projects the two spins into an entangled state. Here, we propose to use the spin-time bin protocol (the Barrett-Kok scheme) to generate the entanglement between two distant nodes, which is much more robust against some important errors such as photon loss, detector loss and cavity parameters mismatch compared the single-photon detection scheme [37,38]. In this protocol, two rounds of single-photon detection are required. After the first round, we flip the spin states |D , |0 of both systems and re-excite |D to |B . The detection of two consecutive single photons (one at each round), will then project the joint state of the quantum systems onto a Bell state. Depending on which detectors click in these two rounds, we obtain two Bell states |ψ ± = (|D0 ± |0D )/ √ 2 with a 50% total probability.
Due to the existing mechanically-induced cavity emission at room temperature, the initial state of the cavity is not perfectly the vacuum state. A more precise initial state can be ob-tained by solving the steady state of cavity mode with only the optomechanical coupling g turned on. Then, the initial state of the cavity is given by [42] where κ 1 κ 2 . The mechanically-induced initial thermal occupation κ 2 /(κ 1 − κ 2 ) is quite small, which is estimated to be around 0.1% using the parameters in Fig. 3. Since this thermal occupation is so small, and it does not affect the quantum system dynamics significantly, we can treat its contributions classically by modelling it as dark counts to simplify the calculations [39]. This dark count rate is given by D th = κ 1 κ 2 /(κ 1 −κ 2 ). Therefore, we start with the initial state of the system: Under this approximation, the mechanically-induced thermal excitation rate in the cavity mode can be set to 0 in Eq. (4), i.e., κ 2 = 0. In this way, the total number of quantum states to simulate is reduced. Now, in order to quantify the entanglement fidelity and efficiency, we follow the photon number decomposition method developed in [39] to compute the time dynamics. The basic idea of this method is to decompose the master equation dynamics into evolution conditioned on single photon detection, which can be done by rewriting the master equation of the whole system (in this case two distant spin-optomechanical systems) as follows:ρ where L 0 = L − 2 i=1 S i with L being the Liouville superoperator that contains all the dynamics of this composite system, and S iρ =d iρd † i is the collapse superoperator of the source fieldd i at the i th single-photon detector [39]. As can be seen, at a given detection time window t f if there is no photon detected, then the system evolves only subject to L 0 , but if there is a photon detected during this time window, then we apply the collapse superoperator to the system. Moreover, as the final state of the system depends on the detected photon count, we would obtain a set of different states, which we call conditional states.
In the Barrett  : Entanglement generation fidelity F gen and efficiency η gen /η 2 t for a single link as a function of protocol time t f . The mechanically-induced initial thermal noise in the cavity is modeled as dynamical dark counts as described in the text, while the detector dark count rate is set to 10 Hz [43]. The detection time window for each time bin T d is set to be equal to half the total detection time window: t f = 2T d . Due to the loss in the channel, it is difficult to see the efficiency curve so it is divided by the factor η 2 t = exp(−L 0 /L att ), where L 0 = 100 km is the length of the link, and L att = 22 km is fiber attenuation distance of telecom photons. The peak value of the fidelity curve F gen is around 97%. All parameters are chosen to be the same for both spin-optomechanics systems and similar to those in Ref. [33], where the parameters are optimized for achieving high indistinguishability and single-photon purity: λ = g = 2π × 100 kHz, δ = 2π×1 MHz, Q m = 3×10 9 , κ 1 = 2Ω = 2π×20 kHz, γ s = 0.01κ 1 [23], and γ 1 = γ 2 = 1.0 × 10 −3 κ 1 .
{(0, 1), (0, 1)} where n l and n e stand for the photon count in the early and late detection time window, and each can take two possible outcomes (1, 0), (0, 1) which correspond to the click in the left detector and the right detector as shown in Fig. 1. Thus, the entanglement generation efficiency and the entanglement generation fidelity can be defined in the following way: where n stands for the detected photon count as mentioned above, and we use |ψ + , when n = {(1, 0), (1, 0)}, {(0, 1), (0, 1)}, otherwise we use |ψ − . Further, due to dark counts (both from detectors and the initial thermal occupation as mentioned above), zero or single-photon conditioned states would give spurious photon counts.
This imperfection is also taken into account when estimating the the entanglement generation fidelity and efficiency, which is discussed in more detail in [39]. Fig. 3 shows the entanglement generation fidelity F gen and efficiency curves η gen /η 2 t for the effective spin-cavity system described by Eq. (4) over the total detection time window t f for a link of 100 km. T d is the detection time window for each time bin, which is set to be half the total detection time window t f . The loss in the channel degrades the entanglement efficiency in proportion to the square of the transmission rate, i.e., η 2 t = exp(−L 0 /L att ), which makes the efficiency curve difficult to see, so it is divided by this factor. We assume a dark count rate of 10 Hz, which is predicted to be achievable for photons in the telecom band using up-conversion single photon detectors (USPDs) in the free-running regime [43] (which do not require cryogenic cooling). After taking the loss in the channel into account, this detector dark count rate is comparable to the rate D th ∼ 100 Hz. This type of detector is also predicted to have low afterpulsing probability [43], making afterpulsing negligible in estimating entanglement fidelity and efficiency. For the detection efficiency, we consider 45% [43], which is later used in the readout fidelity estimates and the repeater rates calculations. Fig. 3 shows that the efficiency degrades gradually after it reaches the maximum due to the thermal-induced flip-flop effect between the bright and dark states. Under the influence of flip-flop effect, both systems continue to emit photons, resulting in the probability of detecting only two photons to vanish when the detection time t f goes to infinity. Likewise, the fidelity decreases after it reaches the maximum, and it starts with fairly low values due to the small signal-to-noise ratio in the beginning. If we choose to terminate the measurement at a proper time as κ 1 t f ∼ 10, then the fidelity is approaching 97% at room temperature.
One can obtain approximate analytical expressions for the entanglement fidelity and efficiency by following the methods developed in [44,45]. In the incoherent regime (2Ω ≤ κ + 2γ * s + 2Γ th ), we can model this four-level system as a threelevel system with the effective emission rate by adiabatically eliminating the spin-photon coher-ence [33,42]: where Γ th = λgn th γ m /δ 2 is the thermal-induced noise. By applying the photon number decomposition method to this spin-optomechanics system [39], we get the entanglement generation efficiency in the Barrett-Kok scheme: where R is the effective emission rate for each system, and η t is the transmission rate in the channel. This is proportional to the product of the two total emission intensities from the two emitters. However, for the room-temperature case where the cavity starts with a small thermal occupation, a more precise expression of the efficiency is given by taking the dark counts into consideration as discussed in [39]. The entanglement generation fidelity F BK is then given by [39] whereC(t f ) takes the following form where R tot = R + 2γ s is the spectral width of the emitted photons for both systems. This fidelity equation is the upper bound for the cryogenic temperature case when there is only optical dephasing. For the room-temperature case, one needs to take into account the mechanicallyinduced thermal contribution in the cavity and the mechanically-induced spin flip-flop effect, which makes the precise analytical fidelity expression very difficult to obtain.

Entanglement mapping
After the successful entanglement generation, we need to store the entanglement between two remote NV electron spins in nuclear spins via performing memory swapping between an electron spin and a nuclear spin at both ends of the link as indicated by two yellow arrows in Fig. 1. This operation is achieved through performing a C n NOT e gate between the electron and nuclear spins plus the measurement of the state of the electron spin.
Assuming that |ψ + is obtained in step 1, since quantum systems are in the dressed basis {|B , |D , |0 }, we need to bring them back to the original basis {|+1 , |−1 , |0 } by turning off the microwave source adiabatically. Then, |D returns to |−1 and |0 remains the same. Here, we denote {|−1 , |0 } as {|↑ e , |↓ e } for the electron spin. Then, we prepare the nuclear spin in the superposition of the spin-up and spin-down states by applying a π/2 RF pulse to the nuclear spin that is initially polarized to the spin-down state via the combination of optical, microwave, and RF fields as discussed in [46]. There are several options for nuclear spins in diamond such as 14 N [47] and 15 N [48]. Here, we use 13 C as the nuclear spin in an isotopically purified sample, which has the nuclear spin I = 1/2 [22,23,49]. The state is then given by where |⇓ n and |⇑ n correspond to m I = −1/2 and m I = +1/2 individually. Now, a C n NOT e gate can be performed between the electron and nuclear spins using the hyperfine interaction between them. Fig. 4 shows the hyperfine structure for performing two-qubit gates between the electron spin and the nuclear spin and one-qubit gates on each of them individually. The electron-nuclear spin Hamiltonian is given by with the zero-field splitting ∆ 0 =2.87 GHz, the electronic spin gyromagnetic ratio µ e = −2.8 MHz/Gauss, the nuclear spin gyromagnetic ratio µ n =1.07 kHz/Gauss, the external magnetic field B is applied along the symmetry axis of the NV, and the hyperfine coupling A ranges from tens of kHz to 100 MHz for a 13 C nuclear spin [22,50,51]. The C n NOT e gate can be implemented by a Ramsey sequence on the electron spin at room temperature, where the free precession time is chosen to be t = π/A with the magnetic field of several hundred Gauss [22,46,47]. The efficient realization of the CNOT gate with fidelity of 99.2% at ambient conditions has been Figure 4: The NV center with a 13 C can be modeled as a four-level system. Nuclear spin sublevels |⇑ n and |⇓ n are addressed by RF radiation with Rabi frequency Ω RF . The electronic spin sublevels are driven via a microwave field Ω MW but when the electron spin is |↓ e , the microwave field has relative detuning given by hyperfine interaction A. demonstrated using composite pulses and an optimized control method [52] as well as the dynamical decoupling technique [53][54][55]. The dynamical decoupling technique is also important in the entanglement generation where the electron spin can be decoupled from the nuclear spin bath to have millisecond-long coherence time at room temperature [24,55]. However, in our entanglement generation step the NV electron spin is in dressed states under a far-detuned microwave source, which itself is already robust against the nuclear-bath-induced noise [40,56].
Two C n NOT e gates on both ends of the link lead to a four-qubit entangled state. So the projective measurement in the Z basis on the state of the electron spin is required to complete the entanglement storage, which projects this fourqubit entangled state to an entangled state of the nuclear spins. Typically, fluorescence detection can be used to determine the state of the electron spin after the projective measurement at low temperature around 4K with good fidelity [57], which enables the cryogenic-temperature entanglement storage in nuclear spins [35,49]. Unfortunately, at room temperature the intensity of electronic spin-up and spin-down states only differ by roughly a factor of 2 due to the fact that the phonon-induced broadening greatly diminishes the resolution of these two Zeeman states [47]. Thus, the past decade has seen a great deal of experimental efforts put into solving this problem [46,[58][59][60]. In Sec. 3, we propose two electron spin readout schemes based on the spin-optomechanics system.

Entanglement swapping
After mapping the entanglement to the nuclear spins, the electron spins are free and we can use them again to generate entanglement between the electron spins i and i + 1. This is done in step 3 as illustrated in Fig. 1. Then, the entanglement swapping is achieved as follows: a C n NOT e gate at each endpoint of this link is applied, giving us an entangled state of these six spins. Via performing measurements on the electron spin in the Z basis, one ends up obtaining an entangled state of four nuclear spins. Depending on the measurement outcomes, one gets different entangled states. Here, we assume that we get the following four-qubit entangled state: In order to complete the entanglement swapping, i.e. to only entangle nuclear spins i − 1 and i + 2, one still needs to disentangle two nuclear spins i and i + 1 in between. This can be done by measuring them in the X basis but unfortunately, one cannot optically read out the nuclear spin directly. However, it turns out that the nearby electron spins can be used to indirectly read out the nuclear spin state [47,61]. The basic idea is as follows: first, a Hadamard gate is performed on the nuclear spins i and i+1 individually by applying a π/2 RF pulse to make |⇑ n → 1/ √ 2(|⇓ n + |⇑ n ) and |⇓ n → 1/ √ 2(|⇓ n − |⇑ n ). Second, the electron spin nearby is initialized to |↑ e , and we again perform a C n NOT e gate, mapping the nuclear spin state to the electron spin state. Therefore, the readout of the nuclear spin could be achieved by performing the measurements in the Z basis on the electron spin, followed by the readout of the measurement outcome which is discussed in detail in Sec. 3. The post-measurement state is given by where the final state depends on the outcomes of the electron spins readout. Therefore, nuclear spins i − 1 and i + 2 are entangled as indicated by the long red wavy line in Fig. 1(b). As we can see, the entanglement swapping process is in fact equivalent to the entanglement mapping process plus the readout of two nuclear spins.

The electron spin readout
Applying previously proposed readout methods to our system is quite challenging since they require extra techniques and apparatus such as using nuclear spin ancillae, spin-to-charge conversion [58] and photoelectrical imaging [60] to achieve a high-fidelity readout of electron spin at room temperature. Hence, we propose to read out the electron spin state at room temperature using the spin-optomechanics interface. In this section, two intensity-based readout schemes are proposed to distinguish the electron spin state at room temperature.

Readout scheme using periodic driving pulses
In the readout scenario, the aim is to distinguish the states |0 or |D . The intuitive idea is to perform a π pulse on the transition between |B and |D , which will excite the state |D to |B while keeping the state |0 unchanged. Then the state |B will decay back to |D according to the process described in Fig. 2(a) and will emit a single photon. By measuring a single photon, we can determine that the state is initially in the state |D or |0 . However, measuring a single photon may not be the optimal way to distinguish these two spin states due to the photon loss in the channel and the dark counts in detectors. Therefore, we provide two extended readout schemes, the periodic driving scheme and the continuous driving scheme to achieve the high-fidelity readout of NV electron spin states.
In the periodic driving scheme, periodic pulses are used to drive a cycling transition between the states |B and |D . Assuming a perfect MW π pulse is applied to the state |D , it is excited to the state |B and then returns to the state |D with a single photon emitted. Then we repeat this process. In the adiabatic elimination regime, the total Hamiltonian is given bŷ whereĤ eff is given by Eq. (3), and g d is the coupling strength for the driving pulse, and f (t) is a periodic delta function with the form δ(t − nT p ) and the period T p is the inverse of the decay rate R. The simulation results are shown in Fig. 5 (a). The solid red and dot-dashed purple curves are the cavity photon population and the NV spin population respectively when the NV spin is initially in the state |D , and the dashed red and purple lines are the cases where the initial NV spin state is |0 . We can define the brightness (intensity) as the average number of emitted photons: or 0 representing the initial NV spin states in |D and |0 respectively, where â † (t)â(t) i is the corresponding average cavity photon number. A single photon is emitted within a period shown as the gray shade in Fig. 5(a).
To estimate the readout fidelity, we consider the measurement being repeated N times and each measurement is independent. Thus, the number of photons detected within the total measurement time N T p can be described by a bi-nomial distribution, and the probability of detecting n photons is P N,n,p = n N p n (1 − p) N −n , where p i = ηβ i is the probability of detecting a single photon within the detection time window, and η is the total efficiency with which an emitted photon can be detected. One can plot P N,n,p corresponding to β D and β 0 and find the intersection point [42]. The intersection point is the threshold that decides the measurement result: if the number of photons detected is more than the threshold, the photons are most likely coming from the emitter and therefore the NV spin state is decided to be |D ; if the number of photons detected are less than the threshold, the NV state is assumed to be |0 because these photons are highly possible from the thermal noise. The detailed discussion is in the supplementary material [42].

Readout scheme using continuous driving pulses
The continuous driving scheme employs a continuous-wave (CW) laser to drive the bright and the dark spin states. Similarly, the Hamiltonian in this case is given bŷ (17) Under this Hamiltonian, the cavity mode will eventually reach a non-zero equilibrium state as shown in Fig. 5(b). To give the calculation of the readout fidelity, we assume that the detection is a Poisson process, where the probability of detecting n photons is given by P (n, λ) = λ n e −λ /n!, where λ is the average photon counts within total detection time T 0 , given by λ i = ηκ t 0 +T 0 t 0 dt â † (t)â(t) i with i = D or 0 corresponding to the initial states |D or |0 respectively. Similarly to the treatment in the periodic driving scheme, the intersection point of these two plots of the probability distribution functions gives the threshold and the detailed discussion can be found in the supplementary material [42].
Instead of showing the readout fidelity, here we show the readout infidelity (1 − F ) of these two schemes in Fig. 6 for the clearer demonstration of how well our readout schemes work. The dark count rate is taken to be 10 Hz in detectors [43], which is negligible because the average number of dark counts within ms time period is on the order of 10 −3 , much smaller than Figure 6: The relation between the readout infidelity (1 − F ) and the total readout time with the parameters used in Fig.5. For the periodic driving scheme (plotted as purple squares), β D = 0.929, β 0 = 0.034, and the driving period (plotted as red triangles) is T = 0.02 ms; for the continuous driving scheme, a † (t)a(t) D = 0.202 and a † (t)a(t) 0 = 0.014. The solid, dashed, and the dash-dotted lines correspond to the total detection efficiency η = 0.05, 0.1, and 0.5, respectively [43,62,63]. The time axis is the total readout time N T p , where N is the total pulse number in the periodic driving scheme. The discontinuity of the first derivative shown on the curves is due to the change of the threshold (because the threshold is always an integer). the average number of emitted photons during the whole readout process. Also, the afterpulsing probability can be efficiently suppressed to be lower than 1% [43], which makes it negligible as well. Comparing these two schemes, the continuous driving scheme requires more time to have the same infidelity due to the lower signalto-noise ratio in the present parameter regime than the periodic driving scheme. To achieve the high-fidelity readout (> 99%), the readout time is typically in the ms timescale for both of our schemes with detectors that have pretty poor efficiencies. However, a high-fidelity readout can be achieved in a shorter timescale if we use higher-efficiency detectors, which are however challenging to realize for telecom wavelength photons [62,64] at non-cryogenic temperatures. In comparison to other proposed methods [46, 58-60, 65, 66], which also demonstrate a high-fidelity readout of the electron spin in NV centers in ms timescale, these two readout schemes appear to predict comparable performance, without having to add extra elements to our setup. Thus, in our proposal for building a room-temperature quantum network, these spin-optomechanics systembased readout schemes serve as more natural and friendly candidates than other room-temperature readout methods.

Entanglement generation rates and overall fidelities
We use a "two-round" repeater protocol. During the first round, the entanglement is generated between electron spins in every other elementary link and then is mapped to corresponding nuclear spins, which also sets those electron spins free. For the remaining links, the entanglement is generated in the second round, followed by the entanglement swapping that distributes entanglement between the first and last nuclear spins. Although entanglement generation between the electron spins is probabilistic, the failure of such an attempt does not disturb the entanglement stored in the nuclear spins if the dynamical decoupling is being applied during the entanglement generation [54,55,67,68]. This means that the second round of the entanglement generation process can be repeated many times until success while not affecting the stored entanglement. However, this is true only when the decoherence of nuclear spins is negligible, which is discussed in more detail below. Hence, our two-round repeater protocol makes the widely-used nested repeater structure no longer necessary [7,10,11].
Considering an even number of links m, each with length L 0 , the total entanglement distribution time is given by where f (m/2) is the factor of the average number of attempts required to successfully establish entanglement in all m/2 links, and p 0 is the entanglement generation probability, and L is the total distance, and c = 2 × 10 8 ms −1 is the speed of light in optical fiber, and T mp , T sw are the total entanglement mapping time and the total entanglement swapping time respectively. Both of these times are made up of CNOT gate time plus the measurement time as discussed in Sec. 2.3 and Sec. 2.4. The numerical results shown in the supplementary material [42] show that f (x) = 0.64 log 2 (x) + 0.83 is a good approximation, and one can recover the well-known 3/2 factor by setting x = 2. In contrast to the nested repeater approach [11], where the average entanglement distribution time has a linear depen-dence on the number of links, we here have a logarithmic dependence. Intuitively, the scaling improvement of two-round protocol comes from the fact that there is no hierarchy of entanglement swapping process, where higher level swapping can only start under the condition of the success of the lower level. Therefore, the main thing left for us is to successfully generate the entanglement simultaneously for these links, which is calculated to have logarithmic dependence on m/2. This scheme could significantly enhance the entanglement distribution rate for a quantum network with much more links, e.g., networked quantum computing [69]. Fig. 7(a) shows the repeater rates as the function of distance for four different numbers of links and direct transmission. With 45% detection efficiency, our protocol yields 10 Hz with 8 links at 800 km. This rate is comparable to cryogenic schemes, such as the rareearth ion-based scheme [11] and the microwave cat qubit-based scheme [10], and it outperforms the well-known DLCZ protocol for laser-cooling based systems [7], which gives less than 1 Hz rate at 800 km. However, if the detection efficiency is significantly lower, e.g. 10% [63], multiplexing would be needed with about 15 multiplexed channels to achieve similar rates. The whole repeater protocol consists of three parts described in Sec. 2. However, instead of taking the fidelity of each part into consideration, here we consider the overall fidelity as where F gen is the fidelity of entanglement generation given in Fig. 3, which needs to be established over m elementary links. F mp is the fidelity of an entanglement mapping operation as described in Sec. 2.3, and F nro is the readout fidelity of the nuclear spin. This overall fidelity equation is only valid in the high-fidelity regime. The fidelity of entanglement swapping includes the fidelity of entanglement mapping plus the readout of two nuclear spins. Therefore, in total we need to generate entanglement for m links, and perform m times entanglement mapping to obtain a chain of nuclear spins followed by the readout of m − 1 nuclear spins to achieve the final entangled state between the first and the last nuclear spins. The nuclear spin readout can be achieved by mapping its state to the electron spin, and applying the readout methods discussed in Sec. 3.  [43]. (b) Fidelity plots with respect to the total distance with detection efficiency of 45%. The CNOT gate fidelity is taken to be 99.2% [52]. The electron spin readout fidelity is taken to be 99.9% based on Fig. 6. At 800 km, the overall fidelity for four links drops below 60%, which is due to detector dark counts. Fig. 7(b) shows the overall fidelities with respect to the total distance for this quantum network with the detection efficiency of 45%. At 800 km, the overall fidelities are still fairly high, except for the case of 4 links where the overall fidelity drops below 60% due to the comparatively large effect of detector dark counts when the transmission loss for the comparatively long elementary links is taken into account.
For an eight-link repeater with 45% detection efficiency, the rate is far above 10 Hz at the crossover point (around 450 km) as shown in Fig. 7(a), on which time scale it is well within the coherence time of nuclear spins which can be longer than a second [22] so the decoherence is negligible in this case. This is also true for the four-link, six-link and ten-link cases. Thus, Fig. 7(b) is a valid approximation of overall fidelities in this regime. For the repeaters with much lower detection efficiencies, e.g. 10%, the rates are significantly lower so the decoherence of nuclear spins would seriously degrade the final fidelities. In this case, we can use multiplexing to enhance the rates (about 15 multiplexed channels needed), which will make the decoherence of nuclear spins negligible.
In addition, our eight-link repeater yields the final fidelity of around 74% at the cross-over point (around 450 km) with 45% detection efficiency, and the six-link repeater yields around 80% final fidelity at the cross-over point (around 470 km) with 45% detection efficiency. These fidelities are comparable to the DLCZ protocol for laser-cooling based systems with 75% for eight links [7], and cryogenic schemes such as the rare-earth ion-based scheme with around 80% for eight links and the microwave cat qubit-based approach with around 60% for eight links [10]. The overall entanglement fidelity could be further improved using entanglement purification protocols [49,70,71], which would make this quantum network architecture fault-tolerant.

Implementation
The spin-optomechanics setup proposed in Ref. [33] is mainly composed of a high-Q cavity patterning with a SiN membrane of ultrahigh Qf (quality×frequency) product, where a small magnetic tip is attached. This hybrid device allows a single NV electron spin to be effectively coupled to photons inside the cavity, emitting a single photon with high purity and indistinguishability at room temperature. However, due to the design where the SiN membrane serves as a part of the optical cavity, the cavity finesse is limited to the order of 10 4 . The other key requirement for this system to work well is the low decay rate, κ ∼ 10 4 Hz in the optical cavity. These two key factors constrain the length of the cavity to be around 0.6m [33]. Here, we propose a new design for this spin-optomechanics interface that uses the membrane-in-the-middle geometry to greatly reduce the cavity length. With this membranein-the-middle design, one could significantly reduce the cavity length using a high-finesse cavity, since the finesse scales as F = πc/Lκ, where κ is the cavity damping rate. As previously estimated, the cavity length is around L = 60 cm with finesse F = 12000. With the new design it might be possible to reduce this to around L = 0.6 cm, if a finesse of order 10 6 can be achieved, see e.g. Ref. [72].
The spin-optomechanics interface shown in Fig. 1(b) illustrates our envisioned spinoptomechanical transducer. A SiN membrane is placed between the node and the anti-node of the cavity modes (of both the cooling mode and the control mode) such that the optomechanical coupling is still linear and not quadratic like many other membrane-in-the-middle experiments [73][74][75]. The membrane-in-the-middle design allows us to use a membrane with a thickness much smaller than the light wavelength, which reduces the potential optical losses such as absorption and scattering due to the significantly smaller overlap between the membrane and the optical field [73]. Similar to the previous proposal, a red-detuned control laser is used to drive the cavity for single photon extraction, which is set to be equal to the transition energy between dressed spin states ω q . The other red-detuned laser with detuning equal to the phonon sideband ω m is used to cool the oscillator from room temperature, which is also possible to achieve in this proposed device [32,76].
Moreover, the spin-mechanics coupling is achieved by a magnetic tip that is attached to the SiN membrane at the bottom, and a NV center in bulk diamond is placed nearby as shown in Fig. 1(a). The required strong spin-mechanics coupling (λ ∼ 10 5 Hz) can be realized by a magnetic field gradient of 10 7 T/m with a SiN membrane of ∼pg effective mass [33]. This SiN membrane also needs to have ultra-low damping rate γ m , which is discussed in [31,33]. As the magnetic tip is attached to the SiN membrane, the quality factor of the membrane may be degraded. This could be compensated by further improving the initial quality factor of the membrane without the tip, which is possible to implement as the limit of the quality factor still has been not reached. With the combination of the methods in [31] and [32], one can get quality factors as high as 10 10 , which gives some room to improve our current Q factor ∼ 10 9 .

conclusions and outlook
We presented a room-temperature quantum network architecture based on NV centers in diamond and a spin-optomechanical interface. We showed that high-fidelity entanglement between electron spins can be generated between two distant nodes under realistic conditions. Nuclear spins associated with the NV centers can be utilized as quantum memories. We showed that the spin-optomechanical interface also offers the possibility to read out electron spins at room temperature with high fidelity on ms timescales. Furthermore, we proposed an entanglement distribution protocol where the average distribution time shows logarithmic scaling with the number of links as opposed to linear scaling in conventional nested protocols. A membrane-in-themiddle design may allow to reduce the dimensions of the spin-optomechanics interface to the sub-cm range, thus improving its potential for integration and scalability.
We have here focused on room-temperature quantum repeaters as a medium-term goal, but the proposed approach also holds promise for the implementation of distributed quantum computing [69,77], extending photonic approaches to quantum information processing in diamond [78,79] beyond cryogenic temperatures. Nuclear spins in diamond offer the possibility to implement quantum error correction codes [49,[80][81][82], which, when integrated into our present approach, may enable fault-tolerant quantum communication and quantum computation under ambient conditions.

S2: Adiabatic elimination
When δ λ, g, one can adiabatically eliminate the oscillator either by following the method [84] to obtain the Heisenberg-Langevin equations for cavity modeâ and NV spinσ − after the elimination ofb or by settingḃ = 0, and obtainingb in terms ofâ andσ − . Here, we follow the second way to obtain Under the conditions δ γ m /2 and γ m 1, which are true in this system, this can be well approximated asb where we ignore decay-related terms and only keep coherent parts. Now, substituting this in the Hamiltonian (Eq. (22)), we obtain the effective Hamiltonian after the adiabatic elimination where Ω = λg/δ is the effective interaction between the cavity mode and the NV electron spin. In order to get the effective master equation, we also need to compute the decoherence terms related to the oscillator modeb. Using Eq. (27), the thermal relaxation Lindbladian (n th + 1)γ m D[b]ρ can be rewritten as where the off-diagonal terms correspond to the incoherent interaction between the cavity mode and the spin and the thermal-induced crossdecoherence between these two modes, which can be ignored if δ n th γ m . This is satisfied in our system even at ambient conditions. The same is true for the thermal excitation Lindbladian n th γ m D[b † ]ρ, which can be written as Therefore, the effective master equation is given bẏ where κ 1 = κ + g 2 γ m (n th + 1)/δ 2 is the effective cavity decay rate, and κ 2 = g 2 n th γ m /δ 2 , γ 1 = λ 2 γ m (n th + 1)/δ 2 , and γ 2 = λ 2 n th γ m /δ 2 are the mechanically-induced thermal excitation rate for the cavity mode, and the mechanically-induced thermal flip-flop rates for the spin respectively.

S3: Effective emission rate
Under the condition λ = g, the effective Hamiltonian shown in Eq. (28) can be rewritten in the rotating frame of the spin frequency λ 2 /δ Together with the effective master equation shown in Eq. (31), we obtain a set of optical Bloch equations for the cavity photon population, NV spin population and the coherence between them as Since we are mainly interested in the single-photon regime, the term â †âσ z can be simplified as − â †â . Hence, these optical Bloch equations can be rewritten as In the incoherent regime, the cross terms that are responsible for the Rabi oscillation, i.e., â †σ − and âσ + , can be eliminated [85], resulting in where R is the effective decay rate which describes the population transfer between the cavity photon and the NV spin, and it is given by Moreover, given that at room temperature n th 1, the effective decay rate R can be written in a more compact form where Γ th = λ 2 n th γ m /δ 2 = λgn th γ m /δ 2 is the thermal noise for the NV electron spin.

S4: Initial state of the cavity
The initial state can be obtained by solving the steady state of cavity mode with only the optomechanical coupling g turned on. Thus, we set Ω = 0, and we obtain the following equation: Solving this equation, we get the average occupation number of the cavity mode:n c = â †â = κ 2 κ 1 −κ 2 . As this occupation is very smalln c ≈ 10 −3 , it is valid to truncate the Hilbert space up to |1 . Hence, the initial state of the cavity is given by:

S5: Photon counting statistics
Our goal is to distinguish spin states |D and |0 . Let us denote the conditional probabilities of measurement outcome ± given that the initial state of system is |i , with i ∈ {D, 0}, as P (±|i) = p ± i . The total probability of outcome ± is then given by p ± = p D p + D + p 0 p + 0 where p i is the total probability of the system being in state i. Then the conditional fidelity is defined as the conditional probability P (D|+) (P (0|−)) of having state D (0) given outcome + (−). This is given by Bayes' theorem: F + ≡ P (D|+) = p D p + D /p + and F − ≡ P (0|−) = p 0 p − 0 /p − . We can then define the total fidelity as the weighted average F = (p + F + + p − F − )/p η where p η = p + + p − is the total probability of having a measurement outcome. In the case that p D = p 0 = 1/2 and p η = 1, the fidelity reduces to the average of the conditional probabilities F = (p + D + p − 0 )/2. The most widely-used approach for spin readout is to use a cycling transition, which involves the emission and detection of a large number of photons. The photon-counting histogram shows the probability distribution of the number of photons detected and has two traces: one for photons emitted from the emitter and the other for the thermal noise contribution (non-zero cavity photon number when the spin state is at |0 ). The cross-over point of the two traces corresponds to the photon number threshold, above which we can be confident that the photons come from the emitter, thus determining that the spin state is |D ; otherwise, the spin state is |0 , meaning that the photons most likely come from the thermal noise.
Here we show the photon-counting histogram for the pulsed driving scheme and the continuous driving scheme in Fig. 8. For pulsed driving scheme, the photon-counting histogram is described by a binomial distribution P N,n,p = n N p n (1 − p) N −n , where p = ηβ, η is the total efficiency that an emitted photon can be detected, and β is the brightness of the cavity photon. For the parameters used in Fig. 5, β = 0.929 and β = 0.034 for the initial spin states |D and |0 respectively. We plot the photon-counting histogram in Fig. 8(a) for a total pulse number of 100 (so the corresponding total readout time is 2 ms). The blue solid line and the yellow solid line show the probability distribution with respect to the detected photon number when the spin is in state |D and |0 , respectively. The threshold is thus determined by the corresponding number of photons at the intersection of the two lines, and it is n t = 9 in this case. The readout fidelity is given by Then the estimated fidelity is 0.99999. For the continuous driving scheme, we plot the photon-counting histogram for the corresponding Poisson distribution, shown in Fig. 8(b). In this case, the probability distribution of detecting n photons is P (n, λ) = λ n e −λ /n!, where λ is the average number of photons detected and is proportional to the readout time. For the param-eters we used in Fig. 6, λ D /λ 0 = 14.43, where λ D and λ 0 are for the case of spin state |D and |0 , respectively. This gives two probability distributions that intersect at a photon number of 4. This means that the threshold is 4, and the readout fidelity is 0.997 using Eq. (40).

S6: f (x) Derivation
Here we provide derivation of f (x) used in Sec. 4. For x elementary links, we define the average number of attempts required to independently generate entanglement in all x links as n max,x = f (x)/p 0 , where p 0 is the entanglement generation probability. For a single link, the probability of a successful entanglement generation with n attempts is given by P (n) = p 0 (1 − p 0 ) n−1 . Thus the joint probability of successful entanglement generation for all x links with attempts n 1 , n 2 , ..., n x is P j (n 1 , n 2 , ..., n x ) = The probability distribution function (PDF) of n max,x is P (n max,x ) = x k=1 P j (n k = n max,x , n =k < n max,x ) + l,x k=1,l=2 P j (n k,l = n max,x , n =k =l < n max,x ) + ... + P j (n 1 = n 2 = ... = n x = n max,x ).
One can check that the function f (2 k ) almost linearly increases with k, and the regression result gives f (2 k ) = 0.64k + 0.83.
Therefore, we obtain the following empirical expression for f (x) by replacing 2 k with x and k with log 2 (x) in Eq. (45).