Policies for elementary links in a quantum network

Distributing entanglement over long distances is one of the central tasks in quantum networks. An important problem, especially for near-term quantum networks, is to develop optimal entanglement distribution protocols that take into account the limitations of current and near-term hardware, such as quantum memories with limited coherence time. We address this problem by initiating the study of quantum network protocols for entanglement distribution using the theory of decision processes, such that optimal protocols (referred to as policies in the context of decision processes) can be found using dynamic programming or reinforcement learning algorithms. As a first step, in this work we focus exclusively on the elementary link level. We start by defining a quantum decision process for elementary links, along with figures of merit for evaluating policies. We then provide two algorithms for determining policies, one of which we prove to be optimal (with respect to fidelity and success probability) among all policies. Then we show that the previously-studied memory-cutoff protocol can be phrased as a policy within our decision process framework, allowing us to obtain several new fundamental results about it. The conceptual developments and results of this work pave the way for the systematic study of the fundamental limitations of near-term quantum networks, and the requirements for physically realizing them.


Introduction
The quantum internet [1][2][3][4][5] is one of the frontiers of quantum information science. It has the potential to revolutionize the way we communicate and do other tasks, and it will allow for tasks that are not possible using the current, classical internet alone, such as quantum teleportation [6][7][8], quantum key distribution [9][10][11][12], quantum clock synchronization [13][14][15][16], distributed quantum computation [17], and distributed quantum metrology and sensing [18][19][20][21][22][23]. The backbone of a quantum internet is entanglement distributed globally in order to allow for such novel applications to be performed over long distances. Consequently, longrange entanglement distribution is one of the main problems in quantum networks.
Most of the aforementioned applications are beyond the reach of current and near-term quantum technologies. Consequently, we are currently in the era of so-called near-term quantum networks, which are characterized by the following elements [4]: Small number of nodes; imperfect sources of entanglement; non-deterministic elementary link generation and entanglement swapping; imperfect measurements and gate operations; quantum memories with short coherence times; no (or limited) entanglement distillation/error correction. The most prominent applications of these near-term quantum networks are quantum teleportation and quantum key distribution. In fact, several experiments have already realized these applications for details). We associate every quantum network with a graph G = (V, E), in which the vertices V represent the network nodes and the edges E represent quantum channels, which we refer to as elementary links. We associate a source station to the elementary link corresponding to e ∈ E, which distributes entangled states to the nodes belonging to e. (Right) In order to analyze an elementary link with respect to time, we define a quantum decision process; see the beginning of Section 3, and Appendix C, for details.
What are the requirements for physically realizing near-term quantum networks? More generally, what are the limitations of near-term quantum networks? Although several software tools for simulating quantum networks have been released in order to probe these questions [44][45][46][47][48][49][50], it is of interest to develop a formal and systematic theoretical framework for entanglement distribution protocols in near-term quantum networks that can allow us to address these questions in full generality. Such a theoretical framework, which is currently lacking (see Appendix A for a review of prior theoretical work on quantum networks), should incorporate both the limitations of near-term quantum technologies and be general enough to allow for optimization of protocol parameters. It should also allow us to answer the following basic questions for arbitrary protocols: (1) What is the quantum state of the network? (2) What is the fidelity of the quantum state of the network with respect to a given target state? (3) What protocol is optimal with respect to fidelity (or some other figure of merit)? The purpose of this work is to answer these questions at the level of elementary links in a quantum network, as a first step towards the development of a general framework for practical quantum network protocols. We do so by introducing a general framework for analyzing elementary links in a quantum network based on quantum decision processes; see Figure 1. In a decision process [51], an agent interacts with its environment through actions, and it receives rewards from the environment based on these actions. The goal of the agent is to devise a policy (a sequence of actions) that maximizes its expected total reward.
To see why decision processes are a natural way to describe protocols in a quantum network, consider a very simple example. Consider three nodes labeled A, B, and C. Quantum channels connect A and C and B and C, and the goal is to create entanglement between A and B. The usual protocol to achieve this goal is to first create entanglement between A and C and B and C, and then to perform entanglement swapping at C. However, because entanglement creation is typically non-deterministic with near-term quantum technologies, it is possible that, e.g., the entanglement between A and C is created first. With near-term quantum memories, storing this entangled state for too long while entanglement between B and C is being created can lead to too much decoherence, rendering the final, swapped state between A and B useless. A decision must therefore be made to either wait (i.e., keep the entanglement between A and C in quantum memory) or to discard the entanglement shared by A and C and create the entanglement again. The framework of decision processes provides us with the language and mathematical tools needed to address this problem for arbitrary networks, not just a network of three nodes.
In addition to being natural, one of the other advantages of the approach taken in this work is that optimal protocols can be discovered using reinforcement learning algorithms. This is due to the fact that decision processes form the theoretical foundation for reinforcement learning [52] and artificial intelligence [53]. (See [54] for related work on machine learning for quantum communication.) Another advantage of our approach is that, even though reinforcement learning techniques cannot always be applied efficiently to large-scale problems, decision processes provide us with a systematic framework for combining optimal small-scale protocols in order to create large-scale protocols; see [55] for similar ideas. The framework introduced in this work can also be extended to allow for a systematic consideration of agents with local and global knowledge of the network, as well as agents that are independent and/or cooperate with each other. These extensions are interesting directions for future work, and they will lead to a more complete theory of practical quantum network protocols. This work represents the starting point towards this ultimate goal.

The entanglement distribution task
In the left panel of Figure 1, we illustrate an arbitrary quantum network using its corresponding hypergraph G = (V, E). Specifically, G corresponds to the physical layout of the network, which we assume to be fixed. The nodes of the network are associated to the vertices V of the graph, and quantum channels physically connecting the nodes in the network are represented in the graph by gray edges, belonging to the set E, connecting the corresponding vertices. We refer to these edges as elementary links. The quantum channels are used to distribute entangled states to the nodes of an elementary link. For two-node elementary links, the associated quantum channels are used to distribute bipartite entangled states; for elementary links with three or more nodes, the associated quantum channels are used to distribute multipartite entangled states. When an entangled state is distributed successfully along an elementary link, we color the corresponding edge red (in the case of two-node elementary links) or blue (in the case of elementary links with three or more nodes), and we refer to the elementary link as an active elementary link. The task of entanglement distribution is to use the active elementary links to create virtual links. A virtual link corresponds to an entangled state shared by nodes that are not physically connected. Entanglement distribution protocols can be described in terms of graph transformations, as done in [56,57], which take the graph G and transform it into a target graph G target , whose edges contain a subset of the elementary links in G along with a desired set of virtual links. The quantum states corresponding to the elementary and virtual links must be close (in terms of, e.g., fidelity) to a target quantum state. Typically, in the bipartite case, the target is a Bell state, while in the multipartite case the target is a GHZ state.
A basic example of an entanglement distribution protocol, and the one that we consider in this work, consists of first generating elementary links, and then performing joining protocols, such as entanglement swapping, to create virtual links. If the active elementary links obtained after the first step, along with their fidelities to the target states, do not allow for the target graph to be created-for example, some of the required elementary link attempts might have failed-then it makes sense to retry the elementary link generation for the ones that failed. For the ones that succeeded, it might make sense to keep the quantum states in memory rather than discard the states, request new ones, and risk some of these new attempts failing. This sequence of decisions at every time step defines a policy. Some questions then naturally arise: 3. What is the (optimal) sequence of actions (i.e., the optimal policy) that should be performed for every elementary link, as a function of time? We address this question in Section 3.4.
These are the main questions that we address in this work, and we do so using the theory of quantum decision processes.

Quantum decision process for elementary links
Let us now illustrate how the three questions posed at the end of the previous section can be answered using quantum decision processes. The basic idea is illustrated in Figure 1, and we present the formal definition of the quantum decision process in Appendix C. Let G = (V, E) be the graph corresponding to the elementary links of a quantum network. To every edge e ∈ E of the graph we associate an independent agent. The agent should be thought of as a collection of (classical) devices located at the nodes corresponding to the edge, which can communicate with each other and thus operate as a single entity. The environment associated with the agent is the collection of quantum systems distributed to the corresponding nodes by a source station. Now, at t = 0, an attempt is made to generate entanglement along an elementary link corresponding to e ∈ E. This means that the source station associated with e prepares a multipartite entangled state and sends the corresponding quantum systems to the nodes of e. There are two key elements of this elementary link generation process (see Appendix B for details): • The elementary link generation success probability p e ∈ [0, 1], which is a function of the transmissivity of the transmission medium (in the case of photonic implementations) and parameters that quantify imperfections in the local gates and measurement devices.
• Depending on success or failure of elementary link generation, the nodes of e can be in one of two quantum states, by definition: ρ 0 e in the case of success, and τ ∅ e in the case of failure. In the case of success, the quantum state is held in quantum memories at the nodes. Then, the quantum state after m time steps in the local quantum memories is given by 1) where N •m e = N e • N e • · · · • N e (m times), and N e describes the noise processes of the local quantum memories; see Appendix B for details.
If the elementary link generation succeeds at time t = 0, then at time step t = 1 the agent might decide to keep the quantum state currently in memory; if it fails, then at time step t = 1 the agent might decide to perform the elementary link generation again. In general, then, at every time step t ≥ 1, the agent associated with e ∈ E can perform two actions: "wait" (i.e., keep the entangled state currently in quantum memory), or "request" (discard the entangled state currently in quantum memory and request a new one from the source). This choice of action can be random, so we define action random variables A e (t) taking two values: 0 for "wait" and 1 for "request". Based on the agent's choice, the distribution of the quantum systems from the source station to the nodes probabilistically succeeds or fails (we expand on this in Appendix B). We define elementary link status random variables X e (t) to indicate the outcomes: 0 for failure and 1 for success. If X e (t) = 0, then the elementary link is considered inactive, and it is considered active if X e (t) = 1. The history of the agent is then defined as the sequence H e (t) := (X e (1), A e (1), X e (2), A e (2), . . . , A e (t − 1), X e (t)) for all t ≥ 1, with H e (1) = X e (1). Every realization of the history is a sequence of the form h t = (x 1 , a 1 , x 2 , a 2 , . . . , a t−1 , x t ), with x j ∈ {0, 1} for all 1 ≤ j ≤ t and a j ∈ {0, 1} for all 1 ≤ j ≤ t − 1. Note that {0, 1} 2t−1 is the set of all histories up to time t. We then think of X e (j) and A e (j) as functions such that X e (j) Now, because the actions of waiting and requesting can be random, we define the random variable M e (t) to be the amount of time the quantum state of the elementary link corresponding to e is held in memory. It satisfies the recursion relation where M e (0) ≡ −1 and A e (0) ≡ 1. Intuitively, the quantity M e (t) is the number of consecutive time steps up to the t th time step that the action "wait" is performed since the most recent "request" action. The value M e (t) = −1 can be thought of as the resting state of the quantum memory, when it is not loaded.
The agent's policy, i.e., the policy of the elementary link corresponding to the edge e ∈ E, is a sequence of the form π e = (d 1 , d 2 , . . . ), where the decision functions d t are defined as In other words, the decision functions give us the probability that the agent takes a particular action, given the history of actions and statuses.
Let {π e : e ∈ E} denote a collection of policies for all of the elementary links of the network. Then, the quantum state of the network at time steps t ≥ 1 is e∈E σ πe e (t), (3.4) where σ πe e (t) is a classical-quantum state for the elementary link corresponding to e ∈ E, which has the form σ πe This classical-quantum state captures both the history of the agent-environment interaction corresponding to the elementary link (in the classical register H t ), as well as the quantum state of the nodes belonging to the elementary link conditioned on a particular history. We present explicit expressions for Pr[H e (t) = h t ] πe and σ e (t|h t ) in Section 3.1 below.
The tensor product structure in (3.4) holds because, by assumption, all of the agents corresponding to the elementary links are independent. This means that all of the agents have knowledge only of the status of their own elementary link. We stick to this setting throughout this work. However, we can use quantum decision processes to develop quantum network protocols in which the agents can cooperate, so that they have knowledge of the network in a certain local neighborhood of their elementary link. The quantum state of the network would then not have the simple tensor product structure as in (3.4). Furthermore, in the case of independent agents, we can use the quantum decision processes for the elementary links as building blocks for quantum decision processes for groups of elementary links, leading to more sophisticated quantum network protocols. We leave these investigations as interesting directions for future work.

Quantum state of an elementary link
We now state one of the main results of this work, which is an explicit expression for the probabilities Pr[H e (t) = h t ] π and the quantum states σ e (t|h t ) of the elementary link corresponding to the edge e in a quantum network undergoing a policy π = (d 1 , d 2 , . . . , d t , . . . ). This result gives us the answer to the first question posed at the end of Section 2. Furthermore, are the number of elementary link requests and the number of successful elementary link requests, respectively, up to time t, with A e (0) ≡ 1.
The expected quantum state of the elementary link corresponding to an edge e ∈ E undergoing the policy π is defined as the state obtained by tracing out the classical history register in (3.5): σ π e (t) := (3.10) Using Theorem 3.1, we immediately obtain the following result. where the sum is with respect to all possible values of the memory time, which in general depends on the policy π.

Figures of merit
Consider an edge e ∈ E in the graph G = (V, E) corresponding to the elementary links of a quantum network, and let π be a policy for the elementary link corresponding to e. Having determined the quantum state of the elementary link corresponding to e, let us now consider the following figures of merit to evaluate the policy π.
• The probability that an elementary link is active at time t ≥ 1, i.e., Pr[X e (t) = 1] π = E[X e (t)] π . Due to the latter equality, we also refer to his quantity as the expected elementary link status.
• The expected fidelity of the quantum state of the elementary link with respect to a target quantum state at time t ≥ 1, i.e., E[ F e (t)] π , where F e (t) := X e (t)f e (M e (t)), (3.12) where f e (m) := ψ|ρ e (m)|ψ = ψ|N •m e (ρ 0 e )|ψ (3.13) denotes the fidelity of the state ρ e (m) with respect to a pure target state ψ = |ψ ψ|.
A related quantity is which can be thought of as the expected fidelity of the quantum state of the elementary link given that the elementary link is active.
We discuss other figures of merit of interest in Appendix C.6.
The expected status and the expected fidelity of an elementary link can both be expressed in a simple manner in terms of the classical-quantum state of the elementary link.

Examples of policies
Let us examine the figures of merit defined above using two simple policies.
First, consider the policy consisting of the action "request" at every time step before the elementary link becomes active, and the action "wait" at every time step after the elementary link becomes active. This policy is defined by for all t ≥ 1 and every history h t ∈ {0, 1} 2t−1 . This policy achieves the highest value of E[X e (t)] π for all t ≥ 1. In fact, this policy is simply the t = ∞ memory-cutoff policy, which we investigate in Section 4. We show in that section that E[X e (t)] π = 1 − (1 − p e ) t for all t ≥ 1. Of course, this highest value of E[X e (t)] π comes at the cost of a lower fidelity, because each "wait" action decreases the fidelity of the quantum state stored in memory; we see this explicitly in Section 4.2.
Another policy is one in which the action "request" is taken at every time step, i.e., d t (h t ) = 1 for all t ≥ 1 and every history h t ∈ {0, 1} 2t−1 . (This is the t = 0 memory-cutoff policy; see Section 4.) In this case, the quantity E[F e (t)] π is maximized, because E[F e (t)] π = f e (0) for every time step t ≥ 1, which is the highest that can be obtained (without entanglement distillation). This highest value of the fidelity comes at the cost of a lower success probability, because the probability that the elementary link is active stays at p e for all times with respect to this policy, i.e., Pr[X e (t) = 1] π = p e for all t ≥ 1 if at every time step the agent requests a link.
The two policies considered above illustrate the trade-off between the expected status E[X e (t)] π and the expected fidelity E[F e (t)] π of an elementary link. The quantity E[ F e (t)] π , with F e (t) defined in (3.12), incorporates this trade-off, as it can be thought of intuitively as the product of the status and fidelity of an elementary link. Let us therefore now turn to finding policies that maximize the quantity E[ F e (t)] π as a function of time.

Policy optimization
We would like to understand the highest value that E[ F e (t)] π can take as a function of policies π and times t ≥ 1, given a particular entanglement generation success probability p e (and thus, given particular values of the physical parameters that comprise the success probability). With this, we can answer the third question posed at the end of Section 2, and more broadly, we can begin to understand the limits of practical, near-term quantum networks.

Backward recursion
Optimization of E[ F e (t)] π with respect to policies π for a given elementary link is given by the following recursive procedure. Theorem 3.4 (Optimal policy for an elementary link). Let G = (V, E) be the graph corresponding to the elementary links of a quantum network, let e ∈ E, and let ψ ≡ |ψ ψ| be a pure target state. Then, for all T ≥ 1, 20) and y = N succ e (T + 1)(h T , a T , 1), x = N req e (T + 1)(h T , a T , 1). Furthermore, the optimal policy is deterministic and given by Observe that in the recursive procedure presented in Theorem 3.4, we must first determine the optimal action at the final time step and then determine the optimal actions at the previous time steps in turn. For this reason, the recursive procedure is often known as backward recursion. Note that this recursive procedure is exponentially slow in the final time because of the fact that the number of histories grows exponentially with time-the number of histories up to time t is |{0, 1} 2t−1 | = 2 2t−1 . For this reason, it is useful to have efficient methods for estimating the maximum fidelity of an elementary link. One such method is forward recursion.

Forward recursion
Instead of starting from the final time step and finding the optimal actions by going backwards, we could instead find optimal actions by going forwards, i.e., by selecting the action such that the immediate expected fidelity is maximized. Such a "forward recursion" approach is more natural from the perspective of a real-world learning agent, who has to make decisions in real time and does not necessarily have complete knowledge of the environment in order to perform the backward recursion algorithm. However, the forward recursion algorithm will not necessarily lead to a globally optimal policy. In fact, the globally optimal policy can be obtained using the backward recursion algorithm, which is the result of Theorem 3.4. Nevertheless, it is worthwhile to briefly discuss the forward recursion algorithm because many reinforcement learning algorithms are based on it, and they give efficiently computable lower bounds on the maximum expected fidelity of an elementary link.
for all t ≥ 1 and all h t ∈ {0, 1} 2t−1 , where p e is the success probability for the elementary link corresponding to e.

The memory-cutoff policy
In Section 3, we defined a quantum decision process for elementary links in a quantum network. We determined results for the quantum state of an elementary link for arbitrary policies, defined figures of merit to evaluate policies (in particular, the expected fidelity), and presented algorithms for determining optimal policies. Let us now consider an explicit example of a policy.
A natural policy to consider, and one that has been considered extensively previously [58][59][60][61][62][63][64][65][66][67][68], is the following deterministic policy. An elementary link is requested at every time step until it becomes active, and once it is active it is held in quantum memory for some pre-specified amount t of time steps (usually called the memory cutoff and not necessarily equal to the memory coherence time) after which the quantum state of the elementary link is discarded and requested again. The cutoff t can be any value in the set N 0 ∪ {∞}, where N 0 = {0, 1, 2, . . . }. There are two extreme cases of this policy: when t = 0, a request is made at every time step regardless of whether the previous request succeeded; if t = ∞, then an elementary link request is made at every time step until the elementary link becomes active, and once it becomes active the corresponding entangled state remains in memory indefinitelyno further request is ever made. In this section, we provide a complete analysis of this policy for all values of t ∈ N 0 ∪ {∞} using the general developments in Section 3. Details of the analysis can be found in Appendix D.
Throughout this section, we consider an arbitrary elementary link in a quantum network, specified by the edge e ∈ E in the graph G = (V, E) corresponding to the elementary links of the network.
given by the following for all t ≥ 1 and every history h t ∈ {0, 1} 2t−1 .

Expected quantum state
Recall from (3.11) that the expected quantum state of an elementary link in a quantum network undergoing a policy π is given by the probabilities Pr[X e (t) = 1, M e (t) = m] π and Pr[X e (t) = 1] π . We now provide analytic expressions for these probabilities, which we denote by Pr[X e (t) = 1, M e (t) = m] t and Pr[X e (t) = 1] t , in the case of the memory-cutoff policy for all possible values of the cutoff t . We consider both the short-term (t < ∞) and the long-term (t → ∞) behavior of these probabilities.

Short-term behavior
In the short term, we obtain the following result for the joint probability distribution of the elementary link status X e (t) and memory time M e (t) random variables.

Theorem 4.2.
Let p e ∈ [0, 1] be the success probability for an elementary link in a quantum network, as defined in Section 3.1, and let t ≥ 1.
For t = ∞ and m ∈ {−1, 0, 1, . . . , t − 1}, From Theorem 4.2, we immediately obtain an expression for the probability that an elementary link is active at all times t ≥ 1.    Figure 2 is due to the nature of the memory-cutoff policy, which requires that the elementary link be discarded every t time steps. Indeed, the period of the oscillations is t , which is apparent for short times and large values of t . In the long term, however, we see that the amplitude of the oscillations decreases, and the expected elementary link status reaches a steady state, whose value we state below in Theorem 4.4.

Long-term (steady-state) behavior
Let us now consider the t → ∞, or long-term behavior of an elementary link undergoing the memory-cutoff policy. Consequently, (4.12)

Expected fidelity
If an elementary link in a quantum network undergoes the t memory-cutoff policy, and it has success probability p e , then from Theorem 4.2 and Corollary 4.3, we immediately obtain the following expressions for the expected fidelity of the elementary link for all t ≥ 1: where in (4.13) and (4.14) the expression for Pr[M e (t) = m, X e (t) = 1] t for t > t + 1 is given in (4.5), and the expression for Pr[X e (t) = 1] t for t > t + 1 is given in (4.9).
In the limit t → ∞, using Theorem 4.4, we obtain the following:

Summary and outlook
Understanding the capabilities and limitations of near-term quantum networks is an important problem, whose solutions will help drive the physical realization of small-scale quantum networks, and eventually lead to the realization of a global-scale quantum internet. Before such developments can be made, we must first have a common language and theoretical framework for analyzing quantum network protocols. This work provides the first steps towards such a general theoretical framework for practical quantum network protocols. We make use of the concept of a decision process to model protocols for elementary links. This formulation is natural based on the structure of near-term quantum network protocols, and it allows for optimal policies to be obtained using dynamic programming algorithms. Section 3 constitutes the main conceptual and technical contributions of this work. It lays out a quantum decision process for elementary links in a quantum network. The framework allows us to model protocols for an elementary link with respect to time in terms of the actions of an agent that can either request an entangled state from a source or keep the one it currently has in its quantum memory. The sequence of actions of the agent over time defines a policy, which is synonymous with the protocol. After formulating the quantum decision process for an elementary link and defining figures of merit for evaluating policies, we considered three examples of policies: the backward recursion policy in (3.21), the forward recursion policy in (3.22), and the memory-cutoff policy in Definition 4.1. We proved that the backward recursion policy is optimal among all policies (Theorem 3.4).
In Section 4, we considered the memory-cutoff policy, and we applied the results of Section 3 in order to obtain closed-form expressions for the expected quantum state of an elementary link for an arbitrary value of the cutoff, for both short times and long times. We also obtained closed-form expressions for the figures of merit defined in Section 3.2.
We expect the results of this work to be useful as a building block for large-scale quantum network protocols. For example, the policies for elementary links considered in this work can be used as an underlying policy layer on top of which routing protocols can be applied in order to obtain an overall (in general non-optimal) policy for generating end-to-end entanglement in a network. Furthermore, because our results apply to elementary links consisting of an arbitrary number of nodes and to any noise model for the quantum memories, they can be applied to protocols that go beyond bipartite entanglement distribution, namely to protocols for distributing multipartite entanglement. We also expect our results to be useful in the analysis of entanglement distribution using all-photonic quantum repeaters [69], and in the analysis of entanglement distribution using satellite-based quantum networks [70][71][72][73][74], in which an elementary link can easily be on the order of 1000 km [38] while still having a high fidelity. Initial applications of the results of this work to satellite-based elementary links can be found in [75,Chapter 7].
This work also opens up several other interesting directions for future work. Of immediate interest is to go beyond the elementary link level by incorporating entanglement distillation and swapping into the decision process developed here, which would allow for the analysis of more sophisticated quantum network protocols, and it would build on prior works [54,[76][77][78][79] that examine quantum network protocols beyond the elementary link level. Such an extension would involve multiple cooperating agents, in contrast to the independent agents considered in this work, and can in principle be formulated for an arbitrary network topology. A simple, but relevant example of a network topology, which has also been considered recently, is the starshaped network used for the so-called "quantum entanglement switch" [80][81][82][83]. As we might expect, these extra elements of entanglement distillation and swapping will make analytic analysis (as done in this work) intractable. This is when reinforcement learning algorithms are expected to be helpful for finding optimal policies. The beginnings of some of these future developments can be found in [75,Appendix D].
Acknowledgments I dedicate this work to the memory of Jonathan P. Dowling. This work, and the other works that I was fortunate to co-author with Jon, would not have been possible without his constant encouragement and his enthusiasm for the quantum internet.
The plots in this work were made using the Python package matplotlib [84].
Financial support was provided by the National Science Foundation and the National Science and Engineering Research Council of Canada Postgraduate Scholarship.

Appendix A Related work
Prior theoretical work on quantum networks can essentially be split into two types. The first type of work is information theoretic [85][86][87][88][89][90][91], with the focus being on obtaining (or placing bounds on) the ultimate limits of communication in a quantum network, without taking device imperfections (such as quantum memories with limited coherence times and nondeterministic gate operations) explicitly into account. Consequently, this type of work does not always provide a realistic analysis for near-term quantum networks. The second type of theoretical work on quantum networks [58-65, 69, 92-110] (see also [111][112][113] and the references therein) focuses on calculating communication rates under more realistic assumptions on the devices, sometimes with specific physical architectures and different types of noise-and lossmitigation techniques, such as entanglement distillation and quantum error-correction, taken into account. Typically, these works have been focused primarily on the topology of a linear chain of nodes. However, more recent work [66,78,79,[114][115][116][117][118][119][120][121][122][123] has begun to focus on arbitrary topologies, with routing protocols taken into account in some cases [56,57,89,90,[124][125][126][127]. The techniques used in these works are often varied, and sometimes different terminology and mathematical tools are used. One of the aims of this work is to provide the starting point for a unified theoretical framework for practical quantum network protocols that can, in principle, incorporate and generalize the developments in the aforementioned works of the second type, and that can be applied to arbitrary topologies and physical architectures.
Policy-based approaches to quantum network protocols, as considered in this work, have been considered before in [62,76,128,129] (see also [113]), where terms such as "rule-set" or "schedule" have been used instead of "policy". In [62], the authors consider different control protocols for elementary links in a quantum network based on different configurations of the sources and heralding stations and the impact they have on end-to-end entanglement distribution rates. In [128], the authors look at protocols for end-to-end entanglement distribution along a chain of quantum repeaters and simulate different scheduling protocols for entanglement distillation along elementary links. Similarly, in [76], the authors use finite state machines to analyze the different layers of an end-to-end entanglement distribution protocol in quantum networks, such as entanglement distillation and entanglement swapping. Finally, in [129], the authors use an approach based on rule-sets to determine end-to-end entanglement distribution rates and fidelities of the end-to-end pairs along a chain of quantum repeaters. One of the goals of this work is to explicitly formalize the approaches taken in the aforementioned works within the context of decision processes, because this allows us to systematically study different policies and calculate quantities that are relevant for quantum networks, such as entanglement distribution rates and fidelities of the quantum states of the links.
This work is complementary to prior work that uses Markov chains to analyze waiting times and entanglement distribution rates for a chain of quantum repeaters [65,[130][131][132]; we also refer to the work on entanglement switches in [80][81][82][83], which use both discrete-time and continuous-time Markov chains. This work is also complementary to prior work that analyzes the quantum state in a quantum repeater chain with noisy quantum memories [133][134][135][136][137].
In [54], the authors use reinforcement learning to discover protocols for quantum teleportation, entanglement distillation, and end-to-end bipartite entanglement distribution along a chain of quantum repeaters. While the work in [54] is largely numerical, this work is focused on formally developing the mathematical tools needed to perform reinforcement learning of entanglement distribution protocols in general quantum networks. The development of the mathematical tools is essential when an agent acts in a quantum-mechanical environment, because it is important to understand how the agent's actions affect the quantum state of the environment. Furthermore, we expect that the protocols learned in [54], particularly those for entanglement distillation and entanglement swapping, could be incorporated as subroutines within the mathematical framework of decision processes developed in this work, so that largescale quantum network protocols (going beyond the elementary link level) can be discovered using reinforcement learning.
This work is also related to the work in [118], in which the authors develop a link-layer protocol for generating elementary links in a quantum network, and they perform simulations of entanglement distribution using a discrete-event simulator under various scenarios. The effect of different scheduling strategies is also considered. The protocols in [118] consider actions in a more fine-grained manner than what we consider in this work. In particular, the steps required for heralding (namely, the communication signals for the results of the heralding) are explicitly taken into account. These steps can be incorporated within the framework developed here-all that has to be done is to appropriately define the transition maps (defined below) in order to accommodate the additional actions. We can similarly incorporate other classical discretevalued properties of an elementary link into the elementary link status random variable X e (t) if needed.
The approach to policy optimization taken in this work is similar to the approach in [55], in the sense that both approaches make use of the principle of dynamic programming. While in [55] the focus is on obtaining end-to-end bipartite entanglement in a chain of quantum repeaters, the goal here is simply to examine elementary links and to determine the optimal sequence of actions that should be performed in order to maximize both the fidelity of an elementary link and the probability that an elementary link is active at any given time. Other recent work on optimization of quantum network protocols can be found in [121,122]. where S e := S e,1 ⊗ S e,2 ⊗ · · · ⊗ S e,k .

Appendix B Elementary link generation
After transmission from the source to the nodes, the nodes execute a heralding procedure, which is an LOCC protocol executed by the nodes that confirms whether all of the nodes received their quantum systems. If the heralding procedure succeeds, then the nodes store their quantum systems in a quantum memory. Mathematically, the heralding procedure can be described by a quantum instrument denote the overall probability of success of the transmission from the source and of the heralding procedure. Now, as mentioned above, once the heralding procedure succeeds, the nodes store their quantum systems in their local quantum memory. We describe the decoherence of the quantum memories by quantum channels N e,i acting on each quantum system A v i e of the elementary link corresponding to e, i ∈ {1, 2, . . . , k}. The decoherence channel is applied at every time step in which the quantum system is in memory. The overall quantum channel acting on all of the quantum systems in the elementary link is The primary mathematical concept being used in this work is that of a Markov decision process.
In particular, we consider a particular quantum generalization of a Markov decision process given in [138] (see also [139,140]), called a quantum partially observable Markov decision process. For brevity, we use the term quantum decision process throughout this work. In a quantum decision process, the agent's action at each time step results in a transformation of the quantum state of the environment, and the agent receives both partial (classical) information about the new quantum state of the environment along with a reward. In the context of elementary links in a quantum network, the elements of the quantum decision process that we formally define below are shown in Figure 1. For a more general definition of a quantum decision process, we refer to [75,Chapter 3].
Definition C.1 (Quantum decision process for elementary links). Let G = (V, E) be the graph corresponding to the elementary links of a quantum network, and let e ∈ E. As shown in Figure 1, we define a quantum decision process for e by defining the agent for e to be collectively the nodes belonging to e, and we define its environment to be the quantum systems distributed by the source station to the nodes of e. Then, the other elements of the quantum decision process are defined as follows.
• We denote the quantum systems of the environment collectively by E e , and we let E e t denote these quantum systems at time t ≥ 0. We drop the superscript and simply write E t when the edge e is understood from the context or unimportant in the context being considered. The quantum state of the environment at time t = 0 is the source state ρ S e ≡ ρ S E e 0 .
• We let X = {0, 1} tell us whether or not the elementary link is active at a particular time. From this, we define elementary link status random variables X e (t) for all t ≥ 1 as follows: -X e (t) = 0: elementary link is inactive (transmission and heralding not successful); -X e (t) = 1: elementary link is active (transmission and heralding successful).
We let A = {0, 1} be the set of possible actions of the agent, and we define corresponding action random variables A e (t) for all t ≥ 1 as follows: -A e (t) = 0: wait/keep the entangled state; -A e (t) = 1: discard the entangled state and request a new entangled state.
The set of all histories up to time t ≥ 1 is (X • Given a pure target state ψ target = |ψ target ψ target |, the reward at time t ≥ 1 is defined as follows: for all h t+1 ∈ {0, 1} 2t+1 , and we define functions R e (t) : {0, 1} 2t+1 × {0, 1} → R as follows:  Given an elementary link specified by an edge e ∈ E and a policy π = (d 1 , d 2 , . . . , d t , . . . ) for the corresponding agent, the agent-environment interaction, as depicted in Figure 1, consists of a sequence of actions and responses of the agent and environment, respectively. This backand-forth between the agent and the environment falls into the general paradigm of agentenvironment interactions considered previously in [141,142], and more generally it falls within the theoretical framework of quantum causal networks (also referred to as quantum combs and quantum games) [143][144][145][146]; see Figure 3.
In terms of the quantum causal network shown in Figure 3, the actions of the agent are given by the decision channels, which are defined as where for h t = (x 1 , a 1 , . . . , a t−1 , x t ), and H t ≡ X 1 A 1 · · · A t−1 X t , with X j and A j denoting classical registers that store the elementary link status and the action, respectively, at time j.
The changes in the quantum state of the environment, based on the actions of the agent, are given by the environment response channels, which are defined as Et→E t+1 (ρ Et ), (C. 17) for arbitrary states ρ E 0 , ω Ht , σ At , ρ Et .
In general, the classical-quantum state in (3.5) is given by Using the definitions of the decision and environment response channels, it is straightforward to show that σ π e (t) = Then, we have On the other hand, by the basic rules of probability, we have Pr It follows that the transition probabilities in (C.25) are given by Pr[X e (t + 1) = x t+1 |H e (t) = h t , A e (t) = a t ] = Tr T xt,at,x t+1 e (σ e (t|h t )) .

(C.26)
Using this, and the definition of the transition maps, we have the following values for the transition probabilities for all t ≥ 1 and for every history h t = (x 1 , a 1 , . . . , a t−1 , x t ) ∈ {0, 1} 2t−1 : where p e is the success probability defined in (B.7). Observe that the transition probabilities are time independent. Furthermore, the status of an elementary link at time t + 1 depends only on the status and action at time t, not on the entire history of statuses and actions. This reflects the fact that the transition maps also depend only on the status and action at the previous time step.
For every edge e ∈ E in the graph G = (V, E) corresponding to the elementary links of a quantum network, and for every T -step policy π for the elementary link corresponding to e, the expected reward at time T is Tr ψ target σ π e (T + 1; h T +1 ) (C.32) = Tr (|1 1| X T +1 ⊗ ψ target ) σ π e (T + 1) .

(C.33)
In other words, Finally, the memory time random variable M e (t) defined in (3.2) can be expressed in a closed form as follows: where so that where we recall the definitions of ρ 0 e and τ ∅ e from (B.6).
Now, for t ≥ 2, we use (C.21). Based on the definition of the transition maps, for every time step j > 1 in which the action "wait" (i.e., A e (j) = 0) is performed and the elementary link is active (i.e., X e (j) = 1), the elementary link stays active at time step j + 1, and thus by definition the memory time must be incremented by one, which is consistent with the definition of the memory time M e (t) given in (3.2), and the quantum state of the elementary link goes from ρ e (M e (t)) to ρ e (M e (t) + 1). If instead the elementary link is active at time j and the action "request" is performed (i.e., A e (j) = 1), then the quantum state of the elementary link is discarded and is replaced either by the state ρ 0 e (if X e (j + 1) = 1) with probability p e or by the state τ ∅ e (if X e (j + 1) = 0) with probability 1 − p e . In the former case, the memory time must be reset to zero, consistent with (3.2), and in the latter case, the memory time is −1, also consistent with (3.2).
Furthermore, by definition of the transition maps, every time the action "request" is performed, we obtain a factor of p e (if the request succeeds) or 1 − p e (if the request fails). If the action "wait" is performed, then we obtain no additional multiplicative factors. The quantity N succ e (t − 1) is, by definition, equal to the number of requests that succeeded in t − 1 time steps. Therefore, overall, we obtain a factor p N succ e (t−1) e at the (t − 1) st time step for the number of successful requests. The number of failed requests in t − 1 time steps is given by so that we obtain an overall factor of (1 − p e ) N req e (t−1)−N succ e (t−1) at the (t − 1) st time step for the failed requests. Also, the memory time at the (t − 1) st time step is M e (t − 1)(h t t−1 ), and then because the quantum state is either ρ e (M e (t − 1)(h t t−1 )) or τ ∅ e , we obtain as required. Finally, which completes the proof.

C.2 Proof of Corollary 3.2
Using the result of Theorem 3.1, the expected quantum state of the elementary link at time t ≥ 1 is given by Pr[X e (t) = 1, M e (t) = m] π ρ e (m), (C. 54) where to obtain the last equality we used the fact that We also rearranged the sum with respect to the set {h t ∈ {0, 1} 2t−1 : X e (t)(h t ) = 1} so that the sum is with respect to the possible values of the memory time m, which in general depends on the policy π. This completes the proof.

C.3 Proof of Theorem 3.3
To see the first equality in (3.15), observe that The expression on the right-hand side of this equation is equal to Pr[X e (t) = 1] π by definition of the random variable X e (t). The second equality in (3.15) holds because X e (t) is a binary/Bernoulli random variable.
To see (3.16), we first use the definition of expectation to get where the sum is with respect to all possible values of the random variable M e (t), which depends on the policy π. Then, by Theorem 3.1, where the last equality holds because the sum with respect to the set {h t ∈ {0, 1} 2t−1 : X e (t)(h t ) = 1} can be rearranged into a sum with respect to the possible values of the memory time M e (t) when the elementary link is active. This completes the proof.

C.4 Proof of Theorem 3.4
We start with a lemma.
Lemma C.2. Let G = (V, E) be the graph corresponding to the elementary links of a quantum network, let e ∈ E, let ψ ≡ |ψ ψ| be a pure target state, and let π = (d 1 , d 2 , . . . , d T ) be a T -step policy with T ≥ 1. Then, Remark C.3. The functions v π t that we have defined in the statement of the lemma can be thought of as analogous to action-value functions in classical Markov decision processes; see, e.g., [51,52]. Also, observe that (C.62) and (C.63) specify a backward recursion algorithm for evaluating a given policy. The algorithm proceeds by first evaluating the function v π T +1 , then proceeding backwards, calculating v π t for all T ≥ t ≥ 2 in order to finally obtain E[ F e (T +1)] π .
Proof of Lemma C.2. Using (C.32) and (C.21), we have that From this, we see that where v π 2 (x 1 , a 1 ) := Then, separating the sum with respect to x 2 , a 2 ∈ {0, 1} in the above equation leads to Proceeding in this manner, we define functions v π t (h t−1 , a t−1 ) for 2 ≤ t ≤ T as follows: Now, it follows from Theorem 3.1 that where y = N succ e (T + 1)(h T , a T , 1) and x = N req e (T + 1)(h T , a T , 1). Therefore, This completes the proof.
Remark C.4. There is an advantage to using the backward recursion algorithm (as presented in Lemma C.2) to evaluate a policy, rather than simply using the definition of the expected fidelity in (C.32) or (C.33). This advantage comes from the fact that the function v π T +1 defined in (C.63) is independent of the policy π-it depends only on the elements of the environment and on the horizon time. Therefore, for a given elementary link specified by the edge e, and a given horizon time T , the function values v π T +1 (h T , a T ) ≡ v T +1 (h T , a T ) can be computed once and need never be computed again. Then, given a T -step policy π, the backward recursion algorithm can be used to quickly evaluate the expected fidelity.
Proof of Theorem 3.4. We start with the backward recursion algorithm given in Lemma C.2.
denote an arbitrary T -step policy, and let us define to be the "slices" of π from time t onwards. By observing that v π t depends only on the policy from time t onwards, i.e., on π (t) , we find that (2) v π (2) 2 (x 1 , a 1 ), (C.80) Then, using (C.62), we have that max for all 2 ≤ t ≤ T , h t−1 ∈ {0, 1} 2t−3 , and a t−1 ∈ {0, 1}. By defining the functions does not depend on π; see Remark C.4), we see that the optimization problem reduces to the following: for all 2 ≤ t ≤ T , h t−1 ∈ {0, 1} 2t−3 , and a t−1 ∈ {0, 1}, with M t;at At = |a t a t |. Now, observe that the objective function in (C.84) can be written as where in the last line we defined M t;x 1 a 1 ). Similarly, in (C.85), the objective function can be written as 1 xt,at=0 a t ). Therefore, we have Note that the optimal action at the t th time step for the history h t ∈ {0, 1} 2t−1 is given by the value a t that achieves the maximum in (C.94), which gives us the result in (3.21). This completes the proof.

C.5 Proof of Theorem 3.5
Let t ≥ 1, and consider a policy π up to time t. With respect to this policy, the classicalquantum state of the elementary link is (recall (3.5)) Now, for an arbitrary decision function d t corresponding to the decision at time t, we obtain the following classical-quantum state of the elementary link at time t + 1: (C.97) Then, where ψ|T xt,at,1 e ( σ π e (t; h t ))|ψ |a t a t |, (C.101) which is an optimization problem of the form in (C.91). The optimal value is therefore equal to , with associated optimal decision function, which we denote by So the task is to determine which of the two quantities, x t f e (M e (t)(h t ) + 1) and p e f e (0), is higher, where p e is the success probability. If the elementary link is not active at time t, meaning that x t = 0, then requesting a link, i.e., selecting a t = 1, gives a higher value than selecting a t = 0 (because the latter leads to a value of zero for the objective function in (C.104) for all p e > 0). On the other hand, if the elementary link is active at time t, then the task is to compare f e (M e (t)(h t ) + 1) and p e f e (0) for every history h t ∈ {0, 1} 2t−1 . Which of these two quantities is higher (and thus which action is taken) depends on the success probability p e ∈ (0, 1), the noise model of the quantum memory, and on the target pure state ψ. We conclude that the decision function d FR .
The success rate is simply the ratio of the number of successful transmissions when a request is made to the total number of requests made within time t. We let A e (0) ≡ 1. Next, let us consider what the histories h t look like through an particular example. Consider an elementary link for which t = 3, and let us consider the status of the elementary link up to time t = 10. Given that each elementary link request succeeds with probability p e and fails with probability 1 − p e , in Table 1 we write down the probability for each sequence of elementary link statuses according to the formula in (3.8). Note that we only include those histories that have non-zero probability (indeed, some sequences h t = ( x 1 , a 1 , . . . , a t−1 , x   {0, 1} 2t−1 have zero probability with respect to the memory-cutoff policy). We also include in the table the memory times M e (t), which are calculated using the formula in (C.35). Since the memory-cutoff policy is deterministic, it suffices to keep track only of the elementary link statuses (x 1 , . . . , x t ) and not the action values, because the action values are given deterministically by the elementary link statuses. For the elementary link status sequences, we define two quantities that are helpful for obtaining analytic formulas for the figures of merit defined in Section 3.2. The first quantity is Y t e (t), which we define to be the number of full blocks of ones (having length t + 1) in elementary link status sequences up to time t − 1. The values that Y t e (t) can take are 0, 1, . . . , t−1 t +1 if t < ∞, and 0 if t = ∞. We also define the quantity Z t e (t) to be the number of trailing ones in elementary link status sequences up to time t. The values that Z t e (t) can take are 0, 1, . . . , t + 1 if t < ∞, and 0, 1, . . . , t if t = ∞.
Using the random variables Y t e (t) and Z t e (t), along with the general formula in (3.8), we obtain the following formula for the probability of histories with non-zero probability.
Proposition D.1. For every time t ≥ 1, cutoff t ∈ N 0 , success probability p e ∈ [0, 1], and history h t = (x 1 , a 1 , x 2 , a 2 , . . . , a t−1 , x t ) ∈ {0, 1} 2t−1 with non-zero probability, is defined to be the number of full blocks of ones of length t + 1 up to time t − 1 in the sequence (x 1 , x 2 , . . . , x t ) of elementary link statuses, and Z t e (t)(h t ) is defined to be the number of trailing ones in the sequence (x 1 , x 2 , . . . , x t ). For t = ∞, Proof. The result in (D.15) follows immediately from the formula in (3.8) by observing that N succ . For t = ∞, we always only have trailing ones in the elementary link status sequences, so that Y ∞ e (t)(h t ) = 0 for all t ≥ 1 and every history h t . The result in (D.16) then follows.
Next, let us count the number of elementary link status sequences with non-zero probability. Using Table 1 as a guide, we obtain the following.
Lemma D.2. For every time t ≥ 1 and every cutoff t ∈ N 0 ∪ {∞}, let Ξ(t; t ) denote the set of elementary link status sequences for the t memory-cutoff policy that have non-zero probability. Then, for t ∈ N 0 , the number of elements in the set Ξ(t; t ) is For t = ∞, |Ξ(t; ∞)| = 1 + t.
Proof. We start by counting the number of elementary link status sequences when the number of trailing ones is equal to zero, i.e., when k ≡ Z t e (t)(h t ) = 0. If we also let the number x ≡ Y t e (t)(h t ) of full blocks of ones in time t − 1 be equal to one, then there are t + 1 ones and t − t − 2 zeros up to time t − 1. The total number of elementary link status sequences is then equal to the number of ways that the single block of ones can be moved around in the elementary link status sequence up to time t − 1. This quantity is equivalent to the number of permutations of t − 1 − t objects with t − t − 2 of them being identical (these are the zeros), which is given by We thus have the x = 0 and k = 0 term in the sum in (D.17). If we stick to k = 0 but now consider more than one full block of ones in time t − 1 (i.e., let x ≡ Y t e (t)(h t ) ≥ 1), then the number of elementary link status sequences is given by a similar argument as before: it is equal to the number of ways of permuting t − 1 − xt objects, with x of them being identical (the blocks of ones) and the remaining t − 1 − x(t + 1) objects also identical (the number of zeros), i.e., t−1−xt x . The total number of elementary link status sequences with zero trailing ones is therefore Let us now consider the case k ≡ Z t e (t)(h t ) > 0. Then, the number of time slots in which full blocks of ones can be shuffled around is t − k. If there are x blocks of ones in time t − k, then by the same arguments as before, the number of such elementary link status sequences is given by the number of ways of permuting t − k − xt objects, with x of them being identical (the full blocks of ones) and the remaining t − k − x(t + 1) of them also identical (these are the zeros up to time t − k). In other words, the number of elementary link status sequences with k > 0 and x ≥ 0 is We must put the indicator function 1 t−k−x(t +1)≥0 in order to ensure that the binomial coefficient makes sense. This also means that, depending on the time t, not all values of k between 0 and t + 1 can be considered in the total number of elementary link status sequences (simply because it might not be possible to fit all possible values of trailing ones and full blocks of ones within that amount of time). By combining (D.20) and (D.21), we obtain the desired result.
In the case t = ∞, because there are never any full blocks of ones and only trailing ones, we have t elementary link status sequences, each containing k trailing ones, where 1 ≤ k ≤ t. We also have an elementary link status sequence consisting of all zeros, giving a total of t + 1 elementary link status sequences.
Remark D.3. Note that when t = 0, we get 25) In other words, when t = 0, all t-bit strings are valid elementary link status sequences.
For t ≤ t + 1, no full blocks of ones in time t − 1 are possible, so we get This coincides with the result for t = ∞, because when t = ∞ the condition t ≤ t + 1 is satisfied for all t ≥ 1.

D.1 Proof of Theorem 4.2
We start with the proof of the claimed expressions for as given by (D. 21), and the probability of each such elementary link status sequence is p x+1 e (1− p e ) t−(m+1)−x(t +1) . By summing with respect to all 0 ≤ x ≤ t−1 t +1 , we obtain the desired result.
We now prove the claimed expressions for Pr[M e (t) = m, X e (t) = 0] t . For finite t , when t ≤ t + 1, there is only one elementary link status sequence ending with a zero, and that is the sequence consisting of all zeros, which has probability (1 − p e ) t . Furthermore, since the value of the memory for this sequence is equal to −1, only the case M e (t) = −1 has non-zero probability. When t > t + 1, we can again have non-zero probability only for M e (t) = −1. In this case, because every elementary link status sequence has to end with a zero, we must have Z t e (t) = 0. Therefore, using (D.15), along with (D. 20), we obtain the desired result.
For t = ∞, only the elementary link status sequence consisting of all zeros ends with a zero, and in this case we have M e (t) = −1. The result then follows.

D.2 Proof of Theorem 4.4
We start by proving (4.12). Since we consider the limit t → ∞, it suffices to consider the expression for Pr[X e (t) = 1] t in (4.9) for t > t + 1. Also due to the t → ∞ limit, we can disregard the indicator function in (4.9), so that Next, consider the binomial expansion of (1 − p e ) t−k−(t +1)x : Substituting this into (D.30) gives us Now, for brevity, let a ≡ t − k, and let us focus on the sum We start by expanding the binomial coefficients to get Next, we have is the Stirling number of the second kind 2 . For i < , it holds that where we have used the fact that     = 1. Altogether, we have shown that for all ≥ 0. The sum is independent of a = t − k. Substituting this result into (D. 33), and using the fact that as required.
The proof of (4.10) is very similar to the proof of (4.12). Using the result of Theorem 4.2, in the limit t → ∞ we have Using the binomial expansion of (1 − p e ) t−(m+1)−x(t +1) , exactly as in the proof of (4.12), we can write as required. The proof of (4.11) is similar.

D.3 Other figures of merit
We now consider the figures of merit defined in Section C.6 in the context of the memory-cutoff policy.

D.3.1 Waiting time
Let us consider the expected waiting time for an elementary link in a quantum network undergoing the memory-cutoff policy. As a check, let us observe the following: • If t req = 0, then because Pr[M e (1) = −1, X e (1) = 0] t = 1 − p e for all t ∈ N 0 (see Theorem 4.2), we obtain E[W e (0)] t = 1 pe , as expected. We get the same result for t = ∞.
• If t = 0, then we get Pr[M e (t req + 1) = −1, X e (t req + 1) = 0] 0 = 1 − p e for all t req ≥ 0 (see Theorem 4.2), which means that E[W e (t req )] 0 = 1 pe for all t req ≥ 0. This makes sense, because in the t = 0 memory-cutoff policy the quantum state of the elementary link is never held in memory.
Proof. Using (C.108), we have  In the limit t req → ∞, we obtain using See Figure 4 for plots of the expected waiting time, given by (D.50), as a function of the request time t req for various values of t . As long as t is strictly greater than zero, the waiting time is strictly less than 1 pe , despite the oscillatory behavior for small values of t req . In the limit t req → ∞, we see that the waiting time is monotonically decreasing with increasing t , which is also apparent from (D.57).

D.3.2 Success rate
Let us now consider the expected success rate for an elementary link undergoing the memorycutoff policy.  For t > t + 1, Proof. We start with the observation that, for every history h t , the number of successful requests can be written in terms of the number Y t e (t)(h t ) of blocks of ones of length t + 1 and the number Z t e (t)(h t ) of trailing ones in the elementary link status sequence corresponding to h t as Y t e (t)(h t ) + 1 − δ Z t e (t)(h t ),0 .
(D. 60) Similarly, the total number of failed requests is Therefore, for t ≤ t + 1, as required, where the last equality follows by a change of summation variable.
For t > t + 1, we use (D.63) again, keeping in mind this time that the number of trailing ones can be equal to zero, to get  See Figure 5 for a plot of the expected success rate E[S e (t)] t as a function of time for various values of the cutoff t . We find that the rate has essentially the shape of a decaying square wave, which is clearer for larger values of the cutoff. In particular, the "plateaus" in the curves have a period of t + 1 time steps. Consider the values of these pleateaus. The largest plateau can be found by considering the case t = ∞, because in this case the condition t ≤ t + 1 is satisfied for all t ≥ 1, and it is when this condition is true that the largest plateau occurs. Using Theorem D.5 with t = ∞, we find that the value of the largest plateau approaches for all p e ∈ (0, 1). In the case t ∈ N 0 , as we see in Figure   for all x ≥ 0, where 2 F 1 (a, b, c, z) is the hypergeometric function. Then, using the fact that lim x→∞ 2 F 1 (1, 1, 2 + x, 1 − p e ) = 1 [148], we conclude that the plateaus approach the value of p e , i.e., lim t→∞ E[S e (t)] t = p e , t ∈ N 0 . (D.72)