Quantum Network Discrimination

Discrimination between objects, in particular quantum states, is one of the most fundamental tasks in (quantum) information theory. Recent years have seen significant progress towards extending the framework to point-to-point quantum channels. However, with technological progress the focus of the field is shifting to more complex structures: Quantum networks. In contrast to channels, networks allow for intermediate access points where information can be received, processed and reintroduced into the network. In this work we study the discrimination of quantum networks and its fundamental limitations. In particular when multiple uses of the network are at hand, the rooster of available strategies becomes increasingly complex. The simplest quantum network that capturers the structure of the problem is given by a quantum superchannel. We discuss the available classes of strategies when considering $n$ copies of a superchannel and give fundamental bounds on the asymptotically achievable rates in an asymmetric discrimination setting. Furthermore, we discuss achievability, symmetric network discrimination, the strong converse exponent, generalization to arbitrary quantum networks and finally an application to an active version of the quantum illumination problem.


Introduction
Hypothesis testing not only allows us to investigate the usually unavoidable error occurring when discriminating between two possible quantum states or channels. The framework has also proven useful in giving bounds, determining properties and proof operational interpretations of quantities such as the capacity of a quantum channel [1,2], the entanglement in a quantum state [3] and many more [4,5]. In the case of discriminating two quantum states, a wide body of literature is available determining the optimal single-copy and asymptotic errors in several different scenarios, see e.g. [6,7]. In particular, it is known that when several copies of a quantum state are available, measuring each copy individually does not lead to the optimal error but rather one needs to use a joint measurement.
The case of discriminating between two quantum channels is much more complex because one can also choose the input of the channels in order to facilitate the discrimination. In particular, when multiple copies or uses are available, this additional freedom allows to pick the inputs adaptively based on earlier outputs of the channel. Consequently, determining the asymptotically optimal error in this setting has long remained an open problem. Recently, a series of publications has finally lead to significant progress on the problem, in particular in the asymmetric asymptotic Stein's setting [8,9,10]. While it was known that in the classical setting adaptive strategies and even jointly distributed channel inputs do not lead to any advantage [11], general converse bounds in the quantum setting were only recently shown in [8], which in particular allowed to extend the classical result to classical-quantum channels, see also [12]. Subsequent work then showed that in the Stein's setting the bounds from [8] are indeed optimal also for general channels [9] and that adaptive strategies do not outperform parallel strategies with entangled inputs [10]. However, generally, entangled inputs are necessary and give an advantage over product state inputs [10]. A close investigation of the power of adaptive strategies in several settings followed in [13].
Both, quantum states and channels, already play an important role when implementing quantum technologies. However, with advances in experimental research and implementations it becomes increasingly more relevant to investigate the properties of more complex structures such as quantum networks. These networks allow for even more ways of interacting. In particular, one could at some point receive an intermediate output of the network, process it and then reintroduce it to the remainder of the network. Relevant examples of such networks on different size scales include quantum circuits, distributed computational resources or even a quantum internet. These additional possibilities make the problem even more complex than the channel case. One can start with product or entangled states, use individual or joint measurements and process intermediate access points with individual or joint quantum channels. Additionally all the different tools can be chosen adaptively and while channels can be used either in parallel or successively, one can run the first part of a quantum network, get a state from an access point and run it through an entirely different copy of the network before reintroducing it to the next part of the first network. All these possibilities lead to a wide range of available strategies one has to consider when searching for the optimal error rates.
In this work we classify the possible strategies and give bounds on the optimal error when discriminating between networks. We focus on the asymptotic setting and here particularly the asymmetric Stein's setting. For all classes we give converse bounds and discuss their optimality. As evident from the above description the problem is extremely complex, which is why for much of this work we discuss the results for the special case of networks with exactly one intermediate access point. This subclass is also known as superchannels since it can be understood as transforming quantum channels, as input to the access point, into quantum channels. Afterwards we extend the results to general networks and settings beyond Stein's Lemma.
The remainder of this paper is organized as follows. First we discuss the general notation and definitions in Section 2, including definitions of quantum networks and superchannels in Section 2.3. As a warm-up we then briefly discuss one-shot discrimination of superchannels in Section 3. In Section 4 we discuss the primary technical tool of this investigation: amortized divergences for superchannels. The main part of this work is Section 5, where we discuss different classes of strategies for discriminating multiple copies of superchannels with focus on converse bounds in the asymptotic Stein's setting. In Section 6 we discuss several examples, such as classical channels and some channels with particular parameterizations. Next, we discuss other asymptotic settings such as the symmetric Chernoff setting and the strong converse exponent in Section 7. In Section 8 we discuss the generalization to quantum networks with an arbitrary number of access points. Finally we discuss applications including an active variant of quantum illumination in Section 9 and conclude in Section 10.

Preliminaries
Here we introduce our notation and give the relevant definitions needed later. In particular, Section 2.2 introduces the required entropic quantities and Section 2.3 discuses definitions of quantum superchannels and networks.

Setup
Throughout, quantum systems are denoted by capital letters A, B, etc. and have finite dimensions |A|, |B|, etc., respectively. Linear operators acting on system A are denoted by L A ∈ L(A) and positive semi-definite operators by P A ∈ P(A). Quantum states of system A are denoted by ρ A ∈ S(A) and pure quantum states by Ψ A ∈ V(A). Quantum channels are completely positive and trace-preserving maps from L(A) to L(B) and denoted by N A→B ∈ Q(A → B). The Choi state of a quantum channel N A→B is a standard concept in quantum information and is defined as N A→B (Φ RA ), where Φ RA is the maximally entangled state. Classical systems are denoted by X, Y , and Z and have finite dimensions |X|, |Y |, and |Z|, respectively. We will drop the indices when denoting quantum states or channels whenever we deem them clear from context. For p ≥ 1 the Schatten norms are defined for L A ∈ L(A) as

Quantum networks and superchannels
The main subject of this work are quantum networks, which can be seen as maps taking channels as input and outputing a quantum channel. Such networks have previously been considered in various contexts in quantum information theory [25,26,27,28,29]. We aim to determine the optimal discrimination errors and therefore assume a single experimenter who has access to the entire network, hence leaving settings with several distributed experimenters for future work. In [28] it was shown that every quantum network as considered in this work can be described by a sequence of k quantum channels, Such a sequence takes the shape of a comb and is therefore called a k-comb which has k − 1 access points, compare also Figure 1. We often use the convenient notation Θ k ≡ (N i ) k to denote k-combs. Sometimes we call the individual channels N i the components of Θ k . Let Q(C → D) be the set of quantum channels from C to D, then a k-comb acts on k − 1 Whenever the resulting channel acts on a state we denote that by Θ k ((A i ) k−1 )(ρ). Sometimes we will need parts of k-combs and we denote as Θ k,m the m-comb given by the first m components of Θ k . Generally quantum networks constructed as described here naturally preserve complete positivity and trace preservation, i.e. if we input quantum channels the output is again a quantum channel. In this work we will mostly consider 2-combs, also known as superchannels [30], as they capture the important features of our problem. For simplicity we will usually denote 2-combs as Θ ≡ Θ 2 and we will often use the decomposition (E C→AS , D BS→D ). Generally we will drop subscripts when the system are clear from the context. If one takes e.g. a quantum circuit and writes it in the above form it is often not a priori clear whether a certain gate should belong to the channel E or D, meaning many different networks can describe the same circuit. Here, we assume that a description of he network is given. Nevertheless for many examples it will be useful to make this ambiguity explicit by introducing a side-channel with which we parametrize a superchannel as (E C→AS , S S→S ′ , D BS ′ →D ). For example depictions of superchannels see Figures 2 and 8. It is however worth pointing out that one can limit the possible descriptions without losing generality. First notice that we can always extend E to an isometry V E and D to also trace out the additional system. This does not change the implemented superchannel [29]. An analog statement for general networks had previously been observed in [31,32]. Furthermore, it was shown in [33,Theorem 2] that all parameterizations ( V E , D) with minimal system size |S| are unique up to the choice of a unitary.

Single-copy superchannel discrimination
As a warm-up to the topic of this work, we will in this section discuss the single-copy problem of quantum network discrimination. For related discussions and more background see also [34,28,35]. It is well known that in symmetric state discrimination the optimal one-shot error is related to the trace distance [36,37,38]. Here we want to consider the case of two quantum superchannels. That is an experimenter has access to one use of a superchannel, not knowing which out of the two available options Θ 1 or Θ 2 it is. In order to decide which is the case, the most general approach is to feed an arbitrary state ρ CR into the superchannel which itself get applied to a channel N AR→BR , resulting into an output state ρ 1 DR if the superchannel was Θ 1 and ρ 2 DR if the superchannel was Θ 2 . To the output state one can apply a measurement to determine the superchannel. Without loss of generality we can assume the use of a binary measurement Q = {Q 1 , Q 2 = 1 − Q 1 }. As usual in the hypothesis testing setting, one is left with two possibilities of erroneously determining the result: the Type-1 and the Type-2 error, given by respectively. Throughout this manuscript S will denote the chosen strategy, in this case the set S = {ρ, N, Q}.
The most common setting in the single copy case is to minimize the average error. For a fixed channel N AR→BR we can follow directly from the channel case that the probability of error is given by where we used the usual diamond norm for quantum channels. Of course we are also allowed to optimize over the channel N leading to the optimal one shot error probability which motivates the definition of a diamond norm equivalent on superchannels, Note that a priori the optimization is over channels with arbitrary large reference system R. For this reason we are also free to omit the additional reference system usually attached to the state, that does not pass through the channel N, as the latter already includes channels of the form N ⊗ id, a fact that will later reemerge when discussing superchannel entropies. As a simple example we can consider replacer superchannels Θ R that act as Θ R (N) = R for every N and a particular fixed R. Those channels generalize the commonly considered replacer channels and their implementation is shown in Figure 2. We easily get that for two replacer superchannels Θ R1 and Θ R2 we have Replacer superchannels will be considered in more detail in Section 6.3. Now, the same can easily be done for two arbitrary quantum networks. The main difference is that when defining the diamond norm for k-combs one has to optimize over (k−1)-combs instead of the quantum channel N (or alternatively one could optimize over k − 1 separate quantum channels).
In contrast to the symmetric case, one is often interested in minimizing one error, say the Type-2 error, while only bounding the Type-1 error to be below a certain constant ϵ. Similarly to the discussion above for the diamond norm, this motivates us to define a superchannel hypothesis testing relative entropy that extends the state and channel case [1,39] as follows, Figure 2: Depiction of superchannels: a) A general superchannel (or 2-comb) Θ acting on a channel N. b) Any superchannel Θ can be understood as two quantum channels E Θ , D Θ connected by an auxiliary system S. c) Implementation of a replacer superchannel: For any input N, we get Θ(N) = R. We fix τ to be a maximally mixed state.
where we again optimize over strategies S = {ρ, N, Q}. As before, a version for networks is defined similarly. As we will see in the remainder of this work, the picture becomes much more complicated if we allow for multiple uses of the quantum superchannel or network since they can be combined in a variety of different configurations. To proceed, we will first introduce the relevant entropic distance measures on superchannels in the next section.
We want to define distance measures for superchannels based on generalized divergences. We will later see that they are operationally meaningful in terms of superchannel discrimination. In the build up, we will first discuss generalized divergences for states and channels. The generalization to arbitrary networks is discussed later in Section 8.

... for quantum states
We say that a function D : S(A) × S(A) → R ∪ {+∞} is a generalized divergence [40,41] if for arbitrary Hilbert spaces H A and H B , arbitrary states ρ A , σ A ∈ S(A), and an arbitrary channel N A→B ∈ Q(A → B), the following data-processing inequality holds From this inequality, we find in particular that for all states ρ A , σ A ∈ S(A), ω R ∈ S(R), the following identity holds [21] and that for an arbitrary isometric channel U A→B ∈ Q(A → B), we have that [21] We call a generalized divergence faithful if the inequality D(ρ A ∥ρ A ) ≤ 0 holds for an arbitrary state ρ A ∈ S(A), and strongly faithful if for arbitrary states ρ A , σ A ∈ S(A) we have D(ρ A ∥σ A ) = 0 if and only if ρ A = σ A . Moreover, a generalized divergence is sub-additive with respect to tensorproduct states if for all ρ A , σ A ∈ S(A) and all ω B , τ B ∈ S(B) we have and simply additive if this holds with equality. Examples for generalized divergences of interest are in particular the quantum relative entropy, the Petz-Rényi divergences, the sandwiched Rényi divergences, or the Chernoff distance -as defined in Section 2.2.

... for quantum channels
Based on the generalized divergences for states, one can now define divergences for quantum channels. First, we have the generalized channel divergence for two quantum channels N, M ∈ Q(A → B), where the optimization is over states ρ ∈ S(AR). Since the above quantity is not generally additive [10] it is natural to define a regularized channel divergence as follows, Finally, in [8] the amortized channel divergence was introduced as All of these definitions have an operational interpretation in terms of an associated channel discrimination task. Notably, in [10] is was proven that in the case of the relative entropy one gets and an analog result was later proven for the sandwiched Renyi relative entropy [42]. We will need some properties, especially of the amortized relative entropy. First we will summarize some results obtained in [8] in the following lemma. • if D is strongly faithful, then its associated amortized channel divergence is also strongly faithful, i.e. • the associated amortized channel divergence is stable under adding an identity, i.e.
We now want to state some more properties of the above quantities that will become relevant later.
Proof. This follows from the following chain of arguments: where the first, third and fourth equality are by definition, the second equality by adding a zero, the first inequality by splitting the supremum and the second inequality by widening set of states which we optimize over. ⊓ ⊔ Lemma 4.3. With the definitions as above, for any additive divergence the following holds, Proof. First note that the ≥ direction simply follows by convenient choice of the state subject to optimization and additivity. Let's now consider two superchannels Θ 1 and Θ 2 for which we would like to define similar measures. To save on writing indices, superchannels will always take a channel from A to B and transform it into a channel from C to D. We begin by a generalization of the channel divergence, which is a special case of [9, Definition 1].

Definition 4.4.
For two superchannels Θ 1 and Θ 2 and a generalized divergence D, we define the generalized superchannel divergence as With Θ 1 ⊗ id R we mean therefore that the superchannel doesn't act on the system R, however we optimize over a channel N AR→BR that does. It was shown that the above definition obeys data processing in the sense that it is monotone under transformations that transform superchannels into general quantum networks [9,Theorem 4].
The above also gives a natural way to define a regularized superchannel divergence as where the optimization is effectively over n-party states and channels. However, one could also define intermediate version where only the states are n-partite and the channels are of product form, or vice versa, where the channels are n-partite and the states are of product form, We will later see that these definitions are indeed of operational significance in certain superchannel discrimination scenarios. Similarly we can also define different extensions of the amortized channel divergences depending on whether one wants to amortize the action of the involved states, channels or both. Definition 4.5. For two superchannels Θ 1 and Θ 2 with decomposition {E Θi , D Θi } and a generalized divergence D, we define the state-amortized, channel-amortized, amortized and fully-amortized superchannel divergences as follows, where the infimum is over arbitrary quantum channels F C→R ′ with R ′ being an additional reference system.
A rough intuition behind the different quantities is that D sA effectively handles a channel problem and amortizes its input state, D cA amortizes the distinguishability of the channel input but not the state, D A amortizes both, channel and state, and finally D A * makes full use of the superchannel structure amortizing both inputs in terms of channel distinguishability. Before we move on to show their operational meanings in the following sections, we will explore the relation between the different quantities. The next lemma states the intuitive observation that additional amortization can only increase the superchannel divergences. Lemma 4.6. For a faithful divergence D we have, Proof. We go through the inequalities from right to left. Starting from D A * (Θ 1 ∥Θ 2 ) we get to D A (Θ 1 ∥Θ 2 ) by settingN =M = id and using data processing to remove the channel F in the second term. We then get to D sA (Θ 1 ∥Θ 2 ) by setting N = M and then to get D(Θ 1 ∥Θ 2 ) we use Lemma 4.1. Similarly, we get from D A (Θ 1 ∥Θ 2 ) to D cA (Θ 1 ∥Θ 2 ) by using Lemma 4.1 and then to In the next section we will discuss the applications of these quantities in asymptotic quantum superchannel discrimination.

Asymptotic bounds for superchannel discrimination
In this section we will discuss the different strategies that are possible when one allows for multiple uses of the superchannel and what rates can be achieved asymptotically. We will for now focus on the asymptotic asymmetric discrimination setting, also known as Stein's setting, however the core results will be so called meta-converses which we prove in terms of generalized divergences. These will later, in Section 7, allow us to apply the results also to other asymptotic discrimination settings. The essential setup is very similar to channel discrimination as described e.g. in [8], but we have to account for the additional freedom provided by the access point and in combining superchannels.
Any strategy, independent of how we combine the superchannels, will ultimately result in either an output state ρ 1 if the superchannel was Θ 1 or ρ 2 if the superchannel was Θ 2 . We measure that output with a binary POVM Q = {Q 1 , Q 2 = 1 − Q 1 }, resulting in the usual type I error α n (S) = Tr(Q 2 ρ 1 ) and type II error β n (S) = Tr((1 − Q 2 )ρ 2 ), where Sis the choice of strategy from the set of allowed strategies including the choice of measurement.
For asymmetric hypothesis testing, we minimize the type II error probability under the constraint that the type I error probability does not exceed a constant ε ∈ (0, 1). We are then interested in characterizing the non-asymptotic quantity where S is the set of allowed strategies, as well as the asymptotic quantities If the two limits coincide we call it simply ζ. Often we will also consider the case where additionally ϵ goes to zero, We will now start our investigation with the simplest possible strategy and then gradually move to more general strategies in the remainder of this section.

Product strategy
This is the least powerful strategy that we will consider: One fixes n copies of an input state ρ CR and of a channel N AR→BR from AR to BR, one pair for each copy of the superchannel, as depicted in Figure 3a). Finally the resulting output state is measured by a binary measurement Q = {Q 1 , Q 2 } trying to determine which superchannel was used. It follows that the set of strategies to optimize over is given by the triple S = {ρ, N, Q}. The product structure essentially reduces the problem to state discrimination by fixing ρ and N and discriminating the resulting output states which are again product states, therefore the relative entropy gives the optimal rate of discrimination by the Stein's Lemma for quantum states. Optimizing over all ρ CR and N AR→BR shows that for these simple product strategies the optimal error rate is given by the superchannel divergence, Of course this strategy is extremely restricted and one can easily come up with more general discrimination strategies. It is however useful in setting a baseline, as every more general strategy naturally performs at least as good as the best product strategy. In the following, we will gradually move towards more complex strategies allowing for entangled states, many-party channels and intermediate adaptively chosen operations.

Parallel strategy with product channels and successive adaptive strategies
Parallel strategies are more general than the previous product strategy as we allow for joint inputs. Here we start by considering a joint entangled input state to all superchannels but for now keep the limitation to product channels. This approach is already more complicated to analyze, nevertheless it is similar in spirit. One fixes a channel N AR→BR for the superchannel to act on. This results in a new channel Θ i ⊗ id R (N) and the problem is now equivalent to attempting channel discrimination with a parallel strategy. In this case we know that the optimal rate is the regularized channel relative entropy and by optimizing over all channels N AR→BR we get the optimal rate for superchannel discrimination in the form of the state-regularized channel relative entropy, Parallel strategies with product channels are, as in the channel case, a special case of certain adaptive strategies. We will later see that while these particular adaptive strategies, as depicted in Figure 4, are the most general in the channel case, they are not for superchannels. One can construct different adaptive strategies than the ones discussed in in this section and we therefore name this particular class successive adaptive strategies. As before one can simply fix a channel N AR→BR and consider the task at hand a problem of channel discrimination which gives the amortized relative entropy as optimal error rate. Optimizing over the channels N AR→BR gives that the state-amortized relative entropy is the optimal rate achievable with successive adaptive strategies when discrimination superchannels, It follows from the non-additivity of the channel relative entropy that parallel strategies with product channels can be strictly more powerful than the product strategies in the previous section. Another observation that can be made regarding the amortized quantity is that, similar to the channel version, one can apply the chain rule from [10] to get an upper bound. In our case we have whereD ∞ denotes D ∞ without an additional reference following the notation in [10]. Optimizing gives Figure 3: Different parallel strategies for n = 3: a) Product strategy. Here both the input state and the intermediate channel are of product form. b) Parallel strategy with product channels. This strategy allows for an entangled input state. c) Parallel strategy with product states. This strategy uses only product states but allows for joint quantum channel N (as indicated by the connecting line). d) Parallel strategy: This is the most general parallel strategy allowing for both, an entangled input state and a joint quantum channel N.
Since we know that parallel strategies with product channels are a special case of sequential adaptive strategies we have immediately that implying that successive adaptive strategies do not provide an advantage over parallel strategies with product channels. Note that for all results in this section we had to take the limit of ϵ → 0 as we do not know whether the given rates are strong converse rates. We remark that via the reduction to a channel problem one can easily get strong converse bounds in terms of the max-relative entropy [8] or the amortized geometric Renyi divergence [43], see also Remark 5.4 for more details.
The results in this section are all based on optimizing over product channels. Clearly, we are not making full use of the additional possibilities provided by a superchannel and in the next section we will discuss what happens when we lift this restriction. Figure 4: The successive adaptive strategy for n = 3.

Parallel strategy with multi-party channels
One gets a first hint that the structure of superchannel discrimination is much richer than that of channel discrimination by considering a parallel strategy for which one allows the use of an n-party channel instead of limiting ourselves to n separate channels, see Figure 3d). It is not clear whether these strategies can be cast as a successive adaptive strategy and therefore one could expect that they provide a strictly more powerful class of strategies for superchannel discrimination.
One of the main ingredient for most of our converse proofs will be the following observation from the original proof of Stein's lemma for quantum states [44], see also [8,Proposition 16]. Let p and q be the binary probability distributions resulting from measuring the final output state of a given discrimination strategy, then where D(p∥q) is the classical relative entropy for two binary probability distributions. By rearranging the above equation, it follows that From this and data processing of the relative entropy we get by standard arguments that a weak converse in the full parallel setting is given by One can further envision an intermediate strategy with interchanged focus on the optimization, namely one allows for arbitrary channels but restricts the input states to product states, see Figure 3c). It should be noted that also for this simpler strategy it is not clear how to cast it as a successive adaptive strategy. In this case we can use the same approach and get the channelregularized relative entropy as weak converse rate We are left with showing that these rates are also achievable. The main ingredient in this proof will be the following results from [45,46]: where V (ρ∥σ) := Tr(ρ(log ρ − log σ) 2 ) is the quantum information variance and Φ is the cumulative normal distribution.
Lemma 5.1. For two superchannels Θ 1 and Θ 2 , considering fully parallel strategies, we have Figure 5: The nested adaptive strategy for n = 3.
Proof. Let's say we have n := km copies of the superchannels. We prepare k copies of an arbitrary input state on C m R and a channel N : We now send each state through m parallel superchannels acting on N, leaving us with the output state By Equation (56) we have now dividing by n and taking the limit k → ∞ (and therefore n → ∞) gives Finally, we can choose m arbitrarily large and pick the optimal ρ and N. Noticing that the strategy used here is a special case of the fully general parallel strategy, the statement of the lemma follows.

⊓ ⊔
In summary, we have giving the superchannel Stein's lemma for fully parallel strategies. We can get an equivalent result for the parallel strategies with product states.

Lemma 5.2.
For two superchannels Θ 1 and Θ 2 , considering parallel strategies restricted to product states, we have Proof. The proof for the parallel strategy with product states works exactly as that of the fully general parallel strategy.

⊓ ⊔
This gives, We have therefore successfully determined the optimal rates for a Stein's lemma for superchannels using different parallel strategies. Finally, strong converse rates in terms of different divergences follow with similar proof altercations as mentioned for product strategies and we refer to Remark 5.4 for details.

Nested adaptive strategies
As mentioned in the previous section, parallel strategies allowing for n-party channels don't seem to be a special case of successive adaptive strategies, however, one can cast them as a different class of adaptive strategies which we call nested adaptive strategies, see Figure 5. For a visualization of how to embed parallel strategies into nested adaptive ones see also Appendix C. In this section, we will discuss this class and give a meta-converse bound in terms of the amortized superchannel divergence. Also note that of course parallel strategies with product channels are a special case of nested adaptive strategies (because general parallel strategies are), but successive adaptive strategies do not seem to be in this class. This can be seen as furthering the intuition that superchannels should ultimately be seen as a function on channels rather than states. Additionally, since successive adaptive strategies do not outperform parallel strategies with product channels which in turn are a special case of nested adaptive strategies, the latter are always at least as good as the successive adaptive ones. We begin by formalizing the class of nested adaptive strategies. The basic structure can be taken from Figure 5. The strategy now consists of an input state ρ C1R1 , 2n − 1 adaptively chosen channels A i and a measurement Q. Therefore, we optimize over all We will describe the strategy in an iterative fashion.
To start, we choose an input state ρ C1R1 and define a sequence of channels ) and now every following channel in the sequence is defined via Now, the outcome of a n-step nested adaptive strategy can simply be given as N n (ρ C1R1 ) (or M n (ρ C1R1 )). Afterwards we measure these resulting states with the measurement Q and get a classical binary distributions p or q. Following the idea of the meta converse in [8], we can now get the following result.
Theorem 5.3 (Meta-converse for nested adaptive strategies). For two superchannels Θ 1 and Θ 2 , we have for any n-round nested adaptive strategy, Proof. We begin by fixing an arbitrary strategy S from all possible nested adaptive protocol for discrimination of the superchannels Θ 1 and Θ 2 and let p and q denote the final decision probabilities. Now, consider the following chain of arguments, The first inequality follows from data processing under the final measurement Q. The second inequality follows by taking a supremum over all ρ, the third because amortization makes channel divergences only bigger. The first two equalities are straightforward and the third follows by making the following substitutions, With this we continue as follows, where the first inequality follows by optimizing, the first equality by definition and the final inequality because the amortized channel divergence obeys data-processing under superchannels.
In summary, we showed that Now, we can simply apply the same procedure another n − 1 times and we get that which concludes the proof.

⊓ ⊔
It follows from Equation (52) that the amortized superchannel relative entropy is a weak converse for a superchannel Stein's Lemma considering all nested adaptive strategies, Knowing that the amortized superchannel relative entropy is a valid converse raises the question whether nested adaptive strategies can perform better than parallel strategies. Since the latter are a subset of the former, we know already that Proving the inequality in the opposite direction, would show that nested adaptive strategies do not outperform parallel strategies. In Appendix A, we show that under a technical assumption, namely an asymptotic equipartition property for the smooth channel max-relative entropy, Equation (71) indeed becomes an equality. We conjecture that the assumption is true in general and therefore that parallel strategies are as powerful as nested adaptive strategies. The argument in Appendix A relies on a novel chain rule for relative entropy under superchannels. Note, that if the two quantities are indeed the same, we immediately learn that D A (Θ 1 ∥Θ 2 ) is also an achievable rate for nested adaptive strategies (and parallel ones). An alternative route to proving achievability directly, could be to generalize the achievability proof for channels in [9,Theorem 6]. The idea there is to use an appropriate number of channel uses within the resource theory of asymmetric distinguishability to recursively prepare the optimal input states in the amortized channel divergence.
The careful reader might notice that earlier in this manuscript we defined the quantity D cA (Θ 1 ∥Θ 2 ) that is not currently associated to any strategy. The interest in it stems mostly from the observation that under the same technical assumption as before one can show that as shown in Appendix A. For a complete picture it would be desirable to find a class of adaptive strategies for which D cA (Θ 1 ∥Θ 2 ) is a converse rate and that includes parallel strategies with product states. We conjecture that the inequality in Equation (73) holds and will turn out to be indeed even an equality.  Figure 7: Every adaptive strategy can be depicted by deconstructing the superchannel Θ into its components E Θ , D Θ . While the first and last channel are fixed, the remaining channels can be of either type, depicted by ? Θ , only restricted by 1) each channel has to appear a total of n times, 2) each channel E Θ has to appear before the D Θ which belongs to the same Θ. For simplicity we omit most of the systems S, they are naturally connecting the E Θ , D Θ belonging to the same Θ.

Remark 5.4.
Note that for all week converses presented in this section, one could replace Equation (52) with and we get a strong converse bound based on the max-relative entropy instead of the relative entropy with Similarly we can get a strong converse bound in terms of the geometric Renyi divergence via [43] leading to In the channel case, and therefore the succesive adaptive strategies for superchannels, we have that the amortization collapses for both the amortized channel max-relative entropy [8] and the amortized channel geometric Renyi divergence [43]. We leave it open whether this also holds in more general superchannel scenarios.

Fully general adaptive strategies
With the nested adaptive strategies in the previous section we have seen a class of strategies that is unique to the superchannel case in the sense that there is no comparable set of strategies in the case of quantum channels. We have also learned that these strategies are at least as powerful as the most general parallel strategies. One might at first be tempted to think that this exhausts the possibilities of superchannel discrimination. However, one also quickly notices that looking at successive and nested adaptive strategies, neither of the two classes seems to be included in the other one and it could in principle be useful to employ a mixture of the two strategies, e.g. use n 2 rounds of successive, followed by n 2 rounds of nested adaptive superchannel uses. But that is not all yet: All strategies discussed so far either place superchannels as a whole before, after or within another superchannel. However, in principle there is no a priori reason to limit the use of superchannels in this way. As an example for a strategy that does not follow this rule we will discuss what we call a braided adaptive strategy as depicted in Figure 6. In this section we will finally discuss a converse bound on quantum superchannel discrimination that provably also holds for the most general class of strategies. Now, how can we ensure a description that covers all possible strategies of a superchannel? The key to this question is to take apart the superchannel into its components, meaning instead of discussing the superchannel Θ we will state the strategy in terms of the underlying channels {E Θ , D Θ }. We now allow for any adaptive strategy using the channels in any order that doesn't violate the superhannel structure, i.e. the channel E Θ of a particular use of Θ has to be used before the corresponding channel D Θ . It should also be noted that the adaptively chosen intermediate channels A i are not allowed to act on the S systems as this would give a forbidden advantage. The resulting structure is depicted in Figure 7.
Our goal will now be to find a converse bound that includes all possible strategies. As described above we will view the strategies following Figure 7: We start with an input state ρ 0 CR to which we apply E Θi followed by an adaptively chosen map A 1 and continuing by alternating applications of superchannel fragments and adaptive operations, finally ending in a last application of D Θi . We will now show that for any such strategy we can give a meta-converse. As for the nested strategies, the set of possible strategies is given by , Q}, with the difference being that components of the superchannels can appear in any order between the adaptively chosen channels A i . Theorem 5.5 (Meta-converse for arbitrary strategies). For any adaptive superchannel discrimination strategy, we have Proof. As usual we begin with the divergence between the two possible output probability distributions, where the first line is data processing and explicitly writing out the discrimination strategy, the second a property of amortization and the equality follows by explicitly writing the superchannel that belongs to the final D Θi assuming that A j is the last adaptive map before the corresponding E Θi . The sequence A 2n−1 • · · · • A j+1 and the sequence A j • · · · • A 1 may both include components of other copies of the superchannel Θ i . For ease of notation, we define where i ∈ {1, 2} depending on whether the superchannel fragments are from Θ 1 or Θ 2 , i.e. whether they appear in the first or second argument. Let F be the channel that achieves the infimum in D A * (Θ 1 ∥Θ 2 ). We now continue the derivation with Eq. (80) where the first equality is by definition, the second simply adding a zero, the first inequality follows by optimizing over the channels in the first two terms and the third equality is the definition of the amortized superchannel divergence. The final equality follows by definition and the final inequality is removing A 2n−1 via data processing. Note that the final amortized channel divergence is taken between quantum channels resulting from a discrimination strategy that corresponds to the initial one after removing the final superchannel. Now, observe that the remaining amortized channel divergence has, after merging A j+1 • F• A j into a single channel and appropriate relabeling, the same form as the initial divergence in Equation (79), but for a discrimination strategy based on n − 1 applications of the superchannel. Iterating the same procedure as above to remove all n superchannels results in which concludes the proof. ⊓ ⊔ Now, with the same technique as described in the previous sections, we get a weak converse bound for a Stein's Lemma with arbitrary strategies, Note that of course for certain strategies one can also mix the previously used proof strategies to get convex combinations of amortized superchannel divergences, e.g. in the aforementioned case where one applies n 2 rounds of successive, followed by n 2 rounds of nested adaptive superchannel uses. On the other hand, the braided adaptive strategy from Figure 6 gives an example where, based on our proof technique, one seems to necessarily always end up with using D A * .
These observations lead to a host of interesting questions that demand further investigation. Most importantly, is D A * achievable or can one find a tighter converse bound? If it is optimal, are there cases where D A * > D A , which would then imply that adaptive strategies are strictly more powerful then parallel strategies. For example, does there exist an example where a braided adaptive strategy is strictly better than all nested adaptive strategies?
Finally we remark that similar to the previous sections we can get strong converse bounds in terms of the amortized max-relative entropy or the amortized geometric Renyi-divergence using the same technique as in Remark 5.4. It is however again unclear whether all involved amortized quantities collapse when using max-relative entropy or the geometric Renyi-divergence.

Summary of results for superchannels Strategy
Converse bound (Stein) Achievable? Figure  Product D(Θ 1 ∥Θ 2 ) ✓ Fig. 3 a Parallel with product channels D s∞ (Θ 1 ∥Θ 1 ) * ✓ Fig. 3 b Parallel with product states Table 1: List of investigated strategies for superchannel discrimination along with the converse bounds provided in this work for the Stein's setting. Rates marked with * are equal, therefore choosing the more general class doesn't provide any advantage. Rates with © are equal given an assumption discussed in Appendix A and if true D A is also achievable. The third column states whether the given converse bound is optimal, i.e. it can be achieved by a particular strategy.

Examples
In this section, we will discuss several examples of superchannels and discuss how the bounds in the previous section simplify for these special cases. Examples include classical superchannels, superchannels with trivial side-channel, environment parametrized and side-channel parametrized superchannels.

Classical superchannels
In this section, we consider the problem of distinguishing classical superchannels. In the case of classical channels it was shown by Hayashi that adaptive strategies do not improve the discrimination error rate [11]. We will see here that the same holds for the Stein's Lemma for classical superchannels. To the best of our knowledge this has not been investigated previously. From the definition of the superchannel we can take a classical superchannel as a pair of conditional probability distributions θ ≡ {e(a, s|c), d(d|b, s)} that transform a channel n(b|a) as Since everything is classical, we are also free to make copies of every accessible system. Equivalently, we can give the experimenter access to the systems directly. That is, all variables except s which is an internal variable of the superchannel. As a result either of the following two outputs is available to the experimenter: Therefore, the relative entropy of two classical superchannels becomes , r ′ a, c, r)).
We will now show that for classical channels the amortized superchannel relative entropies always collapse to the superchannel relative entropy.
and therefore the product strategy is optimal for asymptotic asymmetric superchannel discrimination for classical superchannels, in particular also adaptive strategies do not provide any advantage. Proof. By Lemma 4.6 it suffices to show that D A * (θ 1 ∥θ 2 ) ≤ D(θ 1 ∥θ 2 ). In the following we abbreviate the variables for readability, but they should be clear from context. First, note that for two classical channels the amortized channel relative entropy always collapses to the channel relative entropy, therefore where the third equality is by definition, the first inequality by bounding with a joint supremum and bounding the infimum by the concrete choice of e θ1 , the forth equality can be checked by direct calculation and the second inequality follows from data-processing. The statement of the lemma follows directly from here.

⊓ ⊔
We conclude that simple product strategies are optimal when discrimination between classical superchannels. The case of classical channels was extended to classical-quantum channels in [8,12] and it would be interesting to see if the same results hold for classical-quantum superchannels.

The role of the side-channel
This section is meant to give an intuition that the allowed communication between the two parts of the superchannel is the crucial difference to channel discrimination. That is, we will show that if the side-channel via S is trivial in both superchannels, i.e.|S| = 1, the discrimination problem drastically simplifies as it essentially reduces to discriminating two pairs of quantum channels, see Figure 8 a).
First, we will give a meta-converse for any strategy that is different to the ones presented before. Lemma 6.2. For two superchannels Θ 1 and Θ 2 we have for any strategy S, Proof. This is easily shown by using the channel amortization technique in [8], step-by-step removing either E Θi or D Θi depending on whats next in line.

⊓ ⊔
This bound has some disadvantages in the superchannel case. In particular, it is dependent on the chosen decomposition of the superchannel. This is of course not desirable and we can easily see that in general the new bound in Equation (91) is far from optimal because the bound does not take into account that the S system is inaccessible: Consider two superchannels Θ i for which the right hand side of Equation (91) is finite and that can be decomposed into {E Θi , D Θi }. Now, we construct the following channels,Ê It is now easy to see that the superchannelsΘ i constructed from {Ê Θi ,D Θi } are equivalent to the respective Θ i and should not be easier to distinguish. Nevertheless, even through D A (E Θ1 ∥E Θ2 ) is assumed to be finite, D A (Ê Θ1 ∥Ê Θ2 ) is infinite rendering the bound in Equation (91) useless. This simple calculation also provides evidence that a good bound has to be based on the superchannel as a whole. Now, if one only considers superchannels with trivial communication system S we get that the above bound is indeed optimal for any sufficiently strong discrimination strategy. Lemma 6.3. For two superchannels Θ 1 and Θ 2 with trivial system S we have in the Stein's setting, where * can be any strategy that is at least as powerful as the successive adaptive strategy.
Proof. The converse follows directly from Lemma 6.2. The achievability follows since the successive adaptive strategy can, given a trivial system S, be separated into two adaptive channel discrimination strategies for the respective channel pairs as follows. As initial input state we choose a product state ρ CR ⊗ ρ BR ′ and the adaptive operations within the superchannel are simply used to swap the system into place. The adaptive operations in between superchannels are used for arbitrary operations in product form A 1 ⊗ A 2 , leading to which gives the beginning of the tensor product of two adaptive channel discrimination strategies. Iterating swap operations and adaptively chosen product channels gives the full strategy. The results follows then from the Stein's Lemma for quantum channels.

⊓ ⊔
One should however remember that this is usually not true when S is not trivial as seen by the example above. As a remark, the same also holds for general k-combs and in terms of channel discrimination means that if an experimenter has access to n uses of each channel in the pair the optimal discrimination rate is the sum of the individual discrimination rates.

Replacer superchannels
For channels we have replacer channels which essentially reduce channel problems to state problems. We can similarly define replacer superchannels which should also reduce the problem by one level. A replacer superchannel Θ i takes any channel N and outputs a fixed channel N i , i.e. Θ i (N) = N i . These replacer channels are simply constructed by taking the input state, applying N i and outputing the result, while at the same time feeding a dummy state into N and immediately tracing out the result. Here, it is important that all replacer superchannels use the same dummy state, not to add any additional distinguishability, and we will therefore usually choose it to be the maximally mixed state. For a depiction of a replacer superchannel see Figure 2. Note that replacer channels are a special case of the side-channel seizable channels discussed in the next section, but we state the following lemma for completeness. Lemma 6.4. For two replacer superchannels we have, Proof. The upper bounds D(Θ 1 ∥Θ 1 ) ≤ D (N 1 ∥N 2 ) and D A * (Θ 1 ∥Θ 2 ) ≤ D A (N 1 ∥N 2 ) are a special case of those for side-channel parametrized superchannels shown in Lemma 6.8.
We can now furthermore see that D(Θ 1 ∥Θ 1 ) ≥ D (N 1 ∥N 2 ) and D sA (Θ 1 ∥Θ 2 ) ≥ D A (N 1 ∥N 2 ) by choosing all channels in the suprema in the quantities as identity channels. The inequalities then follow directly from the structure of the replacer channels which concludes the proof.
⊓ ⊔ It follows that two replacer superchannels can be discriminated with the same rate as the fixed channels they output. For any strategy that's at least as powerful as the successive adaptive strategy the rate is given by the amortized channel relative entropy which is the optimal channel discrimination rate. Product strategies are naturally not sufficient to reach that rate and result in a discrimination rate according to the channel relative entropy.

Environment and side-channel parametrized superchannels
An often considered class of channels are the so called environment parametrized channels. Two channels N 1 , N 2 are called (jointly) environment parametrized if they differ only in the choice of an auxiliary state on an additional environment system, i.e.
where ω i E is the parameterizing state. A natural extension of the concept is to define an environment parametrized superchannel as one with both defining channels being environment parametrized by a joint environment state ω i E1E2 , see Figure 8 b). Interestingly it is unclear how to give a good upper bound on D A * for these channels, nevertheless we can give convenient converse bounds via the following meta-converse. Resolving this tension is directly related to the achievability of D A * . Lemma 6.5. For two jointly environment parametrized superchannels Θ 1 and Θ 2 we have for any strategy S and sub-additive divergence D, Proof. For the proof we simply note that since the superchannels are, besides the parameterizing environment state, the same, we can regard any discrimination strategy as a source preparing either (ω 1 E1E2 ) ⊗n or (ω 2 E1E2 ) ⊗n followed by a fixed identical channel. Removing that channel via data-processing and using sub-additivity leads to the desired bound.
⊓ ⊔ Similar to the channel case [8], we call two jointly environment parametrized channels environment seizable if there additionally exists a superchannel Ψ such that where is the replacer channel that always outputs ω i E1E2 . In this case we can easily get achievability results, e.g. for the Stein's Lemma setting. Lemma 6.6. For two jointly environment seizable channels Θ 1 and Θ 2 we have where * can be any strategy that is at least as powerful as the product strategy, which includes every strategy discussed in the work.
Proof. The proof follows easily, as an achieving strategy is simply to apply the superchannel Ψ to every Θ i and then discriminate the output states.

⊓ ⊔
This reduces the discrimination of jointly environment-seizable superchannels with product, any parallel or adaptive strategies to that of quantum states. It also means that none of these strategies is more powerful that the simple product strategy. Now, we discuss a different class of superchannels with a similar intuition: superchannels parametrized by a particular choice of channel. Definition 6.7. We call two superchannels Θ 1 and Θ 2 jointly side-channel parametrized if their action can be decomposed as For a visualization see Figure 8 c). Note that the previously discussed replacer superchannels are a special case of side-channel parametrized superchannels. We now get the following bound. Lemma 6.8. For two jointly side-channel parametrized superchannels Θ 1 and Θ 2 we have Proof. The first line follows easily by using data-processing. In the second line, the first two inequalities follow by Lemma 4.6. For the third, let ρ * CR , σ * CR be the optimizing states in D A (Θ 1 (N)• N∥(Θ 2 )(M) •M) for the chosen superchannels. We then observe, where the first inequality is by bounding the infimum by choosing E, the second by how we chose the states, the third uses data-processing and the forth the chain rule from [10] separating the side-channels from the rest. ⊓ ⊔ Similar to the environment seizable channels defined in [8], we can identify a restricted class for which the above inequalities are tight. Definition 6.9. We call two superchannels Θ 1 and Θ 2 side-channel seizable if they are environment parametrized and additionally there exists a superchannel Ψ such that for i ∈ {1, 2} and some channel N.
For these superchannels we have the following. Lemma 6.10. For two jointly side-channel seizable channels Θ 1 and Θ 2 we have Proof. The proof follows easily because both the channel relative entropy and the amortized channel relative entropy are monotone under superchannels.

⊓ ⊔
We can conclude that for side-channel seizable channels the task of asymptotic asymmetric discrimination reduces to channel discrimination with same rates as discriminating the parameterizing channels S 1 and S 2 , which is a direct generaliztion of the earlier result for replacer channels.
Finally it should be remarked that the decomposition in Equation 101 is not unique, similar to the discussion in Section 2.3. In particular, for any such superchannel one can always find a different decomposition with the same structure where the channels S i are isometries which are perfectly discriminable by a finite number of rounds, i.e. any converse bound via Lemma 6.8 would become infinite. Therefore one has to find the right decomposition in order to get the best converse bound. In the case of side-channel seizable superchannels that decomposition follows from the construction and is given by one that matches the achievable rate.

Beyond Steins Lemma
So far we have mostly considered the asymptotic asymmetric discrimination setting, a.k.a. Steins setting. However, since we have stated many previous results in terms of generalized divergences they also allow us to state bounds on other settings, specifically the strong converse exponent and the symmetric setting. The results below can mostly be found by applying the meta-converses from the previous sections to these scenarios using techniques that can be found e.g. in [8]. We therefore keep the discussion short. Also note that the examples in Section 6 often only use few properties of the relative entropy, e.g. its chain rule, and hence many of them can easily be transfered to the settings described in this section if the used quantities fulfill the same properties.

Symmetric Discrimination -Chernoff's bound
Symmetric hypothesis testing describes an alternative scenario in which we aim to simultaneously minimize the two possible errors. This is sometimes also described as the Bayesian setting of hypothesis testing. Given an a priori probability p ∈ (0, 1) that the first superchannel Θ 1 is selected, the non-asymptotic symmetric error exponent is defined as Given that the expression above involves an optimization over all final measurements Q, we can employ a well known result relating optimal error probability to trace distance, see also Section 3, to conclude that where ρ and τ are the possible output states of the chosen strategy depending on whether the superchannel was Θ 1 or Θ 2 , respectively. We are then interested in the asymptotic symmetric error exponent Thanks to the meta-converses in Section 5, we can easily get converse bounds on the superchannel Chernoff bound as well.
Theorem 7.1. For two superchannels Θ 1 and Θ 2 , we have depending on the choice of strategy S, the following bounds: Proof. The results follow all similarly by first taking the following inequality from [8], and then either using the meta-converse for the corresponding strategy or, for the regularized quantities, simply optimizing over all possible channels and input states.

⊓ ⊔
In the case of the product strategy we can furthermore obtain the asymptotic result by reduction to a state discrimination problem as previously described for the Stein's setting in Section 5.1. From this we get for all the other strategies asymptotic statements of the following form: in the case of fully general strategies and similarly for all the others. Note that also in the channel setting no tighter bounds are known.

Strong converse exponent -Han-Kobayashi
The strong converse exponent is a refinement of the asymmetric hypothesis testing quantity discussed above. For r > 0, we are interested in characterizing the non-asymptotic quantity as well as the asymptotic quantities The interpretation is that the type II error probability is constrained to tend to zero exponentially fast at a rate r > 0, but then if r is too large, the type I error probability will necessarily tend to one exponentially fast, and we are interested in the exact rate of exponential convergence. Note that this strong converse exponent is only non-trivial if r is sufficiently large. Based on the results in Section 5 and the techniques from [8] we get the following result.
Proof. The results follow all similarly by first taking the following inequality from [8], and then either using the meta-converse of the corresponding strategy or, for the regularized quantities, simply optimizing over all possible channels and input states.

⊓ ⊔
For more on the state and channel case of the strong converse exponent we refer to [47,42].

Remark 7.3.
A different refinement is the error exponent, or Hoeffdings bound, in the sense that the type II error probability is constrained to decrease exponentially with exponent r > 0. We are then interested in characterizing the error exponent of the type I error probability under this constraint. That is, we are interested in characterizing the non-asymptotic quantity Note that this error exponent is non-trivial only if r is not too large. Already in the channel case this scenario is much more difficult to handle and bounds are only known in some special cases. We leave the investigation of this case for quantum superchannels and quantum networks for future research. For the state case we refer to [48] and the channel case to [8].

Quantum Networks
We have so far focused on superchannels, which are instances of networks with exactly one access point. In this section, we generalize the results to general quantum networks. Most of the tools used in this section are generalizations of the superchannel case and we therefore keep the presentation short. First we have to define the generalized divergences for networks. We start with the rather simple case of the generalized quantum network divergence. For two quantum networks, represented by k-combs Θ k 1 and Θ k 2 , we have [9, Definition 1] where (A i ) k−1 = (A 1 , . . . , A k−1 ) are the k−1 channels the k-comb acts on. From here a regularized divergence is defined in the usual way, And lastly, we need to define the amortized quantum network divergences, which generalize the amortized superchannel divergence and the fully-amortized superchannel divergence.
Here Ω k−1 1 is a k − 1-comb that takes the role of the channel F in the superchannel case. Note that just as in the case of superchannels, one can define several intermediate versions of the regularized and amortized quantum network divergences, e.g. one where only the input state or certain channels are amortized. The definitions follow similarly as in the superchannel case and we omit them here for brevity.
Of course one now needs to classify the possible discrimination strategies in order to investigate their error performance. The previous discussion on superchannel discrimination should convince the reader that the number of possible strategies, with potentially different ability to discriminate, is too large to discuss them here one-by-one. We will instead limit the discussion to a few notable strategies, however remark that all in some way more limited strategies can as well be investigated similarly considering the suitable extensions of the superchannel setting.
The first example is again the simple product strategy. By reduction to the state discrimination case one easily gets, Furthermore, one can check that the proof strategy for parallel discrimination of superchannels in Section 5.3 also extends to the general network case. This leads to the following result in the Stein's setting for fully parallel strategies, The main result in this section is the following generalization of Theorem 5.5 to general quantum networks. Theorem 8.1 (Meta-converse for arbitrary strategies). For two k-combs Θ k 1 and Θ k 2 and any adaptive network discrimination strategy we have Proof. The proof is a direct extension of that of Theorem 5.5 considering k-combs while always assuming that when removing the currently last comb all access point might act on channel sequences which include components of previous combs. Iteratively removing all the combs leads to the desired result.

⊓ ⊔
This covers the most general setting and allows us to extend the discrimination results from the superchannels case to networks, where we get fundamental converse bounds on the asymptotic distinguishability of two quantum networks.

Theorem 8.2.
For two k-combs Θ k 1 and Θ k 2 and any adaptive network discrimination strategy we have in the Stein's setting in the symmetric Chernoff setting, and for the strong converse exponent, Proof. This follows analogously to Equation (85), Theorem 7.1 and Theorem 7.2, using the metaconverse in Theorem 8.1 applied to the appropriate generalized divergence. ⊓ ⊔ Combining the above results implies that for the best possible (fully general) strategy, we have Of course the crucial question is now for which channels those inequalities become equalities. A simple example we can discuss here is that of classical networks for which the amortized network relative entropy always collapses.

Lemma 8.3.
For two classical networks θ k 1 and θ k 2 , we have Proof. The proof is similar to that of Theorem 6.1.

⊓ ⊔
It follows that for two classical networks product strategies are optimal for discrimination in the Stein's setting. The bounds presented here give fundamental limits on the ability to discriminate between two quantum networks, however, it is also clear that many important questions in the general setting are still unanswered and invite further research.

Applications
Quantum channel discrimination has found many applications [11,39,49]. Often it can make sense to generalize these applications to networks, for example when we want to allow for additional interaction in a given protocol. In the following we will discuss a generalization of the well known quantum illumination problem in which we allow for a relay station on the object we wish to detect that can aid by altering the signal it receives.

Active quantum illumination
Quantum illumination [49] describes the task of detecting the presence or absence of a certain object making use of quantum tools to enhance detection. There are usually two possible scenarios to consider, both start with shining a light source at the suspected location of an object. In the first case, one assumes that either the object is missing and the light can pass unhindered or the object is present and the light is blocked. Let's say the default noise the light probe ρ is experiencing is given by a channel N, then the receiver behind the object will either receive N(ρ) if the probe passes or the fixed state τ if the probe is blocked. As ρ can be subject to optimization, the resulting task is equivalent to discriminating the channels N and R τ , with the latter being the associated replacer channel. In the second case, the position of the detector is changed such that a missing object is represented by receiving τ while the presence of the object gives the reflection of the probe, i.e. N(ρ), therefore the role of null and alternative hypothesis are interchanged.
In [39] it was shown that at least in the first case, adaptive strategies do not give any advantage in many scenarios, namely when one is interested in the error rates according to Stein's Lemma or the strong converse scenario. See also [8] for a proof based on amortized channel divergences. In the quantum illumination task described above, the object is considered passive in the sense that it can only absorb or alternatively reflect the object. Here we propose active quantum illumination in which we allow the object to actively aid the discrimination task by receiving the probe state, acting on it and sending it on. As typically one might use quantum illumination to detect an object in space, we might already have a relay station on that object capable of using quantum optics to aid its observation. In full generality, the task now becomes to discriminate between the superchannel output Θ 1 (N)(ρ) and a fixed state τ , i.e. the replacer superchannel Θ Rτ that always outputs the replacer channel R τ .
While we leave the general investigation of the problem to future work, we show here that in the case of the superchannel Stein's setting the simple product strategy is optimal, giving an extension of the results in [39]. Lemma 9.1. Let Θ 1 be an arbitrary superchannel and Θ Rτ the replacer superchannel that always outputs the replacer channel R τ . We have, Proof. Based on the results from the previous sections the only thing left to show is that We will do so here, where the first inequality follows from Lemma B.1 which we will prove in Appendix B, the first equality follows by definition of Θ Rτ , the second inequality is data-processing. The subscript R denotes restriction on the R system, i.e. the result of tracing out all other systems. For the second equality notice that N(ρ) R = Tr D Θ 1 (N)(ρ) and by direct calculation one can confirm the rule using log(A⊗B) = log A⊗1+1⊗log B, see also [8,Lemma 38]. The final equality is by definition. This concludes the proof. ⊓ ⊔

Conclusions
In this manuscript we have extended the framework of quantum hypothesis testing to the setting of discriminating two quantum networks when many uses are available. The additional structure provided by networks makes this a rich and enticing problem and we have discussed a host of potential discrimination strategies. For each strategy we provided a converse bound, giving a fundamental limit to their performance. In most cases we also discussed achievability of those converse bounds, determining the optimal asymptotic error behavior.
Nevertheless, many open problems remain. First and foremost, the achievability for the fully general class of strategies. Most interestingly, could there be some class of strategies that can outperform the nested adaptive strategies? It seems that this is indeed a possibility, based on our converse bounds. That would imply that there are particular strategies that outperform parallel strategies which would be a departure from the results in the channel discrimination setting. It is however also entirely possible that one can find a sharper converse bound that improves on the result presented in this work.
Secondly, proving the asymptotic equipartition property for the smooth max-relative entropy as discussed in Appendix A. This would allow us to to simplify the relationship between several classes of strategies, show directly that our bound for nested adaptive strategies in the Stein's setting is achievable and therefore optimal, and also be valuable in its own right.
Additionally, there are a lot of other open problems, for examples a better understanding of the symmetric hypothesis testing setting, even for the case of channels. In particular in the network setting, it would also be interesting to look at more limited classes of strategies, e.g. where different parts of the networks can only be acted on by locally separated parties, which is commonly known as distributed hypothesis testing. In the opposite direction, one could also look at more general settings including non-causal strategies and explore whether the tools developed in this work can also apply there. It has previously been shown that non-causal strategies can outperform causal ones in certain settings [50]. Finally, considering the numerous applications of state and channel discrimination it should be worthwhile to further investigate those of network discrimination, for example as a tool to determine bounds on the communication rates over different network architectures.

A AEPs and chain rules
The aim of this section will be to develop tools that allow us to better compare amortized and regularized superchannel relative entropies. We will show, given that a certain technical assumption is true, that the regularized superchannel relative entropy defined in the main text is equal to the amortized superchannel relative entropy. The path will lead us via generalizing the chain rule for quantum channels proven in [10]. To this end, we will mostly work with the max-relative entropy and its smoothed version, where B ϵ (ρ) = {ρ ∈ S ≤ (A) : P (ρ, ρ) ≤ ϵ} is the ϵ-ball around ρ with respect to the purified distance and S ≤ (A) denoting the set of subnormalized states. An important property of the smooth max-relative entropy is that it asymptotically converges to the relative entropy as given by the asymptotic equipartition property (AEP) [51,45], i.e.
We will also need the smooth max-channel relative entropy, where the infimum is over all quantum channelsÑ with 1 2 ∥Ñ − N∥ ⋄ ≤ ϵ. Its regularization is given by, Unfortunately, it is unknown whether this quantity has a similar asymptotic behavior as its state equivalent. We do know that the following holds, This follows e.g. from [9,8]. Here, we however want to assume that the quantity behaves nicely and we make the following technical assumption when needed, Some brief remarks here: First, the regularization of the relative entropy is necessary [10] and, second, the smoothing is necessary, as we know that without smoothing the right inequality in Equation (154) becomes an equality [8]. It is currently unclear whether the supremum over ϵ is actually needed, but the stated version is sufficient for our application. The equality in Equation (155) was first conjectured by Andreas Winter [52] and formally discussed in [53]. Proving Equation (155) would also be useful beyond the questions discussed here and in [53], e.g. solve an open problem in the resource theory of asymmetric distinguishability of quantum channels [9, Section VI.6]. Work towards proving the conjecture can also been found in [54].
With these preliminaries out of the way, we can now turn to the chain rules. In [10], Fang et. al. proved a new chain rule for the relative entropy under quantum channels, which lead to the realization that the amortized channel relative entropy is equal to the regularized channel relative entropy. We will now generalize this result to superchannels by following the same approach. For that we first prove a chain rule for the max-relative entropy.
To continue, we also have a τ ∈ B ϵ (D 1 • N ϵc • E 1 (ν ϵs ) ⊗m ), such that where the fist and third inequality follows by triangle inequality, the second and fourth by definition and the fifth along the lines of [10]. We also used in the fourth inequality that P (ρ, σ) ≤ ∥ρ − σ∥ 1 , see e.g. [51]. This concludes the proof.

⊓ ⊔
We can now make use of the asymptotic equipartition property described in Equation (151) to get the following chain rule.
Corollary A.2. With the definitions as above, we have, Proof. To prove the above inequality, we use the result from Theorem A.1 and reasign D i → D ⊗n i , E i → E ⊗n i , N → N ⊗n , M → M ⊗n , ρ → ρ ⊗n and σ → σ ⊗n . Using the AEP in Equation (151) we get by taking the appropiate limits, similar to [10], the desired result.
⊓ ⊔ While this is interesting in its own right, we want to combine the result with our AEP assumption.
and therefore Proof. The first statement is a direct application of the assumption. The second follows because D ∞ (N∥M) = D A (N∥M), reordering and then taking the supremum over all channels and states.

⊓ ⊔
We now want to point out that by simplifying the above proof strategy, we can also get the following results.
As described in Section 5.4, what we are currently missing to make Equation (165) an equality is a discrimination strategy that includes parallel strategies with product states as a special case and has D cA (Θ 1 ∥Θ 2 ) as a valid converse bound.
Finally, we remark that the chain rule in Corollary A.2 can be extended to k-combs following a similar derivation, resulting in additional terms for all pairs of channels the combs act on. Investigating the applications for the discrimination of general quantum networks appears to be a promising future direction.

B Other ways to amortize?
In this section we will briefly comment on yet another potentially different way of amortizing superchannel divergences. Based on the definition of the state-amortized superchannel divergence and a possible intuition that it covers only amortization at the input state of the superchannel, one might be tempted to define the following amortized quantity, which conveys the intuition that the input states and the input channels are amortized simultaneously.
Indeed, using techniques established in the previous sections, one can show a meta-converse for any strategy S, including the fully general ones, and one can also see that the bound is tight for classical channels and when the second superchannel always outputs a fixed state (see Lemma 9.1 and its proof). However, we also have the following result that states that bounds in terms of DÃ(Θ 1 ∥Θ 2 ) are never better than those we gave in the main text of this work.
Lemma B.1. For two superchannels Θ 1 and Θ 2 , we have Proof. We begin by fixingN andM as the channels that achieve the corresponding supremum in D A * (Θ 1 ∥Θ 2 ) as defined in where the first equality is by definition, the inequality by restricting the supremum to output states ofM andN and the final equality by adding a zero. We now fixρ andσ as the optimizing states in D where the first equality follows by the choice ofρ andσ, the second inequality by taking an infimum and the second and third equality by definition of the corresponding amortized quantity. This concludes the proof.

⊓ ⊔
Interestingly it remains open whether the two quantities are actually different. Currently we only know of, rather particular, examples where they turn out to be the same. Should they indeed be different, that would further our understanding that superchannel divergences should be based on channel divergences. If they are the same we would get a possibly more convenient way of working with D A * .

C Bonus content: parallel ⊆ adaptive
In the main text we have used many times the idea that parallel strategies are a special case of certain adaptive strategies. This can be verified by picking a particular adaptive strategy that implements the parallel strategy. For the convenience of the reader, we want to depict the transformation of a fully parallel strategy into a nested adaptive strategy in an animation in Figure 9. The first frame gives the fully parallel strategy for n = 3. Running the animation by using the arrows below shows how to reorganize the different elements in order to get a special case of the nested adaptive strategies as depicted in Figure 5. Generally speaking, crossing lines in the final frame can be seen as an operation that swaps the position of two quantum system. To get the general adaptive operations, we simply allow for general quantum channels acting on all available systems instead of the swap operations.
Running the animation requires to open this document in a modern pdf viewer, e.g. Adobe Acrobat.