Operational applications of the diamond norm and related measures in quantifying the non-physicality of quantum maps

Although quantum channels underlie the dynamics of quantum states, maps which are not physical channels -- that is, not completely positive -- can often be encountered in settings such as entanglement detection, non-Markovian quantum dynamics, or error mitigation. We introduce an operational approach to the quantitative study of the non-physicality of linear maps based on different ways to approximate a given linear map with quantum channels. Our first measure directly quantifies the cost of simulating a given map using physically implementable quantum channels, shifting the difficulty in simulating unphysical dynamics onto the task of simulating linear combinations of quantum states. Our second measure benchmarks the quantitative advantages that a non-completely-positive map can provide in discrimination-based quantum games. Notably, we show that for any trace-preserving map, the quantities both reduce to a fundamental distance measure: the diamond norm, thus endowing this norm with new operational meanings in the characterisation of linear maps. We discuss applications of our results to structural physical approximations of positive maps, quantification of non-Markovianity, and bounding the cost of error mitigation.


Introduction
It is one of the fundamental properties of quantum mechanics that the evolution of quantum states is described by linear maps which are completely positive and trace preserving (CPTP), stemming from the unitary dynamics enforced on a larger Hilbert space [1]. However, in several different settings of practical importance, various applications of quantum dynamics which are not CPTP can be encountered. This motivates the study of such transformations, and in particular a precise understanding of how they can be compared with and approximated by physical quantum channels.
One important application of non-CPTP maps is in entanglement detection, where positive but not completely positive maps can serve as entanglement witnesses [2]. A bipartite state ρ is entangled if and only if there exists a positive map Φ such that id ⊗ Φ(ρ) is no longer a positive operator, and therefore such a map can reveal the correlations of ρ. This approach has constituted one of the most important ways of detecting entanglement [3,4], but its experimental implementation encounters an obstacle: how to realise the action of an unphysical linear map in practice? This question prompted the introduction of structural physical approximations (SPA) of non-CPTP maps [5], which aim to enable the physical evaluation of general maps by designing suitable approximations in terms of quantum channels and using them to infer properties of the original map [6][7][8].
Another setting in which non-CPTP maps are encountered is that of non-Markovian quantum dynamics or, generally, in the reduced dynamics of correlated systems. Specifically, when an open quantum system shares some initial correlations with its environment, the evolution of the composite system-environment state can correspond to a non-CPTP map when looking only at the dynamics of the reduced state of the system [9][10][11][12]. Although the physical interpretation of this is a matter of debate and alternative ways to understand such dynamics have been proposed [13][14][15], it can nevertheless be useful to study such non-CPTP evolutions directly to gain an understanding of reduced dynamics of open quantum systems.
Even broader types of unphysical quantum dynamics can be found in the areas of quantum error correction and error mitigation [16][17][18]. This is because, in a broad sense, both of these settings are concerned with the following problem: if an unknown system has undergone a noisy evolution as ρ → Θ(ρ), how can we reconstruct the original state as closely as possible, that is, how to implement a map Φ such that Φ • Θ(ρ) ≈ ρ? Such inverse operations typically cease to be valid quantum channels, and so it becomes necessary to devise approaches to implement them in practice with the use of physical operations.
In this work, we introduce a general quantitative framework for the characterisation of such unphysical maps by approximating them with quantum channels. We then explicitly give the considered measures operational meaning by connecting them with the performance of practical tasks, including the cost of simulating a given map with quantum channels. Notably, we show that all of the considered measures reduce to the same quantity when the given linear map is trace preserving: they all equal the diamond norm [19,20], a fundamental computational tool that serves as a measure of quantum channel distance and finds many uses in the practical characterisation of quantum processes [21]. This endows the diamond norm with new meanings in the operational tasks that we consider, and furthermore allows a number of new connections to be established. On the one hand, many known results in the quantification of the diamond norm can be carried over to the setting of our work, and on the other hand, we can use our characterisation to provide new insight into the computation and applications of the diamond norm.
Our approach is based on the notion of robustness measures [22] -inspired by recent applications of such quantities in the study of general resource theories of channels [23][24][25][26][27][28][29][30][31][32][33], we use them to quantify the amount of noise needed to turn a given map into a quantum channel. Such measures allow for several different generalisations to the setting of linear maps, motivating us to study and compare these definitions. The robustness-based approaches can be understood as different ways of designing optimal decompositions of linear maps in terms of quantum channels, and so they generalise the standard structural physical approximations [5]. We express the measures as semidefinite programs and establish various relations and bounds between them.
We apply our first measure in the task of simulating the action of an unphysical map with valid channels, accomplished by allowing the use of ancillary systems which can consists of linear combinations of quantum states. Such an approach allows us to reduce the problem of simulating the dynamics of quantum systems to the much simpler case of simulating the use of a non-positive Hermitian operator. Assessing the difficulty of this procedure then reduces to quantifying how much the given operator deviates from being a valid quantum state, and -employing the trace norm as a natural quantifier of such 'non-quantumness' -we show that the optimal cost of simulating a non-CPTP map in this way is given exactly by the value of the robustness measure.
Furthermore, answering the question of whether any unphysical map can provide measurable operational advantages over quantum channels, we show this to be the case in the setting of discrimination-based quantum games, establishing our second robustness measure as the exact quantifier of this advantage.
Our results also generalise and shed light on the very recent findings of Ref. [33], which considered a similar framework for approximating trace-preserving maps using a robustness-and quasiprobability-based approach. In particular, we show that the measure considered in [33] is actually an alternative expression for the diamond norm of a map, rather than a new quantity.
The paper is structured as follows. In Sec. 2, we introduce the notions of robustness measures and show how they can be applied to non-CPTP linear maps. We establish precise connections with the diamond norm in Sec. 3. We then proceed to show that the robustness measuresand hence the diamond norm -play a crucial role in quantifying the cost of simulating linear maps (Sec. 4) as well as in understanding the advantages a non-CPTP map could provide in inputoutput quantum games (Sec. 5). We proceed to establish a number of bounds for the measures in Sec. 6. Finally, we discuss the applications of our approach, comparisons with other methods, and explicitly show how the measures can be evaluated for some representative examples in Sec. 7.

Robustness of non-CP maps
Let A and B denote two finite-dimensional quantum systems of dimension d A and d B , respectively. We will use L(A) to denote the set of all linear operators, H(A) to denote the set of all Hermitian operators, and D(A) to denote all density operators acting on the Hilbert space of system A. We use X, Y = Tr(X † Y ) for the Hilbert-Schmidt inner product.
Among all linear maps from L(A) to L(B), we will be primarily concerned with Hermiticitypreserving maps H(A, B), which are defined as maps such that Φ(X) ∈ H(B) ∀X ∈ H(A). A map is called positive if Φ(X) ≥ 0 ∀X ≥ 0 (w.r.t. the positive semidefinite cone), completely positive (CP) if id A ⊗ Φ is positive, trace preserving if Tr Φ(X) = Tr X ∀X, and trace nonincreasing if Tr Φ(X) ≤ Tr X ∀X. To each map Φ ∈ H(A, B) we will associate the Choi operator [21]). Let CPTNI(A, B) denote the set of completely positive and trace-non-increasing maps in H(A, B), and analogously CPTP(A, B) the set of completely positive and trace-preserving maps. For simplicity of notation, we will often simply write CPTP for CPTP(A, B) (and analogously for other sets) when the spaces in consideration are not relevant.
In order to quantify how much a given map deviates from the set of CPTP maps, we will employ the concept of robustness measures [22]. It will be insightful to first review how such measures are defined for quantum states. Given a convex set of interest F ⊆ D, commonly chosen to be the set of free states in a given resource theory, one asks: how much noise from a set N ⊆ D has to be added to a state ρ in order to make it a free state? This has the intuitive interpretation of measuring how robust the resources contained in the state ρ are with respect to noise from the set N . Specifically, we write The most common choices of the noise set N are: N = D, in which case we obtain the so-called generalised robustness equivalently given by and the choice N = F, which corresponds to the standard robustness r F . The latter quantity is directly related to the so-called base norm ρ F of the set F, which can be alternatively understood as an optimisation of quasiprobability distributions over the set F: where the third line is a simple consequence of the convexity of F. The definitions straightforwardly extend to unnormalised operators X: defining it is important to notice that the trace of X will come into play, and the base norm will equal X F = 2r F (X) + Tr X. The case of interest to us will be where the set of free states F contains all physical quantum states, F = D, in which case the different notions of the robustness are equal and one has that is, the base norm is precisely the trace norm (Schatten 1-norm) · 1 .

Robustness of linear maps.
A generalisation of these concepts to the case of linear maps can be done in several different ways. Firstly, one has to note that it does not suffice to consider tracepreserving maps in the definitions of this measures. This follows since any linear combination of CPTP maps necessarily satisfies that Tr B J Φ ∝ 1, which means that Tr Φ(ρ) takes the same value for any input state ρ. Therefore, any Hermiticity-preserving map whose reduced Choi matrix is not proportional to the identity operator cannot be represented as To circumvent this, we will employ the set of completely positive and trace-non-increasing maps, which can be understood as probabilistic implementations of quantum channels. Importantly, robustness-based definitions which were all equal in the case of states might not be equal any more. We therefore need to explicitly consider three different types of the robustness w.r.t. the sets CPTP or CPTNI: as well as a generalised notion of a base norm with respect to the set of completely positive and trace-non-increasing maps: In the expressions for R and R , we made use of the fact that one can, without loss of generality, restrict the optimisation to CPTP maps; this follows since for any Λ ∈ CPTNI such that Tr B J Λ ≤ 1 A we can define the map Λ by Λ ∈ CPTP and achieves the same value of the objective function. We note that closely related definitions were recently also considered in Ref. [33] for the case of trace-preserving maps. All of the quantities above are well-defined and take a finite value for any map Φ ∈ H(A, B), as we shall see explicitly by establishing general upper bounds in Sec. 6. The robustness R(Φ) can be seen to be an upper bound for all other quantities: any feasible decomposition of Φ in Eq. (6) gives feasible solutions for Eq. (7), (8), and for the base norm in Eq. (9). It is a priori unclear whether one can find general conditions under which the inequalities between the different measures are tight. We shall shortly see that equality indeed holds for all trace-preserving linear maps.
All of the introduced quantities can be computed as semidefinite programs, which follows since the constraints for a map to be CPTNI (or CPTP) are linear matrix inequalities. This means that the measures can be evaluated efficiently (in the dimensions of the map) using numerical software. The equivalent dual forms of the problems, which can also provide some insight into the differences between the different definitions of the robustness measures, will be reported shortly in Sec. 6.

Relation with the diamond norm
For any Hermiticity-preserving map Φ, the diamond norm (completely bounded trace norm) is defined as [19,21] Φ ♦ = max where, in a slight abuse of notation, we use D(A ⊗ A) to denote the states acting on a bipartite Hilbert space composed of the space A and another space isomorphic thereto.
The diamond norm finds use as a fundamental measure of distance between quantum channels, mirroring the operational role of the trace distance in measuring distances between quantum states [19,21,34,35]. It is one of the most widely employed figures of merit in comparing quantum channels and benchmarking channel manipulation protocols. Its quantification and characterisation is therefore crucial to an effective understanding of the properties of quantum processes. Close connections between the diamond norm and the base norm in the space of quantum channels can be inferred already from the operational similarity that the diamond norm bears to the trace norm, the latter being the natural base norm in the space of quantum states. Here we aim to clarify the details of such connections and to explicitly relate the diamond norm with the robustness measures.
We will first introduce the following lemma, which establishes a useful formulation of the diamond norm for Hermiticity-preserving maps. The result is closely related to a more general approach for generalised quantum channels considered previously by Jenčová [36], and can be alternatively deduced from Lem. 4 and Thm. 2 of [36].

Lemma 1.
For any Hermiticity-preserving map Φ, it holds that Proof. Let Φ ♦ denote the quantity in (12). We first notice that the constraint Tr value. We thus have Taking the Lagrange dual of the above (see Appendix A) gives where in the second line, by continuity, we restricted our attention to the set of full-rank states D >0 (A) without loss of generality, and in the third line we made the change of variables W → The fact that this equals the diamond norm of Φ can be deduced from the results of Ref. [37] already; for completeness, we will show this explicitly. Recalling that |Ω Ω| with |Ω being the unnormalised maximally entangled state, and using the fact that Φ is only acting on one of the subsystems, we can write where we used that any pure state ψ ∈ D(A ⊗ A) can be written as for a suitable choice of ρ ∈ D(A), with |ψ constituting the canonical purification of ρ.
Compared with the semidefinite programs for the diamond norm of general linear maps originally derived in Refs. [37,38], the form of the diamond norm presented in Lemma 1 already constitutes a major simplification -both at a conceptual level, allowing for a restatement of the problem in terms of optimising over decompositions of the form J Φ = M + − M − , and computationally, as the number of optimisation variables is reduced.
As an immediate consequence of the above result, we can use the characterisation of the diamond norm in Eq. (11) to construct valid feasible solutions for the base norm and robustness measures in Eqs. (6)- (9), and vice versa.

Corollary 2.
For any Hermiticity-preserving map Φ, it holds that where λ min and λ max denote, respectively, the smallest and the largest eigenvalues.
Proof. Any decomposition for the diamond norm of the form we get the stated bound. The case of R follows analogously.
Equality between the different quantities can be shown for all trace-preserving maps, directly relating the diamond norm with our considered measures. Theorem 3. For any map Φ ∈ H(A, B) which is trace preserving or, more generally, proportional to a trace-preserving map in the sense that Tr B J Φ ∝ 1, it holds that For trace-preserving maps Φ, it additionally holds that Proof. From the fact that Tr B J Φ = t1 A for some t ∈ R, it is easy to see that every decomposition of the form This implies that we can equivalently write which is precisely Eq. (18). Notice then that any such decomposition gives a valid feasible solution for Φ , together with Cor. 2 yielding equality between the two norms.
When Φ is trace preserving (t = 1), we can write The equality Φ ♦ = 2R (Φ) + 1 = 2R (Φ) + 1 then follows: on the one hand, any decomposition of the form in Eq. (22) gives a feasible decomposition for R and R in Eqs. (7)- (8), and on the other hand, any decomposition for the robustness measures is necessarily of the form in Eq. (22). Equality with the robustness R(Φ) follows by noting again that any feasible decomposition in Eq. (22) gives a feasible decomposition for R(Φ), and on the other hand using the relation R(Φ) ≥ R (Φ) which holds by definition.
Remark. The expression in Eq. (18) is valid also in the case of trace-annihilating maps (Tr B J Φ = 0), and thus the case of computing the distance Λ − Λ ♦ between two quantum channels. A simplified expression for this problem appeared previously in [37] and was explicitly expressed as a robustnesstype measure in [26].
We note that the quantity · , applied to trace-preserving maps, was recently considered in Refs. [33] and [39]. It was not noticed in these works that this is simply the diamond norm, and hence many results shown in [33] (e.g. the multiplicativity with respect to tensor product, unitary invariance, bounds with trace norm J Φ 1 , monotonicity under the action of superchannels, and some explicit expressions) follow directly from known properties of the diamond norm [20,37,40,41].
We will later see that this equivalence does not extend to maps which are not trace preserving (or proportional thereto), and indeed we can have Φ = 2 Φ ♦ in the extreme case.

Quantifying simulation cost
Since the quantum dynamics which can be realised in practice are restricted to completely positive maps, a relevant question then becomes: how can one simulate the action of a non-CPTP map on a quantum state when only CPTP maps are available to us?
A similar question was recently asked in Ref. [33], where the authors applied quasiprobability sampling methods [17,30,42] to the desired operation Φ. We take a different approach here and instead allow for the use of an ancillary system X, which can be an affine combination of quantum states, in order to simulate the action of the map Φ as a CPTP map Λ acting jointly on the input quantum state and the ancilla X. The "non-physicality" of the given map Φ is then pushed into the system X, allowing for the overall transformation Λ to be a valid quantum channel.
The motivation for this approach is that the task of simulating the action of the non-CPTP map Φ is effectively replaced with the simulation of a unit-trace Hermitian operator X, which could be significantly easier to realise in practice, especially since we will see that the dimension of the ancilla can be taken to be arbitrarily small. Standard quasiprobability-based approaches such as the ones employed in [17,30,33] aim to estimate the expectation value Tr[Φ(ρ)A], where Φ is a non-CPTP map and A an observable, by decomposing the given map as Φ = λ i Λ i with λ i ∈ R and Λ i ∈ CPTP (or CPTNI). The expectation value Tr[Φ(ρ)A] is then estimated by evaluating Tr[Λ i (ρ)A] and appropriately sampling from the output distributions with probabilities determined by the coefficients λ i [17,42]. In practice, this means that we have to repeatedly realise each operation Λ i , which requires the implementation of a different quantum circuit for each operation. Consider, on the other hand, a situation in which the dynamics is fixed as some map Λ ∈ CPTNI, and we only need to vary the input states. This can be achieved by writing Φ(·) = Λ(· ⊗ X), where we can write any Hermitian operator in a quasiprobability representation as X = i µ i ρ i . The task of sampling from the output distribution is then reduced to feeding in the different states ρ i into the circuit which realises the fixed operation Λ, thus greatly simplifying the implementation.
As mentioned in Sec. 2, a natural quantifier of how much a given operator X ∈ H(A) with Tr(X) = 1 deviates from the set of all quantum states is the trace norm X 1 . Indeed, this quantity can be given an explicit interpretation in terms of the optimal cost of a quasiprobabilitybased estimation of the expectation value of X [42]. We then define the simulation cost of a map as the minimal amount of such "non-physicality" of X needed to simulate the action of the map: We then have the following.
In the case of a trace-preserving Φ, we have in particular that and an optimal Λ for the simulation can be chosen to satisfy Λ ∈ CPTP.
Proof. Let Λ ± ∈ CPTNI(A, B) be maps that achieve an optimal decomposition for Φ such that where ω ± are orthogonal quantum states and Tr[X] = µ + − µ − = 1. We do not impose any additional conditions on the size of the ancillary system A , meaning that its Hilbert space can be chosen to be an arbitrary space of dimension at least 2. Defining the projector onto the positive part of X as P + , we then consider the map defined by the action on a basis |i 1 j 1 | ⊗ |i 2 j 2 | ∈ L(A ⊗ A ) as follows: It is easy to check that Λ(ρ ⊗ X) = Φ(ρ). Now, we will show that as long as the condition is satisfied, then Λ is also CPTNI. This can be seen by observing first that (27) gives which implies thatP is a valid POVM element. Note that we can rewrite (26) as where T P (·) := Tr [P ·]. Since Λ + , Λ − , and TP , T 1−P are all completely positive, Λ is also completely positive. Since (27) is always satisfied when an operator X with X 1 = µ + + µ − = 1 + 2R(Φ) achieves the desired implementation. The converse part can be proven by extending an argument in Ref. [23] to our setting. Suppose a non-quantum resource X = µ + ω + − µ − ω − and CPTNI map Λ realise the simulation of Φ, i.e. Φ(·) = Λ (· ⊗ X). Also, define Λ + (·) := Λ(· ⊗ ω + ). Then, by linearity of Λ, we get Since Λ + and Λ − := Λ(· ⊗ ω − ) are CPTNI maps, this is a valid linear decomposition of Φ into two CPTNI maps, providing an upper bound for its robustness as R(Φ) ≤ µ − = µ + − 1. This gives the desired lower bound for the simulation cost as µ + + µ − ≥ 1 + 2R(Φ).
An interesting quantitative equivalence emerges between our approach and the method of Ref. [33]. In that work, the authors showed that the minimal overhead required to employ quasiprobability-based simulation techniques [17,42] to estimate Tr[Φ(ρ)A] for a trace-preserving map Φ scales with the norm Φ (see also the discussion in Sec. 7.2). Since we know from Thm. 3 that holds for any trace-preserving map, the quantitative cost of the simulation scheme is actually the same as our method, despite the seemingly different approaches employed. In fact, our Thm. 4 shows that it is sufficient to consider decompositions of Φ as where Λ and X = µ + ω + − µ − ω − are as constructed in our protocol. This means that, despite the significant practical simplification obtained by fixing the dynamics of the simulator as Λ and optimising over the quasiprobability representations of X instead, our simulation method does not sacrifice any performance, and the optimal sampling overhead cost of the more direct approach of [33] cannot be any better. We note that Theorem 4 gives a general way of reducing the task of simulating the action of a linear map Φ to simulating an affine combination of states in the form of the operator X. This could provide methods for the simulation of dynamics even beyond quasiprobability-based approaches like the one discussed above, although the specifics of this will depend on the given simulation method.

State injection and resource simulation.
The setting considered here is closely related to state injection methods which generalise quantum teleportation [43] and find use e.g. in the resource theories of entanglement [44][45][46][47][48], stabiliser-state quantum computation [49,50], and coherence [23,51]. In such tasks, a resourceful state φ (such as a maximally entangled singlet) is used to simulate the action of an arbitrary quantum channel Θ as Θ(·) = Γ(· ⊗ φ), where now Γ is a free operation (such as a protocol consisting of local operations and classical communication only). In this sense, our result can be thought of as the cost of channel simulation in the resource theory of "nonphysicality" beyond quantum mechanics, with the operator X acting as a resource. There are many potential ways to interpret such a result: for instance, unit-trace Hermitian operators which are not necessarily positive semidefinite have found use as so-called pseudo-states in [52], where they were used to study correlations beyond quantum mechanics, and as so-called pseudo-density matrices in [53], where they were used to put spatial and temporal correlations on equal footing. Being able to use a Hermitian system X could then be interpreted as having access to such extended sets of correlations. We leave a precise investigation of the connections between the operational setting employed here and resource theories of correlations for future work.

Amortised simulation.
A related setting that we can consider is that of amortised simulation [23,54], in which the non-quantum resource X is not consumed completely, but instead we can recover some of it in the form of another resource Y which can be reused. Precisely, we define Clearly, S A (Φ) ≤ S(Φ) as we can just take X to be optimal for S and Y to be the trivial system 1. One could expect amortisation to lead to a strictly smaller cost of simulating a given map. However, we can show that this is not the case -amortisation cannot improve the simulation cost of any trace-preserving map.

Corollary 5.
For any trace-preserving map Φ ∈ H (A, B), it holds that Proof. Let Λ be the optimal map such that Λ( Noting that this can be alternatively understood as a simulation protocol for the trace-preserving map Φ(·) ⊗ Y , Thm. 4 tells us that any such protocol satisfies where we used the multiplicativity of the diamond norm and the fact that Y ♦ = Y 1 where we treat Y as a preparation channel with a trivial input space. From this we have that S A (Φ) ≥ S(Φ), which concludes the proof.

Quantifying advantages in quantum games
The study of general linear maps in a resource-theoretic setting motivates the question: is there a well-defined operational task in which having access to any non-CPTP map could provide practical advantages over all quantum channels? In order to give an instance of such a task, we consider the setting of input-output games, inspired by the work of Ref. [55] and studied in the context of dynamical quantum resources in [27,28]. The setting is as follows: Alice prepares a state chosen randomly from the ensemble {p i , σ i } i and sends the state through the map Φ ∈ H(A, B) to Bob, who then measures with a POVM {M j } j . The players are then awarded a score based on a reward function characterised by the coefficients {w ij } i,j ∈ R, and their goal is to maximise the average payoff given by by a suitable choice of the states and measurements. The tuple G = ({p i , σ i }, {M j }, {w ij }) then defines the input-output game G. We stress that, although the payoff P (Φ, G) might lose its physical meaning as a discrimination task when Φ is an arbitrary linear map, already for a positive trace-preserving map Φ we have that every output Φ(σ i ) is indeed a valid density matrix and thus the measurement at the output constitutes a well-defined state discrimination task.
We are then interested in quantifying the best possible advantage that a given map Φ could provide over CPTP maps. Such an optimisation is unbounded without any further constraints, so we will consider games for which any completely positive map Γ achieves a non-negative payoff value -this can always be ensured by suitably shifting the payoff function for a given game. We then have the following.

Theorem 6. For any map Φ ∈ H(A, B), it holds that
where the maximisation is over all input-output games G such that P (Γ, G) ≥ 0 ∀Γ ∈ CP.
In the case of a trace-preserving Φ, we have in particular that and it suffices to optimise over games such that P (Λ, G) ≥ 0 ∀Λ ∈ CPTP.
On the other hand, by strong Lagrange duality (see App. A) we can write We can then make the following observations. Firstly, since the set of separable states in D(A ⊗ B) has a non-empty interior [56], any Hermitian operator X can be written as X = n i=1 x i σ i ⊗ η i for some σ i ∈ D(A), η i ∈ D(B), x i ∈ R, and n ∈ N. Then, choose the optimal W in Eq. (43) and where p i = 1/n for i ≤ n and p n+1 = 0, and the coefficients w i are defined by w i = x i n j η j ∞ . By the Choi-Jamiołkowski isomorphism and the linearity of Φ, we then have for Φ that with G defined by the above choices of {p i , σ i }, {M i }, and {w i }. Noticing that W ≥ 0 ⇒ P (Γ, G ) ≥ 0 ∀Γ ∈ CP, this finally gives where the second inequality follows since P (Φ, G ) = W, J Φ = R (Φ) + 1 holds by assumption while P (Λ, G ) = W, J Λ ≤ R (Λ) + 1 holds for any map Λ by definition, and the last equality follows since R (Λ) = 0 for any Λ ∈ CPTP. positive and negative parts as Proof. Consider · first. Using the expression we see that any such decomposition provides a feasible solution for Φ , since ω ± constitute valid Choi operators of maps Ω ± ∈ CPTNI(A, B). The first inequality thus follows. The second inequality is a consequence of the bound Φ ≥ Φ ♦ from Cor. 2 and the fact that 1 d A J Φ 1 is known to lower bound the diamond norm (see e.g. [41,57]). It can also be explicitly seen by noting that any decomposition of the form J Φ = λ + J Λ+ − λ − J Λ− with Λ ± ∈ CPTNI can provide a decomposition for the trace norm by rescaling each J Λ± by its trace; specifically, and using the fact that Tr The case of the robustness measures R , R follows analogously, where we now use the fact that Tr X + = min µ X ≤ µρ, ρ ∈ D and Tr X − = min µ X + µρ ≥ 0, ρ ∈ D for any Hermitian X. For the robustness R, take λ to be the greater of Tr J Φ+ − 1 and Tr J Φ− , and write Since each J Φ ± λ ∈ CPTNI, this provides a valid feasible solution for R. On the other hand, R(Φ) ≥ max{R (Φ), R (Φ)} by definition, from which the lower bound follows.
Both the upper and the lower bounds in Prop. 7 can be tight, as was shown already for the diamond norm [40]. However, better upper bounds can be obtained as follows.

Proposition 8. For any Hermiticity-preserving map Φ ∈ H(A, B), it holds that
We note that the bound for the diamond norm, which we stated above for completeness, appeared previously in [41].
Proof. The bounds for · ♦ , · , R and R follow simply by using J Φ = J Φ+ − J Φ− as feasible solutions in the definitions.
For the robustness R, take λ to be the greater of λ max (Tr B J Φ+ ) − 1 and λ max (Tr B J Φ− ), and write Since this is a feasible solution for R, we get R(Φ) ≤ λ.
As for lower bounds, we will first need to establish dual expressions for the considered measures. The following Proposition is an application of standard convex duality arguments, and we include details in Appendix A for completeness. ∈ H(A, B), the following dual expressions hold.

Proposition 9. For any Φ
We can then obtain lower bounds by employing the dual optimisation problems. The bound for the diamond norm is well known [20], but we find it is insightful to rederive it using this approach 1 .

Proposition 10. For any Φ ∈ H(A, B)
and any input state ρ ∈ D(A), let Φ(ρ) ± denote the positive/negative part of the output operator Φ(ρ). Then Proof. Consider the diamond norm first. The main idea is to restrict the optimisation in the dual expression of · ♦ in (53) to operators of the form W = ρ ⊗ Z for some operator Z. Then we have where the second line follows by the Choi-Jamiołkowski isomorphism. Taking Z ∈ {1, −1}, we get the lower bound In the case of · , we use feasible solutions of the form An immediate consequence is that for any completely positive map Φ, it holds that since the operators J Φ and Tr B J Φ are both positive semidefinite. However, the lower bounds allow us to show explicitly that the equality Φ = Φ ♦ is no longer true for maps which are neither CP nor trace preserving, and in fact the extreme disparity of Φ = 2 Φ ♦ (cf. Cor. 2) can be achieved. Consider for instance the case when Decomposing 1| into its positive and negative parts, the bound of Prop. 8 gives Φ ♦ ≤ 1. However, the best upper bound we get for Φ is 2, and it is indeed tight: we have Φ(|0 0|) = |0 0| and Φ(|1 1|) = − |1 1|, and so Prop. 10 gives Φ ≥ 2. A similar argument can be used to show that R(Φ) = 1, which in particular implies that 2R(Φ) All of the bounds that we established in this section can be tight, as we shall demonstrate in what follows.

Positive maps and structural physical approximation
Positive maps constitute a fundamental way to detect and characterise quantum entanglement [2][3][4]. One of the most studied approaches to implementing such maps in practice is the structural physical approximation (SPA) [5,6], which aims to approximate a given positive map Φ with a physical quantum channel by considering decompositions of the form Φ + ςD, where D is the completely depolarising channel, J D = 1/d B . Such approximations have found use in both understanding the properties of positive maps [7,58], as well as in realising them in experiments [6,8,59].
Intuitively, the robustness measures can then be understood as different approaches to defining an optimised SPA to the map Φ, by allowing channels other than the depolarising map to be used in the decomposition (cf. [33]). We will now discuss the similarities and differences between the approaches by studying two representative examples of positive maps. T ∈ H(A, A). Letting SPA(T ) denote the minimal amount ς needed for (T +ςD)/(1+ς) to be a quantum channel, it can be easily verified that SPA(T ) = d A . However, by making a more suitable choice of a channel in the optimisation, our robustness measures construct an approximation as (T + λΛ)/(1 + λ) where λ = 1 2 (d A − 1) already suffices to ensure that this is a valid physical channel. From this we see that R(T ) = 1 2 (d A − 1) and hence T = d A . Quantitatively, the advantage gained by allowing arbitrary channels in such decompositions can therefore be significant.

Transposition map. Consider first the transposition map
To understand why a better approximation can be obtained, let us take a closer look at the optimal decomposition for this map. Our generalised approach can take into consideration the fact that the Choi operator of the transposition map, J T (the swap operator), already has a non-trivial positive part, which means that there is no need to act on that part of the space. More specifically, a better approximation is obtained simply by defining the map J Λ = ∈ CPTP and mixing as Structurally, this is not too different from the SPA -the only maps involved in the combination are the depolarising channel and the transposition map itself, even if the optimal approximation is not simply a convex mixture of the two. Indeed, we could define an optimised structural physical approximation which allows for such decompositions to be used: with the expression valid for any map such that λ min (J Φ ) < λ max (J Φ ) = d −1 B . This can be used to give a general bound to the robustness measures.

Proposition 11. For any trace-preserving map
In the case of the transpose, it holds that SPA (T ) = R(T ) = 1 2 (d A − 1), so we know that an optimal approximation of the transposition map can be realised with only the depolarising channel, as long as one considers the optimised approach of Eq. (60). However, this is not the case for general maps, and the advantages offered by the generalised robustness approach can provide new insight into optimal approximations of maps, as we shall see in the following. A) with d A = 3 is an example of an indecomposable positive map, and is defined by [60] C(X) :=

Choi map. The Choi map C ∈ H(A,
where X ij denote the matrix elements of X in a chosen basis. A numerical evaluation shows that the optimal decompositions for C give SPA(C) = 3 2 and SPA (C) = 2 3 . With the robustness, an improved choice can be obtained by choosing Λ = id and mixing as J C + 1 6 J id ≥ 0, yielding R(C) = 1 6 . Consequently, mixing with more general maps can not only provide quantitative improvements, but also identify ways of implementing non-CPTP maps which are impossible to find with the standard structural physical approximations.
An interesting difference between the SPA-and robustness-based approaches is that the optimal SPA of the Choi map is a measure-and-prepare (entanglement-breaking) channel [7], while the map obtained in the robustness-based approach is not (as can be verified with the PPT criterion). Since measure-and-prepare channels enjoy an easy implementation in practical settings, it would be an interesting extension of our approach to consider the extent of a quantitative advantage that can be maintained while requiring that the optimal CPTP approximation be entanglement breaking.
We also note that another approach to realising positive maps was studied in Ref. [61] by using multiple copies of the input state, where a related SPA-based approximation was also considered. An extension of the methods of our work to this framework could provide additional insight into the implementability of positive maps.

Inverse quantum channels
A fundamentally important case of a non-CPTP map encountered in many settings is the inverse linear map of a bijective quantum channel, that is, a map such that Λ −1 • Λ = Λ • Λ −1 = id. 2 Note that such an inverse is not guaranteed to exist for a general channel, and even when it does, it will not form a valid quantum channel unless Λ is a unitary map. However, many important cases of quantum dynamics are indeed invertible, allowing us to study their inverses in the formalism of our work.
Non-Markovianity. One setting in which channel inverses play a role is the study of non-Markovianity. Among the different ways to define Markovian evolution, a common way is to say that a time-dependent evolution governed by the channel Λ t,0 is Markovian if it behaves as a physical map over any time interval [t, t + δt]. Mathematically, any Λ t,0 satisfying this condition is said to be CP-divisible [62][63][64], which can be formalised by the statement that for all times t and s ≤ t we can write where the propagator Ξ t,s is a CPTP map. For more general channels, the decomposition Λ t,0 = Ξ t,s • Λ s,0 results in some Ξ t,s that is non-CPTP, indicating that Markovian dynamics break down after some time point s.
Observe that, provided Λ t,0 is invertible for all t, we can take Ξ t,s = Λ t,0 • Λ −1 s,0 . Therefore, the non-physicality of Λ t,0 • Λ −1 s,0 serves as an indicator of non-Markovianity, and -since this map is trace preserving for any trace-preserving Λ -the diamond norm Λ t,0 • Λ −1 s,0 ♦ can be used as a quantitative measure of non-Markovianity over the time-interval [t, s]. This is similar to the original approach of Ref. [62] where a quantifier based on the trace norm of the Choi operator was employed -the advantage of our definition is the ability to interpret this quantity operationally.
Specifically, we observe that quantum mechanics is ultimately a Markovian theory: if we had knowledge of all relevant objects, then all quantum dynamics could be described by Markovian unitary dynamics. That is, any information from the past that is relevant to the future must pass through the present, and hence the optimal prediction of future observational statistics ultimately depends only on the the present state of reality. Non-Markovianity is an artefact of not tracking all relevant information in the present. In our context, this arises as our mathematical characterisation of the candidate channel, Λ s,0 , does not track the state of the environment. The operational relevance of Λ t,0 • Λ −1 s,0 ♦ then becomes more evident. Notably, in Sec. 4 we presented a systematic means of simulating any unphysical map Ξ t,s by introducing an ancillary system X. Here, we may think of this as building a Markovian model for Ξ t,s by introducing X = i µ i ρ i as an "artificial environment". The feeding in of different states ρ i depending on X then represents a means in which non-Markovian behaviour on the system is realised. While this construction does not immediately look physical (as it allows affine mixtures of quantum states), it can be simulated by a classical computer with sufficient resource overhead. The resource costs of doing so -Ξ t,s ♦thus represents a bound on the information processing capabilities of the environment that enable said non-Markovian behaviour to emerge.
There are multiple approaches for extending this to a time-independent measure of non-Markovianity of Λ. One could, for example, take the supremum of the measure Λ t,0 • Λ −1 s,0 ♦ over all t and s. This would then characterise how much extra information processing we need beyond tracking the state of the system at time s to simulate dynamics over the time-interval [s, t]. We may also follow an approach based on Ref. [62] and define I ♦ (Λ) := ∞ 0 g ♦,t (Λ) dt, where g ♦,t can be understood as the right-hand derivative of the diamond norm of the dynamics at time t: I ♦ (Λ) therefore represents the total amount of non-Markovianity in this evolution. A suitable normalisation of this quantity can allow for the comparison of the strength of non-Markovianity in different settings [62,64]. We leave a careful consideration of these possibilities to future work.
Error mitigation. Another application for the study of channel inverses is error mitigation. This setting considers the scenario where one is tasked with computing expectation values of the type Tr[U(ρ)A] for an input state ρ, ideal gate U, and observable A, while operations are followed by a noise channel Θ. A leading approach to this problem, called probabilistic error cancellation [17,65], is to counteract the noise with the inverse map Θ −1 , so that Tr[U(ρ)A] = Tr[Θ•Θ −1 • U(ρ)A]. By decomposing Θ −1 into a quasiprobability distribution over a convex subset of channels P = {Λ i } such that Λ i • U would be implementable on a (fictitious) noiseless device, standard quasiprobability sampling arguments allows us to construct an unbiased estimator for Tr[U(ρ)A] using only operations implementable on a noisy device. The optimal overhead cost of such a procedure scales as γ P (Θ) 2 , where [17,30] γ P (Θ) = min The specific choice of P can be made depending on not only the physical setting in consideration, but also on one's precise motivations. On the one hand, a set with a finite number of operations (e.g., Clifford gates) turns Eq. (64) into a linear program [17,65], making the overhead cost easily computable while sacrificing the expressibility of devices. On the other hand, choosing a larger set with an infinite number of implementable operations takes into account a larger expressibility [30], but makes the computation of Eq. (64) hard in general. Here, to accommodate computability and expressibility at the same time, we take another approach considered in Ref. [33,66]: we choose P to be all physical quantum channels. We notice that the norm · provides the cost of error mitigation in this setting as γ CPTP (Θ) = Θ −1 = Θ −1 ♦ , which can be efficiently computed by semidefinite programming. Although this choice of P might seem too permissive, the lower bound obtained through this approach can actually match known achievability results (upper bounds) [33], showing new optimality results and even improving on the specialised characterisation of Ref. [30] in some cases. Of note is the fact that, since any inverse map Θ −1 of a quantum channel Θ is trace preserving, our Thm. 3 shows a new application of the diamond norm in bounding the cost of error mitigation: it always holds that γ P (Θ) ≥ Θ −1 ♦ , regardless of the choice of P. In some cases -such as when experiencing the leakage or loss of some qubits during computation -the noisy evolution can actually correspond to a map which is not trace preserving. Although many previous approaches did not take this into consideration, our methods explicitly extend to such maps, allowing one to understand the simulation of non-trace-preserving linear maps through Thm. 4. Related settings which our methods can characterise include the so-called linear quantum error correction [67], which aims to correct errors of systems undergoing general, non-CPTP dynamics Θ, as well as error mitigation for non-Markovian noise [68], where the mitigation cost can be related to a measure of non-Markovianity. In such cases, our approach can thus help understand the implementation of not only the inverse maps, but also the dynamics themselves.

Computing the measures
To showcase the application of our methods and evaluate the measures for some representative examples, we will consider the inverse maps of several fundamental types of noisy quantum evolutions: depolarising, amplitude damping, dephasing, and qubit leakage channels. The expressions for the first two appeared in Ref. [33], which we rederive using the methods and results of this work. We also find for the first three that the optimal decomposition into Λ ± for the norm Θ −1 (Eq. (9)), can be taken as convex mixtures of unitaries and state preparations. Thus, Θ −1 also serves as the optimal cost γ P (Θ −1 ) with a smaller set P as considered in Ref. [30], indicating that the capability to implement all CPTNI maps does not provide any advantage over that of implementing unitaries and state preparations only. Note that the inverses of trace-preserving maps are trace preserving, and so in such cases the equality Φ = Φ ♦ = 2R(Φ) + 1 holds by Thm. 3, which means that it will suffice to evaluate any one of the measures.
Depolarising noise. The depolarising channel, given by D p (X) := (1 − p)X + p Tr X 1 d A for some noise parameter p ∈ [0, 1), has the inverse D −1 p (X) = 1 1−p X − p 1−p Tr X 1 d A . This gives Finally, the equality J ∆ −1 p 1 = S 1 is obtained by noticing that J ∆ −1 p = i,j (S) ij |ii jj| which has the same eigenvalues as S.
The eigenvalues of S can be readily obtained due to the fact that it is a circulant matrix [70, 2.2.P10], allowing for a straightforward computation of the trace norm S 1 and altogether giving For the qubit dephasing channel with p ∈ [0, 1 2 ), we recover Since each eigenvector |s m for S in (70) corresponds to the application of Z m , Λ ± in (72) are realised as probabilistic applications of the generalised phase unitaries.

Amplitude damping noise. The qubit amplitude damping channel
we have Proposition 10 thus gives A matching upper bound can be obtained by explicitly computing J A −1 γ (see e.g. [17,30]) and using the upper bound in Prop. 8.
The above shows a rather general method of obtaining lower bounds for linear maps which are inverses of other linear maps, without having to explicitly compute the full inverse map. Indeed, this can be extended to maps which only approximately invert a given channel -useful, for instance, when dealing with non-invertible maps, or when aiming to reduce the cost of implementing a given map by only requiring that it approximately mitigates the error.
Proof. We use Prop. 10 to get that The third line follows by the triangle inequality, and the last line is a consequence of the assumption that Φ • Φ(ρ) − ρ 1 ≤ ε for all ρ ∈ D(A), since we can write any Z = µ + ρ + − µ − ρ − for some The case of the other measures is analogous: using the variational form of the function Tr Z + (and similarly Tr Z − ) we can obtain where we used the Cauchy-Schwarz inequality. Using these bounds in Prop. 10 yields the stated result.
Leakage error. Consider the qubit leakage error L p (·) = L p ·L † p where L p := |0 0|+ √ 1 − p |1 1|. This represents a situation where the excited state is lost with probability 1 − p, and this stochastic nature is reflected to the fact that L p is not trace preserving. The inverse of the leakage error is given by L −1 p (·) = L −1 p · L −1 p . Since this is a completely positive map, Eq.

Discussion
We introduced a comprehensive quantitative approach to the study of non-completely-positive linear maps, focusing in particular on the task of approximating and simulating them with valid quantum channels. To this end, we considered several quantifiers which generalise measures employed in the study of quantum resources -namely, variants of the robustness and base norm measures. We showed that they satisfy very close relations with the diamond norm, and in particular are exactly equal to it for any trace-preserving linear map. Since such trace-preserving maps are the most commonly encountered examples of dynamics beyond physical quantum channels, this allowed us to establish fruitful interrelations between the quantities, and discover new applications of the fundamentally important quantity that is the diamond norm. We developed in particular two operational connections. Firstly, we introduced a method of simulating general linear maps with quantum channels, shifting the difficulty of realising non-quantum dynamics onto the structurally simpler task of implementing linear combinations of quantum states. We showed that our robustness measure exactly quantifies the cost of realising such schemes in terms of the required state-based resources. Secondly, we showed that another variant of the robustness finds use as an exact quantifier of the performance advantage that a general linear map can enable over quantum channels in a class of state discrimination games. We introduced a number of useful bounds and explicitly employed them to demonstrate the computability of the measures for some representative examples. Finally, we showed how our measures can find use in the quantitative characterisation of several practically relevant settings, namely, structural approximations of positive maps, non-Markovianity quantification, and tightly bounding the cost of probabilistic error mitigation.
Although we focused on the application of our framework to Hermiticity-preserving maps, we note that more general linear maps can be treated in a similar way. The simplest way to approach this is to decompose any linear map Φ into its Hermiticity-preserving and skew-Hermiticitypreserving parts, that is, write Φ = Φ H + iΦ SH where the constituent maps are defined through J Φ H := 1 2 (J Φ +J † Φ ) and J Φ SH := 1 2i (J Φ −J † Φ ). The maps Φ H and Φ SH are then explicitly Hermiticitypreserving, and our arguments can be applied to them directly. A similar approach was employed in [72] [72] are also optimal for the robustness-based quantities.
We also note that the diamond norm has been applied as a measure of specific properties of quantum channels, such as their ability to detect coherence [73]. Connections between our methods and such approaches could be fruitful to explore.
A major outstanding issue is to understand how the framework of this work can be extended to non-linear maps, which could allow for the characterisation and more efficient approximation of important unphysical dynamics such as quantum cloners. This question was already asked in the earliest works concerned with approximating non-CPTP maps with quantum channels [5], but it still remains a considerable challenge to devise approaches which could apply to general non-linear transformations.