State retrieval beyond Bayes’ retrodiction

In the context of irreversible dynamics, the meaning of the reverse of a physical evolution can be quite ambiguous. It is a standard choice to deﬁne the reverse process using Bayes’ theorem, but, in general, this is not optimal with respect to the relative entropy of recovery. In this work we explore whether it is possible to characterise an optimal reverse map building from the concept of state retrieval maps. In doing so, we propose a set of principles that state retrieval maps should satisfy. We ﬁnd out that the Bayes inspired reverse is just one case in a whole class of possible choices, which can be optimised to give a map retrieving the initial state more precisely than the Bayes rule. Our analysis has the advantage of naturally extending to the quantum regime. In fact, we ﬁnd a class of reverse transformations containing the Petz recovery map as a particular case, corroborating its interpretation as a quantum analogue of the Bayes retrieval. Finally, we present numerical evidence showing that by adding a single extra principle one can isolate for classical dynamics the usual reverse process derived from Bayes’ theorem.


Introduction
Reversible transformation of a physical system are bijective mapping between input and outputs. They are called reversible when a well defined notion of reverse operation exists, the latter of which involves the inversion of the direction of the element-wise mapping from the space of the outputs to the space of inputs. Reversible quantum channels are unitary channels, while reversible classical stochastic processes are permutations.
Whenever the bijectivity between the space of inputs and outputs is lost, the standard definition of reverse operation no longer applies and one is forced to define a notion of generalised reversion.
Jacopo Surace: jacopo.surace@icfo.eu Matteo Scandi: matteo.scandi@icfo.eu To this end, an illuminating approach is adopting a statistician's perspective and associating reverse processes with the process of retrodiction. It has been shown in [1][2][3][4][5] that the common method for defining a generalised reverse map is analogous to the operation of retrodiction based on Bayes' theorem. In particular, considering the left-stochastic matrix Φ as the conditional probability ϕ(i|j) = Φ i,j of obtaining the microstate i from the micro-state j, the Bayes inspired reverse mapΦ B is defined in coordinates as: where π is a fiducial state that is perfectly retrieved, called prior in Bayesian inference. Even though the choice of this specific reverse map can be thoroughly justified in the context of classical Bayesian inference [6,7], as we will see, it is just one of the many different reasonable reverse maps. Furthermore the notorious difficulty of extending the Bayes rule to quantum systems [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22], together with the partial arbitrariness of this choice, makes the characterisation of a quantum reverse maps even more questionable. In the quantum scenario, what can arguably be called the standard reverse map is the Petz recovery map [23]. This map has been introduced in relation to its properties in the context of the data processing inequality [24][25][26][27] but, again, it was shown in [3,4] that the Petz map can be regarded as one of the possible extensions of the Bayes inspired reverse map in the quantum context, and that it is a fundamental tool in deriving fluctuation theorems [3,4].
In the following we tackle the problem of the arbitrariness in the choice of a generalised reverse map, introducing a definition of the class of state retrieval maps based on a set of physical desiderata. We will differentiate between state retrieval maps and reverse (or retrodiction) maps, considering reverse maps as state retrieval maps with the additional property of being involutive. We show that by choosing a maximisation principle we can single out a unique optimal map that outperforms the Bayes retrodiction in the task of state retrieval. The advantage of this construction is that, being based on a set of physical principles, it can be naturally extended to the quantum regime, partly overcoming the difficulty in directly extending the Bayes rule. Even in the quantum case, using an analogous maximisation principle, we show that in the considered example the optimal map outperforms the Petz map.
Finally, we present numerical evidence suggesting that by adding a desideratum, namely that reversing an evolution should be involutive (i.e., the reverse of the reverse is the forward map), it is possible to single out a unique map coinciding with the Bayes inspired retrieval map for classical stochastic maps.

Rationale
Before entering the technical details, it is important to explain the intuition that guides our construction. The main aim of this work is to define a physical process that can recover the initial conditions of a dissipative dynamics Φ as accurately as possible. For unitaries there is a unambiguous choice given by Φ −1 , but this is not well defined for general processes, as the inverse of a dissipative evolution is unphysical. For this reason, we put forward a construction of the main desiderata for a generic inverseΦ.
First, it should be noticed that any physically realisable process with domain equal to codomain has at least one fixed point. This gives us the freedom to encode any additional information on the initial conditions into the fixed point ofΦΦ, i.e., one can always choose without loss of any generality a state π that will be perfectly recovered.
Moreover, we impose that the statistics ofΦΦ are as time symmetric as possible. This requirement takes the form of detailed balance condition on the transition rates, which is the canonical method of enforcing time symmetry in dissipative dynamics. It is an easy exercise to prove that the only evolution which is detailed balance with respect to every state is the identity map, which in this case corresponds to the undesirablẽ Φ = Φ −1 . Hence, in order not to lose generality, we require detailed balance with respect to a single state. It is a standard result that if a map satisfies detailed balance with respect to a state, then this is also a fixed point of the evolution. For this reason, choosing any other state than the π defined above would introduce some extra information about the initial conditions, namely that this additional state should perfectly retrieved. For this reason, we require the same state π to be the one with respect to whichΦΦ is time symmetric.
The last requirement is geometrical in nature. It should be noticed that any rotation in the image ofΦΦ can only decrease the quality of the retrieval in a trivial manner: in fact, any rotation of the image can be undone by simply rotating back at the end of the proto-col. For this reason, without loss of generality one can assume that the image ofΦΦ has the same orientation as the original space of states. This intuitive argument is mathematically encoded by principle 5.
We show that the family of possible retrieval maps Φ satisfying the principles above is actually a convex space. To benchmark our construction, then, we also verify that the most common choice of state retrieval, i.e., the one coming from Bayes' retrodiction, is indeed contained in the set that we defined. Still it should be kept in mind that this work is not primarily interested in reconstructing the Bayes inversion (a topic covered in Sec. 5), but rather in exploring the intuitive definition of retrieval maps.
Lastly, the principle we choose to single out the optimal map is also geometrical. In particular, we start from the consideration that any physical map compresses the space of states. This implies the existence of some states outside of the image ofΦΦ that simply can't be retrieved. Following this intuition, we assess the quality of a retrieval mapΦ by how big the volume of the image ofΦΦ is, or, dually, by how small the volume of inaccessible states is. Then, the optimal map should be the one maximising this volume (see Fig. 4 for an illustrative example).
The discussion above leaves open the question of how to measure volumes in the phase space. Our choice is to look at the determinant ofΦΦ. This is motivated by the following two reasons: first, sinceΦΦ is a linear map, a standard result from linear algebra tells us that the Euclidean volume of its image is given by the determinant, so our choice aligns with the canonical treatment. Secondly, the determinant can be efficiently optimised through convex optimisation. This last property is particularly desirable when taking into consideration applications to concrete physical problems.
2 Characterisation of state retrieval

General requirements
In order to characterise which maps can be useful as state retrieval, we put forward some minimal desiderata that they should satisfy.
Suppose one wants to revert a map Φ given some information on the initial conditions of the system encoded by a fiducial state π, called the prior. The first principle we define is that the state retrieval should be physically implementable, which mathematically corresponds to: 1. The state retrieval mapΦ is described by a left stochastic matrix; Notice that this requirement prevents one from set-tingΦ = Φ −1 : in fact, for dissipative evolutions Φ −1 is not a stochastic matrix, so it cannot be physically realised, as it would send general states to something that is not a probability vector. Still, for reversible transformations (i.e., for permutations) Φ −1 is indeed a stochastic map which perfectly recovers the input of Φ. Since this choice is obviously optimal, we also require that: 2. If the map Φ −1 exists and it is left stochastic, the state retrievalΦ coincides with it, that is, whenever it is possible, we should setΦ = Φ −1 .
Notice that since both Φ andΦ are stochastic, this also holds for their compositionΦΦ. This implies that the composite map has at least one probability vector associated to the unitary eigenvalue, corresponding to the state that is perfectly recovered by the retrieval map. Thanks to this fact, we can encode the information about the initial conditions in it, that is the prior state π should correspond to an eigenvector of the composite evolution with eigenvalue one. Hence, the third requirement is: 3. The prior state is one of the perfectly retrieved states:Φ(Φ(π)) = π.
Finally, we do not only require that π is one equilibrium state of the dynamicsΦΦ, but also that this is detailed balanced with respect to it. This can be expressed in coordinates as and it corresponds to the requirement of time symmetric dynamics in the associated Markov chain. This request can be interpreted as follows: sinceΦΦ corresponds to an evolution forth-and-back, its statistics should not distinguish between the two directions of time. By this we mean that the probability of measuring the microstate i at the beginning and evolving to j should be the same as the one of first measuring j and ending up in the i-th state. Unfortunately, we cannot impose such a strong requirement for all states, as it would lead to the unphysical Φ −1 . For this reason, we limit ourselves to imposing time symmetry in the rates of the dynamics with respect to the prior state, as expressed in Eq. (2): 4. The evolutionΦΦ satisfies detailed balance with respect to π.
In order to explore which maps can be considered as possible candidates for a state retrieval, it is first useful to introduce a particular parametrisation of stochastic maps. This is the subject of the next section.

Parametrisation of stochastic maps with a given transition
Consider the family of stochastic maps Ψ with fixed transition Ψ(π) = σ, where both π and σ are probability vectors with strictly positive entries 1 , and of the same dimension. These maps can be rewritten as: where J π is a diagonal matrix with entries (J π ) i,i := (π) i , and Λ Ψ is implicitly defined by the equation Λ Ψ := ΨJ π . This matrix satisfies the following two conditions. From the request that Ψ is stochastic one can deduce that: Moreover, since the transition Ψ(π) = σ is specified, Λ Ψ also satisfies: This means that any stochastic map Ψ with fixed transition Ψ(π) = σ is uniquely identified by an element Λ Ψ of U(σ, π), the space of matrices with non-negative entries, with columns summing to σ and rows summing to π. Interestingly, U(σ, π) is a convex polytope with finite number of vertices, denoted by V (k) σ|π [28] and indexed by k. Moreover, since the matrix transpose exchanges the role of Eq. (4) and Eq. (5), the vertices of U(σ, π) and the one of U(π, σ) are in a one-to-one correspondence through the transformation (V (k) Putting everything together, we can then parametrise the matrix Ψ as: where {λ k } are positive coefficients summing up to one. We note that the convex polytope U(σ, π) is not in general a simplex. Thus, an arbitrary element inside it 1 As a standard approach, in the classical case we are not going to consider the case of probability vectors with zero entries as well as in the quantum case we are not going to consider rankdeficient density matrices. These are special cases that in general inference studies are treated separately, taking care of the possible zeros appearing at the denominators. In the field of Bayesian inference, for example, techniques used to deal with these scenarios are often referred as techniques to solve the zero frequency problem. Moreover, if we assume that all the states are defined on a space of fixed dimension, since zero frequency vectors are always ε-close to a full rank one, one could also argue that given any finite precision in the experiment it is impossible to certify them. Thus, without loss of generality this pathological case can be neglected.
can be parametrised by more than one convex combination of the vertices. Nevertheless, this parametrisation gives a way to uniquely identify a map through a set of coefficients vector {λ (Ψ) } and the ordered pair of states (σ, π) of the fixed transition.

Parametrisation of state retrieval maps
The parametrisation just presented can be used to easily enumerate all the possible retrieval maps. First, it should be noticed that the transformation Φ maps the prior state π into Φ(π), meaning that it can be characterised by the vector of scalar coefficients {λ where V (k) Φπ|π are vertices of U(Φπ, π). In the same spirit, since requirements (2, 3) impose that the retrieval map Φ is a left stochastic matrix with the fixed transitioñ Φ(Φ(π)) = π, one can parametrise it as where, in this case, V Φπ|π are the vertices of U(π, Φπ). Thanks to the relation between U(Φπ, π) and U(π, Φπ) the vertices in the two cases are connected by the transposition (V (k) π|Φπ . For this reason we can focus solely on the coefficients vector, and associate to each state retrieval a transformation R that maps the coefficients vector {λ In the following sections we explore two possibilities for R, one associated with the Bayes inspired reverse, the other with what we call the optimal state retrieval.

Bayes inspired reverse
The Bayes inspired reverse defined in Eq. (1) satisfies the desiderata (1-4), so it is a legitimate state retrieval. Moreover, it is a surprising fact that it corresponds to a particularly simple transformation of the coefficient 2 To be more precise, a state retrieval map which sends Φ tõ Φ is defined on the quotient space U (Φπ, π), whose elements are equivalence classes of coefficients [{λ (Φ) k }], defined by the relation that two points are part of the same equivalence class if they induce the same map on U (Φπ, π) (and similarly for the image space U (π, Φπ)). When passing to the original space of probability distributions {λ (Φ) k }, the relation between state retrieval maps and the corresponding R is no longer one-to-one in general, but rather one-to-many. In particular, R should satisfy the implicit request of having a well-defined projection on the quotient space, namely, the corresponding state retrieval.
Hence, in this case R corresponds to the identity trans- i .

Optimal state retrieval
Principles (1-4) do not select a unique retrieval map, but rather a whole family of transformations. After specifying one more requirement, we provide a maximisation principle that singles out a unique optimal state retrieval mapΦ O . To this end, consider a stochastic map from a space into itself. These types of maps are contracting: the volume of their image will be smaller than the one of their domain. The composite transformationΦΦ falls into this class. Intuitively, it can be argued that the optimal state retrieval should maximise the volume of the image ofΦΦ.
Similar considerations lead us to impose one more requirement onΦ. Notice, in fact, that any negative or complex eigenvalue in the spectrum ofΦΦ corresponds to a reflection or a rotation of the domain, which would increase the statistical distance between a state and its evolved version. For this reason, we impose the principle: 5. The mapΦ is a state retrieval map if all the eigenvalues ofΦ Φ are non-negative.
It should be noticed that the Bayes inspired reverse still falls in this class of transformations. In fact, one can rewriteΦ B Φ as: Both the matrix in the square parenthesis and J π are positive semidefinite. The product of two positive semidefinite matrices has positive spectrum, so the Bayes inspired reverse satisfies principle (5).
Despite the fact that this requirement might appear to be a strong restriction on the class of possible maps, it is still not sufficient to single out a unique transformation. For this reason, we define the optimal retrieval map by the following: Principle. The optimal retrieval map is defined to be theΦ O that maximises the determinant ofΦ O Φ under the constraints (1)(2)(3)(4)(5). Figure 1: Relative entropy between a distribution and its evolution forwards and backwards. In the first two plots we consider probability vectors ρ = [ρ, 1 − ρ] of length two, while in the third plot we consider probability vectors ρ = [ρ1, ρ2, 1 − ρ1 − ρ2] of length three. In all of the plots the map Φ and the prior distribution π are chosen at random. As it can be seen, the optimal map ΦO outperforms the Bayes retrodictionΦB in retrieving the original distribution in the whole space.
In this case then the transformation R assigns to {λ In section 3 we provide an efficient algorithm to construct R, which also proves the uniqueness of the solution. Before passing to that, we provide in the next section an analytic justification to the principle just presented.

Quality of the retrieval
Beyond the intuitive necessity of having the image of the retrieval map as big as possible, the principle of the determinant maximisation can be justified more rigorously. We present here some arguments explaining why the optimal retrieval map should be the one that maximises the determinant. Consider, as a first example, the average relative entropy between the original distribution and the one evolved forward and back: where we indicate by S the space of states. Thanks to the properties of the relative entropy, this average is always non-negative, while it is zero if and only if ΦΦ(ρ) = ρ for every ρ, implying thatΦΦ ≡ I. We show in Appendix A that for invertibleΦΦ (notice that noninvertible maps only have measure zero, as they are not stable under any arbitrarily small perturbations) this quantity satisfies the inequality: where K is a numerical constant independent ofΦ and S(ρ) is the average Shannon entropy. This chain of inequalities gives an idea about why maximising the determinant also minimises the average relative entropy between the initial state and the retrieved one. A more precise argument follows from the observation that in order to optimise the quality of the retrieval we have to makeΦΦ as similar as possible to the identity transformation. Since bothΦΦ and I are positive semidefinite matrices, the relative entropy between the two is well defined and takes the form: where we used the well known matrix identity Tr [log A] = log det A. Minimising this relative entropy is then equivalent to the maximisation of the determinant ofΦΦ. This argument gives a theoretical foundation to the optimisation principle stated in the previous section. Moreover, we also show in Appendix B that the determinant bounds the ability of retrieving any state close to the prior. In formulae, this reads: where δρ is an arbitrary perturbation of the prior state such that |δρ| 1, and the inequality holds up to order O |δρ| 2 .
Finally, using similar arguments, we are also able to prove the following inequality (Appendix B): In this way, the determinant can also be used to bound the maximum rate at which any two states become indistinguishable (the quantity in Eq. (19)). This is a well known quantifier of how much information is lost during the evolutionΦΦ [29].

Optimal State Retrieval
We propose here an efficient algorithm to solve the maximisation of the determinant ofΦΦ by reducing it to the problem of analytic centering. This can be expressed as follows: take a symmetric matrix G[x] linearly dependent on some real scalars {x i } from a convex set A. The analytic centering problem corresponds to the minimisation: This kind of problem can be efficiently solved on a computer [30,31]. Moreover, assuming that the set of x for which G[x] > 0 is non-empty, and that the functional we are minimising in Eq. (20) is strictly convex, the solution is unique. We can now prove the reduction. First, it should be noticed thatΦΦ is not symmetric in general, so the algorithm for the analytic centering cannot be directly applied. Define then the matrix: It should be noticed that principle (4) can be rewritten in matrix form as: from which it follows that Γ[λ (Φ) ] is symmetric. Indeed, the following holds: Moreover, thanks to the properties of the determinant we also have that: In fact, since Γ[λ (Φ) ] andΦΦ are related by a similarity transformation, they actually share the same spectrum.
This implies that the following optimisations are equivalent: The last problem is the analytic centering for Γ[λΦ], which can be solved efficiently by means of convex optimisation. This concludes the reduction.
From the implementation of this algorithm, we obtained numerical evidence that the state retrieval so defined outperforms the Bayes inspired reverse not only on average, but at the single state level. To illustrate this, in Figure 1 we plot the relative entropy of recovery D ρ Φ Φ(ρ) for every state in the domain. The results presented corroborate the intuition that the retrieval map obtained by maximising the determinant ofΦΦ is indeed better than the usual approach in the literature, i.e., Bayesian retrodiction.

Quantum retrieval map
The problem of identifying a state retrieval map for quantum dynamics is more subtle than its classical counterpart. The Bayes' reversion, which depends on the existence of the joint probability of different observables in its derivation, has notoriously proven difficult to be extended to the quantum regime (see section 5 for a short review). For this reason, a reconstruction of a state retrieval map from physical principles is particularly suited to extend the concept of state recovery from the classical regime to quantum dynamics.
Consider a completely positive and trace preserving (CPTP) map Φ. The basic principles we require for a retrieval map to satisfy are the following: 1. The state retrieval is a CPTP map; 2. If the map Φ is unitary, the state retrieval transformation is given byΦ := Φ −1 ; 3. The prior state should be perfectly retrieved, i.e., Φ(Φ(π)) := π.
These three principles already suffice to give a parametrisation of the recovery maps analogous to the one in Eq. (3).

Parametrisation of CPTP maps with a given transition
Given a CPTP map Ψ with a fixed transition Ψ(π) = σ we can decompose it as: where J π is a completely positive generalisation of the multiplication by π, defined as J π (ρ) := √ πρ √ π, and Λ Ψ is given by Λ Ψ = Ψ J π . Since both Ψ and J π are CP, Λ Ψ is CP as well. Moreover, since Ψ is trace preserving it follows that: because the trace preserving condition is equivalent to From the fixed transition it also follows that: In this way, similarly to what happens for the classical case, a quantum channel is uniquely identified by a map Λ Ψ ∈ U Q (σ, π), the space of CP transformations that map the identity to σ, and whose adjoint maps the identity to π. This set is convex. Its extreme points can be characterised in terms of their Kraus operators if the following holds [32,33]: Differently from the classical case, though, the set U Q (σ, π) contains a non-trivial symmetry (that is, not reducible to a relabeling): consider the two unitary maps U π and V σ , defined by U π [ρ] := U ρ U † , and satisfying U π [π] = π (and analogously for V σ , with V σ [σ] = σ). Then Eq. (31) and Eq. (32) are invariant under the transformation: Hence, every Λ Ψ contained in U Q (σ, π) is part of an invariant family connected by the unitary transformations defined in Eq. (33).
The space U Q (σ, π) can also be characterised in terms of the Choi operator C(Λ Ψ ) of the maps Λ Ψ contained in it. In particular, we show in Appendix D how this naturally translates to a characterisation of U Q (σ, π) in terms of a marginal problem, leading to a set of linear inequalities that constrain the spectrum of the Choi matrices therein.
In order to extend principle (4) to the quantum regime we generalise its matrix expression (see Eq. (23)) as follows: 4. The channelΦΦ satisfies the equation with respect to the prior π.
This expression is equivalent to a weak form of detailed balance for quantum evolutions [29,34]. In particular, it should be noticed that this principle coincides with the usual version of detailed balance for classical evolutions, as it can also be understood from the fact that for commuting states J π ≡ J π .
Finally, the last requirement can be translated to: It should be noticed that the spectrum of a CP-map is the same as the one of the corresponding vectorised version [35].

Petz' map
We can now proceed to define a map analogous to the Bayes inspired reverse for quantum systems. First, it is clear from Eq. (31) and Eq. (32) that for any generic Λ Ψ in U Q (π, Φπ), then (Λ Ψ ) † ∈ U Q (Φπ, π), so there is a one to one correspondence between the two sets, given by the adjoint transformation. Moreover, the CPTP map Φ can be written as: where Λ Φ ∈ U Q (Φπ, π). By inspecting Eq. (10), one can see that for classical systems the Bayes' retrodiction is obtained by choosing ΛΦ := (Λ Φ ) T . In complete analogy we define: where on the right hand side one can read the definition of the Petz recovery map, commonly used as a quantum extension of the Bayes rule [1][2][3][4]36]. This argument gives yet another derivation justifying this identification.
It is easy to show that the Petz recovery map satisfies all the desiderata of a state retrieval map. In particular one notices that the Petz recovery map satisfies principles (4) and (5) by rewriting it as: and by using similar arguments as the one for the classical case. S o X 6 s z I u Q j t S f E x k L t e 6 H n u k M G Q b 6 r 5 e L / 3 n N F P 3 T V i a i J E W I + P c i P 5 U U Y 5 o n R T t C A U f Z N 4 R x J c x f K Q + Y Y h x N n k U T g v P 3 5 E l y t V 9 x j i t H l 4 f l 6 t k 4 j g L Z J F t k h z j k h F T J B a m R B u H k n j y S Z / J i P V h P 1 q v 1 9 t 0 6 Z Y 1 n N s g v W B 9 f 2 D S i c w = = < / l a t e x i t > Tr ˜ (⇢) ⇢ Figure 2: Trace distance between a distribution and its evolution forwards and backwards for Φ = ∆η for a qubit, using as prior distribution π = 1/2. The states are parametrised as ρ = (1 + xσx + yσy)/2, corresponding to the disk at the equator of the Bloch sphere. It can be seen how the optimal state retrieval map outperforms the Petz recovery on all states. It should be pointed out that the relative entropy presents the same feature, but the trace distance makes the plot more understandable.
This discussion shows that not only can the approach presented here be useful to clarify the basic requirements for a quantum state retrieval map, but it can also help in highlighting the correspondence between the classical and the quantum scenario.

Optimal state retrieval: case studies
In complete analogy with the classical case we define the optimal retrieval map to be the one satisfying the following Principle. The optimal retrieval map is defined to be theΦ O that maximises the determinant ofΦ O Φ under the constraints (1)(2)(3)(4)(5).
The use of the volume as a significant quantity in the study of quantum channels has been already explored in relevant works such as [37,38]. It is not immediately clear how one could devise a parametrisation to explore the whole space U Q (σ, π). Moreover, the symmetry expressed by Eq. (33) makes designing a maximisation algorithm more involved. For this reason, we limit ourselves here to the treatment of analytically solvable cases.
In particular, consider the depolarising channel given by: where η is a scalar parameter in [0, 1+(d 2 −1) −1 ] and d is the dimension of the quantum system in consideration.
Choosing 1/d to be the prior state, all the calculations can be carried out analytically. S o X 6 s z I u Q j t S f E x k L t e 6 H n u k M G Q b 6 r 5 e L / 3 n N F P 3 T V i a i J E W I + P c i P 5 U U Y 5 o n R T t C A U f Z N 4 R x J c x f K Q + Y Y h x N n k U T g v P 3 5 E l y t V 9 x j i t H l 4 f l 6 t k 4 j g L Z J F t k h z j k h F T J B a m R B u H k n j y S Z / J i P V h P 1 q v 1 9 t 0 6 Z Y 1 n N s g v W B 9 f 2 D S i c w = = < / l a t e x i t > Tr ˜ (⇢) ⇢ Figure 3: Trace distance between a distribution and its evolution forwards and backwards for the evolution Φ specified in Eq. (46) and states of the form ρ = ((1 + xσx + yσy)/2) ⊗ γ β and prior state π = γ β ⊗ γ β .
First, we compute the Petz recovery map in this case. The prior state is invariant under the transformation, This is simply given by J 1/d = I/d, where we used a different notation for the identity superoperator I and the state 1/d. Finally, we can compute the adjoint of the depolarising channel from the series of equations: implying that ∆ † η = ∆ η . Hence, by using the definition in Eq. (36) we obtain that the Petz recovery map for the depolarising channel is given by: that is by the depolarising channel itself. This was somehow expected, for, as already observed in [4], the Bayes inspired reverse channel computed considering as prior a fixed point of the channel is the channel itself.
We can now pass to compute the optimal state retrieval. There are two remarks that need to be made beforehand: first, it should be noticed that the constraint in Eq. (34) is satisfied at the level of the map itself, that is Moreover, the spectrum of ∆ η is real and positive, as it can be understood by decomposing it on any basis of the Hermitian operators. These two observations together imply that In fact, this map always maximises the determinant of Φ O Φ, since any other CPTP will contract the volume of the phase space. Usually, though, it is ruled out by the requirements imposed by principles (4) and (5). The generality of these considerations directly leads to the following: Theorem. Whenever a transformation Φ has positive spectrum and it is detailed balance with respect to the prior state (meaning that Φ J π = J π Φ † ) the optimal state retrieval is given by the identity map.
The theorem, means that in this case the optimal strategy is to leave the system unperturbed. It should be noticed that under the same assumptions the Petz recovery map is given by the map itself,Φ P = Φ, so that applying it leads to a further deterioration of the information on the initial state. This shows how our definition of optimal retrieval is more suited in the task of recovering a state after a transformation. The difference in performance between the Petz recovery map and the optimal state retrieval is shown in Fig. 2.
A crucial simplification in the study of the depolarising channel is that it is a unital channel, so that one can use 1/d as a prior state, leading to J 1/d = I/d. In the following we show that one can obtain analytical insights even in the case of non-unital maps. In Fig. 3 we compare the performance of the optimal map and the Petz' one for the two-qubit channel defined by: where θ λ is the thermalising channel defined by: and γ β = e −βH Tr[e −βH ] is the Gibbs state associated to the Hamiltonian H := |1 1|. Then, a simple calculation shows that the Petz map coincides with the original channel, i.e.,Φ P = Φ. On the other hand, the map maximising the determinant (under the constraints ax. (1-5)) is given byΦ O = SWAP, the swap operator.
Finally, in Figure 4 we highlight part of the rationale for the criteria characterising the optimal retrieval map. In order to do so we study the emblematic example of a map Φ obtained from the composition of a translation and a compression in the Bloch sphere. From panel (b) it is evident how the optimal retrieval map corresponds to the map that minimises the compression of the domain while recovering the desired prior π.

Quality of the retrieval
As we did for stochastic maps, we present here some analytical arguments suggesting that optimising the determinant indeed leads to a better quality of retrieval.
First, it should be noticed that for quantum channels Eq. (17) applies without modifications, so the same arguments presented above in this regard can also be applied to quantum dynamics.
The generalisation of Eq. (18-19), instead, needs a bit more care. First, we introduce the following contrast function: This quantity is positive, zero if and only if ρ ≡ σ, and can be regarded as akin to the Kullback-Leibler relative entropy. It is connected to the super-operator J ρ thanks to the following expansion for close-by states: for  The proof for these inequalities is completely analogous to the one for the classical case and it is presented in Appendix B. The main difference with the classical case is that here we cannot consider the Umegaki relative entropy, unless we demand a stronger version of principle (4), but this seems unnecessary for the situation at hand (see Appendix B for more details).

Bayes reversion from physical principles
The classic derivation of the Bayes inspired reverse channel comes directly from fundamental theorems of probability theory. In fact, since the intersection of two sets A and B is commutative, this means that P (A ∩ B) = P (B ∩ A), so by using the rule of conditional probability (or the axiom of conditional probability following de Finetti [40]) one easily obtains Bayes' theorem. This derivation heavily relies on the notion of With the red arrow we highlight the specific transition π = 1 2 → Φ(π). In panel (b) we plot the action on the Bloch sphere of the optimal (ΦO) and Petz (ΦP ) retrieval maps in purple and green, respectively. In both cases the chosen prior is π = 1 2 . With the red arrow we highlight how bothΦO andΦB map the state Φ(π) toΦO(Φ(π)) =ΦB(Φ(π)) = π. The Bloch sphere is compressed to a much smaller image by the action of the Petz map compared with the optimal retrieval map. Panel (b) helps visualise part of the rationale for the criteria characterising the optimal retrieval map. In this case, where the map Φ is simply a translation composed with a compression, the optimal retrieval map is also the composition of a compression and a translation specified as follows: the translation is the one recovering the desired prior π (the red arrow in the picture) and the compression is the minimal one makingΦO physical (i.e., so that the image ofΦO is contained in the Bloch sphere). In panel (c) we plot the action on the Bloch sphere of the forth-and-back mapsΦOΦ andΦP Φ (with the same colour scheme as before). The prior π is the fixed point of the forth-and-back maps, and the different magnitude in the compression of the Bloch sphere through ΦOΦ andΦP Φ is evident. The choice of the prior π = 1 2 makesΦΦ unital. This allows us to use the parametrisation given in [39] to explore the whole space of possible state retrieval maps.
commutativity for the operation of composing probabilities, which is unavailable when trying to extend the construction of a reverse channel from classical to quantum probabilities. In fact, the non-commutative structure at the basis of quantum theory makes the assignment of a compound probability for a general pair of quantum events problematic. In order to obtain a quantum extension of Bayes inspired reversion a different approach is needed, and many attempts already exist. Among the most modern ones we mention two. The first obtains the classical Bayes inspired reverse from entropy maximisation methods; an overview about this topic is given in [16]. This approach has been further developed to the quantum case as in [11,13,17,18]. Here the Bayes inspired reverse is mainly treated as a tool from inference problems and its physical relevance is somehow set aside.
The second modern and promising approach starts from giving a definition of Bayes inspired reverse in the language of category theory. For its generality this approach is naturally extensible to the quantum scenario, as it is shown in [10] where they give a characterisation of Bayes inspired reverse in terms of commuting diagrams and they show its meaning both in classical and quantum probability. Similar approaches can be found in [19][20][21][22]41].
In this section, motivated by the results presented so far, we are interested in exploring the possibility of a reconstruction of the Bayes inspired reverse starting from few physical principles. If this would be doable, the extension from classical to quantum probability would result naturally, as it was shown in the previous section.
The 5 requirements presented thus far only individuate a family of state retrieval maps which includes the Bayes inspired reverse as a particular case. We can then try to add an additional requirement to see if this singles out the Bayes inspired reverse map in the classical case. A particularly natural choice is the following: 6. The reversion procedure is involutive, that isΦ = Φ.
As we argued in the introduction, we call the state retrieval maps that satisfy this principle reverse maps. Principle (6) implies that R 2 = I, which heavily constrains the freedom on the choice of the reversion procedure R. In the next section we present some evidence that allow us to conjecture that the requirement (6) is strong enough to single out the identity transformation (corresponding to the Bayes inspired reverse) at least in the case in which R is linear and solely depends on the unordered pair of states of the fixed transition.

Characterisation of R
As it was shown in Section 2.4, Bayes' reversion corresponds to choosing the transformation R to be the identity on the space of coefficients. We are thus inter-     Φπ|π }i of U(Φπ, π) using the algorithm of Jurkat and Ryser [28]. From the vertices we can compute the matrices Xi,j and Yi,j and check if they are PSD. In the left and central plots of figure (a) we use an orange square to denote a PSD matrix and a white square to denote a matrix that is not PSD. We note that only the matrices {Xi,i}i and {Yi,i}i are PSD, thus, in this case, R is not allowed to be any permutation different from the identity. In the rightmost plot of figure  ested in knowing if principles (1-6) are enough to ensure that R = I, at least in the case in which R is a matrix.

H D O u T c 3 J 1 E u u A L P e 3 L c h c U P S 8 s r q 5 W P n z 6 v r V c 3 N r s q K y R l H Z q J T P Y i o p j g K e s A B 8 F 6 u W Q k i Q S 7 i x 7 P S / 3 u F 5 O K Z + k t T H I 2 S M g o 5 T G n B C w V V n / 3 Q s 3 3 8 I M 5 C
We order the vertices of U(Φπ, π) in the following way: any vertex that corresponds to a permutation is moved to the beginning of the list {V (i) Φπ|π } i=1,...,n . Say there are of those. Then, we have the following Observation. R is the direct sum of the identity matrix acting on the first sites and a permutation matrix with cycles of maximal length 2 acting on sites + 1, . . . , n.
Proof. Thanks to the structure of U(Φπ, π), one can in-terpret the coefficients {λ (Φ) k } as a probability vector. Thus R must map probability distributions into probability distributions, meaning that R is a stochastic matrix. Moreover, principle (6) implies R 2 = I, meaning that R is invertible and coincides with its inverse. It should be noticed that all the invertible stochastic matrices are permutations. The involutive principle then also implies that it must be a permutation of cycle at most 2. We can now focus on the action of R on the first indices. From principle (2) we know that permutations must be mapped into their inverse, that is U → U T . Thanks to the relation between the vertices of U(Φπ, π) and U(π, Φπ) this corresponds to R acting as the identity on the first elements of {λ Since R is a stochastic matrix, it is sufficient to study its action on the vertices of the simplex of the probability vectors {λ (Φ) k }. In particular we need to check if there is any permutation of two vertices of this simplex that is admissible other than the identity.
We focus on the action of R on single vertices. Consider in particular the case in which Φ := V (i) Φπ . From principles (4) and (5) the following matrix is positive semidefinite. At the same time, due to principle (6), if the vertex (V  (4) and (5), then also implies that the matrix is positive semidefinite. Since the number of vertices is finite, it is easy to explicitly verify for which set of indices Eq. (51) and Eq. (52) are positive semidefinite. We verified this for many possible families of stochastic maps and found that the only admissible R is the identity, meaning that principle (6) seems to be enough to single out the Bayes reversion (see Figure 5). Despite this promising result, an analytical proof of this fact is still missing. In fact we miss a characterisation of the properties of the vertices for generic U(Φπ, π). To the best of our knowledge, for an arbitrary pair (Φπ, π), it is not even possible to know the precise number of vertices of the set U(Φπ, π) without first mechanically constructing them using the algorithm of Jurkat and Ryser [28].

Conclusions
In the present work, we addressed the problem of finding an optimal strategy for the retrieval of a state after the evolution induced by a physical map. We assumed to have a full characterisation of the physical map on the system (given for classical systems in terms of a left stochastic matrix, and for quantum systems in terms of a CPTP map) and we wanted to find a physical transformation ascribable to some reverse transformation.
To this end, we postulated five physically motivated principles that all retrieval maps should satisfy: (1) they are physical; (2) on invertible maps they give the inverse; (3) they perfectly retrieve a fiducial state π; (4) the transformationΦΦ mapping forward and backwards is detailed balanced with respect to π; and (5) the eigenvalues ofΦΦ are positive. We showed that both the Bayes inspired reverse, in the classical case, and the Petz recovery map, in the quantum one, satisfy all these principles.
After giving a parametrisation of the maps compatible with the requirements above, we defined a retrieval to be a transformation R associating to the pair Φ and π a state retrieval mapΦ. In this context, the map R corresponding to the Bayes inspired reverse and the Petz recovery takes a particularly simple form: namely, it corresponds to the identity on the coefficients parametrising the possible retrieval maps.
At this point, we proposed a maximisation principle to define the optimal state retrieval. This seems to outperform the Bayes inspired reverse, or the Petz recovery, both on average and at the level of the single state. We complement the numerical evidence supporting this fact with analytical intuitions about why this is the case.
Finally, in the last section of the paper, we investigated the possibility of singling out the Bayes inspired reverse among the possible state retrievals by adding an additional principle. We propose as a candidate the following: (6) the retrieval of the retrieval is the original map. This principle is motivated by interpreting state retrieval as a generalisation of the time inversion. Despite not being able to prove that this is enough to isolate the Bayes inspired reverse, we have strong numerical suggestions supporting the claim.
Apart from settling down the question whether principle (6) is enough to isolate Bayes' reversion, there are a number of subtleties in the quantum regime that we did not explore. Primarily, there is some arbitrariness in the choice of J π : our choice was motivated by the fact that both J π and its inverse are CP [42]. Unfortunately, different choices of J π impose inequivalent characterisations of the detailed balance in principle (4) [29,34]. For this reason, it will be interesting to study what role this choice has in the definition of reverse maps [43]. Moreover, since the concepts of retrodiction and reverse processes increasingly seem to play a fundamental role in thermodynamics [3,4,44,45], it would be interesting studying the role of the complete family of state retrievals. Finally, it is not directly clear how one could extend the algorithm for the classical scenario to the quantum case. These questions need a treatment of their own and are therefore left for future research. D ( ρ∥ΦΦ(ρ) ) D ( ρ∥ΦΦ(ρ) ) π ρ Figure 6: Relative entropy between a distribution and its evolution forwards and backwards. We consider probability vectors ρ = [ρ, 1 − ρ]. As it can be seen, choosing the prior π as the fixed point ofΦΦ, the optimal mapΦO coincides withΦare and both outperform the Bayes retrodictionΦB in retrieving the original distribution in the whole space.
Again specialising to the relative entropy for classical distributions finally gives Eq. (18).
It is important to keep in mind that these computations only hold in the quantum case for the contrast function in Eq. (62). Still, if principle (4) gets modified with the requirement thatΦΦ satisfies the canonical definition of detailed balance (i.e., the one given, for example, in [48]) all the steps can be generalised to any quantum contrast function. This extension is straightforward, but involves a number of technical details outside of the scope of the present publication. For this reason, we defer its treatment to subsequent works [43].

C Comparison with minimisation of the average relative entropy
In section 3 we used the relative entropy of recovery to evaluate and compare the quality of the retrieval of the single state for the optimal and the Bayes' inspired retrieval map. Given a specific state ρ, a map Φ and a retrieval mapΦ, the quality of the retrieved statẽ ΦΦ(ρ) is higher the smaller its relative entropy of retrieval D ρ Φ Φ(ρ) is. In the examples of Figure 1 the optimal retrieval map outperforms the Bayes' inspired retrieval map. A natural question in this context is how the optimal map compares with the retrieval mapΦ are obtained by directly minimising the average relative entropy on every input state. In Figure 6 and Figure 7 we consider the same example of Figure 1 and computẽ Φ are as the stochastic map that minimises the relative Figure 7: Relative entropy between a distribution and its evolution forwards and backwards. We consider probability vectors ρ = [ρ, 1 − ρ]. As it can be seen, choosing the prior π as a different point with respect to the fixed point ofΦareΦ, the effect of the optimal mapΦO differs from the one ofΦare.
entropy of recovery on average on every input statẽ Note thatΦ are does not depend on the choice of the prior π, but it only depends on the map Φ. In Figure 6 we see that choosing the prior of the optimal retrieval map as the fixed point ofΦ are Φ, the optimal mapΦ O coincides withΦ are .

D Constraints on the Choi state of CPTP maps with a given transition
In this section we show how to formulate the constraints in Eq. (31) and Eq. (32) in terms of the Choi state of Λ Ψ . Consider the maximally entangled state: The unnormalised Choi state of the map Ψ is defined by the formula C Ψ := d (1 A ⊗ Ψ)[|Ω Ω|]. Then, the application of Ψ to a state ρ can be equivalently expressed as Ψ[ρ] = Tr A (ρ T ⊗ 1) C Ψ . Moreover, it also holds that Ψ † [ρ] = (Tr B (1 ⊗ ρ) C Ψ ) T . Finally, the Choi-Jamiołkowski isomorphism states that a map Ψ is completely positive if and only if the Choi state C Ψ is positive definite. We can pass to characterise U Q (σ, π) in terms of the corresponding Choi states. Consider a map Λ ∈ U Q (σ, π). Eq. (31) translates to: Moreover, it is also follows that Eq. (32) translates to: Tr A C Λ = Tr A (1 ⊗ 1)C Λ = (Λ) [1] = σ.

(84)
Since C Λ is positive semidefinite and Tr[C Λ ] = 1, the set U Q (σ, π), thanks to the Choi-Jamiołkowski isomorphism, is isomorphic to the set of all the bipartite quantum states ρ AB compatible with the two marginals ρ A = σ and ρ B = (π) T . This identification allows to constrain the spectrum of the Choi states in U Q (σ, π). In fact, one can construct a system of linear inequalities depending on the spectrum of ρ A and ρ B to constrain the spectrum of ρ AB [49,50]. Moreover, similarly with what happened for the classical case, one can use the spectrum of ρ AB to associate to a set of scalars a map in U Q (σ, π). Differently from the classical case, though, this association is not unique: in fact, the symmetry in Eq. (33) preserves the spectrum of the Choi matrix, so we can only associate to each set of scalars a unique equivalence class, but not a unique map.