Information recoverability of noisy quantum states

Extracting classical information from quantum systems is an essential step of many quantum algorithms. However, this information could be corrupted as the systems are prone to quantum noises, and its distortion under quantum dynamics has not been adequately investigated. In this work, we introduce a systematic framework to study how well we can retrieve information from noisy quantum states. Given a noisy quantum channel, we fully characterize the range of recoverable classical information. This condition allows a natural measure quantifying the information recoverability of a channel. Moreover, we resolve the minimum information retrieving cost, which, along with the corresponding optimal protocol, is efficiently computable by semidefinite programming. As applications, we establish the limits on the information retrieving cost for practical quantum noises and employ the corresponding protocols to mitigate errors in ground state energy estimation. Our work gives the first full characterization of information recoverability of noisy quantum states from the recoverable range to the recovering cost, revealing the ultimate limit of probabilistic error cancellation.


Background
The dynamics of a closed quantum system is described as a unitary evolution [1]. However, quantum systems are rarely closed in practice as they unavoidably interacts with the environment. Quantum channels, stemming from the unitary dynamics in a larger Hilbert space, are considered as the proper mathematical formalism depicting the evolution of general quantum systems [2]. Quantum channels represent the quantum information manipulation processes and are essential to quantum computation [2][3][4][5][6][7][8].
A central subroutine of quantum computation is to extract classical information from a quantum system. The expectation value of some chosen observable, also known as shadow information, characterizes physical properties of the quantum system, making estimating expectation values the main goal of many quantum algorithms [2,9], such as variational quantum eigensolver (VQE) [10]. The importance of expectation values has motivated the study of shadow tomography and related problems [11][12][13]. However, inevitable noises modeled as quantum channels can corrupt the shadow information as they undesirably change the state of the quantum system, preventing us from estimating the expectation value accurately.
It is natural to ask how a quantum channel N affects the information stored in quantum states. The minimum fidelity [14][15][16] of some initial state |ψ ψ| and the final state N (|ψ ψ|) is the one we often use to answer this question. But this method is not appropriate for quantifying how well a quantum channel preserves shadow information. Take the completely phase damping channel N PD (ρ) = ZρZ as an example. While F min (N PD ) = 0, the expectation value of a diagonal observable (e.g., observable Z) is unaffected. Similarly, other fidelity-based measures such as average fidelity and entanglement fidelity [14,17] are not suitable to depict the shadow information preservation either. Some other measures for information preservation are various channel capacities (e.g., classical capacity [18][19][20]), where each channel capacity quantifies a quantum channel's utility to transfer information for a certain purpose. However, these capacities are derived in asymptotic settings with multiple uses of the channel, and thus are not proper guides to practical tasks with finite quantum resources to some extent. Naturally, the following key questions then arise: 1. What is a proper way to quantify the level of shadow information preservation by a quantum channel?
2. How to quantify the cost of retrieving a certain piece of shadow information given it is preserved?

Contributions
To address these two questions in an operational way, we introduce a framework for retrieving shadow information from a noisy state. Both practically relevant and theoretically interesting, this framework combines an efficient way to manipulate quantum states and general quantum operations. Note that similar protocols are employed in predicting properties of quantum states [13,21] and mitigating quantum errors [22][23][24][25][26]. In particular, we suppose that multiple copies of a noisy state N (ρ) are available. For each copy we apply a quantum channel randomly sampled from an ensemble of channels {D j } and then measure the observable of interest, as shown in Fig. 1. We say that a channel N preserves the classical information inquired by an observable O if we can recover this information, that is: there exist a set of quantum channels {D j } and a set of real numbers {c j } such that We establish a necessary and sufficient condition for the preservation of the desired shadow information in terms of a relation between O and N that should be satisfied. This condition motivates a measure called shadow destructivity characterizing a quantum channel's level of destroying shadow information.
We define a measure called retrieving cost to quantify the minimum cost of recovering Tr[ρO], which also quantifies how well the channel preserves the shadow information queried by a specific observable. This measure, along with the concrete retrieving protocol, is efficiently computable with respect to the system dimension via semidefinite programming [27]. We analytically obtain the values of this measure and the retrieving protocols for generalized amplitude damping (GAD) channels and depolarizing channels. Our retrieving costs set ultimate limits on the cost required for shadow retrieving, and the corresponding protocols outperform existing probabilistic error cancellation (PEC) methods [22][23][24][25][26]. Specifically, we employ our method to estimate the ground state energies of several molecules with VQE. The results show that the gap of sampling overhead between our method and a conventional PEC method gets larger as the size of the quantum system grows, implying potential applications of our method in implementing near-term algorithms.

Preliminaries
Before introducing our results, we set the notations and define several quantities that will be used throughout this paper. We use symbols such as H A and H B to denote finite-dimensional Hilbert spaces associated with systems A and B, respectively. We use d A to denote the dimension of the system A. The sets of linear and Hermitian operators acting on A are denoted by L A and L H A , respectively.
In this work, we focus on linear maps having the same input and output dimension. A linear map N A→A transforms linear operators in L A to linear operators in L A , where H A is isomorphic to H A . We call a linear map N A→A a quantum channel if it is completely positive and tracepreserving (CPTP). By saying N is completely positive (CP), we mean id R ⊗ N is a positive map with a reference system R of arbitrary dimension. By saying N is trace-preserving (TP), we mean Tr[N ( For any linear map N A→A , its Choi-Jamiołkowski matrix is given by is an orthonormal basis in H A .

A Necessary and Sufficient Condition
We study the preservation of shadow information by considering an operational task: recovering the expectation value Tr[ρO] of an observable O from multiple copies of a corrupted state N (ρ), where N is the noisy channel of interest and ρ is an unknown state. A usual way [25] is to simulate a Hermitian-preserving and trace-preserving (HPTP) map D, which we call a retriever in this work, such that D • N = id, where id is the identity map. While this approach is sufficient for recovering the target information, its requirement on the retriever D is generally not necessary as we are only concerned with the expectation value of a specific observable. Hence, the only necessary requirement for a retriever D is for any state ρ and a fixed observable O. Our framework extends the set of potential retrievers from HPTP maps to Hermitian-preserving and trace-scaling (HPTS) maps for that HPTS maps are physically simulatable in the way prescribed by Eq. (1), as shown in Lemma 6. We say a linear map N is trace-scaling (TS) if Tr A [J N ] = pI A for some real number p, and HPTP maps are special cases of HPTS maps when p = 1.
Here, given a quantum channel N and an observable O, we give the only condition that a retriever D needs to satisfy so that Eq. (2) holds for an arbitrary state: where N † and D † are the adjoint maps of N and D, respectively. This condition is a direct implication of Lemma 1.  . The proof for the necessity part is rather straightforward, while that for the sufficiency part is nontrivial. To prove the "if" part, we denote by Q a Hermitian operator that evolves into the observable O under the map N † , i.e., N † (Q) = O. Then, it is sufficient to construct an HPTS map D such that D † (O) = Q. According to Lemma S1, we can instead construct an HP and unit-scaling map D † , which guarantees that the map D is HPTS. By saying that D † is unit-scaling, we mean that D † scales the identity operator. The main difficulty of this construction is to ensure that D † is unit-scaling. To overcome this difficulty, we split all the observables into three categories by their rank and trace.
We first consider observables that do not have full rank and are not traceless, i.e., Tr[O] = 0. For these observables, we exploit their kernel so that the constructed D † is unit-scaling. With some manipulation, the other two cases can be reduced to this one. The detailed proof is given below. Proof For the "only if" part, suppose such a map D exists for the given channel N and non-zero observable O. Note that D † is also a Hermitian-preserving map since its Choi operator is Hermitian, as we have noted in Remark 1. Then, we can let For the "if" part, suppose O ∈ N † (L H A ), which means that there exists a Hermitian operator Q such that N † (Q) = O. It is sufficient to construct an HPTS map D satisfying By Lemma S1, it is equivalent to construct a Hermitianpreserving and unit-scaling map D † satisfying the above requirement. Denoting the rank of the observable O by k, the trace of O by t, and the dimension of the Hilbert space H A by d, we complete the proof by constructing a Hermitian-preserving and unit-scaling map D † satisfying D † (O) = Q in each of the following three cases.
• Case 1: k < d and t = 0. Let P be the projection on the support of O and P ⊥ be the projection on the kernel of O. We can construct a map D † such that Clearly, D † is an HP map. Note that = Q and which implies that D † is also unit-scaling and D † (O) = Q.
• Case 2: k < d and t = 0. Since t = 0, O has at least two distinct eigenvalues. Let where ∆Q is a Hermitian operator that we will fix later. Letk denote the rank ofÕ andt the trace ofÕ. Following the definition ofÕ, we havẽ k = k − 1 andt = −λ 0 . Ask < d andt = 0, we can construct a Hermitian-preserving and unit-scaling map whereP denotes the projection on the support ofÕ andP ⊥ be the projection on the kernel ofÕ. By the definition ofÕ, we haveP OP = j>0 λ j |ψ j ψ j | andP ⊥ OP ⊥ = λ 0 |ψ 0 ψ 0 |, and thus Tr[P OP ] = −λ 0 and Tr[P ⊥ OP ⊥ ] = λ 0 . Hence, We first consider the case where O = cI for some real coefficient c. As Q can be any Hermitian operator satisfying N † (Q) = O, and we know that N † is unital, we can let Q = O so that D † being the identity map id will complete the proof.
Now, consider the case where O = cI for any real coefficient c, which implies that O must have at least two distinct eigenvalues. Let O = j λ j |ψ j ψ j | be the spectral decomposition of O, where each λ j = 0 and λ 0 is the smallest eigenvalue. Then, we defineÕ ≡ O − λ 0 I and Q ≡ Q + ∆Q, where ∆Q is a Hermitian operator that we will fix later. Letk denote the rank ofÕ andt the trace ofÕ. Following the definition ofÕ, we havek < d andt = t − d · λ 0 > 0. Hence, by Case 1, we can construct a Hermitian-preserving and unit-scaling map whereP denotes the projection on the support ofÕ andP ⊥ be the projection on the kernel ofÕ. By the definition ofÕ, we haveP Since any observable O can be categorized into one of the three cases above, we conclude that given a quantum channel N , if O ∈ Image(N † ), then there exists a Hermitian-preserving and unit-scaling map D † such that D † (O) = Q for some Q satisfying N † (Q) = O. As the adjoint of a Hermitian-preserving and unit-scaling map is HPTS, we can always find an HPTS retriever D so Fig. 2, Theorem 2 implies that the channel N preserves the shadow information or, equivalently, the shadow information is retrievable, if and only if the corresponding observable O is in the adjoint image of L H A under N . A physical interpretation of this theorem can be given within the Heisenberg picture, where quantum states are constant while observables evolve with time [3]. From this perspective, Theorem 2 is saying that Tr[ρO] can be recovered if and only if there is an observable Q evolving into O under the backward dynamics prescribed by the given N . This theorem also manifests the ultimate limitation of a quantum channel on preserving the shadow information queried by some observables.

Quantifying the Level of Shadow Information Preservation
Following Theorem 2, for a quantum channel N A→A , we define the dimension of N † (L H A ) as the channel N 's effective shadow dimension d s (N ), which quantifies how much shadow information is preserved by N . To compute the effective shadow dimension, we note that the set of all Hermitian operators L H A can be viewed as a vector space over the field of real numbers, and the set of all linear operators L A is a vector space over the field of complex numbers. Both vector spaces are Note that a basis for L H A is also a basis for L A , which implies that dim(N † (L H A )) = dim(N † (L A )). Then, by the properties of the adjoint [28], the image of N † is the space orthogonal to the null space of N . Hence, the dimension of , which equals the rank of a matrix of N as a linear map. A matrix of N as a linear map can be obtained from the channel's Kraus representation. Given Based on the effective shadow dimension, we define a measure called shadow destructivity: where d is the dimension of the Hilbert space that the input linear operators in L A act on. The shadow destructivity quantifies a quantum channel's capability to destruct shadow information and has some meaningful properties: We have shown that ζ(N ) = 0 if and only if all the shadow information is recoverable. In addition, the shadow destructivity is non-negative by its definition. Therefore, the shadow destructivity is faithful. Proposition 4 (Additivity of shadow destructivity). The shadow destructivity is additive with respect to tensor product, i.e., where M A→A and N B→B are two quantum channels.
Proof We have showed that the effective shadow dimension of a channel equals the rank of the matrix of it as a linear map, that is: By this property, we have 3. (Data processing inequality) The shadow destructivity cannot increase when a quantum state is sent through more channels, that is, ζ(N • M) ≥ max(ζ(N ), ζ(M)). This property implies that we can only lose shadow information by sending quantum states through quantum channels. As the shadow destructivity derived from Theorem 2 can be computed from the matrix rank of M N , Theorem 2 endows the matrix rank of M N an operational meaning in shadow information recoverability, where a higher rank indicates that more shadow information is recoverable. It is well known that a matrix is invertible if and only if it has full rank. Since full rank corresponds to the total recoverability of shadow information, invertibility of a channel is equivalent to its total shadow information recoverability.

Quantifying the Cost of Retrieving Shadow Information
By saying that a quantum channel preserves some shadow information, we mean this information can be retrieved after a state is corrupted by this channel. Though the effective shadow dimension of a quantum channel captures its overall capability of preserving shadow information, a piece of shadow information being preserved does not mean that retrieving it is cost-free, and different pieces of information can have different retrieving cost. In the following part, we derive a measure called retrieving cost from the number of copies of the corrupted state required to estimate the desired shadow information within an acceptable accuracy.
Note that the retriever D is an HPTS map but not necessarily CPTP, i.e., not necessarily a quantum channel. While such a map D is not physically implementable, we can simulate its action by decomposing it as a linear combination of CPTP maps: D = j c j D j , and utilizing Eq.
Proof We first prove that a linear combination of quantum channels is HPTS. The Choi-Jamiołkowski matrix of such a linear combination D Now let D be an HPTS map whose decomposition is j c j D j . With this decomposition, we can simulate its action on the expectation value by the probabilistic sampling method prescribed by Eq. (1). Specifically, in the s-th round of total S times of sampling, we first sample a quantum channel D (s) from {D j } with probabilities {|c j |/γ}, where γ = j |c j |, and apply it to a copy of the corrupted state to obtain D (s) •N (ρ). Then, we measure this state in the eigenbasis {|ψ j ψ j |} j of the given observable O = j o j |ψ j ψ j |, where o j ∈ [−1, 1], and obtain the measurement value o (s) . After S rounds of sampling, we attain an estimation for the expectation value Tr[ρO] as By the Hoeffding's inequality [30], the number of rounds required to obtain the estimation within an error ε with a probability no less than 1 − δ is It can be seen that the required number of copies S of the corrupted state is directly related to γ, the sum of absolute values of all c j . Hence, γ can be used to characterize the cost of simulating the HPTS map D. It is desirable to find a decomposition of D making γ as small as possible, and we denote the minimum possible value of γ as γ min (D). In Ref. [25], the authors define the logarithm of γ min (D) as the physical implementability of an HPTP map D.
Naturally, given a channel N , we define the retrieving cost with respect to an observable O to be which is the γ min (D) minimized over D that can retrieve the shadow information queried by O from the noisy channel N . Notably, Lemma 6 states that any HPTS map can be decomposed as a linear combination of two quantum channels (Eq. (26)), and we show in the Supplemental Material that the minimum retrieving cost can be achieved by such a decomposition. This is true because all channels with non-negative coefficients can be grouped into a single channel with a non-negative coefficient, and we can do the same thing for all channels with negative coefficients. The existence of such an optimal decomposition with two CPTP maps leads to an efficient way to compute this measure through a semidefinite program (SDP) in terms of linear maps' Choi-Jamiołkowski matrices: where J id is the Choi-Jamiołkowski matrix of an identity channel. Eq. (30c) corresponds to the condition that D 1 and D 2 are CP, and Eq. (30d) requires them to be TP. Eq. (30f) corresponds to the minimum requirement for a valid retriever as given in Eq. (3). Note that we can establish approximate versions of Eq. (30) by relaxing the requirement in Eq. (30f) such that ] is close to O within a certain tolerance. This relaxation provides a remedy to cases where a perfect retriever does not exist as well as a trade-off between the retrieving cost and the estimation precision.

Case Study and Applications in Error Mitigation
To intuitively understand effective shadow dimension and shadow destructivity, we study two noninvertible channels as examples: N 1 (·) = 1 2 I(·)I + 1 2 X(·)X and N 2 (·) = 1 2 I(·)I + 1 4 X(·)X + 1 4 Y (·)Y . 1/2 and N 2 (ρ) = 1/2 (b + ci)/2 (b − ci)/2 1/2 . While both channels are non-invertible, it is intuitive to see that channel N 2 preserves more information than channel N 1 since the imaginary part of off-diagonal elements is preserved. This intuition coincides with the effective shadow dimension of these two channels, d s (N 2 ) = 3 > d s (N 1 ) = 2, which implies that N 2 preserves one more shadow dimension than N 1 . The shadow destructivity of these channels, ζ(N 1 ) = log(4/2) = 1 and ζ(N 2 ) = log(4/3) ≈ 0.415, has similar implications. Now recall the retrieving cost, which quantifies the resources required to recover a certain piece of shadow information. The corresponding optimal retrieving protocol can be considered as a method to mitigate errors, endowing the retrieving cost with a practical meaning in quantum error mitigation [22-26, 26, 31-44].
In error mitigation, PEC methods are promising to suppress noises in quantum circuits run on near-term quantum computers and return us unbiased estimation of expectation values [22][23][24][25]. Briefly speaking, the idea of PEC is to decompose the inverse (HPTP) of a noisy channel N into a linear combination of physically implementable channels (CPTP), i.e., N −1 = j a j A j , with real coefficients a j and CPTP maps A j , where j a j = 1. The inverse map N −1 is implemented by sampling the channels A j according to a probability distribution p(j) = |aj | γcon , where γ con = j |a j |. In this way, the unbiased estimation of the expectation value of an observable O with respect to a state ρ is obtained by manipulating multiple copies of the noisy state N (ρ):  The quantity γ con is known as the overhead (also called retrieving cost in the setting of our work) because it captures how many copies of the noisy state are needed to ensure the estimation within the acceptable precision. Therefore, γ con is usually used to evaluate the efficiency of an error mitigation protocol. Our proposed method can be considered as an approach to mitigating errors, where the retrieving cost corresponds to the overhead of an error mitigation protocol. For a given noisy channel N and an observable O of interest, the SDP in Eq. (30) can give us a retriever D and its decomposition D = c 1 D 1 + c 2 D 2 with the optimized retrieving cost γ pro = |c 1 | + |c 2 |. The expectation of the estimation made by our method is = Tr [ρO]. (34) Note that in our method, the retriever D is not necessarily the inverse of the noisy channel, i.e., D = N −1 .
In the following, we compare the retrieving cost between our proposed protocol γ pro and the conventional PEC method γ con (see, e.g., [22][23][24][25][26]), with the assumption that full knowledge of the noise is accessible and the noise is modeled as a channel just before the measurement.
As a concrete example, consider retrieving shadow information Tr[ρX] from a state ρ corrupted by a GAD channel [45], where X is the Pauli X operator. The cost of our method is γ pro = 1 √ 1− , while the cost of the conventional method is γ con = |1−2p| +1 1− [25], where is the damping factor and p is the temperature indicator associated with the GAD channel. It is obvious to see that γ pro < γ con for any 0 < < 1, as shown in Fig. 3. Detailed protocols are provided in the Supplemental Material. Now we apply our protocol to ground state energy estimation with VQE. VQE [10] is a promising near-term algorithm for estimating the ground state energy of a Hamiltonian H. It aims to minimize the cost function E = Tr[|ψ(θ) ψ(θ)|H] by tuning parameters θ in a parameterized quantum circuit used to prepare the ansatz state |ψ(θ) . In practice, H is decomposed into a linear combination of tensor products of Pauli tensors {O j }, i.e., H = j h j O j , where each h j is a real coefficient. The core subroutine of VQE is estimating the expectation value of each term Tr[ρO j ].
We use Eq. (28) to estimate the numbers of sampling rounds required by VQE to estimate the ground state energies of given molecules when there is depolarizing noise [2] on each qubit. Table 1 shows the Comparison between our protocol and the conventional protocol [25]. The retrieving cost of our proposed method is γ pro = 1 1− (see Supplemental Material), while the conventional method is γ con = 1+(1−2/d 2 ) 1− , where is the noise level, and d is the dimension of the system. It is clear that the number of sampling rounds of our method is smaller than that of the conventional method.  Besides the advantage of having a lower cost, our method has a larger range of applicability. The conventional method implements the retriever as the inverse of the noisy channel. In contrast, it only works for the cases where the noise is invertible. However, our method, employing an observable-adaptive strategy, also works for some cases where the noise is non-invertible. As long as the observable of interest is in the image of the noise's adjoint map, our method can provide an optimal protocol for recovering the corresponding shadow information.

Conclusions
In this work, we establish two measures from an operational perspective to quantify a quantum channel's influence on shadow information encoded in quantum states, answering the questions posed at the beginning. Our work delivers a systematic framework of quantifying the information preservativity and destructivity of a quantum channel. Meanwhile it establishes an optimal way of extracting noiseless classical information from noisy states, which could be useful for near-term quantum information processing tasks.
While the shadow destructivity and the retrieving cost capture different aspects of shadow information recoverability, a single unified measure that captures both aspects may be worth further studies. For future work, it would be interesting to see how the techniques presented in this work can be combined with quantum error correction [46][47][48] or quantum error mitigation [37,[41][42][43][44]. We also expect that our ideas could be applied to near-term quantum tasks or applications on noisy quantum devices [49].

Adjoint of a Trace-scaling Map
Lemma S1 The adjoint of a trace-scaling map is unit-scaling, and vice versa. By calling a linear map N unit-scaling, we mean it scales the identity operator, i.e., N (I) = pI for some real number p.
Proof Suppose that N is a trace-scaling map such that Tr[N (·)] = p Tr[·] for some real number p. If p = 0, we can define a TP map M ≡ N /p. It is known that the adjoint of TP maps are unital [50], which preserves the identity operator, and vice versa. Thus, M † , the adjoint of M, is a unital map. Then, N † = pM † , the adjoint of the trace-scaling map N , is a unit-scaling map. On the other hand, if p = 0, then N + id is TP and thus N † + id is unital, making N † a unit-scaling map. Hence, the adjoint of a trace-scaling map is unit-scaling. Now suppose that N is a unit-scaling map such that N (I) = pI for some real number p. If p = 0, then M ≡ N /p is a unital map. Thus, we know its adjoint M † is TP, making N † = pM † trace-scaling. On the other hand, if p = 0, then N + id is unital and hence its adjoint N † + id is TP, again making N † trace-scaling. Hence, the adjoint of a unit-scaling map is trace-scaling.

Retrieving Cost
Recall that, given a quantum channel N , the shadow retrieving cost with respect to an observable O is defined as where γ min (D) is the minimum cost for simulating D. Here, we give a proof that the minimum cost of simulating an HPTS map can be achieved by a decomposition with two CPTP maps. The same thing has been proved for HPTP maps in Ref. [25], and the proof here is similar. Proof Lemma 6 implies that γ min (D) is finite as D can be decomposed into a linear combination of two CPTP maps. Suppose γ min (D) is achieved by D = T j=1 c j D j , where T ≥ 3 so that γ min (D) = T j=1 |c j |. Then, it is always possible to construct two CPTP maps D 1 , D 2 so that D = c 1 D 1 + c 2 D 2 and |c 1 | + |c 2 | = γ min (D). In particular, let c 1 = j:c j ≥0 c j , D 1 = j:c j ≥0 c j /c 1 · D j , c 2 = j:c j <0 c j , and D 2 = j:c j <0 c j /c 2 · D j . One can check that D 1 and D 2 are CPTP and

Dual SDP for Retrieving Cost
For a quantum channel N A→A , the primal SDP characterization of its retrieving cost with respect to an observable O is The Lagrange function is where M , N , K are the dual variables. The last term can be expanded as The corresponding Lagrange dual function is For J D1 ≥ 0 and J D2 ≥ 0, it must hold that Tr[M ] ≤ 1, Tr[N ] ≤ 1, Thus, we arrive at the following dual SDP:

Retrieving Cost for GAD channel
A GAD channel can be used to describe the energy dissipation to the environment at finite temperature [45]. It is one of the realistic sources of noise in superconducting quantum computing. For single-qubit cases, a GAD channel can be characterized by the following Kraus operators: where is the damping factor and p is the indicator of the temperature of the environment. A quantum state ρ after going through the GAD channel is given by Note that the amplitude damping channel is a special case of the GAD channel when p = 1. A single qubit state ρ = ρ 00 ρ 01 ρ 10 ρ 11 after going through the GAD channel is where and p are noise parameters.

Proposition S3
Proof First, we are going to prove γ O (N ) ≤ 1 √ 1− using SDP (S3). We show that the retriever D above is a feasible solution with a cost of 1 √ 1− . So to be specific, we have This means that where the second equality follows from the fact that O 2 = I for O ∈ {X, Y }, and the third equality can be verified with direct calculation using Eq. (S21). Here, we have proven that the retriever D is a feasible solution to retrieve information with cost 1 To be specific, we have where O ji ≡ j|O|i . Since O ∈ {X, Y } only has non-zero elements on the anti-diagonal, we have where the second equality follows from Eq. (S21). This means that From the proof, we also know that the above retriever D is optimal. Moreover, since the Choi matrices of D 1 and D 2 are given already, it is trivial to derive the corresponding Kraus operators, It is interesting to note that there is a connection between the retrieving cost and the spectral properties of the noisy channel and the observable of interest. Let . Thus, we need a retriever to scales the expectation value back, and such a retriever corresponds to a retrieving cost of 1/ √ 1 − , which aligns with the above proposition. The same phenomena are observed for the mixed Pauli noises with Pauli observables (see Proposition S5).
Therefore, and prove that they form a feasible solution to the dual SDP.
Therefore, . From the proof, we also know that the above retriever D is optimal. Moreover, since the Choi matrix has been given, the corresponding Kraus operators of the retriever can be easily derived.

Retrieving Cost for Mixed Pauli Channel
The mixed Pauli channel is a common noise model in quantum computers. For single-qubit cases, a quantum state corrupted by mixed Pauli becomes where p i , p x , p y , p z are the corresponding probabilities with p i + p x + p y + p z = 1 and 0 ≤ p ≤ 1 for each p ∈ {p i , p x , p y , p z }. For general n-qubit cases, the noisy state is where the sum is over all the n-qubit Pauli operators {σ}, and {p σ } are the corresponding probabilities with σ p σ = 1 and 0 ≤ p σ ≤ 1. Note that the depolarizing channel is a special case of the mixed Pauli channels, where p x = p y = p z . For an arbitrary state ρ, after the mixed Pauli channel, it becomes ρ = N (ρ) = σ + p σ + σ + ρσ + + σ − p σ − σ − ρσ − . Then, we have

Proposition S5
where O T ji ≡ j|O T |i . This implies that