Entanglement-Free Parameter Estimation of Generalized Pauli Channels

We propose a parameter estimation protocol for generalized Pauli channels acting on $d$-dimensional Hilbert space. The salient features of the proposed method include product probe states and measurements, the number of measurement configurations linear in $d$, minimal post-processing, and the scaling of the mean square error comparable to that of the entanglement-based parameter estimation scheme for generalized Pauli channels. We also show that while measuring generalized Pauli operators the errors caused by the Pauli noise can be modeled as measurement errors. This makes it possible to utilize the measurement error mitigation framework to mitigate the errors caused by the generalized Pauli channels. We use this result to mitigate noise on the probe states and recover the scaling of the noiseless probes, except with a noise strength-dependent constant factor. This method of modeling Pauli channel as measurement noise can also be of independent interest in other NISQ tasks, e.g., state tomography problems, variational quantum algorithms, and other channel estimation problems where Pauli measurements have the central role.


Introduction
Second quantum revolution has introduced a wide range of new quantum technologies. Quantum states and channels hold a central role in the efficient and successful implementation of all of these technologies. It is desirable to design our systems-of-interest as close to ideal behaviour as possible. However, environmental effects and nonidealities in designed components inevitably and irrecoverably introduce noise in these systems. A general method to model this noise in system components is through quantum channels. Ideally, one would aim for system components to be noiseless and error free, i.e., involved channels are identity channels. However, it is almost impossible to design noiseless system components. The next best possible scenario is to have a complete knowledge of noise present in the system. That is, to know all the ways in which noise can corrupt the system and lead it to deviate from the intended behaviour. Having a complete knowledge of noise present in system components allows one to efficiently minimize the errors introduced by the noise [1][2][3].
Quantum process tomography is the method to identify an unknown quantum dynamical process [4][5][6]. The general method of process tomography is to prepare probe states in different initial states, let them evolve through the quantum process of interest, and then measure the output states with different measurement settings [7,8]. A measurement configuration is the specific setting of initial state of probes and measurement settings, i.e., changing the initial state of probe or the measurement setting gives a new measurement configuration. In general, the quantum process tomography is a resource-intensive and experimentally demanding process; standard quantum process tomography of a general quantum channel on d-dimensional Hilbert space requires d 4 measurement configurations. This stringent requirement of a large number of measurement configurations can be relaxed either by operating on a larger Hilbert space (entangled probes schemes) or by making reasonable assumptions on the channel structure based on the prior knowledge [9][10][11]. Examples of the latter strategy include assumption of rank deficiency [12] or modeling the unknown given channel as a parametric class of channels and then estimating the unknown parameters [13][14][15].
Examples of such parametric classes of channels include Pauli qubit channels and their higher-dimensional generalizations including discrete Weyl channels (DWCs) [16,17]. Study of Pauli channels and their generalizations is well motivated by several important properties of this class. For example, it is known that every unital qubit channel is similar to Pauli qubit channel [18]. Furthermore, several physically important classes of quantum channels are special cases of Pauli channels. Examples include depolarizing, dephasing, bit-flip, and two-Pauli channels. Furthermore, any noise model on a multiqubit system can be modeled as having the form of a Pauli channel [19,20]. In recent times, some practical methods have been introduced that effectively approximate any noise model as the Pauli channel [20][21][22][23][24] e.g., by twirling via Pauli operators. Unfortunately, some of the above motivations no longer remain true for the higher dimensional generalizations of Pauli channels [25]. Regardless, generalizations of Pauli channels remain an important and interesting topic of study in the theory of quantum information processing.
Due to their practical relevance and versatility, several researchers have studied the general and specific variants of Pauli channels to devise different strategies for estimating their parameters [13,20,[26][27][28][29][30][31][32][33][34][35]. Of particular interest to us is the entanglement-assisted optimal parameter estimation (OPE) protocol presented in [26], which is optimal in the sense of Cramér-Rao bound, provides the best scaling of mean square error (MSE) in the number of channel uses, requires only a single measurement configuration, and deals with the most general case of the generalized Pauli channels without any further assumptions. Experimental realization of this protocol for qubit Pauli channels was given in [36]. However, experimental realization of this (and other entanglement-assisted) protocol becomes extremely challenging in the higher-dimensional cases due to difficulties involved in generating, maintaining, and processing higher-dimensional entangled states [37,38].
In this paper, we present a protocol for the parameter estimation of DWCs, which can also be applied on the other generalizations of Pauli channels. The proposed protocol, called the direct parameter estimation of Pauli channels (DPEPC), is solely based on separable states but provides the same scaling of MSE as a function of channel uses as that of the OPE but with a multiplicative factor. Unfortunately, DPEPC requires more than a single measurement configurations. However, extensive numerical examples suggest that the required number of measurement configurations scales linearly with the dimension of the Hilbert space. Additionally, we show that in a system with Pauli measurements, errors caused by a Pauli channel can be efficiently modeled as measurement errors. Then, the framework of measurement error mitigation can successfully mitigates these errors. We provide numerical examples of this error mitigation by introducing additional depolarizing noise on the probe states and then mitigating its effects by the aforementioned technique. This procedure recovers the original scaling of both DPEPC and OPE except with another noise strength-dependent multiplicative factor, if the noise strength is known.
The remainder of this paper is organized as follows. In Section 2 we set the notations and preliminaries. Section 3 and 4 provide the protocol and numerical examples of DPEPC for the DWCs, respectively. In Section 5, we provide the conclusions and future outlook.

Notations and Preliminaries
A DWC is a qudit generalization of qubit Pauli channels. The DWC acts on a quantum state ρ as where For simplicity, we will also utilize a single index notation for discrete Weyl operators and the elements of probability vector of (1), where Vk = W n,m and qk = p n,m , with k = n + md. There exists an index-based relation between a Weyl operator W a,b and the eigenvectors of another Weyl operator W n,m . The relationship was first presented by the authors in [39] and is formally given in Lemma 1 of the current manuscript. Due to repetitive appearance of index relation ma − nb mod d, we define it as f k ; n, m where it is understood thatk will first be decompressed to the double index notation to calculate ma−nb mod d. In particular, f k ; n, m = 0 if and only if W a,b and W n,m commute. We denote the orthonormal eigenbasis of W n,m by B n,m . We also define Q d = {0, 1, · · · , d − 1}.

Direct Parameter Estimation of Pauli Channels
In this section, we outline our protocol for the parameter estimation of Pauli channels.
The key idea is the equivalence of DWCs with classical symmetric channels under certain conditions [39]. By estimating the transition probabilities of emulated classical symmetric channels, we are able to reconstruct the full parameter set of the underlying DWCs. We also explore the quantum error mitigation for mitigating errors caused by noise in the probe states.

Proposed Protocol
A DWC acts as a classical symmetric channel when the inputs to the channel are the elements of B n,m , and the measurement at the output is a projective measurement in B n,m . Then, the transition probabilities of the effective classical channels are given by the following lemma.  where qk are the parameters of the DWC.
In the context of the simulated classical channel, λ n,m is the probability of observing the output state |(i + ) n,m when the input state to the channel was |i n,m . Due to the orthogonality of the elements of B n,m it is possible to obtain a direct estimate on λ n,m , ∈ Q d by utilizing Lemma 1. Additionally, due to the independence of λ n,m from the index i of the input state, the estimates on λ n,m for all are obtained simultaneously. That is, for any chosen |i n,m from B n,m and for any , λ n,m is simply the fraction of times |(i + ) n,m is measured at the channel output. Therefore, one experiment configuration (fixed input and projective measurement in B n,m ) is sufficient to estimate the complete set of d transition probabilities λ n,m for a fixed W n,m .
For a fixed B n,m , (3) provides a set of d simultaneous equations which can be written in the matrix form A n,m x = b n,m , where b n,m (resp. x) is the d × 1 (resp. d 2 × 1) vector with λ n,m (resp. p a,b ) as its elements and A n,m is a d × d 2 matrix with entries defined as where I j (i) is the indicator function defined as Once we obtain the estimates on the elements of b n,m , we can attempt to solve the set of equations A n,m x = b n,m to obtain the channel parameters contained in x. However, in order to solve A n,m x = b n,m for a unique x, we need the rank of A n,m to be d 2 , which is impossible for our d × d 2 matrix A n,m . Since the summation (3) partitions the elements qk in d disjoint sets of d elements each such that the elements in each set contribute to a particular λ n,m , the rows of A n,m are linearly independent. Thus, A n,m has rank d for any W n,m that has d distinct eigenvalues. We can solve this problem of having smaller number of available simultaneous equations than the unknowns in the system by obtaining more equations for different n, m values. That is, we invoke Lemma 1 for K different values of n and m to obtain at least Kd equation in the matrix form We denote the matrix on the left hand side of (6) by A d K , where the superscript denotes the dimension of the Hilbert space on which the channel operates and the subscript denotes the total number of non-commuting W n,m using which Lemma 1 was invoked. 1 The set of corresponding indices of Weyl operators utilized in generating A d K is denoted by W idx .
One would then hope that the system (6) with K = d, would have a unique solution. However, we show that the matrix A d d is still rank deficient for any d. First, note that all the elements of the row obtained by summing all rows of any A n k ,m k will be 1. Then, one can obtain any row of any A n k ,m k by simply subtracting all other rows of A n k ,m k from the row containing all 1's. Therefore, A d d despite being of dimension d 2 × d 2 is still rank deficient. Therefore, the minimum K such that A d K has rank d 2 for any d is at least d + 1. In the following, we call an A d K sufficient if it has rank d 2 . Analytically obtaining the exact value of the smallest K for an arbitrary d such that A d K is sufficient is difficult. To overcome this difficulty, we algorithmically obtain W idx . 2 Verbal description of our algorithm is as follows. We first utilize the results from [40] to calculate the total number of distinct eigenvalues of all discrete Weyl operators on H d . We make a set W d of all Weyl operators that have d distinct eigenvalues. Then, we utilize the identity W n,m W p,q = ω nq−mp W p,q W n,m to identify the commutation relations of operators within W d . We make subsets of W d such that operators within each subset mutually commute. Finally, we obtain W idx by choosing one operator each from the commuting subsets of W d . We verify that W idx generates a sufficient A d K by constructing the corresponding A d K and verifying that it has rank d 2 .
We used this algorithm for d upto 100, which provides the following insights. For any d, an A d d 2 −1 is always sufficient. That is, for a d-dimensional DWC, d 2 − 1 measurement configurations are always sufficient to perform the full process tomography. This number can be considerably reduced by utilizing the commutation relations of discrete Weyl operators. We were able to obtain a sufficient A d K for K < d × 2.5 for any H d as large as d = 100. Figure 1 shows the required number of measurement configurations K obtained via this algorithm. Furthermore, for any prime d, A d d+1 is sufficient. This latter observation is expected to hold beyond the values of d which we numerically checked, since it is not possible to construct a set of more than d + 1 noncommuting Weyl operators for a prime d [40]. Therefore, if A d d 2 −1 being sufficient for any d is always true, then the sufficiency of A d d+1 for any prime d also holds everywhere. Obtaining a sufficient A d K entails constructing the binary matrix A d K as well as identifying the indices n i , m i , for 1 ≤ k ≤ K of Weyl operators whose eigenstates will be utilized for the DPEPC of DWC. Once a sufficient A d K is found for a d, the DPEPC of a DWC for N channels uses can be performed as follows. Prepare N/K copies of an eigenstate |s n k ,m k of W n k ,m k for every 1 ≤ k ≤ K and send them through the channel N dwc . For every |s n k ,m k at input, measure the channel output in B n k ,m k and record the measurement. Measurement outcomes provide an estimateλ n k ,m k for all λ n k ,m k . Construct the vector b d K , which is an estimate on the vector on the right hand side of (6). Finally, obtain the estimatesp i,j on channel parameters p i,j by the method of least squares, i.e., where (·) T and (·) −1 are the matrix transpose and the matrix inverse operations, respectively. Note that the inverse in (7) is only dependent on d and the utilized measurement configurations, not on data. Thus, after fixing d and K, we can precompute : OPE protocol of [26]. N copies of probe state |Ψ AB are prepared, where |Ψ AB ∈ H d ⊗H d is the two-qudit maximally entangled state. One of the qudits is allowed to evolve under N dwc and subsequently a joint B bell measurement on both qudits in perfomed. Finally, the vector of probabilities isp i,j is estimated from the measurement statistics.
All subsequent runs of DPEPC can be completed simply by computingx = B d Kb d K . In this sense, no matrix inversion is needed in DPEPC. Figure 2 depicts the complete DPEPC protocol for DWCs.
It was shown in [40] that variance in the estimates on the transition probabilities of Lemma 1 scale with 1/N , which is same as the scaling of OPE, except with a constant multiplicative factor K. We obtain the estimatesp n,m on the channel parameters p n,m by multiplying the estimates on transition probabilities with a matrix which is independent of N . Therefore, we obtain the same scaling in the estimates ofp n,m , i.e., K/N . Before moving to the numerical examples and comparison section, we provide an expository example of DPEPC for d = 2 DWC. This example not only serves the purpose of exposition but also highlights the salient features of the DPEPC for DWCs.

Example 1 (DPEPC for the qubit DWC). We have
|0 1,0 = |0 , and The corresponding measurement settings are the projective measurements in B n i ,m i . The estimateλ n i ,m i is the relative frequency of outcome | n i ,m i when |0 n i ,m i was input to the channel. Finally, an estimate on the channel parameters is obtained via (7).
We stress that once a sufficient A d K is constructed for a given d, that A d K can be utilized for all the subsequent DPEPC experiments for all DWCs operating on H d . This also fixes the measurement configurations and the pseudo-inverse of A d K appearing on the right side of (7) for this d. Therefore, the DPEPC for DWCs does not involve experiment design, matrix inversion, or optimization of any kind. The DPEPC protocol for DWCs is then to simply perform measurements in K pre-defined measurement configurations and plug-in the frequencies of measurement results in b d K to directly obtain the channel parameters x.

Quantum Error Mitigation for DPEPC
The proposed protocol in the previous subsection relies on the ability to sufficiently isolate the prepared probe states such that the only noisy evolution they go through is the noisy channel under study. However, this isolation might not be possible in practice. An unintended noisy evolution might occur anywhere from preparation to the final measurement. Such a scenario is shown in Figure 4 where an unintended noise N un may corrupt the probe states. In the following, we show that the errors caused by this unintended noise can be mitigated if it is of Pauli form. Specifically, we show that the framework of measurement error mitigation [41] can be utilized to mitigate the errors cause by generalized Pauli channels. Let us first assume that N un is also of Pauli form. Then, we have the following convenient result.
Moving from (12) to (13), we changed the order of summation, the order of product of q r,s and p n,m , and also the order of product of Weyl operators by utilizing the commutation relation W n,m W r,s = ω rm−sn W r,s W n,m .
The commutation of these two noisy channels allows use to model noise anywhere in the protocol by a single noisy process N un as long as its overall form is of a Pauli channel. Furthermore, for ease in the analysis we can move N un to any point in the protocol before measurement. However, it makes more sense to assume that N un acts only on the probe states before leaving the Alice's laboratory. That is because all noisy evolution after leaving Alice's laboratory and before being measured by Bob is actually the noisy channel between Alice and Bob.
Then, Alice can execute the DPEPC locally in her laboratory to estimate the parameters of N un and send this information classically to Bob, who can utilize the measurement error mitigation framework as described below.
Errors caused by a faulty measurement device are termed as measurement errors and are characterized by a column stochastic matrix Γ [41]. Let us assume that we apply a projective measurement characterized by a set of projectors {Π i } i on a quantum state ρ. The ideal probabilities of measurement outcomes are given by a probability vector P ideal whose ith element is p i = tr (Π i ρ). On the other hand, the probabilities of measurement outcome from a noisy measurement device characterized by Γ are given by a probability vector given by P noisy = ΓP ideal . If the noise in the measurement device is known, i.e., if Γ is known, an estimate of ideal probabilities of measurement outcomes can be obtained from noisy measurement results by P ideal = Γ −1 P noisy [41].
By the virtue of Lemma 2, we can assume N un to act just before the measurement. Let ρ be the state before N un and the final measurement be in B n,m . We can decompose ρ in B n,m as where α i,i is the probability of obtaining the measurement outcome corresponding to |i n,m when measuring ρ. We can write the state after N un as where we have dropped the subscript of N un for simplicity. The nondiagonal part of ρ in the basis B n,m , i.e., i,j i =j α i,j |i n,m j n,m | remains nondiagonal after the application of N un and does not contribute in the final measurement in B n,m [39]. Therefore, we only need to consider the N (|i n,m i n,m |) terms. Furthermore, the effect of DWC on the eigenstates of any W n,m followed by a measurement in B n,m can be modeled by a classical symmetric channel [39], which in turn is characterized by a doubly stochastic matrix. Therefore, where β = Λ n,m α, with α, β, and Λ n,m as the vector of α i,i 's, vector of β i,i 's, and the doubly stochastic matrix characterizing the effect of classical symmetric channel induced on B n,m . In case of a noiseless measurement device, but the presence of channel noise N un of Pauli form, we record the measurement probabilities β. However, if N un is known, we can simply estimate the ideal probabilities α = Λ −1 n,m β. In case of a noisy measurement device as well as the presence of channel noise N un of Pauli form, we record the measurement probabilities γ = Γ β = ΓΛ n,m α. Since the product of a left stochastic and a doubly stochastic matrix is another left stochastic matrix, we are still operating in the framework of measurement errors, and can perform the error mitigation as easily by inverting the matrix. Therefore, we can mitigate the errors caused by the noisy measurement device as well as from Pauli noise in the system in a unified manner.
Before moving to the numerical examples section, we remark that the only assumptions we made are the channel noise to be of Pauli form and the final measurement to be in the eigenbasis of some Pauli operator. These assumptions are not too demanding given the general nature of Pauli channels and the importance of Pauli measurements. Examples include the current protocol, quantum state tomography tasks [42,43], variational quantum algorithms [44], and other quantum information processing tasks [45], where Pauli measurements have the central role. Therefore, this modeling of Pauli noise in the framework of measurement errors and measurement error mitigation can be of independent interest beyond the protocol at hand.

Numerical Examples
In this Section we provide numerical examples of DPEPC and compare its performance with the entanglement-based optimal parameter estimation (OPE) method of [26] shown in Figure 3. The channel parameters were the eigenvalues of the d 2 × d 2 exponential correlation matrix [46] We recall that γ = 0 in (17) gives completely depolarizing (highly noisy) channel, and γ = 1 gives an ideal (noiseless) DWC. Furthermore, increasing γ makes the channel parameters more ordered in terms of majorization, giving less noisy channels [40,46]. Also, we assume the unintended channel to be the depolarizing channel parameterized by a real parameter κ, where κ = 0, 1 corresponds to the noiseless and the fully depolarizing channels, respectively.
The performance metrics we use in our numerical examples are the variance and the mean square error (MSE) of the estimates. A natural performance metric for process tomography, channel estimation, and channel distinguishing problems is the diamond norm distance. However, we provide the results in the main text in terms of variance and the MSE because of the following reasons. 1) We are dealing with a parametric class of channels where the channel structure is fixed. Then, the problem essentially boils down to a parameter estimation problem. In these problems variance and the MSE are more natural performance metrics. 2) Together, MSE and the variance of the estimates provide more information. For example, We can easily observe a bias in our estimates if the MSE and the variance are not equal. For interested readers, we also provide all our numerical results in terms of diamond norm distance in the Appendix of this paper. Figure 5 shows the performance comparison of DPEPC and OPE of DWCs for d = 5, 6, 7, and 8, and γ = 0.7 with different noise strengths κ. We plot the variance/MSE against the number of channel uses N . Blue (resp. red) Solid lines show the variance of DPEPC (resp. OPE), which is same for the MSE for the noiseless (κ = 0) case. These values of variance and MSE are summed over all parameters p n,m of the channels. Note that the two variance lines are parallel, depicting same scaling in variance 1/N as a function of number of channel uses (N ). The separation between the two lines is the multiplicative factor in the scaling and is a function of K, the number of measurement configurations we need to uniquely identify all channel parameters. This separation also shows the tradeoff between entanglement-assisted and entanglement-free schemes. By avoiding the use of entanglement in our scheme for the sake of experimental feasibility, we need to utilize more experimental configurations and perform our experiment more number of times (by a constant factor, independent of N ) to obtain the same performance.
The performance of OPE for different values of d looks very similar in Figure 5. This seems counter intuitive but can be easily explained as follows. The measurement outcomes in the OPE follow a multinomial distribution where the probability of obtaining measurement outcome i is p i . 3 Let X i be the random variable characterizing the number of times event i is observed in N trials. Then, the variance Var {X i } = N p i (1 − p i ). Since we use the maximum likelihood estimator, i.e.,p i = X i /N , its variance is Var The total variance of DPEPC can be calculated as tr (Σ x ), where the covariance matrix We can see the aforementioned effect in the variance of DPEPC when K is same, e.g., for d = 6 and 8. On the other hand, the effect of increase (d = 5, K = 6 to d = 6, K = 12) or decrease (d = 6, K = 12 to d = 7, K = 8) in K affects the performance as expected.
Dashed lines in Figure 5 show the performance of DPEPC and OPE when the initial probe states are subject to depolarizing noise of strength κ. It can be seen that stronger the noise, earlier the MSE departs from the variance of the estimators and becomes independent of N . It is natural to think that noisier the original DWC whose description we are trying to obtain, higher the tolerance to the depolarizing noise. For example, if the original DWC is completely depolarizing, the MSE will improve with increasing N indefinitely, regardless of the depolarizing noise strength κ. Figure 6 confirms this intuition, where we plot the MSE of DPEPC as a function of γ and κ. We recall that γ = 0 gives a completely depolarizing channel and γ = 1 is a noiseless channel. From the figure, it is clear that the noise on probes has a minimal effect on the MSE performance for smaller 0.2 values of γ. On the other hand, closer the channel under study is to the ideal one, it is more affected by the unintended noise of strength κ.
The effect of depolarizing noise on the initial probe states can be interpreted as a noise strength-dependent saturation point on the number of channel uses N , such that increasing N does not improve the estimation MSE beyond the saturation point. This saturation behaviour is typical of a biased estimator, i.e., where ν = 0 is the bias in the estimates. If the strength of noise on the initial probe states is known, it is possible to avoid the saturation behaviour seen in Figure 5 and 6 by utilizing the measurement error mitigation framework. For the depolarizing channel, we can achieve this by simply setting wherep n,m is the new (bias mitigated) estimate of p n,m . Note that since the depolarizing channel is a special case of the Pauli channel, (19) is a special case of the measurement error mitigation by matrix inversion, which was simplified due to the high symmetry of the depolarizing channel. Figure 7 shows the effect of error mitigation in the DPEPC estimates of a DWC in d = 27 with γ = 0.7, and κ = 0.1 and 0.9. It can be noted that the error mitigation introduces a κ-dependent multiplicative factor in the scaling against the number of channel uses N . This is because for large κ, contribution from uniform distribution dominates in (18), reducing the information about the original distribution in the measurement outcomes. Since we are utilizing maximum likelihood estimates, it is possible to obtain incorrect channel parameters, i.e., negative elements, and parameter sum not equal to one. The effect of these incorrect parameter ranges is enhanced due to error mitigation. In such cases, we set the negative parameter values to 0 and normalize the error mitigated distribution, i.e., we project the obtained vector on the probability simplex. We call this process correction and plot the MSE performance of error mitigation with both correction and without correction. It can be seen that this correction significantly improves the MSE performance of bias mitigated estimates.
Finally, the main ingredient of our DPEPC for DWCs is Lemma 1, which has been generalized to the generalized Pauli channels in [47]. From discussion in [47], it is straightforward to generalize the DPEPC for other generalizations of Pauli channels.

Conclusions
We have presented a process tomography/parameter estimation scheme for DWCs, which can be extended to the other definitions of generalized Pauli channels. The proposed method operates with separable probe states, yet provides same MSE scaling against number of channel uses as that of entangled-based parameter estimation scheme. Numerical examples show that the number of measurement configurations K scales linearly with the Hilbert space's dimension. We also showed that the framework of measurement error mitigation can be useful in systems with Pauli noise and Pauli measurements. In particular, we exemplified the depolarizing noise on the probe state of both DPEPC and OPE and mitigated the consequent errors by utilizing measurement error mitigation. Future directions may include analytical results on K, the number of measurement configurations required in H d . Furthermore, the DPEPC and the OPE are clearly the extreme points in terms of utilizing entanglement in parameterized channel estimation task. It needs to be investigated if there exist some intermediate schemes where limited entanglement may be utilized to access the tradeoff between the entanglement and the number of channel uses/required number of experimental configurations. Another possible future direction is to utilize DPEPC as a first step in identifying any unknown given channel and then using the results of DPEPC in a second step to completely identify the unknown channel. This direction can particularly be interesting due to low resource requirements and fast converging behaviour of DPEPC.

A Diamond Norm Distance Performance
The diamond norm distance is a natural distance between two quantum channels. For the sake of completeness and for interested readers, here we reproduce all numerical examples figures of main text with diamond norm distance as the performance metric.
The diamond norm distance between two quantum channels N 1 and N 2 is given by [23,48] where I d is the identity map on H d , and X 1 = tr (X † X).
The diamond norm distance naturally captures the notion of distinguishability of two quantum channels [49]. The diamond norm distance can be formulated as a semidefinite convex program and thus can be efficiently computed in the problem size [50,51]. Despite its efficiency of computation as a convex program, it is difficult to employ semidefinite programming in the present manuscript since we have examples where d is as large as 27. Consequently, the problem size of optimization in (S1) is d 2 = 729 which is not easy to solve on a personal computer. Furthermore, for obtaining good quality numerical examples, averaging of several samples on each point is required. Due to these reasons, utilizing convex programming for providing numerical examples of this paper is difficult.
Fortunately, for the case of Pauli channels, an exact analytical expression for diamond norm distance can be obtained [49,52]. Let N 1 and N 2 be two Pauli channels of arbitrary finite dimension with parameter sets {p n,m } and q n,m , then [49,52]  We use (S2) to calculate the diamond norm distance between the actual and estimated Pauli channels. We replicate all the numerical example figures from the main text, i.e., Figs. 5, 6, and 7 as Figs. S5, S6, and S7, respectively. These supplementary figures provide the same qualitative insights as that of main text numerical examples except Figure S5.
In Figure S5, we note that increase in d has slightly more perceptible difference as compared to what we noticed in Figure 5. This is because (S2) is actually the 1 norm distance between the actual and the estimated distribution and is of order ≈ i p i (1 − p i ) /N for maximum likelihood estimates of the distribution {p i } [53]. The term i p i (1 − p i ) = 4.07, 4.93, 5.79, and 6.63 for d = 5, 6, 7, and 8, respectively, for the considered examples of γ = 0.7. For uniform sampling from the probability simplex, this term is ≈ 4.32, 5.22, 6.12, and 7.02 for d = 5, 6, 7, and 8, respectively. Due to this slightly higher increase in these values for increasing d, the dependence on d in Figure S5 is slightly more perceptible than in the variance in Figure 5 of the main text.