Projected Least-Squares Quantum Process Tomography

We propose and investigate a new method of quantum process tomography (QPT) which we call projected least squares (PLS). In short, PLS consists of ﬁrst computing the least-squares estimator of the Choi matrix of an unknown channel, and subsequently projecting it onto the convex set of Choi matrices. We consider four experimental setups including direct QPT with Pauli eigenvectors as input and Pauli measurements, and ancilla-assisted QPT with mutually unbiased bases (MUB) measurements. In each case, we provide a closed form solution for the least-squares estimator of the Choi matrix. We propose a novel, two-step method for projecting these estimators onto the set of matrices representing physical quantum channels, and a fast numerical implementation in the form of the hyperplane intersection projection algorithm. We provide rigorous, non-asymptotic concentration bounds, sampling complexities and conﬁdence regions for the Frobenius and trace-norm error of the estimators. For the Frobenius error, the bounds are linear in the rank of the Choi matrix, and for low ranks, they improve the error rates of the least squares estimator by a factor d 2 , where d is the system dimension. We illustrate the method with numerical experiments involving channels on systems with up to 7 qubits, and ﬁnd that PLS has highly competitive accuracy and computational tractability.


Introduction
Quantum process tomography (QPT) -the task of estimating an unknown quantum transformation completely from measurement data -is a powerful, but resource-intensive, tool for quantum technology. It has been applied in a variety of experimental contexts [1,2,3,4,5,6]. However, the ongoing growth in size of quantum hardware, such as quantum circuits and simulators, motivates the design of QPT techniques that use relevant resources as sparingly as possible.
The earliest QPT proposals [7,8] used linear inversion techniques to reconstruct quantum channels from informationally complete datasets obtained by feeding different (known) input states into the process and performing full quantum state tomography on the resulting outputs. Later, it was realised that by applying the channel-state duality uncovered by Choi [9] and Jamiołkowski [10], one could perform QPT by simply applying the channel to a maximally entangled state on the system and an ancilla, and using quantum state tomography to reconstruct the channel's Choi matrix directly [11,12]. Other statistical methods developed in the context of state estimation have been adapted for QPT, including maximum-likelihood [13,14,15,16] and Bayesian inference [17,18,19,20] techniques. In practical applications, point estimators of the quantum process need to be accompanied by error bars [21], hence the importance of developing confidence regions methodology [22,23,24].
A major challenge in current applications is the experimental and computational cost of estimating high dimensional systems. The 'low-rank' paradigm developed in the context of compressed sensing [25,26,27] partly addresses this challenge by reducing the number of measurements required to estimate channels with a small number of Kraus operators, or equivalently, a Choi matrix that is low-rank. Other types of latent structures, such as matrix product states with small bond dimension, have been exploited in quantum tomography [28,29,30] and more recently extended to quantum state and process tomography in conjunction with machine learning [31,32]. Techniques like randomized benchmarking can reliably certify unitary channels with limited resource consumption and a robustness against state preparation and measurement errors [33,34]. An alternative approach is to estimate the fidelity with respect to a known target process [35,36].
In this work we go back to the traditional 'full-tomography' problem and propose a computationally effective and statistically tractable QPT method, based on an extension of the projected least-squares (PLS) method of state estimation proposed in [37]. The method is particularly effective in estimating low rank channels as a proxy for noisy unitary transformations realised in near-term quantum architectures. We provide theoretical concentration bounds for the Frobenius and trace-norm errors and and present simulation results for channels with up to 7 qubits (this corresponds to a Choi matrix with 2 28 ≈ 2.684 × 10 8 degrees of freedom).
Let us consider an unknown channel C : M (C d ) → M (C d ), and represent this by its Choi matrix Φ = C ⊗ I d (Ω), where Ω is a maximally entangled state on C d ⊗ C d . We gather data by repeatedly measuring either the output of the channel with known inputs (direct QPT), or the Choi matrix as output of C ⊗ I d (ancilla-assisted QPT). The PLS estimatorΦ P LS is obtained by first computing the least squares estimatorΦ LS of Φ based on the measurement data, and then projecting this on the convex space of Choi matrices, the latter being the intersection between the cone of positive matrices CP and the hyperplane T P determined by the linear constraints Tr s (Φ) = 1 d /d. We note that the idea of projecting the least-squares estimator of a Choi matrix onto the intersection of CP and T P has been previously considered in [38]. However, this differs in several respects to our implementation: we consider specific choices of input states and measurement settings for which we provide explicit expressions for the LS estimator, and more significantly, we use a different projection algorithm. These result in a protocol of significantly lower computational complexity for which we are able to provide precise statistical convergence results.
We implement the projection using several methods including Dykstra's algorithm [39] and our own hyperplane intersection projection (HIP). Dykstra's algorithm involves an iteration of alternating projections onto CP and T P with certain adjustments that make sure that the limit is the closest point with respect to the Frobenius distance. While the projection onto T P is fast, the projection onto the cone CP requires matrix diagonalisation and is the slowest subroutine of the algorithm. The HIP algorithm also alternates between those projections, but uses them to approximate CP and directly compute the projection on the intersection between this approximation and T P. This vastly decreases the required number of iterations and leads to a significant speed-up in the computation time, while the statistical errors remain comparable to those of PLS using Dykstra's algorithm. To illustrate the versatility of PLS we estimated a 7-qubit noisy version of the quantum Fourier transform, with Pauli measurements (cf. Section 3.1 for details of the experimental setup). The channel is obtained by performing the quantum Fourier transform and then measuring the first qubit in the z direction with probability 1/4. Figure 1 contains a summary of the results, showing increasing accuracy in trace-norm error, as well as eigenvalues and eigenvectors estimators, with sample size. While sample sizes of order 10 12 are prohibitive for experiments, we note that important features become visible for more moderate sample sizes of around 10 8 . For a rank-one channel, 10 7 would arguably be enough, and is significantly less than the number of measurement settings.
The proposed methods are accompanied by mathematically rigorous performance guarantees. They are summarised in Theorem 1 and Corollary 1 which establish concentration bounds for estimators resulting from four possible experimental scenarios detailed in Section 3. In particular, we show that the number of samples required to reconstruct a rank r Choi matrix to accuracy with respect the the Frobenius distance is rd 2 / 2 (for trace-norm distance it becomes r 2 d 2 / 2 ). This shows that that PLS is particularly well-suited for estimating approximately unitary channels (r ≈ 1), for which it achieves error reduction by a factor d 2 compared to the ordinary LS estimator. In Theorem 2 we provide a recipe for constructing confidence regions without any prior assumption about the Kraus rank r. Roadmap Section 2 summarises important background, in particular the LS estimator. Section 3 describes the 4 experimental procedures we analyse. Each procedure concerns either direct or ancilla-assisted QPT and involves either Pauli measurements or mutually unbiased bases measurements. Explicit expressions for each LS estimator are also provided. Section 4 introduces the PLS estimator and its different implementations. Section 5 contains the main theoretical results described above. Section 6 details the numerical algorithms and present numerical results comparing different projection methods, the dependence of errors on rank, and that of the computation time on dimension. Detailed proofs of concentration bounds for LS and PLS estimators can be found in the appendices.

Background
In quantum mechanics, the state of a d-dimensional quantum system is represented by a density matrix, i.e. a positive operator ρ ∈ M (C d ) of trace one (Tr(ρ) = 1). This state characterizes all properties and correlations of the underlying quantum system, but is not directly accessible. One can gain information about it by measuring the system. A measurement with a finite number of outcomes, say {1, 2, . . . , s}, is described by a positive operator valued measure (POVM). This is a set of positive operators {M i } s i=1 on C d , such that i M i = 1 d , where 1 d is the identity operator. The measurement outcome is random and the underlying probability distribution p is given by Born's rule: In quantum state tomography (QST) one aims to estimate an unknown state ρ by measuring a (large) ensemble of independent state copies prepared in the state ρ. Among various measurement and estimation techniques described in the literature [40], the projected least squares (PLS) method [41,37] is particularly suitable for estimating states which are well approximated by low rank operators. In a nutshell, PLS combines an initial 'quick and dirty' least squares estimation step with a 'noise reduction' step where the unphysical estimator is projected onto the space of quantum states. For low-rank states, the projection step reduces the estimation error by a factor of order d, and PLS was shown to achieve optimal error rates when the measurements are chosen from a sufficiently generic ensemble (2-design).
In this paper we address the related problem of quantum process tomography (QPT). Rather than estimating a quantum state, we want to estimate a physical evolution, such as that implemented by a noisy quantum circuit. Such a transformation is described by a quantum channel, i.e. a completely positive, trace preserving map In QPT we assume that the channel is unknown, and we would like to estimate it by probing many identical realisations of C with known input states and performing measurements on the outputs.
A particularly useful representation of C is the Choi matrix [9]. It arises from maximally entangling the input intended for C with an equally-sized quantum memory, also called an ancilla.
is a maximally entangled state on two d-dimensional quantum systems. Here, {|q } denote the computational basis vectors in C d . By applying the channel C to the system and leaving the quantum memory untouched, we obtain the Choi matrix of the channel Here, I d denotes the identity operation on the quantum memory (do nothing). The Choi matrix describes a bipartite quantum state which satisfies the constraint Tr s (Φ) = 1 d /d, where Tr s/a denotes the partial trace over the system or ancilla respectively. Conversely, any positive operator satisfying the above constraint is the Choi matrix of a quantum channel. The action of the quantum channel C on a state ρ can be expressed in terms of its Choi matrix as where denotes the transpose. Quantum channel and Choi matrix are two mathematically equivalent descriptions of the underlying process. Moving from one to the other allows us to recast QPT as an instance of QST. Meaningful distance measures between channels, say C 1 and C 2 , can also be expressed in terms of differences between the Choi matrices Φ 1 and Φ 2 . This includes the squared Frobenius distance as well as trace distance The latter is related to the diamond distance between the underlying channels -arguably one of the strongest and most widely used distance measures for quantum channels, see e.g. [42,Proposition 50]. We restrict our attention to independent and sequential uses of the channel. This is an actual restriction, as there are powerful techniques for QST [43,44,45], as well as quantum enhanced metrology [46,47], which require the parallel application of the channel to different parts of a global quantum system. We do, however, investigate both ancilla-free and ancilla-assisted strategies.
In the first case the system is measured after an appropriate input state ρ has been passed through the channel C. In this case the probabilities associated to an output measurement with where the second equality uses Eq. (1). This procedure is illustrated in Figure 4, Section 3.2.
In the ancilla-assisted case, we input one half of a maximally entangled state Ω into the unknown channel, the other half is left unchanged in a quantum memory. This procedure is illustrated in Figure 3, Section 3.1 and produces an output quantum state described by the Choi matrix Φ. Subsequently, we perform quantum measurements. The outcome probabilities associated with A common feature of both setups described above is the existence of a positive linear transformation which maps the Choi matrix Φ into the probability vector of the measurement outcomes. That is, p = A(Φ) ∈ C s , where s counts the number of measurement outcomes. This follows from the fact that the measured states depend linearly on the channel which, in turn, is in a linear, one-to-one correspondence with its Choi matrix. After performing several independent preparation-measurement rounds, the data is collected as a vector of frequencies f ∈ R s , so that for for large sample sizes f ≈ p. The resulting estimation problem can be recast as linear regression where is the statistical noise due to finite sample size. The simplest estimator for Φ is the least squares (LS) estimator:Φ where A † is the adjoint of A. The minimum is taken over all matrices τ on C d ⊗C d with compatible dimension, and the loss function · is the 2 -norm.
The solution to Problem (2) is easy to compute and produces a self-adjoint matrix with unit trace. However,Φ LS is typically not a Choi matrix of a channel. A quantum channel must be trace-preserving and completely positive; properties which provide additional constraints on the estimators of the Choi matrix. To obtain estimators representing physical quantum channels, we propose to project the LS estimators onto the set of matrices that satisfy the given constraints. By Choi's theorem [9], the complete positivity condition for the quantum channel is equivalent to the positive semidefiniteness of the d 2 × d 2 Choi matrix. The convex set of positive-semidefinite matrices forms a pointed cone in the space of complex square matrices. We call this cone CP, because it contains matrices that represent completely positive maps. The trace-preserving property of the quantum channel is equivalent to a partial trace condition for the Choi matrix: Tr s Φ = 1 d /d. This provides a further affine constraint involving d 2 elements. Hence, the trace-preserving matrices lie on a flat plane of dimension d 4 − d 2 . We call this set T P.
The intersection of CP and T P forms the set of matrices that represent physical quantum channels. Accordingly, we call this intersection CPT P and refer to Figure 2 for a geometric illustration.
For concreteness, we focus on reconstructing physical evolutions of multi-qubit systems. That is, the unknown channel C acts on a quantum system comprised of k qubits, i.e. d = 2 k . This will allow us to discuss different scenarios depending on the choice of measurement and input states. We will focus on two types of measurement setups: single-qubit Pauli measurements and (global) mutually unbiased basis measurements. The different choices of experimental strategies (ancilafree or ancilla-assisted) and measurement setups (single-qubit Pauli or mutually unbiased bases) produce four scenarios which are discussed in detail in the following section. For each scenario, we give a closed form solution to Eq. (2). ). The set of trace preserving maps forms a hyperplane with codimension d 2 , whose intersection with the CP cone is the set of Choi matrices. We also show a representation of the projection of the LS estimator onto the set of quantum states, givingΦCP 1, followed by the projection onto CPT P, including mixing with 3 Proposed experimental procedures and least-squares estimators 3.1 Scenario 1: ancilla-assisted QPT with single-qubit Pauli measurements We first propose an ancilla-assisted protocol. The experimenter prepares many independent copies of the maximally entangled state Ω and acts with the quantum channel on the right subsystem consisting of k qubits, before performing state tomography on the collection of output states which are instances of the Choi matrix Φ (Figure 3).
Two schemes for generating measurement settings come to mind: a 'fixed' one and a 'random' one. In the fixed scheme, we cycle through all 3 2k possible measurement settings a total of ν times. There, sample size N transforms to ν3 2k cycles. In the random scheme, we choose measurement settings uniformly at random, and write ν = N/3 2k for the mean number of times each measurement setting is used. In the random scheme ν may be non-integer and strictly smaller than one.
In both cases, the frequency f s o is equal to the number of times the outcome o has occurred when measuring in setting s, divided by ν.
These frequencies are unbiased estimators of, and tend to in the limit of large N , the probabilities of observing outcome o when measuring with setting s, given by Born's rule: The formula of the least-squares estimator can be adapted from [37] which deals with QST from single-qubit Pauli measurements: Here, the |o i , s i are single-qubit eigenstates of a Pauli matrix, and 1 2 is the 2-dimensional identity operator.

Scenario 2: Direct QPT with single-qubit Pauli measurements
Compared to preparing a large number of maximally entangled states of 2k qubits, individual k-qubit (pure) product states are a much more tractable proposition. Our second proposed experimental setup, illustrated in Figure 4, takes this into account. The experimenter prepares a state from the set of pure product states {P a p } where a ∈ {x, y, z} k and p ∈ {0, 1} k , and denotes the transpose. This is achieved by initialising the qubits in the product state |0 = ⊗ k i=1 |0 and applying a unitary U ai pi to qubit i such that The quantum channel C is applied to the prepared state producing the output state C(P a p ). Using another product unitary U 2 as described in the previous section, the experimenter performs a Pauli measurement with measurement setting b ∈ {x, y, z} k , to obtain an outcome q ∈ {0, 1} k . Similarly to scenario 1, either we repeat the procedure a fixed number of times ν for each choice of indices a, b, p, so that the sample size is N = ν3 2k 2 k , or we choose the indices at random for each of the N samples and set ν = N/(3 2k 2 k ) for the mean number of times each setting is used. The q-th frequency entry {f ab pq } q counts the number of times outcome q is observed having measured the state C(P a p ) with setting b, divided by ν. These frequencies are unbiased estimators of the probabilities p ab pq = Tr(C(P a p )P b q ) and converge to them in the limit of large N .

Proposition 1. For direct QPT with single-qubit Pauli inputs and single-qubit Pauli measurements, the least-squares estimator for the Choi matrix Φ of a k-qubit quantum channel takes the following form:Φ
We refer to Appendix 8.1 for a proof.

Scenario 3: Ancilla-assisted QPT with mutually unbiased basis measurements
The set-up is similar to that of ancilla-assisted QPT with single-qubit Pauli measurements, though we now use the POVM consisting of a maximal set of mutually unbiased bases (MUB) [48,49] to measure instances of the Choi matrix. Such measurements require entangling gates to implement, but they also have a rich, global structure that turns out to produce sampling complexities that are essentially optimal [37]. A MUB POVM on 2k qubits consists of Each vector (range) |v k belongs to one of d 2 +1 orthonormal bases, and any pair belonging to different bases satisfies the unbiasedness condition | v i |v j | 2 = d −2 . The following 'near isotropy' property is shared with a larger class of 2-design POVMs, see e.g. [50,Lemma 8], and plays an important role in proving our results: for all operators A with compatible dimensions. Our proofs rely mainly on the near isotropy property and can therefore be readily generalized to other (approximate) 2-design constructions. Concrete examples are symmetric informationally complete (SIC) POVMs [51], complete sets of stabiliser states [52] and local random circuits of depth (order) k [53,54,55].
Let us now return to the actual protocol. The system-ancilla input state Ω is passed through the channel C ⊗ I d and produces a system characterized by the Choi matrix Φ as output. This system is then measured with a MUB POVM as described above. The frequency f i indicates the number of times outcome i is observed, divided by the total number of measurements N . These frequencies are unbiased estimators of, and tend to in the limit of large N , the probabilities As with single-qubit Pauli measurements, the expression of the least squares estimator is found in

Scenario 4: Direct QPT with mutually unbiased basis measurements
Our final experimental method is similar to direct QPT with single-qubit Pauli measurements, in that we do not use an ancillary system. However, we use a different POVM and a different set of input states. As input states we use the transposes of one dimensional projections |w k w k | onto vectors of a set of d + 1 MUBs. On the respective outputs, we measure a MUB POVM whose elements |v l v l | may be different from those of the input. As with the previous scenario, similar results can be obtained with other 2-designs.
Here, frequencies f k l indicate the number of times outcome l ∈ {1, . . . , m} is observed having measured the state C(|w k w k | ), divided by the (mean) number of repetitions ν = N m . These frequencies are unbiased estimators of the probabilities with M l = 1 d+1 |v l v l | and converge to them in the limit of large N .

Proposition 2. For direct QPT with (transposed) MUB inputs (|w i w i |)
and MUB measurements d m |v l v l |, the least-squares estimator for the Choi matrix Φ of a k-qubit quantum channel takes the following form: We refer to Appendix 8.2 for a proof.
We conclude this section by emphasizing that all estimators found from (2) are linear and unbiased, given that the frequencies observed are unbiased estimators of true probabilities. Hence those estimators discussed so far,Φ LS , are unbiased estimators of the underlying Choi matrix Φ.

Methods of projection
We turn our attention now to the task of finding our final estimator,Φ P LS ∈ CPT P, from the least-squares estimatorΦ LS . We do this in the following way (see Figure 2 for an illustration of the geometry): 1. ProjectΦ LS onto the set of CP matrices of trace one (the quantum states) -we call this set CP1 and the resulting estimatorΦ CP 1 .
For reference, the projection of a matrix M onto a non-empty closed convex set S with respect to a norm · α is defined by In order for our theoretical results to hold, the above two procedures need not constitute true projections. For our purposes an adequate 'projection' would be to find points in CP1 and CPT P respectively that satisfy the following two key properties: where Φ is the Choi matrix we are trying to estimate and here, and henceforth, · ∞ indicates the operator norm. While a direct, Frobenius projection ofΦ LS onto CPT P (e.g. using Dykstra's algorithm) may seem less involved than the above procedure, our method turns out to be more amenable to deriving a priori error bounds that are capable of resolving latent structure in the form of low rank. In addition, the advantage of working with more relaxed definitions is that we obtain 'projections' that are easier to compute numerically, and better-behaved in practice. For the rest of the paper we focus on the two-step method, but some results also hold for the direct projection method (see Appendix 12). We give the general description of steps 1 and 2 and point to the numerics Section 6 for the detailed algorithms.
Step 1. The projection onto CP1 is implemented by an eigenvalue thresholding algorithm (cf. Section 6.1 for the details and the proof of property (13)). This step constitutes a true projection.
Step 2. FindingΦ P LS fromΦ CP 1 is the problem of projecting a matrix onto the intersection of two convex sets (CP and T P) and is the most involved part of the PLS algorithm. We sketch 3 possible, iterative procedures, including our preferred HIP method, and refer to Section 6.2 for detailed algorithms and the proof of property (14). The key idea behind the contraction property ( (14)) is contained in the following lemma whose proof can be found a number of textbooks, e.g. [56], Prop 4.16.

Lemma 1. Let S be a closed, non-empty, convex subset of
be the projection onto S with respect to the Frobenius distance. Then the projection of a matrix A onto S is closer to every point B in S than A, i.e.
Using Lemma 1, one may construct a 'projection' of a matrix X onto CPT P, satisfying property (14), by repeatedly projecting X onto a set of convex sets containing CPT P. One such choice is to project X alternately onto T P and CP. We refer to this as alternating projections (AP). AP can be made into a true projection onto CPT P by adding a correction term orthogonal to CP after each CP projection -this is Dykstra's algorithm [39]. In this work, we define our own fast, generalised algorithm which makes use of Lemma 1 -called hyperplane intersection projection (HIP), a discussion of which is deferred to Section 6.2. Again, this does not constitute a true projection but satisfies property (14). We find this algorithm to be significantly faster than AP or Dykstra's algorithm, with continued convergence at large iterations (see Section 6.3.).
Whichever of the above iterative methods we chose, after a number of iterations we need to ensure thatΦ P LS exactly satisfies the conditions to be an element of CPT P. For this, we mix the current estimator with the maximally mixed state. Concretely, after near-convergence of the projection algorithm, we have an estimatorΦ P LS in T P but which may have very small negative eigenvalues. We call the largest magnitude of these λ min . For a system of dimension d, we thus then take our final estimator to beΦ where p is the solution to (1 − p)λ min + p/d 2 = 0. In this way, we ensure the estimator is positive-semidefinite while simultaneously preserving the partial trace condition. Mixing with the maximally mixed state adds an error proportional to λ min , hence this additional error can be made arbitrarily small by stopping the algorithm when λ min is sufficiently small in magnitude. The convergence rate of the algorithm is hence defined by how quickly λ min falls below the required threshold.
Closed form expressions for projections onto CP, T P and CP1 are required for the implementation of any protocol discussed in this work. These are provided and proven in Appendix 13.

Error bounds and sampling complexities
In this section we provide rigorous error bounds on the PLS estimator in the four experimental scenarios described in Section 3. We first state our results in terms of concetration bounds for the squared Frobenius norm distance Φ − Φ 2 2 and the trace norm (L 1 ) distance Φ − Φ 1 , and then discuss how such results can be used to construct confidence regions. A discussion of different channel distance measures, and how they compare to each other, can be found in [57], see also [58,59,60]. Theorem 1. Let C be a k-qubit channel and assume that its Choi matrix Φ has rank r. The squared Frobenius norm error of the estimatorsΦ P LS derived from the LS estimatorsΦ LS in scenarios 3.1, 3.2, 3.3, and 3.4 satisfy the following error bounds: for ∈ (0, 1), and, respectively, Here, g(k) is a function that depends on the number of qubits k and the type of measurement scenario: Scenario 3.4.

(18)
Theorem 1 shows that the PLS estimators can be brought arbitrarily close to the true Choi matrix, provided that we perform sufficiently many measurements. In particular, the Frobenius square error rates scale as O(r/ (N g(k))) up to logarithmic factors, and we note that scenarios 3.1 and 3.2 exhibit error bounds which are larger by a factor (3/2) 2k than those of scenarios 3.3 and 3.4. This is consistent with the fact that MUB measurements are more 'informative' than local Pauli measurements that necessarily factorize in to tensor products [61,62,63]. Another interesting observation is that the error rates for ancilla-assisted versus ancilla-free settings are the same in the case of Pauli measurements and differ only by a factor 2 in the case of MUB measurements.
The next corollary provides the sampling complexities derived from the above the theorem.
To achieve Frobenius accuracy one requires a sample size of Up to logarithmic factors, this results in sampling complexity These observations showcase that the PLS method for QPT yields sampling complexities which increase linearly for the Frobenius error (quadratically for norm-one error) with the rank of the channel, so that low rank channels have lower errors than full rank ones. This behavior is very similar to compressed sensing estimators that are designed to exploit (approximate) low rank [25,26,27]. In contrast, the LS estimator shows weak dependence on rank and for low rank states, its Frobenius error is O(d 2 ) larger that that of PLS, as shown in Corollary 2 in Appendix 9.5 .
Although the concentration bounds characterise the PLS error behaviour in terms of rank, it is unlikely that an experimenter knows the rank of the Choi matrix of the quantum process under investigation. Therefore, the bounds cannot be used to construct confidence regions unless the rank is known. We now state a more general result which allows us to define confidence regions without prior knowledge of the rank of Φ, but rather in terms of properties of the estimatorΦ CP 1 .
To state the result, we first formalise a notion of being close to rank r: Theorem 2 below provides confidence balls for the PLS estimator for both the Frobenius and the trace-norm distance, in terms of computable properties of the intermediary estimatorΦ CP 1 .

Theorem 2.
Let Φ be the Choi matrix of a channel andΦ P LS our estimator, generated with any 'projections' satisfying Properties (13) and (14). Suppose thatΦ CP 1 is δ-almost rank r for some (r, δ). Then, with probability (at least) 1 − , The trace distance error Φ P LS − Φ 1 is instead bounded by In both cases, g(k) has been defined in (18).
The theorem can be applied by choosing the pair (r, δ) which provides the tightest bound and constitutes the confidence ball. All the results of this section follow from Theorem 4 -proven in appendix 11.
We finally note that the constants given in Theorems 1 and 2 and Corollary 1 are likely to the pessimistic and the true error probabilities may be significantly smaller. However, the advantage is that the bounds hold exactly for any channel and any N . It is certainly possible to get tighter asymptotic bounds for large N .

Numerical experiments
In this section we outline the implementation and results of several computer simulations to illustrate the proposed methods.
All experiments presented in this section were run on a quad-core Intel Core i3-8100 with 16 GB memory (computation times given in Figure 8). Mainly because of memory requirements, the experiments on 7 qubits in Figure 1 were done on 64 cores of Intel Xeon E7-8890 with 512 GB memory, needing around three days for each.
The code is available on GitHub [64]. The repository also contains up-to-date information on the future article on the theoretical underpinnings and heuristics behind HIP [65].

Thresholded projection on trace-one CP operators
As described in Section 4, the first step of the PLS method is the projection ofΦ LS onto the space of quantum states CP1. This is implemented using the following algorithm which describes a general thresholded projection process.  5: if λ i ≤ τ then 6: µ i ← 0 7: else 8: if µ i ≥ 1 then 10:

12:
else 13: find j such that 14: The algorithm is separated into two parts. The first, up to and including line 8, is a thresholded projection onto CP, while the second (lines 9 to 17) ensures unit trace and completes the thresholded projection onto CP1. Note that performing the first part of the algorithm while setting τ = 0 produces the direct Frobenius projection onto CP (see appendix 13.2 for the proof). We choose, for projection onto CP 1, τ = −λ min (Φ LS ) (the sign-flipped, least eigenvalue ofΦ LS ), though we note that the output satisfies Property (13) regardless of the value of τ (Lemma 2 in the appendix 10).
For numeric implementations, we use the full eigenvalue decomposition (EVD) but acknowledge that iterative, randomised implementations or the use of partial EVDs can have a lower theoretical complexity with lower memory requirements -especially in the estimation of high-dimensional, low-rank processes.

Hyperplane intersection projection
We now move on to discuss the numerical implementation of the second step in the PLS projection, that ofΦ CP 1 onto CPT P. In order to solve this problem, we have developed a fast, generalised 'projection' algorithm to find a point in the intersection of an affine space and a convex set satisfying Property (14). Here we give a high-level description of the hyperplane intersection projection and prove that it satisfies Property (14) (cf. Lemma 3 in appendix 10).
We know how to efficiently project (in Euclidean distance) onto CP and T P, though the projection proj CP on CP is more costly. (Closed form expressions for these projections are given in appendix 13, and we do not provide explicit algorithms for them, given their simplicity.) In the hyperplane intersection projection (HIP) algorithm, we switch between the AP regime (projecting alternately onto CP and T P) and HIP mode, based on a criterion discussed below. In HIP mode, we keep a list of half-spaces that contain CP -defined by the estimator after each iteration -and project on the intersection of a large subset of these half-spaces and T P. The key idea that allows efficient computations is that it is easy to do the following two things: project onto the intersection of hyperplanes, and check that the projection onto the intersection of hyperplanes is the same as the projection onto the intersection of half-spaces. We select a large subset of half-spaces that ensures that this equality holds, before projecting onto the intersection of T P and the associated set of hyperplanes. See algorithms 2 and 3 for pseudocode respectively showing construction of the set of hyperplanes, and the HIP algorithm. w_active ← empty list 3: for w ∈ w_list do 4: The following test can be checked by looking at whether some coefficients are all non-negative.

5:
if the projection of Φ on the intersection of T P and the half-spaces of w_active and w is equal to the projection of Φ on the intersection of T P and the corresponding hyperplanes then 6: Append w to w_active 7: Φ new ← the projection of Φ on the intersection of T P and the half-spaces in w_active 8: return w_active, Φ new Whatever the criterion for switching between AP and HIP mode, Property 14 is always satisfied. An easy choice that works well in practice is to take a small fixed number of steps (say six) in AP mode, and a bigger fixed number of steps (say thirty) in HIP mode. The criterion we use in the experiments is more convoluted, and can be read in the implementation [64].

Comparison of projection algorithms
Here, we compare the performance of several variants of the HIP algorithm discussed in Section 6.2, with AP and Dykstra's algorithm.

13:
Add w as first element of w_active 14: w_active, Φ ← HIP_inner(w_active, Φ) 15: if SwitchCondition_toAP then 16: mode ← AP 17: return Φ of any of the subsequent algorithms discussed, before mixing with the depolarizing channel, the L 1 distance Φ P LS − Φ 1 is the same as that of Φ CP 1 − Φ 1 , up to a relative change of five percent. Hence the important question regarding the choice of algorithm is which converges the fastest?
We test convergence times for 'projections' ofΦ LS for a 5-qubit quantum Fourier transform, with 5 different algorithmic implementations: AP, Dykstra, HIPswitch, OneHIP (HIP without any memory of past hyperplanes or switches to AP), and PureHIP (no switch to AP). (HIPswitch, OneHIP and PureHIP are variations on Algorithm 3, and we provide their explicit algorithmic implementations in Appendix 14.) We do so primarily to test the convergence time of our main algorithm HIPswitch compared to that of AP and Dykstra. Additionally, by turning on parts of HIPswitch in turn (OneHIP, PureHIP) we explore the efficacy of the algorithm. Note also in Figure  5, that we display the results of a dual-approach to the projection ofΦ CP 1 onto CPT P; we defer a discussion of this to Section 6.7. In Figure 5, we see that: • Dykstra's algorithm and AP are almost indistinguishable, and do not converge in reasonable time.
• The simplified version oneHIP of HIP, without memory of past hyperplanes or AP, does a little better, but also fails to converge rapidly.
• The simplified version of HIP 'pureHIP', in which we always stay in HIP mode, improves linearly in this setting.
• The main algorithm HIPswitch converges faster than pureHIP. The flat low points correspond to the AP mode. At the end, we might have hit the quadratic convergence regime. HIPswitch and pureHIP attain this faster on easier problems (lower dimension, higher rank) for convergence.
In all subsequent experiments discussed in this article, we use the algorithm HIPswitch (Algorithm 3). We stop when the effect of adding the depolarizing channel changes the trace-norm distance by at most 2 × 10 −3 , corresponding to a least eigenvalue of 10 −7 for 5 qubits, i.e. −7 on Figure 5. We now look at how the performance of our QPT method depends on the number of measurements made. We repeat ten times the estimation of the QFT on 5 qubits, for sample sizes of N = 3e5, 1e6, 3e6, 1e7, 3e7, 1e8. We can see on Figure 6 that:

Effect of sample size
• unsurprisingly, the least-squares estimator error scales as N −1/2 .
• The least-squares estimator always has almost the same error for the same sample size: each set of ten points is indistinguishable on the figure.
• Our final estimator has two possible regimes. It depends on the rank of the first projection Φ CP 1 . For a given rank, there is visible but low variation of the loss.
• Most importantly, the final estimator also scales as N −1/2 .
• For this rank-one channel, the final estimator divides the loss of the least-squares estimator by d 2 /3 or d 2 /5. That is, for 5 qubits, we improve by a factor of 300. In this section, we estimate sums of orthogonal unitary channels, with the ranks 1, 2, 4, 8, 16, 32, and equal eigenvalues. We include a comparison of the performance of the projected estimator when using the two methods of initial projection onto CP1: our default thresholded projection, and Frobenius projection, using the same data. We test for sample sizes N = 1e6, 1e7, 1e8. Each experiment is reproduced ten times. In Figure 7, we see that:

Effect of rank
• the loss is slightly less than linear in the rank, until it is close to the maximum of 2.
• The thresholded projection is much better than the standard projection, unless the loss is already close to the maximum of 2, in which case it is worse.
• The effect of getting the wrong rank with the thresholded projection decreases with the rank.

Effect of dimension
We now test the estimation of the Quantum Fourier Transform using Pauli measurements on one to six qubits, and for mutually unbiased measurements in dimensions 3, 7, 11, 17, 31, 67. To get comparable losses, we copy the scaling of Theorem 1, and set the respective sample sizes as 10 × 9 k for k qubits, and 100 × d 2 , equivalent to 100 × 4 k , for MUBs. In Figure 8, we show the mean trace norm loss and time requirements over ten experiments. For the latter, we show the time needed for one loop of algorithm 3, and the time needed for the whole estimation. (We do not include the time taken for the generation of the least-squares estimator, as it is included in the data generation.) We can see that: • the scaling of Theorem 1 seems comparable to what we get in practice.
• The time for each loop does not depend on the setting. Slightly more loops are usually needed for MUBs than for Pauli.
• We see that the scaling in time gets steeper: possibly for low dimension, bigger arrays are computed faster per entry. The most time-consuming operation is the diagonalisation in proj CP , and its implementation will dictate the complexity. Going from five to six qubits makes loops forty times slower, and the whole computation ninety times slower, hence more loops are required. • Experiments on six qubits can be done in half an hour on a personal computer (plus 15 minutes to generate the data); on 5 qubits in less than twenty seconds.
The memory requirements are at the very minimum of d 4 , the size of the Choi matrix. With Algorithm 3, we need to keep one such matrix in memory for each hyperplane, which can multiply the needs by 20. However, the limiting factor for the simulations in the Pauli setting is the data generation, where a careful reasonably fast implementation requires 24 k complex entries.

An alternative optimisation procedure
We thank an anonymous referee for pointing out that the dual optimisation problem for finding the projection onto CPT P may be more well-behaved than the primal one. Starting from that point, it is possible to devise a simpler projection algorithm that yields the Euclidean projection, is wellrooted in optimisation theory, and has speeds comparable to HIP -based on the few experiments we tested it on. In this section we outline the method and present a preliminary result but we postpone a more in-depth analysis and comparison to HIP for the future publication [65].
The direct projection problem (without projection ofΦ LS onto CP1) may be written as follows: The constraint Φ ∈ CP can be made explicit as Φ ≥ 0, that is, a set of linear inequality constraints, and the constraint Φ ∈ T P can be made explicit as Tr s (Φ) = 1 d 1 d , that is, a set of linear equality constraints. Hence, we can look at the following dual problem. We relax the trace-preserving constraint and form the Lagrangian: with ν ∈ Herm n×n our dual variable. Let the solution to this relaxed problem be: The associated dual function is and the optimal value of ν is found by solving the dual problem: The totally mixed state Φ = 1 d 2 1 d 2 strictly satisfies the inequalities, hence Slater's conditions [66,67] hold, and the solution of the primal problem (25) is the same as the one yielded by the dual problem: Φ rel (ν opt ) =Φ P LS . This is beneficial as the solution to the relaxed problem is easy to compute: for a function f which we do not need to make explicit, so that This allows simple computation of the Lagrangian. We can also check that the gradient is ∇q(ν) = Tr s (Φ(ν)) − 1 d 1 d . This dual problem is well-behaved: to start with, there are only d 2 real parameters, instead of d 4 . Moreover, the function is much smoother: the gradient is Lipschitz, so that we can use fast optimisation algorithms. We used BFGS ( [68,69,70,71]), since it is available in scipy, but options such as Nesterov's accelerated gradient [72] could be considered.
Some simple simulations show that the speeds obtained using this algorithm are comparable with HIP ( Figure 5), with again most of the time spent in proj CP . We need slightly more loops than for HIP in the five qubit case, as in Figure 5, but there is likely to be room for improvement. Notice that the line corresponding to the dual+BFGS algorithm stops around 10 −8 . This is because this implementation of the optimisation algorithm stops for lack of precision at that point.
We can also give an interesting interpretation of the dual variable ν: when we use alternate projections, we project on CP, then correct by projecting on T P, and so on. By contrast, in the dual optimisation, we look for an element in the orthogonal space to T P to be added toΦ LS , so that the projection on proj CP belongs to T P, too.

Conclusions and outlook
In this work we proposed and investigated a computationally efficient and statistically tractable quantum process tomography method called projected least squares (PLS), inspired by previous work on quantum state tomography [37]. The estimator is constructed by first computing the LS estimator and then projecting this on the space of physical Choi matrices. We propose this projection be carried out in two steps: a projection onto the set of quantum states followed by a projection onto the set of Choi matrices. Additionally, we proposed a fast numerical implementation of this projection -hyperplane intersection projection -which accelerates an alternating projections procedure by employing fast projections onto the intersection of several tangent spaces and the hyperplane of linear constraints.
We studied four experimental scenarios for data generation. We consider the most experimentally feasible to be ancilla-free tomography with single-qubit Pauli measurements (scenario 3.2), while the most statistically efficient is the ancilla-assisted setup with mutually unbiased bases measurements (scenario 3.3). For each scenario we provided explicit expressions for the LS estimator (Section 3), and non-asymptotic concentration bounds for the PLS estimator with respect to the Frobenius and trace norm error of the associated Choi matrix, cf. Theorem 1.
Via the two-step projection procedure, all proposed methods are able to exploit latent structure in the form of low rank. Low rank channels are particularly interesting as a class of transformations modelling noisy implementations of unitary channels, which are relevant for quantum technology and quantum computing. Our sampling complexity bounds for the Frobenius error depend linearly on the rank r of the channel and are smaller than those of the LS estimator by a factor r/d 2 . This qualitative behaviour is confirmed by numerical simulations which show that the PLS errors are significantly lower than those of LS for low rank channels.
We carried out several numerical studies involving 5 to 7 qubit channels, including a comparison between the two projection methods, a study on the error dependence on rank, and an analysis of how the computational time scales with system dimension. The code is available on GitHub [64]. The repository also contains up-to-date information on the future article on the theoretical underpinnings and heuristics behind HIP [65]. Following a suggestion by an anonymous referee, we implemented an alternative projection algorithm based on the dual optimisation problem, which shows a similar behaviour to HIP and will be analysed in more detail in [65].
We leave as a topic of future investigations the extension of these results to quantum instruments described in terms of non trace-preserving quantum operations. In a different direction, we would like to further investigate the efficacy and possible improvement of the confidence regions prescribed by Theorem 2. Research should also be carried out to find improvements to the algorithmic implementation of the projections ofΦ LS onto CPT P to decrease computational complexity and memory demands. These could arise from "matrix-free" representations of the least-squares estimator and from exploiting the low rank of many high-dimensional processes of interest [73].
Acknowledgements: RK would like to thank Ingo Roth for inspiring discussions during the early stages of this project. For the figures we used the colour-blind-safe colour scheme from [74].
In this case we are sampling from the 3 2k distributions corresponding to each setting (a, b).
Consider now scenario 3.1. Observing the joint outcome o = (q, p) when measuring the Choi matrix with setting s = (a, b) is equivalent to first measuring the ancilla qubits with setting a, recording the outcome q, and then measuring the conditional system state with setting b and recording outcome p. The probability of observing outcome q when measuring with setting a is always 1 d , since the marginal state of the ancilla is always the maximally mixed state. The deferred measurement principle in quantum information theory [75] implies that the this set up is equivalent to making a measurement on the marginal ancilla state before acting with C ⊗I d on the joint system. Defining Ω a q as the marginal system state conditional on the measurement on the ancilla side of the maximally entangled state Ω, we will now show that the probabilities in scenario 1 (3) can be expressed as: and the associated projection operators transform as The maximally entangled input state |ω in the standard basis is: Therefore, after measuring the ancilla qubits with setting a on the joint state |ω and obtaining outcome q, the projected state is Therefore, the conditional state of the system after making the above measurement on the ancilla qubits isŪ a |q . We conclude, then, that But |q q| = P q = P q , so: Therefore the probabilities obtained making measurements in scenario 1 are: which coincides with equation (38) corresponding to the alternative scenario 2, where the input indices q were chosen randomly with uniform probability.

Proof of Proposition 2
Proposition 2. For direct QPT with (transposed) MUB inputs (|w i w i |) and MUB measurements d m |v l v l |, the least-squares estimator for the Choi matrix Φ of a k-qubit quantum channel takes the following form: Proof. We apply the relation to express the probability distribution for a give input state |w k w k | and measurement with POVM elements M l = d m |v l v l | as where P l = |v l v l |, Q k = |w k w k |. The linear map from Choi matrices to the collection of probability distributions is then It's adjoint is The general form of the LS estimator, assuming A † A is invertible, is given bŷ where f := {p k l : l, k = 1, . . . , m} is the vector of empirical frequencies. To find a more explicit expression and prove the validity of equation (50), we analyse A † A in more detail. For this we use the relations characteristic for 2-designs Tr(XQ k )Q k = X + Tr(X)1 d We now show that A † A has 3 eigevalues with eigenspaces consisting of linear spans of the following 4 types of operators: 1. Let Λ 1 = X ⊗ Y with Tr(X) = Tr(Y ) = 0. In this case where in the last step we used that k Q k = (d + 1)1 d .

A similar equality holds for Λ
Writing P l =P l + 1 d /d with Tr(P l ) = 0, and Q k =Q k + 1 d /d with Tr(Q k ) = 0, we now have Finally, by rearranging terms we get

Concentration bounds for the LS estimators
In order to prove Theorem 1, we first find error bounds for the least-squares estimators for each of the scenarios 3.1-3.4 in Frobenius norm distance, in sections 9.1-9.4. For this we will be applying a matrix generalisation of the classical scalar Bernstein inequality, see Ref. [76].

Theorem 3. Consider a sequence of N independent, Hermitian, random matrices
Then, for t ≥ 0: for a 2k-qubit joint system-ancilla system, where d = 2 k , N is the total number of measurements, Φ LS is the least-squares estimator for the true state Φ, and τ ∈ [0, 1].
Proof. We construct this bound by borrowing the result for the 'Pauli basis measurements' case given in [37] and adapting it for a larger system. The proof is an application of the matrix Bernstein Inequality and we sketch it here, though the reader is directed to [37] for details.
We can write the LS estimator (4) in the following way: where, for every s ∈ {x, y, z} 2k , the X s i are independent instances of the random matrices: with probability distribution over o ∈ {0, 1} 2k We can then write If we define the matrices: equation (57) becomes Thus we can apply the matrix Bernstein inequality to bound Φ LS − Φ ∞ . We have that: and Using the following inequality: and Jensen's inequality [77], we can solve (60) and (61) to find: Applying the matrix Bernstein inequality then gives us equation (53) , provided that τ is not too large.
We now convert this bound into one containing the Frobenius norm.
Proof. Taking (66) as our starting point, we can make use of the following inequality involving the Frobenius norm and operator norm of a Hermitian matrix: Let A ∈ Herm n×n be an arbitrary Hermitian matrix, then Φ LS − Φ is clearly Hermitian since it is constructed as a sum over a set of random Hermitian matrices. If Φ LS − Φ ∞ ≤ τ with some probability, it must be that Φ LS − Φ 2 2 ≤ d 2 τ 2 with at least that probability, by equation (65). Therefore, setting δ 2 = d 2 τ 2 we arrive at (64). This is valid for δ 2 ∈ [0, d 2 ], so is too in the region [0, 1] that we are interested in.

Scenario 3.2 Proposition 5. The LS estimator in scenario 3.2 satisfies the concentration bound
where N is the total number of measurements,Φ LS is the least-squares estimator for the true state Φ, and τ ∈ [0, 1].
Proof. Using Proposition 1 we can express the least squares estimator in scenario 3.2 as a sum of random matrices, similarly to scenario 3.1.
where s = 3 2k · 2 k , and a, b ∈ {x, y, z} k , q ∈ {0, 1} k . Here, X a,b q,i are independent instances of the random matrices: where A a,b q,i = 1 N (X a,b q,i − E(X a,b q,i )) and we can apply the concentration bound from Theorem 3 similarly to Proposition 3. Since X a,b q takes values in the same set as the random matrix X s in Proposition 3 (scenario 3.1), we obtain We now consider the variance Taking into account that s = d3 2k we arrive at the same upper bound expression as found in scenario 3.1

Scenario 3.3
Proposition 7. The operator norm error of the least-squares estimator in scenario 3.3 satisfies the following bound: This is a direct application of the result for 2-design POVMs in [37] to a system of dimension of d 2 . It serves no purpose to repeat the proof -another application of the matrix-Bernstein inequality -here, and the reader is directed to [37] for an exposition.

Proposition 8.
Proof. As with scenario 3.1, we use the relation (65) to derive the extra factor of 1 d 2 in the exponential, and arrive at (69).

Scenario 3.4
Proposition 9. The operator norm error of the least-squares estimator in scenario 3.4 satisfies the following bound: Proof. We return to another application of the concentration inequality (52).
For each k ∈ [m] we define the random matrices with probability distribution over l ∈ {1, ..., m} We denote by X k i the independent samples of X k corresponding to the measurement outcomes for input state Q k , with i = 1, ..., N/m. We then write the LS estimator (46) in the following way: assuming again that the N measurements are distributed uniformly among the different input states. Then, we can write the least-squares estimator operator-norm distance as i.e. the operator norm of a sum over independent centred random matrices: The upper bound, R, for the operator norm of each contribution to the sum (see equation (52)) can be found in the following way: And the variance term: where the inequality is with respect to Loewner ordering. Now Given that (46) by equation (34). We now use the fact that the P l and Q k form spherical 2-designs. The near-isotropy of such 2-designs ensures the following for arbitrary selfadjoint operators A, B: This can be extended linearly to all selfadjoint Φ ∈ M (C d ) ⊗2 such that: Thus: Substituting our upper bound of σ 2 into the matrix Berstein inequality yields the claim, provided that τ does not become too large.

Proposition 10. The following bound holds
Proof. As with scenario (3.1), we use the relation (65) to arrive at (83).

Summary of LS concentration bounds
The LS error bounds derived in sections 9.1,9.2, 9.3 and 9.4 can be summarised as follows Corollary 2.
where: 10 Key properties for two-step projection In this section, we prove that the implementations of our projection procedures for computinĝ Φ P LS satisfy the two properties (13) and (14).
We start with the following lemma relating the operator norm distances of Φ LS and Φ CP 1 from Φ, that is, ensuring Property (13).

Lemma 2.
For any non-negative threshold τ that is less than Φ LS − Φ ∞ , the first thresholded CP1 estimator Φ τ CP 1 defined in Algorithm 1 satisfies: In particular, if τ is zero (projection on CP) or if τ = −λ min (Φ LS ) (our main implementation), the condition is satisfied.
Proof. We write λ i for the eigenvalues ofΦ LS , as in Algorithm 1. First, since Φ is positive semi-definite, all negative eigenvalues ofΦ LS have smaller absolute value than Φ LS − Φ ∞ . In particular τ = −λ min (Φ LS ) satisfies the conditions of the lemma.
The operatorsΦ τ CP 1 andΦ LS are diagonal in the same basis. By the triangle inequality, we just have to prove that no eigenvalue gets changed by more than Φ LS − Φ ∞ .
If the condition on line 9 is met, then all eigenvalues between λ min (Φ LS ) and τ are set to zero, and all the other eigenvalues µ i are set to (λ i + τ − x 0 ) + . So that the small eigenvalues change by at most τ ∨ |λ min (Φ LS )| ≤ Φ LS − Φ ∞ , and the big eigenvalues change by at most τ ∨ |τ − x 0 |. We must then bound |τ − x 0 |. First notice that x 0 ≥ 0 since µ i ≥ 1 and (µ i − x 0 ) + = 1. Our worst case is then if x 0 > 2τ . But in that case, all final eigenvalues can be seen as ( On the other hand, if the condition on line 9 is not met, all eigenvalues bigger than the critical eigenvalue λ j are increased by τ , all smaller eigenvalues are set to zero, and the final eigenvalue j satisfies 0 ≤ µ j ≤ λ j + τ . So that we only have to prove that 0 ≤ λ j ≤ τ . Since the trace ofΦ LS is one, the left inequality on line 14 ensures that λ j > 0. On the other hand, if λ j > τ , then the right inequality on line 14 would ensure that the the condition on line 9 was met. The contradiction ends the proof. We now prove that the second 'projection' does not increase the Frobenius distance, that is, Property (14).

Lemma 3.
Assume thatΦ P LS is, given the inputΦ CP 1 , the output of either Dykstra's algorithm at convergence, AP algorithm for any tolerance, or hyperplane intersection projections algorithm with any switch conditions for any tolerance. Then Proof. By Lemma 1, we merely have to prove that all our 'projections' can be seen as a succession of usual projections in Frobenius norm on convex sets that contain CPT P. Dykstra's algorithm directly outputs the Frobenius projection on CPT P. AP are a succession of Frobenius projections on CP, and on T P, both of which are convex set that include CPT P.
Hyperplane intersection projections in AP mode is just AP. In HIP mode, we project on the intersection of: T P, and a set of half-spaces that contain CP. All those sets are convex sets that contain CPT P, hence their intersection also is.

Proof of Theorem 1
We now return to the proof of Theorem 1 and use the concentration bounds (84) for the LS estimators in the 4 scenarios to obtain bounds for PLS.
We will prove a stronger version, Theorem 4 which also covers the proof of Theorem 2.
To state it, we recall the notion of being close to rank r: a state ρ is δ-almost rank r if there is a rank r state ρ (r) such that ρ − ρ (r) ∞ ≤ δ.
(89) Theorem 4. Let Φ be the Choi matrix of a channel andΦ P LS our estimator, generated with any 'projections' satisfying Properties (13) and (14). Then, with probability at least 1 − η, the bounds (90), (91) and (92) below hold simultaneously true, for all r and corresponding values of δ. Assume that either Φ orΦ CP 1 is δ-almost rank r. With probability at least 1 − η: If it was Φ that was δ-almost rank r, then: If it wasΦ CP 1 that was δ-almost rank r, then: As a remark, Theorem 4 reduces to Theorem 1 when Φ is δ-almost rank r, with δ = 0, i.e. it is exactly rank r. The formulation in Theorem 4 can be seen by taking to equal the right hand side of the equations, solving for η and inverting the probabilities.
First, Property (13) relates the L ∞ norms of the errors ofΦ LS andΦ CP 1 : The next step is to relate the Frobenius norm of the distance betweenΦ CP 1 and Φ to their operator norm distance. Lemma 4. Assume either thatΦ CP 1 or Φ is δ-almost rank r. Then

Proof.
A key argument is presented in [37] for the following inequality relating the 1-norm to the operator-norm of two trace-1 positive-semidefinite matrices ρ 1 (of definite rank r) and ρ 2 : Thus we have: There exists the following inequality relating the Frobenius, 1-and operator-norms: Let A ∈ Herm n×n be an arbitrary Hermitian matrix, then which we can make use of sinceΦ CP 1 − Φ is the difference of two quantum states, and hence must be Hermitian. Using this, we can say: By Property 14 we may transfer this result toΦ P LS : We can now pull these results together to prove Theorem 4.
Using Equation (84), and setting the right-hand-side equal to η, we get τ = 8 ln(d 2 /η) 3N g(k) . We thus obtain, with probability at least 1 − η: where, to remind the reader: Scenarios 3.1, 3. Recall that Equation (84) is valid if τ ∈ [0, 1]. But if τ is more than one, then the right-hand side of Equation 90 is more than two, hence always true. The expression is thus always valid.
12 Error bounds and sampling complexities for the direct projection method where: Proof. Using Lemma 1, one can take the bound defined in (85) and simply apply it toΦ P LS to arrive at equation (105).
Inverting Theorem 5 we prove following: Corollary 3. To achieve the following accuracy: one requires a sample size of The sampling complexities for each method of data collection, with direct projection, are then: Equation (16) is to be contrasted with (105). Clearly, the bound (16) is stricter than that of equation (105) when r < d 2 32 . Therefore, the first method of projectingΦ CP 1 onto CPT P improves upon the error bound when the channel acts on more than 5 qubits, and for certain ranks in states of 3 to 5 qubits, while for 1 and 2 qubits the direct projection bound is tighter. Most importantly, the bound for the two steps projection method captures the qualitative dependence on the rank of the channels, which is confirmed by the numerical simulations described in Section 6.5.

Solution to the TP Projection
The projection, with respect to the Frobenius norm, of a matrix X onto the set of matrices representing trace-preserving maps is the solution to the following optimisation problem: