Overhead for simulating a non-local channel with local chan- nels by quasiprobability sampling

1Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan. 2Center for Quantum Information and Quantum Biology, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Japan. 3JST, PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan. 4Center for Emergent Matter Science, RIKEN, Wako Saitama 351-0198, Japan January 26, 2021


Introduction
We now have a programmable quantum device whose dynamics cannot be simulated by a classical computer within its runtime [1]. However, the capability of such devices is rather limited because of the absence of the quantum error correction. They are frequently referred to as noisy intermediate scale quantum (NISQ) devices [2]. There has been a substantial amount of research efforts to develop useful applications of NISQ devices in recent years [3][4][5][6][7][8][9]. The weakness of NISQ devices is that the number of qubits, the fidelities of gates, and Kosuke Mitarai: mitarai@qc.ee.es.osaka-u.ac.jp the connectivity are limited. The gate fidelities are especially restricted for two-qubit entangling gates. One approach to circumvent such limitation is to use socalled variational quantum algorithms. They employ parametrized quantum circuits and optimize the parameters to perform a given task. In such algorithms, we frequently construct the largest possible circuit allowed on a device to maximize the advantage of the use of quantum devices.
While this approach is promising as it can in principle employ such circuits, such algorithms can still be improved if one can perform further resource reduction. For example, if we can reduce the number of qubits or two-qubit gates required to obtain an output from a certain quantum circuit, it would widen the range of circuits that can be used for variational algorithms. To this end, a few approaches have been proposed. One is to decompose a large circuit into smaller ones by "cutting" circuits using a tomography-like method [10]. Also, in Ref. [11], we have presented a method to "cut" a certain non-local gate by decomposing it into a linear combination of local operations. These approaches share the same property that the overhead for the decomposition, which in this context is defined by the number of circuit runs that is required to achieve a desired accuracy of the output, scales exponentially to the number of cuts performed.
They can be also understood as techniques for performing a quasiprobability decomposition of quantum channels. Quasiprobability distribution, which is defined by a set of complex numbers {q i } satisfying i q i = 1, have recently found a wide range of applications in the area of quantum computing such as error mitigation for NISQ devices [12,13] and classical simulation of near-Clifford quantum circuits [14][15][16][17][18]. In particular, Refs. [12,13,17,18] considered a quasiprobability-based simulation of quantum channels; if a quantum channel Φ can be decomposed as Φ = i c i Φ i where Φ i and c i are respectively a chan-nel and a complex coefficient, Φ can be simulated by sampling Φ i with probability proportional to |c i | and processing the phase of c i with classical post-processing. The overhead of simulating the channel Φ using this decomposition is quantified by i |q i |. If we perform such a decomposition multiple times, the overhead is quantified by the product of i |q i |, thus leading to an exponential overhead to the number of decomposition performed. Refs. [12,13] have developed techniques to build inverse channels of noise channels using an experimentally available set of quantum gates. As a technique for a classical simulation, Refs. [17,18] has considered a quasiprobability decomposition of a non-Clifford channel into Clifford ones. In this context, we can view the decomposition performed in Ref. [10] as a quasiprobability decomposition of the identity channel into a measurement and state-preparation channel, and one in Ref. [11] as a quasiprobability decomposition of a non-local unitary channel into local ones.
In this work, we first define a quantity that we call channel robustness of non-locality in analog to the robustness of magic introduced in Ref. [15], which quantifies the minimal possible overhead that can be achieved for quasiprobabilistic simulation of a non-local channel by local channels. While this quantity is difficult to calculate in general, we show an analytic upper bound for general two-qubit unitary channels by constructing an explicit decomposition, generalizing the technique developed in Ref. [11]. Our previous work [11] has only considered decomposition of non-local gates expressed in the form of e iθA1⊗A2 for Hermitian operators satisfying A 2 1 = I and A 2 2 = I. In contrast, the decomposition developed in this work performs the cut of a general two-qubit gate in a single-step, leading to a substantially reduced overhead. Besides the reduced cost, the derivation of the decomposition is delivered more constructively than before which we believe is informative for further optimizations of this approach. While lower bounds of the defined robustness is also of theoretical interest that can characterize quantumness of a nonlocal channel, in this work, we focus on upper bounds obtained by explicit decompositions which enable us to actually simulate a nonlocal channel by local channels. This work develops a theoretical framework for a resource reduction suitable for first-generation quantum devices.
2 Decomposition of non-local channels into local channels 2.1 Notation We use the notation |ρ to express a density matrix ρ to stress that ρ can also be seen as a vector. Bold-font symbols are to express a quantum channel corresponding to a gate-like operation represented by a normal font. For example, a unitary channel U acts on a state |ρ as U |ρ = |U ρU † where U is a unitary matrix. Inner product between two operators |A and |B is defined as A|B = Tr(A † B).

Channel robustness of non-locality
In standard eperimental platforms including superconducting qubits and ion traps, it is often thought that the arbitrary single-qubit rotation charactrized by an axis n = (n 1 , n 2 , n 3 ) and an angle θ, R(n, θ) = exp [−iθ( α n α σ α )], and the single-qubit projective measurements along any axis are somewhat easier operations than two-qubit entangling operations. Experimentally, the projective measurement is realized by rotating the axis by R(n, θ) and performing the projective measurement along z-axis. The quantum channel M (n) corresponding to the projective measurement is a probabilistic map; when applied to a state |ρ , it returns a state Π(±n)|ρ /p + with some probability p ± , where Π(±n) is a projector to an eigenstate of ± α n α σ α with eigenvalue +1.
To implement Π(n) itself, we can define a probabilistic mapΠ(n) that takes a state |ρ to Π(n)|ρ /p + with probability p + and to |0 with probability p − where |0 corresponds to the zero matrix. The map to |0 means simply to ignore the case when the measurement resulted in −1. However, just discarding the −1 case is inefficient, especially when we also want to perform Π(−n). To resolve this issue, we define a probabilistic mapΠ(n, c + , c − ) that takes a state ρ Let us define the expected value of a random vector |σ which becomes |σ i with a probability p i as E[|σ ] := i p i |σ i . Observe that the following holds for any state ρ, in this sense and henceforth use the notation like this. Π(n, c + , c − ) includes the both of the cases which we mentioned earlier; if we want to apply only Π(n) we can set c − = 0, and we can also apply both of Π(±n) simultaneously with different coefficients. The reason we restrict |c ± | = 1 is to assure Tr[Π(n, c + , c − )ρ] ≤ 1 for any state ρ and any realization ofΠ(n, c + , c − ), thus preventing the decomposition overhead to occur at this stage.
With the above consideration, available local operations in practice, which we denote as L i , are the ones that can be written as an arbitrary product of R(n, θ) andΠ(n) and their tensor products. We denote a set of such possible L i by L. The most general form of decomposition that we aim to build for a given non-local quantum channel Φ is, where L i ∈ L. Given a decomposition above, Φ can be "simulated" in a Monte-Carlo manner by sampling L i with probability proportional to |c i |. More concretely, let us define a probabilistic mapΦ such that it becomes ci which shows that W (Φ)Φ becomes equal to Φ when executed for many times. This algorithm involves only local operations with classical communication (LOCC). However, note that the above protocol is not a simple probabilistic mixture of LOCC as it multiplies the complex coefficient c i /|c i | to each channel L i . Let us now consider the overhead associated with the decomposition. In many cases, the output from a quantum system that is evolved with a channel Φ is an expectation value of an observable O, which can be written as O|Φ|ρ .
The number of samples required to achieve the same accuracy increases if one tries to simulate where o s is a sample drawn fromΦ|ρ with a single realization ofΦ. The application ofΦ introduced in the last section involves many stochastic processes; it means to stochastically apply L i with probability p i , and L i itself is a stochastic map involv-ingΠ(n, c + , c − ). However, in the end, any realization ofΦ becomes a single-qubit operation that preserves the magnitude of the trace of ρ or maps the state to |0 . Therefore, it is guaranteed that the absolute value of a sample o s obtained by measuring O ofΦ|ρ is also bounded by o max . Again by Hoeffding's bound, We can see that W (Φ) 2 amounts to the overhead of the decomposition.
The above discussion leads us to define the following quantity which we call the channel robustness of nonlocality, W (Φ) quantifies the minimum amount of cost when we perform the simulation of a non-local channel Φ by probabilistic application of the local, experimentally feasible operations. W (Φ) is submultiplicative, i.e., , which is proved in Appendix. This allows us to upper-bound the overhead caused by the decomposition of a chain of quantum channels, . Note that if we change the available set of operations to some other ones from L, Eq. (4) quantifies the overhead of the decomposition in that case. For example, the overhead of the decomposition of the identity gate presented in Ref. [10] can be quantified by setting the available decomposition to be measure-and-prepare channels. Another example is the decomposition of non-Clifford circuits into stabilizer-preserving channels considered in Refs. [17,18]. The cost for a family of the error mitigation technique called probabilistic error cancellation [12,13] is also in relation to this quantity; it is quantified by substituting the target channel Φ with an inverse of a noise channel.
As L consists of operations with continous parameters, we can also define W (Φ) using a integral instead of a discrete sum. Formally, we can write, where λ denotes some continuous parameters that specifies an element in L.
The calculation of W (Φ) for a general channel Φ is challenging as it involves a complex minimization procedure. Nevertheless, in the next section, we give an upper bound of W (Φ) for a general two-qubit unitary channel Φ by explicitly constructing a decomposition using a complete but not overcomplete basis in L.

Upper bound for two-qubit unitary channel
It is well-known [19,20] that the non-local part of twoqubit gates can always be written as, where σ 0 is the 2×2 identity operator, and σ 1 , σ 2 and σ 3 are Pauli x, y and z operators, respectively. θ α is a real parameter, and u α is a coefficient that is determined from {θ α }. It leads to the following expression of U , Note that α |u α | 2 = 1 follows from the unitarity. First, we expand the general two-qubit unitary defined in Eq. (7) using |σ β as a single-qubit basis vector as follows: From this expression, it is clear that if we can construct a single-qubit channel U αα such that U αα ρ = σ α ρσ α for any ρ, we can write the above as, Therefore, we conclude U = α,α u α u * α U ⊗2 αα . Now, we construct U αα with available single-qubit operations. Observe that, Let us define the following operators A αα ,± and B αα ,± which can be implemented through single-qubit operations: B αα ,± = 1 2 (σ α ± iσ α ) .
Building on A αα ,± and B αα ,± , we further define the following channels: With simple algebra, we can see that, Therefore, U αα can be written as, The above decomposition of U αα leads us to the following decomposition of U : Note that there are symmetries A αα = A α α and B αα = −B α α . Using them, we rewrite the expression for later convenience, To calculate upper bound for W (U ), we need to formulate Eq. (19) to fit in the form of Eq. (2). σ α , which constitutes the first term of the decomposition, is trivially in L. Let us now consider A αα . We note that from the symmetry it suffices to consider the case where α < α . When α = 0, A αα ,± becomes a projector Π(±n) where n α = δ α α . Therefore, A αα takes the form ofΠ(n, 1, −1), which means A 0α ∈ L. For α = 0, A αα ,± is proportional to a single-qubit rotation that swaps the α-axis and α -axis. More concretely, 2A αα ,± ∈ L for α = 0 and α < α . As for B αα , when α = 0, B αα ,± becomes proportional to a singlequbit rotation around α -axis. Likewise to the previous case, 2B αα ,± ∈ L i . For α = 0, B αα ,± can be implemented by a projector followed by a flip; for example, With this observation, we can see that the channel B αα in this case can be written as a product ofΠ and σ α which makes B αα ∈ L for α = 0 and α < α .
Combining the above properties, we can calculate W (U ) = i |c i | for the decomposition given in Eq. (19) as, which gives an upper bound of W (U ). We note that the operations used in the proposed decomposition, namely σ α (α ∈ {0, 1, 2, 3}), A αα and B αα with α < α are 16 linearly independent single-qubit channel and thus form a complete basis in the space of single-qubit superoperators. This means W (U ) is uniquely determined as long as the same basis set is used.

Behaviour of W (U )
Here, we numerically investigate the behavior of W (U ) defined in Eq. (20), restricting the domain of {θ α } in which each point is not locally equivalent, meaning that a two-qubit unitary represented by a point (θ 1 , θ 2 , θ 3 ) cannot be translated to another point in the domain by transforming it with single-qubit unitaries, according to Ref. [20]. In Fig. 1, we depict such a domain of {θ α } 1 . Note that there are exceptional local-equivalence in the domain; every point A 1 A 2 A 3 and OA 2 A 3 is locally equivalent to A 1 A 2 A 3 and OA 2 A 3 , respectively. Since W (U ) is symmetric to the reflection of θ x , we only investigate the tetrahedron OA 1 A 2 A 3 . In Fig. 2, we show the behavior of W (U ) on the surfaces and edges of the domain. We numerically found that W (U ) is maximized at (θ 1 , θ 2 , θ 3 ) ≈ (π/4, 0.202π, 0.136π) which lies on the surface A 1 A 2 A 3 with its value being approximately 8.87. The behavior of W (U ) seems to be unrelated to other measures such as entangling power of U [19,21]; for example, while the point A 1 corresponds to controlled-σ α gates which can produce the maximal amount of entanglement and has W (U ) = 3, A 3 which corresponds to the swap gate has W (U ) = 7. Although we believe the decomposition given in this work is close to optimal, this counter-intuitive result might be caused by the non-optimality.
Let V be a sequence of gates consisting of alternating layers of singlequbit and two-qubit gates. Note that any quantum circuit can be written in this form. V can be written as V = D L S L · · · D 2 S 2 D 1 S 1 where D i 's and S i 's are twoqubit and single-qubit gates, respectively. We assume α is a Pauli matrix acting on the a-th qubit. Now, focusing on the i-th twoqubit gate, we can express an expectation value of an observable O at the end of the circuit as, where, This decomposition also allows us to perform a "virtual" two-qubit gate on a quantum circuit in the sense that, in V i,αi , the i-th two-qubit gate in V is replaced by σ αi which is a tensor product of local operations. We can do this by the following algorithm. Let us assume that O is written as O = i c i P i , where P i is a tensor product of Pauli operators. With this assumption, we can evaluate 0|V † It is trivial that G(D i ) is always smaller than W (D i ). Therefore, if we can measure ψ i,α i |ψ k,i,αi , it is always better to use this approach. For example, in a classical simulation we can easily calculate ψ i,α i |ψ k,i,αi . However, it is not the case for a quantum computer, in particular for a NISQ device. Measurement of the overlap ψ i,α i |ψ k,i,αi , including its phase, is a demanding task. One way of performing this task is to use a controlled-V i,αi as mentioned in e.g. Refs. [16,22], which is unlikely to be implemented on a NISQ device due to its complexity. The original motivation of this work and our previous works [11,23] has been to avoid such complex operations. Note that the famous swap test [24,25] cannot be applied to this task since it can only evaluate | ψ i,α i |ψ k,i,αi | 2 . Investigations on the relation between W (D i ) and G(D i ) are left for the furture work.

Comparison with the previous work
In the privous work [11], we have proposed the decomposition for an gate in the form e iθA1⊗A2 for Hermitian operators A 1 and A 2 satisfying A 2 1 = I and A 2 2 = I. It is a special case of this work, which is recovered by setting u 0 = cos θ and u α = i sin θ for one chosen α ∈ {1, 2, 3}. Therefore, the cost overhead of this special case is determined by 1 + 2|u 0 u * α − u α u * 0 |, which takes maximum at θ = π/4. If we are to decompose a general two-qubit gate in the form of exp i 3 α=1 θ α σ α ⊗ σ α using this technique, we decompose each of exp [iθ α σ α ⊗ σ α ]. Then, the overhead is quantified by the product of 1 + 2|u I u * α − u α u I |, which reaches its maximum 3 3 = 27 at θ α = π/4 for all α. On the other hand, W U defined in Eq. (20), which quantifies the overhead required by the present approach, becomes 7, showing substantial improvement.
While we believe that the decomposition given in this work is close to optimal, there can be better decompositions with smaller W U . The search for optimal decomposition will require some form of numerical search. In the context of classical simulation of near Clifford circuits, Ref. [16] has performed such a search. However, the optimization of the decomposition considered in this work will be more complicated than the aforementioned work, since the number of available operations is infinitely many as can be seen from Eq. (5). We believe the decomposition proposed in this work can be a good starting point of the optimization if it is not optimal and leave it as future work.

conclusion
We have introduced a quantity called channel robustness of non-locality which quantifies the minimal amount of overhead required for decomposing non-local channels into local ones with a quasiprobability-based method. While the calculation of the quantity for general non-local channels is difficult due to the need for a complicated optimization, we have successfully established an upper bound for a general two-qubit unitary channel. The upper bound is obtained by constructively deriving an explicit decomposition. Its overhead is substantially lowered compared to the previous work [11]. While we believe the present decomposition is close to optimal, there might be a better decomposition of a general two-qubit channel than the one presented in this work, which we leave as possible future work. This formalism of decomposing an experimentally challenging channel into a linear combination of experimentallyeasy channels allows us to readily perform the decomposition using a quantum device. A Submultiplicability of W (Φ) Lemma 1 Let Φ 1 and Φ 2 be any quantum channels and (23) and i |c µi | = W (Φ µ ). Then, Φ 21 can be decomposed as, Because L 2i L 1j ∈ L, the above gives a decomposition of Φ 21 in the form of Eq. 2. Therefore,