Cutting multi-control quantum gates with ZX calculus

Circuit cutting, the decomposition of a quantum circuit into independent partitions, has become a promising avenue towards experiments with larger quantum circuits in the noisy-intermediate scale quantum (NISQ) era. While previous work focused on cutting qubit wires or two-qubit gates, in this work we introduce a method for cutting multi-controlled Z gates. We construct a decomposition and prove the upper bound $\mathcal{O}(6^{2K})$ on the associated sampling overhead, where $K$ is the number of cuts in the circuit. This bound is independent of the number of control qubits but can be further reduced to $\mathcal{O}(4.5^{2K})$ for the special case of CCZ gates. Furthermore, we evaluate our proposal on IBM hardware and experimentally show noise resilience due to the strong reduction of CNOT gates in the cut circuits.


Introduction
Quantum computing [1] in the noisy-intermediate scale quantum (NISQ) era is limited by the strong impact of noise and the small number of available qubits [2].As a result, current hardware is far from being able to execute quantum algorithms with provable quantum advantage, such as Shor's [3] or Grover's [4] algorithm.To overcome these hardware restrictions to some extent, circuit-cutting techniques have recently attracted a lot of attention.When scaling up problem instances for the search of empirical quantum advantage in quantum-machine learning tasks or for the quantum approximate optimization algorithm (QAOA) [5], these methods are expected to become relevant tools and will likely be an inherent part of near-term quantum software frameworks [6].
Consider a quantum circuit consisting of two partitions, only connected by a few wires or two-qubit gates.By decomposing the identity channel, Peng et al. [7] introduced wire cutting where a qubit line is cut along the direction of time.This method was subsequently further investigated [8,9,10,11] with respect to the interplay between circuit cutting and noise [12,13,14,15], automatic allocation of classical and quantum computational resources [16,17] and compilation [18].Similarly, Mitarai and Fujii [19,20,21] Christian Ufrecht: christian.ufrecht@iis.fraunhofer.deFigure 1: Consider the quantum circuit shown on the left.It is a circuit with five qubits represented by the horizontal lines, also referred to as wires.Time evolution of the initial state is illustrated by the white and grey boxes where time flows from the left to the right.They represent quantum gates, that is, unitary transformations on subsets of the qubits.The circuit contains a single gate V that connects the two partitions A and B that are otherwise independent.After decomposition of the quantum channel corresponding to V , the circuit disintegrates into a weighted sum over independent circuit pairs, which contain quantum gates or measurements denoted by F A i and F B i .Here, we employ the calligraphic notation otherwise reserved for superoperators to indicate that the gate or projectors are determined from the superoperator decomposition.The result of the original circuit can be restored by evaluating sequentially each of the circuits on possibly smaller quantum devices.Note that the equality has to be understood on the superoperator rather than on the gate level.
proposed gate cutting, the direct decomposition of a unitary channel corresponding to a two-qubit gate.As in probabilistic error mitigation [22,23,24,21] the variance of the estimator for the quantity to be measured increases [25] in case of a cut circuit.As a consequence, all variants of circuit cutting come with a constant κ, characterizing the sampling overhead, the factor O(κ 2 ) of more samples required to estimate the decomposed circuit to the same accuracy as the original one.If K cuts are performed, the sampling overhead increases to O(κ 2K ).Of course, the exponential overhead is in line with our expectation of classical hardness for simulation of general quantum circuits.Piveteau et al. [26] showed that this overhead can be significantly reduced when a common decomposition of several Bell states with subsequent gate teleportation is performed.Ref. [26] also provides optimal decompositions for several important two-qubit gate types based on the robustness of entanglement measure [27].A strong reduction of sampling overhead can also be achieved for joint cutting of wires as shown by Lowe et al. [28].
In this work, we provide explicit decompositions of We show the upper bound κ = 6, which, remarkably, is independent of the number of control qubits.For a CCZ gate, we find the smaller value κ = 4.5.
We conclude this article with experimental results on the IBM Q system Ehningen discussed in Sec. 5. We observe a strong reduction of the CNOT-gate count in the cut circuits and, therefore, resilience to noise.As a particular application of MCZ-gate cutting, we anticipate its use in the alternating operator ansatz [29,30], a variant of the QAOA [5] algorithm for constrained optimization.Another promising application is the simulation of MCZ gates connecting qubits far apart on the hardware graph as done for two-qubit gates in Refs.[20,31].

Cutting a quantum gate
Consider a quantum circuit where the qubits are grouped into two partitions A and B, only connected by one possibly multi-qubit gate V as shown in Fig. 1.Furthermore, assume the factorizing initial state ρ where the superscript labels the partition.These assumptions are, for example, satisfied when all qubits are initialized to |0⟩ and a Pauli string is measured.Circuit cutting, also referred to as circuit decomposition, circuit fragmentation or circuit knitting, is the task of finding a decomposition of the unitary channel V corresponding to the gate V so that where ) are local channels on the partitions A and B as shown in Fig. 1 and , this decomposition allows us to determine the expectation value of the observable as In Eq. ( 4) we denote by ⟨.⟩ i the expectation value with respect to the state on partition A and B, evolved by F A i and F B i , respectively, that is for α = A, B. If F α i are unitary channels or measurement operations themselves, each term in the sum in Eq. ( 4) can be evaluated on a quantum computer.In the case of a more general quantum circuit, when all gates connecting the two partitions are cut, they become independent and the quantum circuits can be evaluated sequentially on a smaller device.We emphasize that the term circuit cutting as used here deals with decompositions of quantum circuits at the level of superoperators, rather than at the level of unitaries as in Ref. [32].The output of a single experimental shot on a quantum computer is typically a bitstring.The expectation value of an observable is then obtained by a classical post-processing function on the bitstrings of multiple runs.Modeling the outcome of each experimental run as i.i.d.random variables, the number of required samples to achieve a given additive error is determined via the variance of the post-processing function.When a circuit is cut, the variance of the modified estimator for ⟨O⟩ via Eq.(4) increases.Consequently, more experimental runs are required to estimate the result of the original circuit to the same given additive error.More specifically, this sampling overhead is exponential (in the number of cuts).The parameter then quantifies the sampling overhead O(κ 2 ) [25].For completeness, this scaling behavior is re-derived in Appendix E. Eq. ( 1) is therefore optimal if the F i are chosen such that the 1-norm of the vector containing the coefficients a i is minimal.For K cuts the sampling overhead is O(κ 2K ), however, joint cutting of multiple gates or wires leads to much smaller bounds for some gate types [26,28].Tab. 1 summarizes recent findings for cutting of CZ gates and wire cutting and compares the sampling complexities to the main results of the present work.In this work we cut multiqubit controlled Z gates and provide upper bounds for κ.As we will explore in the next section, ZX calculus is particularly suited for this task.

ZX-calculus for circuit cutting
ZX calculus [33,34] is a tensor-network representation for quantum circuits that together with powerful transformation rules allows diagrammatic reasoning.Since ZX-calculus has been reviewed elsewhere [35], we will only introduce those diagram types and transformation rules necessary for this article.The basic diagrams are Z-spiders, defined as where |±⟩ = (|0⟩ ± |1⟩)/ √ 2. For α = 0, the inset is commonly disregarded.A third tensor, a so-called H-box, is defined as where the sum runs over all i 1 , ..., i m , j 1 , ..., j n ∈ {0, 1}.Spiders and H-boxes are therefore maps from m-to (un-normalized) n-qubit states, signified by the number of wires ending at the right and the left of the diagrams.When representing ZX diagrams as matrices, we will implicitly assume the computational basis.Then, for example, an H-box is a matrix filled with ones but a minus one in the lower right corner.Hboxes can be viewed as generalized Hadamard gates since As apparent from the definitions, spiders and H-boxes correspond to symmetric tensors in all indices.Any quantum circuit can be represented as a ZX diagram, the reverse statement, however, is incorrect since a ZX diagram not necessarily corresponds to a unitary matrix.To obtain the Hermitian conjugate of a ZX diagram, we move all wires ending at the left to the right and those ending at the right to the left and replace all angles with their negative values.In the following we show how to use ZX calculus to decompose the unitary channel corresponding to an MCZ gate into a sum over unitary and measurement channels.Decomposing circuits into simpler parts using ZX calculus has been done before in Refs.[36,37].
An MCZ gate enjoys a simple representation [38] = (11) in terms of an H-box with zero wires on the left as shown in Appendix A. In matrix representation, M CZ = diag(1, ..., 1, −1) and we will refer to the number of qubits involved in the MCZ gate as the order of the gate.Also note that there is no difference between the control and target qubits in an MCZ gate.The decomposition constructed in this article is based on the H-box fusion rule proved in Appendix B for completeness.Consequently, with Eq. ( 12) applied to Eq. ( 11), the unitary channel action E M CZ (ρ) = M CZρM CZ † corresponding to the MCZ gate of order n + m acting on an arbitrary density matrix ρ, takes the form The two partitions (upper m qubits, lower n qubits) are only connected by the tensor Q shaded in grey, the un-normalized Choi operator of a Hadamard gate.Next, we remove the two remaining connections between the partitions by a rank-one decomposition of Q in terms of vectors factorizing over the two partitions In Eq. ( 15) we introduced the diagrammatic notation for a two-dimensional vector where v j is the j-th component of v. Therefore, the H-box fusion rule in Eq. ( 13) reduces the cutting of an MCZ gate to the decomposition of a four-by-four matrix.However, only vectors w (i) and u (i) are useful whose contraction with the H-box results in channels that can be evaluated on a quantum computer.In Appendix C we prove the identities for θ ∈ [0, 2π) and where P 1...1 denotes the projector on |1...1⟩.The identities Eq. ( 17) and Eq. ( 18) suggest w (i) , u (i) ∈ θ for different values of θ.This choice guarantees a decomposition consisting of unitary controlled Z-rotation gates and projectors, which can be evaluated by mid-circuit measurements.Next, we expand Q in the basis spanned by the Pauli matrices and then substitute their spectral representation and additionally the identity matrix in terms of the eigenvectors of the Pauli Y matrix, that is . The choice of this expansion basis leads to κ = 3 for the normalized version of Q which can be shown to be optimal with the help of the robustness of entanglement measure [26,27].Unfortunately, Eq. ( 19) contains the state , not covered by Eq. ( 17) and Eq. ( 18) and its contraction with an H-box does not result in a valid quantum circuit.We could simulate this operation by an MCZ gate on an ancilla qubit initialized in and postselected on the |+⟩ state [39], however, at the cost of one ancilla qubit per partition and cut.We circumvent this issue by substituting X = 1/2 + 1/2 − instead.By this substitution we lose optimality for the decomposition of Q.But note that it is unknown if an optimal decomposition of Q translates into an optimal decomposition of the MCZ gate.While we show only upper bounds, we observe in Sec. 4 the reduction of the general decomposition of an MCZ gate for a CZ gate to the optimal one.With these considerations in mind and defining Θ = {−π/2, 0, π/2, π}, we find where α θ = −1 for θ = π and α θ = 1 otherwise.This result leads to a general decomposition of the MCZ gate by substituting Eq. ( 23) into Eq.( 13) and then contracting the vectors with the H-boxes with the help of Eq. ( 17) and Eq. ( 18).Before we state the resulting decomposition explicitly, we rewrite the superoperator P 1...1 corresponding to the projector P 1...1 in Appendix D with the result Here, the operation Z abbreviates the sum over the unitary channels corresponding to all combinations of one-qubit identities and Z gates, that is for an n-qubit state ρ The second operation is P(ρ) = l ξ l P l ρP l where P l are the projectors on all elements of the computational basis and ξ l = −1 for l = (1, ..., 1) as well as ξ l = 1 otherwise.We show in Appendix E how to relate P to circuits with intermediate measurements.Furthermore, we prove that circuits containing either P or P 1...1 have the same sampling complexity, but introducing Z leads to gate cancellations in the final decomposition.The consequence is a reduction of κ in the final result.
We now state the main result of this article, the decomposition of an MCZ gate: Here, S = R z (π/2) denotes an S gate and symbolizes the operation Z in circuit form.The decomposition shown here is for an MCZ gate of order five and a cut after the third qubit line counting from the top.We emphasize that since the derivation was independent of the order of the MCZ gate and the cutting location, the structure of the decomposition in Eq. ( 26) remains unchanged for a cut at any position and arbitrary order of the MCZ gate.Eq. ( 26) has to be read in terms of channels.To evaluate a quantum circuit containing an MCZ gate we have to subsequently replace the gate by the gates and operations shown in Eq. ( 26).

Sampling overhead
In this section we investigate the sampling overhead associated with Eq. ( 26) for different orders of the MCZ gate.The sampling overhead O(κ 2 ) is characterized by κ = i |a i | as stated in Eq. ( 6).We re-derive this statement in detail in Appendix E. We also consider operations such as P and show that with regard to sampling overhead, this operation can be treated in the same manner as a unitary channel.With this insight, we are now in the position to determine κ for Eq.(26).Recalling the pre-factor 1/2 n for the 2 n unitary circuits summarized by Z in Eq. ( 27), we find for a general MCZ gate κ = 6.Remarkably, this bound holds independently of the order of the gate.In the right bracket of Eq. ( 26) the MCZ j gate is the identity gate for j = 0 and Z also contains the identity gate but the prefactors have a different sign.Consequently, summing these gates slightly reduces κ.This reduction becomes more pronounced, the smaller the order of the MCZ gate is but quickly approaches κ = 6 for large order.Indeed, for a CZ gate, the right bracket vanishes and we find κ = 3 which is known to be optimal [26].Similarly, a cut separating one qubit line from a general MCZ gate results in κ = 5.For a CCZ gate, we find κ = 4.5.Due to the importance of CCZ gates in quantum computing theory, its decomposition is stated explicitly: with the abbreviation The sampling complexities derived above are summarized in Tab. 2 together with known results on the optimality of the decomposition.

Experiments on IBM Q
In this section, we show experimental results obtained on the ibmq_ehningen system.We will find a strong reduction of noise impact for the proposed cutting scheme.
We first run numerical simulations to validate the proposed MCZ cutting scheme.To this end, we generate a set of random 3, 4 and 5 qubit circuits with one MCZ gate at the center of the circuit and two otherwise independent partitions.For exemplary circuits and a more detailed description of the circuit generation, see Supplemental Material.In all experiments, we measure the Z ⊗ ... ⊗ Z Pauli string.For the cut circuits, we allocate N i = N |a i |/(2κ) samples to each circuit pair labeled by i where N = 4κ 2 /ϵ 2 bounds the standard deviation by ϵ, see Eq. (68) in Appendix E.2.For an uncut circuit, we allocate all N samples for evaluation.We first calculate N = 1.44 × 10 6 and N = 1.44×10 8 via Eq.( 68), the number of repetitions required to bound the standard deviation of expectation values of the cut circuit to within ϵ = 0.01 and ϵ = 0.001.For fixed number of qubits we generate 5 random circuits and sample each circuit N times using an ideal simulator.Repeating this process 20 times, generates one hundred data points for each value of ϵ.
We now compare the distributions of the expectation-value differences between the full (that is uncut) and cut circuits for different values of ϵ.We  run simulations for 3,4 and 5 qubits and find almost no difference in the standard deviations of the distributions.This behavior is expected since κ is independent from the order of the MCZ gate.Exemplarily, Fig. 2 shows the distribution for random 5-qubit circuits.
In the figure, the white boxes are the interquartile ranges and the bars denote the 5% and 95% quantiles.Fig. 2(1) shows the distribution of the expectation values for the original uncut circuits.As expected, the distribution becomes broader when we instead ate the cut circuit in Fig. 2(2).Its standard deviation is 0.0027, which is smaller than the chosen value of ϵ = 0.01.The same effect is observed for ϵ = 0.001 shown in the plots in Fig. 2(3) and Fig. 2(4).Consequently, a tighter version of Eq. (68) to bound the variance might exist.To further validate the proposed method on real quantum hardware and analyze the impact of noise, we performed MCZ gate-cutting experiments on the ibmq_ehningen [40] device.The ibmq_ehningen device is a super-conducting qubit type quantum hardware with a 27 qubit ibmq_falcon processor.The native gate set on this hardware consists of Rotational-Z, Pauli-X, √ X, Identity, and CNOT gates.For the experiments, we again generated a random circuit with one free parameter that was scanned in the experiments.The maximum number of shots per job on the ibmq_ehningen device is 10 5 and the same was used for each data point, see Supplemental Material for the explicit circuits and a more detailed discussion of the experimental setup.Fig. 4 shows the result for a three-qubit circuit containing a CCZ gate.In the figure, the results obtained by cutting the CCZ gate (green circles) lie slightly closer to the exact curve (blue line) than the results from the uncut circuit (orange triangles).Next, we repeated the experiment with a five-qubit circuit containing an MCZ gate of order five.The result is shown in Fig. 5. From the figure, it is clear that the influence of noise is extreme when executing the transpiled version of the full circuit with an MCZ gate of order five.The resulting expectation values were nothing but a random output centered around zero.However, the expectation value estimated using the proposed method falls much closer to the expectation values obtained by ideal simulation.This resilience to noise can be attributed to the strong reduction of the CNOT-gate count in the cut circuits.While, asymptotically, an MCZ gate of order n can be synthesized with O(n) CNOT gates with one auxiliary qubit [41], for small n the number of required CNOT gates increases quickly.Consequently, cutting for example two qubits from an MCZ gate of order five reduces the maximum number of CNOT gates in a circuit to those required to synthesize a double-controlled S gate.This reduction becomes even more pronounced after compiling to topologically restricted hardware.As a consequence, we observe a strong reduction of noise impact.The maximum number of CNOT gates present in the different circuits are shown in Tab. 3. The number of CNOT gates might vary based on the transpilation method used.However, we chose to proceed with the best transpiler offered by the hardware manufacturer to validate our results.As most of the work in the literature is conducted via a simulator, we extended our 5-qubit MCZ experiment shown above to a noisy simulator.The results are shown in Fig. 3.The proposed method still outperforms the full circuit execution even though the noise model provided by the manufacturer does not accurately capture the noise level exhibited by the real device.

Conclusion
In this work, we proposed an approach for cutting multi-controlled Z gates by means of ZX-calculus based on the H-box fusion rule.We derived the upper bound κ = 6 on the sampling overhead that is independent of the order of the gate.We validated the results on IBM hardware and found strong noise reduction due to the reduced amount of CNOT gates in the cut circuit.We anticipate the generalization of our method to multi-controlled rotation gates and extension to multi-qubit rotations.The optimality of the decompositions constructed in this work at present remains an open question.and b ′ running over {0, 1}, j ∈ {0, 1} n and i ∈ {0, 1} m i,j,a,b which is equivalent to the left-hand side of Eq. ( 12).
In the last step we recalled that π i , π j ∈ {0, 1} so that Alternatively, the H-box-fusion rule can be derived by the Schmidt decomposition of the vector (1, ..., −1) T representing the diagonal elements of the unitary corresponding to the MCZ gate.

C H-box identities
In this appendix, we prove Eq. ( 17) and Eq. ( 18).The contraction of a single-qubit vector with an H-box is readily calculated as by matrix-vector multiplication.Consequently, making use of Eq. ( 33), we find Therefore, Eq. ( 42) is proportional to a unitary for and to a projector on the state |1...1⟩ for w 0 = −w 1 .In terms of spiders, these two conditions are either satisfied for ) T or for w = = (1, −1) T .The former results in Eq. ( 17) and the latter in Eq. (18).

E Derivation of sample-complexity overhead
This appendix investigates in more detail the sampling overhead associated with evaluating a cut circuit, closely following Refs.[24,7].In Appendix E.1 we prove the statements on sampling overhead made in the main text for the case where in each experimental run a circuit from the decomposition is sampled from a probability distribution.Subsequently, in Appendix E.2 we bound the variance by pre-estimating the expectation values of the partitioned circuits.

E.1 Sample complexity for circuit sampling
Consider an n-qubit quantum circuit.Assume the initial state |0⟩ ⊗n and consider the post-processing function f : {0, 1} n → [−1, 1] on the measured bitstring s.Furthermore, assume that f (s) can be efficiently calculated classically.These definitions describe the computational model of Refs.[42,7].The post-processing function gives rise to an observable where P s is the projector on the computational basis state corresponding to bitstring s.Conversely, for instance, if the observable is a Pauli string, it can be written in the above form by local diagonalization and viewing the unitary diagonalization matrix as part of the circuit.The goal of the quantum computation is to approximate ⟨O⟩ to additive error ϵ with high probability for which we will define statistical estimators in the following.
Cutting a gate amounts to replacing its superoperator by a decomposition as in Eq. ( 1).We first consider the case where all F i in the decomposition correspond to unitary gates.With this substitution, we find where ⟨.⟩ i denotes the expectation value with respect to the state evolved by circuit i, containing F i .Moreover, p(s|i) is the probability to measure bitstring s given circuit i.To evaluate the result on a quantum device, we sample i at each experimental run according to the probability distribution We therefore define a random variable I that takes values i with probability p(i) and the estimator [25] f = κ sign(a I )f (S I ) (52 where the random variable S I=i models the bitstring outcomes of the circuit i.This estimator is unbiased since Here, we substituted Eq. (51) into Eq.(54).The final result is obtained by comparing Eq. (55) to Eq. (50).The estimator for N shots is given by the sample mean of Eq. (52).Thus, to estimate ⟨O⟩ to additive error ϵ with probability 1 − δ, Hoeffding's inequality provides the required number of experimental repetitions as where we used that | f | ≤ κ.The number of samples needed for the original circuit is obtained for κ = 1 in Eq. ( 57) from which we infer the sampling overhead O(κ 2 ).
It is straightforward to generalize this derivation to K cuts.In this case, the expectation value of the observable is obtained as and the estimator for K cuts becomes where we defined the independent random variables I 1 ,...,I K that determine the specific circuit to run.
To show equality between the expectation value of Eq. (59) and Eq.(58), we follow Eq.(53) to Eq. ( 56) and make use of the independence of I 1 ,...,I K .Since | f | ≤ κ K , Hoeffding's inequality applied to the sample mean of Eq. ( 59), provides the bound O(κ 2K ) on the sampling overhead.If the observable factorizes over the partitions A and B of the original circuit, the post-processing function factorizes as well, that is f (s) = f A (s A )f B (s B ) where s A and s B are the bitstring results on partition A and B. If the initial state factorizes and all gates connecting the two partitions are cut, the partitioned circuits can be evaluated on independent quantum computers or sequentially on the same device.
We now turn to a quantum circuit that contains projectors.Consider the map M consisting of a complete set of projectors P l and ξ for state ρ.This map is neither positive since ξ l can be smaller than zero, nor trace preserving since the state in Eq. ( 60) is not normalized after the projection and multiplied by ξ l .Note that both P 1...1 and P of Eq. ( 24) are instances of M. For P 1...1 we set ξ l = 1 only for l = (1, ..., 1) and zero otherwise.On the other hand for P we have ξ l = −1 for l = (1, ..., 1) and ξ l = 1 otherwise.Even though M does not correspond to a physical time evolution, we can nevertheless estimate by repeated use of a quantum computer.In Eq. (61), the density matrix ρ is the state right before the projectors and U is the unitary channel corresponding to the unitary evolution U afterwards until the final measurement.To estimate χ on a quantum computer, we perform intermediate measurements, and sample according to the estimator where the random variable S describes the bitstring outputs of the final measurements as before and the random variable L models the outcomes of the intermediate measurements.According to Eq. ( 62) we have to add −f (S) in the sample mean for f if we found the all-one state in the intermediate measurement and f (S) otherwise.This estimator is unbiased since The first line, Eq. ( 63), is the definition of the expectation value and in Eq. (64) we substituted p(s, l) = tr(P s U P l ρP l U † ) [1].In the final step, we identified O = s f (s)P s .The number of samples required to determine χ to given additive error is again determined by Hoeffding's inequality applied to the sample mean of Eq. (62).Since |ξ l | ≤ 1 by definition, Eq. (57) remains unchanged when some of the circuits contain projectors.

E.2 Sampling complexity by pre-estimating circuits
In Sec. 5, we show experimental results for a circuit that disintegrates into two independent partitions A and B. After cutting the single gate that connects the two partitions, we have to evaluate Rather than sampling a circuit pair i for each experimental run as discussed in Appendix E.1 we preestimate the expectation values ⟨O A ⟩ i and ⟨O B ⟩ i for all the circuits first and subsequently restore the result of the original circuit with the help of Eq. (67).In the following, it is shown that the standard deviation of Eq. ( 67) can be bounded by ϵ for a total number of experimental runs where we allocate samples to each circuit of circuit pair i to determine the expectation values ⟨O A ⟩ i and ⟨O B ⟩ i .Note that 2 i N i = N .Next, we define the estimator and f B i are independent, this estimator is unbiased.The variance of f is calculated as CCZ gate and an MCZ gate of order 5, respectively.The circuits are shown in Fig. 8 and Fig. 9. Subsequently, we optimized the single-qubit rotation angles until the difference of the expectation values of a Pauli Z-string Z ⊗ ... ⊗ Z with respect to the circuit with and without the MCZ gate was maximal.Finally, we added one free parameter to the rotation angles of three single-qubit rotation gates, which were scanned in the experiments.The parameter is denoted by psi in the circuits shown in Fig. 8 and Fig. 9. (2) We used the highest level of optimization offered by the Qiskit API after transpiling the circuits to the native gate set and the hardware graph of the ibmq_ehningen device.The highest level of optimization first searches for a layout that satisfies all the 2-qubit gate connectivity to the hardware graph considering the qubits readout errors and gate fidelities.
Then the circuit is unrolled to the native gate set.Finally, optimizations in the form of commutative gate cancellation and re-synthesis of two-qubit unitary blocks are performed.(3) 18 data points between [0, 2π] with equal intervals were chosen for scanning the free parameter.(4) We used 10 5 shots per data point, the maximum number of shots per job allowed by the ibmq_ehningen device.( 5) All experiments were run with maximum error mitigation offered by ibmq wherever possible.We used the TREX option for readout-error mitigation [44] offered by Qiskit.(6) The ibmq_ehningen device can be characterized by the following parameters [40]: It is a 27 qubit ibmq_falcon processor, whose connectivity graph is shown in Fig. 10.The noise parameters at the time of our experiments were the following: Decoherence times: Average values T 1 = 160 µs and T 2 = 150 µs with large fluctuations between the qubits.Single-qubit errors are of the order of 10 −4 and CNOT-gate errors of the order of 0.009.

1 Figure 5 :
Figure5: Expectation value estimation of parameterized circuit containing a 5-qubit MCZ gate on the ibmq_ehningen device.The green circles, sampled from the cut circuits, still qualitatively match the exact result (blue line).The signal sampled from the uncut circuit (orange triangles) is completely derogated due to the strong influence of noise.

RZFigure 9 :
Figure 9: Circuit for the cutting of an MCZ gate of order 5.

Table 1 :
Overview of recent findings on CZ and wire cutting.We contrast the sampling overhead for K cuts associated with different methods and compare them to the main results of the present work.
overhead of different methods in Sec. 2, we prove the main result of this article, the general decomposition of an MCZ gate in Sec. 3 with the help of ZX calculus.In Sec. 4, we evaluate the sampling complexity associated with the decomposition of an MCZ gate.

Table 2 :
Sampling overhead for different order and cutting positions for an MCZ gate.In addition, we state known results on the optimality of the decomposition.

Table 3 :
Number of CNOT gates present in the transpiled version of the circuits shown in the Supplementary Material (first column).After circuit cutting, the maximum number of CNOT gates contained in one of the cut circuits reduces considerably (second column).
)In Eq. (71) we first factorized the variance, valid for independent random variables.Eq.(73) makes use of the bounds E[(f B i ) 2 ] ≤ 1 and |E( f A i )| ≤ 1 since |f A i | ≤ 1 and |f B i | ≤ 1.Finally, Var( fi ) ≤ 1/N i for N i independent samples.Consequently,