Faster Born probability estimation via gate merging and frame optimisation

Outcome probability estimation via classical methods is an important task for validating quantum computing devices. Outcome probabilities of any quantum circuit can be estimated using Monte Carlo sampling, where the amount of negativity present in the circuit frame representation quantifies the overhead on the number of samples required to achieve a certain precision. In this paper, we propose two classical sub-routines: circuit gate merging and frame optimisation, which optimise the circuit representation to reduce the sampling overhead. We show that the runtimes of both sub-routines scale polynomially in circuit size and gate depth. Our methods are applicable to general circuits, regardless of generating gate sets, qudit dimensions and the chosen frame representations for the circuit components. We numerically demonstrate that our methods provide improved scaling in the negativity overhead for all tested cases of random circuits with Clifford+$T$ and Haar-random gates, and that the performance of our methods compares favourably with prior quasi-probability simulators as the number of non-Clifford gates increases.

The question of efficient probability estimation has recently received vast attention due to the ongoing rapid development of quantum devices aiming to supersede classical capabilities (e.g. [25][26][27][28]). Aided by powerful error mitigation techniques [29][30][31][32][33], noisy intermediate-scale quantum (NISQ) [34] devices aim to deliver computational advantages, therefore fast and accurate outcome probability estimation is a necessity for quantitative benchmarking of the devices [33,35,36]. For example, Google's recent experimental realisation of a quantum speed-up [26] relies on classical estimation methods to predict statistical features of the outcome probabilities.
It is expected that exact classical simulation of arbitrary quantum systems is inefficient, as the resource overhead exponentially grows with the size of the system. Nevertheless, there are restricted classes of quantum circuits for which exact classical simulation is possible [37]. The most notable example is given by circuits composed only with stabilizer states and gates in the Clifford group, which can be efficiently simulated classically via the Gottesman-Knill theorem [38].
Probability estimation methods are varied and aim to explore the efficiency of circuit sampling or simulation beyond the regime of quantum cir-cuits that admit tractable classical representations. The estimation methods we mention in our work can be classified under one of two leading approaches [39,40]. The first involves stabilizer rank-based simulators [41][42][43][44][45][46][47], which rely on approximating the circuit components by stabilizer operators. Every state or operation is assigned an exact or approximate stabilizer rank [43] indicating the number of stabilizer operators required to perform an exact or approximate decomposition of that component. If a circuit component is non-classical, its stabilizer rank grows large thus inducing an exponential runtime cost for the estimation of the outcome probability. Algorithms based on stabilizer decompositions have been very successful in estimating outcome probabilities of circuits dominated by Clifford gates and supplemented by a few types of magic states [39,41,42]. These Clifford simulators are generalised from pure to noisy settings by the recently proposed density-operator stabilizer-rank simulator [40]. Furthermore, computing the stabilizer rank of arbitrary gates appears to be an intractable problem in the general case, so recent improvements on computing stabilizer rank bounds for specific non-Clifford states enhance runtimes significantly [48].
The other family of estimation methods relies on quasi-probabilistic representations of circuit components [36,40,[49][50][51][52] and such methods are in principle directly applicable to any quantum circuit without the need for state decompositions, in particular circuits with induced noise. They are based on the notion of a frame representation for the components of the circuit [53,54]. Specifically, all components are represented by quasi-probability distributions in a certain frame and sampling on these distributions can be performed. Since any state or gate admits such a representation, quasi-probability simulators naturally apply to arbitrary circuits with noise. Many such frames have been studied [51,[53][54][55][56][57][58], and the runtime depends on the total negativity that is present in the circuit representation [49]. A notable frame simulator is the dyadic frame simulator [40] which relies on operator decompositions into stabilizer dyads |L R|, where |L and |R are pure stabilizer states. This method assigns dyadic negativity to non-classical elements, which quantifies the extent to which the operator's optimal linear decomposition into stabilizer dyads departs from a convex combination. The dyadic simulator is a state-of-the-art quasi-probability frame simulator for qubits, as demonstrated by its low runtime scaling O(4 0.228t ) with t non-Clifford gates [40]. However, optimising the decomposition of an operator in dyads is computationally challenging.
Stabilizer rank simulators generally offer two advantages over estimation methods based on frame representations. Firstly, they can be used for sampling the circuit output probabilities, which can be viewed as a stronger notion of simulation than probability estimation. Frame representation methods produce probability estimates with additive precision, which does not suffice for sampling [17,42]. Secondly, the stabilizer-rank algorithms developed in [40][41][42] are quadratically faster as they achieve a scaling of O(2 0.228t ) in the asymptotic limit. However, specialised simulators (e.g. [41,42]) suffer from additional polynomial runtime factors, which tend to be more significant compared to the exponential runtime for the experimentally relevant case of big circuits with a low number of non-Clifford elements. Recently, an algorithm of additive precision [39] has also been shown to asymptotically outperform the methods of [41,42], at least in certain parameter regimes.
In this paper, we focus on quasi-probability estimation methods based on frame representations and look for a way to improve the performance of outcome probability estimation. Recently, there has been a proposal of a Monte Carlo sampling algorithm which allows for quasiprobability estimation of circuits that contain a bounded amount of negativity in their representation [49]. For classes of circuits in which negativity grows only polynomially in the number of input states, this estimation algorithm is efficient. The negativity of the circuit therefore indicates the hardness of the probability sampling problem. Although the negativity scales exponentially with the number of non-Clifford gates, the scaling factors hugely depend on the frame choice. Until now, however, the same fixed representation has been applied on every circuit component and the flexibility on reducing negativity has been limited.
Our aim is to explore the extent to which varying the frame representations of the components in a given circuit can lead to a reduction in the total circuit negativity. To this end, we propose a pre-processing routine for any general quantum circuit, which aims at reducing the negativity overhead required for probability estimation. Our proposed routine consists of two distinct subroutines: 1. Circuit gate merging: We introduce the idea of merging gates together into new n-qudit gates for fixed n in the context of reducing sampling overhead. This sub-routine reduces the negativity of the entire circuit and is independent of the estimation method used.
We demonstrate numerically that the average negativity reduction over a random ensemble of circuits is greater as the number of non-Clifford elements, e.g., T gates, increases and is comparable to recent asymptotic negativity bounds [40,59,60]. Our routine does not depend on the specifics of the circuit gate set and can therefore be used in cases of gates which are hard to decompose, e.g., Haar-random gates.

Frame optimisation:
We introduce the idea of using different frames to represent the input and output phase spaces of the gates in the circuit. This is inspired by work in continuous variables [61], but our approach is novel in the context of discrete quasiprobability sampling methods.
We argue that this sub-routine compliments gate merging as an additional source of negativity reduction when merging is no longer efficient. We then demonstrate numerically that instances of Clifford+T circuits and circuits with Haar-random gates admit significant negativity reductions by introducing additional frames in the circuit representation.
We note that a polynomial runtime for these classical sub-routines with respect to the circuit size should be guaranteed to effectively reduce the overall runtime of the sampling method. As proof of principle, we provide explicit algorithms in the main text that ensure this condition for each subroutine. This paper is organised as follows. In Section 2, we review the frame representation and the estimation algorithm using quasi-probability repre-sentations of a given quantum circuit. In Section 3, we outline our results within the context of the current state of quasi-probability simulator research. In Section 4 and 5 we describe the two sub-routines in more detail, before providing a summary in Section 6.

Frame representation of quantum circuits
We first give a brief overview on classical circuit sampling based on the method of frame representation. Suppose that an N -qudit quantum circuit C is composed of the initial state preparation ρ, sequential quantum gates U 1 , U 2 , . . . , U L and the measurement effect E.
The outcome probability of the quantum circuit can be estimated by describing the quantum state ρ as quasiprobability distributions over phase space points λ ∈ Z 2N d and the quantum operations U i as the transition matrices of the distributions. More specifically, a phase space can be constructed from a frame defined as a set of operators F := {F (λ)} and its dual G := {G(λ)} [53,54], such that any operator O is expressed as (1) For a given frame, the outcome probability can be expressed in terms of the representation as In the case where 1) ρ and E are products of local initial states and measurement effects, 2) W ρ (λ 0 ) and W E (λ L ) are classical probability distributions, and 3) W U l (λ l |λ l−1 ), for = 1, . . . , L, are classical conditional probability distributions for all l, efficient classical simulation is possible, where the sampling runtime scales polynomially with N and L [62]. The simulation is performed by sampling the trajectories of (λ 0 , . . . , λ L ) from the initial distribution P (λ 0 ) = W ρ (λ 0 ) and the transition matrix at each step P l (λ l |λ l−1 ) = W U l (λ l |λ l−1 ), which leads to the probability estimate,p C = W E (λ L ). Taking an average over M probability estimates converges to the Born probability as M increases.

Overhead of classical simulation
Non-classicality in the quantum process is represented by negativities in W ρ (λ) or W U (λ |λ), which gives rise to quasi-probabilities. In general, W ρ (λ) and W U (λ |λ) consist of real components that can attain negative values, while satisfying the normalisation conditions, Despite the presence of negativities in the distributions and update matrices, Monte Carlo methods can still be used with adjustments as introduced by Pashayan et al. [49] in order to perform probability sampling. This can be done by for the initial state preparation and taking the transition matrix of P l (λ l |λ l−1 ) = |W U l (λ l |λ l−1 )|/ λ l |W U l (λ l |λ l−1 )| for the quantum gate, while keep track of the signs. In this case, the probability estimate is modified tô where we have defined In order to converge to the Born probability, one can similarly take the average of increasingly many probability estimates sampled over trajectories (λ 0 , . . . , λ L ) using distributions P (λ 0 ) and P l (λ l |λ l−1 ). This directly relates the total amount of circuit negativity to the computational overhead: the larger the negativity in the circuit, the more samples required for an accurate estimation. Observation 1 (Pashayan et al. [49]). The outcome probability p C of the quantum circuit C can be estimated byp C from the number of samples with at least probability 1 − δ of having error less than . Here, is the (maximum) circuit negativity.
As is clear by Eq. (11), the negativity of the circuit acts as an overhead for the convergence time of the sampling algorithm, therefore it is desirable to reduce it before executing the sampling by considering different frame choices.

Main results
In this work, we develop a pre-processing routine to reduce the negativity of the circuit, which in turn reduces the number of samples required to estimate the outcome probability of the circuit. Our routine is applicable to any general circuit consisting of a product input state and product measurement, but independently of the input state dimension, the gate set (e.g. Clifford unitaries or Haar-random gates) and adaptive operations based on intermediate measurement outcomes.

Frame parametrisation
The central focus of our work is to consider frame parametrisations that are allowed to vary across the circuit. It is clear from the definitions that the circuit negativity of a given circuit in Eq. (12) depends on the choice of the frame F and its dual G. Note that F and G are uniquely defined by each other for phase space dimension equal to d 2 , where d is the qudit dimension. We therefore make the dependence clear by labelling the representation functions by G: where we used different frames, G and G , for the input and output wires respectively in the definition of W G |G U . In order to ensure that the number of frames does not grow exponentially with the number of qudits N , we restrict to product frames that are constructed as tensor products of single qudit frames. This allows us to parametrise each single qudit phase space separately, rather than the entire N -qudit phase space. Therefore, we reserve the label G for denoting single qudit frames and the boldface symbol G for denoting a set of single qudit frames. The negativity of each circuit component can now be expressed as where G, G contain elements from the complete set of frames required to represent the circuit. In practice, each circuit component is parametrised only via the frames that correspond to its input and output wires. For example, in Fig.1(b), when parametrising the first gate in the sequence, we can simply consider G as the set {G 1 , G 2 } and G as the set {G 3 , G 4 }. If only a unique frame representation G is used for all circuit components, then G = G = {G} in the expressions above and the label can be dropped, simplifying to the notation of the previous section. The total negativity of the parametrised circuit can now be expressed as a function of the circuit frame set G: We note that Observation 1 still holds by replacing N C with the more general form N C (G). Our main objective is to study the reduction of this circuit negativity by tuning G.

Examples of frame parametrisations
While our results are general and applicable to any family of parametrised frames, in this work we provide two examples of explicit, product frame parametrisations: (i) parametrised Wigner frames and (ii) rotated Pauli frames. Parametrised Wigner frames employ the conventional phase space of the discrete Wigner function [54,55]. Let us define the discrete displacement operator for a d-dimensional system as where χ(q) = e i(2π/d)q . For a qubit system (d = 2), this takes the form D(p, q) = i pq Z p X q . It can be generalised to an N -qudit system as where λ : } using the following reference operators: where we introduced the parametrisation function g(λ). Note that the following relation holds: so the parametrisation function g(λ) : Z 2N d → C\{0} can be fully characterised by the reference operator G 0 . In order to impose that W G ρ (λ) is real-valued and that λ W G ρ (λ) = 1, we need the additional conditions g * (ω) = g(−ω) and g(0) = 1, which are equivalent to G † 0 = G 0 and Tr[G 0 ] = 1 respectively. By taking g(λ) = 1 for all λ, the conventional discrete Wigner function [55] is recovered. One can calculate the quasi-probability distributions of circuit elements via Eq. (13)-(15) using the defined frame and dual frame. In odd dimensions, the parametrised Wigner frame is a good choice for Clifford dominated circuits as Clifford gates do not possess any negativity in the conventional Wigner distribution. Therefore, g(λ) = 1 is already optimal for most circuit elements when considered in isolation and constitutes an obvious starting point for frame optimisation.
In the qubit case, it is known that the Hadamard and CNOT gates have non-zero negativity even in the conventional Wigner distri-bution [63], which motivates us to introduce the next frame parametrisation, valid only for qubits: the rotated Pauli frames.
Rotated Pauli frames are based on the Bloch decomposition of a quantum operator. Consider the set of displacement operators for a single qubit {D(λ)} as defined in Eq. (20) for λ ∈ Z 2 2 = {(0, 0), (0, 1), (1, 0), (1, 1)}. The usual Bloch vector for a single-qubit state ρ can be written as and this defines a valid quasi-probability distribution with frame { 1 2 D(λ)}. We can define a new frame by applying a rotation to the space of the Bloch vector. Let us consider a rotational angle vector θ := (θ X , θ Y , θ Z ) and a corresponding rotation operator R(θ) where R(θ X ) := e −iθX/2 and similarly for Y, Z. Applying this to the Bloch vector in Eq. (25) results in a set of rotated displacement operators, parametrised by θ: Then, we define the frame F = {F (λ)} and its dual frame G = {G(λ)} as which provide a parametrised frame representation for a qubit. This can be generalised to an N -qubit system via with θ = (θ 1 , θ 2 , . . . , θ N ), where θ i is the rotational angle vector for the i-th qubit. The rotated Pauli frames possess the desired property that all stabilizer states and Clifford gates have zero negativity in the conventional Bloch frame representation with θ = (0, 0, 0). Thus, when a given qubit circuit is dominated by Clifford gates, it can be advantageous to employ the rotated Pauli frame.

Pre-processing routine for negativity reduction
The central idea of our pre-processing routine for negativity reduction can now be expressed by the following lower bounds on gate negativity.

Theorem 1.
For two consecutive gates U and V , the following bounds on negativity hold: where G and G are frame sets that represent gates U, V and U V .
Proof. The first inequality holds since G is one specific choice of the optimisation variable set G . The second inequality is due to Observation 2 in the next section.
Theorem 1 motivates us to introduce two subroutines applicable to any quasi-probability estimation algorithm with runtime cost determined by the circuit negativity. The second inequality in Theorem 1 suggests that merging two gates into one is generally advantageous in minimising the total negativity. This leads to the first pre-processing sub-routine, gate merging. The inequality is independent of the specific frame parametrisation and can be directly extended to an arbitrary number of gates. The trade-off is that the merged gate may be of a larger size. For example, if U and V are 2qudit gates sharing one wire between them, gate V U will be a 3-qudit gate. The dimension of the merged gate increases exponentially as the number of qudits involved becomes larger, hence one should truncate the maximum number of qudits acted on by the merged gates, which we define as the spatial parameter n.
The first inequality in Theorem 1 states that, unless the frames between two gates in sequence are already optimal, we can always reduce the total negativity of the two gates by optimising the frames they share. This leads to the second sub-routine, frame optimisation. The optimisation can be directly generalised to a circuit block B containing a sequence of frames G by simultaneously optimising all the frames in the block, min G N B (G). The temporal parameter is the number of frames to be optimised in one optimisation cycle. The optimisation takes place iteratively in the sense that every optimisation cycle optimises the frames within a block, taking as an initial state the optimised frames obtained from the previous cycle. This ensures that negativity cannot increase above its initial value, no matter how many optimisation cycles occur.
Given fixed values for the truncation parameters n, , we show in the following two sections that the total runtime τ of our routine is polynomial in the number of circuit components, In general, larger n or give larger negativity reduction at the cost of additional classical computation.
We note that gate merging yields lower negativity than any frame optimisation between the gates. However, fixing n < N prevents us from merging gates indefinitely, so frame optimisation can then be used for further negativity reduction.

Algorithm 1 Outcome Probability Estimation with Merging and Optimisation
Input: An N -qudit quantum circuit C with a product input state ρ = ρ 1 ⊗ · · · ⊗ ρ N , the list of gates U = {U 1 , ..., U L }, and the product measurement operator E = E 1 ⊗· · ·⊗E N ; the spatial parameter n; the temporal parameter ; the desired accuracy . 1: Run gate merging (Sub-routine 1) with the input gate sequence and n and return the merged gate sequence {V 1 , ..., V L } with L ≤ L consisting of gates acting on at most n qudits. 2: Run frame optimisation (Sub-routine 2) with the merged circuit and and return the optimised frame sequence G opt . 3: Run a sampling algorithm to achieve the input accuracy according to Eq. (11) using the quasi-probability representations of the merged circuit obtained with the optimised frame sequence G opt . Output: p est , the estimated outcome probability.
We present an algorithm for Born probability estimation, including our complete preprocessing routine and sampling, in Algorithm 1 and illustrate its implementation on a toy circuit in Fig. 1. In the following two sections, we discuss in more detail how the two sub-routines, gate merging and frame optimisation, can be implemented. For clarity, we focus on qubit circuits and on the frame parametrisations introduced in the previous section, although our methods are general.

Gate merging
The central idea of our first sub-routine, gate merging, is that the sampling cost of a merged circuit block consisting of multiple quantum gates is in general lower than sequential sampling of each gate. More precisely, this can be summarised as the following observation: . . , U k } be a sequence of quantum gates. The negativity of the merged gate U = U k . . . U 2 U 1 is always less or equal to the product of the individual negativities, i.e., for any frame set G assigned to the gate sequence.
Proof. It is sufficient to prove the statement for two gates U and V . By noting that the quasiprobability of the merged gate is expressed as the negativity of the gate can be bounded as We then apply this argument iteratively to any sequence of quantum gates {U 1 , U 2 , . . . , U k } to obtain Eq. (35), which completes the proof.
Such a negativity reduction can be exemplified by considering the Toffoli gate, which can be optimally decomposed into four T gates [64] along with Clifford gates and Pauli measurements. We compare the negativity of the Toffoli gate itself and its decomposed gate sequence using the Pauli frame, where the negativity only comes from non-Clifford gates. One can readily observe that the Toffoli gate negativity N Pauli Toffoli = 2 is lower than the total negativity of the decomposed gate se- The idea of reducing the negativity of quantum gates by merging (Eq. (35)) can be compared to the submultiplicativity of magic state negativity characterised by the robustness measure (R), which obeys R(ρ 1 ⊗ ρ 2 ) ≤ R(ρ 1 )R(ρ 2 ) [59]. In particular, the robustness of the T state is equivalent to the negativity of the T gate from the sampling cost viewpoint, as one T gate can be "gadgetised" via Cliffords and a single T state [41]. In Ref. [59], the asymptotic negativity per single T gate is lim t→∞ R |T ⊗t 1/t ≈ 2 0.272 which provides a lower bound on their sampling runtime Ω(4 0.272t ).
In order to compare this with the gate merging method, we consider an n-qubit block consisting of Clifford+T gates (see Fig. 2(f) for an example with n = 5). This can be compared to considering n T states in the robustness measure, having the same number of qubits (i.e., the size of Hilbert space) in the block to evaluate the negativity. Figs. 2(a-d) show the distribution of the negativity of 1000 random n-qubit blocks consisting of 100 Clifford gates and t T gates. We observe that the negativity per T gate after merging the gate sequences in a random n-qubit block can be occasionally lower than the robustness measure of n T states [59]. We also note that the negativity reduction works efficiently when the number of T gate in the block, t, increases. For example, when n = 5 and t = 15, 95% of the randomly chosen merged blocks yield negativity per T state lower than the robustness measure. We also plot in Fig. 2(e) the average negativity per T gate versus t, demonstrating that it is decreasing, which implies that our appoach can prove efficient when the structure of the gate block considered becomes more complicated.
The main advantage of our approach is that it is not limited to a particular type of gate set, e.g. Clifford+T circuits, but can be directly applied to any types of quantum gates. The aforementioned approaches using stabilizer rank, robustness and generalised robustness rely on the gadgetisation of a non-Clifford gate using magic states. Therefore, evaluating the classical overhead should be preceded by finding an optimal Clifford gadget with minimum resource of magic. On the other hand, gate merging does not have such a limitation, so it can be useful when the efficient decomposition of a quantum circuit into non-stabilizer states and Clifford gates is non- Figure 2: Histograms of 1000 random Clifford+T circuits with N = 5 consisting of 100 1-qubit and 2-qubit Clifford gates, supplemented by t T gates and merged using spatial parameter n = 5. The leftmost (blue) solid line with the gray region depict the average and standard deviation of each histogram. The brown and green solid lines (from right to left) represent the higher averages of the corresponding histograms for n = 3 and 4 respectively. Vertical dashed lines provide some state-of-the art scalings, more specifically from left to right: O(2 0.228t ) of the Bravyi-Gosset algorithm from [41] based on the stabilizer rank, O(4 0.228t ) of the dyadic frame simulator from [40] and the lower bound Ω(4 0.272t ) based on the robustness of magic from [59]. As t increases, we observe a higher frequency of circuits with log negativity squared per T gate lower than the robustness lower bound: (a) 71%, (b) 81%, (c) 89%, (d) 95%. trivial. We also highlight that merging gates reduces the negativity independently of the choice of frames.
We now describe the gate merging method for a generic N -qubit quantum circuit with L gates. This can be done by grouping the quantum circuit into n-qubit blocks (see Fig. 1(a)→(b)), then Observation 2 guarantees that the negativity of each block is reduced after merging the gate sequences in it. There are various ways of grouping the circuit into n-qubit blocks, but we introduce the iterative Sub-routine 1 for concreteness. The broad idea of the sub-routine is to iteratively connect any yet unmerged (disjoint) gates. All gates remain in the set U disj until they either finally act on n qubits or cannot connect to other gates anymore, when they are move to the output set U merged .

Sub-routine 1 Gate merging
Input: List of gates U = {U 1 , ..., U L } in qudit quantum circuit C and spatial parameter n. 1: Define list of merged gates U merged ← {}, and list of disjoint gates U disj ← {} 2: for U i ∈ U do 3: Set target gate U target ← U i 4: for V ∈ U disj do 5: if U target shares a wire with V then

8:
Add U target to U disj .
9: for U i ∈ U merged do 10: Set target gate U target ← U i 11: for V ∈ U disj do 12: if rank(U target V ) ≤ d n then U target ← U target V .

13:
Add U target to U disj .
At every step, a target gate U target , the algorithm searches through the disjoint gates to find the next one that is connected to U target . We therefore require to search less than L gates for every U target , while the cost of merging two gates (i.e., multiplying) is O(2 2n ), a constant as we fix n < N . So the full gate merging sub-routine scales as O(2 2n L 2 ). The computational cost to compute the transition matrix W G U for n-qubit unitary U and its negativity also exponentially scales with n as there are O(2 2n ) possible phase space points for a n-qubit system.
As we can observe from the scaling, the limiting factor of gate merging is the spatial parameter n, which stems from the exponential growth of the dimension of Hilbert space by increasing the number of qubits. We find numerically that a practical choice for the spatial parameter n is n ≤ 5. As this is a fundamental property of a quantum system, a similar issue arises in the robustness measure as evaluating the robustness of R(|T ⊗n ) and finding its optimal decomposition among O(2 n 2 ) stabilizer states is in general a challenging task for a large n [59].
Due to the computational need to truncate the spatial parameter n < N , a question arises of whether there exist new methods of manipulating the circuit frames and further reducing the total negativity, after gate merging is completed. We provide a positive answer to this question in the following section, where we describe our second sub-routine, frame optimisation.

Frame optimisation
Frame optimisation aims to reduce the total circuit negativity by optimally choosing frames for different circuit components. As we discussed in Section 3.1, we can introduce specific frame parametrisations, such as parametrised Wigner frames or rotated Pauli frames, and iteratively choose the frames throughout the circuit.

Sub-routine 2 Frame optimisation
Input: Quantum circuit C and temporal parameter . 1: Determine the total number of frames, |G opt |, in the circuit C. Choose a subset G (i) target ⊂ G opt with at most frames.

6:
Find a circuit block B containing the frames in G (i) target .

8:
Update the corresponding frames in G opt with G (i) target . Output: G opt .
In principle, the best strategy in terms of achieving the highest negativity reduction would be to carry out global optimisation over all circuit frames, requiring that the number of parameters to be optimised should scale with the number of qubits N and circuit length L. In this work, we show that a local optimisation, with only a fixed number of parameters, is sufficient to achieve considerable negativity reduction and scales only linearly in N and L. This optimisation sub-routine is implemented by dividing the circuit into blocks containing at most frames to be optimised, for a fixed temporal parameter .
To perform the frame optimisation on a quantum circuit C consisting of an input state ρ, a gate sequence {U 1 , ..., U L } and a measurement effect E, we need to start from an initial frame parametrisation. We denote this parametrisation as G opt = {G 1 , . . . , G |G opt | }, where |G opt | is the number of frames to be optimised. The procedure is outline in Sub-routine 2 and explained here. We take a subset G (1) target ⊂ G opt with up to frames (either sequentially or randomly) and create the block B of circuit components which are attached to those frames. Keeping all other frames in the block B fixed with the corresponding frames in G opt , we want to minimise the total negativity of the block N B over all possible choices for G target , so that the minimum occurs at G (1) target allowing us to update the corresponding frames in G opt , which is the end of the first cycle in our frame optimisation. We repeat this process c times by choosing another set of frames as the new G (i) target , i = 1, . . . , c. The number of optimisation cycles c can be chosen arbitrarily, for example it can be chosen as c ≥ |G opt |/ , with the aim of optimising all frames in the circuit at least once. The order in which frames are optimised can also be chosen arbitrarily and can potentially result in a different overall negativity reduction.
We demonstrate the local frame optimisation method with an example. Let us consider the initial part of a simple general circuit depicted in Fig. 3 for the case of n = 2 and = 2. To perform the (i)-th optimisation cycle we consider G (i) target = {G 1 , G 2 }, we consider the corresponding block B 1 = {ρ 1 , ρ 2 , U 1 , U 2 }, which is a set of all circuit components connected to the frames in G (i) target . Then, the explicit optimisation we perform is (39) where N X (G X ) is the negativity of component X as a function of G X with all other frames Figure 3: Example of how to form a block when G target is given in the case of n = 2 and = 2. Only relevant frames are shown. When G target = {G 1 , G 2 }, the corresponding block B 1 , which contains all circuit elements connected to the frames in G target , is fixed to the corresponding ones in G opt . As an additional example, we could have considered the set G Fig. 3. Then, the block negativity we optimise is Note that at each optimisation step, previously optimised frames in G opt are used in the next optimisation cycle. This ensures that the negativity never increases compared to the initial frame choice {G 1 , . . . , G |G opt | } between optimisation cycles.
The presented local optimisation method is efficient in the number of circuit components. Consider an N -qubit circuit of length L where each of L gates acts on at most n qubits. Then, there are at most N + nL different frames to be optimised. Since is fixed, each optimisation cycle takes a constant amount of time O (1). Therefore, the frame optimisation of the entire circuit scales as (N + nL) × O(1) = O(N, L). Note that the exact value depends on truncation parameters n and as well as the specifics of the circuit and its frame parametrisation. Fig. 4 shows the performance of the frame optimisation for a circuit with N = 6 and L = 15 consisting of 2-qubit Haar-random unitaries, which are in general difficult to be simulated with stabiliser-based simulators because they do not admit efficient decompositions. In Fig. 4(a), we use rotated Pauli frames as our frame parametrisation and initialised each frame in the circuit to the set of standard qubit Pauli operators. In Figure 4: Plots showing negativity reduction of a circuit consisting of 2-qubit Haar-random gates with N = 6 and L = 15 after each frame optimisation cycle with different spatial and temporal parameters, n and . The optimisation is carried out sequentially from the first frame to the last frame. Optimisation is performed via the basinhopping algorithm as introduced in [65]. (a) Results after frame optimisation with rotated Pauli frames. The reference frame is the standard Pauli operators. The total negativity continuously decreases as we optimise more frames. (b) Results after frame optimisation with parametrised Wigner frames. The reference frame is the conventional phase-space operators for the Wigner function. The most of negativity reduction occurs near the initial states and the measurements. Fig. 4(b), we choose parametrised Wigner frames as our frame parametrisation and initialise each frame in the circuit to the set of conventional phase-space operators corresponding to g(λ) = 1 (see Sec. 3.2). We can observe that the largest negativity reduction comes from gate merging with higher n, but the frame optimisation also achieves a significant negativity reduction. In general, larger results in lower negativity after optimisation of all frames with fixed n. In the case of parametrised Wigner frames, together with gate merging, we could considerably decrease the initial log-negativity from ∼27.3 to ∼8.9 with truncation parameters n = 4 and Wigner, unmerged Wigner, merged Optimised, merged Figure 5: Histograms of the deviation of estimated probability p est from actual outcome probability p sim (as calculated by Qiskit [66]) for 500 circuits consisting of 2qubit Haar-random gates with N = 3, L = 8 and = 1. The number of samples taken for each circuit is 10 6 , which took around 10 seconds on a standard computer. The plot shown is truncated at |p est − p sim | = 4.0 to demonstrate the advantage of our routines clearly. The advantage is amplified as N and L increase. = 5, which means that we need ∼ 2 2×18 times less samples to reach a given accuracy for probability estimation.
We demonstrate the practical significance of our routine, by sampling 500 circuits consisting of Haar-random gates, with the results presented in Fig. 5. Unmerged circuits represented entirely by Wigner frames do not show any signs of convergence to the actual probability distribution. Merged circuits clearly converge a lot better, especially when their frame representation is optimised.

Conclusion
We introduce two classical sub-routines, gate merging and frame parametrisation, which reduce the total negativity in the quasi-probability representation of a quantum circuit, hence leading to sampling overhead reduction. We emphasise that our methods are very general; they are not restricted to specific choices of frames or frame parametrisations, and can be applicable to any circuit independently of generating gate sets or the purity and dimension of its input qudits. Both sub-routines are efficient in the sense that the runtime scales polynomially in the circuit size N and number of gates L.
We numerically demonstrate that both methods improve the exponential scaling of the circuit negativity by testing them on Clifford+T circuits and circuits with Haar-random gates. Specifically, gate merging is shown to compete on average with the quasi-probability simulators based on dyadic frames and the robustness of magic. Frame optimisation can further compliment gate merging in reducing negativity, when merging gates in the circuits is no longer practical due to the growing size of the gates.
A clear direction for our work is to improve the classical optimisation performed for the frame representation. Our parametrisation resembles variational techniques used in near-term quantum algorithms [67], although our cost function, circuit negativity, is calculated classically. Our optimisation could therefore potentially benefit by research on variational techniques, such as identifying "good" circuit-inspired ansatze for initialising frames or investigating barren plateaus in order to improve optimisation convergence. Such methods could shed light on what families of circuits are hardest to sample from using quasi-probability techniques.
One can also investigate the possibility of performing frame optimisation analytically, at least for particular classes of quantum circuits. Additional assumptions will likely be required for the circuit structure, but finding optimal frames analytically would eliminate the hidden constant runtime costs of "black-box" classical algorithms currently employed for the optimisation. For example, it would be particularly useful to investigate the existence of a finite set of frames as a function of circuit components resulting in minimum negativity for Clifford+T circuits.
Controlled Quantum Dynamics funded by the EPSRC (EP/L016524/1). DJ is supported by the Royal Society and a University Academic Fellowship. MSK acknowledges the Samsung GRC grant.

Code availability
The code that implements the sub-routines and supports the presented numerical findings can be accessed at NK's GitHub repository: https://github.com/nkoukou/ parameterised_negativity.