Pulse-efficient quantum machine learning

Quantum machine learning algorithms based on parameterized quantum circuits are promising candidates for near-term quantum advantage. Although these algorithms are compatible with the current generation of quantum processors, device noise limits their performance, for example by inducing an exponential flattening of loss landscapes. Error suppression schemes such as dynamical decoupling and Pauli twirling alleviate this issue by reducing noise at the hardware level. A recent addition to this toolbox of techniques is pulse-efficient transpilation, which reduces circuit schedule duration by exploiting hardware-native cross-resonance interaction. In this work, we investigate the impact of pulse-efficient circuits on near-term algorithms for quantum machine learning. We report results for two standard experiments: binary classification on a synthetic dataset with quantum neural networks and handwritten digit recognition with quantum kernel estimation. In both cases, we find that pulse-efficient transpilation vastly reduces average circuit durations and, as a result, significantly improves classification accuracy. We conclude by applying pulse-efficient transpilation to the Hamiltonian Variational Ansatz and show that it delays the onset of noise-induced barren plateaus.


Introduction
Quantum machine learning (QML) is a nascent area of research that has seen rapid developments André Melo: am@andremelo.orgFrancesco Tacchino : fta@zurich.ibm.com over the last decade [1,2,3,4].Initial works in the field focused on developing quantum versions of existing classical algorithms, thereby achieving asymptotically faster runtimes [5,6,7].However, these algorithms are beyond the capabilities of current quantum hardware, as they require the execution of rather deep quantum circuits and often rely on specific assumptions like quantum memories [8] to overcome bottlenecks in data loading and readout.More recently, the fast technological progress and wide availability of noisy quantum processors [9,10] motivated the emergence of a second generation of QML algorithms based on parameterized quantum circuits (PQCs) [11,12,13,14,15,16,17,18]. In this alternative QML paradigm, quantum computers may function as a co-processor working in tandem with classical computers.Because the circuit Ansätze can be shallow in depth, PQC-based algorithms are in principle compatible with the existing generation of quantum devices.
An important obstacle to the viability of QML in the near term is the presence of hardware noise [19].Coherent errors often simply shift the position of minima in the loss landscape, in which case they can be trained away [20].In contrast, errors arising from incoherent noise have more adverse effects that hinder trainability and performance.As an example, incoherent errors cause the loss function of a large family of Ansätze to vanish exponentially with increasing number of layers, a phenomenon known as noise-induced barren plateaus (NIBP) [21].A similar effect occurs in kernel-based methods where noise can lead to an exponential concentration of the kernel values [22].
A common approach to mitigate the effects of device noise is to use protocols that estimate improved expectation values through classical post-processing, a procedure known as er-ror mitigation [23,24,25,26,27,28].While schemes such as zero-noise extrapolation [23,24] and virtual distillation [25] can significantly enhance the performances of noisy processors [29], they also introduce additional experimental or computational overhead and are can be ineffective in preventing noise-induced cost concentration [30,22,31,32].A complementary strategy is to apply error suppression schemes that suppress noise at the hardware level, such as dynamical decoupling [33,34] and Pauli twirling [35].A recently-introduced tool for hardware-level error suppression is pulse-efficient (PE) transpilation of cross-resonance gates [36,37].The core idea of this technique is to decompose two-qubit gates into hardware-native ones, such as echoed crossresonance pulses on superconducting architectures based on fixed-frequency qubits.The echoes are then exposed to the transpiler to remove redundant single-qubit rotations.The resulting circuits often have significantly lower schedule durations, thereby mitigating errors introduced by finite coherence times.Crucially, PE transpilation requires no additional overhead or calibration.Refs.[36,38] applied this technique to combinatorial optimization tasks and demonstrated significant improvements over conventional CNOTbased transpilation.Related ideas were also explored in Refs.[39,40,41], which directly optimized pulse parameters to address variational problems.However, a comprehensive study of the impact of PE transpilation on the performance of paradigmatic QML algorithms powered by parameterized quantum circuits was not available until now.
In this work, we demonstrate the application of PE transpilation to three paradigmatic QML tasks.To highlight the general applicability and versatility of our method, we conduct our experiments across three different IBM Quantum [42] backends using qiskit [43].We begin by training Quantum Neural Networks (QNNs) on a synthetic dataset and observe that PE transpilation significantly improves the resulting training loss and classification accuracy.Afterwards, we apply PE transpilation to a quantum kernel circuit that we use to classify all 10 digits of the commonlyused MNIST dataset.When compared to CNOTbased transpilation, PE transpilation allows us to significantly extend the width of the kernel circuits and achieve a classification accuracy of ≈ 90%.Finally, we explicitly compute the effect of PE transpilation on NIBP.In particular, we study how the loss function of the Hamiltonian Variational Ansatz [44] evolves for increasing number of layers and find that PE transpilation slows down the onset of the NIBP.

Application to quantum neural networks
QNNs are one of the leading variational algorithms for QML [45,46,11,47,48] We begin by studying the impact of PE transpilation on a binary classification task with the QNN shown in Fig. 1 (see App.A for a description of the underlying mechanism which enables our specific PE method).We consider an architecture similar to the one reported in Ref. [17] but restrict entangling operations to neighboring pairs of qubits in order to avoid prohibitively large circuit depths.A forward pass of the QNN starts with two layers of the feature map proposed in Ref. [14] (Fig. 1 (b)).A single layer applies Hadamard and R Z gates on all qubits, where the R Z rotation angle is related to the feature values by the relation 2x i .This is followed by R ZZ operations on neighboring pairs of qubits, where the rotation angle x ij = 2(π − x i )(π − x j ) depends on products of the features.We then apply a variational ansatz that consists of parameterized R Y gates applied on every qubit, followed by CNOTs on neighboring qubits, and a final set of parameterized R Y (Fig. 1 (a)).The angles θ i of the R Y operations are the training parameters that are optimized classically to fit a given target function.Finally, we measure the parity of the output A forward pass of the network begins with an encoding stage that maps feature vectors ⃗ x to a quantum state through a parameterized feature map U FM (⃗ x).The second stage of the network consists of a variational form with parameterized R Y (whose angles are optimized with a classical routine) and CNOT gates applied on neighboring qubits.Finally, we measure the parity of the qubits in the Z basis, with the fraction of even (odd) bitstrings representing the probability of class 0 (1).(b) A single layer of the parameterized feature map U FM (⃗ x).Hadamard gates are applied on every qubit, followed by R ZZ rotations on every pair of neighboring qubits (highlighted in blue).(c) Pulse schedules that implement an R ZZ (0.5) gate on ibmq_guadalupe through a conventional CNOT-based approach (top panel) and pulse-efficient transpilation (bottom panel).The section highlighted in red in the top panel corresponds to the pulses that implement a single CNOT gate through cross-resonance.Pulse-efficient transpilation significantly decreases the schedule duration resulting in higher circuit fidelities.
bit strings where P = i Z i is the parity operator and Z i is the standard Pauli Z operator acting on the i-th qubit.We associate even parity with class 0 and odd parity with class 1.
We benchmark the QNN performance on a synthetic two-class dataset with standard and PE transpilation.To ensure the QNN can distinguish between the two classes with high accuracy, the dataset is generated via the QNN model by fixing the trainable parameters ⃗ θ s in such a way that the separation of the classes is maximised.The training procedure is then carried out starting from a new, random initialisation of the parameters, and should ideally recover the set used to generate the data.More formally, we uniformly sample 600 feature vectors ⃗ x ∈ [0, 1) n and compute their parity m(⃗ x, ⃗ θ) through noiseless simulations.Using the L-BFGS-B optimizer [49], we search for QNN parameters ⃗ θ s that maximize the average absolute parity Out of this set of feature vectors, we further select the 50 samples for each class with the largest absolute parity expectation value.Fig. 2(a) shows an example of the resulting dataset for 2 qubits alongside the probability of observing class 0 p(y = 0, ⃗ x, ⃗ θ s ).We observe that the decision boundary of the QNN correctly separates the two classes.
We train the QNNs on ibmq_jakarta using a cross-entropy loss function and 50 iterations of Spall's SPSA stochastic gradient descent algorithm [50] with an automated calibration phase [44] of 50 iterations.Moreover, we apply readout error mitigation [51] and use 100 samples both for training and testing the networks.We show example training curves for n = 3 qubits in Fig. 2(b) which converge almost immediately after the initial calibration stage.In Fig. 3(ab), we compare the training loss and testing classification accuracy of pulse-efficient and regular QNNs with n = 2 to 5 qubits.While the performance of the standard QNN deteriorates rapidly after n = 2, the PE QNN remains competitive with the performance of the noiseless simulation throughout the whole range of n.We attribute this improvement to a reduction in incoherent error due to the shorter schedule duration in PE circuits (Fig. 3(c)).More specifically, as we show in Fig. 1(c) PE transpilation significantly short-  ens the duration of R ZZ gates [36] and hence of the feature map portion of the circuit.

Application to quantum kernels
We now turn to investigating the impact of PE transpilation on fidelity-based quantum kernel classification.This class of algorithms uses a quantum feature map to compute a similarity measure between input data points [15, 14, 52, 18, 53] The resulting Gram matrix K is then fed to a classical kernel method such as a support vector machine [54] to predict the corresponding labels ⃗ y.
For this experiment we use the same feature map as in the QNN case presented above, but increase the depth to 4 in order to achieve higher classification accuracy.Following the approach outlined in Ref. [14], we estimate the feature vector kernel function for all pairs of training data ⃗ x i , ⃗ x j using 8192 shots.Specifically, we apply the circuit U † FM ( ⃗ x i )U † FM ( ⃗ x j )|0⟩ ⊗n and then measure all qubits in the Z basis.The kernel entry K( ⃗ x i , ⃗ x j ) then corresponds to the frequency of the zero bitstring 0 n .Having repeated this process for all the training data, we feed the resulting kernel matrix to a conventional support vector machine implemented with scikit-learn [55].For this classification task, we choose the MNIST dataset, a popular real-world database of handwritten digits [56], and use 10 training and testing samples for each of the ten digits.The kernel chosen for this experiment uses a the number of input features equal to the number of qubits.In the case of the MNIST dataset, the resolution of the images exceeds the number of qubits we use.We therefore reduce the number of features through a truncated singular value decomposition, a standard dimensionality reduction procedure.
Due to the sparsity of the kernel circuits, the qubits experience large idle times that lead to error accumulation [57].To mitigate this source of noise, we combine PE transpilation with a dynamical decoupling protocol.Whenever qubit i has an idle time T idle larger than twice the single qubit gate time, we apply a dynamical decoupling sequence τ /2−X p −τ −X m −τ /2 with delay times τ = T idle − 2T X p/m where X p/m are positive/negative π pulses around the x-axis and T X p/m their duration.
We run the kernel circuits on linearly connected subsets of qubits on ibmq_montreal with PE and regular transpilation, along with noiseless simulations.Fig. 4(a) shows the testing classification accuracy as a function of the number of qubits for all three methods.Focusing first on the simulated curve, the classification accuracy increases monotonically with the number of qubits.This occurs because the number of qubits increases concomitantly with the number of training features, which makes it easier to distinguish different digits.Turning to the device runs, the performance of the regular circuit remains very close to the ideal curve up to 5 qubits, after which it degrades rapidly and stays below 80%.This sharp turning point coincides with the average circuit duration becoming comparable with the average device T 1 , see Fig 4(c).In contrast, the PE transpilation circuit durations are always well below the coherence limit of the device, thereby yielding classification accuracies that reach 90% and closely track the simulated values.To further quantify the performance of the device runs, in Fig 4(c) we show the normalized mean square error of the experimental kernel matrices compared to the simulated matrix K sim : The error curves show a similar trend to the average circuit durations, with the error of the regular circuits increasing much faster than the PE circuits.In Fig. 4(d-f) we show training kernel matrices with n = 9 qubits for all three methods.The PE transpilation kernel matrix is close to the simulated matrix and has an approximately block-diagonal structure, indicating the feature map is capable of separating the digits with high accuracy.In contrast, the regular transpilation matrix is mostly devoid of structure, which results in significantly lower classification accuracy.Moreover, its matrix elements are close to 0, signaling that the underlying bitstring distribution is extremely noisy.In App.B we perform additional experiments to estimate how much of the performance boost we observe can be attributed to PE transpilation versus dynamical decoupling and find that PE transpilation is the main driver of the improvement.

Impact on noise-induced barren plateaus
In our last experiment, we investigate the impact of PE transpilation on NIBP.We implement the Hamiltonian Variational Ansatz for the transverse field Ising model as considered in Refs.[58] and [21].We consider a linearly connected chain of spins, such that a single layer of the ansatz is given by where and σ i α are the conventional Pauli matrices acting on the i-th qubit.To study the onset of NIBP, we study the behavior of a local observable with increasing number of qubits n and ansatz layers L. Following [21], we measure the local parity of the first two qubits O = Z 0 Z 1 along with its derivative with respect to the last β i,i+1 parameter.Further, we set the number of layers to increase linearly with the number of qubits L = 2(n − 1) and perform our runs on ibmq_montreal.
In Figure 5 we show the cost function and its partial derivative with respect to the last β i,i+1 parameter averaged over 100 random parameter sets.In the case of noiseless simulations, both curves appear to slowly decay polynomially with increasing number of qubits.On the other hand, the device runs show a noticeable exponential decay starting at around n = 8 qubits, which we attribute to the onset of NIBP.However, both the average loss function and derivative of the PE transpilation circuits are consistently above those of the regular circuits.Although PE transpilation does not remove NIBP, it has more favorable scaling which would allow going to higher depths of number of qubits when compared with regular transpilation.

Discussion
In this work, we studied the impact of PE transpilation on the performance of near-term QML algorithms.We began by performing binary classification of a synthetic dataset with a QNN, where we found that PE circuits achieved significantly higher classification accuracy and lower training loss.Secondly, we used quantum kernel estimation to classify a real-world dataset of handwritten digits.Combining PE transpilation with dynamical decoupling allowed us to accurately estimate kernels up to 9 qubits and achieve 90% classification accuracy, whilst regular circuits remain below 80%.Lastly, we studied the onset of NIBP on a commonly-used ansatz for quantum chemistry.Our results show that PE transpilation slows down the onset of NIBP, which allows executing variational quantum algorithms at higher numbers of qubits when compared with regular transpilation.
Our results highlight a key advantage of PE transpilation, namely that it introduces no additional overhead or calibrations and is compatible with most qiskit-based programs with minimal modifications.Furthermore, we observe that it consistently improves circuit performance across different models and devices.This makes our proposed approach particularly appealing for applications and use cases that rely on remote device access and control.We expect that these improvements extend to most protocols featuring parameterized R ZX (θ) gates: these natively appear in a broad class of quantum algorithms, such as Hamiltonian simulation schemes [59,60] -with potential applications to optimization [61] and sampling problems [62,63] -and unitary coupled cluster circuits in quantum chemistry.
While finalizing this work, we became aware of a recent preprint [64] that also applies PE transpilation to PQCs designed for quantum chemistry and optimization tasks.

A Pulse-efficient cross-resonance circuits
In this appendix we briefly review pulse-efficient transpilation for cross-resonance-based hardware (typically coupled fixed frequency transmons).The cross-resonance interaction arises by driving a control qubit at the target qubit's frequency.Within the two-level approximation, the resulting time-independent Hamiltonian reads where B = ω ZI I + ω ZX X + ω ZY Y , C = ω IX X + ω IY Y + ω IZ Z, and I, X, Y, Z are Pauli matrices [65,66,67].By using echoed cross-resonance pulses with rotary tones [68,69], is possible to isolate the ZX interaction and thus implement the unitary R ZX (θ) = exp{−iZXθ/2} with good accuracy.To first approximation, the rotation angle of this conditional rotation is given by θ = t CR ω ZX (A), where ω ZX (A) is a non-linear interaction term that depends on the amplitude of the cross-resonance pulse.IBM Quantum backends leverage the cross-resonance interaction to implement CNOT gates constructed with echoed rotations R ZX (π/2) = CR(π/4)XCR(−π/4)X.
Here, CR are the non-echoed cross-resonance pulses, typically shaped as flat-top gaussians.Together with a complete set of one-qubit gates, this CNOT gate is then used as a primitive to synthesize arbitrary two-qubit gates.However, it is possible to implement arbitrary rotations R ZX (θ) by appropriately scaling the crossresonance pulses [36].The core idea of PE transpilation is to leverage this native parametric gate to decrease circuit duration and achieve higher fidelities.The scheme works as follows.Using Cartan's decomposition, we first rewrite a CNOTtranspiled circuit in terms of parameterized, nonechoed R ZX (θ) rotations.Then, we expand the R ZX (θ) gates and expose its echoed implementation CR(θ/2)XCR(−θ/2)X.A final transpilation pass removes redundant single-qubit rotations, leaving at most one single-qubit rotation between non-echoed R ZX pulses.Though nonlinear behaviour can result in coherent over or under rotations, this method achieves significant reductions in circuit duration for certain gates (such as the R ZZ (θ) interaction used in this work) compared to conventional CNOT-based transpi-lation.

B Pulse-efficient kernel classification without dynamical decoupling
In Section 3, we showed kernel estimation results for circuits with PE transpilation and dynamical decoupling.A natural follow-up question is how much of the performance improvement can be attributed to each of the two error suppression strategies.To address this question, we run a smaller set of experiments on ibmq_gaudalupe to classify digits 0, 7, and 9.We execute the kernels with PE transpilation with and without dynamical decoupling and show the resulting classification accuracy and NMSE in Fig. 6.Although dynamical decoupling has a sizeable effect, we conclude that PE transpilation is the primary driver of the performance improvement over regular circuits.

1 Figure 1 :
Figure 1: (a) Quantum neural network architecture for binary classification.A forward pass of the network begins with an encoding stage that maps feature vectors

1 Figure 2 :
Figure 2: Training a quantum neural network on ibmq_jakarta.(a) Example of a synthetic two-class dataset generated from the QNN shown in Fig. 1 with n = 2 features.The heat map represents the probability assigned to class 0 by the QNN with the weights set to those used to generate the binary dataset.The red dashed lines correspond to the decision boundaries p(y = 0) = p(y = 1) = 0.5 of the model.(b) Convergence of the training loss on n = 4 qubits dataset after 120 iterations of the SPSA algorithm.The blue, orange and green curves show the training loss of the simulated, pulse-efficient, and regular quantum neural networks, respectively.

1 Figure 3 :
Figure 3: Comparison of the performance of quantum neural networks with pulse-efficient and regular transpilation.(a) Training loss after 120 iterations of the SPSA algorithm.(b) Classification accuracy on the test set.(c) Average schedule duration of QNN circuit.The pulse-efficient circuits have significantly lower schedule duration, which improves circuit fidelity and classification accuracy.

1 Figure 4 :
Figure 4: Impact of pulse-efficient transpilation on quantum kernel classification run on ibmq_montreal.(a) Classification accuracy on test dataset for a noiseless simulation, pulse-efficient, and regular circuits.(b) Normalized mean square error of the training kernel matrix compared to the simulated kernel matrix.(c) Average circuit duration for pulse-efficient and regular circuits compared to the T 1 .In the bottom panel we show the training kernel matrices at n = 9 qubits obtained with (d) noiseless simulations, (e) pulse-efficient circuits, and (f) regular circuits.

1 Figure 5 :
Figure5: Mitigation of noise-induced barren plateaus through pulse-efficient transpilation.We consider an ansatz similar to the Hardware Variational Ansatz with a number of layers that increases linearly with the number of qubits.By sampling over 100 sets of random parameters we compute the average (a) loss function, and (b) partial derivative with respect to the last β i,i+1 parameter.

1 Figure 6 :
Figure 6: Impact of pulse-efficient transpilation on quantum kernel classification run on ibmq_gaudalupe.