# Gate set tomography is not just hyperaccurate, it’s a different way of thinking

This is a Perspective on "Gate Set Tomography" by Erik Nielsen, John King Gamble, Kenneth Rudinger, Travis Scholten, Kevin Young, and Robin Blume-Kohout, published in Quantum 5, 557 (2021).

By Gregory A. L. White (School of Physics, The University of Melbourne, Parkville, VIC 3010, Australia).

## Characterising quantum gates

When a quantum computation breaks down, where — and how — did the error occur? We often take for granted the diversity of intricate noise in quantum information processors (QIPs). A multitude of metrics exist that summarise performance to varying levels of crudeness or refinement.
To a certain extent, though, asking exactly what happened and exactly where it happened can miss the point on what is even physically observable.

For the most part, protocols to measure noise tend not to be injective; extraneous dynamics with wildly different physical origins can produce the same infidelities. Moreover, the weighting of different types of error can be different. A reported randomised benchmarking figure, for example, may indicate excellent gates but in truth hide more nefarious noise types when a full algorithm is run [1]. To obtain a more complete picture for control diagnostics one typically considers the most general description of two-time quantum dynamics: the completely positive, trace-preserving (CPTP) map, which predicts the evolution of an initial density matrix at some point in time to a final density matrix at another. Historically, the procedure to determine these maps experimentally has been quantum process tomography (QPT). QPT involves repeatedly applying some gate to a series of different basis inputs, measuring in different bases, and then finding a model of the channel to best explain the data. But, as we will see, naive estimation in this manner can be fraught: state preparation and measurement (SPAM) errors can generate an incorrect and inconsistent evaluation of the device.

In a study recently published in Quantum [2], Nielsen et al. present an in-depth breakdown of a technique called gate set tomography (GST), sparing no expense. GST addresses many of the shortcomings to QPT, establishing an ab initio device characterisation. It then goes above and beyond, bringing to the fore conceptual ideas about the class and composition of circuits; a fresh exploration into subtle frame-of-reference matters that arise; experimental design and best estimation procedures; as well as a more candid discussion of the historical background, including which approaches did and did not work. The philosophy behind GST is simple and has been around for several years: expand the error model to include all components of the experiment. With this, prior calibrations are not relied on. All experimenter-chosen interventions are collected into a “gate set”: the control operations, the preparable states, and the measurement effects — positioned to be estimated from a series of carefully designed experiments.

Consider the problem of QPT. An experimenter has a noisy quantum channel $\Lambda$ that they wish to estimate in the form of a CPTP map. As a linear operator, this is completely determined by input-output relations on a complete basis. For example, on a typical device, one might have capacity to prepare a single state $\rho_0:=|0\rangle\!\langle 0|$. Applying each of the four gates $\{I, H, SH, X\}$ on individual runs generates an informationally complete (IC) set of preparation states $\mathcal{S}$. That is, these collectively span the single-qubit Hilbert-Schmidt space. Following the application of $\Lambda$, the experimenter may measure in an IC basis which usually constitutes a basis-change gate followed by a projective measurement in the $Z-$basis. Let us take this to be a measurement in $X$, $Y$, and $Z$, giving the POVM $\mathcal{J} = \frac{1}{3}\{|+\rangle\!\langle + |, |-\rangle\!\langle – |, |i+\rangle\!\langle i\!+\!|, |i-\rangle\!\langle i\!-\!|, |0\rangle\!\langle 0 |, |1\rangle\!\langle 1 |\}$. By vectorising our states $|\rho_i\rangle\!\rangle\in \mathcal{S}$ and measurement effects $\langle\!\langle\Pi_j|\in\mathcal{J}$ in some basis, employing the superoperator representation [3] $R_{\Lambda}$, we end up with the following system of linear equations:

\begin{split}
\text{Tr}[\Pi_j\Lambda(\rho_i)] &= m_{ij}\\
\rightarrow\langle\!\langle \Pi_j|R_\Lambda |\rho_i\rangle\!\rangle &= m_{ij},
\end{split}

for a set of measurement outcomes $\{m_{ij}\}$. With $\mathcal{S}$ and $\mathcal{J}$ known, the equation displayed above can be (pseudo)inverted to find an estimate for $\Lambda$. But in reality, how much can we say that $\mathcal{S}$ and $\mathcal{J}$ are known? Measurement tends to be the single largest error component in NISQ-era devices. Thus, we are likely solving the wrong set of equations. QPT carries an assumption of known SPAM which cannot be justified in a practical setting. At best, an error on the measurement operator may be independent of the POVM effect (such as a depolarising channel) which will then be falsely attributed to $\Lambda$. At worst, if the noise is basis-dependent, then attempting to solve the system of equations above can lead to something unphysical, or wildly incorrect [4].

Gate set tomography eliminates this need for prior calibration, nothing is taken as a priori known.

The object under scrutiny includes the ability to prepare and read out a state. A gate set $\mathcal{G}$ is defined as the collection of: the native state preparation (above, in the ideal case, $|0\rangle\! \langle 0 |$), a native measurement operation (continuing, $\{|0\rangle\!\langle 0|, |1\rangle\!\langle 1|\})$, and a basic set of control operations $\mathcal{C} = \{G_1,G_2,\cdots,G_N\}$ which contains the do-nothing gate. From $\mathcal{C}$ two extra sets are defined, the preparation fiducials $\mathcal{F}_p$ and the measurement fiducials $\mathcal{F}_m$. These fiducial sets comprise short sequences of gates $\{f_i^{(p/m)}\}$ selected from $\mathcal{C}$ such that $\{f_i^{(p)}|\rho_0\rangle\!\rangle\}$ and $\{\langle\!\langle\Pi_j|f_i^{(m)}\}$ are informationally complete. Including these covers the possibility of error in the SPAM basis changes. Thus, the set of experiments generated from this linear form of GST look like $\langle\!\langle\Pi_l|f_k^{(m)}G_jf_i^{(p)}|\rho_0\rangle\!\rangle$. Informational completeness of the fiducials, including measurement of $\langle\!\langle \Pi_k|f_j^{(m)}f_i^{(p)}|\rho_0\rangle\!\rangle$ ensures that every parameter of every part of the gate set is in principle measureable and determinable via linear inversion. This estimate may be improved through standard tomographic post-processing, such as physically-constrained maximum likelihood estimation. However, as we are just about to see, there is no single solution to this problem.

## Gauge freedoms and quantum reference frames

We may only meaningfully update an experimental model with access to extra observations. Born’s rule gives us this connection between the quantum and classical world. Namely, all quantum mechanical observations come in the form of measurement frequencies from an underlying probability distribution:

p_x = \langle\!\langle\Pi_x| G_N\cdots G_2 G_1 |\rho_0\rangle\!\rangle.

One would think that sufficiently collecting data would eventually uniquely fix a picture of the dynamics, however there is a surprising consequence to lifting SPAM vectors from their privileged position. When taken as known, these SPAM vectors implicitly set a reference frame. This frame then becomes unfixed as the vectors are permitted to vary with the remainder of the control operations. Observe that if we select an invertible superoperator $B$ and perform the transformations:

\label{gauge-transformation}
\begin{split}
|\rho_0\rangle\!\rangle &\mapsto B|\rho_0\rangle\!\rangle\\
\langle\!\langle\Pi_j| &\mapsto \langle\!\langle\Pi_j|B^{-1}\\
G_i &\mapsto BG_i B^{-1}
\end{split}

then $p_x$ is left unchanged. But $p_x$ is our only connection to the underlying quantum description. This means that the likelihood function does not have a maximum that is stationary in all directions, but rather a continuous ridge tracing across unobservable parameter values. Once a gate set is found through either linear inversion or by searching for the global maximum of the likelihood function, a choice has to be made about the gauge. This is, in some sense, arbitrary, but some gauges are more desirable than others. If the world is put on a tilt, we’d like it to be a tilt closest to what we are familiar with. This means choosing a gauge respecting usual notions of physicality (complete positivity, trace preservation) so that we can use the usual information-theoretic tools to inspect their properties. It also means finding the gauge that leaves the gate set closest to its target in a procedure called gauge optimisation. Partly this is for convenience, but partly this is due to an uncomfortable feature of this freedom: that many of the metrics we’re familiar with, that are regularly reported — such as gate fidelity or diamond distance — are gauge variant [5,6]. Selecting a gauge as close as possible to the target gate set therefore minimises the illusion of errors created by the gauge transformation. Even treating the ideal target gate set to some unitary gauge transformation will seemingly generate arbitrarily large coherent errors, even though nothing has physically happened.

The tools heretofore employed by the quantum information community have usually been with respect to a privileged reference frame. These can vary quite substantially over a gauge orbit. Properties such as gate eigenvalues, however, are invariant under similarity transformations with respect to our unitary gauges. Thus, a gauge transformation cannot hide, for example the effects of stochastic error. The study of gauge-invariant metrics is worthy of closer attention in the future as we come to demand further stringency in our benchmarking protocols: both in the sense of finding robust metrics to represent meaningful intended information, and in the sense of proselytising its importance to the wider quantum information community [5]. The elevation of preparations and measurement effects to the same level of intervention’ as gates — the former, a mapping from the trivial Hilbert space to states, and the latter a mapping from states to the trivial Hilbert space — represents a steady but significant shift in both quantum computing, and more broadly quantum information science. More work is needed to reflect this in an analytical setting.

## Heisenberg scaling

The first paradigmatic feature of GST, as discussed, is that it is self-calibrating to allow for the possibility of unknown SPAM errors. The second is that it can be carefully designed to amplify gate errors for more accurate estimation. On the road to fault-tolerant quantum computing, this is important. Quantum error correcting threshold metrics are typically designated by diamond distance — the worst case error rate — whereby coherent errors of order $\eta$ make $\mathcal{O}(\eta)$ contributions to the measure. In contrast, coherent gate errors can be suppressed by randomised benchmarking: their contribution to the reported error rate is $\mathcal{O}(\eta^2)$, dominated instead mostly by stochastic noise [7]. Randomised benchmarking cannot tell the whole story about device quality.

Heisenberg-type scaling is estimation with sampling precision that goes down like $1/N$, rather than the usual $1/\sqrt{N}$. This is not a native part of GST’s conception, but rather central to an extension called long-sequence GST. The basic premise behind its hyperaccuracy is as follows. Suppose a gate $G$ coherently over-rotates by some angle $\theta$. In a single application of a gate, this term shows up linearly, and can be estimated to a precision limited by shot noise, $\epsilon = \mathcal{O}(1/\sqrt{N})$. For precisions of even $10^{-3}$, this becomes unwieldy. Instead, however, we may consider characterisation of $G^L$, across some integer $L$ number of repetitions. Now, the over-rotation may be by up to $L\theta$, measured to within $\epsilon$, so $\theta$ is measured to within a precision of $\epsilon/L$. Here, $L$ is usually chosen to take a range of values so as not to understate the case where $\theta \approx 2\pi/L_0$ for a single $L_0$. Of course, repeating a gate does not necessarily compound its error. A gate on a $d-$dimensional system has $\mathcal{O}(d^4)$ free parameters, they cannot all be simultaneously magnified. Tilt errors, for example, can cancel themselves out after a small number repeated applications. A way to address this is to compose gates together in different configurations such that the different parameters are structurally amplified.

In the same way that fiducial operations sandwiched each gate to guarantee informational completeness in linear GST, several gate applications are combined into a short circuit called a erm $g$. The set of germs is chosen carefully to satisfy a certain structure: such that every free parameter of every gate ought to be amplified by at least one germ. A set of germs satisfying this property is termed amplificationally complete. The collection of experiments designated by GST now guarantees not only that all measureable parameters are exposed, but that all gate parameters are amplified for precise estimation. The end result is that if $N’$ total experiments are conducted, the precision of the GST estimate of the repeated gates in the gate set scales as $1/N’$, rather than $1/\sqrt{N’}$, permitting far more accurate estimation of device components.

## Putting it all together

Thus, suppose an experimenter wishes to estimate the exact matrix model representation of the $X$ gate on their QIP. They pull together a set of other control operations in their experimentally accessible cabinet — say, $Y(\pi/2)$, $X(\pi/2)$, $H$, and $Z$. They generate the preparation fiducial sets, the measurement fiducial sets, and the collection of germs (or, more accurately, the software performs this step). Next, they go away to their QIP and run the experiments — circuits given by all combinations of preparation fiducials, with all combinations of germs, with all combinations of measurement fiducials. Then, they repeat this for logarithmically spaced values for $L$: 1, 2, 4, 8, 16 repetitions of each germ. When it comes time to post-process, the highly non-convex optimisation problem of finding the most consistent gate set with the data (of which there are infinitely many!) is solved, such that the gate set is possibly CPTP. Finally, then, a gauge optimisation problem is posed. Without sacrificing any consistency with the observed data, the gauge is varied until the gate set is as close as possible to the target gate set. And voil\’a! A self-consistent, calibration-free, hyperaccurate picture of your interventions on a carefully curated quantum system.

Of course, laboratory observations are frequencies rather than probabilities, and consequently it is important to recognise that the outcome of GST is not a reconstruction as such, but rather an estimation. An extensive focus of [2] (and the body of work behind it) is quantifying the uncertainty of the estimate, with a foundation steeped in the authors’ oeuvre. This is nuanced. There are two broad possibilities behind where an estimate may go wrong: 1) finite sampling prevents a “true” observation of the gate set, and so there is some statistical wiggle room around each outcome; 2) the model itself may be insufficient, in which case it could never describe all of the dynamics even with an infinite number of shots. The former inherits a region estimator, a generalisation of confidence intervals which might say that these are some other gate sets which may be statistically plausible with the data, and here are their properties’. The latter is a more foundational issue. Model violation can occur for a variety of reasons: the statistics of the system may drift over time, such that no one gate set can explain the circuits observed over different intervals. Alternatively, non-Markovian dynamics, where the system-environment interaction is strong, can lead to a loss of CP-divisibility, and the standard operational formalism of open dynamics breaks down. This is difficult to predict before the fact. But post-procedure GST comes equipped with a way to quantify model violation, which, if high, indicates that other techniques need to be employed [8,9,10,11,12]. Let $k$ be the difference between total number of circuit outcomes, and number of model parameters. If the data came from a purely Markovian source, then from finite sampling alone, one would expect the log-likelihood to be distributed like $\chi^2_k$. This hypothesis can be tested by computing how many standard deviations outside the mean the actual log-likelihood can be found. For large model violation, it may no longer be possible to operationally characterise gates on a system, motivating the need to move beyond this.

## Conclusions and outlook

Laboratories the world over have adopted the tenets of GST for characterising their equipment — not least because of the simple and sophisticated $\texttt{pyGSTi}$ software package available [13]. The work presented by Nielsen et al. is simultaneously an accessible introduction and a rigorous deep-dive into the nooks and crannies of GST. The remarkably frank historical detours are pertinent not only to methodology, but form a window into what it means to do science. The group has clearly spent a long time honing their philosophy about model components, and offer a fresh perspective on quantum information processing. I invite the reader to explore the complete piece. As a tool for benchmarking current and next-generation QIPs, gate set tomography is an attentively and comprehensively crafted package waiting to be used for certification of a fully fledged quantum computer with all its components below the fault tolerant threshold [14]. But in its elevation of a gate set as more fundamental than its individual components, there is more than meets the eye.

### ► References

[1] Timothy Proctor, Kenneth Rudinger, Kevin Young, Erik Nielsen, and Robin Blume-Kohout, Measuring the Capabilities of Quantum Computers, arXiv:2008.11294 (2020a).
arXiv:2008.11294

[2] Erik Nielsen, John King Gamble, Kenneth Rudinger, Travis Scholten, Kevin Young, and Robin Blume-Kohout, Gate Set Tomography, Quantum 5, 557 (2021), arXiv:2009.07301.
https:/​/​doi.org/​10.22331/​q-2021-10-05-557
arXiv:2009.07301

[3] Alexei Gilchrist, Daniel R. Terno, and Christopher J. Wood, Vectorization of quantum operations and its use, arXiv:0911.2539 (2009).
arXiv:0911.2539

[4] Daniel Greenbaum, Introduction to Quantum Gate Set Tomography, arXiv:1509.02921 (2015).
arXiv:1509.02921

[5] Junan Lin, Brandon Buonacorsi, Raymond Laflamme, and Joel J. Wallman, On the freedom in representing quantum operations, New Journal of Physics 21, 023006 (2019), arXiv:1810.05631.
https:/​/​doi.org/​10.1088/​1367-2630/​ab075a
arXiv:1810.05631

[6] Timothy Proctor, Kenneth Rudinger, Kevin Young, Mohan Sarovar, and Robin Blume-Kohout, What Randomized Benchmarking Actually Measures, Physical Review Letters 119, 130502 (2017), arXiv:1702.01853.
https:/​/​doi.org/​10.1103/​PhysRevLett.119.130502
arXiv:1702.01853

[7] Yuval R. Sanders, Joel J. Wallman, and Barry C. Sanders, Bounding quantum gate error rate based on reported average fidelity, New Journal of Physics 18 (2016), arXiv:1501.04932.
https:/​/​doi.org/​10.1088/​1367-2630/​18/​1/​012002
arXiv:1501.04932

[8] G. A. L. White, C. D. Hill, F. A. Pollock, L. C. L. Hollenberg, and K. Modi, Demonstration of non-Markovian process characterisation and control on a quantum processor, Nature Communications 11, 6301 (2020), arXiv:2004.14018.
https:/​/​doi.org/​10.1038/​s41467-020-20113-3
arXiv:2004.14018

[9] G. A. L. White, F. A. Pollock, L. C. L. Hollenberg, K. Modi, and C. D. Hill, Non-Markovian Quantum Process Tomography, arXiv:2106.11722 (2021).
arXiv:2106.11722

[10] Kenneth Rudinger, Timothy Proctor, Dylan Langharst, Mohan Sarovar, Kevin Young, and Robin Blume-Kohout, Probing Context-Dependent Errors in Quantum Processors, Physical Review X 9, 21045 (2019), arXiv:1810.05651.
https:/​/​doi.org/​10.1103/​PhysRevX.9.021045
arXiv:1810.05651

[11] Mohan Sarovar, Timothy Proctor, Kenneth Rudinger, Kevin Young, Erik Nielsen, and Robin Blume-Kohout, Detecting crosstalk errors in quantum information processors, Quantum 4, 321 (2020), arXiv:1908.09855.
https:/​/​doi.org/​10.22331/​q-2020-09-11-321
arXiv:1908.09855

[12] Timothy Proctor, Melissa Revelle, Erik Nielsen, Kenneth Rudinger, Daniel Lobser, Peter Maunz, Robin Blume-Kohout, and Kevin Young, Detecting and tracking drift in quantum information processors, Nature Communications 11, 5396 (2020b), arXiv:1907.13608.
https:/​/​doi.org/​10.1038/​s41467-020-19074-4
arXiv:1907.13608

[13] Erik Nielsen, Kenneth Rudinger, Timothy Proctor, Antonio Russo, Kevin Young, and Robin Blume-Kohout, Probing quantum processor performance with pyGSTi, Quantum Science and Technology 5, 044002 (2020), arXiv:2002.12476.
https:/​/​doi.org/​10.1088/​2058-9565/​ab8aa4
arXiv:2002.12476

[14] Robin Blume-Kohout, John King Gamble, Erik Nielsen, Kenneth Rudinger, Jonathan Mizrahi, Kevin Fortier, and Peter Maunz, Demonstration of qubit operations below a rigorous fault tolerance threshold with gate set tomography, Nature Communications 8, 14485 (2017), arXiv:1605.07674.
https:/​/​doi.org/​10.1038/​ncomms14485
arXiv:1605.07674

### Cited by

On Crossref's cited-by service no data on citing works was found (last attempt 2021-10-27 20:17:59). On SAO/NASA ADS no data on citing works was found (last attempt 2021-10-27 20:18:00).