MQT Bench: Benchmarking Software and Design Automation Tools for Quantum Computing

Quantum software tools for a wide variety of design tasks on and across different levels of abstraction are crucial in order to eventually realize useful quantum applications. This requires practical and relevant benchmarks for new software tools to be empirically evaluated and compared to the current state of the art. Although benchmarks for specific design tasks are commonly available, the demand for an overarching cross-level benchmark suite has not yet been fully met and there is no mutual consolidation in how quantum software tools are evaluated thus far. In this work, we propose the MQT Bench benchmark suite (as part of the Munich Quantum Toolkit, MQT) based on four core traits: (1) cross-level support for different abstraction levels, (2) accessibility via an easy-to-use web interface (https://www.cda.cit.tum.de/mqtbench) and a Python package, (3) provision of a broad selection of benchmarks to facilitate generalizability, as well as (4) extendability to future algorithms, gate-sets, and hardware architectures. By comprising more than 70,000 benchmark circuits ranging from 2 to 130 qubits on four abstraction levels, MQT Bench presents a first step towards benchmarking different abstraction levels with a single benchmark suite to increase comparability, reproducibility, and transparency.


Introduction
Quantum computing has gained a lot of attention due to its promising applications and the advances in quantum computing hardware. But still, designing quantum algorithms often means to manually implement quantum circuits on the gate-level-similar to assembly language in classical computing. Hence, there is an urgent need to enhance the workflow of designing and testing quantum algorithms. Without software tools that aid in the design of quantum algorithms, the underlying hardware may not be efficiently utilized. Thus, researchers and developers have already proposed tools for various design tasks, such as quantum circuit simulation [1]- [13], compilation [14]- [34], or verification [35]- [50]. This already led to comprehensive quantum circuit design flows realized through toolkits such as IBM's Qiskit [51], Google's Cirq [52], or Rigetti's Forest [53]-in addition to numerous tools and methods developed by other researchers and engineers in the field.
Generally, these software tools operate on and across different levels of abstraction and all tackle computationally hard problems. As a result, they have to compromise between resource demands and result quality. Usually, when a new software tool is proposed, its performance is empirically evaluated. The question how to evaluate the performance of a software tool is very challenging and not yet fully answered-especially for tools solving NP-complete problems. The most common approach is to run certain problem instances, so-called benchmarks, and compare the performance of the newly proposed method against state-of-the-art methods regarding a certain characteristic, e.g., run-time or solution quality.
Currently, this benchmarking is conducted using a wide variety of different benchmark suiteseach with a specific focus. While some benchmarks suites, e.g., [54], [55], provide benchmarks on higher abstraction levels, other benchmark suites focus on lower abstraction levels, e.g., [56], [57]. Although the intended target levels are properly covered, the demand for a cross-level benchmark suite is not fully met yet. As another consequence, there is no mutual consolidation which benchmarks to use for empirical evaluations yet-leading to lower comparability, reproducibility, and transparency.
In this paper, we propose MQT Bench-the benchmark suite from the Munich Quantum Toolkit (MQT) which explicitly aims to address those drawbacks. It aims at providing a first step towards benchmarking the whole quantum software stack with a single benchmark suite by offering the same benchmark algorithms on different levels of abstraction. To realize such a benchmark suite, several challenges have to be tackled: • Distinct requirements on different levels: Benchmarks have to fulfill certain requirements depending on the abstraction level, e.g., only gates of a device's native gate-set are allowed.
• Accessibility: To foster adoption of the benchmark suite, it needs to be as easy to use as possible.
• Generalizability: Providing a comprehensive set of benchmark algorithms to cover as many use cases as possible.
• Extendability: Future algorithms, gate-sets, and architectures should be easy to integrate.
MQT Bench has been developed with these challenges in mind and is based on four core traits: By this, MQT Bench aims to improve the comparability, reproducibility, and transparency of empirical evaluations for the whole quantum software stack. The rest of this work is structured as follows: In Section 2, we review the necessary basics of the quantum computing compilation flow and the respective software stack to keep this work self-contained. Afterwards, Section 3 reviews the state of the art on how this quantum software stack is currently evaluated and benchmarked-motivating the benchmark suite proposed in this paper. Based on that, Section 4 introduces the resulting benchmark suite and its core traits, before the generated benchmarks are evaluated in Section 5. Section 6 concludes this work.

Background
To keep this paper self-contained, this section gives a brief overview of the quantum circuit compilation flow with its different abstraction levels and the respective quantum software stack. The levels mentioned in this section are inspired by the structure proposed by the openQASM 3.0 specification [58] and can be found similarly in many other compilation flows as well.

Quantum Circuit Compilation Flow
Similar to the classical domain, executing a conceptual quantum algorithm on an actual device requires compiling it to a representation that adheres to all constraints imposed by the hardware. Usually, quantum algorithms are initially developed and tested on a hardware-agnostic level. Generally, this is described as a quantum circuit that consists of high-level building blocks without any restrictions to a certain gate-set or hardware architecture and is defined as the algorithmic level. Target  The first step in realizing a conceptual quantum algorithm on the algorithmic level for a particular problem is to synthesize the high-level building blocks and optimize the resulting representation independently of the actual target architecture. In analogy to a classical compiler, this involves tasks such as constant propagation and folding, gate modifier evaluation, synthesis, loop unrolling, and gate simplification. The resulting representation is defined as the target-independent level .

Example 2.
VQAs are hybrid quantum-classical algorithms, where the parameters of the quantum ansatz are iteratively updated by a classical optimizer analogous to conventional gradient-based optimization. Consider again the circuit from Fig. 1a. Assuming that these parameters have been determined, e.g., θ i = π for i = 0, ..., 5, they are now propagated and the resulting quantum circuit is shown in Fig. 1b. Today's quantum devices impose several constraints on the circuits that may be executed on them. Thus, a target-dependent compilation phase is necessary. For one, devices only provide a particular set of native gates-typically consisting of an entangling gate (like the CX gate) and some family of single-qubit gates. Therefore, the gates of a circuit need to be translated to this native gate-set, which is typically followed by an optimization pass that aims to reduce the introduced overhead. This representation level is defined as the target-dependent native gates level . Example 3. Different quantum computer realizations support different native gate-sets. In our example, we consider the ibmq_manila device as the target device that natively supports I, X, √ X, R z , and CX gates. Consequently, the R y gates in Fig. 1b have to be converted using only these native gates. In this case, they are substituted by a sequence of R z (denoted as • with a phase of −π) and X gates as shown in Fig. 1c.
In addition to the limited gate-set, today's devices (at least those based on superconducting qubits) only feature limited connectivity between their qubits. Consequently, the circuit's logical qubits need to be mapped to the targeted device's physical qubits, so that multi-qubit gates are only applied to qubits directly connected on the device. Since such a mapping can rarely be determined in a static fashion, i.e., globally for the whole circuit, the mapping has to change dynamically throughout the circuit, which is frequently referred to as mapping or routing. Again, this is followed by a round of optimization in order to minimize the overhead caused by the compilation. This representation level is defined as the target-dependent mapped level . Fig. 1d and defines between which qubits a two-qubit operation can be performed. Since the circuit shown in Fig. 1c contains CX gates operating between all combinations of qubits, there is no mapping directly matching the target architecture's layout. As a consequence, a non-trivial mapping followed by a round of optimization leads to the resulting circuit shown in Fig. 1e. This is also the reason for the different sequence of CX gates compared to the previous example.

Example 4. Consider again the scenario from Example 3. The architecture of the ibmq_manila device is shown in
At this point, the circuit is ready to be sent to the quantum computer's backend for execution. From there, the individual gates are scheduled, linked to a particular calibration, and the resulting circuit is eventually passed to the target machine code generator. The resulting binaries are then forwarded to the execution engine to orchestrate the quantum computation on the actual device.

Quantum Software Stack
Quantum software is needed on all levels reviewed above in order to aid designers in eventually realizing useful quantum applications. In the following, we review three of the core design tasks for software tools.
Today, quantum computers are a scarce resource with limited availability and capability. Until more devices become available that are larger and less prone to errors, classical means to simulate quantum algorithm circuits (also known as quantum circuit simulators) are essential for fostering the development of quantum applications. Additionally, and in contrast to actual quantum computing devices, quantum circuit simulators allow for detailed insights into the quantum states throughout the circuit execution since the state's amplitudes are explicitly tracked during the simulation. On actual hardware, information can only be extracted in the form of measurements from the final quantum state, which each yields a classical bit string according to the distribution described by the state's amplitudes. Such simulators can be used on the highest possible level of abstraction, since they can be designed in a way that does not require restrictions on the circuit's gate-set or the qubits' connectivity, e.g., as proposed in [1]- [13]. Furthermore, they might also be used on the lowest possible level in order to perform noise-aware simulations to estimate how a circuit is likely to perform on an actual device, e.g., as proposed in [59]- [62]. Either way, on both abstraction levels, quantum circuit simulation is non-trivial since corresponding representations of states and operations, in general, require exponential space or runtime-requiring powerful software tools.
Due to the immense complexity of the tasks involved in quantum circuit compilation (as it has been illustrated in Section 2.1), e.g., mapping being NP-complete [63], manually conducting compilation often is not an option. Consequently, efficient software tools are needed across all levels of abstraction of the aforementioned flow in order to realize a conceptual algorithm on an actual device. Thus, various compilation tools have been proposed, e.g., [14]- [34].
During compilation, an algorithm's description is considerably altered and transformed. Naturally, it is of utmost importance to ensure that the originally intended functionality is preserved through all levels of abstraction. This procedure is called verification. While the underlying principle of comparing the transformations represented by different quantum circuits is conceptually simple, the exponential size of the underlying matrices makes this problem challenging-it has been shown to be QMA-complete [64]. Due to the increasing complexity of today's compilation flows, tools for verifying their results become increasingly important. Examples of verification tools have been proposed in [35]- [50].
These design tasks demonstrate the wide variety of software tools operating on and across different levels of abstraction-leading to comprehensive quantum circuit design flows realized through toolkits such as IBM's Qiskit [51], Google's Cirq [52], or Rigetti's Forest [53]. At the same time, all these tasks have in common that they have an immensely large complexity. Therefore, a multitude of techniques have been proposed for each task-each with its own trade-off between resource demand and quality of the result. It is key in the development of scientific methods to empirically evaluate and compare their performances on practical, relevant benchmarks.

Benchmarking
The benchmark suite proposed in this paper is not the first and certainly not the last collection of quantum circuit benchmarks. In this section, we review some of the existing suites and their respective foci. Afterwards, we discuss how MQT Bench further complements this.

Current State of the Art
Providing a comprehensive set of benchmarks to satisfy the needs across all levels of the quantum circuit compilation flow is a challenging task. Several approaches to provide benchmark suites for some of the levels within the quantum circuit compilation flow have already been proposed and a non-exhaustive overview is given in the following: Application-Oriented Performance Benchmarks for Quantum Computing [54]: Starting with an example located on the algorithmic level, this benchmark suite rather focuses on high-level descriptions. At the moment of writing, it provides 13 benchmark algorithms which can be generated with a variable number of qubits via jupyter notebooks using multiple software compilation flows and are classified into four categories of complexity. The resulting circuits are available as high-level Python objects without any further compilation steps.
SupermarQ [55]: Similarly, SupermarQ also focuses on the algorithmic level by providing benchmarks with an adjustable qubit range for eight algorithms as high-level Python objects via a Python package. Additionally, six feature vectors are proposed to describe the characteristics of the benchmarks.
QASMbench [56]: This benchmark suite focuses on the target-independent level. It offers numerous quantum circuits with a wide but fixed range of both the number of qubits and depth. It classifies the benchmarks into three categories of sizes and three categories of algorithm classes. All circuits are available in an intermediate representation according to the openQASM 2.0 specification [65].
RevLib [57]: This benchmark suite provides a wide variety of reversible circuits in an intermediate representation format specified by the authors. Classical functions are frequently embedded into reversible circuits in order to use them in quantum algorithms, e.g., oracle functions or Boolean building blocks such as the modular exponentiation in Shor's algorithm [66]. In addition, decomposed versions of this benchmark suite have found widespread use in evaluating the performance of mapping tools, e.g., in [32]- [34]. Since reversible circuits merely form a subclass of quantum circuits and do not employ quantum mechanical effects such as superposition and entanglement, they do not serve as adequate benchmarks for the whole stack.

Motivation
All of the mentioned benchmark suites have in common that they each target a specific abstraction level within the quantum circuit compilation flow. Although the respectively intended target level is well covered, the demand for a cross-level benchmark suite is not fully met yet. Additionally, so far, there is no mutual consolidation which benchmarks to use for empirical evaluations of software tools. This results in a lower comparability, reproduceability, and transparency of results and may even cause confusion because benchmarks believed to realize the same particular task are quite frequently realized in a completely different fashion, e.g., a benchmark called "grover" could realize Grover's algorithm with any particular oracle. Consequently, the demand for a benchmark suite that spans the whole stack of abstraction levels constitutes the main motivation of this contribution. In order to realize such a benchmark suite, several challenges have to be tackled: • Distinct requirements for benchmark circuits on different levels: The closer a level is to the actual computing hardware, the more requirements have to be fulfilled, e.g., only gates from the device's native gate-set are allowed and the connectivity of the hardware architecture must be considered.
• Accessibility: In order to facilitate its adoption, the benchmark suite needs to be as easy to use as possible. While most benchmark suites already provide open-source access, running and adapting the code to the user's needs, e.g., obtaining a selection of relevant benchmarks, frequently is tedious or impossible and a solution that keeps this hurdle low is desirable.

Targetindependent
Targetdependent native gates Targetdependent  mapped Appl.-orient. Perf. Benchm. [54] SupermarQ [55] QASMbench [56] RevLib [57] MQT Bench Directly applicable Additional effort needed • Generalizability of a software tool: This can only be guaranteed if it is evaluated on a broad set of benchmarks, relating to several characteristics, e.g., the number of qubits, circuit depth, or the application domain.
• Extendability: In addition to that, quantum computing is rapidly advancing and the extendability of the benchmark suite to future algorithms, gate-sets, and architectures becomes even more important.
MQT Bench has been developed with these challenges in mind and aims to be a first step towards covering the whole quantum software stack with a benchmark suite based on the following four core traits (which are described in detail in Section 4): 1. Cross-level benchmarking: All benchmarks are provided on four abstraction levels reviewed in Section 2.1.
2. Accessibility: A website is provided (https://www.cda.cit.tum.de/mqtbench) to simplify the usage of MQT Bench as much as possible. Additionally, all benchmarks can also be generated on-demand using our Python package. While most users' needs should be covered by this, we also give access to the open-source repository on GitHub (https://github.com/ cda-tum/mqt-bench).
3. Algorithm Selection: Broad selection of different benchmarks with parameterizable characteristics ranging from building blocks, i.e., QFT, to applications, i.e., Grover's algorithm. 4. Extendability: MQT Bench is easily extendable with respect to available benchmarks, native gate-sets, and hardware architectures.
Benchmarking quantum software tools and compilation flows with the same benchmarks aims to aid comparability, reproducibility, and transparency of empirical evaluations. How this leads to more consistent testing is shown in the following example. Fig. 2

MQT Bench
In this section, we describe in more detail how each of the four core traits mentioned above is implemented.

Cross-Level Benchmarking
As exemplarily illustrated in Section 2, software development in quantum computing takes place on various levels. While there arguably are numerous possible levels, MQT Bench focuses on four as a balanced measure between specificity and generizability-inspired by the structure proposed in the openQASM 3.0 specification [58]. Providing these benchmarks on all of the four abstraction levels is the main contribution of this work. More precisely, the following levels are covered: 1. Algorithmic level : In this format, all kinds of quantum gates may be used and subsumed into high-level building blocks. Loops are possible and the circuit structure might be described by constants, variables, and mathematical expressions. This level provides benchmarks on the most generic level and specific values are assigned to its variables, e.g., the number of iterations of a loop. The number of qubits can be specified.

2.
Target-independent level : Here, loop unrolling, constant folding and propagation are conducted. Also, naive gate simplification is enforced. Again, the number of qubits can be adapted through the website and the used compiler can be selected.
3. Target-dependent native gates level : On this level, a particular native gate-set is chosen and the circuit is transpiled and optimized accordingly. Here, the native gate-set, the compiler used and its settings can be specified.

4.
Target-dependent mapped level : Finally, a dedicated architecture layout is chosen and the circuit is mapped to the targeted device such that it satisfies all its connectivity constraints and becomes executable on the device. Similar to the previous level, the used compiler and its settings are selectable-in addition to the target device.
All relevant information of a generated benchmark is denoted in short in the file name and in detail within the file itself.

Accessibility
MQT Bench strives to meet the needs of many. To accomplish this goal, it has to be as user-friendly as possible. On the one hand, this involves the benchmark library itself. All of the high-level, algorithmic benchmark scripts and routines to generate representations for the individual levels are publicly available on GitHub (https://github.com/cda-tum/mqt-bench). This allows for complete transparency on how the respective circuits have been generated. Additionally, a Python package is provided such that each benchmark can be generated on-demand.
On the other hand, and most importantly, this involves the way in which users interface with the benchmark suite. While most benchmark suites are provided as an accumulation of benchmark files-making it hard to extract the desired set of benchmarks, MQT Bench includes an easy-to-use, no-coding-required web interface (https://www.cda.cit.tum.de/mqtbench) that provides the means to filter the large number of pre-generated benchmarks according to the specific needs of the user. This web interface and its underlying server software is also part of our Python package, such that every user can utilize the same interface locally without the need to access our publicly available server. A screenshot of the website's user interface and its configuration options is shown in Fig. 3.

Algorithm Selection
To provide a broad spectrum of different benchmarks, MQT Bench comprises most of today's de-facto standard quantum algorithms. This includes building blocks, e.g., Quantum Fourier Transform (QFT) and the Greenberger-Horne-Zeilinger state preparation (GHZ), up to higher-level algorithms, e.g., Grover's [67] and Shor's [66] algorithm. At the time of writing, MQT Bench comprises the following benchmarks: • Amplitude Estimation (AE) Additionally, MQT Bench provides several application benchmarks specifically targeting variational quantum algorithms, since those algorithms are especially promising in the NISQ-era [68]. To this end, we use the classification provided by IBM Qiskit's application levels: Optimization, Machine Learning, Finance, and Nature:

Extendability
Quantum computing is a rapidly evolving area of research and, especially in recent years, numerous promising algorithms and applications have emerged. Any benchmark suite has to be extendable and must continuously evolve in order to remain relevant. MQT Bench is designed to be easily extendable in a multitude of ways: • Algorithms: New applications and algorithms can be easily integrated into MQT Bench by providing the corresponding algorithmic level description. The generation of all other levels and options (such as the native gate-sets, architectures, and compilers with their settings) is automatically handled by the proposed library.
• Native gate-sets: So far, native gate-sets of superconducting quantum computers from IBM, Rigetti, and Oxford Quantum Circuits are provided-in addition to the gate-sets of IonQ's and Quantinuum's ion trap-based quantum computers. In the future, additional gate-sets (such as those of Google's and or AQT's quantum computers) can be realized by adopting the necessary compilation routines in the proposed library.
• Hardware architectures: All major players currently rely on a modular platform for the architecture of their devices in order to further scale their quantum computers. It is straightforward to extend MQT Bench's list of available architectures to accommodate future architectures.
Although the four levels MQT Bench is based on cover today's main use cases, the number of abstraction levels most likely is going to increase in the future. While the mapped level has been considered the last one before a circuit is sent to the quantum computer for execution, pulse-level programming is envisioned to give even more control to software developers as proposed in [69]. On the other end of the spectrum, higher-level abstractions and programming languages will be required to foster the adoption of quantum technology. Currently, a programmatic description of the benchmarks is needed on the algorithmic level. An even higher level where this programmatic description will be automatically derived from is also envisioned. QUARK [70], a framework for quantum computing application benchmarking consisting of four levels where the lowest level considers the whole compilation flow discussed in Section 2.1, constitutes one step towards this direction.

Evaluation
At the time of writing, MQT Bench (v1.0.0 ) considers two compilers, five native gate-sets, and seven devices ranging from 8 to 127 qubits. In this section, we provide an overview of several characteristics and statistics of the resulting pre-generated benchmarks. To this end, the following metrics are illustrated in Fig. 4: Number of qubits: A first broad overview is given in Fig. 4a, which shows the relative frequency of the generated benchmarks per number of qubits and per compiler. The relative frequency of benchmarks is decreasing with an increasing number of qubits due to fewer devices being available for higher qubit numbers and the maximal generation time for a benchmark file being exceeded more frequently. Furthermore, the benchmark generation using the TKET compiler generally took more time than Qiskit-leading to a smaller number of generated benchmarks, especially on the target-dependent mapped level.
Distribution of target-dependent mapped level benchmarks: The distribution of the target device for the mapped level is shown in Fig. 4b, with the number of qubits denoted in brackets. As expected, the larger the device, the more benchmarks are created for it. On the target-dependent native gates level, there is an equal distribution with respect to the number of benchmarks created for each of the five native gate-sets.
Distribution of benchmark characteristics: In Fig. 4c, six different characteristics with values between 0 and 1 are evaluated for all pre-generated benchmarks. Five of those characteristics (Program Communication, Critical Depth, Entanglement Ratio, Liveness, and Parallelism) have been proposed in [55]. Furthermore, the percentage of multi-qubit gates is evaluated.

Conclusion
In this work, we proposed MQT Bench-a quantum circuit benchmark suite (as part of the Munich Quantum Toolkit, MQT) comprising different algorithms, compilers, native gate-sets, and target devices-resulting in more than 70,000 benchmark circuits ranging from 2 to 130 qubits on four abstraction levels. To keep this large number of benchmarks manageable, we provide an easy-to-use web interface (https://www.cda.cit.tum.de/mqtbench) allowing users to filter the benchmarks according to their needs. MQT Bench is also provided as a Python package (including the server software to start the web interface locally), such that each of the benchmarks can be easily generated on-demand. Furthermore, we give access to the open-source repository on GitHub (https:// github.com/cda-tum/mqt-bench). By this, MQT Bench presents a first step towards serving a single benchmark suite for the whole quantum software stack-facilitating empirical evaluations of quantum software tools that are comparable, reproducible, and transparent.