Qibolab: an open-source hybrid quantum operating system

We present Qibolab, an open-source software library for quantum hardware control integrated with the Qibo quantum computing middleware framework. Qibolab provides the software layer required to automatically execute circuit-based algorithms on custom self-hosted quantum hardware platforms. We introduce a set of objects designed to provide programmatic access to quantum control through pulses-oriented drivers for instruments, transpilers and optimization algorithms. Qibolab enables experimentalists and developers to delegate all complex aspects of hardware implementation to the library so they can standardize the deployment of quantum computing algorithms in a extensible hardware-agnostic way, using superconducting qubits as the first officially supported quantum technology. We first describe the status of all components of the library, then we show examples of control setup for superconducting qubits platforms. Finally, we present successful application results related to circuit-based algorithms.


Introduction
A successful deployment of quantum computing algorithms requires quantum hardware and middleware software dedicated to instrument control for specific quantum platform technologies.
The goal of middleware is to provide standardized software tools which abstract heterogeneous software interfaces from high-level applications.From quantum computing algorithms based on the quantum circuit paradigm, to low-level driver instructions dedicated to a specific experimental setup including instruments.A proper implementation of middleware software accelerates research from theory to experiments by reducing the amount of effort and expertise required to operate a quantum platform and develop novel quantum algorithms.
Nowadays, the major challenges of middleware, as a research accelerator, include the need of standard code procedures for quantum control algorithms, calibration and characterization, all extensively tested and reviewed.This software should be designed in such a way that it could be reused by similar experiments in multiple research laboratories dedicated to quantum hardware design and fabrication.Therefore, one of the expected positive side effects of the development of middleware is the generation of a database of algorithms and procedures built and maintained by a large research community.As an example, it is possible to find similar cases in other research fields such as data analysis tools [1] and Monte-Carlo event generators [2] for high-energy physics and artificial intelligence [3].
Since the beginning of 2020, despite the growing interest in quantum computing and the recent developments in quantum hardware platforms, we have observed the lack of a standard middleware opensource framework dedicated to self-hosted quantum platforms.There are software libraries dedicated to quantum computing such as Cirq [4] and TensorFlow Quantum [5] from Google, Qiskit [6] from IBM, PyQuil from Rigetti [7], among others .However, many of these software libraries have been promoted just to grant users access to freeware and/or commercial cloud-based platforms, hence no full-stack open-source library for quantum algorithms, from simulation to quantum hardware control was available.Moreover, specialized quantum hardware solutions such as QCodes [31], PyCQed [32] or Labber [33] offer too rigid a structure to seamlessly incorporate all the other essential features that a full-stack solution requires.Therefore, we started developing Qibo [34][35][36][37], an open-source middleware framework for quantum computing, by establishing an international collaboration network involving laboratories in universities and research institutions located in Europe, Asia and America.
In this manuscript we present for the first time Qibolab [38], a software library which unlocks Qibo's potential to execute quantum algorithms on selfhosted quantum hardware platforms.We provide a dedicated application programming interface (API) for quantum circuit design, qubit calibration, instrument control through arbitrary pulses, driver operations including sweepers and transpilation into a given platform topology using its native gates.A successful implementation of Qibo will deliver to the research community a first prototype of extensible quantum hardware-agnostic open-source hybrid quantum operating system, fully tested and benchmarked on superconducting platforms.
The paper is organized as follows.In Sec. 2 we describe the project status, design and modules.Then, in Sec. 3 we present a detailed overview of the Qibolab library for version 0.1.0.In Sec. 4 we show examples of applications involving superconducting qubit platforms.Finally, in Sec. 5 we draw our conclusion and discuss about future development directions.

Project overview and specification
In this section we summarize the status of Qibo in the release 0.2.0 by describing the software design, the latest features implemented in modules and tools, including simulation, hardware control and calibration.The aim of this section is to provide an updated high-level description overview of the project, following up the previous releases documented in Refs.[34] and [35].
For an in depth technical description of the Qibolab library and its software features we invite the reader to proceed to Sec. 3 and 4.

Software design
In Fig. 1 we schematically show Qibo's layout.The framework is divided into two blocks: the language API and the backends implementation for execution on various classical or quantum hardware.
The API contains a set of high-level interfaces for fast prototyping of quantum computing algorithms based on circuit and adiabatic paradigms adopting Python as programming language.
The quantum circuit API implements primitives for exact quantum state manipulation, circuit model initialization with single and two-qubit gates, as well as more complex operations such as Toffoli gates and gate fusion.This API also includes an exhaustive interface to perform final state measurements through shots.Furthermore, dedicated functions are available for noisy quantum simulation on classical hardware.The user has the possibility to build custom noise models through channels such as Kraus channel operators [39], a multi-qubit noise channel that applies Pauli operators with given probabilities, an n-qubit depolarizing quantum error channel, singlequbit thermal relaxation error channels or readout and single-qubit reset channels.Error mitigation techniques for quantum circuits are also available with the following algorithms: Zero Noise Extrapolation (ZNE) [40], Clifford data regression (CDR) [41], randomized readout [42] and Variable Noise CDR (vnCDR) [41].
Our annealing module API provides algorithms for time evolution of quantum states, symbolic and numeric matrix-based Hamiltonian allocation and adiabatic evolution [51].In order to accelerate the initialization of Hamiltonians, Qibo provides a database of pre-coded models including the Heisenberg XXZ, the non-interacting Pauli-X/Y/Z, the transverse field Ising model (TFIM) and the max cut Hamiltonian.
In [34], practical examples illustrating the implementation of the aforementioned features of Qibo are provided.Additional examples are provided in the Qibo documentation [52].
From the implementation point of view Qibo provides multiple execution backends which are responsible for the conversion and execution of the primitives on different hardware.Each backend inherits from an abstract interface which determines the set of methods that must be implemented in order to execute the primitives of the language API.
At the current stage, we support simulation backends on classical hardware and, through Qibolab, the same high-level code can be executed directly on quantum hardware.In practical terms, we can consider Qibolab as an actual hardware backend, once a specific platform is selected by the user.Furthermore, this modularity opens the possibility to create further tools which rely on Qibo and its backends.For ex-ample, tools for quantum chemistry, multi-qubit calibration routines, benchmarking, machine learning inspired algorithms and others, as well as the addition of further backends for simulation or hardware execution.

Classical quantum simulation
Simulation is a crucial part of quantum computing research, particularly in the current Noisy Intermediate-Scale Quantum (NISQ) [53] era, where exact results from simulation can be used for validating algorithms or implementing error mitigation routines.
In Qibo, both gate-based and adiabatic quantum computation paradigms can be simulated on classical hardware.Thanks to its modularity, quantum algorithms can be deployed on three different simulation backends, which are designed to meet specific needs, as represented in Fig. 1.In this section we summarize the advantages and limitations of each backend currently available in Qibo: numpy, tensorflow and qibojit.We also highlight which backend is best suited depending on the application.
The numpy backend is based on NumPy's primitives [54], as explained in more detail in [36].It is a lightweight backend, which supports single-threaded CPU simulations with a moderate performance.This setup is usually recommended for circuits up to 20 qubits.The importance of this backend lies in its broad compatibility with many classical system architectures, including arm64, which makes it a safe and stable choice, especially in development contexts, e.g. in laboratories where quantum platforms are be-  ing installed and tested.
The second backend is based on TensorFlow [3] primitives.Similarly to the numpy backend, it can be used for tackling problems involving a limited number of qubits, although it allows to perform quantum simulation on multi-threading CPU and single-GPU.The tensorflow backend inherits TensorFlow's optimization routines, including state-of-the-art gradientbased optimizers.This feature is particularly useful in the context of Quantum Machine Learning (QML), where automatic differentiation routines can be exploited for training hybrid quantum-classical machine learning models [48].
Within the optimization module of Qibo, we have implemented a function that uses the automatic differentiation provided by TensorFlow to execute gradient-based optimization strategies, namely qibo.optimizers.sgd.This function can be customized according to the developer needs on top the features offered by TensorFlow itself.To execute this function the usage of the tensorflow backend is required.
Typically, in Machine Learning, gradients are calculated through the Back-Propagation (BP) algorithm [55], which requires saving copies of matrices and vectors during the process.Since TensorFlow uses this method, the tensorflow backend requires copies of the state vector during simulation, which increases memory consumption and reduces performance.
The third backend is qibojit [35], a highperformance simulation backend which combines Just-In-Time (JIT) compilation with the definition of custom operators for state vector manipulation.Here, the action of quantum gates is optimized, by considering matrix properties like sparsity and symmetries, and by avoiding allocating new copies of matrices and vectors, which are instead modified in-place.The qibojit structure is shown in Fig. 2, which shows the specific implementation adopted for CPU and GPU(s) environments.
Multi-threading CPU, GPU and multi-GPU con-figurations are supported by qibojit.Simulation on CPUs are based on NumPy tensors and accelerated with Numba [56], while for GPU and multi-GPUs executions, we adopt CuPy [57].For GPUs two different acceleration strategies are implemented.First, we exploit the Cupy's RawKernel method, thanks to which we can write custom CUDA kernels in C++ and seamlessly import them in Python.The second accelerated simulator is implemented using primitives from NVIDIA cuQuantum [58].The qibojit backend is the suggested choice for simulating systems with a large number of qubits.In Section 3.1 of [35], we have conducted benchmarking tests using Qibo on different classical hardware, focusing on significant quantum circuits like Quantum Fourier Transform [59] and Bernstein-Vazirani [60], as the number of qubits increases.These benchmarks also include comparisons of Qibo's performance with other public quantum computing libraries.
Recognizing the importance of simulation, even as technology advances and the quality of quantum devices improves, we plan to improve Qibo from a simulation perspective.With this in mind, we are working on the development of new backends, supporting multi-node distribution of state vector simulation and, by changing simulation method completely, the construction of a tensor networks [61][62][63][64][65] backend.

Quantum hardware support
In the previous section we have shown how Qibo can be used for quantum circuit simulation.Although simulation is a useful tool for testing and profiling quantum algorithms, we are still mainly interested on deploying such algorithms on quantum processors to show the advantages of this technology [66].
Quantum computers can be implemented using several quantum systems, including superconducting circuits [67], trapped ions [68] or neutral atoms [69] among others.In this paper, we focus on superconducting devices, but Qibolab provides an extensi-ble abstraction library to accommodate other quantum technologies, the only precondition is that the experimental setup should be composed by instruments that communicate with each other and the QPUs.As described broadly in Sect.3.1, it is possible to mirror any experimental configuration by inheriting the Platform class, and deploying the suitable Instruments and Qubits methods, specifying all connections through Channels class.Transmons [70] are one possible implementation of qubits through superconducting devices, which are weakly anharmonic oscillators made using Josephson junctions [71].To perform measurements, transmons are dispersively coupled to superconducting resonators which are in turn coupled to a microwave transmission line.
Gates are implemented by coupling qubits through microwave drive and flux [70] lines that carry control pulses with precise amplitude and duration.
As shown in Fig. 1, Qibolab includes all the necessary components to construct a backend for the deployment of quantum algorithms on self-hosted quantum processing units (QPUs).The addition of such backend is facilitated by Qibo's modular layout [36] which enables users to create custom backends with minimum effort.For the particular case of a hardware backend this feature allows us to focus only on low-level components.
Qibolab provides an API to define Pulse objects, able to perform low-level manipulations such as executing a specific sequence of microwave pulses similarly to other libraries [72][73][74][75].Through this interface it is possible to code easily both experiments and calibration protocols.Such abstraction is quite practical given that instruments may have different definitions for specific waveforms.
Another key element listed in Fig. 1 is the presence of drivers to control and interface Qibo with different instruments.To generate the appropriate microwave pulses needed to perform quantum gates, a common approach is to use Arbitrary Waveform Generators (AWG), digital to analogue (DAC) and analogue to digital converters (ADC) which are nowadays available through Field Programmable Gate Arrays (FPGAs).All these devices usually provide libraries or packages to control them, e.g., Qblox Instruments [76], Qcodes [31] and LabOneQ [74].Despite such heterogeneity, Qibolab defines a common interface to properly expose package methods required to control QPUs.
Finally Qibolab takes care of all the necessary operations to prepare the execution of quantum circuits on a fully characterized device.Among these, there is a transpilation step of circuits to the native gates supported by the quantum processor and a compilation step to convert these gates to pulses.Sect.3.3 presents a more precise description of the transpilation step.ling a QPU.Qibolab is running on a host computer, which communicates, typically via a network protocol, with the control electronics used for pulse generation.These electronics are connected to the QPU via different channels: the readout and feedback channels in a closed loop for measuring the qubit, the drive channel for applying gates and, for flux-tunable qubits, the flux channels for tuning their frequency.For a more detailed description of the Qibolab backend, we invite the reader to check Sect.3.1.

Hardware characterization
While the API provided by Qibolab enables full control over the electronics interacting with the qubits, this alone is not sufficient for operating a quantum computer.This is because the accurate fine-tuning and calibration of control waveform parameters are crucial requirements for quantum hardware to work successfully [77].
Within the Qibo environment, Qibocal [78,79] offers the necessary tools for calibrating, characterizing, and validating QPUs through a collection of platform and instrument agnostic experiments, or routines.
With Qibocal, it is possible to deploy various calibration and characterization protocols and generate a comprehensive HTML report summarizing the results.Alongside the report, Qibocal also produces the new platform configuration containing the fine-tuned parameters found.
When executing multiple experiments, these parameters can be updated at runtime, allowing for complex routines with real time feedback.This feature unlocks Qibocal's potential to perform automatized hardware calibration, which will be presented in a future manuscript in preparation [86].

Quantum computing drivers
Qibolab provides a unified framework for controlling the different electronics that are needed to operate a quantum computer.To achieve this, we provide software abstractions and patterns that can be followed by a laboratory in order to operate their self-hosted devices.As a use case, we support drivers for multiple commercial instruments, which we use to showcase the library and provide benchmarks in Sec. 4. In the following sections we describe the software abstractions and supported drivers in more detail.

Software abstractions
Qibolab provides two main interface objects: the Pulse object for defining arbitrary pulses to be played on qubits, and the Platform which is used to execute these pulses on a specific QPU.
Pulses constitute the building blocks of programs that are executed on quantum hardware.They can be used to read the state of a qubit, drive it to change its state, or flux a qubit to change its resonance frequency and probe two-qubit interactions.Qibolab provides pulse objects for each of these operation modes and each Pulse object holds information about the amplitude, frequency, phase, start and duration of the pulse, which are required for the generation of physical pulses.We also provide the functionality to generate waveforms of different shapes, such as Rectangular, Gaussian or DRAG [87].
Real experiments involve playing multiple pulses on different qubits.In Qibolab pulses can be aggregated in a PulseSequence.The Pulse API provides flexibility in scheduling such sequences by specifying when each individual pulse starts in time and allowing overlapping pulses, which are essential for features such as readout multiplexing [88].
Abstract sequences of pulses defined using the Pulse API can be deployed on hardware using a Platform.This core Qibolab object is used to orchestrate the different instruments for qubit control.Each Platform instance corresponds to a specific quantum chip controlled by a specific set of instruments.It allows users to execute a single sequence, a batch of sequences, or perform a sweep, in which one or more pulse parameters are being updated in real-time, within the control instrument.Real-time sweeps or executing sequences in batches, significantly speeds up qubit calibration and characterization procedures.
Platform is comprised of different objects as shown in Fig. 4. Qubit objects are representations of the physical qubits.They contain information about physical parameters associated to a qubit that are measured during calibration and characterization [89,90], such as coherence times T 1 and T 2 , or the parameters of pulses and sequences needed for single-qubit native gates.Similarly, QubitPair objects contain information about the neighboring pairs of qubits in a chip and the corresponding two-qubit native gates.The topology of the chip is extracted from the available pairs and is used by the transpiler presented in Sec.3.3.
Platform holds a collection of Instrument objects which contain the low-level drivers for operating the laboratory equipment.The abstract Instrument class contains the methods one needs to implement when interfacing Qibolab to the libraries provided by the instrument's manufacturers, so that the instrument can be used as part of a larger instrument setup compatible with all functionalities provided by the Qibo framework.Controller is a subclass used by instruments that have arbitrary waveform generators and can play and acquire pulses.Qibolab provides pre-coded driver implementations for several commercial qubit control instruments, as described in Sec.3.2.
Finally, Channel represents a connection from qubits to instruments.Through the Port object it also implements an interface for controlling instrument parameters.This connection is essential for playing pulses from the instrument port that targets the desired qubit.It also provides a qubit-centric interface for setting instrument parameters, which is useful in calibration routines.
To operate a real QPU, one needs to create a Platform that mirrors the channel and instrument configuration of the lab, following the example shown in Fig. 3.The procedure is outlined in the following steps: 1. instantiate Instrument objects for all instruments in the lab setup; 2. create Channel objects for all connections between instruments and qubits, and map them to the corresponding instrument ports.Auxiliary instruments such as local oscillators can also be mapped to a Channel; in the Platform generation, while dynamic parameters are loaded as external data.More details on how a custom Platform can be written for a specific lab setup can be found in the online documentation [91].
If the parameters of an existing platform are updated, for example through a calibration routine, it is possible to dump the new parameters on disk using serialization methods [92].Parameters are uploaded to the respective devices using their specific API, which is abstracted by the Platform interface.Executing a program on the created Platform is also a multi-step process.Users can write their programs using the Qibolab Pulse API or the Qibo Circuit API.The former is commonly used for lowlevel applications such as qubit calibration, while the latter is needed for executing quantum algorithms.Execution of circuits involves additional steps.First, they are transpiled (see Sec. 3.3) to new circuits that respect the QPU connectivity and native gates.Secondly, native gates are compiled to pulses following a set of rules which are held in the Compiler object of the Qibolab backend.Once a PulseSequence is available, either directly or from compilation of a circuit, it can be deployed using a Platform.The Platform will send each pulse to the appropriate instrument ports and acquire feedback associated to measurements.This will be returned to the user according to the specified format.The Qibolab port is an internal abstraction, that takes care of bridging the gap between the Qibolab compiled PulseSequence and the output format to the specific input and output defined by each device.Available formats are classified shots (0 or 1), integrated and demodulated voltage signals, or raw waveform signals.All formats can be obtained as single shots or averaged.More details on the different result formats can be found in the online documentation [93].

Supported drivers
Version 0.1.0 of the Qibolab package provides extensive support for various devices used in quantum hard-ware control.Specifically, it supports devices developed by Qblox [94], Quantum Machines [95], Zurich Instruments [96], as well as RFSoC (Radio Frequency System on Chip) FPGAs (Field Programmable Gate Arrays) supported by the Qick project [97] and by Qibosoq [98].Each of these devices possesses distinct requirements and operational methods, necessitating meticulous attention to ensure seamless control through a unified interface.In more detail: Qblox the Qblox Instruments cluster [99] where we tested Qibolab is composed of several modular devices controlled as one.The Qblox Cluster is the scalable 19" rack instrument that can be configured with a combination of up to 20 modules that can control and readout qubits over a wide frequency range (up to 18.5 GHz).Our setup to control 5 superconducting flux tunable qubits without coupler mediated interactions is composed by: QRM-RF two Qubit Readout Modules [100] with one input channel and one output channel in the radio-frequency regime and 2 digital markers.The module provides all necessary capabilities for qubit readout without external up or down conversion for signals in the range of 2-18.5GHz.QCM-RF three Qubit Control Modules [101] with two drive channels per module dedicated to the qubit control using parametrized pulses, that allows the user to control up to 5 qubits.QCM two QCM modules [102] to control the DC voltage applied to the flux channels of the qubits and generate the flux pulses needed to implement two-qubit gates.The dynamic output range of the DACs (digitalto-analog converters) of the Quantum Control Modules is 5 Vpp, the difference between the highest and the lowest voltage values in a AC signal, with a 1Gsps sampling rate.
The system synchronization of the signals between the modules is made by the Qblox Cluster using SYNQ [103] protocols.The high-level interface for the devices comes from Qblox Instruments [76] and Qcodes [104] Python-based libraries, and the low-level communication with the sequencers is made using assembly code (Q1ASM) [105].This setup allows control of 5+ flux tunable superconducting qubits.
Quantum Machines Qibolab has been tested in controlling a cluster of nine OPX+ controllers [106], and communicate with an all-to-all connectivity to support fast feedback operations between any pair of controllers.The synchronization and clock distribution is handled by OPT devices.Each OPX+ controller has ten analogue output ports, ten digital output ports and two input ports, making the cluster capable of controlling 25+ flux tunable capacitively coupled qubits.
The main disadvantage of our OPX+ controllers, compared to other instruments used in this work, is that the IQ mixing and upconversion are not taken care internally and there is small bandwidth for the intermediate frequency (400MHz) and output voltage (0.5V).Due to these limitations, additional external instruments including local oscillators, mixers and sometimes amplifiers are needed to successfully drive and flux qubits.The Qibolab driver is controlling the whole cluster as a single instrument using the QUA library [75].This library exposes many low-level operations to Python via an intuitive but rich set of commands, which expands beyond simple pulse scheduling and includes conditional logic, loops and complex mathematical operations.
Zurich Instruments the Zurich Instruments cluster where we tested Qibolab is composed of several modular devices controlled as one.
SHFQC a single SHFQC [107], that can control the drive and readout of up to 6 superconducting qubits connected to the same readout probe.The IQ mixing and upconversion are taken care of internally by using a proprietary cleaner signal upconversion and downconversion scheme [108] with an instantaneous bandwidth around 1.2 GHz without the need for calibration.They also provide an output voltage of 2 Vpp.
HDAWG two HDAWGs [109] to provide up to 8 DC-coupled single-ended analogue output channels each to control the flux pulses required to interact with qubits and couplers.Up to 5 Vpp output voltage.
PQSC A single PQSC [110], to synchronize the previous devices via the low-latency, realtime communication link ZSync.The PQSC comes with 18 ZSync ports to distribute the system clock and synchronize the instruments.Furthermore, the links provide a bidirectional data interface to send qubit readout results to the PQSC for central processing and send trigger signals to the slave instruments for feedback.
The high-level interface for the devices comes from the Python-based LabOneQ library [74].This setup allows control of 5+ flux tunable superconducting qubits with tunable couplermediated interactions.
RFSoCs the RFSoCs supported by Qibolab currently include the RFSoC4x2 [111], the ZCU111 [112], and the ZCU216 [113] manufactured by Xilinx.These FPGAs possess a unique feature of offering direct RF synthesis capability up to ≈ 9.8 GHz.This simplifies the experimental setup by eliminating the need for additional local oscillators and IQ mixers.To interact with the Qick firmware, the driver relies on a server that runs on board called Qibosoq.
Both the Qibosoq server and the Qick firmware are open source, reducing costs for setting up a new laboratory.However, it is important to note that these boards have limitations in terms of the number of qubits they can control, can be challenging to synchronize in multi-board setups and, in general, the software supports less features than other devices.
In addition to the devices responsible for synthesizing pulses to control the qubits and acquiring signals for measurements, a comprehensive quantum control system relies on additional devices.Among these, local oscillators play a crucial role in up and down converting microwave signals for some of our devices and pumping the TWPAs.Integrating local oscillators within the same framework is essential since they need to be calibrated and turned on and off during the control process.Qibolab facilitates seamless integration of these devices and includes drivers for Erasynth and Rohde&Schwarz local oscillators in version 0.1.0.
An outline of the supported instruments is presented in Table  drivers included with Qibolab version 0.1.0.It is important to note that while some limitations and missing features are currently present, they are not necessarily inherent to the devices themselves and will be addressed in future versions of Qibolab.
The following is a description of the features presented in Table 2.

Arbitrary pulse sequences the capability of executing arbitrary pulse sequences defined in
Qibolab, which is a fundamental requirement of a driver.This feature is not related to the execution of pulses with arbitrary waveform shapes.
Arbitrary waveforms the capability of executing pulse waveforms of arbitrary shape.For drivers that do not support this feature, rectangular, Gaussian and DRAG waveforms can still be synthesized.
Multiplexed readout allows playing and acquiring multiple multiplexed pulses through the same line.It is particularly useful for multi-qubit chips where the readout line is commonly shared among multiple qubits.
Hardware classification the capability of doing single shot measurement classification during the execution of a pulse sequence.
Fast reset the capability of actively resetting the state of a qubit to zero after a measurement.This feature requires hardware classification and enables faster executions of repeated pulse sequences.
Device simulation the possibility of simulating in advance the pulses to be executed, without directly using quantum hardware.

RTS frequency RTS (Real Time Sweeper
) refers to the capability of executing a pulse sequence multiple times with different values of, in this case, the frequency of a pulse.This feature facilitates faster qubit characterization and experiments.
RTS amplitude real-time sweeping of the amplitude of a pulse.
RTS duration real-time sweeping of the duration of a pulse.
RTS start real-time sweeping of the start time of a pulse.
RTS relative phase real-time sweeping of the relative phase of a pulse.

RTS 2D the capability of combining two RTS scans on different parameters.
Sequence unrolling the capability of unrolling several smaller subsequences into a longer single sequence as in loop unrolling.It aims to decrease the overall time spent on compilation and communication steps by reducing its amount from once every subsequence to once every unrolled sequence.
Hardware averaging the capability of repeating the same experiment multiple times and obtain, directly from the device, averaged results.

Singleshot (No Averaging
) the capability of obtaining from the devices all the non-averaged results.
Integrated acquisition the capability of acquiring complex signals [114] with "in-phase" and "quadrature" (IQ) components demodulated and integrated for the measuring time.
Classified acquisition the capability of performing 0-1 state classification after the integrated acquisition.
Raw waveform acquisition the capability of acquiring non-integrated IQ waveform values.

Transpiler
Logical quantum circuits for quantum algorithms are hardware agnostic.Usually an all-to-all qubit connectivity is assumed while most current hardware only allows the execution of two-qubit gates on a restricted subset of qubit pairs.Moreover, quantum devices are restricted to executing a subset of gates, referred to as native [115].This means that, in order to execute circuits on a real quantum chip, they must be transformed into an equivalent, hardware specific, circuit.The transformation of the circuit is carried out by the transpiler through the resolution of two key steps: connectivity matching [116] and native gates decomposition [117].In order to execute a gate between two qubits that are not directly connected SWAP gates [118] are required.This procedure is called routing.As on NISQ devices two-qubit gates are a large source of noise, this procedure generates an overall noisier circuit.Therefore, the goal of an efficient routing algorithm is to minimize the number of SWAP gates introduced.An important step to ease the connectivity problem, is finding an optimal initial mapping between logical and physical qubits.This step is called placement.The native gates decomposition in the transpiling procedure is performed by the unroller.An optimal decomposition uses the least amount of two-qubit native gates.It is also possible to reduce the number of gates of the resulting circuit by exploiting commutation relations [119], KAK decomposition [120] or machine learning techniques [121].
Qibolab implements a built-in transpiler with customizable options for each step.The main algorithms that can be used at each transpiler step are reported below with a short description.The initial placement can be found with one of the following procedures: • Trivial: logical-physical qubit mapping is an identity.• Custom: custom logical-physical qubit mapping.
• Random greedy: the best mapping is found within a set of random layouts based on a greedy policy.• Subgraph isomorphism: the initial mapping is the one that guarantees the execution of most gates at the beginning of the circuit without introducing any SWAP.• Reverse traversal: this technique uses one or more reverse routing passes to find an optimal mapping by starting from a trivial layout [122].The routing problem can be solved with the following algorithms: • Shortest paths: when unconnected logical qubits have to interact, they are moved on the chip on the shortest path connecting them.When multiple shortest paths are present, the one that also matches the largest number of the following twoqubit gates is chosen.For the two-qubit native gates it is possible to use CZ and/or iSWAP.When both CZ and iSWAP gates are available the chosen decomposition is the one that minimizes the use of two-qubit gates.
The benchmarking of a full transpiling pipeline can be complex, as the results may vary in different chip architectures and a trade-off between performance and execution time needs to be taken into account.We remand the general benchmarking problem to the specific literature [123] and we focus on a more specific use-case.Fig. 5 reports the performance of the routing pass algorithms implemented in Qibolab on a five qubit chip with a star connectivity.In this kind of chip five qubits are arranged with a central qubit connected to all the remaining four qubits.The algorithm performance has been evaluated as the CNOT overhead.That is the number of CNOT gates on the routed circuit divided by the number of CNOT gates in the original circuit.The algorithm performance has been tested on five qubit circuits composed of 10, 20 and 100 CNOT gates, taking an average over 50 random circuits.Moreover, we have tested the algorithms on a structured circuit: the five qubits QFT.The SABRE algorithm has been tested with and without the lookahead.The results have been compared with transpiler designed for that connectivity (star transpiler) that swaps the central qubit in the chip based on the successive gate.All the routing algorithms have been tested starting from an initial trivial layout except for the star transpiler that has a built-it placer, this explains the better performance of this algorithm on short circuits.Fig. 5 shows that SABRE with a lookahead reaches the best performance on the star connectivity chip.The execution time on this simple case is not significant as all algorithms perform in a fraction of second even for the longer circuits.
However, other tests has shown that the scalability of SABRE is better than shortest paths as the number of possible shortest paths increases drastically with the number of qubits in highly connected chips.In summary, Qibolab transpiler shows good performance in making abstract quantum circuits executable on small NISQ devices.In the future we aim at developing new efficient and scalable algorithms for the next generation quantum chips with a high number of qubits.

Cross-platform benchmark
In this section, we present the results of a speed benchmark conducted using Qibolab.The benchmark involved various experiments deployed on the different control devices currently supported by the drivers implemented in Qibolab.
By utilizing Qibolab, assessing the performance and efficiency of each control device becomes a straightforward process, since all devices are exposed through the same interface.The results obtained not only offer valuable insights into the speed of the different instruments, offering data that can help researchers and developers to make informed decisions, but also demonstrate the comprehensive support these devices receive within Qibolab.
The experiments chosen for this benchmark represent the minimal set of routines required for the calibration of a single qubit.They also offer a view of the different execution modes supported by Qibolab: in particular the Single shot classification experiment executes fixed pulses sequences, while the Spectroscopies perform different sweeps over pulse parameters.
In Fig. 6 we present a comparison of execution times for different qubit calibration routines executed using different electronics.Additional details for each routine are provided in the Appendix 6.2.The black bar in this plot provides the ideal time required for each routine, which, in most cases, is calculated as where T sequence,i is the duration of the whole pulse sequence in the i-th point of the sweep, T relaxation the time we wait for the qubit to relax to its ground state between experiments, n shots the number of shots in each experiment and the sum runs over all points in the sweep.The ideal time denotes how long the qubit is really used during an experiment and provides the baseline for our benchmark.Real executions, shown with a different color for each instrument setup, are longer than ideal, due to overhead coming from compilations and communication to the instruments.After profiling the code, we observe that the overhead coming from the Qibolab backend, T qibo , is negligible compared to that of the control instruments, T inst .Therefore, we can approximate the real execution time as There is a decisive factor regarding the performance of routines that involve sweeps.That is, whether the sweeps run in real-time in the processors embedded in the control electronics or the host computer.The latter approach requires a greater number of communication steps between control electronics and host and typically the programs need to be recompiled multiple times resulting in significant overhead.This can be seen in the Ramsey detuned and standard Randomized Benchmarking (RB) experiments, for which real time sweepers have not been implemented yet, resulting to a significant overhead over the ideal time.Randomized Benchmarking experiments, unlike the rest of routines used here, involve playing multiple random sequences instead of sweeping parameters and their performance is expected to increase when sequence unrolling will be implemented.
The second point affecting performance is the communication with the host computer.This usually involves two steps, the actual communication via network (ethernet) and a compilation step happening on the instrument side.We observe that RFSoC boards controlled using Qibosoq have an advantage in this, particularly from Ramsey detuned and Single shot classification where real-time sweepers are not used.This advantage may be due to the simplicity of our RFSoC configuration, which consists of a single board, in contrast to the other systems which are part of clusters with more controllers.More investigation is needed to confirm this point.As expected, the rest of electronics behave similarly in all performed benchmarks.
In Fig. 7 we demonstrate how execution time of different sweeps scales with the number of points used in the sweep.Similarly to above, we see that RFSoC is faster for short sweeps, due to smaller communication and compilation overhead, however the difference diminishes when we cross 100 points.Other instruments show similar behavior in most cases.Qick does not support real-time sweeping of readout frequency and pulse length, therefore these sweepers are slower when compared to other instruments for more than 100 points.Real-time sweepers are used in all cases presented in this plot, except in Circuits.We are currently implementing sequence unrolling methods, which will allow executing batches of circuits, reducing the communication overhead and thus improving runtime.
The code used for all benchmarks presented in this section is provided in a public repository [124].Experiment duration (ratio with ideal time)

Standard randomized benchmarking
The commonly used technique for assessing the accuracy of single qubit gate implementations is standard randomized benchmarking (RB) [81][82][83][84][85] with the Clifford group (see, e.g., Ref. [80] for a review).The RB protocol performs random sequences of Clifford unitaries of different lengths on a single qubit.Every sequence is concluded with the unitary gate that restores the initial state before measuring the qubit.In the absence of imperfections, the measurement, thus, is expected to be classified as 0 (the initial state) with probability 1 independent of the sequence or its length.Single qubit gate fidelities are defined as functions of the decay parameter of this average survival probability with the sequence length.
RB allows us to holistically test the entire software stack together with the quantum hardware.We define the RB protocol with the Circuit API of Qibo, using U3, RX, RY and RZ gates.Executing them with the Qibolab backend involves transpilation to native gates and compilation to PulseSequence objects that are then executed by a Platform.An example of an RB experiment on a 5-qubit IQM chip controlled using Qibolab's Zurich Instruments drivers is depicted in Fig. 8.

CHSH Experiment
A quantum software solution should allow for the development and deployment of quantum experiments at different levels of abstraction and complexity.In order to showcase this, we prepare an experiment to measure the CHSH inequality [125] between two qubits by building the circuit using three distinct methods allowed by Qibo and Qibolab.Namely, one can build experiments by directly accessing the arbitrary pulse sequences of Qibolab, use the native gate interactions through Qibo, or use logical gate operations and rely on the transpiler for the decomposition.In addition, we incorporate a layer of Readout Error Mitigation [42] that is executed on hardware before the experiment takes place.This type of control is only possible with framework aware of all layers of abstraction such as Qibo.
The CHSH inequality was originally conceived in order to disprove a local hidden-variable description of quantum mechanics and used to prove Bell's theorem [126].The protocol consists of preparing a maximally entangled two-qubit state, and performing a simultaneous measurement on both qubits, with two possible measurement settings.Crucially, a qubit can be measured in two perpendicular basis (e.g.X, Z), and the measurement settings of both qubits have a relative angle θ.Then, the combination of the resulting expectation values should be S ≤ 2 if there is a local hidden-variable theory of quantum mechanics, but go beyond, up to 2 √ 2, if not.In this context, the CHSH experiment is used as validation of both the specifications of the machine and control electronics.Far from disproving local realism, we use this procedure to verify that the control of our chip is precise enough to violate the classical bound.We show in Fig. 9 the results of a CHSH experiment on two connected qubits of a 5-qubit Quant-Ware chip controlled using Qibolab's Qblox drivers for different angles of the measurement setting.The bare values of the CHSH barely cross above the classical bound.However, when readout error mitigation is applied, the values confidently cross above 2.We infer from this figure that the most destructive source of error was in the readout of the measurement pulse, rather than on the control of the two-qubit gate pulse.

Full-stack quantum machine learning
Developing QML algorithms [127][128][129] is particularly challenging in the NISQ era [53].Noise and long execution times are two of the most limiting problems while deploying a Variational Quantum Algorithm [130,131] on a real quantum device.In this context, it is relevant to study how the different levels of computation, from the high-level coding of the algorithm to the low-level deployment on the real qubits, impact the results obtained on simple regression or classification tasks.For this reason, Qibo has become the perfect environment to study both hybdrid [48,[132][133][134] and full-stack [135] QML algorithms.
We define a QML model by building a Variational Quantum Circuit (VQC), in whose rotational gates we encode the u-quark Parton Distribution Func-tion (PDF) data picked up from the NNPDF4.0[136] PDF grid.In particular, we consider as input data N data = 50 values of the momentum fraction x, sampled logarithmically from the range [0, 1].We use the model presented in [132], in which the embedding of the x values is implemented following a data reuploading ansatz [137].
The optimization strategy is then implemented by minimizing a target Mean-Squared Error loss function with respect to the model's parameters.We select a hardware-compatible Adam [138] optimizer, in which we calculate the derivatives of the circuit using the Parameter Shift Rule [129,135,139].Once obtained the optimized parameters vector θ best , we inject them into the circuit and repeat the predictions N runs = 50 times.With the mean and the standard deviation σ of these evaluations we calculate our final estimates and their errors.Finally, we quantify the accuracy of the model by computing the following test statistics: where y j,target is the target PDF value provided by NNPDF4.0 and y j,est is the mean of the N runs predicted values for a fixed data x j .
We perform an initial training in exact simulation using Qibo, followed by N epochs = 60 stochastic Adam descent iterations on a single superconducting qubit controlled by RFSoC via Qibosoq [98].After completing the training, those corresponding to the epoch in which we recorded the lowest loss function value are chosen as the final parameters.Each prediction during the gradient descent is obtained by executing the circuit N nshots = 500 times and we set a learning rate equal to η = 0.1.Adam's parameters are set to be β 1 = 0.85, β 2 = 0.99 and ε = 10 −8 .
In Fig. 10 we show the obtained results after repeating the PDF predictions N runs = 50 times for each data: the solid orange line is drawn using the means of the predictions {y j,est } N data j=1 , while the two confidence intervals are obtained one and two standard deviations from the means.The test statistic value presented in Eq. (4) and calculated with our predictions is MSE = 0.0021.These results show the entire ecosystem can be used to successfully fit the target function even without any error mitigation technique.

Outlook
In this paper we extend the Qibo quantum computing middleware framework by introducing Qibolab, an open-source software library for quantum hardware control.Qibo is designed as a full-stack software framework which provides primitives to define circuit-based quantum algorithms through custom backends, i.e. dedicated plugin software libraries which deploy algorithms on specific hardware.The release of Qibolab unlocks Qibo's potential to execute quantum algorithms on hardware platforms and therefore grant to research institutions and laboratories the possibility to operate self-hosted quantum hardware platforms easily.
We have described the current status of the project structure with the major features implemented in release 0.1.0.The software abstractions, supported drivers and transpiler are at the stage of allowing applications related to cross-platform control instrument performance benchmarks through arbitrary pulse control and physics experiments based on the quantum circuit representation.
Furthermore, we have demonstrated successfully three practical-cases in which Qibolab could be useful for quantum technology research: randomized benchmarking, validation algorithms for qubit entanglement (CHSH experiment) and quantum machine learning applications.Therefore, circuit-based models available in Qibo can be deployed seamlessly on quantum hardware through Qibolab.
In the future releases of Qibolab, we plan to extend its capabilities by interfacing new drivers from more commercial and open-source control system vendors.Thanks to the design of the library, we have the possibility to adapt and scale the API for new electronics including large-scale systems for real-time acquisition and error correction.On the other hand, in this paper we have focused on superconducting chips due to its availability in our affiliated institution labs, however we plan to extend Qibolab to other quantum technologies such as trapped ions, neutral atoms and photonics among others.In fact, there are multiple software similarities among these technologies, e.g. for trapped ions we can already define in Qibolab a custom platform which allocates the relevant pulse sequence to modulate optical lasers with its native gates representation for unitary gate preparation.We plan to have access to this and other quantum hardware technologies in the next years through research collaborations and extend Qibolab accordingly.Finally, we believe that with the inclusion of Qibolab, Qibo has grown into a powerful tool for the quantum computing community, by reducing the effort of software development for researchers in simulation, hardware calibration and operation.

Cross-platform benchmark
In this section we provide some more details on the experiments performed for the performance benchmark presented in Sec.4.1.A more detailed description of these routines is given by [66,67,77].All these experiments were repeated for 4096 shots.For spectroscopies, a relaxation time of 5 µs was used, while for the other experiments it was set at 300 µs.Relaxation time is the waiting time between consecutive shots to let the qubit relax back to the ground state before the next shot is started.
Resonator spectroscopy consists of a single-tone spectroscopy where a pulse is sent through the readout line and acquired through the feedback line.The frequency of the pulse is swept in a specific range, in our case probing 20 or 100 different frequencies.In the calibration of a 3D (2D) resonator, the amplitudes acquired present a positive (negative) peak at the resonance frequency of the resonator.
Qubit spectroscopy consists of a two-tone spectroscopy where a first pulse is sent to the drive line and a measurement (a readout pulse and an acquisition) is performed right after.The frequency of the drive pulse is swept in a specific range.In the example used for the benchmark, 300 frequencies were analyzed.As per the resonator spectroscopy, the amplitude acquired presents a peak for a specific frequency that, in this case, will be used as the drive pulse frequency.
Rabi amplitude first a drive pulse, at the frequency identified with qubit spectroscopy, is sent through the drive line and a measurement is performed right after.The amplitude of the first pulse is swept in a range composed of, in this case, 75 points.This experiment is used to calibrate the amplitude of the pi-pulse (Pauli-X gate) which rotates the qubits from the |0⟩ state to |1⟩.
Ramsey detuned a first pulse is sent through the drive line.Then, after a delay, a new drive pulse is sent with a delay dependent phase and finally a measurement is performed.The delay between the two drive pulses, and therefore the phase, are swept.This experiment is used to fine tune the drive pulse frequency.
T1 experiment the qubit is excited using a calibrated pi-pulse, then measured after a variable time.The characteristic decay shown by this experiment is used to measure the relaxation time T1 of the qubit.
T2 experiment this experiment is almost identical to the Ramsey detuned experiment, but no additional phase is introduced in the second drive pulse.This enables to compute the characteristic dephasing time T2.

Single shot classification
The qubit is first just measured at the initial |0⟩ state, and then excited and measured in the |1⟩ state.The results are used to calibrate the classification between measured states.
Standard RB First, a certain number (iterations) of circuits composed of Clifford gates is randomly generated.These circuits are executed and an average fidelity is computed.Then, new circuits are generated increased depth and the procedure is repeated.The fidelity is supposed to decrease exponentially with the number of gates per circuit, leading to an estimation of the average error per gate.

Figure 1 :
Figure 1: Schematic overview of Qibo software components, including backends and tools, for release 0.2.0.

Figure 2 :
Figure 2: Schematic description of the qibojit backend features.

Fig. 3 Figure 3 :
Figure 3: Basic setup of a self-hosted QPU.The host computer running Qibolab communicates with the different electronics used to control a QPU.

Figure 6 :
Figure 6: Execution time of different qubit calibration routines on various electronics.On the left side we show the absolute times in seconds for each experiment.The ideal time (black bar) shows the minimum time the qubit needs to be affected in each experiment.On the right side we calculate the ratio between actual execution time and ideal time.Real-time sweepers are used, if supported by the control device, in all cases except the Ramsey detuned and Standard RB experiments.

Figure 7 :
Figure 7: Scaling of execution time as a function of the number of points in a sweep.Bottom plots show the ratio between real execution on different instruments and minimum ideal time.Real-time sweepers are used in all cases, except the last Circuits plot where we use the standard RB experiment to generate a given number of random circuits to execute.

Figure 9 : 2 .
Figure 9: Results of bare (yellow) and mitigated (red) CHSH values from an experiment on two qubits on a 5-qubit Quant-Ware chip controlled via Qibolab's Qblox drivers.Readout error mitigation significantly enhances the results of the CHSH inequality, bringing it past the classical bound (blue line).The initial entangled state prepared for this experiment is (|01⟩ − |10⟩)/ √ 2. The significant improvement produced by readout error mitigation hints that readout error dominates in the deterioration of the experimental results.

Figure 10 :
Figure 10: Estimates of N data = 50 points of the u-quark PDF using the 1-qubit device controlled by the RFSoC.The target values (black line) are compared with the estimates obtained with the qubit.The solid orange line and the confidence intervals are calculated by repeating Nruns = 50 times the estimations with the trained model and then calculating means and standard deviation of the mean of the Nruns predictions.In particular, the two confidence intervals are computed using 1σ and 2σ errors.

Table 1 :
Outline of the supported devices, along with firmware/software version currently supported.
a See appendix[6.1] 1, while in Table2we present an overview of the primary features supported by the RTS on the frequency of readout pulses not supported. b

Table 2 :
Features or limitations of the main drivers supported by Qibolab 0.1.0.
Qibolab unroller applies recursively a set of hardcoded gates decompositions in order to translate any gate into single and two-qubit native gates.Single qubit gates are translated into U3, RX, RZ, X and Z gates.It is possible to fuse multiple single qubit gates acting on the same qubit into a single U3 gate.

Table 3
shows the firmware version of each Zurich Instruments device used in this work.

Table 3 :
Zurich FPGA internal controller software and HDL revision.