Transfer learning in hybrid classical-quantum neural networks

We extend the concept of transfer learning, widely applied in modern machine learning algorithms, to the emerging context of hybrid neural networks composed of classical and quantum elements. We propose different implementations of hybrid transfer learning, but we focus mainly on the paradigm in which a pre-trained classical network is modified and augmented by a final variational quantum circuit. This approach is particularly attractive in the current era of intermediate-scale quantum technology since it allows to optimally pre-process high dimensional data (e.g., images) with any state-of-the-art classical network and to embed a select set of highly informative features into a quantum processor. We present several proof-of-concept examples of the convenient application of quantum transfer learning for image recognition and quantum state classification. We use the cross-platform software library PennyLane to experimentally test a high-resolution image classifier with two different quantum computers, respectively provided by IBM and Rigetti.


I. INTRODUCTION
Transfer learning is a typical example of an artificial intelligence technique that has been originally inspired by biological intelligence. It originates from the simple observation that the knowledge acquired in a specific context can be transferred to a different area. For example, when we learn a second language we do not start from scratch, but we make use of our previous linguistic knowledge. Sometimes transfer learning is the only way to approach complex cognitive tasks, e.g., before learning quantum mechanics it is advisable to first study linear algebra. This general idea has been successfully applied also to design artificial neural networks [1][2][3]. It has been shown [4,5] that in many situations, instead of training a full network from scratch, it is more efficient to start from a pre-trained deep network and then optimize only some of the final layers for a particular task and dataset of interest (see Fig. 1).
The aim of this work is to investigate the potential of the transfer learning paradigm in the context of quantum machine learning [6][7][8]. We focus on hybrid models [9][10][11], i.e., the scenario in which quantum variational circuits [11][12][13][14][15][16] and classical neural networks can be jointly trained to accomplish hard computational tasks. In this setting, in addition to the standard classical-toclassical (CC) transfer learning strategy in which some pre-acquired knowledge is transferred between classical networks, three new variants of transfer learning naturally emerge: classical to quantum (CQ), quantum to classical (QC) and quantum to quantum (QQ).
In the current era of Noisy Intermediate-Scale Quantum (NISQ) devices [17], CQ transfer learning is particularly appealing since it opens the possibility to classically pre-process large input samples (e.g., high resolution images) with any state-of-the-art deep neural network and to successively manipulate few but highly informative features with a variational quantum circuit. This scheme is quite convenient since it makes use of the power of quantum computers, combined with the successful and well-tested methods of classical machine learn- ing. On the other hand, QC and QQ transfer learning might also be very interesting approaches especially once large quantum computers will be available. In this case, fixed quantum circuits might be pre-trained as generic quantum feature extractors, mimicking well known classical models which are often used as pre-trained blocks: e.g., AlexNet [18], ResNet [19], Inception [20], VGGNet [21], etc. (for image processing), or ULMFiT [22], Transformer [23], BERT [24], etc. (for natural language processing). In summary, such classical state-of-the-art deep networks can either be used in CC and CQ transfer learning or replaced by quantum circuits in the QC and QQ variants of the same technique.
Up to now, the transfer learning approach has been largely unexplored in the quantum domain with the exception of a few interesting applications, for example, in modeling many-body quantum systems [25][26][27], in the connection of a classical autoencoder to a quantum Boltz-mann machine [28] and in the initialization of variational quantum networks [29]. With the present work we aim at developing a more general and systematic theory, specifically tailored to the emerging paradigms of variational quantum circuits and hybrid neural networks.
For all the models theoretically proposed in this work, proof-of-principle examples of practical implementations are presented and numerically simulated. Moreover we also experimentally tested one of our models on physical quantum processors-ibmqx4 by IBM and Aspen-4-4Q-A by Rigetti-demonstrating for the first time the successful classification of high resolution images with a quantum computer.

II. HYBRID CLASSICAL-QUANTUM NETWORKS
Before presenting the main ideas of this work, we begin by reviewing basic concepts of hybrid networks and introduce some notation.

A. Classical neural networks
A very successful model in classical machine learning is that of deep feed-forward neural networks [30]. The elementary block of a deep network is called a layer and maps input vectors of n 0 real elements to output vectors of n 1 real elements. Its typical structure consists of an affine operation followed by a non-linear function ϕ applied element-wise, Here, the subscript n 0 → n 1 indicates the number of input and output variables, x and y are the input and output vectors, W is an n 1 × n 0 matrix and b is a constant vector of n 1 elements. The elements of W and b are arbitrary real parameters (respectively known as weights and baises) which are supposed be trained, i.e., optimized for a particular task. The nonlinear function ϕ is quite arbitrary but common choices are the hyperbolic tangent or the rectified linear unit defined as ReLU(x) = max(0, x). A classical deep neural network is the concatenation of many layers, in which the output of the first is the input of the second and so on: where different layers have different weights. Characteristic hyper-parameters of a deep network are its depth d (number of layers) and the number of features (number of variables) for each layer, i.e., the sequence of integers n 0 , n 1 . . . n d−1 .

B. Variational quantum circuits
One of the possible quantum generalizations of feedforward neural networks can be given in terms of variational quantum circuits [9][10][11][12][13][14][15][16]. Following the analogy with the classical case, one can define a quantum layer as a unitary operation which can be physically realized by a low-depth variational circuit acting on the input state |x of n q quantum subsystems (e.g., qubits or continuous variable modes) and producing the output state |y : where w is an array of classical variational parameters. Examples of quantum layers could be: a sequence of single-qubit rotations followed by a fixed sequence of entangling gates [13,15] or, for the case of optical modes, some active and passive Gaussian operations followed by single-mode non-Gaussian gates [16]. Notice that, differently from a classical layer, a quantum layer preserves the Hilbert-space dimension of the input states. This fact is due to the fundamental unitary nature of quantum mechanics and, as discussed at the end of this section, should be taken into account when designing quantum networks. A variational quantum circuit of depth q is a concatenation of many quantum layers, corresponding to the product of many unitaries parametrized by different weights: In order to inject classical data in a quantum network we need to embed a real vector x into a quantum state |x . This can also be done by a variational embedding layer depending on x and applied to some reference state (e.g., the vacuum or ground state), Typical examples are single-qubit rotations or singlemode displacements parametrized by x. Notice that, differently from L, the embedding layer E is a map from a classical vector space to a quantum Hilbert space. Conversely, the extraction of a classical output vector y from the quantum circuit can be obtained by measuring the expectation values of n q local observableŝ y = [ŷ 1 ,ŷ 2 , . . .ŷ nq ]. We can define this process as a measurement layer, mapping a quantum state to a classical vector: Globally, the full quantum network including the initial embedding layer and the final measurement can be written as The full network is a map from a classical vector space to a classical vector space depending on classical weights.
Therefore, even though it may contain a quantum computation hidden in the quantum circuit, if considered from a global point of view, Q is simply a black-box analogous to the classical deep network defined in Eq. (2). However, especially when dealing with real NISQ devices, there are technical limitations and physical constraints which should be taken into account: while in the classical feed-forward network of Eq. (2) we have complete freedom in the choice of the number of features for each layer; in the quantum network of Eq. (7) all these numbers are often linked to the size of the physical system. For example, even if not strictly necessary, typical variational embedding layers encode each classical element of x into a single subsystem and so, in many practical situations, one has: #inputs = # subsystems = # outputs.
This common constraint of a variational quantum network could be overcome by: 1. adding ancillary subsystems and discarding/measuring some of them in the middle of the circuit; 2. engineering more complex embedding and measuring layers; 3. adding pre-processing and post-processing classical layers.
In this work, mainly because of its technical simplicity, we choose the third option and we formalize it through the notion of dressed quantum circuits introduced in the next subsection.

C. Dressed quantum circuits
In order to apply transfer learning at the classicalquantum interface, we need to connect classical neural networks to quantum variational circuits. Since in general the size of the classical and quantum networks can be very different, it is convenient to use a more flexible model of quantum circuits.
Let us consider the variational circuit defined in Eq. (7) and based on n q subsystems. With the aim of adding some basic pre-processing and post-processing of the input and output data we place a classical layer at the beginning and at the end of the quantum network, obtaining what we might call a dressed quantum circuit: where L n→n is given in Eq. (1) and Q is the associated bare quantum circuit defined in Eq. (7). Differently from a complex hybrid network in which the computation is shared between cooperating classical and quantum processors, in this case the main computation is performed by the quantum circuit Q, while the classical layers are mainly responsible for the data embedding and readout. A similar hybrid model was studied in [31], but applied to a generative quantum Helmholtz machine We can say that from a hardware point of view a dressed quantum circuit is almost equivalent to a bare one. On the other hand, it has two important advantages: 1. the two classical layers can be trained to optimally perform the embedding of the input data and the post-processing of the measurement results; 2. the number of input and output variables are independent from the number of subsystems, allowing for flexible connections to other classical or quantum networks.
Even if our main motivation for introducing the notion of dressed quantum circuits is a smoother implementation of transfer learning schemes, this is also a quite powerful machine learning model in itself and constitutes a non-trivial contribution of this work. In the Examples section, a dressed quantum circuit is successfully applied to the classification of a non-linear benchmark dataset (2D spirals).

III. TRANSFER LEARNING
In this section we discuss the main topic of this work, i.e., the idea of transferring some pre-acquired "knowledge" between two networks, say from network A to network B, where each of them could be either classical or quantum.
As discussed in the previous section, if considered as a black box, the global structure of a quantum variational circuit is similar to that of a classical network (see Eqs. (7), (9) and (2)). For this reason, we are going to define the transfer learning scheme in terms of two generic networks A and B, independently from their classical or quantum physical nature.
Generic transfer learning scheme (see Fig. 1): 1. Take a network A that has been pre-trained on a dataset D A and for a given task T A .
2. Remove some of the final layers. In this way, the resulting truncated network A can be used as a feature extractor. Following the common convention used in classical machine learning [1][2][3][4][5], all situations in which there is a change of dataset D B = D A and/or a change of the final task T B = T A can be identified as transfer learning methods. The general intuition behind this training approach is that, even if A has been optimized for a specific problem it can still act as a convenient feature extractor also for a different problem. This trick is improved by truncating the final layers of A (step 2), since the final activations of a network are usually more tuned to the specific problem, while intermediate features are more generic and so more suitable for transfer learning. In our hybrid setting, the fact that the networks A and B can be either classical or quantum gives rise to a rich variety of hybrid transfer learning models summarized in Table I. For the reader familiar with quantum communication theory, this kind of classification might look similar to that of hybrid channels in which information can be exchanged between quantum and classical systems.
Here however there is a fundamental difference: what is actually transferred in this case is not raw information but some more structured and organized learned representations. We expect that the problem of transferring structured knowledge between systems governed by different physical laws (classical/quantum) could stimulate many interesting foundational and philosophical questions. The aim of the present work is however much more pragmatic and consists of studying practical applications of this idea.
A. Classical to quantum transfer learning As discussed in the introduction, the CQ transfer learning approach is perhaps the most appealing one in the current technological era of NISQ devices. Indeed today we are in a situation in which intermediatescale quantum computers are approaching the quantum supremacy milestone [32,33] and, at the same time, we have at our disposal the very successful and well-tested tools of classical deep learning. The latter are universally recognized as the best-performing machine learning algorithms, especially for image and text processing.
In this classical field, transfer learning is already a very common approach, thanks to the large zoo of pre-trained deep networks which are publicly available [34]. CQ transfer learning consists of using exactly those classical pre-trained models as feature extractors and then post-processing such features on a quantum computer; for example by using them as input variables for the dressed quantum circuit model introduced in Eq. (9). This hybrid approach is very convenient for processing highresolution images since, in this configuration, a quantum computer is applied only to a fairly limited number of abstract features, which is much more feasible compared to embedding millions of raw pixels in a quantum system. We would like to mention that also other alternative approaches for dealing with large images have been recently proposed [28,[35][36][37].
We applied our model for the task of image classification in several numerical examples and we also tested the algorithm with two real quantum computers provided by IBM and Rigetti. All the details about the technical implementation and the associated results are reported in the next Section, Examples 2 and 3.

B. Quantum to classical transfer learning
By switching the roles of the classical and quantum networks, one can also obtain the QC variant of transfer learning. In this case a pre-trained quantum system behaves as a kind of feature extractor, i.e., a device performing a (potentially classically intractable) computation resulting in an output vector of numerical values associated to the input. As a second step, a classical network is used to further process the extracted features for the specific problem of interest. This scheme can be very useful in two important situations: (i) if the dataset consists of quantum states (e.g., in a state classification problem), (ii) if we have at our disposal a very good quantum computer which outperforms current classical feature extractors at some task.
For case (i), one can imagine a situation in which a single instance of a variational quantum circuit is first pre-trained and then used as a kind of multipurpose measurement device. Indeed one could make many different experimental analyses by simply letting input quantum systems pass through the same fixed circuit and applying different classical machine learning algorithms to the associated measured variables.
For case (ii) instead, one can envisage a multi-party scenario in which many classical clients B can independently send samples of their specific datasets to a common quantum server A which is pre-trained to extract generic features by performing a fixed quantum computation. Server A can send back the resulting features to the classical clients B, which can now locally train their specific machine learning models on pre-processed data.
Given the current status of quantum technology, case (ii) is likely beyond a near-term implementation. On the other hand, case (i) could already represent a realistic scenario with current technology.
In Example 4 of the next Section, we present a proofof-concept example in which a pre-trained quantum net-work introduced in Ref. [16] is combined with a classical post-processing network for solving a quantum state classification problem.

C. Quantum to quantum transfer learning
The last possibility is the QQ transfer learning scheme, where the same technique is applied in a fully quantum mechanical fashion. In this case a quantum network A is pre-trained for a generic task and dataset. Successively, some of the final quantum layers are removed, and replaced by a trainable quantum network B which will be optimized for a specific problem. The main difference from the previous cases is that, since the process is fully quantum without intermediate measurements, features are implicitly transferred in the form of a quantum state, allowing for coherent superpositions.
The main motivation for applying a QQ transfer learning scheme is to reduce the total training time: instead of training a large variational quantum circuit, it is more efficient to initialize it with some pre-trained weights and then optimize only a couple of final layers. From a physical point of view, such optimization of the final layers could be interpreted as a change of the measurement basis which is tuned to the specific problem of interest.
If compared with classical computers, current NISQ devices are not only noisy and small: they are also relatively slow. Training a quantum circuit might take a long time since it requires taking many measurement shots (i.e., performing a large number of actual quantum experiments) for each optimization step (e.g., for computing the gradient). Therefore any approach which can reduce the total training time, as for example the QQ transfer learning scheme, could be very helpful.
In Example 5 of the next Section, we trained a quantum state classifier by following a QQ transfer learning approach.

Example 1 -A 2D classifier based on a dressed quantum circuit
This first example demonstrates the dressed quantum circuit model introduced in Eq. (9). We consider a typical benchmark dataset consisting of two classes of points (blue and red) organized in two concentric spirals as shown in Fig. 2. Each point is characterized by two real coordinates and we assume to have at our disposal a quantum processor of 4 qubits. Since we have two real coordinates as input and two real variables as output (one-hot encoding the blue and red classes), we use the following model of a dressed quantum circuit: where L 2→4 represents a classical layer having the structure of Eq. (1) with ϕ = tanh, Q is a (bare) variational quantum circuit, and L 4→2 is a linear classical layer without activation i.e., with ϕ(y) = y. The structure of the variational circuit is Q = M • Q • E as in Eq. (7). The chosen embedding map prepares each qubit in a balanced superposition of |0 and |1 and then performs a rotation around the y axis of the Bloch sphere parametrized by a classical vector x: where H is the single-qubit Hadamard gate. The trainable circuit is composed of 5 variational layers Q = and K is an entangling unitary operation made of three controlled NOT gates: Finally, the measurement layer is simply given by the expectation value of the Z = diag(1, −1) Pauli matrix, locally estimated for each qubit: Given an input point of coordinates x = (x 1 , x 2 ), the classification is done according to argmax(y), where y = (y 1 , y 2 ) is the output of the dressed quantum circuit (10).
For training and testing the model, the dataset has been divided into 2000 training points (pale-colored in Fig. 2) and 200 test points (sharp-colored in Fig. 2). As typical in classification problems, the cross entropy (implicitly preceded by a LogSoftMax layer) was used as a loss function and minimized via the Adam optimizer [38]. A total number of 1000 training iterations were performed, each of them with a batch size of 10 input samples. The numerical simulation was done through the PennyLane software platform [39].
The results are reported in Fig. 2, where the dressed quantum network is also compared with an entirely classical counterpart in which the quantum circuit is replaced by a classical layer, i.e., C = L 4→2 • L 4→4 • L 2→4 . The corresponding accuracy, i.e., the fraction of test points correctly classified, is 0.97 for the dressed quantum circuit and 0.85 for the classical network.
The presented results suggest that a dressed quantum circuit is a very flexible quantum machine learning model which is capable of classifying highly non-linear datasets. We would like to remark that the classical counter-part has been presented just as a qualitative benchmark: even if in this particular example the quantum model outperforms the classical one, any general and rigorous comparison would require a much more complex and detailed analysis which is beyond the aim of this work.

Example 2 -CQ transfer learning for image classification (ants / bees)
In this second example we apply the classical-toquantum transfer learning scheme for solving an image classification problem. We first numerically trained and tested the model, using PennyLane with the PyTorch [40] interface. Successively, we have also run it on two real quantum devices provided by IBM and Rigetti. To our knowledge, this is the first time that high resolution images have been successfully classified by a quantum computer.
Our example is a quantum model inspired by the official PyTorch tutorial on classical transfer learning [41]. The model can be defined in terms of the general CQ scheme proposed in Section III and represented in Fig. 1, with the following specific settings: D A = ImageNet: a public image dataset with 1000 classes [42].
The bare variational circuit is essentially the same as the one used in the previous example (see Eqs. (11,12,13,14)), with the only difference that in this case the quantum depth is set to 6. The cross entropy is used as a loss function and minimized via the Adam optimizer [38]. We trained the variational parameters of the model for 30 epochs over the training dataset, with a batch size of 4 and an initial learning rate of η = 0.0004, which was successively reduced by a factor of 0.1 every 10 epochs. After each epoch, the model was validated with respect to the test dataset, obtaining a maximum accuracy of 0.967. A visual representation of a random batch of images sampled from the test dataset and the corresponding predictions is given in Fig. 3.
We also tested the model (with the same pre-trained parameters), on two different real quantum computers: the ibmqx4 processor by IBM and the Aspen-4-4Q-A processor by Rigetti (see Fig. 4). The corresponding classification accuracies, evaluated on the same test dataset, are reported in Table II. Our results demonstrate the promising potential of the  FIG. 4. Two high-resolution images sampled from the dataset DB and experimentally classified with two different quantum processors: ibmqx4 by IBM (first line) and Aspen-4-4Q-A by Rigetti (second line). In both cases the same pre-trained classical network (ResNet18 by Microsoft [19]) was used to preprocess the input image, extracting 512 highly informative features. The rest of the computation was performed by a trainable variational quantum circuit "dressed" by two classical encoding and decoding layers, as described in Eq. (9).
CQ transfer learning scheme applied to current NISQ devices, especially in the context of high-resolution image processing.

Example 3 -CQ transfer learning for image classification (CIFAR)
We now apply the same CQ transfer learning scheme of the previous example but with a different dataset D B . Instead of classifying images of ants and bees, we use the standard CIFAR-10 dataset [43] restricted to the classes of cats and dogs. Successively we also repeat again the training and testing phases with the CIFAR-10 dataset restricted to the classes of planes and cars (see Fig. 5).
We remark that, in both cases, the feature extractor ResNet18 is pre-trained on ImageNet. Despite CIFAR-10 and ImageNet being quite different datasets (they also have very different resolutions), the CQ transfer learning method achieves nonetheless relatively good results. The hyper-parameters used for all the different datasets (including the previous example) are summarized in Table III. The corresponding test accuracies are also reported in Table III, while some of the predictions for random samples of the restricted CIFAR datasets are visualized in Fig. 5.

Example 4 -QC transfer learning for quantum state classification
Quantum to classical (QC) transfer learning consists of using a pre-trained quantum circuit as a feature extractor and in post-processing its output variables with a classical neural network. In this case only the final classical part will be trained to the specific problem of interest.
The starting point of our example is the pre-trained continuous-variable quantum network presented in Ref. [16], Section IV.D, Experiment C. The original aim of this network was to encode 7 different 4 × 4 images, representing the (L,O,T,I,S,J,Z) tetrominos (popularized by the video game Tetris [44]), in the Fock basis of two-mode quantum states. The expected input of the quantum network is one of the following combinations of two-mode coherent states: where the parameter α = 1.4 is a fixed constant. In Ref. [16] the network was successfully trained to generate an optimal unitary operation |φ j = U |ϕ j , such that the probability of finding i photons in the first mode and j photons in the second mode is proportional to the amplitude of the image pixel (i, j). More precisely, the network was trained to reproduce the tetromino images after projecting the quantum state on the subspace of up to 3 photons (see Fig. 6).
FIG. 6. Images of the 7 different tetrominos encoded in the photon number probability distribution of two optical modes, after projecting on the subspace of up to 3 photons. These images are extracted from Fig. 10 of Ref. [16].
For the purposes of our example, we now assume that the previous 7 input states (15) are subject to random Gaussian displacements in phase space: whereD(α 1 , α 2 ) is a two-mode displacement operator [45], the values of the complex displacements δα 1 and δα 2 are sampled from a symmetric Gaussian distribution with zero mean and quadrature variance N ≥ 0, and j ∈ {1, 2, 3, 4, 5, 6, 7} is the label associated to the input states (15). The noise is similar to a Gaussian additive channel [45]; however, for simplifying the numerical simulation, here we assume that the unknown displacements remain constant during the estimation of expectation values. Physically, this situation might represent a slow phase-space drift of the input light mode. We also assume that, differently from the original image encoding problem studied in Ref. [16], our new task is to classify the noisy input states. In other words, the network should take the states defined in (16) as inputs, and should ideally produce the correct label j ∈ {1, 2, 3, 4, 5, 6, 7} as output. In order to tackle this problem, we apply a QC transfer learning approach: we pre-process our random input states with the quantum network of Ref. [16] and we consider the corresponding images as features which we are going to post-process with a classical layer to predict the state label j. In simple terms, the QC transfer learning method allows us to convert a quantum state classification problem into an image classification problem.
Also in this case we can summarize the transfer learning scheme according to the notation introduced in Section III and represented in Fig. 1: Also in this case we used the Adam optimizer [38] to minimize a cross-entropy loss function associated to our classification problem. For each optimization step we sampled independent random displacements with variance N = 0.6 that we applied to a batch of 7 states defined in Eq. (15). We optimized the model over 1000 training batches with a learning rate of η = 0.01, obtaining a classification accuracy of 0.803. The numerical simulation was performed with the Strawberry Fields software platform [46], combined with the TensorFlow [47] optimization back-end.
A summary of the hyper-parameters and of the corresponding accuracy is given in Table IV Finally, the predictions for a sample of 7 noisy states are graphically visualized in Fig. 7, where the features extracted by the pre-trained quantum network A are represented as 4 × 4 gray scale images. The features of Fig. 7 are quite different from the original tetrominos images shown in Fig. 6. This due to the truncation of network A and to the presence of input noise. However, as long as the images of Fig. 7 are distinguishable, this is not a relevant issue since the final classical layer is still able to correctly classify the input states with high accuracy. quantum and classical depths. Since the original pretrained network A has 25 quantum layers, for the truncated network A we can choose a quantum depth q within the interval 0-25. For the classical network B we consider the cases of 1, 2 and 3 layers, corresponding to the models L 16→7 , L 16→7 • L 16→16 and L 16→7 • L 16→16 • L 16→16 , respectively.
The results are shown in Fig. 8. By direct inspection we can see that increasing the classical depth is helpful but it saturates the accuracy already after two layers. On the other hand, it is evident that the quantum depth has an optimal value around q = 15 while for larger values the accuracy is reduced. This is a paradigmatic phenomenon well known in classical transfer learning: better features are usually extracted after removing some of the final layers of A. Notice that because of the quantum nature of the system, the quantum state produced by the truncated variational circuit could be entangled and/or not aligned with the measurement basis. So the numerical evidence that the truncation of a quantum network does not always reduce the quality of the measured features, but it can actually be a convenient strategy for transfer learning, is a notable result. Finally, our last example is a proof-of-principle demonstration of QQ transfer learning. In this case we train an optical network A to classify a particular dataset D A of Gaussian and non-Gaussian quantum states. Successively, we use it as a pre-trained block for a dataset D B consisting of Gaussian and non-Gaussian states which are different from those of D A . The pre-trained block is followed by some quantum variational layers that will be trained to classify the quantum states of D B . Before presenting our model we need to define a continuous-variable single-mode variational layer, the analog of Eq. (12). We follow the general structure proposed in Ref. [16]: where R is a phase space rotation, S is a squeezing operation, D is a displacement and Φ is a cubic phase gate. All operations depend on variational parameters and, for sufficiently many layer applications, the model can generate any single-mode unitary operation. Moreover, by simply removing the last non-Gaussian gate Φ from (17), we obtain a Gaussian layer which can generate all Gaussian unitary operations. We can express the QQ transfer learning model of this example according to the notation introduced in Section III and represented in Fig. 1: B = Single-mode variational quantum circuit of depth q, followed by a on/off threshold detector.
A summary of the hyper-parameters used for defining and training this QQ model is reported in Table V, together with the associated accuracy. In Fig. 9 we plot the loss function (cross entropy) of our quantum variational classifier with respect to the number of training iterations. We compare the results obtained with and without the pre-trained layer A (i.e., with and without transfer learning), for a fixed total depth of 3 or 4 layers. It is clear that the QQ transfer learning approach offers a strong advantage in terms of training efficiency.
For a sufficiently long training time however, the network optimized from scratch achieves the same or better results with respect to the network with a fixed initial layer A . This effect is well known also in the classical setting and it is not surprising: the network trained from scratch is in principle more powerful by construction, because it has more variational parameters. However, there are many practical situations in which the training resources are limited (especially when dealing with real NISQ devices) or in which the dataset D B is experimentally much more expensive with respect to D A . In all these kind of practically constrained situations, QQ transfer learning could represent a very convenient strategy.

V. CONCLUSIONS
We have outlined a framework of transfer learning which is applicable to hybrid computational models where variational quantum circuits can be connected to classical neural networks. With respect to the wellstudied classical scenario, in hybrid systems several new and promising opportunities naturally emerge as, for example, the possibility of transferring some pre-acquired knowledge at the classical-quantum interface (CQ and QC transfer learning) or between two quantum networks (QQ transfer learning). As an additional contribution, we have also introduced the notion of "dressed quantum circuits", i.e., variational quantum circuits augmented with two trainable classical layers which improve and simplify the data encoding and decoding phases.
Each theoretical idea proposed in this work is supported with a proof-of-concept example, numerically demonstrating the validity of our models for practical applications such as image recognition or quantum state classification. Particular focus has been dedicated to the CQ transfer learning scheme because of its promising potential with currently available quantum computers. In particular we have used the CQ transfer learning method to successfully classify high resolution images with two real quantum processors (by IBM and Rigetti).
From our theoretical and experimental analysis, we can conclude that transfer learning is a promising approach, allowing to get performances which can already compete with classical algorithms, despite the early stage of current quantum technology. In the hybrid classicalquantum scenario considered in this work, transfer learning could be a key tool to help observe evidence of a quantum advantage in the near future.