Qubit-efficient encoding schemes for binary optimisation problems

We propose and analyze a set of variational quantum algorithms for solving quadratic unconstrained binary optimization problems where a problem consisting of $n_c$ classical variables can be implemented on $\mathcal O(\log n_c)$ number of qubits. The underlying encoding scheme allows for a systematic increase in correlations among the classical variables captured by a variational quantum state by progressively increasing the number of qubits involved. We first examine the simplest limit where all correlations are neglected, i.e. when the quantum state can only describe statistically independent classical variables. We apply this minimal encoding to find approximate solutions of a general problem instance comprised of 64 classical variables using 7 qubits. Next, we show how two-body correlations between the classical variables can be incorporated in the variational quantum state and how it can improve the quality of the approximate solutions. We give an example by solving a 42-variable Max-Cut problem using only 8 qubits where we exploit the specific topology of the problem. We analyze whether these cases can be optimized efficiently given the limited resources available in state-of-the-art quantum platforms. Lastly, we present the general framework for extending the expressibility of the probability distribution to any multi-body correlations.


Introduction
In recent years, important experimental breakthroughs have propelled quantum computing as one of the most thriving fields of research [3,8,44,48], with the long-term goal of building universal quantum computers capable of running algorithms with provable quantum speed-up [19,45]. As the first generations of quantum hardware, referred to as noisy intermediate-scale quantum (NISQ) devices [40], do not yet fulfill the technical requirements to implement error-corrected universal quantum computing, increasing efforts are dedicated to design near-term algorithms capable of performing computational tasks with imperfect and limited quantum resources [4,31]. Amongst the most promising paradigms are the variational quantum algorithms (VQA) [13,22,33,35,38]. In these algorithms, a parameterized quantum circuit is optimized using classical computing resources to generate a quantum state that represents an accurate approximate solution of the problem at hand. While a formal proof of any quantum advantages these algorithms might bring has yet to be found [43], applications of NISQ devices to real-world problems are already being explored in chemistry [32] and unconstrained binary optimisation (QUBO) problems [18].
The QUBO model is an NP-hard combinatorial problem that consists of minimizing a cost function of the form C x = x A x, where x ∈ {0, 1} nc is a vector of n c classical binary variables and A is a real and symmetric matrix. VQAs such as the quantum approximate optimization algorithm (QAOA) [1, 5, 13-15, 21, 36, 37, 41, 49] and hardware efficient [6,42] approaches have been applied to find approximate solutions to QUBO problems. QAOA in particular is able to ensure a lower bound on the quality of its solutions for sparse instances of QUBO problems using shallow circuits. This quality is then able to monotonically converge towards the exact solution for infinitely deep circuits recovering quantum adiabatic computing [13,51]. Recent experiments have however highlighted the challenges in implementing the QAOA on problem graphs that differ from the native connectivity of the quantum hardware for increasing system sizes [21]. Hardware efficient approaches, on the other hand, are motivated by the simplicity of their imple-mentations but do not guarantee a lower bound on the quality of the solutions. This is usually accomplished using series of gates native to the quantum platform and is unconstrained by the topology of the QUBO problem. However, depending on the implementation, these hardware efficient approaches can be plagued by exponentially large barren plateaus in their optimization landscapes as the number of qubits increases [34]. In addition to increasing algorithmic difficulties, the engineering overhead of scaling up the quantum hardware also currently limits the size of computational tasks to toy models. Previously proposed schemes to implement quantum algorithms to solve optimization problems have used a number of classical variables equal to the number of qubits available and were therefore limited to problem sizes involving only a few tens of them [1,5,6,21,36,37,41,42,49]. This is not representative of real-world optimization problems, where the number of classical variables n c involved can be on the order of 10 4 .
In this work, we tackle this problem by proposing an encoding scheme for QUBO models with n c variables that can be implemented on O(log n c ) number of qubits. We devise a strategy using n a ancilla qubits and n r register qubits to divide the QUBO problem into 2 nr subsystems of n a classical variables, requiring a total of n q = n a + n r qubits. This approach allows for a simultaneous search through each subsystem by exploiting the intrinsic parallelism offered by quantum devices. In this context, the resulting variational quantum state that encodes a probability distribution over all classical solutions is capable of capturing any n a -body correlations between a number of QUBO variables that scale exponentially with n r . This heuristic approach allows for the systematic increase in the correlations that can be captured in the probability distribution by progressively increasing the number of qubits. As an example, in the limit where each subsystem is composed of only a single classical variable, i.e. all correlations between classical variables are neglected, n r = log 2 (n c ) and optimization problems of size n c ∼ 10 4 could be tackled on quantum hardware with no more than 15 qubits. We emphasize that this limiting case of n a = 1 can be efficiently classically simulated and therefore should not provide any quantum speed-up. At the other end of the spectrum, the most resource intensive limit of our encoding scheme is reached when n q = n a = n c and recovers the traditional approaches that are classically intractable and possibly offer quantum advantages. This encoding scheme provides a systematic way to traverse between these limits, thus allowing one to balance between capturing selected amounts of correlations whilst respecting the hardware capabilities of modern day devices. With the expected capabilities of upcoming NISQ devices, this scheme paves the way to explore the boundaries of classical intractability for real-world problem sizes.
In what follows, we introduce the general idea of our systematic encoding scheme and numerically demonstrate how the limit of n a = 1 is able to solve QUBO problems while significantly reduce the number of qubits required. From there, we make the first step towards more expressive encoding by considering protocols to capture different subsets of two-body correlations and explore whether they can be optimized efficiently. We demonstrate numerically how a selective encoding scheme can be applied to the Max-Cut problem and show that exploiting the topology of a specific problem to select an efficient subset of correlations leads to better solutions. All protocols proposed in this manuscript are in line with the limitations of the current state-of-the-art quantum platforms.

QUBO model and the complete encoding scheme
The QUBO model is an NP-hard combinatorial problem that consists of minimizing a cost function of the form C x = x A x, where x ∈ {0, 1} nc is a vector of n c classical binary variables and A is a real and symmetric matrix. This model is of particular interest due to its relationship with other optimization problems such as the Max-Cut, portfolio optimization and facility allocation problems [23,25,30]. Existing metaheuristic approaches such as the TABU search, genetic algorithms, and simulated annealing are capable of finding suitable solutions to problems consisting of n c ∼ 10 4 classical variables [16,17].
In recent implementations of VQA applied to solving QUBO problems, each binary variable in x is represented by a single qubit, i.e. n q = n c ; a mapping which we will refer to as the complete encoding. The resulting quantum state is parameterized by a set of angles θ with the general form whereÛ cp ( θ) is the unitary evolution implemented on the quantum platform, 1} is the complete computational basis spawn by the n q qubits and |ψ 0 is a given input state. By associating a classical solution x with a basis state | x , the state |ψ cp ( θ) is able to encode all possible classical solutions in a linear superposition. This unique property of quantum mechanics opens the possibility for multiple classical solutions to be tested simultaneously and this intrinsic parallelism is a strong motivator in developing quantum algorithms for classical problems.
In the case whereÛ cp ( θ) is a universal quantum circuit, all α x in Eq. (1) can in principle be independent (up to the normalization condition). Consequently, this quantum state is able to capture all possible correlations between the classical variables and exhibits Complete encoding x 7 x 8 x 1 x 2 x 3 x 4 x 5 x 6 In the minimal encoding, each of the 2 nr basis states |φi is used to represent a single classical variable xi (vertex). In the n-body (two-body) encoding scheme, groups of n (two) classical variables are formed and each basis state represents a unique encoded group. In the complete encoding, each basis state represents an entire graph. expressive power that is beyond classical computation [10,11,24]. The goal from here would be to efficiently navigate the exponentially large Hilbert space and reach the basis state(s) which represent the exact or approximate solution(s) to the QUBO problem.
In this complete encoding scheme, the QUBO model can be mapped onto an Ising Hamiltonian whereσ (i) z is the z Pauli matrix acting on qubit i and A ij are the elements of the matrix A. The ground state ofĤ Ising is a basis state | x that corresponds to an exact solution x of the QUBO problem defined by A. For general instances,Ĥ Ising represents a system of interacting spins where all two-body interactions may be present.
A variational algorithm can then be implemented to find a suitable solution by using the ansatzÛ cp ( θ) to produce trial states and finding the set of parameters θ to minimize the cost function Here, Eq. (3) is a linear function of expectation values with a number of terms polynomial in the number of qubits.
Existing variational ansatzes for optimization problems can be divided in two distinct groups -approaches which require the HamiltonianĤ Ising to be implemented on the quantum hardware and those which utilize only native gates unconstrained by the specific problem. Approaches such as the QAOA, as implemented in Refs. [1,5,21,36,37,41,49], fall into the first category and benefit from being able to exploit some extent of adiabatic computing to search the Hilbert space [2]. In principle, the produced variational state is guaranteed to converge towards the exact solution for infinitely long quantum evolution U cp ( θ). These approaches however, can be difficult to implement for generic QUBO problems. Approaches that fall into the second category have been implemented in Refs. [6,42] and are designed to circumvent the technical challenges of implementingĤ Ising . However, these approaches do not guarantee the existence of an efficient path to the optimal solution and can become exponentially hard to optimize as the system size increases. While the ansatz proposed in this work belongs to the latter category, there should be no fundamental restrictions on devising circuit structures tailored toward a specific QUBO problem within the proposed encoding schemes.

Minimal encoding
While complete encoding schemes allow for all manybody correlations to be captured between classical variables, the number of qubits required limits their application to small system sizes with unfavorable scaling up perspectives. In what follows, we propose an encoding scheme which sacrifices this ability to capture correlations but allows for problem sizes to be scaled exponentially with the number of qubits. We refer to this mapping as the minimal encoding.

Expressibility of the minimal encoding
The minimal encoding scheme considered here requires one ancilla n a = 1 and n r = log 2 n c register qubits for a total number of n q = log 2 n c + 1 qubits. The parametrized quantum state can be expressed as where the states {|φ i r } ({|0 a , |1 a }) are computational basis states of the register (ancilla) qubits. The premise is to define a one-to-one correspondence between each of the n c classical variables x i in x and a unique basis state |φ i r , as depicted in Fig. 1 (c). The probability of the i th classical variable to have the value 1 or 0 is given by Pr(x i = 1) = |b i | 2 and Pr(x i = 0) = 1 − |b i | 2 = |a i | 2 respectively. The coefficients β i ( θ) capture the likelihood of measuring each register state |φ i and thus the corre- Figure 2: Hardware-efficient variational ansatz. The initial quantum state |ψ0 = 1 is produced by the first layer of Hadamard gates. Each subsequent layer 1 ≤ l ≤ L is composed of a series of CNOT gates and single rotations Ry(θi) (denoted Y) where the L × nq variational parameters are grouped in θ (n = nq for readability). A single evaluation of the cost function requires nmeas measurements in the computational basis and one optimization process requires n eval of these evaluations.
sponding state of the ancilla qubit. As an example, encoding the probability distribution over all solutions x of dimensions n c = 4 requires n r = 2. One can then define the mapping as |φ 1 r ≡ |00 r , |φ 2 r = |01 r , |φ 3 r ≡ |10 r and |φ 4 r = |11 r . In doing so, the quantum state representing the unit probability of sampling x = (1, 0, 0, 1) would read A similar encoding strategy has been utilized in the context of image compression [27].
The limitation of this compact mapping is its ability to only encode distribution functions of statistically independent classical variables, i.e. where the probability of obtaining a particular classical solution x from the state is given by Pr( x) = nc i=1 Pr(x i ). This comes as no surprise as the quantum state uses only n c coefficients to encode a probability distribution over 2 nc solutions. As a consequence, it is always possible to efficiently capture the resulting distribution functions using classical approaches, as we will discuss below in more detail. Despite these limitations, we examine this limiting case closely as it captures the core elements of the general encoding strategy.

Cost function to minimize
As with standard VQAs, we defined a cost function to be minimized using a set of parameters θ. Given that |ψ 1 ( θ) represents a distribution function over statistically independent classical variables, it adopts the form whereP i = |φ i φ i | r (P 1 i = |1 1| a ⊗P i ) are the projectors over the register basis state |φ i r independent of the ancilla state (with the ancilla being in |1 a ). The expectation value can be expressed as The highly entangled quantum state that minimizes Eq. (5) adopts the form |ψ = i β i |σ i a ⊗ |φ i with σ i = {0, 1} and corresponds unambiguously to the exact solution x = [σ 1 , . . . , σ nc ] that minimizes the QUBO problem defined by the matrix A. This point is crucial as it ensures that finding the global minimum of Eq. (5) leads to the exact classical solution that minimizes the QUBO problem. Another important aspect of C 1 ( θ) is that it only depends on the set of norms {|b i | 2 }. As a consequence, partial tomography performed by a series of measurements solely in the computational basis is sufficient for its estimation. Finally, the cost function C 1 ( θ) in Eq. (5) cannot be reduced to a linear function of expectation values and therefore the QUBO model in the minimal encoding scheme cannot be described with a suitable Hamiltonian. A detailed derivation of Eq. (5) is presented in appendix A.

Variational protocol to solve randomly generated QUBO models
The quantum state |ψ 1 ( θ) =Û 1 ( θ)|ψ 0 is produced by a parameterized unitary evolutionÛ 1 ( θ) applied to an initial product state |ψ 0 ∼ (|0 a + |1 a ) ⊗ nc i=1 |φ i r . We consider a hardware efficient circuit as our ansatzÛ 1 ( θ) in the form depicted in Fig. 2. This circuit starts with a layer of Hadamard gates applied to all the qubits initially in |00 . . . 00 to produce |ψ 0 . It then follows with an alternating sequence of nearest-neighbor CNOT gates and single qubit R y (θ i ) 200 600 1000 n eval 1400 rotations. Each successive application of CNOT gates and R y (θ i ) rotations make up a single layer. This choice of ansatz represents the simplest case where qubits are arranged in a linear topology with nearestneighbor couplings. It also produces states with only real-valued coefficients which efficiently restricts the Hilbert space since the cost function in Eq. (5) does not depend on any phases. The optimization procedure is standard and first consists of randomly choosing a starting point for the variational parameters θ ini from a uniform distribution and measuring the output quantum state |ψ( θ ini ) in the computational basis. This quantum evolution is repeated n meas times to estimate C 1 ( θ ini ). The results are fed to a classical optimizer which updates the parameters θ old → θ new . The parameters are updated n eval times until convergence or if a set of termination criteria is met. The resulting parameters are denoted θ opt . From the final quantum state In Fig. 3, we show the average optimized cost function as a function of circuit depth for 3 QUBO instances of different sizes, n c = 8, 32 and 64, using n q = 4, 6 and 7 qubits respectively 1 . In each instance, 1 We note that the expressive power of |ψ 1 ( θ) can be fully captured within the complete encoding scheme by using only a single layer of Ry(θ i ) rotations applied to each qubit. Studying the minimal encoding scheme is therefore akin to examining the amount of resources required to map the simplest quantum the elements of A were randomly drawn from a uniform distribution ranging from -1 to 1. COBYLA was chosen as the classical minimizer to update the variational parameters as it was found to give the best results for the least number of cost function evaluations [39]. The effects of a noisy circuit is shown in Appendix C where we compare the performance of a noise-free optimization for a n c = 32 matrix to one with a simplified noise model applied. In Fig. 4 (a)-(c), we compare the infinite-measurement limit to simulated values obtained using n meas ∼ 1 − 15 × 10 3 , at specific circuit depths for each of the different problem sizes. Our findings show that increasing the number of measurements reduces the likelihood of the optimizer terminating in a local minima caused by fluctuations in the cost function. It also allows for finer tuning of the optimal parameters due to the increased precision when estimating C 1 ( θ), resulting in an increase in n eval .
From each of the optimized states of Fig. 4 (a)-(c), 10 classical solutions x were drawn and distributed according to their normalized cost function valueC x = (C x − C min )/(C max − C min ); their normalized cumulative sum is shown in Fig. 4 (d)-(f). The resulting histogram y(C x ) (y-axis) represents the fraction of the solutions drawn that have a cost function greater thanC x (x-axis). As an example, a value y(C x = 0.2) = 0.3 means that 30% of solutions drawn have a cost function value ofC x > 0.2. The better the circuit of nc qubits onto an exponentially narrower circuit. solutions { x} obtained, the sharper the histogram will peak atC x = 0. The different coloured lines stand for different number of measurements n meas and are compared to randomly drawn solutions as represented by the dotted black curves. The results show that the minimal encoding scheme was able to produce a significant portion of its solutions to be within 20% of the optimal cost function value for n c = 8, 32 and a majority of solutions produced for the n c = 64 case were found to be within 30% of the optimal cost function value. The numerical results also suggest that an increase in resources such as n meas , n eval and depth L are required to maintain comparable accuracy as the problem sizes increase.
In Appendix C, we investigate the robustness of the minimal encoding to experimental imperfections such as finite gate fidelities, coupling to the environment and readout errors. We further compare its performance to the more standard QAOA protocol in Appendix D. Following the recent state-of-the-art QAOA experiments for a fully connected problem in Ref. [21], implementing the QAOA for n c = 8 variables would require 8 qubits and 612 gates for p = 3. In comparison, the minimal encoding with a hardware efficient ansatz only requires 4 qubits and 42 gates for L = 6 and was able to achieve an improvement in performance over the QAOA. While current noise levels encountered in state-of-the-art experiments affect all quantum optimization algorithms proposed so far, we show that compared to the QAOA, our efficient encoding is a step in the right direction by drastically reducing the resources required to solve larger-scale problems.

Classical Simulatability
As previously mentioned, the exponential decrease in the number of qubits offered by the minimal encoding also limits its advantage over classical methods. The probability distribution over statistically independent variables captured by the minimal encoding can be efficiently captured using only n c continuous variables {w i }. Each variable w i replaces P 1 i θ / P i θ in Eq. (5), resulting in a non-convex quadratic optimization problem with continuous variables which can be solved using quadratic programming techniques [50]. This is in contrast to the number of parameters, N p = L × n q , required in variational quantum circuits. From our numerical experiments shown in Fig. 3, satisfactory results were obtained using N p = (L = 4) × (n q = 4) > n c = 8 and N p = (L = 16) × (n q = 6) > n c = 32 therefore suggesting classical approaches as more efficient routes. In the following section, we describe the methods for going beyond the limiting case of the minimal encoding where more sophisticated probability distributions can be captured by the quantum state through the use of additional ancilla qubits, therefore providing an opening for possible quantum advantages.

Two-body correlations
In this section, we show how two-body correlations between the classical variables of the QUBO problem can be introduced into the probability distribution captured by the quantum state. These correlations refer to the conditional probability of one of the variables taking on a certain value given the value of another variable when sampling the classical solution from the probability distribution. We then describe how the particular topology of the different QUBO instances can influence the subset of correlated pairs to be encoded. Specifically, when applied to a Max-Cut problem, we find that encoding only the correlations between pairs of variables that are connected within the graph leads to an improvement in the classical solutions obtained when compared to the minimal encoding approach.

General encoding scheme
We propose a general form of the quantum state that allows for the encoding of two-body correlations: where the register (ancilla) subspace now comprises n r = log 2 (n pair ) (n a = 2) qubits with n pair being the number of two-body correlations encoded. Similar to the minimal encoding scheme, each basis state |φ ij r of the register space acts as a pointer. However, this pointer now points to the index of a pair of classical variables (x i , x j ), as depicted in Fig. 1(c). The associated two-qubit ancilla state encodes the bare probability for all pair values, e.g. Pr(x i = 0, x j = 0) = |a ij | 2 , Pr(x i = 1, x j = 0) = |b ij | 2 and so on. This encoding allows one to produce probability distributions that is able to capture correlations beyond statistically independent variables. A similar encoding strategy has been considered to address the issue of limited connectivity in quantum annealing platforms, allowing to simulate all-to-all connectivity from only local interactions [29]. The form of Eq. (6) is general enough to allow correlations to be captured between either all pairs of variables or only a subset of these pairs. In certain cases, one might be able to infer a preferred subset of pairs to encode based on the specific topology of the problem, allowing for an important reduction in the number of qubits required. In what follows, we highlight this point by comparing two general cases of frequently encountered QUBO models.

Selective subsets for sparse matrices
In QUBO instances where A is sparse, one might naturally expect that the most important correlations are those between the pairs of non-zero elements in A. One seminal instance of sparse QUBO models is the d-regular Max-Cut problem where d n c . Each vertex on the corresponding graph is represented by a classical variable in x as depicted in Fig. 1 (b), and each edge by a non-zero off-diagonal element in A. The resulting matrix A has d unit entries per row and column, and diagonal elements A i,i = −d. By selectively encoding only the n pair = n c × d/2 pairs between non-zero elements in A (i.e. the edges), n q = log 2 (n c ×d)+1 are required, which is only log 2 (d) qubits more than the minimal encoding scheme.
Illustrating with an example, encoding the 12 edges of the 3-regular graph with n c = 8 shown in Fig. 1 (b) would require n r = 4 register qubits. The pair (x 1 , x 2 ) could be mapped onto the basis state |φ 12 r ≡ |0000 r , the pair (x 1 , x 7 ) on |φ 17 r ≡ |0001 r and so on until each edge is associated with a unique basis state. In the later sections, we apply this selective encoding method to solve a 3-regular Max-Cut problem with n c = 42 number of variables using n q = 8 qubits, allowing us to surpass the performance of the minimal encoding scheme.

Encoding all possible pairings for dense matrices
For more extreme instances where A is dense, such as the randomly generated QUBO models used in the previous section, selecting a specific subset of two-body correlations becomes completely arbitrary. The only unbiased approach then involves encoding all possible n pair = n c (n c − 1)/2 pairs of classical variables, requiring the maximal number of qubits n q = log 2 [n c (n c − 1)] + 1. Using this method to encode the 28 edges in the fully connected graph shown Fig. 1 (a) would require n r = 5 register qubits. The mapping would proceed in a similar fashion as before, where the pair (x 1 , x 2 ) can be associated to |ψ 12 r ≡ |00000 r , (x 1 , x 3 ) to |ψ 13 r ≡ |00001 r and so on. Despite the "unbiased" choice of pairing the variables, capturing all possible two-body correlations for general dense QUBO problems is typically not an efficient use of quantum resource as we shall observe later.

Averaging the probabilities and defining the cost function
Interpreting the quantum state |ψ 2 in Eq. (6) as a distribution function Pr( x) over the ensemble of classical solutions x is not as straightforward as the minimal encoding case. To better understand this statement, let us first consider the limit where the ensemble of pairs {(i, j)} encoded would correspond to the set of edges in a 1-regular graph, also known as a perfect matching in graph theory and highlighted in Fig. 1 (b). In this case, each variable x i is paired with a single other variable x j and the probability to sample a solution x is uniquely defined as Pr( x) = (i,j) Pr(x i , x j ). However, in the more general scenarios where at least one variable is included in more than one pair, the probability of sampling a solution x is not uniquely defined anymore. For example, in the limit where all pairs are encoded, there are N pm (n c ) = (n c − 1)!! ways of calculating Pr( x) with the possibility of vastly different results, where N pm (n c ) is the number of perfect matchings in a fully connected graph.
In order to be able to define a cost function in the form of Eq. (5) that is well-behaved despite the nonuniqueness of Pr( x), we need to define averaged prob-abilitiesP i,j σi,σj of sampling x i = σ i and x j = σ j where σ = {0, 1} that takes into account the multiple ways of calculating Pr( x). Doing so, we obtain the averaged probability of sampling (x i , x j ) = (1, 1) from where c ij and d ij are the amplitudes of the ancilla states given in Eq.
is the ratio between the number of perfect matchings after subtracting the vertices v i and v j from the graph G to the total number of perfect matchings in G. Similarly, R ijkl describes the same ratio but with 4 vertices removed instead. The graph G is built by mapping each classical variable to a vertex and each pair encoded in |ψ 2 to an edge. Expressions similar to Eq. (7) for P i,j 0,0 ,P i,j 0,1 andP i,j 1,0 are derived in appendix A. Using the same approach, one can also derive the averaged probability of sampling x i = 1, leading tō where b ij is also defined in Eq. (6). In the limit where all possible pairs are encoded, these ratios are R ij (G) = (n c − 3)!!/(n c − 1)!! = 1/(n c − 1) and R ijkl (G) = R ij (G)/(n c − 3). However, in the case where only a subset of pairs are encoded, R ijkl (G) depends on the vertices {i, j, k, l} and is NP-hard to evaluate. One thus needs to resort to approximated ratios and our numerical experiments suggest that estimating R ij (G) = 1/d and R ijkl (G) = R ij (G)/(d − 2) for a d-regular graph leads to adequate behaviour of the probabilities.
The key properties of Eq. (9) are similar to that of Eq. (5) in that (i) its global minimum corresponds unambiguously to the solution x that minimizes the QUBO problem, (ii) it can be estimated by a series of measurements solely in the computational basis, and (iii) it cannot be cast as a linear function of expectation values.

Sampling the classical solution from the quantum state
The form of |ψ 2 ( θ opt ) provides some flexibility in how solutions can be sampled from it. In the following, we describe a sampling protocol that fully exploits the encoded correlations and we show a simple example in Fig. 5.
The procedure is as follows.
1. Select the pair (i, j) with the most definite mean probabilities, i.e. the pair whereP i,j σi,σj of sampling x i = σ i and x j = σ j is the closest to unity. As an example, let us consider that the probability to sample (x 1 , x 2 ) = (1, 1),P 1,2 1,1 = 0.9, is the largest of all mean probabilities, we select the pair (1, 2).

Repeat the steps from (1) until all variables have been assigned a value.
Conceptually, this method allows for a finite propagation of correlations along the graph G during the sampling. As an example, let us consider the case where correlations in the pairs (x i , x k ) and (x k , x l ) are explicitly encoded in |ψ 2 but not for the pair of variables (x i , x l ). Using this sampling technique makes the probability of sampling x l = {0, 1} change conditionally for the sampled value of x i , therefore inducing correlations. We stress that these induced correlations are not captured in the optimization process, but only during sampling.

Application to randomly generated QUBO instances versus a d-regular Max-Cut
In this section, we present the results obtained after optimizing quantum states of the form of Eq. (6) using the cost function C 2 ( θ) for two different instances of the QUBO model -a 3-regular Max-Cut problem of n c = 42 and a randomly generated matrix A of n c = 8.

Selective encoding for a 3-regular Max-Cut problem
To demonstrate the effectiveness of capturing correlations, we apply our encoding scheme for n a = 2 to a randomly generated 3-regular Max-Cut problem with n c = 42 vertices and 63 edges. In this example, selective encoding was used to only encode correlations between classical variables that are connected by an edge, requiring n q = log 2 (63) + 2 8 qubits. By contrast, encoding all of the 861 possible pairs would require 12 qubits.
Using the same hardware-efficient circuit shown in Fig. 2, we apply the optimization protocol described in Sec. 3.3. In Fig. 6 (a), we show the final cost function C 2 ( θ opt ) as a function of the circuit depth L in the limit of n meas → ∞. We compare to optimization results for the same problem using the minimal encoding scheme n a = 1. Panel (b) shows the differences in the optimization process between the n a = 1 and n a = 2 encoding schemes for L = 6 for n meas → ∞ and n meas = 10 4 . While C 1 and C 2 are both depicted in the same figure to demonstrate their respective performance, we stress that they are different quantities and might lead to substantial differences in the quality of the solutions sampled despite their comparable values. This discrepancy is further accentuated given the fundamentally different sampling protocols.
The distribution of solutions drawn from |ψ 2 show a substantial improvement in quality over the solutions obtained from |ψ 1 , as depicted in Fig. 6 (c). Importantly, as we show in Appendix C, this improvement over the minimal encoding is preserved and even amplified in presence of experimental imperfections. Intuitively, this enhanced robustness to noise could be the results of the information redundancy encoded when capturing two-body correlations, a characteristic reminiscent of the general idea of error-correction. The use of selective encoding has thus allowed us to produce better quality solutions through a combination of encoding only the subset of two-body correlations that are expected to be the most relevant and reducing the complexity of the cost function C 2 ( θ opt ).

Encoding all pairs for randomly generated QUBO instances
We conclude the results by revisiting the matrix A with n c = 8 consisting of elements drawn from a continuous uniform probability distribution. In this instance, all 28 possible pairings between the 8 classical variables are encoded, requiring a total of n q = log 2 (28) + 2 7 qubits.
The results are shown in Fig. 6 (d)-(f) and compared to the results previously obtained in the minimal encoding scheme. Most importantly, panel (c) shows that solutions sampled from the statistically independent distribution function encoded in |ψ 1 are of better quality than those sampled from |ψ 2 . These results strongly suggest that encoding all pairs is not an efficient use of quantum resources and can lead to a poorer performance during optimization as well as poorer quality solutions obtained from the final state.
Intuitively, this efficiency loss can be attributed to the use of a much larger Hilbert space to encode highly redundant, and possibly contradictory, information about classical correlations. This suggests that there is a balance to reach regarding the subset of pairs to be encoded and the resources required to do so; a more detailed discussion of the general case for any choice of n a is presented in the following section.

Generalization to multi-body correlations
Now that we have described in detail a framework to make the first step beyond statistically independent classical variables and encode two-body correlations, generalizing the idea to encoding any set of n a -body correlations is straightforward. Consider a variational quantum state of the form: where the ancilla state |ϕ i ( θ) a is composed of n a qubits and is associated with a register state |Φ i r that points to a specified group i of n a classical variables. In light of the previous section, whether |ψ a ( θ) can be efficiently optimized to solve a QUBO problem strongly depends on the choice of the encoded groups of n a classical variables. One of the simplest mapping strategies consists of encoding a selected set of n c /n a independent groups of n a variables, i.e. where no variable is part of more than one group. The number of qubits needed for this, N ind (n a ) = log 2 (n c /n a ) + n a , (11) increases monotonically until the complete encoding threshold where n a = n c . In this strategy, there is a one-to-one correspondence between each of the n c /n a subgroup of n a classical variables and a unique basis state of the n r = log 2 (n c /n a ) register qubits. The quantum state |ϕ i ( θ) a of the n a ancilla qubits associated with the i th subgroup encodes a distribution function that can capture all correlations among the variables of this subgroup. The optimization protocol can be interpreted as partitioning the QUBO problem into subgroups and simultaneously solving each of them using the complete encoding. This choice of mapping is one that is arbitrary as there is no fixed structure as to how the variables should be grouped. However, the minimal use of quantum resources might make this a desirable choice in certain situations. Another strategy would be to encode all nc! na!(nc−na)! groups of n a variables, which is the generalization of encoding all possible pairs for n a = 2. This requires N all (n a ) = log 2 n c ! n a !(n c − n a )! + n a (12) qubits, which is a non-monotonic function of n a and can substantially exceed the total number of qubits required for the complete encoding, showing an inefficient use of quantum resource. In between these two extremes are multiple mapping options and whether any of these encoding schemes can efficiently exploit the dominant correlations within a specific family of QUBO models is of great interest. For example, one could imagine encoding an ensemble of (d + 1)-body correlations that follows the specific topology of a d-regular Max-Cut problem. In this case, each classical variable within the d-regular graph forms a group of d + 1 elements. Encoding all of those n c groups into a quantum state would require N reg (n a ) = log 2 (n c ) + n a + 1, (13) qubits, where d = n a . For n a → (n c − 1), i.e. a fully connected graph, the number of qubits exceeds the threshold n q = n c by log 2 (n c ).
To investigate the resources required to reproduce these probability distributions classically, we consider the simplest encoding strategy described in Eq. (11) where the classical variables are grouped into distinct subgroups. In this scenario, the cost function to minimize is a direct generalization of Eq. (5), i.e. it can still be cast as a quadratic optimization problem over continuous variables. This time however, capturing all correlations encoded in the quantum state would require 2 na classical variables for each of the 2 nr subsystems, leading to a total of 2 nq total variables, as expected. In contrast, by using a variational quantum circuit, the number of variational parameters involved during optimization remains N p = L × n q . While it is expected for L to scale with the number of qubits involved, the exact nature of this scaling is still an open question. Anything sub exponential, which can be expected from previous analysis in the context of random quantum circuits [7], could lead to quantum advantages. One indication favouring this sub exponential depth can be seen in the context of random circuits where an ensemble of random unitaries with approximate t-design properties can be produced with polynomial circuit depth. Because a logarithmic compression in the number of qubits is unlikely to bring about any computational advantages (cf. Section 3.4), the crossover between the minimal encoding and the complete encoding, where the compression in number of qubits is polynomial, is of great interest and needs to be further studied.

Conclusion
In this work, we have proposed and analysed a systematic encoding scheme for variational quantum algorithms that allows one to capture increasing amount of correlations between classical variables in optimization problems. We first detailed the implementation of the minimal encoding scheme, using only n q = log 2 (n c ) + 1 qubits to solve a QUBO model of size n c . This significant reduction in qubits allowed us to tackle randomly generated problem instances of size n c = 8, 32 and 64 using only n q = 4, 6 and 7 qubits respectively. Our numerical solutions was able to find suitable high quality solutions using resources compatible with NISQ devices despite the inability to capture any correlations between the classical variables. The use of a hardware efficient parameterized circuit allowed us to reduce the number of gates required during implementation. Implementing QAOA according to recent state-of-the-art experiments in performed in Ref. [21] would require 612 gates for p = 3 when applied to an n c = 8 variable problem. However, superior solutions can be obtained using our minimal encoding scheme with as little as 42 gates and n q = 4 qubits.
We also detailed encoding protocols that allow for two-body correlations to be captured between the classical variables. The number of qubits required scales logarithmically with the number of pairs encoded and we showed that exploiting the topology of the QUBO instance is essential for efficient optimization of the quantum state. By applying the two-body correlation encoding to a Max-Cut problem of 42 vertices, we were able to obtain better performance compared to the minimal encoding scheme.
The focus of this work was primarily on the encoding schemes outlined in the main text and was not intended as a thorough investigation of the most efficient optimization protocols. We believe that the results presented can still be improved upon substantially. One possible area for exploration could be finding an ansatz that would result in a smoother cost function landscape with shallower circuits. More adapted classical optimization methods may also bring significant improvements in the optimization process as it was found that a considerable fraction of optimization runs got stuck in local minimas [28,46]. Improvement on that front may also substantially decrease the number of measurements required to reach comparable quality of solutions. Further avenues to explore would be whether generalizations to larger n a -body correlations can be efficiently optimized and whether alternative ways of capturing correlations for dense problem instances can be found. More importantly, we wish to investigate the intermediate encoding schemes beyond the limit of classical intractability where quantum algorithms may outperform classical approaches.

A Derivation of the cost functions
In all of the encoding schemes outlined in the main text, the quantum state |ψ( θ) captures a probability distribution over all 2 nc classical solutions. In this context, we generalize the QUBO cost function, C x = x A x, as a sum over all possible solutions weighted by their respective probability to be sampled, i.e.
Here In what follows, we present in more details the following steps that lead to Eqs. (5) and (9) of the main text and provide further discussions about their properties.

A.1 Minimal encoding
In the minimal encoding, the state |ψ 1 ( θ) describes statistically independent classical variables where the probability of sampling x is Pr( Pr(x i ). In this case, which, in terms of the quantum state amplitudes, reads (16) By substituting these results into Eq. (14), one gets The final form presented in Eq. (5) of the main text is obtained by expressing the probabilities |b i ( θ)| 2 = P 1 i θ / P i θ in terms of the projectorsP i andP 1 i (defined in the main text).

A.2 Two-body correlations
In the case where the variational quantum state |ψ 2 ( θ) encodes a given set of two-body correlations, evaluating Eq. (14) is not as straightforward as in the minimal encoding. This is due to the multiple ways of evaluating the probability Pr( x) of sampling a solution x, each of which capable of producing very different results. More precisely, for x = (σ 1 , σ 2 , . . . , σ nc ), represents the conditional probability to sample x i = σ i given x j = σ j . Here the ensemble {(i, j)} represents a set of independent encoded pairs where no variables are repeated, i.e. a perfect matching. Consequently, there are as many ways to evaluate Pr( x) as there are perfect matchings N pm (G) in the graph G, corresponding to the encoded pairs in |ψ 2 ( θ) .
To evaluate Eq. (14), we average over all possible ways of evaluating Pr( x), denoted by {Pr( x)}, and define the mean probabilities for i = j. The mean probability to sample a single variable x i = 1,P i 1 , is given by the same above def-inition with i = j. There are two distinct scenarios that one can encounter while averaging over all possible perfect matchings corresponding to x i = x j = 1 in G. The first is when the perfect matching contains an edge connecting x i and x j . There are N pm (G ij ) of such instances, where G ij is the graph obtained by subtracting the two vertices i and j. For each of these instances, the conditional probability Pr(x i = 1|x j = 1) = |d ij ( θ)| 2 is directly encoded in the quantum state (see Eq. (6) of the main text). The second scenario occurs when the perfect matching does not include an edge connecting the vertices i and j to each other but instead to other vertices k and l. These cases appear within a subset of N pm (G ijkl ) perfect matching instances, where G ijkl is the graph obtained by subtracting the vertices i, j, k and l. In these scenarios, the conditional probability Pr(x i = 1|x j = 1) is not directly encoded in the quantum state and has to be inferred from Pr( where Pr(x k = 0, 1|x i = 1) is the conditional probability of having x k = 0 or x k = 1 given x i = 1.
Considering these contributions, we obtain the following mean conditional probabilities: for (i = j), andP for i = j.
The cost function in Eq. (14) thus adopts the final form as in Eq. (9) of the main text. This averaging ensures a well behaved cost function where the quantum state which minimizes this cost function gives the unit probability of sampling the exact solution which minimizes the QUBO problem. The drawback of this method is the partial "washing out" of the encoded correlations as it can be seen by the first term (second scenario) in Eq. (19) which adopts the form of two statistically independent variables.
Following the steps outlined above, the following averaged probabilities can also be derived: B The cost function landscape The cost functions C 1 ( θ) and C 2 ( θ), described by Eq. (5) and Eq. These differences are depicted in Fig. 7, where C cp ( θ), C 1 ( θ) and C 2 ( θ) are plotted as a function of a single parameter θ i with all other rotation angles being fixed at random values. The A matrix used in Fig. 7 (a)-(c) is the same randomly generated n c = 8 matrix used in Section 3.3 of the main text, while the same A matrix describing the n c = 42 3-regular Max-Cut in Section 4.1.1 was used in panel (d). The circuit used to obtain the landscape of C cp (θ i ) consists of a single layer of R Y (θ) applied in parallel to all qubits. This circuit was chosen as it consists of only single-qubit rotations with no entangling gates. The resulting quantum state can therefore only describe probability distributions of statistically independent classical variables in the complete encoding, and is equally expressible as |ψ 1 ( θ) in the minimal encoding.
For deep circuits and linear cost functions, Ref. [34] predicts the existence of barren plateaus for 2-design quantum circuitsÛ ( θ). Interestingly, the non linear forms of C 1 ( θ) and C 2 ( θ) do not fulfil the necessary conditions underlying the proof derived in Ref. [34]. Consequently, we expect that a more constrained condition of a t-design quantum circuit, where t > 2, would be necessary to demonstrate the existence of these barren plateaus. In addition, for cost functions comprising of a linear combination of a Poly(n q ) number of global observables, Ref. [9] predicts the existence of barren plateaus even for shallow circuits. Despite the fact that each observable considered in this work is a projector, i.e. global operator, the nonlinearity of C 1 ( θ) and C 2 ( θ) combined with the O(2 nq ) number of terms involved also do not fulfil the necessary conditions for the proof in Ref. [9]. A more thor-ough investigation of the barren plateaus for nonlinear cost function is left for future work.

C Effects of noise
In this section, we investigate the performance of our encoding scheme under the effects of a noise model consisting of thermal relaxation errors, imperfect gate fidelities, and readout errors. Thermal relaxation and decoherence can be characterized by the relaxation constants T 1 and T 2 (distinct from T * 2 ) respectively. Given a single-qubit density matrix ρ, the effects of thermal processes can be simulated by transforming ρ after a time evolution t as ρ 01 e −t/T2 ρ 10 e −t/T2 ρ 11 e −t/T1 .
(25) Gate errors are implemented via a depolarization channel that affects each qubit as it undergoes a gate operation. On top of its intended operation, a gate with error λ has an additional effect on ρ according to where I is the identity matrix representing the maximally mixed state. Readout error is the probability of obtaining an incorrect value of the qubit during measurement, i.e. reading a |0 when the qubit is in the |1 state and vice versa. In experimental quantum platforms, the magnitude for the errors above can differ between qubits, and we implement this model by assigning the qubits values drawn from a normal distribution characterized by a mean and standard deviation for each type of error. Figure 8 shows a comparison in the performance of the minimal encoding as a function of circuit depth L for different levels of noise. Each data point is obtained by performing the entire optimization protocol in the presence of all the noise sources described above. The lightest orange dashed line (circle data points) shows the results using noise levels characteristic of existing state-of-the-art hardware [47]. For comparison, we also consider more optimistic values that can be expected in the upcoming generations of NISQ devices, shown by the darker lines. The simulation for the triangle data points was achieved using a 2× increase inT and an increase in 2-qubit gate fidelity from F 2 = 99% to F 2 = 99.9% compared to the circle data points. The diamond points were obtained using a 5× increase inT from the circle data points, and an increase in single and 2-qubit gate fidelities to F 1 = F 2 = 99.99%. Mean readout errors were kept unchanged at 1% for all the noisy simulations For comparison, the darkest line (square data points) corresponds to noise-free simulations, i.e. T = ∞, F 1 = F 2 = 1 and perfect readout  Here Tg is the average single-qubit gate time and we used T CNOT /Tg = 6 and Tmeas/Tg = 30 where T CNOT and Tmeas are the average time for performing a CNOT gate and a measurement respectively. F1 and F2 are the gate fidelities for single-qubit and two-qubit operations respectively. We used a readout error of 1% for all curves except the darkest plain line for which we used 0%. All other parameters are identical to Fig. 3.
but with finite n meas . The result of these optimistic noise levels can be expected from a direct improvement in hardware implementation or from applying additional error-mitigation techniques [12]. In Fig. 9, we reproduce the results for the MaxCut problem of n c = 42 variables as shown in Fig. 6, this time including the noise model introduced above and using n meas = 5×10 4 measurements. The comparison with the minimal encoding scheme shows an enhanced resilience to noise for the two-qubit ancilla encoding. This increased robustness could be attributed to the presence of redundancy in the encoding of correlations which can be thought of as reminiscent of the general ideas behind error encoding schemes. Such results therefore serve as additional motivation to further investigate higher-ancilla encoding schemes as they might procure additional protection against experimental imperfections.

D Comparison with QAOA
In this section, we compared the minimal encoding approach to the Quantum Approximate Optimization Algorithm (QAOA) under the effects of noise. QAOA is a commonly employed technique used to solve binary optimization problems on NISQ devices [1,5,13,21,36,37,41,49,51], where each classical variable is represented by a single qubit (complete  encoding).
Using QAOA to solve a QUBO problem involves finding the state.
(2). One of the main advantages of QAOA is its guaranteed monotonic convergence to the optimal solutions as p → ∞. However, the current capabilities of NISQ devices limits p to small values and its performance has so far been drastically compromised when the interactions inĤ Ising do not match the connectivity of the physical device.
In what follows, we simulate the QAOA protocol in which we consider a linear topology where two-qubit operations can only be applied on qubits adjacent to each other (similar throughout the manuscript). Solving a general QUBO problem where all-to-all interactions can be encountered therefore requires a network of SWAP gates for all two-qubitσ (i) z ⊗σ (j) z interactions inĤ Ising to be implemented. One of the most successful experimental implementation of the QAOA protocol to date relied on an efficient decomposition of the e −iγσ (i) z ⊗σ (j) z · SWAP operations into native gates [21] and we simulate the same decomposition with the same gate fidelities and gate times. We note that different platforms require different gate decompositions due to the different native gate sets and Figure 10: Noisy and noise-free simulations of QAOA and the minimal encoding scheme with hardware efficient ansatz applied to randomly generated A matrices with nc = 8 classical variables. Data points and error bars show the mean and standard deviation of the normalized cost function over multiple matrices (and initial parameters θini) after optimization for p = 1, 2 and 3 (L = 2, 4 and 6) for QAOA (minimal encoding). 8 qubits were used in QAOA while only 4 qubits were required in the minimal encoding scheme. For QAOA, the total fidelity of the e −iγσ (i) z ⊗σ (j) z · SWAP operations is F = 96.3% with gate timeT /Tg ≈ 640 as reported in Ref. [21]. All other parameters are identical to Fig. 8. efforts have been devoted to reduce the number of gates required [26].
To find the best parameters γ and β, we adopt a commonly used optimization strategy that consists of (1) scanning the two-dimensional parameter space spanned by (γ 1 ,β 1 ) for p = 1, (2) fixing (γ 1 ,β 1 ) to their optimal values (3) adding one additional layer p → p + 1 and repeating steps (1)-(3) until reaching the desired final depth. We note that techniques to reduce the required size of the search grid for the parameters associated with p > 1 have been proposed [51]. During our simulation, the parameter scan was done over a 50 × 50 points grid (β ∈ [0, π[ and γ ∈ [0, 2π[) with 50 000 measurements per point, well within the capabilities of existing hardware [21]. It is also noteworthy that for general instances of QUBO problems, γ 1 might not be bounded to the domain above. To ensure that the finite grid resolution was not a limiting factor, the optimal parameters found on the initial 50 × 50 points grid were further improved by performing an additional refined local search.
In Fig. 10, we show the comparison in the performance of our minimal encoding scheme and the QAOA protocol for multiple randomly generated A matrices of size n c = 8. This is in contrast to problems artificially curated to match the topology of the quantum device commonly used in experimental implementations [21,26,36,37,49]. Similar to the simulations shown in Appendix C, we use a noise model that, in addition to the finite gate fidelity, also includes thermal relaxations and readout errors. We emphasize that the search protocol in both the QAOA and minimal encoding scheme have been performed in presence of the simulated noise. This is also in contrast to some recent experimental QAOA demonstrations where the optimization is first performed with an ideal simulation and only the optimized circuit is executed on the quantum hardware [21,26,49]. For the minimal encoding, we used 15 starting points of randomly chosen parameters. With each optimization run resulting in n eval ≈ 200, this leads to a similar amount of circuit evaluations equivalent to p = 1 over a 50 × 50 points search grid.
We see from Fig. 10 that despite the provable monotonic converge of the QAOA for increasing p, practical limitations drastically limit its application to (small) generic QUBO problems. It is therefore not surprising that our minimal encoding considerably outperforms the QAOA given the important difference in the required resources. For a single layer of p = 1 for n c = 8, implementingÛ H via a SWAP network over 8 qubits requires 28σ z -SWAP interactions arranged in 8 subsequent layers. Following the gate decomposition used in Ref. [21], our implementation of QAOA required 84 2-qubit gates and 112 single qubit gates for each application ofÛ H . In contrast, using a hardware efficient ansatz of the form shown in Fig. 2, our minimal encoding requires 4 qubits, 3 CNOT gates arranged in 2 subsequent layers and 4 R y parametrized rotations for L = 1.