Quantum variational learning for quantum error-correcting codes

Quantum error correction is believed to be a necessity for large-scale fault-tolerant quantum computation. In the past two decades, various constructions of quantum error-correcting codes (QECCs) have been developed, leading to many good code families. However, the majority of these codes are not suitable for near-term quantum devices. Here we present VarQEC, a noise-resilient variational quantum algorithm to search for quantum codes with a hardware-efficient encoding circuit. The cost functions are inspired by the most general and fundamental requirements of a QECC, the Knill-Laflamme conditions. Given the target noise channel (or the target code parameters) and the hardware connectivity graph, we optimize a shallow variational quantum circuit to prepare the basis states of an eligible code. In principle, VarQEC can find quantum codes for any error model, whether additive or non-additive, degenerate or non-degenerate, pure or impure. We have verified its effectiveness by (re)discovering some symmetric and asymmetric codes, e.g., $((n,2^{n-6},3))_2$ for $n$ from 7 to 14. We also found new $((6,2,3))_2$ and $((7,2,3))_2$ codes that are not equivalent to any stabilizer code, and extensive numerical evidence with VarQEC suggests that a $((7,3,3))_2$ code does not exist. Furthermore, we found many new channel-adaptive codes for error models involving nearest-neighbor correlated errors. Our work sheds new light on the understanding of QECC in general, which may also help to enhance near-term device performance with channel-adaptive error-correcting codes.


Introduction
Fault-tolerant quantum computers promise to solve some computational problems much faster than classical machines, such as quantum chemistry simulation [1], prime factorization [2], solving linear systems of equations [3]. However, quantum information carried by current noisy intermediate-scale quantum (NISQ) systems is highly fragile and can be easily altered by the environment. The aforementioned tasks are so far out of reach.
The most promising technique to maintain coherence and protect the quantum information from noise is quantum error-correcting codes [4][5][6][7][8]. The main idea of quantum error correction is to encode the low-dimensional quantum state in a larger system such that errors occurring during the computation can be corrected due to the physical redundancy. As long as the noise rate p is below a specific threshold, QECCs can correct the error and reduce the error probability from O(p) to higher orders. In recent years, the intrinsic connections between QECCs and other areas of physics, such as quantum gravity [9], have also been noticed.
Knill and Laflamme devised sufficient and necessary conditions (known as the Knill-Laflamme conditions) for quantum error correction [10]. In principle, we can find any QECC as long as we find solutions to the Knill-Laflamme conditions. However, solving these systems of equations is extremely difficult in the general case. Therefore many open problems in this field remain unsolved, e.g., do all degenerate QECCs obey the Hamming bound? Which QECC has the highest error threshold? In practice, researchers usually analyze QECCs under the Pauli framework and have developed various QECC families, such as surface codes [11,12], Calderbank-Shor-Steane (CSS) codes [13,14], stabilizer codes [5], code-word stabilized (CWS) codes [15,16], quantum low-density parity-check codes [17,18].
Up till now, no logical qubit/operation with useful fidelity was realized in experiments since current gate noise rates are still much larger than the requirements. Very recently, Egan et al. [19] prepared a Bacon-Shor logical qubit with 13 trapped-ion qubits and demonstrated a logical single-qubit Clifford gate. Further, Postler et al. [20] demonstrated a logical T -gate based on the 7-qubit color code. However, the fidelities of these state-of-the-art logical qubits are even lower than those of the physical qubits. From a theoretical perspective, the codes used in those experiments are not device-tailored and may not be optimal for the system. The noise channels on different physical platforms differ significantly [21][22][23]. Symmetric QECC constructions under the Pauli framework can not be directly adapted to non-Hermitian/non-unitary noise channels. It is highly desirable to design asymmetric or channeladaptive QECCs with a hardware-efficient encoder. Such device-tailored codes can protect logical information more efficiently.
Besides analytical constructions, researchers have been trying to find QECCs with computational methods for a long time. Refs. [16,[24][25][26] designed classical algorithms for finding quantum codes associated with graphs. Ref. [27] used numerical greedy search for finding stabilizer codes. These algorithms, however, cannot find arbitrary codes and are extremely timeconsuming. With the popularity of artificial intelligence, researchers also started to design and optimize quantum codes with neural networks [28][29][30][31]. These classical black-box models perform pretty well for certain problems. In this work, we add a new general method to this toolbox. We devise a hybrid quantum-classical algorithm called VarQEC for finding quantum errorcorrecting codes. The cost functions therein are based on the Knill-Laflamme conditions. We iteratively update the parameters in a variational quantum circuit (VQC) with stochastic gradient descent. If the final cost functions are sufficiently small, we obtain an approximate quantum code whose inaccuracy is bounded. Compared with the classical iterative algorithm introduced in Ref. [32], our method yields the encoding circuit, not merely the encoding isometry. After finding a QECC and its encoder, the de-coding operation can be found via various methods like semidefinite programming [33], convex optimization [34], or classical/quantum machine learning [35][36][37].
VarQEC allows for non-Hermitian or nonunitary errors and is surprisingly effective. We numerically verify its effectiveness up to 14 qubits.
Some of the ((6, 2, 3)) 2 , ((7, 2, 3)) 2 codes we find are not locally equivalent to any CWS code. It is an open question of whether there is a quantum code with parameters ((7, 3, 3)) 2 , our numerical evidence suggests that it is non-existent. Then we apply VarQEC to search for asymmetric codes (which detect more Pauli-X/Y errors than Pauli-Z errors or vice versa) and make new discoveries. Furthermore, we search for channeladaptive codes for nearest-neighbor collective amplitude damping and nearest-neighbor collective phase-flips, and find eligible new codes with a hardware-efficient encoding circuit for various connectivity graphs. Since VarQEC is capable to find a QECC with the shallowest possible encoding circuit, it is promising to design codes with sufficient fidelity that can be tested and implemented on near-term devices. Although only relatively small systems were investigated in this paper, hierarchical concatenation can construct good quantum codes with large code lengths and distances [5,38,39].
The paper is organized as follows. In Sec. 2, we introduce some background of quantum error correction. In Sec. 3, we introduce our cost functions and present propositions to support our definitions. In Sec. 4, we explain the VarQEC algorithm in detail. In Sec. 5, we show quantum codes (re)discovered thereby, including symmetric, asymmetric, and channel-adaptive codes for nearest-neighbor collective amplitude damping and nearest-neighbor collective phase-flips. Sec. 6 discusses the noise resilience feature of Var-QEC. Sec. 7 discusses the barren plateaus and the noise-induced barren plateaus in VarQEC optimization. In Sec. 8, we verify our algorithm by an experiment on an IBM quantum device. The conclusions and future directions are sum-marized and discussed in Sec. 9. The appendices give some proof details, a discussion on overparameterization, an alternative variational ansatz, some non-CWS quantum codes, and a list of the quantum weight enumerators of quantum codes discovered by VarQEC.

Preliminaries
In classical computation and communication, redundancy is added when encoding a message such that the errors can be detected and corrected. Although each bit may flip with some probability, the encoded message can be recovered with high probability. The philosophy behind quantum error correction is the same. We use several lowfidelity physical qudits (e.g., qubits) to encode the logical quantum information redundantly and nonlocally. Then quantum errors can be detected through syndrome measurements and corrected through a unitary operation. A q-ary QECC C is a K-dimensional subspace of the q n -dimensional Hilbert space (C q ) ⊗n , where n is the number of physical qudits (referred to as the code length). For qubit systems, q = 2, C ⊂ (C 2 ) ⊗n . When K = 1, the code is a fixed quantum state without computational use. Throughout this paper, we only discuss K ≥ 2.
Knill and Laflamme developed a general theory of quantum error correction. They obtained the sufficient and necessary conditions for an exact QECC [10]: a quantum code with orthonormal basis states {|ψ j } corrects the error set holds for all E α , E β ∈ E. Here, P c = j |ψ j ψ j | is the orthogonal projector onto the code space, and each λ αβ is a complex number. Moreover, we say the quantum code is non-degenerate if the matrix λ αβ has full rank [40]. We can understand these conditions intuitively.
This means orthogonal logical states remain orthogonal after the noise channel, the logical information is not corrupted. When i = j, ψ j |E † α E β |ψ j = λ αβ with λ αβ being a constant only determined by the error product. This indicates that the projections between subspaces induced by different errors are information-preserving, the errors have an orthogonal decomposition. Therefore, we can correct the error without knowing or destroying the quantum superposition state.
The quantum error detection conditions have a similar form: a quantum code with code space projector P c can detect the error set holds for all E µ ∈ E.
In experiments, most quantum errors are uncorrelated single-qudit errors. A natural measure of the capability of a QECC is the number of single-qudit errors that it can detect. This motivated the concept of "code distance": the distance of a QECC is the largest possible integer d such that the code can detect any error non-trivially acting on at most d − 1 qudits. Researchers usually denote the code parameters of a q-ary QECC with code length n, code dimension K, and code distance d as ((n, K, d)) q .
Comparing the Knill-Laflamme conditions and quantum error detection conditions, we know that a distance-d QECC can correct any error set E = {E α } with each E α non-trivially acting on at most (d − 1)/2 qudits.
For convenience, 2-ary quantum codes are usually constructed and analyzed in the Pauli framework. Consider an n-fold Pauli tensor product Denote the number of X factors, Y factors and An equivalent definition of the code distance of a QECC with projector P c is the largest possible integer d such that holds for all Pauli tensor product O α with wt(O α ) < d.
In practical scenarios, Pauli-Z errors are usually more prevalent than Pauli-X and Pauli-Y [41]. Accordingly, we use a parameter c Z to characterize this noise bias and define the following c Z -effective weight and c Z -effective distance.
where c Z > 0. The c Z -effective distance of a quantum code with projector P c is the largest possible integer d e (c Z ) such that holds for all Pauli tensor product O α with This definition is a generalization of the concept of "effective distance" introduced in Ref. [42]. An asymmetric code with code parameters ((n, K, d e (c Z ))) 2 can correct arbitrary Pauli error with c Z -effective weight smaller than d e (c Z )/2, and detect arbitrary Pauli error with c Z -effective weight smaller than d e (c Z ). When Pauli-Z errors occur more frequently than Pauli-X/Y errors, 0 < c Z < 1; when the relaxation times (T 1 ) are much smaller than the dephasing times (T 2 ), Pauli-X/Y errors occur more frequently, c Z > 1.
Quantum codes with relatively small distances can be concatenated to construct a code with large code length and distance, as illustrated in Fig. 1. Suppose we have an outermost code with parameters ((n 1 , K, d 1 )) q , other outer codes with parameters ((n 2 , q, d 2 )) q , ((n 3 , q, d 3 )) q , . . . , ((n l−1 , q, d l−1 )) q , and an inner code with parameters ((n l , q, d l )) q . We can construct a large code through several levels of concatenation: the logical data is first encoded using the outermost code, each physical qudit therein is further encoded using the ((n 2 , q, d 2 )) q code, and so forth. The hierarchically concatenated quantum code has parameters Likewise, we can concatenate asymmetric codes. A distance lower bound is given as follows.
Theorem 1 Consider asymmetric outer codes with parameters ((n 1 , K, d e (c Z ) = δ 1 )) 2 , ((n 2 , 2, d e (c Z ) = δ 2 )) 2 , ((n 3 , 2, d e (c Z ) = δ 3 )) 2 , . . . , ((n l−1 , 2, δ l−1 )) 2 , and an inner code with parameters ((n l , 2, δ l )) 2 . Concatenating these codes yields a new code with parameters Figure 1: Schematic illustration of quantum code concatenation. After finding quantum codes with encoders U 1 , U 2 , U 3 , . . . , we hierarchically concatenate these encoders to obtain a large-distance quantum code. where Proof Assume the concatenated code cannot detect a Pauli tensor product O α . For the outer code, errors occur on at least δ 1 / max{1, c Z } qubits. Each of these qubits is connected to a block of the first inner code ((n 2 , 2, d e (c Z ) = δ 2 )) 2 and for every such block, errors occur on at least δ 2 max{1, c Z } qubits. From similar arguments, errors occur on at least δ j max{1, c Z } qubits in the j-th block. The weight of O α is bounded by Hence, the c Z -effective weight of O α is at least The concatenated code can detect any Pauli tensor product with c Z -effective weight smaller than this value, we conclude

Theoretical Basis
A lot of methods for constructing QECCs are using the stabilizer formalism, but there are not that many outside the Pauli framework. This work aims to search for quantum codes based on the most fundamental principle, i.e., the Knill-Laflamme conditions and the quantum error detection conditions. A crucial tool in our scheme is the variational quantum circuit which consists of multiple layers of parameterized quantum gates. The primary ingredient of a variational algorithm is the cost function(s). We define the cost functions of VarQEC as follows.
Definition 2 (Cost functions) Consider an error set E = {E µ } and a length-n quantum code with parameterized orthogonal basis states We define the 1 -norm cost function and the 2 -norm cost function Clearly, C 1 n,K,E and C 2 n,K,E are always nonnegative and have the same zero-points. When C 1 n,K,E ≤ 1, C 2 n,K,E ≤ (C 1 n,K,E ) 2 . When C 1 n,K,E = 0, the quantum code can perfectly detect the error set E.
To find symmetric codes with code parameters ((n, K, d)) 2 , we use the Pauli error model and choose where O α s are Pauli tensor products. Likewise, when searching for asymmetric codes with code parameters ((n, K, d e (c Z ))) 2 , we choose To find channel-adaptive codes for a general noise channel N (ρ) = α E α ρE † α , we choose Note that the error set E in principle can include non-unitary and non-Hermitian errors. For such errors, we can either twirl them to Pauli errors or simulate them directly by adding ancilla qubits and performing positive-operator valued measures (POVMs). In practice, due to the inexact realization of an encoding isometry, quantum error correction/detection conditions are not exactly satisfied, and QECCs cannot protect the information from errors perfectly. Nevertheless, QECCs can still detect and correct most errors. Such approximate quantum error correction schemes hold great promise [43,44]. A parameter ε characterizes the inaccuracy of an approximate code. If a QECC is ε-correctable for a noise channel N , its worst-case entanglement fidelity is greater than 1 − ε with appropriate recovery [45]. Bény et al.
proposed an approximate version of the Knill-Laflamme conditions for such approximate codes.
Proposition 2 Consider an n-qubit noise channel N (ρ) = α E α ρE † α , and a quantum errorcorrecting code We choose the error product set E = {E † α E β |E α , E β are Kraus operators of N }. Denote the cost function Eq. (15) of the basis states as C 1 n,K,E . Then the code C is ε-correctable under N with ε bounded by Proof Let λ αβ = j ψ j |E † α E β |ψ j /K. To satisfy Eq. (20), we set (25) According to Lemma 1, the inaccuracy ε of code C is upper bounded by K 2C 1 n,K,E . In short, given a noise channel N , as long as we minimize the channel-adaptive cost function to a sufficiently small value, we rigorously find an approximate channel-adaptive code with small inaccuracy. Similar bounds for symmetric or asymmetric quantum codes are given as follows.
Proposition 3 Consider an n-qubit noise channel N (ρ) = α E α ρE † α where each E α non-trivially acts on no more than (d − 1)/2 qubits, and a quantum error-correcting code We where O α s are Pauli tensor products. Denote the cost function Eq. (15) of the basis states as C 1 n,K,E , the number of Kraus operators of N as m. Then the code C is ε-correctable under N with ε bounded by Proof The proof is given in Appendix A.
Proposition 4 Consider an n-qubit noise channel N (ρ) = α E α ρE † α with each E α proportional to a Pauli error with c Z -effective weight smaller than d e (c Z )/2, and a quantum error-correcting code We where O α s are Pauli tensor products. Denote the cost function Eq. (15) of the basis states as C 1 n,K,E , the number of Kraus operators of N as m. Then the code C is ε-correctable under N with ε bounded by Proof The proof is given in Appendix B. Note that Propositions 3 and 4 give pretty loose bounds. The true code inaccuracy, which depends on the particular noise channel, is usually significantly smaller.

Algorithm
Variational quantum circuits (VQCs) have been widely used in near-term quantum algorithms for various tasks [48,49], such as ground state preparation [50,51], eigenenergy estimation [52,53], quantum data compression [54,55], quantum circuit compiling [56,57]. Give a pure product state as input, one iteratively updates the circuit parameters based on measurement results, and finally outputs the desired state. In VarQEC, the output states serve as the basis states of a quantum code, and its encoder is given by the quantum circuit. The structure of our algorithm is illustrated in Fig. 2.
Suppose we have a NISQ device with a hardware connectivity graph G. The vertices denote qubits, and the edges denote adjacent qubit pairs. One can apply single-qubit rotations to each qubit and two-qubit gates to adjacent qubits. We aim to find a K-dimensional QECC that can detect an error set E = {E µ }, and the encoding circuit should be as shallow as possible.
Before running the algorithm, we design a multilayered VQC which is hardware-efficient for the connectivity graph. Denote the number of VQC layers as L, the maximum acceptable number of layers as L max , the evolution of the VQC as U (θ) where θ are the circuit parameters. We start from L = 1 and sample the initial θ randomly. Also, we delicately select k = log(K) physical qubits to prepare the logical data, where the logarithm is with respect to base 2. These k qubits should be scattered instead of concentrated since we hope the remaining qubits are connected to them by very few edges. First, we initialize the selected qubits to one of the K binary strings |0 , |1 , . . . , |K − 1 , and initialize the remaining qubits to |0 ⊗(n−k) . These product states span the input code space The cost functions can be estimated by running specific circuits and doing measurements. To estimate ψ j |E µ |ψ j , we prepare the initial state |j − 1 |0 ⊗(n−k) , evolve the system with the VQC U (θ), then measure the local observable E µ . If errors E µ1 , E µ2 , . . . commute, they can be measured simultaneously in a single shot. To estimate ψ i |E µ |ψ j , we start from |j − 1 |0 ⊗(n−k) , then sequentially evolve the system with U (θ), E µ , and U † (θ), then measure the final state in the computational basis. The measurements are assisted by post-selection: we first measure the n − k auxiliary qubits, and if the result is |0 n−k , we measure the remaining k qubits. Denote the probability of obtaining the binary string oretically, this step will also yield ψ j |E µ |ψ j . However, since VarQEC is a NISQ algorithm, we prefer to use a shallower circuit to estimate cost function terms whenever possible.
In the above description, we assume the error set E only consists of Pauli errors. It does not matter if E includes non-unitary or non-Hermitian terms. Adding ancilla qubits or Pauli twirling can handle it. See Sec. 5.3.1 for a detailed example.
The optimization of θ consists of two stages. The first and the main stage is mini-batch learning. After sampling the initial θ, we minimize C 2 n,K,E with mini-batch gradient descent. The schematic is shown in Fig. 2. Within each iteration, we sample a subset E S ⊂ E, estimate the corresponding partial 2 -norm cost function via measurements, then perform a single gradient descent step with a learning rate η: The required number of measurements to estimate C 2 n,K,E S (θ) up to additive error is of order O(K 2 |E S | 2 / 2 ). The gradient can be estimated through finite-differencing or by combining the chain rule and the parameter shift rule [58]. Minibatch gradient descent allows for a more robust convergence and avoids being trapped in a local minimum. We repeat sampling and gradient descent until convergence. The reason that we minimize C 2 n,K,E first is because it converges much faster than C 1 n,K,E . In addition, C 2 n,K,E is differentiable but C 1 n,K,E is not. If the error set E consists of too many terms, a promising alternative method is to construct "classical shadows" [59] for each basis state |ψ j , then use the shadows to estimate the cost functions classically. The shadow tomography technique can help us implement large-batch optimization with a smaller measurement overhead.
After adequate mini-batch learning, if C 2 n,K,E is relatively small (e.g., C 2 n,K,E < 0.01), we estimate C 1 n,K,E and fine-tune the parameters θ with respect to it since C 1 n,K,E is directly related to the inaccuracy of the code (see Propositions 2,3,4). In this work, we use Powell's method [60], a gradient-free optimizer, for fine-tuning. If C 1 n,K,E is smaller than an acceptable cost tolerance C 1 tol , we stop the optimization and output the final parameters θ opt . Throughout this paper, we set the tolerance as In the ideal case, we obtain the optimal parameters θ opt = arg min The output QECC . . . , (36) is the target approximate quantum code with small inaccuracy. The variational quantum circuit U (θ opt ) serves as the encoding circuit. Further, we can remove redundant gates from the VQC.
If C 1 n,K,E is greater than C 1 tol , we increase the circuit depth L and repeat the optimization steps. If C 1 n,K,E is always greater than the tolerance even when L = L max , we fail to find an eligible code. The detailed procedure is illustrated in Algorithm 1.
A natural question arises: can a fixed-depth VQC find any ((n, K)) 2 quantum code? Haug et al. used the quantum Fisher information matrix to assess the expressive power of a VQC with a fixed input state |0 ⊗n [61]. We generalize this notion to multiple inputs to assess the expressive power of a VQC in VarQEC. If a VQC is capable of finding any ((n, K)) 2 quantum code, we say it is overparameterized with respect to code parameters ((n, K)) 2 . See Appendix C for more details.
When the VQC U (θ) is underparameterized for ((n, K)) 2 , the set of reachable output codes forms a low-dimensional submanifold of the complex Grassmannian Gr(K, 2 n ), Algorithm 1: VarQEC Input: Error set E, hardware-efficient VQC U (θ) with L layers, acceptable number of layers L max , acceptable cost tolerance C 1 tol . Output: An approximate quantum code with a hardware-efficient encoder that detects E.
The VarQEC algorithm searches this submanifold for an eligible code. When U (θ) is overparameterized for ((n, K)) 2 , it can explore all relevant directions and the set of reachable output codes is equivalent to Gr(K, 2 n ), i.e., VarQEC is capable to find arbitrary ((n, K)) 2 quantum code. The required number of periodic bounded real parameters to overparameterize a VQC is at least 2K(2 n − K) since the complex dimension of In Ref. [62], Johnson et al. proposed a related algorithm named QVECTOR, which samples a random 2-design unitary and optimizes parameterized encoding and decoding circuits simultaneously to improve the quantum average fidelity. Compared with QVECTOR, VarQEC can find not only channel-adaptive codes but also quantum codes with specific code parameters. Var-QEC does not need a deep random circuit, which is a daunting challenge on NISQ devices, to sample a bunch of input states. We train the encoder without considering the decoder. The optimization is less likely to be trapped in a local minimum. The cost functions are estimated by measuring some local observables. We can rigorously obtain an ε-correctable approximate QECC with arbitrarily small ε. The noise models in our methods are also flexible and can be artificially assigned.

Symmetric codes
We verify the validity of our algorithm by rediscovering some symmetric codes with well-known code parameters. The cost functions are defined in Eqs. (15), (16). For code parameters ((n, K, d)) 2 , the total number of Pauli errors O α to consider is Without loss of generality, we use the complete bipartite connectivity graph: denote the qubits selected for the input as , the unselected qubits as {Q k , Q k+1 , . . . , Q n−1 }, the graph consists of k(n − k) edges that connect every selected qubit and every unselected qubit, qubits in the same set are not directly connected. For such graphs, the initial logical data can spread to each qubit rapidly since the graph diameter is only 2. The variational quantum circuit has alternating layers of single-qubit rotations R x -R z acting on all qubits and Ising-type interactions R zz acting on adjacent qubits. Denote the number of layers as L. The VQC evolution is of the form where θ l and θ E are elements in θ, U l (θ l ) denotes the l-th layer evolution, U E (θ E ) denotes the rightmost R x -R z rotations which are used to search the manifold of locally equivalent quantum codes. Since R z and R zz gates in the last layer commute, and R z -R x -R z rotations can realize arbitrary single-qubit unitary, locally equivalent QECCs can be found by the same VQC. In principle, any n-qubit unitary evolution can be realized by this ansatz with a sufficiently large number of layers since {R x , R z , R zz } is a universal quantum gate set. The connectivity graph and the periodic-structured VQC ansatz for n = 5, k = 2 are shown in Fig. 3(a,b). In general, with the increase of L, the achievable quantum codes form a higher dimensional submanifold of Gr(K, 2 n ), as shown in Fig. 3(c). When the VQC is overparameterized (L is no less than a critical number L crit ) for code parameters ((n, K)) 2 , VarQEC can explore the whole Gr(K, 2 n ) manifold. An alternative variational circuit for finding additive quantum codes is discussed in Appendix D.
In the following, we verify the local equivalence (LE) between two quantum codes with Kdimensional projectors P c and P c by sampling permutations of qubits Π q and numerically min- (a)  2 , and (probably) non-achievable code parameters (i) from bottom to top: imizing the LE-cost function where is a product of single-qubit unitaries with 3n parameters φ. If there exist Π q and φ such that C LE < 1 × 10 −10 , we say P c and P c are (locally) equivalent.
The minimum code length that protects a logical qubit against arbitrary one-qubit errors is n = 5. The ((5, 2, 3)) 2 code we rediscover is equivalent to the perfect code devised in Ref. [63]. This code is unique and translational invariant. The ((5, 6, 2)) 2 code we rediscover is equivalent to the original non-additive CWS code devised in Ref. [64]. For parameters ((6, 2, 3)) 2 , we sample different initial VQC parameters θ and find a mass of non-additive codes that are not mutually equivalent. This is consistent with our observation that an infinite family of nonequivalent ((6, 2, 3)) 2 codes exist. For parameters ((7, 2, 3)) 2 , we find non-equivalent quantum codes and some of them are not equivalent to CWS codes. See Appendix E for more discussions on ((6, 2, 3)) 2 and ((7, 2, 3)) 2 . The ((8, 8, 3)) 2 code we rediscover is equivalent to the additive code stabilized by up to permutation of qubits.
It is an open question whether a ((7, 3, 3)) 2 QECC exists. We have not yet found such a code with VarQEC, even if using an overparameterized VQC (L = 31) that is capable of finding any ((7, 3)) 2 quantum code and sampling 20000 optimization starting points. This strongly indicates that a quantum code with parameters ((7, 3, 3)) 2 is nonexistent.

Asymmetric codes
In quantum experiments, the decoherence time of a physical qubit is mainly influenced by two factors: the relaxation time T 1 and the dephasing time T 2 . Relaxation leads to all Pauli errors, whereas dephasing only leads to phaseflips (Pauli-Z errors). Denote the probabilities of X, Y , and Z errors as p x , p y , p z respectively. Usually, p x = p y = p z . The asymmetry between X/Y and Z errors motivates people to construct asymmetric QECCs that han-dle them differently [65,66]. Asymmetric codes are more resource-efficient since they can detect/correct more Pauli-X/Y errors than Pauli-Z errors or vice versa in response to demand. Researchers have extended several constructions from symmetric codes to asymmetric codes [66][67][68][69][70][71][72]. Note that the classification of symmetric and asymmetric codes depends on the error detecting/correcting capability instead of the code construction method.
For a system with X/Y -error probabilities p x = p y and Z-error probability p z . We set the bias parameter c Z as such that p z = p c Z x . In most scenarios, dephasing is dominating and phase-flip errors are more prevalent than X/Y errors. Accordingly, 0 < c Z < 1. First, we fix c Z = 1/2 (i.e., p z ≈ p 1/2 x ) and apply VarQEC to find ((n, K, d e (1/2))) 2 codes that encodes one logical qubit (K = 2) or one logical qutrit (K = 3). We discover asymmetric codes They can detect more Z errors than X/Y errors, specifically, detect the error set We now consider the opposite situation where X/Y errors are more prevalent than Z. In the extreme case, T 2 → +∞, the only source of decoherence is qubit relaxation. This process at finite temperature is modeled by the generalized amplitude damping channel. Its Kraus representation has operators where γ is the damping rate, p is a constant determined by the temperature. A 0 and A 2 introduce Pauli-Z errors of order O(γ), A 1 and A 3 introduce Pauli-X and -Y errors of order O( √ γ).
When γ is small, c Z = log p z / log p x ≈ 2. Now we fix c Z = 2 and apply VarQEC to find asymmetric codes with 2-effective distance 3. We rediscover codes with parameters These codes were introduced in Ref. [42]. They can detect more X/Y errors than Z errors, i.e., the error set Furthermore, we find new codes with 2effective distance 4 with K = 2 or K = 3, i.e., Some ((6, 2, d e (2) = 4)) 2 codes are equivalent to the additive ((6, 2, 3)) 2 code stabilized by They can detect the error set For the generalized amplitude damping channel, ((6, 2, d e (2) = 4)) 2 and ((8, 3, d e (2) = 4)) 2 can detect up to three A 1 /A 3 errors or one A 0 /A 2 error, and correct one A 1 /A 3 error. Assisted by post-selection [73], these codes hold the promise to achieve lower logical error rate than codes with d e (2) = 3.

Channel-adaptive codes
In the previous sections, we have only discussed uncorrelated errors, symmetric or asymmetric. This section considers quantum channels with correlated noise. We apply VarQEC to find the corresponding channel-adaptive codes.
Correlated errors are ubiquitous in quantum computing experiments. When two adjacent qubits are not sufficiently separated, the errors occurring on them can be highly correlated [22]. These spatially correlated errors invalidate many well-known quantum codes and dim the hope of fault-tolerant quantum computing. Suppose we ignore the exact connectivity graph and the noise type. In that case, we need at least 11 physical qubits to protect one qubit of information from general correlated errors (i.e., the double errorcorrecting ((11, 2, 5)) 2 code) [74]. Even so, the encoding isometry may not be hardware-efficient. In the following, we investigate two correlated noise channels in detail and introduce channeladaptive codes discovered by VarQEC.

Nearest-neighbor collective amplitude damping
The first testbed is the nearest-neighbor collective amplitude damping channel. Suppose we have n qubits in a ring, as shown in Fig. 6. Every two neighboring qubits collectively interact with a single environment and exhibit collective dynamics of amplitude damping [75,76]. The corresponding Kraus operators are where γ 01 , γ 02 , γ 12 are damping rates. For a short decay time τ , γ 01 and γ 12 are of order O(τ ), γ 02 is of order O(τ 2 ). Each error acts on two neighbouring qubits Q j -Q j+1 . To find quantum codes that approximately correct one nearest-neighbor collective amplitude damping error, we expand the  above Kraus operators with respect to τ , abandon trivial/high-order terms and obtain (55) Each K 0 contributes one factor of √ τ , each K 1 /K 2 contributes one factor of τ . Suppose E α and E β are products of the identity, K 0 , K 1 , and K 2 . The target error set (in VarQEC) consists of terms with total order less than τ 3/2 . Note that here, some error products E † α E β are non-unitary and non-Hermitian. To compute cost functions C 1 n,K,E and C 2 n,K,E (Eqs. (15), (16)), we need to estimate and for various i, j, α, β (i = j). ψ j |E † α E β |ψ j is a complex number that can be obtained as follows: we prepare the state |ψ j and measure two Hermitian observables . The first expectation value gives the real part of ψ j |E † α E β |ψ j and the second expectation value gives its imaginary part.
We estimate ψ i |E † α E β |ψ j with i = j using POVMs. Specifically, we prepare the state |ψ j , add an ancilla qubit and implement the operation where E aux is an auxiliary Kraus operator such Then we measure the ancilla qubit in the computational basis and post-select the cases of |0 . Measuring the ancilla qubit in the |0 state indicates that the error E † α E β has occurred. The corresponding probability is and the corresponding state is For these postselected states, we apply the inverse of the VQC and do projective measurements. The conditional probability of obtaining the binary string |i − 1 |0 ⊗(n−k) is Therefore, we can estimate ψ i |E † α E β |ψ j by √ p ij p 0 .

Nearest-neighbor collective phase-flips
The second noise channel we consider is a combined channel of nearest-neighbor collective phase-flips and single-qubit errors. The channel consists of two stages. In the first stage, a local depolarizing error with noise rate p occurs on each qubit. In other words, Pauli errors X, Y, Z occur on each qubit with probability p/4. Different local errors act independently. We denote the corresponding global noise channel as In the second stage, nearest-neighbor collective phase-flip errors ZZ with noise rate p zz occur on adjacent qubit pairs Q i -Q j . We denote the corresponding global noise channel as The overall process is Directly applying VarQEC to N is not resource efficient since the Kraus representation of N consists of O(exp(n)) operators. For practical purposes, we apply our algorithm to the following channel instead, The second term takes summation over all qubits. The last term takes summation over all adjacent qubit pairs i, j . Its Kraus operators are where qubit-i and qubit-j are adjacent. N is the first-order approximation of N with respect to the error parameters p and p zz . The Kraus representation of N consists of only poly(n) operators. N and N are equivalent in the zero-noise limit, lim Suppose for N , we find an ε-correctable approximate code with ε 1. Namely, with appropriate recovery R, the entanglement fidelity is Then for the original noise channel N , the entanglement fidelity naturally has the form The QECC can push the first-order errors down to an extremely small level. To find quantum codes that correct multiple errors, we can choose a higher-order approximation and similarly implement VarQEC. Given a connectivity graph G with edge number |E(G)| and maximum vertex degree ∆(G). In the following, we set p zz = 0.99/(3n + |E(G)|), p = 4p zz .
For a generic input state ρ, the probability of receiving the same state after going through the noise channel is about ∼ 0.01. The target error list to detect in VarQEC is Still, we use the VQC ansatz with alternating layers of single-qubit rotations R x -R z acting on all qubits and Ising-type interactions R zz acting on adjacent qubits. The circuit depth of a VQC with L layers is of order O(L∆(G)).
After adequate optimization, we find approximate channel-adaptive codes for N = N 2 • N 1 with hardware connectivity graphs shown in Fig. 7. The codes for graphs (a,b,h) are degenerate, and the others are non-degenerate. Six physical qubits suffice to encode one logical qubit, and eight physical qubits suffice to encode two logical qubits.
Note that up to a local unitary transformation, these codes can correct an arbitrary single-qubit error followed by an adjacent U ⊗ U error for any fixed U ∈ U(2) with eigenvalues {−1, 1}.
We investigated the codes for graphs (c,d) in more detail. Clearly, they have code parameters ((7, 2, 3)) 2 . We calculate their quantum weight enumerators [77], which were defined by with coefficients These two codes are locally equivalent and therefore have the same weight enumerators, i.e., Further, we verified that they are locally equivalent to a non-degenerate additive code stabilized by g 1 = X I Z X X I X g 2 = Z I I X X X Z g 3 = I X Z X Z Z Z g 4 = I Z Z I Z Y Z g 5 = I I Y X Z I X g 6 = I I I Z Y Y X (81) up to permutation of qubits. This additive code can correct arbitrary single-qubit errors and 2qubit collective phase-flips occurring on any qubit pairs, i.e., the error set According to the quantum Hamming bound, for one logical qubit, no non-degenerate quantum code with code length n < 7 can correct arbitrary single-qubit errors as well as 2-qubit collective phase-flips since with K = 2 only holds when n ≥ 7. Two ((7, 2, 3)) 2 stabilizer codes were investigated in detail. One is the famous Steane code [14] based on the Calderbank-Shor-Steane (CSS) construction. The other is a non-CSS code found by numerical greedy search, called the bare code [27]. Their weight enumerators are as follows, (84) and A {bare} (z) =1 + 5z 2 + 11z 4 + 47z 6 , QECCs with different weight enumerators are not locally and translationally equivalent. Our code is different from the Steane and the bare ((7, 2, 3)) 2 codes. See Appendix F more weight enumerators. The Steane and the bare codes cannot correct nearest-neighbor collective phase-flips. For the combined channel of nearest-neighbor collective phase-flips with noise rate p zz and single-qubit errors with noise rate p, the entanglement fidelity of our code is of the form whereas the entanglement fidelity of the Steane and the bare codes is of the form

Noise Resilience
Although the previously introduced results are obtained by numerical simulation, VarQEC is a hybrid quantum-classical algorithm meant to be run on NISQ devices where quantum gates are inevitably noisy. In this section, we demonstrate that VarQEC is pretty resilient to random gate errors. As long as the error rate p gate is below a reasonable threshold, VarQEC can find an efficient encoding circuit that prepares the correct code. This resilience is essentially analogous to the noise resilience in variational quantum compiling [56].
We start from the simplest noise model, global depolarizing, and introduce the following theorem.
Theorem 5 Suppose the variational quantum circuit in VarQEC is accompanied by global depolarizing noise acting continuously throughout the circuit. If the ideal circuit is capable of finding an eligible quantum code, after adequate optimization with the noisy circuit, the output parameters θ opt are still correct.

Proof
Consider the cost functions in Eqs. (15), (16). Due to the global depolarizing noise, when we run the VQC U (θ) to prepare a basis state |ψ j , we instead obtain when we apply U † (θ)E µ U (θ) to an initial binary string |j − 1 |0 to prepare the output state |ψ j,µ , we instead obtain

pseudo-optimal parameters
(90) Since the ideal variational quantum circuit is capable of finding an eligible quantum code, each term in the cost function Eq. (15) can be minimized to 0 (i.e., ψ i |E µ |ψ j = 0, ψ j |E µ |ψ j − E µ = 0). Comparing Eq. (15) and Eq. (90), we conclude that To sum up, VarQEC is perfectly resilient to global depolarizing noise, i.e., it can find the correct encoding circuit in the presence of global depolarizing.
In practical scenarios, circuit noise is more complicated and single-qubit errors dominate. Now we consider a more realistic model. Suppose each 2-qubit R zz gate in the VQC is accompanied by local depolarizing noise and collective phase flips, as illustrated in Fig. 8. Before the ideal unitary R zz , the two qubits goes through N ⊗2 DP • N ZZ , after the ideal R zz , the system goes through N ZZ • N ⊗2 DP . In the following, for gate error rate p gate , we set the error rate of each N DP as p gate /2, the error rate of each N ZZ as p gate /8.
Still, we use VarQEC to find channel-adaptive codes for noise channel N (Eq. (68)) with hard-ware connectivity graphs (c,d) shown in Fig. 7. The difference is that this time the VQC is noisy. After optimization, we obtain the pseudo-optimal parameters θ opt . It is interesting to note that if we transfer θ opt to an ideal VQC, the corresponding cost function C 1 n,K,N (θ opt ) can be much smaller than the one we estimated with the noisy VQC. Namely, we find a roughly correct encoder even if we use a noisy VQC in our algorithm. The comparison of cost functions for different gate error rates is given in Fig. 9(a). The cost reduction for both graphs is obvious. Two-qubit gate error rates on state-of-the-art NISQ computers are about ∼ 10 −2 [79]. One can run our algorithm on current hardware directly.
Suppose the input state of a quantum circuit is |ψ in , the target unitary evolution is U ideal . The ideal output state is However, due to quantum gate errors, the output state ρ out is a mixed state. We express ρ out as a summation of three terms, where N circuit denotes that channel of the noisy quantum circuit, λ 1 is the smallest eigenvalue of ρ out multiplied by 2 n , I/2 n is the maximally mixed state, ρ 2 is a density operator orthogonal to |ψ ideal , i.e., The latter two terms of Eq. (93) are both induced by gate errors, but they have different effects on the noise resilience of our algorithm. The second term is a global white noise, as we analyzed in Theorem 5, it does not affect the optimal parameters. However, the third term λ 2 ρ 2 non-trivially alters the optimization landscape and introduces some local minima. Usually, both the second term and the third term are not negligible. Nevertheless, we are certain about the trend: with the increase of circuit depth, the second term will dominate the third term eventually [80,81].
For VQCs corresponding to graphs (c,d), we fix gate error rate 0.01, and try different numbers of layers with randomly sampled θ. The average value of λ s are shown in Fig. 9(b,c). Each point is averaged over 100 samples. Compared with the one-dimensional ring (graph (c)), vertices in the complete graph (graph (d)) are more tightly connected, local errors can be transformed into global white noise more rapidly. For both graphs, λ 1 λ 2 when L is relatively small and λ 1 λ 2 when L is relatively large. With the decrease of gate error rate and the increase of cir-cuit depth, VarQEC will become more resilient to noise. Additionally, one might also consider estimating cost functions in VarQEC more precisely with error mitigation techniques like virtual distillation [82,83].

Barren Plateaus
The barren plateau (BP) [84,85] and the noiseinduced barren plateau (NIBP) [86] are two daunting challenges in variational quantum optimization. In this section, we numerically investigate their effects in the VarQEC algorithm.
The barren plateau is a phenomenon where the gradients vanish exponentially with the increasing number of qubits [84]. It occurs when the VQCs form a unitary 2-design, regardless of whether the VQC is noisy or noiseless. Ref. [85] connected the locality of the cost function and the trainability of the corresponding VQC. If the cost function is local and the circuit depth is of order O(log(n)), the BP does not occur (i.e., the VQC is trainable). However, if the cost function is global or the circuit depth is of order O(poly(n)), a BP occurs in the optimization landscape, and the VQC is untrainable.
In near-term quantum computation, dominating errors always only act on several local qubits. Accordingly, the cost functions C 1 n,K,E (Eq. (15)) and C 2 n,K,E (Eq. (16)) are merely influenced by local errors E µ . Therefore, we expect the same conclusion to hold for VarQEC: the BP does not occur when the circuit depth is of order O(log(n)) and occurs when the circuit depth is of order O(poly(n)).
Without loss of generality, here we use the star connectivity graph S n−1 (S 7 is illustrated in Fig. 10(b) inset) and focus on C 2 n,2,E with E = {O α | wt (O α ) < 3}, i.e., searching for QECCs that encode one logical qubit information and correct an arbitrary single-qubit error. Fig. 10(a) plots the partial derivative of the off-diagonal cost (95) and the diagonal cost with respect to a randomly selected circuit parameter θ j . When the number of VQC layers is L = 3 or L = log(n) , the circuit is trainable. However, when L = n, both off-diagonal and diagonal gradients decay exponentially with the increasing number of qubits.
The noise-induced barren plateau refers to a conceptually different phenomenon where cost gradients vanish exponentially with L due to hardware noise accumulation [86]. Consequently, the gradients vanish exponentially with n if L grows linearly with n. Unlike the noise-free BP, NIPB only occurs when the VQC is noisy, regardless of whether the circuits form a unitary 2-design. Still, we consider the local noise model illustrated in Fig. 8, system size n = 8, number of layers L = 1, 5, 10, 15, 20, 25, 30, noise rate p = 0, 5 × 10 −3 , 0.01, 0.02, 0.03, 0.04, 0.05. The numerical results for the gradients are shown in Fig. 10(b). With the increase of L, the partial derivatives of C 2 n,K,E with respect to a random parameter decay exponentially, and the decay factor is determined by the noise rate. This illustrates that although VarQEC can find a roughly correct encoder after adequate training with a noisy VQC (noise resilience), the required training time grows exponentially with the number of circuit layers.
BPs and NIBPs manifest themselves in Var-QEC when the circuit depth gets large. Nevertheless, we do not need to worry too much about them. From a practical standpoint, we are more interested in QECCs with a shallow (even constant depth) encoding circuit. The gradients of cost functions tend to be large when searching for these codes. In addition, there are more and more effective strategies to mitigate BPs, e.g., cost function partitioning and meta-learning [87] as well as optimization guided by classical shadows [88]. These protocols can be applied to Var-QEC reasonably.

Experiment on an IBM machine
Now we experimentally demonstrate VarQEC with a real superconducting quantum machine, ibm_quito [89].
The connectivity graph of ibm_quito is shown in Fig. 11(a). Our goal is to find a 4-qubit approximate QECC to correct one amplitude damping error [43] using physical qubits Q 0 , Q 1 , Q 2 , Q 3 .
The Kraus operators of the amplitude damping channel are Each (I − Z) term contributes a factor of γ and each (X + iY ) term contributes a factor of √ γ.
To correct a single amplitude damping error, we only need to consider error products with total order less than γ 3/2 : The variational quantum circuit we use is illustrated in Fig. 11(b). When the rotation angle θ = ±π/2, the VQC serves as an exact encoder. Since Q 2 and Q 3 are not directly connected, the IBM compiler adds 2 additional SWAP gates (each realized by 3 CNOT gates) to implement CNOT between Q 2 and Q 3 ). The hardware-efficient VQC after compiling is shown in Fig. 11(c). Due to hardware constraints, we slightly modify the VarQEC algorithm and enhance it with quantum error mitigation (EM) as follows. Suppose the initial parameter θ = 0.1, we iteratively apply the VQC to input states |0000 and |0010 , do quantum state tomography on the output mixed states and record their density matrices ρ 1 and ρ 2 . Then we classically extract their dominating eigenstates The cost gradients are estimated by finite differencing: (100) with δθ = 0.05. In the first stage (first 15 iterations), we minimize C 2 n,K,E with learning rate η = 1 until C 2 n,K,E < 0.01. Then we switch to C 1 n,K,E and minimize it with a smaller learning rate η = 0.05. The training curves of the estimated C 2 n,K,E with/without error mitigation and its real value are shown in Fig. 11(d). After adequate training (25 iterations), the parameter θ converges to about 1.63, slightly greater than the ideal angle π/2 (indicated by the dashed line in the inset). Nevertheless, this difference is acceptable, the VQC still encodes an approximate amplitude damping code.
We implement a total of 152 quantum circuits for this experiment: 100 for estimating the gradients and 52 for estimating the cost functions.
VarQEC is robust to hardware noise; therefore, it is particularly promising in the NISQ era. A problem worth studying further is how to choose the most resource-efficient variational quantum circuit in VarQEC. There is reason to believe that the optimal VQC ansatz is code-dependent. For example, when the target quantum code is translational-invariant, one may use a VQC with a certain amount of symmetry, where different gates can share the same parameter. If we slightly modify the cost functions, VarQEC can be used for finding some QECC variants like the hybrid quantum-classical codes [90], estimating the zeroerror capacity of noisy quantum channels [91], and solving quantum marginal problems [92].
VarQEC can also be directly revised to a classical algorithm. When a NISQ processor is not accessible, one can replace the VQCs with classical variational ansatzes like tensor networks [93][94][95] or neural network quantum states [96], and then similarly implement optimization and search for eligible quantum codes merely with a classical computer. However, the encoding circuits can not be naturally obtained.
we know The first inequality uses von Neumann's trace in-equality.
Each E α non-trivially acts on no more than (d − 1)/2 qubits, therefore, each error product E † α E β non-trivially acts on no more than (d − 1) qubits. We expand E † α E β in the Pauli basis, where each O αβ γ is a Pauli tensor product with weight less than d, (106) According to Eq. (103), we have For the basis states {|ψ 1 , |ψ 2 , . . . , |ψ K }, (108) Similarly, we have Further, from the completeness relation of the we obtain the completeness relation of the error products, The c Z -effective weight of each E α smaller than d e (c Z )/2, therefore, each error product E † α E β is proportional to a Pauli tensor product with c Zeffective weight where O αβ is a Pauli tensor product. According to Eq. (114), we have the normalization condition α,β For the basis states {|ψ 1 , |ψ 2 , . . . , |ψ K }, Similarly, we have According to Proposition 2, the code is εcorrectable with ε bounded by

C Parameter Dimension and Overparameterization
The quantum Fisher information matrix (QFIM) is an essential concept in quantum metrology [97][98][99]. In recent years, its applications in NISQ algorithms and quantum machine learning have also been noticed [61,100]. Ref. [61] uses the QFIM to assess the expressive power of a VQC with the fixed input state |0 ⊗n . In VarQEC, however, we use K orthogonal input states to find an ((n, K)) 2 quantum code. In this section, we generalize the notion of QFIM to multiple input states to quantify the expressive power of a VQC for preparing an ((n, K)) 2 quantum code. Based on that, we discuss the parameter dimension and the overparameterization of a VQC encoder. Suppose the VQC encoder has parameters For a fixed pure input state, one relates the QFIM F(θ) to the distance in the space of pure quantum states by where |∂ l ψ denotes ∂|ψ(θ) /∂θ l . In this case, the parameter dimension D c for a VQC is defined as the number of independent parameters that the VQC can express in the space of output states. Numerical evidence shows that D c is usually equivalent to the rank of QFIM for hardwareefficient VQCs with periodic and non-correlated random parameters θ [61]. In VarQEC, the inputs are K orthogonal pure states. Denote the projector onto the output space as P c . We relate F(θ) to the distance in the space of K-dimensional projectors, where the distance (126) is defined as the fidelity between the normalized mixed states of projectors P c (θ) and P c (θ ). Suppose the projector P c (θ) has eigen decomposition and denote the basis of its orthogonal complement as {|ψ j } j=K+1,K+2,...,2 n , the QFIM under our framework is of the form (128) We remark that the derivation of QFIM for projectors is the same as for density matrices [98]. Therefore, a similar formula can be used to compute the QFIM of a VQC with mixed inputs/outputs. Through sampling random parameters θ from the interval [0, 2π) N and computing the QFIM, we can estimate the parameter dimension D c by rank (F(θ)). The VarQEC algorithm searches a D c -dimensional submanifold of the complex Grassmannian Gr(K, 2 n ).
Without loss of generality, we consider the connectivity graph shown in Fig. 12. For K = 1, 2, 3, 4, we randomly sample parameters θ and Figure 12: The bipartite connectivity graph for finding a quantum code with parameters ((7, 3, 3)) 2 . Qubits Q 0 and Q 1 are selected to prepare the logical data. plot rank(F(θ)) as a function of the number of VQC layers L in Fig. 13. Almost no parameterized gate is redundant when the circuit is underparameterized (D c /N ≈ 1). With the increase of L, rank(F(θ)) increases approximately linearly until achieving its maximum D max c . The maximum parameter dimension for code length n and code dimension K is of the form This agrees with the fact that the dimension of the complex Grassmannian Gr(K, 2 n ) is K(2 n − K) [101]. When D c = D max c , the VQC can explore the whole Gr(K, 2 n ) manifold and prepare arbitrary ((n, K)) 2 quantum code. The required number of layers to saturate the maximum parameter dimension is approximately where |E(G)| is the number of edges of the connectivity graph.

D A variational quantum encoder for additive codes
The VQC with bipartite connectivity performs well in most cases. However, for some code parameters (e.g., ((10, 4, 4)) 2 ), it needs a large bunch of samples of the initial θ to find an eligible code. Here we propose AC-VQC, another variational quantum circuit with all-to-all connectivity, to complement the bipartite ansatz.
The AC-VQC is especially resource-efficient in finding encoding circuits of additive codes. The structure of an AC-VQC is similar to the circuit of the quantum Fourier transform, as shown in Fig. 14. We start from two physical qubits (Q 0 , Q 1 ) and apply a 2-qubit parameterized unitary operator U 01 to them. Then, we add another qubit (Q 2 ), apply 2-qubit parameterized unitary operators U 02 /U 12 to the new one and each of the qubits that already exist (Q 0 -Q 2 , Q 1 -Q 2 ). Repeat the steps until the system size equals n. In the end, we apply single-qubit rotations R z and R x to all qubits to explore the manifold of locally equivalent codes. The initial k qubits prepare the logical data. The total circuit depth is of order O( n−1 j=1 j) = O(n 2 ).  E Non-CWS Quantum Codes with Parameters ((6, 2, 3)) 2 , ((7, 2, 3)) 2 The ((5, 2, 3)) 2 code is known to be unique. However, the classification and construction of ((6, 2, 3)) 2 and ((7, 2, 3)) 2 quantum codes are unclear. Some are said to be "non-CWS" since they are not locally equivalent to CWS codes. Here we present a general construction of non-CWS quantum codes based on stabilizer ones.

Theorem 6
If there exists a quantum code C with parameters ((n, 2 k , d)) q , then there exist degenerate ((n , 2 k , d)) q codes {C } with n > n that are not locally equivalent to a CWS code.
Proof For n > n+1, we can directly obtain non-CWS codes by taking the tensor product with a non-stabilizer state.
For n = n+1, we take the tensor product of the code C with a fixed (stabilizer) state, then apply a non-Clifford entangling unitary operation to one of the original qudits and the additional qudit. This will conjugate the Pauli-stabilizers to nonlocal stabilizers. The resulting code is a non-CWS degenerate code with parameters ((n , 2 k , d)) q .

F.3.2 Nearest-neighbor collective phase-flips
This part lists the quantum weight enumerators of the channel-adaptive codes for the combined noise channel N (Eq. (68)) with hardware connectivity graphs shown in Fig. 7.