Quantum capacity and codes for the bosonic loss-dephasing channel

Bosonic qubits encoded in continuous-variable systems provide a promising alternative to two-level qubits for quantum computation and communication. So far, photon loss has been the dominant source of errors in bosonic qubits, but the significant reduction of photon loss in recent bosonic qubit experiments suggests that dephasing errors should also be considered. However, a detailed understanding of the combined photon loss and dephasing channel is lacking. Here, we show that, unlike its constituent parts, the combined loss-dephasing channel is non-degradable, pointing towards a richer structure of this channel. We provide bounds for the capacity of the loss-dephasing channel and use numerical optimization to find optimal single-mode codes for a wide range of error rates.


Introduction
The presence of noise in quantum systems is one of the main challenges towards realizing useful quantum computation and communication. Indeed, qubit errors severely limit the number of gates that a quantum processor can run or the bandwidth at which a quantum communication channel can operate.
The primary method at our disposal for dealing with noise in quantum systems, apart from improving the underlying hardware, is quantum error correction [1]. This requires encoding the logical information redundantly, making it resistant to the noise in the system. As such, the optimal encoding depends heavily on the noise structure, with the minimal overhead of physical modes to reliably encode a single logical qubit (or the maximal rate of communication) being determined by the quantum capacity of the noise channel [2,3].
Bosonic quantum error correction uses the infinite number of levels in a single harmonic oscillator to encode quantum information redundantly. This provides a hardware-efficient alternative to traditional schemes, which encode logical qubits in arrays of physical qubits. In addition, the use of a single el-Serge Rosenblum: serge.rosenblum@weizmann.ac.il ement substantially simplifies the noise model [4][5][6]. The most common noise types in bosonic systems are bosonic excitation loss (photon or phonon loss) and bosonic dephasing. Bosonic loss results from energy exchange with a cold environment, whereas bosonic dephasing is often caused by frequency fluctuations due to dispersive interactions with unaccounted for degrees of freedom. Another noise mechanism in bosonic systems is thermal excitation noise [7]. However, this error is less prevalent and will not be considered in this work.
Bosonic loss has often been a dominant source of noise in experimental systems. This noise channel is well understood, both in terms of its quantum capacity and the study of specific codes that protect against loss, such as the Gottesman-Kitaev-Preskill (GKP) code [5,6,8,9]. However, recent experiments with superconducting cavities indicate that photon loss rates can be substantially reduced [10]. Meanwhile, various physical processes (e.g., thermal excitations of coupled superconducting qubits [11,12] and stray microwave photons [13]) may lead to dephasing of the superconducting cavities. Hence, we expect that both loss and dephasing errors will become practically relevant errors requiring active quantum error correction. However, since photon number and phase are complementary observables, a tension exists between the ability to correct both error types [14,15]. This suggests a nontrivial error structure, necessitating a thorough study of the joint loss-dephasing channel.
In this work, we study the bosonic loss-dephasing channel from two complementary perspectivesthrough the lens of quantum information theory and quantum capacity (section 3) and through the study of numerically optimized single-mode codes (section 4). We show that, unlike pure dephasing or pure loss, the combined loss-dephasing channel is not degradable [2]. This suggests that the combined channel has a more complex structure than its constituent parts, complicating the derivation of its quantum capacity [16]. Nonetheless, we explore the theoretical limits of the loss-dephasing channel by providing upper and lower bounds on its quantum capacity. In the second effort, we use a biconvex optimization scheme [17] to find single-mode encodings that produce an optimal encoding-decoding entanglement fidelity.
Earlier work [18] showed that applying this method to pure loss leads to codes similar to the hexagonal GKP code. Our results indicate that some of the optimal codes for the loss-dephasing channel are close to the numerical codes presented in Ref. [19]. We also find that some optimal codes possess rotational symmetry. In the pure-loss and pure-dephasing channels, the energy constraint in the optimization problem is saturated. In contrast, the energy constraint is not typically saturated for the combined channel. Next, we compare the performance of the optimized codes to other known codes, such as GKP codes and cat codes [4,20]. Finally, we combine the two perspectives by comparing the hashing bound of the numerically optimized codes with the quantum capacity bounds (section 5).

Preliminaries
In this section, we provide definitions for quantum channels following the notation of Ref. [7].
Let H X be the Hilbert space of a system X, L (H X ) the space of linear operators over H X [2], and D (H X ) the space of density matrices. A linear map N A→B : L (H A ) → L (H B ) is called a quantum channel from a sender A to a receiver B if it is completely positive and trace-preserving (CPTP), or equivalently, if for any system W , the map defined byρ A ⊗τ W → N (ρ A ) ⊗τ W , maps density matrices to density matrices. Any quantum channel N A→B can be written as a sum of Kraus operators, which are linear maps where • is any operator in L (H A ). Alternatively, every quantum channel N A→B can be represented using an isometric extension U : H A → H B ⊗H E with E denoting the environment. The channel from A to B is then obtained by tracing out the environment: This isometric extension is called a Stinespring dilation, and it is unique up to an isometry on the environment. The channel obtained by tracing over B is called the complementary channel of N A→B and is denoted by N c A→E = Tr B (U A→BE ). Finally, a channel is said to be degradable (antidegradable) if the receiver (environment) can simulate the information obtained by the environment (the receiver). Formally, the channel N A→B is degrad- , where • denotes channel composition. The property of degradability (anti-degradability) is invariant to the specific choice of Stinespring dilation.

Bosonic dephasing channel
We define the bosonic pure-dephasing channel as: where γ φ characterizes the dephasing strength and |n is the Fock state with n excitations. There are multiple other ways to represent the dephasing channel. It can be described by the Lindblad master equation where κ φ is the dephasing rate •} is the Lindbladian with jump operatorÂ, andn =â †â is the number operator of the bosonic mode with annihilation operatorâ. The dephasing channel can also be represented as an integral over continuously many Kraus operators, each representing a random phase rotation: 2γ φ e iφnρ e −iφn dφ, (6) or as a sum over a discrete set of Kraus operators: These two Kraus representations correspond to different Stinespring dilations. In particular, the discrete representation is due to a conditional displacement evolution [21]: whereb is the annihilation operator of the environment mode. In this representation, the complementary channel can be written as: with | √ γ φ n a coherent state with amplitude √ γ φ n.
The dephasing channel is degradable, since The channel N D is diagonal in the operator basis {|m n| | m, n ≥ 0}. This implies the following covariance property of N D with respect to any operatorÂ that is diagonal in the Fock basis: In particular, this identity is satisfied for phase-space rotationsÂ = e iθn , θ ∈ R.

Bosonic loss channel
The bosonic loss or photon loss channel N L [γ] can be defined using the master equation 1]. Alternatively, we can use the Kraus representation: The Stinespring dilation of N L [γ] takes the form of a beam-splitter interaction: Since for all coherent states |α X the complementary channel takes on a simple form . Finally, using Eq. 15, it can be shown that the bosonic loss channel is also covariant with respect to rotationsÛ = e iθn .

Joint loss and dephasing channel
The bosonic loss-dephasing channel N LD [γ, γ φ ] arises from combining the previous two noise mechanisms. It can be defined using the following master equation: This expression can be greatly simplified by noting that the two Lindbladians commute - We can use this identity to form a Stinespring dilation for the combined channel using two environments and the dilations of the pure-loss and pure-dephasing channels. To do so, letâ,b l ,b d be the annihilation operators corresponding to modes of the system X, a first environment E l of the loss channel and a second environment E d of the dephasing channel. The isometric extension is Finally, the loss-dephasing channel inherits covariance with respect to rotations from its constituent channels.

Quantum capacity
The quantum capacity Q N of a channel N is the highest rate at which one can reliably transmit quantum information over many uses of the channel [2]. The quantum capacity is an important metric for evaluating the usability of the channel for quantum communication and storage. It can be shown [2] that Q N is identical to the regularized maximal coherent information, which is a regularized limit over the coherent information that one can transmit using asymptotically many copies of N : where I c (ρ, M) := H(M(ρ)) − H(M c (ρ)) denotes the coherent information and H(ρ) := −Tr (ρ log 2ρ ) is the von Neumann entropy. Since Eq. 20 contains an optimization over all possible input states for asymptotically many blocks, it is generally difficult to calculate the quantum capacity. However, this calculation is greatly simplified for degradable and anti-degradable channels. The capacity of anti-degradable channels is zero, which can be intuitively understood from the no-cloning theorem. For degradable channels, the regularized maximal coherent information equals the single-shot coherent information maximized over all possible input statesρ [2]: Moreover, I c (ρ, N ) is concave inρ, allowing its calculation using convex optimization methods. In particular, if N is a rotationally covariant degradable channel on a single bosonic mode (such as pure dephasing or pure loss), we have where the inequality follows from the concavity of I c (•, N ) for degradable channels. Therefore, to calculate Q N , it suffices to optimize only over diagonal states (a similar observation was made in Ref. [21]). When the bosonic mode has finite energy, which is of practical interest, we can define the energyconstrained channel capacity [22]: wheren is the maximum allowed mean occupation number per channel use.

Previous results
As mentioned earlier, pure-loss and pure-dephasing channels are degradable, making their quantum capacities relatively easy to evaluate. Indeed, photon loss -a prominent form of Gaussian noise -is well understood from a quantum information theoretic viewpoint [18,23]. Its quantum capacity is given by [24] and in the energy-constrained case [18,25], where g(n) = H(τ (n)) is the entropy of the thermal stateτ (n) with mean occupation numbern, namelŷ τ (n) = ∞ n=0n n (1+n) n+1 |n n|. This implies that thermal states, which are diagonal in the Fock basis and have a geometric photon number distribution, achieve the upper bound on coherent information. It was also shown in Ref. [18] that a multi-mode encoding using GKP states [8] on a 2N -dimensional lattice achieves the quantum capacity up to a constant offset of log 2 e for all γ ≤ 1 2 in the limit N → ∞. Recently, the quantum capacity of the puredephasing channel was shown by Lami et al. [26] to be equal to where D(p||u) is the relative entropy between the wrapped normal distribution p and the uniform distribution u, and ϕ(q) · · = Π ∞ k=1 (1 − q k ) is the Euler function.
The energy-constrained capacity of the puredephasing channel was numerically analyzed by Arqand et al. [21]. In Appendix A, we show numerically that the coherent information of thermal states approximates the energy-constrained capacity of the pure-dephasing channel.

Quantum capacity of the loss-dephasing channel
The quantum capacity of the loss-dephasing channel N LD [γ, γ φ ] is difficult to evaluate for γ, γ φ = 0. Indeed, unlike its constituent parts, the combined channel is not degradable. Similar behavior also appears in concatenated qubit channels [16,27]. We can nonetheless provide insight into the quantum capacity by providing upper and lower bounds.

Non-degradability of the loss-dephasing channel
One of the main results of this work is a proof that the loss-dephasing channel is non-degradable.
Proof. We briefly outline the proof of the theorem. A complete proof based on four lemmas is provided in appendix B.
According to Lemma 1 and Lemma 2, for γ < 1, there is a Stinespring dilation of N LD [γ, γ φ ] with a unique degradation map D given by Eq. 42. For γ = 1, no such map exists. Since all dilations differ by an isometry on the environment, the existence of a degradation channel for one dilation implies its existence for all dilations. We prove in Lemma 4 that the map D is not a quantum channel for 1 > γ > 0, To prove the second part of the theorem, we rely on the fact that for γ ≥ 1/2, the loss-dephasing channel is a concatenation of an anti-degradable channel N L [γ] with another channel. Therefore, the information lost to the first environment (E l ) is sufficient to simulate the entire channel, proving its anti-degradability. In more detail, since N L [γ] is anti-degradable for γ ≥ 1 2 , there exists an anti-degradation channel D such that Using the Stinespring dilation for Eq. 19, we can write the complementary channel as where U l acts only on X and E l and U d acts only on . Combining the expressions above, we obtain proving anti-degradability for γ ≥ 1 2 .

Data processing upper bound
Since the loss-dephasing channel can be written as the composition of a loss channel and a dephasing channel, a data processing argument [2] implies that its capacity is smaller than that of each of its two constituent channels. This inequality allows the derivation of an upper bound:

Single-mode lower bound
The coherent information of any single-mode input state provides a lower bound on the capacity. A good choice for a representative state is a thermal state that saturates the energy bound, since the coherent information of this state equals the capacity in the pure loss case and approximates the capacity in the puredephasing case (see appendix A). More generally, we can optimize the coherent information over all diagonal states: where Q diagonal and Q thermal are lower bounds obtained by calculating the one-shot coherent information using the optimal diagonal state and the thermal stateτ (n), respectively.
Since the loss-dephasing channel is not degradable, the one-shot coherent information is no longer concave, and we are not guaranteed to obtain a single maximum. As a result, a numerical optimization over diagonal states might not converge to a global maximum. Furthermore, while the channel is covariant to rotations, we cannot use a concavity argument to show that a diagonal state achieves the optimal singlemode coherent information. However, as we will show later, Q diagonal and Q thermal are tight lower bounds on the capacity for γ 1 or γ φ 1.

Comparison of bounds
Here, we compare the previously derived bounds on the quantum capacity of the loss-dephasing channel for several dephasing rates γ φ ∈ {0.001, 0.01, 0.1}, mean energiesn ∈ {2, 5}, and various loss rates γ ∈ [0.01, 0.2]. The results of this comparison are shown in Fig. 1. The lower bound Q thermal is almost as tight as the lower bound Q diagonal obtained by optimizing over diagonal states, except for large γ and γ φ , i.e. γ, γ φ 0.1. Both the lower bounds and the upper bounds are tight when the joint channel is dominated by either loss or dephasing, i.e., γ → 0 or γ φ → 0. However, the bounds become looser when both γ and γ φ are large, as the joint channel deviates further from a degradable channel. Overall, both the lower and upper bounds increase as more energy is allowed for the bosonic mode. Another feature of the data processing upper bound Q data processing is that, given γ φ , Q data processing first increases as γ decreases and then saturates when γ γ φ . This feature appears because Q data processing is, by definition, the minimum of the separate pure-loss and pure-dephasing channel capacities (see Eq. 27), which equals the latter (as a constant) when dephasing is dominant.

Numerically optimized error correction codes
Tailored encoding and decoding operations are required to faithfully transmit quantum information over a noisy channel N A→B . We use the entanglement fidelity [2] of the composite encoding-noise-decoding are identical message spaces available to A and B, respectively) as the figure of merit characterizing how well the information is preserved through E. The optimal encoding and decoding strategy is thus given by where F E is the entanglement fidelity of the composite channel, defined as the overlap between a maximally entangled state |Γ , and the stateρ E obtained after one part of |Γ is transmitted through E: where {|i } is a basis of M A ∼ = M B and TrE is the trace of a matrix representation of E.
If A is a bosonic system, we can modify this definition to handle energy-constrained encodings by defining (S, R) ≤n opt = argmax (S,R), Tr(nS(π))≤n whereπ is a maximally mixed state in M A . Optimizing over either of the channels S or R while keeping the other fixed is a convex optimization problem. As such, this problem has a unique solution that is efficient to compute using semidefinite programming [17]. This insight leads to an intuitive biconvex optimization algorithm in which we iteratively find the optimal decoding given an updated encoding, and vice versa. However, the biconvex problem itself is not guaranteed to be convex. As a result, there might be cases where the algorithm does not converge or where the converged encoding depends on the chosen initial encoding.
Our numerical procedure optimizes the entanglement fidelity, but does not consider other important figures of merit. These include the performance of the encoding and decoding procedures themselves, or the difficulty in their experimental realization. In addition, the constraint on the average photon number fails to take into account the spread of the codes in Fock space. For example, the geometric distribution of GKP codes occupies a much larger number of energy levels than Poissonian-distributed cat codes with the samen. Finally, increasing the entanglement fidelity of a code might not always increase its capacity to transmit information, as we will show in section 5.2.
For further details and results on the pure-loss channel, we refer to section 5 in Ref. [18]. In the next section, we expand on these results by adding dephasing noise. We find that the optimization algorithm consistently converges to unique encodings for various loss and dephasing rates.

Discussion of optimization results
To gain insight into the structure of numerically optimized qubit codes, we perform the optimization for various loss and dephasing rates. Since the lossdephasing channel is covariant under rotations, rotating a code does not alter its performance. For the pure-dephasing channel, the optimized codes are also covariant under diagonal unitaries. Therefore, we regularize the codes accordingly after the optimization process (see Appendix C.1).
As long as the error rates are sufficiently high and the energy constraint sufficiently low, we consistently converge to the same optimal codes for a given triplet γ, γ φ ,n. The optimization results are shown in Fig.  2. Each plot represents an optimization result for a specific pair of loss and dephasing rates -that is, a local maximum of the entanglement fidelity. These local maxima are most likely also global maxima, since we observe that optimization runs with different randomly chosen initial codes converge to the same optimal code.
As we allow for highern or lower γ, γ φ , the optimal value of the entanglement fidelity approaches one and the landscape becomes shallower, allowing for more local maxima with similar entanglement fidelities to appear. This causes the optimization result to depend on the initial state. Such parameter ranges are not considered and are represented in Fig. 2 by a shaded region.
A key takeaway from these results is that while numerically optimized codes for pure-loss or puredephasing channels saturate the mean energy constraint, this is not generally the case. Indeed, for the combined loss-dephasing channel with γ, γ φ > 0, the numerically optimized codes have a particular mean energy that does not vary when allowing higher energies. Figure 2: Wigner plots of the maximally mixed states for optimal codes. The codes are obtained using a biconvex optimization process for different rates of loss and dephasing under the energy constraintn ≤ 9. The plotted codes are consistently obtained from various randomly chosen initial codes. The shaded region represents a low-error range for which multiple local optima exist with entanglement fidelity approaching unity. The energy constraint is saturated when either the dephasing or loss rates are zero, namely for (a),(f),(o), and (p). The remaining codes have optimal mean energies, which we specify in Appendix C.

Comparison with known codes
Various quantum error-correcting codes have been previously developed to protect against bosonic noise. For a short overview of GKP codes (which protect well against loss) [8], rotation codes [14] and numerical codes [19], see appendix D. Here, we show how these codes compare to the numerically optimized codes from this work.
The optimization results depicted in Fig. 2 demonstrate the interplay between loss and dephasing and indicate the ranges of γ, γ φ in which each of the GKP, cat and numerical codes excel [28]. The optimal codes for pure-loss noise (Figs. 2(o) and 2(p)) closely resemble hexagonal GKP codes, as previously observed by Noh et al. [6]. The optimal codes saturate the energy constraint, suggesting that GKP codes with more photons are better suited for dealing with loss.
In the case of pure-dephasing noise (Figs. 2(a) and 2(f)), we observe that two-legged cat codes and squeezed two-legged cat codes [29] provide optimal protection. Similarly to the GKP codes in the pure loss case, these codes saturate the energy constraint. This agrees with the known fact that the performance of two-legged cat codes improves as their mean energy increases [20,[30][31][32]. Squeezed two-legged cat codes have also been found to provide limited protection Figure 3: Entanglement fidelities FE versus loss rate γ for various dephasing rates γ φ (columns) and energy constraintsn (rows). The plots compare the performance of biconvex optimized qubit codes (blue) with optimal members of known qubit encoding families: four-legged cat qubit codes (orange), hexagonal GKP qubit codes (green) and numerical codes (red). In panel (d), the lines corresponding to GKP codes and the optimized codes nearly overlap.
against photon loss in addition to dephasing [29].
Figs. 2(g) and 2(h) correspond to an encoding in which the code words are two-legged cats with opposite parities and orthogonal orientations in phase space. This encoding differs from the regular twolegged cat code, in which the orientation of the code words is identical. The separation of the code words in phase space offers protection against single-photon loss, besides the code's suppression of dephasing errors (a similar observation was made in Ref. [33]). The modified two-legged cat code is also superior to the four-legged cat code [4] in that it also makes use of odd Fock states.
For all codes with nonzero dephasing and loss (i.e., γ ≥ 0.01, γ φ ≥ 0.1 or γ φ ≥ 0.01, γ ≥ 0.1), we observe that the energy constraintn ≤ 9 is not saturated (see Table 1 in Appendix C). Instead, these codes are characterized by varying optimal mean energies for different error rates. This situation is similar to the family of numerical codes [19], which consists of five different codes categorized by mean occupation number. However, unlike the numerical codes, our optimization process considers the error rates, resulting in different optimal codes for different γ, γ φ . Out of those, (i),(k),(l) and (n) appear to be similar to some of the numerical codes (see Fig. 10 in the Appendix).
We also observe that certain codes have explicit rotational symmetry. For example, Figs. 2(k) and 2(l) are symmetric with respect to rotations by 2π/3. However, these codes are not rotation codes [14]. Indeed, whereas the Fock-state distribution of the code words obeys 2N -modularity, the remainders are not restricted to 0 and N as is the case for rotation codes (see Appendix D.2).

Entanglement fidelity
To evaluate the performance of the numerically optimized qubit codes, we compare their entanglement fidelities to those of the other major code families. Specifically, we use the following codes: hexagonal GKP codes as representatives of GKP codes, fourlegged cat codes as representatives of rotation codes and finally, the five numerical codes presented in Ref. [5].
Within each of the code families, we choose the particular instance of the family that maximizes the entanglement fidelity, i.e., where S is a four-legged cat, hexagonal GKP or numerical code qubit encoding with Tr(nS(π)) ≤n, and R * = argmax R F R•N LD [γ,γ φ ]•S is an optimal decoding channel. The results of this comparison are shown in Fig. 3. In all cases, we find that the numerically optimized codes have the highest entanglement fidelity, as expected. For low dephasing rates (γ φ = 0.001), hexagonal GKP codes offer the highest fidelity among the considered code families, while for high dephasing rates (γ φ = 0.1) cat codes perform better. The numerical codes offer good results for intermediate dephasing γ φ = 0.01 and sufficiently largen.

Hashing bound
The bosonic codes derived in the previous section can be used to reliably communicate information over the loss-dephasing channel. In this section, we study how close the communication rate of each code comes to the quantum capacity of the channel Q N . While the achievable communication rate Q E of an encodingnoise-decoding channel E is difficult to determine, it is bounded from below by the information-theoretic quantity known as the hashing bound D E , which we define below (see also corollary 21. From the definition, we see that D E ≤ log 2 dim H M . Therefore, to meaningfully compare the communication rates of the codes to the theoretical limit Q N , we generalize the qubit codes used earlier to qudit codes with dimension d = dim H M ≥ 2 Q N . Specifically, we use qudit codes obtained through biconvex optimization of the entanglement fidelity and known qudit code families, such as hexagonal GKP qudits (Eq. 48 in the Appendix) and 2dlegged cat qudits (Eq. 50).
In Fig. 4, we plot the optimal hashing bounds where S are the qudit encodings for the respective code families with 8 ≥ d ≥ 2 and mean energy Tr (nS (π)) ≤n, and The results indicate that the biconvex optimized codes and the GKP codes have very similar hashing bounds for all the considered parameter regimes. In addition, the hashing bounds are close to the previously derived lower bounds on the capacity of the loss-dephasing channel (Fig. 1). In some cases, we observe that the hashing bound of the hexagonal GKP code surpasses that of the biconvex optimized code. As previously observed in Ref. [5], optimizing the code for entanglement fidelity, as in our biconvex optimization scheme, does not necessarily imply an optimal hashing bound [35]. For example, introducing noise structure (e.g., biased noise) to E at the cost of slightly reduced entanglement fidelity after decoding the inner bosonic code may result in a more efficient outer code using multiple copies of E. This procedure might lead to a better hashing bound than the one associated with the bosonic code achieving maximal entanglement fidelity.
Finally, the relatively low hashing bound of the cat qudit code is due to the fact that for k = 0, . . . , d − 1 the code word |k cat has support on Fock states 2k modulo 2d (see section D.2 in the Appendix). Therefore, the mean energy of the code is greater than d−1, limiting the hashing bound to log 2n . This implies that rotation codes are less compressed than other codes and require higher mean energy to provide the same capacity.

Conclusion
We presented a study of the bosonic loss-dephasing channel from multiple perspectives. We showed that, unlike the pure-loss (for γ ≤ 1 2 ) and pure-dephasing channels, the loss-dephasing channel is not degradable, complicating the calculation of its capacity. To that end, we provided upper and lower bounds that prove to be tight for realistic values of γ, γ φ . Next, we used a biconvex optimization scheme with an energy constraint to find numerically optimized codes for various loss and dephasing rates. We observed that twolegged cat codes are well suited for dephasing errors, while hexagonal GKP codes handle loss errors well. These two edge cases saturate the energy constraint. In contrast, when both loss and dephasing are present, the energy constraint is not saturated, and codes resembling numerical codes emerge. The optimization procedure reveals a phase space of codes that vary non-smoothly from GKP codes to cat codes. Finally, we connected the two perspectives using the hashing bound and showed that the single-mode biconvex optimized codes give rise to a satisfactory lower bound on the capacity. This implies that the optimized codes can be used for quantum communication schemes over the loss-dephasing channel with a relatively high communication rate.
The remaining open questions include a study of whether or not the channel is anti-degradable in a nontrivial error range (not just for γ ≥ 1 2 ). We conjecture that the channel is not anti-degradable for γ < 1 2 and suggest that this may be proven using similar methods to the ones used here. Furthermore, good analytical bounds on the capacity remain to be found. One may also attempt to prove or contradict that for any γ, γ φ > 0, there exists a code with finite energy that maximizes the entanglement fidelity of N LD [γ, γ φ ] and, if so, estimate its energy. Finally, this work does not consider the implementation of encoding, error correction, and decoding procedures. However, our results may be a good starting point for finding codes that are both experimentally feasible and perform well under realistic loss-dephasing noise.

A Quantum capacity of the pure-dephasing channel
The energy-constrained quantum capacity of the dephasing channel can be evaluated using a numerical convex optimization procedure. The optimization can be limited to diagonal states due to the covariance of this channel with respect to phase-space rotations. We observe that the photon number distribution of the optimal input state resembles the Poissonian distribution of coherent states in shape, although it may have a different variance (see Fig. 5). Interestingly, for a large range of parameters, the coherent information of thermal statesτ (n) yields an excellent approximation to the quantum capacity (see Fig. 6).   The aim of this appendix is to give a rigorous proof of the non-degradability of the loss-dephasing channel (Theorem 3.2.1) through a number of lemmas.

the Stinespring dilation of a loss-dephasing channel
and the complementary channel N c LD [γ, γ φ ] is given by Proof. We start by calculating the state U |n00 XE l E d for any n ≥ 0. We can combine the two identities to determine that ∀n ≥ 0, We can now calculate the isometry on an arbitrary operator basis element |n m|: Finally, tracing over the system X forces k = l, which yields the desired result for the complementary channel: Similarly, tracing over the systems E d , E l forces n − m = k − l. Using the overlap between the coherent states Next, if we assume by contradiction that the loss-dephasing channel is degradable, then the degrading channel must assume the form presented in the following lemma.
Proof. We start by considering maximum photon loss, i.e., γ = 1. In this case, the channel N LD [γ, γ φ ] maps all states to the vacuum state. Since the complementary channel N c LD [γ, γ φ ] = I X→E l ⊗ |0 0| E d is not a constant channel, the degrading map D does not exist. Now, assume that γ < 1. Then, for all integer ∆, n ≥ 0, Using the previous lemma, we obtain Using the abbreviated notation the previous equation simplifies to n k=0 This allows one to invert the expression and obtain X n,∆ = n k=0 by induction on n. Using the definition of X k,∆ we can rewrite D(|n n + ∆|) as Taking the adjoint of both sides of the equation gives us an expression for D (|n + ∆ n|), so that Eq. 42 holds, as required.
To prove that the channel is not degradable, we will evaluate the degradation super-operator on coherent states and show that it can map them to non-positive operators. We start with the following lemma: Lemma 3. For γ < 1 and α ∈ C, the map D satisfies Taking r = n − t, s = m − t, q = t − l we obtain, The last lemma, which concludes the proof, shows that α can be chosen s.t. D (|α α|) is not a state. Hence D is not a quantum channel. Lemma 4. If 0 < γ < 1 and γ φ > 0 then D is not a quantum channel.
Proof. It is sufficient to show that D is not a positive map. Let n ∈ N, α ∈ C s.t.
since 0 |σ | 0 > 0. This shows that D (|α α|) 0. Therefore, D is not a positive map and in particular, not a quantum channel.

C Biconvex optimization of QEC codes for the loss-dephasing channel
We use biconvex optimization to find a local maximum in a complicated landscape defined by the entanglement fidelity. For the process to converge, it must run for many iterations. Moreover, the point to which it converges depends on the starting point.
For Fig. 2, we ran the optimization process for around 3000 iterations per plot. This process was repeated ten times with different randomized starting codes in each repetition. We then selected the optimal results for each pair of γ, γ φ , while ensuring that the results from other repetitions yield similar codes. This consistency criterion indicates that the local optimum might also be a global optimum. This procedure works well when either γ or γ φ are large. However, when both parameters are small, the landscape becomes shallow. This makes it hard to converge to a single global maximum, leading to different optimization results for different starting codes. For this reason, we chose not to include the cases where both γ φ and γ are below 0.1.
As presented in the main text, the optimal codes do not always saturate the energy constraint (see Table 1). In some of the cases where either γ = 0 or γ φ = 0, the mean energyn comes close to the constraint but does not saturate it. This is most likely due to numerical limitations of the optimization and the fact that we used a limited number of energy levels (dim H X = 22).

C.1 Regularization process for comparison of numerically optimal codes
The loss-dephasing channel is invariant under phasespace rotations, which causes optimization results with different starting points to be similar up to rotations. To tackle this, we regularize the codes by concatenating them with e iθn • e −iθn and imposing a condition that a particular off-diagonal element in the density matrix of the maximally mixed state of the resulting code is positive. Similarly, the pure-loss channel is invariant under rotations, and the same regularization is applied there. The pure-dephasing channel, however, is also invariant under diagonal unitaries. Therefore, in that case we regularize the code by concatenating it with e iD •e −iD , whereD is a diagonal matrix with real entries. These entries are chosen such that a single off-diagonal element in each row of the density matrix of the maximally mixed state becomes positive. This results in codes with plots such as (a) and (f) in Fig. 2.

C.2
Optimization results for the pure-loss channel with low energy constraints.
Hexagonal GKP codes emerge as optimal codes for the pure-loss channel if the energy constraint is chosen to be sufficiently high. However, if the energy constraint is low (e.g.,n = 6 orn = 7), and if the loss rate is sufficiently high (e.g., γ = 0.2), the optimization result is a code with five-fold rotation symmetry (see Fig.  7). A potential insight as to why this might be the case is that, like the GKP code, we can use a tiling to protect against random shifts. However, due to the energy constraint, the tiling is not shift-invariant and distorts as we move away from the origin. A hyperbolic plane may be a good model for this phenomenon, and this plane can be tiled by pentagrams, as we observe here. D Overview of bosonic error correction codes D.1 GKP codes GKP codes [8] are lattice codes that are particularly well suited for handling phase-space displacements and bosonic loss noise [5]. GKP codes can be either single-mode (as used in this work) or defined using N modes. If they are 2N -dimensional, the code words can be seen as superpositions of infinitely many Nmode coherent states centered around points of a non-degenerate 2N -dimensional lattice in phase space. As such, they have infinite energy. A common way to modify the code to have finite energy, is to multiply the coherent states by weights from a Gaussian envelope, with narrower envelopes corresponding to lower mean energy of the code.
In particular, for the hexagonal GKP qudits G d,∆ (|µ G d,∆ = |µ ∆ V,d ) referred to in the main text, the lattice is given by (48) GKP codes can be seen as stabilizer codes over 2N commuting displacementsD dv1 ,D v2 , . . . ,D vn . Therefore, not any lattice can be used to define a valid GKP code. In the single-mode case, for example, this requirement translates to |det(V )| = π 2 , so that a GKP code with square lattice will have v 1 = π 2 1 0 and v 2 = π 2 0 1 . The condition on V for N > 1 is more involved and is discussed, for example, in section 2.4.3 of Ref. [7]. The performance of a GKP code depends on the lattice that defines it, with lattices that have a higher ratio of sphere packing (ratio between the volume of the largest 2N -sphere that can inhabit the Voronoi cell and the volume of the cell itself) performing better. This means, for example, that a single-mode GKP code defined by a hexagonal lattice will perform better than a code defined by a square lattice (see section 4 in Ref. [18]). Both codes are presented in Fig. 8.
Ref. [18] also shows that GKP codes perform well for bosonic loss in two crucial ways. The first is that infinite-energy multi-mode GKP codes achieve a quantum communication rate that is offset by up to log 2 e from the capacity of the channel in the limit where the number of modes goes to infinity. This feature holds for any loss rate γ between 0 and 1 2 . The second advantage of GKP codes is that the biconvex optimization process described earlier usually converges to a state resembling a hexagonal GKP state with finite energy if only loss is present. How- ever, as discussed in appendix C, for certain energy constraints it can also converge consistently to other codes. For example, a code with five-fold symmetry can emerge, probably due to a requirement to tile a bounded section of phase space.

D.2 Rotation codes
A single-mode bosonic qudit encoding with code words |k C , k = 0, . . . , d − 1 is a called a rotation code [14] with symmetry N if the code words are invariant under the rotationR N = e 2iπ Nn and if the rotation R dN = e 2iπ dNn acts as a logicalẐ on the qudit: Cat codes and binomial codes are examples of rotation codes (Fig. 9). Ref. [14] provides a scheme for performing encoding, decoding and gates on such codes based on controlled rotation gates and phase measurements.
The logical code words of a 2d-legged cat qudit, as referred to in the main text, are defined by where |α is a coherent state with α = 0. Since the code word |k C of a rotation code is supported by Fock states that are kN modulo dN , rotation codes provide protection (or at least detection) against up to N photon loss events. If the code is also a number-phase code, that is, the phases of the code words in the conjugate basis are well localized (modulo a rotation), then the phases of the code words are also well separated. This property protects the code against dephasing errors. Number-phase codes are similar to GKP codes in that, whereas the former use photon number-phase duality to define the code, the latter rely on position-momentum duality. In our numerical optimizations, we found that for pure dephasing (γ = 0), we converge to N = 1 rotation codes with code words that resemble squeezed coherent states. However, when adding loss, we sometimes converge to codes that are invariant under rotations but are not rotation codes. Indeed, the condition for the logicalẐ operator to be a rotation may be too restrictive. Examples of such codes include GKP codes and the numerically optimized codes plotted in Fig. 7 and in Figs. 2(k) and 2(l). More precisely, a qudit code C has rotation symmetry (as opposed to being a rotation code) N if its maximally mixed stateρ C = 1 Proof. The operatorsρ C andR N commute, and both are diagonalizable. Therefore, they are simultaneously diagonalizable and we can find eigenstates of ρ C that are also eigenstates ofR N . Since the eigenvalues ofρ C are 1 d , . . . , 1 d , 0, . . ., we obtain d eigenstates |k C ofρ C with eigenvalue 1 d . Because the states |k C are also eigenstates ofR N , they each lie in V s k = span{|l | l = s k (modN )} for some integer s k . However, sinceρ C overlaps with |l i for all i = 0, . . . , d − 1 and they are all different modulo N , the s k 's must be different. Therefore, |k C can be chosen to lie in V l k .

D.3 Numerical codes
Numerical codes, first introduced in Ref. [19], emerge from a different numerical optimization scheme than the one used here. For example, these codes include the √ 17 code The cost function of the optimization process is a function of coefficients of the quantum error correction matrix, which is constructed from a finite set of noise operators. In the original article [19] and in Ref. [5], the noise operators were taken to be {I,â,â 2 }. The codes are obtained from local minima of the cost function with a penalty on average energy. Five codes were identified in this manner. We plot the Wigner distributions of their maximally mixed states in Fig.  10. Note that a different set of noise operators would result in different numerically optimal codes. In particular, the errors are given the same weights, which is not necessarily desired when considering specific error rates. For example, if the loss rate is low, we should assign a larger weight to I,â. If the loss rate is high, we should assign a larger weight toâ,â 2 , and perhaps even consider higher powers ofâ.
Since the quantum error correction matrix consists of elements of the form µ |N † iN j | ν witĥ N i =â i , i = 0, 1, 2 in our case, numerical codes are expected to perform well under loss (of up to two photons) and dephasing (with its first Kraus operator ∼n). Indeed, there is a large parameter range for which our biconvex optimization scheme outputs codes similar to numerical codes (e.g., Fig. 2(f)).