Quantum-optimal information encoding using noisy passive linear optics

The amount of information that a noisy channel can transmit has been one of the primary subjects of interest in information theory. In this work we consider a practically-motivated family of optical quantum channels that can be implemented without an external energy source. We optimize the Holevo information over procedures that encode information in attenuations and phase-shifts applied by these channels on a resource state of finite energy. It is shown that for any given input state and environment temperature, the maximum Holevo information can be achieved by an encoding procedure that uniformly distributes the channel's phase-shift parameter. Moreover for large families of input states, any maximizing encoding scheme has a finite number of channel attenuation values, simplifying the codewords to a finite number of rings around the origin in the output phase space. The above results and numerical evidence suggests that this property holds for all resource states. Our results are directly applicable to the quantum reading of an optical memory in the presence of environmental thermal noise.


Introduction
For a classical channel between a sender (Alice) and receiver (Bob), the rate at which information can be noiselessly transmitted from Alice and Bob for a given probability distribution of the input symbols equals the Shannon mutual information between the input and output [1].Maximizing the mutual information over all input probability distributions gives the capacity of the channel, which can be achieved using encoding and decoding over long codes.Quantum-mechanically, the analog of an input distribution is an ensemble of quantum states indexed by Alice's input alphabet, and the Holevo information of the ensemble [2] replaces the mutual information in the classical case.If Alice encodes her messages in suitable sequences of states and Bob can make measurement on all received output states at once, communication at a rate equals to the Holevo information is possible with arbitrarily small error probability [3,4].Optimizing over the input probability distribution thus gives the ultimate capacity of encoding information in the given set of quantum states.
Optical quantum channels are an important class of quantum channels for which the Holevo information has been very well studied in the context of different tasks, such as communication [5], optical memory reading [6,7,8,9,10,11], algorithmic cooling [12], as well as in pure-loss quantum optical channels [13].Given its many Input "resource" state ρ is mixed with an "environment" in a thermal state γ T by a beamsplitter B η of transmittance η and then undergoes a θ phase-rotation operation R θ , giving an output codeword state ρ T (η, θ).
applications, we focus on optical quantum channels that separates out the sources of energy, thus rendering energy as a resource in performing the task at hand.This notion of energy as resource has been studied in [14] for thermodynamic tasks.
In what follows, we consider encoding information by optimally applying passive linear-optical operations on a given finite-energy resource state ρ.Such operations, which we call thermal channels (illustrated in Fig 1) are optical quantum channels that can be constructed simply by mixing the given input state with the environment at a certain temperature, followed by a phase-shift operation.The average energy of every state of such output ensembles is bounded above.Such peak energy constraints have physical meaning as the greatest amount of energy that can be tolerated by a given device or channel.In addition, technological limitations may impose such constraints -for example, it is very difficult to generate Fock states of large occupation number.Even for coherent-state sources, such constraints can be severe in some applications.For example, a satellite-based laser communication system may be highly constrained in its energy output by payload limitations and energy scarcity in space.Note that a peak energy constraint on an ensemble is more restrictive than an overall average energy constraint, which has been the subject of earlier work.Indeed, the capacityachieving ensembles for communication on a large class of bosonic Gaussian channels are circular Gaussian distributions of coherent states in phase space [5,13,15], which are are clearly not peakconstrained.
In addition, we also note that by adopting the thermal channel model we obtain a finitetemperature generalization of existing results that assumed a zero-temperature (i.e.vacuum) environment.These include past studies of tasks such as the quantum reading of optical memory [10,11] (illustrated in Fig. 2) and communication over the pure-loss channel [13].
In this work, we investigate the information capacity of such a thermal encoding protocol as quantified by its Holevo information.First we show that given an arbitrary input state and an arbitrary environment temperature, the encoding that maximizes the Holevo information is characterized by an independent distribution of the attenuation coefficients and the phase shift values, with the latter being uniformly distributed -we call such encodings circularly symmetric encodings.We then derive necessary and sufficient information-theoretic conditions that characterize the optimal distribution of the attenuation coefficient for such circularly symmetric encodings.For the case of a coherent-state resource at zero temperature and a thermal state at any temperature, we show analytically that the optimal encoding involves only a finite number of attenuation values.In addition, numerical results based on the aforementioned information-theoretic optimality conditions also show that this property holds in various cases where the output states are displaced thermal states.Based on this combination of analytical and numerical evidence, we conjecture that circular symmetric encodings with a finite number of attenuation values maximize the Holevo information for any resource state.
This paper is organized as follows: In section 2, we lay down the formal definition of a thermal channel and then present the general properties of an optimal encoding.In Section 3, we focus on the case of a zero-temperature environment and obtain optimal encodings when the resource is a coherent state or a displaced thermal state.In Section 4, we discuss a nonzero-temperature environment with a coherent-state or thermal-state resource.We close with Section 5, which briefly discusses the implications of our results.The technical proofs are relegated to the appendices.
2 Thermal encodings: definitions and optimization

Thermal operations
In this section, we first define thermal channels whose output states constitute the potential codewords for the channel.Consider a given input resource state with energy E < ∞ represented by density operator ρ in Hilbert space H A and thermal state with temperature T as the environment state represented by density operator Tr(e −βH ) in Hilbert space H E for a Hamiltonian H = â † E âE (where â † E and âE are the creation and annihilation operators on H E , respectively) and inverse temperature β = 1  kT .Where it is convenient to express a thermal state in terms of its mean photon number N = ⟨â † E âE ⟩, we write ρ th (N ) = ∞ n=0 N n (N +1) n+1 |n⟩⟨n|.We consider only phase-rotation operations and mixing operations with a thermal state.These operations do not require an external energy source and have been studied in [14,12].The thermal channel consists of exactly each of these two operations on the resource state: an operation mixing ρ and γ T followed by a noiseless phase-rotation operation.
Definition 1 (Thermal channel).A thermal channel with attenuation η ∈ [0, 1], phase rotation θ ∈ [−π, π), and environment temperature T ≥ 0 is a quantum channel T (T ) η,θ whose action on a given input state ρ ∈ H A is defined as ) is a beamsplitter operation with transmissivity η = cos 2 φ η and A âA is a phase-rotation of θ on H A .For a circuit representation, see Fig. 1.
Given thermal channel with temperature T , attenuation η, and phase θ, its output is called a codeword and is denoted by ρ T (η, θ) = T (T ) η,θ (ρ) or simply ρ(η, θ) when temperature T is clear from the context.A thermal encoding is defined by a cumulative distribution function F (η, θ) over attenuation η and phase θ of the thermal channel which generates the ensemble of codewords {ρ(η, θ), dF (η, θ)} η,θ .For a thermal encoding F with environment temperature T acting on resource state ρ, the amount of classical information that can be encoded onto the output quantum state is given by the Holevo information [2,3,4]: where S is the von Neumann entropy and ρ ave = dF (η, θ)ρ T (η, θ) is the averaged state of the ensemble.All information quantities throughout this paper are in bits, hence all log functions are base 2. Given a resource state ρ and temperature T , our task is to find the thermal encoding F * that maximises the Holevo information χ te [F ].We note that χ te is an adaptation of the thermal information capacity proposed in [16] which quantifies the optimal Holevo information over distribution of passive thermal operations.

Characterizations of optimal thermal encodings
In this section, we state the properties of thermal encodings F that maximize information capacity of the thermal channel χ te [F ] given resource state ρ and environment temperature T .Interestingly, it turns out that these properties share many features with the optimal encodings for classical additive white Gaussian noise (AWGN) channels where the signals satisfy a peak energy constraint E (see discussions in Section 5).Before we proceed to state these properties, we define circularly symmetric encodings which play an important role in the optimality conditions.
A ring state ρ(η) is a mixture of codewords {ρ(η, θ)} θ in which the phase θ is distributed uniformly around the origin of the phase space, thus forming a ring.Hence we can write the average output state as a mixture of ring states Both ρ(η) and ρ ave are diagonal in the Fock basis with their n-th diagonal entry being P n [ρ(η)] = ⟨n|ρ(η)|n⟩ and respectively.For a proof, see Lemma 1.This family of encodings is central to the first optimality condition.
Proposition 1.Given a resource state ρ and channel temperature T , for any encoding F , there exists a circularly symmetric encoding This implies that for any optimal encoding F * = arg max F χ te [F ] there exists a circularly symmetric encoding F ′ that attains that same optimal capacity (i.e.χ te [F ′ ] = χ te [F * ]).This property is mentioned and shown in a plot by Guha et al. [9] comparing capacity of different quantum reading settings, but was missing a formal proof.We complete this missing piece with a proof in Appendix A.1.Thanks to Proposition 1, we can focus our search for optimal encodings to circularly symmetric encodings, which are completely characterized by their cumulative distribution function (cdf) over the attenuation coefficent η alone.To avoid notational complexity, we will hereafter use F (η) to also denote the cdf of η alone when dealing with circulary symmetric encodings.
We now discuss the second property of the optimal thermal encoding, a necessary and sufficient condition for an optimal encoding F * .We first note that the Holevo information of a circularly symmetric encoding F can be written as an average of a function that depends on η that indicates information content of the ring at η by expressing χ te in terms of relative entropy S(•||•), i.e.
where we have used Fubini's theorem (see, e.g.Chapter 2.3 of [17]) to swap the sum and integral in S(ρ ave ) and used the fact that entropy S(ρ(η, θ)) is independent of the phase θ, allowing us to put ρ(η, 0) in place of ρ(η, θ).Now, define the marginal information density of circularly symmetric encoding F at η as Since the relative entropy S(ρ(η, θ)||ρ ave ) is always non-negative, the marginal information density i[η, F ] is also non-negative, indicating that it is an information measure of ring ρ(η) given circularly symmetric encoding F .We note that the marginal information density also plays an integral part in the results for the capacity of the classical amplitude-constrained additive white-Gaussian-noise (AWGN) channel by Smith [18] and the direct-detection photon channel by Shamai [19], particularly on the properties of the optimal input alphabet distribution.The set of attenuation-coefficients η where F is increasing is of particular importance for this second property.This set has been called the points of increase (POI) (e.g.[20,18,21]) and support (e.g.[22,23]) of the encoding in the literature, depending on whether the encoding is defined as a cumulative distribution function or a probability distribution over the attenuation-coefficients.Definition 3. Given a probability distribution on [0, 1] and its cdf G(η) := Pr[x ≤ η].The support of the distribution is defined as the smallest closed set S ⊆ [0, 1] such that Pr[S] = 1.A point η ∈ S is also called a point of increase of the cdf G.We will call I G the set of points of increase (equivalently, the support set of the probability distribution) the POI of G. Now we are ready to state our second property.Proposition 2. For any given resource state ρ, a circularly symmetric encoding F * (η) is uniquely optimal if and only if The complete proof is provided in Appendix A.3.This property establishes a necessary and sufficient condition for a circularly symmetric encoding F * that achieves the optimal Holevo information sup F χ te [F ].In particular, the first statement can be understood as the Holevo information χ te [F * ] being an upper bound to the marginal information i[η, F * ] regardless the value of η.Whereas, the second statement says that for any point η for which F * (η) is increasing (either in a continuous or discontinuous manner), its marginal information i[η, F * ] is equal to the Holevo information.Because an encoding is a cumulative distribution function it may be more intuitive for one to interpret this as a characterization of the support of probability distribution dF * induced by optimal encoding F * .Supported by analytical and numerical results we present in later sections for some large classes of Gaussian resource states, we conjecture that the Holevo-optimal distribution F * over attenuation η for any resource state ρ has a finite support by showing that its corresponding POI I F * must have a finite cardinality.

Conjecture 1.
For any resource state ρ and temperature T ≥ 0, the set I F * of points of increase of the attenuation-coefficient distribution F * (η) associated with the optimal circularlysymmetric encoding has finite cardinality.
Conjecture 1 states that given an optimal circularly symmetric encoding, the average output state of the thermal channel is a finite mixture of ring states, which can be nicely visualized as discrete rings in its phase-space representation as presented in Fig. 3.We show this property rigorously for coherent state resource in zero-temperature and thermal state resource in any temperature as stated in Proposition 3 and Proposition 4, respectively, and discussed in the corresponding sections they are in.Numerical evidence for more general cases where the channel output is a displaced thermal state are also given in the following sections.

Zero temperature environment
In this section we discuss finding the optimal encoding when the environment has temperature T = 0, which corresponds to the environment state being the vacuum state γ 0 = |0⟩⟨0|.Given a resource state ρ with energy E max , the codeword ρ(η, θ) has average energy ηE max as a consequence of γ 0 environment having a 0 energy.Therefore, the average state ρ ave has energy at most E max .As such, the Holevo information of any encoding and any resource state is bounded above by the quantity [5,15] g(E max ) : We use this universal capacity upper-bound for T = 0 environment to benchmark the encoding capacity of our optimal schemes -see Appendix D for more details.

Coherent state resource
where P n (λ) = e −λ λ n n! is a Poisson probability distribution over outcome n with mean λ.
The optimal circularly-symmetric encoding finiteness property stated in Conjecture 1 holds for coherent states at T = 0, as shown in Appendix B. Proposition 3.For any coherent-state resource ρ = |α⟩⟨α| and T = 0, the attenuation-coefficients η of the optimal circularly-symmetric encoding takes a finite number of values in [0, 1].
We plot this finite optimal distribution for resource state energy |α max | 2 ≤ 20 in Fig. 4(a) using numerical optimization1 to find the optimal encoding scheme and provide ring-shaped phase  space representations of these optimal encodings at some fixed values of |α max | 2 in Fig. 3.As the reader may notice in the figure, for any given amplitude |α max | of the input coherent state, the support of optimal encoding over attenuation η is finite.Fig. 4(a) also shows that the optimal encoding always includes an outermost ring at |α| = |α max | and that the probability mass of this outermost ring is non-increasing as the |α max | increases.For small |α max |, it is optimal to encode purely in phase, i.e. using states that forms just one ring with Holevo information given by which is the Shannon entropy of Poisson distributed random variable with values n ∈ N with mean ).This is line with the findings of [9], where this ring encoding was shown to be close to the upper bound g(E max ) for E max ≪ 1 (See Fig. 4(b)).
The threshold at which this single-ring encoding becomes suboptimal can be obtained by solving for |α max | when adding a vacuum state to the mixture of coherent states begins to increase the information content of the total state, as we know that the optimal encoding is finite over the channel attenuation.That is, we want to solve for where ρ max is the ring state at amplitude α max , and vacuum state ρ 0 = |0⟩⟨0|.Interestingly, the solution to this is when the entropy of the encoded states is proportional to its energy where we get |α max | 2 = 1.25034 (see Appendix C).It can be observed further in Fig. 4(a) that as |α max | 2 slightly exceeds 1.25034, the optimal encoding now includes an additional codeword |0⟩⟨0|.As |α max | increases further, this vacuum codeword gains probability mass until it transforms into a second ring as the first ring codeword diameter further increase.More and more rings are introduced in this way as E max increases (see Fig. 3).We also plot the encoding capacity of the single-ring encoding and the uniform "flat" encoding (see Appendix D) in Fig. 4(b).
For large |α max | 2 , the flat encoding is not far from the optimal capacity derived here.

Relative difference
Figure 4: (a) For an optimal circularly symmetric encoding F * given a coherent-state resource ρ = √ E max √ E max in a T = 0 environment, there are a finite number of points-of-increase (POI) I F * = {η j } j (see Definition 3).The optimal encoding F * at a given E max is specified in the plot by the intersection(s) of the vertical line at E max with the colored lines.The y-axis value of each colored point corresponds to a codeword with energy |α j | 2 = η j E max which correspond to attenuation η j in the POI of encoding F * with its density dF * (η j ) color encoded.For example, given energy E max ∼ 3.5, there are two points of increase η 0 > η 1 in an optimal encoding F * .They correspond to codewords with energy |α 0 | 2 = η 0 E max ∼ 3.5 and |α 1 | 2 = η 1 E max ∼ 0.24 with density dF * (η 0 ) ∼ 0.87 and dF * (η 1 ) ∼ 0.13, respectively.(b) Thermal encoding capacity with resource state energy E max for the capacity upper-bound (7) and coherent resource state with: optimal encoding F * , one-ring encoding (L 1ring ), and flat encoding (L flat ).One-ring encoding is optimal when E max < 1.25034 and flat encoding is almost optimal when E max is large (see Appendix D.2.1 and Appendix D.2.2).(c) The information capacities over resource energy E max using homodyne [18], heterodyne [21] and photon counting [19] in comparison with the optimal capacity χ tc .(d) The channel capacity for displaced squeezed state resource approximates the capacity upper bound (7), which is achieved by the non-Gaussian state (14) (see Section 3.3).The inset plot shows that their relative difference is at most ∼ 1 percent for resource energy E max ≤ 20.
The Wigner function of the average state ρ ave = dF * (η)ρ(η) where F * is optimal encoding and ρ(η) is a ring state for resource |α max ⟩ is plotted in Fig. 10 Figure 5: Capacity for a mixed state resource with fixed energy E 0 and mean photon number n res at T = 0.For each line, the total energy of the resource is fixed at E 0 .The left end of each line (with n res = 0) corresponds to a pure coherent state resource, while the right end is a thermal state resource with mean photon number n res = E 0 .For each E 0 , the capacity χ decreases as the resource mean photon number n res increases.
the flat distribution, its capacity is the von Neumann entropy of the average state with encoding F (η) = η, which is given by . For a derivation and further discussion, see Appendix D.2.2.

Mixed state resource
As so far we only discuss pure resource states in this section, we will now consider a mixed Gaussian state as a resource state, namely the displaced thermal state ρ dts (n res , α) = D(α)ρ th (n res )D † (α) where D is the displacement operator and ρ th (N ) is a thermal state with mean photon number N .We numerically optimize the capacities over encodings for displaced thermal state resource with different mean photon number n res and fixed total energy E 0 and obtained Fig. 5.All of the optimal encodings obtained from this numerical optimization -which satisfy the optimality cosnditions in Proposition 2 -has a finite POI, supporting Conjecture 1.
Note that when n res = 0, we have a pure coherent state resource ρ dts (0, α) = |α⟩⟨α|, and when n res = E 0 , a thermal state with zero amplitude ρ dts (n res , 0) = ρ th (n res ).A thermal state with zero amplitude can still be used to transmit information as long as its energy E 0 = n res is greater than zero.However given a fixed energy E 0 , having a coherent state as a resource is more optimal compared to a resource thermal state with the same energy.Hence, we can transmit more information as the resource state gets closer to a pure coherent state, as shown in Fig. 5.

Nearly-optimal squeezed-state resource
Now we go back to the capacity upper-bound g(E max ) in (7) for resource state energy E max .It is shown in [9, 10] that the optimal resource state that achieves this is the pure non-Gaussian state ρ = |ϕ⟩⟨ϕ| where This state is optimal because resource state |ϕ⟩ and encoding with a uniform phase distribution gives us which is the maximum capacity for an averageenergy constrained channel without any additional constraint on the encoding.This capacity (in the average-energy constrained case) can be achieved using coherent state codewords with a Gaussian distribution encoding (see Appendix D.1).
The state (14) has very high fidelity with respect to a displaced phase-squeezed state even for large E max ∼ 20.This motivates us to look at maximum χ te achievable using a ring of displaced phase-squeezed state.We plot the capacity for a displaced squeezed state in Fig. 4(d) which shows that it performs almost as good as ( 14).The displacement is chosen to match the quadrature expectation value of (14) while the squeezing is set such that the energy of the state is fixed at E max2 .The Wigner function and photon number distribution of the average state using the displaced phase-squeezed state and the optimal state for E max = 3 is plotted in Fig. 9 of Appendix H.

Non-zero temperature environment
In this section we consider the case of T > 0, i.e. the environment is in a thermal state γ T = ρ th (n env ) with mean photon number n env > 0. We will first discuss the encoding capacity when the resource state is a thermal state and show that the optimal encoding satisfies Conjecture 1.We then proceed to discuss numerical results for coherent state resource |α⟩.

Thermal state resource
When the resource state ρ is a thermal state ρ th (n res ) with mean photon number n res , the output codewords are also thermal states of the form ρ th (n η ) where n η = ηn res + (1 − η)n env is a mixture of the resource mean photon number and environment mean photon number.Therefore, for a given encoding F the output average state is a mixture of thermal states ρ ave = dF (η)ρ th (n η ).Since thermal states are invariant under phase shifts, it is useless to encode any information in the phase θ3 .While this is not ideal for maximizing information transfer, it also means that the optimum decoding measurement is just a measurement of the photon number, which is readily made in practice.In this scenario, the cardinality of the distribution support corresponding to an optimal encoding is finite, as proven in Appendix E. Proposition 4. For a thermal-state resource ρ = ρ th (n res ) and any T ≥ 0, the attenuationcoefficients η of the optimal circularly-symmetric encoding takes a finite number of values in [0, 1].
The special case when the resource state has mean photon number n res = 0 (i.e. a vacuum state) reduces to the scenario in section 3.2.This is due to the codeword corresponding to the transmittance value η being equal to the codeword corresponding to η = 1 − η in section 3.2 with ρ = ρ th (n env ) (the displacement is set to zero).The optimal encoding in this case is plotted in Fig. 6(a).We find that for n env < 8.67754, the optimal encoding has exactly two codewords: the resource vacuum state (η = 0) and the thermal state (η = 1).In Appendix F we obtain an analytical form of the Holevo information This quantity tends to 1 as n env → ∞.As n env increases beyond 8.67754, it can be seen in Fig. 6(a) that adding more codewords is necessary to achieve the thermal encoding capacity (black line in Fig. 6(b)).Thus interestingly, it is possible to encode information using a vacuum state resource whenever the environmental temperature is non-zero.

Coherent state resource
Finally, we consider the capacity of a coherent state resource |α max ⟩ (resource energy ) when the environment is in a thermal state with mean photon number n env > 0. The numerically optimized thermal encoding capacity is shown in Fig. 7.For a fixed n env , a larger E max gives a higher capacity.However when we vary E max , a lower n env has higher capacity for larger E max , whereas larger n env has higher capacity for smaller E max (see Fig. 7 inset), which is akin to the phenomenon seen in the previous subsection for a vacuum-state resource.In fact, when n env ≫ 1 and α max = 0, we can get χ te = 1 using a two-codeword distribution (Appendix F).
We now look into a communication scenario over a lossy channel which mixes a low-energy E max ≪ 1 coherent state resource at a fixed attenuation-coefficient η ch .In this scenario we have a single-ring circularly symmetric encoding and displaced thermal state codewords where ρ th (n ch ) is a thermal state with mean photon number n ch = (1 − η ch )n env , we call n ch the codeword thermal photon number.The thermal encoding Holevo information is therefore given by as the codeword entropy is equal for all phase θ.For E max ≪ 1, it is shown in Fig. 8 that by using a coherent state resource we can almost reach the upper bound of the lossy-channel capacity [5, supplementary information, eqn.(12)], given by In fact for  which is derived in Appendix G and shown as the dotted line in Fig. 8. However as the dotted lines goes above the solid lines for large E max , indicating that (20) is not a good approximation of (18) as E max increases.Interestingly as n ch increases, indicating more thermal noise (see blue and black lines in Fig. 8), (20) is a better approximation of (19) as the two lines are close to each other even up to the values E max > 1.We also found that this phenomena occurs as η ch decreases.
Approximation (20) implies that the information capacity per unit photon tends to the value lim Emax→0 χte [E max ]/E max = η ch log 1+n ch n ch , which is finite because n ch > 0. Interestingly, a result by Guha and Shapiro [11] shows that a coherent state resource and binary phase-shift-keying en-  coding in a vacuum environment (n ch → 0) has χ te [E]/E → ∞ as E → 0, allowing transmission of unlimited bits per photon when the average photon in the coherent state is small.

Discussion
In this work, we studied the encoding of information by modulating the phase and transmittance of a peak-energy constrained resource state using passive linear optical channels.The thermal channel family -which has been used to model various tasks such as optical memory reading, communication, and algorithmic cooling -is simple to implement and requires no external energy source.We showed that the maximum amount of information that can be encoded by such channels applied to a given resource state can always be achieved by an encoding scheme that uniformly distributes the phase introduced by the channel (Proposition 1).Among such encodings, the unique optimal encoding is characterized by information-theoretic conditions of how the thermal channel transmittance is distributed (Proposition 2).
We conjectured that all optimal circularly symmetric encoding schemes involve only a finite number of values of the channel transmittance.This conjecture is supported by showing that this condition holds in the case of coherent state resource in zero-temperature environments (Proposition 3) and in the case of thermal state resource interacting with an environment at any temperature (Proposition 4).We also supported this conjecture by numerical evidence based on the aforementioned encoding optimality conditions for cases where the channel output is a displaced thermal state.An intriguing question that one may ask is whether there is a fundamental reason why the optimal circularly-symmetric encoding has a discrete attenuation distribution.
Interestingly, Proposition 1 finds an analog in the optimality of such encodings for the classical peak-constrained 2-dimensional AWGN channel [21], while the properties of optimal distribution of the attenuation coefficient in Proposition 2 is similar to the seminal result of Smith [20,18] on the capacity of the real-valued AWGN channel under a peak energy constraint.These similarities extend to more recent work on n-dimensional AWGN channel capacity under a peak energy constraint (see [22,23] and references therein).In the quantum domain, our work opens the question of the ultimate capacity of bosonic Gaussian channels under a peak energy constraint on the input ensemble, i.e., to the problem of optimizing the Holevo information over such ensembles generated by thermal encodings and beyond.

Acknowledgement
This work is supported by the Singapore Ministry of Education Tier 2 Grant MOE-T2EP50221-0005, The Singapore Ministry of Education Tier 1 Grants RG146/20 and RG77/22, grant no.FQXi-RFP-1809 (The Role of Quantum Effects in Simplifying Quantum Agents) from the Foundational Questions Institute and Fetzer Franklin Fund (a donor-advised fund of Silicon Valley Community Foundation) and the National Research Foundation, Singapore, and Agency for Science, Technology and Research (A*STAR) under its QEP2.0 programme (NRF2021-QEP2-02-P06).R.N. thanks Saikat Guha for useful discussions on quantum reading capacity.V.N. acknowledges support from the Lee Kuan Yew Endowment Fund (Postdoctoral Fellowship).A.T. acknowledges support from CQT PhD scholarship.
[23] Semih Yagli, Alex Dytso, H. Vincent Poor, and Shlomo Shamai Shitz."An upper bound on the number of mass points in the capacity achieving distribution for the amplitude constrained additive gaussian channel".2019 IEEE International Symposium on Information Theory (ISIT) (2019).
[ A Proofs of general properties of optimal thermal encoding A.1 Proof of proposition 1 Consider a thermal encoding F and its corresponding codewords {ρ(η, θ)} η,θ .The Holevo information of this encoding F is given by where S is the von Neumann entropy and ρ ave = dF (η, θ)ρ(η, θ) is the average density operator with respect to F .First we will show that any state that is an average of codewords over uniformly distributed phase is diagonal in Fock basis.
Lemma 1.For any state ρ = dF (x)ρ x with pure ρ x 's, its average state with uniform phase dis- Proof.This can be shown by writing each pure state in terms of Fock basis Now we are ready for the proof of Proposition 1. Namely, we show that for an arbitrary thermal encoding F , there is a circularly symmetric encoding F giving an average state ρave = dF (η) dϕ 2π ρ(η, ϕ) with at least the same entropy.The first term of (21) is upper bounded by the entropy of ρave = S dF (η) where ( 1) is because phase rotation R is a unitary operation, thus does not change the entropy of a state, and (2) by the concavity of the von Neumann entropy, and (3) by defining ρ(η, ϕ) = R(ϕ)ρ(η, θ)R † (ϕ) because state dϕ 2π ρ(η, ϕ) is diagonal in Fock basis by Lemma 1, and hence independent of phase θ, and lastly (4) by marginalizing dF (η, θ) over phase θ.
Now we look at the second term of (21).In the case that all codeword ρ(η, θ) is a pure state, the second term is zero and therefore χ tc [ F ] ≥ χ tc [F ] holds because S(ρ ave ) ≥ S(ρ ave ).In the case of general codewords ρ(η, θ), the second term of (21) is equal to d F (η, θ)S(ρ(η, θ)) because S(ρ(η, θ)) is independent of phase θ.Hence we constructed a circularly symmetric encoding F such that χ tc [ F ] ≥ χ tc [F ] for an arbitrary encoding F .

A.2 Levy metric
For completeness, in this section we define the Levy metric which we will use as a metric on the space of encodings (i.e.cumulative distribution functions) in order to prove some of the properties of an optimal circularly symmetric encoding.Here we specifically define the metric on F , the space of encodings (i.e.cumulative distribution over the interval [0, 1]).The Levy metric d Le : F × F → R is defined as Here d(•, •) : R 2 × R 2 → R is the euclidean distance in R 2 and we maximize the distance between two points on the graphs F and G intersecting a diagonal line a = x + y (on an (x, y)-coordinate plane), over all a ∈ R. (for discussions on its geometric interpretations, see Chapter 2 of [20]).We show that d Le is indeed a metric as d(r, s) = d(s, r) for all r, s ∈ R 2 , and d((x, F (x)), (x ′ , G(x ′ ))) = 0 for a maximizing x, x ′ iff F (x) = G(x) for all x, and satisfies the triangle inequality where the maximization in the second line is over x, x ′ , z, z ′ such that The last line is obtained by simply noting that x ′ = z must be true since G(x ′ ) + x ′ = G(z) + z and for any given three points in R 2 on a line, distance between c F and c H simply cannot exceed the sum of distances between c F and c G and between c G and c H .

A.3 Proof of proposition 2
We give a proof of Proposition 2 analogous to the optimization argument given by Smith [18] about the optimality of AWGN channel encoding.First, we show that the thermal channel Holevo information χ tc is concave (Lemma 3) and weakly differentiable (Lemma 4) and continuous (Lemma 6) over the space of circularly symmetric encoding F equipped with the Levy metric (for definition, see Appendix A.2), then argue that F is convex and compact.If χ tc and F satisfy these properties, we may use the optimization theorem to obtain conditions for an optimal circularly symmetric encoding.
Lemma 2 (Optimization theorem).Consider mapping f from compact and convex topological space Ω to real numbers R. If f is continuous, concave, and weakly differentiable, then there exists a x * ∈ Ω such that f (x * ) = sup x∈Ω f (x), and it is necessary and sufficient for such x * that the weak derivative of f at x * in the direction of x, defined by for ϵ > 0, is non-positive for all x ∈ Ω. Moreover if f is strictly concave, then such x * is unique.
A proof of this optimization theorem can be found in [20, p.15].After establishing the conditions required to use Lemma 2, we will later conclude the proof by showing that Note that all encodings F in F distribute the channel phase uniformly and independently of the distribution over the channel attenuation, therefore optimization of χ tc [F ] is only over encodings over the attenuation, i.e. space of circularly symmetric encodings F is the space of cumulative distribution functions over the interval [0, 1].Space F is convex because an encoding F λ = λF 1 +(1−λ)F 2 is clearly in F for any λ ∈ [0, 1] and F 1 , F 2 , ∈ F .The compactness of F follows from the compactness proof of the space of cumulative distribution functions over interval [−a, a] for some real a in the Levy metric by Smith [20, where he shows that this space is totally bounded, from which compactness follows.Smith's proof for total boundedness of encodings over [−a, a] applies to space of encodings over any bounded and connected interval of R, thus works for F .

Lemma 3. Thermal channel Holevo information χ tc is a strictly concave function of encodings in F .
Proof.For λ ∈ [0, 1] and an arbitrary pair of encodings F 1 and F 2 , let Now let ρ avej = dF j (η, θ)ρ(η, θ) for j ∈ {1, 2}.Because for any vector |k⟩ from basis {|k⟩} k the integral dF ⟨k|ρ|k⟩ is finite for any F , so we have Hence by the strict concavity of the von Neumann entropy, we obtain where equality holds iff ρ ave1 = ρ ave2 , which happens iff As for the second term of χ tc [F λ ], since the entropy S(ρ(η, θ)) is integrable with respect to any encoding, we have By applying the equations ( 29) and ( 30) to ( 27), the required concavity condition Now we show the weak-differentiability of χ tc .
Lemma 4 (Weak differentiability of χ tc ).Thermal channel Holevo information χ tc is weakly differentiable.Particularly, given any pair of encodings F 0 and F , for weak derivative of χ tc at F 0 in the direction of for ϵ > 0, we have where we used the expression of the Holevo information in terms of the marginal information density as in (5) to obtain the second equality and note that lim ϵ→0 i[η, F ϵ ] = i[η, F 0 ] in the last line.Noting that dF 0 (η)i[η, F 0 ] = χ tc [F 0 ] by (36), it remains for us to show that the limit term equals to 0. We do this by again using (36) as follows By the virtue of the L'Hôpital's rule, we evaluate the limit of each terms in the sum to obtain Substituting this back into the sum gives us concluding the proof for (32).
Proof.We will show that lim η ′ →η ||ρ(η) − ρ(η ′ )|| 1 = 0 where || • || 1 is the trace norm, which implies that ⟨φ ′ |ρ(η)|φ⟩ is a continuous function over η for any pair of states |φ ′ ⟩ and |φ⟩.First, we consider amplifier channel A G with gain G = (1 − η)n env + 1 and attenuator channel L η with attenuation η = η G , defined by unitaries U ad V over the input system and environment system as respectively.Unitaries U and V are defined by their actions (in the Heisenberg picture) on the annihilation operator â of the input system [25, 26] where âenv and I env are the annihilation operator and identity operator of the environment, respectively.Similarly for attenuation η ′ we define G ′ = (1 − η ′ )n env + 1 and η′ = η ′ G ′ .Using the one-mode channel decomposition result in [26], we may write a channel N (nenv) η that mixes an input state with attenuation η with a thermal environment with mean photon number n env as a composition of an attenuation and an amplifier channel Therefore we may write the thermal channel output state ρ(η) = A G •L η(ρ) (similarly with attenuation η ′ ), so we get where we again used the triangle inequality and data processing inequality to obtain (1) and ( 2), respectively.Because ρ has energy E, we can use the energy-constrained diamond norm [27, 28] on the amplifier channels and attenuator channels in the last line, defined as ||N − M|| ⋄E = sup ρ ||(N − M) ⊗ I C (ρ)|| 1 where I C is the identity channel for some ancilla system C and the supremum is taken over all states ρ on the product space of the input system and the ancilla system C with energy constraint E on the input system.Thus we obtain the following upper bound as attenuator channel L η scales the energy constraint by η for the first term, and for the second inequality we use [15, Lemma 10.17] to bound the diamond norms by energy-constrained Bures distance β E which can be expressed as [29, 30] where ⌊x⌋ and {x} are the integer part and fractional part of x, respectively, and As η ′ → η we have η′ → η and G ′ → G which implies µ → 1 and ν → 1. Hence both β Eη (A G , A G ′ ) and β E (L η, L η′ ) limit as η ′ → η is 0, implying that lim η ′ →η ||ρ(η) − ρ(η ′ )|| 1 = 0, and therefore as a consequence ρ(η) is weakly continuous.Lemma 6.Given resource state ρ with energy E < ∞, the thermal channel Holevo information χ tc is continuous over space of circularly symmetric encodings F equipped with the Levy metric d Le .
We emphasize that this continuity condition partially rests on our assumption that the resource state has a finite energy E max as indicated in the beginning of Section 2. if lim n→∞ F n = F in the Levy metric.
4. Putting the previous two steps together, the Holevo information is continuous over F ∈ F .
Proof.First we show that if a sequence of encodings {F n } n∈N ⊆ F converging to some F ∈ F under the Levy metric, then sequence of states {ρ Fn } n∈N is weakly converging to some ρ F , where Now, in order to show that the second term of χ tc is continuous over F , we first note that S(ρ(η)) is continuous and bounded over η ∈ [0, 1] (as the thermal channel does not increase energy, hence [15, Lemma 11.8] applies) and that the convergence of a sequence of encodings {F n } n∈N ⊆ F in the Levy metric to F ∈ F implies weak convergence of distributions (again, because F is a compact metric space).Therefore, lim n→∞ F n = F in the Levy metric implies that lim n→∞ dF n (η)S(ρ(η)) = dF (η)S(ρ(η)) .
Hence the limit of a sequence of thermal channel Holevo information with sequence encodings {F n } n∈N converging to F in the Levy metric is Which implies that χ tc is continuous over F .Now we have shown that χ tc is strictly concave, weakly differentiable, and continuous (because von Neumann entropy is continuous) over F , also that the space of circularly symmetric encodings F is convex and compact.Therefore, χ tc and F satisfy all conditions of the optimization theorem (Lemma 2) to guarantee the existence of a unique optimal encoding F * , i.e In other words, the Holevo function is non-increasing in any direction F at F * .We will use this fact to show that F * is optimal iff (26a) and (26b) holds for F * .
First we will show that if (26a) and (26b) holds for encoding F * , then F * must satisfy χ ′ F * [F ] ≤ 0 for all F .This can be shown by simply noting that if (26a) holds, then for any encoding Hence χ ′ F * [F ] ≤ 0 for all F by lemma 4. Now recall the set of points of increase I (see definition 3).Then by noting that I dF * (η) = 1 we have showing that equality χ ′ F * [F * ] = 0 is achieved for encoding F * .For the converse, we will show that both (26a) and (26b) must hold for any optimal encoding F * by showing that if either (26a) or (26b) does not hold, we will get a contradiction.Suppose there exist which contradicts the necessary and sufficient weak derivative condition for optimal F * that χ ′ F * [F ] ≤ 0 for all F .Similarly for (26b), suppose that there exist some points of increase For the remaining points of increase η ∈ I\C, we have giving us a contradiction χ[F * ] < χ[F * ], hence both (26a) and (26b) must hold for any optimal encoding F * .Therefore, concluding the proof of Proposition 2.
B Proof of finite points-of-increase for coherent state resource at temperature T = 0 In this appendix, we give a proof of proposition 3.This closely follows the proof for the finiteness of the set of points-of-increase for optimal photon channel by Shamai [19], which uses the Bolzano-Weierstrass theorem and the identity theorem of analytic function to show that infinite points-of-increase implies that the marginal information density i[λ, F * ] is equal to the mutual information I(F * ) for all photon channel intensity λ ≥ 0. Shamai then shows that this leads to a contradiction.We adopt the same reasoning to show that infinite points-of-increase implies that i[η, F * ] = χ[F * ] for all positive reals η, which leads to a contradiction.Because ρ(η, 0) is a pure state, which implies that its entropy is 0, the marginal information density is So, to establish the contradiction if the points-of-increase is assumed to has infinite cardinality we show that i[•, F * ] is analytic on positive reals.
Proof.Marginal information density in the case of direct detection channel is analytic for all real x > 0 as argued in [19].Setting x = η|α| 2 we can write where )), the negative of Shannon entropy of Poisson random variable with mean η|α| 2 .It is shown in [31] that where N is a Poisson random variable with mean η|α| 2 .The first term η|α| 2 (1 − log η|α| 2 ) is clearly analytic, therefore we only need to show that E Poi(η|α| 2 ) [log N !] is analytic for all η > 0.
Now for the rest of the proof we will show that the second term of ( 49) is analytic in the set for all δ > 0 by showing that the sum is uniformly convergent in any closed disk in C δ .Its analyticity over R >0 then follows because it is analytic over all C δ for δ > 0 and because R >0 ⊆ C δ .
Let us now extend P n [ρ(η)] to all η ∈ C δ for some small δ > 0, which is an open subset of C containing all real numbers in (δ, ∞).Now let η = x + iy and consider So we can now upper bound the modulus of the expected value of log N !for the extended Poisson distribution function where the last inequality is because |y| < δ < x for any η = x + iy ∈ C δ .Now we have log n! ≤ log n n = n log n ≤ n(n − 1), and therefore by considering a closed disk D r (z 0 ) = {z ∈ C δ : |z − z 0 | ≤ r} with z 0 ∈ C δ and r > 0 and ξ the maximum of the real part of any z ∈ D r (z 0 ) we get upper bound Using a standard theorem of complex analysis (see e.g.[32, Theorem 3.1.8])this implies that E Poi(η|α| 2 ) [log N !] is analytic in C δ because each of the term in the sum is analytic.
To show uniform convergence of E Poi(η|α| 2 ) [log N !] in D r (z 0 ) we will use the Weierstrass M-test (see e.g.[32, Theorem 3.1.7])which states that f n uniformly converges to f in some set A if there is some real numbers {M n } n such that n M n < ∞ and |f n (z)| ≤ M n for all n and all z ∈ A. Let ξ = max{Re(z) : z ∈ D r (z 0 )} (i.e. the largest real part of complex numbers in disk D r (z 0 )).By (52) we know that |P n [ρ(η)]| log n! ≤ 2P n [ρ(x)] log n! and for sufficiently large N we have P n [ρ(x)] ≤ P n [ρ(ξ)] for all n > N .So, let Hence, since the second term of (49) uniformly converges in any closed disk D r (z 0 ) ⊆ C δ and each of its term P n [ρ(η)] log n! is analytic in C, it is analytic in C δ .Consequently the restriction of i[•, F * ] over (δ, ∞) is analytic for any δ > 0, concluding the proof.
Note that the transmissivity can only physically takes value in [0, 1], nevertheless because i[η, F * ] is analytic on (0, ∞), a contradiction will follow from assuming that there are infinite POIs.First we consider a lower bound ≥ where ( 1) is because and ( 2) is because n The second term in (53) is an expectation of log N !where N is a Poisson random variable with mean By Stirling's approximation for log of factorials we can use big omega notation to obtain lower bound log n! ≥ Ω(n log n), so ≥ as the Poisson-distributed expectation E Poi(z) [N ] = z.Hence applying this lower bound to (53) gives which implies that i[η, F * ] grows arbitrarily large as η → ∞.Hence for any resource energy |α| 2 and any optimal thermal encoding F * , an infinite points-of-increase leads to a contradiction as Holevo information χ tc [F * ] = i[η, F * ] goes to infinity as η → ∞, so the cardinality of the points-of-increase must be finite.

C Threshold when one ring is no longer optimal
Here for ring state at amplitude α max , θ and vacuum state ρ 0 = |0⟩⟨0| we will solve for α that satisfies which can be written more explicitly in terms of limit D Bounds on the capacity for coherent state resource at T = 0 and average energy constraint A lot of work have been done to derive bounds for capacities of classical channels.For example, see [33,34,35] for bounds for the AWGN channels and [35] for quadrature modulations and measurements.Lapidoth derives bounds for a discrete time Poissonian channel [36] and intensity modulation with additive Gaussian noise [37].There are also bounds on the number of rings for the n-dimensional AWGN channel by Dytso [22,23].In this section, we will present the upper and lower bounds for the Holevo information χ when we have a coherent state |α max ⟩ resource with environment temperature T = 0.

D.1 Upper bounds
If we relax the channel by allowing all encoding F such that the average energy (i.e: energy of ρ ave ) to be bounded by E max = |α max | 2 instead bounding the energy of individual codewords ρ(η, θ), we may use the result by Giovannetti, et al. [5] to get an upper bound to Holevo information χ.For the average-energy constrained channel, χ is attained by a Gaussian distribution encoding and is given in the supplemental material of [5] χ GDE (α max ) = (E max + 1) log(E max + 1)

D.2 Lower bounds
While the classical mutual information gives a lower bound to its quantum analogue χ, this lower bound can be obtained by a single-ring encoding and an encoding correspond to a "flat" distribution.

D.2.1 Single-ring encoding
We can obtain a lower bound by considering a distribution with just one ring at α max .For E max = |α max | 2 , this is given by the von Neumann entropy of the ring state at amplitude α max , which is a uniformly random phase-shifted mixture of |α max ⟩

D.2.2 Flat encoding
Consider an encoding that uniformly distributes the attenuation, i.e.F (η) = η.This distribution will (by construction) has a flat Wigner function in the middle and drops off at the edge (Figure 11).Curiously, the photon number distribution is also rather flat at low photons numbers when energy E max is large enough (see Figure 11).This encoding has a Fock state distribution = dη e −ηEmax (ηE max ) n n! (71) The lower bound (which we denote as L flat ) can then be computed by taking the entropy of this distribution.This bound becomes more informative than L 1ring as E max increases.These bounds and the numerically computed capacity are plotted in Figure 4(b).The information per unit energy diverges as the input E max tends to zero.Now consider the Poisson distribution parameter of P n [α max (η)], which is ηE max and denote a random variable X for such parameter.Such random variables have been studied in the classical context of discrete-time Poisson channels where the communication channel consists of a coherent state intensity modulation followed by direct detection at the receiver [36].Proposition 11 of [36] gives the lower bound on the entropy of such channel where h[X ] is the differential entropy of X and E[X ] is the mean of X .Using this result on the flat distribution F (η) = η for 0 ≤ η ≤ 1, we could have a uniform encoding for random variable X , F (x) = x/E max so that h[X ] = log E max and E[X ] = E max /2.This gives a lower bound for the capacity L flat for flat encoding E Proof of finite points-of-increase for thermal state resource In this appendix we give a proof of proposition 4.Here we employ similar argument to the case of coherent state input in Appendix B where we show that the marginal information density function for the thermal state is analytic for all positive reals then use the identity theorem of analytic functions along with the Bolzano-Weierstrass theorem to obtain i[η, F * ] = χ tc [F * ] for all positive real η from assuming that the points-of-increase is infinite.We again show that this leads to a contradiction to conclude that the assumption of the cardinality of the set of points-of-increase being infinite cannot be true.Here we assume that both mean photon number of the input state n res and mean photon number of the environment n env are strictly larger than zero.We also assume that n res ̸ = n env because otherwise we have ρ(η) = ρ ave for all η hence we cannot encode any information as reflected by χ tc [F ] = dF (η)S(ρ(η)||ρ ave ) = 0 for any choice of encoding F .For simplicity, the above assumptions will be taken into account in the proofs below, but we will give an argument later that the same technique can still be used in the case that either n res or n env is zero (i.e.vacuum).
Noting that the S(ρ(η, 0)) term in the marginal information density (4) is the entropy of thermal state with mean photon number n η = ηn res + (1 − η)n env , we can rewrite it as where Now we will show that i[•, F * ] is analytic over all positive reals.
Before proving Lemma 8, we first need the following technical lemma.
(z+1) k+1 and n max = max{n res , n env } and n min = min{n res , n env }.The following inequality holds for all circularly symmetric encoding F and all k > n max = log(n min + 1) + k log n min + 1 n min .
Proof.We can write So by solving d dz Q k (z) = 0 for z we can find that Q k achieves its maximum at z = k.Moreover, one can observe from the (k − z) term that Q k is monotonically increasing for z < k and monotonically decreasing for z > k , and thus because log is a monotonically increasing function.
Now we are set to show the analiticity of the marginal information density.
The second term is clearly analytic as it is a finitely many compositions of log functions, multiplications, and additions over z ∈ R >0 , hence it remains to show that h[ , it suffices to show that for any ε > 0 there exists M ∈ N such that for all is a composition of additions, multiplications, and log functions, hence is analytic.
By using Lemma 9 we have an upper bound and for any N ∈ N, the partial series identity ) k for all k ∈ N.So by setting N = n max and then plugging it into (78) we get the bound

Hence we get
Now consider a disk D r (z 0 ) and a complex number in it ξ = arg max η∈Dr(z 0 ) k>N . This can be obtained by setting N sufficiently large N > max{n ℜ(η)+δ + 1 : η ∈ D r (z 0 )}.So for all η ∈ D r (z 0 ) we can find an integer M larger than such N and larger than n max to obtain Because the right hand side goes to 0 as M → ∞, for any ε > 0 we can find sufficiently large M such that for all l > M we have |h l [η, F * ]| < ε for all η ∈ D r (z 0 ).As this holds for any closed disk in C δ for any δ > 0, therefore h[•, F * ] is analytic over all positive reals R >0 .
Note that the analiticity of the marginal information density function still holds even if one of n res , n env is zero by the following argument.Without loss of generality assume that n env = 0 (i.e.vacuum environment) and suppose that the optimal encoding is F * , hence n η can only takes up value in [0, n res ].In the above's proof this leads to a pathological case where n min = n env = 0. However we can still find a z ∈ (0, ] for all k > n res because Q k is monotonically increasing and it cannot be the case that dF * (η) = 1 for η = 0 (or any other η ∈ [0, 1] for that matter), namely where the only codeword is the vacuum state of the environment as this leads to zero capacity.One can see this by noting that dF (0) = 1 implies that χ = S(ρ ave ) − dF (η)S(ρ(η)) = S(ρ(0)) − S(ρ(0)) = 0 because phase shift operation by the channel leaves the output state invariant for a thermal state input.This means that P k [F * ] must take value in the interval (0, Q k (n res )) and therefore for any F * we can always find some z > 0 such that Q k (z) < P k [F * ] for all k > n res .Hence, we can still use the technique in the proof of Lemma 8 and Lemma 9 to upper bound the h N function to obtain the same analiticity result.Now, as i[•, F * ] is analytic over positive reals by Lemma 8, by the identity theorem of analytic functions and the Bolzano-Weierstrass theorem, infinite points of increase over [0, 1] implies that i[η, F * ] = χ[F * ] for all positive real η.Now we show that this leads to a contradiction, then conclude that it is not possible to have an infinite points-of-increase.
First, consider the case of n res < n env .As we can write n η = n env + η(n res − n env ), then we have a negative number inside the log function in the −(n η + 1) log(n η + 1) + n η log n η term of i[η, F * ] as written in (75) for sufficiently large η.This implies that i[η, F * ] = χ[F * ] is not real for sufficiently large η, which is not possible.On the other hand, for the case of n res > n env we have because lim η→∞ n η = ∞ and lim x→∞ x log 1+x x = 1, and because lim η→∞ P n [ρ(η)] = 0 for each n ∈ N, implying that − n P n [ρ(η)] log P n [F * ] = 0.As an infinite points-of-increase over [0, 1] leads to contradictions, therefore there can only be finitely many points-of-increase.
F Capacity of two-codeword encoding with a vacuum resource at T > 0.

G Lossy channel with low-energy coherent state resource
Here, we consider a coherent state resource |α⟩ which passes through a lossy channel, i.e. with an environment mean photon number n env > 0 and a fixed attenuation η ch .We will derive an analytic expression of an approximation of the Holevo information (20) in terms of attenuation η ch and thermal photon number n ch = (1 − η ch )n env for a single-ring circularly symmetric encoding given a small energy E = |α| 2 ≪ 1.
Note that this is equivalent to a scenario where we have a thermal state with mean photon number n ch displaced by α as a resource state and an environment in thermal state with the same mean photon number n ch .We can disregard the phase θ and simply write (17) as because the entropy of ρ(η ch , θ) is equal to the entropy of ρ th (n ch ).This is due to identical photon number distribution between codeword ρ(η ch , θ) and thermal state ρ th (n ch ), which is given by and the averaged state ρ ave photon number distribution given by P n [ρ ave ] = ⟨n|D( √ η ch α)ρ th (n ch )D † ( √ η ch α)|n⟩ . (93) Hence we may write the Holevo information of the lossy channel as When |α| is small, we can approximate the displacement operator as allowing us to obtain the approximate probability of the average output Finally, substituting Eq. ( 92) for P n to the right-hand side, and after some simplification, we arrive at which is precisely the approximation χtc [E] in (20).The same result is obtained for the optimal resource state (14), restated below for convenience One way to see this is by considering the displaced squeezed state D(α)S(r)|0⟩ where D is the displacement operator and S is the squeezing operator with α = √ E and r = −( √ 2 − 1)E, hence which is a good approximation for the optimal transmitter codeword given a small E. Transmitting this (now Gaussian state) through the noisy channel, we have a squeezed Gaussian state at the output with amplitude √ η ch E, thermal photon expectation n ch , and squeezing factor − η ch ( √ 2−1)E 1+2n ch .The entropy of each codeword is the same as the entropy of a thermal state with n ch photons.We also find that the photon number distribution of the averaged state is given by (96) with α replaced by √ E, so that we arrive at same result (98) for the Holevo's quantity.We leave the derivations as an exercise for the interested reader.Note that although the state |ϕ⟩ is a resource state that maximizes χ when T = 0 (see Section 3.3), it is not known whether it maximizes χ at T > 0 environment nor when the channel is lossy.The fact we established above that χ for both resource states |ϕ⟩ and |α⟩ coincide at E ≪ 1, could however be a good starting point for future work on studying the thermal channel capacity at T > 0.

G.1 Upper bound on the lossy channel capacity
Here we will proceed to derive an upper bound for χ for the lossy channel scenario which coincides with (20).If each codeword is a Gaussian distribution of coherent states with average energy constraint E, this Gaussian distribution is optimal for a channel with no peak power constraint [5].At the output of a thermal channel, the averaged state is also a Gaussian state with mean photon number n ave = η ch E +n ch and codewords with mean photon number n ch .If we define δ n = P n [ρ ave ]−P n [ρ(η ch )] which is small when E is small, we can approximate S(ρ ave ) as We plot out the cross-section (at p = 0) of the Wigner function W (x, p) for some of the average states in the main text in Fig 10 (top).When |α max | is less than 1.25034 the optimal encoding is to use one ring at η = 1.When |α max | is larger than 1.25034 (but only up to a certain point), the optimal encoding now includes adding a little bit of the vacuum state.As |α max | increases further, the optimal encoding will include more rings and tends to become flatter.Figure 10 (bottom) shows the corresponding photon number distribution of the average state.However, the flat distribution (Figure 11) is not optimal.It performs worse than the 3-ring distribution.).But this is not optimal and does not perform as well as the 3-ring distribution in Figure 10.

Figure 1 :
Figure 1: Thermal channel circuit representation.Input "resource" state ρ is mixed with an "environment" in a thermal state γ T by a beamsplitter B η of transmittance η and then undergoes a θ phase-rotation operation R θ , giving an output codeword state ρ T (η, θ).

Figure 2 :
Figure 2: Thermal channel memory cell reading.Each memory cell is a thermal channel with temperature T parameterized by attenuation η j and phase θ j , transforming an input resource state ρ to an output state to be measured by the receiver.

Figure 3 :
Figure 3: Phase-space visualization of optimal circularly symmetric encoding with coherent state resource at different energies in zero-temperature environment.These are phase-space representations of some instances of optimal encodings plotted in Fig. 4(a),(b).Radius of each red circle indicate energy of its corresponding ring state determined by points-of-increase while, the blue circles indicate attenuated and phase-shifted coherent state codewords that are in each ring state mixture.The coherent state codewords are centered on the circumference of each red circle, with the light blue rings providing a depiction of the ring state corresponding to each point of increase.(a) Optimal encoding for input energy |α max | 2 ∼ 1.1 with capacity χ te ∼ 2 and one ring with energy ∼ 1.1.(b) Optimal encoding for input energy |α max | 2 ∼ 3.5 with capacity χ te ∼ 3 and two rings with energies ∼ 0.24 and ∼ 3.5.(c) Optimal encoding for input energy |α max | 2 ∼ 9.2 with capacity χ te ∼ 4 and three rings with energies ∼ 0.3, ∼ 2.7, and ∼ 9.2.
in Appendix H for multiple values of |α max |.As |α max | gets larger, one can observe from Fig. 10 and Fig. 11 that the Wigner function of the average state of the optimal encoding tends to a flat distribution.For a given |α max |, the one-ring encoding capacity L 1ring and the flat distribution capacity L flat (corresponding to a uniform probability distribution on the disk with η ∈ [0, 1] and θ ∈ [0, 2π] -see also[9]) are clearly lower bounds on the thermal encoding capacity.These bounds are visualized in Fig.4(b), showing that the former is tighter for small |α max | and the latter tighter for larger |α max |.As our codewords are pure states, the one-ring capacity is the von Neumann entropy of the ring state at |α max |, given by (9).See Appendix D.2.1 for a simple derivation.For

Figure 6 :
Figure6: Encoding using vacuum state resource at temperature T > 0 (i.e.environment mean photon number n env > 0).(a) Optimal encoding using vacuum state resource with n env > 0. The shading indicates the probability p j of the j-th codeword.For n env < 8.67754, the two-codeword encoding consisting of the vacuum resource and the thermal environment is optimal (Appendix F).When n env increases beyond 8.67754 (vertical line), the optimal encoding includes a third codeword.(b) Capacity and bounds for vacuum state resource as a function of n env .The mutual information from a two-codeword encoding (dotted red line) is optimal when n env < 8.67754, coinciding with the numerically computed capacity (black line).

Figure 7 :
Figure 7: Thermal-encoding capacity for coherent state resource |α max ⟩ at T > 0 environment.Capacity varies with environment mean photon number n env (see Section 4.2).

SFigure 9 :Figure 10 :
Figure9: (left) Wigner function of the averaged state with an optimal state in Eq. (14) and a displaced squeezed state resource at E = 3.Each displaced squeezed state codeword has 5.40 dB of squeezing and an amplitude of 1.60.(right) Photon number distribution for the average state.The optimal state's Fock state population follows a geometric distribution by construction.The fidelity between the optimal resource and the displaced squeezed state is 0.996.

3 Figure 11 :
Figure11: Wigner function (left) and photon number distribution of a flat distribution (with the resource state as the coherent state |3⟩).But this is not optimal and does not perform as well as the 3-ring distribution in Figure10.
1.Here we use the continuity of von Neumann entropy S over G E , which is the set of states with energy at most E max , shown in [15, Lemma 11.8] to show the continuity of χ tc over F in 4 steps: 1. Convergence of a sequence of encodings {F n } n∈N ⊆ F in the Levy metric to some F ∈ F implies that the corresponding sequence of average states {ρ Fn } n∈N ⊆ G E weakly converges to ρ F ∈ G E .2. Weak convergence (for detailed discussions see [15, Section 11.1]) of {ρ Fn } n∈N to some ρ F implies that lim n→∞ S(ρ Fn ) = S(ρ F ) because S is continuous (by [15, Lemma 11.8]).Therefore, S(ρ F ) is continuous over F ∈ F . 3. The second term of χ tc , dF ′ (η)ρ(η) is continuous over F ′ ∈ F , i.e.