Heat-Bath Algorithmic Cooling with optimal thermalization strategies

Heat-Bath Algorithmic Cooling is a set of techniques for producing highly pure quantum systems by utilizing a surrounding heat-bath and unitary interactions. These techniques originally used the thermal environment only to fully thermalize ancillas at the environment temperature. Here we extend HBAC protocols by optimizing over the thermalization strategy. We find, for any $d$-dimensional system in an arbitrary initial state, provably optimal cooling protocols with surprisingly simple structure and exponential convergence to the ground state. Compared to the standard ones, these schemes can use fewer or no ancillas and exploit memory effects to enhance cooling. We verify that the optimal protocols are robusts to various deviations from the ideal scenario. For a single target qubit, the optimal protocol can be well approximated with a Jaynes-Cummings interaction between the system and a single thermal bosonic mode for a wide range of environmental temperatures. This admits an experimental implementation close to the setup of a micromaser, with a performance competitive with leading proposals in the literature. The proposed protocol provides an experimental setup that illustrates how non-Markovianity can be harnessed to improve cooling. On the technical side we 1. introduce a new class of states called maximally active states and discuss their thermodynamic significance in terms of optimal unitary control, 2. introduce a new set of thermodynamic processes, called $\beta$-permutations, whose access is sufficient to simulate a generic thermalization process, 3. show how to use abstract toolbox developed within the resource theory approach to thermodynamics to perform challenging optimizations, while combining it with open quantum system dynamics tools to approximate optimal solutions within physically realistic setups.

Here we use powerful techniques, developed within the resource theory approach to thermodynamics [7], to greatly extend an important class of cooling algorithms known as Heat-Bath Algorithmic Cooling (HBAC) [3,4]. The goal of these is to maximize the purity of a target system S in a given number of cooling rounds. Each round of the algorithm starts with a unitary applied to the target together with several auxiliary systems A initialized in a thermal state, with the aim of pumping entropy away from the target. Next, the auxiliary systems are re-thermalized through coupling with a heat-bath, before the entire process is repeated in the next round. The asymptotically optimal protocol of this form (in terms of the purity reached in infinitely many rounds) is the Partner Pairing Algorithm (PPA), introduced in [8], whose asymptotic performance has been recently derived for a single target qubit starting in a maximally mixed state [9,10].
Here, we generalize HBAC in the following sense: instead of using the thermal environment only to refresh the auxiliary systems through a complete thermalization, we allow strategies involving an incomplete thermalization of system and ancillas. In particular, we optimize the protocol over every 'dephasing thermalization', that is any quantum map on SA which 1. leaves the thermal state on SA fixed and 2. dephases the input state in the energy eigenbasis. This generalizes the 'rethermalization' used in previous protocols, and thus extends HBAC to a larger set that will be called Extended Heat-Bath Algorithmic Cooling (xHBAC). For example, the recent protocol introduced in Ref. [11], which proposes the use of the heat bath to implement a non-local thermalization 'state-reset' (SR) process related to the Nuclear Overhauser Effect [12], can already be seen as a (non-optimal) protocol within our extended family of xHBAC.
For every finite dimensional target state in an arbitrary initial state, we optimize the cooling performance over every xHBAC for any given number of rounds. We give an analytical form for the optimal cooling operations, uncovering their elegant structure, and show that the ground state population goes to 1 exponentially fast in the number of rounds. This opens up a new avenue in the experimental realization of algorithmic cooling schemes, one requiring control over fewer ancillas, but better control of the interaction with the environment.
Our results suggest that xHBAC schemes provide new cooling protocols in practically relevant settings. We show that, for a single qubit target and no ancillas, the optimal protocol in xHBAC can be approximated by coupling the qubit by a Jaynes-Cummings (JC) interaction to a single bosonic thermal mode, itself weakly coupled to the external bath. The memory effects present in the JC interaction are crucial in approximating the theoretical optimal cooling. The performance of the ideal cooling protocol is robust to certain kinds of noise and imperfections, and it outperforms HBAC schemes with a small number of auxiliary qubits. This makes it, in our opinion, the most promising proposal for an experimental demonstration of cooling through xHBAC, as well as an experimental demonstration that non-Markovianity can be harnessed to improve cooling.
1 Results and discussion 1

.1 A general cooling theorem
A general xHBAC will consist of a number of rounds and manipulate two types of systems, the target system S to be purified and the auxiliary systems A. Furthermore, we will assume we can access a thermal environment at inverse temperature β = (kT ) −1 , with k Boltzmann's constant. We denote by H S = d−1 i=0 E i |i i|, E 0 ≤ · · · ≤ E d−1 , the Hamiltonian of S and by ρ (k) S the state after round k. Also, we denote by H A the Hamiltonian of the auxiliary systems, initially in state ρ A , and by τ X = e −βH X /Tr[e −βH X ] the thermal state on X = S, A. A protocol is made of k rounds, each allowing for (Fig. 1  2. Dephasing thermalization. Any Λ (k) applied to SA, where Λ (k) is any quantum map such that i) Λ (k) (τ S ⊗ τ A ) = τ S ⊗ τ A (thermal fixed point); ii) If |E , |E are states of distinct energy on SA, E|Λ (k) (ρ SA )|E = 0 (dephasing).
At the end of the round, a refreshing stage returns the auxiliary systems A to their original state ρ A . We denote the set of protocols whose rounds have this general form by P ρ A . Typically ρ A = τ A , and this operation is simply a specific kind of dephasing thermalization. However, more generally, ρ A may also be the output of some previous cooling algorithm, subsequently used as an auxiliary system. Note that A is distinguished by the rest of the environment in that it is under complete unitary control.
Some comments about xHBAC protocols are in order. First, if we skipped the dephasing thermalizations and ρ A = τ A , we would get back to a standard HBAC scheme. xHBAC protocols include thermalization of subsets of energy levels, as in the mentioned SR protocol of Ref. [11]. But they are by no means limited to these strategies. Second, for every k we wish to find a protocol maximizing the ground state population p S |0 over all sequences of k-round operations. Note that often the literature on cooling is restricted to finding asymptotically optimal protocols (e.g., the PPA [8]), but here we find optimal k rounds protocols for every k. Finally, we note in passing that within this work we will not make use of auxiliary 'scratch qubits' [9,10], i.e. S itself is the target system to be cooled.
To give the analytical form of the optimal protocols we need to introduce two notions. First, that of maximally active states. Given a state ρ with Hamiltonian H, the maximally active stateρ is formed by diagonalizing ρ in the energy eigenbasis and ordering the eigenvalues in increasing order with respect to energy. That is,ρ is the most energetic state in the unitary orbit of ρ.
Second, we define the thermal polytope in the following way. Given an initial state ρ with populations in the energy eigenbasis p, it is the set of populations p of Λ(ρ), with Λ an arbitrary dephasing thermalization. Without loss of generality, we assume that every energy eigenspace has been diagonalized by energy preserving unitaries. The thermal polytope is convex and has a finite number of extremal points (Lemma 12, [13]). It is particularly useful in thermodynamics to classify out of equilibrium distributions according to their β-order [14]. Given a vector p and a correspondent Hamiltonian with energy levels E i , the β−order of p is the permutation of π of 0, 1, 2, . . . such that the Gibbsrescaled population is sorted in non-increasing order: Crucially, for any dimension d, we provide an explicit construction of a set of dephasing thermalizations, denoted Λ (π,α) and called β-permutations, able to map any given p to every extremal point of its thermal polytope (Methods, Sec. 2.1 and Fig. 2). π and α are two vectors of integers, each ranging over all permutations of {0, ..., (d−1)(r −1)}, if r is the dimension of A. Their significance is the following. If we are given an initial state p with β-order π, the state q α := Λ (π,α) (p) is the 'optimal' state among all those with β-order α. More precisely, there is no other state in the thermal polytope of p which has β-order α and can be transformed into q α by dephasing thermalizations. This is in analogy with the role of permutations under doubly stochastic maps. In fact, in the limit β → +∞, β-permutations are permutations and the β-ordering is the standard sorting.
A particularly important β-permutation, denoted by β opt , is the one that maximizes the ground state population of S among all dephasing thermalizations and, furthermore, achieves the largest partial sums S |i . β opt is constructed as follows: β opt = Λ (π,α) with π an ordering of energy levels of SA such that, if q This identifies an optimal dephasing thermalization. But what is the optimal unitary control that we need to apply? The following lemma answers this question: Lemma 1. Given a state ρ with population p, the thermal polytope of the maximally active stateρ contains the thermal polytope of U ρU † for every unitary U .
The proof can be found in the Methods, Sec. 2.3. With these concepts in place, we can give the optimal cooling protocol for any given set of auxiliary systems in A. Theorem 1. Assume S is a d-level system with E d−1 > E 0 and β > 0. Without loss generality (by Figure 2: An optimal cooling round on a d = 3 system. Map the initial state after round k−1 to its maximally active state by a unitary U (continuous black arrow). The thermal polytope of the maximally active state then describes all possible final states achievable by arbitrary thermalizations. β-permutations map to the extremal points of these polytope (dotted red arrows), and β opt achieves the one closest to the ground state (dashdot red arrow). The distance to the ground state decreases exponentially fast in the number k of rounds.
an initial diagonalizing unitary) we can take the initial state of S to be diagonal in the energy eigenbasis with p Then, for a given auxiliary state ρ A , the optimal cooling protocol in P ρ A is such that in each round k: 1. The unitary mapping ρ (k−1) S ⊗ρ A to the corresponding maximally active state is applied to SA.
2. β opt is applied to SA.
The optimal protocol achieves p (k) 0 → 1 at least exponentially fast in k, even with no ancilla.
The intuition behind this protocol is simple (see Fig. 2). In a single round, the unitary maximizes the amount of energy in SA, in accordance to Lemma 1. Next, the optimal β-permutation is applied. For the proof, see Methods (Sec. 2.4).
In analyzing the performance of the protocols P ρ A it is important to keep in mind the cost of preparing and controlling the auxiliary systems A, especially if ρ A = τ A . Remarkably, however, the optimal xH-BAC protocol that uses no auxiliary systems A still has p (k) 0 → 1 exponentially in k. Furthermore, such protocol has the further advantage that the cooling operations do not change with the round k. Also note that the initial state-dependent unitary can always be replaced by a complete thermalization while maintaining the same cooling scaling, so that no prior knowledge of ρ (0) S is needed to unlock a strong cooling performance. Finally, in the Methods section 2.5, we construct for any d an explicit protocol that not only uses no ancillas, and can be realized by a sequence of two level interactions: first, a unitary swaps the populations of the ground and most excited states; then, a sequence of two-level dephasing thermalizations between energies (1, 0) is performed, each maximising the net population transfer i → i−1 (these are β-swaps, discussed in more detail in the next section). Every d−1 repetitions, the performance takes an elegant form: (2)

The qubit case
To show how the general results can be used in a specific case, let us analyze in detail the single qubit case with no auxiliary systems A (denoted P ∅ ) and energy gap E. The only nontrivial β-permutation is Λ β = Λ (π,α) with π = {1, 0} and α = {0, 1}, which induces transition probabilities 0|Λ β (|1 1|)|0 = 1, 1|Λ β (|0 0|)|1 = e −βE . This is the β-swap introduced in [13], which can be realized by the dephasing thermalization Since the unitary mapping the state to its correspondent maximally active state is the Pauli X unitary, Theorem 1 reads as follows: Corollary 1. Assume E > 0, β > 0. Without loss of generality (by making use of an initial diagonalizing unitary), we can take the initial state of the system to be diagonal in the energy basis with p 1 . The optimal cooling protocol in P ∅ is such that in each round k: The population of the ground state after round k is: and p (k) 0 → 1 as k → ∞. The performance can be obtained by direct computation, or by setting d = 2 in Eq. (2). Note the simple structure of the optimal protocol, in particular the fact that the same operation is applied iteratively at each round. Furthermore, note that β-swaps cannot be realized by Markovian interactions with the thermal environment. In fact, if we optimized over Markovian interactions only, the optimal protocol with no auxiliary systems would be the trivial thermalization at the environment temperature ρ S → τ S . Then, at best we would achieve a ground state population 1/(1 + e −βE ). The proof is given in the Methods (Sec. 2.6). This shows that memory effects can be used to great advantage in cooling scenarios. In particular, differently from standard HBAC protocols, our protocol achieves exponential convergence to the ground state. The 'price' we pay is the need for greater control over the thermalization steps; however, we need no auxiliary systems in the unitary stage and the protocol iterates the same transformation at every round, which simplifies the implementation.
We presented an optimal protocol for qubits, but how is it realized by an explicit interaction and environment? Here we answer this question. It is known that an infinite dimensional environment is necessary to increase the purity of the target system to 1 in the absence of initial system-environment correlations [15,16]; however, we do not need a complex environment: it was shown in [13] that a single bosonic mode in a thermal state τ B = 1 − e −βE ∞ n=0 e −nβE |n n| suffices to implement the required β-swap. Specifically, the unitary needed is (|0, n 1, n − 1| + |1, n − 1 0, n|) .
(5) What is perhaps more remarkable is that, as we prove, we do not need to rethermalize (or otherwise refresh) the thermal mode at every round of the protocol. The same mode can be reused in each round, in spite of the correlation build-up and the back-reaction, with no re-thermalisation needed. This provides a simplified and explicit version of Corollary 1 (proof in Methods, Sec. 2.7): Theorem 2. Under the assumptions of Corollary 1, the optimal protocol in P ∅ has the initial state ρ (0) S ⊗ τ B and is such that in each round k: SB , the state of system-bath after round k − 1.

Robustness to imperfections
Theorem 2 holds even beyond some of the (standard) idealizations we made: 1. If the bosonic mode gap does not perfectly match the system gap, U β SB still realises the cooling of Eq. (4), with the caveat that some work flows at each β-swap.
2. We can consider a typical imperfection, i.e. a small anharmonicity in the ladder of B. In the Meth- , the protocol of Theorem 2 performs very closely to the ideal scenario of Eq. (4) even for moderate anharmonicity (within 0.005% for βE=1, τ = 0.05).
Furthermore, up to now we considered an idealized scenario in which one can perform the required β-swap exactly. In a real experiment, however, one will only realize an approximation of it (we will see an explicit model in the next section). Consider a noise model in which, instead of the de-excitation probability 1 required by the β-swap, one can only realize dephasing thermalizations Λ (k) with an induced de-excitation probability 0|Λ (k) (|1 1|)|0 ≤ 1 − . Denote this set by P ∅ . Define an -noisy β-swap as the dephasing thermalization Λ β with 0|Λ β (|1 1|)|0 = 1 − (when = 0 this is the β-swap). Then one can derive the following noise-robust version of Theorem 1: Under the assumptions of Corollary 1 and given ≤ 1 1+e βE +e 2βE , the optimal nontrivial cooling protocol in P ∅ is such that in each round k: The population of the ground state after round k is: In words: even in the presence of (moderate) noise, the optimal strategy is to perform the best approximation to the ideal protocol; hence, the simple structure of the optimal protocols is robust to imperfections. Theorem 3 is proved in Section 2.9 by a tedious but straightforward optimization. There, we also prove another form of 'robustness': if we optimize over the larger set of thermalization models known as thermal operations [17], which include protocols where some amount of superposition in the energy basis survives the thermalization step, Theorem 3 holds unchanged. Surprisingly, the extra coherent control in the thermalization phase is not necessary for optimal cooling.  at step k for different environment temperatures for: ideal optimal cooling protocol from Theorem 1 (blue), upper and lower bounds on a JC realization of this protocol (red, see Sec. 3.1 for details on how to calculate them), SRΓ protocol from [11] run with 1 ancilla qubit (green), PPA protocol from [8] run with 2 ancilla qubits (purple). The initial state for all protocols is thermal.

An experimental proposal
We can now provide an explicit experimental proposal. More details about the calculations and simulations involved can be found in Sec. 3.1 -3.2.
The unitary U β SB of Eq. (5) can be realized exactly with an intensity-dependent Jaynes-Cummings model [18,19], However, a perhaps more promising avenue is to approximate the β-swap steps with a resonant Jaynes-Cummings (JC) coupling with a thermal bosonic mode, . Assuming good control of the interaction time s, we can numerically optimize s to realize an -noisy β-swap with the small . If the thermal mode is reset at each round, we can compute the performance of this implementation using Theorem 3.
In Fig. 3 we compare this protocol with leading proposals in the literature, for an initially thermal target. The JC protocol outperforms the PPA with 2 ancillas [8] (even with some limitations in the timing accuracy and the total available waiting time), with the exclusion of the very high temperature regime; when βE is not too small, it performs comparably (if potentially slightly worse) to the ideal version of the non-local thermalization scheme with 1 ancilla proposed in [11]. The optimal β-swap protocol outperforms all these protocols, but requires an intensity-dependent JC model whose implementation we leave as an open question.
These considerations assume that at every round the bosonic mode is reset to the thermal state. However we show, using a standard master equation (Eq. (51) in Sec. 3.2), that the reset can be substituted by a more Figure 4: The optimal protocol for cooling can be approximated by one in which a population inversion (Pauli X) is applied to the qubit before it enters the cavity, where it interacts with a resonant mode initially at temperature β. If the interaction parameters are appropriately chosen, such that a small is achieved, the outgoing qubit has a much lower temperature.
realistic slow rethermalization of the single mode with an external environment. Since reasonably high cooling is achieved after 2 rounds, this suggests the following implementation: a stream of slowly fired atoms passes through two identical cavities resonant with the qubits we are trying to cool, each supporting a single mode initially thermal; at the entrance of each cavity we perform a Pauli X operation, and let each qubit interact with the cavity mode for a chosen time (see Fig. 4 for a schematic description). Numerics show that if re-thermalization of the cavity mode is sufficiently quick compared to the firing rate, the cooling performance settles to a constant as many atoms are cooled (see Fig. 5).
This is, to our knowledge, the most appealing setting to experimentally implement the protocol and is, in fact, highly reminiscent of a micromaser [20,21]. This device consists of a cavity with a harmonic oscillator in an initially thermal state, and it works provided we are able to keep this oscillator out of thermal equilibrium. The firing of an excited atom through a cavity in the micromaser can be seen as a single instance of our proposed imperfect implementation of the optimal cooling protocol. However, the figures of merit in each case are different: in the micromaser, we need very pure atoms in order to excite the cavity efficiently, while in the cooling protocol the aim is to obtain these very pure atoms in the first place. Nevertheless, this suggests that experimental settings where the micromaser has been shown to be possible might be good platforms in which to test our algorithm. To our knowledge, this currently includes both cavity QED [20,21] and solid-state settings [22].
In conclusion, often HBAC techniques made the implicit assumption that the best way of exploiting the external environment to pump entropy away from the system is to thermalize the auxiliary ancillas. Here we show how optimizing over the thermalization strategy provides new cooling protocols breaking previously established limits, in particular allowing exponential convergence to the ground state in the ideal scenario.

β-permutations and the thermal polytope
Here we discuss how to construct β-permutations and how they characterize the thermal polytope. The first key definition is a generalization of the concept of majorization. Recall that, given two d − 1-dimensional probabilities p and p , p majorizes p , denoted p p , if where x ↓ denotes the vector x arranged in descending order. ρ ρ is defined as the same relation among the corresponding eigenvalues. Then define Definition 1 (Thermo-majorization [14]). Given a state ρ with Hamiltonian H = The thermo-majorization curve of ρ is formed by: 1. Applying a permutation π of {0, ..., d − 1} to both p and E such that p π(i) e βE π(i) is in non-increasing order.
We refer to the vector π with elements π(i) as the β-order of p.  linearly to form a concave curve -the thermo-majorization curve of p.

Plotting the ordered 'elbow' points
Given two probability distributions p and p associated with the same energy levels, we say that p thermo-majorizes p if the thermo-majorization curve of p is never below that of p . We denote this by p th p . Also, if ρ and ρ are states with population vectors p and p , the notation ρ th ρ denotes p th p .
Note that th becomes in the infinite temperature limit β → 0. One can show that p th p is equivalent to the existence of a Gibbs stochastic matrix (a stochastic matrix G with Gg = g, if g i ∝ e −βEi ) such that Gp = p [23]. Furthermore, a dephasing thermalization Λ acts on the population vector as a Gibbs-stochastic matrix G j|i = j|Λ(|i i|)|j ; conversely, every Gibbs-stochastic matrix can be realized by dephasing thermalizations. As such, the thermal polytope of p coincides with the set of probability distributions p such that p th p . That this is a convex polytope follows from the fact that the set of Gibbs-stochastic matrices also is. To characterize the thermal polytope of p explicitly, we construct a set of maps, called β-permutations, that take p to each one of its extremal points.
Then G is constructed as follows.
Row 0 : If e −βE α(0) < e −βE π(0) , set: else, let k 0 be the smallest integer such that: k0 j=0 e −βE π(j) ≥ e −βE α(0) . and set: else, let k m be the smallest integer such that: and set: In the next section, we prove the important fact that, if π is the β-order of p, the set {P (π,α) p} α , for α varying over all possible β-orders, includes all extremal points of the thermal polytope of p (Lemma 2).

Proof of properties of β-permutations
In the following we show that the β-permutations, defined Section 1.1, convert p into any of the extremal points of the thermal polytope p α , while being independent of the specific form of p (the set of β-permutation that need to be applied only depends on the β-order of p). This is in complete analogy with permutations and their role within the theory of doubly-stochastic matrices as determined by Birkhoff's theorem [24] (in fact, β-permutations become permutations in the infinite temperature limit).
Specifically, we will prove the following facts: 1. P (π,α) is a Gibbs-stochastic map.
2. Given p with β-order π, p α := P (π,α) p has β-order α and is such that there is no q with β-order α such that p th q th p α (Lemma 2).
3. The set of p α for varying α includes all the extremal points of the themal polytope of p (the latter is equivalently defined as all q such that p th q).

Proof of claim 1
Recall that we reordered the rows of P (π,α) according to α and the columns according to π: . . . . . .
Hence, the condition of Gibbs-stochasticity can be rewritten as For simplicity, we repeat here the algorithm for constructing P (π,α) . Row 0 is populated as follows: if e −βE α(0) < e −βE π(0) then Set: else Let k 0 be the smallest integer such that: Set: end if Let us first check that Eqs. (8) and (10) are fulfilled in each clause of row m = 0. In the first clause, it is clear that G 00 ≥ 0. Gibbs preservation follows as: For the second clause, it is again true that we have 0 ≤ G 0j ≤ 1, for all j (if G 0k0 were greater than 1, then this would contradict the definition of k 0 in Eq. (14)). We then note that: so Gibbs preservation is satisfied.
With row zero in place, row m (m ∈ {1, . . . d − 1}) is populated as follows: else Let k m be the smallest integer such that: Set:

end if
We now check that Eq. (8) and Eq. (10) are fulfilled in each of the clauses for each row in the matrix. For the first clause 0 ≤ G mkm−1 ≤ 1 follows from: where in the last inequality we used the definition of k m−1 . Finally, note that: and hence Eq. (10) holds. For the second clause, the definitions of k m−1 and k m ensure that both 0 ≤ G mkm−1 ≤ 1 and 0 ≤ G mkm ≤ 1 hold. We now show Gibbs preservation. First note that: where we have used Eq. (24) and Eq. (26) in the last line. Consider now Note that there exists an integer r such that 0 ≤ r ≤ m − 1 and: Using these expressions in Eq. (28), we see that: and hence the Gibbs distribution is preserved. Finally, let us show that the matrix constructed is stochastic. Using the expressions for G ikm−1 given above, the fact that k i = k r for all r ≤ i ≤ m − 1 and the definition of k m−1 , Together with Eq. (24) and the already proved fact that G ij ∈ [0, 1] for all i, j ∈ {0, . . . , d − 1}, we get that G is stochastic.

Proof of claim 2
This can be stated as the following Lemma:

Lemma 2.
If p has β-order π and p α := P (π,α) p, then p α has β-order α and is 'maximal' in the sense that there is no q with β-order α such that p th q th p α .
We now construct the proof. β-permutations have a simple geometrical description in terms of thermomajorization curves. In row zero, we want to maximize the population of E α(0) subject to the thermo-majorization constraints. To do this, we compare e −βE α(0) and e −βE π(0) . If e −βE α(0) < e −βE π(0) , then we cannot move all of the population in E π(0) to E α(0) without violating thermo-majorization and must instead move only a fraction of it, as given by Eq. (12). This case is illustrated in Fig. 7a. On the other hand, if e −βE α(0) ≥ e −βE π(0) , then we can move all of the population in E π(0) to E α(0) and set G 00 = 1. We then try to move population from E π(1) to E α(0) and repeat this process until we reach an energy level whose population we cannot move into E α(0) without violating thermo-majorization. This energy level is defined through Eq. (14) and given by E π(k0) . From E π(k0) we can only move a fraction of the population into E α(0) , given by Eq. (16). When we reach this point, we have moved as much population as possible into E α(0) . This case is illustrated in Fig. 7b.
In the mth row we are determining the population of E α(m) after the transformation. At this point in the construction, all of the population from energy level E π(j) for 0 ≤ j ≤ k m−1 − 1 has been transferred already into the set of energies E α(i) m−1 i=0 . We thus have G mj = 0 for 0 ≤ j ≤ k m−1 − 1. We now check to see how much of the remaining population in E π(km−1) we can move to E α(m) subject to the thermo-majorization constraints. If we can only move a fraction of it, we follow the 'if' clause in the above algorithm and determine G mkm−1 to be given by Eq. (20). This is illustrated in Fig. 8a. Alternatively, if we can move all of it, we follow the 'else' clause and G mkm−1 is given by Eq. (24). The rest of the construction follows a similar line of argument to the first row. We move all of the population from the energy levels after E π(km−1) to E α(m) until we reach an energy level E π(km) , where this is not possible due to thermo-majorization. This is illustrated in Fig. 8b.
Let c p : [0, Z S ] → [0, 1] be the function such that c p (x) is the height of the thermo-majorisation curve of p at x. The action of the β-permutation matrix described above is such that the thermo-majorization curve of p α = P (π,α) p is constructed as follows. For i ∈ {0, . . . , d − 1}, denoting by (x α i , y α i ) the points that are piecewise linearly connected to give the thermo-majorization curve of p α , one has: 2. Define p α i := y α α(i) − y α α(i−1) , with y α(−1) := 0. By construction, there is no q with β-order α such that p th q th p α .

Proof of claim 3
That all extremal points of the thermal polytope associated to p have the above form is stated and proved in Lemma 12 of Ref. [13].

Proof of Lemma 1: optimal unitary control
First we recall the notion of maximally active state associated to a given ρ, defined as:

Definition 2 (Maximally active state). Let ρ be the state of a system with associated Hamiltonian
are the eigenvalues of ρ arranged in ascending order.
This corresponds to the state with the highest energy along the whole unitary orbit of ρ, i.e.ρ = arg max U Tr[U ρU † H]. We can now prove Lemma 1 (note that we use facts from Section 2.1 in this proof):  Proof. The claim is equivalent to provingρ th U ρU † for every unitary U . We will first show that for any permutation π of {1, ..., d}: where λ ↑ denotes the vector of eigenvalues ofρ arranged in ascending order. To show it, we will construct a sequence of d − 1 Gibbs-stochastic matrices converting λ ↑ into πλ ↑ , passing through the intermediate states q (1) , . . . , q (d−1) , with q (d−1) = πλ ↑ and q (0) = λ ↑ . We start with q (0) . We wish to move the probability λ ↑ 0 associated with energy level E 0 to energy level E π(0) . To do this, we transpose in sequence the populations of energy levels 0 ↔ 1, 1 ↔ 2, . . . , π(0) − 1 ↔ π(0). Each of these is possible under Gibbs-stochastic matrices, because λ ↑ 0 ≤ λ ↑ j+1 , ∀j and E j ≤ E j+1 . In fact, they are achieved with the matrices: where I \(j,j+1) denotes the identity on all levels different from (j, j + 1) and Hence it is always possible to move population λ ↑ 0 to energy level E π(0) by setting q (1) = G (π(0)−1,π(0)) . . . G (1,2) G (0,1) q (0) . q (1) coincides with πλ ↑ on element π(0). Now, if we truncate element π(0) from q (1) , we obtain a vector satisfying Reasoning as before, we find a second sequence of Gibbs-stochastic matrices acting on the truncated vector, that transposes the populations in adjacent positions and moves λ ↑ 1 to the energy level E π (1) . Applied to q (1) , this sequence give a state q (2) which coincides with πλ ↑ in elements π(0), π (1). It should be clear that this construction can be repeated on every intermediate q (m) , each time applying it to the distribution in which we ignore the energy levels E π(0) , ..., E π(m−1) , since these have the correct occupation probability. This provides a sequence of states culminating in q (d−1) = πλ ↑ as required.
Having shown that Eq. (29) holds, we now argue forρ th U ρU † for all unitaries U . Given U , letp = diag U ρU † (remembering to first diagonalize the degenerate energy subspaces if necessary) and we want to prove λ ↑ thp . Letp ↑ be the state formed by arranging the elements ofp in ascending order. By Eq. (29), we have thatp ↑ thp . In addition, by the Schur-Horn theorem, we have that λ ↑ p ↑ . Combining this with the fact that both λ ↑ andp ↑ are ordered in terms of increasing occupation probability and have the same β-order, it follows that: Hence λ ↑ thp ↑ thp , which implies the claim.

Proof of Theorem 1
First, let us focus on optimality in a single round. That the initial unitary can be taken to be the one mapping ρ (k−1) S ⊗ ρ A to the corresponding maximally active state follows immediately from Lemma 1. Next, we perform the dephasing thermalization that maximizes the population of the ground state of S within the thermal polytope. If π k is the β-order of the state after the unitary, this can be chosen to be the β-permutation Λ (π k ,α) with α given in Eq. (1). That this is an ordering that maximizes the ground state population of S follows from the concavity of thermo-majorization curves. Furthermore, it maximizes are the populations of ρ (k) S arranged in descending order. This follows from Lemma 2 and the fact that the β-permutation of Eq. (1) is the one that maximises the x-axis coordinates of the elbow points of the output thermo-majorization curve.
We now formally show that the concatenation of such rounds forms an optimal protocol in P ρ A . Suppose that at the beginning of round k we have one of two diagonal states ρ will represent the trajectory followed by the state when we apply the claimed optimal protocol, whereasρ (k) S will be the trajectory followed by a generic protocol (hence, ρ Our goal is to show that: for all choices of ρ A , V (k) and Λ (k) . Here U (k) m.a. denotes the unitary creating the maximally active state on SA, V (k) is an arbitrary unitary and Λ (k) a dephasing thermalization. This implies not only that our protocol maximizes the ground state population in a given round but also that deviating from it can only have an adverse effect on the achievable population in subsequent rounds.
To see that Eq. (30) holds, first note that as ρ , then: This gives: The first line follows from Eq. (31) by comparison of the thermomajorization curves, since maximally active states have the same β-orders. The second line follows from Lemma 1. Now, by definition of Λ (π k ,α) and Eq. (32),

A qudit protocol with β-swaps and no ancillas
The β-permutations affecting only two energy levels (i, j) at once are called β-swaps [13] and have the form where I \(i,j) is the identity on every level l = i, j. We now define the following Gibbs-stochastic matrix, which is a many-level generalization of the matrix A 6 from Ref. [25]: where ∆ ij = E i − E j .G can be generated as a sequence of β-swaps:G = d−2 i=0 β i+1,i . Now consider the protocol in which, at each round, we perform the following steps: 1. A unitary transformation U is performed, that swaps the population of the ground state and the most excited state.

Proof of trivial Markovian cooling for d = 2
Markovian dephasing thermalizations are defined as those dephasing thermalizations that can be written as the solution of a master equation with a (possibly time-dependent) generator in Lindblad form. Here we show that the lowest achievable temperature by Markovian dephasing thermalizations without ancillary systems is limited to that of the environment. As discussed in Sec. 2.1 of the Methods, dephasing thermalizations act on the population vector p as Gibbsstochastic matrices, in this case 2 × 2. Using the results of Ref. [26], a direct computation shows that the action on p of Markovian dephasing thermalizations can be written as with β 01 given by Eq. (33). Let (p, 1 − p), (q, 1 − q) (s, 1 − s) be the energy distribution at the start, after the unitary, and after the Markovian dephasing thermalization, respectively. A direct calculation shows that the thermalization relates s and q by Since p and q are related by a unitary, we have that p(1 − p) ≤ q(1 − q). Assume that the system is initially hotter than the bath, p < 1/(1 + e −βE ). By direct inspection the maximum is found at λ = 1/(1 + e −βE ), where one achieves s = 1/(1 + e −βE ), the thermal ground state population at temperature β. The same would be true if one optimized over all Markovian thermal operations (using again the results of Ref. [26] and the proof in Sec. 2.9).

Proof of Theorem 2
We prove this theorem via a direct calculation of the ground state probability after a repeated application of the unitaries. Note that the state ρ (k) S at the beginning of round k + 1 of the optimal protocol is incoherent for every k ≥ 1. As such, the system-bath state after round k will have the general form: for some occupation probabilities p The relation between the occupations at step k and those at step k + 1 is (see Fig. 9) Solving these equations recursively, we find the populations at step k as a function of the initial occupation probabilities (at k = 0): As before, we denote by p . Figure 9: The circulation of populations induced by each round (Pauli X followed by U β SB ) of the optimal protocol. Arrows indicate complete transfer of population. The picture gives an intuitive understanding of the cooling mechanism of the optimal protocol.
We now use the relation p where t (0) k = 1 − e −βE e −kβE denotes the occupation of the kth energy level of the bosonic mode at the beginning of the protocol. This finally leads to (36) Direct substitution shows that this expression is identical to that derived in Theorem 1 for the optimal protocol.

Robustness of Theorem 2 to anharmonicity
We model an anharmonicity in the oscillator as a correction to the energy gap between the energy levels n and n + 1 (which we now label as E an n ) where τ 1. To analize its effect, notice that the ground state from Eq. (36) is and that this does not rely on any particular form for the populations t e −βEn peaks at less than 0.005%. This shows that these protocols are robust to realistic anharmonicities.

Proof of Theorem 3 and its extension to thermal operations
We give a proof of a more general result than Theorem 3, optimizing the thermalization step over the set of thermal operation (TO) T on a qubit, which strictly include all dephasing thermalizations.

Preliminaries
First we review the definition of thermal operations [17,27], which are channels acting on a quantum system S in state ρ S and with Hamiltonian H S as: where β = 1/(kT ) is the inverse temperature of the surrounding heat-bath, H B is an arbitrary Hamiltonian and U is an energy preserving unitary that satisfies [U, H S + H B ] = 0. Let S be a qubit with Hamiltonian H S = E|1 1|. Given ρ S , in addition to the occupation probabilities p S = (p, 1 − p) T , let ρ 01 = 0|ρ S |1 denote the off-diagonal term. Similarly, for a second state σ S , let q S = ( 0|σ S |0 , 1|σ S |1 ) T and σ 01 = 0|σ S |1 . Then any thermal operation on a qubit can be described by: 1. The action on the population (given by a Gibbs-stochastic matrix G).
2. The action on the off-diagonal term.
Taken together this gives: where the expression for c follows from the constraint |c| ≤ √ G 00 G 11 [28,29] which is a consequence of the complete positivity of T . One can also apply a phase e iθ to ρ 01 by a unitary, so that in general c ∈ R, but since this will be irrelevant in the following considerations, without loss of generality we set θ = 0 and take c ≥ 0 (one can always reversibly transform to any θ = 0 by an energy-preserving unitary, which is a thermal operation). Hence, for our purposes a qubit thermal operation T is defined by the two parameters:

v [T ] = (λ, c) .
(42) With this notation, the β-swap introduced in the main text is a thermal operation T β that removes all quantum coherences and has G = G β-swap where 0). Dephasing thermalizations are the subset of thermal operations with c = 0.

Proof of an extended version of Theorem 3
Define the set of -noisy thermal operations as the set of thermal Operation T such that Let T ∅ denote the set of cooling protocols using no ancilla in which at each round k 2. An -noisy thermal operation T (k) is applied to S.

A unitary V (k) is applied to S
Then, Theorem 4. Under the assumptions of Corollary 1 and given ≤ 1 1+e βE +e 2βE , the optimal nontrivial cooling protocol in T ∅ is such that in each round k: 1. The Pauli X unitary is applied to S.

Any -noisy thermal operation T with v(T ) = (1 − , c) is applied to S.
The population of the ground state after round k is: where Z = 1 + e −βE and p (k) Before we prove this theorem, let us discuss the consequences. Note that a particular choice for the optimal cooling protocol is T with v(T ) = (1 − , 0). This is just the -noisy β-swap defined in the main text. Since T ∅ ⊂ P ∅ , this means that the above theorem implies Theorem 3 of the main text as an immediate corollary. Also note, that setting = 0, we obtain as a simple consequence a direct proof of Corollary 1 that does not goes through Theorem 1.
Furthermore, note it does not matter what is the choice for c in the -noisy thermal operation: it could even vary from round to round. The Jaynes-Cummings implementation described in the main text is an -noisy thermal operation. On the population it acts as an -noisy β-swap, but it has c = 0. However, the above theorem shows that this does not make any difference, since having control on the coherent part of the evolution does not provide any advantage and the performance is independent by the choice of c at each step. In other words, as claimed in the main text, we can safely ignore the coherent evolution and focus on the induced population dynamics, at least for small enough.
Proof. The most general protocol can be written as where U (i) (·) = U (i) (·)U (i) † and V (i) (·) = V (i) (·)V (i) † denote general unitaries applied to S in round i and T (i) is a thermal operation on S applied at round i.
1 . This can be done without loss of generality, since any protocol in P starts with an arbitrary unitary U (1) .
We want to maximize the ground state population over all choices of U (i) , T (i) and V (i) . As we perform an arbitrary unitary at the beginning of every round, we can assume without loss of generality that the state of S at the start of round k + 1 is diagonal in the energy eigenbasis: and that p (k) ≥ 1 2 (here we drop the subscript 0 to simplify the notation). In round k + 1 of the protocol, we have: and our goal is to maximize p (k+1) . As ρ . Among all thermal operations associated to a fixed Gibbs-stochastic matrix G with Gq (k) := s (k) (with q (k) and s (k) defined through the above matrices), optimality of the protocol imposes that we choose T (k) to be a thermal operation that maximizes the absolute value of b (k) -i.e. it preserves the maximum possible amount of coherence. This is achieved by the thermal operation that maximizes the parameter c for given λ, i.e., from What remains to be done is to perform an optimization over all possible Gibbs-stochastic matrices G, parametrized by λ ∈ [0, λ max ] as in Eq. (41), where λ max ≥ 1 − 1 1+e βE +e 2βE (which corresponds to ≤ 1 1+e βE +e 2βE ). For each G, the relation between s (k) , b (k) and q (k) and a (k) is Finally, using the unitarity of U (k) , we can relate q (k) and a (k) to p (k) via:

Jaynes-Cummings model without refreshing the thermal mode
In the previous result the mode is always reset back to the thermal state after every step. Theorem 2 shows that when using U β SB the same mode can be used repeatedly. Does this still hold (at least approximately) when the interaction is via the Jaynes-Cummings model?
We explore this question via a numerical simulation of a modified algorithm where now, at each round k, instead of rethermalizing the mode completely, the bosonic mode is partially re-thermalized via a dissipation process. This is done with a standard master equation, which models the evolution of state ρ B of the cavity mode due to the interaction with an external thermal field [30]: Here A is the rate of loss of cavity photons (controlling the strength of the re-thermalization) and n = 1 e βE −1 is the average number of reservoir quanta with energy E.
We compare the cooling achieved after k rounds in the case of full reset at each round (Eq. (6) with = 1−G 0|1 (s)), with the cooling achieved for various finite re-thermalization times. In each case, we fix βE = 1 and a particular interaction time between the qubit and the cavity mode,s = gt = 98.92. Note that the same procedure can be applied for any s, or even taking s to be a random variable to simulate imperfections in the timing. The results are shown in Fig. 11. We find that, unlike in the case of implementing U β SB , one has to reset the mode back to the thermal state as much as possible in order for the algorithm to work efficiently with a fixed interaction time.
Reasonably high cooling can be achieved by interrupting the protocol after 2 rounds. The above findings suggest the experimental setup discussed in the main text, where atoms are slowly fired inside two identical cavities resonant with the qubit transitions we are cooling. The protocol on each atom then consists of: 1. First Pauli X applied.
2. First JC interaction applied, for some times.
3. Second Pauli X applied. t th =∞ Figure 11: Cooling achieved by the Pauli/Jaynes-Cummings protocols as a function of the number of rounds, when the mode is re-thermalized for various times t th . The infinite time corresponds to the mode being completely re-thermalized at every step (light blue curve at the top), while t th = 0 corresponds to no re-thermalization at all (red curve with wide oscillations). We see that re-thermalization is needed to achieve the greatest cooling. For this figure, the parameters are βE = 1, A = 1, g = 1 and the Jaynes-Cummings interaction is turned on for a period of timet = 98.92.
4. Second JC interaction applied, for the same times.
5. The cavity modes undergo re-thermalization for a finite time according to Eq. (51).
While in the first step the JC interaction achieves a de-excitation probability of G 0|1 (s) given by Eq. (49), every subsequent atom interacts with only partially re-thermalized cavities, for which Eq. (49) does not hold. Thus, the protocol may not achieve the same cooling on every atom. Nevertheless, one may expect that, by firing the atoms slowly enough, the re-thermalization of the cavities due to losses will be sufficient to make the cooling performance almost constant. This intuition is confirmed in Fig. 5. We take the atoms to be initially in a thermal state with βE = 1. We then plot the final ground state population achieved by each atom passing through the two cavities, as a function of the number of atoms already cooled. The various curves represent different choices for the ratio A/r between the strength of the re-thermalization and the rate at which the atoms are fired (with the caveat that we assume r small enough so that two atoms are never present at the same time in a cavity). When A/r = ∞, the single mode has time to re-thermalize perfectly, the performance is the same for each atom and given by Eq. (6) with k = 2 and = 1 − G 0|1 (98.92). In realistic scenarios, however, we see that the incomplete thermalization negatively impacts upon the performance. However, we also see that the performance stabilizes to a constant after a small number of atoms are fired, so for a high enough ratio A/r cooling of any number of atoms is possible. This may be understood as the creation of a steady state in the cavity field.