Thermodynamics of Quantum Causal Models: An Inclusive, Hamiltonian Approach

Operational quantum stochastic thermodynamics is a recently proposed theory to study the thermodynamics of open systems based on the rigorous notion of a quantum stochastic process or quantum causal model. In there, a stochastic trajectory is defined solely in terms of experimentally accessible measurement results, which serve as the basis to define the corresponding thermodynamic quantities. In contrast to this observer-dependent point of view, a `black box', which evolves unitarily and can simulate a quantum causal model, is constructed here. The quantum thermodynamics of this big isolated system can then be studied using widely accepted arguments from statistical mechanics. It is shown that the resulting definitions of internal energy, heat, work, and entropy have a natural extension to the trajectory level. The canonical choice of them coincides with the proclaimed definitions of operational quantum stochastic thermodynamics, thereby providing strong support in favour of that novel framework. However, a few remaining ambiguities in the definition of stochastic work and heat are also discovered and in light of these findings some other proposals are reconsidered. Finally, it is demonstrated that the first and second law hold for an even wider range of scenarios than previously thought, covering a large class of quantum causal models based solely on a single assumption about the initial system-bath state.


Introduction
The success of the classical framework of stochastic thermodynamics is undeniable. It pushes the validity of the laws of thermodynamics far beyond their original scope, it allows to consistently describe the thermodynamics of small fluctuating out-of-equilibrium systems, even along a single trajectory, and many of its predictions have been verified experimentally [1][2][3][4][5][6].
In contrast, how to describe the thermodynamics of small quantum systems along a single 'trajectory' remains a subject of debate since 20 years. Obviously, the reason is the measurement backaction of an external observer, who manipulates a small quantum system and thereby changes the process. This implies that any theory of quantum stochastic thermodynamics should be able to consistently treat the measurement backaction and is necessarily different from its classical counterpart [7]. Over the past, many different approaches have been put forward, often differing in their predictions and lacking either an experimentally feasible way to verify them or the ability to describe quantum effects. Recently, based on a rigorous notion of a quantum stochastic process or quantum causal model [8][9][10][11][12][13][14][15][16], an 'operational' approach to quantum stochastic thermodynamics was constructed [17][18][19]. It puts the experimenter in the foreground by explicitly including all external interventions (state preparation, measurements, feedback operations, etc.) in the description. A 'stochastic trajectory' is defined solely in terms of experimentally available (classical) measurement results, on which the corresponding thermodynamic quantities are built. The formalism is free from many restrictive and previously used assumptions (e.g., perfect measurements, continous measurements, detailed control about the bath degrees of freedom, no feedback control, use of ambiguous notions for time-reversed trajectories, etc.) and can be readily applied to analyse a multitude of experiments including Refs. [20,21].
Nevertheless, the definitions used in Refs. [17][18][19] were derived from an observer-dependent point of view, involving quantum measurement theory, subjective choices of the 'Heisenberg cut', and certain classicality assumptions. To circumvent the use of any such elements, this paper rederives the framework of operational quantum stochastic thermodynamics based on an inclusive, Hamiltonian ('autonomous') approach. By using only arguments from nonequilibrium statistical mechanics of isolated systems, we provide a solid and independent justification for the definitions of Refs. [17][18][19]. To the best of the authors' knowledge, this distinguishes the operational approach from other proposals in quantum stochastic thermodynamics. The idea to model everything autonomously is not novel and has been used very successfully to understand the physics of Maxwell's demon [22][23][24][25][26][27][28][29] or the thermodynamics of various forms of information processing [30][31][32]. Of particular inspiration in our context is the approach by Deffner and Jarzynski [31], hence it is also worthwhile to distinguish our approach from it. First and most importantly, they did not explicitly connect their autonomous approach to an observerdependent point of view to obtain the corresponding thermodynamic definitions at the trajectory level. Second and more an issue of technicalities, their approach was classical, used certain weak coupling assumptions, and they treated information and entropy differently by excluding correlations, which turn out to be crucial for our purposes. By overcoming all these assumptions, we do not only justify the framework of Refs. [17][18][19], but we provide a general and promising tool to study the emergence of thermodynamic quantities at the trajectory level without making explicit use of quantum measurements. This opens up a novel possibility to derive the laws of thermodynamics at the trajectory level even beyond the present considerations.
A short summary together with an outline of the paper reads as follows. First, in Sec. 2 we briefly review the essential of a quantum causal model or quantum stochastic process as far as it is needed for the following. Afterwards in Sec. 3, we carefully construct the corresponding autonomous model, whose dynamical equivalence to a quantum causal model is proven in Sec. 4. The central part of this paper is Sec. 5. In there, we study the thermodynamics of our autonomous model using arguments from statistical mechanics and we demonstrate that it naturally induces definitions at the trajectory level in accordance with operational quantum stochastic thermodynamics (apart from one minor and typically negligible difference). Furthermore, we also discover that our autonomous approach leaves room for some remaining ambiguities in the definition of stochastic heat and work. In light of this freedom, we show that the definition of stochastic work in the 'twopoint projective measurement scheme' [33,34] provides one possible consistent choice within our autonomous approach. However, further arguments show that it is only valid for isolated, but not for open systems (in which we are primarily interested here). On the other hand, the concept of "quantum heat", at least as originally introduced in Ref. [35], does not have any theoretical foundation within our autonomous approach. Finally, the paper ends with some additional noteworthy remarks in Sec. 6.

Quantum causal models
Albeit there are some differences in the detailed mathematical description of a quantum causal model or quantum stochastic process [8][9][10][11][12][13][14][15][16], the common idea is that the primary entity in an experiment is the control operation or intervention performed on the system, but not the state (i.e., density operator) of the system itself. By shifting one level higher from states to operations, a quantum causal model can be represented by a multi-linear map from the set of interventions (applied at different times) to a final output state. While being quite abstract at first place, it offers many conceptual advantages, for instance, to optimize quantum circuits [36,37], to rigorously define quantum non-Markovianity [38] or classicality [39,40] in quantum processes, as well as to design multi-time resource theories [41]. If the system is classical, the approach reduces to classical causal modeling [42], which allows to go beyond the standard description of a classical stochastic process, which is based only on passive and perfect observations. We here follow closely Refs. [15,16], which has a clear interpretation in terms of stochastic trajectories, see also Ref. [43]. We note that we do not attempt to review the approach in its full generality.
To begin with, we briefly repeat the essential of quantum operations, instruments and interventions [44,45]. At any time any such operation is described by a completely positive map A(r), where r denotes the measurement result associated to this intervention. This could be the result of a standard projective measurement or a more general measurement [46,47]. Its action on the density operator ρ S of the system is denoted as ρ S (r) = A(r)ρ S , where we used a 'tilde' to denote a non-normalized stateρ S (r). The probability to obtain outcome r is encoded in its trace p(r) = tr S {ρ S (r)}. A set of completely positive maps A(r) forms an instrument if its average effect A ≡ r A(r) is described by the completely positive and trace-preserving map A. It can be written in the familiar operator-sum represen- where it is also known as a Kraus map. Here, 1 S denotes the identity in the system Hilbert space.
By generalizing to multiple times, we now allow that the external agent interrupts the time-evolution of the system at arbitrary times t n > · · · > t 0 by arbitrary interventions characterized by an instrument {A k (r k )}. Here, the subscripts indicate the time t k at which the intervention happens. Note that at each time t k we can choose a different instrument with, possibly, a different set of measurement results associated to it. Given a sequence of measurement results, denoted by r n = (r 0 , . . . , r n ), the non-normalized state of the sys-tem at a time t > t n can be formally written as (1) Here, the so-called 'process tensor' [15,16,38] T presents a multi-linear map from the set of control operations to the final system state. Note that we only indicated the dependence on the control operations A k (r k ) because those are the objects we assume to be controllable in an experiment. In contrast, for instance, we did not indicate the dependence on the initial system state, which is assumed to be arbitrary but fixed [albeit it can be manipulated via A 0 (r 0 )]. In particular, in case of system-environment correlations the process tensor does not depend linearly on the initial system state. Furthermore, we remark that the process tensor can be tomographically reconstructed by measuring the final output state many times in response to the chosen set of control operations and hence, it is experimentally a well-defined object. Finally, the probability to get the results r n is given via p(r n ) = tr S {ρ S (t, r n )}.
Microscopically, the process tensor arises from the following picture. Let denote an arbitrary system-bath Hamiltonian, where H S (H B ) is the bare system (bath) Hamiltonian and V SB their mutual interaction. Furthermore, in view of the thermodynamic framework considered later on, we already introduced some time-dependent driving protocol λ t (e.g., an electric or magnetic field), which can change the energies of the system. For the present section, however, this is of minor relevance. The global unitary time evolution from t to t is described by the superoperator where U SB (t , t) = T + exp −i t t dsH SB (λ s )/ with the time-ordering operator T + . Then, the process tensor can be microscopically expressed as Here, ρ SB (t − 0 ) denotes the global system-bath state, which can be arbitrary at the moment, prior to the first intervention, which happens at time t 0 . Note that we use in general the notation t ± to denote a point in time just before or after t. Furthermore, the multi-linearity of the process tensor is evident from Eq. (4). Finally, remember that each A k (r k ) ≡ A k (r k ) ⊗ I B acts nontrivially only on the system (we suppress any identity operations I as well as many tensor products in the notation).
It turns out [10][11][12][13][14][15][16] that the framework can be even further generalized. Remember that T is a process tensor acting multi-linearly on a sequence of interventions. Equivalently, the process tensor can be seen as an object that acts on the tensor product of spaces L(H S ⊗ H S ), where L(H S ⊗ H S ) denotes the vector space of linear maps acting on H S ⊗H S with the system Hilbert space H S . Thus, if we denote by A n:0 (r n ) ≡ A n (r n ) ⊗ · · · ⊗ A 0 (r 0 ) an element of that space, we can write Eq. (1) in short asρ S (t, r n ) = TA n:0 (r n ). Now, due to linearity, it is possible to consider any sequence of control operations A , not only those that are decorrelated as A n:0 (r n ) is. This happens, for instance, when one considers the average effect of classical feedback control where A k (r k ) = A k (r k |r k−1 ) depends on previous measurement results. Note that also the driving protocol λ t = λ t (r k−1 ) is allowed to depend on previous measurement results. This generality captures any conceivable feedback scenario, but for notational simplicity we suppress the possible dependence on r k−1 most of the times. Furthermore, it is even possible to consider quantum correlated operations. This goes beyond classical feedback control and can result in interventions that can no longer be written as a completely positive map at a single time (the overall process tensor nevertheless preserves complete positivity). It will become clear from the exposition below that we can also include this into our autonomous framework, but for ease of presentation we refrain from discussing the most general scenario with all its details. Finally, within the framework of quantum causal models it is even possible to consider interventions A k (r k ), where the input and output spaces are different (for instance, by adding or discarding ancillas to or from the system), or space-like separated interventions, which happen at different laboratories. Again, we find that the benefits added by the greater generality do not outweigh the drawbacks of a more hampered presentation here.

Autonomous model
In this section we construct the autonomous model, which simulates a quantum stochastic process of the form (1) if it is finally subjected to an appropriate measurement giving result r n . That this is in principle possible is not new, see Refs. [10,15,48]. Our discussion is, however, less abstract and more 'physics'-oriented by explicitly specifying Hamiltonians. This is needed later on to formulate a theory of thermodynamics. We will guide our construction along the experimental setup Figure 1: A system S is in contact with a bath, which -in view of the thermodynamic framework considered later onis sketched as a heat bath with initial temperature T . A preparation apparatus P sequentially produces ancillas A(k), k = 0, 1, . . . , which interact with the system when they enter the shaded grey area and thereby implement a control operation. Afterwards, these ancillas are detected giving rise to a measurement outcome r k , which is stored in a memory M . As indicated by the feedback loop, the external agent can decide to change, e.g., the state of each ancilla (sketched with different colors) or the Hamiltonian of the system or the system-ancilla interaction via the protocol λt (not explicitly sketched) conditioned on all previous outcomes. Fig. 1. We proceed in two steps.

sketched in
First, we only consider the unconditional or unmeasured dynamics. This means that the external agent only deterministically implements control operations A k at time t k described by completely positive and trace-preserving maps, which do not depend on any measurement result r k . Pictorially speaking, we ignore the right hand side of Fig. 1 (the detector, the memory, and the feedback loop). Then, the main insight to get an autonomous Hamiltonian model for this situations rests on the unitary dilation theorem, first proven by Stinespring [49] (see also Refs. [44,45]). It states that any control operation can be written as the reduced dynamics of a unitary interaction with an external ancilla system: Here, U SA(k) denotes the unitary operator resulting from the system-ancilla interaction and ρ A(k) the initial state of the kth ancilla, which was prepared in a preparation apparatus P . Note that the unitary and the initial state are allowed to depend on k such that, in general, A k = A for k = . The Hamiltonian associated to this 'unconditional' setup therefore reads In detail, it consists of the following parts: . This is the same as in Eq. (2) describing the system, bath and their interaction ignoring any external influence. B. Ancilla preparation H P A (λ t ). In this part the different ancillas are produced by implementing a unitary U P A(k) prior to the interaction of ancilla A(k) with the system. By fixing a suitable initial state ρ P (t − 0 ) of the preparation apparatus, we can -due to Eq. (5) by choosing an appropriate U P A(k) -implement any operation we want on the ancilla. Hence, we can prepare any ancilla state we like [50]. Due to this, the initial state of the ancillas ρ A (t − 0 ) can be in principle arbitrary, albeit in any experiment there are certain restrictions imposed on the preparation of the initial ancilla states, see, e.g., Refs. [20,21]. Note that we use A to denote the totality of all ancillas A(0), A(1), . . . , A(n) and that n can be an arbitrary large number.
C. System-ancilla part H SA (λ t ). This Hamiltonian reads in detail and describes the bare Hamiltonian H A(k) of each ancilla A(k) as well as its interaction V SA(k) (λ t ) with the system. Each H A(k) can be different and in principle even time-dependent, albeit this is typically not the case and therefore, we omitted it for notational simplicity. In contrast, the time-dependence of V SA(k) (λ t ), which can be again different for each A(k), is crucial. Later on in Sec. 4 we will design it in such a way that it implements the unitary U SA(k) in Eq. (5). At the moment, however, we are more relaxed and only assume that V SA(k) (λ t ) is zero outside the 'interaction zone' with the system (the shaded grey area in Fig. 1). Especially, it is zero when the ancilla gets prepared in P or measured afterwards (see below). D. Work reservoir λ t . We still allow for an external time-dependent field λ t , which is responsible for, e.g., changing the system Hamiltonian H S (λ t ) or switching on and off the system-ancilla interactions V SA(k) (λ t ). This means that we model the driving, which will be later on identified with the work supplied to the setup, semi-classically. While this is not fully autonomous (in the sense of a completely time-independent model), the resulting dynamics are nevertheless unitary. Note that the ideal limit needed to generate a time-dependent Hamiltonian out of a time-independent one is understood [31]. As the purpose of this paper is not to understand the detailed autonomous modeling of work reservoirs, we stick throughout to this semi-classical picture for ease of presentation.
We remark that the setup specified so far is identical to the framework of repeated interactions or collisional models as considered in Refs. [19,[51][52][53]. Next, we want to explicitly include measurements and conditioning in the description. Here, the key mathematical ingredient to autonomously model the observer is an extension of Eq. (5). In fact, every possible intervention A k (r k ) can be implemented as [45,54] where P (r k ) is some orthogonal resolution of the identity in the ancilla Hilbert space, r k P (r k ) = 1 A(k) . Note that the average effect of the intervention (8) is described by Eq. (5), i.e., r k A k (r k ) = A k . To implement Eq. (8), we need additional degrees of freedom. They will turn out to describe an idealized classical memory responsible for performing the measurement of the ancilla and for storing the measurement result r k . Finally, we also need to implement the feedback loop as sketched in Fig. 1 in an autonomous way, but this does not need any additional physical degrees of freedom. Thus, the Hamiltonian (6) is generalized to We now study its terms again separately in detail.
Following the tradition of the thermodynamics of computation [55], we split the memory in informational degrees of freedom (IDF) I and non-informational degrees of freedom (NIDF) N , which are here responsible for dephasing the IDF (see also Ref. [31]). Strictly speaking, the NIDF are not necessary for the following, but we keep them as they simplify the algebra and argumentation at some places and, in particular, including them seems more realistic from a physical perspective. Thus, the Hamiltonian of the memory is split as The Hilbert space of the IDF is spanned by the vectors |r n = |r n ⊗ · · · ⊗ |r 0 encoding the measurement results. As customarily done, we assume that these states are energetically degenerate, i.e., H I ∼ 1 I . Furthermore, the IDF are initially in a standard reference state 1| decorrelated from the rest. We assume that the NIDF act like a pure dephasing bath such that the information stored in I is classical meaning that, after tracing out the NIDF, the IDF are only classically correlated with the rest: The dephasing can be implemented in various ways and, in principle, does not entail any energetic cost. An ex-plicit example works as follows 1 : let r k ∈ {1, . . . , d(k)} label the in total d(k) different measurement results at time t k . Then, let H N describe a set of n noninteracting and energetically degenerate entities, which are prepared in a maximally mixed state of dimension note that a maximally mixed state is identical to a Gibbs state for degenerate energies). Then, let V IN (λ t ) implement a short unitary evolution between I(k) and N (k), which happens right after the kth measurement and has the form Due to the degeneracy it is obvious that [U IN (k) , H I + H N ] = 0 and thus, the unitary has no energetic cost. Furthermore, straightforward algebra shows that Thus, we implemented a dephasing operation at zero energetic cost, as desired. F. Ancilla-memory part V AM (λ t ). This part is responsible for the actual measurement of the ancilla by correlating its state with the IDF, i.e., such that for any ρ A(k) [we use a primed notation to distinguish it from the state ρ A(k) appearing in Eqs. (5) and (8)] Thus, if we measure the IDF in state |r k , the conditional state of the ancilla is P (r k )ρ A(k) P (r k ), which eventually gives rise to Eq. (8). Note that the timedependence of V AI(k) (λ t ) is such that the measurement happens after the interaction between the system and the kth ancilla as implemented by Eq. (7), but before the dephasing operation (12). G. Conditional (feedback) part. So far, the external agent can implement arbitrary control operations A k (r k ) at an arbitrary set of discrete times t k . However, in the most general case, the external agent is also allowed to use the available information in the memory to condition the future dynamics after time t > t k on the so far available measurement results r k . This is implemented by the last part of Eq. (9), rn H SBP A (λ t , r n )|r n r n |, which applies a different 'unconditional' Hamiltonian (6) depending on the state of the memory |r n r n |. In fact, due to Eq. (11) the evolution from time t + k to t − k+1 is given by Here, we excluded the results r for > k because we naturally assume that for t < t H SBP A (λ t , r n ) = H SBP A (λ t , r −1 ) depends only on the so far obtained measurement results. To conclude, for each measurement trajectory r k we can apply a different Hamiltonian affecting any possible part of Eq. (6) and hence, allowing full control about the system and the ancillas. If we do not perform feedback, then H SBP A (λ t , r n ) = H SBP A (λ t ) for all r n . Note that we could even change the time of the measurements during the experiment by conditioning the memory Hamiltonian H M (λ t ) on previous measurement results too. For ease of presentation we refrained from writing down the most general case. Finally, we remark that the present construction can be seen as a general form of coherent feedback control [56][57][58]. It was already used to study the thermodynamics of feedback control in Refs. [22,52].
We repeat that the temporal order of the dynamics is essential (see also Fig. 2): the preparation happens before the actual control operation (the system-ancilla interaction), which happens before the measurement of the ancilla, which happens before the final dephasing of the memory. Apart from this order the timedependence of all interactions is so far arbitrary.
Finally, the time evolution is fully fixed by specifying the global initial state, which reads Here, ρ SB (t − 0 ) is an arbitrary initial system-bath state, which we will need to restrict in Sec. 5, ρ P (t − 0 ) is a suitable chosen initial state of the preparation apparatus, ρ A (t − 0 ) is an arbitrary initial ancilla state, and finally, the initial state of the memory is chosen as ρ M (t − 0 ) = |1 n 1 n | I ρ N (t − 0 ) with a suitable initial state for the NIDF as discussed above. deph . We remark that there is some freedom of how to fix the time t k , when the intervention 'happens'. Here, it is indicated as the time when the system-ancilla interaction takes place, which is well-defined in the limit where this interaction is instantaneous as assumed in Sec. 4. Note that we excluded the feedback loop from Fig. 1 for a simplified graphical presentation only.

Dynamical equivalence with a quantum causal model
We now show that our autonomous model captures the dynamics of a quantum causal model as described in Sec. 2. For that purpose we need to implement the control operations instantaneously. Ideally, this requires that the interaction between the system and the kth ancilla can be written as where δ(t − t k ) denotes the Dirac delta function. This implements an instantaneous unitary evolution U SA(k) at time t k . Starting from the initial state (16), the time evolution of the global state can be iteratively constructed via Here, U denote the operations resulting from the preparation of the kth ancilla, its interaction with the system, its measurement, and the final dephasing of the memory, respectively (see also Fig. 2). While their temporal order is important, it is not necessary that U prep happen instantaneously before or after the control operation U (k) ctrl since they commute with U (k) SB . In fact, in an actual experiment delays are unavoidable and preparations and measurements can take a finite time [20,21].
After tracing out the NIDF as well as all ancillas, which are no longer participating in the interaction and which we denote by A out , we write Eq. (18) as Notice that we have replaced U deph by the dephasing map (12) and due to Eq. (11) we have Here, |1, r k−1 describes the state of the IDF before the measurement, where the kth register is still set to its standard state '1'. Furthermore, we assumed that only ancilla A(k) is participating in the kth interaction, in principle more general scenarios are conceivable. 2 Now, we use the preparation apparatus P to prepare any an- prep . Due to Eq. (9) this preparation procedure is allowed to depend on the previous measurement results r k−1 , which we typically suppress in the notation. After tracing out P , we get Next, due to Eq. (13) the action of the ancilla measurement reads explicitly can be conditioned on all previous measurement results due to Eq. (9). The second line of Eq. (22) describes a giant Schrödinger cat 2 For instance, the present framework also allows to 'recycle' an old ancilla and to let it interact again with the system. This could implement a quantum correlated operation as mentioned at the end of Sec. 2. For ease of presentation we refrain from discussing the most general scenario with all its details. state with respect to the different superpositions of the measurement results r k . This cat is killed by the dephasing operation: where we introduced the superoperator P(r k )ρ A(k) ≡ P (r k )ρ A(k) P (r k ) corresponding to the measurement result r k . Equation (23) describes the state of the system, the bath, the kth ancilla, and the IDF of our autonomous black box model at the kth time step. To verify its equivalence with a quantum causal model, we imagine an external 'super-observer' (who has engineered the black box), who reads out the IDF by performing a projective measurement. If the super-observer finds the results r k , the (non-normalized) conditional state of the bath, system and kth ancilla of the black box is according to Eq. (23) After tracing out the bath and the ancilla and using Eq. (8), we are left with If we iterate this, we arrive at Eq. (4). This shows that our autonomous setup conditioned on obtaining the measurement results r k simulates any quantum causal model as introduced in Sec. 2.

Thermodynamic equivalence with the operational framework
In this central section we derive thermodynamic definitions at the 'unmeasured' level for our autonomous black box model (Sec. 5.2) and show that they naturally imply corresponding thermodynamic definitions at the trajectory level, which coincide with the definitions of Refs. [17][18][19] apart from one minor exception (Sec. 5.3). However, we also discuss possible ambiguities at the stochastic level (Sec. 5.4) and reconsider two other choices in the literature in light of our findings (Sec. 5.5). We start with some agreements though.

Agreements
The observer-dependent thermodynamic framework of Refs. [17][18][19] was derived under certain idealized assumptions, which we summarize here: I. Initial state. The global form of the initial state (16) remains, but we assume that the initial system-bath state is described by a Gibbs ensemble denoted by π, i.e., ρ SB (t − 0 ) = π SB (λ 0 ) ≡ e −βH SB (λ0) /Z SB (λ 0 ). Note that this is in general a correlated state. We also assume that the NIDF are initially described by a Gibbs state as specified in Point E above. They are initially decorrelated from the rest.
II. Classical and fast memory. The IDF are treated as an ideal classical memory. This implies that the IDF quickly dephase and, for all practically relevant times, are only classically correlated with the system and the ancillas, see Eq. (11). Furthermore, as already specified in Point E above, the IDF are energetically degenerate and the dephasing operation is implemented without energetic cost. Finally, the measurement of the ancilla modeled by the interaction V AI (λ t ) is idealized to be infinitely fast, i.e., of the form where t k denotes some time after the system-ancilla interaction.
III. Preparation apparatus. In principle, the preparation of the ancillas can have a thermodynamic cost. However, the goal of the repeated interaction framework is to include ancillas in an arbitrary nonequilibrium state into a consistent thermodynamic framework, regardless of how they were prepared [51][52][53]. Consequently, also Refs. [17][18][19] ignored the preparation costs of the ancillas. In our context, it suffices to point out that, at least in principle, it is possible that the preparation has zero thermodynamic cost (for instance, by implementing the preparation reversibly). Since we are not interested in practical realization of our autonomous model, but rather in the theoretical foundations of quantum stochastic thermodynamics, we neglect in the following any discussion about the thermodynamic cost of the preparation and simply assume that it provides us with the desired ancillas.
Finally, in this section we do not assume that the system-ancilla interaction V SA(k) (λ t ) happens instantaneously, but it can instead take a finite time as also considered in Refs. [19,52]. In this sense we are more general here than in Sec. 4. Indeed, we discuss at the end that an instantaneous, delta-like interaction causes a subtle difference in the thermodynamic description.

Thermodynamics at the unmeasured level
Our autonomous setup describes one big 'supersystem' SAI, which consists of the system S, the ancillas A and the IDF I, and which we label for the moment as X = SAI. It is connected to two heat baths: first, the bath B in direct contact with the system S and second, the NIDF N responsible for dephasing the memory. The overall setup can therefore be recast in form of the The following results are based on two recent advances in strong coupling thermodynamics. First, we use the quantum version [59] of the 'Hamiltonian of mean force' framework [60] (see also Refs. [61][62][63][64][65] for related research in this direction). Then, we combine it with the framework of Refs. [66,67] to take into account the initially decorrelated dephasing bath. A detailed calculation how to combine the two frameworks can be found in the Supplement of Ref. [19] and therefore we here only present its essential elements.
We start with the conventional definition of mechanical work, which quantifies the global change in internal energy, i.e., W (t) = tr{H tot (λ t )ρ tot (t)} − tr{H tot (λ 0 )ρ tot (t 0 )}. Furthermore, by construction the interaction V XN (λ t ) caused by the dephasing bath does not have any overall work cost, see Point E above. Therefore, we can identify the total work with the work done on the supersystem X. It can be expressed as as an integral over the instantaneous supplied power: Note that, whenever it will be clear from context, we will suppress the subscript on the trace operation in the following. Next, we turn to the internal energy. To define it, we need the concept of the Hamiltonian of mean force, which is defined via the reduced equilibrium state of a global canonical Gibbs state. Specifically, with respect to an arbitrary system X coupled to the bath B we define This implicitly defines the Hamiltonian of mean force H * X . Note that π * X = π X in general. In addition, H * X depends on the inverse temperature β and the control parameter λ t . Classically, it can be seen as an effective free energy landscape for the system, which is different from the bare energy H X due to the strong system-bath coupling. For readers unfamiliar with the framework of strong coupling thermodynamics, it might be easier to follow the rest of the paper by replacing the Hamiltonian of mean force H * X with the standard Hamiltonian H X , which amounts to assuming a weakly coupled heat bath. In fact, the main contribution of this paper is to provide a recipe to deduce trajectory-dependent thermodynamic definition from an autonomous picture without explicit measurements. With which thermodynamic definitions one starts at the unmeasured level is of rather minor relevance here. We only choose the strong coupling approach for the sake of generality to make clear that the resulting framework of operational quantum stochastic thermodynamics does not rely on the commonly used weak coupling or Markovian approximations.
We now define the internal energy of X as where ∂ β denotes a partial derivative with respect to the inverse temperature. The first line coincides with the standard definition within the Hamiltonian of mean force framework [59,60] and describes deviations from the weak coupling definition given by tr{H X ρ X (t)}. The second line in Eq. (28) needs to be added to take into account the initially decoupled second bath [19,66]. However, this expression can be simplified since the interaction V XN (λ t ) = V IN (λ t ) responsible for the dephasing of the IDF is expected to act only very shortly after each measurement and hence, for practically all times we can set tr{V XN (λ t )ρ XN (t)} = 0. 3 Hence, Because we now have a definition for work and internal energy, this automatically fixes the heat via the first law where Let us now turn to the second law. First, we define the thermodynamic entropy of the supersystem X Here, S vN (ρ) ≡ −tr{ρ ln ρ} denotes the von Neumann entropy and the second term is again a strong coupling correction [59,60]. Then, the second law of nonequilibrium thermodynamics states that the entropy production Σ is always positive, which can be expressed as (k B ≡ 1) Here, we defined the nonequilibrium free energy It differs from the conventional weak coupling definition solely by the replacement of H X with H * X . The positivity of entropy production follows from monotonicity of relative entropy [68,69] since where D[ρ σ] ≡ tr{ρ(ln ρ − ln σ)} denotes the quantum relative entropy. Showing the equivalence of Eqs. (32) and (34) is tedious, but follows only standard steps, see the Supplement of Ref. [19]. We now investigate the definitions above in detail by making extensive use of Eq. (11). First, the work (26) originates from the three time-dependent terms H S (λ t ), V SA (λ t ), and V AI (λ t ). The first two contributions can be written as The third contribution due to V AI (λ t ) can be simplified by noting that the ancilla and IDF are isolated during the measurement such that we simply have to add up the changes in the ancilla energies (remember that the IDF are energetically degenerate). Thus, let ρ A(k) (r k−1 ) denote the state of the kth ancilla after the interaction with the system but before the measurement (which can depend on r k−1 ) and let ρ A(k) (r k ) denote its state after the measurement conditioned on finding the IDF in state |r k . Then, if we split the work W AI (t) = k W AI(k) (t) into its contributions due to the kth control step, we find that This equation is derived in detail in Sec. 5.4. Next, we turn to the internal energy and first notice that the Hamiltonian of mean force can be simplified to This follows from the facts that the IDF are energetically degenerate and that the measurement of the ancilla happens after the interaction with the system. That is, at any given time the kth ancilla is either in contact with the system [and then V A(k)I (λ t ) = 0] or not, in which case H A(k) + V A(k)I (λ t ) commutes with the rest of the Hamiltonian. Since there is also at most one ancilla in contact with the system at a given time (say again the kth ancilla), we can also conclude that H * SA (λ t , r n ) = H * SA(k) (λ t , r n )+ i =k H A(i) . In the case of a causal model as considered in Secs. 2 and 4 (described by an instantaneous system-ancilla interaction) we can even set H * SA (λ t , r n ) = H * S (λ t , r n ) + k H A(k) . The splitting (38) together with Eq. (11) implies for the internal energy [denoting H * SA = H * SA (λ t , r n ) for simplicity] Similarly to the term V XN (λ t ) in Eq. (28), we have also here neglected the interaction term V AI (λ t ): it describes a very fast process, whose temporary resolution is unimportant for us, i.e., for most times V AI (λ t ) = 0, see Point II above in Sec. 5.1. The energetic change due to the measurement is nevertheless fully captured by the work (37). Finally, we look at the definition of entropy, Eq. (31). Due to Eqs. (11) and (38) this can be written as Similarly, the nonequilibrium free energy (33) becomes

Conditional thermodynamics: the canonical choice
Let us repeat our philosophy so far: We started with a quantum causal model and constructed an autonomous model, which simulates it. The unitary dilation theorem (5) as well as its extension (8) to nondeterministic interventions naturally forced us to introduce a stream of ancillas and a classical memory into the picture. Then, we studied the thermodynamics of the isolated autonomous model by combining recently developed tools in strong coupling thermodynamics [19,59,60,66,67] and simplified the resulting expression as much as possible. Now, we imagine the same situation as in Sec. 4 where an external super-observer measures the memory and obtains outcome r n . What is the internal energy and system entropy as well as the work supplied and the heat flow conditioned on this outcome? Above, we already wrote down all thermodynamic quantities in a suggestive way as an ensemble average over r n via where X is a placeholder for W, U, Q and S. Therefore, to get the right thermodynamic quantity X(t) on average, x(r n , t) presents its stochastic counterpart (denoted by a small letter as customarily done in stochastic thermodynamics). For instance, the stochastic work at the trajectory level follows from Eqs. (35), (36) and (37) as Likewise, the internal energy (39) and heat (30) at the trajectory level become where w(t, r n ) = w S (t, r n )+w SA (t, r n )+w AI (t, r n ). Finally, the entropy and nonequilibrium free energy follow from Eqs. (40) and (41): These quantities, which were derived from an inclusive, Hamiltonian approach, can now be compared with the proposed definitions in Refs. [17,19] (Ref. [18] deals with the classical counterpart). To compare them, one has to keep in mind that the definitions in Ref. [17] were proposed for the weak coupling regime. This implies H * X = H X and in particular ∂ β H * X = 0. Furthermore, the ancillas were called 'units' in Refs. [17,19]. The kth unit was denoted by U (k) and the entire string of units was denoted U (n) instead of A.
Apart from one minor exception, all definitions coincide. Therefore, the question raised in Ref. [17] "whether there exist good a priori arguments" (in contrast to the many a posteriori justifications given in Refs. [17][18][19]) to justify the definitions used in operational quantum stochastic thermodynamics can be unequivocally be answered with "Yes!" The exception concerns Eq. (45), which was previously interpreted as a heat exchange of the ancilla during the control operation, see, e.g., Eq. (31) in Ref. [17] or Eq. (13) in the Supplement of Ref. [19]. Within our autonomous approach we now recognize it actually as a work cost, see also below for more details. Interestingly, somewhat anticipating this case, this term was already excluded from the second law in Refs. [17,19]. Therefore, no major conclusion has to be changed apart from relabeling one term as work instead of heat. In fact, typically this term is of minor relevance as it vanishes, for instance, if the ancillas are energetically neutral or if the final measurement of them happens in their energy eigenbasis as in Refs. [20,21].

Ambiguities in stochastic work and heat
We first catch up on the promised derivation of Eq. (37) by focusing on the measurement of the kth ancilla. During that measurement, described by the interaction Hamiltonian V AI(k) (λ t ), the ancilla A(k) and the IDF are isolated. The change in their internal energy is therefore identical to the work supplied to them, i.e., Here, we used Eq. (11) and that the IDF are energetically degenerate such that we only have to track the change in expectation value of H A(k) . Remember that ρ A(k) = ρ A(k) (r k−1 ) denotes the state of the kth ancilla after the interaction with the system, which can depend on r k−1 . Next, we use Eq. (13) and take the trace over the IDF to infer that Notice that P (r k )ρ A(k) P (r k ) = P (r k )ρ A(k) (r k−1 )P (r k ) is a non-normalized state and its norm is the probability p(r k |r k−1 ) to obtain result r k given the previous results r k−1 . Thus, by writing denotes the normalized state of the kth ancilla after the measurement conditioned on r k , we obtain Here, we also used the elementary rules of probability theory p(r k ) = p(r k |r k−1 )p(r k−1 ) and r k p(r k ) = p(r k−1 ). This concludes the derivation of Eq. (37). Consequently, the stochastic work (45) was identified with the term following p(r k ) in Eq. (37) and the heat (47) is indirectly defined via the first law.
We are now in a position, where we can see the origin of the ambiguity in assigning heat and work at the trajectory level. Imagine we start with Eq. (50) again, but we express it as where H SA = H S (λ t ) + H A is the Hamiltonian of the system and all ancillas. This is possible since the operation U (k) meas acts only non-trivially on the kth ancilla and the IDF and hence, the expectation value remains unchanged when including additional degrees of freedom. If we then follow the same steps as above, we end up with Since this expression is still correct, it allows us to confirm by comparison with Eq. (50) that the average work injected into the system or the remaining ancillas is zero as expected. However, if we now follow the strategy X(t) = rn p(r n )x(r n , t) to identify the stochastic work, we obtain the definitioñ Now, the stochastic work injected into the system or the remaining ancillas is not zero since our state of knowledge about those entities changes when receiving the measurement result r k . Hence, if we sum this over all measurements k, we do not get back Eq. (45). Consequently, via the first law we also get a different expression for the stochastic heat (47). Note that this ambiguity of assigning stochastic heat and work only happens during the measurement step of the ancilla, i.e., Eqs. (43) and (44) remain unchanged, and it also does not affect the definitions of state functions such as stochastic internal energy or entropy.

Comparison with other choices in the literature
Together with the section above we are now in a position to reconsider other choices in the literature. In particular, the question of how to thermodynamically describe a projective measurement of a quantum system has gained a lot of attention. For that particular class of interventions it is actually superfluous to consider the stream of ancillas and one could directly look at an interaction between the system and the kth IDF to implement a projective measurement as described in Point F of Sec. 3. On the other hand, nothing will change in our conclusions if we keep the ancilla but simply assume that it is energetically degenerate, i.e., H A ∼ 1 A for the rest of this section.
We start with the two-point projective measurement scheme [33,34], which is a theoretically successful approach to derive quantum fluctuation theorems. In there, one considers an isolated system subjected to two projective measurements of the energy at the beginning and at the end of the protocol. The difference in the measurement outcomes is interpreted as the stochastic work in this framework. This stochastic work includes two terms. One term is due to changing the system Hamiltonian H S (λ t ) in time, which is fully captured by Eq. (43). The other term interpretes the change in energy caused by updating our state of knowledge due to the final projective measurement as work, which corresponds to the alternative choice (55). Adding these two contributions, results in Now, we specialize to the two-point projective measurement scheme, where r n = (E 0 , E 1 ) only denotes the two results of the initial and final projective measurement. The corresponding eigenstates of the initial and final Hamiltonian are denoted as |E 0 and |E 1 and we identify ρ S (E 0 , E 1 ) = |E 1 E 1 | and ρ S (E 0 ) = U S (t 1 , t 0 )|E 0 E 0 |U † S (t 1 , t 0 ) denotes the unitarily evolved system state prior to the final measurement. Since the system is assumed to be isolated here, it follows that Therefore, Eq. (57) reproduces the work statistics of the two-point projective measurement approach. Does this imply that Eq. (55) is the natural choice instead of Eq. (45)? Notice that in this paper we were mainly interested in an open system coupled to a heat bath. Now, suppose we were to follow the ideology of the two-point projective measurement approach and consider the following example. At some initial time t 0 we have prepared a two-level system with energy gap Ω in its excited state, ρ S (t 0 ) = |e e|, which then evolves in time while being in contact with a heat bath (which, for the sake of simplicity, is considered to be an ideal weakly coupled Markovian heat bath here). Then, we perform at time t 1 > t 0 a measurement of its energy and find it in the ground state |g . If we do not drive the system (λ t = constant), its change in internal energy is simply Clearly, a natural interpretation of this situation would suggest to identify ∆u with the heat exchanged with the bath, which induced at some unknown time t ∈ (t 0 , t 1 ) a jump from the excited to the ground state. Instead, the two-point projective measurement approach would identify parts of ∆u as work, namely the part of energy change caused by a change of its state from ρ S (t 1 ) (the state prior to the measurement at t 1 ) to |g g| (the post-measurement state), cf. Eq. (55). For open quantum systems, the two-point projective measurement approach therefore does not reproduce our classical intuition about heat exchanges induced by stochastic transitions from one state to another, which are revealed by updating our state of knowledge. In fact, one can show that the canonical choice of Sec. 5.3 reduces to the conventional definitions used in classical stochastic thermodynamics [2,4,5] when considering ideal continous measurements of an open classical system [17]. Furthermore, note that Eq. (57) excludes the energetic cost of the first measurement yielding result E 0 . However, if the system was prior to the measurement in weak contact with a heat bath and only afterwards isolated, Eq. (57) is the correct stochastic work if one adapts the convention that Eq. (45) is the correct choice for open quantum systems. An opposite interpretation to the two-point projective measurement approach was suggested in Ref. [35], where the change in energy of an isolated system due to a projective measurement of an arbitrary observable was identified as heat. This heat does not appear in any second law and it was called "quantum heat". While we see that our canonical choice in Sec. 5.3 allows to identify parts of the changes in energy due to a projective measurement as heat, on average it predicts that any change in energy due to a measurement is due to work, which follows from Eq. (54). This average, derived within our inclusive, Hamiltonian approach, agrees with the two-point projective measurement approach on average and coincides with the "switching work" known from the repeated interaction framework [70], see also Ref. [52]. Therefore, the concept of "quantum heat", at least as originally introduced in Ref. [35], does not have any theoretical foundation within our autonomous approach.

Final remarks
The main message of this paper is a very positive one. After 20 years of debate, the present paper shows that there exists a straightforward way to derive the definitions of quantum stochastic thermodynamics by starting from unambiguous notions at the unmeasured level. A certain amount of freedom in defining heat and work at the stochastic level remains, but additional arguments can be invoked in favour of one or the other. In particular, the most consistent choice might depend on the question whether the considered system is open or isolated. That this can give rise to different thermodynamic definitions should not be too surprising as this is the same in classical thermodynamics. Furthermore, the resulting definitions turn out to be surprisingly simple and mostly follow from what was known (since a long time) at the unmeasured level if one correctly takes into account the measurement results r n . This basically means that one has to replace ρ S (t) by the correct state of knowledge ρ S (t, r n ) to compute, e.g., the stochastic work or internal energy.
Furthermore, it cannot be overemphasized that the operational framework of quantum stochastic thermodynamics equips a large class of quantum causal models with a consistent thermodynamic interpretation, even along a single trajectory. The main physical assumptions is an initially equilibrated system-bath state, the remaining assumptions listed at the end of Sec. 2 are rather of minor relevance for current practical purposes in quantum thermodynamics. In particular, the present paper shows that the strong coupling definitions even hold in case of real-time feedback control, which could not be established in Ref. [19]. Thus, operational quantum stochastic thermodynamics opens up the possibility to analyse the thermodynamics of almost every quantum experiment, even beyond average quantities, and its thermodynamic consistency is guaranteed by virtue of the results reported here.
There is one caveat, however, which is not linked to the framework of operational quantum stochastic thermodynamics per se but rather to the limit in which a quantum causal model or quantum stochastic process is defined. As long as the system-ancilla interaction is not instantaneous, a clear advantage of operational quantum stochastic thermodynamics is that it allows to define thermodynamic quantities, even along a single trajectory, solely in terms of experimentally available information. Everything can be computed based on knowledge of the conditional system-ancilla state ρ SA (t, r n ) given a trajectory of measurement results r n . In this sense, the theory is fully 'operational'. But, quite ironically, this is no longer true in the peculiar limit, where the system-ancilla interaction V SA(k) (λ t ) is idealized as a delta-peak [see Eq. (17)]. This implements a unitary U (k) ctrl on the system-ancilla space, whose energetic change is work. But if the system-bath coupling V SB is not negligible, the work W SA (t k ) invested in the kth control operation becomes = tr H tot (λ k )(U This shows that one has to eventually evaluate the term tr SBA {V SB (U (k) ctrl −I)ρ SBA (t k )}, which requires explicit knowledge about the bath degrees of freedom, albeit for any smooth, non-singular time-dependence of V SA(k) (λ t ) this is never necessary, see Eq. (44). Thus, beyond the weak coupling regime, the strict limit of a quantum causal model makes the operational approach no longer fully operational. However, at least for typical open quantum systems linearly coupled to a quadratic bath, Eq. (59) can be still efficiently computed using reaction coordinate master equations as explicitly demonstrated in, e.g., Refs. [27,71,72].