Probability in many-worlds theories

We consider how to define a natural probability distribution over worlds within a simple class of deterministic many-worlds theories. This can help us understand the typical properties of worlds within such states, and hence explain the empirical success of quantum theory within a many-worlds framework. We give three reasonable axioms which lead to the Born rule in the case of quantum theory, and also yield natural results in other cases, including a many-worlds variant of classical stochastic dynamics.


Introduction
Despite the amazing empirical success of quantum theory, its implications for the nature of reality remain controversial. From a realist perspective, the key issue is to replace the measurement postulates of textbook quantum theory with a more objective and welldefined structure. Supplementing the theory with hidden variables [1,2], or including spontaneous collapse laws [3,4,5] are two possible approaches. However, arguably the simplest approach, initially proposed by Everett [6], is to drop the measurement postulates altogether. The theory then simply describes a vector in Hilbert space undergoing unitary evolution. Considering the kinds of unitary interaction that constitute measurements, we find that they can be described by a branching of the initial state into a superposition of 'worlds' each containing a different measurement result. This is the many-worlds interpretation of quantum theory [6,7]. Observers in each world will see a distinct result, hence on a qualitative level this is consistent with our experiences. However, a key challenge is to recover the probabilistic predictions of quantum theory within this deterministic manyworlds setting. In particular, we want to explain why the observed relative frequencies of quantum results in our world, for a huge variety of different scenarios, are very close to those predicted by the Born rule [8] (i.e. relative frequency ≈ |quantum amplitude| 2 ).
The approach we pursue here is to argue for a 'natural' way to pick one world at random from a manyworlds state. i.e. a natural probability distribution over worlds. We can then consider which worlds are typical with respect to this probability distribution. If worlds containing relative frequencies in quantum Anthony J. Short: tony.short@bristol.ac.uk experiments consistent with the Born rule are typical, this would offer an explanation for our observationsthat we are living in a typical world, or are a typical instance of ourself. A probability distribution over worlds can also be interpreted as a measure, allowing one to discuss properties of 'most' or 'almost all' worlds. The theory predicts that atypical worlds also exist, in which other instances of ourself have seen strange results, and perhaps do not believe quantum theory. However, this seems comparable to the situation in standard quantum theory, where we explain the observed measurement results by noting that they are typical with respect to the Born rule, and concede that it is possible we could have obtained very strange results instead, and perhaps arrived at a different theory.
We identify three reasonable axioms, not specific to quantum theory, which we would expect a natural probability distribution over worlds to obey. In the case of quantum theory, we find that the unique probability distribution obeying those axioms is that given by the Born rule. If the axioms we require for naturalness are convincing, this could therefore explain the empirical success of the Born rule.
As the probabilistic axioms are intended to be general in nature, we consider a simple class of manyworlds theories which includes other possibilities in addition to quantum theory. We consider four specific models within this framework, and highlight any differences in the resulting probability distributions, as well as one case in which a natural probability distribution obeying our axioms does not exist.
In order to focus solely on the issue of probability, rather than on precisely how to decompose the state into different worlds (often known as the preferred basis problem), we consider a toy-theory in which the set of worlds are given. Intuitively, we can think of these worlds as states which could be understood classically on a macroscopic scale (e.g. tables and pointers have well-defined locations, and observers have welldefined experiences). These states are important in relating the content of the theory to our experiences, and are related to the locality of physical interactions and decoherence [9].
Previous approaches to understanding probability in many-worlds include Everett's original work [6], which also sought to define a natural measure over worlds in order to consider a typical observer, although Everett's assumptions (that the measure depends only on the corresponding amplitude, and is ad-ditive when several orthogonal states are relabelled as a single state) are not completely intuitive. Typicality is also considered in a very different approach based on matter density [10]. Vaidman discusses many different approaches to deriving the Born rule in [11]. He has proposed the 'measure of existence of a world' [12] and notes that deriving the Born rule requires additional assumptions, that may be based on symmetries [13] or other natural properties [11]. Zurek [14,15] uses envariance to derive the Born rule (which involves symmetries when the state is decomposed into a system and environment). An alternative approach is based on relative frequencies for infinitely repeated experiments [6,16], but this is challenging to relate to finite experiments. More recently, an approach based on decision theory was proposed by Deutsch and developed by Wallace and others [17,18,19,20,21]. Several of the technical steps in our proofs are similar to those in this decision-theoretic approach. However this approach has led to criticisms [22,23], and because of its focus on the future rather than the past, it seems that a decision-theoretic picture alone cannot explain why we find ourselves in a world in which the historical record of quantum experiments agrees with the Born rule. Very recently, Saunders has proposed an approach [24] based on dividing the state into equal-amplitude branches based on the decoherence structure and following similar counting arguments to those in statistical mechanics.

A simple class of many-worlds theories
Consider a Hilbert space with a countably infinite orthonormal basis of possible 'worlds' |n labelled by non-negative integers, n ∈ {0, 1, 2, . . .}. The space may be over the real or complex numbers depending on the particular theory under consideration. A many-worlds state is a vector |v in this space. We will refer to v n = n|v as the amplitude of world n. For simplicity, we restrict ourselves to linear manyworlds theories for which the evolution over a finite interval is given by a linear operator T , such that We will denote the matrix elements of T in the basis of worlds by T ij = i|T |j . Our aim will be to derive a natural probability distribution p n over worlds in the state |v . Similarly, we will denote the probability distribution over worlds in |v by p n . Different theories will be characterised by giving the set of allowed state vectors and transformations in the theory (with the requirement that the allowed transformations must map allowed states into allowed states). The cases we consider here are: 1. Quantum many-worlds theory. The allowed states are all complex vectors satisfying v|v = 1, and the allowed transformations are all unitary operators (for which T T † = T † T = I, where I is the identity transformation). We will show that in this case p n = |v n | 2 .
2. Unnormalised quantum many-worlds theory. An alternative version of the quantum manyworlds theory with unnormalised states. The allowed states are all complex vectors satisfying 0 < v|v < ∞, and the allowed transformations are all unitary operators. We will show that in this case p n = |vn| 2 k |v k | 2 . Note that in this case the probability p n corresponding to a given world depends on all amplitudes, and not only on v n .
3. Stochastic many-worlds theory This represents a many-worlds version of a classical probabilistic world. The allowed states are those with real amplitudes satisfying v n ≥ 0 ∀n and n v n = 1, and the allowed transformations are those satisfying T ij ≥ 0 ∀i, j and i T ij = 1 ∀j. Note that although v n and T ij obey the same mathematical properties as probability distributions and stochastic maps respectively, we are not assuming that they represent probabilities. However, we will show that in this case p n = v n .
4. Discrete many-worlds theory A many-worlds theory in which there are an integer number of copies of each world, with the dynamics transforming each world in the original state into a finite number of new worlds. The allowed states are all real vectors for which v n is a non-negative integer for each n and n v n < ∞, and the allowed dynamics are those for which T ij is a nonnegative integer for all i, j and i T ij < ∞ ∀j.
We will show that in this theory there is no natural probability distribution p n obeying our axioms.

Natural axioms for a probability measure
For each of the theories considered above, we will investigate whether one can define a probability distribution over worlds p n obeying the 'reasonable' axioms given below.
1. Present state dependence -p n depends only on the present state |v , and not on how that state was generated.
2. Weak connection with amplitudes -v n = 0 implies that p n = 0. Hence only worlds with nonzero amplitudes are considered 'real' components of the many-worlds state.
3. Weak connection with transformations -If the set of worlds can be partitioned in such a way that a transformation T acts separately on each part, then T will preserve the total probability of each part. This captures the intuition that probability cannot 'flow' between worlds that are uncoupled by the dynamics. Expressing this more formally, if there exists a partition of the non-negative integers into subsets S k such that T ij = 0 whenever i and j are in different subsets, then n∈S k p n = n∈S k p n for all k.
Of these axioms, 1 and 2 both seem relatively uncontroversial, and are usually assumed without comment in approaches to probability in many-worlds. Axiom 3 is more subtle and powerful, but also seems a natural property in the context of transformations. We will discuss these axioms further at the end of the paper. However, we will first show how they can be used to derive probability measures for the theories given in section 2.

Deriving probability measures
For ease of understanding, in this section we describe informally how axioms 1 -3 allow us to derive probability rules for the many-worlds theories we consider, highlighting the key steps in the proof. Detailed formal proofs of all of the results can be found in appendix A. We first prove two helpful lemmas which we use in deriving the probability rules, that apply to bounded states (those which contain only a finite number of worlds with non-zero amplitude) L1 Equal amplitudes give equal probabilities. For bounded states in all of the theories we consider, if two worlds have equal amplitude then they have the same probability.
To prove this, we first consider a transformation which swaps two worlds, one of which has zero amplitude and the other has non-zero amplitude. Due to axiom 3, the total probability of the two swapped worlds must stay the same, and due to axiom 2 the worlds with zero amplitude must have zero probability. Hence the probabilities are also swapped.
The same argument doesn't apply if we swap two worlds with non-zero amplitudes directly, as probabilities could move between the worlds. However, by performing a sequence of three swaps, each of which involves one world with zero amplitude, we can swap any two worlds and their corresponding amplitudes and probabilities whilst leaving the remaining worlds unchanged. If the amplitudes of the two swapped worlds were initially the same, then the final state will be the same as the initial one, but with the probabilities swapped. From axiom 1 it then follows that the probabilities for these two worlds must be the same.
L2 Larger amplitudes cannot lead to smaller probabilities. For bounded states in all the theories we consider except discrete many-worlds theory, if one world has a larger amplitude than another, its probability must be at least as large.
To prove this, we perform a transformation which branches the higher amplitude world into two final worlds, one of which has the same amplitude as the smaller amplitude world. From axiom 3 the sum of the probabilities in the two branches must be equal to the probability of the initial higher amplitude world. However, one of the branches has the same probability as the smaller amplitude world, hence the probability of the higher amplitude world must be at least as large.
Next, we use these results to derive probability rules for our many-worlds theories. We begin by considering a particularly simple class of quantum manyworlds states in which the amplitudes are square roots of rational numbers.
Following a branching strategy similar to Deutsch and Wallace [17,20], we perform a unitary transformation in which each world |n evolves independently of the others into m n new worlds with equal amplitude. Each world in the final state has the same amplitude 1 √ M and thus equal probability 1 M from L1 above. Applying axiom 3 to the transformation tells us that the probability in each set of branches is conserved, hence we find that the probability of world |n in the initial state must be mn M , which is the standard quantum result.
To extend this result to arbitrary initial states in quantum many-worlds theory, we must perform three additional steps. Firstly, if the initial state contains an infinite number of worlds with non-zero amplitudes, we begin by transforming it into a superposition of finitely many worlds in order to create some 'working space' in which to apply the above results. For example, when considering the probability p k we can perform a unitary transformation that merges all of the worlds with labels greater than k into a single world, without affecting the other worlds (and hence not changing p k due to axiom 3). For an arbitrary positive integer M , we can then write the state as where m n is the largest integer less than or equal to M |v n | 2 , and 0 ≤ n < 1. Next, we eliminate the phase factors e iφn by applying a unitary which acts separately on each world, and therefore does not affect the probabilities. Finally, we perform a branching unitary, in which each initial world |n is transformed into a superposition of m n new worlds with equal amplitude, and one new world with smaller amplitude. The final state contains approximately M worlds with equal amplitude 1 √ M and at most N worlds with smaller amplitude. When M N , the smaller amplitude worlds are almost irrelevant (due to L2 above), and the probability associated with world n in the initial state is By considering arbitrarily large M , this argument can be made exact, giving the standard quantum probability rule p k = |v k | 2 . The derivation of the probability rules in unnormalised quantum theory and stochastic many-worlds theory are very similar. In the former case, the main difference is that the final state contains approximately M X worlds with equal amplitude, where X = m |v m | 2 . This gives p n ≈ mn M X ≈ |vn| 2 X , and yields the expected quantum probability rule for this case, p n = |vn| 2 m |vm| 2 . In the case of stochastic manyworlds theory, the steps are identical to those for quantum many-worlds theory, except without phase factors, square roots, and absolute-values-squared.
It is also important to note that the probability rules derived so far satisfy the axioms for all allowed states and transformations, and not only the ones considered when constructing the proof. For example, if a unitary transformation is block-diagonal in quantum theory, it commutes with the projectors onto each block, and hence conserves the total probability of each block.
Finally, to see that no probability rule obeying our axioms is possible for Discrete many-worlds theory, consider a transformation on the state |0 + |1 which takes |1 → |1 + |2 whilst leaving all other worlds unchanged. Applying L1 to the initial state and using axiom 3 gives p 0 = 1 2 , while applying L1 to the final state |0 + |1 + |2 gives p 0 = 1 3 . As this leads to a contradiction, no probability rule obeying the axioms exists 1 .
1 A similar argument can be used to rule out a 'naive branch counting' strategy in quantum many worlds theory, in which each world with non-zero amplitude is assigned equal probability [25]

Discussion
In this section we present some additional discussion about the axioms, theories beyond quantum theory, and decoherence in our approach.

Axioms
1. Present state dependence -This is a simplifying axiom, and incorporates the fact that the state at a given time should be sufficient to make any substantive claims about it, including the typical properties of worlds within it. Arguably if historical information is important, it should form part of the state, and our framework should be extended. Furthermore, this axiom allows probabilities to be assigned to worlds in an arbitrary initial state, without needing to know how that state was generated.
2. Weak connection with amplitudes -This seems the most compelling requirement. Without this, one could simply assert that the state is irrelevant, and p 0 = 1 in all cases.
3. Weak connection with transformations -This is the most complicated of the three assumptions, but it is hard to see a weaker way of incorporating a dependence on the dynamics of the theory. Without this, we could assign an arbitrary probability distribution over the worlds appearing in every state (for example we could always assign probability 1 to the world with non-zero amplitude having the lowest numerical label). Within quantum theory, this also fits nicely with the continuous time picture in which T = e −iHt for some Hamiltonian H, as T will act separately on each partition if H does. Note that in terms of the proofs, we only need this axiom to apply to a specific set of unitaries involving branching, swapping, or merging of worlds.
An alternative to axiom 3, which is a stronger assumption but offers a nice conceptual picture is: 3'. Weak connection with transformations-For every state |v and transformation T in the theory, there exists a conditional probability distribution P i|j such that satisfying P i|j = 0 whenever T ij = 0. This ensures that probability can only 'flow' between states which are linked by the dynamics.
The existence of a conditional probability distribution P i|j for each transformation of the state supports the idea that living within an evolving many-worlds state could feel like undergoing a stochastic evolution.
There is also a nice symmetry between the claim that T ij = 0 =⇒ P i|j = 0 and v n = 0 =⇒ p n = 0. Axiom 3' is strictly stronger than the original axiom 3 as it implies it but is not implied by it. In particular when the worlds can be partitioned into subsets on which T acts separately, then the conditional probability distribution P i|j can only redistribute probability within these subsets, and hence the total probability of each subset is preserved. However, one can imagine a trivial theory with only one allowed state |v = |1 +|2 , and one allowed transformation T = |1 2| + |2 1|. Under axiom 3 we could assign an arbitrary probability p 1 for this state, but with axiom 3' we must have p 1 = p 2 = 1 2 . Because axiom 3' is stronger than axiom 3, we can derive all of the same results as before (and Lemma 1 now follows directly by permuting the two worlds in question). However, a more subtle point is whether the probability rules derived previously actually satisfy axiom 3' in all cases. For example, for all states |v and unitary transformations T in quantum manyworlds theory with p n = |v n | 2 , can we always find a conditional probability distribution P i|j satisfying the requirements of axiom 3' ? This is by no means trivial, but it is a consequence of results in [26,27,28] (in particular the Flow or Schrödinger Theory presented in [27]) that this is indeed the case. In general, for a given |v and T within quantum theory, there may be many different conditional probability distributions P i|j satisfying axiom 3'. If conceptual importance is to be given to this quantity, it may therefore be useful to consider adding additional axioms which specify it uniquely, or properties which are independent of this choice. Similar results will apply for unnormalised quantum theory, and in the case of stochastic many-worlds theory we can simply take P i|j = T ij .
Overall, it is interesting that one can overlay a stochastic evolution onto a many-worlds state with natural properties, and that doing so is helpful in deriving the Born rule.

Non-quantum many-worlds theories
Recovering a natural probability rule in the case of stochastic many-worlds theory shows that this approach is not specific to quantum theory. Furthermore, this theory may also be interesting in its own right. In particular, it is difficult to make sense of theories involving objective probabilities at a fundamental level 2 . This result suggests an interesting alternative possibility of treating objective probabilities as amplitudes in a many-worlds state. The fact that 2 Frequentism only gives a definite prediction for infinitely many trials, which is irrelevant for practical situations, and the principal principle [29] seems somewhat unsatisfying. Kent's incompressible bit string provides another alternative approach [22] we find ourselves in a world which is typical with respect to the objective probabilities, and that these are helpful in subjective decision making, would then be a consequence of the natural probability distribution over worlds matching the amplitudes.
For discrete many world theory there is no natural probability distribution over worlds obeying our axioms. Two alternatives for understanding probabilities within this theory are to violate axiom 1 or axiom 3. In the former case, we could assume some initial probability distribution, and then update this probability distribution during transformations according to This would lead to the same natural probability distribution as a Stochastic many-worlds theory for which the transition matrix elementsT nm are given by the bracketed expression in (6). This would lead to a natural flow of probability between worlds (obeying axiom 3), but the same state could yield many different probability distributions according to how it was generated. In the latter case, we could instead drop axiom 3, and take This is a function of only the current state (obeying axiom 1), but the probability of a world can change even when the transformation acts on it like the identity.

Decoherence
Note that this derivation of the Born rule does not rely on decoherence, and indeed employs transformations such as permutations of worlds which would be essentially impossible to achieve in practice. However, such transformations are possible in principle, and it is helpful to use the full strength of the theory to generate constraints upon possible probability rules. The two particular instances where this is helpful are in proving Lemma 1, and when compressing states with support over all worlds in order to generate 'working space'. An alternative for the former is to take Lemma 1 directly as an additional axiom, perhaps motivated by symmetry, but the internal structure of the two worlds may look very different, and it seems quite strong to assume this is irrelevant in determining p n . The latter could possibly be eliminated by adding a continuity axiom, but then one would have to choose a particular distance measure on states. Decoherence also plays a key role in explaining 'collapse' in realistic situations, as it becomes practically impossible to re-interfere macroscopically distinct states.

Conclusions
If reality has a deterministic many-worlds structure, as in the Everett interpretation, then there are no objective probabilities in the theory. In this case, how should we understand the fact that we are living in a world in which the relative frequencies of outcomes in past quantum experiments are very close to the probabilities predicted by the Born rule? Although it is consistent to say that this is a mere coincidence (as a world with such results must exist), it would be good to give a deeper explanation of this fact.
The approach pursued here is to establish a natural way of picking a world at random from the manyworlds state, and then observing that with very high probability such randomly chosen worlds have the property that they agree with the Born rule. i.e. that this is a typical property of the worlds. The key is to motivate such a natural way of picking worlds at random. One apparently natural way (at least for states containing a finite number of worlds) is to simply assign equal weight to each world with non-zero amplitude, but this ignores some of the mathematical structure in the state, and it violates one of our reasonable axioms (axiom 3). Other problems with this strategy have been discussed in [20].
It seems impossible to give a completely compelling set of requirements which a probability distribution over worlds must obey, but we have defined three natural axioms which are sufficient to recover the Born rule in the context of quantum theory, and also give an appealing result for classical stochastic theory. These axioms are that probabilities: only depend on the current state, are zero for worlds which appear with zero amplitude, and cannot flow between sets of worlds which are uncoupled by the dynamics. It would be interesting to explore alternative possible requirements, and would also be good to reconsider issues relating to the choice of 'world' basis in this context.
Are results such as these sufficient to explain the empirical success of the Born rule? An interesting perspective is to consider a universe in which the Everett interpretation is true, described by a single unitarily evolving quantum state. Under reasonable conditions, such a state could be described by a superposition of branching worlds, many of which contain structures which look like people. What would it be like to live as one of those people in such a universe? If it is like our own experiences then this supports the many-worlds interpretation. If it is very different then this would rule it out. The strangest situation would be if we cannot in principle say what it is like, given that we know the correct mathematical theory describing the universe.

A Detailed proof of probabilities
We begin by proving two useful lemmas which apply to probability distributions obeying our axioms for the many-worlds theories we consider. These lemmas focus on bounded states, which have a finite number of non-zero amplitudes (such that v n = 0 for n ≥ N ), but the final theorems will apply to any state.
We first show that for any theory in which all permutations are allowed transformations (which includes all of the theories in this paper), any two worlds with the same amplitude have the same probability.

Lemma 1 (equal amplitudes). Consider a theory in which all permutations of worlds are allowed transformations (T is a permutation of worlds if T ij = δ i,π[j]
where π is a bijection of the non-negative integers), and a state |v = Proof. Denote by T n↔m the transformation which swaps worlds n and m and leaves all other worlds unchanged (T n↔m = |n m|+|m n|+ k / ∈{n,m} |k k|). Hence in this case T n↔m |v = |v . Directly applying axiom 3 to this transformation is not sufficient, as this doesn't tell us how probabilities change inside the n, m subspace. Instead, we note that v N = 0, and that T n↔m = T n↔N T n↔m T N ↔m . For the first transformation T n↔N , consider a partition of worlds in which n and N are in one part, and each other world is in its own part. Due to axiom 3, we know that p n + p N = p n + p N , and that p k = p k for all k / ∈ {n, m}. Furthermore, from axiom 2 and the fact that v N = v n = 0, we know that p N = p n = 0. Hence p N = p n and T n↔N permutes the corresponding probabilities. Following a similar logic for the two subsequent transformations T n↔m and T N ↔m (each of which permutes two amplitudes, one of which is zero), we find that the probabilities p n after the sequence T n↔N T n↔m T N ↔m are the permutation of the original probabilities by T n↔m , hence p n = p m . However, due to axiom 1 and the fact that the initial state is the same as the final state in this case, p n = p n . Hence p n = p m .
Our second lemma shows that, in all except the discrete many-worlds theory, if one world has larger amplitudes than another it cannot have a smaller probability. This is useful in deriving the probability rule for cases where the amplitudes are not related to rational numbers in a convenient way.
Lemma 2 (larger amplitudes cannot lead to smaller probabilities). Consider a state |v = N −1 n=0 v n |n in quantum many-worlds theory, unnormalised quantum many-worlds theory, or stochastic many-worlds theory for which |v l | > |v k |. Then p l ≥ p k .
Proof. For quantum many-worlds theory, or unnormalised quantum many-worlds theory, consider the unitary evolution which acts in the {|l , |N } subspace as and satisfies T |m = |m if m / ∈ {l, N }. For stochastic many-worlds theory consider a stochastic evolution which acts on |l as and satisfies T |m = |m if m = l. In both cases this means that for |v = T |v , v l = v k and hence p l = p k from Lemma 1. From axiom 3 we have that p N + p l = p N + p l and p k = p k , and from axiom 2 we have p N = 0. Hence Using the above lemmas and the probability rule axioms, we now derive the appropriate probability rules for our many-worlds theories (or in the case of discrete many-worlds theory show that this is impossible). To illustrate the key idea, we first derive the probability rule for quantum many-worlds theory in the case in which all amplitudes are square roots of rational numbers, using a similar branching strategy to Deutsch and Wallace [17,20] (although without any decision theoretic component). |N + nM + l .
As this is a superposition of worlds with equal amplitude it follows from Lemma 1 that the probability assigned to each world in this state must be identical. Hence p m = 1 M for each world in this state. From axiom 3 and equation (13) it then follows that p n = mn N .
Next, we use Lemma 2 to extend this result to the general case of arbitrary states.
Theorem 2 (general quantum probability rule). For an arbitrary state |v = n v n |n in quantum manyworlds theory, p n = |v n | 2 .
Proof. We first transform the initial state (which may have infinitely many non-zero components) into a bounded state to create some working space. Consider a particular probability p k . In order to determine this, we first perform a unitary T with the property that which acts separately on each of the worlds 0 to k, and collectively on the rest (merging them into a single world). This gives a bounded state containing N = k + 2 worlds with the property that p k = p k from axiom 3. Next, we pick a large non-negative integer M , and write the transformed state |v as world). This gives a bounded state containing N = k + 2 worlds with the property that p k = p k from axiom 3. Next, we pick a large non-negative integer M , and write the transformed state |v as |v = N −1 n=0 m n + n M |n (27) in which m n = M v n and n = M v n − M v n , where x is the floor function. Hence each m n is an integer and 0 ≤ n < 1. We then perform a transformation which branches each world in the original state into m n or m n + 1 worlds in the final state (depending on whether n = 0) via a stochastic transformation acting on the partition of worlds S k as in theorem 1,