Efficient qubit phase estimation using adaptive measurements

Estimating correctly the quantum phase of a physical system is a central problem in quantum parameter estimation theory due to its wide range of applications from quantum metrology to cryptography. Ideally, the optimal quantum estimator is given by the so-called quantum Cram\'er-Rao bound, so any measurement strategy aims to obtain estimations as close as possible to it. However, more often than not, the current state-of-the-art methods to estimate quantum phases fail to reach this bound as they rely on maximum likelihood estimators of non-identifiable likelihood functions. In this work we thoroughly review various schemes for estimating the phase of a qubit, identifying the underlying problem which prohibits these methods to reach the quantum Cram\'er-Rao bound, and propose a new adaptive scheme based on covariant measurements to circumvent this problem. Our findings are carefully checked by Monte Carlo simulations, showing that the method we propose is both mathematically and experimentally more realistic and more efficient than the methods currently available.


Introduction
The aim of statistical estimation theory in classical systems is to estimate a probability distribution based on a series of observations. More precisely, given a statistical model P = {p(x | θ) | θ ∈ Θ ⊂ R n } and a series of observations {X 1 , . . . , X N } generated by a particular distribution density p(x | θ), the aim is to design an estimatorθ(X 1 , , . . . , X N ) which is as close as possible to the parameter θ, measured in terms of a cost function, usually the Mean Square Error (MSE), betweenθ and θ. A fairly remarkable result, the celebrated Cramér-Rao bound (CRB), states that the variance of an unbiased estimator, is at least as high as the inverse of the Fisher information of p(x | θ). The estimators that saturate this bound are called efficient. An example of an efficient estimator is the maximum likelihood, that saturates the CRB when the number of measurements tends to infinity.
This classical result has, however, some subtleties, namely: for the maximum likelihood estimator to be consistent (i.e. the estimator converges to the parameter when the number of outcomes tends to infinity), the likelihood function must be identifiable (i.e. the likelihood has a unique global maximum), Θ must be a compact set, and all probability distributions in the statistical model must have the same support. Statistical models satisfying the previous conditions are called regular models.
While the actual role that the process of measurement plays in extracting information from a classicaly system is not particularly relevant, the situation is completely different in a quantum system, making the process of parameter estimation mathematically more involved. Suppose ρ is an unknown density operator acting on a Hilbert space H, that represents the state of the system. Let us assume that ρ belongs to a particular subset of parametrized states The parametrized states ρ θ are the result of the evolution of an initial probe state ρ by a trace-preserving dynamical process dependent upon a parameter θ. Note that different initial conditions give rise to different subsets of parametrized states. Quantum parameter estimation aims to produce the best estimation for θ under the premise that the system is prepared in a state of the parametric family S. Notice that, unlike its classical counterpart, quantum estimation consists of two parts: the choice of the measurements to perform on the system, and then the processing of their outcomes through an estimatorθ. Note that measured data is a set of random outcomes with a probability distribution depending on the parameter, and an estimator is a random variable whose outcomes estimates the parameter. Thus, the pair measurement-estimator forms a strategy of estimation in the quantum case one can play around with seeking for an optimal estimation strategy. Indeed, each choice of measurement defines a classical statistical model, whose lowest possible CRB maximises the Fisher information. Thus the maximum of all possible Fisher informations, over the space of measurements allowed by the postulates of Quantum Mechanics, is called the Quantum Fisher Information (QFI), and its inverse, the Quantum Cramér-Rao Bound (QCRB). In other words, the QCRB is the minimum over all possible MSE of any possible estimation strategy allowed by Quantum Mechanics. If the Fisher information for a particular choice of measurement M is equal to the QFI, this measurement is called optimal. One way to find an optimal measurement is to use the set of operators {M (j)} to represent the measurement, where each M (j) corresponds to a projector onto each of the eigenspaces of the symmetric logarithmic derivative (SLD) L(θ) [5], defined by ∂ρ θ ∂θ = 1 2 (L(θ)ρ θ + ρ θ L(θ)) .
It turns out that, in general, this measurement is so-called locally optimal. This means that the optimal measurement depends on the actual value of the parameter we want to estimate and the classical Fisher information given by that measurement is a local maximum in the parameter space that characterizes the measurement. Thus estimating the parameter using a locally optimal measurement is rather impractical. Two approaches have been developed to overcome this problem. The first one is based on adaptive estimation schemes which updates a guess for a locally optimal measurement [1,13]. The second method seeks a set of initial conditions that do not depend on the unknown parameter [7,33]. Both methods are based on the maximum likelihood estimator (MLE) which implies that in order to obtain it, the aforementioned set of regularity conditions must be met [6,11,20]. We will see that the MLE derived from optimal measurements may fail to satisfy these conditions and thus these two approaches fail to attain the QCRB.
The problem of non-identifiable likelihood functions often appears when estimating the quantum phase of a system, a particular problem with a wide range of applications from quantum metrology to cryptography, as many problems can be recast into a quantum phase estimation [10,24,27]. In this framework, when trying to use locally optimal measurements [12,14,19,31,33], it may happen that adaptive schemes fail to reach the QCRB [28] due to the fact that regularity conditions are not met [13]. A paradigmatic example of nonidentifiablity is provided when trying to perform qubit phase estimation which are, in general, unable to reach the QCRB [1,7]. The main goal of the present work is to introduce a new adaptive scheme for quantum phase estimation which saturates the QCRB independently of the initial condition that generates S. We will first show that the Adaptive Quantum State Estimation method (AQSE), a general method for parameter estimation [35], does not achieve the QCRB when applied to a qubit phase estimation. The reason why AQSE does not converges is that the statistical model, built with the measurement that maximizes the Fisher information (the locally optimal measurement), is not identifiable. We will further show that the non-identifiability problem is solved by taking a sample of measures from a covariant measurement. The covariant measurement is identifiable and is chosen to minimize the MSE for one measure. With the covariant sample, we can then construct a confidence interval where the underlying statistical model is now regular. Then we will apply the AQSE method inside the confidence interval. The paper is organized as follows. We start by giving a brief review of quantum estimation theory with special emphasis in covariant estimation techniques for periodic parameter estimation (section 2). Then, we review different estimation strategies for the phase of a qubit and discuss their weakness and strengths, in particular we discuss the effects of nonidentifiable probabilistic models in the estimation error (section 3). Once we understand the problems with different estimation strategies, we present a two-step estimation scheme. Numerical simulations suggest that this scheme is able to reach the QCRB (section 4). We end the paper with a summary of the main results and our conclusions (section 5).

Quantum parameter estimation
Recall that, given a set of independent observations {x 1 , . . . , x N } from a random variable X with probability density p(x | θ), the likelihood function is defined as From here, we can derive the maximum likelihood estimator for θ, whose mean square error obeys the CRB [20] where the MSE is defined by and F (θ; X) is the Fisher information: Intuitively, the Fisher information quantifies how much information carries a sample about the unknown parameter. The ultimate aim in classical parameter estimation is to find the estimator that achieves the Cramér-Rao bound. If the statistical model {p(x | θ) | θ ∈ Θ} for a random variable X is regular the maximum likelihood estimator produced by a sequence of N independent and identically distributed of X can attain the Cramér-Rao bound asymptotically for large N [11,20].
In the quantum case, we must first describe the process of measuring the system. This is better achieved by using the concept of Positive Operator-Valued Measures (POVMs). A POVM with outcomes on a set X is a family of bounded positive operators acting over the Hilbert space of the system H such that M (X ) = I. When the state of the system is given by the density operator ρ θ , a POVM M specifies the conditional probability of obtaining the event B by Born's rule Thus, in quantum parameter estimation, the Fisher information, given by Eq. (6), is a function of the POVMs and, henceforth, we will denote it as F (θ; M ). This implies that, according to Eq. (4), maximising F (θ; M ) over the set of POVMs gives the lowest CRB. This results into the so-called Quantum Fisher Information (QFI), which we will denote as F Q (θ), and its lowest bound is so-called the Quantum Cramér-Rao Bound (QCRB) [15,16,18]. For the particular case of a scalar parameter estimation, the QFI has the following form [5,15,16] where L(θ) is the symmetric logarithmic derivative, also called quantum score, defined by Eq.
(2). When the system is in a pure state ρ θ = |ψ θ ψ θ |, the quantum score is easy to calculate [29], obtaining L(θ) = 2 ∂ρ θ ∂θ , so that the QFI becomes: Notice that the set of POVMs {M θ (j)} constructed as the projections of the quantum score L(θ) do depend on the parameter θ one is trying to infer [5]. A natural way around this is to introduce adaptive schemes to estimate θ. One approach relies on adaptive quantum estimation schemes based on locally optimal POVMs that could, in principle, asymptotically construct the optimal POVM without knowing the parameter beforehand [2,13,19,25]. Nevertheless, a set of precise mild regularity conditions for each statistical model involved in the method are required. For instance, in the adaptive quantum state estimation (AQSE) method [13,25], it is necessary to assume regular statistical models for every measurement to guarantee a consistent and efficient estimator. However, for estimation problems on which the quantum parameter is periodic, as it is the case for phase estimation, the likelihood functions produced by locally optimal POVMs are not identifiable [3,7]. Consequently, there is no mathematical reason that ensures the saturation of the CRB. The second approach searches specific initial conditions, for which the optimal POVM does not depend on the unknown value of θ [7,33]. Hence, in principle, performing an extensive independent sequence of this POVM, along with the maximum likelihood estimator, it is possible to achieve the quantum Cramér-Rao bound. Nonetheless, for periodic parameter estimation, this method often produces non-identifiable likelihood functions, and as a consequence, the CRB is not attained.
We will show now that, for the case of quantum phase estimation, we can solve the problem of non-identifiable likelihood functions by using the formalism of covariant estimation, which considers the symmetries of the system and can saturate the QCRB under specific initial conditions [16].

Covariant estimation
The following discussion is based on [16]. In quantum parameter estimation, each ρ θ ∈ S obtains its parametric dependency through a physical transformation U θ . Usually, the set of {U θ } θ∈Θ forms a group, which induces an action over S. When the family S is invariant under the conjugation of U θ , for any θ ∈ Θ, we say that the problem of estimation involves a symmetry. The problem of quantum parameter estimation involving symmetries is called covariant. In the following, we summarise the approach of covariant estimation. Let G be a locally compact Lie group acting transitively over the parametric set Θ. Thus for any base point θ 0 ∈ Θ, we have where K is the stabilizer subgroup of θ 0 . According to Wigner's theorem [34], the Hilbert space H of the system has a unitary G-representation U = {U g } g∈G . Hence, the probe states can be transformed by the conjugation where U g is an element of the G-unitary representation of H. Specifically, we say that we have a quantum covariant estimation problem whenever In this framework, a POVM M on the system H taking values in Θ is called covariant with respect the G-unitary representation {U g }, if for every event B and g ∈ G, it satisfies that where Bg −1 = {θ | θ = gθ , θ ∈ B}. Note that this condition is equivalent to We will denote as M (Θ) the set of POVMs with outcomes in Θ. Similarly, we will denote as as M (Θ, U ) the set of the covariant POVMs with respect the G-unitary representation. The class of covariant POVMs takes advantage of group symmetries. Specifically, when the outcomes of the measurement are considered as the estimates, and the cost function As a result, we can find optimal measurements on M (Θ, U ) in the Bayesian approach, by taking the average of c(θ,θ) over a prior probability measure. The following theorem ensures this assertion.
The previous result restricts the search for optimal measurements to the class M (Θ, U ). This fact is particular useful as, according to the following theorem, the covariant POVMs can be characterised as follows.
Theorem 2 ([16]). Let P 0 be a Hermitian positive operator on H, commuting with the operators {U g } g∈K and satisfying where g(θ) ∈ G is any representative element of the equivalence class θ ∈ Θ. Then, a POVM M (dθ) with outcomes in Θ ∼ = G/K is covariant if and only if has the form A particular case, which is relevant for quantum phase estimation, is when the stability group is the trivial one, that is when K = {e} with e the identity element. Here, we have that Θ ∼ = G, that is, the parametric space Θ is on one-to-one correspondence to the elements of the group G. This implies, in turn, that E θ,M c(θ, θ) is constant for all θ ∈ Θ and, as consequence of theorem (1), the optimization of the expectation of the invariant cost function can be restricted to the set of covariant POVMs. Notice that for quantum phase estimation, there is a little caveat: here the one-parameter symmetry group G isomorphic to Θ = [0, 2π) but this is not a compact set. In practice, we can consider it to be compact, assuming that the unknown parameter is an interior point of Θ. Thus, we can analyze the CRB for any covariant POVM. We proceed to discuss in more detail how these ideas apply to quantum phase estimation for qubit states.

Phase estimation strategies in a qubit
Let S = {ρ θ ; θ ∈ Θ = [0, 2π)} be a parametric family of density operators, on a 2-dimensional Hilbert space H. In this case, each state ρ θ ∈ S represents the state of a qubit. Here, the states ρ θ obtain their parametric dependency applying an arbitrary unitary transformation U θ over a probe state ρ on H, that is An arbitrary unitary transformation (up to a global phase) on a 2-dimensional Hilbert space can be written using the generators of the Lie algebra su(2) and has the following form [23,26] where n = n xx + n yŷ + n zẑ ∈ R 3 is a unit vector and σ = σ 1x + σ 2ŷ + σ 3ẑ , with σ i , for i = 1, 2, 3, the Pauli matrices. The estimation problem in this case consists in finding the best strategy to estimate the phase θ ∈ Θ of the exponential operator appearing in Eq. (20).
To analyse the QCRB in this case it is better to work using the Bloch sphere representation for a qubit state. The probe state can be written as and the transformation U θ can be seen as the rotation of the probe Bloch vector a by the angle θ around the axis n. We will denote the transformed vector as a(θ), so the transformed qubit state ρ θ is solely determined by a(θ). One can show that, for this problem, the quantum score takes the following form [7]: where This implies that, according to Eq. (8), the quantum Fisher information becomes Thus, if the probe qubit is a pure state (i.e., a = 1) and the Bloch vector a is orthogonal to the rotation axis n, then Eq. (24) achieves their maximal value F max Q = 1. We will refer to this as the optimal initial conditions. Let us proceed to discuss the different estimation strategies for the phase θ.

Locally Optimal POVMs
From the expression of the quantum score, given by Eq. (22), it is straightforward obtain that its projection operators are given by: (25) Hence, the family of POVMs is M g = {M g (0), M g (1)}, with outcomes in the set X = {0, 1}, yield the following classical Fisher information .
In other words, when the probe state is prepared in the optimal initial conditions, the POVM M g is optimal, independent of θ ∈ Θ. Therefore, any POVM M g is optimal.
At this point, one may think that a POVM M g produces an optimal estimation strategy in the sense that if we were to perform a sequence of N independent measurements using the maximum likelihood estimator, it should be possible to saturate the QCRB in the asymptotic regime. This is, however, false: the POVM M yields a non-identifiable likelihood function, and thus the maximum likelihood estimator is not consistent. Indeed, when a sequence of N identically quantum systems with Hilbert space H is prepared in the same state ρ θ , the composite system is described by the N -tensor product H ⊗ · · · ⊗ H N -times and its state is described by an N -fold tensor product state ρ θ ⊗ · · · ⊗ ρ θ N − times . Then, the likelihood function produced by the outcomes where m is the number of 0's in x, and By looking for the MLE of θ, denotedθ MLE , we obtain: which returns two values in Θ = [0, 2π), indicating the the likelihood is non-identifiable. Thus, the Fisher information alone is not enough to characterize the error of this estimation strategy. The previous discussion also can be found in [7]. Now we consider the case of POVMs with k ≥ 3 outcomes. Any element of a POVM M = {M (i)} with outcomes in a set X = {1, 2, ..., k} can be written as When a qubit is in a state ρ θ , the probability of measuring the outcome i is A necessary and sufficient condition for identifiability is that, for any parameters θ 1 , θ 2 ∈ Θ, the set of equations p(i | θ 1 ) = p(i | θ 2 ), i = 1, ..., k admits only one solution in the parametric space Θ [21].
The problem of identifiability could be avoided by considering POVMs with several outcomes because it is easier to satisfy the requirements for identifiability given above. However, a POVM that produces an identifiable likelihood is not necessarily optimal. Outside the optimal initial conditions, there is no measurement M that does not depend on θ and that saturates the QCRB [1]. In [7] it is shown a family of identifiable POVMs with more than 2 outcomes, however, except for the optimal initial condition, this family is not locally optimal. The results in [1,7] do not forbid the existence of an identifiable and locally optimal POVM with more than 2 outcomes, but we could not find one. Note also that for a small number of measurements, the locally optimal POVM does not necessarily saturate the QCRB. Nevertheless it is possible to built a POVM that, for one measurement, produces an identifiable model that minimizes the mean square error. We will review now how to build this POVM by using the covariant approach [16]. Once we show how this POVM is built, we proceed with our proposal to achieve the QCRB for qubit phase estimation.

Covariant phase estimation for qubits
The following discussion is based on [17], where M * is deduced for any shift parameter.
As we have previously mentioned, phase estimation estimation is covariant with trivial stabilizer group since the group G = [0, 2π) equipped with addition modulo 2π acts over itself. Then, Θ is isomorphic to G.
where P o is a positive operator satisfying Eq. (17). The generator of the unitary representation is the operator J := n · σ 2 , which has a spectral decomposition where have used that ± 1 where p mn = m|P o |n . Thus, any covariant POVM is characterized by the real numbers 0 ≤ p nm ≤ 1. Then, by Eq. (7) Note that the measure (35) is a 2π-periodic probability distribution, so it is natural to consider the moments of the random variable e iφ instead. The first moment for a circular distribution p(φ) is defined as so from here we can estimate the phase as φ = Arg E e iφ . To quantify the correct dispersion of the estimates we use the Holevo variance [16] V H where M ∈ M (Θ, U ) and µ = E e iθ . If one has a biased estimator, then µ = E cos(θ − θ) [3]. A circular estimatorθ is unbiased if e iθ ∝ E e iθ . For narrowly peaked and symmetric distributions around θ, V H M (θ) ∼ MSE M (θ). As a result, Holevo's variance is lower bounded by .
As the quantum Fisher information reaches its largest values for the families of pure states, let us restrict our attention only to pure quantum probes. Moreover, when the probe is a pure state, one can find the covariant POVM that minimizes the Holevo variance using the spectrum of J for any shift parameter [16,17]. Here, we adapt the proof of [16] to the problem of phase estimation.
where P m = |m m| is the associated projector to the eigenvalue m of J. Then, for any Proof. In Dirac notation, ρ = |ψ ψ| so that Eq. (38) takes the following form Since |p mn | ≤ 1, where the equality is achieved if and only if p mn = p * mn . Setting k = 1 yields the assertion .
Besides, if m|ψ is a constant for all m ∈ Spec(J), the measurement M * has information about θ equal to the quantum Fisher information [17]. For qubits, the previous condition is equivalent to have an initial condition a · n = 0.
In this context, we now prove that the measurement M * maximizes the Fisher information over the set of covariant POVMs. Then, for any M ∈ M (Θ, U ), where M * (dθ) is defined according to Eq. (38).
To find an explicit epxression of p(dθ|θ), we first notice that so that Eq. (38) can be rewritten as follows Then, the measurement M * yields the probability density function As M is a POVM with an infinity number of outcomes, it is reasonable to get an identifiable statistical model independent of θ. To illustrate this fact, we perform a numerical simulation, generating random numbers with distribution (49). A particular likelihood function produced by (49) is shown in Fig. 1.  Moreover, according to Eq. (49), the Fisher information given M * reads: When a · n = 0, F Q = 1 and therefore F (θ; M * ) = F max Q = 1. This is not surprising since, under this optimal initial condition, | 1 2 |ψ | = | − 1 2 |ψ |. Consequently, assuming this optimal setup, if the maximum likelihood is used for N independent copies of the system, one can attain the QCRB asymptotically for large N . Nevertheless, under the optimal initial condition, there are other covariant POVMs that can attain the QCRB. To see this, let us characterize all covariant measurements for quantum phase estimation of qubits.
Every positive bounded operator P o can be expressed in the Bloch representation, that is, When a qubit is in the state ρ θ , the outcomes for the POVM M (dθ) follows the probability density From this result, one can calculate the Fisher information associated with the covariant POVM M . In particular, the case d = ±( n × a) is of interest, since the classical Fisher information reads Let us briefly summarise the results we have found thus far. Firstly, we have shown that estimation strategies based on locally optimal POVM with two outcomes cannot achieve the QCRB since the corresponding likelihood is non-identifiable. POVMs with more than two outcomes may be identifiable but we could not find the POVMs that are locally optimal (globally optimal POVMs do not exist [1]). Secondly, we have seen that for continuous and periodic outcomes the covariant POVM solves the identifiability problem and minimizes the MSE for one measurement, however, it can only achieve the QCRB (in the limit of a large number of measurements) with optimal initial conditions a · n = 0. Hence, to the best of our knowledge, it does not exist a set of independent and identical measurements that achieve the QCRB.
It turns out that, under the optimal set of initial conditions, the covariant POVM M * has been previously investigated and it is usually called canonical phase measurement [3,17,22] which, according to Eq. (40), takes the following form: Although the optimal covariant POVM is hard to realize experimentally, it can be well approximated by POVMs with large number of elements or adaptive measurements [7,22,30]. In particular, the canonical phase measurement can be implemented using adaptive measurements with quantum feedback [22], and the POVM M * can be well approximated using a POVM with k ≥ 8 that discretizes Eq. (49).

Entangled measurement
It is actually possible to achieve the QCRB for a probe not orthogonal to the rotation axis if we relax the condition of identical and independent set of measures. Let us indeed see it is possible to attain the QCRB for entangled measurements for any initial condition.
Following [17], we estimate θ ∈ [0, 2π) for the family of states ρ ⊗N θ in the Hilbert space H ⊗N with ρ θ = U θ ρU † θ and ρ being a probe pure qubit. Note that ρ ⊗N θ can be written as that is, the family of states ρ ⊗N θ in H ⊗N is covariant with respect to the unitary representation e −iθJ (N ) . Therefore, as in the case of one probe, the POVM that minimizes the Holevo variance is expressed in terms of the spectrum of J (N ) . Therefore, the measurement M (N ) * that minimizes the Holevo variance has the expression where, P λ is the projection operator of J (N ) associated to the eigenvalue λ.
The difference here, in contrast to the single measurement case, is that here the spectrum of J (N ) is degenerate. Thus

Adaptive state quantum estimation (AQSE)
As we mentioned in the introduction, unlike its classical counterpart, quantum parameter estimation yields estimators which depend on the parameter one wants to estimate. A way to tackle this problem is using an adaptive quantum estimations scheme [13,25,35]. It works as follows: suppose one has a set of optimal POVMs {M g } g∈Θ . One begins with an arbitrary initial guess g 0 . Then one applies the optimal measurement at g 0 , M g 0 . Assuming the data x 1 is observed, one applies the MLE to the likelihood function L 1 (θ | x 1 ; g 0 ) = p(x 1 | θ; g 0 ) to obtain an estimateθ 1 (x 1 ) = g 1 . This process is then repeated iterative. For n ≥ 2, one applies the POVM M g n−1 , with g n−1 =θ n −1 (x 1 , ..., x n−1 )), obtaining the outcome x n . Hereθ n−1 is the estimation from the previous step using the outcomes x 1 , ..., x n−1 . The likelihood function for the nth step is where x i is the data observed at step i. Applying the MLE one obtains the nth guess g n =θ n (x 1 , ..., x n ). Even though this method relaxes the condition of identical measurements, this does not guarantee that the resulting classical statistical model is regular thus impacting its performance [8].
To illustrate this let us revisit the problem of non-identifiability for the POVM M g previously discussed. To fix ideas we take θ = 2 and consider N = 64 measurements. In panel A of Fig 2 we show the result of the likelihood function for g = 1.5 using Eq. (29). The likelihood is indeed non-identifiable and presents two global maxima in the interval [0, 2π). Panel B in the same figure shows the results of using AQSE starting with the initial value g 0 = 1.5. In this case we can see that there is only one global maxima but at the incorrect value of θ. This is the result of having an original non-identifiable model, whose wrong maximum has been enhanced due the particular random trajectory in the parameter space generated by AQSE. Thus, even though AQSE is able to lift a possible degeneracy of the global maxima there is no a priori way to control on how to enhance the correct maximum. With a modest amount of foresight, it seems fairly natural to expect that if we knew Here we have considered g = 1.5 and the number of ones was 32 out of the possible N = 64. For clarity the figure has been rescaled so that the maxima equal to unity. Panel (B) shows the log-likelihood function for the AQSE scheme given by Eq. (57) using 64 adaptive steps. The existence of trajectories in the parameter space enhancing the incorrect maximum explains why the AQSE does not reach the QCRB. In both cases, the initial condition has been taken to be a · n = 0. 5. in which interval the parameter lies and restricted the likelihood over that interval -so as to ensure that the restricted likelihood is identifiable-then the adaptive scheme will converge to the QCRB fast. To analyse this we have compared the behaviour of AQSE in two different scenarios: unrestricted and restricted likelihoods. In both cases we have taken θ = π. For the unrestricted scheme the parameter space is Θ = [0, 2π), while for the restricted one we take θ ∈ Θ = [ π 2 , 3π 2 ), so that the likelihood is identifiable. In Fig. 3 we show the result of these two cases for two extremal initial conditions a · n = 0.5 (panel A) and a · n = 0 (panel B), corresponding to the lowest and largest value of F Q , respectively. In both cases the numerical method was done using a bootstrap simulations with 10000 repetitions. From these results we can observe that in the unrestricted case the lack of identifiability directly affects the scaling of the Holevo variance as a function of the number of adaptive steps. Furthermore, numerical results show that AQSE does not saturate the QCRB. On the other hand, in the restricted case, we find that the MLE saturates the QCRB around 128 measurements for both initial conditions. Thus we conclude that a possible deficient performance of AQSE is due to the non-identifiability for θ in the likelihood functions. Finally, in Fig. 3, we have also compared the performance of AQSE with the optimal Quantum Cramér-Rao Bound  In all cases we used 10000 sequences for each number of probes. As we can see the restricted AQSE performs much better than the AQSE. We have also included the strategy based on the optimal covariant POVM M * as a benchmark. Notice that for both initial conditions the AQSE does not reach the QCRB, while for optimal initial condition the sequence of covariant POVMs is able to attain the QCRB (see panel B).
covariant inference. In this case, the sequence of M * measurements has a better scaling for a small number of measurements (less than 4). Then, we can also conclude that the optimal covariant measurement is the best independent strategy in a regime of small N . This is expected because the covariant measurement is built to minimize the MSE for one measurement.
In the next section, we propose an estimation scheme that uses the covariant estimation and the AQSE method to avoid the problem of non-identifiable likelihood functions and thus is able to saturate the QCRB for any initial condition.

Adaptive estimation scheme with confidence intervals
From the previous discussion it seems clear that, conditioned of knowing in which interval the actual phase lies in and restricting the likelihood to that interval so it becomes identifiable, the AQSE method converges efficiently to the QCRB. With this in mind, we propose the following two-step scheme which allows us to initially guess the interval, thus making the iterative scheme a regular problem. Our scheme consists of two main steps. In the first step one uses a sequence of independent optimal covariant measurements to produce a confidence interval (CI) using the maximum likelihood estimateθ MLE . The CI gives a range of plausible values where the unknown parameter most likely lies in. In the second step, one applies the AQSE method restricted to the confidence interval. Furthermore, one is able to update the center of the CI by the MLE of each previous step. The idea of using CIs to improve the error in estimations is exemplified, in a different context, in [4]. Let us assume that we have N copies of the state system ρ, so that each measurement is performed consecutively over each copy. Let us denote the result of each measurement as x i . In the first part of our scheme we apply N 1 covariant measurements M 1 given by Eq. (40), that is and then we construct the MLÊ to obtain the estimationθ MLE (x 1 , ..., x N 1 ). The CI is then given by [9]: where c is the appropriate critical value in the standard normal distribution (e.g. 1.96 for 95% of confidence or 2.58 for 99% of confidence) and F is the Fisher information of M * given by Eq. (50). To determine the minimum sample size N 1 in order to get a confidence interval of size 2E, we use the CRB to write the lower bound In the second step of our scheme we then use the remainder of the copies, that is N 2 = N − N 1 , to perform measurements using the POVM All in all, our two-step method is given by: with Θ = CI θ MLE (x 1 , ..., x k ) . Eq. (63) is the main result of this work. Before discussing some numerical results of our newly introduced method it is important to discuss the possible sources of errors associated to it. Suppose that we set a confidence level 0 ≤ C l ≤ 1 for a given marginal error E. Then we can write that where the two types of errors ∆ 1 and ∆ 2 correspond to the error associated to the AQSE method when the CI correctly includes the value of the actual parameters, and when it does not, respectively. Let us call the latter intervals bad CIs. Further, let us take E < π 2 , so that likelihood functions used in the AQSE method are identifiable. Thus, the error ∆ 1 when the CI includes the value of the parameter is lowered bounded by To characterize ∆ 2 , first we note that there are two types of bad CIs. To see this, let θ A be the parameter's value and θ B = θ A + π. Then, the likelihood function produced by AQSE has two local maxima, one around θ A , and the other around θ B . The first type of bad CIs corresponds to those that include the second maximum around θ B . If one applies AQSE restricted to these CIs then θ MLE tends to θ B . The second type of bad CIs appear when the interval does not include θ B . In this case, the maximum likelihood estimation produced by AQSE can tend to the point in the interval closest to either θ A or to θ B . When the estimation is closest to θ A , by updating the interval's center with the posterior estimates, one can, in principle, obtain a confidence interval that captures the parameter's value. This explain why we obtain better results updating the center of the confidence interval.
To obtain a bad interval of type one, it is necessary that most of the data obtained from the covariant inference be numbers close to θ B (the likelihood function has a global maximum around θ B ). From Eq. (49), for a covariant sample of size N , it follows that the probability of obtaining a likelihood function with the global maximum around θ B is Thus, for sufficiently large N , the above probability can be neglected. In this way, for a confidence level close to 1, and a sufficient small marginal error E, one expects that the number of bad confidence intervals disappear as the number of adaptive steps increases.
To illustrate this fact, in Table 1, we show the number of bad intervals as a function of the adaptive steps in the AQSE part produced by a bootstrap simulation of 10000 repetitions.
Here we have chosen a marginal error of π 4 , with a confidence level of 0.99. In this case, we need to perform 11 and 22 covariant measurements for the initial conditions a · n = 0 and a · n = 0.5, respectively, to get the desired CI. For these values, the number of type one bad confidence intervals is negligible. Setting = E in (66), the probabilities of obtaining them are 2.3 × 10 −10 and 2.3 × 10 −18 for these two initial conditions, respectively. Note that, the distribution for the estimates can be approximated as a mixture of 3 normal AQSE Steps # bad CIs ( a · n = 0) # bad CIs ( a · n = 0. 5)  0  111  125  4  50  56  8  19  20  16  8  1  32  6  1  48 2 0 distributions, N (θ, , where M is the sequence of AQSE and E is the marginal error. This implies that the lower bound for the Holevo variance, when the center of the CI is not updated, is given by The second term on the right-hand side of Eq. (67) dominates when N 2 → ∞. This has a very simple interpretation: if the parameter is outside the confidence interval, making more steps in the AQSE method does not diminish the error. Unfortunately, we are not able to provide a lower bound to the Holevo variance in the case for which the center of the CI is also updated. Nevertheless, as we will see, by setting a confidence level close to 1, a sufficiently small marginal error E, and by updating the center of the CIs, the Holevo variance approximates the QCRB in a small number of steps.
To assess the performance of our method we have performed a Monte Carlo simulations by setting E = π 4 and C l = .99. Moreover, we have compared our numerical results with the ideal case (C l = 1) and an estimation strategy for which the center of the CIs is not updated. The results are summarised in Figures 4 and 5 for initial conditions a · n = 0 and a · n = 0.5. In Fig. 4 we have considered the initial condition a · n = 0, so that F (θ; M * ) = 1, and fixed N 1 = 11. The vertical dashed line at N 1 = 11 separates the point between covariant and AQSE inferences. Notice that in the covariant part, for N ≤ 11, both strategies corresponding to either updating or not the CIs centers yields the same result. However, for N > 11, if the center of the CI is not updated, so that according to Eq. (67) it second term dominates, the estimation error grows quickly as the number of measurements increases. If, however, the center of the CIs is updated then the estimation error tends to the QCRB. Similarly in Fig. 5, for which we have chosen N 1 = 22, a · n = 0.5 and F (θ; M * ) = 0.75, we can observe the same qualitative behaviour, showing that our estimation strategy tends to the QCRB as the number of steps increases for any set of initial conditions. We conclude by showing in Fig. 6 the performance of all the methods we have discussed  Figure 4: Plot showing Holevo's variance as a number of probes for our scheme. The optimal strategy (orange rhomboid markers), in which we update the centers of the CIs, tends to the QCRB as the number of probes increases. The latter bound corresponds to take a confidence level of 100%. We also show the alternative strategy for which the centers of the CIs are not updated (yellow pentagon markers). This result is lowered bounded by the solid black line, whose formula is given by Eq. (67).
We have chosen the initial condition a · n = 0. Here the vertical blue line at N = 11 marks the point separating the covariant inference performance from the AQSE inference to our two-step scheme.
throughout this work considering the initial conditions a · n = 0.5 and a · n = 0, which correspond to the smallest and largest quantum Fisher information, respectively. Obviously the optimal performance for the different estimation strategies is independent of θ, since the Holevo variance is invariant under translation modulo 2π over the parametric space. For all the strategies, the behavior is qualitatively the same for any initial condition. As expected, the entangled strategy is the best one and reaches the QCRB. The second best strategy Covariant Cramér-Rao Bound N=22 Figure 5: Plot showing Holevo's variance as a function of the number of probes N for our two-step adaptive scheme. The optimal strategy (orange rhomboid markers), in which we update the centers of the CIs, tends to the QCRB as the number of probes increases. This result is lowered bounded by the dashed black line, which is the QCRB corrected using Eq. (67) for a confidence level of 100%. We also show the alternative strategy for which the centers of the CIs are not updated (yellow pentagon markers). This result is lowered bounded by the solid black line, whose formula is given by Eq. (67).
In this case, we have chosen the initial condition a · n = 0.5. Here the vertical blue line at N = 22 marks the point separating the covariant inference performance from the AQSE inference to our twostep scheme. Finally, the horizontal dashed red line corresponds to the Cramér-Rao bound when only covariant measurements are used instead.
is the restricted ASQE, but this is unrealistic as it assumes that one knows beforehand an interval where the MLE is regular and includes the parameter. Our method, which we emphasize is the more realistic both mathematically and experimentally, is the third best strategy, as it reduces the error of the estimates improving the performance of AQSE and approximating the estimate to the QCRB.  Figure 6: Holevo variance vs the number of probes for different strategies of estimation. In order to calculate the Holevo variance for the schemes based on the maximum likelihood estimator. A sequence of 10000 measurements were simulated in each point. The curve for the entangled measurement was analytically calculated. The curve for the proposed scheme shows the performance with a 99%confidence intervals. The restricted AQSE method assume that the parametric space is an interval of length π that includes the real value of θ. The best strategy is the entangled measurement, followed by the restricted AQSE. These two strategies are unrealistic to implement. The third best strategy is our proposal. The y-axis is in log scale.

Summary and Conclusions
In this work we have thoroughly examined several strategies for quantum phase estimation. Their common denominator relies on maximizing likelihood density functions which are generally non-identifiable and/or non-optimal. This implies that the QCRB cannot be attained. We have developed a two-step strategy that circumvents this problem. Our method relies first on covariant measurements to identify a confidence interval within which the actual parameter is most likely to be, and then to apply an adaptive technique restricted to that interval. When compared with the current existing methods, ours is mathematically more robust, as it reaches the QCRB for any initial condition, and experimentally is more realistic, since neither an entangled measurement nor a priori information of the parameter's value are needed. Finally, we believe that our scheme can be generalized to any system so long as one can use a set of measurements to construct confidence intervals. However, based on the results presented here and the current body of work, a number of questions remain open. First of all, it is not clear to us whether there exists a better strategy for which the subset of covariant measurements is intertwined with the subset of adaptive ones so as to identify the CI much faster. Secondly, the generalization to this work to multiparameter estimation does not seem straightforward, since although it is indeed possible to construct the covariant POVMs [16] for this case, little is known about non-identifiability for multiparameter likelihood functions. These, and other issues, will be addressed in forthcoming works.

Acknowledgments
We thank Laboratorio Universitario de Cómputo de Alto Rendimiento (LUCAR) of IIMAS-UNAM for their service on information processing. This research was supported by the Grant No. UNAM-DGAPA-PAPIIT IG100518 and IG101421, and Doctoral scholarship CONACYT 334231. I.P.C. acknowledges funding support from the London Mathematical Laboratory where he is an External Fellow.

A Code implementation
Our numerical results have been implemented using R language. An R library with the various methods discussed here can be publicly found in the repository [32]. To reproduce the numerical points for covariant strategy, AQSE and restricted-AQSE presented in Fig.  6, use the function: Hvar_Scheme(theta_real, par_space, n_boost, num_prob, n,a, strategy), set the value of θ, theta_real from 0 to 2π, set par_space as the vector c(0, 2*pi), set the size of bootstrap n_boost=1000, vary the desired number of probes num_prob, set the initial unitary vectors for the probe and the axis of rotation n a, and vary the index strategy from 1 to 3. The index 1 represents the estimation strategy for independent covariant POVMs, 2 represents the AQSE strategy, 3 represents the restricted-AQSE strategy. To reproduce the points for our proposed estimation scheme use the function ECI_Hvar(theta_real, par_space, n_boost, num_prob, n,a, C_lev, Margin_Err), where you have to set the confidence level C_lev and marginal error Margin_Err to the desired values. Finally, to calculate the curve for the entangled estimation strategy use the function Ent_Hvar(theta_real, par_space, n,a, num_prob).