Quantum key distribution rates from semidefinite programming

Computing the key rate in quantum key distribution (QKD) protocols is a long standing challenge. Analytical methods are limited to a handful of protocols with highly symmetric measurement bases. Numerical methods can handle arbitrary measurement bases, but either use the min-entropy, which gives a loose lower bound to the von Neumann entropy, or rely on cumbersome dedicated algorithms. Based on a recently discovered semidefinite programming (SDP) hierarchy converging to the conditional von Neumann entropy, used for computing the asymptotic key rates in the device independent case, we introduce an SDP hierarchy that converges to the asymptotic secret key rate in the case of characterised devices. The resulting algorithm is efficient, easy to implement and easy to use. We illustrate its performance by recovering known bounds on the key rate and extending high-dimensional QKD protocols to previously intractable cases. We also use it to reanalyse experimental data to demonstrate how higher key rates can be achieved when the full statistics are taken into account.


Introduction
In the long history of QKD (see [1][2][3][4] for reviews), secret key rates have mostly been calculated analytically. Consequently, we only know their values for protocols with highly symmetric measurement bases, such as BB84 [5], the six-state protocol [6], or their generalisations to mutually unbiased bases (MUBs) in higher dimensions [7,8]. In order to tackle arbitrary measurement bases, for a long time the only approach available was to use the min-entropy to lower bound the von Neumann entropy [9]. This trick has been applied successfully to device independent QKD [10,11] and QKD with characterised devices [12]. It leads to a simple SDP, at the cost of delivering suboptimal key rates. Numerical methods that go beyond the min-entropy approximation have been developed to tackle this issue [13,14]; unfortunately, none of them output optimal key rates. In Refs. [15,16] numerical methods that do provide optimal key rates are proposed. We'll compare them with our method in the Conclusion.
Recently, Brown et al. [17] presented a new approach for the accurate computation of asymptotic secret key rates in device independent QKD. It is based on a converging SDP hierarchy of lower bounds on the conditional von Neumann entropy. This hierarchy relies, in turn, on the NPA hierarchy [18] for characterising quantum correlations. As a consequence, the Brown et al. hierarchy has a very high complexity, being capable of handling only cases with few measurement settings.
In this work we adapt the SDP hierarchy proposed in Ref. [17] to the case of QKD with characterised devices, enabling likewise the computation of a sequence of lower bounds converging to the asymptotic key rate. In contrast with the device independent case, we don't need to use the NPA hierarchy, which makes the method much more efficient. Remarkably, the complexity of this new method is essentially independent of the number of measurement settings. Instead, the most relevant parameter is the quantum state dimension, and we can handle cases up to local dimension 8.
We illustrate the power of the technique on two families of protocols. First, for protocols that use d+1 mutually-unbiased bases (MUB protocols), we recover the analytical key rate in the known cases, and extend the protocol to use the full experimental data in any dimension (including dimensions that are not primes or prime powers, in which we use d + 1 bases that are only approximately unbiased). Second, we propose and analyse a new protocol that is tailored to situations where one can produce high-dimensional entanglement, but can only measure superpositions of consecutive basis elements.
In order to handle real experimental data, we show how to modify the algorithm to minimise the conditional entropy over a confidence region in parameter space, and propose a new Monte Carlo method to compute such a confidence region via Bayesian parameter estimation. We illustrate the method by reanalysing data from two experiments.
The paper is organised as follows. In Section 2 we recap necessary results from Ref. [17]. In Section 3 we present and prove the main result, the SDP hierarchy for bounding QKD rates. In Section 4 we apply it to recover known bounds on the key rate and compute new ones. In Section 5 we show how to modify the SDP to handle experimental data, and use that to bound key rates from two previous experiments.

Variational lower bounds for the conditional entropy
In this section, we review necessary results from Ref. [17] and establish the notation to be used from now on.
In quantum key distribution, the asymptotic key rate is lowerbounded by the Devetak-Winter rate 1 [20] K where H(·|·) is the quantum conditional entropy. H(A|B) only depends on the statistics measured by Alice and Bob in the key basis, and as such is straightforward to compute. The interesting problem is H(A|E), as we have to minimise it over all states ρ ABE share by Alice, Bob, and Eve compatible with the statistics measured by Alice and Bob. To do that, we express the conditional entropy in terms of the (unnormalized) relative entropy: where D(ρ||σ) = tr(ρ(log 2 ρ − log 2 σ)) and ρÃ E is the classical-quantum state given by a |a a| and {A a 0 } n−1 a=0 is the basis used by Alice to generate the secret key.
As shown in Ref. [17], one can get a convergent sequence of SDPs for the relative entropy by using a Gauss-Radau quadrature [21]. The Gauss-Radau quadrature is defined for every positive integer m as a vector of m coordinates t and a vector of m weights w such that for all polynomials g of degree 2m − 2 it holds that In addition we have that w i > 0, i w i = 1, w m = 1/m 2 , t i ∈ (0, 1], and t m = 1. A simple algorithm for computing w and t is described in [21]. As shown in Ref. [17], applying the Gauss-Radau quadrature to an integral representation of the logarithm yields a variational upper bound for the quantum relative entropy, valid when supp(ρ) ∩ ker(σ) = ∅: so if we define then the objectives and equality constraints of (6) and (7) match, modulo a complex conjugation on the Z a i that we can freely do, since they are unconstrained. It remains to show that the existence of a state ρ ABE and complex matrices Z a i is equivalent to the existence of positive semidefinite matrices Γ 1 a,i and Γ 2 a,i related via Equations (10). To see the forward direction, first note that the map Ξ is completely positive: its Choi matrix is given by which is manifestly positive semidefinite. Let then for s 1 0 = s 2 0 = 1 and s 1 1 = s 2 1 † = Z a i . The matrices γ g a,i are positive semidefinite, as and so are Γ 1 a,i and Γ 2 a,i , as Γ g a,i = (1 ⊗ Ξ)(γ g a,i ) (16) and Ξ is completely positive. This completes the proof of the forward direction. To see the converse direction, let the eigendecomposition of σ be In the following, we assume that Eve's Hilbert space is H E := span{|i } 2d 2 −1 i=0 , and regard the state shared by Alice, Bob and Eve as ρ ABE = |ψ ψ|. Eve's reduced density matrix has therefore support on the space spanned by just the first d 2 basis states of H E . The "extra space" given by span{|i } 2d 2 −1 i=d 2 −1 will be needed by the end of the proof. Now, define the matrix With the state ρ ABE defined above, the identity Ξ(M ) = W M W † can be seen to hold. Define then 1 Here W + denotes the pseudo-inverse of W ; and1, the projector corresponding to the support of σ.
The matrices on the left-hand side of Equations (18) and (19) are positive semidefinite, which We are ready to build Eve's operators Z a i ∈ B(H E ): where √ · denotes the unique positive semidefinite square root. In the expression above, the matrix block decomposition reflects the division of Eve's Hilbert space H E into the orthogonal subspaces span{|i } d 2 −1 i=0 and span{|i } 2d 2 −1 i=d 2 . With the definition above, it is easy to see that Equations (10) are satisfied. This completes the proof of the converse direction.
Unlike the case for device independent QKD, for fixed m this is not a hierarchy, but a single SDP. We thus have a converging hierarchy for the conditional entropy depending on a single parameter m.
In order to obtain a reliable solution for an SDP numerically, it is crucial that it satisfies strict feasibility, that is, that there exists a feasible solution for which all eigenvalues in the positive semidefiniteness constraints are strictly positive. For SDP (7) this is the case iff there exists a full rank state σ compatible with the measurement results. If this is not the case then one must reformulate the SDP with facial reduction to make it strictly feasible [23]. This is of little relevance in practice, as in any real experiment the state will be full rank. For completeness, however, we give a proof of this characterisation of strict feasibility and show how to perform facial reduction in Appendix A.
The SDP (7) takes a long time to run in higher dimensions, so it is important to look for techniques to optimise it. We have explored the use of symmetrisation [24]: if the objective of an SDP is invariant under some transformation applied to its variables and this transformation doesn't affect whether the constraints are satisfied, then this transformation is a symmetry of the SDP and can be used to eliminate redundant variables, which can dramatically improve the running time.
A simple and powerful symmetrisation applies when A a 0 and E k are real matrices for all a, k: in this case we can choose all our variables to be real, which has a dramatic impact on performance, mainly due to the poor support current solvers have for complex variables 3 . Another useful symmetrisation applies when the E k satisfy a certain permutation symmetry, which allows us to reduce the number of variables ζ a i , η a i , θ a i by at most a factor of d, which again can dramatically improve performance. This is proven and explored in detail in Appendix B. The focus of our paper, however, is using SDP (7) to calculate key rates using the full experimental data available, in which case the symmetrisation seldom applies.

Numerical results
In order to illustrate the performance of our technique, we ran the SDP (7) for various QKD protocols. We solved the SDP with MOSEK [27] via YALMIP 4 [28] on MATLAB 5 . In order to model the effect of noise, we assume that Alice and Bob share an isotropic state: |ii and v ∈ [0, 1] is the visibility.

MUB protocol
The first protocol we consider is the d + 1 MUBs protocol from Ref. [8], where Alice and Bob measure a complete set of MUBs for some prime d, with Bob's MUBs being the transpose of Alice's. They estimate the probabilities of their outcomes being equal or differing by some integer, and use only this data to compute the key rate. Both these limitations, that the dimension must be prime and the full data not being used, came from the proof technique and do not apply here. Therefore we extend the protocol to use all available data, and furthermore allow the dimension d to be any integer. When d is equal to a prime power we one can use the well-known Wootters-Fields construction [29] to generate the MUBs. For other dimensions it is widely believed that no set of d + 1 MUBs exists [30], so we generate numerically bases that are only approximately unbiased via a simple gradient descent. A set of n approximate MUBs is obtained from numerically minimising [31] min where {U (r) } n r=1 is a collection of unitary matrices. Note that the value is zero for a set of MUBs. The key basis is the computational basis, that is, A a 0 = |a a|, and the POVM elements used in the SDP are given by where Π i k is the projector onto the ith vector of the kth MUB, with k going from 0 to d and i, j going from 0 to d − 1.
In order to obtain an analytical formula to compare against our numerical results, note that when we actually have d+1 MUBs, that is, when d is a prime or a prime power, these measurements are tomographically complete for the isotropic state 6 [32], so the key rate must be equal to the tomographic rate. That is the rate one obtains when Alice and Bob make a tomographically complete set of measurements, and as such is the best possible rate for a given quantum state. When our d+1 bases are not exactly mutually unbiased this no longer holds, so for other dimensions the tomographic rate is only an upper bound for the key rate.
For the isotropic state the tomographic rate is given by [33,34] where h(·) is the binary entropy. Note that Equation (24) coincides with the key rate computed in Ref. [8] for the isotropic state when d is prime. We did the calculation for d = 2, 4, 6, 8, setting m = 8. As we can see in Figure 1, our technique closely matches the exact key rate for d = 2, 4, 8, whereas for d = 6 it is slightly below the upper bound. This indicates that the problem of existence of MUBs, although mathematically fascinating, is irrelevant for the practical implementation of this QKD protocol.

Subspaces protocol
When the experimental setup suffers from a high amount of noise, it is advantageous to do a filtration step before running the QKD protocol. That's the idea behind the subspace protocol proposed in Ref. [12]. In it the Hilbert space of dimension d is partitioned in d/k subspaces of dimension 7 k. Alice and Bob first check whether they obtained an outcome belonging to the same subspace of the Hilbert space; if they haven't, they discard the round. Otherwise, they proceed with the protocol, with the state conditioned on belonging to the subspace they are in, which is in general less noisy than the state they started with. The main weakness of the original security analysis is that it approximates the von Neumann entropy by the min-entropy, which leads to a loose lower bound on the secret key rate. Here we can fix this problem. We further note that the subspace technique is very flexible with respect to which protocol is used in the subspaces. Here we use the d + 1 MUBs protocol from section 4.1.
For the isotropic state and when k is a prime or prime power the key rate is given by is the probability that Alice and Bob obtain outcomes in the same subspace, and K iso is given by Equation (24).
We ran that SDP (7) for d = 8 and k = 2, 4, 8. Note that the MUB protocol is ran in each subspace, so A a 0 and E k are simply those of the MUB protocol, but with the size of the subspace k playing the role of the dimension d. The measurement that is implemented physically is the direct sum of the measurements in the subspaces, but that does not appear explicitly in the key rate calculation. The results are shown in Figure 2. One can see that the numerical results closely match the analytical formula. For the sake of comparison we also showed the much inferior key rates that are obtained when using the min-entropy technique.

Overlapping bases protocol
For some experimental setups it is not feasible to measure high-dimensional MUBs, so we consider a simpler protocol where only superpositions of nearest neighbours have to be measured. Alice and Bob use the computational basis as the key basis, A a 0 = |a a|, and measure the POVM elements where d must be even, k goes from 0 to 4, i, j go from 0 to d − 1, and the vectors v i k are given by Note that here we are not measuring only the probabilities that the outcomes are equal or differ by some constant, but instead are taking into account the full experimental data. Several of the projectors we are measuring are linearly dependent, however, so one should be careful to select a linearly independent subset for the data analysis.
This protocol is specially appropriate for energy-time entanglement setups, in which the time of arrival of photons are collected into time-bins. In such setups it is infeasible to measure MUBs, but the superposition of neighbouring bins is accessible with the use of Franson interferometer (see Refs. [35][36][37] for experiments using such setups).
We ran the SDP (7) for d = 4, 6, 8. The results are shown in Figure 3. We compare them with key rates obtained with the subspace protocol from Section 4.2, as that protocol also needs only superpositions between nearest neighbours. We see that for low to moderate amounts of noise the overlap protocol outperforms it.

Dealing with experimental data
In order to calculate the conditional entropy with SDP (7), we need the exact probabilities of each measurement outcome. Those are not available in real experiments, as it's fundamentally impossible to measure probabilities, even in the absence of error. Instead, one measures relative frequencies, from which one deduces that the probabilities are within some region with some level of confidence. Note that a naïve identification of relative frequencies with probabilities typically leads to probabilities that are not compatible with any quantum state (commonly referred to as quantum state with negative eigenvalues). Computing instead the confidence region compatible with your relative frequencies automatically deals with this issue 8 . Accordingly, we need to modify SDP (7) to not use exact probabilities, but instead minimise the conditional entropy over the confidence region.
This leads us to the thorny issue of how to compute the confidence region in the first place. The method we choose is Bayesian parameter estimation [39][40][41], as it naturally provides a confidence region in the form of the high-density posterior. An exact computation is only feasible for small amounts of data in small dimensions [42,43], and provides a confidence region with a very complex description, so we have to use approximate methods. The only method we found in the literature is based on using particle filters [44,45]. Its complexity scales exponentially with the dimension and is therefore unsuitable for our purposes. We propose thus a new method.
The method we use is based on approximating the likelihood function by a Gaussian. This will be a good approximation whenever the trials are independent and the amount of data is large. It results in a confidence region consisting of the intersection of an ellipsoid with the set of probabilities that can be produced by quantum states. The crucial advantage of this confidence region is that it can be described by semidefinite constraints, and as such be incorporated in our SDP. More formally, the confidence region is given by where f are the measured frequencies, p are the probabilities, Σ is the covariance matrix of the Gaussian, χ is a parameter that determines the size of the ellipsoid, and E is a vector whose components are the POVM elements. The algorithm to determine Σ and χ is described in Appendix C.
The modified SDP is then given by

Examples
As an example, we calculated the key rate for the overlap protocol with the data obtained in an experiment reported in Refs. [37,46]. In this experiment time-bin entanglement is produced and distributed over a 10.2km free-space channel over Vienna. The discretisation used here for the key rate calculation was used in Ref. [46] to certify genuinely high-dimensional entanglement. We used the data for d = 4, and obtained 0.4038. The measurements used were a linearly independent subset of the real projectors E 0,i,j , E 1,i,j , E 3,i,j . For comparison, we also computed the key rate with the overlap protocol with d = 4 and k = 2, also using only the real projectors, and obtained 0.3868.
As another example, we calculated the key rate for the MUB protocol with the data obtained by measuring the transverse position-momentum degree-of-freedom of photons using tailored macropixel bases, for dimension d = 3. This data was originally presented in Ref. [47] in the context of entanglement certification. Using the full data we got the key rate 1.3310. For comparison, we also computed the key rate with the original MUB protocol from Ref. [8], that doesn't use the full data, and got 1.3553.
Surprisingly, the key rate was lower when using the full data. This is because in both cases we are using a flat prior over the parameter space, but the parameter space is much larger when using the full data, which dilutes the effect of Bayesian conditioning. The number of counts in the data set was roughly 600 per setting, which is too small to overcome this effect. If we artificially multiply the number of counts by 10 (which doesn't change the relative frequencies), we get the expected result: key rate 1.4303 with the full data, and 1.4021 with the original protocol. This shows that when using the full data we need to make sure we have enough counts to have a good estimate of all the parameters.

Conclusion
We have developed an SDP hierarchy for calculating asymptotic key rates in quantum key distribution with characterised devices. The algorithm is efficient, easy to implement, and straightforward to use for different protocols. We have also shown how to adapt it to minimise the key rate over the confidence region compatible with the measured statistics, which is necessary to handle real experimental data. Our numerical results show that it closely recovers the known analytical key rates.
We would like to highlight the main advantages of our approach to that of [15,16]. First, our method formulates the problem as an SDP, and thus can use standard solvers, while in [15,16] the problem of calculating the key rate is cast as a general convex optimization problem, which requires custom-programmed algorithms. Also, we believe our method is easier to set up and modify for an arbitrary protocol -the user only needs to be able to formulate the measurements used for the key generation and the conditional entropy estimation. On the other hand, the methods of [15,16] require deeper understanding of the underlying method, including the definitions of certain linear maps needed to properly define the cost function of each considered protocol.
The fact our method casts the problem as an SDP allows one to obtain an analytical lower bound to the key rate, if desired. One solves the dual problem, approximates the numerical solution by rational numbers, and perturbs the rational solution to make it satisfy the SDP constraints exactly. This is done for example in Appendix C of [48]. In this way one gets around the finite tolerances of numerical SDP solvers and possible floating-point errors.
Directions for future research include adapting it to protocols that use different security assumptions, that were developed to overcome limitations of standard QKD. Examples are measurementdevice-independent QKD [49], twin-field QKD [50], and using decoy-states [51].
Another important step is to compute the rate for finite keys, which is necessary for real implementations. This can be done using the Entropy Accumulation Theorem [52] as done in [53].

Code availability
All the code and data necessary to reproduce the results of this paper are available in https: //github.com/araujoms/qkd. Proof. To see the if direction, assume that such a state exists. Then let ζ a i = 0, and η a i = θ a i = 1. The eigenvalues of Γ 1 a,i and Γ 2 a,i are then the eigenvalues of σ, which are by assumption strictly positive, together with the eigenvalues of 1.
To see the converse direction, let |v be a eigenvector of σ with eigenvalue 0. Then since a,i ≥ 0, it can be written as G † G for some matrix G. Since 0| v|G † G|0 |v = G|0 |v 2 2 , we have that G|0 |v = 0, and therefore Γ 1 a,i |0 |v = 0.
In order to find out whether a full rank σ compatible with the measurement results exist it is enough to maximise the minimum eigenvalue of σ, which is a simple SDP: Note that this SDP is always strictly feasible (if a σ compatible with the measurement results exists in the first place), because one can set λ = −1 and then all eigenvalues will be at least 1.
Suppose that σ cannot be made full rank. Note that because of the constraints Γ 1 a,i ≥ 0 and Γ 2 a,i ≥ 0 the matrices ζ a i must have the same support as σ. Furthermore, the matrices θ a i and η a i can be chosen to have the same support, as non-zero elements outside it can never decrease the value of the objective. Therefore, to make the SDP strictly feasible we can simply reformulate it explicitly inside the support of σ. Let then {|v i } k−1 i=0 be an orthonormal basis for the support of σ, and V = (|v 0 , . . . , |v k−1 ) an isometry. We have that σ = V ωV † for a k × k quantum state ω, and similarly ζ a Writing then SDP (7) in terms of the new variables, we have

A.1 Example
Suppose Alice and Bob share a state of local dimension 2, and are estimating the probability of getting equal outcomes in the Z and X bases. That is Suppose then that the probabilities they estimate 9 are f 0 = 1 and f 1 = x, for some x ∈ (0, 1). Then necessarily we have that σ|01 = σ|10 = 0, so σ is supported in the 2-dimensional subspace spanned by |φ + and |φ − . The isometry is then V = (|φ + , |φ − ), and with it we can solve the reduced SDP (31). It turns out that in this case facial reduction made no practical difference: the numerical solver could also handle this problem without the reduction, the only difference was that the reduced problem was much faster. Suppose now that x = 1, so the only possible solution is σ = |φ + φ + |, and the isometry is V = |φ + . In this case the solver cannot handle the original problem, it suffers from numerical instabilities and gives out an incorrect answer. The reduced problem was solved correctly without difficulties.

B Symmetrisation
In this appendix we explore in more detail the symmetrisation techniques mentioned in Section 3.

B.1 Complex conjugation
The simplest symmetrisation applies when A a 0 and E k are real matrices for all a, k. Then the transformation will leave the objective invariant, as and map satisfied constraints to satisfied constraints, as tr(σ * ) = tr(σ), tr(E k σ * ) = tr(E T k σ), and Γ g a,i * ≥ 0 iff Γ g a,i ≥ 0. This implies that whenever σ, {ζ a i , η a i , θ a i } a,i is feasible solution to the SDP, σ * , {ζ a i * , η a i * , θ a i * } a,i will also be a feasible solution with the same value, and the same will be true for any convex combination of these two sets. Using then equally weighted convex combination, we see that we can choose all of them to be real.
Even if the E k are not real, this symmetrisation also applies if for all k there exists a k such that E T k = E k and f k = f k . This holds for example for the MUB protocol from Section 4.1 with the probabilities of the isotropic state. It will certainly not hold for real experimental data (or even simulated data), so we refrained from using this in our calculations.

B.2 Permutation
Assume now that A a 0 = |a a|, that is, we are using the computational basis as the key basis. Then for any permutation π and any unitary V the objective will be invariant under the mapping where U π is the permutation matrix such that U π |a = |π(a) for all a. This mapping also doesn't affect the satisfiability of the constraints of the SDP, except for the constraint tr(E k σ) = f k , which is mapped onto tr E k (U † π ⊗ V † )σ(U π ⊗ V ) = f k . Therefore, the mapping (36) is a symmetry of the SDP if distribution of the parameters is therefore given by the resulting estimate for the parameters is the expected valuê and the 1 − α credible region (Bayesian version of confidence region) is defined as the high density posterior, that is, the set where γ is defined as the supremum of γ such that The required integrals are in general very difficult to compute. We propose here to compute them by approximating the likelihood function p(D|θ) by a Gaussian, restricting the prior p(θ) to be either flat or Gaussian, and then applying Monte Carlo methods. The reason for doing so is that the product of two Gaussians is again a Gaussian, and so the posterior distribution will be the intersection of a Gaussian with the space of valid θ, call it Θ (in our case the set of probabilities that can be produced by a quantum state). This makes the credible region easy to describe; if the Gaussian of the posterior has mean µ and covariance matrix Σ, that is, if then the credible region will be the set of θ such that θ ∈ Θ and θ − µ, Σ −1 (θ − µ) ≤ χ 2 (45) for some χ. The main advantage of doing so is that inequality (45) is an SDP constraint, so if the condition θ ∈ Θ also is, we can bound quantities of interest maximising or minimising them over the credible region.
To compute the credible region we then start with a initial guess χ 0 = Q −1 (κ/2, 0, 1 − α), where Q is the regularised incomplete gamma function and κ the number of parameters we are estimating. We then compute integral (43) via Monte Carlo: we sample n times a θ from p(θ|D) and count how many times k it respects inequality (45). Our estimate of the integral is then k/n. We then increase or decrease χ 0 until we our estimate is close enough to 1 − α, via binary search.
To sample θ from p(θ|D), there are two methods: the easiest one is rejection sampling: sample θ from G(θ), which is very easy to do, and check whether θ ∈ Θ. If yes, accept the sample, otherwise reject. It might be the case that the probability of getting a sample θ ∈ Θ is too low for this method to be viable, usually when µ ∈ Θ. In this case we need to sample θ with the Metropolis algorithm.
Up to this point, the discussion of Bayesian parameter estimation has been quite generic. In order to apply it to our problem we need to specify the likelihood function and the Gaussian that approximates it.
If one assumes the random variables are independent and identically distributed (i.i.d.), the likelihood function is given simply by the multinomial distribution. Let's say one performed a set of n measurements with k i outcomes each, obtaining counts N := where ·, · denotes the inner product. We approximate it by the Gaussian with multinomial mean and covariances.
The mean µ := {µ(w|i)} ki−2,n−1 w=0,i=0 is given by where we are omitting the relative frequency of the last outcome in order to avoid degeneracy, and N i := ki−1 w =0 N w i . The covariance matrix Σ is given by where µ i := {µ(w|i)} ki−2 w=0 . If any of the counts N w i is equal to zero, this will lead to a singular covariance matrix, which can not be used in the Gaussian. A simple trick to get around this is to change this count to one for the calculation of the covariance matrix. This makes it no longer singular, and still provides a decent approximation to the multinomial distribution, provided the number of counts is high enough for the Gaussian to be a sensible approximation.
Without the i.i.d. assumption, the covariance matrix is no longer a simple function of the mean, but needs to be estimated separately.