Neural Network Approach to the Simulation of Entangled States with One Bit of Communication

Bell's theorem states that Local Hidden Variables (LHVs) cannot fully explain the statistics of measurements on some entangled quantum states. It is natural to ask how much supplementary classical communication would be needed to simulate them. We study two long-standing open questions in this field with neural network simulations and other tools. First, we present evidence that all projective measurements on partially entangled pure two-qubit states require only one bit of communication. We quantify the statistical distance between the exact quantum behaviour and the product of the trained network, or of a semianalytical model inspired by it. Second, while it is known on general grounds (and obvious) that one bit of communication cannot eventually reproduce all bipartite quantum correlation, explicit examples have proved evasive. Our search failed to find one for several bipartite Bell scenarios with up to 5 inputs and 4 outputs, highlighting the power of one bit of communication in reproducing quantum correlations.


Introduction
Quantum Mechanics is famous for having randomness inherent in its prediction.Einstein, Podolski and Rosen argued that this makes quantum mechanics incomplete, and suggested the existence of underlying Local Hidden Variables (LHV) [1].While this view was disproved by Bell's theorem [2,3], it has nevertheless proved fruitful to approach quantum correlations, without committing to an ontology of the quantum world, by asking which resources would one use to simulate them.Though insufficient, LHV provide an intuitive starting point -then, the question becomes: which additional resources, on top of the LHV, are needed to simulate quantum correlations?.Some works have considered nonlocal boxes as supplemen-Peter Sidajaya: peter.sidajaya@u.nus.edutary resources [4][5][6]: while appealing for their intrinsic no-signaling feature, these hypothetical resources are as counterintuitive as entanglement itself, if not more.Classical communication, on the other hand, is a resource that we use on a daily basis and of which therefore we have developed an intuitive understanding.Because we are thinking in terms of simulations and not of ontology, we are not impaired by the very problematic fact that communication should be instantaneous if taken as the real underlying physical mechanism.
Therefore, we are interested in the question of how much classical communication must supplement LHV to simulate the behaviour of a quantum state.For the maximally entangled state of two qubits, after some partial results [7][8][9], Toner and Bacon provided a definitive solution by describing a protocol that simulates the statistics of all projective measurements using only one bit of communication, which we refer to as LHV+1 [10].Subsequently, Degorre and coworkers used a different approach and found another protocol which also requires only one bit of communication [11].The case of non-maximally entangled pure states proved harder.By invoking the Toner-Bacon model, two bits of communication are certainly sufficient [10] and it was recently proved that two bits are also enough for POVM measurements [12].Meanwhile, Brunner and coworkers proved that one PR-box is not enough for projective measurements [5].But the simulation of those states in LHV+1 remained open.Only recently, Renner and Quintino reported an LHV+1 protocol that simulates exactly weakly entangled pure states [13].Our neural network will provide evidence that projective measurements on all two-qubit states can be very closely approximated in LHV+1.
The LHV+1 problem could, in principle, be approached systematically, since the behaviours that can be obtained with those resources are contained in a polytope.However, the size of this polytope grows very quickly with the number of inputs and outputs: as of today, after some initial works [14,15], the largest LHV+1 polytope to be completely char-acterized has three measurements per party and binary outcomes; and no quantum violation is found [16].Addressing the problem for higher-dimensional systems has also been challenging.Some results has been found for the average amount of communication [17][18][19].On the other hand, for the minimum amount, Brassard et al. showed that for n pairs of Bell states, the amount of communication necessary must grow as 2 n [7].This clearly shows that one bit must fail to suffice at some point, but the question of which states can or cannot be simulated by it remains open.Finally, in the finite output scenario, Vértesi and Bene showed that a pair of maximally entangled four-dimensional quantum systems cannot be simulated with only one bit of communication by presenting a scenario involving an infinite number of measurements [20].Here, we will try to answer whether there is a finite scenario where one bit of communication fails to simulate a quantum correlation.
In recent years, there have also been increasing attempts to study quantum correlations with machine learning.Many of them reveal the great potential neural network has in tackling the complexities in detecting nonlocality and entanglement [21][22][23][24][25][26].The choice of tackling the LHV+1 problem with machine learning is prompted by the fact that there is no compact parametrisation of LHVs, nor of the dependence of the bit of communication from the parameters of the problem.Thus, we are looking for a solution to a problem, whose variables are themselves poorly specified.Moreover, similar to an LHV model, everything inside a neural network has definite values.Thus, it seems natural to devise a machine learning tool, specifically an artificial neural network (ANN), to act as an LHV model.
This work is separated into two sections.In Section 2, we study the simulability of the correlations of entangled state with classical resource and one bit of communication using a neural network.We also present a semianalytical protocol which approximates the behaviour of partially entangled two-qubit states with one bit of communication, and we also study the errors of our protocol.In Section 3, we also try to find a quantum behaviour in dimensions higher than two qubits that could not be simulated by a single bit of communication.
2 Simulating Two-qubit Entangled States using Machine Learning

Using Neural Network to generate protocols
Inspired by the use of a neural network as an oracle of locality [24], we approached the problem using an artificial neural network.The network takes in measurement settings ⃗ a and ⃗ b as an input and outputs an LHV+1 bit probability distribution, enforced by an architecture that forces the suitable locality constraints, which we will discuss below.The output distribution is then compared against the target distribution using a suitable error function, such as the Kullback-Leibler divergence.
The Local Hidden Variables (LHV) are described by a random variable λ shared among both parties.For a finite communication model to work, λ needs to be a random variable of infinite length [17].Besides that, λ can be of any form.The analytical model of Toner and Bacon uses a pair of uniformly distributed Bloch vectors, that of Renner and Quintino [13] uses a biased distribution on the Bloch sphere.The neural network of [24] simulates a model where the LHV is a single real number, distributed normally or uniformly.In theory, the choice is ultimately redundant because the different LHV models can be made equivalent by some transformation.However, the neural network will perform differently since it can only process a certain amount of complexity in the model.From trial and error, we settled on Toner and Bacon's uniformly distributed vector pair as the LHV model in our neural network.
A probability distribution P (A, B) is local if it can be written as The network approximates a local distribution by the Monte Carlo method as where N is a sufficiently large number (≥ 1000).In the network, Alice and Bob are represented as a series of hidden layers.Each of the parties takes in their inputs according to the locality constraint and outputs their own local probability distribution.The activation functions used in the hidden layers are the standard functions, such as the rectified linear unit (ReLU) and the softmax function used to normalise the probabilities.The forward propagation is done N times using varying values of λ i sampled from the chosen probability distribution.Thereafter we take the average of the probabilities over N to get the probability distribution as expressed in equation (2).
To move from LHV to LHV+1, we notice that sending one bit of communication is equivalent to giving Alice the power of making the decision to choose between one out of two local strategies.The recipe looks as follows: • Alice and Bob pre-agreed on two local strategies P L,1 and P L,2 , as well as on the λ to be used in each round.It seems to us that all previous works in LHV+1 assumed but of course there is no need to impose such a constraint.
• Upon receiving her input ⃗ a, Alice decides which of the two strategies should be used for that round, taking also λ into account.Some of the previous LHV+1 models used a deterministic decision model, but there is no reason to impose that: Alice's decision could be stochastic.She informs Bob of her choice with one bit of communication c, and Bob consequently keeps his outcome for the chosen strategy.
Thus, given a randomly sampled LHV λ i , the LHV+1 model is described by ( where we labeled c = +1 (respectively c = −1) the value of the bit of communication when Alice decides for strategy 1 (resp.2).Now the complete model consists of two local networks and one communication network.The communication network consists of a series of layers whose inputs are the same as Alice's and outputs a number between 0 and 1 by using a sigmoid activation function, representing P (c | ⃗ a, λ i ), which then is used to make a convex mixture of the two local strategies, for the particular inputs and LHV.The final network architecture, then, can be seen in Fig. 1.This approach of using a neural network to generate local strategies was originally used in a network setting [24].In that work, the network was used to verify nonlocality by looking for transitions in the behaviours of distributions when mixed with noise.When a state is mixed with noise, it lies within a local set, up to a certain noise threshold; reducing the amount of noise in the state allows for the identification of sharp transitions in the network's error, indicating when the state exits the local set.Here, instead of such an oracle, we will use the network to generate a protocol to simulate the quantum state by analysing its outputs.

Simulating Two-qubit States
For a two-qubit scenario, the joint measurements can be defined by two vectors in the Bloch sphere, i.e. ⃗ a, ⃗ b ∈ S 2 .Here, we are only considering projective measurements with binary outputs.Thus, the projectors are defined by and similarly for Bob.The behaviour is the set i.e., the set of correlations for all possible measurement directions.

Maximally Entangled State
The maximally entangled state case has been solved analytically by Toner and Bacon [10].Thus, we used this state as a test bed for our machine learning approach by training the machine to simulate the distribution of the maximally entangled state |Ψ − ⟩.
A snapshot of the behaviour of the trained model can be seen in Fig. 5 in Appendix B. These figures, along with others which can be generated with the code, are very similar to the figures in the paper of Toner and Bacon, with theirs being plotted on a sphere and ours being a projection.The major difference is, in our plots, Alice's output is different depending on the bit of communication.By comparing our guesses and the plots generated by the neural network, we deduced that the behaviours of the parties in the neural network are: Maximally entangled state protocol:

Bob outputs
where sgn(x) = x |x| is the sign function.
The protocol bears much resemblance to Toner and Bacon's original protocol, with the only difference being the output of Alice, which are simply −sgn(⃗ a • ⃗ λ 1 ) in the original protocol.
We can further check that this is model reproduces the correct correlation by checking the expected marginals of the maximally entangled state: While the first two equations can be proven analytically to be correct for our functions using the same method used by Toner and Bacon, proving the third equality is more difficult.However, numerical integration shows that the third equation also holds up to arbitrary numerical precision, depending on the accuracy set for the numerical integration, for uniformly sampled combinations of ⃗ a and ⃗ b.The relative error between the neural network models' behaviours and the quantum behaviours.The blue dots are the original model described, while the red crosses are the simplified model described in the text.The grey shaded region is the region in which an LHV+1 model is known [13].

Non-maximally Entangled States
We now apply the same method to the nonmaximally entangled two-qubit states.Without loss of generality, any pure two-qubit states can be written in the form of |ψ(α)⟩ = cos(α) |01⟩ − sin(α) |10⟩ , α ∈ 0, π 4 using a suitable choice of bases.The state is maximally entangled when α = π 4 and separable when α = 0. We trained the network to simulate the distribution of |ψ(α)⟩ with α ∈ 0, π 4 .A selection of the resulting protocol is shown in Fig. 6.
The errors of the models for these states are similar to the one for the maximally entangled state (see Fig. 2).This does not necessarily mean that they successfully simulate the states exactly, instead of simply approximating them.If the behaviour were actually nonlocal, we should expect a transition in the error when we mix the state with noise, signifying the exit of the state from the local+1 bit set.However, we observe no clear transition occurring when noise is added to the state, only a shallow gradient, suggesting that it is still inside the local+1 bit set.While encouraging, this does not constitute a proof, and we would still need to write an analytical protocol.Unlike for the case of maximally entangled states, the models we obtained for the non-maximally entangled states are more complex and our attempt to infer a protocol begins by looking at figures similar to Fig. 6 in Appendix B.
We start from the parties' outputs: The outputs of Alice are of the form of where ⃗ λ a1 = u a1 ⃗ λ 1 + ⃗ λ 2 + v a1 ⃗ z decides the hemisphere direction and b a1 = w a1 +x a1 ⃗ λ 1 •⃗ z+y a1 ⃗ λ 2 •⃗ z decides the size of the hemisphere.Similarly, Using numerical algorithms, we can approximately obtain the relevant coefficients, laid out in Table 4 in Appendix A for the different states.This guess comes from intuition.There are three points to note.First, Fig. 6 shows that the outputs of the parties remain hemispherical, meaning that it is still generated from a sgn function.Thus, we obtain this expression by first fitting the normal direction of the plane in the hemispherical model to the neural network output.By looking at the fitted hemispherical model, one can see that the general movement of the normal direction is a weighted sum of ⃗ λ 1 and ⃗ λ 2 , thus we have the parameter u.Second, the expected values ⟨A⟩, ⟨B⟩, ⟨AB⟩ for the non-maximally entangled states are biased in the ẑ direction.More specifically, they are Thus, we add an arbitrary parameter that allows biases in the form of the parameter v. Third, since ⟨A⟩ and ⟨B⟩ are nonzero, the hemispheres cannot divide the sphere perfectly in two and we must give a bias inside the sign function in the form of b, which then gives the parameters w, x, and y.Note that since the parameters come from what essentially is a regression, not all of the parameters are needed.In fact, by looking at Table 4, many of these parameters are basically zero for many of the states and parties.Moreover, many of the parameters (b and v) should reduce to zero for the maximally entangled state in order to obtain the Toner and Bacon model.The fact that v is quite far from zero for the maximally entangled state shows that much of the errors in fitting this model might come from this parameter.
So far the expression is still integrable.However, things start to become complicated when we move to the bit of communication.To start, notice (Fig. 6) that the neural network simulations converge to a so-lution, in which the bit of communication is not deterministic for some inputs1 .However, guessing such a stochastic communication function proved impossible for us: hence, we tried to force a simpler deterministic model by reducing the communication part of the neural network to a single hidden layer.Albeit having a worse error (see Fig. 2), we managed to obtain a close expression of the output of this new, simplified, network.Its output can be seen in Fig. 7 in Appendix B.
The (simplified) bit of communication is given by where , and Θ(x) = 1 2 (1 + sgn(x)) is the Heaviside step function.Again, the relevant coefficients obtained using numerical methods are listed in Table 4 in Appendix A.
The intuition behind this choice of seemingly random function comes from the fact that it must reduces to the Toner and Bacon model for the maximally entangled state.When we look at Fig. 7 in Appendix B, we can see that the form is similar to the usual product of two sgn functions in the Toner and Bacon model.However, in the original Toner and Bacon model, there is always a pair of antipodal points bordered by two +1 regions on opposite sides and two -1 regions on opposite sides, forming a quadripoint.These points corresponds to the unique vectors ±⃗ n which are orthogonal to both ⃗ λ 1 and ⃗ λ 2 .This pair of quadripoints cease to exists in the non-maximally entangled states models, thus there is an 'overflow' or an imbalance in the +1 and -1 regions.In order to mimic this, we first notice that sgn(x) = Θ(x) − Θ(−x).
Next, we split the two sgn functions into these two terms and multiply the two of them, obtaining four terms in total.Then, we add a bias term b c in each of the terms to mimic the imbalance or the 'overflow' in the different regions.In general, we can set the eight different biases in each Heaviside step function to be independent of each other, but we decided against that to prevent overfitting an already complex model so as to make it unreadable.We have decided to place the biases in such configuration to respect the antipodal symmetry of the figures (the transformation ⃗ a → −⃗ a will interchange the first and second term and similarly for the third and fourth term) and to oppose the positive terms symmetrically with the negative terms.The final form of the bias itself is found through regression and guessing.Finally, we normalise the function back with the clip function.
We call this set of functions which we guess to be the functions of the neural network the semianalytical model.Since the results we have presented are numerical in nature, this protocol is not an exact protocol, but simply an approximation.In theory, we could analytically integrate these functions and try to match the numerical parameters to the expectation values of the quantum behaviour.However, the communication function is simply too complex for us to perform this integration.Thus, in the next section, we will benchmark the performance of this model along with the neural network with more statistical measures to get an intuition of the 'closeness' of these approximations.
It is worth noting that the model suggested by our neural network is very different from that of Renner and Quintino [13].Notably, in their model, one of the LHV is distributed with a bias; while we have set in our code that the LHV are two uniformly distributed Bloch vector.The fact that there exist different LHV+1 models is not surprising: already for the simulation of the maximally entangled state, the model of Degorre et al. [11] differs from that of Toner-Bacon [10].

Statistical Analysis of the Simulations
After presenting our protocols, we can now consider the performance of our protocols, both the neural network protocol itself and the semianalytical protocol we distilled from it.These LHV+1 protocols are not exact protocols, but approximations, and we can describe their closeness to the quantum behaviour by providing statistical error values.To get a better intuition on the error values, let us consider a hypothesis testing scenario [27].Suppose that we have an unknown sample of length n generated by the same measurement done to n identical systems.Suppose also that we know that the systems are all actual quantum systems (P Q ), or our LHV+1 models (P LHV +1 ), but we do not know which.Let us take P LHV +1 as the null hypothesis.Let a be the Type I error (mistakenly rejecting a true null hypothesis).In our case, a Type I error would correspond to our machine learning model successfully spoofing as a quantum system.For any decision-making procedure, the probability of a Type I error is lower bounded by a ≥ e −nD KL (P Q ||P LHV +1 ) .Thus, in order to have 95% confidence in rejecting a sample from the LHV+1 model, we would need a sample size of .
The sample size n needed to distinguish the probability distributions differs with the measurement settings, with some measurement settings being more difficult to distinguish.The performance of our LHV+1 models (both the machine learning and our semianalytical approximations) over the measurement settings are given in Fig. 3.It can be seen that from the neural network's protocols to our semianalytical approximations, we have gained about two orders of magnitude in Kullback-Leibler divergence.This is due to the limitations of our numerical methods used to obtain the optimum parameters, and the fact that we were bound to have missed some details from the behaviour of the network when we translated it into analytical expressions.
Our semianalytical protocols require, on average, hundreds of measurements before they can be distinguished from real quantum behaviours, disregarding other noises present in an actual quantum system.Even better, when considering the neural network themselves, it would take upwards of 10 4 samples to distinguish them from an actual quantum system.
As previously mentioned, ideally one might try to see whether the semianalytical protocol, when integrated analytically to give the full behaviour, can be made into an exact protocol with the correct parameters.However, as the communication function is very tricky to analytically integrate, this approach might not work.On the other hand, considering that an exact protocol can already simulate some two-qubit states, these pieces of evidence suggest that all twoqubit states can be simulated with just a single bit of communication.However, ultimately, the question of exactly simulating partially entangled states with one bit of communication remains open.

Searching for Bell violation of the one-bit of communication polytope
Since two-qubit states are simulatable up to a very good precision, we now consider a different question: can we find an explicit quantum behaviour that is unsimulatable with one bit of communication?We try to go to higher dimensional systems and try to find a Bell-like inequality for the communication polytope.
As far as we know, no violation of a Bell-like inequality for the one-bit of communication polytope has ever been described.For the rest of the section, let L be the local set, Q be the quantum set, and C be the one-bit of communication set.In other words, we are interested in points inside of Q that lie outside C. Let also A (B) be the output set of Alice (Bob) and X (Y) her (his) input set.We will be describing scenarios with the (|X |, |Y|, |A|, |B|) notation.Similar to L, C is also a convex polytope.However, unlike it, it does not lie inside the no-signalling N S space.Bacon and Toner described the complete one bit polytope for the (2,2,2,2) scenario.They also considered the (3,3,2,2) scenario, but only for joint observables [14].Maxwell and Chitambar expanded these results to the (3,2,2,2) scenario later on [15].Finally, the latest results are given by Cruzeiro and Gisin, which characterised the (3,3,2,2) communication polytope [16].In all these works, no violation of C by a quantum correlation is found.Therefore, we need to go to higher dimensions.More specifically, we consider scenario with 3 outputs and beyond where Alice has more than 2 inputs (when Alice has only two inputs, she can trivially send her input to Bob and every N S behaviour can be simulated).

Description of the polytope
The number of extremal points of C is far larger than that of L. Indeed, the number of local deterministic strategies is |A| |X | |B| |Y| .The number of deterministic strategies that can be performed with a single bit is by counting; by removing duplicates, it reduces to [16] C is the convex polytope formed by these vertices.In practice, we can only generate polytopes of up to around 2 × 10 7 points due to memory limitations.Since the number of extremum points for C is much larger than for L, we can quickly discard the possibility of performing full facet enumeration.Hence, we would have to resort to other methods for our search.

Random sampling quantum behaviours in higher dimensions
We first tried to sample points from Q by measuring the maximally entangled two-qutrit and two-ququart state with measurements sampled uniformly in the Haar measure, before using linear programming to solve the membership problem for C.However, this method proved ineffective as we did not manage to find any behaviour which lies outside C, and even a significant amount still lies inside L. The statistics of this method can be seen in Table 1.

Using non-signalling points
The next method we used was to use points in N S (P N S ) and mix them with noise (P noise ) to obtain .Each of the smaller eight boxes corresponds to the output table for a particular combination of inputs, which Alice's input indexing the vertical dimension and Bob's the horizontal.In each of the boxes, the 4 × 4 table corresponds to the outputs of Alice and Bob, with Alice's in the vertical and Bob's in the horizontal.Note that the 1 here means 1  4 , which means that each of the boxes sum up to 1, as required.
for some correlation weight 0 ≤ w ≤ 1.We do this in order to find out the threshold weights at which the correlation exits the sets Q and C. If we find a P N S which exits Q at a higher correlation weight w Q than the corresponding one for C, w C , all behaviours with w C < w < w Q would be behaviours in Q that are unsimulatable by one-bit of communication.Thus, we will focus on finding the smallest value of the gap (w C − w Q ), where a negative value of the gap would signify a violation.A graphical illustration can be seen in Fig. 4.
The membership problem for Q is solved using the NPA hierarchy method [28] with level 2 hierarchy, which we ran using QETLAB [29].Note that the NPA hierarchy gives only an upper bound on Q.As before, the membership problem for C is solved using linear programming.
Unfortunately, choosing a suitable P N S proved to be a challenge.The extremum points of the N S space have only been characterised for binary inputs or binary outputs [30,31].Here, we mostly used nonlocal points which are locally unbiased, i.e. for all inputs, all the local outputs are of equal probability, and maximally correlated, i.e. for all input combinations, there is a perfect correlation between Alice and Bob's outputs, for a particular output of Alice the output of Bob is guaranteed and vice versa.While we tried other non-signalling points, this particular class of points gave us the smallest gap (w C − w Q ) in all scenarios.Similarly, there are also numerous choices for P noise , but we found that white noise gives the smallest gap in most scenarios.
The maximum quantum bound of this inequality can be obtained by the maximally entangled twoququart state, with suitable measurements found with heuristic numerical methods.Equivalently, with the same measurements, this state is able to realise the correlation w Q P N S +(1−w Q )P noise , where P N S is the correlation in Table 2 and the w Q = 2 3 as previously mentioned.This explicit behaviour meets the NPA bound to merical precision, and thus we can conclude that the maximum correlation weight w Q = 2 3 and the quantum bound of 3  4 for the inequality are exact in this case.
This point represents our closest attempt at finding a violation with this method.The number of extremal points for C in the (4, 2, 4, 4) scenario is around 1×10 6 , and it is still possible to go one input higher to (4,3,4,4) or (5,2,4,4).A violation might exist there, but our heuristic search proved unfruitful.In the end, contrary to the prepare-and-measure scenario [12], it still remains an open problem to find a bipartite quantum behaviour that is provably unsimulatable with one bit of communication 2 .

Conclusion
In this work, we tried to further the works that have been done on characterising the communication complexity cost of quantum behaviours.We tried to obtain a protocol to simulate partially entangled twoqubit states using a neural network, and we presented 2 One might hope that the prepare-and-measure (P&M) example [12] could be an entry point; but it turns out not to be.Indeed, consider the corresponding entanglement-based scenario, in which one steers probabilistically the P&M behaviors P (b|u = (x, a), y) from some bipartite correlations P (a, b|x, y).However, since the P&M examples only use one qubit and a projective measurement by Bob, the bipartite behavior can be simulated with at most one trit [13], and probably one bit as the first part of our paper proves.
a semianalytical LHV+1 protocol based on the protocol of the neural networks.While these protocols could only approximate the quantum behaviours, on average one needs hundreds of measurement data, for the semianalytical protocols, and tens of thousands of measurement data, for the neural network protocols, in order to be distinguished from the quantum behaviour.We also tried to find quantum behaviours in higher dimensions that could not be simulated with one bit of communication.While we were able to find a Bell-like inequality that has the same maximum value in Q and C, we were unable to find a violation.
From this work and all the previous works done on the topic, it can be seen that evaluating the capabilities of entangled quantum states in terms of communication complexity is very difficult.While we are confident that a behaviour that cannot be simulated with a single bit could probably be found, extending the work to more bits and states would probably be too difficult, barring any new revolutionary techniques.On the other hand, from our result that numerical protocols that closely approximate the twoqubit entangled states can be found, the task of simulating partially entangled two-qubit states using one bit of communication exactly is probably possible and a fully analytical protocol could probably be found in the near future.
Note added in proof : while the review of this paper was being finalised, Márton et al. showed how to find quantum correlations outside C [32] by using parallel repetition.The Bell scenarios, for which such examples have been found, have many more inputs and/or outputs than those studied here.

B Figures
In the following figures, we will show the output of the neural networks in terms of figures similar to the ones found in [10], but in polar coordinates (in θ and ϕ).These figures show how Alice and Bob would respond in a given round for a certain fixed pair of LHV.I.e., the two "Alice" plots are the plots of the function as a function of ⃗ a for i ∈ {1, 2}.Similarly for the "Communication" plot (P (c = +1 | ⃗ a, λ = λ f ixed )) and the "Bob" plots (P (B i = +1 | ⃗ b, λ = λ f ixed )).Thus, the axes are the input of Alice or Bob, depending on the plots.The pink and cyan dots are the position of the LHV, which are uniformly distributed throughout the spheres, but are fixed in these figures to show the shape of the functions.Here the axes are the Bloch sphere in polar coordinates.In the "Alice" and "Communication" plots, they show the direction of ⃗ a.In the "Bob" plots, they show the direction of ⃗ b.The colour is the output of the A, B, c functions of the neural network.We give an analytical expression of these functions in Section 2.2.1.While the lines are not smooth, the protocol suggests more or less a Toner-Bacon like protocol.

P 2 (Figure 1 :
Figure 1: The architecture of the Artificial Neural Network (ANN).The model consists of two local distributions and a communication network.In each distribution, the two parties are constrained by locality by routing the input accordingly.The communication network outputs a value between 0 and 1, and represents the probability of Alice sending a certain bit to Bob.The output for a particular round is then simply the convex combination of the two local distributions.

Figure 2 :
Figure2: The relative error between the neural network models' behaviours and the quantum behaviours.The blue dots are the original model described, while the red crosses are the simplified model described in the text.The grey shaded region is the region in which an LHV+1 model is known[13].

Figure 3 :
Figure 3: Violin plots for the neural network (blue) the semianalytical protocol we presented (red) describing the following values: (a) The Kullback-Leibler divergence between our protocols and the quantum behaviours.(b) The Total Variational Distance between our protocols and the quantum behaviours.(c) The minimum sample size needed to have at least 95% confidence in distinguishing the two behaviours as described in the hypothesis testing scenario.In all three, the violin shapes illustrate the distributions of the values over the different projective measurements on the two-qubit state.

wFigure 4 :
Figure4: wQ is the threshold correlation weight for the quantum set Q, while wC is the threshold correlation weight for the one-bit communication set C. Thus, wC < wQ would imply a violation and would give a quantum behaviour that could not be simulated by a single bit of communication.

Figure 5 :
Figure 5: The protocol generated by the ANN for |ψ( π 4 )⟩.Here the axes are the Bloch sphere in polar coordinates.In the "Alice" and "Communication" plots, they show the direction of ⃗ a.In the "Bob" plots, they show the direction of ⃗ b.The colour is the output of the A, B, c functions of the neural network.We give an analytical expression of these functions in Section 2.2.1.While the lines are not smooth, the protocol suggests more or less a Toner-Bacon like protocol.

Figure 6 :
Figure 6: The protocol generated by the ANN for |ψ( 5π 32 )⟩.Notice the that now the transitions between the red and blue regions are less sharp than the one of the maximally entangled state.This means that now the bit of communication sent is nondeterministic in those areas.This model is trained starting from the |ψ( π 4 )⟩ model shown above, thus it has similar features.

Figure 7 :
Figure 7: The protocol generated by the ANN for |ψ( 5π 32)⟩, but with the caveat that the bit of communication is now of a simpler form.Notice that now, the bit of communication sent is deterministic again.The forms of the parties' outputs are slightly different from the previous figures (Fig.6) because they are trained from different starting models.

Table 1 :
|X |, |Y|, |A|, |B|) Points sampled Proportion in L The statistics for the random sampling approach.None of the points sampled fall outside C.

Table 2 .
The table itself can also be interpreted as a Bell inequality by taking the terms in the table as the coefficients of the correlation terms and adding all of them.When normalised(|X |, |Y|, |A|, |B|)

Table 3 :
The smallest gap (wC − wQ) in each scenario we studied.