Contextuality in entanglement-assisted one-shot classical communication

We consider the problem of entanglement-assisted one-shot classical communication. In the zero-error regime, entanglement can increase the one-shot zero-error capacity of a family of classical channels following the strategy of Cubitt et al., Phys. Rev. Lett. 104, 230503 (2010). This strategy uses the Kochen-Specker theorem which is applicable only to projective measurements. As such, in the regime of noisy states and/or measurements, this strategy cannot increase the capacity. To accommodate generically noisy situations, we examine the one-shot success probability of sending a fixed number of classical messages. We show that preparation contextuality powers the quantum advantage in this task, increasing the one-shot success probability beyond its classical maximum. Our treatment extends beyond Cubitt et al. and includes, for example, the experimentally implemented protocol of Prevedel et al., Phys. Rev. Lett. 106, 110505 (2011). We then show a mapping between this communication task and a corresponding nonlocal game. This mapping generalizes the connection with pseudotelepathy games previously noted in the zero-error case. Finally, after motivating a constraint we term context-independent guessing, we show that contextuality witnessed by noise-robust noncontextuality inequalities obtained in R. Kunjwal, Quantum 4, 219 (2020), is sufficient for enhancing the one-shot success probability. This provides an operational meaning to these inequalities and the associated hypergraph invariant, the weighted max-predictability, introduced in R. Kunjwal, Quantum 3, 184 (2019). Our results show that the task of entanglement-assisted one-shot classical communication provides a fertile ground to study the interplay of the Kochen-Specker theorem, Spekkens contextuality, and Bell nonlocality.


Introduction
The problem of identifying the resources responsible for a quantum advantage over classical strategies in quantum information and computation is key to unlocking the potential of quantum technologies. Often, such resources are taken to be theory-dependent features like entanglement, coherence, incompatibility, or perhaps the exponential scaling of Hilbert space dimension with the number of quantum systems at hand. The nonclassicality witnessed by Bell violations [1,2] makes it possible to identify a source of quantum advantage that can be assessed in a theoryindependent fashion, relying only on empirical data rather than internal features of the theory that generated the data. In contrast to the case of Bell nonlocality, Kochen-Specker (KS) contextuality [3], a notion of nonclassicality mathematically similar to Bell nonlocality, hasn't been as widely adopted as a theoryindependent witness of quantum advantage. This is despite the existence of theoretical results on its relevance for quantum information and computation [4][5][6][7]. One reason for this is that it isn't robust to noise, unlike Bell nonlocality, making its experimental testability a matter of controversy [8,9] Recently, much work has been devoted to making contextuality a notion of nonclassicality that relies on empirical data without making assumptions about the representation of measurements (concerning, in particular, their sharpness [10][11][12]) in the theory generating the data [13,14]. 1 This noise-robust notion of contextuality due to Spekkens [13] has been shown to underlie several quantum information tasks such as parity-oblivious multiplexing, quantum random access codes, state discrimination, communication complexity, anomalous weak values, and statedependent cloning [19][20][21][22][23][24][25]. These applications of Spekkens contextuality, though, have no counterpart in terms of KS-contextuality, leaving a gap in our understanding of how advantages from KS-contextuality can be turned into noise-robust advantages premised on Spekkens contextuality. This is in line with the spirit of Ref. [17], where the first noise-robust noncontextuality inequality, inspired by the Kochen-Specker theorem, was derived. Since the approach to noiserobust noncontextuality inequalities generalizes the KS paradigm by removing restrictions like projective measurements [17], it behooves us to ask if, and in what precise form, the advantages that derive from KS-contextuality persist when one considers noiserobust contextualityà la Spekkens [13,26]. In this paper, we take the first steps in this research program using tools from previously proposed hypergraph frameworks [10,27].
We consider the problem of one-shot classical com- 1 Experimental testability of generalized contextuality -in particular, the need for tomographic completeness in order to verify operational equivalences -has been addressed in several recent papers [14][15][16] and we refer the interested reader to these for a discussion of such issues and how they are handled in the framework. Note, however, that if one assumes a quantum description of any experiment (as opposed to a general probabilistic theory -GPT -description), then verification of operational equivalences via tomography is straightforward as there is no ambiguity about the dimension of a well-characterized system's Hilbert space, hence about the minimum number of tomographically complete preparations/measurements needed to establish operational equivalences. This move -assuming a quantum description -is the natural one when considering applications of contextuality in quantum information as opposed to its foundational implications for quantum theory (which necessitates a broader framework like GPTs). On the other hand, even if one does assume a quantum description, Kochen-Specker contextuality cannot do justice to the case where this description involves non-projective measurements for reasons that have been extensively discussed elsewhere [10,13,17,18]. munication where it has been shown that, assisted by entanglement, KS-contextuality provides an increase in the one-shot zero-error capacity of classical channels based on the KS theorem [3,6]. We study a relaxation of this problem to one of enhancing the one-shot success probability of sending a fixed number of classical messages assisted by entanglement. Previous work [28,29] has studied the one-shot success probability in the case of classical channels (unrelated to the KS theorem [3]) which do not admit an enhancement of zero-error channel capacityà la Cubitt et al. [6]. In contrast, we here study the one-shot success probability for the general case that includes, in particular, channels which do admit an enhancement of the oneshot zero-error channel capacity. A schematic of the task of entanglement-assisted one-shot classical communication is outlined in Fig. 1.
Our results can be summarized as follows: • We show that preparation noncontextuality [13] relative to Bob's share of the system characterizes the classical upper bound on the task of enhancing the one-shot success probability of a classical channel assisted by entanglement. 2 Hence, preparation contextuality drives the quantum advantage in the general task.
• We then show a mapping between the one-shot communication task and a corresponding nonlocal game: preparation contextuality powers an advantage in the first task if and only if Bell nonlocality powers an advantage in the second. 3 This generalizes the connection between entanglement-assisted one-shot zero-error capacity and nonlocal (pseudotelepathy) games noted in Ref. [6].
• We motivate a constraint on the communication task that we term context-independent guessing in a situation where the receiver (Bob) has no knowledge of (or doesn't trust) the exact channel probabilities but knows only (or trusts only) the channel hypergraph. We then prove that, for some classical channels (including the one studied in Ref. [6]), the contextuality witnessed by a hypergraph invariant -the weighted maxpredictability -implies an enhancement of the 2 This does not require that the channel be based on the KS theorem in the sense of Cubitt et al. [6]. 3 Note that while the first task, by definition, requires communication via a classical channel-hence timelike separation, where the channel output (to the receiver) is within the future lightcone of the channel input (from the sender)-the second task, by definition, forbids all communication by requiring spacelike separation, where the sender and receiver are causally disconnected during each run of the nonlocal game. However, in both situations-timelike or spacelike separation-the shared common-cause resource doesn't result in signalling, i.e., we assume that this resource is described by a non-signalling theory (e.g., shared entanglement in quantum theory) [30]. one-shot success probability in the communication task. This makes direct connection with the formalism of Ref. [27], where weighted maxpredictability provides an upper bound on the strength of source-measurement correlations under the assumption of noncontextuality. We thus provide an operational meaning to the violation of noise-robust noncontextuality inequalities in Ref. [27]: namely, such violations power the enhancement of one-shot success probability of classical communication assisted by entanglement.
The structure of the paper is as follows: In Section 2, we define some preliminary notions from the theory of one-shot classical communication as well as contextuality. In Section 3, we describe the general protocol for entanglement-assisted one-shot classical communication that provides a unified description of protocols such as those of Refs. [6] and [28]. In Section 4, we discuss the resources that play a role in the quantum advantage in the communication task, including preparation contextuality and Bell nonlocality. In Section 5, we dive deep into the connection between the role of preparation contextuality in our communication task and the role of Bell nonlocality in a corresponding nonlocal game, proving some general relationships between them. In Section 6, we look at the problem of one-shot communication of a single bit through classical channels with complete confusability graphs, in particular the classical channel of Ref. [28]. In Section 7, we study the case of classical channels based on the KS theoremà la Ref. [6]. We conclude with a discussion in Section 8, mentioning some open problems and opportunities for future work.

Classical Channels
Consider a discrete and memoryless classical channel N (e.g., Fig. 2). Let X denote the set of input symbols of N and Y denote the set of output symbols so that {N (y|x)} x∈X,y∈Y denotes the channel probabilities satisfying: N (y|x) ≥ 0 for all x ∈ X, y ∈ Y and y∈Y N (y|x) = 1 for all x ∈ X. Further, we denote by Y x ⊆ Y the set of output symbols that have a non-zero probability of occurrence when the input symbol is x ∈ X, i.e., the support of x, given by Y x ≡ {y ∈ Y |N (y|x) > 0} for x ∈ X. Similarly, X y ⊆ X denotes the set of input symbols that yield a non-zero probability of occurrence for the output symbol y ∈ Y , i.e., the support of y, given by To the classical channel N , we associate the channel hypergraph H(N ): vertices of H(N ) denote the input symbols x ∈ X and hyperedges denote the output symbols y ∈ Y , such that each hyperedge representing y ∈ Y contains the input symbols in X y . Any two input symbols x, x ∈ X are said to be confusable when they share a hyperedge in H(N ), i.e., Y x ∩ Y x = ∅. The confusability graph G(N ) of the channel is given by the orthogonality graph of H(N ), i.e., its vertices are given by X and any two vertices in X are connected by an edge if and only if they are confusable.
Given the classical channel N , Alice and Bob choose an encoding of the messages (say, [q] ≡ {m} q m=1 ) that Alice (the sender) wants to send to Bob (the receiver) through the channel. An encoding is a collection of mutually disjoint subsets of X.
More concretely, A zero-error code is a set of input symbols X 0 ⊆ X that are mutually non-confusable, i.e., no two symbols in this set can map to the same output symbol when fed into the channel N . Hence, an encoding {X (m) } q m=1 of the messages in [q] is said to admit a zero-error code if and only if there exists a non-empty set The one-shot zero-error capacity of a classical channel is the number of messages that can be sent without error with one use of the channel, i.e., the cardinality of the largest zero-error code it admits. This is given by the independence number α(G(N )) of G(N ), namely, the cardinality of the largest set of vertices that share no edge in G(N ). Note that N does not admit a nontrivial zero-error code (i.e., with q ≥ 2) if and only if G(N ) is a complete graph, i.e., α(G(N )) = 1. Further, for any encoding {X (m) } q m=1 that does admit a zero-error code using N , we necessarily have q ≤ α(G(N )). We illustrate the above notions in Fig. 2 with a simple example of a classical channel that was studied in Ref. [28].
In this paper we will consider the one-shot success probability for sending messages in a given encoding . Such an encoding does not admit a zero-error code, although the classical channel N may admit (smaller) encodings with zero-error codes, i.e., it may be that α(G(N )) > 1. Of particular interest in this paper is a family of classical channels that we term Kochen-Specker (KS) channels. A KS channel is defined as any classical channel whose channel hypergraph satisfies the property of KS-uncolourability [27] [31], i.e., it is impossible to assign a {0, 1}-valuation to the vertices such that the assignments in each hyperedge add up to 1. 6 A KS-uncolourable hypergraph is said to admit a KS set if it is possible to associate its vertices to projectors on a Hilbert space and hyperedges to projector-valued measures (PVMs), i.e., the projectors in any hyperedge are mutually orthogonal and sum up to the identity. Any set of such projectors for a KS-uncolourable hypergraph is called a KS set. We will consider two paradigmatic examples of KS channels, drawing upon Refs. [6,28], only one of which [6] admits a KS set. 4 Hence, given any input symbol x (m) ∈ X (m) , the message m can be uniquely inferred from x (m) since X (m) ∩ X (m ) = ∅ for all m = m. In the interest of efficiency of encoding, we shall only consider encodings where each X (m) (m ∈ [q]) is a clique in G(N ). 5 6 What this means in terms of channel confusability is that it is impossible to pick a set of vertices with one vertex from each hyperedge such that all the vertices in the set are mutually non-confusable. Here X = {00, 01, 10, 11} and Y = {(1, 0), (1, 1), (2, 0), (2, 1), (P, 0), (P, 1)}. The support of input 01, for example, is given by Y01 = {(1, 0), (2, 1), (P, 1)} ⊆ Y and the support of output (P, 0), for example, is given by X (P,0) = {00, 11} ⊆ X. The channel probabilities are given by N (y|x) = 1/3 for all y ∈ Yx and 0 otherwise for all x ∈ X. The one-bit encoding of m ∈ Msg = {0, 1} is given by X (0) = {00, 01} ⊆ X and X (1) = {10, 11} ⊆ X. This encoding {X (0) , X (1) } does not admit a zero-error code. In fact, a zero-error code doesn't exist for this channel hypergraph since it does not admit even a pair of inputs that are mutually non-confusable, i.e., do not share an edge.

Contextuality
We will be interested in the twin notions of preparation and measurement noncontextuality following Spekkens [13]. In a general operational theory, a preparation procedure consists of a source setting, S, that prepares an ensemble of possible preparations indexed by source outcome s ∈ V S , each with probability p(s|S). We denote the ensemble for source setting S by {(p(s|S), [s|S])} s∈V S . Formally, we refer to the (abstract) device implementing this preparation procedure as a multisource. A measurement procedure consists of a measurement setting M that yields one of possible outcomes indexed by m ∈ V M with probability p(m|M, S, s) when a system prepared according to [s|S] is input to the measurement device. Formally, we refer to the (abstract) device implementing this measurement procedures as a multimeter. Together, the combination of source setting S and a measurement setting M yields conditional joint probability distribution given by p(m, s|M, S) = p(m|M, S, s)p(s|S). (See Fig. 3.) Two source settings S and S are said to be opera- and we denote this by [m|M] [m |M ].
In keeping with the original definition of generalized noncontextuality [13], and its further development in subsequent work [10], the notion of operational equivalence-for preparations and for measurements-is evaluated relative to all possible measurement and preparation events (respectively) in the operational theory governing the experiment of Fig. 3.
An ontological model of the operational theory consists of ontic states λ ∈ Λ that are sampled by a preparation [s|S] according to some probability distribution µ(λ|S, s), so that µ(λ, s|S) = µ(λ|S, s)p(s|S). Any measurement device responds to the input of an ontic state λ according to some probability, ξ(m|M, λ), called a response function. The ontological model reproduces the operational statistics as follows: The assumption of preparation noncontextuality entails the following implication: The assumption of measurement noncontextuality entails the following implication: A failure of the joint assumption of preparation and measurement noncontextuality is then said to be a demonstration of contextuality. 7 3 General protocol for one-shot classical communication assisted by a common-cause resource 3

.1 One-shot success probability
We want to consider the situation where Alice and Bob have access to a shared common-cause resource such as quantum entanglement (but possibly also more general nonclassical common-cause resources [30,33]) which they can use to enhance the oneshot success probability of sending messages through N . This situation of entanglement-assisted one-shot classical communication has been studied previously [6,28,29]. We will below take the nonclassical common-cause resource to be quantum entanglement for ease of presentation but the ideas extend to postquantum theories -corresponding, in general, to some convex subset of the set of no-signalling correlations -in a straightforward way.
The task at hand is the following: Alice and Bob share a classical channel N together with some bipartite quantum system in an entangled state ρ AB . Alice wants to send messages from the set Msg according to a probability distribution {p(m)} m∈Msg . To do this for some encoding {X (m) } m∈Msg defined on N , Alice implements a POVM M x } x∈X on her part, ρ A , of the shared state ρ AB given by She inputs x into the classical channel which yields output y ∈ Y with probability N (y|x). Using the output y and his part of ρ AB , Bob needs to figure out a strategy that will let him infer Alice's choice of measurement (hence the message m) with the maximum success probability, i.e., for every set of {m, x, y} we want to maximize the probability p(m = m|y, m, x) that Bob's guess for the message, denoted m , agrees with the message Alice sent.
On receiving the channel output y, Bob implements some measurements, say z } z∈O } v∈V , on his part of the shared quantum system according to 7 The assumption of noncontextuality is an instance of the Leibnizian idea of the ontological identity of operational indiscernibles [32]: operationally equivalent experimental procedures admit ontologically equivalent representations under this assumption. some probability distribution, say {p(v|y)} v∈V , that depends on y. The measurement outcome z ∈ O occurs with probability p(z|v, m, x) = Tr(E is the state on Bob's side that Alice "steers" to when she obtains outcome x for measurement m with probability p(x|m). Overall, Bob implements the effective POVM M y ≡ v p(v|y)M v given by the set of POVM ele- Bob's guess for the message, m , will then be a function of z, y, i.e., m = g(z, y), where g is a function from O × Y to Msg. We have a successful decoding when m = m. This effectively defines the overall measurement M The overall success probability is thus given by In situations where Bob can do this perfectly (the zero-error regime), i.e., S = 1, we obtain an exact simulation of a noiseless classical channel Id with the same input and output alphabet Msg with channel probabilities Id(m |m) = δ m,m by using the noisy classical channel N and (potentially) some shared common-cause resource. A schematic of the protocol is provided in Fig. 1.

KS channels that admit KS sets: Cubitt et al. strategy
For some KS channels admitting KS sets (e.g., Fig. 4), the Cubitt et al. strategy [6] corresponds to choos- As shown in Ref. [6], such channels admit an enhancement of their one-shot zero-error classical capacity in the presence of shared entanglement, i.e., their one-shot success probability S = 1 for some q > α(G(N )). We will discuss these channels in more detail in Section 7.

KS channels that do not admit KS sets
In general, a KS channel may not admit any KS set and in that case the strategy of Cubitt et al. [6] does not apply. An example of a KS channel that does not admit a KS set was studied for its one-shot success probability by Prevedel et al. [28]. This example fits within the general protocol described in Section 3.1 and will be discussed in Section 6.
4 Quantum advantage in one-shot classical communication

Classical one-shot success probability
Classically, the only information that Bob has about Alice's measurement and its outcome, i.e., m and x, is mediated by the output of the channel, y. We therefore have p(m |y, m, x) = p(m |y), i.e., m is conditionally independent of m and x, given y. Shared randomness does not help because it only amounts to a convex mixture of deterministic classical strategies (indexed by, say, c ∈ C) according to some probability distribution {p(c)} c∈C and no such convex mixture can do better than the best deterministic classical strategy. The classical one-shot success probability is therefore given by with the tight upper bound so that

Preparation contextuality drives the quantum advantage
In the protocol we have described, the following operational equivalences hold: ≡ ρ B for all pairs of distinct messages m 1 , m 2 ∈ Msg. 8 This follows from the fact that the common-cause correlations shared between Alice and Bob must be nonsignalling: Alice's choice of POVM M m encoding the message m (m ∈ Msg) steers Bob's system to the ensemble of states x )} x∈X (m) ; however, on coarse-graining, the reduced state on Bob's side, ρ B = Tr A ρ AB , is the same for all choices of m, and thus the common-cause correlations cannot be used by Bob to infer m.
Preparation noncontextuality then entails that where Λ is the ontic state space of Bob's system. Given the prior distribution {p(m)} m∈Msg and the channel probabilities {N (y|x)} y,x , we obtain, under the assumption of preparation noncontextuality for Bob's system, the following expression for the oneshot success probability and the preparation noncontextual upper bound on it: 8 Although we are using quantum notation here, operationally, this prepare-and-measure setup on Bob's system (see Fig. 3) can be viewed as a multisource with settings S = m ∈ Msg and outcomes s = x ∈ X that occur with probability p(s|S) = p(x|m).
The operational equivalences can then be expressed as in Eq. 1, i.e., ∀[m|M], translates to m ) for all [m|M]. The multimeter has settings M and outcomes m that range over the set of all measurement events in the operational theory. Note, however, that as long as we assume the shared common-cause resource is non-signalling (as is the case with entangled states in quantum theory), we do not need to explicitly verify these operational equivalences by varying over all measurement events: these equivalences are implied by the non-signalling nature of the shared common-cause resource.
To see how this comes about, note that the one-shot success probability, when expressed in terms of an ontological model for Bob's system, requires that p(m |y, m, x) = λ p(m |y, λ)p(λ|m, x). We can then write a joint probability distribution p(x, λ|m) = p(λ|m, x)p(x|m), which can be rewritten as p(x, λ|m) = p(x|m, λ)p(λ|m). Recalling that preparation noncontextuality requires p(λ|m) = p(λ) for all m, we obtain the expression for S PNC . Hence, we have: S max PNC = S max Cl . We now argue that the upper bound S max PNC can be saturated by a preparation noncontextual ontological model. Such a model (achieving Thus, p(λ|m) = x p(x|m)p(λ|m, x) = δ λ,λmax implies that p(λ|m, x) = δ λ,λmax for all m, x. We must then have p(m |y, m, x) = λ p(m |y, λ)p(λ|m, x) = λ p(m |y, λ)δ λ,λmax = p(m |y, λ max ), i.e., the statistics of y does not change in response to variations in m and x but is directly determined by the ontic state that is deterministically sampled by every preparation procedure. In fact, the response functions cannot even deviate from the best deterministic classical strategy. This preparation noncontextual ontological model, therefore, trivially reproduces the operational equivalence required on Bob's system, i.e., the operational equivalence between all the coarse-grainings of preparation ensembles induced by Alice's measurements. It also achieves S = S max PNC by fixing the response functions on Bob's side to mimic the best deterministic classical strategy. 10 10 Hence, the ontological model can only simulate an operational theory that has just one equivalence class of preparations and, furthermore, associates outcomes to its measurements deterministically. As such, in the presence of other empirical facts that an operational theory might present (such as the simple fact that Bob's system can be prepared in operationally inequivalent ways), this preparation noncontextual model will fail to reproduce predictions of the theory that go beyond the required operational equivalence between preparation procedures. Generically, therefore, any non-trivial (at least in the Hence, we have that S ≤ S max Cl is a preparation noncontextuality inequality and any quantum advantage in this communication task witnesses preparation contextuality.

The shared state must violate a Bell inequality under the local measurements used in the communication protocol
In a quantum implementation of the communication protocol, shared entanglement between Alice and Bob is crucial for there to be an advantage over the classical one-shot success probability. However, entanglement alone is not enough: the entanglement must be such that it enables a Bell inequality violation relative to the local measurements that Alice and Bob implement. To see this, consider a nonlocal game that uses the same resources-shared entanglement and local measurements-as the communication protocol but under spacelike separation (hence no classical channel): Alice and Bob implement their local measurements labelled by m and y, respectively, and obtain their respective outcomes x and m with a joint probability p(x, m |m, y) and the joint statistics thus collected admits a locally causal model, i.e., p(x, m |m, y) = ω∈Ω p(x|m, ω)p(m |y, ω)p(ω), (13) where ω denotes the shared ontic state sampled from the ontic state space Ω of the bipartite system Alice and Bob share. (Note that this is, in general, different from the ontic state space Λ for Bob's system alone.) In such a case, it is straightforward to see that the achievable success probability is no better than the best deterministic classical strategy. Firstly, sense of admitting operationally inequivalent preparation procedures) preparation noncontextual ontological model will only achieve S < S max PNC .
which can be now be expressed as Hence, Bell nonlocality of the shared entangled state relative to the measurements carried out by Alice and Bob is a necessary condition for a quantum advantage in this communication task. 11

Preparation contextuality vis-à-vis Bell nonlocality: the connection with nonlocal games
It is known that any bipartite proof of Bell nonlocality can be turned into a proof of preparation contextuality on each wing of the Bell experiment, i.e., Bell nonlocality implies preparation contextuality on both wings of the Bell experiment. It is easiest to see this in the contrapositive: that is, the existence of a preparation noncontextual ontological model on any wing of the Bell experiment implies the existence of a locally causal model for the Bell experiment. We provide an explicit argument in Appendix A. 12 For the task of enhancing the one-shot success probability of a classical channel, preparation contextuality and Bell nonlocality are even more intimately related than the general situation above. As we just showed in Section 4.3, S > S max Cl implies Bell nonlocality of the joint statistics {p(x, m |m, y)} x,m ,m,y . This allows us to state the following proposition: 11 Note, however, that -contrary to the counterfactual Bell scenario we just considered -the measurement choice y in the communication protocol of interest is not free and is determined probabilistically by Alice's measurement outcome x. That is, the protocol requires a wiring of Alice's system with Bob's using the classical channel, something distinct from a Bell scenario. 12 On the other hand, in the simplest possible scenario capable of exhibiting preparation contextuality (with two tomographically complete binary measurements and four preparations), it has been shown that the existence of a preparation noncontextual ontological model is equivalent to the existence of a locally causal model in any bipartite extension of the one-party scenario to a CHSH scenario [34]. Ref. [34] also noted that Barrett was the first to show the implication from preparation noncontextuality to local causality in any bipartite extension of a given one-party scenario. This has also been observed in Ref. [35] Proposition 1. For every classical channel N that admits an enhancement of the one-shot success probability driven by preparation contextuality, i.e., S > S max Cl , there exists a nonlocal game which can be won with a better-than-classical success probability by the same entangled state and local measurements which enable an advantage in the communication task. By construction, we also have the converse: an advantage in this nonlocal game would imply an advantage in the communication task.
Indeed, if this were not the case (i.e., no such nonlocal game existed) then the enhancement of the oneshot success probability couldn't have been exhibited because the shared correlations between Alice and Bob would then be Bell-local. Hence, the problem of one-shot classical communication assisted by nonsignalling correlations characterizes a family of Bell scenarios where a proof of preparation contextuality on one wing implies a proof of Bell nonlocality between the two wings. This provides further insight into the conditions under which preparation contextuality for a single system can be said to imply Bell nonlocality for its appropriate bipartite extensions, in line with previous work where a certain type of preparation contextuality was shown to imply Bell nonlocality relative to a bipartite extension [36]. 13 The explicit construction of a nonlocal game instantiating Proposition 1, however, would depend on the properties of the channel N . We know that at least in the case of the Cubitt et al. protocol under ideal conditions, these nonlocal games correspond to pseudo-telepathy (PT) games inspired by the KS theorem [5,6,39]. In the case of the Prevedel et al. example [28], the associated nonlocal game is essentially the well-known CHSH game [2,40]. It is an open question whether there exists a generic construction of a nonlocal game, instantiating Proposition 1, that always works starting from any channel N .
We can, however, provide a fairly general construction of nonlocal games starting from a family of classical channels, thus instantiating Proposition 1. This general construction, in particular, reproduces as special cases the examples studied in Refs. [6,28]. It is inspired by the pseudotelepathy (PT) game discussed in Ref. [6], allowing, however, the case where the quantum strategy is imperfect and where neither Alice nor Bob might have access to a KS set. We define this mapping from the communication task to a nonlocal game below. 13 In the case of bipartite pure entangled states of Schmidt rank greater than two (such as the two-ququart maximally entangled state used in Ref. [6]), it has been shown that preparation contextuality of the reduced state on Bob's system, in conjunction with Alice's ability to remotely steer Bob's system to arbitrary preparation ensembles using entanglement [37], implies that the entangled state can exhibit Bell nonlocality [38]. So, at least in the case of such pure entangled states, the implication from preparation contextuality to Bell nonlocality that we consider in this paper also follows from Ref. [38].
The family of classical channels for which our construction works satisfies two properties for any channel N in the family: first, its channel hypergraph H(N ) is k-regular (i.e., every vertex appears in k hyperedges) for some positive integer k, and second, the channel probabilities are entirely fixed by the combinatorial structure of the channel, i.e., N (y|x) = 1 |Yx| δ(y ∈ Y x ), where δ(a ∈ A) defines an indicator function for membership in set A, taking value 1 if a ∈ A and 0 otherwise. Channels satisfying the first property will be called k-regular channels and those satisfying the second property will be called outputuniform channels. Hence, the classical channels we consider below will be k-regular and output-uniform classical channels. Further, we will assume that Alice's choice of the message to send in a particular run is uniformly random, i.e., p(m) = 1 |Msg| for all m ∈ Msg. All this amounts to the following expression for the one-shot success probability: The corresponding nonlocal game is specified by the following: Alice receives questions m ∈ Msg and replies with answers x ∈ X; Bob receives questions y ∈ Y and replies with answers m ∈ Msg; the conditional joint probability distributions of interest, therefore, are given by {p(x, m |m, y)} x,m ,m,y ; the Referee sends them questions m, y according to the probability distribution p(m, y) = p(m)p(y) = 1 |Msg| 1 |Y | ; in order to win the game, Alice and Bob must produce outputs x, m (respectively) such that the condition V (x, m , m, y) = 1 is satisfied, where The probability of winning the game is then given by Note that this mapping relies only on combinatorial properties of N , namely, its channel hypergraph H(N ), and is a straightforward generalization of the connection between the one-shot classical communication protocol and pseudo-telepathy games note in Proposition 3 of Ref. [6]. The connection of the protocol of Ref. [28] with the CHSH game, for example, falls under this generalization.
We are now ready to prove the following theorem: L being the set of Bell-local probability distributions. We use "p(x, m |m, y) ∈ L" as shorthand for membership of the full probability vector (p(x, m |m, y)) x,m ,m,y in the set of Bell-local probability vectors [40].
Theorem 1 provides us a way to characterize a family of classical channels N for which the one-shot success probability can be enhanced by nonsignalling correlations: namely, all k-regular and output-uniform classical channels for which there is a gap between the classical and the nonsignalling value of the nonlocal game defined by them following the recipe we have just outlined, cf. Eq. (17).
It is worth emphasizing here the physical distinction between the two tasks -the one-shot communication task and the corresponding nonlocal game -we have considered in this section. In the communication task, Alice and Bob must necessarily be timelike separated, but in the nonlocal game, they must necessarily be spacelike separated. In the absence of spacelike separation in the communication task, it is inaccurate to state that Bell nonlocality drives the quantum advantage in the task: to be sure, the states and measurements that drive the quantum advantage in the communication task can also drive the quantum advantage in the nonlocal game, but the two tasks correspond to fundamentally different physical situations. Contextuality, for this reason, is the more natural notion of nonclassicality to appeal to as the driver of quantum advantage in the communication task. 6 One-shot success probability of communicating a single bit The problem of communicating a single bit through a noisy classical channel has been previously studied in Refs. [28,29]. Note that the classical value of the one-shot success probability of communicating a single bit (i.e., one out of two messages) in this problem is strictly less than 1 if and only if the confusability graph of the channel N is a complete graph, i.e., α(G(N )) = 1. Hence, it is only for such channels that the possibility of enhancing their oneshot success probability of sending a single bit using shared entanglement exists: for all other channels (with α(G(N )) ≥ 2), a single bit can always be sent with zero error classically. In the rest of this section, therefore, we will only consider output-uniform channels with a confusability graph that is complete. These channels are a special case of Kochen-Specker (KS) channels, namely, those where it is impossible to pick even a pair of non-confusable vertices from distinct hyperedges. 15 Such KS channels obviously do not admit any KS sets since any set of projectors associated with their vertices will necessarily have to be pairwise commuting, i.e., there would be no incompatibility between the projectors. Our general protocol applies to such channels and here we will consider one particular example, the one studied in Ref. [28], as a paradigmatic case and show how it fits within our framework.
The general protocol of Section 3 takes the following 15 Recall that a KS channel is defined by a channel hypergraph where it is impossible to pick a set of vertices, one vertex from each hyperedge, such that all the vertices in this set are mutually non-confusable.
Alice carries out one of two possible measurements labelled by m ∈ {0, 1}, their outcomes labelled by b 2 ∈ {0, 1}. On obtaining outcome b 2 for measurement m, Alice inputs the two-bit string x = mb 2 to the channel. Bob possesses one of two possible binary measurements labelled by v ∈ {0, 1} (their outcomes labelled by z ∈ {0, 1}) and must use the output y from the classical channel (which gives him some information about the possible inputs X y ) to decide his measurement strategy in order to infer the message Alice's message m. The full strategy is detailed below: v ∈ {0, 1}, p(m = m|z, y) = δ g(z,y),m .
Assuming p(m) = 1 2 , the expression for the success probability is given by We refer to Appendix C for a complete derivation of the above expression. Now, the joint statistics p(b 2 , z|m, v) can be interpreted as arising from a Bell-CHSH scenario [2], i.e., a Bell scenario where each party has two possible binary-outcome measurements (m and v in this case), noting that the statistics is non-signalling. We can then define the success probability in such a CHSH game [40] as so that We have And, of course, on allowing arbitrary nonsignalling correlations, a PR-box [40,41] can achieve S = S PR = 1.
How is the CHSH game related to the nonlocal game corresponding to the Prevedel et al. protocol that one would obtain following Theorem 1? In Appendix D, we show that, in fact, this nonlocal game is essentially the CHSH game rewritten in such a way that Alice has two inputs while Bob has six. In this section we will go beyond channels with complete confusability graphs (for which one-shot zeroerror communication is impossible) and consider general Kochen-Specker (KS) channels. Of particular interest will be KS channels that admit KS sets: for some of these channels it is possible to achieve an enhancement of the one-shot zero-error capacity using entanglement, e.g., in Ref. [6], Cubitt et al. showed that one can use a classical channel based on Peres's 24-ray two-qubit KS set [42] which also underlies the Peres-Mermin proof of KS-contextuality [43,44].
We will focus on the one-shot success probability that can be achieved using the Cubitt et al. [6] strategy when Bob only assumes the structure of the channel hypergraph, H(N ), and his knowledge of the encoding Alice uses but makes no assumptions about the exact channel probabilities {N (y|x)} x∈X,y∈Y . This could, for example, happen when Alice and Bob trust the channel hypergraph but they do not trust the channel probabilities of the classical channel given by some provider.
Recall that the one-shot success probability following the Cubitt et al. strategy is given by Bob uses his knowledge of the channel hypergraph, H(N ), and the output received from the channel, y, along with any nonsignalling correlations shared with Alice, to make a guess x ∈ X y for Alice's input x. Our assumption that Bob is oblivious of the channel probabilities means that the probability with which Bob makes his guess, p(x |y, m, x), should be independent of the particular y ∈ Y x ("context" of the guess x ) that Bob receives. It can depend only on the support of x via an indicator function, i.e., p(x |y, m, for any x ∈ X : We term this condition context-independent guessing (CIG). We will see that the quantum strategy of the Cubitt et al. protocol [6] satisfies this constraint and this fact allows us to invoke the assumption of measurement noncontextuality in addition to preparation noncontextuality in placing a noncontextual upper bound on the one-shot success probability. This will in turn allow us to analyze the Cubitt et al. construction and the critical role of the KS theorem in it in the light of generalized noncontextualityà la Spekkens [13]. More concretely, we will see that a non-trivial upper bound on the classical one-shot success probability in this protocol can be characterized by a hypergraph invariant -the weighted max-predictability [18] -following the approach of Ref. [27]. On the other hand, note that classically we have p(x |y, x, m) = p(x |y) for all x ∈ X (m) , m ∈ Msg and the CIG constraint of Eq. (30) (in a classical strategy) then requires that p(x |y) = p(x )δ(y ∈ Y x ). That is, for any x ∈ X : p(x |y 1 , x, m) = p(x |y 2 , x, m) ≡ p(x ), Indeed, in a classical strategy for this communication task, any assignment p : X → [0, 1] respecting the context-independence property defines what is usually called a (general) probabilistic model when the channel hypergraph is viewed as a contextuality scenario [27,31].
7.2 One-shot success probability of a KS channel under the CIG constraint: contextuality and quantum advantage As our working example, we will consider the same classical channel considered by Cubitt et al., cf. Fig. 4. This channel admits a KS set, i.e., a set of projectors on a 4-dimensional quantum system, each projector associated with a vertex in the channel hyper- graph such that each hyperedge constitutes a projective measurement.

The one-shot success probability with contextindependent guessing
The one-shot success probability is given by For any two symbols x, x ∈ X, we quantify the confusability of x with respect to x via the function This is the probability that, for input x, the channel N yields an output that could also arise from the input x . Obviously, η(x, x) = 1 for all x ∈ X.
where we used the fact that η(x, x) = 1. Defining we have that S = S perf + S imperf . Here S perf denotes the contribution to the success probability from the situation where Bob guesses Alice's input x to the channel exactly (i.e., x = x, and therefore also infers m correctly) and S imperf denotes the remaining contribution to the success probability from the situation where Bob doesn't guess x correctly (i.e., x = x) but nevertheless infers m correctly from x (i.e., x ∈ X (m) ).
Recalling the source-measurement correlation function Corr studied in Ref. [27], we have x p(x, x |m). (38) In the context of Ref. [27], this correlation function captures how predictable the measurements corresponding to the hyperedges m can be made when one varies over corresponding preparation ensembles (also labelled by m) that average to the same mixed state. In the ideal case of projective measurements from Peres' 24-vector KS set, this quantity is 1, since every measurement is perfectly predictable when the input state is picked from its orthonormal basis and the uniform mixture over input states from any such basis is the maximally mixed state. However, in a noncontextual ontological model, this quantity is a upper bounded by a hypergraph invariant [27] that will turn out to be relevant for the one-shot success probability in the following subsections.
We then have the following bounds on S: where Here η min is the minimum confusability between any two symbols in any message in the encoding. Similarly, η max is the maximum confusability between any two symbols in any message in the encoding. We have 0 < η min ≤ η max < 1.

7.2.2
The assumption of measurement noncontextuality and how any noncontextual strategy satisfies contextindependent guessing In the Cubitt et al. strategy, it is assumed that both Alice and Bob have access to specific sets of measurements carved out of a KS set for the channel of  Fig. 4.
In our treatment of the problem, since we want to allow more general choices of measurements, we make two relaxations: 1. Firstly, we do not restrict ourselves to projective measurements on Bob's side, i.e., we allow any set of positive operators (each operator associated to a vertex in the channel hypergraph) that satisfy the requirement that the (solid and dotted) hyperedges in Fig. 4 form complete measurements, and 2. Secondly, although in the Cubitt et al. protocol, Alice's measurements are carved out of the same set of positive operators that Bob can implement, and this is clearly the optimal choice of measurements for Alice (yielding S = 1 for sending six messages, quantumly), we allow that, in general, Alice could associate some other measurements with the messages she wants to send even if the outcomes of such measurements do not satisfy the operational equivalences implicit in the channel hypergraph of Fig. 4. That is, the encoding strategy of Alice could use a set of positive operators for her measurements {M that is completely different from the set of positive operators used by Bob for his measurements {M (B) y } 18 y=1 in the decoding strategy. We mention this to emphasize that our generalization of the Cubitt et al. strategy does not rely on identifying the measurement outcomes of Alice and Bob in the way they are identified in the optimal strategy and our use of the assumption of measurement noncontextuality is restricted to Bob's system, i.e., response functions of Bob's measurements. In particular, the positive operators that constitute Bob's measurements can, in principle, be reconfigured to define the six measurements {M m=1 that it would be optimal for Alice to choose for her encoding measurements {M In a noncontextual ontological model of Bob's system, the response functions associated with the vertices (labelled by x ∈ X) respect measurement noncontextuality, i.e., for all x, y 1 , y 2 such that x ∈ X y1 ∩ X y2 = ∅, ξ(x|y 1 , λ) = ξ(x|y 2 , λ) ≡ ξ(x|λ), ∀λ ∈ Λ. (42) Now, even though Bob may not implement or have access to the six measurements {M (B) m } 6 m=1 that would be optimal for Alice in the protocol, the operational equivalences implicit in the hypergraph of Fig. 4 indicate that the response functions for M (B) m on Bob's system must, under the assumption of measurement noncontextuality, also satisfy for all x ∈ X and all m ∈ Msg.
Recalling that we have that p(x |y 1 , m, x) = p(x |y 2 , m, x) for all x , y 1 , y 2 , m, x such that x ∈ X y1 ∩ X y2 (equivalently, y 1 , y 2 ∈ Y x ), so that the CIG constraint is satisfied by any noncontextual strategy and the one-shot success probability takes the form of Eq. (34).

Upper bound on the one-shot success probability from preparation and measurement noncontextuality
We now proceed to upper bound the success probability under the assumption of noncontextuality, i.e., preparation and measurement noncontextuality, and obtain To see how this comes about starting from the expression for the one-shot success probability in Eq. (34), i.e., As in the general case of one-shot classical communication assisted by entanglement, given the operational equivalence of ensembles prepared on Bob's side by Alice's measurements M (A) m and the assumption of preparation noncontextuality, we have that µ(λ|m) = ν(λ) for all m. We then have

Contextuality drives the quantum advantage
We show that the one-shot success probability achievable via any noncontextual strategy is no better than best classical strategy with context-independent guessing. For any extremal classical strategy i ∈ I (I being the set of extremal classical strategies satisfying CIG), the success probability is given by where p B (x |y, m, x, i) = p B (x |y, i) = p B (x |i)δ(x ∈ X y ), since Bob has no access to any information from Alice besides the shared variable i (denoting the strategy both of them agree to implement) and the channel output y, the latter specifying a confusable set X y containing Alice's input x (and Bob's guess x ). An arbitrary classical strategy can then be represented by a convex mixture of extremal classical strategies according to some probability distribution {p(i)} i∈I and the classical success probability is then given by Using the CIG constraint, we have p B (x |y, i) = p B (x |i)δ(y ∈ Y x ), where any extremal classical strategy i specifies a particular extremal probabilistic model on the channel hypergraph (viewed as a contextuality scenario [31]) in Fig. 4(a). This allows us to express the maximal classical success probability satisfying CIG as We therefore have Since any classical strategy is a convex mixture of extremal classical strategies, the upper bound S max Cl(CIG) can always be achieved by a classical strategy, i.e., there exists an extremal classical strategy i * ∈ I such that S Cl(CIG) (i * ) = S max Cl(CIG) . Similarly, the upper bound S max NC can be saturated by a noncontextual strategy, albeit a very trivial one, following a similar reasoning as at the end of Section 4.2 (except that the extremal response functions here are indeterministic on account of KS-uncolourability). Thus, we have that contextuality also drives the quantum advantage in one-shot classical communication when Alice and Bob trust the channel hypergraph but make no assumptions about the channel probabilities. 16 16 Note that, under the CIG constraint, we no longer have the exact correspondence with nonlocal games exemplified by Theorem 1. This is because the connection between prepareand-measure scenarios on Bob's system alone and Bell scenarios where Bob is one of the parties in a two-party Bell experiment breaks down when one imposes, besides preparation noncontextuality, the assumption of measurement noncontextuality on Bob's system (based on the operational equivalences be-7.2.5 Contextuality witnessed by a hypergraphinvariant -the weighted max-predictability -is sufficient for a quantum advantage In this section we point out an explicit connection between contextuality that can be witnessed via the noise-robust noncontextuality inequalities of Ref. [27] and the quantum advantage in the one-shot classical communication task with context-independent guessing. The full technical argument supporting the claims in this section is presented in Appendix E.
In Appendix E, we first show that is the weighted max-predictability, a hypergraph invariant that was defined in Ref. [10] and studied extensively in Ref. [27], appearing in the upper bounds of noise-robust noncontextuality inequalities proposed in the frameworks of Refs. [10,27]. Here, Γ is the hypergraph defined by the contextuality scenario from which the measurements of Alice and Bob are drawn in the ideal Cubitt et al. strategy [6], i.e., Fig. 4(b) including all the hyperedges (solid and dotted). We then consider the special case η max = η min = η and show that firstly, from Eq. (39), we have S = Corr + η(1 − Corr) = η + Corr(1 − η). (53) We then have that Now, is an instance of a noise-robust noncontextuality inequality following the approach of Ref. [27], inspired by logical proofs of KS-contextuality [3,17]. Hence, the contextuality witnessed by Corr > β(Γ, {p(m)} m ) is sufficient for a quantum advantage in this task when η max = η min . 17 Output-uniform channels with tween his local measurement events). This restricts the scope of response functions for Bob's measurements beyond anything required by local causality (under which no restriction on local response functions is imposed). The interested reader may look at a discussion of this point at the end of Section 2.7 in Ref. [10]. 17 Note that when η min = ηmax ≡ η, all pairs of distinct input symbols are equally confusable, i.e., η(x, x ) = k-regular hypergraphs have η(x, x ) = 1 k y δ(y ∈ Y x ∩ Y x ) and this quantity is independent of x, x (x = x ) if and only if |Y x ∩ Y x | is constant for all x = x , i.e., the number of hyperedges shared by any two confusable vertices of the channel hypergraph is constant across all pairs of confusable vertices. In the channel hypergraph of Fig. 4, for example, we have |Y x ∩ Y x | = 1 for all confusable pairs of vertices x, x , so the classical channel studied in Ref. [6] (where k = 3 and N (y|x) = 1 3 for all y ∈ Y x , x ∈ X) satisfies the condition required for Corr > β(Γ, {p(m)} m ) to imply a quantum advantage. The case Corr = 1 corresponds to the situation studied in Ref. [6]. We have shown that this quantum advantage can persist even when Corr < 1, i.e., in the regime of noisy measurements.
Another special case concerns the situation where S imperf = 0. In this situation too, the violation of Corr ≤ β(Γ, {p(m)} m ) implies a quantum advantage (cf. Appendix E). Using the set-up in Ref. [6], one can achieve Corr = 1, i.e., zero-error communication.
We have thus provided an instance of an information-theoretic task where the noise-robust signatures of contextualityà la Refs. [17,27] witness a quantum advantage in the task. This provides an operational meaning to noise-robust noncontextuality inequalities of the type in Eq. (55) that were proposed in Ref. [27].

KS basis sets and the ideal Cubitt et al. strategy
Can one use the strategy of Ref. [6] starting from any KS set of vectors? 18 The strategy requires not merely a KS set -namely, a set of vectors with orthogonality relations represented by a KS-uncolourable hypergraph -but, in fact, a KS basis set, i.e., a set of disjoint complete orthogonal bases Z ≡ {B m } q m=1 such that it is impossible to pick a vector from each basis ensuring that no two are orthogonal. Clearly As we have noted, the set of vectors appearing in any KS basis set form a KS set. However, it is not a priori obvious that, given a KS set, it is always possible to carve it up into a KS basis set {B m } q m=1 18 Recall that the general protocol of Section 3.1 does not rely on the existence of KS sets. It's only the particular strategy of Ref. [6] that makes use of them.
such that q > α(G(N )) for any channel constructed from it following the prescription of Cubitt et al. [6] mentioned above. 19 Hence, to answer whether the entanglement-assisted enhancement of the one-shot zero-error capacity achieved by the strategy of Ref. [6] carries through for any KS set, we need to settle the following question: Does every KS set admit a KS basis set of size q > α(O(Γ))? Here Γ is the contextuality scenario corresponding to the KS set and O(Γ) is the orthogonality graph of Γ. [10,27] If the answer is in the affirmative, then the Cubitt et al. strategy can be used starting from arbitary KS sets. If not, then there must exist a counterexample. Indeed, we can find such a counter-example and, therefore, the Cubitt et al. strategy is not applicable to arbitrary KS sets: it only works for KS sets that admit (disjoint) KS basis sets. Our counterexample comes from the Conway-Kochen 31-vector KS set, the smallest known KS set in dimension d = 3 [45]. We refer to Appendix F for details.
This raises the following important open problem: Given an arbitrary KS set, what are the necessary and sufficient criteria for it to admit a KS basis set?

Discussion and outlook
We have generalized and unified the protocols of Refs. [6,28] in a broad framework for entanglementassisted one-shot classical communication that should prove useful for future investigations. Our results bear witness to the role that noise-robust contextualityà la Spekkens [13] plays in this task. Indeed, the problem of entanglement-assisted one-shot classical communication provides a fertile ground to study the rich interplay between the Kochen-Specker theorem [3], Spekkens contextuality [13] and its hypergraphtheoretic formulations [10,27], and nonlocal games. Several open questions and opportunities for future work arise: 1. Does there exist a generic construction of a nonlocal game, instantiating Proposition 1, for any channel N that admits an enhancement of its one-shot success probability? Even extending the family of channels for which such a construction exists beyond the case we have shown, i.e., the 19 In Ref. [6], there is a claim that the existence of KS basis sets is "a corollary of the KS theorem". This is true if a KS basis set is allowed, in general, to contain bases that share vectors. However, for the Cubitt et al. construction to work, the bases in a KS basis set must be disjoint, i.e., no vectors are shared between bases: this is what allows Alice to encode her messages unambiguously in the outcomes of these measurement bases. Further, for an advantage, the number of these disjoint bases in a KS basis set must exceed the independence number of the confusability graph of the channel constructed from the KS basis set. Hence, in our definition of a KS basis set, we explicitly include the disjointness of bases, something Ref. [6] implicitly assumed.
family of output-uniform k-regular channels of Theorem 1, would constitute progress in this direction. Furthermore, even within this family of channels, an important question is to characterize those for which the corresponding nonlocal game admits a gap between classical and quantum/nonsignalling correlations. These channels would, in turn, admit an advantage in a corresponding one-shot communication task because of Theorem 1.
2. While the channels considered in Refs. [6,28] are KS channels, it remains an open question whether more general channels (in particular, with KS-colourable channel hypergraphs) exhibit non-trivial advantages in enhancing the one-shot success probability using entanglement. Our general protocol in Section 3 does not specifically rely on the channel being KS-uncolourable. For example, a simple channel corresponding to a statistical proof of KS-contextuality is the one based on the KCBS construction on a qutrit [46]. It consists of 10 vertices, denoted (addition modulo 5, so that i + 1 = 1 for i = 5), with α(G(N )) = 3 (equal to its one-shot zero-error capacity); a natural question then arises: can entanglement be used to enhance the one-shot sucess probability of sending two bits using this channel? What would be the role of noise-robust contextualityà la Refs. [10,18] in enabling such an enhancement? If not, can any channel with a KS-colourable hypergraph admit enhancement of its one-shot success probability?
3. The Cubitt et al. protocol [6] requires the existence of (disjoint) KS basis sets. Is it possible to modify this protocol to use KS sets which do not admit (disjoint) KS basis sets, e.g., the Conway-Kochen 31-vector KS set? Would such a modification still allow for the possibility of enhancing the one-shot zero-error capacity of a classical channel? Or would it, maybe, only allow for an enhancement of the one-shot success probability following the general protocol we discussed in Section 3? The existence of pseudotelepathy games based on KS sets [5] suggests that a quantum advantage in some corresponding oneshot communication task (following Theorem 1) should be possible. A related question is: what is the simplest scenario that admits enhancement of the one-shot zero-error capacity of a classical channel? Is the example studied in Ref. [6] the simplest one, or is it possible to further reduce, say, the size of the input alphabet or the dimension of the quantum system for which the enhancement is achieved? Of course, insofar as one uses KS sets to achieve this enhancement, this is also related to the smallest possible KS sets: in 5. The connection of the one-shot communication task with preparation contextuality could also be leveraged to obtain bounds on inaccessible information in preparation contextual ontological models of quantum theory, following the ideas recently proposed in Ref. [53].
More generally, the problem of entanglementassisted one-shot zero-error communication can be viewed as a channel simulation problem, i.e., using a noisy channel to simulate a noiseless channel in a one-shot setting using nonsignalling correlations [54]. The relaxation of it to the case of enhancing the one-shot success probability (which we have studied) can be viewed as using a noisy channel to simulate a less noisy channel using nonsignalling correlations, i.e., noise-attenuation of a classical channel using a nonclassical common-cause resource [33]. We have focussed in this paper on the interplay of this latter channel simulation problem with the contextuality of the system that the receiver (Bob) holds in the communication task. A worthwhile project here is a rigorous resource-theoretic account of this problem to better understand how various resources affect the simulation preorder over classical channels in this noisy setting [55]: whether perhaps the resource of LOSR-entanglement [56] is more appropriate than LOCC-entanglement when viewing the resource aspects of entanglement (and how this affects, for example, the usefulness of Tsirelson boxes vs. Hardy boxes [33,56,57] in this task), how the resource of noiserobust contextuality on one wing of a Bell experiment plays with bipartite nonlocality, and, more abstractly, the usefulness of a common-cause resource in simulating a direct-cause resource (e.g., the fact that entanglement can increase the one-shot zero-error capacity of a classical channel [6]). It would also be interesting to see if the contextuality witnesses we have considered in this paper turn out to be related to some monotones for channel (non-)conversions in a resource theory of channel simulation.
A Preparation noncontextuality on one wing of a bipartite Bell experiment implies local causality We will use quantum notation below for ease of understanding, but the argument applies to all nonsignalling general probabilistic theories (GPTs).
Consider a general bipartite Bell scenario where Alice's measurement settings are labelled by s, Bob's settings are labelled by t, and their respective outcomes are labelled by a and b. Their joint statistics is, therefore, given by p(a, b|s, t) = Tr(E and ρ AB is the entangled state shared between Alice and Bob. We now consider the prepare-andmeasure experiment on Bob's side 20 that this Bell scenario induces: Bob's preparations are steered by Alice's measurements, i.e., every measurement outcome E (s) a on Alice's side steers Bob's system to (an unnormalized state) σ a|s = Tr A (E (s) a ⊗ Iρ AB ). However, no-signalling requires that Bob should not be able to infer Alice's measurement setting s by local interventions on his system alone, so that we have a σ a|s = ρ B for all measurement settings s that Alice can choose. Each s therefore labels a preparation ensemble {p(a|s), ρ a|s } a on Bob's side such that a p(a|s)ρ a|s = ρ B for all s, where p(a|s) = Tr B σ a|s 20 The same argument goes through with the roles of Alice and Bob interchanged. and ρ a|s = σ a|s p(a|s) . Given this operational equivalence between the preparation ensembles on Bob's side, the assumption of preparation noncontextuality entails that any ontological model of Bob's system must satisfy a p(a|s)p(λ|s, a) = p(λ) for all s. This can be rewritten as a p(a|s, λ)p(λ|s) = p(λ) for all s, i.e., p(λ|s) = p(λ) for all s. We then have, given Bob's measurement outcomes E where the last equality follows from the assumption of preparation noncontextuality. Thus, the existence of a preparation noncontextual ontological model for Bob's system (or for Alice's system, by symmetry) implies the existence of locally causal ontological model for the bipartite Bell experiment. Note that p(a, b|s, t) where p(a|s) = Tr(E and using the no-signalling condition to express the second term of Eq. (19) The Bell expression then becomes using Eq. (16). This means that the following holds: i.e., the one-shot zero-error communication occurs if and only if the corresponding nonlocal game is won with certainty. On the other hand, this also means that so that we finally have x,x ∈X (m) ξ(x |m, λ)µ(x|m, λ)η(x, x ), so that and Corr = λ Corr(λ)ν(λ) (80) We now proceed to upper bound S max NC in terms of a hypergraph invariant: is the weighted max-predictability [10,27]. Hence, we have Recalling that is a noise-robust noncontextuality inequality [27], we have that the contextuality witnessed by Corr > β(Γ, {p(m)} m ) is sufficient for a quantum advantage when η max = η min . Another special case: The sufficiency of Corr > β(Γ, {p(m)} m ) for a quantum advantage also arises when S imperf = 0. Then we have that S = S perf = Corr and S max NC ≤ β(Γ, {p(m)} m ), so that the violation of Corr ≤ β(Γ, {p(m)} m ) implies the violation of S ≤ S max NC . Indeed, in the ideal quantum case considered by Cubitt et al. [6], we see that Corr = 1, maximally violating the noncontextuality inequality and achieving a success probability of 1.

F Not every KS set admits a KS basis set: the Conway-Kochen 31-vector KS set
Consider the simplest known KS set in d = 3 dimensions, namely, the Conway-Kochen 31-vector KS set [45]. The 31 vectors are carved up into 17 complete orthogonal bases (with 3 vectors each) and 20 incomplete orthogonal bases (with 2 vectors each). The orthogonality graph has an independence number of 11 and the only disjoint basis sets of size greater than 11 are those of size 12 and 13. None of these disjoint basis sets forms a KS basis set, hence no quantum advantage over the unassisted one-shot zero-error capacity of 11 can be obtained via the methods of Ref. [6] for this construction. A remaining possibility is that, on adding the missing vectors in the 20 incomplete orthogonal bases to the KS set, the orthogonality relations between the resulting set of 51 vectors will perhaps allow for an advantage. We rule out this possibility as well: after including 20 additional vectors that render all the bases that appear in this KS set complete, we have a contextuality scenario represented by a hypergraph containing 51 vertices carved up into 37 (three-vertex) hyperedges. We check for any additional orthogonality relations arising from the newly introduced 20 vectors and find 4 additional incomplete orthogonal bases. On further completing these 4 bases by adding 4 more vectors, we find that there are, overall, 55 vertices (vectors) carved up into 41 hyperedges (complete orthogonal bases) and there are no additional orthogonality relations. The orthogonality graph associated with this extended contextuality scenario has an independence number of 25. Hence, for an advantage based on the strategy of Ref. [6], there must exist a KS basis set of size q > 25. However, the largest disjoint basis set is still of size 13 and it does not form a KS basis set. Hence, the strategy of Ref. [6] does not provide an advantage even (and especially) when extending Conway-Kochen 31-vector KS set to complete all incomplete orthogonal bases and include any additional orthogonality relations (leaving no incomplete orthogonal bases). This provides a counter-example to the question we posed, showing that the Cubitt et al. strategy doesn't work for arbitrary KS sets.
One might wonder why we bother "completing" the original 31-vector KS set to 55-vector KS set with no incomplete bases. We do this to rule out the possibility that something akin to the 18-vector KS set in 4 dimensions [58] is happening here: for that KS set, it's not possible to implement the Cubitt et al. protocol, but supplementing it with the remaining set of 6 vectors (out of Peres's 24-vector KS set [42] from which the 18-vector set is drawn) and taking into account the resulting additional orthogonality relations yields Peres's 24-vector KS set for which the Cubitt et al. protocol works. From our investigation, it is clear that for the 31-vector KS set, such a situation doesn't arise even after "completing" it.
We provide below a list of all the vectors and bases in the orginal as well as the "completed" KS set for the Conway-Kochen argument, so that the interested reader may verify our claims concerning this KS set. (90) The orthogonality relations of between these vec-tors are the following (the first entry in each list is the vector with respect to which the remaining vectors in the list are orthogonal): Taking into account possible extra orthogonality relations not captured by the set of 37 complete bases, it turns out that there are 4 additional incomplete bases in the set of 51 vectors above: