On relating one-way classical and quantum communication complexities

Communication complexity is the amount of communication needed to compute a function when the function inputs are distributed over multiple parties. In its simplest form, one-way communication complexity, Alice and Bob compute a function $f(x,y)$, where $x$ is given to Alice and $y$ is given to Bob, and only one message from Alice to Bob is allowed. A fundamental question in quantum information is the relationship between one-way quantum and classical communication complexities, i.e., how much shorter the message can be if Alice is sending a quantum state instead of bit strings? We make some progress towards this question with the following results. Let $f: \mathcal{X} \times \mathcal{Y} \rightarrow \mathcal{Z} \cup \{\bot\}$ be a partial function and $\mu$ be a distribution with support contained in $f^{-1}(\mathcal{Z})$. Denote $d=|\mathcal{Z}|$. Let $\mathsf{R}^{1,\mu}_\epsilon(f)$ be the classical one-way communication complexity of $f$; $\mathsf{Q}^{1,\mu}_\epsilon(f)$ be the quantum one-way communication complexity of $f$ and $\mathsf{Q}^{1,\mu, *}_\epsilon(f)$ be the entanglement-assisted quantum one-way communication complexity of $f$, each with distributional error (average error over $\mu$) at most $\epsilon$. We show: 1) If $\mu$ is a product distribution, $\eta>0$ and $0 \leq \epsilon \leq 1-1/d$, then, $$\mathsf{R}^{1,\mu}_{2\epsilon -d\epsilon^2/(d-1)+ \eta}(f) \leq 2\mathsf{Q}^{1,\mu, *}_{\epsilon}(f) + O(\log\log (1/\eta))\enspace.$$ 2)If $\mu$ is a non-product distribution and $\mathcal{Z}=\{ 0,1\}$, then $\forall \epsilon, \eta>0$ such that $\epsilon/\eta + \eta<0.5$, $$\mathsf{R}^{1,\mu}_{3\eta}(f) = O(\mathsf{Q}^{1,\mu}_{{\epsilon}}(f) \cdot \mathsf{CS}(f)/\eta^3)\enspace,$$ where \[\mathsf{CS}(f) = \max_{y} \min_{z\in\{0,1\}} \vert \{x~|~f(x,y)=z\} \vert \enspace.\]


Introduction
Communication complexity concerns itself with characterizing the minimum number of bits or qubits that distributed parties need to exchange in order to accomplish a given task (such as computing a function f ). Over the years, different models of communication for two party and multi party communication [4] have been proposed and studied. We consider only two party communication models in this paper. Communication complexity models have established striking connections with other areas in theoretical computer science, such as data structures, streaming algorithms, circuit lower bounds, decision tree complexity, VLSI designs, etc.
In the two-way communication model, two parties Alice and Bob receive an input x ∈ X
Jain and Zhang [9] extended the result of [14] when µ is any (non-product) distribution given as follows: For a function f : X × Y → {0, 1}, another measure that is often very useful in understanding classical one-way communication complexity, is the rectangle bound (denoted rec(f )) a.k.a. the corruption bound. The rectangle bound rec(f ) is defined via a distributional version rec µ (f ). It is a well-studied measure and rec 1,µ (f ) is well known to form a lower bound on R 1,µ (f ). If µ is a product distribution, [9] showed, Q 1,µ 3 (f ) = Ω rec 1,µ (f ) . For a product distribution µ, Jain, Klauck and Nayak [12] showed, max product λ rec 1,λ (f ) = Ω(R 1,µ (f )).
However, it remained open whether R 1,µ (f ) and Q 1,µ (f ) (or Q 1,µ, * (f )) are related for a fixed distribution µ. We answer it in positive and show the following results.
Our results Note that for entanglement-assisted protocols, there must be a factor of 2 because of super dense coding. Additionally, if µ is a non-product distribution, we show, Theorem 2. Let , η > 0 be such that /η + η < 0.5. Let f : X ×Y → {0, 1, ⊥} be a partial function and µ be a distribution supported on where {|{x | f (x, y) = z}|} . 1 A partial function under a product µ is basically same as a total function. Both Theorem 1 and Theorem 2 are proved by converting quantum protocols into classical protocols directly.
The bound provided by Theorem 2 depends on the column sparsity CS(f ). Although CS(f ) can be as large as O(|X |), giving a bound exponentially worse than the O(log(|X |)) brute force protocol, Theorem 2 is useful when CS(f ) is constant. In particular, Theorem 2 can convert the quantum fingerprinting protocol [5] 2 on EQUALITY function into a classical communication protocol with similar complexity for the worst case by combining it with Yao's Lemma [19].

Proof overview
For a product distribution µ, we upper bound R 1,µ (f ) by Q 1,µ, * (f ), using ideas from König and Tehral [13] and Jain, Radhakrishnan, and Sen [10,11]. For an entanglement-assisted quantum one-way communication protocol, let Q ≡ DE B represent Alice's quantum message D and Bob's part of the entanglement E B . We first replace Bob's measurement by the pretty good measurement (PGM) (with a small loss in the error probability). Then we use an idea of [13] to show that we can "split" Bob's PGM into the PGM for guessing X. Since this new X-guessing PGM is independent of Y , Alice can apply it herself on the register Q (Alice's message and Bob's share of prior entanglement) and send the measurement outcome C to Bob, who will just output f (C, Y ). The classical message that Alice sent is long (in fact it is equal to the length of X) but it has low max-information with input X, since (by monotonicity of the max-information) I max (X : C) ≤ I max (X : Q) ≤ 2 log(|D|). Note that the second inequality has a factor of 2 due to super dense coding. We then use a compression protocol from [10,11] to compress C into another short message C of size 2 log(|D|). The same argument works for variants of this result where the two parties does not share entanglement, and/or where the error probability is averaged over a distribution of x and maximized over y.
For a non-product distribution µ, we upper bound R 1,µ (f ) by Q 1,µ (f ), using ideas of Huang, Kueng and Preskill [8] and [10,11]. For a quantum one-way communication protocol with quantum message 3 Q, we first use the idea of [8] to show that there exists a "classical shadow" C of the quantum message Q, which will allow Bob to estimate Tr(E y b Q) (for any b ∈ {0, 1}, where M y = {E y 0 , E y 1 } is Bob's measurement on input y). This allows Alice to send the classical shadow C of quantum message Q. However, the precision of the classical shadow procedure of [8] depends on E y b 2 F , so we need to bound E y b 2 F . We show that there exists measurement oper- is at most the "column sparsity" of function f and Tr(Ẽ y b Q) is "close" to Tr(E y b Q). We again note that the classical shadow has low maxinformation with input X, since (using the monotonicity for max-information) I max (X : C) ≤ I max (X : Q) ≤ log(|Q|). As before, we use the compression protocol from [10,11] to compress C into another short message C of size log(|Q|).

Organization
In Section 2, we present our notations, definitions and other information theoretic preliminaries. In Section 3, we present the proof of Theorem 1. In Section 4, we present the proof of Theorem 2.

Preliminary
Quantum information theory All the logarithms are evaluated to the base 2. Consider a finite dimensional Hilbert space H endowed with an inner-product ·, · (we only consider finite dimensional Hilbert spaces). A quantum state (or a density matrix of a state) is a positive semi-definite matrix on H with trace equal to 1. It is called pure if and only if its rank is 1. Let |ψ be a unit vector on H, that is ψ, ψ = 1. With some abuse of notation, we use ψ to represent the state and also the density matrix |ψ ψ|, associated with |ψ .
Given a quantum state ρ on H, support of ρ, called supp(ρ) is the subspace of H spanned by all eigenvectors of ρ with non-zero eigenvalues.
A quantum register A is associated with some Hilbert space If two registers A, B are associated with the same Hilbert space, we shall represent the relation by A ≡ B. For two states ρ A , σ B , we let ρ A ≡ σ B represent that they are identical as states, just in different registers. Composition of two registers A and B, denoted AB, is associated with the Hilbert space where {|i } i is an orthonormal basis for the Hilbert space H A . The state ρ B ∈ D(H B ) is referred to as the marginal state of ρ AB . Unless otherwise stated, a missing register from subscript in a state will represent partial trace over that register. Given Note that the size (number of qubits) of the canonical purification |ρ A is twice the size of quantum state ρ A .
A quantum channel E : L(H A ) → L(H B ) is a completely positive and trace preserving (CPTP) linear map (mapping states in The set of all unitary opera- . A projector Π is an operator such that Π 2 = Π, i.e. its eigenvalues are either 0 or 1. For a classical random variable X, we use where ρ x Q are states and P X ( For a function Z : X → Z, define We also use U d to represent the uniform distribution over {0, 1} d .
3. For a quantum state ρ, and integer t > 0, we define We start with the following fundamental information theoretic quantities. We refer the reader to the excellent sources for quantum information theory [17,18] from where the facts stated below can be found. Definition 2 (von Neumann entropy). The von Neumann entropy of a quantum state ρ is defined as, Definition 3 (Relative entropy). Let ρ, σ be states with supp(ρ) ⊂ supp(σ). The relative entropy between ρ and σ is defined as, Definition 4 (Max-relative entropy [6,10]). Let ρ, σ be states with supp(ρ) ⊂ supp(σ). The max-relative entropy between ρ and σ is defined as, Definition 5 (Max-information [6]). For state ρ AB , If ρ is a classical state (diagonal in the computational basis) then the inf above is achieved by a classical state σ B .
Definition 6 (Mutual information). Let ρ ABC be a quantum state. We define the following measures.

Conditional mutual information
Proof. For the first inequality consider, For above note that x p x ρ x A = ρ A and hence for all x : p x ρ x A ≤ ρ A . For the second inequality consider, Fact 2 (Monotonicity). Let ρ XA be a cq-state (X classical) and E : L(H A ) → L(H B ) be a CPTP map. Then, The second inequality now follows from the definition of I max .
Definition 8 (Guessing probability). Given a cq-state, ρ XQ = x p x |x x| ⊗ ρ x Q , we often want to guess X by doing a measurement on the quantum register Q. If we do so by a measurement M with POVM elements {E x }, its success probability averaged over X is We use p opt g (X|Q) ρ to denote the maximum probability over all measurements M, i.e.

Definition 9
(Pretty good measurement (PGM)). For a cq-state, The pretty good measurement (PGM) is the measurement M pgm X with POVM elements {E pgm

Fact 6 (Optimality of PGM [16]). For any cq-state
and d is the dimension of the register X.
Note that g(x) is convex everywhere and increasing when x ∈ [1/d, 1], and g(1/d) = 1/d. Also, note the bound from Fact 6 is better than the optimality bound of Barnum and Knill [2] when the guessing probability is close to 1/d.

One-way communication complexity
In this paper we only consider the two party one-way model of communication. Let f : X × Y → Z ∪ {⊥} be a partial function, µ be a distribution on f −1 (Z) and ≥ 0. Let µ X represent the marginal of µ on X . In a two party one-way communication protocol P, Alice with input x ∈ X communicates a message to Bob with input y ∈ Y. On receiving Alice's message, Bob produces output of the protocol P(x, y).
In a one-way classical communication protocol, Alice and Bob are allowed to use public and private randomness (independent of the inputs). In a one-way quantum communication protocol, Alice and Bob are allowed to do quantum operations and Alice can send a quantum message (qubits) to Bob. In an entanglement-assisted protocol, Alice and Bob start with a shared pure state (independent of the inputs) and Alice communicates a quantum message to Bob.
Let P represent a one-way communication protocol.

CC(P) be the maximum number of (qu)bits communicated in P.
Definition 11. Let P represent a classical public-coin protocol.
Intuitively, R 1 (f ) is the classical communication complexity for worse case (x, y), The following result due to Yao [21] is a very useful fact connecting worst-case and distributional communication complexities.

Fact 7 (Yao's Principle [21])
.   M, R)). The communication from Alice to Bob is T η (X : C). Furthermore, Fact 9 (Markov's inequality). For any nonnegative random variable X and real number a > 0, Fact 10. For a projector Π, we have The following fact follows from Theorem 4 in [8]. It can be shown that for any fixed Hermitian operator A, T values d (A, x 1 ), d (A, x 2 ), . . . , d (A, x T ), where each x i ← M ST AB (ρ), to estimate Tr(Aρ) using the standard medianof-means approach. Using Chebyshev's inequality and the Chernoff bound, we obtain Pr s←S (|d(A, s) − Tr(Aρ)| ≤ ) ≥ 1 − δ.

Product distribution proof
Here we restate Theorem 1 and provide its proof.
Proof. The proofs of all the inequalities are all very similar. We give a detailed proof of the first and state the differences to obtain the other inequalities at the end. Recall we use the notation, ψ to represent the state and also the density matrix |ψ ψ|, associated with |ψ . Let S y z = {x|f (x, y) = z} and Q 1,µ, * (f ) = a.
Consider an optimal distributional entanglement-assisted quantum communication strategy. Let the initial state be where |ρ AB is the shared entanglement between Alice and Bob (Alice, Bob hold registers A, B respectively). Alice applies a unitary U : is a unitary conditioned on X = x) and sends across register D to Bob. Let the state at this point be Bob performs measurement M y with POVM elements {E y z : ∀z ∈ Z} on register Q conditioned on Y = y to output f (x, y). Then, This implies, where we defined µ y (z) Note that ρ y,z Q are density matrices and z∈Z µ y (z) = z∈Z x∈S y z µ(x) = x∈X µ(x) = 1. We can view z∈Z µ y (z) Tr ρ y,z Q E y z as the success probability of distinguishing the cq-state ρ y ZQ = z∈Z µ y (z)|z z| ⊗ ρ y,z Q with measurement M y with POVM elements {E y z : ∀z ∈ Z}. We have, where we defined the random variable Z y def = f (X, y). Applying the function g of Fact 6 to both sides and using the convexity of g, we have where A y z = µ y (z)ρ y,z Q , and A = z∈Z A y z . Note that A y z = x∈S y z µ(x)ρ x Q , and A = x µ(x)ρ x Q is independent of y. From Fact 6, the optimality of PGM, we have that for all y, where {E pgm x : ∀x ∈ X } are the POVM elements of the PGM that guesses X from Q. Therefore, instead of doing M pgm,y Z , we can measure with M pgm X , getting a guess x , and then compute f (x , y) as our guess of Z y = f (X, y). That is, More precisely, define C ≡ M pgm X (Q). Since C is a classical random variable that is independent of y, Alice can compute C by herself. Consider the intermediate classical one-way communication protocol where Alice computes and sends C = M pgm X (Q) to Bob, and Bob predicts f (x, y) with z = f (c, y). The success probability of this intermediate protocol is where we used Equation (3.5) and Equation (3.4) in the last line.
In this intermediate protocol, the message C that Alice sent is not short. In fact, it has the same length as X. However, C has low max-information with X. By Equation (3.1) and Fact 2 we have Therefore using Fact 8, we can compress C and get a classical one-way communication protocol with message M and public-coin R, such that and success probability g(1 − ) − η. Therefore R 1,µ 2 −d 2 /(d−1)+η (f ) ≤ 2a + O(log log(1/η)) which gives the desired (bound).
To obtain the second inequality we note that Alice and Bob did not have the starting state ρ AB , so the register B is empty, and in Equation (3.1) we have instead I max (X : Q) ρ = I max (X : D) ρ ≤ log(|D|) = a. (3.8) The inequality is because of Fact 2. Everything else follows.
To obtain the third inequality we basically condition on every possible y, i.e. changing " y µ(y)" in the proof to "for all y" and " x,y µ(x, y)" to "for all y , x µ(x)". We also do not need to use the convexity of Fact 6.
To obtain the fourth inequality, we combine the changes to obtain the second and the third inequalities.

Non-product distribution proof
Here we restate Theorem 2 and provide a proof. Theorem 4. Let , η > 0 such that /η + η < 0.5. Let f : X × Y → {0, 1, ⊥} be a partial function and µ be a distribution sup- For all y, define, where ψ 4. If Bob's estimated value turns out to be less than 0.5, he outputs 1−b y , otherwise b y .
Let I(x, y, s) be the indicator function such that I(x, y, s) = 1 if subsample s results in Bob (with input y) estimating Tr |ψ x Q ψ x Q |Ẽ y by upto additive error η.
In P 1 , the message S (averaged over x) that Alice sent is of size O(T a 2 ). However, S has low mutual information with X. Using Fact 2, we have I max (X : S) = I max (X : (M ST AB (Q)) ⊗T ) ≤ I max (X : Q ⊗T ) ψ ≤ T a.

Acknowledgment
The work of RJ was supported in part by the National Research Foundation, Singapore, under Grant NRF2021-QEP2-02-P05; and in part by the Ministry of Education, Singapore, through the Research Centers of Excellence Program. The work of NGB was done while he was a graduate student at the Centre for Quantum Technologies and was supported by the Ministry of Education, Singapore through the Research Centers of Excellence Program. The work of HL was funded by MOST Grant no. 110-2222-E-007-002-MY3.