Merlin-Arthur with efficient quantum Merlin and quantum supremacy for the second level of the Fourier hierarchy

We introduce a simple sub-universal quantum computing model, which we call the Hadamard-classical circuit with one-qubit (HC1Q) model. It consists of a classical reversible circuit sandwiched by two layers of Hadamard gates, and therefore it is in the second level of the Fourier hierarchy. We show that output probability distributions of the HC1Q model cannot be classically efficiently sampled within a multiplicative error unless the polynomial-time hierarchy collapses to the second level. The proof technique is different from those used for previous sub-universal models, such as IQP, Boson Sampling, and DQC1, and therefore the technique itself might be useful for finding other sub-universal models that are hard to classically simulate. We also study the classical verification of quantum computing in the second level of the Fourier hierarchy. To this end, we define a promise problem, which we call the probability distribution distinguishability with maximum norm (PDD-Max). It is a promise problem to decide whether output probability distributions of two quantum circuits are far apart or close. We show that PDD-Max is BQP-complete, but if the two circuits are restricted to some types in the second level of the Fourier hierarchy, such as the HC1Q model or the IQP model, PDD-Max has a Merlin-Arthur system with quantum polynomial-time Merlin and classical probabilistic polynomial-time Arthur.


I. INTRODUCTION
A. Quantum supremacy of the HC1Q model The Fourier hierarchy [1] is a hierarchy of restricted quantum circuits. The kth level of the Fourier hierarchy, FH k , is the class of quantum circuits with k layers of Hadamard gates and all other gates preserving the computational basis. The second level, FH 2 , is the most important, because circuits in FH 2 are used in many algorithms, such as Simon's algorithm [2] and Shor's factoring algorithm [3]. The instantaneous quantum polynomialtime (IQP) model [4,5] is also in FH 2 : Definition 1 (IQP) The IQP model on N qubits is the following quantum computing model: 1. The initial state is |0 N .

The unitary H ⊗N DH ⊗N is applied.
Here H is the Hadamard gate, and D is a quantum circuit consisting of only Z-diagonal gates, such as Z, CZ, CCZ, and e iθZ , etc.
3. All qubits are measured in the computational basis.
The IQP model is a well-known example of sub-universal quantum computing models whose output probability distributions cannot be classically efficiently sampled unless the polynomial-time hierarchy collapses. Since a collapse of the polynomial-time hierarchy is considered as highly unlikely in computer science, it shows the hardness of classically simulating the IQP model. Other sub-universal models that exhibit similar quantum supremacy are also known, such as the depth four model [6], the Boson sampling model [7], the DQC1 model [8][9][10][11][12], the Fourier sampling model [13], the conjugated Clifford model [14], and the random circuit model [15].
The first main result of the present paper is to add another simple model in FH 2 to the above list of sub-universal models that exhibit quantum supremacy. We define the Hadamard-classical circuit with one-qubit (HC1Q) model as follows: Definition 2 (HC1Q) The Hadamard-classical circuit with one-qubit (HC1Q) model on N qubits is the following quantum computing model (see Fig. 1(a)): 1. The initial state is |0 N .  We show that the HC1Q model is universal with a postselection. More precisely, we show the following: Theorem 3 Let U be a polynomial-time uniformly-generated quantum circuit acting on n qubits that consists of poly(n) number of Hadamard gates and classical reversible gates, such as X, CNOT, and Toffoli. Then from U we can efficiently construct an HC1Q circuit on N = poly(n) qubits such that a postselection on some output qubits generates the state U|0 n .
Since Hadamard plus classical gates are universal, Theorem 3 shows that HC1Q model is universal with a postselection. A proof of the theorem is given in Sec. II. The proof is based on a new idea that is different from those used in the previous results [4,6,7,9].
Proofs for the depth-four model [6], the Boson sampling model [7], and the IQP model [4] use gadgets that insert gates in a post hoc way by using postselections. ( By using the arguments of Ref. [4,7,9], we obtain the following corollary of Theorem 3, which is quantum supremacy of the HC1Q model: Corollary 4 Output probability distributions of the HC1Q model cannot be classically efficiently sampled within a multiplicative error ǫ < 1 unless the polynomial-time hierarchy collapses to the third level.
Here, we say that a probability distribution {p z } z is classically efficiently sampled within a multiplicative error ǫ if there exists a classical probabilistic polynomial-time algorithm such that |p z − q z | ≤ ǫp z for all z, where q z is the probability that the algorithm outputs z.
The corollary demonstrates an interesting "phase transition" between classical and quantum. To see it, let us consider the circuit of Fig. 1(b), which is obtained by removing the last |0 qubit of the HC1Q model ( Fig. 1(a)). Here, is a polynomial-time uniformly-generated classical reversible circuit. The circuit of Fig. 1(b) is trivially classically simulatable, since C ′ is a permutation on {0, 1} N −1 and therefore Our result therefore suggests that the addition of a single |0 qubit to the trivial circuit of Fig. 1(b) changes its complexity dramatically.
The third level collapse of the polynomial-time hierarchy for the depth-four model, the Boson sampling model, the IQP model, and the DQC1 model can be improved to the second level collapse [11,12]. In Sec. III, we show that the same improvement is possible for the HC1Q model: Its proof is omitted since it is similar to that of Theorem 6.
At this moment, we do not know which sub-universal model is the most promising for experimental realizations, but the HC1Q model should be useful for certain experimental setups due to its simple structure. it is open whether any problem L in BQP has an interactive proof system with a quantum polynomial-time prover and a classical probabilistic polynomial-time verifier.

Definition 9
We say that a problem L has an interactive proof system with a quantum polynomial-time prover if there exists a classical probabilistic polynomial-time verifier such that • If x ∈ L then there exists a quantum polynomial-time prover such that the verifier accepts with probability at least 2/3.
• If x / ∈ L then for any prover the verifier accepts with probability at most 1/3.
(Note that in this definition, the prover is in quantum polynomial-time only for yes instances, i.e., when the prover is honest. For no instances, the computational power of the malicious prover is unbounded.) Answering the open question is important not only for practical applications of cloud quantum computing but also for foundations of computer science and quantum physics [16].
In fact, several partial solutions to the open problem have been obtained. They are categorized into the following four types: 1. Several verification protocols [17,18] and verifiable blind quantum computing protocols [19][20][21][22][23] demonstrate that if the verifier has a weak quantum ability, such as preparations or measurements of single-qubit quantum states, any BQP problem can be verified with a quantum polynomial-time prover.
2. If multiple entangling quantum polynomial-time provers who are not communicating with each other are allowed, any BQP problem is verified with a classical polynomialtime verifier [24][25][26].
3. Since BQP is contained in IP [27], a natural approach to the open problem is to restrict the prover of IP to quantum polynomial-time when the problem is in BQP. In fact, recently, a step in this line has been obtained in Ref. [28]. The authors of Ref. [28] have constructed a new interactive proof system that verifies the value of the trace of operators with a postBQP prover and a classical polynomial-time verifier. 4. It has been shown recently that the classical verification of quantum computing is possible if a certain problem is assumed to be hard for quantum computing [29].
Actually, the answer to the open problem is unconditionally yes if we consider specific BQP problems. For example, it is known that the recursive Fourier sampling [30] has an interactive proof system with a quantum polynomial-time prover and a classical polynomialtime verifier who communicate polynomial number of messages [31]. Furthermore, it has been shown recently that calculating orders of solvable groups has an interactive proof system with a quantum polynomial-time prover and a classical polynomial-time verifier who exchange two or three messages [33]. Finally, it was suggested in Ref. [32] that a problem of deciding whether there exist some results that occur with high probability or not for circuits in FH 2 has a Merlin-Arthur system with quantum polynomial-time Merlin.

Definition 10
We say that a problem L has a Merlin-Arthur system with a quantum polynomial-time Merlin if L has an interactive proof system with a quantum polynomialtime prover (Merlin) and a classical probabilistic polynomial-time verifier (Arthur) where only a single message transmission is done from the prover to the verifier.
The second main result of the present paper is to introduce another problem in BQP that is classically verifiable. More precisely, we define the following promise problem that we call Probability Distribution Distinguishability with Maximum Norm (PDD-Max): Definition 11 (PDD-Max) Given a classical description of two quantum circuits U 1 and U 2 acting on N qubits that consist of poly(N) number of elementary gates, and parameters We first show that PDD-Max exactly characterizes the power of BQP: Theorem 12 PDD-Max is BQP-complete (under polynomial-time many-one reduction). These results demonstrate that if we restrict the circuits of PDD-Max to the form of the HC1Q model, it is another example of problems that is classically verifiable. These results also suggest that such a restriction of PDD-Max is not BQP-hard, since BQP is not believed to be in MA [34].

Its proof is given in
The classical verifiability of PDD-Max for restricted circuits does not seem to be directly related to the classical verification of quantum supremacy, such as the verification of the IQP model, but it is an important future research subject to explore any relation between them.

C. Preliminary
In this paper, we often use the following well known inequality: Theorem 14 (Chernoff-Hoeffding bound) Let X 1 , ..., X T be identically and independently distributed real random variables with |X i | ≤ 1 for every i = 1, ..., T . Then In the following sections, we give proofs of theorems. In the last section, Sec. VII, we provide some discussions.

II. PROOF OF THEOREM 3
In this section, we show Theorem 3. We are given a unitary operator acting on n qubits, where u i is the Hadamard gate H or a classical gate for all i = 1, 2, ..., t.
The outline of our proof is as follows. We first construct a polynomial-time nondeterministic algorithm from U ′ . We next define a polynomial-time quantum unitary operator W , which consists of only classical gates, from the polynomial-time non-deterministic algorithm. We finally show that the HC1Q model of Fig. 2   1. The initial state of the register is (s = 0, z = 0 n ).
2-a. If u i is a classical gate (such as X, CNOT, or Toffoli) whose corresponding action update the state of the register as (s, z) → (s, g(z)).

2-b.
If u i is H acting on jth qubit, update the state of the register in the following non-deterministic way: i.e., h is the number of H appearing in U ′ . The above polynomial-time non-deterministic algorithm does the non-deterministic transition h times, and therefore the algorithm has 2 h computational paths. We label each path by an h-bit string y ∈ {0, 1} h . We write the final state of the register for the path y ∈ {0, 1} h by (s(y), z(y)), where s(y) ∈ {0, 1} and z(y) ∈ {0, 1} n . Then, we obtain It is easy to see that W can be constructed with only classical gates. Now we show that the state H ⊗n U ′ |0 n = U|0 n can be generated by the HC1Q model of Fig. 2 that uses W with a postselection. In Fig. 2, the state immediately before the postselection is where |ψ is a certain (h + n + 1)-qubit state whose detail is irrelevant [35]. After the postselection, the state becomes Hence, we have shown that the HC1Q model of Fig. 2 with a postselection can generate U|0 n for any unitary U that consists of Hadamard and classical gates, which means that the HC1Q model is universal with a postselection.

III. PROOF OF THEOREM 5
It was shown in Refs. [11,12] that the third level collapse of the polynomial-time hierarchy for most of the sub-universal models (including the depth-four model [6], the Boson sampling model [7], the IQP model [4], and the DQC1 model [9]) is improved to the second level collapse. The idea is to use the class NQP [37] in stead of postBQP. NQP is a quantum version of NP, and defined as follows: The probability p Λ of obtaining Λ is By the definition of NQP, when x ∈ A no . It means that p Λ > 0 when x ∈ A yes and p Λ = 0 when x ∈ A no .
Assume that p Λ is classically efficiently sampled within a multiplicative error ǫ < 1. It means that there exists a classical polynomial-time probabilistic algorithm that outputs 0 or 1 such that where q 0 is the probability that the classical algorithm outputs 0 (accepts). Then, we can show that A is in NP. In fact, if x ∈ A yes then According to Definition 16, A is therefore in NP.

IV. PROOF OF THEOREM 6
In this section, we show Theorem 6. For simplicity, we assume that the first k qubits are measured, where k = O(log N). Generalizations to other k qubits are the same.
The outline of our proof is as follows. We first construct a probability distribution {q z } z that can be calculated in classical polynomial time, and is close to {p z } z within a 1/poly L1-norm. We then show that we can sample {q z } z in classical polynomial time.
Let us consider the circuit of Fig. 1 (a). As is shown in Appendix B, the probability of obtaining the measurement result z ∈ {0, 1} k for the first k qubits is The subset S can be obtained in polynomial time in the following way: 2. Repeat the following for all α ∈ {0, 1} k .

End.
From the construction, |S| ≤ 2 k = poly(N). Therefore, the value of f (x) is exactly com- Let us generate random bit strings x 1 , ..., x T ∈ {0, 1} N −1 . We then calculatẽ Note that if we take X i = f (x i ), then E(X i ) = p z , and therefore from the Chernoff-Hoeffding bound, For any polynomial r, let us take ǫ = 1 5×2 2k r . Given {p z } z , define the probability distribution {q z } z by Note that it is well defined, because which is shown as follows: It is shown as follows. First, In this way, we have shown that we can calculate in classical polynomial time the probability distribution {q z } z that is close to {p z } z . Our final task is to show that {q z } z can be sampled in classical polynomial time. For simplicity, let us assume that each q z is represented in the m-bit binary: where a z,j ∈ {0, 1} for j = 1, 2, ..., m. (Otherwise, by polynomially increasing the size of m, we obtain exponentially good approximations.) The following algorithm samples the probability distribution {q z } z .
where y < z and y ≤ z mean the standard dictionary order. (For example, for three bits, 000 < 001 < 010 < 011 < 100 < 101 < 110 < 111.) In summary, we have shown that {p z } z can be sampled in polynomial time within a 1/poly(N) L1-norm error.

V. PROOF OF THEOREM 12
In this section, we show Theorem 12. Our proof consists of two parts. In the first subsection, we show that PDD-Max is BQP-hard. In the second subsection, we show that PDD-Max is in BQP.

A. BQP-hardness
Let A = (A yes , A no ) be a promise problem in BQP. It means that there exists a uniform family {V x } x of polynomial-size quantum circuits such that if x ∈ A yes then V x accepts with probability at least 1 − 2 −r , and if x ∈ A no then V x accepts with probability at most 2 −r , where r is any polynomial. More precisely, let V x be the polynomial-size quantum circuit, which acts on n = poly(|x|) qubits, corresponding to the instance x. If we write V x |0 n as with certain (n − 1)-qubit states |φ 0 and |φ 1 , we have α ≥ 1 − 2 −r when x ∈ A yes , and α ≤ 2 −r when x ∈ A no . Let us consider the circuit of Fig. 3. We call it U 1 . We also define We now show that deciding x ∈ A yes or x ∈ A no can be reduced to a PDD-Max problem with U 1 and U 2 , which means that PDD-Max is BQP-hard. In fact, note that Let p z ≡ | z|U 1 |0 n+m+1 | 2 be the probability that we obtain z ∈ {0, 1} n+m+1 when we measure all qubits of U 1 |0 n+m+1 in the computational basis. When x ∈ A yes , When x ∈ A no , H ⊗n+m+1 , it is obvious that q z = 1 2 n+m+1 for any z. Therefore, when x ∈ A yes , and when x ∈ A no , for any z ∈ {0, 1} n+m+1 . In this way, we have shown that PDD-Max is BQP-hard.

B. In BQP
We next show that PDD-Max is in BQP. We show that the following BQP algorithm solves PDD-Max with a 1/poly(N) completeness-soundness gap (i.e., the gap of acceptance probabilities between the yes-instances and the no-instances is lowerbounded by 1/poly(N)): 1. Flip a fair coin s ∈ {0, 1}. Generate U s+1 |0 N , and measure each qubit in the computational basis. Let z ∈ {0, 1} N be the measurement result.
2. Repeat the following for i = 1, 2, ..., T , where T is a polynomial of N specified later.
2-a. Generate U 1 |0 N , and measure all qubits in the computational basis. If the result is z, then set X i = 1. Otherwise, set X i = 0.

Calculatep
the Chernoff-Hoeffding bound, |p z − p z | ≤ ǫ with probability larger than 1 − 2e − T ǫ 2 2 . In a similar way, calculate the estimatorq z of q z ≡ | z|U 2 |0 N | 2 . From the Chernoff- The intuitive idea of this algorithm is as follows. By the definition of the PDD-Max, if the answer of the PDD-Max is yes, there exists z such that |p z − q z | is large. In this case, the probability of obtaining such z in step 1 is large. Therefore let us assume that we obtain such z in step 1. The step 1 is, in other words, the process to find a candidate of the solution. In steps 2 and 3, probabilities p z and q z are estimated by using the Chernoff-Hoeffding bound.
Finally, in step 4, we check whether |p z −q z | is indeed large, and accept with high probability since it is actually large. If the answer of the PDD-Max is no, on the other hand, there is no z such that |p z − q z | is large, and therefore in step 4 we do not conclude that |p z − q z | is large except for some failure probability. (Note that for general U 1 and U 2 , estimating p z and q z seems to require BQP power, since it seems to be necessary to generate U 1 |0 N and U 2 |0 N . We will see in the next section that if U 1 and U 2 are restricted in FH 2 , the estimation of p z and q z can be done in classical polynomial time, and therefore PDD-Max has a Merlin-Arthur system with quantum polynomial-time Merlin.) Now let us give a more precise proof that PDD-Max is in BQP. First, let us consider the case when the answer to PDD-Max is YES. If z obtained in step 1 satisfies |p z − q z | ≥ a, and ifp z andq z calculated in steps 2 and 3 satisfy |p z − p z | ≤ a−b 8 and |q z − q z | ≤ a−b 8 , we definitely accept in step 4, because and therefore The probability η of occurring such an event is calculated to be where we have taken T ≥ 128k (a−b) 2 and k is any polynomial of N. Therefore, the acceptance probability p acc of our protocol is lowerbounded as Next, let us consider the case when the answer to PDD-Max is NO. If |p z − p z | ≤ a−b 8 and |q z − q z | ≤ a−b 8 , which occurs with probability ≥ (1 − 2e −k ) 2 , we definitely reject, because The rejection probability p rej is therefore lowerbounded as Therefore, the acceptance probability p acc is upperbounded as The completeness-soundness gap is therefore for sufficiently large k, which shows that PDD-Max is in BQP.

VI. PROOF OF THEOREM 13
In this section, we show Theorem 13. Before giving the proof, let us explain the intuitive idea. In Sec. V B, we have seen that PDD-Max is in BQP for general U 1 and U 2 . In step 1 of the algorithm, a candidate of the solution, i.e., z such that |p z −q z | ≥ a, is obtained by doing quantum computing. In step 2, another quantum computing is necessary to estimate p z and q z . In the following we will see that if U 1 and U 2 are the HC1Q model, estimations of p z and q z can be done in classical polynomial-time. Therefore, we can construct the following 3. Arthur calculatesp p z andq z that Arthur calculates satisfy |p z − p z | ≤ a−b 8 and |q z − q z | ≤ a−b 8 , Arthur definitely accepts, because Taking the honest prover in step 1, the probability η of occurring such an event is calculated to be where we have taken T ≥ 128k (a−b) 2 and k is any polynomial of N. Therefore, the probability p acc that Arthur accepts in our protocol is lowerbounded as Next, let us consider the case when the answer to PDD-Max is no. If |p z − p z | ≤ a−b 8 and |q z − q z | ≤ a−b 8 , which occurs with probability ≥ (1 − 2e −k ) 2 , Arthur definitely rejects, because Therefore, the probability p acc that Arthur accepts in our protocol is upperbounded as The completeness-soundness gap is therefore for sufficiently large k, which shows that PDD-Max has a Merlin-Arthur system with quantum polynomial-time Merlin.
Remarks. To conclude this section, we provide some remarks. Note that the fact that p z and q z can be estimated in classical polynomial-time seems to be a special property only for some circuits (such as those in FH 2 ), and we do not know how to do that for other general circuits. (In particular, in Sec. VII B, we will explain why the technique we use for FH 2 circuits will not work for other circuits, such as FH 3 circuits.) Furthermore, note that what Arthur can do is to estimate p z within an additive error given z: he cannot sample Furthermore, we would like to point out that PDD-Max can has a Merlin-Arthur system with quantum polynomial-time Merlin for other circuits outside of FH 2 . The essential point of our proof is that the output probability distributions, p z and q z , of U 1 and U 2 can be classically efficiently estimated. This property itself is not restricted to circuits in FH 2 , and therefore we believe that our results should hold for many other circuits outside of FH 2 .
B. FH 3 FH 2 circuits have nice structures such that p z and q z can be estimated in classical polynomial-time. We do not know how to do the same thing for other circuits such as those in FH 3 . The reason is as follows. From a similar calculation given in Appendix C, we can show that the probability p z ≡ | z|U|0 N | 2 for an FH 3 circuit U satisfies In order to make 2 s(N ) ǫ = O(1/poly(N)), ǫ must be exponentially small, which means that T must be exponentially large. Therefore, we cannot obtain a 1/poly(N)-precision estimator of p z in classical polynomial time. Note that it is known that as long as we use f as a black box, the Chernoff-Hoeffding type bounds are optimal (up to some factors) [39]. This is the reason why our previous proof does not work for other circuits.
If we calculate its norm, which is the probability p z , we obtain Eq. (3).
Then the state becomes We apply the classical gate C to obtain We apply the final Hadamard H ⊗(N −1) ⊗ I to obtain where C j (x0) is the jth bit of C(x0) and y j is the jth bit of y. Therefore