Quantum algorithms and lower bounds for convex optimization

While recent work suggests that quantum computers can speed up the solution of semideﬁnite programs, little is known about the quantum complexity of more general convex optimization. We present a quantum algorithm that can optimize a convex function over an n -dimensional convex body using ˜ O ( n ) queries to oracles that evaluate the objective function and determine membership in the convex body. This represents a quadratic improvement over the best-known classical algorithm. We also study limitations on the power of quantum computers for general convex optimization, showing that it requires ˜Ω( √ n ) evaluation queries and Ω( √ n ) membership queries.


Introduction
Convex optimization has been a central topic in the study of mathematical optimization, theoretical computer science, and operations research over the last several decades. On the one hand, it has been used to develop numerous algorithmic techniques for problems in combinatorial optimization, machine learning, signal processing, and other areas. On the other hand, it is a major class of optimization problems that admits efficient classical algorithms [5,12]. Approaches to convex optimization include the ellipsoid method [12], interior-point methods [10,17], cutting-plane methods [18,28], and random walks [16,23].
The fastest known classical algorithm for general convex optimization solves an n-dimensional instance usingÕ(n 2 ) queries to oracles for the convex body and the objective function, and runs in timeÕ(n 3 ) [21]. 1 The novel step of [21] is a construction of a separation oracle by a subgradient calculation with O(n) objective function calls and O(n) extra time. It then relies on a reduction from optimization to separation that makesÕ(n) separation oracle calls and runs in timeÕ(n 3 ) [22]. Although it is unclear whether the query complexity ofÕ(n 2 ) is optimal for all possible classical algorithms, it is the best possible result using the above framework. This is because it takesΩ(n) queries to compute the (sub)gradient (see Section A.1) and it also requires Ω(n) queries to produce an optimization oracle from a separation oracle (see [25] and [24,Section 10.2.2]).
It is natural to ask whether quantum computers can solve convex optimization problems faster. Recently, there has been significant progress on quantum algorithms for solving a special class of convex optimization problems called semidefinite programs (SDPs). SDPs generalize the betterknown linear programs (LPs) by allowing positive semidefinite matrices as variables. For an SDP with n-dimensional, s-sparse input matrices and m constraints, the best known classical algorithm [22] finds a solution in timeÕ(m(m 2 + n ω + mns) poly log(1/ )), where ω is the exponent of matrix multiplication and is the accuracy of the solution. Brandão and Svore gave the first quantum algorithm for SDPs with worst-case running timeÕ( √ mns 2 (Rr/ε) 32 ), where R and r upper bound the norms of the optimal primal and dual solutions, respectively [7]. Compared to the aforementioned classical SDP solver [22], this gives a polynomial speedup in m and n. Van Apeldoorn et al. [3] further improved the running time of a quantum SDP solver toÕ( √ mns 2 (Rr/ ) 8 ), which was subsequently improved toÕ ( √ m + √ n(Rr/ ))s(Rr/ ) 4 [2,6]. The latter result is tight in the dependence of m and n since there is a quantum lower bound of Ω( √ m + √ n) for constant R, r, s, [7].
However, semidefinite programming is a structured form of convex optimization that does not capture the problem in general. In particular, SDPs are specified by positive semidefinite matrices, and their solution is related to well-understood tasks in quantum computation such as solving linear systems (e.g., [9,13]) and Gibbs sampling (e.g., [2,6]). General convex optimization need not include such structural information, instead only offering the promise that the objective function and constraints are convex. Currently, little is known about whether quantum computers could provide speedups for general convex optimization. Our goal is to shed light on this question

Convex optimization
We consider the following general minimization problem: min x∈K f (x), where K ⊆ R n is a convex set and f : K → R is a convex function. (1.1) We assume we are given upper and lower bounds on the function values, namely m ≤ min x∈K f (x) ≤ M , and inner and outer bounds on the convex set K, namely B 2 (0, r) ⊆ K ⊆ B 2 (0, R), (1.2) where B 2 (x, l) is the ball of radius l in L 2 norm centered at x ∈ R n . We ask for a solutionx ∈ K with precision , in the sense that We consider the very general setting where the convex body K and convex function f are only specified by oracles. In particular, we have: • A membership oracle O K for K, which determines whether a given x ∈ R n belongs to K; • An evaluation oracle O f for f , which outputs f (x) for a given x ∈ K.
Convex optimization has been well-studied in the model of membership and evaluation oracles since this provides a reasonable level of abstraction of K and f , and it helps illuminate the algorithmic relationship between the optimization problem and the relatively simpler task of determining membership [12,21,22]. The efficiency of convex optimization is then measured by the number of queries to the oracles (i.e., the query complexity) and the total number of other elementary gates (i.e., the gate complexity).
It is well known that a general bounded convex optimization problem is equivalent to one with a linear objective function over a different bounded convex set. In particular, if promised that min x∈K f (x) ≤ M , (1.1) is equivalent to the problem min x ∈R, x∈K x such that f (x) ≤ x ≤ M. (1.4) Observe that a membership query to the new convex set can be implemented with one query to the membership oracle for K and one query to the evaluation oracle for f . Thus the ability to optimize a linear function min x∈K c T x (1. 6) for any c ∈ R n and convex set K ⊆ R n is essentially equivalent to solving a general convex optimization problem. A procedure to solve such a problem for any specified c is known as an optimization oracle. Thus convex optimization reduces to implementing optimization oracles over general convex sets (Lemma 2.1). The related concept of a separation oracle takes as input a point p / ∈ K and outputs a hyperplane separating p from K.
In the quantum setting, we model oracles by unitary operators instead of classical procedures. In particular, in the quantum model of membership and evaluation oracles, we are promised to have unitaries O K and O f such that • For any x ∈ R n , O K |x, 0 = |x, δ[x ∈ K] , where δ[P ] is 1 if P is true and 0 if P is false; • For any x ∈ R n , O f |x, 0 = |x, f (x) .
In other words, we allow coherent superpositions of queries to both oracles. If the classical oracles can be implemented by explicit circuits, then the corresponding quantum oracles can be implemented by quantum circuits of about the same size, so the quantum query model provides a useful framework for understanding the quantum complexity of convex optimization.

Contributions
We now describe the main contributions of this paper. Our first main result is a quantum algorithm for optimizing a convex function over a convex body. Specifically, we show the following: Theorem 1.1. There is a quantum algorithm for minimizing a convex function f over a convex set K ⊆ R n usingÕ(n) queries to an evaluation oracle for f andÕ(n) queries to a membership oracle for K. The gate complexity of this algorithm isÕ(n 3 ).
Recall that the state-of-the-art classical algorithm [21] for general convex optimization with evaluation and membership oracles usesÕ(n 2 ) queries to each. Thus our algorithm provides a quadratic improvement over the best known classical result. While the query complexity of [21] is not known to be tight, it is the best possible result that can be achieved using subgradient computation to implement a separation oracle, as discussed above.
The proof of Theorem 1.1 follows the aforementioned classical strategy of constructing a separating hyperplane for any given point outside the convex body [21]. We find this hyperplane using a fast quantum algorithm for gradient estimation usingÕ(1) evaluation queries, as first proposed by Jordan [15] and later refined by [11] with more rigorous analysis. However, finding a suitable hyperplane in general requires calculating approximate subgradients of convex functions that may not be differentiable, whereas the algorithms in [15] and [11] both require bounded second derivatives or more stringent conditions. To address this issue, we introduce classical randomness into the algorithm to produce a suitable approximate subgradient withÕ(1) evaluation queries, and show how to use such an approximate subgradient in the separation framework to produce a faster quantum algorithm.
Our new quantum algorithm for subgradient computation is the source of the quantum speedup of the entire algorithm and establishes a separation in query complexity for the subgradient computation between quantum (Õ(1)) and classical (Ω(n), see Section A.1) algorithms. This subroutine could also be of independent interest, in particular in the study of quantum algorithms based on gradient descent and its variants (e.g., [19,27]).
On the other hand, we also aim to establish corresponding quantum lower bounds to understand the potential for quantum speedups for convex optimization. To this end, we prove: There exists a convex body K ⊆ R n , a convex function f on K, and a precision > 0, such that a quantum algorithm needs at least Ω( √ n) queries to a membership oracle for K and Ω( √ n/ log n) queries to an evaluation oracle for f to output a pointx satisfying with high success probability (say, at least 0.8).
We establish the query lower bound on the membership oracle by reductions from search with wildcards [1]. The lower bound on evaluation queries uses a similar reduction, but this only works for an evaluation oracle with low precision. To prove a lower bound on precise evaluation queries, we propose a discretization technique that relates the difficulty of the continuous problem to a corresponding discrete one. This approach might be of independent interest since optimization problems naturally have continuous inputs and outputs, whereas most previous work on quantum lower bounds focuses on discrete inputs. Using this technique, we can simulate one perfectly precise query by one low-precision query at discretized points, thereby establishing the evaluation lower bound as claimed in Theorem 1.2. As a side point, this evaluation lower bound holds even for an unconstrained convex optimization problem on R n , which might be of independent interest since this setting has also been well-studied classically [5,[24][25][26].
We summarize our main results in Table 1.

Upper bound
To prove our upper bound result in Theorem 1.1, we use the well-known reduction from general convex optimization to the case of a linear objective function, which simplifies the problem to implementing an optimization oracle using queries to a membership oracle (Lemma 2.1). For the reduction from optimization to membership, we follow the best known classical result in [21] which implements an optimization oracle usingÕ(n 2 ) membership queries andÕ(n 3 ) arithmetic operations. In [21], the authors first show a reduction from separation oracles to membership oracles that usesÕ(n) queries and then use a result from [22] to implement an optimization oracle usingÕ(n) queries to a separation oracle, giving an overall query complexity ofÕ(n 2 ). The reduction from separation to membership involves the calculation of a height function defined by the authors (see Eq. (2.41)), whose evaluation oracle can be implemented in terms of the membership oracle of the original set. A separating hyperplane is determined by computing a subgradient, which already takesÕ(n) queries. In fact, it is not hard to see that any classical algorithm requiresΩ(n) classical queries (see Section A.1), so this part of the algorithm cannot be improved classically. The possibility of using the quantum Fourier transform to compute the gradient of a function usingÕ(1) evaluation queries ( [11,15]) suggests the possibility of replacing the subgradient procedure with a faster quantum algorithm. However, the techniques described in [11,15] require the function in question to have bounded second (or even higher) derivatives, and the height function is only guaranteed to be Lipschitz continuous (Definition 2.9) and in general is not even differentiable.
To compute subgradients of general (non-differentiable) convex functions, we introduce classical randomness (taking inspiration from [21]) and construct a quantum subgradient algorithm that uses O(1) queries. Our proof of correctness (Section 2.2) has three main steps: 1. We analyze the average error incurred when computing the gradient using the quantum Fourier transform. Specifically, we show that this approach succeeds if the function has bounded second derivatives in the vicinity of the point where the gradient is to be calculated (see Algorithm 1, Algorithm 2, and Lemma 2.3). Some of our calculations are inspired by [11].
2. We use the technique of mollifier functions (a common tool in functional analysis [14], suggested to us by [20] in the context of [21]) to show that it is sufficient to treat infinitely differentiable functions (the mollified functions) with bounded first derivatives (but possibly large second derivatives). In particular, it is sufficient to output an approximate gradient of the mollified function at a point near the original point where the subgradient is to be calculated (see Lemma 2.4).
3. We prove that convex functions with bounded first derivatives have second derivatives that lie below a certain threshold with high probability for a random point in the vicinity of the original point (Lemma 2.5). Furthermore, we show that a bound on the second derivatives can be chosen so that the smooth gradient calculation techniques work on a sufficiently large fraction of the neighborhood of the original point, ensuring that the final subgradient error is small (see Algorithm 3 and Theorem 2.2).
The new quantum subgradient algorithm is then used to construct a separation oracle as in [21] (and a similar calculation is carried out in Theorem 2.3). Finally the reduction from [22] is used to construct the optimization oracle usingÕ(n) separation queries. From Lemma 2.1, this shows that the general convex optimization problem can be solved usingÕ(n) membership and evaluation queries andÕ(n 3 ) gates.

Lower bound
We prove our quantum lower bounds on membership and evaluation queries separately before showing how to combine them into a single optimization problem. Both lower bounds work over n-dimensional hypercubes.
In particular, we prove both lower bounds by reductions from search with wildcards [1]. In this problem, we are given an n-bit binary string s and the task is to determine all bits of s using wildcard queries that check the correctness of any subset of the bits of s: more formally, the input in the wildcard model is a pair (T, y) where T ⊆ [n] and y ∈ {0, 1} |T | , and the query returns 1 if s |T = y (here the notation s |T represents the subset of the bits of s restricted to T ). Reference [1] shows that the quantum query complexity of search with wildcards is Ω( √ n). For our lower bound on membership queries, we consider a simple objective function, the sum of all coordinates n i=1 x i . In other words, we take c = 1 n in (1.6). However, the position of the hypercube is unknown, and to solve the optimization problem (formally stated in Definition 3.1), one must use the membership oracle to locate it.
Specifically, the hypercube takes the form × n i=1 [s i −2, s i +1] (where × is the Cartesian product) for some offset binary string s ∈ {0, 1} n . In Section 3.1, we prove: • Any query x ∈ R n to the membership oracle of this problem can be simulated by one query to the search-with-wildcards oracle for s. To achieve this, we divide the n coordinates of x into four sets: T x,0 for those in [−2, −1), T x,1 for those in (1,2], T x,mid for those in [−1, 1], and T x,out for the rest. Notice that T x,mid corresponds to the coordinates that are always in the hypercube and T x,out corresponds to the coordinates that are always out of the hypercube; T x,0 (resp., T x,1 ) includes the coordinates for which s i = 0 (resp., s i = 1) impacts the membership in the hypercube. We prove in Section 3.1 that a wildcard query with T = T x,0 ∪ T x,1 can simulate a membership query to x.
• The solution of the sum-of-coordinates optimization problem explicitly gives s, i.e., it solves search with wildcards. This is because this solution must be close to the point (s 1 −2, . . . , s n − 2), and applying integer rounding would recover s.
These two points establish the reduction of search with wildcards to the optimization problem, and hence establishes the Ω( √ n) membership quantum lower bound in Theorem 1.2 (see Theorem 3.2).
For our lower bound on evaluation queries, we assume that membership is trivial by fixing the hypercube at C = [0, 1] n . We then consider optimizing the max-norm function for some unknown c ∈ {0, 1} n . Notice that learning c is equivalent to solving the optimization problem; in particular, outputting anx ∈ C satisfying (1.3) with = 1/3 would determine the string c. This follows because for all i ∈ [n], we have |x i − c i | ≤ max i∈[n] |x i − c i | ≤ 1/3, and c i must be the integer rounding ofx i , i.e., c i = 0 ifx i ∈ [0, 1/2) and c i = 1 ifx i ∈ [1/2, 1]. On the other hand, if we know c, then we know the optimum x = c.
We prove an Ω( √ n/ log n) lower bound on evaluation queries for learning c. Our proof, which appears in Section 3.2, is composed of three steps: 1) We first prove a weaker lower bound with respect to the precision of the evaluation oracle.
Specifically, if f (x) is specified with b bits of precision, then using binary search, a query to f (x) can be simulated by b queries to an oracle that inputs (f (x), t) for some t ∈ R and returns 1 if f (x) ≤ t and returns 0 otherwise. We further without loss of generality assume x ∈ [0, 1] n . If x / ∈ [0, 1] n , we assign a penalty of the L 1 distance between x and its projection π(x) onto [0, 1] n ; by doing so, f (π(x)) and x fully characterizes f (x) (see (3.18)). Therefore, f (x) ∈ [0, 1], and f (x) having b bits of precision is equivalent to having precision 2 −b .
Similar to the interval dividing strategy in the proof of the membership lower bound, we prove that one query to such an oracle can be simulated by one query to the search-with-wildcards oracle for s. Furthermore, the solution of the max-norm optimization problem explicitly gives s, i.e., it solves the search-with-wildcards problem. This establishes the reduction to search with wildcards, and hence establishes an Ω( √ n/b) lower bound on the number of quantum queries to the evaluation oracle f with precision 2 −b (see Lemma 3.1).
2) Next, we introduce a technique we call discretization, which effectively simulates queries over an (uncountably) infinite set by queries over a discrete set. This technique might be of independent interest since proving lower bounds on functions with an infinite domain can be challenging.
We observe that the problem of optimizing (1.8) has the following property: if we are given two strings x, x ∈ [0, 1] n such that x 1 , . . . , x n , 1 − x 1 , . . . , 1 − x n and x 1 , . . . , x n , 1 − x 1 , . . . , 1 − x n have the same ordering (for instance, x = (0.1, 0.2, 0.7) and x = (0.1, 0.3, 0.6) both have the ordering In other words, f (x) can be computed given x and f (x ). Thus it suffices to consider all possible ways of ordering 2n numbers, rendering the problem discrete. Without loss of generality, we focus on x satisfying {x 1 , . . . , x n , 1 − x 1 , . . . , 1 − x n } = { 1 2n+1 , . . . , 2n 2n+1 }, and we denote the set of all such x by D n (see also (3.34)). In Lemma 3.4, we prove that one classical (resp., quantum) evaluation query from [0, 1] n can be simulated by one classical evaluation query (resp., two quantum evaluation queries) from D n using Algorithm 5. To illustrate this, we give a concrete example with n = 3 in Section 3.2.2.
3) Finally, we use discretization to show that one perfectly precise query to f can be simulated by one query to f with precision 1 5n ; in other words, b in step 1) is at most log 2 5n = O(log n) (see Lemma 3.3). This is because by discretization, the input domain can be limited to the discrete set D n . Notice that for any x ∈ D n , f (x) is an integer multiple of 1 2n+1 ; even if f (x) can only be computed with precision 1 5n , we can round it to the closest integer multiple of 1 2n+1 which is exactly f (x), since the distance 2n+1 5n < 1 2 . As a result, we can precisely compute f (x) for all x ∈ D n , and thus by discretization we can precisely compute f (x) for all x ∈ [0, 1] n .
In all, the three steps above establish an Ω( √ n/ log n) quantum lower bound on evaluation queries to solve the problem in Eq. (1.8) (see Theorem 3.2). In particular, this lower bound is proved for an unconstrained convex optimization problem on R n , which might be of independent interest.
As a side result, we prove that our quantum lower bound is optimal for the problem in (1.8) (up to poly-logarithmic factors in n), as we can prove a matchingÕ( √ n) upper bound (Theorem C.1). Therefore, a better quantum lower bound on the number of evaluation queries for convex optimization would require studying an essentially different problem.
Having established lower bounds on both membership and evaluation queries, we combine them to give Theorem 1.2. This is achieved by considering an optimization problem of dimension 2n; the first n coordinates compose the sum-of-coordinates function in Section 3.1, and the last n coordinates compose the max-norm function in Section 3.2. We then concatenate both parts and prove Theorem 1.2 via reductions to the membership and evaluation lower bounds, respectively (see Section 3.3).
In addition, all lower bounds described above can be adapted to a convex body that is contained in the unit hypercube and that contains the discrete set D n to facilitate discretization; we present a "smoothed" hypercube (see Section 3.4) as a specific example.

Open questions
This work leaves several natural open questions for future investigation. In particular: • Can we close the gap for both membership and evaluation queries? Our upper bounds on both oracles in Theorem 1.1 usesÕ(n) queries, whereas the lower bounds of Theorem 1.2 are onlyΩ( √ n).
• Can we improve the time complexity of our quantum algorithm? The time complexityÕ(n 3 ) of our current quantum algorithm matches that of the classical state-of-the-art algorithm [21] since our second step, the reduction from optimization to separation, is entirely classical. Is it possible to improve this reduction quantumly?
• What is the quantum complexity of convex optimization with a first-order oracle (i.e., with direct access to the gradient of the objective function)? This model has been widely considered in the classical literature (see for example Ref. [26]).
Organization. Our quantum upper bounds are given in Section 2 and lower bounds are given in Section 3. Appendices present auxiliary lemmas (Section A) and proof details for upper bounds (Section B) and lower bounds (Section C), respectively.
Related independent work. In independent simultaneous work, van Apeldoorn, Gilyén, Gribling, and de Wolf [4] establish a similar upper bound, showing thatÕ(n) quantum queries to a membership oracle suffice to optimize a linear function over a convex body (i.e., to implement an optimization oracle). Their proof follows a similar strategy to ours, using a quantum algorithm for evaluating gradients inÕ(1) queries to implement a separation oracle. Those authors also establish quantum lower bounds on the query complexity of convex optimization, showing in particular that Ω( √ n) quantum queries to a separation oracle are needed to implement an optimization oracle, implying an Ω( √ n) quantum lower bound on the number of membership queries required to optimize a convex function. While Ref. [4] does not explicitly focus on evaluation queries, those authors have pointed out to us that an Ω( √ n) lower bound on evaluation queries can be obtained from their lower bound on membership queries, using a careful application of techniques inspired by [21] (although our approach gives a bound with a better Lipschitz parameter).

Upper bound
In this section, we prove: 1. An optimization oracle for a convex set K ⊆ R n can be implemented usingÕ(n) quantum queries to a membership oracle for K, with gate complexityÕ(n 3 ).
The following lemma shows the equivalence of optimization oracles to a general convex optimization problem.
Lemma 2.1. Suppose a reduction from an optimization oracle to a membership oracle for convex sets requires O(g(n)) queries to the membership oracle. Then the problem of optimizing a convex function over a convex set can be solved using O(g(n)) queries to both the membership oracle and the evaluation oracle.
Proof. The problem min x∈K f (x) reduces to the problem min (x ,x)∈K x where K is defined as in (1.4). K is the intersection of convex sets and is therefore itself convex. A membership oracle for K can be implemented using 1 query each to the membership oracle for K and the evaluation oracle for f . Since O(g(n)) queries to the membership oracle for K are sufficient to optimize any linear function, the result follows. Theorem 1.1 directly follows from Theorem 2.1 and Lemma 2.1.
Overview. This part of the paper is organized following the plan outlined in Section 1.3.1. Precise definitions of oracles and other relevant terminology appear in Section 2.1. Section 2.2 develops a fast quantum subgradient procedure that can be used in the classical reduction from optimization to membership. This is done in two parts: 1. Section 2.2.1 presents an algorithm based on the quantum Fourier transform that calculates the gradient of a function with bounded second derivatives (i.e., a β-smooth function) with bounded expected one-norm error.
2. Section 2.2.2 uses mollification to restrict the analysis to infinitely differentiable functions without loss of generality, and then uses classical randomness to eliminate the need for bounded second derivatives.
In Section 2.3 we show that the new quantum subgradient algorithm fits into the classical reduction from [21]. Finally, we describe the reduction from optimization to membership in Section 2.4.

Oracle definitions
In this section, we provide precise definitions for the oracles for convex sets and functions that we use in our algorithm and its analysis. We also provide precise definitions of Lipschitz continuity and β-smoothness, which we will require in the rest of the section.
Definition 2.2 (Interior of a convex set). For any δ > 0, the δ-interior of a convex set K is defined as Definition 2.4 (Evaluation oracle). When queried with x ∈ R n and δ > 0, output α such that |α − f (x)| ≤ δ. We use EVAL δ (f ) to denote the time complexity. The classical procedure or quantum unitary representing the oracle is denoted by O f .

Definition 2.5 (Membership oracle).
When queried with x ∈ R n and δ > 0, output an assertion that The time complexity is denoted by MEM δ (K). The classical procedure or quantum unitary representing the membership oracle is denoted by O K . Definition 2.6 (Separation oracle). When queried with x ∈ R n and δ > 0, with probability 1 − δ, either • assert x ∈ B 2 (K, δ) or • output a unit vectorĉ such thatĉ T x ≤ĉ T y + δ for all y ∈ B 2 (K, −δ). The time complexity is denoted by SEP δ (K).
Definition 2.7 (Optimization oracle). When queried with a unit vector c, find y ∈ R n such that c T x ≤ c T y + δ for all x ∈ B 2 (K, −δ) or asserts that B 2 (K, δ) is empty. The time complexity of the oracle is denoted by OPT δ (K).
for all y ∈ R n . For a differentiable convex function, the gradient is the only subgradient. The set of subgradients of f at x is called the subdifferential at x and denoted by ∂f (x).

Definition 2.9 (L-Lipschitz continuity)
. A function f is said to be L-Lipschitz continuous (or simply L-Lipschitz) in a set S if for all x ∈ S, g ∞ ≤ L for any g ∈ ∂f (x). An immediate consequence of this is that for any x, y ∈ S, Definition 2.10 (β-smoothness). A function f is said to be β-smooth in a set S if for all x ∈ S, the magnitudes of the second derivatives of f in all directions are bounded by β. This also means that the largest magnitude of an eigenvalue of the Hessian ∇ 2 f (x) is at most β. Consequently, for any x, y ∈ S, we have

Evaluation to subgradient
In this section we present a procedure that, given an evaluation oracle for an L-Lipschitz continuous function f : R n → R with evaluation error at most > 0, a point x ∈ R n , and an "approximation scale" factor r 1 > 0, computes an approximate subgradientg of f at x. Specifically,g satisfies for all q ∈ R n , where Eζ ≤ ξ(r 1 , ) and ξ must monotonically increase with as α for some α > 0.

Smooth functions
We first describe how to approximate the gradient of a smooth function. Algorithm 1 and Algorithm 2 use techniques from [15] and [11] to evaluate the gradient of a function with bounded second derivatives in the neighborhood of the evaluation point. To analyze their behavior, we begin with the following lemma showing that Algorithm 1 provides a good estimate of the gradient with bounded failure probability. • l = 2 /nβ to be the size of the grid used, (2.6) 3 Apply the inverse QFT over G to each of the registers; 4 Measure the final state to get k 1 , k 2 , . . . , k d and reportg = 2L N (k 1 , k 2 , . . . , k d ) as the result.
Lemma 2.2. Let f : R n → R be an L-Lipschitz function that is specified by an evaluation oracle with error at most . Let f be β-smooth in B ∞ (x, 2 /β), and letg be the output of GradientEstimate(f, , L, β, x 0 ) (from Algorithm 1). Then The proof of Lemma 2.2 is deferred to Lemma B.2 in the appendix. Next we analyze Algorithm 2, which uses several calls to Algorithm 1 to provide an estimate of the gradient that is close in expected L 1 distance to the true value.
Data: Function f , evaluation error , Lipschitz constant L, smoothness parameter β, and point x.
i lie in an interval of size 3000 √ n β, setg i to be the median of the points in that interval; 6 Otherwise, setg i = 0; 3. Let f be a convex, L-Lipshcitz continuous function that is specified by an evaluation oracle with error at most .
Proof. For each dimension i ∈ [n] and each iteration t ∈ [T ], consider the random variable From the conditions on the function f , Lemma 2.2 applies to GradientEstimate(f, , L, β, x), and thus Pr X t i = 1 < 1/3. Thus, by the Chernoff bound, Pr , and (2.9) follows. The algorithm makes T = poly(log(1/n β)) calls to a procedure that makes one query to the evaluation oracle. Thus the query complexity isÕ(1). To evaluate the gate complexity, observe that we iterate over n dimensions, using poly(b) = poly(log(1/n β)) gates for the quantum Fourier transform over each. This process is repeated T = poly(log(1/n β)) times. Thus the entire algorithm usesÕ(n) gates.

Extension to non-smooth functions
Now consider a general L-Lipschitz continuous convex function f . We show that any such function is close to a smooth function, and we consider the relationship between the subgradients of the original function and the gradient of its smooth approximation.
For any δ > 0, let m δ : R n → R be the mollifier function of width δ, defined as is obtained by convolving it with the mollifier function, i.e., The mollification of f has several key properties, as follows: These properties of the mollifier function are well known in functional analysis [14]. For completeness a proof is provided in Lemma A.2.
Proof. For all q ∈ R n , convexity of F δ implies so (2.13) follows from Proposition 2.1(iv).
Now consider δ such that Lδ . Then the evaluation oracle with error for f is also an evaluation oracle for F δ with error + Lδ ≈ . Thus the given evaluation oracle is also the evaluation oracle for an infinitely differentiable convex function with the same Lipschitz constant, with almost equal error, allowing us to analyze infinitely differentiable functions without loss of generality (as long as we make no claim about the second derivatives). This idea is made precise in Theorem 2.2.
Unfortunately, Lemma 2.3 cannot be directly used to calculate subgradients for F δ as δ → 0. Note that for the given evaluation oracle for f to also be an ∼ -evaluation oracle for F δ , we must have δ ≤ . Furthermore, there exist convex functions (such as f (x) = |x|) where if |f (x)−g(x)| ≤ δ and g(x) is β-smooth, then βδ ≥ c for some constant c (see Lemma A.3 in the appendix). Thus using the SmoothQuantumGradient algorithm at x = 0 will give us a one-norm error of 3000n 3/2 √ β ≥ 3000n 3/2 √ c, which is independent of . To avoid this problem, we take inspiration from [21] and introduce classical randomness into the gradient evaluation. In particular, the following lemma shows that for a Lipschitz continuous function, if we sample at random from the neighborhood of any given point, the probability of having large second derivatives is small. Let y ∈ R Y indicate that y is sampled uniformly at random from the set Y . Also, let λ(x) be the largest eigenvalue of the Hessian matrix ∇ 2 f (x) at x. Since the Hessian is positive semidefinite, we have λ(x) ≤ ∆f (x) := Tr(∇ 2 f (x)). Thus the second derivatives of a function are bounded by ∆f (x).
where (2.20) comes from the divergence theorem and η(y) is the area element on the surface ∂B ∞ (x, r 1 ) defined as Using Markov's inequality with Lemma 2.5, we have for p > 1. We use this fact to argue that at most points y ∈ B ∞ (x, r 1 ), we can use the SmoothQuantumGradient procedure (with a second derivative bound β 0 = pnL/r 1 ) and obtain good estimates to the gradient (with error that monotonically decreases with ). From Lemma 2.3, we see that for SmoothQuantumGradient to be successful at a point y, the second derivative bound β 0 = pnL/r 1 must hold not only at y, but at every point z ∈ B ∞ (y, l), where l := 2 /β 0 . Thus we wish to upper bound the probability that a point y lies in the lneighborhood of the set of points with second derivatives greater than β 0 . Specifically, we have the following.  Thus we have shown that if we choose a point at random in the r 1 -neighborhood of the given point x, every point in its l-neighborhood has small second derivatives with high probability. Note the assumption that L ≥ 1 is without loss of generality since otherwise we could simply run the algorithm with L = 1. Now we are ready to show that Algorithm 3 produces a good approximate subgradient. Theorem 2.2. Let f be a convex, L-Lipschitz function that is specified by an evaluation oracle with error < min{1, 8192r 3 1 }. Letg = QuantumSubgradient(f, , L, x, r 1 ) (from Algorithm 3). Then for all q ∈ R n , where Eζ ≤ L 1/3 15000n 2 + n 4r 1 .

Membership to separation
In this section we show how the approximate subgradient procedure Algorithm 3 fits into the reduction from separation to membership presented in [21]. We use the height function h p : R n → R defined in [21] as h p (x) = max{t ∈ R | x + tp ∈ K}. (2.41) The height function has the following properties: Algorithm 4: SeparatingHalfspace(K, p, ρ, δ) Data: Convex set K such that B 2 (0, r) ⊂ K ⊂ B 2 (0, R), κ = R/r, δ-precision membership oracle for K, point p. 1 if the membership oracle asserts that p ∈ B 2 (K, δ) then 2 Output: p ∈ B 2 (K, δ).
If p ∈ B 2 (K, δ) the algorithm is trivially correct. If p / ∈ B 2 (0, R), the algorithm outputs a halfspace that contains B 2 (0, R) (and therefore contains K), and not p.
Since the membership oracle returns a negative response p / ∈ B 2 (K, −δ), and the error in h p (x) must be ≥ δ, p / ∈ B 2 (K, − ). We are also given that B 2 (0, r) ⊆ K.  Theorem 2.4. Let K ⊂ R n be a convex set with B 2 (0, r) ⊆ K ⊆ B 2 (0, R) and κ = R/r for some R > r > 0, and let η > 0 be fixed. Further suppose that R, r, κ ∈ poly(n). Then a separating oracle for K with error η can be implemented usingÕ(1) queries to a membership oracle for K andÕ(n) gates.

Separation to optimization
It is known that an optimization oracle for a convex set can be implemented inÕ(n) queries to a separation oracle. Specifically, Theorem 15 of [21] states: Theorem 2.5 (Separation to Optimization). Let K be a convex set satisfying B 2 (0, r) ⊂ K ⊂ B 2 (0, R) and let κ = 1/r. For any 0 < < 1, with probability 1 − , we can compute x ∈ B 2 (K, ) such that c T x ≤ min x∈K c T x + c 2 , using O(n log(nκ/ )) queries to SEP η (K), where η = poly( /nκ), andÕ(n 3 ) arithmetic operations.

From Theorem 2.5 and Theorem 2.4, we have the following result
Theorem 2.6 (Membership to Optimization). Let K be a convex set satisfying B 2 (0, r) ⊂ K ⊂ B 2 (0, R) and let κ = 1/r. For any 0 < < 1, with probability 1 − , we can compute x ∈ B 2 (K, ) such that c T x ≤ min x∈K c T x + , usingÕ(n) queries to a membership oracle for K with error δ, where δ = O(poly( )), andÕ(n 3 ) gates.
Proof. Using Theorem 2.4 with η = poly( /nκ), each query to the separation oracle requiresÕ(1) queries to a membership oracle with error δ = O(poly( )). We makeÕ(n) separation queries and perform a furtherÕ(n 3 ) arithmetic operations, so the result follows. Theorem 2.1 follows directly from Theorem 2.6.

Lower bound
In this section, we prove our quantum lower bound on convex optimization (Theorem 1.2). We prove separate lower bounds on membership queries (Section 3.1) and evaluation queries (Section 3.2). We then combine these lower bounds into a single optimization problem in Section 3.3, establishing Theorem 1.2.

Membership queries
In this subsection, we establish a membership query lower bound using a reduction from the following search-with-wildcards problem: We use Theorem 3.1 to give an Ω( √ n) lower bound on membership queries for convex optimization. Specifically, we consider the following sum-of-coordinates optimization problem:

2)
where × is the Cartesian product on different coordinates. In the sum-of-coordinates optimization problem, the goal is to minimize Intuitively, Definition 3.1 concerns an optimization problem on a hypercube where the function is simply the sum of the coordinates, but the position of the hypercube is unknown. Note that the function f in (3.3) is convex and 1-Lipschitz continuous.
We prove the hardness of solving sum-of-coordinates optimization using its membership oracle: Given an instance of the sum-of-coordinates optimization problem with membership oracle O Cs , it takes Ω( √ n) quantum queries to O Cs to output anx ∈ C s such that with success probability at least 0.9.
Proof. Assume that we are given an arbitrary string s ∈ {0, 1} n together with the membership oracle O Cs for the sum-of-coordinates optimization problem. We prove that a quantum query to O Cs can be simulated by a quantum query to the oracle O s in (3.1) for search with wildcards. Consider an arbitrary point x ∈ R n in the sum-of-coordinates problem. We partition [n] into four sets: and denote T x := T x,0 ∪ T x,1 and y (x) ∈ {0, 1} |Tx| such that i 0 = 0 by (3.9). If i 0 ∈ T x,1 , we similarly have s i 0 = 0, y (x) i 0 = 1, and thus s |Tx = y (x) . In both cases, s |Tx = y (x) , so Q s (T x , y (x) ) = 0 = O Cs (x). Therefore, we have established that O Cs (x) = Q s (T x , y (x) ) if T x,out = ∅, and O Cs (x) = 0 otherwise. In other words, a quantum query to O Cs can be simulated by a quantum query to O s . We next prove that a solutionx of the sum-of-coordinates problem satisfying (3.4) solves the search-with-wildcards problem in Theorem 3.1. Because min x∈Cs f ( On the one hand, for all j ∈ [n] we havex j ≥ s j − 2 sincex ∈ C s ; on the other hand, by (3.10) we have In all, if we can solve the sum-of-coordinates optimization problem with anx satisfying (3.4), we can solve the search-with-wildcards problem. By Theorem 3.2, the search-with-wildcards problem has quantum query complexity Ω( √ n); since a query to the membership oracle O Cs can be simulated by a query to the wildcard oracle O s , we have established an Ω( √ n) quantum lower bound on membership queries to solve the sum-of-coordinates optimization problem.

Evaluation queries
In this subsection, we establish an evaluation query lower bound by considering the following maxnorm optimization problem: In the max-norm optimization problem, the goal is to minimize a function f c : for some c ∈ {0, 1} n , where π : R → [0, 1] is defined as Observe that for all x ∈ [0, 1] n , we have f c (x) = max i∈[n] |x i −c i |. Intuitively, Definition 3.2 concerns an optimization problem under the max-norm (i.e., L ∞ norm) distance from c for all x in the unit hypercube [0, 1] n ; for all x not in the unit hypercube, the optimizing function pays a penalty of the L 1 distance between x and its projection π(x) onto the unit hypercube. The function f c is 2-Lipchitz continuous with a unique minimum at x = c; we prove in Lemma C.1 that f c is convex. We prove the hardness of solving max-norm optimization using its evaluation oracle: Given an instance of the max-norm optimization problem with an evaluation oracle O fc , it takes Ω( √ n/ log n) quantum queries to O fc to output anx ∈ [0, 1] n such that with success probability at least 0.9.
The proof of Theorem 3.3 has two steps. First, we prove a weaker lower bound with respect to the precision of the evaluation oracle: Lemma 3.1. Suppose we are given an instance of the max-norm optimization problem with an evaluation oracle O fc that has precision 0 < δ < 0.05, i.e., f c is provided with log 2 (1/δ) bits of precision. Then it takes Ω( √ n/ log(1/δ)) quantum queries to O fc to output anx ∈ [0, 1] n such that with success probability at least 0.9.
The second step simulates a perfectly precise query to f c by a rough query: Lemma 3.2. One classical (resp., quantum) query to O fc with perfect precision can be simulated by one classical query (resp., two quantum queries) to O fc with precision 1/5n. Theorem 3.3 simply follows from the two propositions above: by Lemma 3.2, we can assume that the evaluation oracle O fc has precision 1/5n, so Lemma 3.1 implies that it takes Ω( √ n/ log 5n) = Ω( √ n/ log n) quantum queries to O fc to output anx ∈ [0, 1] n satisfying (3.16) with success probability 0.9.
The proofs of Lemma 3.1 and Lemma 3.2 are given in Section 3.2.1 and Section 3.2.2, respectively.

3.2.1Ω( √ n) quantum lower bound on a low-precision evaluation oracle
Similar to the proof of Theorem 3.2, we also use Theorem 3.1 (the quantum lower bound on search with wildcards) to give a quantum lower bound on the number of evaluation queries required to solve the max-norm optimization problem.
Proof of Lemma 3.1. Assume that we are given an arbitrary string c ∈ {0, 1} n together with the evaluation oracle O fc for the max-norm optimization problem. To show the lower bound, we reduce the search-with-wildcards problem to the max-norm optimization problem. We first establish that an evaluation query to O f can be simulated using wildcard queries on c. Notice that if we query an arbitrary x ∈ R n , by (3.14) we have where π(x) := (π(x 1 ), . . . , π(x n )). In particular, the difference of f c (x) and f c (π(x)) is an explicit function of x that is independent of c. Thus the query O fc (x) can be simulated using one query to O fc (π(x)) where π(x) ∈ [0, 1] n . It follows that we can restrict ourselves without loss of generality to implementing evaluation queries for x ∈ [0, 1] n . Now we consider a decision version of oracle queries to ( We partition [n] into four sets: The strategy here is similar to the proof of Theorem 3.2: T x,mid,t corresponds to the coordinates such that |x i − c i | ≤ t regardless of whether c i = 0 or 1 (and hence c i does not influence whether or not max i∈[n] |x i − c i | ≤ t); T x,out,t corresponds to the coordinates such that |x i − c i | > t regardless of whether c i = 0 or 1 (so max i∈[n] |x i − c i | > t provided T x,out,t is nonempty); and T x,0,t (resp., T x,1,t ) corresponds to the coordinates such that |x i − c i | ≤ t only when c i = 0 (resp., c i = 1).
Denote T x,t := T x,0,t ∪ T x,1,t and let y (x,t) ∈ {0, 1} |Tx,t| such that In other words, for all i ∈ [n] we have |x i − c i | ≤ t, which implies , and thus T x,out,t = ∅ by (3.23) and (3.27). Now consider any i ∈ T x,t . If i ∈ T x,0,t , then x i ∈ I 0,t by (3.24). By (3.20) we have x i ∈ J 0,t and x i / ∈ J 1,t , and thus c i = 0 by (3.29). Similarly, if i ∈ T x,1,t , then we must have c i = 1. As a result of (3.28), for all i ∈ T x,t we have c i = y (x,t) i ; in other words, c |Tx,t = y (x,t) and Q c (T x,t , y (x,t) ) = 1 = O f c,dec (x).
On the other hand, if O f c,dec (x) = 0, there exists an i 0 ∈ [n] such that Therefore, we must have i 0 / ∈ T x,mid,t since (3.22) implies I mid,t = J 0,t ∩ J 1,t ⊆ J c i 0 ,t . Next, if i 0 ∈ T x,out,t , then T x,out,t = ∅ and we correctly obtain O f c,dec (x) = 0. The remaining cases are i 0 ∈ T x,0,t and i 0 ∈ T x,1,t .
If i 0 ∈ T x,0,t , then y (x,t) i 0 = 0 by (3.28). By (3.24) we have x i 0 ∈ I 0,t , and by (3.20) we have x i 0 ,t ∈ J 0,t and x i 0 / ∈ J 1,t ; therefore, we must have c i 0 = 1 by (3.30). As a result, c |Tx,t = y (x,t) at i 0 . If i 0 ∈ T x,1,t , we similarly have c i 0 = 0, y (x,t) i 0 = 1, and thus c |Tx,t = y (x,t) at i 0 . In either case, We have shown that if we can solve the max-norm optimization problem with anx satisfying (3.17), we can solve the search-with-wildcards problem. By Theorem 3.2, the search-with-wildcards problem has quantum query complexity Ω( √ n); since a query to the evaluation oracle O fc can be simulated by O(log 1/δ) queries to the wildcard oracle O c , we have established an Ω( √ n/ log(1/δ)) quantum lower bound on the number of evaluation queries needed to solve the max-norm optimization problem.

Discretization: simulating perfectly precise queries by low-precision queries
In this subsection we prove Lemma 3.2, which we rephrase more formally as follows. Throughout this subsection, the function f c in (3.14) is abbreviated as f . 5n ∀ x ∈ [0, 1] n . Then one classical (resp., quantum) query to O f can be simulated by one classical query (resp., two quantum queries) to Of .
To achieve this, we present an approach that we call discretization. Instead of considering queries on all of [0, 1] n , we only consider a discrete subset D n ⊆ [0, 1] n defined as D n := χ(a, π) | a ∈ {0, 1} n and π ∈ S n , (3.34) where S n is the symmetric group on [n] and χ : {0, 1} n × S n → [0, 1] n satisfies Observe that D n is a subset of [0, 1] n . Since |S n | = n! and there are 2 n choices for a ∈ {0, 1} n , we have |D n | = 2 n n!. For example, when n = 2, we have We denote the restriction of the oracle O f to D n by O f |Dn , i.e., In fact, this restricted oracle entirely captures the behavior of the unrestricted function.

Lemma 3.4 (Discretization).
A classical (resp., quantum) query to O f can be simulated using one classical query (resp., two quantum queries) to O f |Dn .
Algorithm 5: Simulate one query to O f using one query to O f |Dn .
After running Line 1, Line 2, and Line 3, we have a point x * from the discrete set D 3 such that Ord(x) = Ord(x * ). Since they have the same ordering and |x i − c i | is either , the function value f (x * ) should essentially reflect the value of f (x); this is made precise in Line 4.
While Algorithm 5 is a classical algorithm for querying O f using a query to O f |Dn , it is straightforward to perform this computation in superposition using standard techniques to obtain a quantum query to O f . However, note that this requires two queries to a quantum oracle for O f |Dn since we must uncompute f (x * ) after computing f (x).
Having the discretization technique at hand, Lemma 3.3 is straightforward.
We run Algorithm 5 to compute f (x) for the queried value of x, except that in Line 3 we take k * = (2n + 1)(1 −f (x * )) (here a is the closest integer to a).
as a result, k * = (2n + 1)(1 − f (x * )) because the latter is an integer (see Lemma C.3). Therefore, due to the correctness of Algorithm 5 established in Section C.2, and noticing that the evaluation oracle is only called at Line 3 (with the replacement described above), we successfully simulate one query to O f by one query to Of (actually, to Of |Dn ).
x i + f E,c (x); (3.50) therefore, one query to O f can be simulated by one query to O f E,c . Therefore, approximately minimizing f with success probability 0.9 requires Ω( √ n/ log n) quantum queries to O f . In addition, f M is independent of the coordinates x n+1 , . . . , x 2n and only depends on the coordinates x 1 , . . . , x n , whereas f E,c is independent of the coordinates x 1 , . . . , x n and only depends on the coordinates x n+1 , . . . , x 2n . As a result, the oracle O Cs×[0,1] n reveals no information about c, and O f reveals no information about s. Since solving the optimization problem reveals both s and c, the lower bounds on query complexity must hold simultaneously.

Smoothed hypercube
As a side point, our quantum lower bound in Theorem 3.4 also holds for a smooth convex body.
Given an n-dimensional hypercube C x,l :  The smoothed hypercube satisfies where l n is l times the n-dimensional all-ones vector; in other words, it is contained in the original (non-smoothed) hypercube, and it contains the hypercube with the same center but edge length A Auxiliary lemmas

A.1 Classical gradient computation
Here we prove that the classical query complexity of gradient computation is linear in the dimension.
Lemma A.1. Let f be an L-Lipschitz convex function that is specified by an evaluation oracle with precision δ = 1/ poly(n). Any (deterministic or randomized) classical algorithm to calculate a subgradient of f with L ∞ -norm error = 1/ poly(n) must makeΩ(n) queries to the evaluation oracle.
Proof. Consider the linear function f (x) = c T x where each c i ∈ [0, 1]. Since each c i must be determined to precision , the problem hides n log(1/ ) bits of information. Furthermore, since the evaluation oracle has precision δ, each query reveals only log(1/δ) bits of information. Thus any classical algorithm must make at least n log(1/ ) log(1/δ) = n/ log(n) evaluation queries.

A.2 Mollified functions
The following lemma establishes properties of mollified functions: Lemma A.2 (Mollifier properties). Let f : R n → R be an L-Lipschitz convex function with molli- Proof.
(i) Convolution satisfies d(p * q) dx = p * dq dx , so because m δ is infinitely differentiable, F δ is infinitely differentiable.
where the inequality holds by convexity of f and the fact that m δ ≥ 0. Thus F δ is convex.
(iii) We have Thus from Definition 2.9, F δ is L-Lipschitz.
(iv) We have as claimed.
The following lemma shows strong convexity of mollified functions, ruling out the possibility of directly applying Lemma B.2 to calculate subgradients. Lemma A.3. There exists a 1-Lipschitz convex function f such that for any β-smooth function g with |f (x) − g(x)| ≤ δ for all x, βδ ≥ c where c is a constant.

B Proof details for upper bound
We give the complete proof of Lemma 2.2 in this section. Given a quantum oracle that computes the function N 0 F in the form it is well known that querying U F with allows us to implement the phase oracle O F in one query. This is a common technique used in quantum algorithms known as phase kickback. First, we prove the following lemma:  The above shows that we can implement QFT −1 G on a single b-bit register using O(b) gates. Thus there is no significant overhead in gate complexity that results from using QFT G instead of the usual QFT. Now we prove Lemma 2.2, which is rewritten below: Lemma B.2. Let f : R n → R be an L-Lipschitz function that is specified by an evaluation oracle with error at most . Let f be β-smooth in B ∞ (x, 2 /β), and letg be the output of GradientEstimate(f, , L, β, x 0 ) (from Algorithm 1). Let g = ∇f (x 0 ). Then Proof. To analyze the GradientEstimate algorithm, let the actual state obtained before applying the inverse QFT over G be |x . (B.9) From Lemma B.1 we can efficiently apply the inverse QFT over G; from the analysis of phase estimation (see [8]), we know that , (B.10) so in particular, Now, let g = ∇f (x 0 ). The difference in the probabilities of any measurement on |ψ and |φ is bounded by the trace distance between the two density matrices, which is Then we have In Algorithm 1, N is chosen such that N ≤ Thus the trace distance is at most 1 6 . Therefore, Pr k i − N g i 2L > 4 < 1 3 . Thus we have √ n β < 1500 √ n β, and the result follows.
Finally, we prove that the height function h p can be evaluated with precision using O(log(1/ )) queries to a membership oracle: Lemma B.3. The function h p (x) can be evaluated for any x ∈ B ∞ (x, r/2) with any precision ≥ 7κδ using O(log(1/ )) queries to a membership oracle with error δ.  Proof. We denote the intersection of the ray x + t p and the boundary of K by Q, and let H be an (n − 1)-dimensional hyperplane that is tangent to K at Q. Because K is convex, it lies on only one side of H; we let q denote the unit vector at Q that is perpendicular to H and points out of K. Let θ := arccos p, q .
Using binary search with log(1/δ) queries, we can find a point P on the ray x + t p such that P / ∈ B(K, −δ) and P ∈ B(K, δ). The total error for t is then at most 2δ cos θ . Now consider y = x−∆ q for some small ∆ > 0. Then h p (y) − h p (x) = ∆ cos θ + o( ∆ cos θ ) (see Figure 2 for an illustration with n = 2). By Proposition 2.2, h p (x) is 3κ-Lipschitz for any x ∈ B(0, r/2); therefore, h p (y) − h p (x) ≤ 3κ y − x = 3κ∆, and hence for a small enough ∆ > 0. Thus the error in h p (x) is at most 2δ cos θ ≤ 7κδ, and the result follows.

C Proof details for lower bound
In this section, we give proof details for our claims in Section 3.2.

C.1 Convexity of max-norm optimization
In this subsection, we prove: is convex on R n , where c ∈ {0, 1} n and π : R → [0, 1] is defined as Proof. For convenience, we define g i : R n → R for i ∈ [n] as where the second equality follows from (C.2). From (C.3), it is clear that Since the pointwise maximum of convex functions is convex, Moreover, for all i ∈ [n] we define h c,i : The proof is similar when c i = 1.
Now we have Because h c,i and g j are both convex functions on R n for all i, j ∈ [n], the function h c,i (x)+ j =i g j (x) is convex on R n . Thus f c is the pointwise maximum of n convex functions and is therefore itself convex.
C.2 Proof of Lemma 3.4

C.2.3 Correctness of Line 4
In this subsection, we prove: Lemma C. 4. The output of f (x) in Line 4 is correct.
In each case, the resulting expression is exactly (C.30). Overall, we see that (C.30) is always true when i < k * .
• Suppose i > k * . By (C.29), we have Both cases imply (C.30), so we see this also holds for i > k * .

C.3 Optimality of Theorem 3.3
In this section, we prove that the lower bound in Theorem 3.3 is optimal (up to poly-logarithmic factors in n) for the max-norm optimization problem: Theorem C.1. Let f c : [0, 1] n → [0, 1] be an objective function for the max-norm optimization problem (Definition 3.2). Then there exists a quantum algorithm that outputs anx ∈ [0, 1] n satisfying (3.16) with = 1/3 using O( √ n log n) quantum queries to O f , with success probability at least 0.9.
In other words, the quantum query complexity of the max-norm optimization problem isΘ( √ n). We prove Theorem C.1 also using search with wildcards (Theorem 3.1).