“Proper” Shift Rules for Derivatives of Perturbed-Parametric Quantum Evolutions

Banchi & Crooks ( Quantum, 2021; building 1951) have given methods to estimate derivatives of expectation values depending on a parameter that enters via a “perturbed” quantum evolution x 7→ e i ( xA + B ) / ~ . Their methods require to modify not just the parameters of the PQC, but also change the unitaries that appear. Moreover, in the case when the B -term is unavoidable, no exact method for the derivative seems to be known: Banchi & Crooks give a method with an approximation error. In this paper, we present a “proper” shift rule for PQCs of this type, i.e., one where only shifting parameters is required, not changing the gates. Moreover, our method is exact (i.e., it gives an “analytic derivative”), and has the same worst-case variance as Banchi-Crooks’. We propose some variants with approximation error. We discuss the theory surrounding proper shift rules, based on Fourier analysis of perturbed-parametric quantum evolutions, resulting in a characterization of the proper shift rules in terms of their Fourier transforms, which in turn leads us to non-existence results of proper shift rules with exponential concentration of the shifts.


Introduction
One paradigm in near-term quantum computing is that of variational quantum algorithms (VQAs): Quantum algorithms which contain real-number parameters and which must be trained, i.e., the parameters must be optimized -similarly to classical differentiable programming, e.g., artificial neural networks.
In fault-tolerant quantum computing, evidence has been given (see, e.g., [1] and the references therein) that the concept of quantum programs depending on parameters that are fitted to data may turn out to be an important component in applications of quantum computing, beyond the realm of machine learning and AI.
Pre-fault-tolerance, under the name of Parameterized Quantum Circuits or Variational Quantum Circuits, VQAs are at the heart of proposals for quantum-computer based simulations of molecules and condensed matter materials, quantum machine learning and quantum AI, some approaches to quantum-based combinatorial optimization, for example some uses of the Quantum Approximate Optimization Algorithm, and even applications such as linear-system solving and factoring (e.g., [2,3,4,5,6,7]). But the concept of parameterized quantum evolution for prefault-tolerant quantum simulation and computation extends beyond gate-based quantum circuits (e.g., [8,9,10]). In this paper, our interest is in quantum evolutions where the parameters do not enter via simple gates.
As optimization/training algorithms based on estimates of derivatives (e.g., variants of Stochastic Gradient Descent) outperform derivative-free methods in practice in the training of variational quantum algorithms (cf. [11] and the references therein), there is the need to obtain, efficiently, estimates of derivatives with respect to the parameters in a VQA. Starting with seminal works by Li et al. [12] and Mitarai et al. [3], unbiased estimators for derivatives have been obtained efficiently using so-called shift rules: If f denotes the expectation-value function dependent on, w.l.o.g., a single parameter, a shift rule is a relation where the u j (coefficients) and s j (shifts) are fixed real numbers, and the equation holds for all x. This is a convolution of the expectation-value function with a finite-support measure, and by replacing "finite-support" with "finite", this extends to shift rules with a continuum of shifts: where ϕ is a finite measure on R, and " * " denotes convolution. The notion of a shift rule has been extended, e.g. [13] speak of a Stochastic Shift Rule for a method that involves other modifications of the quantum evolution than merely changing the parameters. 1 To avoid ambiguity, we use the term Feasible Proper Shift Rule (feasible PSR) for the ϕ in relation (1b).
We add the qualification "feasible" as we also study a couple of notions of approximate PSRs where (1b) holds up to an approximation error 2 , leading to biased estimators.
The present paper studies PSRs for derivatives of the type of quantum evolutions which is the subject of [13]: We expect that the parameter, x enters in the following form We refer to (2) as "perturbed-parametric" unitary, and we speak of a perturbed-parametric expectation-value function if a measurement depends on x via a perturbed-parametric unitary.
Banchi & Crooks [13] propose a stochastic shift rule for estimating derivatives of perturbedparametric expectation-value functions, building on an equation by Feynman [14,15]. As said above, their rule modifies the underlying quantum evolution itself: To obtain an estimate of the derivative, the unitary (2) must be replaced by a concatenation e it3 (x3A+B)/ℏ · e iβ (x2/β·A+B)/ℏ · e it1 (x1A+B)/ℏ , for certain values of t * , x * , and a small β > 0; if the B-term in the evolution can be switched off, β → 0 effects the unitary e ix2A/ℏ . For β > 0, Banchi & Crooks refer to their method as Approximate Stochastic Parameter Shift Rule.
Modifying the underlying quantum evolution has some small technical disadvantages, as it requires a re-calculation of the schedule of quantum-control pulses.
Moreover, Banchi & Crooks's shift rule has the disadvantage that it cannot give an unbiased estimate of the derivative in situations where the B-term in the perturbed-parametric unitary (2) is unavoidable: A O(β) approximation error results.
In many situations where perturbed-parametric unitaries arise, a certain small bias in the estimate of the derivative may be allowed. Indeed, in pre-fault-tolerant settings, expressions such as (2) are approximations up to control inaccuracies.
But in the case where the B-term is unavoidable, Banchi-Crooks's Approximate Stochastic Parameter Shift Rule requires, to achieve a good approximation, a small β > 0, so the parameter value, 1/β, is large. In practice, large parameter settings may not be desirable (e.g., they could result in more cross talk) or even technically possible. 3 Moreover, in some situations where perturbed-parametric quantum evolutions arise, the factor β in the middle factor in (3) is a duration of some pulse, and effecting unitaries such as e iβ (x2/β·A+B)/ℏ involves a narrow frequency band for that pulse (e.g., the resonant frequency of a qubit). In these situations, the Fourier Uncertainty Principle may make it difficult in principle to choose an arbitrarily small β.

Contributions of this paper
We present a PSR feasible for expectation-value functions where the parameter enters in the form of (2); we call it the Nyquist Shift Rule. Our method is exact, so that estimators without bias can be constructed; one such estimator, that we present below, has the same variance as Banchi & Crooks's Stochastic Parameter Shift Rule.
As the tag "Nyquist" suggests, our PSR is based on Fourier analysis. The connection between PSRs and Fourier spectra was probably first observed in [16], and exploited in [17] and elsewhere; the present paper shows ways to exploit Fourier analytic properties of perturbedparametric expectation-value functions. Indeed, our results are based on the observation that the perturbed-parametric expectation-value function is -in finite dimension -band limited: Its Fourier spectrum (i.e., support of the Fourier transform) is contained in an interval [−K, +K], where K is determined by the difference between the largest and smallest eigenvalues of A. For functions like that, immediately the Nyquist-Shannon Sampling Theorem comes to mind, here stated in the form of the Shannon-Whittaker Interpolation Formula: E.g., for functions f with Fourier spectrum contained in [− 1 /2, where " □ " is placeholder for the variable, sinc := sin(π □ )/(π □ ) and " " refers to sum-convolution 4 as defined with the RHS. From here, obtaining a derivative 5 seems straight forward: The resulting PSR 6 sinc ′ ( □ − n) (n ∈ Z) decays only as 1/|n| for |n| → ∞. This can be fixed by picking a sweet spot point x for which sinc ′ (x − n)) decays as 1/|n| 2 , namely x = 1 /2. Restarting with and then performing a reflection of f (the details are in §2.4), results in, for every x ∈ R, as sinc ′ ( 1 /2 − n) = (−1) n−1 /(π(n − 1 /2) 2 ). Generalizing to other values of K for the interval [−K, +K] containing the Fourier spectrum, one obtains the following family of PSRs, which we call Nyquist Shift Rules: , where K > 0; (5) with δ x denoting the Dirac point measure at the point x ∈ R.
While this hand-waiving argument captures the starting point of the research presented in this paper, there are a number of problems with it. First of all, the Shannon-Whittaker formula (4a) as presented doesn't work: Plugging in f = cos(π □ ) at the point 1 /2 gives infinity on the RHS. 7 Secondly, in (4b), we are taking the derivative under a sum which (even if it converged) doesn't converge absolutely, causing headaches and high blood pressure. However, if we are able to arrive at (4c), then (4d) will follow, and the feasibility of the Nyquist shift rules (5) will be established.
The contributions one by one. (1) Counting the observation about the Fourier spectra of perturbedparametric expectation-value functions (see §2.3) as the first contribution of this paper, (2) the second contribution is a rigorous proof that the Nyquist shift rule (5) is feasible for expectationvalue functions where the difference between largest and smallest eigenvalues of A is at most 8 K/2πℏ; see §2.5.
(3) En passant, in §2.4, we characterize the set of all feasible PSRs (for fixed K > 0) in terms of their Fourier transforms -mirroring the characterization via a system of linear equations in [18] but with infinitely many equations. As a first consequence of that characterization, we can show ( §2. 4.b) that the space 9 of all feasible PSRs has infinite dimension (for each fixed K) -so you could definitely say that there are many of them.
(4) As a second consequence of the characterization in §2.4, we can prove some non-existence results for particularly nice feasible PSRs ( §2.9), e.g., it's a one-liner to see that there is no PSR with compact support that is feasible for the frequency band [− 1 /2, + 1 /2]. It takes a little more effort to prove that there is no feasible PSR which is exponentially concentrated, indeed, there isn't even a family of exponentially concentrated PSRs that are, in a wide sense,"nearly feasible" (definition in Corollary 2.34): You cannot let the approximation error tend to 0 without blowing up either the tail or the "cost" (see next item) of the PSR.
(5) The "cost" of PSRs: In §2.6.a, in parallel with the results in [18], we show that the (totalvariation) norm of a PSR equals the worst-case standard deviation of a single-shot estimator for the convolution of the expectation-value function with the PSR (see Fig. 2 below). Moreover, we prove that, for each K > 0, there is no feasible PSR that has smaller norm than our Nyquist shift rule ϕ K , i.e., the Nyquist shift rules are optimal; see §2.6. The Stochastic Parameter Shift Rule of Banchi & Crooks [13] has the same worst-case standard deviation (but it isn't a "proper" shift rule).
While from a mathematical point of view, the Nyquist shift rules in (5) are the "right" ones, they require to query the expectation-value function at arbitrarily large parameter settings. Indeed, the expected magnitude of a parameter setting is ∞ as the harmonic sum diverges. The usual remedy for this situation in the context of Nyquist-Shannon-Whittaker theory is to truncate the sum, which introduces an approximation error. As this paper's contribution (6) Table 1: Comparison of methods. The table shows the approximation error ("apx err"), maximum magnitude of parameter values ("max MoPV") and average magnitude of parameter values when simple estimators are used. "Banchi-Crooks" refers to the Approximate Stochastic Parameter Shift Rule in [13], which depends on a small parameter, β > 0, that determines the query points. The methods presented in this paper are dubbed "Nyquist"; "exact" refers to (5) with the Single-Shot Estimator, Fig. 2 (1))c, where c is the maximum magnitude of a parameter value.
We summarize the results about various types of proper, improper, and folded shift rules in Table 1.

Organization of the paper
In the next Section, we give a technical, mathematically rigorous overview of the results of this paper, with an emphasis on motivation and easy on the proofs. The more technical proofs are in the Sections 3-8. Section 9 discusses some questions that arise, and points to future work.
Appendices A and B hold math that is well known or easily derived, added for convenience; in Appendix C we prove, for the sake of completeness of the presentation, the characterization of feasible proper shift rules in a more general case than is needed for expectation-value functions. Appendix D has additional graphs from numerical simulations (cf. §2.10).
The author aims for the content of this paper, from this point on, to be fully mathematically unambiguous and rigorous, and welcomes any criticism that points out where this goal has been missed.

Technical overview
This section presents the results of the paper in full mathematical rigor, with an emphasis on motivation: we state the results and discuss their relationships. In terms of proofs, we give a few that help to motivate, banning the more technical or longer ones to later sections.

Math notations, definitions and preliminary facts
In this paper, a finite measure is a finite (not necessarily positive) regular Borel measure on R; we use the phrase signed measure and complex measure, resp., to emphasize that only real or also complex values are allowed. In this paper, finiteness is implied in the term signed/complex measure.
Complex measures have a canonical decomposition into two signed measures, which in turn have a canonical decompositions into two positive measures (Jordan decomposition). The sum of the resulting four (or 2) positive measures in the decomposition of a finite measure µ is called the total-variation measure and denoted by |µ|. The total-variation norm of a finite measure µ is defined as ∥µ∥ := 1 d|µ|, and satisfies ∥µ∥ = sup f f dµ, (6) where the supremum ranges over all measurable functions f with ∥f ∥ ∞ ≤ 1. This equation shows that convergence µ n n→∞ − −−− → µ in total variation implies convergence for every bounded measurable function f : f dµ n n→∞ − −−− → f dµ -we use that implicitly all the time.
Convergence in the total-variation norm of the sum of measures defining the Nyquist shift rule (5) can easily be checked.
For a measurable mapping τ : R → R and a finite measure µ on R, we denote by τ (µ) the image of the measure under τ (aka push-forward measure). It is defined by τ (µ)(E) = µ(τ −1 (E)) for every Borel set E, or, equivalently by f dτ (µ) = f • τ dµ for all bounded measurable functions f .
We will repeatedly use the following fact, without reference to it: For every measurable τ : R → R, the mapping µ → τ (µ) takes complex measures to complex measures, is linear, and is continuous in the total-variation norm; in particular, if µ := j∈N α j µ j is a total-variation-norm convergent series, where the µ * are complex measures and the α * are complex numbers, then τ (µ) = j α j τ (µ j ).
The terms Fourier transform, support, and Fourier spectrum (the support of the Fourier transform) refer to the concepts for tempered distributions; in the case of finite measures, the Fourier transform coincides (up to technicalities) with the Fourier-Stieltjes transform: note the 2π in the exponent. The inverse Fourier transform is denoted by□ . We also use the notations (. . . ) ∧ and (. . . ) ∨ for the (inverse) Fourier transform of the function / measure / distribution in the place of ". . . ".
While the results in this paper can be extended to multi-parameter functions, here, we focus solely on a single parameter. As a consequence, all function spaces will be spaces of functions on the real line. For that reason, when using standard function-space notation, we omit "(R)": I.e., L 1 (space of absolutely integrable functions on R), L 2 (space of square integrable functions on R), C 0 (space of continuous functions on R which vanish towards ±∞), C 1 (space of continuously differentiable functions on R), C 1 b (space of bounded continuously differentiable functions on R), S (space of Schwartz-functions on R), S ′ (space of tempered distributions on R), etc.
We use the word smooth to mean continuously differentiable.

Perturbed-parametric evolutions and proper shift rules
The author apologizes to all physicists for choosing the barbaric normalization 11 h = 1, i.e., ℏ := 1/2π, in the following definition.
as a perturbed-parametric unitary or a perturbed-parametric unitary function. A perturbed-parametric expectation-value function is a function of the following form: where ϱ is a positive operator and M a Hermitian operator, both on the same Hilbert space as A, B. To avoid trivial border cases, we require that A is not a scalar multiple of the identity operator.
This definition captures the typical setting in which VQAs are used today: 1. Prepare an n-qubit system in an initial state, say |0 n ⟩; 2. Subject the system to a time-dependent evolution (possibly depending on other parameters, but not x); 3. Subject the system to a evolution with Hamiltonian −(xA + B) for one unit of time; 4. Subject the system to further time-dependent evolution (possibly depending on other parameters, but not x); 5. Measure an observable.
Here, the state ϱ would be reached after step #2, and the observable M would be E † (M 0 ), where E is the quantum operation resulting from the evolution in step #4 and M 0 the observable in step #5.
In parallel 12 with [18], we make the following definitions.
where " * " is convolution: For all x ∈ R, Let's take as an example the symmetric difference quotient, . It is feasible for the space 13 of polynomials of degree at most 2, but it incurs an approximation error (i.e., is not feasible) on any non-constant expectationvalue function.

Function spaces
In this paper, as in [18], we address questions about parameterized quantum evolutions by proving theorems about function spaces, and about membership of expectation-value functions in these spaces. In this section, we define the spaces and present the facts regarding membership of expectation-value functions. For convenient reference, Table 2 gives an overview of all spaces we use.
For convenience, and to avoid trivial border cases, we make the following definition.
Let Ξ be a frequency set. The space of real-valued, bounded, smooth functions with Fourier spectrum contained in Ξ is denoted by 14 as in [18]: There, convex optimization techniques and computations are explored, based on the fact that this space has finite dimension. The spaces in the present paper, though, are not finite dimensional: For positive real K, we denote by F K the set of smooth, bounded real-valued functions on R with Fourier spectrum contained in [−K, +K] From Paley-Wiener theory [19,Theorem IX.12] we know that every tempered distribution with compact Fourier spectrum is indeed a function which is (extendable to a) holomorphic (function) with polynomial growth (in the real part). Hence, the set F K contains all tempered distributions with two constraints: (1) on the Fourier spectrum, and (2) boundedness. The boundedness is necessary, or otherwise the convolution with finite measures would not be well defined.
We find it more convenient to work with the function space, but the prime example that we are interested in are the perturbed-parametric expectation-value functions from Def. 2.1. The following proposition gives the connection. (We denote by eig min ( ), eig max ( ) the smallest and largest, resp., eigenvalues of an operator.) Proposition 2.5. With the notations of Def. 2.1, set K := eig max (A) − eig min (A). The expectation-value function is a bounded analytic function whose Fourier spectrum is contained in [−K, +K], and hence it is a member of F K .
As everybody will surmise, the proof of the proposition, in §3.4, is based on the Lie Product Formula, aka "Trotterization".
The work in [18] is based on the function space K Ξ , which is justified as every function in that space is an expectation-value function of a variational quantum circuit [18,Prop. 4]. In the present paper, that is not the case: There are functions in F K that are not perturbed-parametric expectation-value functions 15 . Working with the spaces F * is nothing else but a convenient abstraction, simplifying the reasoning for much of what we are doing in this paper -particularly the non-existence proofs. At some points, a more refined Fourier-analytic abstraction of perturbedparametric expectation-value (and unitary) functions is more convenient or necessary. We prepare that with the following definition. Definition 2.6 (Linear decay). For non-negative real constants c, C, we say that a continuous function f of a real variable decays linearly (or is of linear decay) with decay constants (c, C), if for all x ∈ R with |x| ≥ c we have |f (x)| ≤ C/|x| Defining, as usual, the difference set of a set Λ as Diff Λ := {λ − λ ′ | λ, λ ′ ∈ Λ}, textbook eigenvalue perturbation theory (in finite dimension!) will give us the following. There exists a function In Section 3, we obtain Prop. 2.7 as a consequence of a corresponding Fourier-decomposition for perturbed-parametric unitary functions (Prop. 3.1). Details and the proofs of both propositions are in § §3.2-3.3.
With the notations of Prop. 2.7, as the tempered distributions with support contained in a given set form a subspace, and since both f (by Prop. 2.5) and f 1 have Fourier spectra that are contained in [−K, +K] for K := max Ξ, we find that f 0 also has a Fourier spectrum that is contained in [−K, +K]. This motivates the following definition. Definition 2.8. For K > 0, we denote by F ↓ K the vector space of analytic functions that are of linear decay for some choice of decay constants, and whose Fourier spectrum is contained in the interval [−K, +K].
Combining Propositions 2.5 and 2.7, we obtain the following as a direct consequence.
In other words, every perturbed-parametric expectation value function f can be uniquely decom- , a (finite) linear combination of sin(2πξ □ ) and cos(2πξ □ ) with frequencies ξ ∈ Ξ; The only part of Corollary 2.9 that we have not discussed yet is the directness of the sum (which is equivalent to the uniqueness of the decomposition); we refer to Prop. A.4 in Appendix A.2.
The convenience gain of the Fourier-decomposition theorem lies in the fact that functions of linear decay are square integrable. So the Fourier transform f 0 is a function (not an evil tempered distribution), and, thanks to the compact support, the inverse Fourier transform is realized simply by the integral: -no tempered distribution theory required, not even improper integrals. The consequence is that, in some occasions, F ⊕ Ξ is technically easier to work with in terms of the Fourier transform (e.g., see the discussion following Lemma 2.12 in §2.4.a below). In other occasions, the additional knowledge in F ⊕ * does not seem to translate into reduced technical complexity of the proofs (e.g., the proof of Prop. 2.5 above).

As
F ⊕ Ξ ⊊ F max Ξ , every PSR feasible for F max Ξ is also feasible for F ⊕ Ξ , and it is conceivable that there could be PSRs that are feasible for F ⊕ Ξ but not feasible for F max Ξ . This is not the case. Based on the characterization of PSRs feasible for F max Ξ in §2.4.a below, we will show the following. Proposition 2.10. Let Ξ be a frequency set. Every complex PSR feasible for F ⊕ Ξ is also feasible for F max Ξ .
(The proof is in §4.1.) Hence, for every frequency set Ξ, the PSRs feasible for the smaller space F ⊕ Ξ of Corollary 2.9 are exactly the same as those feasible for the larger space F max Ξ used in Prop. 2.5.
In terms of practical use of the PSRs, the structure of F ⊕ Ξ can be exploited: In §2.8 below, we will discuss the concept of Folding which exploits the linear decay condition to achieve a shift-ruleish method with an approximation error. The approximation error decays quickly (quadratically) with the magnitudes of the parameter values at which the perturbed-parametric expectation-value function is queried. (s). Choosing x = 0 as an anchor point, as it were, we find f ′ (0) = f dφ whereφ := (− □ )(ϕ) is the image of ϕ under the measurable mapping (− □ ) : R → R : x → −x, the reflection on 0. Conversely, any complex measureφ satisfying that integral equation (for all f ) gives rise to a feasible PSR, via ϕ := (− □ )(φ). This approach leads to the characterization of the set of (finite-support) shift rules via the system of equations in [18,Eqn. (8)].

Reflection, dilation, and the space of feasible proper shift rules
In the present paper, we choose an anchor point different from 0 -simply because it is convenient for the concrete Nyquist Shift Rules that we will work with: The anchor point will be 1 /2. Moreover, as discussed in the introduction (and as is evident in (5)), the Nyquist shift rule takes its simplest form when the Fourier spectrum is For these reasons, we will consider measures µ with the following property: the space which g is allowed to be in will be swapped out for F ⊕ Ξ in some places below. In any case, we will speak of a derivative-computing measure.
We start by providing the details about the connection between µ's and PSRs. First of all, note that if ϕ is a PSR for a function space containing a certain function g, then where τ : R → R : s → 1 /2 − s is the reflection on 1 /2. This means: If ϕ is a PSR feasible for F1 /2 , then µ := τ (ϕ) satisfies (12). Dilation then gives feasibility for F K in the case K ̸ = 1 /2. We summarize in the form of the following proposition (proof details in §5.1). Proposition 2.11. For every K > 0, there is a canonical isomorphism between the (real or complex, resp.) affine space 16 of (signed or complex, resp.) measures µ satisfying (12), and the affine space of (real or complex, resp.) PSRs feasible for F K .

2.4.a The space of PSRs
We now come to the characterization of the set of PSRs feasible for F K , for any K > 0, and for F ⊕ Ξ for any frequency set Ξ: We mentioned in Prop. 2.10 that these sets are the same, and we will now develop the machinery to prove it. By what we just summarized, the task is to characterize the derivative-computing measures, i.e., the finite measures satisfying (12).
The following lemma is the infinite-support extension of the corresponding statement on finite support [18,Eqn. (8)]. Lemma 2.12 (Fourier-analytic characterization of derivative-computing measures). Let µ be a finite measure on R. For (12) to hold, it is necessary and sufficient that Recall that the Fourier transform of a tempered distribution which arises from integrating against a complex measure coincides with integrating against the Fourier-Stieltjes transform of the measure. As the Fourier-Stieltjes transform of a complex measure is a continuous function, evaluatingμ at individual frequencies ξ is well defined.
Let us consider the necessity of the condition.
Proof that (12) implies (13). For all ξ ∈ [− 1 /2, + 1 /2], the real and imaginary parts of the function x → e −2πiξx lie in F1 /2 . Assume that µ is a complex measure satisfying (12). By the complex linearity of both the derivative at 1 /2, f → f ′ ( 1 /2), and of the integral f → f dµ, by applying both to the real-and imaginary parts of x → e −2πiξx , we find that This proves the condition (13), demonstrating its necessity in Lemma 2.12.
As for the proof of the sufficiency-direction, it is based on Fourier Inversion. Performing that rigorously for only F ⊕ * is less technical than the general case as stated in the lemma, which requires tempered distribution arguments and Paley-Wiener-Schwartz theory. As, in view of Corollary 2.9, the general case is mostly of academic interest, in §4.2, we prove the lemma only for the special case of F ⊕ * , and demote the general case, F * , to Appendix C. As the proofs of the versions of Lemma 2.12 are somewhat spread out, we refer to Fig. 1 for a visual guide. Revisiting §2.3.a, suppose Ξ ⊂ [− 1 /2, 1 /2] is a frequency set with max Ξ = 1 /2. In Lemma 4.1 of §4.1 we will prove that (13) is already implied by requiring that the equation g ′ ( 1 /2) = g dµ in (12) holds only for g ∈ F ⊕ Ξ -instead of for all g ∈ F1 /2 . This will prove Prop. 2.10 together with the reflection on 1 /2, translation, and dilation techniques laid out in §2.4 above.

2.4.b "Quantitative" view
Using Lemma 2.12, we are now ready to make a "quantitative" statement about the set of measures µ satisfying (12). At this point, as it were, we "don't know yet" whether a single such µ exists, but if a single one exists, the set is large. In §4.3, the proof of the proposition proceeds by adding to a single such signed measure infinitely many linearly independent signed measures whose Fourier transform vanishes on [− 1 /2, + 1 /2], and invoking Lemma 2.12.

Feasibility of the Nyquist shift rules
It's about time to discuss the feasibility of the Nyquist shift rules ϕ K , K > 0, from (5), for F K ; throughout this section, ϕ * will be as in (5).
As indicated in the introduction, we will not attempt to pursue a strategy based on the Shannon-Whittaker Interpolation Theorem. Instead, we apply Lemma 2.12 of the previous section for a suitable signed measure µ, which we will define in a moment. This µ will allow us to define ϕ1 /2 via reflection, and dilation will give us all other ϕ K , K > 0 (Prop. 2.11).

2.5.a The derivative-computing measure
With we define the following signed measure on R: where convergence in the total-variation norm can easily be checked. One main technical piece of work in this paper is the following theorem.
Applying Lemma 2.12 shows that the µ in (14) satisfies (12), i.e., integrating against it gives the derivative at 1 /2, provided that the integrand is a bounded smooth function with Fourier spectrum The proof of the theorem is in §4.4.

2.5.b From µ to ϕ K
We now state the feasibility of the Nyquist shift rules; the purely technical proofs, based on the reflection and dilation, are in §5.2.

Corollary 2.16. For each fixed real number
With Prop. 2.5, we see that the Nyquist shift rule ϕ K , K > 0, from (5), when convoluted against an expectation-value function f as in Def. 2.1 with eig max (A) − eig min (A) ≤ K, gives the derivative of f .

Cost concept and optimality
So we have PSRs feasible 17 for F K , K > 0. But are there better ones? In this section, in answering that question in the negative, we understand the word "better" in a narrow technical sense to mean smaller norm. The norm of a PSR will turn out to be the worst-case standard deviation of the Single-Shot Estimator for it.
This section explains why we have somewhat emphasized real-valued PSRs: It can be seen that, as the functions in our spaces F * , F ⊕ * are real-valued 18 , the real part of any feasible complex PSR is a feasible PSR with smaller norm (cf. [18,Lemma 16]). Hence, there doesn't seem to be an advantage in allowing complex PSRs.

2.6.a The cost of a PSR
As in [18], we define the cost of a PSR ϕ simply as the (total-variation) norm of the measure, ∥ϕ∥. It is elementary that the norm coincides with the operator norm of the linear operator on the normed space of bounded, measurable real-valued functions of a real variable, M b , with the supremum norm ∥ □ ∥ ∞ ; for convenient reference, we note two relevant inequalities in a remark. 19 Remark 2.17. For every real-valued, bounded, measurable function f and every signed measure ϕ, we have and in particular, By applications of standard facts about regular measures (and continuous, differentiable, analytic functions), the statements remain true if we replace the qualification "bounded measurable" by "bounded continuous", or "bounded smooth" C 1 b , or "bounded infinitely differentiable", or "bounded analytic".
In addition to the application-side interpretations of the operator norm as the worst-case factor by which noise (e.g., higher-frequency contributions to the input) is amplified when taking the convolution, [18] discusses two more motivations for referring to ∥ϕ∥ as the cost of the PSR ϕ, which make sense only for finite-support PSRs. Here, we discuss the following: The norm of the PSR is the standard deviation of a single-shot estimation algorithm for the convolution, in the worst case over all functions f and points x. 17 Let it be repeated that, due to Prop. 2.10 in §2.3.a, there's no difference between feasibility for F * and for F ⊕ * ; we stick to F * as it is the "simpler" space. 18 So are, importantly, all expectation-value functions. 19 Cf. (6) for the first one.

Algorithm: Single-Shot Estimator
Input : x ∈ R Plus: Access to shot oracle F Output: 1 Sample a random point A ∈ R according to the probability measure |ϕ|/∥ϕ∥ 2 Invoke the shot oracle for the point The Single-Shot Estimator. As we want the convenience of talking about function spaces, we introduce a theoretical computer science concept that replaces a "shot", i.e., a single measurement of the observable M from Def. 2.1 in the parameterized quantum state: Definition 2.18 (Shot oracle). A shot oracle, F , for a function f : R → R takes as input s ∈ R and returns a random number F (s) ∈ R such that E F (s) = f (s), with the requirement that runs of the shot oracle (for same or different inputs) return independent results. With input x ∈ R, the Single-Shot Estimator in Fig. 2 Proof. We use the notations from Fig. 2. Suppose the Single-Shot Estimator receives x ∈ R as input. By our assumption that F ( □ ) is ±1-valued and since dϕ/d|ϕ| is a function taking ±1-values |ϕ|-almost everywhere, we have E(x) 2 = ∥ϕ∥ 2 , so that E E(x) 2 = ∥ϕ∥ 2 . As advertised in Fig. 2, the Single-Shot Estimator is indeed an unbiased estimator for f * ϕ, For a feasible PSR, the variance is ∥ϕ∥ Making a worst-case assumption on f and x, we take f ′ (x) = 0 -barren plateaus [20,21] raising their flat heads again. In other words, we upper-bound the variance by ∥ϕ∥ 2 . This worst-case upper bound leads to the worst-case standard deviation of ∥ϕ∥ for the Single-Shot Estimator.

2.6.b Review of weak duality
While much of the convex optimization theory in [18] breaks down when moving from the finite dimensional vector spaces K Ξ there to our infinite dimensional spaces F K , and moving from finitesupport PSRs to infinite-support ones, the Weak Duality Theorem, [18,Prop. 6] goes through letter for letter. We give here a version of it that is tailored to our needs; the proof is in §6.1.

Proposition 2.20 (Weak duality theorem). Let G be a vector space of real-valued, bounded smooth functions, which satisfies
If ϕ is a PSR feasible for G , and f ∈ G with ∥f ∥ ∞ = 1, then Under the antecedent condition, equality holds in (16) if, and only if, inequality (15a) in Remark 2.17 is satisfied with equality for f, ϕ.
The spaces F * and F ⊕ * satisfy the condition on G in Prop. 2.20: By the usual tempereddistribution calculations we have for all tempered distributions f , 2.6.c Optimality of the PSR Now we are ready to prove that, in terms of our cost concept, the Nyquist shift rules (5) are optimal for F K .

2.6.d Other optimal feasible PSRs
In Section 9, the question will arise whether we can replace ϕ K as defined in (5) by another optimal feasible PSR -maybe one whose support is more concentrated near 0. Here we note the following consequence of (the proof of) Theorem 2.21 together with Prop. 2.20.
If ϕ is a PSR feasible for F K , then ϕ has smallest norm (i.e., is optimal) if, and only if, ϕ = −ϕ 0 + ϕ 1 where for i = 0, 1, ϕ i is the restriction of |ϕ| to the set This means that for all K > 0, a PSR feasible for F K is optimal for F K if, and only if, its support is contained in the set of numbers of the form ( 1 /2 + m)/2K where m ∈ Z, and the signs alternate in the right way.
The proof of the corollary is sketched in §6.3.

Truncation
As discussed in the introduction, the Nyquist shift rules (5) require to apply the perturbedparametric unitary from Def. 2.1 for arbitrarily large values of the parameter x in (8), which may not be physically desirable, or even possible. In this section, we discuss ways to truncate our PSRs to a bounded set of shifts, at the expense of introducing a (hopefully small) approximation error in the derivative. The next section §2.8 pursues the same goal using a technique that we call Folding. A third way to keep the values of the parameter small is described and used in the section on numerical simulations, §2.10.

Preparations.
We start by discussing the approximation error.
Definition 2.23 (Near feasibility). For a space of real-valued, bounded, smooth functions G , and ε ≥ 0, let us say that a PSR ϕ is ε-nearly feasible for G , if we have Feasibility is the same as being 0-nearly feasible, of course. While convenient for our calculations (see the next lemma), the example of the symmetric difference quotients shows the limitations of this definition. The approximation guarantee is wellknown: For at least 3-times differentiable functions f , we have (for variable ε) the norm (cost) can readily be computed: (It has been observed before (e.g., [13]) that the standard deviation of estimating based on this PSR is Ω(1/ε).) We see that improvement in approximation error is bought at the cost of increasing standard deviation of the estimator: As the sampling complexity (i.e., number of samples to reach a precision) grows quadratic in the standard deviation, cost and benefit are of equal order.
However, according to our Def. 2.23, the symmetric difference quotient is only O(1)-nearly feasible -which erroneously 21 suggests that it is completely useless.
The following lemma shows how near-feasibility is being used. Its proof explains why we get ∥f ∥ ∞ on the RHS.
Proof. As ϕ is feasible for G , we can replace f ′ = f * ϕ, use the bilinearity of convolution, and then use the inequality (15b) of Remark 2.17: This concludes the proof.
Truncation. Now we can discuss what happens to our ϕ K , K > 0, when we truncate them. In the next proposition, no effort has been made to optimize the constant in front of the 1/N ; the proof is in §7.1.
The PSR ϕ As discussed in the introduction, usually, a small error in the derivative is acceptable. Setting N := ⌈4K/επ + 1 /2⌉, Prop. 2.25 shows us that we can press the error to below ε by using shifts of magnitudes less than 2/επ + 1/2K.

Folding
For truncating a PSR, we have taken an interval and treated specially all shifts that didn't fall into that interval: We simply discarded them. We discarded shifts, not parameter values: With a truncation interval [−∆, ∆], for the derivative at a point x ∈ R, the parameter values at which the expectation-value function is queried to approximate the derivative at For folding a PSR, we can also take an interval, and treat specially either shifts (in shift folding) or parameter values (in parameter folding) that don't fall into that interval. Instead of simply discarding, though, we will replace the shift/parameter value by one that falls into the interval.
It will make sense, in this subsection, to switch our attention from F * to F ⊕ Ξ := K Ξ + F ↓ max Ξ , i.e., we always think of the expectation-value function f as being decomposed into f = f 1 + f 0 , as in Corollary 2.9. We start by clarifying the relationship between this paper's Nyquist shift rule (5) which has infinite support and is feasible for F ⊕ Ξ , and the known PSR which has finite support and is feasible [17] for K Ξ .
For every positive integer L, there exists a finite-support PSR feasible for K {−L,...,L} [17]: the expression in the 2nd line is the first derivative of the modified Dirichlet kernel at (j + 1 /2)/2L. The following remark makes this compatible with the notation in this paper, by dilation.

Remark 2.26 ([17]
). Let K, ξ 1 be positive real numbers with the property that L := K/ξ 1 ∈ N, and define The following is a PSR feasible for K Ξ with support contained in 22 [−1/2ξ 1 , +1/2ξ 1 [ and with norm 23 2πK: For a positive real number p and every x ∈ R we will use the following notations: (Recall the definition of x = y mod pZ, which is: y − x ∈ pZ.) With the notations of the previous remark, the mapping x → x % (1/ξ 1 ) folds its argument into a fundamental region of periodicity of the functions in K Ξ . For a fixed real number p > 0, by shift-folding a PSR with mod-p, we mean taking the image of the measure under the mapping ( □ % p) : R → [−p/2, +p/2[. We will give a general definition of shift folding in Def. 2.28 below; here we are only interested in taking the remainder mod-p. With the notations of Remark 2.26, by shift-folding with mod p where p = 1 /ξ1, we compile the measure of every real number onto the corresponding point in the fundamental region [−1/2ξ 1 , +1/2ξ 1 [ of periodicity of the function space K Ξ . Now we are ready to establish the relationship between the infinite-support Nyquist shift rules (5) feasible for F ⊕ Ξ and known finite-support PSRs feasible for K Ξ : With the notation of Remark 2.26, if we take a Nyquist shift rule and shift-fold with mod-1/ξ 1 , then we obtain the known shift rules feasible [17] and optimal [18,Theorem 18] The proof in §7.2.a is based on partial fraction decomposition of 1/ sin 2 . This theorem is relevant not only to make the point that, just as the PSRs ψ from Remark 2.26 are the right 24 PSRs for K Ξ , our Nyquist shift rules (5) are, mathematically speaking, the right PSRs for F ⊕ Ξ . The theorem also motivates our general concept of folding, as defined in the next subsection, §2.8.b.
We have taken the approach here to present the special case where Ξ consists of equi-spaced frequencies with no gaps. It should be clear to the attentive reader that Theorem 2.27 applies more generally in the case of frequency sets Ξ that have a common divisor 25 . The frequency set Ξ having a common divisor is equivalent to the functions in K Ξ being periodic (the periods are the reciprocals of the common divisors). In [18] we prove that in the case of frequency sets with gaps, the finite-support shift rule ψ from Remark 2.26 is still feasible and optimal, but there is a wealth of optimal finite-support shift rules, notably some with smaller support.

2.8.b General definitions for shift and parameter folding
We treat folding abstractly. Here's the definition. The intuition behind a folding function is that in the decomposition f = f 1 + f 0 from Corollary 2.9, the folding function with p := 1/ξ 1 (notations from Remark 2.26) banks on the pperiodicity of the f 1 -term to recover the known shift rules [17] for f 1 . It ignores the f 0 -term, introducing an approximation error. The hope is that as f 0 decays towards ±∞, the approximation error introduced through the folding can be made small. That hope can be brought to fruition, as the next subsection shows.
The formal definitions are in the following Lemma 2.29: we refer to Item (a) as shift folding, and to Item (b) as parameter folding. Fig. 3 shows the estimator for parameter-folded PSRs that corresponds to the Single-Shot Estimator for unfolded ones, Fig. 2. Lemma 2.29 (Fundamental folding-lemma). Let Ξ be a frequency set, and p > 0 a real number such that 1/p is a common divisor of Ξ. Let ϕ be a PSR and τ a p-folding.
If ϕ is feasible for K Ξ , then the following hold.
(a) The PSR τ (ϕ) is feasible for K Ξ , i.e., for all f ∈ K Ξ and all x ∈ R we have The proof of Lemma 2.29 is given in §7.2.b, as it illuminates the relationships between the technical aspects of Def. 2.28 and shift/parameter folding.

Algorithm: Simple Folding Estimator
Input : x ∈ R Plus: Access to shot oracle F .

2.8.c An example of parameter folding with quadratic decay in approximation error
There are many ways to chose the folding function. Here we present one of them, for which we can prove its effectiveness in parameter folding using our Fourier-decomposition toolkit. The definition of the folding function is inside the following lemma. (The proof is mere arithmetic and therefore omitted.) We use the %-notations defined in (20) above.

Lemma 2.30.
For positive real numbers p, c satisfying p|c (i.e., c/p ∈ N), define the following function: The function τ p,c is a p-folding function with image ]−c − p, +c + p[.
For the remainder of this section, and in §7.2.c (which contains the proofs), we use the following big-O notation: We write g(c) = O(h(c)) for functions g = g(c), h = h(c) ≥ 0, to indicate the existence of an absolute constant C such that |g(c)| ≤ C · h(c) for all allowed c; if the constant is not absolute but depends on, say, "K", we write O K ( ). We also use the corresponding big-Ω notation.
Corollary 2.9 motivates the conditions in the following proposition, which quantifies the approximation error, operator norm (i.e., cost), and maximum parameter values. Proposition 2.31. Let Ξ be a frequency set, p a positive real number such that 1 /p is a common divisor of Ξ, and c a positive real number with p|c; let τ p,c as in Lemma 2.30. The Nyquist shift rule ϕ K parameter-folded by τ p,c as in Lemma 2.29(b), has the following properties.
(b) The operator norm of the linear operator 26 To parse the expressions involving products of p or c with K (in denominators or under the logarithm), note that cK ≥ pK ≥ 1.
The second inequality follows from the fact that 1/p divides the elements of Ξ one of which is K, and K > 0 holds as Ξ is a frequency set (Def. 2.4).
To summarize, first note that the expression on the RHS in Item (a) in the case "x ∈ [−p, +p]" is O K,C (1/c 2 ), i.e., we have quadratic decay of the approximation error in terms of the largest parameter value queried.
As for the value of p, in the situation of Def. 2.1, a greatest common divisor of the difference set of the spectrum of A would be expected to be known, and a smaller common divisor can be chosen to cover a larger interval in the order-1/c 2 case of Item (a). The number c is the main quantity to play with: In the interval [−p, +p], the approximation error goes down quadratically in c, while the maximum magnitude of a parameter value increases linearly, and the expected magnitude of a parameter value increases only logarithmically. Note that increasing c might allow to decrease C, as for each fixed f , C decays linearly as c → ∞.
The strict condition "x ∈ [−p, +p]" in Item (a) is merely to make the proof less onerous: The interested reader will, upon inspection of the proof in §7.2.c, realize that the quadratic decay of the approximation error holds if |x| ≤ p + (1 − Ω(1))c, and hence in particular for |x| = o(c). Generally speaking, it should be understood that the approximation error decreases as |x| ≪ c + p and increases as |x| approaches c + p. A look at the proof reveals that a more fine grained analysis would have in the denominator the term c+p−|x| instead of 2Kc, provided that |x| ≤ c+p−Ω K (1); this would lead to decays between (1/c) 1 and (1/c) 2 in that region of x's.

Non-existence results
In this section, we briefly discuss a few results of non-existence of the optimist's feasible PSRs, e.g., those with compact support. As non-existence is so frustrating, we won't spend too much time on it; the proofs are mostly only sketched.
Before we start, two things must be emphasized. Firstly, the non-existence results are only for feasible (as opposed to approximate) PSRs: As we already know from the discussions about truncation in §2.7, for every ε > 0 there exists an ε-almost-feasible PSR with compact support (and decent norm/cost). Having said that, note that the size of the support in these examples grows beyond all bounds as ε tends towards 0; we will prove below that that is unavoidable.
Secondly, recalling the discussion in §2.3.a, the reader should be aware: While the results of this section pertain to feasibility of PSRs for F1 /2 , with proofs making use of all frequencies ξ ∈ [− 1 /2, + 1 /2], by Prop. 2.10, non-existence of the described types of PSRs feasible for F ⊕ Ξ is implied for all frequency sets Ξ with max Ξ = 1 /2. (And from there, of course, by dilation, for all frequency sets.) We start with some observations involving slight modifications of our Nyquist shift rules.

2.9.a Other optimal PSRs
An inspection of the proof of Theorem 2.14 in §4.4 shows that, for all K > 0, the Nyquist shift rule ϕ K is the unique complex PSR that is feasible for F K and whose support is contained in the support of ϕ K . Indeed, the function u in (14) is the inverse periodic Fourier transform of the RHS of equation (13) from Lemma 2.12. (The Fourier Inversion Theorem applies as both u andû are absolutely integrable.) In particular, there is no other complex PSR that is optimal for F K , as, by Corollary 2.22, the support of such a PSR would have to be contained in the support of ϕ K .

2.9.b PSRs with compact support
By standard Paley-Wiener theory, a complex measure µ with compact support will have a Fourier transform that is a holomorphic function on a connected domain containing the real line. From standard complex variable function theory we know that two holomorphic functions on a connected domain must be equal if they coincide on a set that contains an accumulation point. This means that Lemma 2.12 restricts the choices forμ to only a single one: so that µ is the tempered distribution "for every Schwartz function, take the derivative at 1 /2". But this is a contradiction, as that particular tempered distribution is not a finite measure. 27 Hence, there is no compactly supported complex PSR feasible for F * .

2.9.c Exponential concentration
Now, using nothing but standard tools, we show that complex PSRs with exponential concentration cannot exist.
With a similar argument as in the case of compact support, we will prove the following in §8.1. We note that for Ξ := [− 1 /2, + 1 /2], the space F1 /2 contains the space G in Theorem 2.33, and hence no exponentially concentrated PSR is feasible for F1 /2 .

2.9.d Exponentially concentrated nearly feasible PSRs
We can also give impossibility results for complex PSRs with non-zero approximation error.
For this context, the concept of Nearly Feasible from Def. 2.23 is too limiting. Instead we simply take convergence of the approximations of the derivative in the point 0 to the correct value. The proof is sketched in 8.2. Note that the boundedness condition, ∥ϕ j ∥ = O(1) (j → ∞) is needed, otherwise the symmetric difference quotient would be a counterexample. 27 The equality [φ → ∂ 1/2 φ] = □ dµ for a finite measure µ would imply, for all positive integers k, 2πk = sin(2πk(x − 1 /2)) dµ(x), which would contradict the finiteness of the measure.

Numerical simulation
To make an attempt at understanding the practical utility of the Nyquist shift rule vs Banchi-Crooks's method, the author has designed a small Pluto 28 -notebook containing code in the mathematical computation programming language Julia 29 , which simulates the two methods numerically and allows us to produce colorful pictures.
The Approximate Stochastic Parameter Shift Rule (ASPSR) has been implemented as in [13], accommodating our idiosyncratic choice of ℏ = 1/2π. It is compared to a version of the truncated Nyquist shift rule (STNySR), tailored especially for the comparison to Banchi-Crooks: A positive real number T can be provided by the user which has the effect of limiting to the interval [−T, +T ] all parameter values in queries to the expectation-value function.
As the stochastic properties of the two estimators are identical, the numerical simulation ignores that aspect. It allows to query • In the case of the STNySR: Directly the expectation-value function as in (2.1) at a given parameter value x ∈ R; • In the case of the ASPSR: For given s, x, ε, directly the expectation value The truncation parameter, T , mentioned above is set to 1/8ε, to ensure that both methods use parameter values in the same interval. The Julia code produces random perturbed-parametric unitary instances as follows: A random matrix M with ±1 eigenvalues; a random positive-semidefinite trace-1 matrix ϱ; a random matrix A with ±1-eigenvalues; a random standard Gaussian Hermitian matrix B.
The Pluto notebook makes it is easy to make numerical simulations to compare absolute and relative errors between ASPSR and STNySR graphically in plots. While in the following we discuss some typical features and noteworthy behavior based on plots created with the code, the reader is invited to play with the notebook 10 and judge for her or himself. (2) STNySR is better than ASPSR when there is sufficient gap between the parameter value x and the cut-off ±1/8ε.
As A has ±1 eigenvalues we have K = 2, and hence the query points of the Nyquist shift rules are 1/4 apart. Sub-figure (c) of Fig. 4 shows break points in the green STNySR-line, which are caused by changes in the set of shifts that are queried. Fig. 5 shows the differences of relative errors between the two methods. Positive values mean STNySR is better. It can be seen (positive green 10th-percentile points) that in about 90% of the random instances, STNySR was better than the ASPSR, at least where the point x at which the derivative is requested is sufficiently far away from the cut-off 1/8ε = 12.5.
As the query points of the Nyquist shift rules are 1/4 apart, if x > 12.25, no query point of STNySR is to the right of x -leading to a noise-only "approximation" of the derivative.
It can be seen (positive blue median points) that for x ≤ 12, i.e., when there are at least 3 query points to the right of x, STNySR gives at least as good an approximation of the derivative as ASPSR, in at least half of the cases. It should be noted that the mean (not plotted) lies above the median, indicating that the advantage is, on average, substantial.
For a considerable region of the parameter, STNySR is roughly at least as good ASPSR in 99% of the instances (magenta 1st percentile data points hugging zero). 28 plutojl.org 29 julialang.org  Appendix D has plots of the results of more numerical simulations. The reader should understand that the presented numerical results are preliminary, and that more refined and extensive numerical simulations and statistical analysis are necessary for a comprehensive comparison of the methods. In particular, which (proper or not) shift rules are preferable in which parameter regions when the parameter values are constrained to an interval might be a topic of future research.

Fourier analysis of perturbed-parametric unitaries
In this section, we will deal with the Fourier-analytic properties of the functions in Def. 2.1: Perturbed-parametric unitary functions and expectation-value functions. We start by reviewing the (standard) notation we use.

Notations
Section 3 is somewhat demanding in terms of notations, as we have to deal with operator-valued tempered distributions.

3.1.a Tempered distributions
We denote the space of (complex-valued) Schwartz functions on R by S, and the tempered distributions on R by S ′ . We denote the duality between Schwartz functions (on the left) and tempered distributions (on the right) by "⟨ : ⟩"; e.g., for a Borel measure µ of at most polynomial growth we have ⟨φ : µ⟩ := φ(x) dµ(x), and for a measurable function f of at most polynomial growth we have ⟨φ : As we are using angle-brackets "⟨ : ⟩" for tempered distributions, we revert to parentheses "( | )" for the Hilbert-space inner product; it is linear in the right argument, anti-linear in the left argument.

3.1.b Operator-valued functions
The space of linear operators on a finite-dimensional Hilbert space H is L(H).
In all of Section 3, we denote by □ the operator / spectral norm (of operators on a Hilbert space or of square matrices, resp.). If T is a set and F is an operator-valued function defined on T , we let F ∞,T := sup x∈T F (x) ; for T = R we omit the subscript T on the norm.
For the concepts of linear decay and square integrability of operator-valued functions we use □ (although they are norm independent in finite dimension). Integration of operator-valued functions against complex-valued Schwartz functions and the Fourier transform are defined as usual: element-wise. This means, for example, the Fourier transformF of an operator-valued function F is defined through the condition: For all vectors ϕ, ψ in the underlying Hilbert space, We will also need a small addition to the "K Ξ " notation: Denoting the underlying Hilbert space by H, for an arbitrary finite non-empty set Λ ⊂ R, the complex vector space of bounded functions with values in L(H) whose Fourier spectrum is contained in Λ is denoted by We emphasize that Λ is not required to be symmetric.

The Fourier-decomposition theorems
Now we are ready to formulate the Fourier-decomposition theorem.

Proof of Prop. 2.7. We simply expand the definition of the perturbed-parametric expectationvalue function in
We set f 1 := tr MŨ ( □ )ϱŨ ( □ ) † . This is a bounded function with Fourier spectrum contained in Diff Spec A (e.g., [16,17], or see the proof of Lemma 3.5 below). By standard estimates, we find that each of the remaining terms has linear decay. As sums of functions of linear decay have linear decay, this completes the proof of the proposition.

Proof of Fourier-decomposition for unitaries
Notational conventions for §3.3. * . This section is somewhat hard on the alphabet. For that reason, we adopt the following notation convention: We denote matrices by typewriter-font letters A, B, C, . . . . Moreover, for operators denoted by A, B, U, . . . , when an orthonormal basis has been fixed, we denote the matrix corresponding to the operator by the typewriter version of the letter, e.g., A ↔ A, B ↔ B, . . . . We denote the identity matrix by I.
The proof of Prop. 3.1 is by degenerate (i.e., eigenvalues are not necessarily simple) eigenvalue perturbation theory, as everybody has learned it in their quantum mechanics class. To enable a mathematically rigorous proof, Appendix B reviews the mathematical foundation, Rellich's theorem, based on which it also summarizes textbook perturbation theory in matrix notation (Corollary B.2), for convenient reference.

3.3.a Proof of Prop. 3.1 (a),(b)
This subsection holds the proof of Fourier-decomposition, Prop. 3.1. We make some definitions and we will work with them throughout the subsection.
We use the notations of Def. 2.1, and denote by d the dimension of the underlying Hilbert space. Define U (x) := exp(2πi(xA + B)) for x ∈ R.
We use the following matrix big-O notation: By O ≥R (1/x 2 ) we refer to an unnamed squarematrix valued function x → M(x) defined on R \ [−R, +R], with the property that (Recall that is the spectral norm, but the property is independent of the norm, as we are in finite dimension.) If A, B commute, then the basis can be chosen to consist of eigenvectors of B, so that U =Ũ .
Proof. The remark about the case when A, B commute is trivial.
In the general case, we apply Corollary B.2 from Appendix B, and use the notation therein.
We set R := max(1, 2/r) and calculate, for |x| > R, From here, we treat the terms separately. We find: For the middle factor in ( * ), we find Items (e) and (c) of Corollary B.2 imply that all entries of all Λ [ * ] are real numbers, and hence, for the spectral norm of the matrix exponentials, we find exp(2πiΛ [2] /x) − I = 2πiΛ [2] 30 The factor 1/2 ensures that the power series are bounded as |z| → r /2. where in the first line we just use the Taylor remainder bound of the exponential function, whereas in the second line we use the bound on E Λ mentioned above. Now, continuing ( * ), we calculate Collecting terms, we conclude This completes the proof of Lemma 3.2, noting that O ≥1 arises from O ≥R by the continuity of the three summands on R \ {0}.
We can now derive Prop. 3.1 directly from Lemma 3.2.

3.3.b Proof of Prop. 3.1(c)
We have to show that

3.4.a Operator-valued tempered distributions
We set out to prove Prop. 3.4. The annoyingly technical proof is based on the Lie Product Formula e 2πi(xA+B) = lim n→∞ e 2πixA/n e 2πiB/n n , for all x ∈ R; we emphasize that the limit is taken pointwise. We will prove convergence of the Fourier transforms of the finite products to the Fourier transform of the RHS. The technical nuisance is that the Fourier transforms of the finite products are linear combinations of Dirac measures, while the infinite product has a continuous, square-integrable component. (We know that from the Fourier-decomposition theorem, Prop. 3.1.) Unfortunately, that means that the convergence can only happen in the sense of tempered distributions. Moreover, as we reason about the unitary-valued function (i.e., not the expectation-value function), we need operator-valued tempered distributions. The author wishes to emphasize that they don't add difficulty, just abstraction.
Operator-valued tempered distributions are, basically, matrices with entries in S ′ ; but while speaking of matrices requires fixing an orthonormal basis of the Hilbert space, we give the equivalent coordinate-free definition: An operator-valued tempered distribution τ is a sesqui-linear 31 mapping H × H → S ′ : (ϕ, ψ) → τ ϕ,ψ .
With this machinery, we can now get to work.

3.4.b Fourier spectra of perturbed-parametric unitary functions
The Fourier spectrum of the expectation-value function (9) relates to that of the unitary function (8) as in the following lemma. We sketch the proof for the sake of completeness.

Sketch of proof.
• Item (a) follows from the fact that tr F ( □ ) is a sum of matrix elements, each with Fourier spectrum contained in SuppF .
• For Item (b), we consider the matrix elements: For ϕ, ψ ∈ H we have, for all x, and the statement follows from standard Fourier analysis: The equation f * =f * (− □ ) holds also for the Fourier transform of tempered distributions f ; hence ⟨φ : f * ⟩ = 0 for all Schwartz functions φ with Supp φ ⊆ R \ (− Suppf ).
• For Item (c), consider the function of n variables we are interested in the Fourier spectrum of the operator-valued function x → Y (x, . . . , x). The matrix elements of Y are linear homogeneous polynomials of degree n in the matrix elements of the F j , j = 1, . . . , n, i.e., they are of the form g 1 · · · · · g n , where Supp g j ⊆ Supp F j , j = 1, . . . , n, and all these supports are compact.
The central mini-result of §3.4 is the following fact. This concludes the proof.

3.4.c Proof of Prop. 3.4
We pull the following lemma out for easier readability. x → e 2πi(xA+B) , and let U n , n ∈ N, as in Lemma 3.5. By the Lie Product Formula, for every fixed x ∈ R, the sequence of operators (U n (x)) n∈N converges to U (x), i.e., the sequence of operator-valued functions (U n ) n converges pointwise to U .
Claim. The sequence converges to U in the tempered-distribution topology. This is a known fact, modulo technicalities arising from the functions being operator-valued; the standard arguments are below, for the sake of completeness.
From the claim, as the Fourier transform is continuous, we find thatÛ Proof of the claim. Let φ : R → C be a Schwartz function. We have to show that the sequence of operators (⟨φ : U n ⟩) n converges to ⟨φ : U ⟩. It is sufficient (finite-dimensionality of H) to show convergence for every matrix element, i.e., for every ϕ, ψ ∈ H, we have to show (ϕ | ⟨φ : U n ⟩ψ) similarly for U in place of U n . As we have (ϕ | U n (x)ψ)φ(x) ≤ ∥ϕ∥∥ψ∥ · |φ(x)| for all x and |φ| is integrable, the condition in the dominated convergence theorem applies, and we have (ϕ | ⟨φ : U n ⟩ψ)

The derivative at 1 /2
We continue using the tempered-distributions notations defined in §3.1.
In this section, we discuss the Fourier-analytic characterization of measures computing the derivative at 1 /2 of functions in F * and of functions in F ↓ * ; cf. (12). We derive consequences for the space of feasible shift rules, and, last not least, we'll prove that the Nyquist shift rules are feasible.
We start by discussing the potential consequences of the Fourier-Decomposition theorem in the form of Corollary 2.9, i.e., the "F * vs. F ⊕ * ".

Impact of the Fourier-decomposition theorem
In the next subsection §4.2, we will prove the characterization, in terms ofμ, of the finite complex measures µ which satisfy (12): Integrating a against µ gives the derivative at 1 /2 for all functions in F1 /2 (Lemma 2.12). The current section proves that restricting to the smaller space F ⊕ * of Corollary 2.9 does not yield any additional feasible PSRs, i.e., we prove Prop. 2.10.
Fix a frequency set Ξ ⊂ [− 1 /2, + 1 /2] with max Ξ = 1 /2. As both taking the derivative at 1 /2 and integration against µ are linear, a characterization of derivative-computing measures for F ⊕ Ξ (cf. (12)) must consist of the following two parts: The first part takes care of f 1 in the notation of the Fourier-decomposition theorem, Prop. 2.7. It corresponds to condition (13) in Lemma 2.12, but it is now a condition on only a finite number of frequencies, exactly like for the unperturbed theory in Eqn. (8) of [18] (only that we have chosen the anchor point 1 /2 here, instead of 0 in [18]).
As for part (2.), the hope would be that, even though in the functions in F ↓ 1 /2 all Fourier frequencies in [− 1 /2, + 1 /2] are allowed to occur, the linear decay condition would result in a larger set of feasible PSRs.
Unfortunately, that does not seem to be the case: Not even restricting the space in part (2.) to only translated sinc-functions, increases the space of feasible PSRs, as the following lemma shows.
Clearly, (12) implies (weak-12); Lemma 4.1 together with Lemma 2.12 show that the two conditions are in fact equivalent.
That's it, now we verify ( * ) by direct calculation. For every x 0 ∈ R, we have (□ is inverse Fourier transform): This completes the proof of Lemma 4.1.
We can now finish off the proposition in §2.3.a.

Fourier-analytic characterization of derivative-computing measures, case F ⊕ *
In §2.4.a, we have proven one direction of Lemma 2.12. Here, we prove the other direction in the special case described in Corollary 2.9: We replace, in condition (12), the qualification "g ∈ F1 /2 " by "g ∈ F ⊕ Ξ ", for arbitrary frequency set Ξ ⊂ [− 1 /2, + 1 /2]. As a matter of fact, we will take a condition that is both more general and allows for a lazier proof -here is the statement (cf. Lemma 2.12): Proposition 4.2. Let µ be a complex measure on R. If (13) holds, then g ′ ( 1 /2) = g dµ holds for all functions g ∈ F1 /2 which can be decomposed as g = g 0 + g 1 where • g 0 is smooth and square integrable; and • g 1 has finite Fourier spectrum, i.e. 32 , there exists a finite Ξ ⊆ [− 1 /2, Proof. Let µ, g, g 0 , g 1 , Ξ as described. There exist complex numbers b ξ , ξ ∈ Ξ, such that Note that g 0 is the usual L 2 -Fourier transform of g 0 , in particular, it is an L 2 -function (not an evil tempered distribution).
By linearity, it suffices to prove the derivative-computing measure equation for each term of ( * ) separately: For the L 2 -part: For each ξ ∈ Ξ: 2πiξ e iπξ = e 2πiξx dµ(x).
The equations (B) correspond to the condition (13), evaluated in −ξ, so there is nothing to be done.
As noted in §2.4.a, the proof of the general case of Lemma 2.12, i.e., taking derivatives of all F1 /2 -functions, is in Appendix C.

Proof of the Space-of-Feasible-PSRs proposition
We now prove Prop. 2.13: The set of all signed measures µ satisfying (13) is either empty or an infinite-dimensional (real) affine space. If it is empty, there is nothing to prove. Otherwise let µ be such a measure.
For the ν a we take the finite signed measures cos(2πa □ ) sinc 2 (density on Lebesgue measure). We prove the conditions in the following two lemmas, but first we need a remark.
The statement about the Fourier transform of cos(2πa □ ) sinc 2 follows from the Fourier Inversion Theorem (e.g., [24,Theorem 4.11]). Before we prove the lemma, we finish the proof of Prop. 2.13. Let ν a := cos(2π(3a where λ is the Lebesgue measure. By Lemma 4.4, we have and these sets are disjoint from [− 1 /2, + 1 /2]. The proof of Prop. 2.13 is completed by proving Lemma 4.5: the linear independence of the ν a , a ∈ N. Proof of Lemma 4.5. The functions ν a , a ∈ N are linearly independent as their supports are disjoint; see (28). As the inverse Fourier transform is injective and Fourier Inversion holds, the measures ν a , a ∈ N, are linearly independent, too.

Proof of feasibility of the Nyquist shift rule
In this section, we prove Theorem 2.14: The measure µ defined in (14) satisfies the condition (12) of Lemma 2.12. The function u : Z → R in (14) plays a special role. In the next subsection, §4.4.a, we will prove the following.
Here, we are using the Fourier transform on Z: For all absolutely summable sequences v : the result will be a 1-periodic 34 continuous function. 35 Completing the proof of the theorem is now a piece of cake.

Lemma 4.7. For all ξ ∈ R we have
Proof. The functional equation (31b) of the dilogarithm gives the equation for all ξ / ∈ Z. We verify it by hand for ξ ∈ Z. In that case, on the LHS we have 2 · Li 2 (1) = π 2 /3 by (31a) and the Basel Problem (or §I.1 in [25]).
We define for all ξ ∈ R, (The equation for g 0 follows from the one for g by applying it for 2ξ.) Proof. This is a direct calculation: where in the last step we have used the equation for g 1 above.
We can now complete the proof of the proposition.

Proof of Item (b):
For f ∈ F K2 and x ∈ R, noting that f (K 1 □ /K 2 ) ∈ F K1 , we calculate, using in ( * ) that ϕ is feasible for F K1 : which proves Item (b).
As a direct corollary of the lemma we obtain the isomorphisms required in Prop. 2.11, noting that: (1) taking images of measures is a C-linear operation; (2) reflection and dilation map signed measures to signed measures; (3) reflection is self-inverse, and the inverse of dilation by α is dilation by 1/α. This completes the proof of Prop. 2.11.

Derivation of the Nyquist shift rules
Proof of Corollary 2.15. Applying Theorem 2.14, Lemma 2.12, and Lemma 5.1(a) to the µ defined in (14), we find that the measure ϕ := τ (µ), for τ : x → 1 /2 − x the reflection on 1 /2, is a PSR that is feasible for F1 /2 . We calculate, by linearity and norm-continuity of ν → τ (ν) and Eqn. (7), where in the equation marked with the asterisk we perform the change of variables a := 1 /2 − n. This proves the corollary.
Proof of Corollary 2. 16. From the previous corollary, we know that ϕ1 /2 is a PSR that is feasible for F1 /2 . Fix K > 0. Applying Lemma 5.1(b), we find that the measure 2K · ( □ /2K)(ϕ1 /2 ) is a PSR feasible for F K . Again invoking linearity, continuity, and Eqn. (7), we calculate and the proof of the corollary is complete.

Optimality
This section has parallels with [18].

The Weak Duality Theorem
Compared to [18,Prop. 6], the version of the Weak Duality Theorem in the current paper has the additional statement about the inequality being an equation.
[as ∥f ∥ ∞ ≤ 1] In the second equation, we have used the condition that The inequality −f ′ (0) ≤ ∥ϕ∥ -that we have just proven -is satisfied with equality if and only if the inequality ( * ) holds with equality.

Proof of optimality of the Nyquist shift rules
Here we give the proof of Theorem 2.21.
Proof of Theorem 2.21. In view of the Weak Duality Theorem, Prop. 2.20, we need to present a function Clearly, the function f ⋆ : x → − sin(2πKx) is in F K , and we have −∂f ⋆ (0) = 2πK. To compute ∥ϕ K ∥, first recall the well-known fact (Lemma A.1 in Appendix A.1) that the sum of all reciprocals of the squares of all positive odd integers is π 2 /8.
Regarding ϕ K , we have This establishes the second part of ( * ), and completes the proof of the theorem.

The support of optimal proper shift rules
We now sketch the proof of Corollary 2.22. Let K > 0 and assume ϕ is a PSR feasible for F K . Moreover, let f ⋆ := − sin(2πK □ ) be the dual optimal solution established in the proof of Theorem 2.21 above.
As we already know from the proof of Theorem 2.21 that −∂f ⋆ (0) is equal to the norm of the optimal shift rule, by the Weak Duality Theorem applied to ϕ, f ⋆ , equality in the inequality (16) there is sufficient and necessary for ϕ to be optimal.
Proof. The difference between the two measures is Taking the norm, we obtain 4K π ·
Adding the three terms we find that the RHS in ( * * ) is where we have used the notation Θ( ) = O( ) ∩ Ω( We split the value into three summands by distinguishing the non-overlapping cases and upper-bound each separately. For the case x − s ≤ −c − p we upper bound as follows (recall from Theorem 2.21 that ∥ϕ∥ = 2πK): The case c + p ≤ x − s is symmetrical and gives the same bound. For the case −c − p < x − s < +c + p we upper bound as follows: |s| d|ϕ|(s).
The first integral we upper-bound by |x| · d|ϕ|, so the first summand is upper bounded by |x|.
This concludes the proof of Prop. 2.31.
8 Sketches of proofs of non-existence results 8.1 Impossibility of exponential concentration

8.1.a Background on exponential concentration
We review some technicalities about exponential concentration. The proof of the following lemma is purely technical and given in Appendix A.3 for the sake of completeness. Hence, the only challenging part of the proof is that we do not assume that Ξ contains an accumulation point, i.e., an element ξ ∈ Ξ that is in the closure of Ξ \ {ξ}.
But with Ξ the setΞ also satisfies the condition of the theorem. This can be seen in two ways.
(1) By first extending the proof of Lemma 2.12 to the case where the condition "ξ ∈ [− 1 /2, of equation (13) is replaced with "ξ ∈ Ξ", and then noting that both sides of equation (13) are continuous functions of ξ, so the equality extends to limit points. (2) Directly from (12), with firstly g := cos(2πξ □ ), ξ ∈ Ξ, and secondly g := sin(πξ), ξ ∈ Ξ, by again noting that in all cases both sides of the equation are continuous functions of ξ.

Shift rules with approximation error
Sketch of proof of Corollary 2.34. For a proof by contradiction, assume that C, r, ϕ * exist. We invoke Alaoglu's theorem: "The dual unit ball is weak * compact." The space of complex measures is the dual of C 0 , the space of (complex valued) continuous functions vanishing in ±∞, and the total-variation norm is the operator norm on the dual (e.g., [28,Theorem 6.19]).
As {ϕ j | j ∈ N} is norm bounded, by passing to a sub-sequence of (ϕ j ) j , there exists a complex measure ϕ such that The proof of the corollary is completed by showing that (a) ϕ is exponentially concentrated, and (b) ϕ is feasible for G . The combination of (a) and (b) contradicts Theorem 2.33.
The arguments for (a,b) are entertaining exercises in calculus and measure theory, and we leave them to the bored reader.

Discussion and outlook
We have given a "proper" shift rule for analytic derivatives of "perturbed-parametric" quantum evolutions. The support of the measure is non-compact, and the tail, i.e., the fraction of the total variation that falls outside a given interval [−T, +T ], T → ∞, decays only linearly, as in a Cauchy-type probability distribution.
The question arises whether proper shift rules for analytic derivatives of perturbed-parametric quantum evolutions exist that have lighter tails. Given that we have also shown that the space of proper shift rules that give analytic derivatives has infinite dimension, at first sight, that doesn't appear impossible. However, we have also shown that proper shift rules (for perturbed-parametric quantum evolutions) with exponentially decaying tails do not exist. Indeed, the negative result holds for norm-bounded (i.e., worst-case variance bounded) families of proper shift rules that approximate the derivative with an approximation error that tends to 0.
On the positive side, the best result we have is a O(1/T 2 ) approximation error, for parameter values that are restricted to [−T, +T ] (this, by nature, cannot be "proper" shift rule).
In future research, it might be interesting to either expand the non-existence results, or to give shift rules with faster decaying tails.

APPENDIX A Miscellaneous math A.1 Sums of reciprocals of squares of odd integers
The following is a direct consequence of the Basel Problem, ∞ n=1 1 n 2 = π 2 /6.
Proof. In the following equations, the first and last equations are the Basel Problem: · π 2 /6.
Collecting the non-sum terms on the LHS gives the statement of the lemma.
We use it to prove the following fact. If lim |x|→∞ f (x) = 0, then f = 0.
Proof. The lemma is trivial if the frequencies ξ 1 , . . . ξ d have a common divisor, as then the function f is periodic.
In the general case, let f be as described, and set where ∥ ∥ stands for any norm. Assume f ̸ = 0; this implies c > 0 (as f can be extended to a holomorphic function on the complex plane). We prove that for every T > 0 we have ∥f ∥ ∞,[T,∞[ ≥ c/2. This implies that lim |x|→∞ f (x) = 0 does not hold.
Proof. The first statement a direct consequence of Lemma A.3; the second statement follows by grouping terms.

A.2.a Bounded functions with finite Fourier spectrum
We have used the following well-known fact all the time.

Proposition A.5. Let f be a measurable (complex-valued) function. If f is bounded and has finite
Fourier spectrum, then f is a (finite) linear combination of Fourier characters e 2πiξ □ , ξ ∈ R.
The tempered-distribution fact behind it is the following: If a tempered distribution has finite support Ξ ⊂ R, then τ is a linear combination of derivatives 36 ∂ α ξ ξ , ξ ∈ Ξ, with α ∈ (Z + ) Ξ . For τ :=f , that makes f have the form Elementary considerations about polynomials plus an argument similar to the one in the proof of Lemma A.3 show that such a function is bounded only if α ξ = 0 for all ξ ∈ Ξ.
B Finite-dimensional perturbation theory (for the theoretical computer scientist) We use the notations laid out in §3.3; in addition, in this section, e is just another letter (and not Euler's number exp (1)).
The mathematical cornerstone of finite-dimensional perturbation theory is Rellich's Theorem [30, Theorem 1]. We give here a much simplified version adapted to our needs.
Based on Theorem B.1 the usual undergrad quantum physics perturbation theory can be executed mathematically rigorously in finite dimension; the result is shown in the following Corollary B.2 -in matrix notation (i.e., for the theoretical computer scientist). A, B be Hermitian operators on a Hilbert space H of finite dimension d ∈ N. There exists an r > 0 and an orthonormal basis of eigenvectors of A, along with d-by-d matrices A, B, W [k] , k ∈ Z + , and diagonal d-by-d matrices Λ [k] , k ∈ Z + , such that the following holds. The proof of Corollary B.2 is well known; we sketch it. The matrix-valued function W is defined, for ℓ ′ , ℓ = 1, . . . , e, j ′ = 1, . . . , h ℓ ′ , j = 1, . . . , h ℓ , and all small enough |z|, by W (ℓ ′ ,j ′ ),(ℓ,j) (z) := (w ℓ,j (0) | w ℓ ′ ,j ′ (z));

Corollary B.2. Let
where (. | .) is the inner product in the Hilbert space H, and the w * ( ) are from Rellich's Theorem B.1. In other words, the columns of W(z) are the orthonormal eigenstates at z, wrt the basis at 0. The diagonal-matrix valued fuction Λ has on its diagonal the eigenvalue-valued functions λ from Rellich's theorem, in the corresponding ordering. Now, just expand the power series in z on both sides of the equation (A + zB)W(z) = W(z)Λ(z) and collect terms. Hence, there is no need to repeat the proof here.
C Fourier-analytic characterization of derivative-computing measures, general case In §2.4.a, we have proven one direction of Lemma 2.12, we now prove the remaining one in full generality: That (13) implies (12). Apart from standard calculations with Fourier(-Stieltjes) transforms and tempered distributions (most of which we are writing out in this section), the proof is based on the technique in the Paley-Wiener-Schwartz theorem that is contained in the following fact.
Lemma C.1 (Paley-Wiener-Schwartz [22,19]). Let τ be a tempered distribution with compact support, and ψ a Schwartz function with compact support such that ψ = 1 on Supp τ . The tempered distributionτ (inverse Fourier transform) is given by integrating against the following function 37 x → ⟨ψ · e 2πi □ x : τ ⟩, which has at most polynomial growth and is analytic.
In this section we arbitrarily fix a Schwartz function ψ with compact support satisfying Let us pull out some lemmas for easier readability.
37 Note that ψ · e 2πi □ x is a Schwartz function with compact support, for every x ∈ R.
Lemma C.2. Let f be a smooth function of at most polynomial growth which has compact Fourier spectrum. For all x ∈ R we have f (x) = ⟨ψ · e 2πi □ x :f ⟩.
Proof. From Lemma C.1 and the Fourier Inversion Theorem for tempered distributions, we know that the RHS defines a function of x, integration against which defines a tempered distribution that is equal to the tempered distribution defined by integrating against f . As both f and (again by Lemma C.1) the function defined by the RHS are continuous, pointwise equality follows.
Lemma C.3. If f is a smooth function of at most polynomial growth, such thatf has compact support, then f ′ has at most polynomial growth. Moreover, Proof. Let f as stated, and σ be the tempered distribution "integrating against f ". The tempered distribution ∂σ has support contained in Suppσ. Indeed, if φ is a Schwartz function with support in the complement of Suppσ, when so is ∂φ, so that ⟨φ : ∂σ⟩ = −⟨∂φ :σ⟩ = 0.
So Lemma C.1 is applicable: There exists a continuous function g of at most polynomial growth such that ∂σ is given by integrating against g. But for all compactly supported Schwartz functions φ we have φf ′ = − ∂φf = −⟨∂φ : σ⟩ = ⟨φ : ∂σ⟩. = φg, so that f ′ = g holds pointwise. This proves that f ′ is of at most polynomial growth, and we can apply Lemma C.2 to f ′ , which gives us ((a)). With ((a)), the equation ((b)) is purely a relation between tempered distributions and follows from (∂σ) ∧ = (2πi □ )σ.