Double-bracket quantum algorithms for diagonalization

This work proposes double-bracket iterations as a framework for obtaining diagonal-izing quantum circuits. Their implementation on a quantum computer consists of interlacing evolutions generated by the input Hamiltonian with diagonal evolutions which can be chosen variationally. No qubit overheads or controlled-unitary operations are needed but the method is recursive which makes the circuit depth grow exponentially with the number of recursion steps. To make near-term implementations viable, the proposal includes optimization of diagonal evolution generators and of recursion step durations. Indeed, thanks to this numerical examples show that the expressive power of double-bracket iterations suffices to approximate eigenstates of relevant quantum models with few recursion steps. Compared to brute-force optimization of unstructured circuits double-bracket iterations do not suffer from the same trainability limitations. Moreover, with an implementation cost lower

exceptional cases when it has been possible to find parametrizations that generalize to system sizes larger than a handful of qubits.Recently, a more algorithmic approach based on the classical Lanczos algorithm [4], widely employed for numerical diagonalization, has found application in a quantum device [5].Its innovation consists in implementing imaginary-time evolutions using accessible unitary operations.Ref. [6] provides another example for experimental preparation of eigenstate approximations for relatively large systems.Similarly to the approach taken here, evolutions under the input Hamiltonian were composed with simple evolutions to achieve eigenstate approximations.As we will see it is possible to use analytical means to guide such eigenstate preparation as opposed to optimization.This work will establish that the Głazek-Wilson-Wegner (GWW) flow [7,8,9] can play a conceptual role for quantum computing, namely it provides a feasible solution to the task of compiling diagonalizing quantum circuits.Applications of GWW flow in condensed-matter physics have been explored using classical computers and are showcased in the monograph by Kehrein [10], see Ref. [11] for a concise review.The GWW flow is an example of non-linear differential equations called double-bracket flows [12,13,14] whose mathematics is covered in the monograph by Helmke and Moore [15].
Having near-term quantum computing applications in mind, we will depart from the GWW flow.Instead, we will consider double-bracket iterations which can recover continuous flows as a special limit but give flexibility to attempt lowering implementation cost.It appears difficult to characterize the precise efficacy of variational double-bracket iterations analytically but numerical simulations show that just a handful of recursion steps can yield surprisingly good approximations of low-energy eigenstates of the quantum Ising model.
The manuscript is structured as follows.Sec. 1 i) defines general double-bracket iterations, ii) expounds on data structures for diagonalization in quantum computing, iii) formulates the specific diagonalization double-bracket iteration ansatz, iv) proves its diagonalizing monotonicity, v) discusses the quantum circuit components needed to implement doublebracket iterations and vi) formalizes the quantum algorithm by means of a recursive transpiling algorithm.More broadly, Sec. 1 demonstrates that in the double-bracket approach it is clear why and how diagonaldominance iteratively increases.
The recursive character of double-bracket iterations leads to an exponential runtime of the quantum algorithms in the number of iteration steps.Sec. 2 explores numerically what can be achieved with a small number of recursion steps and how to reduce their number.It demonstrates that few steps of a diagonalization double-bracket iteration suffice to achieve relevant state preparations.Double-bracket iterations can be approximated by quantum circuits using Hamiltonian simulation [16].Simulating evolution under input Hamiltonians is being actively explored in experiments [17,18,19] and so the proposed quantum algorithm lends itself towards near-term experiments.In particular, no controlled-unitary operations are needed.
Sec. 3 focuses on continuous double-bracket flows, in particular a variant of a double-bracket iteration is proposed which converges to the GWW flow.The obtained runtime is not efficient but the presented theory should be relevant for anyone wishing to explore quantum computing applications of double-bracket flows for tasks other than diagonalization.
Sec. 4 discusses double-bracket iterations in context of other approaches to constructing quantum algorithms.In particular quantum dynamic programming [20] can reduce the circuit depth of recursive double-bracket iterations, thus potentially strengthening their role for future quantum computing.Finally, Sec. 5 states the specific conclusions established in this work and suggests an outlook of double-bracket flows for scientific quantum computing.

Double-bracket quantum algorithms for diagonalization
This section will propose double-bracket quantum algorithms for diagonalization.They will be derived from double-bracket iterations.While similar to discretizations of double-bracket flows, double-bracket iterations allow for more flexibility.We will explore what happens if the individual evolution steps have relatively long durations.Additionally, we will retain a 'double-bracket' form of rotation generators which is similar to flows but will consider it as an ansatz for variational purposes.We will introduce flows only later in Sec. 3 because they are less relevant for nearterm quantum computing.

Definition of double-bracket iterations.
Let Ĥ0 be a Hamiltonian and D0 , D1 , . . .be a sequence of hermitian diagonal operators.We define the respective diagonalization double-bracket iteration (DBI) by the recursion starting with k = 0 and recursion step where Ŵk is given by Here, the step durations s k can be chosen variationally for the considered task.Sec. 2 will provide numerical examples.Sec. 3 will discuss the relation to the less relevant for quantum computing continuous double-bracket flows which can be recovered in the limit of an equidistant and infinitesimal step duration s k .When the bracket (2) involves a diagonal operator then we will say that the DBI is a diagonalization DBI because we will see that such iterations give rise to circuits well-suited for that task.Note, that nondiagonal (but hermitian) Dk could be useful for other purposes.
It should be highlighted that for variational purposes, the resulting ansatz will be parametrized by the parametrization of the diagonal operators Dk .This is different from unstructured parametrizations of circuits because the diagonal evolutions governed by Dk are 'folded' through the brackets (2).As we will see, this form makes it easier to search for the diagonal operators Dk . Eqs.
The DBIs considered here will vary the diagonal operators Dk and recursion step durations s k so as to increase the diagonalization rate in each step.Eqs.(1) and (2) allow to change the D0 , D1 , . . .generators, in hope to faster achieve a fixed point Ĥ∞ which commutes with any diagonal operator, i.e. diagonalize.
We will say that a sequence of double-bracket rotations, or more generally a sequence of approximations to double-bracket rotations, is a DBI.In Subsec.1.6, we will discuss how a DBI can be turned into a unitary circuit.As in the title of this work, we will refer to a DBI transpiled into a quantum circuit as a doublebracket quantum algorithm.

Diagonalization in quantum computing.
A Hamiltonian Ĥ0 can be considered as the input to the task of finding a diagonalization transformation.
Whenever specific examples will be required, Ĥ0 will be taken to be a Hamiltonian of L qubits and the D = 2 L dimensional Pauli matrices acting on qubit i will be denoted by Xi , Ŷi and Ẑi .Throughout, we choose Ẑi to be diagonal.Let us now discuss how one can approach diagonalization in quantum computing.
We will speak of eigenstate-by-eigenstate diagonalization if a unitary is applied to a basis vector, obtaining a particular eigenstate or an approximation thereof.Let us denote the computational basis vectors by |µ ⟩ with µ1, . . ., µL ∈ {0, 1} such that Ẑi |µ ⟩ = (−1) µi |µ ⟩.If a unitary Û is diagonalizing Ĥ0 then if we apply Û to |µ ⟩, we will obtain an eigenstate.More specifically, let Ĥ∞ = Û † Ĥ0 Û be diagonal then |µ ⟩ ∞ = U |µ ⟩ is an eigenstate with corresponding energy eigenvalue E µ because (5) A quantum circuit approximation to the diagonalizing transformation Û above can be viewed as an eigenstate-by-eigenstate diagonalization quantum algorithm which is global because for all states |µ ⟩ we use the same circuit.Brute-force optimization of circuits aimed at preparing individual eigenstate approximations has been studied extensively, see Ref. [1] for a review, and could be considered local eigenstateby-eigenstate diagonalization because the circuit can vary for different target eigenstates.
The aim of this section is to propose that DBIs are well suited for eigenstate-by-eigenstate diagonalization.A priori, the argumentation that will be presented will be based on a global cost function.However, when a diagonalization DBI has not advanced far, it may be viewed as performing local eigenstateby-eigenstate diagonalization for some selected states |µ ⟩.This is because even if the global cost function has not been reduced to zero, it can happen that applying the unitary circuit arising from the fewstep DBI will give good eigenstate approximation for some states |µ ⟩.In general, performance of local eigenstate-by-eigenstate diagonalization can be witnessed by measuring the energy fluctuation which vanishes for eigenstates and can be evaluated through experimentally accessible observables [6].
As an alternative, one may consider thermally encoded diagonalization where one would encode the input Hamiltonian Ĥ0 in a density matrix, e.g. as ρ(H0) = (1 + Ĥ0 /∥ Ĥ0 ∥)/tr(1 + Ĥ0 /∥ Ĥ0 ∥) .(7) Again an approximation to an exact diagonalization transformation U could be considered.In this case diagonalization effectiveness could be measured by, e.g., evaluating quantum coherence in the computational basis [21].While thermally encoded diagonalization may appear appealing, it should be noted that i) all observables would carry a lot of trivial noise due to the close proximity to the maximally mixed state, ii) not all of the exponentially many eigenstates are of equal interest and iii) certifying success may be difficult because, e.g., quantum coherence is a function of the entire state.For those reasons this framework will not be considered further in this work but it goes to show that the quantum data structure for diagonalization, in principle, may be chosen in more than one way.
We discussed two ways in which a quantum computer can be set up for the task of diagonalization.Next, let us discuss two oracles for the quantum computer to query the input Ĥ0 .The matrix dimension D = 2 L is expected to be large so, unlike in classical computing, for quantum computation it is more natural to build algorithms querying evolutions {e −it Ĥ0 } t∈R governed by Ĥ0 rather than one by one inputting matrix elements of Ĥ0 into a quantum computer.
The Hamiltonian simulation oracle translates classical knowledge of the values of the couplings in the input Hamiltonian Ĥ0 into evolutions e −it Ĥ0 .This should be viewed as being in line with the natural sparsity structure of Hamiltonians which are relevant in physics: The number of couplings will usually scale at most polynomially with the number of qubits L, rather than exponentially.Query access to this evolution oracle can be achieved by Hamiltonian simulation quantum algorithms with an almost linear runtime in the evolution duration [16].
An oblivious evolution oracle allows to query the application of unitary evolutions e −it Ĥ0 to a state governed by Ĥ0 for desired durations t.Here one does not need to know the precise couplings in the Hamiltonian, a situation close to the setting of analog quantum simulation where the evolution just 'happens'.If we disregard the knowledge about the couplings in Ĥ0 then the Hamiltonian simulation oracle can be seen as oblivious, simply evolving the state as queried.For readers familiar with QIBO, this is similar to having the freedom to change the 'backend' and being agnostic as to what is the specific implementation at hand [22].
DBIs reveal how to turn evolutions under the input Hamiltonian e −it Ĥ0 into an approximation of eigenstate-by-eigenstate diagonalization, thus doublebracket quantum algorithms work within the oblivious evolution oracle framework.The next subsection will derive analytical conditions for a DBI to tend towards a diagonal fixed point and after that we will discuss how to implement the arising double-bracket quantum algorithms on quantum computers.

Diagonalization double-bracket iterations
In the convention that all Ẑi operators are diagonal, their products with µ1, . . ., µL ∈ {0, 1}, form a basis of diagonal operators on L qubits.This means that any D = 2 L dimensional diagonal operator D can be expressed as the linear combination which involves the Hilbert-Schmidt scalar product [23] ⟨ Â, B⟩ The hermitian conjugation in the definition of the Hilbert-Schmidt scalar product is important because we will be using anti-hermitian operators Ŵ = − Ŵ † so that their induced Hilbert-Schmidt norm defined through There is no minus for hermitian operators, e.g, . Anti-hermitian operators arise naturally when considering brackets, i.e., commutators, of hermitian operators.For example, we will use brackets of hermitian operators Â, B Note that e s Ŵ ( Â, B) is unitary for any real duration of the evolution s which can be proven analogously to, e.g., why e −is Ĥ0 is unitary.In fact i Ĥ0 is antihermitian.
Let ∆ (Z) ( Ĵ) be a map associating a hermitian and diagonal operator to any hermitian operator Ĵ.We call ∆ (Z) (•) a diagonal association and it is specifically purposed to give rise to brackets which, when exponentiated, generate unitary evolutions.We speak of a map because ∆ (Z) ( Ĵ) could be a result of some optimization procedure applied to Ĵ for every step of a diagonalization DBI and thus ∆ (Z) could be used to variationally construct D0 , D1 , . . . on the fly.Let us next see some examples of diagonal associations.
The dephasing channel for L qubits is a diagonal association; it takes Ĵ and returns an operator with the same diagonal matrix elements and off-diagonal ones equal to zero.The output of the dephasing channel is again hermitian so the bracket, which we will call canonical, defined by generates unitary operators because it is antihermitian ( Ŵ (can) ) † = − Ŵ (can) .
The diagonal association can be chosen to be independent of Ĵ.For example we can set by sampling a random µ each time it is queried.Again, such an association allows to define a bracket For any diagonal association ∆ (Z) (•) we define the respective DBI by setting Dk = ∆ (Z) ( Ĥk ).So to build a DBI, in every step we could dephase as in (14) or pick a new random operator as in (16).The diagonal association ∆ (Z) ( Ĵ) could be extended to be a function of the recursion step number k and be built on the fly to serve the purpose of the iteration.For example for each given Ĥk one can perform rounds of closed loop optimization varying the parameters of an ansatz for ∆ (Z) ( Ĥk ) and aim to obtain Ĥk+1 yielding a low cost function value (e.g., be more diagonal).For clarity let us summarize this theoretical approach by the following abstract algorithm.

Input
K: total number of recursion steps s: a vector of K recursion step durations ∆ (Z) : diagonal association Ĥ0 : input Hamiltonian

Return ĤK
In the remainder of this section, we will discuss i) why the above examples of diagonal associations (14) and ( 16) have diagonalizing tendencies, ii) how to implement Eqs.(1) and (2) on a quantum computer, and iii) what will be the runtime, i.e. implementation cost to run a diagonalization double-bracket quantum algorithm.

Conditions for diagonalization using double-bracket iterations
Let us next derive a monotonicity relation for diagonalization DBIs.It is related to a similar relation known for the GWW flow: This predecessing result can be thought of as a special case of the discussion below, relevant in the continuous limit of the DBI iteration with the canonical bracket (15), i.e. taking s k infinitesimally small.
The DBI monotonicity relation will be derived by studying what happens in a single recursion step.Let Ĵ be a hermitian matrix and let be its double-bracket rotation for time s by a bracket (13) which involves a diagonal association ∆ (Z) ( Ĵ).
To analyze the effect of this double-bracket rotation, it is useful to define which is the off-diagonal restriction of Ĵ.Here ∆( Ĵ) is the dephasing channel which is in general different from ∆ (Z) ( Ĵ).We will see that ∆( Ĵ) will appear nonetheless in the analysis and thus the diagonal association via the dephasing channel gives rise to a bracket that we will call canonical [10].
We will use the squared Hilbert-Schmidt norm as the measure of the diagonalization progress and define With this definition we have a result which will be proven just below.This formula uncovers a large degree of flexibility in devising diagonalization DBIs because we merely need that i.e, the Cauchy-Schwarz angle θ needs to be restricted to [0, π/2).We thus see that in the geometry induced by the Hilbert-Schmidt scalar product, the canonical bracket is special in that it is a choice of doublebracket rotation direction which leads to a step of diagonalization as long as It may be handy to say that as long as flow step durations s k are small enough then the dephasing channel (which is related to the continuous GWW flow) gives rise to unconditionally diagonalizing DBIs.On the other hand, according to (23) general diagonal associations give rise to conditionally diagonalizing DBIs, where in each step one may need to check whether the sign should be flipped.If Ŵ (Z) points away from Ŵ (can) then redefining the sign of the diagonal operator ∆ (Z) ( Ĵ) → −∆ (Z) ( Ĵ) will give rise to a bracket which will have a diagonalizing effect.The reader should note that this is generic as opposed to an unconstrained ansatz optimized by brute-force.
In a diagonalization DBI we will have the special case Ĵ = Ĥk .Once the evolution (18) will have progressed far enough to some time s k then the decrease of the magnitude of the off-diagonal terms will turn around into an increase.To proceed further with diagonalization, we set Ĥk+1 = Ĵ(Z) s k and proceed to a rotation of Ĥk+1 with a new bracket.This assignment of s k is locally optimal in k but for a DBI with K > 1 total steps it may not be globally optimal; it is the greedy strategy for optimizing DBI step durations s = (s 1 , . . ., s K ).This is a viable strategy but other may be possible and the following definition allows to further conceptualize the task at hand.

Definition 1 (σ-decreasing). A sequence of unitaries
This definition captures the operational goals of a quantum device because as long as an evolution is effective in lowering the off-diagonal matrix elements then the precision of implementing the DBI step is a secondary matter and 'errors' may potentially be tolerable.We next derive the monotonicity relation (22) which can aid theoretically in experimental explorations.

Lemma 2 (σ-decreasing double-bracket rotations).
We have As a corollary to this statement, the monotonicity relation (22) follows by dividing both sides by s > 0 and taking the limit s → 0.
Proof.Using the integral form of the remainder in the Taylor expansion we have where and We use that the Hilbert-Schmidt norm is induced by the respective scalar product and insert the Taylor expansion obtaining Here for higher-order terms we made analogous bounds as in (29) and collected them into O(s 2 ) with an updated constant.Next we will use that and which involves basic linear algebra discussed in the Appendix A. Thus where the minus sign appeared due to the Hilbert-Schmidt scalar product definition (10).
This results shows that if the variational bracket is chosen to be the canonical bracket then for small enough duration of the double-bracket rotation we will obtain a σ-decrease.
Corollary 3 (Unconditional monotonicity of GWW DBI).For all steps k in the abstract algorithm 1 we have For general variational brackets that arise from some diagonal association we can summarize the possibility of obtaining a σ-decreasing sequence for Ĥ0 as follows.
Corollary 4 (Generic diagonalizing DBIs).Let ∆ (Z) (•) be a diagonal association.For any input Hamiltonian Ĥ0 and ϵ > 0, there exists a σ-decreasing DBI with an association ∆ (Z) satisfying or This means that once we choose to rotate with brackets involving diagonal operators, it is the i) double-bracket and ii) recursive structures that lead to diagonalizing properties.The choice of the diagonal association influences the performance of the corresponding DBI, however.
Proof.We proceed inductively, constructing the modified association on the fly.If for some k the original bracket vanishes [∆ (Z) ( Ĥk ), Ĥk ] = 0 then we add a diagonal operator Âk such that ∆ (Z) ( Ĥk ) = ε k ∆ (Z) ( Ĥk )+ Âk gives rise to a non-zero bracket which either for ε k = +1 or for ε k = −1 will be collinear with the canonical bracket.Taking s k sufficiently small ensures a σ-decrease in that step.This procedure can be repeated for all steps.Finally, we note that ∥ Âk ∥ can be small enough so that (36) holds.
Note, that this proof formalizes the claim that the DBIs proposed have generically σ-decreasing properties.If [∆ (Z) ( Ĥk ), Ĥk ] ≈ 0 then instead of sticking to that diagonal operator and modifying it subtly it may be better to choose a completely different one.In other words, the corollary is concerned with the fact that diagonal associations generically lead to diagonalizing DBIs but it does not make a statement about the rate of diagonalization.The diagonalization rate should best be analyzed for the given input Hamiltonian Ĥ0 because different sequences of D0 , D1 , . . .will perform differently for different settings.The next subsection discusses a special case when D0 , D1 , . . .are all equal and so convergence characteristics can be studied analytically.

Brockett-Helmke-Mahony-Moore doublebracket iterations
Moore, Mahony and Helmke [24] proposed Lie-bracket recursions, see also the monograph by Helmke and Moore [15,Ch. 2.3], which are equivalent to DBIs proposed here but came to the attention of the author only in last stages of finalizing this manuscript.Eqs.(1) and (2) anticipate that it may be useful to use different diagonal operator Dk in each recursion step.However, Brockett [25,26] proposed a doublebracket flow involving a single diagonal operator D * and Moore, Mahony and Helmke [24] have considered DBIs where each double-bracket rotation involves the same diagonal operator D * .Thus it appears that Liebracket recursions arose by considering numerical diagonalization schemes, similar to this work which considered initially the GWW flow but concluded that generalizing small-step discretizations to large step durations is worthwhile, now also for quantum computing.In this subsection we will summarize how the mathematical analysis of the Brockett-Helmke-Mahony-Moore (BHMM) DBI [24] provides insights about the performance of quantum computing applications of DBIs.
Firstly, all fixed points Ĥ∞ of the BHMM DBI have the property [ Ĥ∞ , D * ] = 0. Secondly, assuming that D * has a non-degenerate spectrum allows to show that Ĥ∞ is diagonal.Thirdly, the only stable equilibrium point is Ĥ∞ which has the same sorting of eigenvalues as D * .Finally, because we are not switching Dk in every recursion step, the BHMM DBI can be seen analytically to converge.In particular, close to the equilibrium points the convergence is exponential.
This eventual exponential convergence rests on reaching the basin of asymptotic attraction.In the quantum case we are expecting exponentially large matrices and as we will see in the next subsection double-bracket quantum algorithms have an exponential implementation cost in the number of recursion steps.What this means is that in quantum computa-tion implementing the initial non-asymptotic steps is key as otherwise the exponential stability of the diagonal fixed point cannot be salvaged.The variational formula (26) is a first step to systematically understand how to vary and select Dk in each recursion step.
Furthermore, the analysis of the BHMM DBI highlights the need to avoid spectral degeneracies [24].In near-term quantum computing, Dk should be simple enough to implement so for example the classical Ising model Dk = Here the 'magnetic' fields and qubit couplings B i , J i,j are the parameters of the restricted variational ansatz and as a rule of thumb should be chosen such as to avoid spectral degeneracies.
Finally, Moore, Mahony and Helmke [24] provide an idea to compute the scheduling of DBIs.The most basic formula is s * = 1/(4∥ Ĥ0 ∥ HS •∥ D * ∥ HS ) which can be generalized to a step-dependent scheduling [24] expressed in norms involving Ĥk .This is a quantitative formula for 'small-enough' step duration obtained by bounds based on Taylor expansion similar to Eq. (26).For near-term quantum computing, it should be noted that Hilbert-Schmidt norms of innocuous operators can be exponentially large and thus this scheduling may be unnecessarily too pessimistic.
Having said that, together with this double-bracket rotation duration the BHMM DBI provides an example of an unconditionally diagonalizing DBI and thus facilitates obtaining a feasible solution to the task of compiling a diagonalization unitary.Ref. [27,Thm. 2.3] provides one more, related, perspective which derives conditions for exponential convergence of iterations related to DBIs.Next, let us look at the details how to implement DBIs on quantum computers.

Double-bracket rotations via group commutators
A DBI is a sequence of double-bracket rotations.We will now discuss how to implement individual doublebracket rotations on a quantum computer.A DBI is a priori only an analytical ansatz for a diagonalization iteration but we will also see that it is possible to implement iterated double-bracket rotations, thus arriving at double-bracket quantum algorithms.
The group commutator prominently features in the Solovay-Kitaev quantum compiling scheme [28] and is defined by the formula which involves two hermitian operators Â, B. It yields an evolution step approximately governed by the commutator [29] e −[ Â, B] = V (GC) ( Â, B) + Ê(GC) , (38) where ]∥ in any unitarily invariant norm, e.g., the Hilbert-Schmidt norm.This constant captures the non-cummutative character of the operators involved and a selfcontained proof using the technique of local-error bounds [30,16,31] is provided in Appendix A.
Using the group commutator formula, we see that if we have oracle access to evolution {e −it Ĵ } under an evolution generator Ĵ and to evolutions {e −it D } under its diagonal association D = ∆ (Z) ( Ĵ), then we obtain an approximation to the double-bracket rotation with The approximation error can be bounded as Interestingly, this local error bound of the group commutator admits a further upper bound which is proportional to the norm of the corresponding generating bracket.If we refine this R times and scale the evolution time r = s/R, then we can converge onto the exact double-bracket rotation according to The group commutator should be viewed as a means of approximately implementing a DBI step using simpler evolution operations.It is important to recognize that the group commutator formula (37) constitutes an elemental component for building more complex quantum algorithms.It involves forward and backward evolution under two different generators.If the evolution time is short enough, it yields an evolution step governed by the commutator of these two operators.Equivalently, we may rephrase this in language of queries [32,33]: In order to query the forward propagation under the commutator of two operators we must query their forward and backward evolution interlaced as in Eq. (37).
Note that the group commutator relies on the capability to invert the evolution.The signs and ordering of these evolution steps decide which sign the resulting commutator in the generator will have.We will use in the following that the redefinition Â = − Ĵ and B = D yields using Eq.(38) In other words, the same approximation is obtained, but using a different order of queries.

Group commutator iterations
This sets the scene for formulating the group commutator iteration (GCI) which is meant as the routine that is going to be easiest to implement on a quantum computer.It arises by prioritizing what really matters in a DBI.Firstly, the diagonal operators D0 , D1 , . . . of the inline formulation of DBI around Eqs. (1) and (2) should be simple to implement.This may not be the case for diagonal associations appearing in abstract algorithm 1 because those may include the dephasing channel.Secondly, DBI is formulated using evolutions under exact brackets.This is useful for analytical derivations but for implementations one should favor hardware practicality.Finally, the full group commutator is not needed because there is additional structure allowing to reduce the number of queries to the evolution oracles of the input Hamiltonian Ĥ0 .
More specifically, firstly we will assume that for each recursion step we have a classical prescription d µ for the diagonal operator Dk = µ∈{0,1} ×L d µ Ẑµ .When this is a concise description, e.g. it could be the local magnetic field Hamiltonian or the classical Ising model, then this evolution can be efficiently implemented using Hamiltonian simulation or Clifford operations [23].Secondly, we will not aim to perform exactly a double-bracket rotation but only approximate it using the group commutator in each recursion step.So, let us set for k = 0, 1 . . .
and define Finally, we use the ordering (41) to define a reduced group commutator sequence and obtain for This can be summarized as follows.

Abstract algorithm 2 Group commutator iteration (GCI)
Input K total number of recursion steps s a vector of K recursion step durations D a vector of K hermitian and diagonal operators Ĥ0 input Hamiltonian

ĤK output Hamiltonian
A GCI can be modified to converge onto a DBI by investing sufficient runtime overhead as in Eq. (40).However, the notion of σ-decreasing stresses that for diagonalization it need not be our goal per se to implement a particular DBI but rather we can keep ourselves satisfied with a different iteration, as long as it is σ-decreasing.There may be a gain in diagonalization efficacy by implementing a DBI more closely but it may not be worth it, at least in near-term, and GCIs should be considered first for hardware applications.Thus the abstract algorithm serves to stress the view that GCIs in their own right are worthwhile for near-term applications.
Above we defined DBIs as having recursion steps that involve an exact double-bracket evolution while GCIs strictly speaking only approximate such an evolution.However, it should cause no confusion to conceptually think of GCIs as essentially DBIs because every recursion step of a GCI approximates a doublebracket rotation (38).If we use the same sequence of diagonal operators D0 , D1 , . . .then the arising diagonalization DBI and GCI are naturally corresponding.In other words, for conceptualization one should think in terms of DBIs while for applications one should consider GCIs.

GCI transpiler for quantum computing
In step k of a DBI we need to make an evolution step governed by Ŵk = [ Dk , Ĥk ].We can approximate the double-bracket rotation of Ŵk using the group commutator but for that we need oracle access to evolutions under the k-th iterated Hamiltonian {e it Ĥk }.Unless we have it, then a DBI, or GCI, despite being formulated using unitary operations, is not directly deployable on a quantum device.
For a GCI to be deployable we need to construct the oracle access to {e it Ĥk } using the basic oblivious evolution oracles of input Hamiltonian {e it Ĥ0 } and of the diagonal operators {e it Dk }.Doing this translation is the role of a transpiler which is tasked with listing all queries to the oblivious evolution oracles in the order in which they are composed in the GCI.The basic evolutions {e it Ĥ0 } and {e it Dk } can be further transpiled into primitive gates of a given device using standard quantum compiling methods.
The key idea here is frame shifting.Let Vk be the complete GCI unitary such that then we will use unitarity to express Thus we can shift the frame using Vk and query the input evolution under the input Hamiltonian with the desired time t.This means that Vk composed with the group commutator recursion step together with (47 The transpiler will output the circuit V K of VK for the final step K if it recursively keeps track of the list of queries needed to implement V1 , V2 , . ... If Vk in Eq. ( 49) is already transpiled into queries to evolution oracles then we would call that form 'unfolded' as opposed to (48) which is still implicit.The following transpiling algorithm formalizes this idea.

Abstract algorithm 3 T (GCI) : GCI circuit transpiler
Input K: total number of recursion steps s: a vector of at least K recursion step durations {e −it Dk }: at least K oblivious evolution oracles with time t ∈ R and diagonal generators Dk labeled by k = 0, 1, . . .{e −it Ĥ0 }: oblivious oracle access to evolution under Ĥ0 for time t ∈ R Output V K : ordered list of queries to the input evolution oracle {e −it Ĥ0 } or diagonal evolution oracles {e −it Dk } whose ordered composition yields the circuit of unitary VK yielding ĤK = V † K Ĥ0 VK satisfying K GCI recursion steps ) Set frame-shifted evolution query list as in Eq. ( 47) Below the quantum algorithm environment describes how to use the above abstract algorithms of a GCI and its transpiling to run GCI on a quantum computer.For this it is meaningful to make the folowing consideration: if and so it approximates an eigenstate as long as the iteration progressed sufficiently far.It is important to note that we are considering VK and so the output of the transpiler above needs to be reversed because it was working in the convention of iterating the Hamiltonian and not the state.For this we will consider the list of queries to evolution oracles outputed by the GCI transpiler V K , we will reverse that list reverse(V K ) and apply to the state.Finally, we assume that when combining lists, such as in the assignment of P K−1 above, the result is one single list which is automatically 'flattened' by iterating through all elements of the lists in the appropriate order and creating a new list.

Input
K: total number of recursion steps s: a vector of at least K recursion step durations {e −it Dk }: at least K oblivious evolution oracles with time t ∈ R and diagonal generators Dk labeled by k = 0, 1, . . .{e −it Ĥ0 }: oblivious oracle access to evolution under Ĥ0 for time t ∈ R; its queries are applied to the registers of the input state Note that this quantum algorithm aligns with three design characteristics and the presence of each of them is beneficial.It is an iteration which in this specific case has the advantage that its outputs can be studied along the way.Assuming it is σ-decreasing then its fixed point is useful because it yields diagonalization and as the iteration progresses an increasingly better approximation is obtained.Finally, the operations employed are natural, namely Hamiltonian and diagonal evolutions suffice to implement the procedure.
Runtime as query complexity of the oblivious evolution oracle.We will assume that the evolution with the operator e is k Dk can be implemented using Clifford operations easily and so will not include it in the runtime analysis.Instead we will count the number of queries to the oblivious evolution oracle of the input Hamiltonian needed to prepare Vk and will denote it by N (k).It satisfies the recursion The solution of this recursion, with N (1) = 1 as the starting point, gives the final result of queries to the evolution oracle to perform K GCI recursion steps so an exponential scaling in the number of steps.
Ref. [20] proposed a way to reducing this runtime to polynomial at the cost of appropriately large qubit overheads.

Dynamical diagonal association and double-bracket recursions
In this section we shall explore a class of DBIs with 'dynamical' diagonal associations.They will be dynamical in the sense that without classical knowledge of an evolution generator Ĵ, it will be possible to transform the oracle access to the evolutions e −is Ĵ into oracle access to the evolution under the associated diagonal generator e −is∆ (Z) ( Ĵ) .One example of a diagonal association map which can be implemented 'dynamically' on a quantum computer is the dephasing channel.It is also known as pinching and amounts to the restriction of an operator to its diagonal.The dephasing channel is a mixedunitary quantum channel for any D [34, Eq.(4.113)] but the most important case is for L qubits, so with D = 2 L .Then where irrespective of the order in this product ∥ Ê(∆) ∥ ≤ 8s 2 ∥ Ĵ∥ 2 , see appendix B. This means that it is possible to perform the evolution under ∆( Ĵ) for time s by interlacing the evolution under Ĵ in steps s/D with phase flips Ẑµ defined in Eq. (8).These phase flips can be implemented using Clifford operations [23].
Again, let us rephrase this as an elemental quantum algorithmic component in the language of queries: We obtain query access to evolution governed by the diagonal association given by the pinched generator by repeatedly querying evolution under the input generator and conjugating it by phase flips.This query access is 'dynamical' because it obliviously turns Hamiltonian evolution to evolution under the pinched generator.Note the difference to the above not 'dynamical' case of classical prescribed diagonal operators because then the diagonal association depends only on classical knowledge that informs the choice of D0 , D1 , . ... Finally, we can further generalize this to say that the diagonal association should admit an approximation by a product formula.More specifically, we want there to exist a finite sequence of I + 1 unitaries Qi such that with the error term satisfying ∥E (∆ (Z) ) ∥ = O(s p ) with p > 0. The dephasing channel is an example of such a diagonal association with Qi 's given by appropriate phase flips and I = 2 L , s 1 = s 2 = . . .= s I = s/D so that p = 2. Unitary-mixing channels in general admit a similar approximation by a product formula.The product formula approximation facilitates covariance under shifting of the frame, a property needed for performing multiple recursion steps on a quantum device.Let L be a unitary which implements a shifting of the frame and consider any generator Ĵ then we have and so we automatically get a prescription for the implementation of the diagonal association with the same error.
Covariance under frame shifting allows to progress with a recursion.
) is the first recursion step, then we set L = V1 and the frame-shifting covariance formula (55) can be used in the group commutator to implement the next recursion step Repeating this 'unfolding' results in the hardware prescription for the GCI which includes a 'dynamical' diagonal association that can be approximated by a product formula (54).
Query complexity for dynamical diagonal associations.The dephasing channel is an instructive example of a 'dynamical' diagonal association covariant under shifting of the frame because it is involved in the canonical bracket.According to the monotonicity relation (24) a double-bracket rotation involving the canonical bracket is σ-decreasing for sufficiently small rotation duration s.This is unconditional, as opposed to taking an arbitary diagonal operator Dk which generates a diagonalizing double-bracket rotation conditioned on choosing the sign correctly (either + Dk or − Dk will yield a variational bracket collinear with the canonical bracket).However, while a variational operator Dk may be easy to implement, e.g. by Clifford operations, for 'dynamical' diagonal associations we have an approximation by a product formula (54) through I queries to the evolution oracle of the input Hamiltonian.Thus the queries to e ±i √ s k−1 ∆ (Z) ( Ĥk−1 ) dominate and so When the diagonal association map is given by the dephasing channel we have the GWW DBI and I = 2 L where L is the number of qubits so the runtime of K GWW DBI steps is exponential in both K and L. Indeed, in the worst-case, input matrices can be dense and the full unitary-mixing representation of the dephasing channel is needed to pinch them.As will be discussed below conjugation by a polynomial number of phase flips suffices for sparse evolution generators and then N I (K) = (poly(L)) K .

Efficacy of diagonalization DBIs
The abstract algorithm of the GCI circuit transpiler highlights an important aspect, namely GCIs result in recursive circuits.Thus as a recursive GCI progresses, each additional recursion step becomes more costly than the previous ones.In order to get general understanding of the expressive power of DBIs, in this section we turn our attention to DBIs and abstract from intricacies of group commutator approximations involved in GCIs.More specifically, the aim is to address what eigenstate approximations can be achieved with only a few recursion steps.

Numerical examples of DBIs
Numerical example: Preparing eigenstates.We will begin the assessment of the efficacy of diagonalizing DBIs by exploring the GWW DBI which is a DBI where Dk = ∆( Ĥk ), i.e., at every step we use the diagonal association given by the the dephasing channel (14).We then use that to compute the canonical bracket (15) at every step and proceed with the DBI recursion (1) and (2).
This will provide intuition about how many recursion steps may appear to be needed in proof-ofprinciple explorations.The analysis includes optimization of recursion step durations by a greedy protocol: For every recursion step we find the step duration which results in the maximal reduction of the Hilbert-Schmidt norm of the off-diagonal restriction of the respective Ĥk .
Let us consider the GWW DBI applied for the transverse and longitudinal-field Ising model For the numerical calculations we will set J X = 2 and L = 9.Regarding this choice of the model, quantum simulations are often explored by means of the simpler transverse-field Ising model which does not feature the longitudinal on-site field Xj aligned with the Ising interaction Xj Xj+1 and thus involves two rather than three local Hamiltonian terms.However, then the model is exactly solvable by a mapping to free fermions [35], while in presence of the longitudinal field such solution is not directly available.In other words, considering the logintudinal field Xj is not a big experimental complication and may make the obtained results more representative of generic interacting spin systems.Fig. 1a) shows the monotonic decrease of the Hilbert-Schmidt norm of the off-diagonal restriction ∥σ( Ĥk )∥ HS as the GWW DBI progresses.The duration of flow steps has been optimized to mitigate against the recursive character of the proposed implementation which is encoded on the horizontal axis.Fig. 1b) shows that the diagonal restrictions ∆( Ĥ15 ) and ∆( Ĥ30 ) after k = 15 and 30 steps reproduce the energy spectrum of the Hamiltonian obtained by exact diagonalization.More broadly, this suggests that the GWW DBI is an appealing pre-conditioning method, i.e., unitary transformation increasing the diagonal dominance of an input Hamiltonian.This can be useful for further quantum information processing because other protocols may have an improved runtime then, e.g.runtime of ground-state preparation using quantum phase estimation depends on the initial overlap with that ground-state [36].
Evaluating expected values of the flowed Hamiltonian E (k) µ = ⟨µ| Ĥk |µ⟩ for a choice of computational basis states |µ ⟩ is a basic way of extracting physical information from a quantum device.In an experiment, E This investigation revealed the maximally polarized state |11 . . . 1 ⟩ to be the candidate of choice for potential explorations using a quantum device.This is because it tends towards a low-energy state and just k = 5 steps result in a narrow energy fluctuation.Thus based on this state a near-term device could probe the low-energy physics of the model.
Finally, the later stages of the iteration result in splitting narrowly positioned energy levels which has to happen in order to diagonalize the quantum manybody system.Such splitting may be detectable experimentally when the energy levels split beyond the respective range of energy fluctuations but ultimately this effect may be sensitive to implementation imperfections and thus hard to observe in a near-term quantum device.
In summary, Fig. 1 presents numerical results instructing about possible avenues of exploring diagonalization DBIs in a quantum device.We find that even if the norm of the off-diagonal restriction may appear large, selected states may be close to eigenstates.This means that for that limited number of recursion steps the DBI achieves local rather than global eigenstate-by-eigenstate diagonalization.In particular, the maximally polarized state |11 . . . 1 ⟩ is an example of a state that should be considered for exploring DBIs experimentally because it results in very low show the measurable energy fluctuation Ξ k (µ) which become small for sufficient number of DBI steps.Just k = 5 steps yield a surprisingly good approximation to a very low-energy eigenstate and thus seems appealing for potential explorations of the GWW DBI in a quantum device.After k = 10 recursion steps the maximally polarized states show big variations both in expected energy value and fluctuation which should be linked with coupling to other states as these are being reorganized.These variations stem from GWW DBI being a global diagonalization method aiming to decrease the norm of the off-diagonal restriction ∥σ( Ĥk )∥ HS so most states lower their energy fluctuation but some may temporarily increase.expected value of energy and reaches a narrow fluctuation range within a small number of steps.

Numerical example:
Step duration optimization landscape.Fig. 2 presents a minimal example of the GWW DBI for the transverse-field Ising model on L = 3 sites and Ising coupling J X = 1.For this small system we can directly inspect a color-coded visualization of the Hamiltonian matrix, see Fig. 2a).
The initial few steps lead to a rapid decrease of the norm of the off-diagonal terms, however, further down the line even taking very many steps does not allow to fully diagonalize the problem and the iteration results in block-diagonalization.In such special cases, it is possible to choose a different diagonal association and pursue diagonalization further.
The upper-right inset shows that the Hamiltonian matrix has been put into a block-diagonal form and the 1 × 1 blocks signify eigenstates.This can be better understood by considering the dependence of the off-diagonal norm on the flow step duration s.For the initial two steps, Fig. 2b), we find a rapid decrease of the norms of the off-diagonal restrictions.However, for the later steps, Fig. 2c), the diagonalizing iteration loses efficiency and saturates which is due to increasingly more shallow minima of the off-diagonal norm in dependence on the flow step duration.This is a distinctive feature of the method: The iteration aims to decrease the canonical bracket which may not result in full diagonalization in presence of spectral degeneracies [13,10].Full diagonalization can be obtained by a variational search for a diagonal association different from dephasing.
Numerical example: Variational Dk for DBI.Fig. 3 shows the numerical results for flowing the Hamiltonian after optimizing the flow step duration for all generators in a selection that included the canonical bracket and Ŵ (µ) for all possible choices of phase flip operators Ẑµ in Eq. (17).The insets show the full matrices of the initial and flowed Hamiltonians, demonstrating gradually increasing diagonaldominance of the flowed Hamiltonians.Due to rapidly growing Hilbert space dimension, the visualization of the full Hamiltonian matrix becomes impractical for increasing system sizes so a state-selective analysis of the type shown in Fig. 1 should be preferred but appendix D presents an analogous plot for L = 7 qubits.

Numerical example:
Full diagonalization.Fig. 4 shows a comparison between the GWW DBI and a DBI with variationally selected brackets.The decrease of the Hilbert-Schmidt norm of the offdiagonal restriction is faster for the DBI exploring where B x ∈ [0, 1.5] are independent and identically distributed according to the uniform distribution.
Here we consider â to be the annihilation operator of either a fermionic system where âx â † y + a † y a x = δ x,y or of a bosonic system when âx â † y − a † y a x = δ x,y for all x, y ∈ [L] ×2 .This Hamiltonian can be written as Ĥ(h) = x,y h x,y â † x ây for an appropriate matrix Ĥ = Ĥ † collecting all couplings, e.g., for ĤAI we have h x,y = 1 for nearest-neighbor sites, h x,x = B x and zero otherwise.The width of the self-links encodes the approximation to the single-particle eigenvalues.
For both fermions and bosons, these coupling matrices generate the Gaussian representation of the unitary group and in particular we have the Lie-algebra homomorphism Thus, we can consider a Gaussian DBI in the singleparticle space with such that where and h 0 is the initial coupling matrix.If we solve the Gaussian DBI then we obtain the full Hamiltonians acting on the Hilbert space by setting Ĥk = Ĥ(h k ).Fig. 5 shows the results of performing a GWW DBI with optimized recursive step durations s k for a randomly sampled instance of the Anderson Hamiltonian.Panel a) shows the decrease of the norm of off-diagonal restrictions and panels b,c,d) show graph representations of the coupling matrices during the iteration.This routine can in principle be performed for much larger systems (even 100 × 100 on a regular laptop) but visualization would become difficult due to clutter.For the modest 5 × 5 lattice size, we see all elements of the diagnalization via a DBI: Initially the coupling between nearest-neighbor sites are substantial and successive double-bracket rotations lower these coupling strength.In the process further offdiagonal couplings become generated but carry almost negligble strength.

Heuristic strategies for lowering implementation cost
Fewer recursive repetitions: Repeating GCI steps.The numerical examples presented results for DBIs but for hardware implementation, the GCI quantum algorithm would be used.Given, Dk one may optimize s k to give the highest σ-decrease from a single group commutator rotation (37).However, it turns out that it may be more useful to repeat twice the same group commutator formula to target more closely a double-bracket rotation of the corresponding DBI.
Fig. 6 illustrates the σ-decrease obtained by targeting approximations of a double-bracket rotation so m ≈ e −s Ŵ .The model is the TLFIM as in Fig. 1 on L = 9 sites with J X = 2.The results show that the GWW double-bracket rotation gives a larger σ-decrease than rotations by the group commutator.This means that in a GCI, instead of performing a single group commutator step to obtain Ĥk+1 and Figure 6: Repetitions of a group-commutator rotation step allow to increase σ-decrease.In most common applications of the group commutator formula (39) a small flow step duration s is considered because then an evolution governed by a commutator is approximated.However, heuristic group commutator steps for sizeable durations s are examples of σ-decreasing unitaries.The σ-decrease is largest in the limit of repeating many times the group commutator parametrized by a short flow step duration s.
proceed onwards, one should rather obtain Ĥk+1 by several repetitions of the group commutator involving Dk and Ĥk .This should be more efficient because prematurely switching the generator K times will necessitate N (K) = (3 + o( 1)) K queries to the evolution under the input Hamiltonian while repeating the same GCI step K times would scale linearly N (K) = O(K) For this reason, in practice, this heuristic strategy should be receive preferential consideration.
Efficient pinching: Making use of locality.The pinching scheme based on Eq. (53) uses exponentially many phase flips.This may be needed in the worstcase, i.e., for dense matrices [37] but is not practically applicable.This number can be reduced if σ( Ĥk ) is sparse, i.e., a sum of a small number of Pauli operators.Appendix C shows that restricting the product in Eq. (53) to a smaller number of phase flips selected uniformly at random allows to with high probability accurately approximate the evolution under a pinched generator.Local input Hamiltonians are sparse and for early stages of DBIs Lieb-Robinson bounds apply and thus efficient pinching heuristics can be employed.Note, that Hastings conjuctured that Lieb-Robinson bounds do not hold for all double-bracket flows [38] so sparsity may be reduced very quickly also in DBIs.
More specifically, the App.C provides a bound on how many phase flips suffice to achieve a desired approximation.This depends quadratically on the sparsity S of the given Hamiltonian.For initial Hamiltonians we have S = 2L for ĤTFIM and S = 3L for ĤTLFIM .In both these cases the sparsity is linear in the number of qubits S = aL, with a > 0, and the bound in App.C ensures an accurate pinching approximation using R = a 2 L 2 J 2 ϵ −2 log(2aL/δ) phase flips, where J is the largest coupling strength, ϵ is the required pinching precision and 1 − δ is the success probability.
In general the sparsity is the number of nonnegligible off-diagonal interaction terms and cannot be computed a priori for evolved Hamiltonians beyond using arguments based on Lieb-Robinson bounds.Fig. 1a) shows an example where surprisingly short DBI steps have been locally optimal.By comparison, if one would consider a quench evolution of duration of the ordered as considered in the DBI, then the Lieb-Robinson cone would be much smaller than the total system size.
Regarding runtime, we may additionally consider the intuition provided by the notion of σ-decreasing: Accurate pinching is only required for approximating the GWW DBI but other generators may also decrease the off-diagonal terms.One may choose to explore heuristics wherein pinching uses a number R of phase flips independent of the encountered sparsity.If R depends polynomially on the system size, the runtime would still be exponential in the number of DBI steps K due to the recursion but if K is kept constant it would be efficient in the system size.Finally, on can experimentally check if R independent of the system size is enough to obtain a σ-decreasing DBI.

Classical-quantum hybrids:
Precompiling.GWW flow has received a wealth of attention by quantum many-body physics theorists [10,11].These existing classical algorithms can aid the quantum computation wherein the first steps of the GWW DBI could be computed numerically and the classical prescription of the canonical brackets would be implemented directly by Hamiltonian simulation.More specifically, a typical approximation is to restrict Ĥk to be a quantum Ising model with couplings depending on the recursion step k.Each of such models then can give rise to a respective bracket Ŵk which can be used for a double-bracket rotation.
For example, quantum computed dephasing is not needed to run the first step of the GWW DBI for Ising models because we know ∆( ĤTFIM ).Generalizing this to precompiling, which replaces the DBI by direct implementation of s −s k Ŵk , is efficient in terms of the sparsity of the DBI Hamiltonians Ĥk and, if viable, it should be preferred over running the full scheme which includes dephasing.On the other hand, once the assistance of the classical computations will reach its limits the actual quantum computation can be used to extend the diagonalizing iteration further.
Note, that for systems with sizes considered in the numerical examples presented in this work essentially the entire GWW DBI can be precompiled.For L = 9 the decomposition of any Ĥk or Ŵk involves at most 2 2L ≈ 2.6 × 10 5 different Pauli operators.This is the worst case upper bound on the number of interaction terms that may be necessary to precompile steps of the classically prescribed GWW DBI circuit.Of course, for first DBI steps that number is much smaller and it may be a very useful heuristic to perform Hamiltonian simulation for first few steps of the GWW DBI When precompiling, one should directly include the heuristic presented above and locally optimize σdecreasing flow step durations s 1 , . . ., s N .Inspecting the precompiled GWW DBI circuit (65) makes apparent its practicality in near-term: It suffices to just simply use Hamiltonian simulation for Hamiltonians −i Ŵk as generators.Doing this accurately would achieve the same results as presented, e.g., in Fig. 1 and thus GWW DBI allows to compile approximate diagonalization circuits.
Precompiling will eventually cease being practical but further quantum computation starting from the shifted frame may be possible.For some fixed number of extra DBI steps a σ-decreasing DBI may proceed via quantum computation: Starting from the shifted frame a variational assignment of the DBI step duration such that Ĵs = V (GC) s ( Ĵ) is maximimally σ-decreasing can be considered.

Comments on the necessary number of recursion steps.
The GWW flow acting on an arbitrary product state will yield an approximation to an eigenstate of Ĥ0 and for local Hamiltonians we know certain properties of generic eigenstates.For example their bi-partitions are expected to have extensive entanglement cost which suggests that correlations should be allowed to span substantial portions of the system [39].If the GWW flow as it establishes correlations will be subject to a Lieb-Robinson bound [38,40,41] then taking the total number of flow steps in proportionality to the system size K ∼ L may be necessary.
When exploring qualitative rather than quantitative questions about quantum matter, taking K independent of the system size L may be possible.The eigenstate thermalization hypothesis, see Refs.[42,43,44] for proven instances, suggests that the reductions of eigenstates agree with a thermal density matrix.These, in turn, obey a so-called area law for the mutual information [45] which is related to decaying correlations, i.e., in physically relevant cases choosing N independent of the system size L may be considered as long as it is sufficiently large to match the required range of correlations.Many-body localized systems have eigenstates which are lowly entangled and indeed applications of double-bracket flow methods have been reported [46].Regular quantum statistical systems exhibit a correlation length which decays with the temperature.
In summary this discussion is qualitative and for practical implementations one can instead make a case-by-case analysis like in Fig. 1.Having said that qualitatively entanglement properties of eigenstates can provide a guideline to identify systems where DBIs will perform particularly well.Many-body localized spin [46] or fermionic [47] systems seem particularly appealing for potential explorations.

Głazek-Wilson-Wegner flow
The continuous Głazek-Wilson-Wegner (GWW) flow is a smooth family of unitary operators Ûℓ parametrized by the total flow duration ℓ ∈ [0, ∞).Given Ûℓ , we define the GWW flowed Hamiltonian where Ĥ0 is the input Hamiltonian.As we will discuss below, generically the flowed Hamiltonian Ĥℓ is increasingly more diagonal as ℓ increases and so the GWW flow can be used to obtain approximations to a unitary diagonalization transformation.The GWW flow unitaries Ûℓ are defined using the requirement that the flowed Hamiltonian Ĥℓ solves the GWW flow equation [7,8,9,11,10] where is the commutator of the diagonal ∆( Ĥℓ ) and offdiagonal σ( Ĥℓ ) restrictions of the flowed Hamiltonian.The choice (69) will be referred to again as the canonical bracket.
The GWW flow allows to sequentially diagonalize Hamiltonians which follows from the fact that the squared Hilbert-Schmidt norm of the off-diagonal restriction is non-increasing [10,11] This equation states that the magnitude of the offdiagonal restriction of flowed Hamiltonians generically decreases and the negative rate is equal to twice the Hilbert-Schmidt norm of the canonical bracket.
Appendix E provides a detailed discussion and selfcontained derivation of this key monotonicity feature.Eq. ( 70) can be also derived similarly to Eq. ( 24) because we may consider Ĵ = Ĥℓ and take the limit of infinitesimally short double-bracket rotation s → 0 and use that Ŵ (can) ( Ĵ) = Ŵℓ .The only difference is that in this case we would be performing the Taylor expansion for the GWW flow.The monotonicity relation ( 70) is most crucial because it shows why the flow is diagonalizing: The magnitude of off-diagonal terms in the flowed Hamiltonian can stop changing only if the Hilbert-Schmidt norm of Ŵℓ vanishes.In turn this implies Ŵℓ = 0 which by definition (69) happens only if the diagonal restriction commutes with the off-diagonal restriction.This is true if one of these restrictions vanishes and by (70) it is the off-diagonal restriction that will be flowed away.Alternatively, on some occasions the two restrictions can both be non-trivial but then they must be commuting so simultaneously diagonalizable.Thus, in generic situations the fixed point of the GWW flow will be a diagonal form of Ĥ but it is also possible that the flow will reach only a block-diagonal form due to, e.g., spectral degeneracies [12,15].

GWW flow limit of the GWW DBI
In this subsection we present claims proved in the appendix, namely with only small overhead the GWW GCI can be made to converge onto the GWW flow too.We next describe the overhead needed to setup a converging GWW GCI.
We begin the formulation of the modified GWW GCI by fixing the desired flow duration ℓ and the total number of recursive steps K. Based on these we set the durations of the recursive steps to be equidistant and take s = ℓ/K.
As above, we will use the approximation to the evolution under the pinched generator denoted by and define the modified group commutator step by where r = √ s/R and R is an integer which sets the refinement scale needed to converge towards the GWW flow.
We next define the modified GWW GCI unitaries Vk after k steps by and we set V0 = 1 and Ĵ0 = Ĥ0 .The iterated Hamiltonian is given by This modified GWW GCI converges to the GWW flow lim K→∞ VK = Ûℓ which is captured by the following proposition.

For fixed total flow duration ℓ ∈ [0, ∞) and K equidistant flow steps, the refined GWW GCI quantum algorithm converges to the continuous GWW flow according to
The proof is in appendix B. The proposition shows conceptually that despite the GWW flow being nonlinear, it can be implemented in the quantum computing framework.Indeed, in this work the GWW flow plays the role of providing the foundation for the GWW DBI by setting Dk = ∆( Ĥk ) which for sufficiently short durations s k is promised to reduce the norm of the off-diagonal restrictions.
Quantitatively, the obtained convergence rate precludes meaningful practical applications.The inverse square root dependence on the number of recursion steps stems from the scaling of the error term of the group commutator approximation (39).This can be improved by considering higher-order product formulas [29] but the exponential dependence on ℓ may be a characteristic feature of recursive discretizations because Prop. 5 in app.B proves the convergence of the GWW GCI to the GWW DBI and a similar scaling appears there too.Thus it appears that continuous double-bracket flows are useful conceptually and DBIs are a better suited notion for experimental implementations.

Relation to prior work and open questions 4.1 Towards quantum hardware applications
Regarding variational quantum algorithms.Similar to the design principle of regular variational approaches, running double-bracket quantum algorithms relies directly on primitive gates [1].Being algorithmic, a diagonalization DBI may sometimes appear less efficient than a circuit ansatz optimized by brute-force.This is expected especially for miniature systems of a handful qubits because a less restrictive ansatz allows to 'overfit' and achieve an input-specific compression which is unlikely to generalize to larger systems.Similarly, if a device is very noisy and only miniature circuits involving just a handful of layers of entangling gates are available then brute-force optimization may produce better results than the more algorithmic DBI approach [48,49,50,51,52].
For larger quantum computations (larger number of qubits and larger circuit depths) diagonalization DBIs will perform better than circuits optimized by brute-force.Finding the right circuit is computationally hard [2,3] and in practice barren plateaus are prohibitive [1].In contrast, a DBI can be set up easily and can provide a feasible solution to the quan-tum compiling task of approximating an eigenstate.If thousands of layers of entangling gates are available then DBIs will fare better as soon a brute-force optimization will become unpractical for the mere reason of being able to use the quantum hardware meaningfully.
An unstructured circuit ansatz optimized by bruteforce can be used as preconditioning for the more sophisticated DBI approach which can take over for a second stage of state preparation.It is open whether DBIs, together with short-depth variational circuits, will suffice for advancing material science without resorting to the even more advanced quantum algorithms based on phase estimation.See Ref. [53] for a recent work on optimizing variational circuits using Riemannian gradient flow methods related to doublebracket flows.
Without preconditioning, quantum algorithms based on phase estimation need an exponential runtime for ground state preparation [57,36].For doublebracket flow it is difficult to predict when a target accuracy will be reached.For similar reasons, there is no runtime guarantee for variational double-bracket iterations and understanding it better is a key theoretical challange to advance this approach further.For proof-of-principle applications and benchmarking for material science applications, DBIs can be effective just after a few recursion steps.This may be indicative of an exponential convergence which appears for the double-bracket flow of the QR algorithm [12,15].
Regarding DBI parametrization.The variational brackets studied here have been simple, namely products of Pauli-Z operators.Is it more advantageous to use linear combinations of Pauli-Z operators?For both TFIM and TLFIM ∆( Ĥ0 ) = L i=1 Ẑi is in form of such linear combination and in Fig. 4 is outperformed by a single product operator.
The canonical bracket loses efficiency in presence of degeneracies in Ĥ0 and in general lack of degeneracies allows to ensure to eventually obtain diagonalization [13,15,12].Are spectral gaps important when aiming for fast heuristic approximations?
Another aspect of interest is choosing the computational basis state which will offer the most insight about the physical properties of the system.What needs to happen during the iteration for the fully polarized state |11 . . . 1 ⟩ to not flow towards the ground state?

Towards new quantum algorithms
Regarding other double-bracket flows.There appear to be further possibilities for creating novel quantum algorithms by building on the various continuous flows developed in the field of dynamical systems.This includes the early work in Ref. [12] which discusses continuous flow equations implementing the QR decomposition and proposes a continuous version of the Golub-Kahan algorithm for computing the singular value decomposition [68].In Ref. [13] doublebracket flows for diagonalization, linear programming and sorting have been formulated, see also Ref. [69] for matching optimization and for simulating any finite automaton [70].Thus, there appears to be a vast literature on analog classical computing which explores how non-linear differential equations can encode various computational tasks [14,15,71,72,73,74,27].In this work the focus has been solely on the task of preparing eigenstates of quantum many-body systems but quantizing these ideas can add to the body of explicit quantum algorithms that we know of.
Regarding analog quantum simulation.It is very likely that when repurposing other dynamical equations for quantum computing one will face challenges similar to those discussed in Sec. 3.However, we may ask if a fully analog implementation of a double-bracket flow is viable?Here we resorted to a discretization which raises the question if recursive circuits can be avoided in the digital approach?Positive answers to these questions would likely affect an entire family of, possibly quantum, algorithms.Ref. [20] takes steps towards using input quantum states as quantum instructions for doublebracket quantum algorithms but the input system size needed to grow exponentially with the duration of evolution.

Regarding dephasing as a quantum algorithmic component.
The key physical property needed to embed the GWW flow into the quantum computing framework has been irreversible dephasing.The approach taken here parallels dynamical decoupling [75].The presented implementation proposal uses that pinching is a mixed-unitary channel but the preparation complexity has been exponential in the system size if no structure of the problem is promised.Arguably this is natural because of the size of the input matrix but it would be useful to avoid this preparation inefficiency.The appendix C explores the case of pinching sparse Hamiltonians, however, if the recent conjecture regarding Lieb-Robinson bounds of doublebracket flows is true [38] then the flowed Hamiltonian could quickly become dense.It should be noted that the Brockett double-bracket flow [25,26] does not feature dephasing and yet is also diagonalizing [15].
Regarding quantum algorithms using mixedunitary channels.It occurs worthwhile to consider the question: What use can we make of mixed-unitary maps which involve polynomial rather than exponential number of unitary conjugations?Here, pinching was a costly circuit component which broadly speaking had the role of implementing memory loss.Can polynomial mixed-unitary maps give rise to controllable irreversibility and are they useful for creating purposeful quantum algorithms?Note, that recent experiments have shown that circuits of the type involved in the pinching component (71) have an expressive power which suffices to prepare eigenstates experimentally [6].One could speculate if the role of the local gates assigned in that study by variational optimization has also been to dephase couplings.
It seems that recursive flows admit a discretization instability.Proposition 5 states convergence of the proposed quantum algorithm towards the continuous GWW flow in the limit of vanishing flow step durations.However, the constants involved in the overall bound depend exponentially rather than polynomially on the total flow duration.An abstraction of this is lemma 10 in appendix A which suggests that this exponential dependence stems from the deviations of the generator because it depends on the accuracy of the previous steps.If this exponential dependence is a true instability then the more adaptive σ-decreasing setting should be considered in practice.

Conclusions and outlook
Double-bracket iterations (DBIs) reveal in what order to compose evolutions under an input Hamiltonian Ĥ0 with diagonal evolutions in order to obtain approximations of its eigenstates.The approach has an analytical character and offers a way to find a feasible solution to a large scale quantum compiling task which is untractable when relying exclusively on brute-force optimization [2,3].
The essential ingredient is to perform a doublebracket rotation, namely if D0 is a diagonal operator then for short enough s 0 either exp(−s 0 [ D0 , Ĥ0 ]) or exp(s 0 [ D0 , Ĥ0 ]) will transform Ĥ0 into Ĥ1 which will have a decreased magnitude of off-diagonal terms as expressed by the Hilbert-Schmidt norm.The abstract algorithm 1 summarizes the recursive iteration of this double-bracket rotation ansatz by feeding in another diagonal operator D1 to transform Ĥ1 into Ĥ2 which again is more diagonal etc.Thus we obtain an ansatz which i) is variational ( Dk and s k can be optimized) and ii) has implementation requirements similar to unstructured approaches of using readily available gates and composing them such as to prepare eigenstate approximations.
In contrast to other variational approaches, though, for double-bracket iterations obtaining an additional diagonalizing step is as simple as picking a new diagonal operator and learning whether to use the induced bracket for forward or backward evolution.The knowledge how to perform effective double-bracket rotations can be learned using quantum hardware which then can be used to progressively study further the system of interest with a next recursive step.This is different from quantum phase estimation which relies on random outcomes of measurements.In terms of implementation requirements, the presented double-bracket approach can be viewed as being positioned between unstructured variational approaches and those based on phase estimation.
The group commutator iteration (GCI) algorithm 2 formalizes the main proposal of this work, namely to implement experimentally iterations which approximate double-bracket rotations by the corresponding group commutator formula.Algorithm 3 transpiles GCIs into a sequence of evolutions governed by the input Hamiltonian and diagonal operators.The involved evolutions can be implemented by Hamiltonian simulation or Clifford operations and thus doublebracket quantum algorithms consist of operations natural for quantum computing.The proposed quantum algorithm 4 is as simple as applying the evolutions prescribed by the transpiler algorithm 3.
The key ingredient of the GCI algorithm 2 are evolutions governed by the input Hamiltonian Ĥ0 .The presented unfolding approach leads to exponentially many queries to the input evolution in the number of recursion steps.However, Ref. [20] shows that in certain cases the circuit depth can be reduced to polynomial via quantum dynamical programming.This comes at the cost of additional overheads and thus the number of recursion steps must be kept low.
Fig. 1 addresses the question what eigenstate approximations can be achieved with few recursive DBI steps.The fully polarized state |11 . . . 1 ⟩ stands out as the candidate of choice for experimental studies because it can be expected to tend to a state representative of the low-energy physics of the model even for larger system sizes.Additionally Fig. 1 shows that while the overall norm of the off-diagonal restriction of the iterated Hamiltonian may remain substantially non-zero after a dozen of DBI steps, some computational basis states do tend towards eigenstates.This is evidenced by experimentally measurable energy fluctuations.
This work arose based on existing theory of doublebracket flows and Sec. 3 supplements a demonstration that the Głazek-Wilson-Wegner (GWW) flow can be implemented in the quantum computing framework.For this it is necessary to approximate evolutions governed by the restriction to the diagonal ∆( Ĥk ) of iter-ated evolution generators Ĥk .Generator dephasing is irreversible and approximating it is costly when implemented by unitary transformations.However, as has also been demonstrated, interventions to reduce the runtime are possible.
While double-bracket quantum algorithms are oblivious in the sense that classical information about the input is not necessary, they are not blind as the number of queries to e −it Ĥ0 may suffice to learn Ĥ0 .An interesting physical behavior ensues in that the Hamiltonian 'guides' its own diagonalization.Doublebracket quantum algorithms use a 'quantum' oracle because instead of querying 'classical' data, e.g., a value of a matrix element, we query an application of a quantum evolution operation to the quantum registers.
A quantum device capable of double-bracket flows for systems with sizes larger than are practical on a laptop computer will likely resort to the proposed heuristics i) extending durations of double-bracket rotations ii), efficient pinching or diagonal associations and iii) precompiling.The first two will need to be used for large systems but for small numbers of qubits precompiling, proposed in Eq. ( 65), can include them directly.Thus for proof-of-principle implementations of the double-bracket flow quantum algorithm precompiling reduces the experimental demands to merely accurate Hamiltonian simulation.

Appendix
This appendix contains five sections and presents details of calculations that extend the discussion from the main text.The overall aim has been to provide a self-contained exposition which includes technical details.This is especially true in the first section A, which focuses on operator relations and inequalities which are basic for working with double-bracket quantum algorithms.It starts with elementary norm upper bounds, then derives a second-order upper bound on the group commutator and finally presents solution to an abstract recursion in lemma 10 which impacts bounds on the convergence of flow discretizations.The proofs of these results are technical.
Section B discusses details of the elementary circuit components of the proposed quantum algorithm of the double-bracket flow.Firstly, the subsection B.1 proves that the discretized GWW flow defined in Eqs.(1) and (2) converges to the continuous GWW flow.This is a precursor to understanding the quantum algorithm for the GWW flow whose steps have additional deviations besides those stemming from discretization.We find that the recursive character of the discretized GWW flow results in an exponential sensitivity on the length of the flow which is different from discretizations of evolutions with an independently prescribed generator where the error bound has a polynomial dependence on the total duration.Unless this characteristic is an artifact of the proof technique, a quantum device cannot approximate the continuous GWW flow in polynomial runtime.Next, subsection B.2 proves error bounds on the elemental components of the proposed quantum algorithm in relation to the discretization of the GWW flow.Subsection B.3 provides a proof of the proposition 5 stated in the main text regarding the limit of the quantum algorithm to the continuous GWW flow.
Section C resorts to large deviation bounds to prove that a random selection of phase flips leads to accurate pinching wherein the success probability depends on the number of phase flips versus the sparsity of the Hamiltonian.
Section D presents additional results of numerical simulations.Section E discusses details about the continuous GWW flow.While this reproduces the known facts also reviewed elsewhere [11,10], a mathematically inclined reader may find it easier to understand because certain subtleties are highlighted rather than suppressed.In particular, a self-contained constructive proof of the existence of the GWW flow will be provided.The motivation for considering that problem is summarized in subsectin E.1.Subsection E.2 derives the key monotonicity property (70) of the GWW flow on which rest statements about its diagonalizing properties.Finally, subsection E.3 shows that DBI discretizations converge to the continuous flow.

A Helpful technical calculations
As in the main text, the Hilbert-Schmidt norm of any two linear operators Â, B ∈ L(C ×D ) induced by the Hilbert-Schmidt scalar product ⟨ Â, B⟩ HS = tr[ Â † B] will be denoted by ∥ Â∥ HS = ⟨ Â, Â⟩ HS .We will derive certain results for an unitarily invariant norms which will be denoted without a subscript as ∥ Â∥.The operator norm (maximum singular value) is the most relevant example other than the Hilbert-Schmidt norm which is unitarily invariant too.Lemma 6 (Commutator formulas).For any Â, B, Ĉ, D ∈ L(C ×D ) we have that the difference of commutators is a sum of commutators of differences Moreover the trace induces cyclicity as follows which in particular yields Proof.A direct use of linearity of commutators gives We obtain the cyclicity property by using the cyclicity of the trace in two directions We obtain (78) by using (77) which gives tr B[ Â, Â] = 0.
Lemma 7 (Bounding canonical brackets).Let Â, B ∈ L(C ×D ).We have the upper bounds on the bracket and on the difference of brackets Proof.We have where we used that via the pinching inequality in general ∥∆( Â)∥ ≤ ∥ Â∥ [76, Eq. (IV.51) on p. 97] which together with the triangle inequality implies that ∥σ( B)∥ ≤ 2∥ B∥.Notice, which is useful generally, that σ(•) is linear namely Using ( 76) and ( 82) we obtain Lemma 8 (Diagonal-off-diagonal orthogonality).Let Â, B ∈ L(C ×D ).We have the Hilbert-Schmidt orthogonality of the diagonal and off-diagonal restrictions Proof.Let us consider a basis { |k ⟩} k=1,...,D and by direct computation we find that an off-diagonal operator The relations (89) and (77) turn out to be quite crucial.Let Â, B, Ĉ, D ∈ L(C ×D ) be hermitian then we have Applications for this purely linear-algebraic relation can be found by rephrasing the relation (94) using the Hilbert-Schmidt scalar product This means that certain projections of off-diagonal restrictions of brackets can be evaluated as Hilbert-Schmidt overlaps of reshuffled brackets.Or, put differently, the nested bracket simplifies to the first order bracket under the trace.This is not directly valid for higher-order nesting.
In particular setting all the involved operators to be the Hamiltonian Ĥ we find This formula comes into play when analyzing short flow steps generated by the canonical bracket.If we consider a variational bracket with some diagonal operator B = ∆ (Z) ( Ĥ) then we would find that the canonical bracket comes into play nonetheless This discussion stresses the mathematical structures that lead to the appearance of the canonical bracket when considering such classess of double-bracket flows.Lemma 2 in the main text summarizes the application to.double-bracket quantum algorithms.
Lemma 9 (Group commutator).Let Â = − Â † , B = − B † ∈ L(C ×D ) be anti-hermitian and let ∥•∥ be a unitarily invariant norm.We have Let us mention that we will typically use this bound for Â = i √ s Ĵ and B = i √ s Ĵ′ for some real evolution duration s ∈ R and hermitian Ĵ, Ĵ′ .We then get that the scaling of the deviation of the exact bracket-evolution is given by Due to a cancellation, this is a reduced power (and thus higher error for s ≪ 1) compared to, for example, the first order composition which thus converges at a faster rate. Proof.Define where It will be useful to consider smooth rescaling Â → x Â by some x ∈ R to give rise to a smooth matrix-valued function Bx Â for which the integral form of the remainder in the Taylor expansion theorem gives In the following we will use that by unitary invariance of the norm we get the bound for the remainder We first check that and By an explicit calculation we may check that the derivative of Qx reads By the fundamental theorem of calculus, triangle inequality and unitary invariance of the operator norm we obtain Next, we perform a telescoping step pivoting around [ Â, B] obtaining As an alternative, we may consider using more standard bounds together with the Taylor expanasion formula (104).Our strategy will be that e B Â e − B ≈ e B Â − B = e [ Â, B]+ R ≈ e [ Â, B] e R ≈ e [ Â, B] . (114) We will use that which, we may mention, can be proven for any two anti-hermitian Â, B by considering x) B such that by using unitary invariance Similarly, for any anti-hermitian Â, B we get the bound which we prove analogously by defining Q(x) = e (1−x) Âe (1−x) B e x( Â+ B) so that such that using we get a simplification of the bound (117) obtaining ∥e Âe B − e Â+ B ∥ ≤ To proceed with the bound on the group commutator approximation, for notational purposes we denote By using telescoping and triangle inequality we find Thus, if both Â and B contribute O(s) in the bounds then this derivation yields also a third-order bound O(s 3 ).
Proof.The ansatz is based on the observation that and let us show by induction that it is the solution to the recursion.For k = 0 we have α 0 = 0 as should be.Let us assume the relation (128) holds for some k ≥ 0 and show that hence it must hold for k + 1.The right-hand side of the recurrence for α k+1 reads B GWW flow as a quantum algorithm B.1 GWW flow as a GWW DBI limit We will consider the GWW flow unitary Ûℓ for some ℓ ∈ R and compare it to the unitary obtained after K steps of the GWW DBI.More specifically, we will set the recursion step duration to be s = ℓ/K and it will be useful to compare to the GWW flow at flow parameters {ℓ k = ks} k∈{0,1,...K} .
As in the main text, we start with k = 0 and define and also to generate the first recursion step approximation we define the bracket For any k ∈ {1, . . .K} we define This allows us to define the iterated Hamiltonian approximation and with this the approximation to the canonical bracket to generate the next step becomes Proposition 11.We have Proof.It will be helpful to freely use certain properties of Ûℓ or more specifically of Ŵℓ .
By explicit computation we have Using the GWW equations we have For all ℓ we have and and additionally using (82) in lemma 7 for Â = B = Ĥℓ Using these relations, we obtain the bound This means that thanks to s 0 dr|s − r| = s 0 dr(s − r) = s 2 /2.Thus, we have Using lemma 7 we have which can be bounded by the fidelity of the GWW flow approximation in the previous step This means that and so altogether we arrive at From lemma 10 we obtain and we arrive at the proposition statement by using the inequality (1 + x) r ≤ e xr with x, r > 0 so that

B.2 Error terms of elemental components
Proposition 12 (Pinching elemental component).For s ∈ R set r = s D .Then As evident from (157) this can be implemented using D times the evolution e −ir Ĵ and 2D flips Q k or Q † k .Proof.We can bound (165) by fixing an ordering of the terms in (14) and sequentially merging the terms together.In the first step we obtain for k ∈ {1, . . ., ⌊D/2⌋} [16] In the next step when merging 2k+4 the norm used in the upper bound will increase by 4 after using the triangle inequality.In general we have for all I, I ′ ⊂ {1, . . ., D} The sequence of mergings of this type can be kept track of using a tree with {1, . . ., D} as its leaves.When iterating over the tree we have to use the bound (160) at most ⌈log 2 (D)⌉ times and get For the special case D = 2 L this can be tightened to Lemma 13 (Refined pinching step).For any Proof.Let R ∈ N > 0. Using lemma 12 we have We next use inductively telescoping Thus if we set R = ⌈s −1/2 ⌉ = Ω(s −1/2 ) then we obtaine the desired O(s 3/2 ) bound Proposition 14 (Group commutator elemental component).Set r = √ s/R.Then Proof.By telescoping, we have Lemma 9 and proposition 12 yield

B.3 Convergence of the quantum algorithm to the GWW flow
When we considered the GWW DBI in subsec.B.1 above, we disregarded the fact that evolution according to the double-bracket must be approximated on a quantum computer.Paralleling the notation of the GWW DBI unitaries Ûk , we define the GWW GCI unitaries Vk as follows V0 = 1 and Ĵ0 (177) and For any k ∈ {1, . . .K} we define This allows us to define the iterated Hamiltonian rotation as Note that we are using the canonical bracket Ŵ (can) exactly for the GWW DBI and approximately (via the group commutator) for the GWW GCI.
Proposition 15 (Continuous flow limit).For fixed flow duration ℓ ∈ [0, ∞), the proposed quantum algorithm and the discretization of the GWW flow converge according to Moreover, in relation to the continuous flow we have Proof.We begin with Using lemma 7 we have for the first term For the second term we can use proposition 14 obtaining This recursive bound can be solved using lemma 10 such that For the second part, we use proposition 11 obtaining Let us comment that the convergence to the GWW flow demonstrates the diagonalizing properties of the quantum algorithm but this convergence is a weak notion due to a peculiar subtlety.In principle, one could attempt to prove that a quantum algorithm implements a σ-decreasing sequence for Ĥ.The proof for this would be harder because along of the flow one would need to lower bound the monotonicity gap in Eq. (34) to ensure that the σ-decreasing property holds.Instead, the above proposition shows convergence to the continuous flow and, by proxy, diagonalizing properties follow.The additional demonstration of σ-decreasing properties would hint at convergence rates because the monotonicity gaps yield the block-diagonalization rate.

C Approximate pinching in locally interacting qubit systems
For µ, ν ∈ {0, 1} ×L where L is the number of qubits denote These operators form an orthonormal Hilbert-Schmidt basis of linear operators This allows us to write any Hamiltonian Ĥ as In this notation, for η (1) , . . ., η (R) ∈ {0, 1} ×L let us define approximate pinching as acting on some input operator Â ∈ L(C ×D ) by If R = D and all η (i) 's are different then this is simply the pinching channel ∆ (η) = ∆.
Proposition 16 (Typically balanced flips).Fix ν ∈ {0, 1} ×L .Let η (1) , . . ., η (R) ∈ {0, 1} ×L be R choices of phase flips drawn independently from the uniform distribution on {0, 1} ×L .The probability of the norm of the approximate pinching exceeding ϵ is bounded as The proof is based on a basic combinatorial observation that for any fixed ν ∈ {0, 1} ×L the number of µ ∈ {0, 1} ×L which have an even overlap with it |µ ∩ ν| := L i=1 µ i ν i equals the number of those with an odd overlap.This implies that when drawing µ uniformly at random the probability of an odd overlap |µ ∩ ν| with ν is exactly one half Pr(|µ ∩ ν| is odd) = 1/2.This also comes into play in other settings, e.g., is the reason why for certain states fidelity can be estimated independent of the number of qubits [77].

Proof. By an explicit computation we find that
and this implies that The summands are distributed as independent fair coins because Pr((−1) |η (k) ∩ν| = 1) = 1/2.Thus, by applying Hoeffding's inequality to R tosses of a fair coin, i.e., a Bernoulli trial using the Rademacher distribution, we obtain Typical balance of sequences of phase flips means that off-diagonal interaction terms in σ( Ĥ) become small very fast individually.Let us define the off-diagonal support set s = {µ, ν ∈ {0, 1} ×L s.t.H µ,ν ̸ = 0 and ν ̸ = 0} and the sparsity S = |s| to be its number of entries.Additionally, let us use the max-norm to denote the largest interaction strength ∥ Ĵ∥ max = max µ,ν∈{0,1} ×L |J µ,ν |.
Proposition 17 (Approximate pinching).For any Ĵ with sparsity S pinching ∆( Ĵ) can be approximated in operator norm up to error ϵ > 0 and with failure probability δ > 0 using ∆ (η) ( Ĵ) by independently drawing phase flips R = Proof.By the triangle inequality we have so if η (i) 's are such that for all µ, ν ∈ s the sequence is sufficiently balanced such that ∆ (η) ( Xν ) ≤ ε := ϵ S∥ Ĵ∥max then ∥∆( Ĵ) − ∆ (η) ( Ĵ)∥ ≤ ϵ for some ϵ > 0. The probability of this failing to occur can be bounded as This sufficient condition implies that Pr We arrive at the statement of the proposition by using the prescribed dependance of the number of trials R on the approximation error ϵ and failure probability δ.

D Additional plots
This appendix provides examples of figures for other parameters than in the main text.Firstly, Fig. 7a discusses the analog of the Fig. 3 but now for L = 7 sites.It also presents two more plots, one for the TFLIM and one for TFIM with coupling strength J X = 1.Secondly, Fig. 8 shows eigenstates in analogy to Fig. 1 but now for TFIM as opposed to TFLIM which showcases the difference in flow when integrability is present.Finally, Fig. 9 shows the analogy to Fig. 4 but focuses on the coupling J X = 1 as opposed to J X = 2 in the main text.The numerical code written in python is freely available [78] and includes a class implementing a suite of functionalities for simulating double-bracket flows together with evaluation of observables and visualization.The entire project is accompanied by jupyter notebooks, one for each figure appearing in this work.The optimization of the flow step durations is performed by either a grid or binary search and the value yielding the largest σ-decrease is selected.Each of the figures can be produced in a matter of a few minutes.This timing is significantly larger for larger system sizes L > 9 than considered here and thus such simulations would carry extra cost in terms of greenhouse gases [79].

E Notes on the GWW flow E.1 GWW flow as a system of ordinary differential equations
The system of equations ( 68) and (69) can be written as a double-bracket differential equation   We can equivalently look at individual matrix elements with i, j = 1, . . ., D After we expand the matrix elements of the canonical bracket further we have This means that the prescribed function is a multi-variate polynomial of degree 3.
Polynomials are locally Lipschitz-continuous and thus by the Picard-Lindelöf theorem for an appropriately restricted domain of flow durations ℓ and matrix element values of ( Ĥℓ ) i,j there is a unique solution [80].Within that domain of flow durations Ĥℓ exists and hence Ŵℓ exists.Then the Dyson-Picard-Lindelöf series is given by where ∆ n (s) = {r ∈ R ×n : 0 ≤ r 1 ≤ r 2 ≤ . . .≤ r n ≤ s} is the n dimensional simplex.This series defines a unitary transformation Ĥ0 → Ĥℓ [41] which we may denote by Ûℓ .Alternatively, we may treat −i Ŵℓ as an input time dependent Hamiltonian and we may write it in short-hand notation of time-ordered exponantials as Polynomials are not globally Lipschitz-continuous so the Picard-Lindelöf theorem does not directly yield a unique solution for ℓ → ∞.Ref. [80] stresses that lack of global Lipschitz-continuity can at times have drastic repercussions and exemplifies it by the ordinary differential equation dx(t)/dt = 1 + x(t) 2 whose solution is x(t) = tan(c + t) where c is a constant set by the initial condition and we get x(−c ± π/2) → ±∞.Thus, the above discussion using basic existence theorems from the theory of dynamical systems substantiaties a priori the existence of the GWW flow only for a certain flow duration restricted by the local Lipschitz-continuity constant.
The monograph by Helmke and Moore [15] provides a proof of the existence of the solution to double-bracket flow equations for all ℓ using a differential-geometric approach.Thus, Eq. (216) is well-defined for all ℓ ∈ R.
Subsec.E.3 proves the existence of GWW flow for all flow durations ℓ without resorting to the theory of ordinary differential equations but rather by showing that the GWW DBI converges to the GWW flow.While this necessitates technical calculations, the proof will be self-contained and constructive.

E.2 Monotonicity of GWW flow
When it exists, the GWW flow is smooth and so we can derive in the continuous case the precursor to the DBI monotonicity lemma 2 from the main text.
Proposition 18 (Monotonicity of GWW flow).We have Proof.We need to begin by studying the derivative of the off-diagonal terms which is defined as The proof of this proposition will build upon several intermediate results.The exact unitary of the GWW flow Ûℓ will be defined as the limit of composing unitary operations with piece-wise constant generators.These generators will be built recursively from previous steps.We will show that this sequence converges and that the limit unitary satisfies the GWW flow equation.
The sequence of approximations enumerated by N = 1, 2, . . .will based on flow parameters ℓ (N ) k = kℓ2 −N with k ∈ {0, 1, . . .The first approximate flow step will be generated by For k < N the approximate canonical bracket for the following step is The GWW flow unitary will be defined as This can be interpreted as a limit of the GWW DBI, only we make explicit the dependence on the number of discretization steps N .Point-wise (i.e., for fixed ℓ) convergence in operator norm is implied by the sequence of GWW flow approximations being a Cauchy sequence (257) We will denote the subject of th recursion of the upper bound as {α k } k∈{0,...,2 N } so that where we defined a = 16∥ Ĥ∥ 2 s (N ) and b = 64∥ Ĥ∥ 2 s (N ) 2 Observing that b = 4a we obtain from lemma 10 the bound We continue by proving continuity of the unitary Ûℓ .where we define the discretization step sizes for each target flow duration s = ℓ/N , s ′ = (ℓ + ϵ)/N and Ŵk , Ŵ ′ k are the approximants to the canonical bracket for Ûℓ and Ûℓ+ϵ , respectively.We will arrive at a bound independent of N which will allow to take the limit ϵ → 0.
We use appropriate telescoping which allows to iteratively relate to previous discretization steps and then again use proposition 7 For both flows the total duration differs but the starting direction is the same so Ŵ1 = Ŵ ′ 1 and so using lemma 10 we find the bound Here the first term was the regular Lie derivative of the matrix exponential and so we need to establish that the deviation from a constant generator vanishes for ϵ → 0. To prove this, let us first notice that Ŵℓ = Ŵ (N ) 1 ( Ĥℓ ) ≡ Ŵ (N ) Proof of proposition 19 (Existence of GWW flow).Summarizing, the limit unitaries Ûℓ defined in Eq. (238) exist and they satisfy the GWW flow equation which proves the existence of the GWW flow.
(k) µ can be estimated in the Schrödinger picture by evolving an initial state forward |µ ⟩ → Vk |µ ⟩ and estimating the expectation value of the input Hamiltonian Ĥ0 .Fig.1c) shows examples of initial states |µ ⟩ whose expected values of energy stabilize to a constant level after relatively few GWW DBI steps.

Figure 1 :
Figure 1: GWW DBI aiming to approximate eigenstates of the transverse and longitudinal-field Ising model ĤTLFIM of L = 9 qubits.a) During the DBI, the decrease of the Hilbert-Schmidt norm of the off-diagonal restriction progresses in several stages.Optimization of the DBI step durations s k results in some steps yielding larger overall decreases than others which can be explained by lifting of degeneracies.b) The energy spectrum of the Hamiltonian overlaid by the sorted entries of diagonal restrictions ∆( Ĥk ) after k = 3, 15 and 30 steps.It hints that later DBI stages affect the states in the middle of the spectrum where the energy gaps tend to be small.In particular approximate degeneracies which show up as steps of ∆( Ĥ3) become lifted through further recursion steps, as evidenced by the softened progression of ∆( Ĥ15).c) Expected values of the energy E (k) µ = ⟨µ| Ĥk |µ⟩ computed for a selection of computational basis states |µ ⟩ which is equivalent to applying the DBI cicuit to that state and evaluating the expected value of the input Hamiltonian.The two fully polarized states were selected because they tend towards states at the edges of the spectrum.Additionally the selection includes four other computational basis states with small energy fluctuations after k = 15 DBI steps.Expected values of the energy E (k) µ change mostly during the first few DBI steps and then stabilize to a constant level.The colored areas around E (k) µ || Ŵk || HS Intermediate step k = 11 Last step k = 15

Figure 2 :
Figure 2: Example of GWW DBI applied to the transverse-field Ising model on L = 3 sites.a) In each step k = 1, . . ., 15 the canonical bracket(15) generates the evolution and the flow step duration is chosen to be giving the maximal decrease of the norm of the off-diagonal restriction ∥σ( Ĥk )∥ HS .After about 5 steps of a rapid decline the norm decrease begins to saturate and essentially ceases completely.This is due to the convergence to the block-diagonal form of the Hamiltonian indicated in the upper-right inset.b) For small DBI rotation durations the norm decreases linearly in accordance with the monotonicity relation(24).Evolution beyond the first minimum can yield another deeper local minimum, e.g., for k = 2 the second minimum has been chosen over the first.If the DBI rotation duration is chosen wrongly, the norm of the off-diagonal restriction may increase, e.g., for k = 1 most s do not yield a diagonalizing step.c) The monotonicity relation(24) implies that there is at least one negative minimum but as the iteration progresses it becomes shallow.Additionally, as shown in the inset, the norms of the canonical brackets decay and so the diagonalization rate slows down because the respective double-bracket rotations havae a weakened influence.

Figure 3 :
Figure 3: Illustration of variational DBI choosing either the canonical bracket with Dk = ∆( Ĥk ) or Dk = Ẑµ, whichever yields a larger decrease of ∥σ( Ĥk )∥ HS applied for TLFIM on L = 5 sites and JX = 2.This example shows that by going beyond the dephasing channel which gives rise to the GWW DBI, it is possible to proceed with the diagonalization because for k ≥ 6 the product operators yield better diagonalizing rotations.

Figure 4 :
Figure 4: Comparison of variational and GWW DBIs.a) The Hilbert-Schmidt norm of the off-diagonal restriction decreases more rapidly for the variational DBI.The appearance of the variational bracket Ŵ (010) has a significant impact, namely to lift degeneracy.b) Each line is the expected energy value during the progression of the GWW DBI evaluated for all 2 L = 8 computational basis initial states.The large energy fluctuation of the states with intermediate expected energy value can be explained by convergence towards a superposition of two degenerate states.c) The degeneracy of the initial computational basis states becomes fully lifted after k = 4 flow steps and the expected energy fluctuation becomes negligible in the course of the DBI.

Figure 5 :
Figure 5: Results of the GWW DBI for the 2d Anderson model.a) Decrease of the Hilbert-Schmidt norm of the off-diagonal restriction ∥σ(h k )∥ HS .b) A representation of the initial coupling matrix h0 of the input model Ĥ0 = Ĥ(h0) as a 2d square graph with edge widths encoding the coupling strengths.According to the 2d geometry, there are only nearest-neighbor couplings and additionally we consider random uniform disorder which is encoded in the width of the self-links.c) Just a handful of DBI steps suffices to nearly decouple the neighbors.d) As the DBI progresses, distant sites become coupled but the strength is negligible and the system Hamiltonian becomes nearly diagonal, with sites of the lattice being nearly decoupled.The width of the self-links encodes the approximation to the single-particle eigenvalues.

Figure 8 :
Figure 8: Analogous to Fig. 1, GWW flow of computational basis states towards eigenstates but for the transverse-field Ising model ĤTFIM instead of TLFIM.a) The decrease of the norm of the off-diagonal restriction progresses smoothly instead of in stages.A lower value of the norm of the off-diagonal restriction is reached.The degenerate steps in the diagonal restrictions (inset) are less pronounced and thus match the spectrum faster.b) Expected values of the energy tend towards their asymptotic values faster than for TLFIM.States flowing towards outer edges of the spectrum exhibit periodic increases of energy fluctuations which does not appear for TLFIM and may be due to integrability.This signifies coupling and decoupling to other states and may be necessary to lift degeneracies in general.c) States that reach the smallest energy fluctuations after k = 30 flow steps accelerate the convergence after about k = 10 steps when the overall norm from panel a) moves away from the slowly decreasing plateaux and the oscillations of energy fluctuations begin to appear.

Figure 9 :
Figure 9: Comparison of variational and GWW flows in analogy to Fig.4but for JX = 1 and not 2. a) The Hilbert-Schmidt norm of the off-diagonal restriction decreases more rapidly for the variational double-bracket flow but sometimes locally optimal choices lead to smaller decreases in the long run (as evidence by the two σ-decrease curves meeting after two steps.b) The fully polarized states converge very fast when using the GWW flow.c) Again, the degeneracy of the initial computational basis states becomes fully lifted after only a handful of flow steps and the expected energy fluctuation becomes negligible in the course of the flow.