Two-Unitary Decomposition Algorithm and Open Quantum System Simulation

Simulating general quantum processes that describe realistic interactions of quantum systems following a non-unitary evolution is challenging for conventional quantum computers that directly implement unitary gates. We analyze complexities for promising methods such as the Sz.-Nagy dilation and linear combination of unitaries that can simulate open systems by the probabilistic realization of non-unitary operators, requiring multiple calls to both the encoding and state preparation oracles. We propose a quantum two-unitary decomposition (TUD) algorithm to decompose a $d$-dimensional operator $A$ with non-zero singular values as $A=(U_1+U_2)/2$ using the quantum singular value transformation algorithm, avoiding classically expensive singular value decomposition (SVD) with an $O(d^3)$ overhead in time. The two unitaries can be deterministically implemented, thus requiring only a single call to the state preparation oracle for each. The calls to the encoding oracle can also be reduced significantly at the expense of an acceptable error in measurements. Since the TUD method can be used to implement non-unitary operators as only two unitaries, it also has potential applications in linear algebra and quantum machine learning.


Introduction
In the early 1980s Manin [1] and Feynman [2] independently proposed that in order to circumvent the prohibitive scaling in the simulation of quantum systems, one would need a device that operates according to the principles of quantum mechanics.These considerations gave birth to the field that we now call quantum computing [3].Since then an entire zoo of quantum algorithms for simulating quantum processes has emerged, some of them promise spectacular improvements over their classical alternatives, even exponential speedups [3,4,5,6,7,8].However, most of these prominent quantum algorithms are solely tailored for unitary simulations of isolated quantum phenomena.Since such a quantum device is universal in its computational nature, one may expect that its applicability reaches beyond the standard unitary framework.
In general, quantum systems of interest are rarely isolated as they are exposed to the inevitable influence of their environments.The environment consists of inaccessible degrees of freedom of the system's natural surroundings as well as from the control devices that can become entangled with the system, ultimately leading to a loss of information.Therefore, the dynamics of a general quantum process must be described by a non-unitary, open quantum evolution [3,9,10,11].Understanding open quantum dynamics is indispensable to the study of dissipation and decoherence in quantum systems and therefore, noise in quantum circuits [12,13,14,15,16,17], and is fundamental to a wide variety of phenomena such as thermalization [18,19], non-equilibrium steady states [20,21,22], transport in strongly correlated systems [23,24] as well as applications in quantum biology [25,26,27,28].Moreover, open quantum dynamics can be harnessed to perform universal quantum computation [29,30] and prepare topological [31,32,33] as well as entangled states [34,35,36,37].Simulating the dynamics of open quantum systems is vital to understanding these processes.
Unlike isolated quantum evolution, open quantum evolution is non-unitary and thus presents a challenge for conventional quantum computers that are capable of directly implementing unitary gates.Quantum algorithms such as trotterization of Lindbladians [38,39,40,41] and simulation of Markovian open dynamics from a set of universal channels [42,43] rely on dilation techniques of encoding a non-unitary operator in a larger unitary.Recent advances have lead to promising methods such as the Sz.-Nagy encoding, which provides a minimal dilation [44,26,45,46].Another useful method for realizing a non-unitary operator is by implementing it as a linear combination of unitaries (LCU) [47,48], recently used to simulate Lindblad evolution [49].In Ref. [50], the authors propose implementing an arbitrary operator as a linear combination of four unitaries created by approximate operator exponentiation.A recent work [51] implements the non-unitary operators in the singular basis by calculating the singular value decomposition (SVD) classically, thus incurring significant overhead for large systems.Additionally, non-unitary operators may be implemented via a combination of unitary operations and measurements requiring a feedback loop [52,53], imaginary time evolution algorithms [54,55,56] that need quantum tomography at each step, and even with methods that use the intrinsic dissipation of the quantum computer [57].
In this article, we first analyze prominent techniques such as the Stinespring dilation, the Sz.-Nagy dilation and the LCU method and discuss their implementation of non-unitary operators to simulate open quantum dynamics and calculate expectation values of observables.The Sz.-Nagy and LCU methods implement the dynamical map by the probabilistic realization of each non-unitary Kraus operator individually.We estimate the number of encoding oracle calls, state preparation oracle calls, and auxiliary qubits required for successful implementation of individual operators in the dynamical map.We provide a fully quantum method to estimate the expectation value of an observable to a specified confidence level, improving on previous methods that use Cholesky decomposition, which incurs O(d 3 ) of classical overhead, where d is the dimension.In particular, given an initial state |ψ prepared by the state preparation oracle S and a contraction operator A block encoded as an isometry (unitary) U A , the state A |ψ / √ p where p = ψ| A † A |ψ , 0 < p ≤ 1, can be successfully implemented with probability at least 1 − β using O(1/p log(1/β)) calls to both U A and S. For realistic cases of interest such as the amplitude damping channel, a particular Kraus operator can have p 1 which leads to a large number of calls to both U A and S, which can be especially limiting if the initial state is difficult to prepare.We provide a quantum algorithm to decompose any arbitrary d-dimensional contraction operator A ( A ≤ 1) 1 , with non-zero singular values into two unitaries such that A = (U 1 + U 2 )/2 using the quantum singular value transformation (QSVT) algorithm [58,59,60,5], without explicitly computing the SVD, which has O(d 3 ) of classical overhead.Given a state |ψ prepared by S and a block encoding U A of an arbitrary operator A, our two-unitary decomposition (TUD) algorithm implements Ũ1 |ψ , Ũ2 |ψ which are approximations of U 1 |ψ , U 2 |ψ with probability at least 1 − β using O(log(1/β)) calls to S, and O(1/δ log(1/ ) log(1/β)) calls to U A and U † A , where the singular values σ i of A are such that σ i ∈ [δ, 1 − δ] ∀i.Therefore, instead of implementing A |ψ directly which is probabilistic, the TUD algorithm implements Ũ1 |ψ and Ũ2 |ψ deterministically with each requiring approximately one use of state preparation oracle S or a single shot of |ψ .This is a considerable improvement from the previously needed O(1/p log(1/β)) calls to S. We show that the unitary decomposition can be used to estimate the expectation value of an observable with respect to A |ψ which can be done with Hadamard tests.Since the TUD method can be used to implement non-unitary operators as only two unitaries, it also has potential applications in linear algebra and quantum machine learning [61,62,63].  2 Open Quantum Systems and Kraus Operators The evolution of an isolated system in a mixed quantum state is described by the von Neumann equation where H(t) is the system's Hamiltonian and ρ t := ρ(t) is the density matrix (in = 1 units).The formal solution to Eq. ( 1) is a unitary transformation where U t = T exp −i t 0 H(t )dt and T is the time-ordering operator, simplifying to U t = exp(−iHt) when H is time-independent.
The system and environment together undergo the unitary evolution of an isolated quantum process.To describe the effective dynamics of the system, we trace out the environmental degrees of freedom obtaining the open quantum evolution.Without loss of generality, one can assume that the system is initially decoupled from its environment such that the entire density matrix is ρ 0 = ω E ⊗ ρ S , where ω E , ρ S are the initial density matrices of the environment and system, respectively.The total evolution U t = T exp −i t 0 H Total (t )dt is described by a Hamiltonian of the general form , where H S , H E act only on the system and the environment respectively, while H I is responsible for their interaction.This allows us to describe the effective dynamics of the d-dimensional system in terms of a dynamical map Λ t : B(H S ) → B(H S ) given by where H S is the Hilbert space of the system, Tr E is the partial trace over the Hilbert space of the environment, and m ≤ d 2 .The dynamical map Λ t is a completely positive trace preserving (CPTP) map which describes physically valid transformation from quantum states to quantum states [9].Following the partial trace, one obtains a set of Kraus operators A k ∈ B(H S ), where B(H S ) is the space of bounded linear operators.The Kraus operators are in general time-dependent, non-unitary, and also depend on the initial environmental state.For example, if where |k is a basis for the Hilbert space of the environment.Furthermore, the Kraus representation of the dynamical map is not unique as the partial trace over the environment is independent of the choice of basis.The Kraus operators themselves lack any special structure apart from the following trace-preservation constraint: For notational clarity we shall suppress any explicit dependence on time or other parameters unless stated otherwise: Since a quantum computer only has access to unitary operations and measurements, the implementation of dynamical maps comprised of non-unitary Kraus operators is a challenge.Implementing the transformation in Eq. (3) by implementing the entire unitary U t that describes the system and its environment together quickly becomes infeasible as the environment can often have large or even infinite dimensionality.Therefore it is necessary to develop quantum algorithms that can emulate the effect of environmental influence indirectly.Given Kraus operators A k block encoded in unitaries U k , we estimate the number of resources to implement each non-unitary Kraus operator individually on a quantum computer as well as the entire map given by Eq. (3).In many practical cases, m can be assumed to be O(poly log d) and therefore, the implementation of the map becomes practically feasible.For brevity we restrict further analysis to ρ S = |ψ ψ|, where |ψ is created by the oracle S, with the extension to mixed states being straightforward.Additionally, we analyze the complexity to estimate the expectation value of a Hermitian observable O, such that the estimator is O := 1/N N i=1 x i where x i is the i th measured (randomly sampled) value of the random variable X, where E(X) = O and N is the total number of measurements.In practice N will be selected so that the variance of the estimate (or standard error of the mean) is below a certain pre-chosen threshold V ar( O ) = V ar(X)/N ≤ v, as discussed in more detail in App. C.

Stinespring Dilation
Simulating the dynamical map of Eq. (3) by realizing the entire evolution of the system and its environment together as a unitary is often infeasible.A natural and intuitive first approach is to instead emulate the degrees of freedom of the environment by a set of auxiliary qubits that may be traced out to obtain the desired map.The Stinespring dilation theorem, proposed long before the advent of quantum simulation, offers a way to achieve this 'mimicking' of the environmental degrees of freedom.
Theorem 1 (Stinespring Dilation theorem) Let Λ : B(H) → B(H) be a quantum CPTP map over a finite dimensional Hilbert space H. Then there exists a Hilbert space E, a unitary operator U St over the joint Hilbert space E ⊗ H, and a quantum state ω ∈ B(E), such that where dim(E) ≤ dim(H) 2 , and the representation is unique up to a unitary equivalence.The Kraus operators can be embedded in the first block-column of the Stinespring dilation unitary U St that acts on the system and a set of auxiliary qubits emulating the environment where m ≤ d 2 .We apply the unitary and trace out the auxiliary qubits-analogous to tracing out the environment-to obtain the dynamical map Given the Kraus operators A k , the md dimensional matrix U St may be created by filling in the rest of (m − 1)d columns via the Gram-Schmidt process, incurring a classical overhead of O(m 3 d 3 ).The quantum circuit for realizing the unitary U St in Fig. 1 uses O(log(m)) auxiliary qubits, and may be implemented with O(m 2 d 2 ) one and two-qubit gates in the worst case [3].The Stinespring method implements the entire dynamical map in one go and therefore, one can simply proceed to measure the observable O.The number of shots N is chosen such that V ar( O ) ≤ v.
One way to possibly reduce the number of gates is by completing the Stinespring unitary to minimize the number of gates in its decomposition rather than by a naive application of Gram-Schmidt procedure.While the above analysis does not assume any knowledge about the Kraus operators, the gate decomposition complexity can also improve greatly for some sparse or specially-structured Kraus operators.
Example 1 Let's consider the amplitude damping channel, with Kraus operators The Stinespring unitary may be filled in as follows The channel is implemented with the aid of a single auxiliary qubit initialized in state |0 which is traced out at the end of the evolution.The last two columns of the unitary are easy to obtain by inspection, due to the relative sparsity of the operators, and it is also Example 2 Simulating a Unitary Ensemble: This is inspired by the observation that many noise sources in controllable quantum systems stem from imperfect control, leading to a unitary implementation which can vary from realization to realization, e.g.disordered evolution [64].One example is that of a continuous control pulse used to enact a particular unitary gate, where the pulse itself is prone to statistical fluctuations [65].Moreover, Pauli errors are captured by the same framework.For concreteness, we consider the case of random unitary channels [66] also known as random external fields [67], where the quantum map is described by a discrete convex combination of unitaries U i where q i > 0 are probabilities ( i q i = 1).An illustration of the proposed scheme to enact Eq. ( 8) is shown in Fig. 2, which proceeds in two steps.First, a state preparation unitary S is implemented to prepare with O(log m) auxiliary qubits the state m i=1 √ q i |i .Next, multi-controlled unitaries CU i are applied, conditioned on the state of the auxiliary qubits, i.e.CU i |i |ψ = |i U i |ψ .The state of the original system plus auxiliary qubits is given by where for ease of notation we assume the initial state is pure |ψ , though nothing in the above scheme is modified if it is an arbitrary mixed state.Finally, by tracing out (i.e.leaving unmeasured) the auxiliary qubits, one can see Eq. (9) becomes precisely Eq. (8): At this point, any expectation value can be measured as desired using this output state.

Parallel Simulation of a Dynamical Map
A natural attempt at overcoming the difficulty of creating and implementing the entire Stinespring unitary is to implement each Kraus operator individually in parallel [44], which we will see below can reduce the gate complexity.Since in general the Kraus operator A k is not unitary, it can itself be encoded within a larger unitary U k which can be implemented on a quantum computer.The Kraus operator can then be accessed via measurement of the auxiliary encoding qubits leading to its probabilistic implementation.We define a general block encoding as below.
Definition 1 (Block Encoding [58]) Assuming that A is an n-qubit operator2 , U is a unitary acting on n + qubits, and α ∈ R + is an upper bound on the maximum singular value of A, then we say that U is a (α, , ∆) block-encoding of A if Without loss of generality, we assume that A/α is the top-left block of U where A/α ≤ 1.The trace preservation constraint implies that all Kraus operators are contractions such that A k ≤ 1 ∀k.Therefore they may be embedded with α = 1 for any general block encoding [44].We assume that each A k is block encoded in its respective unitary U k with encoding auxiliary qubits denoted by a (1, , 0) block encoding.The action of each unitary on the state |ψ is given as where where N k is the number of shots or measurements for each term A † k OA k .A completely arbitrary block encoding U k uses O(L 2 d 2 ) one and two qubit gates for its decomposition in the worst case where L = 2 .The total gate complexity is O(mL 2 d 2 ) and total number of additional qubits O(m log(L)) for the m Krauses.Our process of measuring each expectation value only requires the implementation of block encoding U k and the (1, q, 0) block encoding U O of observable O using at most O((L 2 + Q 2 )d 2 ) one and two qubit gates for decomposition, where Q = 2 q .Arbitrary Kraus operators A k and thus arbitrary unitaries require an exponential number (with respect to the number of qubits) of one and two-qubit gates to decompose.In many realistic scenarios, each A k is a sparse matrix describing transitions between s quantum states where s = O(poly log d).In such cases, the number of one and two qubit gates for decomposition reduces to O(L 2 s 2 ).We discuss below that for special oracles like Sz.-Nagy and LCU encodings, this complexity can be reduced significantly.

Sz.-Nagy Dilation
The Sz.-Nagy dilation theorem guarantees a minimal block encoding of any arbitrary d-dimensional contraction A using only one auxiliary qubit.
Theorem 2 (Sz.-Nagy theorem [70,71,44]) For every contraction operator A : B(H) → B(H) ( A ≤ 1)3 there exists a unitary dilation operator U : B(H)⊕B(H) → B(H)⊕B(H) where Hu, Xia and Kais [44] give a quantum algorithm where each Kraus operator A k is encoded in its respective Sz.-Nagy unitary denoted by U SN k which is a (1,1,0) block encoding.The action of each unitary on a state and an auxiliary qubit is defined as Since U k is a 2d dimensional matrix, its decomposition uses O(d 2 ) one and two qubit gates for the worst case.The total space complexity for simulating the full map of m Kraus operators, in the worst case, is therefore O(md 2 ), and total number of additional qubits is O(m).The same results for general block encodings apply here except that the number of gates required for decomposition is O(d 2 ) which is a great reduction as it removes the dependence on L which in general can depend on d.
Hu, Xia and Kais [44] provide an innovative method to calculate the expectation value by converting the observable O to a positive-semidefinite form Õ and performing Cholesky decomposition which can be incorporated into the dilation itself.This decomposition uses at most a classical overhead of O(d 3 ) per Kraus operator.We do not require the additional step of Cholesky decomposition using our method to estimate expectation value  -Nagy circuit becomes efficient to implement.However there are still m circuits to be implemented, which can be done in parallel.The Sz.-Nagy dilation needs only one auxiliary qubit regardless of the dimension of the encoded A k , making it the most economical block-encoding.However, constructing the dilation unitary might not be efficient as implementing the off-diagonal blocks Moreover, we show in Sec 4.2 that the knowledge of the off-diagonal components of the Sz.-Nagy unitaries is equivalent to the knowledge of expressing any d-dimensional Hermitian matrix in two unitaries or a d-dimensional general matrix in four unitaries, instead of the d 2 unitaries that span the basis.

Linear Combination of Unitaries
The Linear Combination of Unitaries (LCU) method allows for a probabilistic implementation of an arbitrary operator in a quantum circuit realized as a linear combination of unitaries [47,59,72].Using the LCU method we can block encode an arbitrary matrix into a larger unitary given that we know its decomposition as a sum of unitaries.Let an arbitrary operator A k be represented as a linear combination of L unitaries where and α ki ≥ 0, ∀k, i without loss of generality.It is assumed that information about each Kraus operator A k is provided in terms of α ki as a list of L numbers and each U ki is assumed to be implemented with at most c one and two qubit gates.Examples of such U ki include Pauli operators or more generally 1-sparse matrices.
Lemma 1 (LCU Lemma [47,59]) Given a set of α ki ≥ 0 and U ki , with U ki U † ki = I, such that A k = L i=1 α ki U ki , there exists a quantum circuit (Fig. 5) that implements where = log L, B k is a state preparation unitary such that The LCU circuit in Fig. 5 described by Eq. ( 17) implements the Kraus operator A k , requiring log L auxiliary qubits with at most Lc number of decomposition gates.The total gate complexity is at most mLc and the total number of qubits at most m log(L) .The LCU method implements the state A k |ψ /(α k √ p k ) with probability The previous discussion and all results about the number of oracle calls to U k and S hold with the modification k .Cleve and Wang in Ref. [49] show that all Kraus operators can be implemented in a single circuit by using an additional set of qubits applying the LCU method to sum over the Kraus operator index k.Their method can implement approximation of a quantum channel comprising of m Kraus operators in a 2 n dimenional space, with a quantum circuit of size O(q 2 m 2 (log(mq/ε)+n) log(1/ε) ), where each Kraus is given in a linear combination of q Paulis.

Two-Unitary Decomposition Algorithm
A non-unitary operator given in a block encoding unitary such as the Sz.-Nagy or LCU unitary has a probabilistic implementation.Each time the implementation fails, the block encoding unitary as well as the initial state preparation oracle must be applied again.In this section, we provide a quantum algorithm to decompose an arbitrary d-dimensional contraction operator A into two unitaries such that A = (U 1 + U 2 )/2 using the quantum singular value transformation (QSVT) algorithm [58,59,60,5].Given a state |ψ prepared by S and a (1, , 0) block encoding U A of an arbitrary operator A, the TUD algorithm implements Ũ1 |ψ , Ũ2 |ψ such that U 1 − Ũ1 , U 2 − Ũ2 ≤ , with probability at least 1−β using O(log(1/β)) calls to S, and O(1/δ log(1/ ) log(1/β)) calls to U A and U † A , where the singular values σ i of A are such that σ i ∈ [δ, 1−δ] ∀i.Any operator can be alternatively realized by implementing the two unitaries separately with each requiring one use of the state preparation oracle S (i.e. a single shot of |ψ ).This is a considerable improvement from the previously needed O(1/p log(1/β)) calls to S. We apply the TUD algorithm to implement non-unitary Kraus operators to simulate the dynamical map and calculate the expectation value of an observable in Sec.4.2.The queries to the block encoding unitaries U k can also be reduced significantly at the expense of an acceptable error in measurements.
We assume access to a (1, , 0) block encoding U A of an arbitrary operator A = ( 0| ⊗ ⊗ 1)U A (|0 ⊗ ⊗ 1) where = log(L).Information about a sparse A may typically be given in terms of its Pauli expansion, A = L i=1 α i P i , where P i are the products of d-Pauli matrices.The oracle can then be created with techniques such as LCU using Ref. [47,59] or Lemma 1 as shown in Sec.3.2.2.Mathematically, any arbitrary operator A can be expressed in two unitaries as A = (U 1 + U 2 )/2 which is guaranteed by the following theorem.
Theorem 3 (Two Unitary Decomposition [73,74]) For every contraction operator A : B(H) → B(H) ( A ≤ 1), there exist two unitary operators U 1 and The final application of the block-encoding oracle is A , which is U A or U † A for an odd or even polynomial of degree n respectively.(b) Circuit C in Eq. ( 21) to obtain the two decomposition unitaries Ũ1 and Ũ2 .
The above theorem relies on performing a SVD to obtain U 1 and U 2 which costs O(d 3 ) on a classical computer and is therefore prohibitively expensive for large quantum systems.We avoid this cost in our algorithm as we do not perform SVD.In essence, our algorithm relies on the insight that if the SVD of Our algorithm implements Ũ1 |ψ , Ũ2 |ψ where Ũ1 = A + i f (A) and Ũ2 = A − i f (A) such that U 1 − Ũ1 , U 2 − Ũ2 ≤ .We create the function f (A) which is an n-degree polynomial approximation of the function f (A) such that f (A) − f (A) ≤ using the QSVT algorithm as shown in Fig. 6 (a).The function f (A) is implemented with n/2 + 1 calls to U A and n/2 calls to U † A such that n is O(1/δ log(1/ )) where we assume that the singular values of A, σ i ∈ [δ, 1 − δ] ∀i, such that δ > 0 is known.We then use the LCU-addition circuit in Fig. 6 (b) to add A and ±i f (A) to obtain Ũ1,2 /2.Using oblivious amplitude amplification (OAA) then boosts the probability of implementation arbitrarily close to one [48,72], finally obtaining Ũ1 and Ũ2 .
The QSVT algorithm can implement the transformation for an odd degree polynomial of σ i .We must use an odd degree polynomial approximation as we require the basis of transformation to be Since the even function √ 1 − σ 2 cannot be approximated by an odd degree polynomial, we implement the polynomial approximation of the odd function sign(σ) √ 1 − σ 2 .Both functions are identical in the relevant domain σ ∈ (0, 1] as singular values are always nonnegative.We present a slightly modified algorithm for the case when δ = 0 in the next section by implementing A as a four-unitary decomposition.
To obtain the odd polynomial approximation of the function we apply the QSVT transformation shown in Fig. 6 (a) where n is an odd number, Π φ = e iφ(2Π−I) , Π = (|0 0|) ⊗ ⊗ I, and the set of phases {φ i } in Eq. ( 18) are efficiently computable [75,76,77].We project it to the block-encoded subspace to obtain and the relative error using a set of 51 angles {φ i }, respectively, computed using the pyqsp open-source repository [78].After the previous step, we have where We add U A and ±iU f (A) by implementing a LCU circuit to obtain Ũ1 and Ũ2 as shown in Fig. 6 (b) obtaining where . The decomposition unitaries Ũ1 and Ũ2 are each flagged by auxiliary qubits |0 |0 ⊗ +1 and |1 |0 ⊗ +1 .Each of the unitaries Ũ1 or Ũ2 can now be implemented deterministically by only using oblivious amplitude amplification once [48,72] for each case.We show in App.B that the probability of successfully implementing Ũ1 , Ũ2 is ∼ 1 − 3 2 /4.We shall neglect the dependence on 2 moving forward by choosing it small enough.A high level description of the TUD algorithm is given in Alg. 1.
We demonstrate our algorithm by decomposing a set of two-dimensional matrices A such that A = U diag(σ, σ)V † , where U and V † are both independently randomized 4 .We decompose A and implement Ũ1 and Ũ2 by running our two-unitary decomposition algorithm for an approximating polynomial or query complexity of n = 51.Figure 7 (c) shows the approximation error U 1 − Ũ1 and Fig. 7 (d) shows the probability of success of implementing Ũ1 with respect to singular values of A. We see that when the singular values of A fall within the range ∼ [0.1, 0.9], the approximation error remains below 10 −2 and the unitaries are deterministically implemented.
The algorithm produces the correct output with n = O(1/δ log(1/ )) uses of the oracle with the promise that singular values σ i ∈ [δ, 1 − δ] ∀i.In practice we control n and can increase it to obtain a better approximation.When given the minimum singular value δ, we can increase the degree of fitting polynomial f (x) in Fig. 7 A using 2 additional auxiliary qubits.O(log(1/β)) queries to the state preparation oracle S. Procedure: 1 Form the QSVT circuit of Eq. ( 18) for U f (A) as shown in Fig. 6 (a) by choosing 2 Implement the addition of U A and ±iU f (A) by using the LCU circuit in Fig. 6 (b) and perform oblivious amplitude amplification once for each Ũ1 , Ũ2 .The operator norm (or max singular value) of the difference between the approximate QSVT unitary Ũ and the ideal unitary U = A ± if (A), plotted vs. the singular value(s) of the input matrix A. We generated 1000 two-dimensional random matrices with both singular values being identical such that σ ∼ U (0, 1).(d) The norm of the block-encoded matrix Ũ after oblivious amplitude amplification, where deviations from 1 correspond to regions of higher error in (c).
The algorithm however fails when σ min = 0 which can happen in many realistic cases.This problem can be addressed if we first shift all the singular values to fall within the region [δ , 1 − δ ] such that δ > δ for our choice of δ , perform the decomposition that only costs ∼ O(1/δ log(1/ )) and undo the shifting.The reduction in circuit depth will come at the cost of an increase in error tolerance.However, a controlled shift of this kind requires knowing the singular vectors in advance which is expensive which we therefore avoid 5 .We show below that, with a slight modification in the formalism and by performing transformations on the eigenvalues, we can apply arbitrary eigenvalue shifts without explicit knowledge of the eigenbasis.

Four-Unitary Decomposition
Algorithm 2: Two Unitary Decomposition for Hermitian Matrix Input : Hermitian matrix H, δ, > 0 with eigenvalues of H are H using 2 additional auxiliary qubits.O(log(1/β)) queries to the state preparation oracle S.

Procedure:
1 Form the QSVT circuit of Eq. ( 18) for U f (A) as shown in Fig. 6 (a) by choosing n = O(1/δ log(1/ )) with modification that f (σ i ) is an even n-degree . 2 Implement the addition of U H and ±iU f (H) by using the LCU circuit in Fig. 6 (b) and perform oblivious amplitude amplification once for each Ũ , Ũ † .
Any arbitrary matrix A can be decomposed such that A = H 1 + iH 2 where H 1 = (A + A † )/2 and H 2 = −i(A − A † )/2 are Hermitian matrices.We show below that if we perform the two-unitary decomposition on each of the Hermitian matrices H 1 and H 2 such that H 1 = (U 1 + U † 1 )/2 and H 2 = (U 2 + U † 2 )/2, we can control the shifting of the eigenvalues and therefore can control the query complexity to the oracle as desired.
A Hermitian matrix H given in a (1, , 0) block encoding U H can be decomposed into two-unitaries such that H = (U + U † )/2, where U = j (λ j + i 1 − λ 2 j ) |λ j λ j | and λ j , |λ j are the eigenvalues and eigenvectors of H.We use a simplified version of our two-unitary decomposition algorithm, Alg. 2, to implement the decomposition unitaries Ũ , Ũ † such that U − Ũ , U † − Ũ † ≤ , with at least probability 1 − β using O(1/δ log(1/ ) log(1/β)) calls to the block encoding oracle U H where the eigenvalues λ j ∈ [−1 + δ, 1 − δ] ∀j.Particularly, when given a Hermitian matrix, the QSVT algorithm will always implement both the even or odd polynomial approximations of any function in the same basis |λ j λ j |, thus removing the restriction of choosing an odd degree polynomial to ensure the unitarity of U .Therefore, we can choose the originally-intended even function f (x) = √ 1 − x 2 to be approximated by an even degree polynomial f (x) created by the circuit in Fig. 6  The operator norm (or max singular value) of the difference between the approximate QSVT unitary Ũ and the ideal unitary U = H ± i √ 1 − H 2 , plotted vs. the eigenvalue(s) of the input matrix H.We generated 1000 two-dimensional random matrices with both eigenvalues being identical such that λ ∼ U (0, 1).(d) The norm of the block-encoded matrix Ũ after oblivious amplitude amplification, where deviations from 1 correspond to regions of higher error in (c).
(a),(b) show the QSVT approximation f (x) for the scalar function f (x) = √ 1 − x 2 and the relative error using a set of 30 angles {φ i }.We can see that there is large error when x → ±1.This error can be avoided by performing the two-unitary decomposition for H/α; H/α = (U + U † )/2 for α > 1 to obtain Ũ such that U − Ũ ≤ .The effect of scaling is to compress all eigenvalues from λ i ∈ [−1, 1] → λ i /α ∈ [−1/α, 1/α] and to avoid the error close to −1, 1.This method thus avoids the problem when a singular value of A is zero by using a different polynomial approximation whose error does not blow up when an eigenvalue is 0. We discuss the dependence of the query complexity on the scaling factor α in App.D. For the example shown in Fig. 8 where n = 30, the scaling α can be chosen such that the relative error shown in Fig. 8 (b) for the approximation is within the acceptable range.One could choose α = 2 in the decomposition of H/α, and obtain U − Ũ ∼ 10 −2 in the scaled domain [−1/2, 1/2] as shown in Fig. 8 (c).

Application: Dynamical Map Simulation
In summary, a Kraus operator A with non-zero singular values can be implemented as two separate unitaries using Alg. 1 or four unitaries if it has a vanishing singular value as shown in Sec 4.1 using Alg. 2. If A is Hermitian, Alg. 2 can be used directly to implement it as two-unitaries.In this subsection we show the application of the TUD algorithm to implement non-unitary Kraus operators to simulate the dynamical map and calculate expectation value of an observable.We assume access to Kraus operators in a general block encoding such as the LCU in Sec.4.2.1, and show the calculation for generalized amplitude damping channel using four-unitary decomposition.If one assumes access to block-encodings specifically of the Sz.-Nagy form, we show in Sec.4.2.2 that the fourunitary decomposition can be obtained without using QSVT.

Given General Encoding
We assume access to the block encoding of m Kraus operators A k such that A k = ( 0| ⊗ ⊗ 1)U A k (|0 ⊗ ⊗ 1) where = log(L), which may be created by techniques such as LCU as shown in Sec.3.2.
We demonstrate our algorithm by taking the generalized amplitude damping channel as an example.The channel describes the effect of dissipation on a single qubit at a finite temperature.Let the temperature be T , with the steady-state probabilities p = e −E 0 /kT /Z and 1−p = e −E 1 /kT /Z, where E 0 and E 1 are the energies of states |0 and |1 respectively, and where the partition function Z = e −E 0 /kT + e −E 1 /kT .The parameter γ controls the probability of decay which in general is a function of time.The system is described by the four Kraus operators We take E 0 = 0, E 1 = 5 GHz, T = 50 mK, which are typical values for superconducting qubits resulting in p = 0.982.Since the Kraus operators A 1 , A 3 have a zero singular value for all values of γ, we use the four unitary decomposition to implement them.Following the discussion in Sec.4.1, each Kraus operator A k can be written as Given H 1k , H 2k in the form of (α, , 0) block encodings U H 1k , U H 2k , we can use the four unitary decomposition to express A k : where the scaling α = 1.61 in order to restrict the input eigenvalues to a sub-region having a low error profile.We chose n = 30 degree polynomial approximation (or equivalently the number of oracle calls) and show that the error in Fig. 9 9 (c-f) respectively.We show that these unitaries can be used to calculate expectation values of observables.
Given that the Kraus operator A k has non-zero singular values, it can be implemented as two separate unitaries Ũ1k , Ũ2k using Alg. 1, which can be used to compute each term A † k OA k separately and then summed to obtain O .Each term in the sum is written as where , and k indexes the Kraus operators of the channel.(c-f) The expected number of calls to both the state preparation and block-encoding oracles needed to obtain a successful run for each Kraus oeprator.To implement each decomposition unitary, the number of queries to U H (either U H1 , U H2 ) and S is shown with blue solid and dashed line respectively using the TUD method.The orange curve shows expected number of queries to both U A and S using any block encoding which is a probabilistic method.
string of Paulis or block encoded in a unitary U O .We show in the App.B that the error in , which comes from the use of OAA and vanishes without the use of OAA.This error can be made arbitrarily small by increasing the degree of polynomial approximation by querying the block encoding U k for O(log(1/ )) times and therefore is ignored as it has a negligible effect in the estimation of number of runs to estimate the expectation of A † k OA k .The detailed procedure of calculation of expectation value and estimation of variance can be found in Appendix C. The condition where N k are the number of shots for each expectation value A † k OA k .The number of runs however, is higher than for what we proposed in probabilistic implementation of Sec.3.2.
If the decomposition of O is unknown, then it can also be further decomposed into two unitaries using the same algorithm: Each term can now be calculated as where each term in the curly bracket is calculated by a Hadamard test, thus requiring four Hadamard tests regardless of the dimension of Hilbert space.If a singular value of a matrix is zero or very close to it, we must simulate the map by decomposing each A k into four unitaries as shown in Sec.4.1.Each expectation value is then similarly calculated where each term is separately computed in parallel.The first four terms in the first curly bracket can be measured directly whereas the rest can be measured via Hadamard tests.We verify the above formula for the example of generalized amplitude damping channel by plotting the expectation value and error in App.B. The variance for the four-unitary case can be similarly derived to the two-unitary decomposition case presented before.

Given Sz.-Nagy Encoding
In the special case where we assume access to a minimal dilation of the Sz.-Nagy form [44], we show that we do not need QSVT to obtain the two-unitary decomposition and can instead obtain it with only one call to the encoding oracle deterministically and without any error.This assumption permits the calculation of expectation values using only a single query to the Sz.-Nagy block-encoding.Each k )/2 are Hermitian matrices.We assume that both the Hermitian operators H 1k , H 2k are encoded in their respective Sz.-Nagy unitaries We aim to obtain the decomposition Let the eigenvalues and eigenvectors of H 1k be λ ki , |λ ki , then we can see that the decomposition unitaries can be written as One can then easily obtain the decomposition unitaries by performing rotations on the Sz.-Nagy encoding qubit as shown in Fig. 10, such that with H being the Hadamard gate.A similar process is done to decompose ) to obtain the decomposition of the arbitrary matrix ).Therefore, given the Sz.-Nagy encoding, we decomposed an arbitrary operator into four unitaries without QSVT, and with only one oracle call and zero error.We can measure each expectation in parallel using Eq. ( 26) by replacing QSVT approximation unitaries Ũik with the exact unitaries U ik .
Encoding the matrix in Sz.-Nagy unitary circumvents the need for QSVT due to its special structure.When given a Hermitian matrix H, the off-diagonal blocks of the Sz.-Nagy unitary already provide access to the transformation f (H) = √ 1 − H 2 , which can be accessed.This special structure is even present in any other general oracle but the basis required to access the off-diagonal block is unknown in general.Additionally, we use the fourunitary decomposition and not the two-unitary decomposition here because Sz.-Nagy dilation of A has When A is Hermitian, the Sz.-Nagy encoding provides the desired transformation as the input and output bases are the same.

Conclusion
The simulation of open quantum systems is challenging due to nature of the dynamical map that describes the evolution.In particular, the dynamical map can be represented in terms of Kraus operators, that, in general, fail to display special operator properties such as unitarity or hermiticity.Therefore, direct simulation on a standard (unitary based) quantum computer is impossible, i.e., to do so generally one must sacrifice some degree of accuracy, efficiency, or success probability.In order to circumvent those shortcomings, one needs to employ indirect methods that use a combination of auxiliary qubits, tracing-out or subsystem measurements.In this work, we analyzed prominent approaches such as the Stinespring dilation, the Sz.-Nagy dilation and the LCU technique.Additionally, we introduced a two-unitary decomposition algorithm as an alternative method for simulating Kraus operators, and hence, the entire dynamical map in parallel.A comparison between the algorithms is summarized in Table 1.All discussed algorithms share a common feature of embedding a non-unitary Kraus operator into a larger unitary matrix.This procedure requires expanding the Hilbert space with extra auxiliary qubits to accommodate the embedding, which is usually referred to as a dilation or block encoding.
We examined previously-introduced algorithms, that are of a probabilistic nature, by providing an estimation for both classical and quantum resources.Moreover, we introduce a quantum method to calculate the expectation value of an observable and compute the variance of its estimator, necessary for understanding the total number of runs of a given algorithm to reach certain precision.We provided the TUD algorithm, which compared to previous methods allowed one to express the non-unitary Kraus operator with non-zero singular values as a sum of two unitaries which can be deterministically implemented.The TUD approach relies on the quantum singular value transformation and avoids expensive O(d 3 ) classical SVD overhead.Since each Kraus is expressed as a combination of two unitaries, one can implement each unitary independently without failure, which uses only a single access to the state preparation oracle.The two-unitary decomposition suffers a singularity in its error profile for matrices with vanishing singular values.We therefore provided a remedy in the form of a four-unitary decomposition, which decomposes a non-Hermitian Kraus operator into a sum of two Hermitian operators, that are then decomposed in two unitaries each.This procedure also allows to arbitrarily scale and shift eigenvalues.Even though in this paper we focused mainly on applications related to the simulation of open quantum systems, the basic algorithms discussed here rely on simulating arbitrary contractions ( A ≤ 1).Thus, one can readily use the TUD algorithm in other settings that require implementing matrices without particular structure, as is often the case with classical input data accessed via an oracle.For example, methods in linear algebra such as the HHL algorithm [61], or in machine learning [62,63] more generally, can potentially benefit from our approach.However, further work is required to identify in which cases the TUD algorithm can provide advantages over existing techniques.The introduced TUD algorithm and reviewed methods are currently state-of-the-art techniques for simulation of open quantum systems.However, we treat this contribution not as the last word, but yet another step towards more efficient ways for simulating these complex systems.We hope that quantum computing can reveal new phenomena related to environment-system interactions that cannot be studied by other means -where the analytical treatment is confined to a handful of small-scale systems, and numerical techniques fail to scale with system's size.In particular, biological and chemical systems that are embedded in a non-trivial environment (a solution, a protein complex, etc.) so far have been treated with various approximation and semi-classical techniques or limited to small-size systems.Once we can tap into full scale OQS simulations, these restrictions can be lifted and a wealth of new physics (e.g.reaction mechanisms) could be potentially explored.

A Total Number of Expected Runs
The number of expected runs to implement each Kraus operator A k and therefore the state We wish to find the minimum value of the total number of expected runs to implement all Kraus operators satisfying the constraint m k=1 p k = 1.We write the cost function as Solving the above we obtain p k = p min = 1/m ∀k.These are the minima as

B Error Estimation B.1 Probability of Success
We calculate the probability of successfully implementing Ũ1 and Ũ2 after OAA.The output of circuit C in the main text Fig. 6 (b) is We only have to use OAA once (which calls the QSVT routine three times) to implement either Ũ1 or Ũ2 , which we denote by Ũ in the calculation below.

B.2 Simulation of Dynamical Map
The aim is to simulate the dynamical map We show the calculation below for each term A † k OA k and thus remove k for convenience.We have a block-encoded operator A and write its ideal decomposition as A = 1 2 (U 1 + U 2 ).Here U 1 = A + if (A) and U 2 = A − if (A).We can expand the expectation as In reality, we might have an error h in the block encoding given by (1, , h) Ã : A − Ã ≤ h.We use our two-unitary decomposition algorithm to implement the function f ( Ã) : f ( Ã) − f ( Ã) ≤ , where f is the polynomial approximation of f .Therefore we implement approximations of U 1k , U 2k which are Ũ1k = Ã + i f ( Ã) + ∆c and Ũ2k = Ã − i f ( Ã) + ∆c, where |∆c| ≤ comes from the error in oblivious amplitude amplification as Ũ1k , Ũ2k are not perfect unitaries.We write U 1k = Ũ1k + ∆A + i∆f + ∆c and U 2k = Ũ2k + ∆A − i∆f + ∆c where ∆A = A − Ã and ∆f = f (A) − f ( Ã).We implement  where we kept the leading order terms in error.
For each k, we have and using the Cauchy-Schwarz inequality, Ãk |ψ ≤ 1 as all operators are contractions.Therefore, we obtain that We can see that when block encoding error h = 0, the only error is due to the oblivious amplitude amplification.To be clear, the error ∆f induced by QSVT itself cancels out leaving only the error from oblivious amplitude amplification (which also originates from the QSVT error).
A similar calculation applies when we have the four-unitary decomposition instead of the two-unitary decomposition shown above.We calculate the expectation value and error for the example of generalized amplitude damping channel shown in the main text.Each term can be evaluated in parallel using the four unitary decomposition as follows When OAA is used, the error , which vanishes without OAA.Figure 11 (a) shows the total expectation value O plotted on the left calculated using the four-unitary decomposition with the initial state |ψ = |1 .The observable O is taken as X, Y, Z denoted by blue, orange and green lines respectively.The error in estimation is shown on the right for an infinite number of measurements which remains below ≈ 10 −2 .Similar calculation is shown for different initial states |+x , |+y in Fig. 11  (b), (c).We plot the error in Fig. 12 and show the calculation of expectation values without performing oblivious amplitude amplification to show that it cancels for each term A † k OA k .

C Measuring Expectation Value: Hadamard Test and Variance
Let the random variable X take the values X = ±1 when the auxiliary qubit is measured in state |0 , |1 respectively.The expected value of X is then and the variance of X is where i denotes the i th measurement.Here X 1 , X 2 are measured N /4 times and X 3 is measured N /2 times which is in proportion to their respective weights in Eq. (59).The variance of the estimator X is

D Quantum Singular Value Transformation
In this section, we briefly summarize the quantum singular value transform and provide details regarding our specific usage of it.Readers seeking more information should consult the references cited herein.
The quantum singular value transform (QSVT) [58] is an algorithm that allows one to apply a polynomial transformation to the singular values of an arbitrary matrix which is block-encoded inside of a larger unitary.The intuition for the structure of this algorithm lies with earlier work on quantum signal processing (QSP) [81] which studied what kinds of unitary transformations are achievable by alternating between a "signal" unitary (which encodes some kind of accessible quantum information) and a single-qubit phase gate with a varying rotation angle (which provides a second control axis).A rich set of polynomial transformations of the information encoded by the signal unitary was obtained, and this behavior was later extended beyond the single-qubit case, where it was shown that the entire spectrum of a block-encoded Hermitian matrix H could be processed similarly [59].
To achieve this, one first doubles the dimension of the Hilbert space by appending an auxiliary qubit, and then transforms the block-encoding into an iterate that takes the form of a product of two reflection operators.By Jordan's Lemma, each 1-dimensional eigenspace of H then defines an SU (2) subspace (hence the name "qubitization") in which one can generate coherent rotations without mixing into any of the other eigen(sub)spaces.With additional access to controlled-phase gates, one can achieve the same kind of "two-axis" control within each eigen(sub)space, and thus effect the same polynomial transformations of quantum signal processing on each eigenvalue, and therefore the matrix itself, in parallel.Lastly, these results were further generalized [58] to arbitrary (even rectangular) matrices, where by carefully considering the left and right singular vector spaces, the same behavior can be derived with even fewer assumptions.The importance of these results is well-motivated by recent observations that many quantum algorithms with no overt similarities and widely differing use-cases can be generalized as instances of QSVT [58,5].
Let us begin by assuming that we have a matrix with SVD form which is block-encoded within a larger unitary U , given by A = ΠU Π.The projector Π projects into the right singular vector space, i.e. img(Π) = i |v i v i |, and img( Π) = i |w i w i | thus projects into the left singular vector space.These projectors then locate A within U , and with access to controlled versions of Π and Π (i.e.rotations conditioned on lying in the span of Π or Π), one can interleave them with U to produce the desired polynomial transformation: P (A) = i P (σ i )|w i v i |.When restricted to the appropriate invariant SU (2) subspace, the actions of U and the controlled rotations have the form where B i denotes the SU (2) subspace of the i-th singular value.The operator [U ] B i is often referred to as the "iterate" or the signal operator, and the operator [e iφ(2Π−I) ] B i is convention, due to the fact that we presume access to block encodings of the more general format where the R-iterate form implicitly holds by the results of [58].If one assumes direct access to block-encodings of the Sz.-Nagy form, one immediately recovers the R-iterate in the computational basis and can take advantage of measuring in the +| • |+ convention, which is more convenient for representing real polynomials.In fact, such an assumption immediately gives us our results without needing QSVT at all (Sec.4.2.2).Other oracles for the block-encoding, such as LCU, require the 0| • |0 convention, which results in complex polynomial outputs that must have their imaginary components removed in the manner of Eq. ( 65) if one desires a real output.In particular, this convention is much less convenient because the polynomial outputs must obey two additional constraints [5,58] |P (±1)| = 1 (67) where P (•) is the implemented polynomial.Although these constraints result in higher approximation error for a given degree, we choose to use this convention because it is compatible with more realistic oracle models such as LCU.
The angles output by pyqsp are in the W x form, and must be converted to the R-form by the following map [5]: The final circuit for the QSVT is shown in Fig. 14.As described in the main text, we must start in the right space and end in the left space to obtain the desired map i 1 − σ 2 i |w i v i |.This necessitates either an odd polynomial approximation to the even function √ 1 − x 2 , or a decomposition of A into Hermitian and anti-Hermitian components so that one resolves the input/output space issue.For reproducibility, the 51 angles obtained via pyqsp for the QSVT of Fig. 7  The complexity for step 1 in the two-unitary decomposition algorithm that involves the use of QSVT to create the polynomial approximation to the matrix-function f (A) = i sign(σ i ) 1 − σ 2 i |w i v i | depends on the number of oracle calls to the block encoding oracle U A .Since each call to the U A increases the degree of the polynomial by one, the complexity by finding the degree n of the approximating scalar polynomial f (x) for the scalar function f (x) = sign(x) √ 1 − x 2 .Let f 1 (x) = sign(x) and f 2 (x) = √ 1 − x 2 , such that f (x) = f 1 (x)f 2 (x).Then, Corollary 6 of the Ref. [82] or Ref. [58,5] show that where f1 (x) is a polynomial of degree n 1 = O(1/δ 1 log(1/ )).We can obtain the condition for the approximation of the function f 2 (x) by a n 2 -degree polynomial f2 (x) over the domain where γ i are the taylor expansion coefficients and B =

D.3 Complexity for Scaling in Four Unitary Decomposition
We chose the even function f (x) = √ 1 − x 2 to be approximated by an even polynomial to avoid a discontinuity at 0 in the error profile.The degree n of approximating polynomial can be found as where we used that x/α ∈ [(−1 + δ)/α, (1 − δ)/α] and B α ≤ ∞ i=n |γ i α i | is an upper bound on the weighted Taylor series coefficients.We find that n ≥ 1/(log α + δ) log(B α / ) for the approximation to hold within the domain of x/α.We see that when we do not know δ or δ = 0, we can chose n ≥ 1/ log(α) log(B α / ) ≥ 1/(log α + δ) log(B α / ) removing the dependence on δ.
However, if we have a matrix that is sufficiently sparse implying only few non-zero eigenvalues, the compression of the eigenvalues will not affect the distribution of eigenvalues much as it is not dense.For this special case, H/α = H where H is another sparse matrix such that its eigenvalues λ i ∈ [−1+δ , 1−δ ] where δ = 1−1/α.The complexity in this case is then O(1/δ log(1/ )).The number of oracle calls n decreases with increasing α for the approximation valid over a smaller domain.However this comes at the cost of the approximation error, as when we scale back we will we have to use α Ũ , α Ũ † in calculation involving H and therefore the error will scale with α.

D.4 Intuition for Two-Unitary Decomposition
Rather than appeal directly to SVD to derive the two-unitary decomposition, one can motivate the approach given that we have QSVT in mind.Since we prefer to work with unitaries, one might ask what kind of processing is needed to produce a unitary from A, using only basic block-matrix arithmetic (such as what is defined in Section 5 of [58]).Multiplication of A by some matrix can indeed produce a unitary via the polar decomposition of A: If A is invertible and if P is the positive-definite square root of A, then we can obtain a unitary U = AP −1 = A( √ A † A) −1 .Performing matrix inversion is costly through QSVT, so we may consider trying addition rather than multiplication: Given that we must always process a block-encoding of A, we may assume that M is some function of A, i.e. that M = f (A) = W f (Σ)V † .Thus we may write = V (Σf (Σ) − f (Σ)Σ)V † = 0 (81) which follows from the fact that Σ is diagonal and thus commutes with f (Σ).We are left with which proves that M = √ I − A 2 is sufficient to obtain a unitary, and avoids the use of matrix inversion.

Figure 1 :
Figure 1: The Stinespring dilation unitary U St with log m auxiliary qubits that are traced out (indicated by downward-pointing arrows) to obtain the Kraus map.

Figure 3 :
Figure 3: A unitary from the set of dilation unitaries {U k } such that k ∈ {1, . . ., m}, where each implements the state A k |ψ / √ p k conditioned on measuring the auxiliary qubit in state |0 ⊗ .

Figure 4 :
Figure 4: A unitary from the set of Sz.-Nagy's dilation unitaries {U SN k } such that k ∈ {1, . . ., m}, where each implements the state A k |ψ / √ p k conditioned on measuring the auxiliary qubit in state |0 .

.
Since each application of the encoding oracle U A increases the degree of the polynomial by one, the required circuit depth such that |f (x) − f (x)| ≤ where x ∈ [−1 + δ, −δ] ∪ [δ, 1 − δ] can be found by setting degree n = O(1/δ log(1/ )), see App.D. Figures 7 (a) and (b) show the QSVT approximation for the scalar function f

Figure 7 :
Figure 7: (a) Approximation of f (x) = sign(x) √ 1 − x 2 using an odd degree-51 polynomial.Real and imaginary components of the approximation are plotted in comparison with the ideal function.(b) Relative percent error between the real and ideal curves of plot (a) as a function of the scalar input.(c)The operator norm (or max singular value) of the difference between the approximate QSVT unitary Ũ and the ideal unitary U = A ± if (A), plotted vs. the singular value(s) of the input matrix A. We generated 1000 two-dimensional random matrices with both singular values being identical such that σ ∼ U (0, 1).(d) The norm of the block-encoded matrix Ũ after oblivious amplitude amplification, where deviations from 1 correspond to regions of higher error in (c).

Figure 8 :
Figure 8: (a) Approximation of √ 1 − x 2 using an even degree-30 polynomial.Real and imaginary components of the approximation are plotted in comparison with the ideal function.(b) Relative percent error between the real and ideal curves of plot (a).as a function of the scalar input.(c)The operator norm (or max singular value) of the difference between the approximate QSVT unitary Ũ and the ideal unitary U = H ± i √ 1 − H 2 , plotted vs. the eigenvalue(s) of the input matrix H.We generated 1000 two-dimensional random matrices with both eigenvalues being identical such that λ ∼ U (0, 1).(d) The norm of the block-encoded matrix Ũ after oblivious amplitude amplification, where deviations from 1 correspond to regions of higher error in (c).

Figure 9 :
Figure 9: Error behavior and query complexity for the four Kraus operators of the generalized amplitude damping channel, as a function of the decay/excitation event probability γ. (a-b) The operator norm (or max singular value) of the difference between the approximate QSVT unitary Ũjk and the ideal unitaryU jk = H jk ± i 1 − H 2 jk , where (a) H 1k = 1 2 (A k + A † k ) , (b) H 2k = i 2 (A k − A † k ), and k indexes the Kraus operators of the channel.(c-f) The expected number of calls to both the state preparation and block-encoding oracles needed to obtain a successful run for each Kraus oeprator.To implement each decomposition unitary, the number of queries to U H (either U H1 , U H2 ) and S is shown with blue solid and dashed line respectively using the TUD method.The orange curve shows expected number of queries to both U A and S using any block encoding which is a probabilistic method.

Figure 11 :
Figure 11: In row (a), we plot on the left the true expectation value of a Pauli O ∈ {X, Y, Z} with respect to the evolution of the state |1 under the generalized amplitude damping channel.On the right, we plot the absolute difference between the true expectation value and the approximate expectation value estimated via the four-unitary decomposition of each Kraus operator A k (γ).Rows (b) and (c) show the same for initial states |+x and |+y , respectively.

Figure 12 :
Figure 12:  In row (a), we plot on the right the absolute difference between the true expectation value and the approximate expectation value estimated via the four-unitary decomposition of each Kraus operator A k (γ) with initial state |1 , but before OAA has been performed.Note the small scale of the y-axis.On the left, we plot the average error over the 10 terms in the four-unitary decomposition of A † OA (see Eq. (26)).We also average this error over the terms from the other three operators of the channel which contribute to Tr(Λ(ρ)O).The scale on the right is 12 orders of magnitude larger, a clear indication that the error must cancel to obtain the scale on the left.Rows (b) and (c) show the same for initial states |+x and |+y , respectively.

Figure 14 :
Figure14: QSVT circuit used for applying real and odd polynomial transformation to the singular values of an arbitrary matrix A, in the convention used by pyqsp.If one desires an even polynomial, the final application of the block-encoding will be U † instead of U .
the auxiliary qubits are measured in state |0 ⊗ .To implement the state A k |ψ / √ p k successfully with at least probability 1 − β, We detail such a calculation and its associated variance in Appendix C.This is repeated to obtain the expectation value of each term A † k OA k in the sum to calculate O .Finally a summation over the m terms is performed to obtain the expectation value of O.The condition Similarly the total number of expected calls to U k and S isE = m k=1 1/ √ p k where E ≥ m 3/2.We can measure the expectation value of the observable O in parallel which is given by O = m k=1 A † k OA k .Each time any of the states A k |ψ / √ p k is successfully implemented, we can proceed to perform a measurement of O.The observable O may be measured directly if one knows the eigenbasis of O, or when O is provided as a linear combination of unitaries (e.g.Pauli strings) which may be measured separately and totaled (see Sect. 3.2.2).More generally if no information is provided, one may instead use the Hadamard test to calculate the expectation value given a block-encoding of O.
m}, where each implements the state A k |ψ / √ p k conditioned on measuring the auxiliary qubit in state |0 ⊗. mentioned in Sec.3.2 which reduces to O(d 2 ) complexity in the worst case given the Sz.-Nagy encodings.Similarly as above, in the case when A k are s-sparse, the number of one and two qubit gates for decomposition reduces to O(s 2 ) and each Sz.
where the |w i and |v i form orthornomal bases.
The state |ψ prepared by a call to the oracle S.
Output : Flagged states |0 |0 ⊗ +1 , |1 |0 ⊗ +1 indicating implementation of Ũ |ψ , Ũ † |ψ with probability at least 1 − β such that (a)and (b) is bounded by 10 −2 .Figure9(c) and (d) show the expected number of oracle calls to implement the state A Ũ2k |ψ , Ũ † 2k |ψ , we see that we only require 93 calls 6 to the block encoding oracles U H 1k , U H 2k and one call to the state preparation oracle S for each which is shown by the solid and dashed blue line in Fig.
k |ψ / √ p k for k ∈ {0, 1, 2, 3} respectively as a function of γ.The expected number of calls to the block encoding U A k of each Kraus operator A k as well as the the state preparation oracle S to prepare |ψ is O(1/p k ) shown by the orange solid line.Instead of implementing A k , if we implement the individual decomposition unitaries Ũ1k |ψ , Ũ † 1k |ψ ,

Table 1 :
Given an initial state |ψ prepared by the state preparation oracle S and the operator A block encoded in U A using auxiliary qubits, the first row shows the complexity of successfully implementing the state A |ψ / √ p with probability at least 1 − β, for general, LCU, and Sz.-Nagy block encodings.The second row shows the complexity of implementing Ũ1 |ψ , Ũ2 |ψ which are approximations of U 1 |ψ , U 2 |ψ such that A = (U 1 + U 2 )/2 with probability at least 1 − β, where the singular values of Putting this back in the expected number of runs, we obtainE min = m 2 .Therefore E ≥ m 2 .When amplitude amplification is used each A k is implemented in O(1/ √ p k ).Carrying out a similar calculation gives E = m i=1 1/ √ p k ≥ m 3/2 with amplitude amplification.
(before using the above conversion) are