Reqomp: Space-constrained Uncomputation for Quantum Circuits

Quantum circuits must run on quantum computers with tight limits on qubit and gate counts. To generate circuits respecting both limits, a promising opportunity is exploiting uncomputation to trade qubits for gates. We present Reqomp, a method to automatically synthesize correct and efficient uncomputation of ancillae while respecting hardware constraints. For a given circuit, Reqomp can offer a wide range of trade-offs between tightly constraining qubit count or gate count. Our evaluation demonstrates that Reqomp can significantly reduce the number of required ancilla qubits by up to 96%. On 80% of our benchmarks, the ancilla qubits required can be reduced by at least 25% while never incurring a gate count increase beyond 28%.


Introduction
Quantum computers will remain tightly resourceconstrained for the foreseeable future, both in terms of available qubits and number of operations applicable before an error occurs.Running quantum programs hence requires compiling them to circuits with a limited qubit and gate count.A promising opportunity to achieve this goal is to exploit the need for uncomputation as an opening to trade qubits for gates.
What is Uncomputation?Just as classical programs, quantum circuits often leverage temporary values, called ancilla variables.Whereas classical programs can discard temporary values whenever convenient, temporary values in quantum circuits must be carefully managed to avoid side effects on other values through entanglement [1, §3].Uncomputation is the process of preventing such side effects by reverting ancilla variables to state |0⟩ after their last use, thus ensuring that they are disentangled from the remainder of the state.For instance, Fig. 1 shows a circuit implementing CCCCH: the H gate on qubit t with four control qubits o, p, q, and r.   holds o • p • q, and c holds o • p • q • r.We then use this last ancilla c to control the H gate on t, only applying H if all of o, p, q, and r hold state |1⟩.In Fig. 1a, these ancilla variables are not uncomputed, and may result in unexpected interactions if this circuit is used as part of a bigger computation.They must therefore be uncomputed, as shown in Fig. 1b: the operations applied to each of them are reverted at the end of the circuit, ensuring that all ancilla qubits are reset to |0⟩.
Reducing Qubits.After uncomputing an ancilla variable, its qubit can be reused by another ancilla variable, therefore reducing the overall number of qubits used by the circuit.Sometimes, it is even beneficial to uncompute an ancilla variable (too) early, allowing its qubit to be reused at the cost of later recomputing the ancilla variable when it is needed again.Fig. 1b simply uncomputes ancilla variables in the reverse order of their computation, namely c-b-a.As no ancilla qubit can be reused, Fig. 1b requires 8 qubits and 7 gates overall (see Fig. 1d).Fig. 1c shows an alternative implementation of CCCCH leveraging recomputation.It uncomputes ancilla variable a early, making its qubit u 0 free for the computation of ancilla • Well-valued circuit graphs, a graph representation of circuits allowing for accurate value tracking, and a method evolveVertex to modify them ( §3); • Reqomp, a method using well-valued circuit graphs to synthesize and place uncomputation in circuits under space constraints ( §4); • A correctness proof for Reqomp ( §5); • An implementation 1 and evaluation of Reqomp demonstrating it outperforms previous work ( §6).

Background
We now introduce the necessary background on quantum computation.Quantum States.We write the quantum state φ of a system with qubits p and q as: where γ j,k , γ l ∈ C and ⊗ is the Kronecker product.
If φ factorizes into j γ ′ j |j⟩ p ⊗ k γ ′′ k |k⟩ q , p and q are unentangled, otherwise they are entangled.Whenever convenient, we omit ⊗ and write |j⟩ instead of |j⟩ p .We use latin letters |j⟩ to denote computational basis states from the canonical basis {|0⟩ , |1⟩} and greek letters φ to denote arbitrary states.Gates.A gate applies a unitary operation to a quantum state.Here, we only consider gates with a single target qubit in state φ and potentially multiple control qubits C = {c 1 , ...} in state |j⟩ for j ∈ {0, 1} m , mapping |j⟩ C ⊗φ to |j⟩ C ⊗ϕ, where the mapping from φ to ϕ may depend on the control j.Specifically, only the value of the target qubit may be changed, while control qubits are preserved.Note that this mapping can be naturally extended to superpositions (i.e., linear combinations as in Eq. ( 1)) by linearity.Further, because any circuit can be decomposed into singletarget gates, not considering multi-target gates is not a fundamental restriction.
A gate is qfree if its mapping can be fully described by operations on computational basis states, i.e., if for control qubits C and target qubit t it is of the form for F : {0, 1} m × {0, 1} → {0, 1}.For example, the NOT gate X, the controlled NOT gate CX, and the Toffoli gate CCX are qfree, while the Hadamard gate H and the controlled Hadamard gate CH are not qfree.Qfree gates are known to be critical for synthesizing uncomputation [1,3,4].
Uncomputation.The task of uncomputation is to revert all ancilla variables in a circuit to their initial state |0⟩, while preserving the circuit effect on the other variables.Formally, given a circuit C, we want to synthesize C which resets ancillae variables to |0⟩ without affecting the remainder of the state: Definition 2.1 (Correct Uncomputation, [1,3]).C correctly uncomputes the ancillae A in C if whenever Here, C denotes the semantics of circuit C acting on a given input state.We refer to [1] for a more thorough introduction to uncomputation.

Circuit Graphs
As discussed in §1, Reqomp does not work directly on circuits, but instead relies on well-valued circuit graphs.In this section, we first intuitively introduce these graphs ( §3.1) and the method evolveVertex to manipulate them ( §3.2).Finally, we formalize the definition of well-valued circuit graphs and show how evolveVertex preserves their well-formedness ( §3.3).
Circuit graphs were introduced and formalized in Unqomp [1].We here extend them to precisely track qubits values.We discuss the differences between well-valued circuit graphs and Unqomp circuit graphs in §3.3.

Circuit Graph Intuition
As an example, we consider the circuit depicted in Fig. 2a.This circuit uses an ancilla qubit a to compute some output on qubit t, based on the value of qubit s.The circuit graph G corresponding to this circuit is shown in Fig. 2c.Vertices and Edges.We first focus on the structure of the circuit graph G. G contains one init vertex per qubit (e.g., s 0.0 for qubit s), and one gate vertex per gate (e.g., s 1.0 for the first X gate on s).It also connects consecutive vertices on the same qubit by a target edge, e.g., s 0.0 → s 1.0 .Further, as a 1.0 represents a CX gate controlled by qubit s, the circuit graph G also contains a control edge between the corresponding vertices on s and a: s 1.0 •→ a 1.0 .Finally, the circuit graph G also contains anti-dependency edges to enforce correct ordering between otherwise unordered vertices.For example, a 1.0 s 0.1 ensures that the second X gate on s (represented by s 0.1 ) can only be applied after the CX gate targeting a (represented by a 1.0 ).Antidependency edges can be reconstructed from the target and control edges: for any three vertices n, c, d

Valid Circuit Graphs.
A circuit graph is valid iff it corresponds to a valid circuit.Most importantly, all valid circuit graphs must be acyclic2 .Any valid circuit graph G can be converted into a circuit.The resulting circuit has one qubit per init vertex in G.We then pick any total order on the gate vertices of G that is consistent with the partial order induced by its edges, and add gates to the circuit following this total order.[1] showed that any of the circuits G can be converted to (depending on the choice of total order) have equivalent semantics.We hence define the semantics of a valid circuit graph G, denoted G , as the semantics of any circuit it can be converted to.
Tracking Values.While the above construction follows Unqomp [1], we additionally introduce a new vertex naming convention to track qubit values.Specifically, each vertex (e.g., s 1.0 ) is identified by its qubit (here s), its value index (here 1) and its instance index (here 0).The value index is chosen such that intuitively, two vertices with the same qubit and value index hold the same value, even in the presence of entanglement.The instance index is used to ensure uniqueness of vertex names.For example, as X is selfinverse, the value on qubit s is the same in the very beginning of the circuit ( s 0.0 ) as after applying the two X gates to s ( s 0.1 ).More precisely, if the input state to the circuit is |0⟩ s ⊗ φ, after the two X gates on s have been applied, the final state is |0⟩ s ⊗ φ ′ for some φ ′ .Reflecting this in the circuit graph, vertices s 0.0 and s 0.1 share the same value index 0.
Value Graph g val .To track value indices during circuit graph construction and later during uncomputation, we rely on the value graph g val , shown in Fig. 2b.It records for each qubit and value index the possible value transitions.g val contains one init vertex per qubit but without an instance index, for example s 0 for qubit s.When encountering a new gate, for example the first X gate on qubit s, we pick a fresh value index for this qubit and extend g val .The value graph g val also records which operations can be safely uncomputed.For instance, as X is a qfree gate, the X on s 0 can be uncomputed: applying X on s 1 yields s 0 (note that X is self-inverse).We materialize this with the reverse edge s 1 → s 0 , giving: Similarly, the CX gate from a 0 with control s 1 yields a 1 (see Fig. 2b).Note that we do not specify the instance index of s 1 : as any vertex on qubit s with value index 1 carries the same value, any of them can be used as a control.As CX is qfree, we also record in g val the reverse edge from a 1 to a 0 , with gate CX and control s 1 .In contrast, the CCH gate on qubit r cannot be safely uncomputed as it is not qfree [3].We therefore only record the forward edge from t 0 to t 1 .

Modifying a Circuit Graph
We now discuss the core operation of Reqomp, the safe extension of a circuit graph in a stepwise manner through function evolveVertex.

evolveVertex.
We show the algorithm for evolveVertex in Fig. 3a.As its name suggests, it is used to evolve vertices, i.e., to bring qubits from one value index to another.It uses the value graph g val as a guide, and iteratively modifies a circuit graph G.In Fig. 3, G is a copy of the circuit graph G from Fig. 2c, and its value graph g val is shown in Fig. 2b.Note that Reqomp, and hence also evolveVertex, never modifies the circuit graph G corresponding to its input circuit.Instead, they both work on a new circuit graph G, built by following G, as we will explain in §4.
Calling evolveVertex.evolveVertex takes as input three arguments.First, qb is the qubit on which we will insert the uncomputation.Second, nVId is the value index we will evolve this qubit to.The last argument is I, a set of qubits on which vertices are currently being added-this argument is needed to avoid infinite recursion (see also Lin. 2, discussed later).In Fig. 3b, we demonstrate the example call evolveVertex(a, 0, ∅), which uncomputes a single gate on qubit a: it will bring qubit a from its current value index 1 to 0. The argument ∅ indicates that no vertices on any other qubit are in the process of being added to G.
Building the New Vertex.evolveVertex proceeds as follows.Lin. 3 gets the last vertex last on qubit a, that is the lowest one following target edges.Lin. 4 stores its value index in variable oVId.Here last is a 1.0 and oVId is 1.Lin. 5 then checks that nV Id, the value index we want to add to the graph, here 0, can be reached in just one gate step.This is the case as a 0 is just one CX gate away from a 1 , as evidenced by the edge a 0 CX,b0 − −−− → a 1 in g val (see Fig. 2b).Lin. 8 then inserts a new vertex in G on qubit a with value index 0 and gate CX.As there is already a vertex a 0.0 in G, it picks a new instance index, resulting in vertex a 0.1 .Lin. 9 finally links it to its predecessor with a target edge and adds the resulting anti-dependency edge t 1.0 a 0.1 .This results in the second graph in Fig. 3b (ignoring the red edges s 1.0 → a 0.1 s 0.1 ).Adding Control Edges.To ensure a 0.1 indeed uncomputes a 1.0 , we must control a 0.1 with qubits holding the same values as were used to control a 1.0 .More precisely, a 0.1 should be controlled by vertices with the same qubit and value index as those controlling a 1.0 .As the set ctrls contains exactly those qubits and value indices (see Lin. 6), Lin. 10 simply iterates over all controls c in ctrls.Through a call to the auxiliary function getAvailCtrl it then gets a (potentially new) vertex c (Lin.11), which should have the same value index and qubit as c and be available as a control for v (that is adding the control edge c •→ v, and resulting anti-dependency edges does not create a cycle in G).Many implementations of getAvailCtrl are possible, each following different heuristics.The only restriction is that any modification to G must be done through a call to evolveVertex.This and the assertion in Lin. 12 are enough to ensure correctness of evolveVertex as we will discuss in §3.3.Choosing a Control.Let us manually follow the implementation of getAvailCtrl3 .The only control in ctrls is s 1 .We first check if an existing vertex in G with qubit s and value index 1 could be used.This is not the case, as using s 1.0 as control for a 0.1 would result in a cycle, as shown in the second graph in Fig. 3b.We must hence compute a new vertex on qubit s with value index 1.We do so by calling evolveVertex(s, 1, {a}).Note how the set of qubits under construction contains a, as this is a recursive call within the computation of a 0.1 .We can finally link s 1.1 to a 0.1 with a control edge, concluding the computation and yielding the third graph in Fig. 3b.Avoiding Infinite Recursion.The assertion in Lin. 2 ensures that we never call evolveVertex recursively on the same qubit.This avoids infinite recursion where two qubits keep triggering recomputation of the other.To this end, we propagate the set I of qubits currently under construction through getAvailCtrl to potential recursive calls into evolveVertex (see Lin. 11).

Modified Control.
Here the uncomputation of ancilla a (introducing a 0.1 in G) resulted in the modification of its control qubit s (introducing s 1.1 ).The circuit C corresponding4 to G is hence not a correct uncomputation of C. While ancilla a has been correctly brought back to its initial value 0, the value of qubit s has been modified and does not hold the same value as it would in C. We will show in §4.3 how Reqomp notices and fixes such a value mismatch to ensure correct uncomputation.

Formalizing Value Indices
So far, we relied on an intuitive understanding of wellvalued circuit graphs, and used it to build uncomputation for a circuit.Let us now formalize this intuition in Def.3.1.
Definition 3.1 (Well-valued Circuit Graph).We say a valid circuit graph is well valued iff: (i) all vertex names are of the form q s.i where q is the name of the vertex qubit, s and i are natural numbers (ii) there are no duplicate vertices (iii) the init vertex on each qubit is named q 0.0 and for any q s.i in G, q s.0 is in G (iv) any gate vertex q s.i with s > 0 satisfies one of the following: (fwd) valIdx(pred(q s.i )) = valIdx(pred(q s.0 )) and q s.i and q s.0 have the same gate and same control vertices (up to their instance indices) (bwd) if we denote s ′ = valIdx(pred(q s.i )), we have that (i) valIdx(pred(q s ′ .0)) = s, (ii) q s.i .gate is qfree and equal to q s ′ .0.gate† , and (iii) both q s.i and q s ′ .0 have the same controls (up to instance indices).
Here, pred(v) is the unique v ′ such that v ′ → v and valIdx(v) is the value index of v.We now give some intuition of the last condition (iv).Case (fwd) corresponds to a (forward) computation, for instance s 1.0 and s 1.1 in the last graph in Fig. 3b.Here, case (fwd) ensures that s 0.0 → s 1.0 and s 0.1 → s 1.1 apply the same operation to the same starting state, i.e. in both cases qubit s holds the same value before the operation is applied.This is the case as both s 1.0 and s 1.1 have the same gate X, and their predecessors (s 0.0 and s 0.1 ) have the same value index 0. Case (bwd) corresponds to a (backward) uncomputation, for instance a 1.0 and a 0.1 .Here, case (bwd) ensures that the operations a 0.0 → a 1.0 and a 1.0 → a 0.1 are exact inverses of each other.Specifically, it ensures that (i) a 0.1 and the predecessor of a 1.0 (here a 0.0 ) have the same value index 0, (ii) the gates of a 1.0 and a 0.1 are inverses of each other, and (iii) their controls (s 1.0 and s 1.1 ) have the same qubit and value index.

Preserving Values.
Any circuit graph verifying this definition ensures the following: if a qubit is in some basis state at a value index, then if the qubit reaches the same value index at a later point in time, it will again be in this same basis state.Or, more formally: for every pair of vertices q s.i and q s.i ′ , applying all gates G ′ between these vertices should preserve q, in the following sense: where H n−1 denotes the set of quantum states over n − 1 qubits.As we can write any state as a sum of computational basis states, Eq. (2) allows us to reason about any state.We show in App.C that any wellvalued circuit graph ensures Eq. (2) (more precisely, Lem.C.3 implies this).
evolveVertex.In §3.2, we claimed that evolveVertex preserves the well-valuedness of a circuit graph.More precisely, if evolveVertex terminates without error, the resulting circuit graph G is still well-valued (assuming it was before the call).This is ensured through the two assertions in Lin.7 and Lin.12.We now specify those assertions.The first, well_valued_vertex requires that one of the following conditions is satisfied: (fwd) there exists i such that qb oVId.i→ qb nVId.0 is in G and gt and ctrls correspond to the gate and controls of qb nVId.0 ; (bwd) there exists i such that qb nVId.i→ qb oVId.0 is in G, gt is qfree and equal to qb oVId.i.gate† and ctrls correspond to the controls of qb oVId.0 ; The second, correct_control requires that c has the same qubit and value index as c, and that adding the control edge c •→ v (and all resulting antidependency edges) does not create a cycle in G. Taken together, these assertions correspond exactly to the definition of a well-valued graph, and ensuring that it stays acyclic.Further, neither of those assertions refer to the value graph g val and both avoid reliance on the function getAvailCtrl.This allows for a selfcontained definition of well-valued graphs and simpler correctness proof, which we discuss in §5.
Contributions to Circuit Graphs.In the following, we briefly elaborate on the main differences between the circuit graphs introduced in Unqomp and our new notion of a well-valued circuit graph.Unqomp does not use value indices.Instead, uncomputation is built by adding to the circuit graph built from the input circuit an uncomputation vertex for each vertex on an ancilla.Correctness of the uncomputation then relies on this one computation vertex to one uncomputation vertex correspondence in the final circuit graph.This one to one correspondence fundamentally does not allow for recomputation, where three or more vertices may correspond to the same value.In contrast, we introduce the notion of value indices, and prove formally (see §5) that they accurately track values in qubits.We further introduce the notion of a value graph and the function evolveVertex, which leverages value indices to build correct computation and uncomputation in a circuit graph.

Reqomp
The previous section presented our notion of wellvalued circuit graphs, and how they can be used to insert computation or uncomputation on any qubit.We now take a step back and present the complete Reqomp procedure, and how it leverages well-valued circuit graphs and the evolveVertex function to tackle the problem of ancilla variables uncomputation under space constraints.As mentioned in §1, Reqomp takes as input a quantum circuit and a number of available ancilla qubits.If successful, it returns a quantum circuit where all ancilla variables from the original circuit are uncomputed and all other variables are preserved, using only the number of available ancilla qubits.
Example Circuit.Fig. 4 gives an overview of Reqomp and applies it to an example circuit with five ancilla variables, a, b, c, d, and e, and two non ancilla variables r and t.We note that while this circuit does not implement a relevant algorithm, it allows showcasing the key features of Reqomp on a simple example.
Reqomp Workflow.Fig. 4 shows the steps performed by Reqomp, which we detail below.First, Reqomp converts the circuit C into a circuit graph G and a value graph g val (see Fig. 4b).Using this representation, Reqomp identifies the dependencies among ancilla variables in the circuit (see §4.1 and Fig. 4c), and uses them to derive an uncomputation strategy respecting the number of available ancilla qubits (see §4.2 and Fig. 4d).Reqomp then applies this strategy to build a new circuit graph G containing uncomputation (see §4.3 and Fig. 4e).Finally, Reqomp converts the resulting circuit graph into a circuit C (see §4.4 and Fig. 4f).

Identifying Ancilla Variables Dependencies
The first step of Reqomp is converting the input circuit into a circuit graph G, as we described in §3.
Using this circuit graph G, Reqomp then identifies ancilla dependencies.
On the circuit graph G from Fig. 4b, Reqomp identifies all ancilla variables vertices (highlighted in red) and their dependencies (highlighted in blue), and extracts the ancilla dependencies shown in Fig. 4c.There, each vertex corresponds to an ancilla variable and each solid edge corresponds to a control edge among gate vertices between these respective ancilla variables.We will discuss the dotted edge shortly.

Deriving the Uncomputation Strategy
Based on the ancilla dependencies derived above, Reqomp will derive an uncomputation strategy.

What is an Uncomputation Strategy?
The uncomputation strategy describes in which order the ancillae in the circuit should be computed and uncomputed, to satisfy the space constraints that were given as input, while minimizing the number of gates in the circuit.For instance Fig. 1 showcases two different such strategies.The first one, shown in Fig. 1b, is to compute ancilla a, then b, then c, then uncompute c, then b, then finally a.We typically write such a strategy as a, b, c, c † , b † , a † , where we write a to denote "computing ancilla a", and a † to denote "uncomputing ancilla a".The second strategy, shown in Fig. 1c Partitioning Ancillae.The first step in deriving such an uncomputation strategy from the ancilla dependencies is to distinguish groups of ancillae variables that depend on each other; in other words, partitioning the ancillae according to their dependen-  Why Partition Ancillae?Reqomp aims at balancing ancilla qubits and gates.For two ancillae that do not interact, such a trade-off is easy: we should always uncompute the first ancilla early, making its qubit available for the latter one.As the ancillae are independent, the latter one does not need the earlier one, so the early uncomputation will not induce extra gates, i.e., no recomputation is necessary.For instance, in Fig. 4c, ancillae {a, b, c} and {d, e} are independent.Therefore, it is strictly better to uncompute a, b and c before computing d and e, thereby reusing the physical ancilla qubits initially holding a, b and c for d and e.This is in contrast to ancillae that are part of the same partition.For instance, in Fig. 1, we saw that for the 3 linked ancillae a, b, c, uncomputing early yields a different trade-off than uncomputing late.

Strategy for a Connected Component
Now that we have split the ancillae variables in two components, let us derive the uncomputation strategy for each of them.We first note that within each of the components in Fig. 4c, the ancilla variables exhibit a linear dependency.Formally, we say ancillae a 1 , . . ., a n are linearly dependent if all gates targeting a i for i > 1 are only controlled by a i−1 and nonancillae.This corresponds to a component that forms a simple path.In this case, we can derive an optimal uncomputation strategy (in terms of number of computation/uncomputation steps) using dynamic programming [5] 5 .For the first component ({a, b, c}) on at most two ancilla qubits, the following optimal strategy is found: For the second component ({d, e}), also on at most two ancilla qubits, the following optimal strategy is found: If within a component the ancilla variables are not linearly dependent, we abort the current procedure, and fall back on an alternative one, Reqomp-Lazy, which we describe in §4.5.We note that we avoid solving the general problem of finding an uncomputation strategy for any ancilla dependency, as it is P-SPACE complete [6].Combining Strategies After determining the optimal uncomputation strategy for each connected component, we must combine those strategies to yield our complete strategy.To this end, we determine in which order the components should be processed, ensuring that if an ancilla d transitively depends on ancilla c, c's component is processed before d's component (captured by edges in Fig. 4c).In our case, we must process the component with ancilla variables a, b, c before the one with ancilla variables d, e, as d depends on c through the qubit r.Combining the respective strategies in this order, we finally get the complete uncomputation strategy shown in Fig. 4d.If ordering the connected components is impossible due to cycles, we again fall back to Reqomp-Lazy.

Applying the Uncomputation Strategy
We showed in the previous section how Reqomp derives the uncomputation strategy for a given circuit.Further, we have shown in §3.2 how we could use evolveVertex to insert computation or uncomputation in a circuit graph.However, there is a gap between the uncomputation strategy and evolveVertex: the former does not mention any non-ancilla variables, nor any value indices, which are a required argument of evolveVertex.Fig. 5  Those steps can then be applied in order, using the function evolveVertex, which we discussed in §3.2.Non Ancilla Variables.We explained above how the uncomputation strategy can be detailed for non ancilla variables.Let us now describe how non ancilla variables are computed.This is done in two places.First, at the end of the last forward stage on an ancilla anc, anything controlled by this ancilla is computed.
More precisely, we want to compute all t s , where t is a non ancilla variable and s a value index such that the computation of t s is controlled by some anc i , that is to say that there exists some edge t s ′ gate,..,anci − −−−−−− → t s in g val .This is shown in Lin.28-30, and the required computation steps are again computed and applied through the function evolveVtxUntil 6 .Non ancilla variables are also computed at the very end of the strategy, in Lin.33-35.Here any non ancilla variable whose final value index in G is not the same as in G is computed to its final value index in G. getAvailCtrl.When

Obtaining the Final Circuit
If the above check succeeded, the final step of the algorithm converts the circuit graph G to a circuit C.During this step, we perform a generic post-processing optimization, previously discussed in [1].Specifically it replaces in G all CCX gates which are later uncomputed by RCCX gates.While RCCX gates introduce an additional phase change, replacing pairs of CCX gates ensures that this phase change is also reverted.
As RCCX gates can be implemented more efficiently than CCX gates (the latter require more T gates), this can lead to a substantial efficiency improvement.This is particularly appealing in our setting, were we encounter many CCX gates, and most of them are uncomputed.We note that Unqomp could only apply this optimization to gates it had itself uncomputed, whereas Reqomp can also identify uncomputation that is already in place in the original circuit, by leveraging value indices.
The updated circuit graph G is then converted back to a circuit, as described in §3.1.For our example, this results in the circuit C in Fig. 4f.Importantly, the resulting circuit uses the same physical ancilla qubit to hold both a and c, saving one qubit at the cost of an extra uncomputation and recomputation of qubit a.The same physical qubit is also used to hold d, at no extra recomputation cost, as d does not depend directly on a.

Fallback procedure: Reqomp-Lazy
In general, uncomputation according to Def. 2.1 is not always physically possible [1, §6.2].Because we cannot always achieve uncomputation, Reqomp applies heuristics to succeed as frequently as possible.However, we must accept that they may fail in some cases.First, the ancillae within a partition may not be linearly dependent, or the ancillae partitions may have cyclic dependencies.In such cases, Reqomp falls back to the heuristic Reqomp-Lazy, which we will describe shortly.Second, assertions in evolveVertex or assertFullyEvolved may fail, both in Reqomp and Reqomp-Lazy.In such cases, Reqomp returns an error.When this happens, it may indicate that no approach can achieve uncomputation, hinting at a possible implementation mistake or misconception by the programmer.If uncomputation is possible, but no available approach can synthesize it automatically, a programmer can always uncompute manually instead.
Overview.Reqomp-Lazy is inspired by Unqomp [1], but leverages the augmented circuit graphs and evolveVertex.In particular, Reqomp can uncompute and recompute controls for a vertex when they are not directly available.In contrast, Unqomp would have returned an error anytime this happens.We provide an example in Fig. 9 ( §6).
Fig. 6 shows the algorithm Reqomp-Lazy.Lin.44 initializes G with a copy of G, to be extended by adding vertices that perform uncomputation.Lin.45 defines U as the set of all vertices to be uncomputed: it contains the first instance of each value index on each ancilla qubit.Then, Lin.46-49 step through U in reverse topological order and revert all operations on ancillae one step at a time by calling evolveVertex (Lin.49).Then, analogously to Reqomp, Lin.50 asserts all qubits are fully evolved.Finally, as specified in the original version of Unqomp [1, §5.4], Lin.51 allocates ancillae to the same physical qubits if their lifetimes do not overlap.The resulting circuit graph is then finally converted to a circuit, using the procedure detailed in §4.4.

Custom Control Strategy.
While Reqomp-Lazy reuses evolveVertex, it uses a different implementation of getAvailCtrl.This new implementation aims at using controls that are as early (in terms of target edges) as possible, therefore keeping later controls available for later uncomputations.Specifically, to find a control c s for a vertex v, it finds the earliest vertex in G on qubit c and value index s that is available for v.
Recall that in contrast, we used the latest such vertex when using evolveVertex to apply an uncomputation strategy (see §4.3).If no such control vertex can be found, we recursively call evolveVertex to evolve the last vertex on qubit c until it has the state index s, just as we did in §4.3.

Correctness
We prove in App.C that Reqomp synthesizes correct uncomputation.In this section, we provide an intuition of this proof.
Value Index Assertions.The correctness of Reqomp relies on value indices.At the end of the algorithm (Lin.36 when ApplyingStrategy succeeded, Lin.50 when Reqomp falls back to Reqomp-Lazy), we assert that the last vertex on all ancilla qubits has value index 0, and that for any non-ancilla qubit, the value indices of the last vertex are the same for the original graph and the synthesized graph.Intuitively, this ensures that ancillae are reset to |0⟩, while other qubits are preserved.
Correctness hence relies on the precise formal interpretation of value indices.Intuitively, we claim that two vertices on the same qubit with the same value index hold the same value.Extended Circuits.To formally define this notion, we introduce the notion of an extended circuit.We conceptually extend a given circuit to allow us to compare the value of all vertices occurring in the circuit.Fig. 7 exemplifies this by extending the example circuit in Fig. 7a, which applies an H gate and two controlled X gates to qubits q and a.Overall, the circuit in Fig. 7a yields state which we write in a column-by-column format in Fig. 7a (right).
Fig. 7b shows our extension of Fig. 7a, copying9 the value of each vertex from the corresponding circuit graph to a fresh qubit.The name of these copy qubits is the same as their corresponding vertex but underlined, e.g., q 0.0 holds the initial state of q, corresponding to vertex q 0.0 .Value Index.Intuitively, copy qubits with the same value index and qubit hold the same value.More precisely, if we write the state produced by the extended circuit as a sum of computational basis states, in each summand (with a non-null coefficient), copy qubits with the same value index and qubit hold the same value.For example, in every summand (i.e., column) of the final state in Fig. 7b, a 0.0 and a 0.1 hold value |0⟩ (see red bracket in Fig. 7).
Similarly, each qubit holds the same value as its last copy qubit.For example, in every summand (i.e., column) of the final state in Fig. 7b, q and q 1.0 both hold either value |0⟩ or |1⟩ (see blue bracket in Fig. 7).
In Lem.C.3 (App.C) we formally prove that these two facts hold for any well-valued circuit graph, as defined in Def.3.1.
We further show in App.C that any circuit graph built with evolveVertex (Fig. 3a) is well-valued.Final Values in the Extended Graph.The assertion in Lin.36 (resp.Lin.50) ensures that in the circuit graph G built by applying the uncomputation strategy (resp.the circuit graph G built by Reqomp-Lazy), the last vertex on all ancilla qubits has value index 0. Hence, those qubits hold the same value as the initial value of that qubit, i.e., |0⟩.More precisely, consider a circuit graph G with ancilla qubits A and non ancilla qubits Q, and denote G the circuit graph after uncomputation.We then have that any summand in the final state after applying the extended version of G is of the form |0...0⟩ A ⊗|i⟩ Q ⊗|...⟩ V , where we use V to denote all the copy qubits in E(G).
The assertions in Lin.36 and Lin.50 further check that the value indices of non-ancilla qubits match their respective last vertices in G.

As we show more formally in App. C, this means that if the effect of the extended version E(G)
of G on some initial state can be written as then the effect of the extended version E(G) of G on the same state can be written as: where we denote V ′ the set of copy qubits in E(G).
Circuit Graph Semantics.Importantly, the semantics of the unextended circuit follows straightforwardly from the semantics of the extended circuit.In Fig. 7, simply ignoring the rows from Fig. 7b yields the correct final state.If we similarly ignore the values of the copy qubits V and V ′ in the two equations above, we recover the correct uncomputation theorem, for circuits C and C: Multiple Graphs.Note that here we assumed that both G and G have the same effect, as they apply the same gates for the same value indices.Proving this formally requires extra work, done in Lem.C.4 (App.C).

Evaluation
We have evaluated Reqomp on an existing benchmark to answer the following research questions: Q1 Circuit Efficiency: Can Reqomp create efficient circuits in terms of number of qubits and gates, while allowing to trade one for the other?
Q2 Usability: Is Reqomp fast and directly applicable to a wide range of circuits?
Implementation.We implemented Reqomp as a language extension of Qiskit, using Qiskit's built-in AncillaRegister type to mark ancilla variables in the circuit.As Qiskit, our extension is implemented in Python.

Benchmarks and Baseline
To evaluate Reqomp, we used the benchmark from Unqomp [1].The first column in Table 1 summarizes the circuits in our benchmark, separated into "small" and "big" circuits.While the "small" circuits were taken directly from Unqomp, we have generated the "big" circuits by re-parametrizing the original circuits to yield bigger circuits.This allows us to demonstrate the Reqomp also performs well on larger circuits.
For completeness, we provide the exact parameters for each circuit in App.D, including the resulting circuit sizes.

Circuits.
To provide an intuition on our benchmark, we explain selected circuits (see [1, §7.1] for details).
IntegerComparator takes a constant parameter n and multiple input qubits encoding a value v, and flips its output qubit if and only if v ≥ n.MCX flips its output qubit if and only if all its input qubits are one.MCRY applies a rotation to its output qubit if and only if all its control qubits are one.PiecewiseLinearR applies a rotation f (x) to its output qubit, where x is the value on its input qubits and f is piecewise linear.PolynomialPauliR works analogously, but for polynomial f .WeightedAdder takes as parameters a list of weights λ 0 , ...λ n and outputs λ i q i where the q i are the input values.Selecting a Baseline.We provide a thorough overview of related work in §7.Of the many works discussed there, only four can take circuits as input: Square [2], Quipper [7], ReQWire [4] and Unqomp [1].Of these, ReQWire can only verify uncomputation in circuits and not synthesize it 10 , and we show in §7.1 that due to various shortcomings Square is not a viable option for uncomputation.This leaves only Quipper and Unqomp.As [1] showed that Unqomp generally outperforms Quipper, we choose Unqomp as our baseline.
Other works take as input boolean formulas (to be compiled to circuits) [8,9,10], focus on building uncomputation strategies without explaining how to apply them [11,12], or do not compile to circuits [13].

Q1: Circuit Efficiency
We now discuss the efficiency of circuits produced by Reqomp in terms of qubits and gates and compare them to circuits produced by Unqomp [1].

Approach.
For each circuit, we ran Reqomp targeting all possible number of ancilla qubits nAncillaQubits.We then recorded, for all calls that terminated without error, the number of qubits and gates of the resulting circuit (with uncomputation).
We note that Reqomp had to fall back to Reqomp-Lazy for circuits Multiplier and WeightedAdder, as Table 1: Reqomp results when targeting a specific ancilla qubit reduction compared to Unqomp (e.g., −66.7 indicates a reduction by 66.7%).Gate counts are reported as compared to Unqomp (e.g., 70.5 indicates an increase by 70.5%).Columns Max and Min report the results for the most aggressive settings, respectively optimizing only for number of qubits and optimizing only for number of gates.Columns -75%, -50%, and -25% report the gate counts when achieving the respective ancilla qubit reductions.Entries "x" indicate that a given ancilla qubit reduction was not achieved.the ancilla dependencies of these circuits are not linear.While Reqomp-Lazy succeeds on these circuits and even outperforms Unqomp, it cannot offer multiple space-time trade-offs.

Results
. Table 1 summarizes our results.Note that gate counts are expressed as a percentage of Unqomp gate counts.For all examples, using the maximum number of ancilla qubits (column Min as this is the minimal reduction) yields better results than Unqomp for 10 circuits, and equivalent results for the remaining 10 circuits.For example, Reqomp saves 5.2% of gates on circuit IntegerComparator, without requiring additional qubits.This is because Reqomp can identify uncomputation already present in the original circuit, allowing it to avoid unnecessary operations when uncomputing or recomputing an ancilla or even a control.Analogous effects occur for Piecewise-LinearR, WeightedAdder, and Multiplier, where the last two are handled by Reqomp-Lazy.More importantly, Table 1 demonstrates that Reqomp can significantly reduce the number of ancilla qubits compared to Unqomp: by up to 96% for two examples, and by at least 25% for 16 out of 20 circuits.Importantly, this reduction comes at only a moderate cost in gate count, below 28% for qubit reductions of 25%.As most quantum computers are more limited in terms of qubits than gates, these trade-offs are highly favorable.Further, for some examples the reduction in qubits comes at almost no cost in gates: for Piece-wiseLinearR, reducing by 75% the number of ancilla qubits only increases the number of gates by 17.6%.
Trade-Offs.To further demonstrate the gate count cost incurred by these reductions, Figs.8a-8b show a more fine-grained visualization of the trade-offs between ancilla qubits and gate count.
Overall, we immediately observe that on all circuits, reducing the number of available ancilla qubits can only increase (and never decrease) the gate count of the resulting circuit.However, the rate of this increase varies among the different circuits, as discussed next.
For some benchmarks such as PiecewiseLinearR (Figs. 8a-8b) and PolynomialPauliR (Table 1), Reqomp can drastically reduce the number of ancillae at almost no cost in terms of gates.
For other benchmarks such as MCX (Figs. 8a-8b) and MCRY (Table 1), Reqomp can still reduce the number of ancillae substantially, but at a significant cost in terms of gates.In such cases, the appropriate ancilla reduction depends on the available hardwarea programmer with access to Reqomp can then systematically select the right trade-off.
Other circuits fall somewhere between these two categories (Figs.8a-8b and Table 1): Reqomp can reduce the number of ancilla qubits, at a non-negligible cost in terms of gates.
Very Small Number of Qubits.Fig. 8 further demonstrates that enforcing a very small number of ancillae typically increases the number of applied gates significantly.For instance, MCX with 200 controls can be implemented with only 8 ancilla qubits, but this requires a staggering 21 831 gates, compared to only 3579 when 200 ancillae are used.Overall, we conclude that enforcing very small number of ancilla qubits is typically not a good approach.
Depth.For completeness, Figs.8c-8d shows the trade-off between ancilla qubits reduction and circuit depth.As we do not optimize for circuit depth, reducing the number of ancillae sometimes yields shorter circuits.Still, overall, circuit depth behaves analogously to gate count, generally increasing for reduced ancilla qubits counts, at different rates depending on the circuit.
Interestingly, in some cases, we can reduce the ancilla count at almost no cost in circuit depth, even though there is a cost in gate count.For example, reducing ancillae from 99 to 25 on Adder only increases depth by 31%, even though it increases the gate count by 64%.

Q2: Reqomp Usability
We also investigated the usability of Reqomp, showing that it is both fast and directly applicable to many quantum circuits.

Reqomp Runtime. Our evaluation indicated that
Reqomp is fast: it synthesized uncomputation for all circuits in Table 1 within five seconds.Furthermore, running Reqomp typically takes as much time as decomposing the resulting circuit to basic gates using Qiskit's built-in decompose() function.We hence believe that Reqomp can be integrated into the programmer's workflow without incurring a significant slowdown.Applicability.Recall that even for a circuit where uncomputation is possible in principle, Reqomp may raise an error.We therefore investigated how frequently Reqomp succeeds in practice, comparing it to other tools: We find that Reqomp (with the fallback strategy Reqomp-lazy) finds a circuit with uncomputation for all input circuits.In contrast, Unqomp can only cover 60% of those circuits directly.We will explain shortly how we tweaked Unqomp to also cover the remaining 40%.Furthermore, only 50% of the circuits in our benchmark are purely classical, hence any tool that exclusively supports qfree gates can at most be used on 50% of the examples.Unqomp Limitations.Unqomp can only handle 60% of the circuits in our evaluation directly, because it cannot accurately handle uncomputation that already occurs in the input circuit.Fig. 9 illustrates this on a circuit applying a CX gate (see red box on the left), where the bar over C indicates that the control is inverted.To invert the controls, the circuit applies an X gate to invert the control, and another X gate to restore the value of the control.To uncompute a it is hence necessary to track that after two X gates, p is back to its original value shown as p 0 in Fig. 9, and therefore applying a third X gate will bring its value to p 1 again, allowing to uncompute a. Value indices allow Reqomp to precisely track those value changes, and insert the uncomputation gates (in the green box on the right).In contrast, Unqomp fundamentally cannot allow for recomputation, as its correctness relies on each operation being computed and uncomputed exactly once.It further does not recognize uncomputation or recomputation already present in the original circuit.Therefore, in Fig. 9, Unqomp cannot recognize that the second X gate recovers the original value of p.Even if it did, it could not recompute p 1 to uncompute a.
In our evaluation (Table 1), we bypassed this type of issue by defining the red block on the left as a  custom gate controlled by p. Unqomp then never decomposes this new gate, assumes it keeps p constant, and places it to uncompute a.Unfortunately, this approach makes Unqomp harder to use, and in some cases makes the resulting circuit less efficient.

Related Work
We now discuss works related to Reqomp.

Square
Even though it cannot synthesize uncomputation code, Square [2] looks very closely related to Reqomp at first sight.Specifically, it presents "a compiler that automatically [places uncomputation] in order to manage the trade-offs in qubit savings and gate costs" [2, §1].Unfortunately, Square suffers from various shortcomings that prevent a meaningful comparison to Reqomp.Square Problem Statement.Square takes as input a program defining a qfree circuit (non qfree gates are not supported).In this program, each function consists of the three blocks Compute (indicating forward computation), Store (indicating computation of outputs), and Uncompute (indicating uncomputa-tion).Square then compiles this program to a circuit by arranging these blocks, possibly repeating blocks when recomputation is helpful.
Square defines three different strategies for interleaving the blocks.Lazy (uncompute as late as possible), Eager (uncompute as early as possible), and finally Square itself, using a custom heuristic.For the example CCCH in Fig. 1, Lazy would correspond to the 3-qubit strategy shown in Fig. 1b and Eager to the 2-qubit strategy shown in Fig. 1c.We now present the main shortcomings of Square.
Constant Compute/Uncompute Blocks.As mentioned in §1, the gates needed to uncompute an ancilla variable may depend on where this uncomputation occurs in the circuit.It is hence impossible to define fixed Compute and Uncompute blocks to be applied anywhere.
For instance, consider the circuit in Fig. 10a.It uses three ancilla variables a, b, and c to compute the output variable r from the input i. Fig. 10a highlights the Compute and Uncompute blocks Square would consider, namely blocks a, b and, c for computation and blocks a † , b † , and c † for uncomputation.Note how the value of qubit i is changed by block b, and restored later by block b † , ensuring that qubit i has the same value for the CX gate in block a † as it had in block a.Now, if we want to save one ancilla qubit by uncomputing ancilla variable a early, we get the circuit shown in Fig. 10b.Here, when uncomputing a for the first time, the value of i has been changed in block b and is not yet restored.To correctly uncompute a in the block a † 2 (different from the block a † ), it is hence necessary to restore i using an X gate before using it as a control to uncompute a.Similarly, block b † 2 must change the value of i again.
Not accounting for the above, Square assumes that no matter its placement, uncomputation code can be kept unchanged.In particular, its eager strategy would use the Compute and Uncompute blocks from Fig. 10a, yielding Fig. 10c.This is clearly incorrect as this circuit has different semantics than the one in Fig. 10a.For example, for input |0⟩ i |0⟩ t , Fig. 10a produces state |0⟩ i |0⟩ t while Fig. 10c produces state |0⟩ i |1⟩ t (assuming ancillae are in state |0⟩).
We note that Square does not exclude such patterns-in fact its little-belle benchmark contains an analogous pattern. 11Incomplete Uncomputation.Besides only supporting fixed uncomputation code, Square may also skip uncomputation of some ancilla variables.For some examples evaluated in [2], the implementation of the lazy strategy does not insert any uncomputation 11 Benchmark little-belle is available at https: //github.com/epiqc/Benchmarks/blob/master/bench/square-cirq/synthetic/little_belle.py.
We note that different uncomputation strategies do not yield different results on it, as it does not contain gates modifying the output and hence is semantically equivalent to the identity.code at all, leaving all ancilla variables dirty, while the eager strategy uncomputes all of them.Specifically, we believe that the reported differences between strategies in the Square publication ([2, Tab.III]) on the benchmarks12 RD53, 6SYM, 2OF5, and ADDER4 are only due to leaving some ancillae dirty-as these benchmarks do not contain nested uncomputation, the order of uncomputation should not make a difference.
Additional Parameters.Finally, the implementation of Square is inconsistent with the system described in [2].Specifically, using the interface to specify Compute blocks requires providing 7 parameters, and some benchmarks evaluated in [2] also contain Unrecompute and Recompute blocks not mentioned in the publication [2].Even though the authors provided us with brief explanations of these parameters on request, we could not confidently derive correct parameters for new benchmarks.

Purely Classical Circuits
Most works synthesizing uncomputation cannot handle non-qfree gates [4,7,8,9,14]. 13It has already been established [1] that using such works on quantum circuits by separating out the qfree subparts typically yields inefficient circuits, and is sometimes even impossible.
In the following, we discuss works which only support qfree gates, and define a custom strategy allowing to trade qubits for gates.We have already discussed Square in §7.1.Boolean Functions.Revs [8,9] translates irreversible classical functions to reversible circuits.It focuses on optimization possibilities during the translation from boolean functions to reversible circuits, but also offers an uncomputation strategy, however without the option of trading qubits for gates.
Similarly, [10] also translates boolean specifications to reversible circuit.While it introduces another uncomputation heuristic, it also cannot trade qubits for gates.
We expect that both of those strategies could be incorporated into Reqomp, possibly yielding more efficient circuits.
Pebble Games.Multiple works present uncomputation strategies for classical reversible computation, which can be reduced to solving pebble games [12].Importantly, while pebble games operate on dependency graphs on values, Reqomp operates on quantum circuits.In particular, pebble games assume all values can be uncomputed, which is incorrect for nonqfree gates.Further, a direct translation of circuits to such graphs would ignore repeated values, leading to issues analogous to Fig. 9.In contrast, conflating repeated values can lead to cyclic dependencies, which are not supported by pebble games.
Knill [5] provides an optimal yet efficient solution for linear dependencies.As most circuits we encounter in practice exhibit linear dependencies, Reqomp uses the same uncomputation strategy.Meuli et al. [11] suggest using a SAT-solver to handle arbitrary dependencies, which may be a possible extension of Reqomp.

Non-Qfree Circuits
We now discuss works offering uncomputation for non-qfree circuits.
Language Level.Quantum languages like Quipper [7] and Q# [13] offer convenience functions to automatically insert uncomputation.However, these functions are often tedious to use, and may insert incorrect uncomputation (see [1, §8] for details).
Silq [3] uses a type system to detect which variables can be safely uncomputed, but does not synthesize this uncomputation.Overall, none of those works can constrain the number of ancillae used.
Circuit Level.We are aware of only two works supporting uncomputation for non-qfree circuits.Re-QWire [4] can only verify user supplied uncomputation (in the case of non-qfree circuits).Unqomp [1] allows to synthesize uncomputation for quantum circuits, but cannot trade qubits for gates.Further, as discussed in §6, it uses a notion of circuit graphs that does not allow to track qubit values and therefore is unable to uncompute directly many examples that Reqomp can handle.

Conclusion
We introduced Reqomp, a method to synthesize and place efficient uncomputation for quantum circuits with space constraints.Reqomp is proven correct and can easily be integrated into circuit based quantum languages such as Qiskit.We demonstrate in our evaluation that Reqomp is widely applicable and yields wide ranges of trade-offs in space and time, for instance allowing to generate tightly space constrained circuits by using only a few ancilla qubits.

Ancilla qubits n
Number of qubits

B Algorithms B.1 Partitioning
Fig. 11 shows the algorithm for partitioning the input graph.

GetPath.
The function getPath used by evolveVertexUntil is shown in Fig. 12.For ancilla variables, it simply returns the shortest path between the two values in the value graph.However for non ancilla variables, it forces the computation of intermediate values that may not have been computed yet.This could happend for a circuit such as: q X X H Here the value graph is: Therefore, if we want to compute q 2 from q 0 , the shortest path is simply q 0 → q 2 .However as H is not qfree, once q 2 has been computed, it can never be uncomputed again, and therefore, we can never compute q 1 , which may be needed for some later computation.
To correct this, we introduce q 1 (if it has not already been computed in G) in the path, giving:

Linear Steps
Fig. 13 shows getLinearStrat.It is adapted from [5]: we added the uncLast parameters that allows us to apply it to ancillae only (that is we want all qubits to be computed once then uncomputed whereas the original algorithm did not uncompute the last qubit in the dependency line).

C Formal Correctness Proof
In the following, we provide a formal proof that Reqomp synthesizes correct uncomputation according to Def. 2.1.

C.1 Definitions and Helper Lemmas
We first define what we consider to be a valid circuit graph, following [1]: In a valid circuit graph, we can define for any non init vertex n its predecessor pred(n) as the only vertex m such that m → n (the target edge from m goes to n).We can also define for any qubit q its last vertex last(q): it is the only vertex on qubit q with no outgoing target edge.
We now recall the well-valued circuit graph definition.

Definition C.2 (Well-valued Circuit Graph). We say a valid circuit graph is well valued iff:
(i) all vertex names are of the form q s.i where q is the name of the vertex qubit, s and i are natural numbers (ii) there are no duplicate vertices (iii) the init vertex on each qubit is named q 0.0 and for any q s.i in G, q s.0 is in G (iv) any gate vertex q s.i with s > 0 satisfies one of the following: (fwd) valIdx(pred(q s.i )) = valIdx(pred(q s.0 )) and q s.i and q s.0 have the same gate and same control vertices (up to their instance indices) (bwd) if we denote s ′ = valIdx(pred(q s.i )), we have that (i) valIdx(pred(q s ′ .0)) = s, (ii) q s.i .gate is qfree and equal to q s ′ .0.gate† , and (iii) both q s.i and q s ′ .0 have the same controls (up to instance indices).
Vertices in a well-valued circuit graph are of the shape q s.i , where we call s its value index (valIdx in the algorithms) and i its instance index.i is 0 for the first occurrence of q s in the graph, but otherwise we only use its value to ensure uniqueness of the vertex names.
Due to the following lemma, it suffices to only consider valid and well-valued circuit graphs:   Lemma C.1 (evolveVertex Correctness).For a valid and well-valued circuit graph G, any number of calls to evolveVertex results in a valid and well-valued circuit graph G such that (i) {q s.0 ∈ G} is a subset of {q s.0 ∈ G} and (ii) for any q s.0 in G ∩ G, it has the same gate and control vertices (up to instance index) in both graphs.
Proof.By induction on the depth of calls to evolveVertex.
We then define the extended graph E(G) of a circuit graph G. Roughly, we want E(G) to keep a copy of every vertex q s.i in G, saved on a fresh qubit q s.i .For a graph G with one qubit and two vertices, we show E(G) in Fig. 14.

Definition C.3 (Extended Graph).
For any circuit graph G = (V, E), we define its extended graph E(G) = (V e , E e ) as follows: V e =V ∪ q s.i 0.0 , q s.i 1.0 | q s.i ∈ V E e =E ∪ q s.i 0.0 → q s.i 1.0 | q s.i ∈ V ∪ q s.i •→ q s.i 1.0 | q s.i ∈ V For each q s.i in V , we have added a new qubit q s.i , with one init vertex and one gate vertex CX controlled by q s.i .In the following we refer to those added qubits as V .Note that while q s.i 1.0 is a vertex, q s.i is qubit.
As the extended graph is a valid graph, it corresponds to a circuit and therefore its semantics E(G) is well defined.For a given input state φ to G, this allows us to define: where p(Q) = (p(q (1) ), ..., p(q (n) )) for qubits Q = {q (1) ...q (n) }.
Using these coefficients, we can prove the following three lemmas.First, the semantics of the circuit graph G can be expressed in terms of its projected coefficients G p : Lemma C.2 (Projected Coefficients for Graph Semantics).For a circuit graph G we have:

G φ = p:E(G).qbs→{0,1}
G p |p(G.qbs)⟩ Proof.We can prove this by induction on the number of gates in G.
Second, copies have consistent values.Specifically, for a given qubit q and valIdx s, all q s.i hold the same value as q s.i , and the value of q is the same as the copy of the last vertex on q: Lemma C.3 (Null Projected Coefficients).For a valid and well-valued circuit graph G = (V, E) and p : E(G).qbs → {0, we have G p = 0 if (i) p(q s.i ) ̸ = p(q s.0 ) for some q s.i , or (ii) p(q) ̸ = p(last(q)) for some qubit q.
Finally, if G p ̸ = 0, it depends only on the gates used for the first computation of each q s.0 .

C.2 Main Proof
Using Lem.C.1-C.4,we can prove the correctness of Reqomp: Theorem C.1 (Correctness).Have G a circuit graph built from a circuit with n qubits, of which m are ancilla variables.Without loss of generality, we can assume that those ancilla variables A = a (1) , . . ., a (m)  are the first m qubits of G. Let Reqomp(G, A) = G.
Note that this is an equivalent rewrite of Def.2.1.
Proof.We first make the values of the non-ancilla qubits explicit, and denote R = G.qbs\A.This allows us to rewrite Eq. ( 5) as : (7) Similarly for G we can write: (8) Note that here we use λ to refer to a coefficient in G, and not to the complex conjugate of λ.
To prove the theorem, it is hence enough to prove that for all k ′ , To do so, we first identify Eq. ( 8) with Lem.C.2.This gives us that: The assertion at Lin. 36 in the Reqomp algorithm (Fig. 5) and Lem.C.3 then give that for any ancilla qubit a (i) , if p(a (i) ) ̸ = p(a (i) 0.0 ), then G p is null.As a (i) 0.0 copies the initial state of the ancilla, we then get that if k ̸ = 0, then λ kk ′ = 0, proving (i).
To prove (ii), we first note that Eq. ( 9) holds analogously for G, allowing us to derive the following.Here, we denote V 0 = {q s.0 ∈ V }.We then have for any k ′ in {0, 1} n−m : Using Lem.C.3, we have that for any p 0 : V 0 → {0, 1}, there is a unique p + 0 : E(G).qbs → {0, 1} such that p |V0 = p 0 and G p is not known to be null.We can hence further rewrite Eq. (12): Fig. 1a uses three ancillae variables a, b, c, stored in the respective ancilla qubits u 0 , u 1 , u 2 .The first ancilla a holds o • p,

Figure 2 :
Figure 2: A Circuit C and the corresponding value graph g val and circuit graph G
Graph G and value graph g val (omitted)

Figure 5 :
Figure 5: Applying the uncomputation strategy.We assume G, G, and g val are globally available.

Figure 7 :
Figure 7: Intuition on the correctness of Reqomp.
(a) Gate counts for selected small circuits.(b)Gate counts for selected big circuits.(c)Circuit depth for selected small circuits.(d)Circuit depths for selected big circuits.

Figure 14 :
Figure 14: Extended graph example, copy vertices are shown in green.
k∈{0,1} m λ kk ′ = k∈{0,1} m p:E(G).qbs→{0,1}p(G.qbs)=kk ′ ,1} p:E(G).qbs→{0,1}p(R)=k ′ p |V 0 =p0 G p (12) bridges this gap by showing how Reqomp translates the uncomputation strategy into a series of calls to evolveVertex, which will build a new circuit graph G with uncomputation.Reqomp does not insert the uncomputation directly in G, the circuit graph built from the input circuit C. Instead, it builds a new graph G from scratch, adding computation and uncomputation on all qubits step by step.G is initialized in Lin.16.Initially, it contains one init vertex for each non ancilla qubit in G.For the circuit graph G shown in Fig. 4b, this results in the following graph G: To compute an ancilla to the objective value index chosen above, Reqomp relies on the function evolveVtxUntil, shown in Lin.38.This function first determines what the current value index of the variable is in G, that is to say what is the value index of the last vertex with qubit var.Using g val , the function then determines the intermediate computation steps required to bring var from its current value index from to the objective one to.This is simply the shortest path in g val from var from to var to .
A New Circuit Graph.ancilla a.For a computation stage, the first step is to allocate a qubit (Lin.19) and create a new vertex on this qubit (Lin.20-21).This new vertex is then linked to the last vertex on the same qubit, if it exists, with a target edge in Lin.22.This is typically the case if the qubit was previously used to compute and uncompute another ancilla.Further, at the very end of each stage, if it was an uncomputation stage, the qubit is marked as freed and therefore can be reused for later stages, see Lin. 32.Detailed Steps for Ancilla (Un)computation.Now that the ancilla has been allocated a qubit if necessary, Reqomp computes the detailed computation or uncomputation steps the current stage requires, in Lin.23-26.First, Reqomp decides on what is the objective value index for the current stage.If this is an uncomputation, it is 0, as the ancilla should be uncomputed back to its initial value.If this is a computation step, the ancilla should be computed until its maximum value index in G.For instance, for ancilla a, this maximum value is 1.EvolveVtxUntil.
2e introduced evolveVertex in §3.2, we mentioned that it relies on an auxiliary function getAvailCtrl to get the controls required for a vertex.Any heuristic can be used for this function, as long as it only modifies G through evolveVertex.Reqomp uses the following heuristic7.Suppose we need a control c s (that is value index s on qubit c) to be used to control some vertex v.We first find the latest (that is the lowest following target edges) vertex with this qubit and value index in G.If this vertex is available for v, that is adding a control edge from this vertex to v does not create a cycle in G, we return it.If this vertex is not available, or if no such vertex exists, we recursively call evolveVertex to build a new vertex on qubit c with value index s from the latest vertex on qubit c.
Asserting Uncomputation is Complete.After the uncomputation strategy has been applied as described above, Lin.36 performs a final check8.It asserts that all variables are fully evolved, either back to their initial state index 0 (for ancilla variables), or to their final state index in G (for non-ancillae).If not, Reqomp falls back to the alternative Reqomp-Lazy strategy (see §4.5).

Table 2 :
Notational conventions used throughout this work.