Quantum-accelerated constraint programming

Constraint programming (CP) is a paradigm used to model and solve constraint satisfaction and combinatorial optimization problems. In CP, problems are modeled with constraints that describe acceptable solutions and solved with backtracking tree search augmented with logical inference. In this paper, we show how quantum algorithms can accelerate CP, at both the levels of inference and search. Leveraging existing quantum algorithms, we introduce a quantum-accelerated filtering algorithm for the $\texttt{alldifferent}$ global constraint and discuss its applicability to a broader family of global constraints with similar structure. We propose frameworks for the integration of quantum filtering algorithms within both classical and quantum backtracking search schemes, including a novel hybrid classical-quantum backtracking search method. This work suggests that CP is a promising candidate application for early fault-tolerant quantum computers and beyond.


Introduction
Constraint programming (CP) is a paradigm used to model and solve constraint satisfaction and combinatorial optimization problems [1]. In CP, a problem is expressed in terms of a declarative model, identifying variables and constraints, and the model is evaluated using a general-purpose constraint solver, leveraging techniques from a variety of fields including artificial intelligence, computer science, and operations research. CP has successfully been used to approach challenging problems in areas such as scheduling, planning, and vehicle routing [1][2][3]. Given a CP model of a problem, a constraint solver performs a search for values of the variables that satisfy the expressed constraints. The search performed is often systematic, as is the case in a backtracking search [4], although some solvers employ incomplete schemes such as local search [5,6]. In this work, we focus on the former, with an emphasis on backtracking search. While CP bears similarity to other paradigms for modeling and solving combinatorial problems, such as boolean satisfiability (SAT) [7] or integer programming (IP) [8], the technology differentiates itself in a number of key ways. Modeling efforts leverage rich decision variable types and focus on identifying and combining encapsulations of frequently occurring combinatorial substructure. Search effort is reduced through logical inference, a process whereby possible variable-value assignments are ruled out based on the constraints in the model. Each constraint is supported by an inference algorithm that rules out value assignments for the variables in the scope of the constraint. Assignments removed in one constraint often propagate to other constraints, allowing even more inference to be performed [1]. This emphasis on inference and propagation is particularly powerful as it can significantly reduce the fraction of the search space that must be explored, detecting "dead ends" as early as possible. It follows that the success of CP is dependent on the availability of efficient procedures for performing inference to prune infeasible value assignments. The benefits of a reduced search space are severely muted if the time taken to perform inference exceeds the time needed to explore the area pruned by that inference. Indeed, a large body of research exists that is expressly focused on finding increasingly efficient inference algorithms for various constraints [9][10][11][12]. In this paper, we explore the use of quantum computing to accelerate CP, focusing on innovations for both inference and search.
In the context of inference, we explore the use of quantum algorithms for graph problems, especially that for finding maximum matchings in graphs [13], to accelerate classical inference algorithms in CP. The quantum algorithms used heavily exploit Grover's algorithm for unstructured search [14]. In addition to speeding up inference, we argue that the structure of inference in the CP paradigm represents an attractive framework for the deployment of quantum algorithms. The encapsulation of combinatorial substructure within CP models provides an elegant mechanism for carving off portions of complex problems into inference subproblems that can be solved by a quantum co-processor. These smaller subproblems require fewer resources, making them promising candidates for the early faulttolerant quantum computers of the future. With respect to search, we investigate the adaptation of existing quantum tree search algorithms [15,16] to the search performed within CP, and provide preliminary resource estimates for this integration. Our adaptations are focused on incorporating quantum filtering within both classical and quantum backtracking search algorithms.
At a high level, this paper is intended to provide researchers in quantum computing with an introduction to core concepts in CP before detailing how quantum algorithms can be applied to the paradigm. While CP has been recently used as an approach to more efficiently compile quantum circuits [17], this work, along with an earlier version [18], conducts the first investigation into the formal integration of quantum computing into CP. Our initial explorations indicate the potential for symbiosis between the two paradigms: quantum algorithms can accelerate both inference and search in CP, and CP offers an attractive, modular formalism for tackling hard problems that makes it a promising candidate application for early fault-tolerant quantum computers, 1 and beyond. This paper primarily combines and exploits existing quantum algorithms from the literature, and some extensions thereto, to provide quantum algorithms for CP. Our main overall contribution is a detailed examination of how these algorithmic building blocks can be put together and applied to CP. This includes a careful analysis of subtle aspects of 1 By "early fault-tolerant" quantum computers, we mean future devices with error rates small enough, for example, to enable phase estimation, but with a limited number of logical qubits. While we argue that our proposals are suitable for early generations of such devices because their hybrid nature allows for putting smaller parts of the problem on the device, we do not expect that the quantum algorithms we discuss will be successfully implementable using NISQ devices. In other words, our target is quantum devices that are still intermediate-scale but for which noise (at the logical level) is not a significant factor. this application, such as parameter regimes in which the algorithms best classical algorithms, the use of classical data in quantum algorithms, and how to incorporate filtering into backtracking when one or both are quantum. While the details of this analysis are specific to this application, we expect that our approach to the subtler aspects of these algorithms will be useful for designing and analysing the use of these building blocks in other applications.
The primary contributions of this paper are as follows: i. We propose a quantum-acceleratedÕ( |X||V ||E|)-time 2 bounded-error algorithm for domain consistency of the alldifferent constraint, where |X| is the number of variables, |V | is the number of unique domain values, and |E| is the sum of the variables' domain sizes. Our approach follows the main classical algorithm, accelerating the basic subroutines performed at each iteration with quantum analogs. The complexity is dominated by that for finding maximum matchings in bipartite graphs. Long-standing state-of-the-art deterministic and randomized classical algorithms take O( |X||E|) and O(|X| ω−1 |V |) time, respectively, where ω corresponds to the asymptotic cost of classical matrix multiplication; the best upper bound known on ω is 2.373. 3 Our approach, leveraging an existing quantum algorithm, improves over these bounds by factors on the order of |E|/|V | and |X| 2ω−3 |V |/|E|, respectively, up to polylogarithmic terms. 4  ii. We identify a broader family of global constraints, including the global cardinality constraint (gcc) and the inverse constraint, whose domain consistency filtering algorithms can be accelerated with quantum algorithms. As with the alldifferent constraint, the worst-case complexity of the classical domain consistency filtering algorithms for these global constraints is dominated by finding maximum matchings in bipartite graphs.
iii. We detail frameworks for integrating quantum-accelerated inference algorithms within classical and quantum backtracking search schemes. We show that the speedups noted for previously proposed quantum backtracking algorithms can also be leveraged for quantum branch-and-infer search. We also propose partially quantum search schemes that yield speedups for smaller sections of the tree, intended for early, resource-constrained quantum devices. Finally, we provide preliminary resource estimates and discuss the benefits and drawbacks of each search approach.
The organization of the paper is as follows. Section 2 provides background for important concepts in CP and Section 3 summarizes relevant previous work. Section 4 discusses how quantum algorithms can access classical data. Section 5 details a quantum-accelerated algorithm for the alldifferent constraint and discusses its generalization to other global constraints with a similar structure. Section 6 details the integration of our quantumaccelerated filtering algorithms within both classical and quantum tree search schemes. Finally, Section 7 provides concluding remarks and future research directions.

Constraint Programming Background
In this section we provide background on the fundamental concepts of constraint programming (CP). The interested reader is referred to additional sources for a more thorough review of the subject [1,19]. There is a variety of open-source [20][21][22][23] and commercial [3] software available for modeling and solving problems using CP.

Constraint satisfaction problems
CP is a paradigm used for solving constraint satisfaction and optimization problems. A constraint satisfaction problem (CSP) consists of variables X = (x 1 , . . . , x |X| ), with associated domains D = (D 1 , . . . , D |X| ), and constraints C = (C 1 , . . . , C |C| ). The domain is the finite set of values the variable can possibly be assigned. Each constraint C ∈ C acts on a subset of the variables X, known as the scope of the constraint. Let d * i represent the value actually assigned to variable x i . A solution to a CSP is a tuple of assigned values (d * 1 , . . . , d * |X| ) ∈ D 1 × · · · × D |X| such that, for each constraint C ∈ C the values assigned to the variables within its scope satisfy the constraint.
Similarly, a constraint optimization problem (COP) is a CSP with an objective function. The solution to a COP is an assignment of values to variables, from the domains of those variables, such that each constraint is satisfied and the objective function is optimized (either maximized or minimized). The objective function is typically represented by a variable and associated domain. In this work we focus on techniques that can be employed to solve CSPs and COPs within a backtracking tree search framework.

Backtracking search algorithms
Backtracking search algorithms are an important and general class of algorithms used to solve CSPs and can be seen as performing a depth-first traversal of a search tree [4]. The search tree provides a systematic way of investigating different decisions in a divideand-conquer fashion. The root node of the search tree corresponds to the original CSP and subsequent nodes, generated via branching, are increasingly constrained versions of the original CSP. Backtracking algorithms are effective given the means to quickly test whether or not a node in the search tree can lead to a solution; for this reason, backtracking search algorithms are not helpful for unstructured search problems. A node that cannot lead to a solution is called a dead end. In general, it is advantageous to detect dead ends as quickly as possible so that search effort can be directed to more promising areas of the search tree.
In the simplest form of backtracking, often called naive backtracking [4], a branch represents an assignment of a value to an unassigned variable. This can be thought of as adding a constraint to the CSP, i.e., x i = d j for some d j ∈ D i , or, alternatively, as reducing the domain associated with the variable to the assigned value, i.e., D i = {d j }. There are also more sophisticated branching rules than this simple assignment-based branching. For each non-leaf node in the tree, a child is generated for each value in the domain of the variable being branched on. The branching process involves variable-and value-selection heuristics; the former identifies the variable to branch on, and the latter identifies the order in which the branches are explored.
The node resulting from the branch corresponds to the CSP of the parent with an updated domain. A predicate P is then used to test whether the node can lead to a solution or not, returning 1 if the node indicates a solution to the CSP, 0 if the node cannot lead to a solution (i.e., definitely infeasible), and indeterminate ( * ) otherwise. The ability to efficiently determine the value of P is often due to exploiting problem structure.
If a node in the search is a dead end (i.e., P returns 0), the most recent branch is undone (backtracked), a new branch is posted, and the process repeats. If the node represents a solution to the CSP (i.e., P returns 1), the problem has been solved. 5 If it is not clear that the node can lead to a solution (i.e., P returns * ), a new branch is posted, the resulting node is tested with P , and the process repeats.
Thus, there are two core ingredients of backtracking search: i) a means of testing whether a node can lead to a solution or not, and ii) heuristics that determine how to branch. In the next section we detail the branch-and-infer tree search used in CP.

Branch-and-infer search
Search effort in CP is reduced via logical inference. Each constraint in the CSP model is supported by an inference algorithm that performs domain filtering (also known as pruning), a process that removes possible variable-value assignments by proving that they cannot satisfy the constraint, and thus cannot participate in a solution to the CSP. As variables often participate in multiple constraints, value removal by the filtering of one constraint often triggers value removals from variables in neighboring constraints in a process termed propagation. For the purposes of this paper, the term logical inference encapsulates both filtering and propagation. The remainder of this section provides a more formal description of the overall search process.
CP's branch-and-infer search follows the framework of backtracking search; however, the predicate P is extended to a propagation function F that prunes values from the domains of the variables and, through this domain reduction, determines whether a node can lead to a solution or not. The search is specified by two main operators: the propagation function F and a branching operator h.
Let us define where 2 S is the power set of set S, i.e., 2 D is the set of tuples of subsets of the domains. The search proceeds by exploring a tree in which each node is associated with the variables and constraints of the original CSP, and a distinct element of 2 D representing the domains "active" at that node.

Propagation Function
The propagation function takes active domains 6 D and returns filtered domains D and a flag β ∈ {0, 1, * } representing the status of the node (definitely infeasible, definitely feasible, or indeterminate, respectively). We also use F d (D) and F p (D) to denote the filtered domains and flag (predicate value) returned by F , respectively. A description of the propagation function F is given by Algorithm 1.
The propagation function first initializes the node status to indeterminate ( * ). Then, it enters a loop of domain filtering based on each of the constraints using the constraint filtering algorithm Filter C for each constraint C ∈ C in the CSP encoding. The constraint filtering algorithm for a given constraint identifies variable-value assignments that it can prove will not satisfy the constraint based on the structure of the constraint and the current domains. The extent to which a filtering algorithm removes values is dictated by the level of constraint consistency desired; see Section 2.4. The output of Filter C is the filtered domains 7 and a flag indicating whether the constraint is unsatisfiable (0) or potentially satisfiable ( * ).
After processing each constraint, the process repeats. As mentioned previously, domain values pruned based on the information in one constraint propagate to another, yielding the ability to perform more domain reduction. There are two cases for the filtering loop (lines 2 − 14) to terminate: i) when, after calling the filtering algorithm on all of the constraints, the domains are unchanged, or ii) when a filtering algorithm Filter C returns "unsatisfiable" (a value of 0). In the latter case, an infeasible constraint is enough to declare the node a dead end and a backtrack is initiated. For the former case, if all the domains are singletons, flag β is updated to a value of 1 indicating a feasible solution has been found. Otherwise, at the current node, the value of β is left as indeterminate ( * ). The pruned domains are returned and the branching operator, described in the next section, is used to generate a child and continue the search.
The development of efficient filtering algorithms is of utmost importance to the success of CP. Better logical inference can detect dead ends earlier in the search (thereby pruning away many nodes that would otherwise need to be explored) but usually at the cost of being more expensive (i.e., time consuming) at each node that is explored. It is therefore useful to classify the extent to which a particular filtering algorithm removes values in order to balance this tradeoff. This topic forms the core discussion in Section 2.4.

Branching Operator
While some CP searches use naive backtracking (as described above), the paradigm commonly employs alternative branching strategies including (but not limited to) 2-way branching, which creates two children by posting constraints x i = d j and x i = d j [4] for some d j ∈ D i , and domain splitting, which creates two children by posting constraints of the form x i ≤ d j and x i > d j . 8 Generally, a branch in CP adds a unary constraint (not limited to an equality constraint as in naive backtracking) to the CSP. To preserve the completeness of the search, the total set of branching constraints posted from a node must be exhaustive.
We can equivalently view branching in CP as an operation which, given a tuple of domains at a node in the search tree, produces a set of child nodes, each associated with a tuple of domains. In this case, instead of a node being defined by the original CSP and the set of branching constraints posted, a node is instead defined by the original variables X, constraints C, and the tuple of active domains (where the active domains incorporate all domain reductions from previous branching decisions as well as reductions due to filtering at each node). This formulation will be particularly useful for our discussion of quantum backtracking extensions in Section 6.3. Formally, we define the branching operator as taking the parent's tuple of domains as input and returning the c-th child (another tuple of domains) produced by branching. We also define the operator as taking the parent's tuple of domains as input and returning the number of children generated by branching from the parent (as determined by the specific branching strategy used). Classically, such a function is not typically needed explicitly, because the children are explored sequentially; however, for the quantum algorithms we propose in this work it will be necessary. To maintain completeness, we require that is the domain of variable x i in the c-th child. That is to say, any assignment consistent with the domains of the parent is consistent with the domains of at least one child.

Consistency
Concepts of consistency have long played a key role in CP and are fundamentally important to the performance of backtracking search algorithms [24]. In this section, following previous work [19], we consider domain consistency (also known as generalized arc consistency) and range consistency but recognize that there are other relevant notions of consistency. [25,24]. Intuitively, the constraint is domain consistent if, after assigning any variable to any value in its domain, there exists a set of values to assign to the other variables (from their domains) such that the constraint is satisfied. From this, we can define a domain consistent CSP as follows:  Establishing domain consistency for some constraints can be, in the worst case, as intractable as solving the global CSP. For this reason there exist weaker forms of consistency, such as range consistency defined as follows [19]. Range consistency is effectively a relaxation of domain consistency, only checking the feasibility of the constraint with respect to the range of domain values instead of the domain itself. As such, filtering for range consistency is never more costly than domain consistency, though the difference in effort depends on the specific constraint involved. Domain consistency (or another specified level of consistency) for a constraint is achieved via the filtering operator for that constraint. As effort spent filtering domains typically results in fewer nodes in the search tree, much of the effort in CP has been in identifying constraints that promote efficient and effective filtering. These special constraints are known as global constraints.

Global constraints
A global constraint is a constraint acting on a more-than-constant number of variables that represents a commonly recurring combinatorial substructure [19]. The motivation for the use of global constraints is twofold. First, the shorthand of the constraint simplifies the high-level modeling task. Second, while an equivalent constraint relationship may be expressed with a combination of simpler constraints, global constraints can strengthen the performance of solvers by maintaining a more global view of the structure of the problem. This often translates to the ability to perform more domain filtering.
To illustrate this concept, we introduce a concrete example involving the alldifferent global constraint [26]. Definition 2.4 (alldifferent constraint). The constraint alldifferent(x 1 , . . . , x k ) requires that all of the variables in its scope take on different values (i.e., in a solution to the constraint x i = x j for all i < j ∈ {1, . . . , k}).
The alldifferent constraint captures a commonly recurring substructure, namely that a subset of problem variables need to take on different values. This structure often occurs in timetabling and scheduling problems, for example. In addition to packaging this structure nicely for modeling purposes, the global constraint also increases the amount of inference that can be performed.
, and the constraints x 1 = x 2 , x 1 = x 3 , and x 2 = x 3 . Enforcing domain consistency on each constraint does not allow us to perform any inference; each constraint is domain consistent and the pruned variable domains remain the same. It follows that the CSP is domain consistent as posed, even though its unsatisfiability is apparent.
In contrast, consider the impact of using the alldifferent global constraint: , and the constraint alldifferent(x 1 , x 2 , x 3 ). Enforcing domain consistency on the alldifferent constraint would quickly return infeasible, as assigning x 1 = 1 does not permit a set of values for x 2 and x 3 that would satisfy the constraint, nor does x 1 = 2, thus emptying the domain store of x 1 . As such, the constraint (and thus the CSP) is inconsistent.
Evidently, it is beneficial to represent a relationship of difference over a set of variables as an alldifferent constraint. In general, global constraints are proposed for substructures that permit enhanced filtering, not just to facilitate a more concise expression of CSPs. The library of available global constraints is extensive 9 and useful for modeling a wide array of problems [12]. The success of these global constraints, however, is largely tied to the efficiency of their underlying filtering algorithm.
The worst-case complexity of filtering algorithms for domain consistency can be polynomial or exponential (i.e., as intractable as the problem being solved), depending on the constraint. To mitigate this, weakened forms of consistency can be accepted, or the filtering algorithm can simply be terminated prior to achieving domain consistency in favor of branching in the search tree. Much of the research effort in the CP community revolves around the design of algorithms that achieve a given level of consistency with improved worst-case complexity over existing methods (e.g., for scheduling problems with resource constraints [27,28]).

CP modeling and solving: Sudoku
To illustrate CP's branch-and-infer search, consider a CP model for the popular puzzle game Sudoku [29]. An example Sudoku puzzle is visualized in Figure 1. A solution to the puzzle assigns a number ranging 1-9 to each cell such that each row, column, and 3 × 3 sub-grid contains all of the digits from 1 through 9. The decision problem related to solving general Sudoku puzzles on n 2 × n 2 grids is NP-complete [30]. In this example, n = 3.
One option for modeling the problem with CP uses integer decision variables of the form x i,j ∈ {1, . . . , 9}, representing the value placed in the cell of row i, column j. Since it is a satisfiability problem, we do not have an objective function. Let I = {1, . . . , 9}. The CP model includes the variables x i,j ∈ I, ∀i, j ∈ I subject to the following constraints: alldifferent(x i,1 , . . . , x i,9 ), ∀i ∈ I (7) alldifferent(x 1,j , . . . , x 9,j ), ∀j ∈ I (8) Constraint (6) embeds the "clues" of the problem, fixing them to take on the values given in the instance. Constraint (7) ensures that each cell in a given row takes on a different value, and Constraint (8) does the same for columns. Constraint (9) enforces that each 3 × 3 sub-grid must have all different values. We note that, however, this is not the only possible model. For example, we could have used binary decision variables x i,j,k , taking on a value of 1 if value k ∈ {1, . . . , 9} is assigned to the cell in row i, columnn j, and 0 otherwise. Of course, the set of constraints using this variable definition would be different as well. Appendix C provides modeling examples for other problems using CP. With the model in hand, we can use CP's branch-and-infer search to solve the Sudoku instance. The search would start by invoking the propagation function to perform root node filtering over the model constraints. For example, enforcing domain consistency on alldifferent(x 1,1 , x 1,2 , . . . , x 1,9 ) would not change the variable domains (as row 1 is blank). However, enforcing domain consistency on alldifferent(x 2,1 , x 2,2 , . . . , x 2,9 ) would prune a number of values (e.g., the value 3 would be pruned from the domain of x 2,1 ) as the hints in this row permit some inference. After the model achieves the desired level of consistency, the search branches by, for example, assigning a value to a variable (i.e., guessing a value assignment to a cell), before repeating the process.
The CP model described above has 9 × 9 = 81 decision variables, each with an initial domain of size |I| = 9, in the worst case. Each of the alldifferent filtering subproblems, however, only involves nine variables. While representing the entire problem on a quantum chip may be prohibitive, the filtering subproblems for the alldifferent constraints can more readily take advantage of early fault-tolerant quantum hardware.
Within CP's branch-and-infer tree search, an improvement in the efficiency of global constraint filtering allows for domain values to be pruned faster. While this does not change the number of nodes explored before finding a solution, which is usually the dominant factor in the runtime, it does reduce the per-node time and therefore has a significant positive impact on the efficiency of solving problems in practice. This phenomenon has been demonstrated in similar paradigms such as integer programming (IP) where improvements in solving linear programming (LP) relaxations has had an orders-of-magnitude impact on the performance of IP solvers over the past few decades [31].

Related work
In this section we identify work immediately relevant to this paper. In Sections 4, 5, and 6 we present additional background in the context of our results. Similarly titled work by Di Pierro et al. introduced a formal language for expressing quantum computations that they called "quantum constraint programming" [32]; that is, they apply the modeling aspect of CP to quantum computations, whereas here we employ a quantum version of the classical CP approach to solving problems.
This work investigates the quantum acceleration of CP, at the levels of both inference and search. Our explorations make use of two important quantum algorithms: quantum search and phase estimation. For the former, Grover's algorithm famously shows that a target element can be found in an unstructured list of N elements with only O( √ N ) quantum queries, which gives a realizable quadratic speedup over classical black-box search (requiring N queries in the worst case) when the oracle for accessing the list can be queried or implemented efficiently [14]. Phase estimation is a quantum algorithm used to estimate the eigenvalues of a unitary [33].
Quantum algorithms for solving various graph problems have seen considerable progress in recent years. These algorithms are often studied in the quantum query model, where the algorithm accesses the graph through a quantum query operation as we detail in Section 4. Previous work provides lower and upper bounds for the bounded-error quantum query complexity of various graph problems, including connectivity, minimum spanning tree, and single-source shortest path [34][35][36][37][38]. Particularly relevant work has investigated the quantum query complexity of matching problems [13,[38][39][40][41][42], showing speedups over classical query-model algorithms, and in some cases [40,13] explicitly improved time complexities, as discussed further in Section 5.1.2.
Similarly, there have been efforts to develop quantum algorithms for constraint satisfaction and search. Quantum search has been extended to problems within mathematical programming, such as semidefinite programming [43,44] and the acceleration of the simplex method [45]. The latter, in a similar fashion to this work, uses quantum search to accelerate the subroutines of the simplex method (e.g., variable pricing). There also exist recent efforts to use algorithms based on quantum search and phase estimation to speed up tree search methods including branch-and-bound [46] and backtracking search [15,16,47]; we discuss the latter in more detail in Section 6. See in particular [15, Sec. 1.3] for a historical overview.

Quantum resources and data access
In this paper, we propose quantum algorithms and quantum subroutines to solve problems whose instance specification and solution are classical information. Many of these quantum algorithms require quantum access to classical data that enables computation on a superposition of classical data. This section synthesizes the relevant ideas from the literature as background for later sections. For example, suppose that an instance is specified by a function f : W → Y , where Y = {0, 1} r and elements of Y can be viewed as integers with addition modulo |Y | = 2 r . By "quantum access", we mean the ability to call the unitary on an arbitrary state w∈W,y∈Y τ w,y |w |y . We will refer to the first register |w as the query register and the second register |y as the database register. Quantum algorithms are assessed with respect to several different metrics: query complexity, time complexity in the oracle model, and time complexity in the gate model. In the quantum oracle model, we are given access to a unitary such as that expressed in Eq. (10) (the "oracle"). Query complexity counts the number of queries (calls to the oracle) the algorithm uses. Time complexity in the oracle model additionally counts the number of primitive (two-qubit) gates, counting each query as a single gate. In the gate model, there is no oracle; everything must be expressed in terms of primitive gates, including the query unitary. (While the oracle model only exists in theory, it can be extremely useful, especially for proving lower bounds, and in some cases directly leads to improved gate-model time complexities.) The time complexity is usually the number of primitive gates (i.e., the size of the circuit) 10 , but can also be the depth of the circuit if parallelization is permitted.
The quantum time complexities stated in this work are in terms of the depth of quantum circuits consisting of two-qubit gates, but with parallelization only within the parts of the circuits that implement the queries. The classical run-times reported, however, are in a higher-level model that neglects non-dominant logarithmic factors associated with bit-level operations; a lower-level model, such as classical circuits consisting of binary gates, would include such additional factors. To abstract away such implementation details, we consider only the dominant factors when comparing classical and quantum.
There are two main ways of implementing the query in the gate model. The first can be used when the data specified by f can be efficiently and uniformly computed classically, in which case one can use an explicit, efficient quantum circuit computing the function f . For example, in applying Grover's algorithm to a problem in NP, the function f is simply the verifier, which by definition can be efficiently computed classically, and thus also by an efficient quantum circuit. For any classical circuit with t gates on n bits, one can construct a reversible version (and thus suitable for quantum circuit implementation) with just O(t 1+o(1) ) gates on O(n log t) bits. See [49,Ch. 6] for details. For unstructured data, the second method, which we employ in this work, is called quantum random access memory (QRAM) [50].
QRAM is a data structure that allows quantum queries of the form expressed by Eq. (10). Broadly speaking there are two classes of QRAM: i) a speculative, yet plausible, special-purpose physical hardware analogous to classical RAM [51], and ii) circuit-based QRAM [52,53]. In the former, the number of qubits required is linear in the database size, O(|W | log |Y |), and query calls occur in logarithmic time O(log |W | + log |Y |). The latter circuit-based QRAM, on the other hand, is more flexible in terms of quantum resources supporting trade-offs between circuit width (number of qubits) and depth (number of quantum gates). Circuit QRAM can be implemented implicitly using O(log |W | + log |Y |) total qubits with circuit depth O(|W | log |Y |), or explicitly using O(|W | log |Y |) total qubits and O(log |W | + log log |Y |) depth. 11 Because gates can act in parallel, the total number of primitive gates can still be linear in the database size, even with logarithmic depth. 12 Henceforth, by "QRAM" we will mean either the special-purpose hardware or explicit circuit-based variants, with linear space, logarithmic access time (circuit depth, in the latter case), and linear initialization time (but logarithmic update time). In both variants, the contents of the database are stored explicitly in memory. In some cases we will require this memory content to be in a superposition, as further detailed in Section 6.3.
We are primarily interested in quantum query access to a directed or undirected graph G = (V, E), with n = |V | vertices and m = |E| edges. We consider two models for accessing the graph: the adjacency matrix model (i.e., the "matrix" model) and the adjacency list model (i.e., the "list" model).

Adjacency matrix model. Let
In the matrix model, the query is defined by be an array with the neighbors of vertex v. Then, in the list model, we can query The overall initialization time for the graph is thus at most O(m log n), though we will often need to initialize only the relevant portions of the model at a given point in an algorithm, and in many cases this too can be parallelized further.
In this paper, we primarily use the list model due to its superior performance in the application we consider.

Quantum-accelerated global constraint filtering
In this section we detail a quantum-accelerated filtering algorithm for domain consistency of the alldifferent constraint. In Section 5.1.1, we review Régin's classical filtering algorithm for the alldifferent constraint. In Section 5.1.2, we explain how Dörn's quantum algorithm for maximum matching can be used to speed up the costliest part of Régin's algorithm. In Section 5.1.3, we explain how several quantum algorithms can be combined to speed up the less costly parts of Régin's algorithm. In Section 5.2, we explain how Cymer's general filtering framework allows for using quantum maximum matching to filter other global constraints whose domain-consistency algorithms are structurally similar to that for alldifferent.

The alldifferent constraint
The proposed quantum subroutines accelerate the classical algorithm of Régin [9] for filtering the alldifferent constraint. We note that more recent work has investigated techniques for optimizing Régin's algorithm in practice; however, these do not improve upon its worst-case time complexity [54,55].
we only require it for a specific type of memory circuit) may be difficult to realize in practice.

Classical filtering algorithm
The classical filtering algorithm for alldifferent begins by constructing a bipartite variable-value graph [9], as illustrated in Figure 2. The example visualized involves variables and domains but there are other possibilities as well. A domain-consistency filtering algorithm for this constraint seeks to remove values in each domain that cannot participate in a solution to the constraint. For this example, x 3 = d 2 is an assignment that will never be feasible and thus d 2 should be pruned from the domain of x 3 .
Recall the notation from Section 2.1. We define the bipartite variable-value graph as G = (X, V, E), with vertices X ∪ V and edges E. Each edge in the graph represents a variable-value pair. In this case, V = i D i is the set of unique domain values. Such a graph has n = |X| + |V | vertices and m = |E| = i |D i | ≤ |X||V | edges, with |E| ≥ |V |.
Algorithm 2: alldifferent filtering [9] Input: Variables X and domains D Output: False if no solution, otherwise filtered domains With G, the filtering of alldifferent proceeds as detailed in Algorithm 2. The algorithm consists of two primary subroutines: FindMaximumMatching, which finds a matching of maximum size (i.e., one with the most edges) in G, and RemoveEdges, which identifies edges in G that can never participate in a maximum matching. If FindMaxi-mumMatching returns a matching M whose number of edges |M | < |X|, the constraint cannot be satisfied and the algorithm terminates. If a maximum matching exists with |M | = |X|, the algorithm prunes domains based on the output of RemoveEdges. The result is a set of pruned variable domains such that the constraint is domain consistent.
The FindMaximumMatching subroutine bears the brunt of the computational complexity [26]. For our purposes, we only invoke this subroutine when |V | ≥ |X| (as the case |X| > |V | is clearly infeasible). Long-standing previously state-of-the-art classical algorithms for finding maximum matchings run in O(m √ n) time; the algorithm of Hopcroft and Karp (HK) is for bipartite graphs [56], while the algorithm of Micali and Vazirani (MV) applies to general graphs [57,58]. Given any initial matching, these algorithms operate in phases, where each phase of the algorithm looks to find a matching of greater size. The runtime of each phase is O(m), and O( |M |) = O( √ n) phases are required [56], where |M | is the size of the maximum matching. In terms of the variable-value graph properties, since |M | is bounded by |X|, these algorithms take O(|E| |X|) time.
Following the algorithms of HK and MV, a randomized O(n ω )-time algorithm [59,60] was proposed for bipartite graphs, where ω corresponds to the classical asymptotic cost of matrix multiplication; the best upper bound known on ω is approximately 2.373 [61]. In terms of the variable-value graph properties, this algorithm takes O(|X| ω−1 |V |) time [60]. Alt et al. then proposed an O(n 3/2 m/log n) algorithm [62]. Each of these algorithms offer modest improvements over HK for dense graphs.
Finally, very recent work using interior-point methods and dynamic graph algorithms have led to anÕ(m + n 3/2 )-time classical algorithm [63] for finding maximum matchings in bipartite graphs, offering the first significant improvement over HK. The algorithm leverages fast linear system solvers to provide near-linear asymptotic complexity for moderately dense graphs. In terms of the variable-value graph properties, with n = O(|V |), their algorithm runs inÕ(|E| + |V | 3/2 ) time.
In order to remove edges which participate in no maximum matching, and thus cannot satisfy the constraint, RemoveEdges finds strongly connected components (SCCs) in a directed transformation of G using Tarjan's O(n+m) algorithm [64]. While this subroutine is not the dominant contribution to the computational cost of alldifferent filtering, its acceleration can still be valuable in practice.
In the remainder of this section we provide details for the FindMaximumMatching and RemoveEdges subroutines that accelerate the filtering of the alldifferent constraint. For the former, we summarize existing quantum algorithms for finding maximum matchings in graphs, noting the complexity of the state of the art. For the latter, we combine a number of quantum graph algorithms, including an adaptation of work that identified strong connectivity in graphs [35], into a quantum subroutine that improves over the classical algorithm in some cases.

Subroutine: Finding a maximum matching
The essence of a quantum-accelerated filtering algorithm for alldifferent is simple: use a quantum algorithm to solve the maximum matching problem. Recent work proposed a series of algorithms for finding maximum matchings in the quantum query model [40,13]. To the authors' knowledge, and excluding an earlier version of this work [18], these results have never been linked to accelerating global constraint filtering in CP.
In the list model (see Section 4), an initially proposed quantum algorithm is capable of finding maximum matchings in O(n √ m + n log 2 n) time [40], while a second, improved algorithm runs in O(n √ m log 2 n) time [13]. 13 The latter algorithm, proposed by Dörn, improves over both existing deterministic and randomized algorithms for the majority of parameter values, and follows the classical MV algorithm for finding maximum matchings in general graphs [57], but accelerates its primary subroutines with quantum search.
In Dörn's time-complexity result, the first log factor is due to repetitions of quantum search required to produce an overall constant success probability bound 14 for the algorithm [13]. Each individual instance of quantum search may provide an incorrect answer with a constant probability [14], and the overall algorithm uses poly(n) quantum searches. To ensure the probability of the overall algorithm producing an incorrect answer is less than O(1/poly(n)), each instance of quantum search is repeated O(log n) times for a success probability of at least 1 − O(1/poly(n)). The implications of these probabilties are discussed more in the context of backtracking search in Section 6.1. The second log factor is due to the cost of the (QRAM) queries, as discussed previously in Section 4, and other low-level implementation costs.
As A quantum algorithm for maximum bipartite matching with query complexity O(n 3/4 √ m) in the list model is shown in [41], with the same complexity recently obtained for general graphs in [42]. In each case, it remains an open problem to obtain similarly improved time complexities; see, e.g., [42,Sec. 4].
In addition to accelerating the filtering of alldifferent, Dörn's algorithm (and any improved quantum algorithms of the future) for finding maximum matchings [13] can play a crucial role in the acceleration of domain-consistency algorithms for a broader family of global constraints, as discussed in Section 5.2.

Subroutine: Removing edges
If a maximum matching is found such that |M | = |X|, Algorithm 2 proceeds to initiate the RemoveEdges subroutine. The subroutine leverages properties formulated by Berge [65] that describe necessary and sufficient conditions for an edge to be involved in some maximum matching. If an edge does not participate in any maximum matching, the edge can be pruned. Instead of applying Berge's conditions directly, the problem has been previously translated into a search for edges in directed simple paths and strongly connected components (SCCs) in a directed transformation of the graph [9,26]. We describe the main steps of the RemoveEdges subroutine, detailed by Algorithm 3, as follows.
The input to the RemoveEdges subroutine is the variable-value graph G and a matching M . In DirectGraph, the edges in G are directed depending upon whether or not they are in the matching M , producing directed graph G M . Edges in the matching are directed from variables to values ("right-facing") and the remaining edges from values to variables ("left-facing"), as shown in Figure 3.
Using this graph, FindSimplePaths is used to identify all edges reachable from unmatched vertices, i.e., those not involved in the matching M . This is achieved by a simultaneous breadth-first search (BFS) from the unmatched vertices, where any edge traversed is marked as "used". The output of FindSimplePaths is a set E used of used edges, and the time complexity is Θ(|E used |); in the worst case, |E used | = Ω(m), but it is typically much smaller. In the example of Figure 3, edges (d 3 , x 3 ) and (x 3 , d 4 ) will be explored by the BFS and thus marked "used". These are then added to set E used . Definition 5.1 (Strongly connected component (SCC)). A strongly connected component of a directed graph G is a maximal set of vertices S such that for every pair of vertices u, v ∈ S, there is a directed path from u to v and from v to u.
Next, FindSCC is used to find SCCs in G M . Any edge in an SCC is in some maximum matching and cannot be pruned [9]. Tarjan's algorithm can be used to find SCCs with time complexity O(n + m) [26,64]. In the example of Figure 3 To summarize, the classical implementation of RemoveEdges has time complexity O(n + m) from the lower-level subroutines FindSimplePaths, FindSCC, and Identi-fyEdges. In terms of the variable-value graph properties, this is O(|E|), since |E| ≥ |V |. Though FindSimplePaths is already asymptotically optimal in the classical case, as the run-time is linear in the size |E used | of its output, the remaining subroutines can be accelerated using quantum search. Specifically, our proposed quantum algorithm improves the time complexity of RemoveEdges toÕ |E used | + |V ||E| + |E||R| (where R is the set of edges that are removed). In the worst case, and up to logarithmic factors, this matches the classical complexity of O(|E|), but in the best case, as discussed below, a |E|/|V | improvement factor can be obtained. In the remainder of this section, we describe how to achieve this. Our approach uses a quantum analog of Tarjan's algorithm, Q-FindSCC, as detailed in Appendix A.
for all vertices v ∈ X ∪ V . For variable vertices v ∈ X, δ M,v = 1, and so constructing U N M,v is trivial. However, whenever it is called as part of a search, we can simply return the appropriate value classically 15 . For unmatched value vertices v ∈ V , U N M,v = U Nv ; i.e., it is the same as that for the undirected graph G. Quantum SCC finding. Existing work has produced quantum algorithms for determining if a graph is strongly connected [35], noting that an adaptation of the approach can find the SCCs. In Appendix A we describe such an adaptation. We outline a quantum analog of Tarjan's algorithm, Q-FindSCC, which can output the SCCs of a directed graph with time complexity O( √ nm log 2 n), when given quantum access to the graph, as just described (or O( |V ||E| log 2 |V |) in terms of the properties of G). The log factors come from requiring a constant success probability and low-level implementation details (including QRAM queries), as discussed in Section 5.1.2 for the maximum matching algorithms. In the next stage, we will need quantum access to the components S, from a unitary U S ; see Appendix A. This can be done by recording them in QRAM during the execution of Q-FindSCC without changing the time complexity.
Finding edges. Finally, we describe how to remove edges with time complexity O( |E||R|) with Q-IdentifyEdges, where R is the set of edges that are removed. This procedure takes as input the set of unitaries {U Nv } v and the unitary U S . The general idea is to perform a quantum search over the |E| edges in G. The time complexity of the procedure is O( |E||R| log 2 |V |), even in the case when |R| is unknown a priori [66].
Recall that we wish to remove edges that are: i) not in M , ii) not in E used , and iii) not connecting vertices in an SCC. We proceed by searching over the edges incident to variable vertices v ∈ X. For each v and i ∈ [δ v ], we want to remove the incident edge {v, Given quantum access to S, N v and E used , this can be computed in log(|V |) time. Note, as with the matching, QRAM for quantum access to E used can be initialized during the runtime of FindSimplePaths as edges are discovered, without adding to the complexity of RemoveEdges.
If there are r v ≤ δ v edges to be removed from the search over vertex v, the time complexity isÕ(

Generalizing quantum filtering
As detailed in the previous section, filtering for the alldifferent constraint consists of a feasibility check followed by a pruning step to enforce domain consistency. The former is achieved by finding a maximum matching in a bipartite graph representation of the constraint, while the latter uses a combination of breadth-first and depth-first searches to look for SCCs to enable the pruning of edges from the graph. Other global constraints with a similar structure can also be accelerated using quantum subroutines. Indeed, the need to find matchings in graphs is a bottleneck for many global constraint filtering algorithms. The global cardinality constraint (gcc), for example, is another such constraint [11]. The gcc constraint is commonly used in scheduling, rostering, and timetabling problems [11]. Our previous work shows that the domain-consistency algorithm for gcc can be accelerated with quantum algorithms in a fashion similar to that for alldifferent [18]. Beyond alldifferent and gcc, however, there are other global constraints whose domain-consistency algorithms consist of finding maximum matchings and SCCs in bipartite graphs, such as the alldifferent_except_0 constraint. Definition 5.3 (alldifferent_except_0 constraint). Given variables X = (x 1 , . . . , x |X| ), alldifferent_except_0(x 1 , . . . , x |X| ) requires that all of the variables in its scope not assigned a value of 0 to take on different values (i.e., in a solution to the constraint, In this case, the variable-value graph is constructed such that there are additional vertices on the value-side of the graph allowing multiple variables assigned to a value of 0 to participate in a maximum matching and thus satisfy the constraint. This variation of alldifferent is often useful in models that require difference among a subset of the variables that is unknown a priori. The work of Cymer [10,68] provides a powerful mechanism for extending our results beyond alldifferent and gcc. Their initial work details a generic filtering algorithm incorporating the Dulmage-Mendelsohn (DM) canonical decomposition [69] for global constraints whose domain-consistency algorithms consist of finding maximum matchings in bipartite graphs, followed by a subsequent linear-time step [10]. Given a bipartite graph associated with the constraint, their general filtering mechanism has two main steps: finding a maximum matching and computing the DG canonical decomposition. As with Régin's algorithm for alldifferent, their generic algorithm is dominated by the complexity of finding maximum matchings in bipartite graphs, as computing the DM decomposition can be done in linear time [10]. It follows, then, that any global constraint that can be made domain consistent by their algorithm can also be accelerated, with respect to worst-case time complexity, by the quantum algorithm for finding maximum matchings, and the associated data structures, outlined in the previous section of this paper. In addition to alldifferent, gcc, and alldifferent_except_0, Cymer shows that their algorithm can be used for other global constraints including inverse, same, and usedby, as detailed in Appendix B. See [10] for the full list of thirteen global constraints considered.
In subsequent work [68], Cymer proposes a generic filtering mechanism for global constraints whose domain-consistency algorithms are expressed over general graphs, incorporating the Gallai-Edmonds decomposition [70]. Similar to the bipartite case, maximum and optimal-degree matchings play an important role in this algorithm. Given that the quantum algorithm for finding maximum matchings in our approach is applicable to general graphs [13], quantum filtering could accelerate aspects of the filtering for these families of constraints as well. In contrast to Cymer's DM-based algorithm, however, it is unclear as to whether quantum computing would improve on worst-case time complexity since finding matchings in their Gallai-Edmonds decomposition algorithm is not necessarily the dominating complexity term.

Quantum-accelerated branch-and-infer search
Recall that the constraint programming (CP) approach to solving a CSP is to augment backtracking search with logical inference. In the previous section, we showed how to accelerate inference with quantum processing. In this section, we focus on using quantum processing to accelerate branch-and-infer search. In Section 6.1, we detail how the quantum subroutines for inference described above can be integrated into an otherwise entirely classical backtracking search. In Section 6.2, we review existing quantum algorithms for backtracking search and introduce hybrid variants of quantum backtracking search that interpolate between the fully classical and fully quantum cases. Finally, in Section 6.3, we show how these quantum algorithms for backtracking can be applied to CP, including how to integrate quantum filtering into quantum backtracking search algorithms to obtain a fully quantum branch-and-infer search algorithm.

Classical backtracking with quantum-accelerated inference
Integrating quantum filtering algorithms within a classical backtracking search is the most straightforward of the frameworks we propose, as well as potentially the earliest to be implementable on quantum devices with some degree of fault-tolerance since it requires the fewest quantum resources. The high level idea is to use a classical processor to manage the tree search and a quantum co-processor to solve global constraint filtering subproblems. In the context of Algorithm 1, this would involve replacing some (or all) of the Filter C algorithms with quantum analogs (i.e., using quantum alldifferent filtering instead of the classical algorithm).
Recall that our quantum subroutines require quantum access to their inputs. For simplicity, we assume that the contents of the QRAM can be prepared classically using the same low-level encoding before being loaded into the QRAM; in this way, any overhead from classical memory calls is exactly the same regardless of how the filtering is done. Quantum FindMaximumMatching, for example, requires quantum access to the bipartite variable-value graph G, which we supposed is performed using QRAM, as discussed in Section 5.1.2. For the purposes of our classical backtracking approach with quantumaccelerated inference, it suffices to use a single QRAM capable of holding the bipartite variable-value graph for the most memory-intensive alldifferent constraint at the root of the tree (where it is largest). This QRAM can then be initialized from scratch for each filtering subproblem without impacting the asymptotic complexity of the quantum subroutine. 16 We can also imagine a more sophisticated implementation where the QRAM is updated to reflect removed edges as the tree is traversed, instead of re-initializing it from scratch each time. We note both approaches result in the same asymptotic complexity.
Similarly, the subroutine Q-FindSCC requires quantum access to the directed graph G M . As discussed in Section 5.1.3 the QRAM for G (used by quantum FindMaximum-Matching) can be converted to one for G M with negligible overhead. The subroutine Q-IdentifyEdges requires quantum access to the strongly connected components S returned by Q-FindSCC and the used edges E used returned by FindSimplePaths; the QRAM for these can be initialized during the course of these prior subroutines. Again, only one QRAM is needed for each, and can be re-initialized for each iteration of filtering.
Given the probabilistic nature of the quantum algorithms we employ, their integration within the CP search must be done with care. Recall from Section 5.1.2 that each of the quantum subroutines proposed for alldifferent filtering involves a polynomial number of quantum searches, where each instance of quantum search yields an incorrect answer with a probability bounded above by a constant; to ensure the overall success probability of the subroutines is bounded by a constant, each of the quantum searches is repeated O(log n) times. While this repetition yields a bounded-error algorithm for a single instance of filtering, in a tree search the filtering algorithms are called (potentially exponentially) many times, often more than once per node in the search. The repetitions necessary to ensure a constant overall success probability for an exponential number of quantum searches could overwhelm any quantum speedups. With this in mind, we propose two approaches for integrating quantum filtering in a classical tree search: i) an exact method, where quantum subroutines with a certain property can be integrated without sacrificing the completeness of the search and ii) a bounded-error method.

Exact method
Quantum subroutines augmented to return an explicit failure indicator (e.g., a boolean flag that tells us whether or not the algorithm has succeeded) can be used in a tree search algorithm that requires perfect completeness (i.e., an exact tree search) with negligible overhead. To illustrate this, suppose we have such a quantum algorithm and a classical algorithm for the same problem (e.g., finding a maximum matching). Let c(n) = poly(n) be the runtime of the classical algorithm, which always succeeds, and q(n, p) be the runtime of the quantum algorithm with failure rate p, which we are free to choose. 17 Suppose now that we run the quantum algorithm for p = o(1/c(n)); if it fails (as noted by the failure indicator), we then run the classical algorithm. This is a Las Vegas algorithm; it always succeeds, but its runtime varies probabilistically. The expected runtime is then q(n, o(1/c(n))) + p · c(n) = q(n, 1/poly(n)) + o (1). That is, the average runtime is only negligibly more than running the quantum algorithm for failure rate at most inverse polynomial.
Of course, the above result is only beneficial to us when the quantum algorithm offers a speedup over its classical counterpart. While we can often design quantum subroutines that offer such speedups, augmenting them to include the required failure indicator without impacting asymptotic runtimes is not always possible. To illustrate this point, consider the two subroutines in our quantum filtering algorithm for alldifferent, as detailed in Section 5, in turn.
The quantum FindMaximumMatching subroutine can be extended to include the required failure indicator with the same complexity scaling reported in Section 5.1.2. The failure rate of this subroutine, as previously stated, is already polynomially small, and a further reduction in failure rate to a smaller polynomial entails only changing the constant factor absorbed by the asymptotic notation. To construct the required failure indicator, we must check whether or not the subroutine returned a maximum matching. In all cases, the subroutine as formulated returns some set of edges. In linear time, as we explain, we can check whether these edges are a maximum matching and set the failure indicator to "true" if not. If the returned edges are a maximum matching, then in linear time we can construct a minimum vertex cover [71] of the same size. If we follow this constructive procedure and the resulting vertex cover is larger than the matching, then we know it is not maximum and we set the failure indicator to "true". Otherwise, we confidently return the maximum matching and a failure indicator of "false".
On the other hand, we cannot efficiently construct a failure indicator for the quantum RemoveEdges subroutine. It could return an incorrect edge to remove if Q-FindSCC returns an invalid set of SCCs. Unlike verifying a maximum matching, it is not possible to classically check whether the SCCs returned are correct without overwhelming the quantum speedup. As a result, this subroutine cannot be advantageously incorporated into an exact tree search approach.
Our quantum alldifferent filtering algorithm for an exact tree search implementation, then, would consist of both quantum and classical FindMaximumMatching subroutines and a classical RemoveEdges subroutine. A similar analysis can be conducted for the subroutines of other global constraint filtering algorithms.

Bounded-error and heuristic methods
Alternatively, suppose we permit the overall tree search to fail (i.e., not find a solution to the CSP if one exists) with some constant probability, and we wish to use all quantum subroutines available to us to maximize the speedup achieved at each node. Since, in this case, the output of some quantum subroutines (i.e., quantum RemoveEdges) is not efficiently verifiable classically, we would would need an approach that is robust to errors. To achieve a constant success probability for the overall tree search, we can restrict the search to calling our quantum subroutines a polynomial number of times; in this case it suffices to have each subroutine fail with probability at most inverse polynomial, which introduces at most O(log n) overhead, as discussed above. We can also, for example, ensure that the quantum subroutines are invoked at earlier nodes in the tree which typically represent larger subproblems and will stand to benefit from quantum speedups the most. After some predetermined polynomially bounded threshold of calls to the quantum subroutines, the tree search transitions to using classical filtering only.
A final "heuristic mode" approach is to always use the quantum subroutines regardless of the size of the tree without stringent guarantees on the overall success probability. In this case, the effect of subroutine failures on the overall tree search is strongly dependent on the particular tree search and filtering algorithms used. For filtering algorithms in which the only failure mode is not pruning a domain value that could have been pruned, the tree search will remain complete. However, the resulting tree may end up larger than if the filtering succeeded without failure (i.e., pruned all removable values).

Quantum-accelerated backtracking
In this section we detail quantum-accelerated approaches to backtracking search. We begin by reviewing existing quantum backtracking search algorithms from the literature. Then, we present variations of these algorithms that perform quantum tree search over subtrees of the full tree, using fewer quantum resources at the expense of a smaller speedup. In Section 6.3 we will incorporate inference into the backtracking algorithms to obtain quantum branch-and-infer search algorithms.
Following existing work [15], the results in this section assume the existence of a classical backtracking algorithm A that finds a solution to the CSP or determines that none exists. This algorithm implicitly defines a tree T that contains T vertices and has depth L. As before, this classical backtracking algorithm is assumed to traverse the tree with a depthfirst search, where the ordering of each node's children is determined by A. Further, we let T UB and L UB represent (efficiently calculable from the CSP itself) upper bounds on the number of nodes in and depth of T , respectively. Finally, we let T A be the number of nodes actually explored by A in finding a single solution to the CSP (or proving that none exists). We report the complexity of quantum backtracking algorithms as a function of these parameters; the full complexity includes the cost of implementing the per-node procedures, as detailed in Section 6.3.

Background
In quantum tree search, we are given quantum access to operators that locally define a tree (i.e., that specify each node's children) and a predicate that evaluates a node. The goal can be i) to determine if the tree contains a marked node, i.e., one for which the predicate value is 1 (indicating a solution to the CSP); ii) to find a marked node, if one exists; or iii) to find all marked nodes. While the methods described here apply in a general setting, we will be concerned with their use to search the tree defined by a given backtracking procedure, as described in Section 2.2. We discuss implementation of the node operators in the context of CP in Section 6.3.2.
Montanaro [15] (building off of Belovs [72]) gave a quantum algorithm to determine if a search tree T contains a marked node inÕ( √ T UB L UB ) queries to the operators that locally define the tree, and to find such a node inÕ( √ T L 3 ) queries (orÕ( √ T L) when there's a promise of only one marked node). Jarret and Wan (JW) [16] extended and improved the algorithm to find a marked node inÕ( √ T L) queries in general. (Their query complexity is actually tighter when expressed in terms of the effective resistance of the tree, which is always upper bounded by the depth.) Ambainis and Kokainis (AK) [47] gave a tree size estimation algorithm that, together with Montanaro's algorithm, yields an algorithm to find a marked node inÕ( √ T A L 3 ) queries. That is, with a small overhead, AK ensures that the quantum backtracking algorithm explores only as much of the tree as the classical algorithm A would. T A , the actual number of nodes explored by A, can be much smaller than T UB for two reasons. First, the upper bound T UB that we can efficiently calculate before starting the tree search may be much larger than the size T of the tree, especially when A uses inference. (Whether we have such a bound or not is irrelevant when actually finding a marked vertex, because we can try exponentially increasing values T UB , thereby introducing at most a factor logarithmic in T UB .) Second, the classical algorithm A can stop when it finds the first solution, so the number of nodes T A it explores can be much smaller than the total number of nodes T in the tree when it is not required to find all solutions. While AK's and JW's versions have better scaling than Montanaro's with respect to the number of nodes and depth, they involve more complicated procedures that can significantly increase the constant factors and degree of the logarithmic ones; a practical implementation would need to take this into account. Nevertheless, the quantum parts of all three variants are mostly the same, and so improvements in implementation for, say, Montanaro's algorithm would likely also apply to the the extensions.
The foundation of these quantum backtracking algorithms is a quantum walk operator defined as follows (see e.g. [15,Sec. 2]). For each vertex s (with children s ) of the tree with root r and any α > 0, let where the summation s ←s is over all children of s. Define a walk operator that reflects about the subspace perpendicular to |ψ s (α) when s is not marked: where I is the identity operator. Let A be the set of vertices that are an even distance from the root (including the root itself), and B be the set of vertices that are an odd distance from the root. Let The quantum algorithms for tree search are based the spectral properties of the overall walk unitary W B W A (α). 18 Specifically, if there is at least one marked node, then the root |r has non-trivial overlap with a 1-eigenvector of W B W A (α); if there is no marked node, then the root is orthogonal to the 1-eigenspace. These can be distinguished by phase estimation on the root. Montanaro's algorithm for detecting whether a marked node exists is to repeatedly perform quantum phase estimation for the operator W B W A (L), starting with the initial state |r ; the algorithm returns affirmatively if enough of its eigenvalues are 1. 19 Quantum phase estimation of a unitary operator U to precision δ uses O(1/δ) applications of controlled-U and O(log(1/δ)) other gates [33]. To find a marked node, we can do classical descent on the tree, at each stage checking whether the subtree rooted at each child contains a marked vertex and descending on any one that does. This multiplies the runtime by a factor of L. Because we can check whether the output is actually a marked node, we can do the above for T UB equal to increasing powers of 2, so that the overall runtime depends on the actual tree size T , up to polylogarithmic factors. If there is a single marked node, then conditioned on the estimated phase being 1, the node register is a superposition of states each of which has half its amplitude on the root and the other half uniformly distributed over the path from the root to the marked vertex. By measuring the node register and repeating the procedure on the subtree rooted at the output, we can get to the marked node in O(log L) repetitions.
JW's version proceeds similarly, except that phase estimation is run using W B W A (η), whereη is an estimate of the effective resistance of the tree 20 . The procedure to estimatẽ η consists of rounds of phase estimation (of the walk operator using the current estimate) followed by quantum amplitude estimation (of the 1-eigenspace), which has the same resource requirements as the phase estimation. By using the effective resistance, the overlap of the 1-eigenvector with the root is made to be about 1/2. The state of the node register conditioned on the estimated phase being 1 is such that we can get to a marked node in the same way as Montanaro's procedure but in O(log L) repetitions for any number of marked nodes.
The algorithm [47] of Ambainis and Kokainis for quantum tree search works by searching the first τ nodes of the tree T that would be explored by a classical tree-search algorithm, for exponentially increasing values of τ . For each value of τ , it does this by first generating a path u(τ ) = (u 0 , . . . , u l ) of length l ≤ L descending from the root r = u 0 that completely specifies the subtree T τ containing the first τ nodes; the set of nodes of the subtree T τ specified by the path u(τ ) consists of the nodes of the path itself together with the nodes of each subtree rooted at any "earlier" sibling of any node in the path. By earlier sibling of a node s, we mean a node s with the same parent as s (i.e., a sibling) that is explored earlier by the classical algorithm. An example is shown in Fig. 4a, where the path consists of the right-most green colored nodes at each level. Then Montanaro's algorithm is applied to the subtree T τ using a successor function modified according to u(τ ). Recall that this entails performing phase estimation to precision O(1/ √ τ L), including for the final value of τ ≈ T A .
If we use JW's algorithm in place of Montanaro's in AK's algorithm in order to find a marked node within T A , then the overall resource cost isÕ( √ T A L 3 ) calls to the walk operator for the original tree T to get the path u andÕ( √ T A L) calls to the walk operator for T A as specified by the path u. In Section 6.3.2, we explain how to implement both walk operators for backtracking search schemes that integrate inference as in CP, but first we discuss partial quantum search variants of the above algorithms that require fewer quantum resources.

Partially quantum tree search
Resources will be constrained on quantum devices for years, likely decades, to come. Before being able to implement the fully quantum tree search algorithms discussed in Section 6.2.1, it will be possible to implement methods that perform quantum search over smaller subtrees rather than over the full tree. Here, we describe two such methods.
Rennela et al. also considered hybrid classical-quantum tree searches [74]. They were concerned with quantum algorithms bottlenecked by space constraints, and show how in certain cases divide-and-conquer algorithms can achieve genuine quantum speedups even with a number of qubits equal to some small fraction of the problem input size. The essential idea is to apply the quantum algorithm to subtrees at the "bottom" of the search tree when the subproblems are sufficiently small. In constrast, our hybrid algorithm allows us to apply the quantum tree search algorithm over subtrees that together cover the entire search tree, starting at the root. Unlike as with the divide-and-conquer approach, the effectiveness of our hybrid approach does not depend on the internal structure of the tree, but rather just its size and depth. At a high level, our hybrid approach is motivated by the possibility that the limiting quantum resource is not space but accuracy of the computation. 21 Specifically, it may be possible that we can "implement" the operator W B W A (α), in the sense of having enough qubits to run the circuit, but that the noise level is such that phase estimation cannot be reliably performed to the precisionÕ(1/ √ T ) required to search the whole tree.
Chunky quantum tree search. We present now an extension of AK's algorithm with which the entire tree can be explored in arbitrarily sized chunks as illustrated in Figure 4. For simplicity, we assume all chunks are of uniform size χ. Let T τ,τ be the subtree (i.e., the "chunk") of T consisting of the first τ − τ nodes explored after the first τ nodes, together with the union of the nodes on the unique path between the τ nodes and the root. Intuitively, T τ,τ is the difference between T τ and T τ , with minimal edges added to make it connected. As discussed in Section 6.2.1, the subtree T τ can be uniquely specified by a path u(τ ). The subtree T τ,τ +χ can be uniquely specified by two paths: u(τ ) and u(τ + χ). The path u(τ ) can be thought of as defining T τ by specifying a youngest child for each node on the path. The paths u(τ ) and u(τ + χ) can be thought of as defining T τ,τ +χ by additionally specifying an oldest and a youngest child for each node on their shared initial subpath.
Using AK's algorithm and for any τ , we can get the path u(τ ) of length at most L defining the subtree T τ = T 0,τ of the first τ nodes explored in T . Given paths u(τ ) and u(τ + χ), we can apply Montanaro's or JW's algorithm to explore the subtree T τ,τ +χ by modifying the walk operator.
Each chunk T τ,τ +χ has at most χ+L nodes, and so can be explored usingÕ( (χ + L)L) calls to the walk operator for T τ,τ +χ . Overall, the walk operator for T τ,τ +χ usesÕ(L) more gates relative to the walk operator for T , independent of τ and χ. Additionally, the path u(τ + χ) can be found using AK's algorithm usingÕ( (χ + L)L 3 ) calls to the walk operator for T τ,T A .
Assuming χ = Ω(L), the specification and exploration of each chunk useÕ( χL 3 ) calls to a modified walk operator. There are T A /χ chunks, so overall there areÕ( T A L 3 /χ) calls. For a single chunk (χ = T A ), we get the original runtime scaling of √ T A , and for constant-sized chunks χ = O(1), we get the classical T A scaling. Importantly, the per-node cost is that of the largest cost node within each chunk, which interpolates between the fully classical and fully quantum cases.
Bounded-depth quantum tree search. An alternative way of breaking up the tree into subtrees is to limit the quantum search by depth. That is, starting at some node of the full search tree, we perform quantum search on the subtree rooted at that node and containing nodes at most some distance L * away. This can be done by marking each node such that i) the predicate value is 1 or ii) the predicate value is indeterminate and the node is distance L * from the root (of the subtree). The outcome of each depth-L *bounded quantum search is either a satisfying solution if one exists within distance L * of the starting point, or the set of not-definitely-infeasible nodes at distance exactly L * away from the starting point. The simplest case, L * = 1, is essentially Grover search over each node's children, which we can perform directly. Specifically, if we classically know the node whose children we are searching over, we can classically compute its number of children and perform a modified Grover search to find all children c that the predicate marks as   Table 1: Parts of our representation of a search node. For the domains, the range and number of qubits is that of the original (largest) domains D. Each set S (e.g., D i or R k ) is stored as an integer giving its size |S| and a list (s 1 , . . . , s |S| , * , . . .) of its entries followed by null values; the length of the list is the maximum size of the set. The total number of qubits isÕ(Lm). not definitely infeasible. Performing Grover search directly in this way allows us to store much of the state classically, while restricting our attention to children not known to be infeasible after propagation in a cheaper way. More generally, such direct Grover searches are complicated by the generally unknown structure of the tree; see [75,76].

Quantum-accelerated backtracking with inference
In this section we extend the ideas of the previous section to incorporate the propagation of CP. Classically, this involves as many rounds of filtering as needed to reach a fixed point (as exemplified in Algorithm 1). Since, in the quantum case, the propagation must be done in superposition, the number (and sequence) of rounds of filtering must be fixed a priori. Similarly, the circuit for each filtering algorithm must be valid when applied to any corresponding value-variable graph throughout the tree; this means that the effective per-node filtering cost is that of the worst case.

Representation of tree nodes for CP
Before getting to how to implement the walk operators, we will start by specifying one way of representing the nodes of the search tree in memory. Concretely, we will represent each node of the search tree by the tuple  Table 1 contains a more detailed description of how exactly this are represented at a low level, and how many qubits they require.
The main requirements of the representation of the search node are that each node is uniquely identified and that each node contains all the information necessary get to that node's unique parent. In general, different nodes in the search tree can have the same domains, so the domains alone are not enough. The history b of branching decisions alone suffices in this sense, but we include the rest for ease of exposition and efficiency of computation. In a slight abuse of notation, for two tuples A = (A 1 , . . .), B = (B 1 , . . .) of sets (e.g. D or R), we will write the difference as The root of the full search tree has l = 0, b = ( * , . . .), D set to its initial value, and R = (∅, . . .), but we will also consider searches of subtrees rooted at other nodes. The children of a non-leaf node (l, (. . . , b l ,  * , . . .) D, (. . . , R l , ∅, . . .)) (17) are the nodes Recall that F d (D) ∈ 2 D is the filtered domains and F p (D) ∈ { * , 0, 1} indicates the known feasibility of the filtered domains. This definition of the search tree is not unique, and several choices were made for ease of exposition. Different representations of the nodes make different space-time tradeoffs. That is, more consise representations can save space at the cost of requiring more computation. A quantitative balance of these tradeoffs would depend on detailed knowledge of the hardware (including any underlying fault-tolerance scheme), of the choices of the heuristic and propagation functions, and of the class of instances. Our definition also allows in the search tree nodes known to be infeasible (as does [73]), whereas in Montanaro's original formulation such a node was excluded when enumerating the children of its ostensible parent. While including infeasible leaves in this way increases the number of nodes in the tree, it does not increase the number of times the filter operator (analogous to Montanaro's predicate) is called; in Montanaro's formulation, the predicate is called for each child in sequence, so that overall it is called a number of times proportional to the number of nodes in the inclusive tree.

Implementation of walk operators for CP
Both operators W A (α) and W B can be built from the following primitives: the propagation operator and the branching operator U br,A (α) |l, * , D, R even l>0 and U br,B is defined as U br,A for odd l. By keeping track of the removed edges, we ensure that U F is invertible. In other words, while additional ancillas may be needed to compute U F , they can be returned to their initial state by the end. If we have a classical circuit to compute F , then the quantum circuit for U F is approximately the same size. In Section 6.3.3, we discuss how the quantum-accelerated version of the filter operator U F can be integrated into the quantum tree search. Note that because we must perform the propagation in superposition, the cost per node is equal to the maximum over all nodes, rather than the average as in the classical case, where propagation at some nodes may be quicker than at others. This is the same whether we use a reversible version of a classical circuit, or a modification of quantum-accelerated filtering as described in Section 6.3.3. The implementation of the branching operator U br will depend on the exact choice of the NB: For a generalized controlled gate |x |0 → |x U x |f (x) , we use a triangle to indicate the control register(s) |x ; filled and unfilled circles have their usual meaning of binary controls and negated controls, respectively.
(a) Circuit implementing WA(α). The bottom line of l corresponds to the least significant bit indicating the parity of l. The first gate swaps the R 2 l/2 +1 and the corresponding ancilla; the second does the same for b 2 l/2 +1 . The central gates apply −Z on the least-significant bit for predicate value β = * and a global sign flip for predicate value β = 0.
(c) Circuit implementing Uh nu . For each l ∈ [l] there is a gate that negates the first ancilla and addsb l+1 to the second ancilla, conditioned on (b l , b l+1 ) = (b l , * ), with the convention that bl +1 = 0. The first gate does the same but with the simpler condition that b1 = * . The latter half of the gates reset the ancilla. Figure 5: Circuits for walk operator and subroutines thereof. Note that 2 l/2 is obtained from l by setting the least significant bit to zero.
classical heuristic h, but below we give as an example a concrete implementation assuming classical circuits for h nu and h ch (·, c) = h c (·). We will sketch how to implement W A (α) out of these primitives; the implementation of W B is similar but simpler. To start, note that W A (α) acts on several subspaces independently, each spanned by a node at some even level l and its children: |l, (. . . , b l ,  * , . . .) , D, (. . . , R l , ∅, . . .) (22) and Suppose we add an ancilla state |β for β ∈ {0, 1, * } and initialize it to | * . If we apply U F for even l, the children nodes are unchanged, and the parent node is transformed into Let l 0 be the least significant bit of l. Now applying U br,A (−Z l 0 )U −1 br,A reflects around a state similar to the right-hand sides of Eqs. (20) and (21), except with the parent filtered. By controlling this on the predicate value β = * , we ensure that the overall effect is the identity for marked vertices. For predicate value β = 0, we apply a global sign flip (effecting Eq. (15) for an unmarked but childless node). The circuit for this is shown in Fig. 5a. A practical implementation of W A (α) need not use U br,A (α) as a black box as above.
Now we explain how to implement the branching operator U br,A . Assuming classical circuits for h nu and h ch (·, c) = h c (·), we can implement the operators We will also make use of a controlled uniform-superposition-producing operator Details on how to construct U unif can be found in [73,Sec. 4.7]. To implement the branching operator, we use an ancilla register, initialized to 0, to compute the branches produced by the heuristic. (The case in which a node has no children because the predicate is true is accounted for in the circuit for W a (α) outside the subroutine U br,A , by cancelling out the branching operator with its inverse.) Then conditioned on the number of branches h nu and whether l = 0, we produce a state proportional to |0 + √ αh nu |1 on the least significant bit of l. Conditioned on the least significant bit of l being 1, we then prepare a uniform superposition over [h nu ] using U unif . We then uncompute h nu , and apply h c according to the ancilla. The circuit for this is shown in Fig. 5b. A practical implementation of U br,A (α) need not use U hnu , U ch as black boxes as above.
AK's algorithm has two stages. First, it generates the path u specifying the subtree T τ containing the first (by DFS) τ nodes of the full tree T . It does this by repeated evaluations of a tree-size estimating procedure whose quantum part is just phase estimation of the walk operator W B W A (L) on successive nodes of the tree, with the slight modification that no marking is done (e.g., by conditioning the controlled phase in the middle of Fig. 5b on predicate values {0, 1} rather than just 0). In our circuit for W A in Fig. 5a, this means just removing the control on the central −Z gate. Second, the algorithm performs Montanaro's (or JW's) algorithm for detecting or finding a marked node in the subtree T τ using the path u. In order to do that, we just need to modify the walk operator W B W A (α) so that it corresponds to T τ rather than the full tree T . We can do that as follows. Letl + 1 be the length of u. For each node u l on the path, its values of b l and R l for all l ≤ l are the same as those for nodes later in the path. (The values of b l and R l in u l are * and ∅ for l > l.) Letb be the branching history for the last node ul in the path. Define the functionh which gives the number of branching decisions from a node in T τ . (We leave the dependence, viab andl, on the path u implicit, because we know that classically.) Note thath nu depends on more parts of the state than h nu . For a node not on the path, this is the same as for T . For a node on the path, this ensures that the next node on the path (if there is one) is the last branch in T τ . To implement Montanaro's algorithm on the subtree T τ defined by u, it suffices to replace U hnu by We give an example circuit for this is in Fig. 5c. Because l is encoded in b (by the location of the last non-zero entry thereof), the circuit depends only on (b 1 , . . . , bl +1 ) and D, in addition to some ancilla qubits.

Incorporating quantum algorithms for filtering
We now explain how the quantum-accelerated version of the filtering operator U F can be integrated into a fully or partially quantum tree search. There are two main considerations: the QRAM needed as input and the possible error in the output. We willl first address the QRAM. Our quantum algorithms require quantum access to the variable-value graph G = (X, V, E), as specified some active domains D. Here, we will work in the adjacency list model, but the situation is similar in the adjacency matrix model. Let N D,v be the neighbors of vertex v in the variable-value graph implied by D. At the lowest level, each set N D,v is represented as a list of its elements together with a register indicating its size. For each v, there is anÕ( i.e., an explicit circuit QRAM, as discussed in Section 4. At a given tree search node, we can initialize the QRAM by implementing inÕ(m) time, where the second "QRAM" register hasÕ(m) qubits. The contents of the QRAM are then in superposition, but entangled with the node register. The filtering algorithm can then query the graph (perhaps for a superposition over query registers |i ) by calling the operator in Eq. (32). There are two potential sources for further efficiency. First, each filtering algorithm only needs access to the subgraph induced by the variables in the corresponding constraint's scope and their neighbors. Second, as discussed in Section 6.3.1, the part of the node register containing D is exactly the registers |N D,v for variable vertices, so there is no need to copy them into a separate ancilla register. Now we will address the error in the output. In Section 5, we described the quantum filtering algorithm as a mostly classical procedure that utilizes quantum search, each time completely measuring the quantum state before proceeding. However, as described in Appendix D, we can construct a single unitary that performs the filtering without changing the error or runtime. Unlike as when doing classical backtracking, we cannot amortize away the classical failsafe checks, and so there will always be some probability of failure. Our choices then are to only use the accelerated filtering in a polynomial number of places, or to use it everywhere without any assurance that the overall tree search will be successful. We also need to formulate the filtering algorithm such that the final state (when the algorithm is successful) is independent of the intermediate measurement results. Whenever the filtering algorithm succeeds, it outputs the same set of edges to remove. If the filtering is part of a classical procedure, the order of the elements in this set would not matter. However, within the classical algorithm, we need the state |D , R, F p (D) to be uniquely specified at the lowest level; this can be easily achieved, for example, by storing the domains D and the removed edges R as sorted lists.

Conclusions
In this paper, we investigate the use of quantum computing to accelerate constraint programming (CP). We adapt recent work in quantum algorithms for graph problems and propose quantum subroutines to accelerate the filtering of the alldifferent constraint and other global constraints whose domain-consistency filtering problems involve finding a maximum matching in a bipartite graph. We detail frameworks for integrating quantumaccelerated inference algorithms within classical and quantum backtracking search schemes. Our work highlights the potential for mutual benefit between the paradigms of quantum computing and CP.
One avenue for future work is to investigate whether even faster quantum algorithms for constraint filtering can be obtained by leveraging recent advances in quantum query complexity for various graph problems [38,41,42]. Such query complexity improvements suggest that further quantum speedups may be possible in terms of time complexity. Similarly, advances in quantum algorithms more generally may allow speedups to be shown for a wider variety of constraint types. A particularly promising direction for future research is the use of quantum computers to accelerate domain-consistency algorithms posed in terms of a Gallai-Edmonds decomposition [68] as noted in Section 5.2.
where i * is initially set uniformly at random from [δ v ] and reset to the outcome of every successful Grover search. The procedure finds the minimum with constant probability, iñ O( √ δ v ) time, including queries to U Nv , U S , and U index . Q-FindUndiscoveredNeighbor and Q-FindMinIndexNeighborOnStack are called O(n) times (to traverse all n vertices in the DFS). To get a constant error probability for Q-FindSCC, each should be repeated log n times to get their error rates to O(1/n). Overall, the running time isÕ v∈V √ δ v =Õ( √ nm). This assumes that U Nv for all v ∈ V is previously initialized. U S and U index can be initialized inÕ(n) and updated in O(log n).

B Other global constraints
Additional global constraints can be made domain consistent using the Dulmage-Mendelsohn canonical decomposition algorithm of Cymer, and thus can be accelerated by our quantum approach as discussed in Section 5.2. Here we consider several examples; see [10] for a list of thirteen applicable global constraints.
The inverse constraint is often used to model routing problems involving successor and predecessor variables. For example, a CP model of the symmetric traveling salesman problem (TSP) with n vertices can be posed as follows: alldifferent(X) x 1 < y 1 (41) where c ij is the cost of traveling from vertex i to j, x i is the vertex visited immediately after i and y i is the vertex visited immediately before i. Constraint (41) breaks symmetry by preventing reverse tours [79]. A solution that visits the vertices in order (1, 2, 3, 4) would have X = (2, 3, 4, 1) and Y = (4, 1, 2, 3).
Definition B.2 (same constraint). Given tuples of variables X = (x 1 , . . . , x n ) and Y = (y 1 , . . . , y n ), the constraint same(X, Y ) enforces that the multiset of values assigned to the variables of X is identical to the multiset of values assigned to the variables of Y [80].
For example, X = (1, 2, 3) and Y = (2, 1) would satisfy the constraint usedby(X, Y ), whereas X = (1, 2, 3) and Y = (4) would not. The usedby constraint is often useful for models that require an assignment of resources where there may be more of one resource type than another (i.e., nurses and doctors).

C Other CP modeling examples
In this appendix we provide additional examples to illustrate how computationally difficult problems can be modeled in constraint programming (CP). These examples are on top of the Sudoku model presented in Section 2 and the TSP model presented in Appendix B, and utilize only the global constraints formally defined in this paper.

C.1 Rostering problems
CP has been used to model and solve various scheduling problems, such as the nurse rostering problem [81] from operations research. In this problem we are given a team of nurses, i ∈ I; a set of days in the rostering period, d ∈ D; a set of different shifts, k ∈ K = {M, A, E, O} (i.e., morning, afternoon, evening, and off); and a pre-specified rostering requirement, C d,k , specifying the number of nurses required on day d, shift k. We can model this rostering requirement in CP effectively using the global cardinality constraint (gcc), as introduced in Section 5.2.
We define decision variable s i,d ∈ K to represent the shift assigned to nurse i on day d. Our gcc constraint is then formulated as follows:

C.2 Sports tournament scheduling
Another interesting application of CP modeling is for round-robin sports tournament scheduling [82]. In a single round-robin tournament we have a set of n teams, i ∈ I, and a set of n − 1 match slots, t ∈ T . The goal is to design a schedule such that each team plays each other team exactly once. (There are also more complex versions of round-robin tournament scheduling such as bipartite single round robin [82].) To formulate this problem in CP, we introduce decision variable x i,t ∈ I \ {i} whose value indicates the opponent team i faces in match slot t. The constraints can then be expressed as: alldifferent({x i,t : t ∈ T }), ∀i ∈ I (46) x i,t ∈ I \ {i}, ∀i ∈ I, t ∈ T.
Constraint (45) ensures that the team that i plays in slot t plays i in slot t; the flexibility of the CP paradigm allows variables to be indexed by other variables. 22 Constraint (46) dictates that each team has a different opponent in each match slot, and finally Constraint (47) dictates the domain for each of the variables. While this modeling example illustrates the core constraints for round-robin scheduling, real-world models typically capture more complex constraints, such as those surrounding limitations on the number of 'away' games, or the incorporation of travel times.

C.3 Quadratic assignment problems
While the previous examples were both satisfaction problems, CP can be used to model and solve optimization problems as well, including those with quadratic objective functions.
The key idea is that, while each state σ l is dependent on the measurement history e (l) , the final state σ k has a part that is independent of the measurement history, namely f (σ 0 ), and that can therefore be unentangled therefrom. Now, consider our quantum algorithm for filtering. The filtering computes the function f (D) = (D , R, F p (D)), where D = F d (D) and R = D \ D . (See Eqs. (2) and (19).) Note that this is reversible, as supposed above (and we keep track of the removed edges R l for precisely this reason). It consists of classical computations interspersed with quantum searches. Suppose that each search were perfect, in the sense that measurement always yielded a marked state. In that case, the circuit before measurement is exactly as in (50), with uniform probability over the possible outcomes, so if we can do the search, then we can implement U E l . More precisely, using fixed-point amplitude amplification (Thm. 27 of [83]) we can implement U E l such that the final state is -far from ideal in using O(log(1/ )/δ) queries, where δ is the usual leading term of Grover search (e.g., √ N ). If we want the final state of the whole quantum tree search to be within distance O(1) from ideal and the filtering algorithm is run T times with g search each, then we need = O(T g).