Global optimal control of perturbed systems

We propose a new numerical method for the computation of the optimal value function of perturbed control systems and associated globally stabilizing optimal feedback controllers. The method is based on a set oriented discretization of state space in combination with a new algorithm for the computation of shortest paths in weighted directed hypergraphs. Using the concept of a multivalued game, we prove convergence of the scheme as the discretization parameter goes to zero.


Introduction
Global infinite horizon optimal control methods for the solution of general nonlinear stabilization problems are attractive for their flexibility and theoretical properties, because they are applicable to virtually all types of nonlinear dynamics, their optimal value functions can typically be identified as Lyapunov functions and they allow for a rigorous treatment of perturbations in a game theoretical setting. However, these methods have the drawback that their numerical solution requires the discretization of the state space which results in huge numerical problems both in terms of computational cost and in terms of memory requirements. Hence, in order to make these methods applicable to a broader range of systems, advanced numerical techniques are needed in order to reduce the computational effort as much as possible.
A novel approach to such problems was presented in the recent paper [1], where a set oriented numerical method for the approximate computation of the optimal value function of certain nonlinear optimal control problems has been developed. The approach relies on a division of state space into boxes that constitute the nodes of a directed weighted graph, where the weights are constructed from the given cost function. On this graph, standard graph theoretic algorithms for computing shortest paths can directly be applied, yielding an approximate value function which is piecewise constant on the state space. At the same time, for every node in the graph, these algorithms compute the successor node on a shortest path, yielding approximate optimal pseudo-trajectories of the original system. Hence, this method combines a simple and hierarchically implementable discretization technique with efficient graph theoretic algorithms yielding both low memory consumption and a fast solution. For the problem of feedback stabilization the solution from [1], however, is not directly applicable, because the resulting pseudo-trajectories would have to be postprocessed in order to obtain true solutions of the system.
In [2] it was subsequently shown that the approximate optimal value function can in fact be used in order to construct a stabilizing feedback controller. Based on concepts from dynamic programming [3] and Lyapunov based approximate stability analysis [4], a statement about its optimality properties was given and a local a posteriori error estimate derived that enables an adaptive construction of the division of state space. However, due to the fact that the approximate optimal value function is not continuous, the constructed feedback law is in general not robust with respect to perturbations of the system.
In the present paper, we show how to incorporate arbitrary perturbations into the framework sketched above. These perturbations can be either inherently contained in the underlying model, describing, e.g., external disturbances or the effect of unmodelled dynamics, or they could be added on top of the original model to account, e.g., for discretization errors.
Our goal in this paper is to construct a feedback which is robust in the sense that on a certain subset of state space it stabilizes the system regardless on how the perturbation acts. Conceptually, this problem leads to a dynamic game, where the controls and the perturbations are associated to two "players" that try to minimize and to maximize a given cost functional, respectively. We show how the discretization of state space in a natural way leads to a multivalued dynamic game (i.e. a discrete inclusion) and prove convergence of the associated value function when the images of the inclusion shrink to the original single-valued map. From this multivalued game we derive a directed weighted hypergraph that gives a finite state model of the original game. We formulate an adapted version of Dijsktra's algorithm in order to compute the associated approximate value function and prove convergence when the box-diameter of the state space division goes to zero.
It should be noted that the convergence analysis developed in this paper using multivalued dynamics is new also for the discretization of optimal control problems without perturbations in [1]. An interesting side result of our study is that using this technique we are able to keep track of the effects of discontinuities in the approximated optimal value function as induced, e.g., by state space constraints. This allows us to prove not only L ∞ convergence in regions of continuity but also L 1 convergence in the whole domain of the optimal value function, provided that the optimal value function is continuous with respect to small changes in the state space constraints.
Compared to other dynamic programming approaches to the stabilization of perturbed nonlinear systems (see, e.g., [5] and the references therein), the main advantages of our method are these general and rigorously provable convergence properties and the low computational cost of our perturbed version of Dijkstra's algorithm, cf. Section 6.1. However, our new algorithm is also advantageous for unperturbed problems when treating the spatial discretization errors as perturbation: as Example (19) illustrates, this approach leads to considerably improved performance on a significantly coarser discretization compared to [2].
The paper is organized as follows. In the ensuing Section 2 we describe the problem formulation and the associated game theoretic interpretation. In Section 3 we introduce the concept of a multivalued game and an enclosure and prove a statement about the convergence of the value function of a sequence of enclosures of a multivalued game. These result are extended to systems with state constraints in Section 4. In Section 5 we show how via the division of state space one obtains a multivalued game from the original system, construct the corresponding hypergraph and introduce an associated shortest path algorithm. Some hints on its implementation, complexity issues as well as two numerical examples are addresed in Section 6. Convergence of the numerical approximation to the optimal value function and the construction of approximately optimal feedback laws are discussed in Sections 7 and 8, respectively.

Problem formulation
We consider the problem of optimally stabilizing the discrete-time perturbed control system x where f : X × U × W → X is continuous, x k ∈ X is the state of the system, u k ∈ U is the control input and w k ∈ W is a perturbation parameter, chosen from sets X ⊂ R d , U ⊂ R m and W ⊂ R . In addition to the evolution law, we are given a continuous cost function g : X × U → [0, ∞), that assigns the cost g(x k , u k ) to any transition Our goal is to derive an (optimal) feedback law u : X → U that stabilizes the system in the sense that for a certain subset S ⊂ X any trajectory starting in S tends to some prescribed set O ⊂ X, while the worst case accumulated cost is minimized.
Let us be more precise. For a given initial point x ∈ X, a control sequence u = (u k ) k∈N ∈ U N and a perturbation sequence w = (w k ) k∈N ∈ W N yield the trajectory x(x, u, w) = (x k (x, u, w)) k∈N , defined by x 0 = x and while the associated accumulated cost is given by In order to formalize the interplay between the control and the perturbation we employ a game theoretic viewpoint which we describe next. The problem formulation actually already describes a game (see, e.g., [6]), where at each step of the iteration (1) two "players" choose a control value u k and a perturbation value w k , respectively. The goal of the controlling player is to minimize J, while the perturbing player tries to maximize this quantity.
We assume that the controlling player has to choose the value u k first and that the perturbing player has the advantage of knowing u k when choosing the perturbation value w k . However, the perturbing player is not able to forsee future choices of the controlling one. More formally, we restrict the choice of perturbation sequences w ∈ W N to those that result from applying a nonanticipating strategy β : U N → W N to a given control sequence u ∈ U N , i.e. we have w = β(u), with β satisfying As mentioned, our goal is to find a feedback law u : X → U such that with controls u k = u(x k ), x k approaches a given set O ⊂ X, regardless of how the perturbation sequence w is chosen. Accordingly, we assume that we know a compact robust forward invariant set O ⊂ X, i.e. for all x ∈ O there is a control u ∈ U such that f (x, u, W ) ⊂ O. Since we are done with controlling the system once we are on O, we assume that g(x, u) = 0 for all x ∈ O and all u ∈ U and g(x, u) > 0 for all x ∈ O and all u ∈ U . Further assumptions on g and on the dynamics in a neighborhood of O will be specified later.
Our construction of the feedback law will be based on the upper value function of the game (1), which fulfills the optimality principle

Multivalued games
As we will see in the next section, our set oriented approach to the discretization of state space of the perturbed control system (1) leads to a finite state multivalued system. For the convergence analysis of this discretization it turns out to be useful to introduce as an intermediate object an infinite state multivalued game defined by a discrete inclusion. This is given by a multivalued map where X ⊂ R d is a closed set and U ⊂ R m , W ∈ R and the images of F are compact sets, together with a cost function In order to simplify our presentation we first assume that F (x, u, w) = ∅ for all x ∈ X, u ∈ U , w ∈ W , which will be relaxed later, cf. Section 4. Further regularity assumptions on these maps will be imposed when needed. Note that we have introduced a second state argument in G, which allows to associate different costs to the trajectories of the associated discrete inclusion. For a given initial state x ∈ X, a given control sequence u = (u k ) k∈N ∈ U N and a given perturbation sequence w = (w k ) k∈N ∈ W N , a trajectory of the game is given by any sequence x = (x k ) k∈N ∈ X N such that x 0 = x and We denote by the set of all trajectories of F associated to x, u and w. The accumulated cost is given by As in the previous section, we are interested in computing the upper value function of this game. By standard dynamic programming arguments [7] one sees that this function fulfills the optimality principle Observe that our original "single valued" game (2)-(3) can be recast in this multivalued setting by defining F (x, u, w) := {f (x, u, w)} and G(x, x 1 , u, w) := g(x, u).
We will now investigate the relation of the value functions of different multivalued games. For this purpose we first introduce the concept of an enclosure. Definition 1. If (F 1 , G 1 ) and (F 2 , G 2 ) are two multivalued games such that for all x, u and w and for all x, x ∈ F 2 (x, u, w) and all u and w, then (F 1 , G 1 ) is called an enclosure of (F 2 , G 2 ).
From this definition we immediately obtain the following proposition. Proposition 1. Let the game (F 1 , G 1 ) be an enclosure of the game (F 2 , G 2 ). Then The next proposition studies the convergence of the value functions V (F i ,G i ) of a sequence of games (F i , G i ). In this proposition H denotes the Hausdorff distance for compact sets.
and sup x,x 1 ∈X,u∈U,w∈W Assume furthermore that F is upper semi-continuous in x and that G is continuous in x and x 1 , both uniformly in u and w and on compact subsets of X. In addition, we assume that there exists α ∈ K ∞ 1 with and i.e., uniform convergence on compact sets in the domain of V (F,G) .
Proof. Let k * : X N → N be a bounded map. Then from the optimality principle (6) we obtain by induction . Due to the lower bound α on G, for every δ > 0 there exists a time k γ,δ ∈ N such that for each trajectory x ∈ X F (x, u, β(u)) with cost bounded by γ there exists a time k * (x) ≤ k γ,δ such that x k * (x) ∈ B δ (O). We fix ε > 0 and x ∈ K and choose δ > 0 such that V (F,G) (x) ≤ ε for all x ∈ B δ (O) (δ exists because of the continuity of V (F,G) on ∂O). Then, using an ε-optimal perturbation strategy β * ∈ B and an arbitrary u * ∈ U N , from the above optimality principle we obtain Now, fixing β * , for any i ∈ N we can pick an ε-optimal control u * i , yielding In particular, this last expression is bounded by γ and hence the lower bound α for G i implies that there exists a compact set K 1 such that each ε-optimal trajectory (x k ) k ∈ X F i (x, u * i , β * (u * i )) lies in K 1 for all i ∈ N. Now assumption (7) and the upper semicontinuity of F imply that for each ε 1 > 0 there exists an i 0 ∈ N such that for i ≥ i 0 and each such ε-optimal trajectory (8) and the continuity of G imply that we can find for all i ≥ i 1 and all k * ∈ {1, . . . , k γ,δ }. Combining this inequality with the estimates for V (F,G) and V (F i ,G i ) using u * = u * i in the former we obtain for all i ≥ i 1 . Since i 1 depends only on k γ,δ and ε, hence only on the set K and not on the individual x, we obtain the desired uniform convergence.
Remark 1. Note that we have obtained our result under very weak assumptions on F and G using, however, the crucial continuity assumption of V (F,G) on ∂O. This assumption -which is implicit and in general difficult to check directly -can be ensured by the following asymptotic controllability assumption on the dynamics F and the cost function G in a neighborhood of O: Assume that there exists a neighborhood N of O and a KL function 2 η such that for each x ∈ N and each perturbation strategy β ∈ B there exists a control sequence u ∈ U N and a trajectory ( Then, using the construction from [8, Proof of Theorem 5.4], we find a K function Note that condition (9) is weaker than controllability conditions typically employed to ensure continuity in minimum time problems or pursuit-evasion games (cf. e.g. [9, Chapter IV]) because we do not require to be able to steer the system into the "target" set O but only asymptotically to O.
We also emphasize that we only need continuity at the boundary of O and that our optimal value function may be discontinuous elsewhere.

State space constraints
So far we have assumed F (x, u, w) = ∅ for all x ∈ X, u ∈ U , w ∈ W which guarantees that for each initial value x, and each pair of control and perturbation sequences u and w we obtain at least one trajectory (x k ) k which is defined for all k ∈ N 0 . However, in practice it will often be necessary to relax this assumption.
In order to motivate this relaxation, assume that we are given a multivalued game ( F , G) on a state space X ⊆ R d . In our numerical approach, the state space set X on which we can solve the problem will be a compact set while the state space X of the given problem is often unbounded. In addition, from a modeling point of view it might be desirable to introduce state constraints, e.g., in order to avoid certain critical regions of the state space. In both cases, it will be necessary to restrict the state space of the original problem defining This construction may result in F (x, u, w) = ∅ for certain x ∈ X, u ∈ U , w ∈ W and consequently it may happen that a solution trajectory will only exist for finite time. More precisely, for given F , given u = (u k ) k ∈ U N , given w = (w k ) k ∈ W N and any sequence be the maximal index up to which the sequence x constitutes a trajectory of F . Since a trajectory with k max It is easy to see that Proposition 1 remains valid in this case, while Proposition 2 is more difficult to recover in this setting. The reason lies in the fact that any enclosure will necessarily enlarge the set of possible trajectories, even if we apply the same state space constraints to F and F i . In the presence of state space constraints this means that for any i there may exist a trajectory x i of F i for which all nearby trajectories x of F violate the space constraints. In other words, unless very specific knowledge about the dynamics F is available and used for the construction of the enclosure F i , the enlargement of the dynamics has the implicit effect of relaxing the state space constraints. However, if we assume that the optimal value function is continuous with respect to relaxations of the state space constraints, then we can recover Proposition 2. In order to formalize this relaxation, for ε > 0 we define the space and the related optimal value function V (Fε,G) . Using this notation we can prove the following variant of Proposition 2.
Proposition 3. Consider the state space constrained dynamics F ofF and consider a sequence of enclosures (F i , G i ) of F on X. Let the assumptions of Proposition 2 hold for F and F i , where (7) in the case of F (x, u, w) = ∅ is to be understood as Assume, furthermore, that F is upper semi-continuous in x uniformly in u and w on compact subsets of X and let · p be the usual p-norm for real valued functions on X for some p ∈ {1, . . . , ∞}.
Then for each compact set K ⊂ X for which sup x∈K V (F,G) (x) < ∞ and on which the continuity assumption Proof. The assumptions on F and F i imply that for each ε > 0, each k * ∈ N and each sufficiently large i ∈ N, for each trajectory x i of F i we can find a trajectory x ε of F with x ε k − x k ≤ ε, k = 0, . . . , k * . Hence, up to the time k * the trajectory x ε is also a trajectory of F ε . Thus, replacing F by F ε we can follow the proof of Proposition 2 in order to obtain for all sufficiently large i ∈ N and all x ∈ K. Now (10) implies the assertion.
Remark 2. Basically, the continuity assumption (10) demands that an arbitrarily small relaxation of the state space constraints does not lead to large changes in the optimal value function. If V (F,G) is continuous on K then one can expect (10) to hold for p = ∞ while if V (F,G) is discontinuous on K (note that state space restrictions may introduce discontinuities in the optimal value function) then we would only expect (10) to hold with p < ∞ because the location of the discontinuity is likely to change when the state constraint changes. We conjecture that (10) holds under mild regularity conditions on the optimal control problem, a formal verification, however, is beyond the scope of this paper.
In any case, we would like to emphasize that our result allows for a rigorous convergence proof of the approximating multivalued game in the presence of discontinuities, a feature which is rarely found in other approximation techniques.

Discretization of the game
In this section we describe the set oriented discretization technique which transforms our problem into a graph theoretic problem. In order to introduce our method, we first recall the corresponding procedure for unperturbed systems developed in [1] before we turn to the general setting.

Discretizing the Unperturbed System
If X is finite and there are no perturbations, then one can use a shortest path algorithm like Dijkstra's method [10], see also the appendix, in order to compute the value function, see, e.g., [7]. In [1] it has been shown how to discretize general optimal control problems with continuous state space such that this approach can be applied. We review this method here in a different formulation that directly carries over to the case of a perturbed control system in the next section.
We consider a single valued control system f : X×U → X (f continuous, X ⊂ R d and U ⊂ R m compact, 0 ∈ X, 0 ∈ U , f (0, 0) = 0), together with a continuous cost function g : X × U → [0, ∞) with g(x, u) > 0 for x = 0 and g(0, 0) = 0. Let P be a finite partition of X, i.e. P is a finite set of mutually disjoint subsets P ⊂ X. Define the map π : X → P, π(x) = P , x ∈ P , as well as ρ : X ⇒ X, ρ = π −1 • π (i.e. to each x, ρ associates the set of the partition P which contains x).
Box-enclosure of the system. Consider the multivalued game (which is actually a multivalued control system since there are no perturbations here) (F, G) with , u)) and G(x, x 1 , u, w) = g(x, u).
The optimality principle (6) in this case reads Projection onto piecewise constant functions. The right hand side of (11) defines an operator on real valued functions on X, the dynamic programming operator L : R X → R X , Note that the optimal value function V (F,G) is, by definition of L, a fixed point of L, i.e. L[V (F,G) ] = V (F,G) . Abusing notation, we identify the space R P with the subspace of real valued functions on X that are piecewise constant on the elements of the partition P (in fact, we view v ∈ R P as the function v • π ∈ R X ). We define the projection ϕ : and the corresponding discretized dynamic programming operator L P : R P → R P , Explicitely, the discretized operator reads since v ∈ R P is constant on each element of P, i.e. on each set F (x , u). We define the discretized optimal value function V P ∈ R P as the unique fixed point of L P with V P (0) = 0. Then V P satisfies the optimality principle Graph theoretic formulation. Note that since P is finite, V P (f (x , u)) in (12) can only take finitely many values. We can therefore rewrite (12) as where V P (P ) = V P (x) for any x ∈ P ∈ P. If we define the multivalued map (or, equivalently, the directed graph) F : P ⇒ P, F(P ) = π(f (π −1 (P ), U )), P ∈ P, and the cost function we can rewrite (13) as V P (P ) = min {G(P, P 1 ) + V P (P 1 )}.
Note that this optimality principle can be interpreted as being solved by Dijkstra's algorithm.

Discretization of the Perturbed System
Now we want to carry over the discretization procedure from the last section to our game setting. We proceed in a completely analogous way, additionally incorporating the perturbations now. This will ultimately lead to a directed hypergraph (actually a forward hypergraph or F -graph in the terminology of [11]) instead of an ordinary graph for which we formulate the associated shortest path algorithm at the end of the section.
Box-enclosure of the system. Consider the multivalued game (F, G) with (where f and g are the control system and cost function introduced in Section 2). From the optimality principle (6) we obtain Projection onto piecewise constant functions. The dynamic programming operator L : R X → R X here reads Correspondingly, the discretized operator L P : R P → R P is given by since v ∈ R P is constant on each element of P, i.e. on each set F (x , u, w). We define the discretized optimal value function V P ∈ R P as the unique fixed point of L P with V P (P ) = 0 for all partition elements P ∈ P with π −1 (P ) ∩ O = ∅. Then V P satisfies the optimality principle V P (x) = inf x ∈ρ(x),u∈U g(x , u) + sup Graph theoretic formulation. In order to derive the corresponding shortest path algorithm, it is useful to formulate (17) equivalently in terms of an associated graph. To this end note that for any pair (x, u) ∈ X × U , the set F (x, u, W ) ⊂ X is the union of a finite set of elements from the partition P. In particular, the family {F (x , u, W ) : (x , u) ∈ ρ(x)×U } of subsets of X is finite for any x ∈ X. Putting this in terms of a corresponding map on P: each partition element P is mapped to a finite family {N i } i=1,...,i(P ) , N i ⊂ P, of subsets of P under all perturbations. Formally, we have a directed hypergraph (P, E) with the set E ⊂ P × 2 P of hyperedges given by or, equivalently, the multivalued map F : P ⇒ 2 P , Figure 1: Illustration of the construction of the hypergraph.
If we define weights on the edges of this hypergraph by G(P, N ) = inf{g(x, u) : (x, u) ∈ P × U, π(F (x, u, W )) = N }, then we can write (17) equivalently as Dijkstra's method for the perturbed system. We are now going to generalize Dijkstra's algorithm (see the appendix) such that it computes the value function of a weighted directed hypergraph (i.e. the function defined by the optimality principle (18)). Let (P, E), E ⊂ P × 2 P , be a hypergraph with weights G : E → [0, ∞). In order to adapt Algorithm 2, we need to modify the relaxing step in lines 7-9, such that the maximization over all perturbations (i.e. over N ∈ N ) in (18) is taken into account. The modified version of lines 7-9 reads: We note that this algorithm bears similarities with the SBT-algorithm in [11]. However, in our case the graph has a special structure (namely, the heads of the hyperedges consist of only a single node, i.e. we have an F -graph as defined in [11]). This yields the subquadratic complexity in the number of nodes as derived above and thus gives an improvement over SBT.

Implementation
In the numerical realization we always let the state space X be a box in R d and construct a partition P of it by dividing X uniformly into smaller boxes. In fact, we realize this division by repeatedly bisecting the current division (changing the coordinate direction after each bisection). The resulting sequence of partitions can efficiently be stored as a binary tree -see [12] for more details.
Time and space complexity. The time complexity of the standard Dijkstra algorithm (Algorithm 2 in the appendix) strongly depends on the data structure which is used in order to store the set Q. In particular, the complexity of the operations in lines 5 (extracting the node with minimal V -value) and line 9 (decreasing the V -value and the associated reorganization of the data structure) have a crucial influence. In our implementation we are using a binary heap in order to store Q which leads to a complexity of O((|P| + |E|) log |P|).
In the perturbed case (Algorithm 1), each hyperedge is considered at most N times in line 7, with N being a bound on the cardinality of the hypernodes N . Additionally, we need to perform the check in line 8, which has linear complexity in N . Thus, the overall complexity of the perturbed Dijkstra algorithm is O(|P| log |P| + |E|N (N + log |P|)).
The space requirements grow linearly with the number of partition elements. Since typically the whole state space has to be covered, this number grows exponentially with the dimension of phase space (assuming a uniform partioning). The concrete storage consumption strongly depends on the properties of the underlying control system. While the number of hyperedges is essentially determined by the Lipschitz constant of f , the size of the hypernodes N will crucially be influenced by the size of the perturbation. In the applications that we have in mind in this paper, these numbers are of moderate size.
As a rule of thumb, the main computational effort in our approach goes into the construction of the hypergraph via the mapping of test points -in particular, if the system is given by a short-time integration of a continuous time system. Note that this "sampling" of the system will be required in any method that computes the value function. Typically however, in standard methods like value iteration, certain points are sampled multiple times which leads to a higher computational effort in comparison to our approach.

Numerical Examples
A simple 1D system. We start by looking at an additively perturbed version of a simple 1D map from [2]: for some ε > 0 and the fixed parameter a ∈ (0, 1). The cost function is so that (regardless of how the perturbation sequence is chosen) the optimal control policy is to steer to the origin as fast as possible, i.e. to choose u k = −1 for all k. Similarly, the optimal strategy for the "perturbing player" is to slow down the dynamics as much as possible, corresponding to w k = ε for all k. The resulting dynamical system is the affine linear map which has a fixed point at x = ε/ (1 − a), i.e. under worst case conditions (assuming w k = ε for all k) it will be impossible to get any closer than α 0 := ε/(1 − a) to the origin. Correspondingly, we choose a neighborhood O = [0, α] with α > α 0 as our target region. With the exact optimal value function is as shown in Figure 2 for a = 0.8, ε = 0.01 and α = 1.1α 0 . In that Figure, we also show the approximate optimal value functions on partitions of 64, 256 and 1024 intervals, respectively. In the construction of the hypergraph, we used an equidistant grid of ten points in each partition interval, in the control space and in the perturbation space.
The inverted pendulum -reloaded. As a more challenging test case, we reconsider the problem of designing an optimal globally stabilizing controller for an inverted pendulum on a cart (see [1,2]): The equation models the (planar) motion of an inverted pendulum with mass m = 2 on a cart with mass M = 8 which moves under an applied horizontal force u. The angle ϕ measures the offset angle from the vertical up position. The parameter m r = m/(m + M ) is the mass ratio and = 0.5 the distance of the pendulum mass from the pivot. We use g = 9.8 for the gravitational constant. The instantaneous cost is q(ϕ,φ, u) = 1 2 0.1ϕ 2 + 0.05φ 2 + 0.01u 2 .
Denoting the evolution operator of the control system (19) for constant control functions u by Φ t (t, u), we consider the time-T -map Φ T (x, u) of this system as our discrete time system with T = 0.1. The map Φ T is approximated via the classical In [2], a feedback trajectory with initial value (3.1, 0.1) was computed that was based on an approximate optimal value function on a partition of 2 18 boxes (cf. Figure 3 (left)). In contrast to what one might expect, the approximate optimal value function does actually not decrease monotonically along this trajectory (cf. Figure 3 (right)). This effect is due to the fact that the discretization method used in [2] allows for jumps in the trajectories which cannot be reproduced by the real system. The fact that the approximate optimal value function is not always decreasing indicates that the approximation accuracy in this example is just fine enough to allow for stabilization, and in fact, on a coarser partition of 2 14 boxes, the associated feedback is not stabilizing this initial condition any more.
We are now going to use the approach developed in this paper in order to design a stabilizing feedback controller on basis of the coarser partition (2 14 boxes). To this end, we imagine the perturbation of our system being given as "for a given state (ϕ,φ), be prepared to start anywhere in the box that contains (ϕ,φ)", i.e. we define our game by F ((ϕ,φ), u, W ) : where B ∈ P is the box in the partition P under consideration which contains the point (ϕ,φ). Note that we do not need to parameterize the points in Φ T (B, u) with w ∈ W for the construction of the hypergraph. Figure 4 shows the approximate upper value function on a partition of 2 14 boxes   with target region O = [−0.1, 0.1] 2 as well as the trajectory generated by the associated feedback for the initial value (3.1, 0.1). As expected, the approximate value function is decreasing monotonically along this trajectory. Furthermore, despite the fact that we used considerably fewer boxes as for Figure 3, the resulting trajectory is obviously closer to the optimal one because it converges to the origin much faster.

Convergence Analysis
In this section we show that and in which sense the approximate optimal value function constructed in the preceeding section converges to the true one as the underlying partitions are refined, using the abstract results for multivalued games developed in the Sections 3 and 4.
We begin with the following observation on the relation between V P and V (F,G) with F , G from (16).
Proposition 4. Consider the discretized optimal value function V P and the optimal value function V (F,G) from (5) corresponding to the game (16). If V (F,G) is continuous on ∂O, then these functions are related by Proof: First note that both functions are nonnegative. ¿From the previous considerations it follows that the functions satisfy the optimality principles and In order to show inf we number the elements P i of P such that i 2 > i 1 implies V P | P i 2 ≥ V P | P i 1 . We first consider those elements P i , i = 1, . . . , j, for which we have V P | P i ≡ 0 which by our assumptions on V P and g(x, u) is equivalent to In particular, for any fixed w we find x 1 ∈ F (x 0 , u 0 , w) ∩ O for which we proceed the same way, which yields F (x 1 , u 1 , w) ⊂ O for all w ∈ W . Hence, given a perturbation strategy β(u) we find a control sequence u such that X F (x 0 , u, β(u)) ⊂ O implying G(x k , x k+1 , u k , β(u) k ) = 0 and thus inf In fact, what we showed is that V (F,G) (x) = 0 for x ∈ O. Since we assumed that V (F,G) is continuous on ∂O, we also get inf Now we proceed by induction over i ≥ j + 1. We pick some i ≥ j + 1 and assume that the desired inequality (23) holds for ρ(x) = P 1 , . . . , P i−1 . We fix x ∈ X with ρ(x) = P i and an arbitrary ε > 0. Then we pick x ∈ P i such that the infimum over x in (22) is attained up to ε. Thus we obtain where we have used the induction assumption in the third step as follows: the inequality g(x, u) > 0 implies V P (x 1 ) < V P (x) = V P | P i , furthermore we have x 1 ∈ F (x , u, w) = P i for some i ∈ N, i.e., V P (x 1 ) = V P | P i . This implies V P | P i > V P | P i and consequently i > i . Hence by the induction assumption we have inf Now, since ε > 0 was arbitrary, we obtain (23). The converse inequality V P (x) ≤ inf x ∈ρ(x) V (F,G) (x) follows by a similar induction argument using the fact that (21) always yields a larger value than (22) due to the additional minimization over x in (22).
Remark 3. Note that in order to obtain the assertion from the preceeding proposition, it is sufficient that the union of those partition elements that have nonempty intersection with O form a neighborhood of O. If this is true, one can actually drop the assumption on the continuity of V (F,G) on ∂O.
We now consider a sequence of increasingly finer partitions of X and ask under which conditions the corresponding approximate optimal value functions converge to the value function of the game (f, g). In a nested sequence of partitions, each element of a partition is contained in an element of the preceding partition.
The following theorem states our main convergence result. It shows that we obtain L ∞ convergence on compact sets on which V (f,g) is continuous and -under a mild regularity condition on the set of discontinuities -L 1 convergence on every compact set on which V (f,g) is bounded. We first consider problems without state space constraints and address the constrained case in Remark 4, below. Assume that g(x, u) is continuous, that g(x, u) > 0 for x ∈ O and that V (f,g) is continuous on ∂O. Then for every compact set K ⊆ X on which V (f,g) is continuous and being the largest subset of K which is a union of partition elements P ∈ P i .
If we assume furthermore that the set of discontinuities of V (f,g) has zero Lebesgue measure, then Proof. We use Proposition 2 with (F, G) = (f, g) (f interpreted as a set valued map) and Proposition 4. Note that since F i (x, u, w) = ρ i (f (x, u, w)) and G i (x, u, w) = g(x, u), the games (F i , G i ) are enclosures of (f, g) (in fact, since the sequence of partitions is nested, for every i, (F i , G i ) is an enclosure of (F i+1 , G i+1 )). Under the assumptions of the theorem, all assumptions of Proposition 2 are satisfied. In particular, by the assumptions on g and since X and U are compact, we know that there exists a function α ∈ K ∞ such that for all i. Thus, V (F i ,G i ) converges uniformly to V (f,g) on K. In order to show the L ∞ convergence on K i observe that if V (f,g) is continuous on K then it is also uniformly continuous on K which implies Thus we can use Proposition 4 in order to conclude In order to show the L 1 convergence, observe that the uniform convergence It thus remains to show that V (F i ,G i ) | K − V P i | K → 0 in L 1 . Let D be the set of discontinuities of V (f,g) and D i = {P ∈ P i , P ∩ D = ∅}. We write Because of V (f,g) ≥ V (F i ,G i ) , the assumption that D has zero Lebesgue measure and H(ρ i (x), {x}) → 0, we have that I i,1 → 0 for i → ∞. Using Proposition 4, the compactness of K, and the fact that V (F i ,G i ) | K → V (f,g) | K uniformly, we also obtain that I i,2 → 0 as i → ∞, i.e. V (F i ,G i ) | K − V P i | K → 0 in L 1 and thus the assertion of the theorem.

Corollary 1. Under the assumptions of Theorem 1 we have
for Lebesgue-almost all x ∈ K, where K is any compact subset of the domain of V (f,g) .
Proof. By standard arguments, there exists a subsequence (i(j)) j such that V P i(j) (x) → V (f,g) (x) as j → ∞ for Lebesgue-almost all x ∈ K. Since (V P i (x)) i is monotone, we obtain the assertion.
Remark 4. Using Proposition 3 instead of Proposition 2 it is easily seen that our convergence results remain valid in case of state space constraints if we assume condition (10) forF (x, u, w) = {f (x, u, w)}. In this case, the first assertion of Theorem 1 will hold for the p-norm from (10) instead of the ∞-norm.

Feedback Construction
As usual, we use the approximate optimal value function V P and the optimality principle (4) in order to construct an approximate optimal feedback. More precisely, for any point x ∈ S 0 , S 0 := {x ∈ X : V (f,g) (x) < ∞}, we define u P (x) = argmin u∈U max w∈W {g(x, u) + V P (f (x, u, w))}.
We can immediately adapt Theorem 3 from [2] in order to obtain a statement about the performance of this feedback. The following result in particular shows that the feedback is robust with respect to arbitrary perturbations of the system. Then there exists a function δ : R → R with lim α→0 δ(α) = 0 such that for all sufficiently small ε, all sufficiently large i, all η ∈ (0, 1), all x 0 ∈ D c (i) and all perturbation sequences (w k ) k ∈ W N , the trajectory generated by g(x j , u P i (x j )), δ(ε/η) + ε .
Proof. We only point out how to suitably modify the proof of Theorem 3 in [2]. First note that according to Theorem 1, V P i converges uniformly to V (f,g) on D. The second observation is that if we choose i 1 ∈ N, i 1 > i 0 such that V (f,g) − V P i (x) ≤ ε/2 for i ≥ i 1 and all x ∈ D c (i 1 ), then {g(x, u) + V P i (f (x, u, w))} = g(x, u P i (x)) + max w∈W V P i (f (x, u P i (x), w)), i.e.
for all x k+1 ∈ f (x k , u P i (x), W ). The rest of the proof of Theorem 3 in [2] remains the same.
Remark 5. A particular application of our result is to robustify the feedback construction from [2] with respect to small perturbations which may be due, e.g., to discretization errors resulting from the numerical computation of the discrete time system from an ordinary differential equation. For this purpose, a particularly convenient way is to consider an "ε-inflated" system related to the original unperturbed system. More precisely, given an unperturbed control system f : X × U → X, one considers the perturbed system x k+1 = f (x k , u k ) + εw k , k = 0, 1, . . . , with w k ∈ [−1, 1] d for some (small) ε > 0. In the numerical realization, the sets F (x, u, W ) = f (x, u) + ε[−1, 1] d are easy to construct using ideas from rigorous discretization, see [13,14].

A Dijkstra's Method
Let (P, E) be a finite directed graph with edge weights g : E → [0, ∞). Let D ∈ P be the destination node. The following algorithm [10] computes the length V (P ) ∈ [0, ∞) of the shortest path from P to D for all nodes P ∈ P. for each Q ∈ P with (Q, P ) ∈ E 8 if V (Q) > g(Q, P ) + V (P ) then 9 V (Q) := g(Q, P ) + V (P ) An important feature of this algorithm is given by the following proposition, which follows immediately from the construction of the algorithm and the fact that the edge weights are nonnegative.
Proposition 5. During the while-loop in lines 4-9 of Algorithm 2 it holds that V (P ) ≥ V (P ) for all P ∈ P\Q.