An algorithm for event-based optimal feedback control

We present an algorithm for an event based approach to the global optimal control of nonlinear systems with coarsely quantized state measurement. The quantized measurements induce regions of the state space and the events represent the change of the system’s state from one quantization region to another. We investigate the theoretical properties of the approach and illustrate the performance by a numerical example.


I. INTRODUCTION
In this paper the problem of optimally controlling a nonlinear control system to a desired target set by means of a state feedback law is considered.We assume that for the evaluation of the feedback law only coarsely quantized measurements are available via suitable events.More precisely, we define certain thresholds for our system, like, for instance, fill levels (0%, 25%, 50%, . . . ) for a tank in a multi-tank system.These thresholds induce a partition of the state space into different regions (in the tank example 0%-25%, 25%-50%, . . . ) and we assume that only the region containing the initial state and the subsequent crossings of thresholds -the events -are known to the feedback controller.
For sampled data systems, it was observed in [3]- [5] that the set oriented approach to global optimal control problems for perturbed systems developed in [4] is suitable for solving the problem.In this approach, the uncertainties are modelled as perturbations [5] and the perturbed system is interpreted as a set-valued control system.In this paper, we extend the approach from [5] to an event based setting.For an analysis of the difference between sampled data and event based control we refer, e.g., to [1], [2], in which the performance of the two approaches is compared for a first order system.More information on event based control can be found, e.g., in [6], [7] and the references therein.
The basic event based algorithm, which is developed in Section II, already significantly improves upon the results using the sampled data approach.However, this basic algorithm only takes into account the region containing the current state, i.e., the last event.This is a quite conservative approach, because it only uses rather coarse information about the system's state when an event takes place, i.e., a threshold is crossed.Motivated by conceptually similar methods in the discrete event system literature, see, e.g., [8] and the references therein, and by the promising numerical results for sampled data systems from [5], in Section III This work was supported by the DFG priority program 1305.L. Grüne is with Mathematical Institut, University of Bayreuth, 95440 Bayreuth, Germany lars.gruene@uni-bayreuth.de F. Mu ller is with Mathematical Institut, University of Bayreuth, 95440 Bayreuth, Germany florian.mueller@uni-bayreuth.dewe extend the method by including information about past events: We determine the feedback value after the kth event not only depending on the current state region but also on the regions determined from previous events k − m, . . ., k − 1, leading to a kind of dynamic feedback concept.With considering past events the uncertainty about the place where the system crosses a threshold is narrowed down and therefore the conservartism is reduced.
In Section IV we give a theorem about the relation between the optimization with and without considering past information and in Section V we illustrate the efficiency of our approach with a numerical example.

II. PROBLEM FORMULATION
We consider the discrete-time nonlinear control system where The set of all control sequences u = u(k) k∈N is denoted by U N and for each initial value x 0 and control sequence u we denote the corresponding trajectory by x(k, x 0 , u).
Throughout the paper we interpret (1) as a discrete time model for a continuous time sampled-data system.
The control problem we consider is as follows: Given a target set X * ⊂ X , steer the system into X * while minimizing the functional over u, where N (x 0 , u) denotes the minimal k ≥ 0 such that x(k, x 0 , u) ∈ X * holds.Here c : X × U → R is a continuous running cost satisfying min u∈U c(x, u) > 0 for all x ∈ X * .Our goal now is to find a feedback law which approximately solves this problem, assuming, however, that the system's state is not exactly determinable.In order to formalize this uncertainty, we use a partition P of the state space X consisting of finitely many connected and disjoint subsets P i ⊂ X with the properties Pi∈P P i = X and P i ∩ P j = ∅ for all P i , P j ∈ P with i = j. ( In contrast to, e.g., [3], [4] we do not interpret the sets P ∈ P as a discretization which we are able to change according to our demands.Rather, the subsets P i of this partition model the quantization regions of the state measurements.Here we assume the partition P as given and do not address the question about how to choose good partitions.We assume that our target set X * is a union of such regions, i.e., X * = P∈P * P for some set P * ⊂ P .For the purpose of feedback control, we assume that the region P i containing the initial value x 0 is known and that each time the state crosses a region boundary an event e is triggered.By e i,j we denote the event which corresponds to the state moving from P i to P j .Note that knowing the initial region and the subsequent events is equivalent to knowing the region containing the current state of the system.Hence, we formally define the feedback value as a map µ : P → U and obtain the feedback value for a state x as µ(ρ(x)), using the correlation function ρ : X → P defined by ρ(x) := P if x ∈ P.
In order to construct an event based system from (1), for r ∈ N 0 we use the iterates f r (x, u) for x ∈ X , u ∈ U given by and define the following value.
Definition 2.1: For each x ∈ X with x ∈ P i and each u ∈ U we define the value r(x, u) to be the smallest value r ∈ N satisfying In other words, r(x, u) is the time the event e i,j is generated.
Formally, we could set r(x, u) = ∞ if f r (x, u) ∈ P i for all r ∈ N 0 .For the practical implementation, we impose an upper bound R ∈ N 0 for r(x, u) and generate the event e i,i if f R ∈ P i .
In order to specify the set valued system for our algorithm, we define the set 2 X of all subsets of X and the set of sequences (2 X ) N := {X = (X (0), X (1), . ..) | X (i) ⊂ X for all i ∈ N} and use the following concept of choice functions.
Definition 2.2: The set of all choice functions γ is denoted by C and the set of all component functions γ by C. With the components γ of the choice function γ we model the uncertainty of the state x by choosing the perturbed state γ(X , u) ∈ X depending on the control u in the region X containing x.The choice functions γ then extend this concept to a sequence of regions and controls.
Using the concept of partitions and choice functions we can define an event-based set valued control system by In what follows we will omit the arguments of γ k in order to simplify the notation.The map F describes all possible transitions of a subset X i ⊂ X of the state space to regions P ∈ P , parametrized by γ k .In other words, for each u ∈ U we have the identity A trajectory X (k, P 0 , u, γ), k ∈ N of ( 4) is now a sequence of regions defined by and depends on the initial set P 0 ∈ P , the control sequence u ∈ U N and the choice function γ ∈ C. Note that we can express each set valued trajectory X (0) = P i0 , . . ., X (k) = P i k as a sequence of events The next object defines the set of regions from which the system (4) can be steered to the target set X * regardless of the choice of γ.
Definition 2.3: The domain of controllability of X * is defined as and the first hitting time is defined as N (P, u, γ) = inf{k ∈ N|X (k, P, u, γ) ∈ X * }.Note that for fixed P we can interpret γ as a map from U N to X N .In the language of dynamic game theory this map defines a nonanticipating strategy, cf.[3], [4].
Using the running cost c we now define a cost function for the event based set valued control system (4) and r(x, u) from Definition 2.1.By this definition we assume the worst case, i.e., the highest cost, over all the uncertain states x ∈ P. Using c 1 we now define the functional with values in R +,0 ∪ {+∞} and the optimal value function By standard arguments one sees that V 1 fulfills the optimality principle for all P ⊂ X * and V (P) = 0 for all P ⊂ X * .Since P consists of finitely many sets, from this it is easy to see by induction that In particular, the domain of controllability is easily obtained once V 1 is computed.
We will now investigate the behavior of V 1 along an optimal trajectory for the original system (1).To this end, observe that the optimal feedback law µ : P → U is the control value realizing the minimum in (6), i.e., Using this µ we get the following theorem.Theorem 2.4: For all x ∈ X with ρ(x) ∈ S the inequality holds for r = r(x, µ(ρ(x))) .
Proof: Using the optimality principle (6) and the definition of µ, γ and c 1 we get which shows the assertion.The result has an immediate consequence for the trajectory x(k, x 0 , µ) of (1) with feedback control µ defined by is reached and strictly decreasing for each k in which an event is triggered.This implies that x(k, x 0 , µ) eventually reaches Remark 2.5: The advantage of the event based approach compared to the sampled data approach in [3]- [5] can be explained as follows: In these references the set valued map F is constructed directly from (1).Hence, if there exists P i ∈ P and x ∈ P i with f (x, u) ∈ P i for all u ∈ U, then F (P i , u, γ(P i , u)) = P i holds for γ(P i , u) = x.Hence, the optimality principle (6) immediately implies V (P) = ∞.Using f r(x,u) (x, u) instead of f (x, u) for constructing F resolves this problem, because -unless f r (x, u) ∈ P i for all r ≥ 0 -the set valued map F will always satisfy F (P i , u, γ(P i , u)) = P i .

III. INCLUDING PAST INFORMATION
The approach described in the previous section is conservative because by maximizing over γ we implicitly assume the worst case in each step along the trajectory, i.e., that for each k among all the possible states in X (k) the actual state x(k) is the one which produces the largest cost.Of course, this is not necessarily the case.The approach we propose in order to reduce the conservatism relies on the idea that at time k we consider the last m measurements in order to compute the feedback µ.This way we can collect more information, thus reduce the uncertainty of the system and consequently obtain a less conservative result.In other words, we are now looking at an approximately optimal feedback map of the form µ m+1 (X (k − m), . . ., X (k)) where again the regions X (k) can be reconstructed from the knowledge of the initial region containing x 0 and the subsequent events.Note that this construction resembles the dynamic feedback concept well known in observer design.
In order to keep the exposition simple, we restrict ourselves to m = 1.All arguments can, however, be extended to the more general setting m ≥ 1.Our goal in this case is to find a feedback law µ 2 (X (k − 1), X (k)), or, using the equivalent event characterization (5), µ 2 (e(k)).
For Z we define the event-based set valued control system as with F from (4) and with r(x, u) from Definition (2.1).Here the symbol δ represents the "undefined" region, which appears when the system is started at time k = 0 with initial region P 0 ∈ P but undefined previous region P −1 .Therefore, at time k = 0 a trajectory starts with the vector Z(0) = (δ, P 0 ) T .
By including the extra information in the definition of F 2 the uncertainty of the system is reduced.Instead of using F (X (k), u(k), γ k ) as in the previous section we use now F (X (Z(k)), u(k), γ k ), where X(Z(k)) is a subset of the current region X (k).The set X(Z(k)) contains only those states which can be reached from the past region Z 1 (k) = X (k − 1), i.e., we exclude those states from Z 2 which the system cannot reach.
Clearly, not all the pairs Z = (P i , P j ) T ∈ P 2 are actually attained by the systems dynamics.In fact, only those pairs with X(Z) = ∅ can appear on the left hand side of ( 8) which is why we define the active state regions We denote the trajectories of ( 8) by Z(k, Z 0 , u, β) and adapt the definitions from the previous section to our new setting.The target set now becomes Z * = {Z ∈ P 2 | Z 2 ⊆ X * } and the definition of the domain of controllability S and the first hitting time N changes accordingly.For the cost function we define the functional and the optimal value function V 2 again fulfills the optimality principle The optimal feedback µ 2 (Z) is given by the argmin of this expression.The following theorem is the counterpart of Theorem 2.4.
Theorem 3.1: For all x ∈ X and all Z ⊂ S with x ∈ X (Z) the inequality holds for r = r(x, µ 2 (Z)).In particular, the inequality holds for Z = (δ, ρ(x)) T .
Proof: Completely analogous to Theorem 2.4.

IV. COMPARISON OF THE TWO APPROACHES
In the preceding sections we have introduced the optimal value functions V 1 and V 2 and the corresponding feedback laws µ and µ 2 .In this section we now show that V 1 is an upper bound for V 2 .In [5] a similar theorem for the sampled data approach is proven.
Theorem 4.1: The optimal value functions V 1 and V 2 satisfy 2 , P ∈ P with Z 2 = P.
Proof: We prove the theorem by induction over the elements P 1 , P 2 , . . ., P l ∈ P which we number according to their values in the optimal value function V 1 , i.e., V 1 (P i ) ≤ V 1 (P j ) for all 1 ≤ i < j ≤ l.We will frequently use the obvious inclusion X(Z) ⊆ Z 2 for X(Z) from ( 9) and all Z = (Z 1 , Z 2 ) T ∈ P 2 .

Induction start n = 1:
Since V 1 (P) = 0 holds if and only if P ⊆ X * we obtain P 1 ⊆ X * .Since Z ⊆ Z * for all Z ∈ P 2 with Z 2 = P 1 ⊆ X * we obtain V 2 (Z) = 0 = V 1 (P 1 ) and thus the assertion for P 1 .

Induction step n → n + 1:
We use the induction hypothesis V 2 (Z) ≤ V 1 (P j ) for all j = 0, . . ., n and all Z ∈ P a 2 with Z 2 = P j in order to show By positivity of c 1 this implies for all γ and thus the numbering of the P j yields Now the optimality principle for V 2 yields where Z max = (P n+1 , P i ) T denotes the element from {F 2 (Z, µ(P n+1 ), γ) | γ ∈ C} realizing the supremum, which exists because F 2 can only assume finitely many values.Now X(Z) ⊆ P n+1 implies P i = F (P n+1 , µ(P n+1 ), γ) for some suitable γ and thus from (11) we can conclude i ≤ n.Furthermore, from the optimality principle for V 1 we obtain Using the induction assumption we can continue to estimate which together with (12) yields the assertion.
In practice, we expect V 2 to be considerably smaller than V 1 , as the numerical example in the following section confirms.Theorem 4.1, however, only yields V 2 ≤ V 1 because system (8) may not contain any useful additional information compared to (4), which is theoretically possible but appears to be an exceptional case.

V. EXAMPLE
We illustrate our approach with the example of a temperature and fill level control of a tank model.This model is part of the experimental plant "VERA" at the Ruhr-Universität Bochum.Figure 1 shows a schematic image of the model.
where q(u 1 ) = 0.07 • 10 −4 (11.1uGravitational constant ϑ ext 293.15K Temperature of inflowing water.The discrete time system (1) is obtained by sampling the continuous time system with sampling period T = 1.0.The goal of the optimization is to reach the target set as fast as possible which corresponds to the running cost c ≡ 1.
The optimal value functions are computed with a graph theoretic algorithm.To this end, we numerically construct a weighted directed hypergraph, in which for the computation of V 1 each state region P i and for the computation of V 2 each event e i,j is represented as a vertex.The transitions of the set valued system generate the hyperedge in the hypergraph.Once the graph is constructed, we can compute the optimal value functions with a min-max version of Dijkstra's shortest path algorithm, see [4], [9] for details.Since this algorithm relies on the optimality principles ( 6) and (10), for the implementation we do not need an explicit representation of the choice function γ, because by definition of F we get in ( 6) and an analogous expression in (10).
In our first computation we use 16 2 quantization regions in order to cover the state space and a target set around the operating point (x 1 = 0.349 m, x 2 = 310.56K) consisting of the 4 regions indicated in black in Figure 2, which shows the optimal value function V 1 .The clearly visible jump in the values at x 1 = 0.36 m is explainable with the help of the worst case trajectories used in the optimization over the set valued dynamics.A worst case trajectory starting in P, is defined as where γ k is the function realizing the supremum sup on the right hand side of the optimality principle (6).
Figure 3 shows a typical worst case trajectory with starting region to the right of x 1 = 0.36 m.Due to the uncertainties included in the model, instead of approaching the target directly, each optimal worst case trajectory starting right of x 1 = 0.36 m first moves to the top of the state space, then turns left and eventually moves down to the target which explains the jump in the optimal value function at x 1 = 0.36 m.
In Figure 4 we see the optimal value functions V 2 for the same partition.The values are smaller and the jump in the values of V 2 has vanished.On a partition with 8 2 regions, the approach via V 1 is no longer feasible because a large part of the state space does not belong to the domain of controllability S. In contrast to this, for V 2 we still get a useful solution as shown in Figure 5.However, this clearly visible advantage comes at the expense of larger (offline) computational effort: on a PC with an Intel Core2 Duo E6850 CPU running at 3.00GHz the computation of Fig. 2  Finally, in order to compare our event based approach to the sampled data approach, Figure 6 shows the optimal value function on a partition with 128 2 regions, where F was constructed directly from a sampled data model with sampling period T = 6.0.For this approach, such a fine partition is necessary, because on a coarser partition a large part of the state space is no longer controllable to the target.This illustrates the advantages of our proposed event based approach due to the effect described in Remark 2.5.

VI. CONCLUSION
In this paper we have introduced an event based algorithm for the optimal feedback control of nonlinear systems with coarse quantization.Compared to similar approaches for sampled data systems, the algorithm is able to obtain stabilizing feedback laws on much coarser quantizations.

Fig. 4 .
Fig. 3. Worst case trajectory needed 2.42s, while Figures4 and 5took 201.51s and 75.08s, respectively.In all cases, the construction of the graph is the by far most expensive part of the algorithm.