Practical NMPC suboptimality estimates along trajectories

In this paper we develop and illustrate methods for estimating the degree of subopti-mality of receding horizon schemes with respect to inﬁnite horizon optimal control. The proposed a posteriori and a priori methods yield estimates which are evaluated online along the computed closed–loop trajectories and only use numerical information which is readily available in the scheme.


Introduction
Receding horizon control (RHC), often also termed model predictive control (MPC), is by now a well established method for the optimal control of linear and nonlinear systems [1,13]. The method approximates the solution to a infinite horizon optimal control problem which is computationally intractable in general by a sequence of finite horizon optimal control problems. Then the first element of the resulting control sequence is implemented in each time step which generates a closed-loop static state feedback. The approximation of the infinite horizon problem naturally leads to the question about the suboptimality of the resulting MPC feedback. Hence our main task is to give estimates of the degree of suboptimality -and implicitly for stability -of the MPC feedback with respect to the original infinite horizon cost functional. This matter was treated in a number of papers, see e.g. [3][4][5][6]10,18]. Here we deal with discrete-time nonlinear systems on arbitrary metric spaces and use finite horizon optimal control problems without terminal costs or terminal constraints. For these schemes, we present techniques for estimating the degree of suboptimality online along the closed-loop trajectory. The techniques rely on the computation of a "characteristic value" α at each time instant n along the closed-loop trajectory x(n) and the actual estimate can then be computed from the collection of all these α-values. Like in [4] or [6], our approach is based on relaxed dynamic programming. The motivation for this work is twofold: on the one hand, we expect trajectory based estimates to be less conservative than the global estimates derived, e.g., in [4], [6] or [18], because in these references the worst case over the whole state space is estimated while here we only use those points of the state space which are actually visited by the closed-loop trajectory. On the other hand, we expect that our trajectory based estimates can be used as a building block for MPC schemes in which the optimization horizon is tuned adaptively, similar to adaptive step size control in numerical schemes for differential equations. In this context, the computational cost for evaluating our estimates is a crucial point and this is where the two techniques we present differ. While the first estimation technique yields a sharper estimate, it can only be evaluated a posteriori, i.e., the value α for time n can only be computed at time n + 1. The second technique leads to a more conservative estimate of α but is computable with smaller effort from values which are known at time n. Moreover we present results for the case of practical stability and hence extend results from [4] which treat this problem globally by combining them with results from [7] for the non-practical case to relax the necessary conditions. The paper is organized as follows. In Section 2 we describe the problem setup and give the basic relaxed dynamic programming inequality which leads to our first estimation method. In the following Section 3 we state our main theorem which leads to an alternative estimation method. In Section 4 we extend our main theorem to the case of practical suboptimality and illustrate both methods for these two cases by means of a numerical simulation in Section 5. The final Section 6 concludes the paper.

Problem formulation
Throughout this paper the nonlinear discrete-time system with x(n) ∈ X and u(n) ∈ U for n ∈ N will the basis of this analysis. Here the state space X is an arbitrary metric space and we denote the space of control sequences u : N 0 → U by U.
Remark 1 Note that this in particular means that presented results also apply to the discrete-time dynamics induced by a sampled infinite dimensional system, cf. [6] for a numerical example and [9] for a continuous-time analysis of this setting.
For this control system we want to find a static state feedback u = µ(x) ∈ U which minimizes the infinite horizon cost functional with stage cost l : X × U → R + 0 . The optimal value function for this problem will be denoted by V ∞ (x 0 ) = inf u∈U J ∞ (x 0 , u). Moreover one can prove optimality of a feedback law µ via using Bellman's optimality principle for a given optimal value function.
Remark 2 In order to simplify the presentation here and in the following we will assume that the minimum with respect to u ∈ U is attained.
Here we will use a receding horizon approach in order to avoid the problem of solving an infinite horizon optimal control problem which necessarily involves the solution of a Hamilton-Jacobi-Bellman equation. Doing this we replace the previously stated problem by a sequence of finite horizon optimal control problems. Therefore we minimize the truncated cost functional l(x u (n), u(n)) and denote the associated optimal value func- . Moreover we will use the abbreviation for the minimizing open-loop control sequence of the reduced cost functional. This control gives us the open-loop solution to the initial value x u N (0, x 0 ) = x 0 where u N (x 0 , n) represents the n-th control value within the open-loop control sequence. In order to obtain an infinite control sequence from this setting we define a feedback law µ N by implementing only the first element of the optimal control sequence u N . By Bellman's principle of optimality this feedback law µ N is given by and we denote the corresponding closed-loop system by In the literature this setup is usually called nonlinear model predictive control (NMPC) or receding horizon control (RHC). Note that the optimal value function on a finite horizon can also be expressed via these feedback laws via The main idea of our task is to make this strategy to be real time applicable. Here our aim is to reduce the computational time necessary for computing (4) in every step of this NMPC setup and at the same time to be able to guarantee a certain degree of suboptimality of the solution (6), (7) compared to the infinite horizon solution (1), (3) with u(n) = µ(x(n)).
To this end we define the infinite horizon cost corresponding to µ N by and develop upper bounds for this infinite horizon value, either in terms of the finite horizon optimal value function V N or in terms of the infinite horizon optimal value function V ∞ . In particular, the latter will give us estimates about the degree of suboptimality of the controller µ N in the actual step of the NMPC process.
The main tool we are going to use for this purpose is a rather straightforward and easily proved "relaxed" version of the dynamic programming principle.
Recently, this has been studied by Lincoln and Rantzer in [11,17]. Here we want to use the estimate given in Proposition 2.2 of [4] but adapt it to hold only along trajectories (6), (7).
Proof. The proof is similar to that of [17,Proposition 3] and [4, Proposition 2.2]. Rearranging (9) and summing over n we obtain the upper bound Hence, taking K → ∞ gives us our assertion since the final inequality V N ≤ V ∞ is obvious.
Since all values in (9) are available at runtime α can be easily computed online along the closed-loop trajectory and thus (9) yields a computationally feasible and numerically cheap way to estimate the degree of suboptimality of the trajectory. However, using (9), for the computation of α for the state x(n) we need to know V N (x(n + 1)). In other words, (9) yields an a posteriori estimator, which is an obvious disadvantage if α is to be used in order to determine a suitable horizon length N at time n. Hence, in the next section we present an alternative way in order to estimate α.

Suboptimality estimation
This section aims at reducing the amount of information necessary to give an estimate of the degree of suboptimality of the trajectory (6), (7) under consideration. Here we are interested in ignoring all future information which is to say we try to avoid the use of V N (x(n + 1)) in our calculations. Of course, this will in general yield a more conservative estimate.
Since we deal with NMPC schemes without terminal costs we can always use The following estimates are similar to certain results in [4], where, however, they were defined and used globally for all x ∈ X. In order to make those results computable without using a discretization of the state space X or analytical a priori information, here we formulate and prove alternative versions of these results which can be used along trajectories.
Lemma 4 Consider N ∈ N, a receding horizon feedback law µ N and its associated closed-loop solution x(·) according to (7) with initial value ) holds for some α ∈ [0, 1] and all n ∈ N, then V N satisfies (9) and holds for all n ∈ N 0 .
Proof. Using the principle of optimality and (10) one can easily see that (9) holds and hence Proposition 3 guarantees the assertion. In order to shorten notation the following assumption contains the main ingredients for our results.
Assumption 5 For a given N ∈ N, N ≥ 2 there exists γ > 0 such that the inequalities (12) hold for all k ∈ {3, . . . , N} and all n ∈ N 0 where the open-loop solution x u N (j, x(n)) is given by (5).
We would like to point out that x(·) in Assumption 5 is the closed-loop trajectory which is the outcome of the NMPC algorithm, i.e. equations (6) and (7). In contrast to that x u N (·, x(·)) represents the open-loop solutions coming from (4) and (5) which are also known in every single step of this algorithm. Note that those two values are not identical in general.
Proposition 6 Consider N ≥ 2 and assume that Assumption 5 holds for this N.
Proof. In the following we use the abbreviation x u N (j) := x u N (j, x(n)), j = 0, . . . , N, since all our calculations using the open-loop trajectory defined by (4), (5) refer to the fixed initial value x(n). Setñ := N − k. Using the principle of optimality and Assumption 5 we obtain for all k ∈ {3, . . . , N} and all n ∈ N. Now we will prove the main assertion For notational reason we will use the abbreviation with n being arbitrary but fixed we obtain For the induction step k → k + 1 the following holds using the induction assumption If we choose k = N then we getñ = 0. Inserting this in our induction result we can use x u N (0) = x u N (0, x(n)) = x(n) and our assertion holds. Using this technical construction we can now state our first main result: Theorem 7 Consider γ > 0 and N ∈ N, N ≥ 2 such that (γ + 1) N −2 > γ N holds. If Assumption 5 is fulfilled for these γ and N, then the estimate holds for all n ∈ N.
Proof. Considering Proposition 6 and j = n − 1 we obtain and the assertion holds. Theorem 7 immediately leads to our second suboptimality estimate: at each time instant n we can compute γ from the inequalities (11) and (12) (cf. Remark 12) and then compute α according to (14). Furthermore, if l is independent of u (as in our numerical example) then the control value is not needed at all and thus γ can be computed only from the data available at time n. In either case we obtain an a priori estimate which is available before the current step is actually carried out.
Remark 8 Intuitively, this result states that if the instantaneous running cost contains sufficient information about the optimal value function, then the resulting controller will be suboptimal. Here we can say that V N and l contain "sufficient" information if we obtain α > 0 from Proposition 3 by using (15) and there exists a γ > 0 such that the inequalities (11) and (12) hold. Now our aim is to weaken our previous relation between α and γ. To this end one can see from the proof of Proposition 6 that we need (11) to establish the induction anker only. This leads to the following relaxation of Assumption 5: Assumption 9 For a given N, N 0 ∈ N, N ≥ N 0 ≥ 2 there exists γ > 0 such that the inequalities hold for all k ∈ {N 0 + 1, . . . , N} and all n ∈ N 0 where the open-loop solution x u N (j, x(n)) is given by (5).
Using these relaxed conditions we can reformulate Propostion 6: Proposition 10 Consider N ≥ N 0 ≥ 2 and assume that Assummption 9 holds for these constants. Then holds for all n ∈ N 0 .
Proof. According to the changes in our assumptions we only have to show that the induction anker in the proof of Proposition 6 holds true, i.e. k = N 0 . We define η k = (γ+1) k−N 0 (γ+1) k−N 0 +γ k−N 0 +1 and obtain the induction anker via Using this proposition it is easy to prove Theorem 11 Consider γ > 0 and N, N 0 ∈ N, N ≥ N 0 such that (γ + 1) N −N 0 > γ N −N 0 +2 holds. If Assumption 9 is fulfilled for these γ, N and N 0 , then the estimate holds for all n ∈ N.
Proof. Since the proof is identical to the proof of Theorem 7 now using Proposition 10 we omit it here.
Remark 12 (i) Assumption 9 generalizes Assumption 5 and [4, Assumption 4.6] in which N 0 = 2 was used. In the numerical example, below, we will see that a judicious choice of N 0 can considerably improve our estimates.
(ii) The only data which is not immediately available is the control value µ j−1 (x u N (N − j, x(n))) in Assumption 9 which needs to be determined by solving an additional optimal control problem with horizon j − 1 ≤ N 0 − 1. Since typically N 0 is considerably smaller than N, this can be done with much less effort than computing V N (x(n + 1)). Furthermore, this control value is not needed at all if l does not depend on u as in our numerical example in Section 5.
Remark 13 (i) Another way of numerically computing suboptimality estimates was presented in [18] for linear finite dimensional system. The main difference to our approach is that the condition in [18] has to be verified by computing numerical approximations to the optimal value functions, which is feasible only for low dimensional linear systems but infeasible in our nonlinear setting on arbitrary metric spaces.
(ii) Asymptotic stability can be concluded from our suboptimality results if α > 0 and the running cost l is positive definite, for details see [6]. Furthermore our results can be extended to practical optimality and stability similar to [4]. Moreover one observes that in order to be stable, i.e. α ≥ 0, we need to hold. Note that h ′ (γ, N 0 ) = 2γ ln(γ + 1) − 2γ ln(γ) + 2 ln(γ + 1) (γ + 1)γ(− ln(γ + 1) + ln(γ)) 2 is independent of N 0 . Hence, since for all x ≥ x ≈ 0.01 we obtain h ′ (x, N 0 ) > 1 for fixed N 0 ∈ [2, N], we can use the positive definiteness and strict monotonicity of h ′ (·, N 0 ) to conclude that h(·) grows unboundedly and stronger than linear. Additionally we obtain that N grows less than quadratic since for x ≈ 5.7 we have h ′ (x) < x for all x ≥ x.

Practical Optimality
In general, one cannot expect that the conditions presented in the section 3 hold in practice. This is due to the fact that in many cases the discrete-time system (1) is obtained from a discretization of a continuous-time system, e.g. sampling with zero order hold, see [15,16]. Hence, even if the continuous-time system is controllable to a fixed point x * , it is likely that the corresponding sampled-data system is only practically stabilizable at x * . In addition, numerical errors in the optimization algorithm may become predominant in a small neighborhood of x * . One way to adapt results from Theorem 7 and 11 could be to modify the stage cost l to be positive definite with respect to a forward invariant stabilizable neighborhood N of x * . However, the computation of N and hence the design of l may be difficult if not impossible. Still, even with the original l and in the presence of sampling and numerical errors one would expect a receding horizon controller to practically stabilize the system, i.e., to drive the system towards a small neighborhood of x * , see also [8]. Hence, an intuitive idea is to "virtually" cut off and shift the running cost. Note that we still solve the original optimal control problem but that we interpret the resulting trajectory and control values using a modified cost function.
Proposition 14 Consider a feedback law µ N : X → U and its associated closed-loop trajectory x(·) according to (7). Assume there exists a function V N : X → R + 0 satisfying the inequality 16) for some α ∈ [0, 1], some ε > 0 and all n ∈ N 0 . Consider a discrete time interval I := {n 1 , . . . , n 2 }. Let n 1 , n 2 ∈ N, n 1 < n 2 , for which the inequality l(x(n), µ N (x(n))) ≥ ε holds for all n ∈ I and set σ := V N (x(n 2 + 1)). Then, for the modified stage cost and the corresponding functional V µ N I (x(n)) = n 2 j=n l(x(j), µ N (x(j))) using the controller µ N the estimate holds for all n ∈ I.
Proof. From the definition of l and I we obtain for n ∈ I. Thus, summing over n gives us which implies the assertion since V N ≤ V ∞ is obvious. If (16) is satisfied, then inequality (18) is true for all subintervals I = {n 1 , . . . , n 2 } of N 0 on which l(x(n), µ N (x(n))) ≥ ε holds, implying that on I the corresponding trajectory behaves "almost" like an infinite horizon optimal one. In particular, x(n) approaches x * and thus the sequence of l-values will repeatedly (possibly infinitely often) enter and leave the set [0, ε], cf. Figure 2, in which the arrows at the bottom indicate the intervals on which the trajectory is approximately optimal. When leaving [0, ε] the possible growth of V N is bounded by ε due to (16). Thus, once l(x(n), µ N (x(n))) has entered [0, ε] for the first time, the system will remain in a neighborhood of x * defined by a sublevel set of V N whose precise shape, however, cannot be easily determined a priori, see also [4, Remark 5.2 and Example 5.10].
Similar to Lemma 4 we can use the following observation: Lemma 15 Consider N ∈ N, a receding horizon feedback law µ N and its associated closed-loop solution x(·) according to (7) with initial value Proof. Using the principle of optimality and (19) we obtain Hence (16) holds and Proposition 14 guarantees the assertion. Similar to the non-practical case we can now state sufficient conditions which will later on be used to guarantee our practical suboptimality estimate.
Assumption 16 For a given N ∈ N, N ≥ 2 there exist γ, ε > 0 such that the inequalities hold for all k ∈ {3, . . . , N} and all n ∈ [0, n 0 ] ⊂ N 0 using the abbreviation for the open-loop solution given by (5).
Proposition 17 Consider N ∈ N and assume that Assumption 16 holds for this N. Then Proof. In order to prove the assertion we show that given Assumption 16 for V k , k = 2, . . . , N, Assumption 5 holds for are modified optimal value function V k and hence Proposition 6 will guarantee our assertion. By considering the stage costl(x, u) := l(x, u) − ε which gives us V k (x) = V k (x) − kε for arbitrary x and u we can use the same proof as for Proposition 6. Note thatl and V k (x) may become negative. Still the induction holds since we do not have to assume those terms to be nonnegative in this construction. Hence, we have shown that (11), (12) hold for either γ = 0 or γ from Assumption 16 for V k andl and we can conclude via Proposition 6. Finally, setting V k (x) = V k (x) − kε and choosing k = N proofs the assertion.
Theorem 18 Consider γ and N ∈ N such that (γ + 1) N −2 > γ N holds. If Assumption 16 is fulfilled for these γ, N and some ε > 0, then the estimate Proof. Using Propositon 17 we obtain

Now we use a construction similar to the proof of Theorem 7 and obtain (19) to hold with
Hence Lemma 15 provides the assertion. Since in our numerical examples results using Assumption 9 instead of Assumption 5 are significantly better we want to adapt this to the present case of practical stability.
Assumption 19 For a given N, N 0 ∈ N, N ≥ N 0 ≥ 2 there exists γ > 0 such that the inequalities Again one can easily see that Assumption 16 is a special case of Assumption 16 by setting N 0 = 2.
Proposition 20 Consider N ∈ N and assume that Assumption 19 holds for this N. Then Proof. Similar to the proof of Proposition 10 we can show the induction anker in the proof of Proposition 17 to hold. Hence we can conclude our assertion analogously.
Theorem 21 Consider γ and N ∈ N such that (γ + 1) N −N 0 > γ N −N 0 +2 holds. If Assumption 19 is fulfilled for these γ, N and some ε > 0, then the estimate Proof. Similar to the proof of Theorem 18 we obtain our assertion using Proposition 20 and Lemma 15.
Note that Assumption 9 represents sufficient conditions separately for each N 0 . Hence one is free to choose the best resulting estimate.

Numerical Experiments
In order to illustrate our results we consider a digital redesign problem (cf. [14]) of an arm/rotor/platform (ARP) model: For this system a continuous-time full-state feedback u 0 was designed via backstepping such that the output ζ(t) := x 5 (t) − a 3 a 1 −a 2 a 3 [x 6 (t) − a 3 x 7 (t)] is close to x 5 (t) and tracks a given reference signal ζ ref (t) = sin(t), see [2,Chapter 7.3.2] for details on the backstepping design and the specification of the model parameters.
To solve the sequence of optimal control problems we use a direct approach, i.e. discretize the continous-time system and use an SQP method to solve the resulting optimization problem. In order to make our results comparable we fix the initial values to x(t 0 ) = (0, 0, 0, 0, 10, 0, 0, 0), the absolute and relative tolerances for the solver of the differential equation   For this trajectory the estimates from Proposition 3 are shown in Table 1.  Table 1 α values from Propositon 3 In our simulation, the exact α-values from Proposition 3 are close to one for the first iteration steps indicating that the feedback is almost infinite horizon optimal. However, from iteration step 6 onwards the values become smaller and even negative which shows that optimality is lost here. One possible reason for this is that here the values of V N are close to the accuracy of the optimization routine and the tolerances of the solver of the differential equations, hence numerical errors become dominant. Nevertheless, the measured values of α in conjunction with the values of V N show that the closed loop system behaves "almost optimal" until a very small neighborhood of the reference trajectory is reached which is exactly what we expected to happen. Now we compare our results from Proposition 3 and Theorems 7 and 11 in Table 2 to see how conservative the apriori estimate actually is.  Table 2 Comparing results from Theorem 11 for H = 10 · T and various N 0 From Table 2 one can clearly see that the enhanced apriori estimate given by Theorem 11 is by far not as conservative as the apriori estimate given by Theorem 7 which even does not even guarantee stability and, compared to results from Proposition 3, we obtain very good results. Table 2 also shows that one can not specify a "best" N 0 . However, since all necessary values are computable or can be estimated easily one can use a simple maximization to obtain the "best" possible estimate α. In Table 1 one can see that our estimates deteriorate when the values of V N (x(n)) or l(x(n), µ n (x(n))) are very small, implying that optimality of the trajectories is lost in a small neighborhood of the reference solution. Note that this is also true for Table 2 which we truncated for reasons of clarity. One can only speculate whether this is due to sampling or due to numerical optimization errors, most likely the effect is caused by a combination of both. In either case, one would expect to obtain better estimates when considering practical optimality. Using Proposition 17 for our model problem we obtain the following values for α for horizon length H = 10 · T and ε = 10 −6 . Here we do not compare  Table 3 α values from Propositon 17 results for suboptimality and practical suboptimality since it is clear from the inequalities that the estimates for practical suboptimality will always yield better α values. Note that here the resulting values for α are the outcome of a simple maximization for ε. We also like to point out that for n = 7 and 8 we have that l(x(n), µ N (x(n))) < ε. Here we have by definition α = 1 since we are in the chosen ε-region. Note that for n = 9 we leave the ε-region but the optimization routine takes us back there in only one step. Hence, using the notation from Proposition 14 here we have two intervals I 1 = {0, 6} and I 2 = {9}.
We now compare the results from Theorem 18 and all the previously mentioned results. In Table 4 Table 4 Comparing results from Propositions 3 and 17 as well as Theorems 11 and 21 for Theorem 21 we solved all possible N 0 -subproblems and took the maximal value of α. We may have stated the results for each N 0 separately but this will give us no more insight information than we already obtained in the nonpractical case.

Conclusion
Using known results for rigorous suboptimality estimates of receding horizon controllers we have derived variants and extensions which are computable along trajectories. The advantage of these approaches are their simple evaluability using already calculated or easily computable values. Furthermore we have shown conditions which guarantee a certain degree of suboptimality and are applicable in a real time setting.
Using numerical experiments we showed that the presented conditions can be verified easily and hence our estimates can be a good starting point for designing algorithms which adaptively choose suitable optimization horizons N. Future research will address the design of algorithms which adaptively compute suitable optimization horizons N based on our suboptimality estimates.