Optimal Opinion Control: The Campaign Problem

Opinion dynamics is nowadays a very common field of research. In this article we formulate and then study a novel, namely strategic perspective on such dynamics: There are the usual normal agents that update their opinions, for instance according the well-known bounded confidence mechanism. But, additionally, there is at least one strategic agent. That agent uses opinions as freely selectable strategies to get control on the dynamics: The strategic agent of our benchmark problem tries, during a campaign of a certain length, to influence the ongoing dynamics among normal agents with strategically placed opinions (one per period) in such a way, that, by the end of the campaign, as much as possible normals end up with opinions in a certain interval of the opinion space. Structurally, such a problem is an optimal control problem. That type of problem is ubiquitous. Resorting to advanced and partly non-standard methods for computing optimal controls, we solve some instances of the campaign problem. But even for a very small number of normal agents, just one strategic agent, and a ten-period campaign length, the problem turns out to be extremely difficult. Explicitly we discuss moral and political concerns that immediately arise, if someone starts to analyze the possibilities of an optimal opinion control.


Introduction
Since about 60 years the dynamics of opinions has been studied. Today it is a standard topic of general conferences on agent-based modelling. A bunch of models were defined and analyzed. 1 In the last 15 years at least hundreds and probably more than thousands of simulation studies on the dynamics of opinions were published. 2 The studies and their underlying models differ in many details: The opinions and the underlying time are continuous or discrete, the updating of opinions is governed by different updating regimes, the space of possible opinions may have more than one dimension, the dynamics may run on this or that type of network, various types of noise may be involved. But despite of all their differences there is a commonality in all these studies and their models: The agents influence mutually their opinions, but they do not do that strategically.
What is studied in the huge body of articles is typically focusing on convergence, dynamical patterns or final structures. Given the specific parameters of the underlying model, the typical questions are: Under what conditions does the dynamics stabilize? Does the dynamics lead to consensus, polarisation, or other interesting types of clustering? What are the time scales that are involved? What remains unasked in these studies are strategic questions like: Which opinion should an agent pretend to have to drive the whole dynamics in his or her preferred direction? Where in the opinion space should an agent 'place' an opinion, given that he or she has a certain preference with regard to the opinions in later periods. Our article deals with such strategic questions. We develop a conceptual framework that allows to answer strategic questions in certain cases. Additionally, we analyze why it is surprisingly difficult or impossible to give exact answers to strategic questions even in cases that look very, very simple.
It is not by accident that strategic questions are normally not raised in the sort of research that is labeled opinion dynamics. The standard approach is to describe the dynamics as a more or less complicated dynamical system: There is a set I of agents 1, 2, . . . , i, j, . . . , n and a discrete time t = 1, 2, . . .. The opinions of the agents are given by a vector (or profile) x t = (x t 1 , . . . , x t n ) that describes the state of the system at time t. Even if stated in an informal or semi-formal way (sufficiently clear to program the process), the dynamics of the system is basically given by a function f t that computes the state of the system x t+1 as x t+1 = f t (x t ).
Thus, for each agent i the function f t specifies how x t+1 i depends upon x t . Depending upon the specific opinion dynamics model, the vector valued functions f t work in very different ways. For the most part they do some kind of averaging: averaging with a privileged weight for agent i's own opinion or a weighted averaging with weights w ij that agent i assigns to agents j and that are meant to express agent i's respect for agent j, or some other sort of averaging subject to constraints, for instance constraints in terms of network distance on an underlying network on which the opinions dynamics is assumed to run.
Whatever the 'story' about f t , it is always a reactive process in which the agents react on the past and only the past. But, an answer to the question, where to place an opinion in order to drive the dynamics in a preferred direction, requires something else: Anticipation, i.e., finding out what the future effects of placing an opinion here or there in the opinion space probably are, and then placing it there, where the placement is most effective to get to a preferred outcome.
In the following, we assume a setting in which we have two sets of agents: First, a set of non-strategic agents as they are usually assumed in opinion dynamics. They are driven by the function f t . The function describes a dynamical system in which the non-strategic agents always reveal their true actual opinion, take the opinions of others as their true actual opinion, and mutually react on the given opinion profile x t according to f t (x t ). The second set of agents is a set of strategic agents. Whatever their true opinion may actually be, they can place any opinion strategically in the opinion space where, then, non-strategic agents take these opinion seriously, i.e., consider them as revealed true opinions of other agents.
The strategic agents have preferences over possible opinion profiles of non-strategic agents. Therefore, they try to place opinions in a such a way that the opinion stream that is generated by f t is driven towards the preferred type of profile. Thus, the strategic agents somehow 'play a game' with the non-strategic agents. Non-strategic agents do not realize that and take the strategically placed opinions of strategic agents as if they were revealed true opinion of 'honorable agents'.
Our setting has a structure as it is often perceived or even explicitly 'conceptualized' by political or commercial campaigners: There is an ongoing opinion stream, result of and driven by mutual exchange of opinions between communicating citizens, consumers, members of some parliamentary body, etc. That opinion stream has its own dynamics. However, it is possible to intervene: Using channels of all sorts (TV, radio, print media, mails, posters, adds, calls, speeches, personal conversation, etc.) one can place opinions in the opinion space. Done in an intelligent way, these interventions should drive the opinion stream in the direction of an outcome that is preferred by the principal who pays a campaign. About that will often be the self-understanding and selling-point of a campaigning agency.
The number of strategic agents matters: If there are two or more strategic agents, the setting becomes a game theoretical context in which the strategic agents have to take into account that there are others that try to manipulate the opinion dynamics as well. Therefore, the strategic agents do not only 'play a game' with the non-strategic agents. They play also -and that now in an exact game theoretical sense of the word -a game against each other. It is a complicated game for which in principle usual solution concepts like the (sub-game perfect) Nash equilibrium can be applied. But if there is just one strategic agent, then there are no other competing manipulators. That turns the problem into a problem of an optimal control of the opinion stream governed by f t .
In the following we focus on the optimal control problem only. We add just one strategic agent 0 to the set of agents. Agent 0 is equipped with the ability to freely choose in any time step what other agents then perceive as his or her opinion. We call agent 0 the controller and his freely chosen opinion the control. Mathematically, this makes the development of the opinion system dependent on exogenously chosen parameters, namely the control opinion, and we are faced with a control system. If we define what the controller wants to achieve, we can formulate an optimal control problem, for which the controller tries to find the controls to get there.
Our optimal control problem is of a specific -seemingly simple -type: Agent 0 can strategically place opinions in a finite sequence of periods, one and only one opinion per period. There is the ongoing and underlying opinion dynamics, given by a function f t . Period by period agent 0 tries to do the placement of an opinion in such a way that finally in a certain future time step N (the horizon), known in advance, something is maximized: the number of normal agents' opinions that are in a certain part of the opinion space that ex ante was specified by agent 0 . To keep it simple, we assume as a special one-dimensional opinion space the real-valued unit interval [0, 1]. As the part of the opinion space preferred by the controller, we assume a target interval [ , r] ⊆ [0, 1] for some prescribed < r in [0, 1] known to the controller.
Both assumptions are much less restrictive than they seem to be: First, the unit interval can be used to represent opinions about, for example, tax rates, minimal wages, maximum salaries, political positions on a left-right spectrum, product quality, or any property whatsoever that can be expressed by real-valued numbers. If -and often that will be the case -the 'really' possible range of numerical values is different from the unit interval, then some transformation, approximation, or range regarding guess work is necessary. But that is an easy and widely accepted step (at least in principle). Second, suppose there are m fixed alternatives a = (a 1 , a 2 , . . . , a m ) ∈ [0, 1] m , sorted such that a 1 ≤ a 2 ≤ . . . ≤ a m . Further suppose, our n normal agents have to choose among the alternatives at the future time step N and and will do that by choosing an alternative that is next to their own opinion in that time step. What, then, is the problem of a controller with the interest to make as much normal agents as possible choosing a certain alternative a j ? Obviously the problem is to maximize the number of agents' opinions that in time step N are within a certain interval to the left and to the right of the favored alternative a j . The exact bounds of that interval depend upon the exact positions of the two nearest alternatives a j−1 to the left and a j+1 to the right of the favored a j . The exact left and rights bounds are then aj−1+aj 2 and aj +aj+1 2 respectively. Therefore, whatever the vector of alternatives may be (e.g., positions of parties or candidates on a left/right scale, a price or a technical specification of a certain type of product), whenever there are voting or buying decisions 3 after a foregoing opinion dynamics (e.g., about the appropriate position in the political spectrum, the acceptable price or a desirable technical specification of some product), our controller agent 0 who tries to 'sell' a certain alternative a j as effectively as possible, has always the same problem: How to get by period N as many opinions as possible within a certain target interval [ , r], determined by the closest left and right competitors of a j ? -Obviously, our framework and setup is much more general than it looks at a first glance.
Our problem and approach presupposes an underlying opinion dynamics given by a function f t . But there are many. We will use a linear and a non-linear variant, each of them probably being the most prominent variant of their type.
3 It may even be a buying decision in the metaphorical sense of whether or not to 'buy' a certain assumption: Imagine a committee that after some discussion has to decide whether to proceed based on this or that assumption in a vector a of alternatives.
In the linear variant the dynamics is driven by weighted averaging: Weights w ij may express the respect or competence that an agent i assigns to an agent j; alternatively, a weight w ij may express the influence, impact or power that agent j has on agent i. The weights that an agent assigns sum up to one. The opinion space is the unit interval [0, 1]. -The history of this linear system goes back to French, see (French Jr 1956), it has been already presented in (Harary 1959), it was explicitly stated in (DeGroot 1974), and it received a lot of attention, especially in philosophy, through the book (Lehrer & Wagner 1981). 4 We will refer to that model as the DeGroot-model (DG-model). 5 The non-linear variant that we will use is the so called bounded confidence model (BC-model). In this model the agents take seriously those others whose opinion are not too far away from their own opinion: The agents have a certain confidence radius and update their opinionsthe opinion space is again the unit interval [0, 1] -by averaging over all opinions that are from their own opinion not further away than : An agent i updates to x t+1 i by averaging over the elements of the set j ∈ {1, 2, . . . n} |x t i −x t j | ≤ , i.e., over all the opinions that are within what is called his or her confidence interval. The model was defined in (Krause 1997), 1998 coined bounded confidence model (see (Krause 2000)), and then for the first time, to a certain extent, comprehensively analyzed, by both simulations and rigorous analytical means, in (Hegselmann & Krause 2002).
The model looks extremely simple. However, there are several warnings in the literature on the BC-model, among them a very recent one: "The update rule . . . is certainly simple to formulate, though the simplicity is deceptive" (Wedin & Hegarty 2014, p. 2). The authors' warning is well founded: The simple update rule generates a complicated dynamics that still is only partially understood. The main reason for that is this: The updating rule of the BC-model can be described as assigning weights to other agents. All agents with opinions out of the confidence interval get a weight of 0; agents within get a weight of 1 divided by the number of agents that are within the interval. Therefore, the BC-dynamics is weighted averaging as well. However, there is a crucial difference to the linear DG-model: The weights of the BC-model are time-dependent and, even worse, discontinuously dependent on the current profile. That causes a lot of trouble -and, at the same time, generates many of interesting effects. As a consequence, the BC-model became a subject for all sorts of analytical or computational analysis and a starting point for extensions of all sorts. The body of literature on the BC-model is correspondingly huge.
Structural results in the BC-model were obtained with respect to convergence and its rate (Dittmer 2001;Krause 2006;Lorenz 2006), thresholds for the confidence radius (Fortunato 2005b), the identification of the really crucial topological and metric structures (Krause 2006), or the influence of the underlying network (Weisbuch 2004). The influence of a 'true' opinion, to which (some of) the individuals are attracted, received special attention (Hegselmann & Krause 4 In (Lehrer & Wagner 1981) the authors do not interpret the iterated weighted averaging as a process in time. As stressed in (Hegselmann & Krause 2006, p. 4): "Their starting point is a 'dialectical equilibrium', i.e., a situation after, the group has engaged in extended discussion of the issue so that all empirical data and theoretical ratiocination has been communicated. '. . . the discussion has sufficiently exhausted the scientific information available so that further discussion would not change the opinion of any member of the group' ( (Lehrer & Wagner 1981, p. 19)). The central question for Lehrer and Wagner then is: Once the dialectical equilibrium is reached, is there a rational procedure to aggregate the normally still divergent opinions in the group (cf. (Lehrer 1981, p. 229)? Their answer is 'Yes.' The basic idea for the procedure is to make use of the fact that normally we all do not only have opinions but also information on expertise or reliability of others. That information can be used to assign weights to other individuals. The whole aggregation procedure is then iterated weighted averaging with t → ∞ and based on constant weights. It is shown that for lots of weight matrices the individuals reach a consensus whatever the initial opinions might be -if they only were willing to apply the proposed aggregation procedure." 5 In philosophy the model is often called the Lehrer-Wagner model. Malarz 2006;Douven & Riegler 2010;Douven & Kelp 2011;Kurz & Rambau 2011;Wenmackers et al. 2012Wenmackers et al. , 2014. With a grain of salt, the true opinion can also be interpreted as a control that is constant over time and that is contained in each individual's confidence interval. Many variants of the original BC-model (discrete time, finitely many individuals, continuous opinion space) have been proposed, among them pairwise sequential updating (Deffuant et al. 2000), a discrete opinion space (Fortunato 2004), a multi-dimensional opinion space Krause 2005), a noisy opinion space (Bhattacharyya et al. 2013;Pineda et al. 2013), a continuous distribution instead of a finite-dimensional vector of opinions (Lorenz 2005(Lorenz , 2007, and continuous time (Blondel et al. 2010;Wedin & Hegarty 2014). Alternative dynamics have been enhanced with BC-type features in (Stauffer 2002(Stauffer , 2003Fortunato 2005;Stauffer & Sahimi 2006;Rodrigues & Da F.Costa 2005) in order to make the resulting emergent effects more interesting.

2006;
It is interesting to note that simulations play an important role in the investigation of the innocent-looking models for opinion dynamics (Stauffer 2003;Hegselmann & Krause 2006;Fortunato & Stauffer 2006;Stauffer & Sahimi 2006) -a hint that some aspects of opinion dynamics are very hard to deduce purely theoretically. That simulations based on floating-point numbers can bear numerical artefacts (see Section 3) was already discussed in (Polhill et al. 2005). The arguably most general account of theoretical aspects was contributed by Chazelle on the basis of a completely new methodology around function theoretical arguments(!) (Chazelle 2011). Moreover, opinion dynamics in multi-agent systems can be seen as an instance of influence systemthis broader context is described in (Chazelle 2012).
Recently, the effect of time-varying exogenous influences on the consensus process has been studied in (Mirtabatabaei et al. 2014). For the first time to the best of our knowledge, the notion of opinion control appeared literally only in the title of a paper (Zuev & Fedyanin 2012), based on a different dynamical model, though. For continous time optimal controll techniques have been applied to opinion consensus, exemplarily also for the BC-dynamics, in the preprint (Albi et al. 2014). Closest to our research is, up to now, probably the investigation of so-called damage spreading: what happens if some of the opinions in a BC-model change erratically and possibly drastically (Fortunato 2005a)?
In what follows, we analyze our optimal control problem with the linear DG-and the nonlinear BC-model as the underlying opinion dynamics that agent 0 tries to control: Given the DGor BC-dynamics and using the controls, i.e., placing one opinion per period here or there in the opinion space, agent 0 tries to maximize the number of agents with opinions that in a future period N are in a certain target interval [ , r] -and therefore would 'buy' the corresponding alternative a j that agent 0 is actually campaigning for. -We call the problem the campaign problem. In the following, we will switch to a more vivid language fitting this interpretation: agents are called voters, the special agent 0 is called the controller, the target interval [ , r] is called the conviction interval, and voters in the target interval are called convinced voters.
As an example, we will analyze a specific instance of the campaign problem. Our benchmark problem is this: There are 11 voters, governed by a DG-or BC-version of the (reactive) function f t . At t = 0, the 11 voters start with the opinions 0, 0.1, 0.2, . . . , 0.9, 1. The conviction interval is [0.375, 0.625]. That conviction interval would be the target in a campaign in which the alternatives 1 4 , 1 2 , 3 4 are competing and 1 2 is the preferred alternative. The goal is to maximize the number of convinced voters, i.e., those with opinions in the conviction interval, in a certain future period N with N = 1, 2, . . . , 10. The benchmark problem looks like a baby problem. But it is a monster: It will turn out that for higher one digit values of N we could not solve it by the most sophisticated methods available.
We explicitly turn the BC-dynamics into a control system in the most straight-forward way and ask the question, how to control it optimally. This question has, to the best of our knowledge, not been investigated in the literature so far.
In this paper we • propose the new problem optimal opinion control 6 and an anecdotal benchmark problem instance for illustration and test purposes, • develop two exact mathematical models based on mixed integer linear programming to characterize optimal controls 7 , • devise three classes of primal heuristics (combinatorial tailor-made, meta-heuristics, model predictive control) to find good controls, • present computational results on all methods applied to our benchmark problem, • sketch possible lines of future research.
In Section 2 we formally introduce the optimal opinion control problem. Section 3 shows what can happen if the pitfalls of numerical mathematics are ignored in computer simulations. Section 4 presents some structural knowledge about optimal controls in small special cases, thereby indicating that a general theoretical solution of the optimal control problem is not very likely. Computational results are presented in Section 5. The detailed presentation of the underlying exact mathematical models is postponed to Appendix A, followed by information on the parameter settings for the commercial solver we used in Appendix B. Our heuristics are introduced in Section 6. In Section 7 we interprete the obtained results. Conclusions and further directions can be found in Section 8.

Modeling the Dynamics of Opinions and Their Control
We will now formalize the ideas sketched in the introduction. We are interested in opinions that can be represented by mathematical entities. There may be many ways to transform opinions about every conceivable topic into more or less complicated mathematical structures. In this paper, we will restrict ourselves to the arguably simplest case where an opinion of a voter i ∈ I can be represented by a real number x i in the unit interval [0, 1].
The opinions may change over time subject to a certain system dynamics: We assume that time is discretized into stages T := {0, 1, 2, . . . , N }. The opinion of voter i ∈ I in Stage t ∈ T is denoted by x t i . We call, as usual, the vector x t := (x t i ) i∈I the state of the system in Stage t.
The system dynamics f t is a vector valued function that computes the state of the system x t+1 as x t+1 := f t (x t ). We assume a given start value x start i for the current opinion of each voter i ∈ I. Thus, x 0 i = x start i holds for all i ∈ I. Depending on how f t is defined, we obtain different models of opinion dynamics. In this paper, we will only consider so-called stationary models, where f t does not depend on the Stage t. Therefore, from now on, we will drop the superscript t from the notation and write f for the system dynamics.
2.1. The DeGroot Model. In this model, each voter is again in contact with each other in every stage. The strengths of the influences of opinions on other opinions are given by strictly positive weights w ij with j∈I w ij = 1 for all i ∈ I, with the meaning that the opinion of voter i is influenced by the opinion of voter j with weight w ij . The mathematical formulation of this is to define f = (f i ) i∈I as a weighted arithmetic mean in the following way: (1) 6 During finalizing work on this paper, it came to our attention that optimal opinion control was independently introduced and investigated with a different objective in continuous time in (Wongkaew et al. to appear) 7 Another application of mixed integer linear programming techniques in modeling social sciences was, e.g., presented in (Kurz 2012).
It can be shown that this, in the limit, leads to consensus. 8 It leads, as we will see below, still to an interesting optimal control problem.
2.2. The Bounded-Confidence Model. The motivation for this model is that our voters ignore too distant opinions of others. Formally, we fix once and for all an ∈ (0, 1), and each voter is influenced only by opinions that are no more than away from his or her own opinion. We call [x t i − , x t i + ]∩[0, 1] the confidence interval of voter i in Stage t. Let the confidence set I i (x 1 , . . . , x n ) of voter i ∈ I in state x = (x 1 , . . . , x n ) be defined as Observe that I i (x 1 , . . . , x n ) = ∅ due to i ∈ I i (x 1 , . . . , x n ).
Then the system dynamics of the BC-model is given as follows: A possible extension might be a stochastic disturbance on , but, as we will see, boundedconfidence is still far from being completely understood. Therefore, in this paper bounded confidence will be in the main focus.
2.3. A New Opinion Control Model. Given a dynamical system as above, we can of course think about the possibility of a control that can influence the system dynamics. Formally, this means that the system dynamics f depends also on some additional exogenously provided data u, the control.
Formally, this means in the simplest case (and we will restrict to this case) that the controller can present an additional opinion u t in every stage that takes part in the opinion dynamics. The corresponding system dynamics, taking the control as an additional argument, are then given as follows (with x 0 := u and I 0 := I ∪ {0} as well as w ij this time with j∈I0 w ij = 1 for easier notation): x1,...,xn) x j . (Bounded-Confidence-Control) We can interpret this as a usual model of opinion dynamics with an additional opinion x 0 that can be positioned freely in every stage by the controller. The aim of the controller is as follows: Control opinions in a way such that after N stages there are as many opinions as possible in a given interval [ , r] ⊆ [0, 1].
To formalize this, fix an interval [ , r] (the conviction interval ), and let the conviction set J(x 1 , . . . , x n ) denote the set of all voters j ∈ I with x j ∈ [ , r]. We want to maximize the number of convinced voters. Thus, the problem we want to investigate is the following deterministic discrete-time optimal control problem: subject to where f = (f i ) i∈I is one of the controlled system dynamics in Equations (DeGroot-Control) and (Bounded-Confidence-Control), resp.
2.4. Our Benchmark Example. We now design a special instance of our optimal control problem that serves as our benchmark for computational investigations. We are given eleven voters with starting opinions 0, 0.1, 0.2 . . . , 0.9, 1. Our conviction interval is the interval [0.375, 0.625].
The goal is to maximize the number of convinced voters in stage 1, 2, . . . , 10, resp. We will see, that even for this innocent-looking example we were not able to find the optimal number of convinced voters for all numbers of stages between 1 and 10. Table 1 shows all we were able to find out. In the rest of the paper we will explain how we obtained that knowledge.  Table 1. All information we could gather about the benchmark example in Sections 6 * and 5 * * .

Simulation and Pitfalls from Numerical Mathematics
Before we can start with our benchmark problem we have to realize a very inconvenient fact: If we try to solve our benchmark problem for the BC-model, the computer has -again and again -to decide the question whether or not |x i − x j | ≤ , where x i , x j , and are, in particular, real numbers. For a human being with a tiny bit of math training it is easy to answer the question whether |0.6 − 0.4| ≤ 0.2. If a computer has to answer that simple question and uses what in some programming languages is called the data format "real" or "float" or even "double" (float with double precision), then the computer might get it wrong. In Figure 1, left, one can see that effect: We start the dynamics with 6 voters that are regularly distributed at the positions 0, 0.2, 0.4, . . . , 1. The confidence radius is 0.2. Obviously the computer (using a program written in Delphi, but in NetLogo the analogous mistake would happen, possibly somewhere else) answers the question whether |0.6 − 0.4| ≤ 0.2 in a wrong way. As a consequence, from the first update onwards the dynamics is corrupted: Given our start distribution and the homogeneous, constant, and symmetric confidence radius, the opinion stream should be mirror symmetric with respect to a horizontal line at y = 0.5. That symmetry is forever destroyed by the very first update. What happens here is no accident. It is the necessary consequence of the floating point arithmetic that computers use to approximate real real numbers. Using floating point arithmetic each number is represented with a finite number of binary digits, so a small error is possibly made. For a hard decision like |x i − x j | ≤ or |x i − x j | > a small error is sufficient to draw the wrong conclusion, whenever |x i − x j | equals or is rather close to . The only way for us to cope with this problem is to resort to exact rational arithmetics throughout, although there may be more sophisticated methods to improve efficiency. This numerical instability has the more serious consequence that of-the-shelf optimization algorithms with floating point arithmetics can not be used without checking the results for correctness in exact arithmetics.
Using exact arithmetic we obtain that the opinions of our voters are given by The corresponding correct trajectory is drawn on the right hand side of Figure 1. As we have seen, a small error in the positions of the voters, computational or by observation, can have a drastic effect. We mention that this effect can not happen for numerical stable dynamics like, e.g., the DG-model. The only patch that came to our mind which was capable of dealing with the numerical instability was to use exact arithmetic. This means that we represent all numbers as fractions where the numerator and the denominator are integers with unlimited accuracy. We remark that we have used the Class Library of Numbers (CLN) a C++-package, but similar packages should be available also for your favorite programming languages.
There are quite a lot of articles dealing with the simulation of the BC-model. To our knowledge none of these mentioned the use of exact arithmetic. So one could assume that the authors have used ordinary floating point numbers with limited precision for there considerations. It is an interesting question whether all of these obtained results remain more or less the same if being recalculated with exact arithmetic. For no existing publication, however, we found any evidence that the conclusions are only artefacts of numerical trouble. For results using randomized starting configurations the probability is zero that the distance between agents equals the confidence radius. In those experiments numerical artifacts are much less likely (though not impossible) than in simulations starting from equidistant configurations.
We have to admit, that in the starting phase of our investigation in optimal control of opinion dynamics, we have also used floating point arithmetic. We heuristically found controls achieving 10 voters after 10 stages. Using exact arithmetic it turned out that the computed control yields only 4 convinced voters, which is a really bad control, as we will see later on.

Basic Structural Properties of Optimal Controls
In this section we collect some basic facts about structural properties of optimal controls, mainly for the BC-model. While generally the DG-model has nicer theoretical properties, there is an exception when considering the ordering of the voters over time. ( Proof. If the positions of voter i and voter j coincide at stage t, then they have identical confidence sets and the system dynamics yields the same positions for i and j at stage t + 1. For (2), we j . The analogous statement for the DG-model is wrong in general. Next we observe that one or usually a whole set of optimal controls exists. The number of convinced voters is in any stage trivially bounded from above by the total number of voters |I|. Hence, to every control there corresponds a bounded integer valued number of convinced voters. With some technical effort an explicit bound can be computed.
First, we observe that there are some boundary effects. Consider a single voter with start value x 0 1 = 1 2 and = 1 2 . Further suppose that the conviction interval is given by [0, δ], where δ is small. The most effective way to move the opinion of voter 1 towards 0 is to place a control at 0 at each stage. With this we obtain x t 1 = 1 2 t+1 for all t. Thus the time when voters 1 can be convinced depends on the length δ of the conviction interval. This is due to the fact that we can not place the control at x t 1 − if x t 1 is too close to the boundary. The same reasoning applies for the other boundary at 1. In order to ease the exposition and the technical notation we assume that no opinion is near to the boundaries in the following lemma.
Lemma 4.2. Consider an instance of the BC-model such that the start values and the conviction interval [l, r] are contained in [ , 1 − ]. It is possible to select suitable controls at each stage such that after at most 2n+1 + 2 stages all |I| voters are convinced.
Proof. We will proceed in two steps. In the first step we ensure that all voters have the same opinion after a certain amount of stages. In the second step we will move the coinciding opinions inside the conviction interval.
Without loss of generality, we assume the ordering x 0 1 ≤ · · · ≤ x 0 n and observe x 0 At most n − 1 of the n voters can be inside the confidence set of voter 1 and their opinion is at least x t 1 , see Lemma 4.1. Thus we have x t+1 yields the same opinion, which is also inside [ , 1 − ], for all voters after at most n + 1 stages, i.e., the first step is completed. In Step 2 we proceed as follows. If x t 1 ∈ [l, r] nothing needs to be done. Due to symmetry we assume x t 1 < l in the following. If l−x t 1 ≥ n+1 we place a control at x t 1 + so that x t+1 1 = x t 1 + n+1 , since all voters influence voter 1. After at most (n + 1) stages we have l − x t 1 < n+1 . In such a situation we set the control to x t 1 + (n + 1)(l − x t 1 ) such that x t+1 1 = l. Applying Lemma 4.1 again, we conclude that we can achieve x t i = l after at most 2n+1 + 2 for all i ∈ I. Taking the control as x t i we can clearly preserve the configuration to later stages. Thus, given enough time (number of stages) we could always achieve the upper bound of |I| convinced voters. By setting [l, r] = [1 − , 1 − ] and x 0 i = , we see that the stated estimation gives the right order of magnitude in the worst case.
For the DG-model the upper bound on the time needed to convince all voters depends on the influence w i0 of the control for each voter. To this end we define ω = min i∈I w i0 , i.e., the tightest possible lower bound on the influences. Since it may happen that no stable state is reached after a finite number of stages, we can only navigate the voters into an interval of length greater than zero.
Proof. By induction over t we prove that we have |x t i − p| ≤ (1 − ω) t for all i ∈ I and t ∈ N, if we place the control at position p at all stages. Since Thus, given enough time (number of stages) we could always achieve the upper bound of |I| convinced voters if the conviction interval has a length greater than zero. By setting p = 1 and x 0 i = 0, we see that the stated estimation gives the right order of magnitude in the worst case. Using the Taylor expansion of log(1 − ω) and having an influence that decreases with the number of voters in mind, we remark that

Computational Information on Optimal Controls
How can one find reliable information on optimal controls and their resulting optimal achievements? That is, for a special instance like our benchmark instance, how can we find out, how many convinced voters are achievable for a given horizon N ? It is certainly not possible to try all possible controls and pick the best -there are uncountably infinitely many feasible controls, because all elements in [0, 1] N constitute feasible controls. On the other hand, some logical constraints are immediate without enumeration: it is impossible to achieve more convinced voters than there are voters.
A common technique to supersede such a trivial statement without complete enumeration is to devise a mathematical model and find solutions to it by exact methods. Exact methods depend on generic logical reasoning or provably correct computational information about the solution of a mathematical model. 9 In this section, we use mixed integer linear programming (MILP) for modeling the DG and the BC optimal control problems.
While the models are generically correct, concrete computational results will only be given for our benchmark problem and related data. One big advantage of MILP models is that there is commercial software of-the-shelf that can provide solutions to such models regardless of what they represent. There is even academic free software that is able to provide solutions stunningly fast.
In this section, we will not spell out the formulae of our models explicitly. 10 Instead, we try to emphasize the features of our approach.
First, an optimal solution to an MILP model is globally optimal. That is, no better solution exists anywhere in the solution space. Second, if an optimal solution to an MILP model was reported by the solver software, we are sure (within the bounds of numerical accuracy) that it is an optimal solution, i.e., the method is exact. Third, if an optimal solution to an MILP could not be found in reasonable time, then very often we still obtain bounds on the value of an (otherwise unknown) optimal solution. And fourth, as usual the process of constructing an MILP model is non-unique, i.e., usually there are many, substantially different options to build an MILP model for the same problem, and one may provide solutions faster or for larger instances than another.
We built an MILP model for the DG optimal control problem and two MILP models for the (much more difficult) BC optimal control problem.
The MILP model for the DG optimal control problem can be classified as a straight-forward MILP: the system dynamics is linear and fits therefore the modeling paradigm of MILP very well. The only little complication is to model the number of convinced voters, which is a non-linear, non-continuous function of the voters' opinions. Since binary variables are allowed in MILP, we can construct such functions in many cases using the so-called "Big-M method." Details are described in the Appendix A.1.
An MILP model for the BC optimal control problem is not straight-forward at all. Since the system dynamics depends on whether or not some voter is in the confidence interval of another, we have to decide at some point whether or not two voters' distance is either ≤ or > . It is another general feature that strict inequalities cause trouble for MILP modeling, and the distinction would be numerically unstable anyway (most MILP solvers use floating-point arithmetic, see Section 3). Thus, we refrained from trying to build a correct MILP model for the BC optimal control problem. Instead, we built two complementary MILP models. Without referring to the details, we again explain only the features. In the first MILP model, the primal model, any feasible solution defines a feasible control, which achieves, when checked with exact 9 An example of generic logical reasoning can be seen in the previous section. 10 Mathematically explicit descriptions, suitable for replicating our results, can be found in Appendix A in the appendix.
arithmetic, at least as many convinced voters as predicted in the MILP model. The second MILP model, the dual model, is some kind of relaxation: No control can convince more voters than the number of convinced voters predicted for any of its optimal solutions. This is implemented by using a safety marginˆ for the confidence interval. In the models, it is now required for any feasible control that it leads, at all times, to differences between any pair of voters that are either ≤ or ≥ +ˆ . Ifˆ > 0, we obtain a primal model, since some originally feasible controls are excluded because they lead to differences between voters that are too close to the confidence radius . Ifˆ ≤ 0, we obtain a dual model where the requirements for "are within distance " and "are at distance at least +ˆ " overlap so that the solution with better objective function value can be chosen by the solver software. Now, if we put together the information from both models then we can achieve more: If the optimal numbers of convinced voters coincide in both models, then we have found the provably optimal achievable number of convinced voters although we had no single model for it. Otherwise, we obtain at least upper and lower bounds. Moreover, any lower bound for the number of convinced voters predicted by the primal model is a lower bound on the number of achievable convinced voters, and any upper bound on the number of convinced voters predicted by the dual model is an upper bound on the number of achievable convinced voters.
The first MILP model for the BC-model is a basic model with a compact number of variables along the lines of the DG MILP. However, the system dynamics is discontinuous this time, which requires a much heavier use of the Big-M method. MILP-experience tells us that too much use of the Big-M method leads to difficulties in the solution process. Since the basic model did indeed not scale well to a larger number of rounds, we engineered alternative models.
In this paper, we restrict ourselves to the so far most successful second model. 11 The resulting advanced MILP model for the BC optimal control problem has substantially more variables but not so many Big-M constructions. Moreover, the advanced model uses a perturbed objective function: Its integral part is the predicted number of convinced voters, and one minus its fractional part represents the average distance of the unconvinced voters to the conviction interval. This perturbation was introduced in order to better guide the solution process. The problem with the unperturbed objective function is that many feasible controls unavoidably achieve identical numbers of convinced voters because there are simply much fewer distinct objective function values than controls; this is a degenerate situation, which is difficult to handle for the MILP solver. The perturbation guarantees that two distinct controls are very likely to have distinct objective function values.
Our hypothesis was that the advanced model would be easier to solve, which, to a certain extent, turned out to be true. We know of no other method to date that yields more provable and global information about optimal controls for the BC optimal control problem.
In the following we report on our computational results. The MILP for the DG-model was very effective. It could be solved in seconds for the benchmark problem with homogeneous weights. Eleven convinced voters are possible for any horizon, and, of course, no more. The control is non-crucial here, because homogeneous weights lead to an immediate consensus in the conviction interval. But also for other weights, optimal solutions can be found for all horizons very fast. The real conclusion is that solving the DG optimal control problem on the scale of our benchmark problem is easy, but there are no mind-blowing observations about optimal controls.
Using the basic MILP revealed that the BC-model is in an all different ball-park. Table 2 shows the computational results for our basic model withˆ = 10 −5 > 0 yielding provably feasible 11 Because of its special characteristics it was accepted for the benchmark suite MIPLIB 2010 (Koch et al. 2011   controls. In order to really prove that no solutions with a better objective (above the upper bound) exist, we would have to rerun the computations with anˆ ≤ 0. We skipped this for the basic model and did this only for the advanced model below, since the information obtained by the advanced model is superior anyway. Table 3 shows the computational results for our advanced model with anˆ = 10 −5 > 0, i.e., all obtained configurations are feasible, i.e., the given controls provably produce this objective function (within the numerical accuracy). For N = 6 and larger, cplex (ILOG 2014) could not find a dual bound with fewer than 11 convinced voters in one hour. For a time limit of 24h, however, the optimum was determined for N = 6, and the optimal number 11 of convinced voters for N = 10 could be found. Moreover, a better feasible control with objective 6.707 for N = 7  Table 3. Results of the advanced MILP model for the benchmark problem with anˆ = 10 −5 > 0 (provably feasible configurations); number of variables/constraints/non-zeroes for original problem before preprocessing; the time limit was 1h = 3600s; MacBook Pro 2013, 2.6 GHz Intel Core i7, 16 GB 1600 MHz DDR3 RAM, OS X 10.9.2, zimpl 3.3.1, cplex 12.5 (Academic Initiative License), branching priorities given according to stage structure. Remark: For a time limit of 24h, 6 stages can be solved to optimality (optimal value 6.697) and for 10 stages we obtain a solution with 11 convinced voters (optimal value 11.800 − −11.840).
and of 8.767 for N = 9, respectively, could be computed in less than 1h = 3600s on a faster computer 12 The instance for N = 10 is special in the sense that the trivial bound of 11 convinced voters is sufficient to prove the "voter"-optimality of a solution with 11 convinced voters. It is yet an open problem to find a configuration that provably maximizes the perturbed objective function of the advanced model.  Table 4. Results of the advanced MILP model for the benchmark problem with anˆ = −10 −5 < 0 (capturing all feasible and possibly some infeasible configurations); number of variables/constraints/non-zeroes for original problem before preprocessing; the time limit was 1h = 3600s; MacBook Pro 2013, 2.6 GHz Intel Core i7, 16 GB 1600 MHz DDR3 RAM, OS X 10.9.2, zimpl 3.3.1, cplex 12.5 (Academic Initiative License), branching priorities given according to stage structure. anˆ = −10 −5 < 0, i.e., the obtained configurations may be infeasible, i.e., the listed objective function values may be different than the results of the true BC-dynamics applied to the computed controls. However, the set-up guarantees that no feasible configurations exist that produce better objective function values. It is apparent that this time not even for N = 5 the optimal value of the dual model could be found in 1h. This can be explained: Since in the dual model a solution has more freedom to classify "inside confidence interval: yes or no", it is harder for the solver to prove that high objective function values are impossible. On a faster computer 13 we obtained for the dual model optimal values of 6.676 for N = 5 in 2649s and 6.700 for N = 6 in 117 597s, respectively. Thus, we know that more than six voters are neither possible in five nor in six stages.
# stages objective of an optimal BC-control after 1h CPU time all we know from MILP   Table 5 summarizes the results that the advanced model can generate in 1h per computation by combining the primal and dual computations. For up to 3 stages the optimal value of an optimal BC-control was found. Regarding the number of achievable voters only (the integral part of the objective function), we know that in 4 stages six voters are possible, but no more. There is, however, a small gap between the optimal values of the primal and the dual model. For 5 and more stages, the upper bound could not pushed below the trivial bound 11 in 1h computation time.
We see that MILP modeling requires a lot of effort. We know, however, of no other method to date that can prove global optimality of a control for N = 6 or larger. What can we learn from the provably optimal solutions? We see that • some controls near the end of the horizon just decrease the average distance of nonconvinced voters to the conviction interval, i.e., they are induced by the perturbation of the objective function   . Trajectories produced by solutions of the advanced MILP forˆ = 10 −5 > 0 (provably feasible configurations) for 6 (optimal objective) and 10 (optimal number of convinced voters) stages in a time limit of 24h.
• the controls of the earlier stages, however, always try to pull a voter as far as possible, i.e., they are exactly at confidence distance to some selected voter The second observation will be exactly what is used in Section 6 as an idea for a clever heuristics. 5.1. Computational Results on Random Instances. Of course, the choice of the particular benchmark problem is completely arbitrary. We want to provide additional evidence for the observation that DG control is easier than BC control and that the advanced model performs better than the basic one. To this end, we tested all models on random instances. We chose a uniform distribution of start opinions in the unit interval. The conviction interval was constructed by choosing a party opinion uniformly at random in the unit interval and a conviction distance uniformly at random between 0.1 and 0.2. The conviction interval then contained all values with a distance at most the conviction distance from the party opinion, cut-off at zero and one, respectively. The confidence distance was chosen uniformly at random between 0.1 and 0.2 as well. We drew five samples of random data and ran cplex (ILOG 2014) with a time limit of one hour on the resulting instances with one up to ten stages. Table 6 shows the results for anˆ = 10 −5 > 0; we again skip the dual computation witĥ = −10 −5 < 0. One very interesting phenomenon can be identified in Sample 5: the number of convincible voters is not monotonically increasing with the number of stages available for control. There can be some unavoidable 'distraction' caused by other voters. While in two stages we can for N ≥ 5 we show the incumbent solution when the time limit of 1h was exceeded; note the obviously infeasibly splitting trajectories for N = 10 stages for voters 1 and 2 in Stage 7.
achieve five convinced voters, in three stages no more than three are possible. This is some more evidence for the assessment that bounded-confidence dynamics can lead to the emergence of various counter-intuitive structures that make the dynamic system appear random and erratic though it actually is deterministic.

Heuristics to Find Good Controls
In the previous section we have described a MILP-formulation of our problem. So in principle one could solve every problem instance by standard of-the-shelf software like cplex (ILOG 2014). In contrast to the DG-model, where we could solve our benchmark problem for any number of stages between between 1 and 10 without any difficulty, the instances from the BC-model are harder. Using the MILP-formulation of the previous section we were only able to determine the optimal control up to 6 stages using cplex.
The approach using a MILP-formulation has the great advantage that we receive dual, i.e., upper, bounds for the optimal control. For the primal direction (find feasible controls) one can equally well employ heuristics that can determine good controls more efficiently. Note that also the MILP-approach benefits from good feasible solutions, especially if they respect fixed variables in the branch-and-bound search tree. So in the next subsections we give three heuristics to find good controls.
6.1. The Strongest-Guy Heuristics. What makes the problem hard, apart from the discontinuous dynamics and numerical instabilities, is the fact, that the control x t 0 is a continuous variable in all stages t. Thus, at first sight the problem is not a finite one. Let us relax our problem a bit by allowing only a finite number of possibilities for x t 0 at any stage and have a closer look at the situation.
By placing a control x t 0 at Stage t some voters are influenced by the control while others are not influenced by the control. We notice that the magnitude of influence rises with the distance between the voters opinion x t i and the control x t 0 as long as their distance remains below . So the idea is, even though we are not knowing what we are doing, we will do it with full strength.   Table 6. Results of the MILP models on random instances; the time limit was 1h = 3600.00s; MacBook Pro 2013, 2.6 GHz Intel Core i7, 16 GB 1600 MHz DDR3 RAM, OS X 10.9.2, zimpl 3.3.1, cplex 12.5 (Academic Initiative License), branching priorities given according to stage structure.
Instead of giving the exact values of x t 0 for all stages t we can also give a sequence of indices [i 1 , i 2 , . . . , i N ] of those voters that are pulled at maximum strength. The number of such sequences is finite. Therefore, the controls arising this way can be enumerated, in principle. For our benchmark problem, (a clever implementation of) enumeration could be done: In Table 7 we give for our benchmark problem the maximum number of convinced voters that can be achieved by using the strongest guy heuristics together with the corresponding index-vector. 14 The corresponding trajectories are drawn in Figure 6. We observe that the strongest-guy heuristics improves the best found solution of the ILP approach, given 1h of computation time, for N = 7 stages by two additional convinced voters. For N = 4 stages the heuristics misses the optimum of 6 convinced voters by one. Given the upper bounds from Table 5 we can conclude that the strongest-guy heuristics found an optimal solution for N ∈ {0, 1, 2, 3, 5, 6, 10}.
We can improve upon these findings by slightly modifying the strongest-guy heuristics. In order to avoid numerical instabilities, when computing with floating point numbers, we had experimented with the positions x t i + − δ and x t i − + δ for δ = 10 −6 , i.e. almost full strength. Curiously enough, we have obtained better solutions in some cases. For N = 4 stages the δ-modified control sequence [3,3,8,6] Figure 7. We remark that we are not aware of any further improvements.
We remark that the strongest-guy heuristics can be easily adopted to the situation where some of the 0-1 variables from the MILP formulation (see Appendix A) are fixed to either 0 or 1. Thus, it is possible to install a call-back to the strongest-guy heuristics inside an MILP solution  process. However, in our experience the MILP solution needs most of the time to tighten the dual bound.
6.2. A Genetic Algorithm. To get a better idea about the solution space, we implemented a genetic algorithm (GA) to search for optimal solutions. The GA used standard GA methods to evolve good solutions from a randomly chosen set of starting strategies. To use GA for the the problem, the problem has to be formulated to suit the GA terminology. A GA instance consists of genes that form a chromosome. Each chromosome can be evaluated using a fitness function, which serves to determine the chromosome's quality with respect to the original problem. The GA uses alternating steps of evolution and selection to modify the chromosomes and moving the entire population of chromosomes to increase the number of high quality chromosomes. By means of the survival of the fittest, the selection process sorts out weak chromosomes with low fitness values, while it retains chromosomes with high fitness values. The remaining chromosomes evolve to the next round of the GA. The following subsection explain the different parts of the GA individually.
6.2.1. Set-Up of the GA. The only free variables in our example problem are the ten different positions of our freely selectable voter: The ten different positions make up one strategy to control the remaining voters. In the GA, one strategy is encoded as one chromosome with the ten different positions each occupying one gene. Because of the numerical inaccuracies (see Section 3) in standard floating point implementations, the GA had to use fractions to specify the strategies. Throughout the GA exact arithmetics had to be used, which, since the GA has been implemented in Java, made use of the java.math.BigInteger class.
The GA itself has been set up to run for 250 rounds or until an optimal control had been reached, whichever would occur first. The size of the population has been set to 500 voters. Both values could be increased to get a higher chance to reach an optimal solution. However, the computational cost of using exact arithmetics is quite high, which led to this acceptable compromise.
6.2.2. Fitness Function. In the example we used the conviction interval of [0.375, 0.625], the most obvious fitness function to use is to count all the convinced voters. This fitness function is denoted as MaxVotes (MV). Figure 8 depicts a typical run of the GA with the MV fitness function. While the MV fitness function provides an exact mapping of the original problem to the formulation of the GA's fitness function, it does not provide a very good example of a fitness function for a GA because it is discontinuous and has huge gaps in its range. Because the fitness function is just a count of the convinced voters, the function can take integer values only. However, the GA's fitness function is defined on real values. As a consequence, the fitness function does not provide enough information about a particular strategy. In the figure, one can see that, approximately after round 120, almost all remaining strategies yield eight voters, but the GA does not progress further to yield more voters. This indicates a lack of information on the strategies. If all chromosomes evaluate to the same fitness value, the GA has no way to select the fittest chromosomes and advance them to the next generation.
Therefore, we came up with some different fitness functions that span the entire range and try to provide additional information about strategies that yield the same amount of voters. These strategies, however, deviate from the original problem and create a slightly different problem for the GA to solve. Thus, all fitness functions that do not map the original problem directly, have to be evaluated with respect to their mapping ability.
The fitness functions fall into three different categories: Weighted Sum: This category of fitness functions calculates the weighted sum of all the voters final positions. The weight to be used is computed with a given partially defined function that maps a position to a weight. The MV fitness function is a special case of the Weighted Sum class of functions, because it assigns weight 1 to all positions in the conviction interval and weight 0 otherwise. The other functions used in this class are DistanceToParty2 (D2P2) and BorderDistanceToAll (BD2A). Both of them differ from MV in that they assign values between 0 and 1 to positions that (a) are not in the conviction interval with higher values the closer the position is to the interval or (b) decreasing values within the interval, the closer the position is to the center of the interval. This leads to positions of voters on the very edges of the interval to be the most favorable. The idea behind evaluating positions within the interval differently is that voters sitting on the edges of the interval have the greatest effect on voters that are not in the range yet. Last Remaining: The class of the last remaining fitness functions does not evaluate every voter but restricts itself to convinced voters (counted with weight 1) and the nearest voter that has not reached the conviction interval yet (weighted according to the function). All other voters are assigned weight 0. The fitness function in this category is BorderDistanceToMin (BD2M), which has the same form as BD2A from the Weighted Sum category. Minimum Distance: The last class of fitness functions does not evaluate all voters positions but takes into account the distance between the two outmost voters (i. e. the voter with the highest opinion and the voter with the lowest opinion). Because of the order preserving characteristics of the model, these two voters do not change throughout one run, which means, we can just use the distance between the voter that started with opinion 0 and the one that started with opinion 1. If the distance is in the range of [0, 0.25], which is the size of the conviction interval, the fitness evaluates to the maximum fitness value of 10. If the distance is greater than 0.25, the function computes a value that is decreasing to 0 with increasing distance. The two functions used in this class are MinimumDistanceBetweenFirstAndLast (MDBFL) and MinimumDistanceBe-tweenFirstAndLastSquare (MDBFLS) that have a linear or quadratic slope respectively. A third function in this class is the MinimumDistanceBetweenFirstAndLastToCenter-Square (MDBFL2CS) that accounts for the fact that the group of voters with opinions below 0.5 may not behave symmetrically to the group with opinions above 0.5. This asymmetry can result in the position range not to be centered around 0.5 but deviate from that midpoint. Such behavior is undesirable, since the original goal is to get as many voters close to 0.5 as possible. MDBFL2CS accounts for this and evaluates the positions of the two outmost voters with respect to their distance to the desired midpoint.
The two values are added.
6.2.3. Evolution. After the evaluation phase of the GA, the fittest chromosomes are chosen to advance to the next generation. There are different selection algorithms available. We used two different ones, which are among the standard selection algorithms: Weighted Roulette Selector (WRS): Each chromosome is assigned a probability to advance to the next round proportional to its fitness. Then the population for the next round is chosen by randomly picking a chromosome from the so called 'roulette wheel' as often as desired. This selection method allows for some chromosomes with low fitness values to advance to the next round, which results in a lower chance to reach a local optimum too quickly.
Best Chromosomes Selector (BCS): The BCS sorts the population according to the fitness values and discards the fraction with the lowest fitness values. The ration of chromosomes to retain is configurable. BCS fosters depth search with the danger of reaching a local optimum. As an advantage, it progresses much quicker than WRS. After the chromosomes for the next round have been selected, the GA performs the crossover and mutation operations, whose parameters (percentage of the mutation, point of crossover) are configurable.
6.2.4. Results. Figure 9 shows the performance of the different fitness functions. One notable observation is the step like behavior of the MV fitness function: Already around round 30, it reaches eight voters, but does not advance from there. All the other fitness functions show a much smoother behavior. However, with the exception of BD2A and MDBFLS, all functions seem to have reached a plateau around round 150. Judged from the performance of the fitness functions, BD2A promises the best results as it still progresses at round 250 and also reached a high fitness value. However, as BD2A optimizes a slightly different problem than the original problem. Therefore, the performance with respect to the fitness function has to be compared with the performance of the fitness function with respect to the voters. all other fitness functions alike) does not provide a good mapping of a fitness value to a certain number of voters. This, of course, poses a problem, since the only way for the GA of evaluating a certain strategy is the fitness function. The graph suggest that, apart from MV, only BD2M should be used.
Of the two different selectors available, BCS proved to be the most useful of the two. While the WRS worked, the evolution of the population happened very slowly regardless of the fitness function used. Figure 11 shows the result of the two selectors while using the same fitness function (BD2A). Similar results hold for all other fitness functions as well.

Figure 11. Comparison of two different selectors
Because of these findings, the BD2M fitness function has been tested with different values for the best performing selector BCS. The selector allows to configure the fraction of voters that advances from one generation to the next. With a value of 50% only the better half of chromosomes advances. This leads to an extremely narrow search that runs into high risks of lingering at a local optimum. To increase the chances of leaving a local optimum again, the percentage should be increased. In the simulation, runs with ratios from the set {0.5, 0.6, 0.7, 0.75, 0.80, 0.85, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99} have been used. From these runs, only ratios above 0.75 resulted in stable evolution patterns, while rates below had their fitness values alternate between very low and very high values but did not converge.
Above 75% all runs converged with an optimal rate at around 95%. The runs with ratios of 0.95 and 0.96 were the ones that produced strategies that at least gave nine convinced voters. Figure 12 shows these two runs and the number of voters that each chromosome generated. Altogether the GA provides a heuristics to find optimal (or near optimal) solutions. However, because of the problem structure the GA did not find an optimal solution for the control problem. The most convinced voters that the GA could find a strategy for were nine in the example setting. Strategies yielding nine voters were extremely rare and could only be obtained in settings with highly tuned parameters. This result seems to indicate that the solution space has a very sparse population that could possibly occupy a very restricted region in the space.
6.3. The Model Predictive Control Heuristics. The observation that our benchmark problem can be solved to optimality if restricted to just a few stages suggests the following recedinghorizon control heuristics (RHC), sometimes also known under the name model predictive control (MPC).
(1) Choose the MPC-horizonŇ ≤ N , for which the optimal control problem (the MPCauxiliary problem) can be solved to optimality. (2) Set the state of Stage 0 to the given start configuration of opinions.
(4) Apply the first control. (5) Change the start configuration to the resulting configuration in Stage 1. (6) Go to 2.
Although this method uses exact optima of a related optimal control problem, it is in general a suboptimal control. The hope is that its performance is in many cases not too far away from the optimal objective value.
For our computational results we used the advanced MILP to solve the upcoming MPCauxiliary problems. In order to simplify the setup, we implemented MPC in the following form: Compute the MPC-auxiliary problem, then fix the first control to the solution value, finally increase the number of stages for the MPC-auxiliary problem by one and repeat. This way, the opinions in past stages are computed from scratch in each iteration based on fixed controls. This does not make a difference in the resulting number of convinced voters. Because of the usage of rounded intermediate results (the fixed controls for earlier stages), we observed some infeasible problems. This was cured by changing the control values by ±10 −6 , which cured the problem in all cases.  Table 8. Results of the MPC heuristics for various shorter horizons based on the advanced MILP model applied to the benchmark problem with 10 stages; MacBook Pro 2013, 2.6 GHz Intel Core i7, 16 GB 1600 MHz DDR3 RAM, OS X 10.9.2, zimpl 3.3.1, cplex 12.5 (Academic Initiative License), branching priorities given according to stage structure. Table 8 shows the computational results for the MPC heuristics. We see that it is unable to find the optimal control with 11 convinced voters in 10 stages. This indicates that the control leading to 11 convinced voters must use control sub-sequences of at least length five that are suboptimal on the shortened horizon. This is another hint that optimal controls exhibit nontrivial structures.
The conclusion of this section on heuristics is that an intelligent tailor-made combinatorial heuristics works best on our very special benchmark problem. The reason for this is most probably that in the fitness landscape of this benchmark problem solutions with more than nine convinced voters are rare and hidden.
For a more general assessment, all methods should compete on a complete benchmark suite with random perturbations of parameters. This, however, goes beyond the purpose of this paper, in which we wanted to advertise the field of optimal opinion control and show some possible directions of future research.

Interpretation of Results
The computational results are twofold: first we have collected results about optimal controls including their performance (how many convinced voters are possible?) and about models and solution methods including their effectivity (how tight is the information they compute?), and efficiency (how quickly do they compute that information?).
The results about models and solution methods confirm that the BC optimal control problem is much more involved than the DG optimal control problem. BC optimal control must be modeled with care: the advanced model performs much better in all tests than the basic model. It is the method that, so far, provides the tightest information on the performance of optimal controls in all our benchmark problems. Still, our ability to find provably optimal controls is limited. Using the fast strongest-guy heuristic, we could find upper bounds for the number of achievable convinced voters that are tight for our benchmark problem in all known cases.
The results about optimal controls themselves confirm that BC dynamics causes many interesting effects: Neither is the number of achievable convinced voters monotone in time, nor does every optimal campaign for N stages constitute an optimal campaign for any smaller number of stages. In other words: Enlarging the time horizon in an otherwise unchanged campaign problem requires an all new solution. Though this sounds plausible in the real-world, we do not claim that this conclusion can safely be transferred to real-world campaign-planning -what we rather state is that it takes no more than the mechanism of BC opinion dynamics to let this effect emerge.

Conclusion and Outlook
We have introduced the problem of optimal opinion control by simply allowing one individual to freely choose in each stage its advertised opinion. All efforts to find optimal controls in a small example instance showed that the structure of optimal controls is complicated. Modeling with MILP-techniques is possible, but even sophisticated models are hard to solve. An optimal control for the campaign problem with eleven voters and one through ten stages remains open for seven through nine stages. Ten stages could be solved by the strongest-guy heuristics, which is able to convince all eleven voters in ten stages. Popular meta-heuristics like genetic algorithms and model predictive control could not find this solution. The fact that the small campaign problem is still largely a mystery, make its investigation interesting for further research. Other directions are the generalization to more than one controller (game theory) and multi-dimensional opinion spaces.
In this appendix, we present and briefly explain the detailed mathematical Mixed Integer Linear Programming (MILP) models used in this paper. This should allow the interested reader to replicate our results.
The chosen modeling technique is -not surprisingly -much more powerful in the DG-model than in the BC-model; the former takes profit of the linear system dynamics whereas the latter suffers a lot from the highly non-continuous system dynamics and the numerical instability. More specifically, our model is not able to represent the original problem exactly. We will provide, however, actually two models: one is correct in the sense that every upper bound on the objective value of the model is an upper bound on the optimal number of convinced voters in the original problem (but possibly not vice versa); the other one is correct in the sense that any feasible solution to it is a feasible solution to the original problem (but possibly not vice versa).
The motivation for using integral variables in a model for our optimal control problem is that the dynamics mainly depends on the structures of the confidence sets and the conviction sets: We can use binary variables to indicate how the conviction sets and the confidence sets, resp., look like.
Since the BC-model requires some experience in modeling with MILPs, we start with a model for the DG optimal control problem. Later on, when the main principles are explained, we will present a model for the BC-model.
A.1. An MILP Model for the DG Optimal Control Problem. In this section we present a mathematical model, a Mixed Integer Linear Programming model (MILP), for the solution of the DeGroot optimal opinion control problem. We start with the DeGroot dynamics in order to explain some crucial MILP-modeling techniques in this easier dynamics. These techniques will be used extensively for the bounded-confidence dynamics, in which the logic is considerably more complicated.
The following MILP model is based on standard modeling techniques in (MILP). We first list the variables of the model.
• The continuous variables x t 0 ∈ [0, 1], t = 0, 1, . . . , N − 1 denote the positions in opinion space where we place a control in the various stages; these are the variables that we are really after.
• The continuous variables x t i ∈ [0, 1], i ∈ I, t = 0, 1, . . . , N denote the positions of the voters in the various stages; these variables measure the system states. The variables in Stage 0 are given as input data (start state / start value).
• For each voter, we want to measure whether its position in Stage N is inside the conviction interval; to this end, we use binary variables z i ∈ {0, 1}, i ∈ I, with the following meaning: z i = 1 if and only if i is convinced in Stage N , i.e., x N i ∈ [ , r]. With this, we may formulate the goal of the model: we want to maximize the number of convinced voters, which can be expressed as follows: Now, the success measuring variables z i have to be coupled with our decisions x t 0 via the system states and the system dynamics. The following linear side constraint couples the decisions to the system states: So far, we did not restrict the binary variables. A solver would simply set them all to 1 and achieve an objective value of n (all convinced), because the binary variables so far have nothing to do with the underlying dynamical system.
The binary variables can now be coupled to the system state variables in Stage N by a standard MILP modeling trick as follows. The logical implication must be: If z i = 1, i.e., if we want to count an voter as convinced, then ≤ x N i ≤ r must hold. In other words, the inequalities ≤ x N i ≤ r can be violated when z i = 0, but they must be satisfied whenever z i = 1. Thus, whether or not we demand the restriction ≤ x N i ≤ r depends on the value of a variable. We call such a conditional restriction a variable-conditioned constraint and write it as ≤ x N i ≤ r vif z i = 1. The MILP modeling trick can transform such a variable-conditioned constraint into a set of unconditioned constraints in all cases where the violation of the variableconditioned constraint is bounded.
We show the transformation for the inequality ≤ x N i , the other inequality can be handled analogously. The maximal violation of the inequality − x N i ≤ 0 is , since − x ≤ for all x ∈ [0, 1]. That means, the inequality − x N i ≤ does trivially hold, no matter where x N i is in [0, 1]. We want to impose the trivial inequality − x N i ≤ whenever z i = 0 and the non-trivial inequality − x N i ≤ 0 whenever z i = 1. But this can be achieved in one step by imposing the inequality The analogously derived inequality for the right border of the conviction interval reads The complete MILP reads as follows: subject to x t i ∈ [0, 1] for all t = 0, 1, . . . , N , i ∈ I ∪ {0}.
In the following we will not spell out anymore the results of such transformations. Instead, we will present the variable-conditioned constraints literally in order to make the logic more decipherable. The above MILP with literally expressed variable-conditioned constraints reads as follows: subject to x t i ∈ [0, 1] for all t = 0, 1, . . . , N , i ∈ I ∪ {0}.
The reader should bear in mind that MILP models with additional variable-conditioned constraints with bounded violation can be transformed into true MILP models. Thus, such models are accessible for standard MILP solvers like cplex (ILOG 2014). Modeling languages like zimpl (Koch 2004) even support variable-conditioned constraints directly. The MILP for the DG-model can be solved efficiently by of-the-shelf software like cplex. In particular, solving our benchmark problem for any number of stages between 1 and 10 is possible. For example, 11 convinced voters are possible with only one round for uniform weights, and this does not even need the help of a control.
A.2. MILP Models for the BC Optimal Control Problem. The optimal control problem in bounded-confidence dynamics is non-continuous, thus non-linear. Nevertheless, one can construct an MILP model for it by using variable-conditioned constraints in a similar way as in the previous section.
We introduce a real parameterˆ (meant to be of small absolute value; in our computational experiments we choseˆ = ±10 −5 ) with the following meaning: whenever j is not in the confidence interval of i, then |x i −x j | ≥ +ˆ must hold. Forˆ > 0, this is stronger than the original condition, which is: if j is not in the confidence interval of i, then |x i − x j | > must hold, and vice versa. This original condition is a strict inequality that can not be handled directly in MILPs, and a transformation to a different MILP (in modified so-called homogeneous variables) is usually numerically highly unstable.
With the modified condition we can choose either to exclude potentially feasible solutions (this happens forˆ > 0) or to include potentially infeasible solutions (this happens forˆ ≤ 0). In the latter case, we grant the optimization algorithm to choose freely in particular whether or not j is in the confidence interval of i whenever |x i − x j | = .
Thus, the model needs to be applied twice: for the identification of feasible solutions we need to setˆ to something strictly positive, and for the determination of upper bounds on the optimal number of convinced voters we need to setˆ to at most zero. The larger the absolute value ofˆ is, the more robust the conclusions are against rounding errors.
If we run the model only once with, e.g.,ˆ = 10 −5 we allow for a small inaccuracy in the upper bound obtained by the model. Such an inaccuracy can not be avoided when a standard MILP-solver is used -the most powerful solvers like cplex (ILOG 2014), xpress, or gurobi use bounded-precision floating point arithmetics, and an accuracy of 10 −6 is a common setting. Any solution that we miss this way, however, would be non-robust in the sense that a slight deviation from the system dynamics would lead to a different objective.
There are several modeling options out of which we present two. The first model extends the MILP for the DG-model using similar techniques to a much larger extent. We could solve it by the standard MILP solver cplex up to N = 4 in less than an hour. For N = 5 the solver could not even get close to a proven optimal solution in weeks. The second model is a carefully engineered, more complicated system comprising some experience in MILP techniques. With the second model we were able to solve the benchmark problem up to N = 6 with cplex.
We suspect that the solution for N = 7 and above requires tailor-made MILP models and solution techniques.
For both our models, we assume that all voters are numbered according to their starting opinion, i.e., i < j implies x 0 i ≤ x 0 j for i, j ∈ I. This saves some work since the order of voters in the opinion space does never change, due to Lemma 4.1.
Our first, basic model uses the following variables: • The control variables x t 0 , t = 0, 1, . . . , N − 1 model the opinions published by the controller, as above. These are the only independent decision variables. The remaining variables are dependent measurements to compute the objective function.
• The state variables x t i , i ∈ I, t = 0, 1, . . . , N measure the opinions of agent i in Stage t, as above.
• The binary indicator variables v t i,j are one if and only if voters i < j are within distance , i.e., they influence each other.
• The binary indicator variables l t i , r t i , and c t i are one if and only if the control in Stage t is to the left by a margin of at leastˆ , strictly to the right by a margin of at leastˆ , or inside the confidence interval of voter i. • The binary indicator variables z i , i ∈ I, are one if and only if voter i is within the conviction interval [ , r] in the final Stage N , as above. • The measurement variablesx t 0,i , i ∈ I, t = 0, 1, . . . , N , denote the contribution of the control opinion x t 0 in the system dynamics formula in Stage t; this variable must equal x t 0 if the control is in the confidence interval of voter i; it must be zero otherwise. • The measurement variablesx t j,i , i, j ∈ I, t = 0, 1, . . . , N , denote the contribution of the voter's opinion x t j in the system dynamics formula of voter i in Stage t; this variable must equal x t j if that opinion is in the confidence interval of voter i; it must be zero otherwise.
• The count variables k t i , i ∈ I, t = 0, 1, . . . , N − 1, denote the number of voters in the confidence interval of voter i in Stage t. With this set-up and the aforementioned use of linearized variable-conditioned constraints, we can formulate the following basic model. The logical details are explained right after the presentation of the MILP.
subject to vif c t i = 1 then end ∀ k = 1, 2 . . . , |I| + 1, t = 0, 1, . . . , N − 1, i ∈ I, vif z i = 1 then The objective function (21) counts the number of voters in the conviction interval in Stage N . Restriction (22) sets the positions of the opinions in Stage 0 to the given start values. Constraint (23) demands (together with the fact that all involved variables are binary) that exactly one of the variables l t i , r t i , c t i must be one. The meaning is that the control is either strictly to the left, to the right, or inside the confidence interval of voter i in each Stage t. With restrictions (24) and (25) we request that whenever c t i = 1 the distance between the control and voter i is no more than so that the control is really inside i's confidence interval. In contrast to this, inequalities (26) and (27) make sure that whenever r t i = 1 resp. l t i = 1 the control must be to the right resp. to the left with a distance of at least +ˆ from voter i so that the control is really outside the i's confidence interval. Restrictions (28) and (29) make sure in a similar way hat the value v t i,j correctly indicates whether or not i and j are in each others' confidence intervals. The case distinction between a large distance to the left or to the right is unnecessary because of the order of all voters' opinions, reflected by the indices, stays fixed throughout the process. Constraints (30) sets k t i to the number of opinions in the confidence interval of voter i. Constraints (31) and (32) compute how much the control's opinion contributes to the next opinion of voter i. This is either the control's opinion in case c t i = 1 or zero in case c t i = 0. Similarly, constraints (33) and (34) compute the contribution of voter j to the next opinion of voter i depending on the value of v t min(i,j),max(i,j) . Depending on how many opinions are in the confidence interval of voter i, we can now compute its next opinion by restriction (35). Constraints (36) and (37) make sure the the classification in variable z i of being convinced is consistent with the distance of i's opinion to the conviction interval.
Our second, more sophisticated model uses the following variables: • The control variables x t 0 , t = 0, 1, . . . , N − 1 are as above. • Similarly, the state variables x t i , i ∈ I, t = 0, 1, . . . , N are as above. • For j min , j max ∈ I and c l , c r ∈ {0, 1}, we introduce variables v t i,(jmin,jmax;c l ,cr) where v t i,(jmin,jmax;c l ,cr) = 1 if and only if the following holds: voter j min is the minimal index of a voter in the confidence interval of i, voter j max is the maximal index of a voter in the confidence interval of i, Index c l = 1 if and only if x t 0 ≥ x t i − (i.e., the control is not to the left of the confidence interval of voter i), and Index c r = 1 if and only if x t 0 ≤ x t i + (i.e., the control is not to the right of the confidence interval of voter i). In particular, all variables v t i,(jmin,jmax;0,0) must be zero. The motivation for these variables is that they are indicating the unique combinatorial confidence configuration (j min , j max ; c l , c r ) of a voter: If v t i,(jmin,jmax;c l ,cr) = 1 then we know by Lemma 4.1 that all voters j ∈ I with j min ≤ j ≤ j max influence i and that the current control influences i if and only if l = r = 1. In MILP language, these variables are assignment variables that assign to each voter a unique combinatorial confidence configuration.
• For j min , j max ∈ I, we introduce variables p (jmin,jmax) where p (jmin,jmax) = 1 if and only if the following holds: j min is the minimal index of a voter in the conviction interval in Stage N , and j max is the maximal index of a voter in the conviction interval in Stage N . The motivation for these variables is that they are indicating the unique combinatorial conviction configuration (j min , j max ) in the final stage: If p (jmin,jmax) = 1 then the number of convinced voters in Stage N is simply j max − j min + 1.
With the variables above, a logically consistent model can be formulated, which can solve the benchmark instance up to N = 5. Some additional engineering effort was required in order to help cplex (ILOG 2014) to obtain the optimal value for N = 6 as well. For this, we need the following auxiliary variables.
• For each voter i ∈ I and each Stage t = 1, . . . , N we introduce measurement variables λ t i and ρ t i denoting the left and right distances of voter i to the conviction interval [ , r]. The motivation for these variables is that they provide a continuous measurement for how close we are to convince more voters in Stage t + 1. Thus, with these variables we can perturb the objective function to reduce the dual degeneracy of the model, i.e., solutions with identical original objective value up to Stage t have distinct perturbed objective values, hinting at which solution has better chances to improve in the later stages. • For i, j ∈ I with i < j and t = 0, 1, . . . , N we introduce binary indicator variables u t i,j with the following meaning: u t i,j = 1 if and only if in Stage t the confidence interval of voter j contains voter i. This is the case if and only if in Stage t the confidence interval of voter i contains voter j. The motivation for these variables is, first, to transfer the above symmetry relation to a relation among combinatorial confidence configurations and, second, that branching on these new additional variables leads to more balanced subproblems than branching on the variables for the combinatorial confidence configurations.
• In the same spirit, we introduce for i ∈ I and t = 0, 1, . . . , N binary indicator variables s t i with the following meaning: s t i = 1 if and only if in Stage t the control is in the confidence interval of voter i. The motivation is again that a more balanced branching is possible.
The main term of the objective (41) determines the number of convinced voters by the help of the variable p (jmin,jmax) , which is one if and only if j min is the minimal index and j max is the maximal index of a convinced voter. The perturbation (42) adds one and subtracts a penalty term less than one from this number. The penalty is essentially the normalized average distance of the non-convinced voters to the conviction interval. The motivation of this perturbation is, that the standard solver, when branching on variables with increasing stage index, has a chance to identify those partial solutions up to a stage that have greater chances (heuristically) to increase the number of convinced voters in future stages. This influences which branches are inspected first and can lead to faster identification of good primal solutions. Restriction (43) fixes the start values, as in the basic model. Constraint (44) demands that exactly one confidence configuration is selected for each voter in each stage. Constraints (45) through (52) makes sure that the selection of confidence configurations is consistent with the opinions and their distances (in an analogous way to the basic model). Constraint (53) models the fact that there can be at most one conviction configuration at the end. If none of the possible conviction configurations is selected then no voter is convinced in the end. Restrictions (54) and (55) make sure that the selected conviction configuration is consistent with the distances of voters to the conviction interval. The dynamics is represented by restriction (56). Note how much simpler the computation of the dynamics becomes with the help of the confidence configuration variables compared to the basic model. So far, the logic of bounded confidence control is complete. The remaining restrictions are heuristic add-ons in order to accelerate the solutions process in a standard solver by means of the additional variables. Contraints (57) through (64) impose bounds on the distances of voters to the conviction interval. If we put them all together, the distance variables are urged to exactly those distances. Constraint (65) and (66) make sure that the additional variables u t i,j receive values that are consistent with the selected confidence configurations: voters i and j influence each other if and only if one of the confidence configuration variables v t i, (jmin,jmax;c l ,cr) and v t j,(jmin,jmax;c l ,cr) , respectively, for configurations in which i and j influence each other is one. The sum is taken over all such configurations, thus it does not matter which confidence configuration variable contributes the one. Totally analogous is the effect of constraint (67) for the additional variable s t i : it is set to one whenever one of the confidence configuration variables of the form v t j,(jmin,jmax;1,1) is one. Some additional cutting planes are provided by restriction (68), which chooses the first control value in the left half of the opinion space. This is possible because the benchmark problem is symmetric. Restrictions (69) and (70) pose bounds on how far an opinion can move in just one stage. Finally, constraint (71) explicitly demands that the order of opinions is consistent with the indices. The remaining constraints (72) through (74) specify the types of the variables.
If one spells out all variable-conditioned constraints in linear restrictions, then one obtains the problem class rocII contained in the MIPLIB 2010 (Koch et al. 2011) benchmark suite. The instance rocII-4-11 (11 voters, 4 stages) is classified as "easy" whereas already rocII-7-11 is classified as "challenge" (open problem). The full benchmark problem rocII-10-11 (status "challenge") is also contained in the suite. The MIPLIB 2010 suite constitutes the probably most important test bed used by virtually all developers of standard solvers for tuning their software products, and it may very well be that general MILP research that is totally unrelated to opinion dynamics will lead to the solution of some of our benchmark instances.
Appendix B. The parameter settings for the MILP solver simplex tolerance feasibility 1e-09 simplex tolerance optimality 1e-3 mip strategy variableselection 3 (=strong branching) mip tolerance absmipgap 1e-3 emphasis numerical yes timelimit 3600 (in the respective cases) Table 9. The cplex parameter settings that were used for all computations. Table 9 shows the cplex parameter setting that we used for our computations. This is meant for possible replication of our results. There is no reason to believe that these parameter values are the best possible. They have been set based on our general computational experience in MILP.