Model Predictive Control - Part 1: Introduction to MPC (Lasse Peters)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everyone my name is lesser peters and in today's lecture i'm going to give a brief introduction to model predictive control now before we get started i want to motivate this problem uh with this autonomous driving scenario so imagine that we're controlling a car here at an intersection in bonn and we were to yeah we are tasked to sort of safely navigate this car over this intersection now if we were to solve this task there's several steps involved so typically we would first run some kind of perception pipeline that will detect that we will use to detect other cars and then also detect the drivable area of the road here marked in green and then in the pre-processing step we would have some high-level planner that gives us a rough rough plan so imagine that here we want to take a right turn at this intersection and in today's lecture we are looking at the scene in an abstract perspective so we are assuming that we are given a reference plan that we want to follow and we are given a state estimate which includes the position and velocity of our own car and then also of other cars that we may treat as obstacles and we are asking the question how can we find control inputs so in this example the steering commands and the acceleration and brake pedal commands in order to achieve this plan now in previous lectures on this channel we've already discussed ideas for control in particular reactive control and so here's just a quick reminder of how that worked um and so in that setting we will typically decompose the problem into along different dimensions so for example for them for this bicycle model of the vehicle we would decompose it into a longitudinal problem where we propose a controller to control the velocity of that vehicle uh and one choice that we may take is a pid controller and then on the lateral dimension so for the steering commands uh we would use for example a stanley controller and for example the stanley controller is just this law where we plug in the crosstrack error so the lateral deviation from the reference trajectory and then using this law and multiplying this error with the gain we can map that to the steering command delta now while this these kinds of controllers are very appealing to that due to their simplicity there are several limitations which motivate the use of the control techniques that we discussed today so yeah here's the list of the limitations or at least an abstract view of that and so the first kind of limitation that is very important is that it's not trivial to extend these controllers or these ideas that for example we have from stanley to more complex systems so these kinds of controllers are kind of rules that we can write down for a very specific system um um but if for example we were to uh control uh a legged robot then we obviously can't use the stanley controller and we have to sort of rederive the whole controller from scratch and it's not immediately clear how to come up with such a rural phone arbitrary system then furthermore there are these control gains for example in stainless that we have to tune and we kind of have to make a choice and this tuning can be very tedious uh heuristic manual process then find uh furthermore we had to decompose the problem into sort of smaller problems for which we can write down these rule-based controllers and so here we decompose it into a longitudinal and a lateral controller and this kind of decoupling ignores the ignores the coupling between the two dimensions so imagine that you're controlling a car and you want to safely steer around the corner and that setting you would want to reduce the velocity in order to make steering around the corner simpler but here when you decouple this you kind of the pid controller doesn't know about the fact that the stanley controller has a hard time making its turn around the corner yeah then another important limitation is that for example the stanley controller is a rule that we can only use to track a given reference but if there's a more complicated task or we may want to avoid something like constraints or maybe other obstacles then this is not immediately captured by this kind of controller and finally these controllers always only take in a snapshot of the current time so they evaluate the current crosstrack error or maybe a filtered version of that but they don't look at a don't look how their decisions affect future decisions and so they're always sort of reactively mapping the current state or the current setting to a control command and yeah so these kind of limitations bring us to the topic that we're discussing today and so model predictive control makes use of optimal control technique and optimal control is asking the question how can we best control the system now in order to formulate a given problem as an optimal control problem there are several ingredients involved and this first ingredient that we need in order to model a given problem as optimal control is a model of the system dynamics and this model we will use to predict the evolution of the state for the given inputs so the dynamics model f maps the state which may be for example the position and velocity of our vehicle and the input which will be the steering and brake pedal commands to the next state so through the state at the next time step right and so given the sequence of inputs if we have a model we can predict the evolution of the trajectory so the evolution of this state into the future now for our example of uh of a bicycle uh of for our self-driving car um we could use the model of a uh 2d kinematic bicycle that you can you can see here so there the state would be given by the position in x and y then the heading psi so the heading of the vehicle then the velocity of the vehicle in the longitudinal direction and the steering angle delta and we can write down for this sort of kinematic abstraction of the car we can write down the state evolution using this equation where our inputs are the acceleration and the the steering rate and so in in any case this model or the model that we will use will always be an abstraction of the real dynamics so of course uh this neglects many details maybe drag or other effects but um yeah we just need a model which which with which we can sufficiently well predict um the state trajectory of the system and we will go into some of the details of the traits of trade-offs involved later now once we have a half dynamics we also need an objective and this objective is used to assign a cost to a given trajectory so it's rating how good is a given trajectory for the task that we try to accomplish and so this objective is written down as a cost function which depends upon the sequence of states x and inputs u and maps it to a scalar and so here we would typically write it down as a time separable cost where g only takes the current state and input at time t and then the cost the total cost j is the sum of these individual time costs or stage costs and so for example for the problem of reference tracking that you can see below we can write down this cost as this squared sum of terms where for the state uh if we so if this dashed line is the reference that we may wish to track and this red line is the prediction that we would get for a given sequence of inputs then we can compute this tracking cost as the sum of uh weighted sum of these errors e where the errors are the difference between the trajectory in red and the reference trajectory so the state where we are at according to our prediction and the state that we would want to be at at that time step and then we weigh these errors with this weighting matrix q to sort of say uh which part of this error which is a vector is more important than others and then also we will have some cost for applying steering commands so in order to encourage or to prefer trajectories that have a low uh have a low for example low acceleration or low steering input so we sort of prefer smooth uh trajectories that are not uh very jerky and so the task or the uh yeah the task of the objective is to encode uh the encode which trajectories are preferred uh over others so the the objective and cos the the task that we try to accomplish in this case checking the reference and then the third and final ingredient often of this kind of formulation of an optimal control problem are constraints and it constrains encode domains of allowed states and inputs and so we would write these constraints with these calligraphic x for the state constraints and calligraphic u for the input constraints and uh to give to make this a bit more concrete uh here's an example of a of a state constraint um so we can use this constraint for example to encode areas that we don't want to enter with our system for example if there was an obstacle here centered at the origin marked in red then we could express this as the following constraint we want that if you use this bicycle model that we've seen earlier we want that the squared position px plus py so the squared x coordinate plus y coordinate must be greater or equal to r which is the radius radius of this obstacle and that essentially just encodes that our trajectory must lie outside of this circle if the circle is uh centered at the origin and so um yeah these are the state constraints so they will only depend on the state variables and then we can also have input constraints that you can see on the right um so for example we may wish to never exceed a certain acceleration due to some friction limits of our tires um and then we could sort of have a box constraint so an upper and lower threshold that our acceleration cannot exceed or must not exceed and so here this profile sort of bounded below by a min and above by mx and we would write that as the following inequality so a must be greater than a min and smaller than a max and using these kind of inequalities or description of sets where our states can be in yeah we can sort of encode these state and inputs constraints now to summarize uh using all of these three ingredients we can write a given problem as an optimal control problem by using this mathematical formulation so we can write it down as a constraint optimization problem in which we attempt to minimize the objective j by choosing the states x and inputs u so a sequence of states x and inputs u over the into the future um but we are not allowed to choose arbitrary states and inputs but rather they must be consistent with the dynamics that we've chosen that is we have constraints that each of the next states must correspond to our dynamics model so the trajectory must be dynamically feasible it must be achievable by the by the system model that we took and then also uh we have these uh stated input constraints that i've just talked about so we can't arbitrarily choose x and u because x must always be in the set of feasible states so calligraphic x and you must be in the set of feasible inputs calligraphic u and then the final other constraint that we also add is that at the first time step our trajectory must agree with the initial condition that we have from our state estimation pipeline so all our plans that we make from here into the future they must start at our current position okay and so when we have this sort of mathematical description of the control problem then we can attempt to solve it now the bad news is that in general there is no closed form solution to these problems and instead we have to use the numerical methods in order to do that yeah the the details of how that is done are beyond the scope of this lecture but in practice you can often get away uh with encoding your problem in one of the existing modeling languages for constraint optimization and so there are tools with very high level interfaces such as cassadi which has an interface in c plus plus python and matlab or jump which is a constraint optimization library in julia and the beauty of these libraries is that you can take this mathematical description of the problem so here a very simple problem over two decision variables x and y with this very simple objective and only two constraints and if we were to encode this problem for example in kassadi and python we can just declare what are the variables that we so the decision variables that we can use to minimize the cost then we symbolically symbolically describe the objective and because constraints and then we can just call a low level solver yeah and so if you know how to write a given problem uh or how to encode a given control problem as this in this constraint optimization form you can already use these libraries to accomplish the task now are we done unfortunately we're not so this just using this un optimal control technique that i've just mentioned um will not be immediately successful and uh yeah there are several reasons for this so if you imagine that we just sort of solve this optimal control problem and then we take the sequence of controls that we had and play them out to achieve our task we would get to the problem that we use the prediction model that will always be wrong to some extent so the prediction model was an abstraction of the real system and so it will have errors and because we would take sort of multiple steps after another often these arrows will accumulate over time and they will result in a diverging trajectory of the system so if you imagine for the same set of inputs so the same set of or sequence of inputs so the same sequence of steering commands and acceleration and brake pedal commands if we apply them to the rear system we may follow this blue trajectory but then if we apply these the sequence of states and sorry control the sequence of controls to our prediction model then we would follow this other trajectory and therefore if we just sort of play out our sequence of controls and inputs in open loop we will not accomplish our task very accurately and the second reason um why why just applying this technique is uh intractable or it's not a good idea is that even if we had the perfect model so maybe we perfectly knew how our system behaves uh to the inputs that we apply um then we may have a very long task horizon so imagine that we have this car that is tasked with to drive around this loop so it's a task to follow this uh dashed reference line and if we write were to write this down as an optimal control problem for this entire sequence on this entire lab then we have the problem that we have a lot of time steps that we have to consider in our optimization and this is typically intractable so we have we have to we get a very large problem that is hard to solve in in a limited amount of time and therefore model predictive control adds one further idea to this framework of uh optimization which is the idea of receiving horizon control and so the idea that is added there is that rather than optimizing for the full trajectory into the future we start at the current time step and we consider only the current state and then from there we only look uh we take a look ahead so we we only consider a limited preview into the future maybe the next 50 steps and so we measure the state we and then we task our optimizer to optimize the next 50 time steps into the future and from these 50 time steps so we have the optimal solution for the next 50 time steps from our optimizer we'll then only take the very first input so we'll only apply the first input to the system and then measure again uh the new state so we would then re-plan from there and so the idea is rather than playing out these inputs in open loop we always only apply the first one and then measure again where that brought us now schematically that would look as follows so for this example of driving in a lab with an autonomous vehicle imagine that we are here at t0 and then rather than planning for the full lap we only plan for the next say 50 time steps then this may be the optimal trajectory that we get from our optimizer at this current time step then we apply only the first bit of that input sequence so maybe just the first input and then we'll end up somewhere here and the point is that we may not exactly end up where we plan to be but we will hopefully be close every if we have a reasonable model but because we measure again and we now replant from here again for the next 50 time steps we are able to uh to sort of capture these errors because we always crave the system for the true state and so we replant from here and then we get the next trajectory so the advantages are that we are able to account for errors because by re-planning we introduce feedback and then also reduce the problem side we reduce the problem size because we only need to ever plan for the next say 50 time steps here rather than for this whole loop now from an algorithmic perspective this looks this is sort of the summary and it's really just what i said so as an input we take the objective the dynamics model a planning horizon and for a local solver we typically also need an initial guess to speed up the solve and right and so we will initialize our controls with this initial guess and then we loop over this following sequence we measure the state so we get the sort of the current state from the state estimation pipeline and then we plug in our current initial guess into the solver together with the objectives and the dynamics then we get the sequence of controls we get only the first control input and apply that to the system and then we start again and one important point here is that because the controls between different different time steps typically only vary by a small margin it is a good idea to reuse the controls from the previous time step and use them to warm start the next solution to to make this uh solution process faster because we have to do it online so we have to do we have to be able to do it at a high rate okay and if you do all of this um then here's a small toy example of uh of a this small blue car steering around this uh track which is given by this blue rather jerky reference but because we run a modern predictive control and we we only allow for dynamically feasible trajectories the closed loop trajectory that we get is this very smooth orange line which we get from these green predictions at every time step now that we've discussed the high level idea of model predictive control i want to briefly discuss the different design parameters that are involved in order to successfully design an npc controller for a given system and so these design parameters are range from the prediction mode that we use in our optimization over the cost function that we use to encode the task to the prediction horizon which is the number of time steps that we include in our optimization at every time step and so at every at every optimization step and then finally i will also discuss terminal constraints which are additional constraints that we can add in order to improve the safety of our system okay so let's start with the prediction model and so the general tradeoff that we have to solve when choosing a prediction model is between accuracy that is we want to be able to very accurately predict how our inputs affect the future states of the system and complexity that is we want to have a very simple model that allows us uh to solve this optimization problem quickly so we want to have a very uh small model with only a few number of states and so these two things are typically competing so often a very accurate model has a lot of states in order to capture the high fidelity details and finding a good trade-off here is the main task when choosing for a model family for for example for this car a bicycle model a kinematic bicycle model bay may be good if we had low velocities but then if we had higher velocities we will also have to include dynamic effects such as friction and aerodynamic drag but once we've committed to a model family so we kind of know the model class that we are in then if we only have some parameters that we want to estimate so for example we want to estimate the drag coefficient or some friction parameters then we can get these in a very principled way through a data data driven approach and the way that we would do that is that we can collect a data set of the real system behavior here denoted with calligraphic d and each of the data points in that data set is comprised of these tuples of x prime d which is the next time step that is obtained when you apply ud so the input ud at the time at the state xd and the way that you would typically get this data set is for example imagine for your car you can just drive that car on the loop that we've just shown earlier so you can just drive the car and record the sequence of inputs and the resulting states and then you can chop up um so you record that from your uh true system and then you can chop up this trajectory uh in into these tuples uh to get the to compose this data set and our task will then be to yeah if we have this data set we can optimize the unknown parameters such as drag and friction we can collect them into this parameter vector theta and then we can again cast this as an optimization problem so we can say that we want to minimize a loss function over this data set by choosing a good candidate or making a good choice of these parameters and this loss function is the squared sum of prediction errors between or the squared sum of errors between the predictions that that we would make from xd in our data set when applying uh input ud and the true next state so this is what our model would give us for a given parameter theta so that this is the state transition that the model would give us and this is the true next state and we want to minimize the squared sum of these errors right so the second kind of parameter that i just want to touch on very briefly is the cost function so the cost function is the uh is used to encode the task and when you design a cost function you have to be careful or you have to make sure you have to understand that you induce a certain optimal behavior so for example if you look at this task of following this road one way in order in which you can design a cost for this problem is by placing a reference at the center of this road and then you can just use a squared penalty such as the uh the squared cost function that i showed earlier where we penalize the crosstrack error but uh the behavior that we get from that is this very sort of conservative uh line that is trying to stay near the center line because that's exactly the uh the objective that we encoded for an autonomous driving scenario so a non-competitive setting this may be uh the right uh way of encoding this task but if we want to have more competitives in our behavior so for example we may wish to drive along this track as fast as possible then this is not a very suitable formulation because driving fast along this lane along this road is not is not easily achieved if we are constrained or if we sort of encourage to stay along the center in that case if we want to drive fast a better formulation of the task is that we just include the track bounds as constraints so we have to we say that all these positions so all the states that we consider must be within the track bounds and then we only use the cost function to encourage progress along the tracks along the track so for example we have a penalty for for low velocities and then the third kind of parameter is the prediction horizon so the number of time steps that we consider in our optimization at every round and so there again we have a trade-off so we want to have ideally want to have a short prediction horizon so that we have a very simple optimization problem with only a few number of time steps that we have to consider uh if we only consider a few time steps into the future then we get very myopic so very short-sighted behavior and this kind of myopic behavior can either be inefficient so imagine in the scenario where this car is placed down here on the left and we only consider the next five time steps and maybe inefficient because here we decide to accelerate and then we have to break a lot in order to make the turn and so we're sort of wasting uh we're wasting our control authority to correct the mistakes that we made earlier due to uh due to this uh short-sightedness but it may even be unsafe so imagine that here we decided because here we only see the straight segment we decided to accelerate so much that we're not even able to keep the car within the track bounce once we see that the car that this road is making this turn here and so yeah one way in order to avoid this sort of unsafe and inefficient behavior is by making this uh by making this trajectory uh longer so taking more time steps into account so using a longer prediction horizon but another way that you can achieve that is by including terminal constraints and these constraints are additional constraints that ensure recursive feasibility so with feasibility i mean that at this previous setting here where we are not able to make the turn the problem became infeasible in that there's no states there's no control inputs that we could apply here that are in the allowed set of controls so in calligraphic u that would keep our car within the track bounce because we were already so fast so this problem became infeasible and recursive feasibility means that if we have the correct if the truth model of the system agrees with the prediction model that we have then this problem will always be solvable even if we only consider a limited number of time steps so even though we don't see this corner coming and one way that we can achieve it in this very problem where we have a static environment is that we can constrain the velocity at the end here to be zero and that means that we are only considering plans for which uh at the end we can bring the system to a stop so we can bring the system to a stop over our prediction horizon and that will implicitly prevent us from ever going so fast that we could not uh bring the car to a safe hold when maybe some here comes a very short or very sharp corner or maybe even a a wall or something appearing okay so now to summarize uh the pros and cons of a model predictive control in particular in contrast to reactive control um we have seen that we can explicitly handle constraints that we can use to encode obstacles so positions of obstacles or the track bounce as in the previous example due to this receding horizon control scheme we explicitly and also account for future decisions so our uh optimization procedure uh explicitly accounts for how does an uh the controls at this current time step affect uh the controls at the at the later time step and we are also doing that uh coupled so we controlling steering and acceleration simultaneously for example for this car and finally we got a very uh systematic procedure in order to construct a controller so all we need to do is write down the dynamics for the system and the costs and then potentially additional constraints and once we have that we can hand this description of the problem over to an optimizer to to get this the control law and we don't have to derive this sort of rule-based controller by hand for the given system a downside of that is that we have to solve this optimization problem online and so these kinds of methods they will have higher computational and complexity and more memory requirements than a simple rule-based controller where all you need to do is quickly evaluate for example the crosstag error and plug it into the simple rule to finish this section on model predictive control i want to hide a few applications of these techniques in the wild and the first kind of example i want to show you is a learning based model predictive controller from uber rosilio from uc berkeley and in this work they have extended the idea of vanilla model predictive control with learning of a cost term at the end of the horizon which is used to account for decisions beyond the prediction horizon and you can see that when they apply this idea to a full-size car here on the racetrack in the beginning this car is driving very slowly and conservatively because it doesn't have a good idea of the cost to go so the cost beyond the because the cost beyond the prediction horizon but after driving a couple of laps on this track it has sort of gathered experience and is able to compress that into this cost to go term and they are able to even reach the friction limits of the tires here in this example and the second example i want to show you is the whole body model predictive controller from marco hutcherslip at etherics or the robotic systems lab and this is a great example uh that model predictive control is applicable beyond uh these car-like models like that we can typically capture well with bicycle models so here they're controlling this wheeled legged robot to perform all sorts of tasks different gates moving over obstacles and so you can see that model predictive control is a very expressive technique that can be applied to a wide variety of different systems now now that you've seen the sort of expressiveness of mpc i also want to briefly highlight one of the main limitations of model predictive control which arises when interacting with other agents so if there are other thus far we've always considered scenes where there's only a single agent so there were only there was only this autonomous car that we maybe want to control over the track but if there are other agents in the scene then we kind of have to account for them and one very traditional way that we could do that we can could use to do this is that we can treat other players as dynamic obstacles so we can predict where they will be in the future with some motion model for example a constant velocity model or with some more elaborate model that we are getting from some prediction pipeline externally however the problem is that we typically can't statically predict where other players will be in the future because these players plan as well and thus they are their position in the future will depend on our own actions because yeah if i sort of drive close to this car for example it may sweep to the side and so this decision making process of multiple players is highly coupled and that is something that is not captured by model predictive control at least not by the vanilla formulation and so one extension that i just want to briefly provide is an extension to dynamic games and so um yeah here i just want to highlight the main difference to an optimal control problem you will see that the on the high level this problem looks very similar so in order to cast a given problem as a dynamic game we need the following ingredients we need first some joint dynamics which describe the evolution of the state x but now x involves the state not only of a single agent but of n agents and then these dynamics f they have uh they don't don't only take a single input u but they take an input for every player so u1 to un and they take uh this dynamics take the same role they are used to predict the state at the next time step from the previous time and then additionally because now we have n players making a decision in this system each of the players has their own objective and this objective can again be written as this cost function but here now with index i so player i has objective j i um and this depends on the states in the future than on the states of player i itself so u i and then also sorry the inputs of player i which are ui and then also the inputs of everyone else which is uh u not i so we denote it like this and now because we have n different uh objectives we cannot simply solve this as a single optimal control problem uh because uh yeah there's not one objective that we want to uh that we want to uh one cost function that we want to minimize instead the solution will be an equilibrium and a very popular solution concept is the one of the nash equilibrium which means that every player is sort of competitively competitively doing this optimization for their own objective until they reach our this equilibrium state in which no player can unilaterally improve their respective cost so mathematically that would be improved uh would be written as we want the costs of player i at the optimal input sequence to be smaller than all other costs that they could achieve with the when they changed their inputs if everyone else stayed at their optimal inputs so you can sort of think of this as multiple players pulling at the joint state trajectory with their inputs in order to favor their own objective now solving these kinds of problems is beyond the scope of this lecture however i do want to show a few applications of this kind of idea and so the first example is this toy example from forest lane that he has used to encode attack game so two players trying playing tag where the objective of player one is to minimize the distance to the blue player and the objective of two so player two is to maximize the distance so red is the chaser and blue is the evader and the dynamics of these each each of these players is modeled as a 2d point mass and you can see that by solving this game over these two players we get this very competitive and highly interactive uh strategy uh where red is able to sort of chase blue into a corner and uh tag them the second example that i would like to show is this work from alexander lineker from eth zurich who has developed a game theoretic controller for racing of these model cars and there the objective is for each player is to not collide with other players but stay in the track bounce and stay ahead of other players as well as maximizing track progress and you can see that by solving this dynamic game over these two players you get this very competitive wheel to re racing behavior on this model track for these toy cars to conclude today's lecture i'd like to summarize what we've discussed today so we've started with a discussion of limitations of reactive control which motivate the use of optimization based techniques that we've looked at today then we've discussed how we can cast a given control problem as an optimization problem and how we can wrap this optimization in a receding horizon scheme to obtain a model predictive controller then we've discussed different design paramite parameters in order to make a model predictive controller performant for a given system in practice and finally we've also reviewed some of the limitations of mpc in the presence of other agents and potential extensions to games now if you're interested in this topic of model predictive control there are several resources that i can recommend if you're interested in model predictive control in general there's this book by a franchise coperelli on predictive control and there's also the lecture material from uc berkeley available online if you're interested in how you can solve the optimization problem that we had at the inner loop of motor predictive control and how you can solve that without something like ipr like jump or kassadi you can consult this book on numerical optimization by notre dame wright and finally if you're interested in control in general and also like to consume this material in terms of videos there's this very good youtube channel by stephen brunton on this topic okay with that we've reached the end of today's lecture thank you for your attention
Info
Channel: Cyrill Stachniss
Views: 2,596
Rating: undefined out of 5
Keywords: robotics, photogrammetry
Id: XaD8Lngfkzk
Channel Id: undefined
Length: 42min 18sec (2538 seconds)
Published: Tue Oct 12 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.