Model Predictive Control - Part 2: Numerical Methods for MPC (Lasse Peters)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everyone my name is lesser peters and in today's lecture we're going to discuss numerical methods for model predictive control and so this lecture assumes that you already have a good understanding of the high level idea of model predictive control or mpc for short and in today's lecture we're going to look into some of the mathematical tools that are used in the background to solve these problems and so let's begin by briefly reviewing the high level idea of npc so let's look at this example of a car following a race track the circular race track which is given by this dashed reference and the idea of npc is that at every time step um we look at our current state which in this case may for example be the position and heading of this car and its velocity and then we make a plan for a fixed or a finite uh time into the future so maybe uh 20 or 30 time steps into the future using a model to make these predictions and optimizing our plan our receding horizon plan so the the plan that is um stretching in a fixed distance into the future uh for a given objective and so um the high level idea is as follows we always take the current state and then we have some objective for example minimize the deviation from this reference while also maximizing progress along the track and then this state and the objective are handed over to this mpc module that is making predictions into the future and optimizes the controls so the inputs to the system for example the steering and brake pedal commands are to minimize the cost so to fulfill this objective and then we have a plan for say uh a couple of time steps into the future maybe 20 or 50 but we will only apply the very first of those inputs and then from there we start again so we again take the state we plug it into our mpc model module to do to update to get a new receiving horizon plan so new plan into the future we again take only the first step so we again only apply the first input and so we do this in a loop and in this in this framework there's one important part which we'll focus on today which is the optimizer so the tool that is uh the tool that is optimizing this trajectory uh to fulfill our objective so to minimize the cost j and so the mathematical formulation of this optimization in the inner loop was as follows so we could cast this control problem as constraint optimization where our task was to minimize some cost function j by choosing suitable states x and inputs u and this is a this is a sequence of states in inputs u over the prediction horizon into the future where the prediction horizon is capital t but we are not interested in arbitrary states or inputs or we can't just choose them arbitrarily but they are coupled because they need to be dynamically feasible that is the states that we are choosing they must be achievable by the dynamics model that we have and so for example if we have a car we can steer and we can accelerate to the front and back but we cannot immediately move sidewards because the tires are preventing us from doing that and so we have dynamic feasibility as an equality constraint here and then we may additionally have some state and input constraints meaning that we can only choose certain controls maybe we have a maximum acceleration or for the state the maximum velocity that we allow or a maximum deviation from the track and finally we have an initial initial constraint initial state constraint meaning that all our plans must start from this x in it which we may get from our state estimation pipeline so we know where we are now and the objective is telling us what we want to do so make progress along the track and we have to make sure that all our states start from our current position and just as a quick reminder on notation we use this uh x one two capital t and u one two capital capital t to denote the the tuple of all of these states uh and inputs okay and so at our previous lecture where we've discussed the high level idea of mpc which you can also find on this channel there we've just assumed that we have access to some software tool to solve this problem so that lecture is focused on how can we synthesize a given so how can we write a given control problem as an optimization problem and then once we have that we have this uh formulation that we can hand over to one of these libraries for example kasadi which is a numerical optimization library or constraint optimization library which has interfaces for c plus python and matlab or there's also jump which is a similar library in julia and all of these libraries work in a way that you can you have your problem description so here's a very simplified uh toy problem of constraint optimization and you can make a symbolic description in a very compact form of this problem and then just hand it over to the optimizer and it will solve it for you now um in today's lecture we want to sort of get rid of this assumption that we just have some tool that it magically tells us the solution instead we want to look behind the curtains of what is involved in order to solve these kinds of problems and so yeah so that's the focus of today we'll look at the the numerical background of these tools and before we dive into these methods i want to quickly simplify the problem a little bit for the scope of this lecture and so here we have a slightly reduced problem formulation where we don't have state and input constraints so we'll we will drop these constraints where the controls can only be in a specific set and the states can only be in a specific set so the only uh constraints we are left with are the constraints for initial conditions and for dynamic feasibility and so um yeah we we end up with this uh smaller problem uh which is uh to some extent simpler to solve okay and so um another thing i want to remind you is that uh and that's also something we've already introduced in the previous lecture is that we will assume that our objective is time separable that is we don't just have an arbitrary cost function here but we assume that we can write this cost as a sum of stage costs meaning that there are these small g so g of x and t so at the of the state and input at each time step and we can write the total cost as the sum of these stage costs so the total cost over the entire horizon will be the cost at the first at the second plus the third and so on and so this will be an important assumption for some of the tools that we will develop throughout this lecture okay and so before we directly dive into how we can solve the problem i want to introduce one very important tool that we will make use of throughout this lecture which is the idea of state elimination and the idea of that is that we can reduce the problem even further by getting rid of the state decision variables and converting this problem that you've seen on the previous slide into an unconstrained optimization problem and so i'm hoping that you may recognize that as something that you've already seen in other disciplines so to make that more familiar to you and so the idea of state elimination builds on this idea of forward simulation which is that we have this very specific structure of the problem um that all the states are always connected to the previous state while the dynamics constraints and so if we have the initial conditions which we will be given by our state estimation pipeline for example and then we can get rid of all the future states by writing them in terms of control so the idea is that in the end we only want to have the control variables so the u 1 to t as as decision variables in our optimization and we don't want to have the states and so the idea is that we can write trivially we can write x1 as x in it because that's just our initial constraint and then x2 we can write as just simulating the dynamics one step forward in time so we apply f which was our dynamics function which takes x one step uh into the future and so we can apply f on the initial state with the first input and we get the second state and we can do that recursively so so we can do it again for x3 we just apply two times f where the inner f takes the initial state and applies the first input and the then the outer uh the outer dynamics take the that resulting state which is now x2 and applies u2 and so we can do this um one after another and so we can define this function ft where f t is the function that takes the initial state at time step t and then the sequence of controls ranging from that small t so the intermediate time step uh small t to capital t so until the end of the horizon and it will give us the resulting trajectory so the induced trajectory that we just get from applying these inputs in open loop from that state and so f1 is for example this forward simulation function starting from the initial state so from x1 and that's just uh the thing that we've seen on the previous slide and so that exactly gives us the entire trajectory if we plug in the initial state and the entire sequence of controls now why this is why is this useful so we can use this to eliminate the state variables and rewrite our objectives in terms of only controls and initial conditions so if we use this forged simulation function starting at the first time step then we can write our original objective which depended upon states and inputs now as this new objective j tilde which only depends on on the controls u1 uh to t so the the control sequence and the initial state and we can do that by um by substituting the first so here originally we had to plug in the sequence of states but now instead we plug in the forward simulated sequence so this forward simulation function f1 will give us x1 to t and so that's exactly what we needed as the first argument to our original cost function and so with this rewritten objective the nice thing is that now we have an unconstrained optimization problem which uh with fewer decision variables so now we only depend upon these inputs you want to ut and the initial conditions they are given so they are not a decision variable they are just fixed and the nice thing is that not only do we have fewer decision variables but we don't have any constraints because the all the constraints so the initial conditions and the dynamics constraints they are implicitly satisfied because the forward simulation function will always produce dynamically feasible states and so with this rewriting of the problem we we have just an unconstrained optimization problem that you've or may have already seen in other disciplines and so you essentially get some cost landscape here we just plotted it for two decision variables so say you had only u1 and u2 and these were only scalars then maybe the cost looks something like this and your task is now to minimize this cost by taking a suitable choice of these control inputs and in general um this cannot be done analytically so we can't just say okay this is the minimum we can just step there instead we have to take an iterative approach to approximating this and i'm sure you've maybe already seen this in another context where you may take a gradient descent so small gradient small steps in the steepest direction of descent so in the opposite gradient direction um towards a local minimum or maybe even a newton's method and we'll get to these kinds of methods later throughout the talk um but before we do that i want to quickly we will do a bit of a deep dive into a special case which is these unconstrained problems or problems with with convex quadratic costs meaning that the cost is uh is has only up to squared terms of the decision variable so it's a polynomial where the highest term is a squared term and so we get these sort of in multi-dimensions we get these quadratic surfaces or in one dimension you would just get a parabola and so for these convex quadratic costs the nice thing is that we can compute the minimum analytically so here we could just say if we had this entire cost landscape we could just directly step to this global minimum by taking the gradient of this quadratic function and just setting it to zero and so we're just solving for the point where the gradient of this function is zero and this kind of special case also exists in optimal control so which is called the linear quadratic regulator and so the linear quadratic regulator is a special case of an optimal control problem where the dynamics are linear and the costs are quadratic and so this is exactly the ocp that we've seen earlier just with these additional assumptions so we want to minimize our states and inputs to to minimize this objective where this objective now takes a very special form so it's the sum of these squared terms where the deviation from x from the r of x from the origin is uh penalized with some cost matrix q and the deviation of the input so any non-zero input is penalized um with a cost matrix r and our total cost is just the sum of these stage costs over time and then our dynamics so these are the quadratic costs and then the dynamics which are linear are taking uh take this form where the state at the next time step is always a times x at the previous time plus b times u so this is just a linear equation of how the state evolves over time and of course we still have our initial conditions and the last important assumption that we will make use of here is that q must be symmetric so q must equal its transpose and must also be positive semi-definite which means that it must have eigenvalues greater or equal to zero and r must be strictly positive definite meaning that it must have eigenvalues only greater than zero with that problem description at hand we can attempt to solve the lqr problem and the first approach that i want to show here is the so-called batch solution and so the batch solution attempts to solve for all controls at once so in one batch for the entire horizon and it does so by exactly exploiting this idea of state elimination that we've seen earlier and this will allow us to cast the lqr problem essentially as a linearly squares problem and so let's look at what the forward simulation idea looks like for the linear dynamics and so this is exactly what we've seen earlier for x1 we just get the initial state and then for the later states we can exploit this very special structure of the linear dynamics so for x2 is just a times the initial state plus b times u1 and then for all the later state we just states we just always take the previous expression which is the previous state we multiply it from the left with this a matrix and then we add b times the input at the time and so because of this very special structure of the dynamics we can write it down in this compact expression for an arbitrary time step into the future all the way up to capital t and now we'll using this forward simulation idea we can also write the forward simulation function as a single matrix equation so here we just stacked all of these states into one long vector so that we call this now capital x and then we can so this is the trajectory that we want to find and we can find this trajectory with this matrix equation simply by multiplying the initial state with this stack matrix which we call uh a over bar and then we collect all the terms that were multiplied with u into uh a capital so b over bar matrix and we also have this stacked inputs here and so you can convince yourself that this matrix expression is exactly doing the same thing as we've seen on the previous slide so for example x1 is just the identity matrix times x in it and then there's no uh there these entries are all zero so no input is applied and then x2 is just a times the initial state plus b times c1 and all the other entries are uh are zero so all the other u's get multiplied with zeros here and with this matrix equation at hand we can just write the entire form forward simulation as this compact matrix expression and for this sort of matrix form of our uh problem where we've now stacked uh all our states and inputs into long vectors we can also define new cost matrices and so we will form this block diagonal q over bar which is just consists of the cues at every time step stacked along the diagonal as block matrices and then r over bar just stacks the or takes the block diagonal of the original control cost matrices r and with this at hand we can write a new cost j over bar which now is which can now be compactly expressed only in terms of these stacked states and inputs simply by multiplying uh this q from left and right with x and r from left and right with u and so this looks very similar uh like our original state cost a stage cost so this the cost at a single time step but now this expresses the cost for the entire horizon because these are also the stacked states for the entire horizon and the stack controls for the entire horizon and so this will cover or this will represent the exact same objective so encode the same task as we originally had and combining this rewritten cost with the idea of forward simulation that we've seen earlier we can now write again our tilde cost um so uh yeah we take our objective and then the so this objective in terms of the stacked states and then we also had this forward simulation function as the matrix expression and so we can replace each of the x's in this expression uh with the corresponding forward simulation and if we rearrange the terms and sort of tidy up this expression a little bit then we get uh this compact j tilde or right so that's the the cost in terms of only the sequence of all the inputs with state constraints eliminated and we can write it as this quadratic expression in u so u is just left and right multiplied onto this h matrix and then there's also some linear term and a constant in u and so this h matrix is for example what you see down here so it has exactly the same shape as our original uh block diagonalized control matrix so right and with this uh reduced cost which now only depends uh on the inputs but on the sequence of all the inputs we can write the batch formulation of lqr as a linearly squares problem and so this is the unconstrained optimization problem that represents uh the the yeah the lqrs problem as a single batch so we need to solve over all the inputs u in one go for our gel uh j tilde and if we look at this this is a quadratic expression in u so that this is a polynomial uh just where uh u are vectors and the highest term that is showing up is uh a squared term and so we can check that h is um is con so this is convex because h is positive definite by construction because r we assume to be positive definite and q we assume to be positive semi-definite and so you can show that this will always be convex meaning that it's sort of upwards curve the cost landscape and so that allows us to solve for the minimum the global minimum of this uh expression simply by setting the gradient to zero and so we can take the gradient here of exactly this expression uh set it to zero and then solve for the input and so that will involve sort of bringing u star which is the optimal consoles and the controls that we are looking for to the side and so we need to take here the inverse of h in order to do so okay that concludes the section on the batch approach to lqr and so to summarize this approach exploited the fact that we can eliminate the state variables with linear dynamics in a very specific way to write this problem as an unconstrained optimization problem which we can recognize to be a linearly squares problem and then this problem allows us or this formulation allows us to directly solve for all controls in one batch a few comments on this approach so the nice thing about that approach is that we can solve it analytically which is a derived from the fact that it is a convex quadratic problem and so this is much more efficient than having to take small steps towards a local solution instead we can find the global solution simply by solving a linear system of equations but still this requires us to invert this rather large matrix which has the dimensions mt where m is the number of controls that we have and t is the number of time steps that we have in our planning horizon times mt so was this big matrix h and so yeah this dependence on the time horizon here makes it a bit problematic because if we have a lot of time steps into the future then computing this inverse may be quite expensive and so the question is can we do better and the short answer is yes and so the idea that we can follow is this idea of dynamic programming which gives rise to a different solution to lqr so technically dynamic programming can also be applied to different settings but in this lqr setting it works out quite nicely and so the main idea that we will exploit is that we can exploit the sequential structure of the problem induced by the dynamics even more to compute not only the forward simulation but also to compute a more efficient inverse of this h matrix and so we before we dive into the math i want to quickly highlight the underlying principle that that allows us to perform this dynamic programming idea and this is the principle of optimality and here we'll just start with an example before we come up with a formal definition and so um if imagine that you are tasked to plan find a plan to this goal location c and you're starting in a and so you are you know this optimal plan from a to c imagine that you you already know this plan and this optimal plan leads through b now the principle of optimality states that if this is the optimal plan from a to c then you can also take any tail portion of the plan for example this tail bit that starts in b and ends in c and this will still be the optimal plan for this tail problem meaning that if we are in b and we're asking ourselves what is the best way to get to c we can just sort of truncate or get this to get this tail portion of our original plan and invoke that and it should be easy to convince yourself that this is true imagine we can just sort of try to find a counter example or contradict it imagine there was a alternative plan from b to c that was better so imagine that this was the has a lower cost from b to c then obviously from b to c would take this gray route but if that was true then our original plan would also take this grey route from b to c so it cannot have been optimal in the first place and so yeah the summary is that we can always take any tail portion of the plan and it will be still optimal for the tail problem more formally imagine that you know the optimal solution to some optimal control problem so we know the optimal sequence of states of inputs u star from one to capital t where t is the planning horizon and then this uh following these controls will induce this state trajectory x star then any tail sequence of this original control sequence starting at some intermediate time step small t so from small t to capital t will also be the optimal solution to this tail problem over this optimal over this cost to go jt where the cost to go jt is just the stage cost summed up starting from small t to capital t so this is this cost just cares about all the stage costs after our intermediate time step t and so this table problem says we want to minimize this cost to go starting from t into the future with only the tail constraints and with the initial condition that we start on our optimal trajectory at time t and so this is exactly the same as the example that we've seen on the previous slide and so dynamic programming exploits this idea of or this principle of optimality in a very specific way and it uses it to break up the problem into smaller problems that maybe so smaller tail problems that may be simpler to solve and so imagine again the same setting that we are trying to find a plan a way to go to see and now we don't know where we start from so in previously we knew okay we start from a now we just say okay we uh want to go to c from an arbitrary state in this setting and so we start from c and then we go to some intermediate time step here t and we enumerate all the states that we could possibly be at at that time so for example we could be at this state and then we solve the optimal uh control problem from this state into the future uh to to reach c and then we also do that for this intermediate uh for this state here which was our original uh intermediate state b and we get this tail plan and we can do it for the bottom one and so we have to do it for all the states and so the controls that we compute they are not a fixed sequence of controls but rather they are a function of this state that i will be at and i have to compute this uh these controls so we have we have to compute this entire function meaning we have to find the controls for all possible states and in addition to that we will also record the optimal cost to go which is the cost that we incur if we follow uh the optimal plan meaning that what is the cost that we have to pay even if we played optimally from here onwards into the future so for this state up here even though we found the best possible sequence of controls we still have a cost of 10 in the in this example and here we have a cost of 1 and here cost of 4. and so we denote this optimal cost to go by j star and again j star t is a function of this state because what the the cost that we will incur in the future depends on where we are at this time step and using this optimal cost to go we can now extend the tail problem into so backward in time and so here we've just divided the whole horizon into two steps so we have this intermediate time step and then initial time but technically you could also take many small steps and so here we now have to look again okay at which states at which of these states sorry we have to enumerate all of the states and then we have to again solve an optimal control problem but here we don't minimize grittily how can we reach any of these states so we don't just care about the immediate cost but we minimize the the immediate cost until time t plus the cost to go that we would incur from there onto the future so for example if we were in this state then this would be the optimal trajectory because this has a potentially maybe it has a high cost but then at least here when we reach the state b we have a small cost to go and so because for example this top state here has a very high cost to go none of these states decided to go there because even though maybe we can reach this state quite cheaply whatever we can do there will be quite expensive and so even the optimal cost to go is still quite high and so now we have the second uh sort of portion of the plan which is again a function of the state now a function of the initial state and uh once we have this function uh which is a is a feedback strategy because it maps a state to a controls we can now plug in our initial conditions and so we say okay we were at state a and if you are at state a we can just evaluate this we can just evaluate the feedback strategy at that state and we recover our original plan okay and so just a few comments on this idea of dynamic programming so um you may think this is a from from this example you may think this is a horrible idea because you have to now instead of being able to solve just a single uh sort of for four single initial conditions uh to initial condition to a specific goal we now have to solve for all possible states that we could be at and in general that poses a challenge because that means that we need to be able to represent this optimal cost to go j star t as a function of state for example and so this optimal cost to go is defined as the original cost to go evaluated at the optimal strategy and the corresponding states that we get from forward simulating these forward simulating this strategy and so a naive approach to represent this optimal cost to go would for example be to discretize the state into small volumes and then compute this cost to go so solve the optimal control problem for each of these small volumes for each of those possible regions of the state space that we could be in and then just compute this lookup table and then when we sort of want to evaluate this function we just uh look up the value in this discretized setting and so while that is in general possible it is often a bad idea because this approach requires you to solve an exponential number of of problems in the state dimension and you also have to make a choice about the discretization but the good news is in the setting of lqr we are in a special case where we can where we can compute this cost to go and also the feedback strategy quite efficiently and we can represent them in closed form and this is what we will look in to in throughout this part of the lecture and so let's look at the uh recursive setup of the lqr problem so sort of the dynamic programming perspective of this problem just as a reminder we hear we have our original problem for lqr with our costs and so quadratic costs and linear dynamics and if we look at the smallest possible tail problem that is the sort of the smallest uh sub-problem that we can solve we can just look at the final stage so we just look at the very last time step and we can ask ourselves okay what is the best control that we can apply at the final time and so we get only a single decision variable because this uh best control that we're looking for is a function of state so the state is given and so from the perspective of this optimization problem uh this is a constant because x is a constant and then uh we have to just minimize this and so because we assumed r to be positive definite meaning that any non-zero control input would result in a positive cost the best thing that we can do is just to simply not apply any input at all and so the solution of this optimal this optimization problem would be that we don't apply any input and in fact it is also quite common to just get rid of this these inputs at the final time entirely because they will always just turn out to be zero anyway and for the optimal cost to go so at the final time step we can just evaluate our original cost here or the original cost to go so which is just the last uh term of this summed cost so the just the last time step so if small t equals capital t we can just plug in the solution where u is zero and so if we look at this um then if u is zero then this term is gone and all we have is this and so the optimal cost to go um as a function of state is just x transpose p x where p is equal to q where q is our state uh state cost matrix and so um we will use this p to uh to track the cost to go on the state and um which is something that we have to compute and in contrast to this small q which is something that we are given as part of the problem description and so we have to here finding p is trivial we can just sort of uh look at this and see that p is equal to q but later we will have to update it for the next time steps and so let's look at how this update works so now we take this extend this problem one step backwards in time so now we try to find the input the best input at the previous time step so at the yeah at the t minus one so the second to last time step and so now um we need to optimize over these two decision variables so what we have to optimize is uh the immediate cost that we incur for the decision that we make now plus uh the optimal cost to go which is the cost that we incur for whatever state we end up in and so this state that we end up in is connected to our initial state via the dynamics and so that's why we have these dynamics constraints and we can for our particular problem we can sort of plug in the things that we've already computed so the immediate cost here this g uh is exactly our stage cost as we have defined it for lqr so this is just this quadratic expression in states and controls and then we can also get rid of the dynamics constraints so we here we still have the dynamics constraints and the decision variable for x but here we can get rid of these dynamics constraints by applying exactly the same trick of forward simulation so we simply replace x t so which is the t at x at the final time step so the next state we can replace it with the dynamics so that we now have only a decision as only only the decision variable of the controls at t minus one and we have no constraints at all and if we now substitute uh if we now substitute back our so we we already computed a closed form expression for the cost to go at the next time step and so here we can plug this in so we can replace this function uh with a with this expression x transpose p x and uh therefore we can write our uh our optimization problem like this and if we uh look at this so we here we've just rearranged the terms so that uh here we have a squared term in x then we have a squared term in u and then we have a mixed term um we can look at this so from the perspective of optimizing over u as a function where this solution is a function of x x again is a constant and so this is just a constant this is a linear term down here and the highest term that shows up here is a quadratic in u and we can convince ourselves that because r was assumed to be positive definite and q was assumed to be positive semi-definite and pt is exactly q then positive semi-definite plus positive definite still remains uh positive definite so this matrix uh is so this expression is convex quadratic in u meaning that again we have this nice uh upwards curved cost landscape and therefore we can apply the same trick as we've seen earlier just now solving for only a single time step and so we can find the global minimum of this cost landscape simply by setting the gradient to zero so we take the gradient of the expression that you saw on the previous slide and set it equal to zero now we have to solve for u star to get the optimal input at the previous time step and this will turn out to be this linear feedback strategy in the state so we just have some matrix f multiplied with this state which will give us the control that we which is the best control that we can apply at that state uh to fulfill our task and this f matrix so the feedback matrix is computed with this formula which is simply done by rearranging or by isolating you star here on the left and whenever you see this inverse you have to ask yourself okay can i safely invert this matrix and we've just convinced ourselves on the previous slide that this matrix is positive definite so we can safely do so and then we can this resulting feedback controller we can now evaluate at an arbitrary state simply by multiplying that state with this matrix okay that was the backup so how we sort of compute the controls at once time step backward in time now we also have to back up the costs so the optimal cost to go we can again do that by plugging in the optimal uh so the optimal control function into our to go and so we will end up again with this kind of expression where we get a cost that is quadratic in the state now uh so we again have x transpose p x but now this p at the previous time step is updated because it's updated because we plug in this expression here for u and our cost function and so again we get this updated uh updated optimal cost to go so the only thing that we now have to convince ourselves is what's happening with this matrix or what are these properties and you can show that this matrix is still positive semi-definite uh even though we don't know took it one step backwards in time and this is again derived from the assumptions that r were positive definite and q was positive semi-definite so therefore p is also positive semi-definite and so all of this composing this idea together we already have the dynamic programming algorithm for lqr we've now just seen how we can take we can how we can sort of back up the cost and the controls for one time step but we can now apply this idea recursively so we start at the initial time where we have seen that the initial cost to go matrix pt must be exactly q and then we can recursively compute the state feedback strategies for each of the time steps uh before that until we are at the initial time by applying this recursive rule which is exactly the two equations that we have derived on the previous slides so we can always back back up sort of all all these expressions from the next time step so this is just the um the optimal cost to go matrix at the next time so we know all of these because they are in in the future and with recursively going backwards so we can this way we can compute the feedback gain and we can do the same for the quadratic cost and with that we then have a state feedback controller for each of the time steps and we know that when we follow this optimal controller we can ensure this cost which is the best cost we can achieve in this given problem and so to summarize the main idea of the dynamic programming approach to lqr is that we can decompose the lqr problem into smaller uh one step optimizations so instead of solving one big batch problem over t over all of the t time steps we just solve t very small one step problems and the the nice thing is that we can do this in a nice recursive fashion to get a state feedback strategy and we can solve for these state feedback strategies one stage at a time now a few comments uh on this approach in particular in contrast to the batch approach that we've seen in the previous chapter and so the main thing here is that instead of having to invert one big h matrix which is m t times mt so it grows with a with a horizon of the problem we now only have to invert t so the number of time steps that we have in the planning horizon smaller smaller matrices which are always of the dimension m times m so we the this matrix that we have to invert which is perhaps the most expensive part of this whole computation now has a constant size it doesn't depend on the planning horizon so this is now linear complexity in the horizon because we just have to solve more of these small matrices for longer horizon but the matrix itself doesn't grow and the other nice thing about this approach is that we don't just get a sequence of controls that we have to apply so just the values of the controls in open loop that are only valid for a very specific initial condition instead we have solved for a feedback strategy that we can evaluate at arbitrary states and this is quite interesting because it also allows you to compensate for errors so imagine that you end up at a slightly different state you don't have to recompute this whole optimization procedure again you can just evaluate your feedback strategy at this new state estimate okay so that concludes the section on on lqr so this bit of a deep dive for this special case where we could compute the the optimal solution quite efficiently now we look at how we can use this idea to also solve nonlinear problems so problems where we don't have linear dynamics but potentially nonlinear dynamics and also non-quadratic costs and the main idea that we will exploit is that we can approximate the solution to a nonlinear optimal control problem via iterative sort of local approximations of the original problem as lqr and then we can take small steps in lqr direction and we will go into how that works here so remember for lqr we can sort of analytically compute the global minimum of the cost landscape and the batch approach and the dynamic programming approach we're just vehicles to doing that in different ways and so it's a bit simpler to think about this perhaps in 1d and um but if we now have a non-linear cost so our non-quadratic cost non-credited cost landscape then we cannot solve this analytically anymore but what we can do is we can exploit this lqr solution to take steps towards the the solution or it steps towards at least a global local solution and so what lqr or iterative lqr does is that we form a quadratic approximation of this cost landscape so we do a taylor series approximation here antenna approximation here and for that this will give us an lqr problem which we can solve analytically to global optimality and so this will take us uh in the direction of a local minimum and so we are here we sort of approximated the original problem about this red point which is our operating point down here and then we always step to this next point and so now our operating point is this new red point down here we again form a quadratic approximation of a cost landscape so we again locally approximate this problem as lqr and then again we step to the minimum of the iqr problem and if we repeat that sufficiently many times that will be at a local minimum which is why we're using this different color for for this so here there's another minimum over here which is perhaps the global minimum of the cost landscape but at least we can still find a locally optimal solution and so if you've seen other approaches like the newton's method uh this will seem very familiar to you and in fact this is just a special case uh of the nuisance method where we exploit the very structure of this control problem and so let's look at the algorithm of iterative lqr for which you can see the flowchart over here and we go through the idea of iterative lqr step by step so the first thing that we need always for these iterative approaches is some initial guess so we need to know some idea of controls that we may want to apply and this can just be maybe for this car just slowly driving forward can be a simple initialization and the first thing that we will do is we roll out these we roll out these controls to get the corresponding trajectory so we can just take these controls and we have the initial conditions from our state estimation pipeline and we just apply the forward simulation function that we've defined earlier to get the entire sequence of states so we know the initial state we know all the controls which we get from our initial guess so these are not the the uh optimal controls but they're just some controls that we may apply and then we get this trajectory so these are the sort of uh step one and two and the next thing that we will do is that we compute uh uh we want to approximate this problem as a linear as a iqr problem so and that is we need linear dynamics and quadratic costs and so the first thing is that we do is that we linearize the dynamics about the operating point so we've just we've we have some initial guess of what the controls are and we've simulated this forward in time to get the corresponding states and now we can if this was the original dynamics potentially non-linear so here these blue dynamics we can locally approximate these dynamics with the taylor series expansion to first order and so this is the operating point which is the simulated state that we are at at that time and at this iteration of the algorithm and so the affine approximation of this just goes through this point over here this red point so there it will be perfectly there it will perfectly match the original dynamics and then the further you go away from it it may potentially deviate from the non-linear dynamics and so the taylor series approximation about that point is this exactly this expression here where we've shifted our coordinate system into this point so that we are now not using x as a state but we're using delta x which we define as the deviation from the operating point so we're asking ourselves what are the dynamics around uh this point uh our so our current operating point and we do the same sort of centering about the operating point for the u so delta u is just u minus u bar which is the input at the operating point and we exactly get um when we do that we exactly get these linear dynamics that we've already know how to work with so just a times x plus b times u with a slightly redefinition of the state here and these a and b matrices they are simply computed from the taylor series expansion so a for example is the jacobian of the dynamics in x so how does this state change when i slightly apply a different when i'm slightly at a different state and same for b i'm this is the jacobian of the original dynamics in u and so this gives us these linear dynamics we can apply the same trick for the cost so if this was the original potentially non-quadratic cost so this blue thing here the the blue cos landscape then we can form a quadratic approximation this time about the operating point so here this was our operating point now we fit a quadratic function to that point where we want to match the value at this point we match the slope and then we also match the curvature and so this is the quadratic approximation of g around this operating point which again rewrites in terms of our delta state so delta x and delta u the deviation from this center and if we look at this so if you look at this this is exactly um this part we already know so this is exactly the quadratic cost that we've already seen um and that we know how to work with and then there are some of these additional terms and just for reference these terms uh that show up in the course they're computed like this um we're not going uh over these in detail these are exactly the sort of the same idea of taylor series expansion we've seen at the previous slide but now we're asking ourselves how can we how can we work with the additional terms so here we already know these quadratic terms we know how to incorporate in the lqr problem now here some additional terms that we haven't seen before there's one further trick when we want to solve the aqr problem we can augment our state to now include an additional one at the bottom and so we define a new state z as just delta x and a 1 as a one additional state component we can do the same for the input so instead of u we now use v which is uh delta u and then a one at the bottom and if we do that we also have to define the corresponding state matrices and you can uh the corresponding dynamics and you can convince yourself that this encodes exactly the same dynamics as before so we just um bait embed this a tilde here as the top left entry in this block diagonal matrix and b we pad with a zero on the bottom or with zeros and at the bottom and then we can rewrite this quadratic cost uh q um where previously we had these additional uh affine terms we can now embed these in the quadratic cost matrix here and same for the control cost and so you can convince yourself that also this cost which is now here represents exactly the same objective that we've seen on the previous slide and so now we have this standard lqr problem that we've already seen uh how to solve either with a batch approach or more efficiently with a dynamic programming approach and we can just solve the solution or find the solution of this problem so now we add this part of the algorithm and then as the final part of our iteration we will step towards this solution that is we've now found this global optimum of the local approximation about our operating point and we want to now take a step in that direction and that is we will take our original operating point sorry this is this one and we just take some step size alpha between 0 and 1 in this direction so if we were to go a full size of step size 1 in this direction of delta u star then we would exactly end up at this blue point which seems like a good idea in this setting but in practice we may also choose to take smaller steps here and we will see some of the tradeoffs trade-offs soon okay and so now we are at the end of this iteration now we have to repeat and so we have an updated operating point so we have updated uh controls we will um we will roll out these controls again to get the updated states then we get a new approximation of this problem as a local qr problem and then we can solve that problem again and we will we will repeat that until our operating point has converged that is until the state between the iterations does not change anymore okay so that concludes the algorithm that you can the ilqr algorithm that you can use to approximate nonlinear optimal control problems there are a few or two practical considerations that i want to highlight here which are important to make this algorithm work reliably in practice and so uh he will just go over the main ideas uh of why that is important and what you can do to to address these points and so the first point is that we have to make sure that our cost landscape remains positive definite or that our lqr problem remains positive definite that is in our original example here that we used to visualize this idea we had we just formed this quadratic approximation and it was nice and upwards curved and when we stepped to that minimum we would actually make progress to a local minimum of the original cost however in other settings when we linearize maybe about this operating point here then the cost landscape may have no curvature at all and that corresponds to this matrix that we had in our lqr problem not being invertible and so uh yeah because this doesn't have uh any curvature so it will have zero eigenvalues or even worse if we are close to a maximum then the quadratic approximation of the original cost landscape will take this shape and setting the gradient to zero will not find a minimum but a maximum and so then this matrix is invertible but it's just this update is pointing in the wrong direction and in order to fix this issue we have to apply what is called a regularization and there are different ways in which we can recognize the problem in general our task is to make sure that this matrix here is always positive definite so previously we've just assumed that this is positive definite but we cannot now these because we just assume that r is positive definite and p is positive semi-definite but now these matrices are not we can't choose these matrices but they are derived from the local approximation of the problem and so we have to make sure that when we take the taylor series expansion that these assumptions still hold and a simple thing that we can do is to make sure that this expression remains positive definite we can just add a scaled identity so we add this identity matrix scaled with a small value row and we have to change this or we have to choose this value row sufficiently high so that this this local approximation is upwards curve so this essentially just bends the approximation at this point upwards and it will make sure that all of these eigenvalues are positive and we can invert and we'll also take a step in the correct direction now because now when you take a step we would end up here and we actually go towards the minimum the second practical practical consideration that i want to highlight is a suitable choice of the step size step size alpha so when we choose this step size alpha we have to make solve this a trade-off between making a lot of progress towards the goal so for example here in our original operating point if this is our local minimum our global minimum of the approximation a step size of one would correspond to exactly going to to this star here so we're making a lot of progress and that is a good idea but on the other hand we also have to make so for this example it's a good idea but on the other hand we also have to make sure that we computed this local approximation with the taylor series expansion and this expansion is only valid for a local neighborhood of our operating point so we have to make sure that we don't move too far because the local model is not very accurate and here's an example of when that can happen so imagine that we have this different cost landscape and we linearized about this operating point here and due to some stronger non-linearity in the original cost this sort of curves up upwards quite quickly whereas our local approximation about this operating point predicts that the minimum will be here and so if we were to take a full step size of alpha equals one to this minimum we would actually not even reduce the cost but instead increase the cost throughout this iteration and so this harms the convergence of the algorithm and we may not even be able to find this local minimum and the fix to make sure that we take a suitable step size um so the fix to make it take a suitable step size uh towards the local minimum is this idea of line search where we try to adaptively find a good value of alpha essentially by trying out different uh step sizes so for example we try out step size one and then we see that the cost even increases and so then we would in a back tracking line search we would just reduce the step size so we now divide it by two and now we get the smaller step so we have the original we don't have to compute the approximation again so we can still keep the same iqr problem we just have to re-evaluate the cost and so now we see if we take a smaller step then we are a bit closer but we would still not accept this uh we would still not accept this step because this blue point doesn't sufficiently decrease the cost so it's almost the same height as alpha and so we only accept uh steps that significantly reduce the cost and so we would divide the step size one more time and now we made only a small step but we are very close to the local minimum here and now we would accept this step because it's neither going up uh up in cost and it ensures a sufficient decrease okay so that concludes this section on iterative lqr which was the last section of this lecture and so to summarize the the approach that ilqr takes is that we can if we have some initial guess for the controls so we have some initial strategy that we can apply then we can iteratively refine the strategy by locally approximating our original nonlinear optimal control problem about the corresponding operating point as an lqr problem then we solve that lqr problem take a step towards the solution and then repeat that until convergence a few comments uh on the this algorithm so in general that is a very efficient and well established uh algorithm for nonlinear optimal control in particular in robotics and it works particularly well if the problem is are sufficiently smooth so if there's high nonlinear non-linearities maybe even contact then this doesn't work so well because the taylor series expansion is not very accurate uh but for many problems uh for example like the self-driving car uh this is a this is a an excellent algorithm and uh we've uh yeah we've already seen that uh these additional tricks like regularization and line search are very important to make it reliable in practice so if you have some robot that has to follow some track you you cannot just sometimes accidentally increase the cost and so getting these practical considerations right it's important for reliability and finally i want to highlight that this algorithm can also be extended to include inequality constraints so here for the scope of this lecture we have simplified the setting by assuming that we have no constraints on state states and inputs beyond the dynamics but there exist extensions like penalty methods where we can include an additional cost term to represent these constraints and we can adaptively estimate the weight for this cost term okay so that concludes today's lecture to summarize what we've looked at today we've begun with a quick review of model predictive control which motivated this deep dive into optimal control today so the numerical methods for model predictive control and yeah we've begun or we've started our discussion with a review of how you can eliminate some of the decision variables and the dynamics constraints to obtain an unconstrained optimization problem via forward simulation and to hopefully convince you that you may have already seen this unconstrained optimization problem in other contexts then we've done a bit of a deep dive into how we can uh solve a special class of these optimization problems or optimal control problems which are problems with linear dynamics and quadratic costs and so we've looked at the batch solution to lqr which uses the idea of forward simulation to solve this as one big linear least squares problem and we've also discussed the idea of dynamic programming which exploits the fact that we can recursively compute the solution to an iqr problem quite efficiently and finally we've pieced all of these different ideas together to come to discuss the iterative lq algorithm which is an algorithm that solves non-linear optimal control problems through iteratively approximating the nonlinear problem as small local lqr problems that we can use to find a step direction and make progress towards a local minimum okay so that concludes today's lectures if you are interested in reading more about these topics there are several resources that i can recommend so the first one is this book on predictive control for linear and hybrid systems by francesco reli where also the lecture material for this mpc class from berkeley is available online if you're interested in more of the numerical background in particular some of the tricks how you can the practical considerations how you can make uh how you can even get convergence guarantees and make these algorithms reliably work in practice i recommend this book by notre dame wright on numerical optimization and finally if you are also interested in optimal control in general trajectory optimization and maybe some coding exercises there's a cause by russ tedwick from mit on under actuated robotics thank you for your attention
Info
Channel: Cyrill Stachniss
Views: 1,553
Rating: undefined out of 5
Keywords: robotics, photogrammetry
Id: TtGCEzYxM_A
Channel Id: undefined
Length: 72min 36sec (4356 seconds)
Published: Sun Oct 24 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.