Numerical Optimization Algorithms: Step Size Via the Armijo Rule

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone and welcome to another video so today i'd like to continue our discussion on numerical optimization algorithms and in particular i want to investigate using uh what's known as the armijo rule for choosing the step size now remember uh we've talked extensively about step size selection in your numerical optimization routine in fact we've had multiple videos on that in the past so if you haven't seen these two videos before um please make sure you watch those before continuing on because i want to directly extend some of these ideas and in fact we're going to continue working with one of the cost functions that we looked at in the line minimization video so in particular maybe just to set the stage for why we want the uh our mijo rule and why it's useful let me draw your attention back to the cost function that we examined like i mentioned in the previous line minimization uh discussion so if you remember in the way of choosing the step size using the line minimization technique we said okay if you are sitting at some location x in state space or a decision space and you've chosen your descent direction which we called d0 or dk or whatever right the way line minimization worked right is that it said well i'm just going to go ahead and look along this line from 0 to infinity and i'll find the minimum location of the cost function and that's going to be my step size right and again that made perfectly good sense theoretically right and conceptually that was a great way to go about it but we saw that um practically the way that you're going to do this uh is you're gonna gonna take this uh search direction and you're gonna chop this up into you know hundreds or thousands of points right and you're gonna look along this line for you i don't know 2 000 3 000 10 000 locations and you're going to iteratively evaluate the cost function at every single one of these blue dots and then find the minimum right that's how you're going to numerically implement this in matlab c whatever right now that again is all fine and dandy especially it's fine and dandy if you happen to have a analytical or an easily evaluatable version of your cost function right remember this was the cost function we were dealing with and again it's nothing super complicated but again think about this from another perspective what if you didn't have this cost function available right what if this was hidden from you or maybe even in another maybe a more uh real world situation you have some unknown black box for your cost function right you might not even have an analytical description of it this might be some you know a flight simulator where you stick in some state vector or some decision vector and out comes the cost at that location right so you don't even have an analytical description of this and what's more uh concerning is yeah if this is really uh complicated this cost function like it's an entire flight simulator like doing one run of this means booting up the flight simulator flying a mission and then evaluating the cost evaluating the cost function even one single time might be computationally expensive right so this approach of doing this you know 5 000 times is no longer tractable right it's not practical so what the armijo rule does is the armijo rule is similar to this line search but it asks the question of instead of uh brute forcing this by discretizing this door this search direction into thousands of locations um could i maybe identify certain ones of these which matter more than others can i maybe take a step like maybe i'll i'll test out here and if that doesn't go so well maybe i'll test some other location or another location and maybe only have to do this four five maybe ten times along this direction and still be able to find a step size which uh which makes the cost function go down and actually provide some guarantees right so that's what the armijo rule is going to buy us it's going to buy us a very similar type of behavior as the line search or the line minimization but it's going to do so at a much more favorable computational cost okay so if that sounds good maybe um let's talk about what is the armijo rule and what's really cool about the armijo rule is it's fairly simple to implement and there are too many parameters associated with it in fact there's only really three parameters um associated with the armijo rule so the first parameter is a value of s and the only restriction on s is that this has to be greater than zero okay the other parameter is another parameter which is beta okay and the restriction of this is beta needs to be in the range of zero to one exclusive right so in other words uh the beta has to be whoops sorry right now to space beta has to be strictly greater than zero and strictly less than one you can't have a beta of zero and you can't have a beta of one but anything in between is fine okay and then finally you have another parameter known as sigma okay and same thing sigma oops needs to be in the range of zero to one exclusive as well okay so that's the idea it's these parameters okay and now once you have these three parameters the step size that you're interested in is really easy so the step size is simply going to be um alpha k right which is your step size it's just going to be beta to the mk times s okay and here this parameter this mk is basically this is the first [Music] non-negative integer uh for which the following is true okay so f of x k is greater than or equal to f of x k plus beta to the m s d k minus sigma times beta to the m s gradient of f of x k transpose times d k okay this looks like a mouthful and in fact this is quite a bit i bet let's label this equation let's call this in in the notes i have this as equation five so let's let's be consistent okay so um this looks like a lot but we're actually going to see this is actually pretty slick and it's not that bad so again you just choose these three parameters s beta and sigma those are the inputs to this algorithm right you basically pick them at the start and again we're going to talk about what these physically mean in a second but for now just imagine these are given to you right and then it's actually really easy the step size is you just take beta to the mk times s that gives you your step size the only thing that's interesting is this mk so what mk is is you start at zero right it's an it's the first non-negative integer so you start at m equals zero and you check does it satisfy this equation and again look at this you know all these terms you can evaluate the cost function where you are then you know what beta is you know what m is right m starts at zero then it goes to one two three four whatever right um you know what dk is right you already have picked your descent direction in this algorithm you know what sigma is you know everything you know what the gradient of the cost function is at the current location so again it's this might look like a mouthful but it's one simple simple check in an if statement right you check does m equals zero satisfy this if no you bump up m go m equals one check does it satisfy it no go to m equals two et cetera et cetera and eventually we're gonna see that this has to this should satisfy um at some number beta right this should become true that's when you break out of this and you just calculate your step size right off the bat so um in fact why don't we let's let's write this up as a pseudo some pseudo code because as you can see it's actually just gonna be a few lines um so let me erase some of this just to get ourselves some room so pseudo code for this for uh maybe let's call this you know an armijo rule this is the name of our function right so uh to use matlab speak uh you know what we should do instead let me let me just prototype this up like a matlab function right so you're gonna have something you know function the whole deal with our mijo rule is it's gonna spit out alpha k right that's all this thing is gonna do right and apparently all you're going to need to tell the armijo rules you need to tell it the descent direction so it needs to know where you're going dk what else do we need to tell it we need to tell it the three parameters s beta and sigma and that's pretty much it i'll assume that you also you know the this army whole rule has access to the cost function has access to the cost function gradient heck maybe you want to you know pass in the gradient of the cost function in as a parameter but again we'll look at this in actual code in a second but you know from a high level perspective this is what it needs to know the descent direction the three parameters and it will calculate the um the step size alpha k using the scheme so it's really easy uh let's go ahead and now that you have all these let's just talk about this so again let's put a comment here and we'll note that s beta and sigma are given slash known right and i guess so is so is dk all right so all you got to do is in the pseudo code we're going to initialize m m is going to be zero right start this out as m equals zero and then all we're going to do is let's just start some while one loop right um and this is pretty simple uh while one i'm gonna put an end statement down here okay so in this loop all you're doing is is literally you're just checking is is this thing true right so just check you know if this thing is true maybe i'll just write this in to be lazy i'm just gonna say is equation five true right if it is you're basically done right so what you're gonna do is just uh i don't know let's call it mk this is the critical thing we want it's just the current value of m right and then you're gonna break yourself out of this loop right so exit the loop if it's not true right all you're gonna do is just basically um what you're just gonna increment m right so this is really dumb m is m plus one right and then i'll end the if statement and then i'll end the while one loop okay so by the time we exit out of this while one loop we have got our mk defined so now it's just a matter of calculate alpha k right so outside here the next line oh shucks i'm gonna run out of space let me um let me do this okay so now uh you have enough information to basically say alpha k whoops alpha i want alpha not a alpha k it's nothing more than beta to the mk times s that's the return argument and then you're done right so it's not that bad uh so you can pretty much code this up in just a few lines okay now while it's easy to implement we should ask ourselves what is going on with the armijo rule why does this work we should do a little bit of analysis on what is this little check that we're doing here we're going to see that a lot of it comes down to what's going on when we're testing this so tell you what give me a second to erase the board and uh we'll analyze this in just a second all right so i moved some of the important relationships up to this section of the board to kind of get out of the way so again remember these are the three parameters of the armijo rule and then this is how we are choosing the step size and then here was that interesting uh check that we need to perform at each step of the uh or each sub step i should stay within the our mijo rule function right so let's make a couple of observations and we'll understand what the armijo rule is doing i think by digging a little bit deeper into this so um a couple of things to note uh right off the bat uh we can obviously see that beta the actual in fact s beta and sigma all of those are positive numbers right there there's no way they can be zero or negative okay so uh in conjunction with that actually if you look at this then one other thing that we can note is that beta to the m right this is also strictly positive right we can see that because beta is always positive and m it starts at zero and then goes only up right it's integer values going up so this is always also going to be positive right so uh using these observations what we can do is actually make a note that this entire second term including the minus sign so maybe maybe let me let me be very careful where i'm drawing these braces this entire second term in the armijo rule this thing if you look at it let's think of it see if we can tease this out right uh maybe we should make a note here let's make a couple of notes right we also note right what do we know about dk right this thing was a descent direction right so what is the property of a descent direction the descent direction meant that when you take the gradient of the cos function at a location right dot it with the direction that you're going right this is going to be strictly negative right that's the whole definition of a descent direction if you move in this direction the directional derivative of the function in that direction is negative meaning the cost function will immediately go down in the vicinity of xk right if you move in the direction dk so we can see that this term right here is negative right maybe let me see let me see how many colors we can jam into here right the gradient fk transpose dk this is negative let me i'll just write this as negative right that's negative okay what do we just say about beta beta to the m this term right here this is positive right this beta to the m this is positive right what about sigma we know sigma here this is positive and then you got a negative term here right you got this minus sign obviously obviously the minus sign is negative right so look at this this entire thing you get a negative positive positive negative so this thing that we've got here in green this has to be positive right this whole thing is positive [Music] right um in fact why don't we call this entire thing let's call this the improvement factor okay uh maybe where should i put that let's define this down here so let's go ahead and arbitrarily i'm going to define or rename i guess maybe it's a better way to put this we're just going to rename this entire gobbledygook there including the negative sign as the improvement factor or i'll abbreviate this as i f right so the improvement factor at any given step is negative sigma beta to the m s grad f x k transpose dotted with d k okay and also the other thing to note about this look at this sigma this is constant doesn't change beta is constant s is constant the gradient is constant because our x k at a given location is constant right we're all thinking about what does this look like at a given location x k and same thing d k is constant but beta can or m can potentially change right so maybe up to to denote that i'm going to say this improvement factor is a function of m right so uh again you start at m equals zero you check if that thing is true if not m becomes one check if it's true if not m becomes two et cetera et cetera so this improvement factor changes with every iteration right so what's interesting is again stare at this thing long enough we we just showed that this is must be positive right so maybe let's again make a note here let's make a couple of bullets right so the improvement factor of m is strictly positive for all values of m right again i guess maybe we should say for all m that we're considering right the ones that are in the positive integers or i guess we should even say non-negative integers right that's the ones let's just say m is in 0 1 2 etcetera etcetera right non-negative integers right so the improvement factor has to be positive and what's also interesting is if you look at how this improvement factor operates right again because beta is in the range of zero to the one right so when m is equal to zero you basically just have beta when m becomes one uh or sorry no when m is zero this is one right uh i guess what i'm getting at is is beta to the m right what does this thing look like versus m right if we just make a table so when m is zero this thing is one right when this thing is one this thing is what it's just beta when m is two you get beta squared right yadda yadda yadda three you get beta cubed right now the thing to note though is because beta is strictly less than one right this whole thing is getting smaller and smaller and smaller and smaller right beta the m becomes smaller and smaller and smaller right so numerically i don't know let's make something up let's say let's say beta is 0.8 right so this would just be 0.8 right and then beta squared is 0.8 times 0.8 so what is that 0.64 or something like that then you got 0.8 times 0.8 times 0.8 does everyone see what this is what i'm what we are getting at here what i'm getting at is i want to make a note here that as as m increases right though the whole improvement factor right m decreases again and the reason that happens is because this beta to the m term this is the only thing that's changing at every location everything else in this term is is fixed and constant so the entire improvement factor as m goes up i am just starts going down maybe maybe let's use arrows right so we know the improvement factor is always greater than zero and as you keep cranking m up this improvement factor goes down and down and down and down and down right um okay so uh if we accept all this let's just go ahead and rewrite this entire check right that complicated check statement that we had here right we're checking to see if x of x sorry f of x k is greater than or equal to f of x k plus beta to the m s d k right then what we can do is i can just write this now as plus the improvement factor at m or sub m right okay cool let's keep going um let's make a note right do you recall the whole idea with our uh numerical optimization routine right was we said that uh okay the step size here was beta to the m times s right so a candidate uh alpha right it's not alpha k alpha k is the actual step size we're going to take but you know we are checking it every time does as alpha of beta to the m times s does that work right so let me just plug this in right this is really how we are choosing uh a step size at any sub iteration of the amigo rule we're just asking does this step size effectively work right this that's the step size of beta the m to the times s right so that's this thing right here so look at this you can rewrite this whole thing left-hand side is the same greater than or equal to f of x k plus alpha dk right plus the improvement factor oops i f so uh to the m right okay now again this is exactly um you know if alpha is your candidate step size you take that step size in the direction dk and you move that distance away from where you are right now this entire thing in parentheses this is your potential candidate x of k plus 1 right that's where we're going so you're asking yourself does where i'm going to go if i take a step size of this alpha does it satisfy this entire relationship right so again i can write this as f of x k greater than or equal to f of x k plus 1 plus the improvement factor sub n right so what's great about this is you can immediately see where this crazy wonky check came from all this is doing right is the armijo rule is going to pick a step size alpha that is going to guarantee that the cost function value at the new location is less than where you are by the improvement factor right so let me let's back this up and parse that again right if you stare at this long enough right if for example let's say that the current cost function location of where you're sitting right now right that's this location here here's your current guess right and i'm going to make up some number let's say this is 10.5 right that's the cost function value right and now your improvement factor you know at a given m right so again remember the improvement factor where was it down here this thing right you know all of these so for whatever value of s beta sigma you you've gone ahead and chosen um you plug it all into here and you figure out what's your improvement factor at m equals zero right and let's say your improvement factor comes out to be you know uh the 2.5 something like that right so what the armijo rule is going to do is it's going to check to see if i take a step size in the direction dk by this distance does this cost function value this head better be in order to make this true this has to be what it has to be what uh gosh i do the math eight right yeah eight or less right in order to make this equality this inequality true so if the if the cost function value at this new stepped location like for example right here right this is your candidate location so this is your test distance alpha is equal to beta to the ms right if this spot if the cost function here is eight or less thumbs up the amigo rule says you're good you're done you can you can exit and that's the step you're gonna take right because it is shown that the cost function you'll guarantee that the cost function value has decreased by at least the improvement factor now if it's not true right what if this thing if the cost function value here was you know nine right so even though we see that the cost function value went down by moving at to this location right it used to be ten and a half it went down to nine but the improvement factor it it didn't achieve the appropriate improvement factor right so in that case what we need to do is we need to go through the uh our mijo rule again we're going to increment m by one and you're going to see as we discussed that is going to make the improvement factor go down right so it will become smaller so you're not going to demand as much improvement at the next sub iteration so this might come down to i don't know i'm going to make up another number right it might come down to 1.5 or something like that and then that is going to change the step size location and then that's going to therefore change the um potential of cost function value and maybe now this is a like seven or something like that and now you're hunky dory so again sorry that might be a little more confusing we're going to look at an explicit example of running through this but hopefully hopefully now we're getting a feel for what these different parameters are doing right so um in fact to get a better idea of that all right yeah let's rewrite this so uh let me move this to the other side so we can write this as f of x k minus f of x k plus 1. it's got to be greater than or equal to the improvement factor sub m okay so uh let's go ahead and resubstitute uh actually we'll go back to this form right we said x k plus 1 is nothing more than f of x k plus alpha d k right so this is now f of x k minus f of x k plus alpha d k right is less than or sorry greater than or equal to the improvement factor which again let's substitute in the definition of the improvement factor right this was negative sigma beta to the m s grad f of x k transpose d dk right okay and then let's go ahead and make the note that this term right here this beta to the m s b to the m s that's nothing more than the candidate step size that we're interested in right so it's alpha right so this alpha is the same as that alpha right it's again it's it's at each sub step of the armijo rule inside that while loop what is the potential candidate alpha that you'd like to test to see if it satisfies this relationship or not okay so uh with that i think we can write this whole thing as f of x k minus f of x k plus alpha d k all right it's gonna be greater than or equal to and in fact i'm going to pull the alpha out to the other side so this is let me see minus sigma grad oops let me let me make this more explicit my sigma's look kind of like gradients gradient f of x k transpose d k yeah i think that's it right times alpha all right and then to get rid of this negative sign let's just go ahead and flip the orders so i guess we can write this whole thing as f of x k plus alpha d k all right minus f of x k has got to be less than or equal to right because again i'm multiplying both sides by a negative one so i can flip the inequality so this becomes what sigma grad f of x k transpose d k again i'll put this in in brackets i guess just to know that this is a quantity okay there we go okay so if you look at this thing uh what you can see is let's look at the left hand side so this is just some constant right this is just some offset right or a constant if you want to think about it that way from a graphing perspective i basically want to look at this as y is equal to m x right this this thing hold this whole thing looks a little bit like i want the whole left hand side to basically be a y is equal to here is some slope m times your independent variable x right so this is basically the equation of a line going through the origin if i plot this entire quantity on the on the y-axis this is the slope here's an alpha on the x-axis right so again if you look at this this is basically this left-hand side is how much did the cost function change from the cost function value at xk right so this is again how much of the cost function change if you were to take a step of size alpha right and then over here this entire thing here's the slope times alpha so again what i'm going to want to plot is we want to make some kind of plot where you have alpha on the x-axis and on the y-axis i'm going to draw it like this this y axis is f of x k plus alpha d k right minus f of x k right now the question is what does this line look like right we see we see it's a line we see it's got to go through the origin right now the question is which direction does this line go again remember that this term here this grad f of xk transpose times dk the whole deal was dk was a descent direction right so i know that that term is going to be negative right and we know that this sigma again the way it was chosen is between the range of 0 and 1. so it's positive so in other words this slope has to be negative so this line is going to look like this right this is going to be the line of sigma times grad f of x k transpose d k times alpha right so this green line is a straight line with this slope okay now we got to ask ourselves what is this going to look like right this term if you think about this long enough again this is almost exactly the picture that we examined when we were talking about line minimization this is asking how does the cost function change as you move along a line starting at xk right at this black dot here you move in the direction dk which is this sort of dashed blue green dotted line right and you move along in the distance alpha so all this is this is basically what does the cost function value look like as you start going along the line dk and then we're just going to offset it by its current location so it's it's basically it's the cost function value but it starts at zero right so again remember i'm gonna assume you watched a previous video on the line minimization right this it looks like i don't know maybe this thing goes uh you know um how should we write this uh you know maybe i should use a different color let me use a different color let me use blue okay so you know uh i don't know it does something like i don't again i'm making this up right so this blue line is this term over here right so the blue is f of x k plus alpha d k minus f of x k right okay so now we can actually graphically see where does this satisfy right so what you're asking for what the armijo rule is checking is basically where does the blue line in this case fall underneath the green line right so if you look at this you see that uh you know in here you know from here to here right you see this window the blue line is below the green line so this would be an acceptable alpha choice right the armenian rule if you happen to choose an alpha in this range that is yay thumbs up it should exit out of the while one loop that we defined earlier similarly over here it looks like to here this is also acceptable alpha choices right whereas these regions are no good right so i'm going to maybe mark this as x's is bad bad bad bad bad bad maybe i should use the different color but hopefully graphically you see what's going on right now again look at this let's let's ask ourselves how does this green line can we tune the green line right and again yes you see that the way the green line is its slope is like this so we got to ask ourselves what tuning knobs or what control authority do we have on this well dk is fixed right we know which direction we're heading the gradient is fixed we know what we're doing which which what the gradient looks like at xk we can't change it um so sigma we actually see sigma will directly tune how steep or shallow this green line is right so again you notice that the way we've defined sigma is between zero and one right so almost the worst case scenario is a sigma of of one in that sense what happens if sigma is one if sigma is one this whole thing becomes the slope is the gradient times dk so uh actually what do we end up with if d that's the case we end up with uh this line right that it that it should go bare tangent to basically this right at the start ah geez i don't know if i'm drawing this entirely accurately so this is the sigma equals one line right and since we can't actually make sigma equal to one right sigma has to be slightly less than one so really this line should be slightly that direction right but again i can't infinitesimally draw it that way but i guess this is an interesting thought experiment for you as the as the viewer what does this look like right here in the neighborhood of alpha equals zero well because if you think about this since this is the sigma equals one line is tangent to the blue right at the origin the sigma which is slightly less than one right the sigma of less than one right that's going to actually i think if you were to zoom in on this picture right here right you would see the blue what you see i guess the green line here here's here's the sigma of less than one line and the blue line it should probably go under it ever so slightly right basically what this is telling us is that if you choose sigma small enough there is an area or sorry if you choose alpha small enough eventually you can get to the red uh the blue line going underneath the green line and the armijo rule should satisfy so again that's that's really kind of interesting that this almost has a baked in protection that if alpha is small enough right then our mijo should exit right so you're never going to get stuck in that while one loop it should hopefully exit because you'll eventually start choosing the these alpha values small enough that you that you get close enough to the origin that it does that it has this behavior right so again you can also see that that the way alpha's being chosen right is um as m starts going up and up and up right no matter what s value you have eventually this term beta to the m is going to become zero right so this converges to zero therefore alpha starts converging to zero therefore you start moving closer to this this un uh um unavoidable situation where it should exit so again hopefully that that puts some of your mind to arrests that you know you know whenever i write a while one loop i always have to ask myself is there ever a situation where i'm going to get stuck in an infinite loop and hopefully this analysis convinces you that no that that shouldn't happen with the armijo rule if it's coded up appropriately right okay so now this is great because now we can see what all of these parameters do so for example the sigma value right we now can write and we understand that this sigma this is sort of like telling us how much improvement do we demand at each iteration of the army ho uh rule right so let's write that down so this basically helps us describe how much improvement do we demand at each step of the armijo rule right actually here let me erase some of these things we have some more room to write okay because we see that basically sigma is going to tune the slope of this this green curve right so if the green curve is not um so steep right if it's closer to zero the armijo rule will basically exit even though the improvement factor is not so high right so basically it will allow you to choose step sizes where the cost function doesn't improve that much so this is almost like tuning how aggressive or how how much yeah i mean how much improvement are we asking for each step uh of the armijo rule right so we can see that basically uh closer to one what does that mean this basically means uh you you want it uh more aggressive right uh all right so this is almost like you can think about this as more aggressive improvements right because then you're getting closer to this situation where you need a lot of improvement right whereas closer to zero means the slope of this line is much flatter right so in that case i guess this is less aggressive improvements right okay okay um what else how what are these other ones how about beta let's do this in another color so i don't all run into each other so beta you can actually see right here here's how beta works its way into and this tells you what does beta uh tune in terms of your algorithm algorithm so beta you can see basically tells us how much does the candidate step size change with each sub iteration of the armijo rule right so uh uh a beta closer to one means that actually the step sizes are not going to change very much right you can clearly see that right if beta is closer to one beta to the m it doesn't change as much with each m iteration right so again let's write this down so beta this is basically telling us how aggressive is changes in alpha with each step or i guess we should call it again sub step of the armijo rule right so again we see that closer to one this means less change and then closer to zero means more change so again coming over to this picture what's going to end up happening right is say you go through your amigo rule with whatever constants and the first step size you chose ends up here right this is in the unacceptable range right with all of these x's so the armijo rule is going to say no that value of m equals zero let's say this is the m equals zero iteration right that's your candidate step size if that doesn't work you're going to bump up m to 1 we're going to ask ourselves how does it change well depending on on your value of beta this might change a little bit like here it just might make these small steps and then hopefully you end up in here so small steps again correspond to beta values which are closer to one or you could have large steps right so instead of these small little ninky steps you might take large steps but here's what's actually interesting the larger step you take you notice that we jumped over completely this this acceptable range and we ended up into an unacceptable range and then it will go again right and again you're going to see this we'll run this in matlab just so you can get a better feel for them but again this is what beta is doing so beta is tuning your step size sigma is tuning your amount of improvement or basically the slope of this green line right and then finally let's ask ourselves what is s right and again you can see it here's s right away and actually you can see uh remember the armijo rule starts out with m equals zero right so really the armijo rule the first candidate alpha that it shows that it chooses is beta to the zero times s or alpha is equal to s right so actually you can see right off the bat s is the maximum step size that you want to start the armijo rule algorithm at right so s is again let's write that down just so we're all um on the same page this is the maximum alpha value to consider in the army rule so again coming to our picture here right if we knew that i want to only look for step sizes you know within the range of say zero to eight or something like that let's say this is where where eight alpha is equal to eight kind of corresponds i don't really need to calc or s or consider cases of alpha all the way out to infinity let's just cap it at eight so in that case i would choose s is equal to eight in my algorithm right so that's i think we i think i think we've we've analyzed this thing uh to death maybe what we should do is we should maybe jump over to matlab let's code all this up for this exact problem and go ahead and let the armijo rule rip and see what it comes up with um one thing i probably should mention and i just realized that i probably should have mentioned this actually in the last discussion we were talking about line minimization i forgot to characterize all of these stationary points so i think i kind of stated without proof that this was sort of the global minima we were kind of gunning for well there's actually multiple stationary points and several local minima throughout this cost function in the in these areas let me let me clean this up this picture up maybe to make this not so cluttered um i'll try to clean this up as best as i can without erasing too much okay right so these red dot locations are actually the stationary points of this problem and if you were to evaluate the cost function at these locations this one here has a cost function value of 60. these two have a cost function value of 42.5 ish and then these cost functions values have values of 36.4 j of 36.4 so indeed uh this one here right is one that you can go for or this is the other one right so ideally those are the global minima of this cost function all these other ones are local minima um but again like i said i think i stated that without proof so again if you're interested in these actual locations feel free to check out the notes but um again this is what we're we're hoping the algorithm gets us to this bottom location from one of these starting points so all right i think that's enough preamble why don't we jump over to matlab and take a look at this and uh see if this works all right so here we are in matlab and let's basically code up one iteration of the armijo rule and again let's use the exact same scenario that we did when we previously examined the line minimization technique for choosing the step size so for this first scenario again like we said we're going to choose the exact same starting location and now let's choose an s of 15 so remember that's our initial maximum um step size that we're going to consider and again we'll choose you know reasonable values for sigma and beta so again sigma is 0.15 so it's not anywhere near one so we're basically saying we don't demand a lot of um improvement with each iteration and beta is actually closer to one so we're basically saying take small incremental changes in the step size for each sub iteration of the amiho rule so um again there's a lot of code here but most of this is just plotting let me scroll down to the actual guts of the armijo rule and here it is and as you can see it's basically like what we talked about um on the board right you just basically set the integer m to zero we just drop into this while one loop and then we're just going to consider here's the candidate alpha solution uh it's just beta to the m times s and then all we're going to do is we're going to look at where are we going to go x of k plus 1 if we chose that candidate's step size um size and the associated direction and then basically we're just going to check you know does the cost function value at that point uh decrease from the previous value by the specified improvement factor and if so we are going to then use that step size as our uh well that candidate step size as the actual step size for this iteration and break out of the loop and we're basically done if not we basically just increment m by one and run through this again so let's go ahead and run this and we can take a look at what it looks like so again just for giggles i'll show you here is the picture that goes along with it right here's our initial look guest location we're searching along this direction uh for maximum of 15 units and we're just calculating the amigo rule and here let me pull up what it looks like this is what the uh algorithm looks like so let me get this legend out of the way so we can kind of see what's going on so again the blue line is as we discussed this is the cost function value minus the initial cost function value so again you can think of it as just an offset version of what the cost function looks like um along this green line right and then as you can see i've drawn here in magenta this is the line um of basically that improvement factor if you had an alpha of exactly or excuse me if you had a sigma of exactly equal to one so this is sort of like a worst case most steep line that you could get but since we have a sigma value of 0.15 you see that this slope is decreased to 15 of what it should be so the green line is less than the less steep than the magenta line and again since we started out with an s of 15 this was the first candidate step size that we examined was basically as s of 15. and again all we're checking is did the cost function value at that point decrease from this point by the specified improvement factor so that's the green line right so all you're again checking is does the blue line fall below the green line and obviously it's the answer is no here so we increment to the next step size so m now is equal to one and this is the situation so here's now the new step size so the new alpha candidate is 14.7 and we check again is the blue underneath the green no so we keep going no no no no no no and eventually you see that here the blue does become less than the green so we exit out of the algorithm so this is one step of the armijo rule so this gets us a step size of uh gosh i can't read this off i don't know eight nine or something like that in fact instead of me guessing let's just come back and ask it um oops alpha k so here 9.4 is the step size in this case and let's look at the counter m so it took 23 steps to find a step size that satisfies um the armijo rule and basically here we go this is the step size we're going to take at this iteration using uh where is it these sets of parameters right here right so actually you know we can do let's let's just repeat this process let me actually uh comment out this close all let's keep all these old figures out so right now we still have figures for scenario one but now let's change this to run scenario two so scenario two is if you look at the parameters it's basically the exact same as before except the beta value is now cl uh not at as close to one right so now if we use this beta value of 0.85 again what this should do with the armijo rule is in every within every single sub iteration it should take larger changes in step size right so again if we run this we can now compare the two and i'll again let me pull down the appropriate figure so here's the new figure and let me see if i can where was the old figure shoot i have this somewhere uh whoops where did i stick it uh here okay let me put this on the left and we'll put this one on the right just so we can compare so again here on the left is scenario number one on the right is scenario number two and as you can see the blue green and magenta lines are all identical right they're the same sets of parameters and again s of 15 is also the same so our first step into the our mijo rule is the same but now the only thing that's different is since in scenario two the beta number was was not as close to one you can see that each time you increment m in the armijo rule the step size or the candidate step size it's delta it jumps right it jumps by a larger delta so in this case you can see that we're able to find a step size where the uh actually it's over here i didn't i didn't plot the actual good solution i just plot the failures but you can see that a step size over here is going to yield a alpha which is the blue line is below the green line so and again we can go over here and back to matlab and let me let me pull up what that alpha value is alpha k and again you see i think earlier we got 9.4 now you get 9.2 it still satisfies the specified improvement factor except now instead of taking 23 iterations it only took three iterations to get there right so this was faster and it got us um you know a situation where the cost function was improved by the specified amount so let's keep going and playing around with these parameters just so we can get a better feel for this so let's change this now to scenario number three for the armijo rule and all we're going to do here is i want to change the location of the starting point so instead of starting here at negative four positive three i want to go to this location and the reason i want to do this is i know some of you may be following along with the non-linear programming book from uh berticus i remember i mentioned and i referenced this book earlier it's an excellent resource and they draw a figure in that book in fact i'll show you it here just for uh for convenience and as you can see this cost function looks a little bit more like the behavior we showed up on the board where you have areas of in acceptable alpha ranges and areas of unacceptable alpha ranges if you look at the pictures that we are drawing right here it's pretty clear that there's one set of unacceptable ranges that's everything to the right of this oops oh sorry i moved the figure accidentally everything to the right of this intersection point is unacceptable and everything to the left of it is acceptable so this is a little boring in the sense that there's there's a clear delineation between these regions so i want to move the starting point so that we are in a different location and we get a little bit more interesting behavior so again let's go ahead and run this and i'll just show you the differences um so uh again maybe what i can do is let me pull up the figure here it is so now at this new starting location so in fact i'll show you the contour plot so here's this new starting point in fact i think we can also look at the old starting point um you can see here's the old starting point of where we were looking at for scenarios one and two it's you know it they look very very similar um it's just slightly i moved it the starting point ever so slightly just so that the cost function would have a little bit more interesting variation as you can see here so now you can see where there are sections where the blue line is under the green line and then it goes over the green line then it goes under the green line and then it goes over the green line again so as you can see again the whole deal with the army hoe rule is we are just we're monkeying and changing the slope of the green line and we're looking for areas where the blue line crosses under the green line so you can see that now we have this section of acceptable step sizes this region of unacceptable and then finally over here a region of acceptable again um and i guess i should have mentioned all these x's here are obviously in an unacceptable region so this is interesting because now again uh if you look at the s parameter i guess in this case i chose an s of seven right you can kind of see that the initial step size was at seven so it says still blue is above the green above the green above the green above the green above the green above the green above the green above the green and then finally aha here it goes below the green so i think it catches probably an alpha a step size somewhere in this region so again let's go and look at this what did it say did our algorithm get yeah alpha 4.6 right so the thing that's interesting is let's take a look at the contour plot and again this plot sort of side by side and um again remember what we were trying to do we remember when we characterized this cost function we said that there was actually five sort of stationary points or local minimum of this function right um one two three four five in the center of all these circles right and we said we're trying to find the minima down here or up here in these two right these were the two global minima right and as you can see i think we we fell into the correct hole so to speak right it looks like the armijo rule is starting to get us step sizes which end up in this bottom local minima right which is what we want but if you look at this uh in fact there's another region of acceptable alphas here right between zero and i don't know what it looks like on a 1.8 or something like that right so what could potentially happen is if we don't choose the armijo rule properties or parameters appropriately for this case for example like what if we chose um an s value of i don't know somewhere between in this picture i don't know 2 and 3.75 or something like that right you would start here and we would start saying unacceptable walking back back back back toward here and what that might correspond to is actually falling into this minima here instead of the one we're looking for so in fact let's look at that right now let's actually go to our mijo rule let's look at this uh scenario number four which as you can see is the same as scenario number three except the s value is now less instead of starting at seven and searching backwards let's start at 3.5 and search backwards and now if you run this you can see that actually that is exactly what happens so let's see if we can look at all these figures all at once so here here and then i don't know if i can get the other ones up at the same time yeah here we go okay so the top two set of plots are scenario three that we looked at earlier where we started at seven and started searching backwards and we fell into the right spot now if we start at three and a half right and search backwards we go and find this location as alpha and that starts yielding us or starts you know nudging the algorithm towards this lo this minima so again something to be aware of um when you're when you're picking these parameters you may need to choose different parameters depending on your uh problem to make sure that you guide the optimizer towards the appropriate uh minimum right now um while we're talking about this picking the right parameters right i know there's that nagging thing in the back of everyone's head of as we talked about earlier are you ever going to get out of this while one loop and i and i said earlier mathematically it should work right so let's look at scenario number five which is sort of a worst case scenario so this is um again we don't really care about where we're starting you know it looks like i'm using the same parameters as scenario number three but now i'm gonna choose a sigma of really darn close to one remember you don't want this to be exactly equal to one right it has to be in the range of zero to one exclusive so let's just choose like 0.9999 right so that is making that improvement factor slope super steep in fact um we're going to see that the the two lines should overlap the green and the magenta line are going to overlap in this case and now let's also choose a beta value of really really close to one so again remember what that means is that with each iteration of each sub iteration of the armijo rule the step size is not going to change very much so this is almost a worst case scenario here so let me run this and actually i'll let it start running because we're going to see this takes a takes a little bit longer if you noticed earlier um the the algorithm would almost exit very quickly you know within you know 10 20 iterations so now it's running running running okay it finally decided to finish now let me see how many it took look over a thousand sub iterations or calls within that while one loop but it did actually it exited right and in fact let's look at what the alpha k in this case is and as expected what ended up happening is we now have a very very small um step size right so again the picture that goes along with this is over here in fact let me let me just make this larger oh gosh now it's going to take a while to redraw because there are so many steps as you can see right the magenta and green line are are on top of each other right here and it sure looks like this line is all is underneath the blue line all the time so it looks like we're never gonna get a case where the blue line is underneath the green line but again remember because we chose it to be a descent direction eventually if you zoom in close enough and gosh i don't know how many times it's going to take me to kind of zoom in and boy matlabs it's running slow here because i did so much extraneous plotting but we know it exited at some point so we know that if i kept zooming into this you'll eventually see the blue line um go underneath the uh the the green line um so again i don't know if this is worth our time to keep zooming in and waiting for my computer to update but i think you get the point all right eventually it will work and uh we see that the armijo rule exits even in these worst-case scenarios but obviously it took a lot of iterations right so there are situations where this could take a while to exit um okay so i'll tell you what i'm not gonna keep trying to zoom in and dealing with this sluggish behavior um so uh there i think that gets us a really good idea and a good feel for what all these parameters are doing maybe what we should do now is let this thing rip um on the full numerical optimization algorithm and see where the system converges to all right so so to see this with the entire problem again let's just jump over to the right side of my code and again this is the same code we were looking at using the constant step size the diminishing step size the line minimization step size so i've just added some code and some scenarios for the armijo rule selection step size and again all we had to do was change or or more importantly add code down here for when we're choosing the uh the step size instead of using the constant diminishing or the line minimization now i add a situation where it's the armijo rule and again all we're doing is i'm literally copy pasting the uh the the very simple several lines of the armijo rule algorithm over here so it's the exact same thing so i don't think we need to talk about that too much um the only thing maybe i will talk about is i am storing um how many times the cost function or or or what m is at every step of the algorithm so effectively remember each time you increment m you evaluate the cost function once so what we're doing here is i'm going to be able to count how many times did i actually have to evaluate the cost function during the process of this optimization algorithm okay so let's go ahead and come back up to the top and yeah here let's let's execute this first vanilla scenario so again let's go ahead and start at uh wait sorry uh it's where my scenario 10. that is down here here we go here's our armijo rule that we're using with the uh same steps or same initial guess location we always were looking at and now what we're going to do is let's again choose an s of 15 sigma of 0.15 beta 0.98 in fact i think that's exactly what we chose when we were looking at um it over here for executing for a single iteration yeah i think this is the similar to this is basically using the parameters of scenario one in the single iteration we're going to use that in the overall scenario and i'm just going to keep iterating and see if we converge to one of those stationary points right so let's go ahead and run this guy and we'll run it for 10 steps and just see where it goes and uh oh shucks sorry i uh shoot i probably should have done this down here uh tell you what let me let me put in a break point so we can so i can move the figures down here and we can see what's going on uh let me put a pause here okay run uh okay so here's the picture in fact let's dock this in the over to the side so maybe we can see it a little bit more easily okay so again here's our initial location and uh we're free to go ahead and now just step through this so i'm going to continue and again that is the same uh thing that we got earlier right so we see it jumps down here we found ourselves a step size which satisfied the armijo rule so we're going to jump down here here's our next point now here's the fascinating thing in this case remember that we had an s of what was it uh 15 i think right here right s of 15. so again all the armijo rule is going to do is same thing we're going to look at getting a descent direction here using gradient descent so if i look at this i can tell that it's pretty much along this line where my i'm trying to draw with my my mouse right now right so it's going to look in that direction for start all the way at 15 so it's going to start actually all the way over here and then start backtracking backtracking backtracking until the cost function improvement satisfies the uh or is better than the improvement factor right that we saw earlier so actually watch this at the next step of this algorithm look at this interesting it actually came up here and found a location that it likes so now what's also fascinating is if you keep doing this let's keep running uh come on uh continue continue look at this it's actually starting to converge to this upper minimum which again i guess if we look at the performance of the algorithm right the cost function value it starts up high when it comes to and down low and it sure looks like it is converging um quite nicely to a minima and again it is a this this is a proper global minimum it's maybe just not the one we're expecting the one we were kind of looking for is this one down here the one in green that i had highlighted i think that's what we found in the line minimization technique was we ended up getting uh the algorithm to converge down here but you know this is basically the same thing right the cost function values are the same so it's it's getting us the right answer but this is something that's kind of interesting about these numerical algorithms is sometimes they get you solutions which might not be intuitive but at least mathematically they're correct so it is seeming to do that in this case now if we didn't like this and we wanted to make sure that we sort of ended up down here i think what we can do is as you saw the reason it was able to jump from point number two to point number three and make that large of a jump is because we allowed it to over in our selection of s right we said s was was 15 uh wait a second sorry um yeah s was 15 right so actually if i go down here to let me skip scenario 11. scenario 11 isn't terribly interesting i think all we are looking at is well okay sorry for jumping i tell you what let's talk about scenario 11. serial 11 is is basically the same thing but as you can see i'm just changing the beta parameters so hopefully we're just not going to take as many cost function evaluations so for example if i look at what is m data right this is effectively at step number one we had to evaluate the cost function 23 times in order to get um one location then on step number two it took us 26 cost function evaluations cetera et cetera so the total sum of this should be that total number of times uh whoops sum of nope m data so in this case for scenario 10 it took us uh 1283 cost function evaluations to converge to that upper uh limit or to that upper uh global minima right so now in co in scenario number 11 i'm just going to see if i can reduce that number a little bit so let me change that uh up here let's go to scenario 11 and we'll run this again and again let me go ahead and bring up the plot so here's the performance here's what the plot looks like so again we see the same thing we go from here we jump down here and then we jump up to the top minima and then it's got a little bit different behavior here but at the end of the day it still finds the same global minima but now if we look at mdata you see that at each sub iteration of the armijo rule it takes less and less steps so actually again looking at this is now only 232 so i think what was the what was the earlier number something like 1293 or something like that so look at that it only took about 18 of the of the number of function evaluation so it was much more economical in finding that uh that minima right um okay so now let's talk about scenario number 12 okay so let's scroll down here and i'll take a look at what's going on with scenario number 12. okay so this is the case where i'm going to basically take the exact same parameters of scenario 11 but i am going to handicap the of the the distance s so instead of allowing you to search 15 units in in front of the direction i'm only going to allow you to search 10 units in front so that is hopefully going to avoid this problem or i i guess i don't know if you want to call it a problem but it will avoid this phenomena of allowing it to jump from down here all the way to up here um with a step so let's go ahead and run scenario 12 actually i think i changed it already so let me go ahead and um in fact let me put a break point in again down here so i can move the figure onto the screen so we can see it easily uh all right let me put a break point here uh hit f5 to run this and again let's go ahead and dock the figure and now let's go and run and as you can see yeah the first step we jump down here now this next step if you continue look at that it it stopped here and that's because we didn't allow it to search all the way up here it had to stop within 10 units and basically it's starting to now you can see converge to this bottom minima as we discussed uh earlier so again we still get it falling into one of these global minima where i think the cost function value is about 36.4 i think is the actual minima that's exactly what we're we're seeming to nail it just happens to find the other minima right okay so this is great so we're seeing that we can really tune how the optimization algorithm performs by basically changing these parameters of the amigo rule so let's look at scenario 13 okay so scenario 13 is again another different variation on the theme um and now we are going to just change the uh the initial condition and show that we can still find that uh that bottom global minima um so let's do that uh here so again scenario 13 in fact again this is not terribly interesting so let's go ahead and run this and um we'll take a look so again here's a performance here's the actual results and as you can see uh it converges nicely but now with this initial condition we are also in uh sort of that danger zone that we looked at earlier in the sense that you can see that in order to get to this minima it had to cross over this other minima right this had to cross over this local minimum and on its way to finding the global minima so in this case i think we were lucky so to speak by looking at these parameters with s of seven right actually this is the case we were looking at um with the single iteration right where with an s of seven you are you give it enough leeway to search that it can find this global minima but if we were to handicap that here if we tell it not to search so far we'll get stuck in this local minima instead of finding the global minimum so in fact we can see that with scenario 14. if we go ahead and change this to scenario number 14 and rerun again all right let me show you that here's the performance and here's the results so as you can clearly see we we fall into the local minimum with the armenia rule and we do not get down to this global minimum function value so again this is a little bit interesting um that being said let's let's go back to scenario 13 because scenario 13 seemed to get us what we were looking for okay and what i do want to point out is that yes this gets us a location that we're interested in so it looks like this solved the problem appropriately and the nice thing about this is again if you look at the number of function evaluations it took us it took you know i don't know 837 function of value cost function evaluations in order to uh perform the algorithm now if you compare that with if you remember our line minimization that we looked at in the previous uh discussion right we said that this also finds that appropriate location but if you think about this long enough what we're doing with line minimization is you remember we had we discussed that in order to practically do line minimization you're doing something like this right you're creating you're basically slicing up uh the alpha steps between zero and s right your maximum value with some number of alphas so again let me run the line minimization and line minimization you see here we nailed it in only five steps right it ended up finding this bottom global minima did this very it on first appearance it looks like it's very efficient right it only took five steps of the algorithm to get here but at each step of that algorithm what you're doing is you have to evaluate the cost function at all of these alpha values so in this case that is 1500 i'm doing 1500 cost function evaluations for one step of the algorithm so if it took five steps to get to the that global minima and at each one of those i'm doing 1500 evaluations i'm evaluating the cost function 7500 times in order to generate this result and convert to the solution via line minimization and we saw that our mijo rule got it to us and oh gosh i already forgot the number it was like 870 ish or something like that so look at that 10 it's 90 faster or you it only takes 10 of the time right to to get to there even though it took more steps it took it took only 10 of the time in terms of how how much it how the economy associated with evaluating the cost function so as you can see the armijo rule is much better in terms of uh yeah i guess economy so it just it might take more steps but it will get there faster than doing something like line minimization okay i think that's enough talking why don't we go back to the board and and wrap this up all right so there you have it one more tool in your tool chest of numerical optimization routine algorithms and specifically how to choose the step size via the amiho rule so as you can see um it might appear complicated on first blush but actually to implement it is not that bad right you can code this up in just a couple of lines of code and you end up with a technique which provides some very nice guarantees about convergence and some very nice um behavior and it basically it saves us from evaluating the cost function an excessive number of times and really allows us to zero in on step sizes that guarantee a specific amount of cost function improvement with each step of the algorithm so with that being said i think this is probably a great spot to leave it i hope you enjoyed the video and if so i also hope you'll consider subscribing to the channel uh surprisingly if you scroll a little ways down and click on that subscribe button it really does help me continue making these videos um another way that you can support the channel is via patreon and the nice thing about that technique is that 100 of the proceeds that the channel receives via patreon are going to be directed towards k-12 science engineering and adventures for k-12 kids and young adults so um i think this is probably a good spot to sign off remember the new videos come out every monday so i hope i'll catch you at one of those and we can all learn something new together alright so until then i'll talk to you bye
Info
Channel: Christopher Lum
Views: 858
Rating: undefined out of 5
Keywords: minimization rule, Armijo step size, Armijo condition, line search, backtracking, Goldstein rule
Id: Uz3B9fVb4LQ
Channel Id: undefined
Length: 76min 43sec (4603 seconds)
Published: Mon Nov 01 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.