Applied Optimization - Least Squares Curve Fit

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello again it's good to be back with you in today's lesson I'm talk about least squares curve fits now this is something many of us do a lot if you've ever used Excel or another spreadsheet program there's that little add trendline feature you can use that's a least-squares curve fit and the least-squares curve fit is an example of optimization although often we don't realize it's optimization well if we're going to press that button a lot when we're working with Excel would it be good if we knew it was what it's what it was doing sounds like a good idea so let's do that here's here's basically how least squares curve fits operate you learn this first and then we'll go back and figure out what mathematically what's really going on let's say I have three points I'll just use three to start to keep it simple you can use any number of points okay and I want to fit a line through those well if I click the add trendline button I might get something that looks like that well that is an optimal fit it's optimal means well it's better than all the other fits that the program could have used to make those well let's let's start labeling some things here and maybe see if we can get them sort of mathematical handle on what's going on that's point x1y1 that's x2 y2 and that's x3 y3 this a curve fit called f XY now these these are finite points these are data this is a continuous function now why would you want a continuous function to approximate data this data could mean anything it could be measured data it could be financial data it could be population who knows but it's what we're trying to do is we're trying to come up with a function that that's close to all our data points and there's a couple of reasons you want to do that one possible one is that we're trying to find trends add trendline well these three points are trending upward and to the right if we added more points we would kind of expect them to go out that way another is that if you're running a mathematical model sometimes it's hard to put data points into your model it's if your model is continuous you wanted a continuous approximation to your data so that's another reason there are others but those are the two big ones that you see a lot alright for this to be optimal how is it optimal well what it's doing is it's minimizing the error between your data points and that curve well how is it doing that let's do this let's let's define an error here and actually let's mimic I'm going to clear some room out here get rid of that that Matt and then maybe this okay clean that out let's call this distance vertical distance from the point to the line d1 just for error one and this one I'll call me to put that on the other side sorry gang and this one call III alright so there you go well how about adding those up minimizing the air and calling it good let's try that eat the total error is u1 plus e2 plus e3 okay so minimize that we're good to go except there's a problem this isn't gonna work as I have it written down there right now this is below the one so we'll call that error negative we'll call this one positive and that was negative if this one was really big and the sum of those two was also really big but with an opposite sign we could have very large errors individual errors that all added up to pretty close to zero okay well that's not going to help or I could have a negative error so big it swamped all the others and I had a very large negative error well that's less than zero that's when you minimize air I can I want to make a one huge that's not we're trying to do we're trying to get these as close to that line as we can well how are we going to do it so zero is the minimum no matter whether you're positive or negative well you probably see this coming Square these doesn't matter what you give that positive number a negative number once you square it the lowest that thing can possibly be is zero so if every one of these points goes exactly through that line the sum total of the airs will be zero and the only way the the lowest value this can have is zero the only way it can get to that point there that value is that those points all go through that line so what I'm going to do is I'm going to write these out mathematically and then add them up and then minimize the function so one last step here I need a function to use to fit now I can pick absolutely anything I'm going to use a straight line here just for simplicity but there are some very very good curve fitting programs available right now they used to be one called table curve that would go through six or eight thousand fits it was just like magic was great dunno if it's still around but the point is good curve fitting software it could use many many different possible curve fitting functions the fact that I'm using a straight line here is only for simplicity any function you want you can use now there's such a thing as overfitting your data if you use some crazy rational polynomial or something you'd better have a pretty good reason for doing it I've been an engineer for matter of 35 years or so now and I've used rational polynomials once where it was really called for and I could justify it so 35 years I went to those really crazy functions once most of the time I'm doing low order polynomials long straight lines parabolas things like that because they don't have a real solid reason a good physical argument for using anything more than that so don't over fit your data let's do this okay that's the equation for a straight line back in that or non junior high school or something your teacher probably said MX plus B I'm not sure how M came to be the letter we used for slope but that gets used a lot I'm using a B's and C's because I'm going to be using this this format later in the class so I'm going to try to stick with a common common description for right now now it's tempting to say X is your variable it's not as part of the current than what we're doing the curve fit this is numbers we know what these are we don't know what those numbers are we can't do a curve fit so these XY locations those are data those are not variables so the act of curve fitting means find those yeah I know I was terrible there we go um so find those right so in for the active curve fitting means that be will what will eventually be your constants those are your design variables now those are the things you want to identify if we can identify a and B we've done a curve fit all right so what do our errors look like well e1 is going to look like y1 minus a x 1 a plus B and we'll square that okay why is this y1 is that point right there the vertical distance is y minus the distance - the our curve there are some function which is ax plus B and we know what X is that's a data point now I've got a y1 - that I can reverse it if I want to it doesn't matter because I'm squaring this this is one of the few times in your life where you can play kind of fast and loose with minus signs and get away with it so that's a 1 well e 2 is going to be the same and since this is so easy to write down that starts looking an awful lot like an algorithm this would be pretty easy to program it is it's it's not hard at all and so III is just going to be the wealth up let's just write it out for completeness here just to make sure everybody comes along for the right and there's III so add all those up and you've got your objective function I'll see I've gotten room here you can see that okay so what this looks like see if I can get this right on the first try kind of right a little smaller so I can fit it all on my board here okay there's your objective function and my design variables are a and B if I can find a and B I found this right so rather than try to do this analytically which gets kind of complicated let's go to some software I'll start with excel and then we'll go to MATLAB let's make some data points though let's say that x and y are let's see we'll call that one one two two and three two okay close enough those little we'll use those points and we're gonna find so there's one two two this is actually the lower than with me let's make that maybe a half okay I want to keep these numbers round if I can so there's we've got let's take those data points which represent there they're drawn right there and let's find a straight line that fits those optimally let's find the least squares curve fit now let's do the curve that we just talked about using Excel so here we go let's start with my making ourselves some X data that we used one two and three and in the Y we had zero point five two and two so making a plot out of this is pretty straightforward we'll just highlight our data go up to insert grab the scatter plot I don't need the chart title there so we'll get rid of that we have third there's our three points just like we had on the board earlier so click there to highlight that data series right click add trendline and the linear curve fit is the default so I don't even have to pick anything over here I want to see the equation and display the r-squared and we really lucked out on this one that the kerf it came out very very simple on the board we said y equals ax plus B and you could see that a is the slope and that's 0.75 and B is zero the line actually goes through zero now r-squared if you haven't seen it before is a measure of how accurately the your curve goes through all the points you've got if R squared is 1.000 whatever it means that the curve is going through all your points ideally you want r-squared to be pretty high r-squared of zero is terrible basically you've got random noise and r-squared of 1 means your curve goes through every point so here we are by the way if you're remember junior high school algebra your teacher probably said y equals MX plus B that's just sort of standard terminology that's left over from I don't know when and again M our slope is 0.75 and B is zero so that was pretty easy let's try it now in MATLAB where we'll cast it as an optimization problem now let's do the same curve fit in MATLAB there's I can leave everything else in MATLAB there's many ways to do this I'm going to show you two right now the first is we're going to use F min search which is probably the simplest of the canned optimization functions that are in MATLAB it doesn't need derivatives or anything as long as you know what your objective function is you're good to go so rather than have you watch me type it back in I'm gonna recall a command I typed in before okay so e capital e is my total error that at C thing means this is gonna be a an anonymous function and I'm going to use the letter C to stand for the variables in that function I used C because those are the constants we're trying to find often you use X or Y in here and I wanted to make sure we didn't confuse the constants we're trying to find with the X's and Y's we already have as part of our little collection of data so if you remember from the whiteboard we have y1 minus this whole quantity in here is MX plus B well M we don't know what that is or ax plus B I guess is what I called it there it is right there there's that's a that's the first constant we don't know X well X 1 is 1 plus B well I don't know that either so there it is and I've squared it same thing Y 2 which is now 2 and subtract are this constant times X 2 which is 2 plus B square that and the same thing over here there's y3 and there's x3 right there so I go there I just hit return I didn't put a semicolon at the end so it does echo to the screen and the next thing to do the last thing to do is to type in F min search or recall that command capital e for the anonymous function we've just defined and I have to give it a starting point well I don't know what to use so I'm just going to use 0 0 if I hit return there it is because these terms all look about the same it's pretty easy to assemble this error function using a loop or something like that this structure lends itself to efficient programming pretty well so the fact that we're using three data points rather than some larger number doesn't really matter now there's another way to do this let me clear this in fact I'll clear the memory so our workspace just got cleaned out we don't have anything in there there's X again just recalling a command from before and there's Y so I've got our list of X's and Y's now loaded into the workspace I could plot this if I want and it says there it is right there well that's not really kind of how we want it what we had on our screen before so I'll turn the line off no line and then I'll use maybe circles for markers there they are and I can change the X&Y range if I want to but you can see right there with that there's our data it sits in memory we can plot it if we can plot it and the picture comes out right that's a pretty strong indication that haven't made any mistakes so far but I'm going to turn on something right now called CF tool curve fit tool this is a little interactive tool that's built into MATLAB and what it let me get it on the screen here there we go so you can see it there's what it looks like this is a little interactive basically an app that's built into MATLAB there are several like this there's there's the certainly isn't the only one and you can tell it what you want your X&Y dated to be they have to be loaded into the workspace so there it is there's X&Y loaded in and you gets got our points but it also automatically did the fit I have Auto fit check so it did the fit already polynomial of degree one so the degree one polynomial is a straight line and there's a bunch of other features here you can mess with if you want but if you go over here it shows you that we're using a polynomial of degree one a straight line and they're calling P one and P two the parameters you want to find we call them a and B on the whiteboard and we called them see in the example just now where I typed it in and what are the coefficients 0.75 and 0 there so there they are goodness of fit I'm not sure what SSE means I've got to go look that up r-squared is 0.75 so those numbers agree with what we found with using the F min search and the Excel example so we've got it turn this off okay I do want to I don't want to save my session so we've now gone over how to do a least-squares curve fit and done the curve fit two different ways with Excel and with MATLAB both using the same objective function we've shown that it is an example of optimization and just want to leave you with the reminder that the fact that we used a linear function was just for this example you can use any function you want the only rule is you have to have at least as many data points as you have unknown parameters if you have exactly as many data points as unknown parameters there is only one solution it's not really a least-squares curve fit at that point it's just fitting a curve through points if you have more data points than you have unknown parameters then it really does become an optimization problem and you really are finding occur with the least squared error [Music]
Info
Channel: purdueMET
Views: 2,553
Rating: 4.8297873 out of 5
Keywords: least, squares, curve, fit, fitting, matlab, excel, error, function, minimum, minimize, optimize, optimum, minimization, data, trend
Id: k-Ppr4bHpDw
Channel Id: undefined
Length: 18min 42sec (1122 seconds)
Published: Sun Aug 19 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.