Econometrics // Lecture 3: OLS and Goodness-Of-Fit (R-Squared)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everybody so this is Chris from canes Academy and in the last lecture we talked about the simple linear regression and introduced some of the methods surrounding the simple linear regression if you'd like to go back to that video at any point during this video just click this top left arrow here and it will take you there likewise if you'd like to skip to the next video at any point during this video just click this bottom right arrow and it will take you directly to the next video in this lecture we're going to talk about OLS or ordinary least-squares method as well as the goodness of fit measure R squared so over here I have a diagram that I've already drawn out and we're just going to review some things to make them concrete so X is our independent variable Y is our dependent variable and what we're trying to do when we run a regression is measure how a change in X affects Y so what happens when we change X what happens to Y and the way we do this is we have all of these observed data points so these purple dots are all real-life data it's what we see in the real world and then this green line here is our estimated regression so our estimated relationship between x and y we can define this line as y hat and y hat will be equal to some constant or our intercept beta naught hat which is this distance here plus our slope parameter beta 1 hat so the slope of this line and then X and so what we're going to start with is what is OLS OLS is a method for estimating this this regression though the Y hat regression and it's essentially trying to put this line in the most appropriate way and defining it as our best guess for the relationship between X&Y and the way it does that is by minimizing the sum of squared residuals so we talked about all this in our last video and I'll just quickly define the sum of squared residuals again as exactly what it sounds like the sum of all of these residuals so the distance the vertical distance from this line to each point squared and now we should move ahead and talk about the properties of OLS so we know that we're just trying to minimize our SSR in that when doing that we will basically find the most appropriate regression for our given data and some essential properties we need to do this are as follows so the sum of the residuals has to be equal to 0 so what does this mean we've already defined that our residuals are equal to our true value for y minus our estimated value for Y and so that essentially means this value of y here for our sorry this value of y here for our real point minus this value of y here for our estimated point at a given X and so when we define it like this if we can sort of interpret that if a point is below our estimated line then our residual at that point will be negative and if the point is above our estimated line then that means that our residual at that given point will be positive and so when we sum up all of these residuals we want them to be equal to 0 because that will imply that the points are weighted equally the true points are weighted equally above and below vertically on our line our second property that's very important for OLS is the sum of all the residuals multiplied by the exes will be equal to zero now X is our independent variable and essentially what property two is saying that the independent variable is not correlated with the residual so this is very important because this property tells us if it's true it tells us that there's nothing in our is it residual that explains X I because if we didn't have this property then we wouldn't be able to properly use this regression as the relationship strictly between x and y the third property is the mean of Y so Y bar is equal to beta 0 hat plus beta 1 hat times the mean of X so this one might look a little bit more intimidating at first but it's essentially very simple what it means is that at some point on our estimated regression a Y hat we have the value of the mean of the Y values as well as the value the mean of the the x value so on this line somewhere will be our Y bar and our X bar so when we minimize the sum of squared residuals we're using the OLS method or any other method we will essentially want to know how well we have estimated the true model and the way we do this is through goodness of fit measures and there's one specifically we're going to talk about in this video and it's called the r-squared and before we define it we have to define some other some other measures of variation so we already have the SSR which is the deviation from the the estimated line and squared so essentially if we have a low SSR that will mean that the points are very tight to the line they're all very close on this line we have to define something similar called the SSE which is the estimated or sorry the explained sum of squares and this is defined as the sum of the estimated wise minus the mean and squared so the SSE tells us what is explained by our model how much is explained by our watt but model in the wise so how much of the variation in the wise is explained by our regression we need to define one more before we can talk about the R squared it's called SST or the total sum of squares and the total sum of squares is defined as the sum of the true values of Y minus the mean squared so this don't get these to confuse the estimated is our estimated parameter - it's me minus the mean and this is the true value of y at that given X minus the mean so this one tells us all of the variation in Y and this will tell us all of the estimated variations so how much have we explained with our model and this is how much is changing in the real old so now that we have these three defined we can define our R squared our goodness of fit how well does this estimated regression fit our true relationship between our independent variable X in our dependent variable Y and R squared is defined as SS e so our explained sum of squares divided by SST and essentially this makes sense we want to see how much of our model we have explained so we have our explained sum of squares which tells us how much of the variation in Y we have explained using our model divided by the total variation in Y and intuitively we can derive this as also equal to the 1 minus the SS r divided by SST so R squared will always between be between 0 and 1 and so we take 1 and we - what we haven't explained divided by the total variation so both are equivalent the SSE / SST and 1 minus SS are over SST they are both equivalent measures they're equal to the exact same thing but we have to be very cautious because we've talked about causal relationships and the R squared does not tell us whether our regression is causal or not it simply tells us how much we have explained using our regression so be very careful do not do not confuse a high r-squared with a with a causal relationship so the last thing we're going to talk about is how to interpret the r-squared we're just going to quickly go over this if R squared is equal to 1 not me that every single point on in the real world lies exactly on our estimated regression and if it's equal to zero then that means that there's absolutely no correlation between our X and our Y in our explained model so the higher your r-squared the better your goodness of fit and the lower your R squared the poor or your goodness of fit so that's all for our lecture today and I encourage you to comment if you don't understand anything completely or comment with feedback we would love to hear your opinion this is Chris and we hope to see you soon
Info
Channel: KeynesAcademy
Views: 260,897
Rating: 4.8552279 out of 5
Keywords: Ordinary Least Squares, Goodness Of Fit, Econometrics (Field Of Study), regression, lecture, R-Squared, Tutorial, intro, introduction, dummies, time series, academy, 421, project, assumptions, properties, basics, fundamentals, concepts, course, class, estimators, economics, beginners, function, keynes, model, OLS, Overview, data, analysis, error, residual, simple, theory, test, quiz, video, tutor, help, learn, lesson, teach, teacher, professor, guide, explain, explanation, easy, causality, subject, slope, intercept, coefficient, tips
Id: 8tAPsX0YuNE
Channel Id: undefined
Length: 12min 15sec (735 seconds)
Published: Sat Aug 03 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.