What is Regression? | SSE, SSR, SST | R-squared | Errors (ε vs. e)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi this is Justin Seltzer for Z statistics I've got a little video here on regression and I'm hoping that there's going to be a series of about five videos I'll be doing on regression now this first one is going to be the nuts and bolts so really looking at the foundations of regression so if you've not dealt with regression before I'd say this is a good video to go through but even if you have I reckon I come at things from maybe a different angle than you're used to from lectures and textbooks kind of make it a little bit more intuitive so you might find that there's still some stuff in here that is quite useful for you anyway um if you can listen in very closely you might hear it's a raining pretty heavily here so which is quite quite applicable in a way because the little sample that I've concocted from the top of my head here is bar takings given the temperature for that particular day so we've got a nine particular Friday nights that we've sampled out of June and July we've got the amount of money that the bar has made on those nights and the temperature that was recorded on those days so I guess the theory is that the greater the temperature the more likely people will be to go to the pub after they finish on a Friday if it's a particularly crappy day and it's pretty chilly and overcast maybe they're just going to go home instead perhaps so if you have a look at the scatter plot that we've got here we put our bar takings as the y-axis and the temperature is our x-axis you can see a positive relationship kind of showing out here so our theory about the relationship between takings and temperature seems to be ringing true but can we be more specific and turn this visual generalization into some kind of equation some cold hard equation or maybe even more importantly can we assess the strength of that relationship those two questions is what a progression is all about trying to generate our relationship between two variables and then assess the strength of that relationship all right so where to start maybe starting from the top is the best place so the easiest way to think of a regression perhaps is as a line of best fit so you've dealt with you know drawing a line of best fit from school days I imagine now that line in statistics is called y hat which it's the silly name but we like to put hats on things when they're predictions of true values so the Y hat line is a prediction of Y for a given value of x so for example at 25 if we have a 25 degree day we can sort of use this Y hat line to assess how much would expect to be making from our bar on that particular day so it's $2,500 maybe just sort of by sight having a bit of an estimate there if it's 15 degrees you might think it's about what $1500 we would expect so it's our estimate or prediction of Y for a given value of x but it's got an equation which I've written down here the sample regression line - 350 3.11 plus 120 3.5 for X so there's a constant term or a y-intercept and there's also a gradient of X or the coefficient of x which defines y hat but how do we find those numbers where do those numbers come from I guess intuitively you might think that well this Y hat line is drawn so that the error terms the distances to the line itself is minimized so if you sum up all those distances to the line the lines drawn so that that particular value that sum is minimized well you'd be correct but maybe not 100% the sum of all those raw error terms if you appreciate what happens when there are the very terms and negative error terms I've got the positive ones in blue and negative ones in red if you sum them all together you're going to get zero so that particular summation is not really going to help us find the line of best fit there's in fact an infinite number of lines you can draw that will have the sum of the error terms being zero so that's not necessarily going to help us here what we're going to have to do is actually square these distances you've got to square them to get rid of that negativity so all the negatives become positive so on the next little slide here we'll notice that the sum of the squared errors will be positive and it's that metric that we're going to be trying to minimize to create Y hat so Y hat is perfectly defined as the line that minimizes the sum of the squared errors so there's only one line that can be drawn that will do that now so it's only for that reason that we come up with all these sum of squared errors type of things that you might have heard heard of SS are SS e SS T we only have to square things because of that problem where the negative raw values are going to cancel out with the positive ones okay so now might be a good time to have a look at what SS are SST and SS e are all about so let's do that let's have a more in-depth look at these three factors so let's go back to the beginning and I'll also draw in here the Y bar line what's Y bar Y bar is the mean value of y so our bar takings average is drawn in there now what a regression is all about is trying to figure out why a particular variable varies why is this particular observation so high what Y on this particular date we have $3,200 in our pocket well in the pubs pocket and on this particular day we only had well what's that one $800 or something a very small amount that's the question that's what regression is all about now you can assess the deviations here from y bar so this particular observation at the top here is much higher than the average the question is why is that higher than the average we can separate that distance to the Y bar line into an explained deviation and an unexplained deviation and what I mean by that here's that Y hat line if you appreciate that this little X that I'm about to draw so there's that particular point I'm looking at there's an X that X represents the expected or predicted value of y for a given value of x so we're using the maximum temperature to try to refine our estimate for y here so even the fact that it was 23 degrees on this particular day we would have expected we would have expected a higher than average value for that particular day so X is where we would have expected it to be so that distance from the X to the Y bar line is that expected deviation from the me the fact that we know that it was a particularly hot day for winter that is means that on that particular day we'd expect it to be higher than the mean and we'd expect it to be that much higher than the mean so that's an explained deviation from the mean there's also now an unexplained deviation from the mean so despite the fact that we thought it would be higher than average it was even higher than our expectation so there's remaining an unexpected deviation from the mean so out of the total deviation there's an unexplained component and an explained component and that's where we start dealing with things like SS our SS and SS tee if you were to sum up all of the very title variations squared you're going to get SS TSS total sum of squares total if you sum up just these green bits squared you're going to get SSR and if you sum up all of the residuals or errors squared you'll get SSE so you'll notice that SST the total deviation is equal to SS r plus SS II which kind of makes sense that total deviation to the mean is split into the explained and unexplained components it's maybe a little bit more complicated than that but that'll do I think I think that's a good little visualization as to what SST SSR and SS a is all about we then jump into this thing called R squared which I think we'll delve more into in the next video but let's have a quick discussion of R squared I appreciate that when I go to the next slide here when the observations are scattered quite randomly you're going to get a low R squared now what a square it is is the proportion of the total variation which is being explained so if I go back just one slide for a second you can see it's SS r on SST so it's the proportion of the total sum of squares being taken up by SS R so if SS II is quite small so these error terms are quite small your R squared is going to be quite high because SS r is pretty much the entirety of the total sum of squares so going back an example like the one on the left will have a particularly high sum of squared errors you can see those error terms are quite large here huge error terms all over the place so they'll have a low R square the one on the left the one on the right has a particularly high R squared with a low SSE so lo sum of squared errors gives you a high R squared so R squared gives you some kind of vacation of the fit of that model okay so with that out of the way SST SSR and SSE up and also r-squared let's now talk about error terms because I know that there's a bit of confusion around what the sort of lowercase e and the sort of uppercase curly is all about so what I thought to do is again go back to the beginning here and appreciate that the initial sample that we took was an estimate or the line of best fit that we drew is an estimate of the true relationship between bar takings and temperature so if we took nine new observations so say the next nine Friday's at the pub here we go we've gone from August to September now we're going to get a completely different regression line so here you can see that the constant term changed as did the slope coefficient now what both of these samples are doing are trying to estimate the true effect of temperature on bar takings so it underpins regression itself this assumption that there is a true relationship that we can estimate and it's given by this little equations called the population regression function y equals beta naught plus beta 1 X plus this Kerley error term now the idea is we can never know what beta naught and beta 1 actually is we can never know what they are but we can estimate them so in this particular estimate we have 586 as our estimate for better not and that's often termed beta naught hat this is therefore termed beta 1 hat or alternatively you can use lowercase B naught and lowercase B 1 but this is just an estimate of the true relationship now I'm drawing the true relationship here as a sort of very godly glowing line here for that very reason that you know you can never No you know where it is exactly but the theory is that it does exist and we're trying to estimate it so what is this curly error term then it's sort of tacked onto the end of this what that means is that every observation has some kind of distance a theoretical distance to that population regression function we can never calculate that curly error term but it does exist in theory so that's completely different to the original error terms we were dealing with lowercase e that's the distance to our sample regression line and we can calculate that we actually can find each individual error term so we can minimize the sum of those error terms squared whereas we can never know what those curly error terms are that just exists in theory and for a given sample the sum of those curly error terms aren't necessarily going to be zero so that's hopefully giving you some indication of the differences there between the two different errors that you see coming up in in statistics the Curley error term and that sort of sample error which is the lowercase e so that's it that pretty much concludes this first video hopefully you found it relatively informative the next video I'm going to do is going to be quite a good one on degrees of freedom which I know people have lots of problems with so stay tuned for that that'll be a good E and thanks for watching

Info

Channel: zedstatistics

Views: 1,088,675

Rating: 4.9082603 out of 5

Keywords: What is regression, R squared, SSE, SSR, SST, Statistics, Error terms., zedstatistics, zstatistics, justin zeltzer, zeltzer

Id: aq8VU5KLmkY

Channel Id: undefined

Length: 15min 0sec (900 seconds)

Published: Tue Nov 22 2011