Excel Walkthrough 4 - Reading Regression Output

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi today I wanted to just go over multiple multiple regression output in a little bit the data I'm gonna be using for this I took from example chapter 15 question 5 in the book its MLB data looking at national league teams with winning proportions and homeruns new alright I've walked through this example in class I'm gonna record a video at some point and that walks through this example bit by bit on how to do the problem but for now I just wanted to to walk through the output to try to make us more comfortable with reading and interpreting multiple regression output ok so the way that we generate these output if you recall is we go over to our data tab and we collect click data analysis and then we go to we once the data analysis window pops up we select regression and click OK and then we input our ranges we have our dependent variable that range where that data is goes in the top box the bottom box contains our explanatory variables which can be one just homeruns or er a or we can include the full array of explanatory variables if we do that we we generate the output that I've produced here already so I'm gonna cancel out of here and in a in a later example I'll show how I got this data I think I showed you in class in any case when we do that what we get is is this right here we have a table of two summary output at a top it was a very useful table I mean the reason that this exists as a technology is because there's a really good job of summarizing the things that we care about what do we what do we care about in particular well right now we care about the most important thing probably is our estimates of the relationship although the f-test matters a lot too in any case I'll just walk through what we have here just so you can look at this this says ANOVA right here that just means that this section is analyzes the variance but it gives us a bunch of different information give us we have three rows regression residual and total and then we have three sets of degrees of freedom this is a simple linear regression so we have one degree of freedom in the in the in the regression and this right here DF is always equal to K that's K right here that's how many regressors we have and you go down here you can see that we do we have 1 we have an intercept and then we have homeruns in this case the dependent variable I'm sorry I should said this the dependent variable is proportion of games one so we have one regressor so k equals one with 1 degree of freedom internal aggression up here we have our number of observations which gives us n so right we had 16 observations and if you want to check this you can go over to the data tab and see that indeed we have 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 teams 16 observations n equals 16 we run a regression our total degrees of freedom that minus 1 so this is n minus 1 goes here and then with residual right here is n minus K minus 1 and that's 14 the rest of this box contains more specific information remember the purpose of regression is to a purpose of regression is to give us information about the variability in Y right we want to know why all these teams don't have the same winning percentage right and this and what to the another way state that question is why is there variation why why aren't they just all the same well the argument we're making when you do regression is well the variation in Y exists because there's a variation in X there's also some error but we're trying to argue that the point that the variation X is what's driving this result so we're trying to measure is the thoth we're trying to measure the I'm trying to explain the total variability the way we measure the total variability is this number right here that's the SST the sum of squares total the total sum of squares that's the SST right there the way we do this is we hope to capture a lot of it with our sum of squares due to regression right here this is our SSR this is the amount of the squares as explained by our regression the amount that's left over is the sum of squares due to residual and that's right here this is our SSE so that's the information that we have here now if we want to know if our regression does a good job one way to do that is to take this information and use it to tell us well how much of our information did we uh did we get did we capture to do that we would want to run an F test and we have our F statistic right here which it calculates for us this is our F that's nice what that is is it's if you remember it's the MSR which is right here over the MSE so this is getting kind of messy you can see that this little area contains a lot of information so we have our F test statistic and the degrees of freedom come over here we have one degree of freedom in the numerator 14 degrees of freedom in the denominator and if you look that up on your F table you'll see that our significance of our total regression is about 0.1 3 just not terribly significant but that's what our ANOVA table tells us we can look at other you know that's one regression but our Nova table tells us a lot of information about our aggression and about the significance of the regression that's often where I go first you can see also that it doesn't explain a lot of the variation our r-squared is up here right this is our R squared and we are explaining about 15% of the variability in Y with our variability in X another simple linear regression over here on the er a tab so I was on the HR tab home runs tab now we're looking at how winning percentage is related to e ra e ra is earned run average it's the amount of runs a team allows us in any case if you plug in the data this is what you get and again we have a simple linear regression just one regressor K equals one because when we have one other than the intercept down here you'll see that the sum of squares total is the same as point O seven three four six nine that should that should be true because it's the same Y right we're looking at the same variation in winning percentage and so we have that information that we're trying to try to deal with this regression does a better job our sum of squares due to regression is higher meaning that our variability next does a better job of explaining the variation in why they move together so again we have our SSR um all the rest of the stuff is the same now we have an F statistic right here that has a significance it's much much stronger right this regression does a very good job of explaining it does a very significant job they move together and you can see that our squared right here that's point 0.5 0 that's high about half the variation you can explain with with just er a and then our coefficients down here are useful so our adjusted r-squared right now we only have one regressor it doesn't really matter that much other stuff that's useful these coefficients are kind of important so this is the baseline so if a team had a zero point zero zero all right it's an out-of-sample prediction but that would be the predicted winning percentage if your er ei is equal to zero and then with every additional point of e RI that winning percentage drops by 0.08 so this you know you can use these right here as your intercept and coefficient in the following predicted equation estimated equation right this is what we're really trying to estimate and baby zero is the zero point eight six four so we could interpret this like so and then B 1 is negative zero point zero eight four and we're gonna multiply that by x1 which is in this case it's our DRA that's yes to make the equation apologize for the handwriting I'm getting better but it's gonna take some doing is this significant as our is this a significant relationship well we have information here we have our b1 this right here is our SB that's our standard error on our coefficient and we have a t statistic which is just B over s B right that's negative 0.08 367 divided by zero point zero two two two three gives you negative three point seven six now this has n minus K minus 1 degrees of freedom the T distribution and if we look it up on your T table you'll see that a p-value are two-sided our two-tailed tests would give us a p-value of 0.02 o95 for simple linear regression this is going to be the same p-value C as the f unless you're rounding if you're not rounding though you're using computer to do it those should be the same but that p-value tells us about the strength of the relationship alright now we can look at multiple linear regression now we have to let me scroll down just a wee bit there you know we don't need that top row this right here is the full table we have we now have two regressors homerun and er a and so our degrees of freedom and the regression is two we're using two of our degrees of freedom to run the regression and you can see that our R square before is 0.5 now it's point eight five eight which is that's good works winning 86 percent of the variation so you got a and homeruns they do a pretty good job not all of it there there are better measures out there in baseball but that's pretty good how is this test is this regression significant overall when we look at our F test and you see that that significance 3.05 e to the negative of oh six that's equal to the way we do that is we write three point zero five and then we move the decimal point back six spaces because that negative six is there so one two three four five six that's the decimal point zero zero fill in zeros here all right that's pretty improbable point zero zero zero zero zero 305 that's our p-value that's what that's saying it it's not a p-value on the F test so yes this is very highly significant other than that well now we see we have positive relationship between homeruns in win in winning percentage negative relationship between Eri and winds and the p-values here are all uh where's my patent there we go he's p-values they're all really small right everything's significant so yeah so that's how you read a regression table we'll do more practice everybody will get good at this we'll take our time but yeah it's it's good to take a look at this and I'll save this spreadsheet so you guys can take a look thanks bye
Info
Channel: Jason Delaney
Views: 339,088
Rating: 4.0515208 out of 5
Keywords: Business Statistics, Regression, lecture, notes, Scatter plot, math, regression model, statistics, Microsoft Excel (Software), R squared, coefficient of determination
Id: Ut22-WLvEVw
Channel Id: undefined
Length: 11min 26sec (686 seconds)
Published: Thu May 24 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.