SPSS for newbies: Interpreting the basic output of a multiple linear regression model

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
everyone Phil from Statistics mentor.com in this video we're going to talk about the absolute basic output of running the multiple linear regression model as an example we've got a Deepak variable earnings of the independent variables we've got experience in working say measured in years and schooling number of years of schooling let's say to keep things well earnings that say is measured in thousands of dollars so the units is earnings in thousand dollars experience is in years schooling is in years we wish to regress earnings on experience and schooling which means that taking experience in schooling as IBS and earnings as the DV so in short we say jargon is to say it will regress the earnings on experience and schooling okay to do it go to analyze regression linear transfer it into the dependent variable box the earnings and into the independent variable box the two IVs notice there are some options and statistics and plots available to us we'll ignore those we'll go okay okay first box isn't doesn't interest us at all just tells us what has what IVs are that are in the model right model summary this box contains the R square now for multiple regression we wish to report and look at the adjusted r-square rather than the R square and recall that the R square stroke adjusted R square measures the proportion of the total variability in the DV that is explained by the IV's the model so here we could report since it's just it R square is about 0.189 we can report that about to convert into percentages 19% of the total variability in earnings explained by the model or if they've seen the model we can say that 19% of the total variability in earnings is explained by experience and schooling okay now doesn't mean the R square is not useful here if there's a big discrepancy between R Square and adjusted R square it suggests that some of the IV's were included in the regression model is redundant okay but it's not really good way there's a more precise way of checking whether that's the case whether we've included more Ivy's and we need to as we shall see shortly then we'll come on to the two tests and they're following two tables and all the table presents the key thing here is the F statistic the F test and how we get this figure here five point five three seven is using these three columns alright so it as a means to an end all these three columns where nitrous interesting those figures are interesting these two final figures which obtained from the first three now bigger test we think that apply statisticians just need to know what is the null and the alternative the null hypothesis always for this F testing and all of the regression is that the model has no explanatory power which is same as saying that all the coefficients on the Ivies is zero but that's the same as saying that none of the IV's helped to predict the DV in other words model is useless all right is it do we reject the null or not reject the null well look at the p-value which is labeled as sig significance point zero zero eight way less than naught point naught 5 indeed it's even less than naught point naught 1 so we conclude there's very strong evidence to reject the null that the model has no exponent or power ok next coefficients this is the the most interesting of all because it tells us about the relationship between IVs and the DV for the coefficients ok so first of all we'll just look at each each of these rows here this is a corresponding T statistic all right we're interested in the ones for the IVs so experience T statistic is 2.4 AIDS significance naught point naught 1/8 now what is the null for the T step here by default the null for the t stat in the regression is that the coefficient for the IV is 0 that's same as saying that the IV doesn't help predict the DV that particular IV all right so this experience is experience significant yes we reject the null because the p-value for this T is 0.01 8 which is less than 0.05 schooling T is 2.9 67 significance point 0 5 that is less than naught point naught 5 if you're unsure just get out you calculate a naught point naught 5 minus not point naught naught 5 is positive so they are both significant so so in other words they each have predictive ability for DV next we move on to the coefficients go to hell I hate every time you dock sits always got this yellow thing box flashing up double click to activate get rid of that okay there is again right now where was I right coefficients where these coefficients are interesting we're going to understand the data rather than standardized coefficients we can either understand those coefficients these coefficients we need to check two things first of all does the sign make sense as positive or negative contour what theory suggests in this case education Theory so experience on on salary well on earnings will you expect that the more experience you have the higher earnings you should get to you would expect a positive coefficient indeed we do because it's positive not put one-eight-five for schooling you expect the more years of schooling you have the higher the earnings so that works out to be positive there you go positive point two seven six eight next comes on to the interpretation of the coefficients an interpretation for experience goes like well let's um step back a bit in general the coefficients on the IV in multiple regression goes like this for a one unit increase in IV the model predicts that the DV will increase or decrease depending on the sign on the coefficient by blah blah blah units holding all other IVs constant so I say that again because as important set in another way the model predicts that for a one unit increase in that IV the DV will increase or decrease by blah-blah-blah units holding all other IVs fixed let's do that now for this example so experience one point eight five four call it experiences measuring years and earnings is measuring thousands of dollars so we would say the model predicts that for a one year increase in experienced work experience the earnings will increase by x is 5000 1854 dollars holding years of schooling fixed KY said 1854 is because I could have said it's 1.85 for thousands of dollars so instead of saying thousands one point eight four thousand dollars converted just times that by a thousand okay next one years schooling coefficient is plus two point seven six eight we would say that the model predicts that for an additional year of schooling the earnings will increase by two thousand seven hundred and sixty eight dollars holding years of experience fixed okay and that is the interpretation of the unstandardized coefficients for the standardized coefficients the which is I have let me use this but some of you may need to report it is that everything the unit's like when I said one unit increase in IV leads to a blah blah blah increase or decrease in the you in in the DV by so many units the units are our standard deviations so for experience we would say that by one standard deviation increasing experience the motor predicts that the earnings will increase by 0.35 eight standard deviations alright it just gives you an idea of the sensitivity of the DV to changes of the IV but in terms of real-life interpretation doesn't talk about stand deviations doesn't mean much to people alright so for that reason this is what we okay now if I can just review then quickly all summer we're interested in reporting the adjusted r-square the anova were interested in reporting the f now if it fails the f there is no point continuing to look at the coefficients because it fills the f it means that model is useless okay if it passes the F we move on to this one and then we look at the T now if these T's out that's significant we look at these coefficients check the sign to the next sense from a theorist viewpoint if they don't make central reserve point move your theory it might mean that you haven't included you haven't included something key IVs so you might be missing some variables all right if those signs of those coefficients matches with what you expect from theory by theory I don't means the discovery I mean theory of the application of the data for example here we're looking at education data source from education theory if you're doing something from psychology you'd look at you appeal psychology Theory from what you expect the science to be and for economics and so on so once these signs do make sense that we can use it for interpretation to a petition of the coefficients right now one final thing to say is that and it's important as well is that these test statistics the F T only are valid if certain conditions hold for the model okay and this is where you do a residual analysis or what they call diagnostic checking so in reading off the F and T I've assumed that certain things about the model hold and so that these things are valid okay so that concludes the very very basic introduction to multiple regression
Info
Channel: Phil Chan
Views: 451,454
Rating: 4.7129278 out of 5
Keywords: spss
Id: AMqB0K5Tt6M
Channel Id: undefined
Length: 12min 51sec (771 seconds)
Published: Tue Sep 04 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.