Hypothesis Testing Part 1: Interpreting t-statistics from OLS Regression in Stata

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Applause] [Music] welcome back today I wanted to walk through some introductory econometrics issues in hypothesis testing basically how to read the information that you get from your standard OLS regression output so I've got Stata open here so I thought we'd call up some data get an example regression and then kind of walk through exactly what we're looking at so the data set I'm gonna call up here is from the Wooldridge text books you can use the bc use command we're gonna call up the CEO salary data sets that we're going to use the command bc use and the name of the file is CEO sow 1 and so we've got two hundred some-odd observations of salaries of different executives based on individual and firm level characteristics okay this is just purely for example so let's run a regression with salary as our dependent variable and then maybe have sales return on equity return on sales and then we've got these various industry indicators so let's put in the utility dummy variable so equal to one if the executive is running a utility firm so we've got our regression results we've got a bunch of coefficients standard errors T statistics so that's what we want to take a look at today so just a quick refresher right so when we have a regression model the point of hypothesis testing is to be able to make statistical inference from our single sample estimates on to the underlying true population coefficients ideally we want to be able to say with some degree of confidence or significance that the true coefficient is not 0 based on the evidence that we see from our single sample and a good way to think about this is we're going to be at least initially evaluating each coefficient estimate based on a two-sided alternative hypothesis test or the null hypothesis is that the true value is zero and again we want to be able to see if we can reject that null at a reasonable level of significance or confidence and our method of evaluation is going to be that T statistic so the purposes of this discussion is not to go into the underlying statistical theory and the proofs of exactly why this works and more importantly when it doesn't work but just so you know the assumption that we're working with here is that we are in the gauss-markov world of classical assumptions so the ratio that we see here this T statistic ratio of the coefficient estimate the B 1 hat to its standard error will follow a an underlying T distribution so we're assuming constant error variance no autocorrelation proper specification and at least large sample normality of the error terms okay but when we look at our regression results that of course is where these T statistics come from the ratio of the coefficient estimate to the standard error and in order to evaluate what that T statistic means of course we have to compare it to the appropriate critical value so that's the point at which we can say we are enough standard errors away from zero in our estimate that we no longer believe that the true value is 0 that the true distribution is centered at zero okay so our choice then now is to make our required level of confidence rate that's going to be the alpha or the significance of our test and by convention we basically have three choices that we typically look at 90% or an alpha of 10 percent 95 percent confidence alpha 5 percent or 99 percent confidence the alpha would be 1% and since this is a two-sided test we're going to be drawing our conclusion based on the absolute value of the t-statistic in other words we could be different from 0 in either direction below 0 or above 0 so if the absolute value of our t-statistic surpasses the critical value we reject that null of 0 and say now we don't believe that the true value of 0 we don't believe that that distribution is centered at 0 so the picture would look something like this where once we again look up what the appropriate critical value is we have kind of split up the possible outcomes into the do not reject region close to 0 and then if we're far enough away from 0 in either direction we're in the rejection region so we think of the the alpha or the significance in terms of the area under that curve so if for example we have 90 percent we're looking for 90 percent confidence where we have an alpha significance of 10% that middle area that 1 minus alpha is going to be 90% and we'll have alpha over 2 or 5% in each tail so let's go back and look at our example in Stata hair so where we've estimated the model we've seen how we've calculated the T statistic now we've got to find the appropriate critical value and draw a conclusion for each of our coefficients now the first step here is to figure out what the appropriate critical value is and typically you are recommended to go to the back of your statistics book of your econometrics book and look at the critical tea table and it might look something like this where we have a column of degrees of freedom so that's again our n minus K minus 1 sample size minus number of coefficients estimated minus one for the intercept and then the rest of the columns are made up of different levels of significance so in our case here are in is 209 our K is 4 right so we have an N minus K minus 1 of 204 and our table doesn't have an entry for 204 in fact we are somewhere down here at the bottom somewhere in between 120 and the asymptotic normal distribution the bottom row in the table so it looks like we're somewhere between those values one point six four five one point six five eight and we could interpolate between the two and mathematically figure out exactly where we lie but it turns out in Stata we can look up the exact critical value so we're gonna generate this variable that's gonna give us the exact appropriate critical value so in Stata we use the generate command and we'll call this say the critical 10 two-sided let's make up a name the goal will be able to understand what it is and the function that's gonna give us this result the area under that T distribution in the tails at a certain degree of freedom and a certain level of significance is I in the detail so it's the inverse of the T distribution the as you can imagine the Associated can end the t tail command if given a critical value or a T value it'll give you the area right will come we'll use that in in part two of the sequence a little bit later but all we have to do is input the degrees of freedom part n minus K minus 1 which in our example is the 204 and then how much area is in the tail so you even though this is a two-sided 10% test we're gonna put only 5 percent right so 0.05 that's how much area is in each tail of the distribution so we hit enter and we have a new variable up here and it's gonna be the same observation I'm saying value each observation so we could just call up the summary statistics and get the mean of that variable and here it is one point six five two and there it is right in the middle there so that's the exact correct critical value for our example so now that we have our critical value let's go ahead and see how we can draw our conclusion right so as we see our for the utility dummy variable for example and that's a good a good choice in general for a two-sided test right you don't really have a preconceived notion of whether or not that's going to be a positive or a negative coefficient we just want to know is it different from zero so we have that T statistic the ratio of the coefficient to negative 525 to the standard error the 263 gives us that value of negative one point nine nine so at and 10% significance 90% confidence we can reject the null that that coefficient is zero our calculated T surpasses the critical value in absolute terms so our coefficient is statistically significant in this case so that's the two-sided test which again from the point of view of just deciding whether or not a variable should be included in the model that's always going to be a good start and the picture that comes along with that in this case again we've got that T distribution we've got our two rejection regions on the right hand side in the left hand side and the cutoff point is our negative and positive iterations of the critical value so that 5% here 5% there and we see our calculated T here at negative one point nine nine surpasses that left-hand critical value so that's the picture of a significant coefficient rejection of that no love zero so last thing to think about here is what about if we have a preconceived theoretical expectation of what our coefficient should be and we want to conduct a one-sided test that we don't think it's gonna be negative we don't think it's gonna be positive and we want to focus just on one side of that distribution and for the most part this is gonna be the more useful way to approach hypothesis testing for individual coefficients another way to think about it is regardless of expectations if your model estimates a coefficient that is negative say we want to know is it significantly negative you get a positive coefficient is it significantly a positive so again just allowing us to focus in on one side of that distribution so here the alternative is gonna take on the sign that we expect to see so if we have say a one-sided alternative with a positive coefficient we expect it to be positive that is the alternative hypothesis and the null will take on the other possibilities 0 or negative so the process is exactly the same except now when we look up our critical value we're only going to be concerned about the area in the tail on one side of that distribution so the meaning of 10% or 5% is going to be a little bit different we don't have to cut that significance level in to write to get those critical points so the picture right again for a positive alternative hypothesis and a one-sided test the Alpha is all going to be the area under the tail here on the right hand side and then the 1 minus alpha again is going to be everywhere else including the left hand tail so here for a an example 10% significance all 10% is in that right-hand side rejection region so back to our example say we're looking at the coefficient on sales and again we would expect that to be positive so a firm that has better performance better sales we would expect their CEO to be compensated for that and in fact we do get a positive coefficient at 0.012 but is it significantly positive so we see the calculated T statistic 0.01 to 6 over the standard error for the sales coefficient 0.009 gives us that ratio of 1.4 so now again we've got to find the appropriate critical value so as we saw before the table isn't going to be super helpful because we're you know in between rows in that critical table so we can go back to Stata and again use that inverse T tape T tail function to call up the exact critical value so let's generate a new variable called as the critical and let's do this again at 10 percent so critical 10 one-sided equals I and BT tail and that's our degrees of freedom still 204 and now again because this is the area under the tail we're gonna put in 10 percent instead of the point O 5 which again is split up into the two tails that we saw before so that's all that up let's summarize that new variable there and it is in fact one point two eight six is our critical value and so clearly we see our calculated value of 1.4 surpasses the critical value so we can at 10 percent significance 90 percent confidence reject the null that our coefficient is zero or negative so it's going to be exactly that same process coefficient by coefficient it's up to you to decide the significance level the confidence level that you need but once you make that decision look up the appropriate critical value be able to make that conclusion so in the next installment we'll look at interpreting p-values and then we'll also look at interpreting confidence intervals and plotting out coefficients and confidence intervals so I hope that was helpful see you next time [Music]

Info

Channel: Mike Jonas Econometrics

Views: 2,031

Rating: 4.8823528 out of 5

Keywords: econometrics, data, hypothesis test, t test, stata tutorial, stata learming, @stata, t statistic, t-statistic, t-stat

Id: a7vB8wDd0es

Channel Id: undefined

Length: 16min 4sec (964 seconds)

Published: Thu Mar 12 2020