Hypothesis Testing Part 2: Confidence Intervals and Coefficient Plots

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everybody we are gonna take a look today at confidence intervals applied to OLS regression coefficients we'll also take a quick look at generating coefficient plots all with some examples in Stata so let's take a look alright so this is a follow-up to our hypothesis testing part one link in the description that walks through your basic T statistic applied to OLS regression coefficients so now we want to expand that out looking at the confidence intervals that can be really really useful ways of thinking about the accuracy of our point estimates those beta hats that we generate so in terms of our definition our interpretation a confidence interval and we'll stick with a 95% confidence interval so that number can change in your calculation will depend along with that choice but a good way to think about it is it shows the interval the range in which 95% of our sample estimates of our parameter of our OLS beta will lie right so this again goes back to that theoretical idea of repeated sampling right if we could randomly sample from a population over and over again and estimate our model again and again and again what this interval tells us is that 95% of the time the coefficient of interest will lie within this range okay so obviously be the wider that ranges the less precise the less accurate are our estimation procedure is ending it so when we talk about it Oh LS regression model we are making the the naive assumption here that were in that gauss-markov world right so imagine we have a multiple regression model Y as a function of x1 and x2 in that linear combination of our coefficients we're making our kind of basic assumptions that that error term that UI is on Auto correlated its homoscedasticity error variance it is uncorrelated with our X variable so all of our X variables are exogenous and crucially for the process of hypothesis testing we make the normality assumption as well okay so under those conditions with a again a and assumed or at least asymptotically normal distribution for that error term we can calculate the probability of being within a given range of outputs if we know that distribution so distance from the centre of a normal or a T distribution can be calculated right in terms of the percentage probability of lying within that range so we're talking about the area under that T distribution of a normal distribution the whole curve so this sounds a lot like what we did with our hypothesis testing right we wanted to have a a given percentage significance or confidence that the true value is beyond a certain point past a critical value and definitely the the the issues are related one key difference here is that with our T statistic hypothesis testing right the assumption there was under the null hypothesis that the distribution was centered at zero and we were looking for evidence to counter that right to reject that null hypothesis when we make these confidence intervals were implicitly assuming that the distribution is centered on our estimated value on the beta hat itself and that lets us calculate those percentage areas right around that estimated value so there's a little bit of a different point of view here versus that t-test and in terms of what we can do with the information that comes out of this again why is this gonna be useful well partially it's gonna mirror the hypothesis testing results right so if zero does not lie within that interval that's consistent with saying we are 95% sure 95% of the time our estimated value will not be zero that will be consistent with a statistically significant coefficient estimate and the other way to think about it render the the more precise our estimate is the more accurate it is the smaller that interval is going to be so we got the idea here let's let's call up some numbers and then kind of see exactly where these values are are calculated okay so we use the same kind of practice data set that we did in hypothesis testing part one right so we're gonna call up that CEO salary data set keep diving rum so it's CEO Sal one with the BC use command so again this comes out of the Wooldridge introductory econometrics textbook and let's go ahead and start with the same regression that we ran the last time so we're gonna regress the salary of a sample of CEOs as a function of firm level sales return on equity return on sales and then we'll throw in a dummy variable for whether or not the firm is a utility or not so let's take a look at our top row here so we've got all the information for our sales variable so our coefficient its standard error its T statistic its p-value we'll get into that next and it's confidence interval right so based on our definition here what we could say is if we reran this sample right again and again and again 95% of the time we will expect our estimated coefficient to lie between negative 0.005 and positive point oh that's kind of the range of expected values so how are those numbers calculated where did they come from well again the the Assumption here is that we are centering our distribution on the coefficient estimate itself so from our example would be the 0.01 to and a good way to think about this is that we are using that same T distribution right so the ratio of the coefficient to its standard error and then figuring out how much order I should say we're trying to figure out in what range at what interval will we find 95% area about that estimated coefficient so if we go ahead and divide the coefficient by the standard error we can just add and subtract the 2 point 5 percent significance critical value from that center point and that will give us two and a half percent area in the left hand tail two and a half percent in the right hand tail and 95 percent in the middle and in fact there's absolutely nothing wrong with doing it that way and all you need to be able to do is look up the appropriate T critical value in the table and we'll find where that ninety-five percent is going to lie so our calculation here would be our T statistic for the coefficient the beta hat over a standard error plus or minus the two and a half percent critical value and like we said that's absolutely fantastic except it would be a lot more useful if these ranges were in units of the coefficient itself the beta hat rather than the beta hat divided by its standard error so that's simple enough and if we take that beta hat over standard error plus or minus the critical value multiplied through by the standard error and now we'll get our confidence interval the same result where 95% where we have 95% confidence that our values will alive but it's in units now of the coefficient itself so we don't have to do any mental gymnastics to interpret what the numbers actually mean so our plot now becomes centered on B one hand and the upper and lower bounds are again in units of our coefficient but the areas of concern are still exactly the same so these numbers that we see here for our sales the negative 5 positive 0.03 this is how they are calculated so to actually walk through the the exact process the only thing that's missing is the appropriate critical value right so again we flip to the back of the book our statistic book our our econometrics book and we look at the table and of course the critical value depends not just on the significance but also on the degrees of freedom are n minus K minus 1 so as we saw last time we're not gonna have an exact row in our critical table in most cases right so here we have 209 observations we have 4 explanatory variables so K of 4 and then minus one more for the constant we have an N minus K minus 1 of 204 which contigs act row to give us our critical value so we're somewhere in between these ranges here for our 2.5% one-sided or 5% two-sided so we're gonna use the inverse detail command it's data to calculate the exact critical value appropriate for our degrees of freedom so we can generate a new variable call it crit five call it whatever you want but that seems to make sense and the command here within the generate statement is inv t-tail and then we put in our degrees of freedom in our case 204 and then the area that we want in each tail of the distribution in this case it's the point O 2 5 hit enter we've got a new variable created here and it's creating this for every observation so we'll just take the the average of it so summarize our variable crit 5 and we get a value 1 point 9 7 which should lie right in between these guys and it looks like it does and that's the last kind of missing ingredient for doing this calculation and if we wanted to to walk through it for that sales variable or at the beta hat plus or minus our critical tee times the standard error our coefficient to 0.01 to 6 1 from here plus critical tee at 2.5% times the standard error gives us the upper bound of our confidence interval here of the point O 3 and then the lower limit coefficient minus critical value times the standard error equals minus 0.005 one so again what does that tell us 95% confidence that our outcome will lie within that range note that zero is within that confidence interval so that's consistent with an inability to reject the null of zero or an insignificant coefficient so this T step will not surpass the 5% critical value and we should be able to do this same calculation and the critical T is gonna be exactly the same for all of our coefficients and replicate where these values come from and the picture would look something like this so again it's centered on our beta hat coefficient the lower bound and the upper bound we've got the middle 95% here and that's the range that's the realm that we expect our values to fall okay last little thing here is just to get the visual of this process right by plotting our coefficients along with their 95% confidence intervals right so a really nice useful way to give the reader of your regression results of your economic research tell a little story about how the levels of your coefficients and their accuracy they're 95% confidence interval changes across specifications and of cross variables and luckily Stata has a command that's going to create this for us in a real easy way the co F plot command so if we just go first off we need to install this right so we need to do SSC installed Co F plot Co e FP l OT and that's gonna install so now we're ready to use it and there's a whole documentation as with most commands so if you type help Co F plot you'll get a lot of different options much more than we'll talk about here but that shouldn't have any trouble with that and if we just type this in without any options co f plot so what that's going to do is give us the plot of the coefficients in 95% confidence intervals for the most recently estimated set of results so it's going to come back to this regression that we have up here and let's see what we get not great right so what this tells us is these are the values of the coefficients the dots for our various variables whose labels we actually can't read and then these bands here are the ranges of the confidence interval and you can see right away the the problem of using this default plot is that there's only one unit of measurement right and our coefficient values are going to be much different depending on the units of measurement of our variables and so you can't even see the 95% confidence intervals for these top coefficients relative to what we have down here so we're gonna make some some adjustments here before you do anything else probably the first adjustment I would recommend is we would go F plot and then we want to drop the constant so go drop underscore co NS and then maybe put in a line at zero so X line is zero so that'll help a little bit so here is zero here's our coefficients we got rid of the constant but we still have this one variable that is dominating the units of measurement here that we still can't see exactly what the what the variables are so one other kind of option that I generally always use here so we we're gonna back up drop the constant put an X line at zero and let's use the no label option no labels so that'll give us the names of the variables so that looks a little bit better because it was trying to fit the entire text of the label of the variable and there so sales are art of us utility they get the magnitude of that utility variable is is dominating what we see here so here is kind of the the most useful application of this command is to show multiple specifications right so let's kind of go back to the beginning and run a regression of salary sorry salary just as a function of sales so get normally you wouldn't want to run a regression with only one variable let's go ahead and do that and then let's store those coefficients so VST stos estimates store and let's call it model one and then we change the specification we run the same regression we regress salary as a function of sales but now we add in our OE you see where this is going now let's save this PST sto as model two just for the sake of the example let's do it one more time regress salary as a function of sales roee and let's put that utility variable in there and let's save this EST sto save these coefficient estimates as in three model three now what we can do is use that Co F plot again and apply it to model one model to model three so the names that we applied to those estimates but now instead of dropping the constant let's just keep one variable sales and let's also put in the line at zero for reference and let's also do the no labels option so just put in the the name of the variable and what this should show us is how our coefficient estimate changes across specifications so this would be something we would call a robustness check right no matter what we do to the to our model we always get these similar results right so we get a positive coefficient estimate in each case but note that in each case that confidence interval right includes zero so in none of these specifications do we have a significant coefficient but we can just look at this plot and it tells us exactly the story of our model specification search rate as it changes how do the results change so maybe not the greatest example but you can see how this would be a really useful tool so hopefully this was helpful we will take a look in hypothesis testing part 3 at the p-value the probabilities and so I will see you then Thanks [Music]
Info
Channel: Mike Jonas Econometrics
Views: 973
Rating: 5 out of 5
Keywords: econometrics, stata, hypothesis test, confidence interval, t-test, coefplot, econometrics lecture, econometrics tutorial, t test
Id: 19bZcpWp5Ck
Channel Id: undefined
Length: 20min 24sec (1224 seconds)
Published: Wed Mar 18 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.