Panel Data (Fixed Effects, Random Effects) - R for Economists Moderate 9

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] welcome back okay in this video we're going to be covering some basic panel data methods and regressions panel data is when you have multiple observations for each individual person firm state country whatever in your data all right that's what a panel data is and so of course the data the sets that we've been using so far not one panel did that I live in cross-sectional which is when you have one observation per person or a firm or whatever which the aged one data or they've been time series data where you have one thing observed in many periods okay panel data we got multiple things in multiple periods so we're gonna bring in a new data set or use for and again and we're gonna use a data set that's also from the Waldridge data like the wage one data it was it's crime the cross some crime data so let's go ahead and take a look at what this looks like this is a panel that standard panel data format technically it's what's called the long format there's also what's called wide format panel data but most of the time in economics you'll be working with long format although sometimes in finance you'll see you'll see wide format but don't worry about that anyway so let's look what we got so first will be a county here as this isn't data on some crime rates by county by year so we have the same county multiple times notice that the county number one is included multiple times and we have it in 1981 and 1982 and three eighty four eighty five and six and so on then we have other counties also over that same time period so we got county number three also from 81 to 87 county number five and so on and so forth within this we have the crime rate CRM RTE we also have the number of police per capita and that's what we're going to be looking at today so we've loaded in our data so we're gonna do some packs and panel data stuff we're gonna need the panel data package for that and that is the library PLM stands for panel linear model so we're gonna load that in so panel in your model what it does is it works a lot like a regular OLS linear model but it needs to work with some panel data so we're actually gonna not be working with data frames for this we've been working with data frame this whole time Cirie's obvious but mostly data frames here we're going to be working with is a panel data frame AP data frame okay so we're gonna need to take our crime day that we've already loaded in as a data frame and we're gonna need to declare it as panel data we're gonna need to tell our not just hey this is some panel data but also here's the individual variable the variable that tells me what individuals I'm looking at and here's a variable that tells me the period that I'm looking at what times I'm looking at okay so we're gonna declare our data to be a panel data set so we're just gonna call it crime dot P just for pan the panel version of the crime data and that's just gonna be P data dot frame the crime data that we already have and now I need to tell it what the parts of it this are that actually give it's the panel structure like how does it know that I want County as the individual individuals I'm looking at and here as the time variable how does it know that it's not oh urban is one place and not urban is another place so right I need to tell it so I'm gonna need to tell that the index of this is and I'm going to use a vector here some of the first thing the vector is going to be the individual variable and that is of course County the second very thing I'm going to put in is going to be the time variable which is year so if I do this I now have a new data set which looks exactly the same but now it knows that County is the one that's telling me what the individuals are which individual counties I'm looking at and years the variable telling me what my different times are the time variable okay so I've got my new data let's go ahead and run a panel model okay now there's a lot of different kinds of panel models that you can run now conveniently PLM can handle pretty much all of them at least all the basic ones right it can handle what fixed effects it can handle random effects it can handle first difference models lots of good stuff okay so let's run all three of those why not so let's go ahead and run fixed effects first so a fixed effects model is it is also called a within model because it only focuses on variation within the individual observations right so we had County one beyond County two eight County three basically what a within model is going to do our fixed effects model is going to do it's gonna say I'm gonna basically take all the variables that I'm interested in I'm gonna calculate the mean within each County and then I'm going to subtract that mean out which means that the only variation I'm gonna have left is within County right all the differences between counties don't wanna not interested in tossing it out I'm only interested in the variation within counties right I'm comparing a single County to itself at different periods of time okay so we're gonna do with the PLM function we're gonna be like I mentioned regressing the crime rate let's go ahead and actually do the log crime rate why not let's do as a function of the number of police per capita okay we're gonna do it with the crime dot P data set and now we're going to need to tell it what kind of panel model I want to run so I'm going to tell I'm gonna book um kind of model do I want and there are a couple of different options we're gonna use the like I said the fixed effects model which here is called a within model because we're only looking at variation within individuals okay I'm gonna run that and let's go ahead and bring in stargazer and let's look at it with Stargate here we go all right so it looks like police per capita is positively related to the crime rate which in this guy mean we're now it's not saying that more police cause crime probably what's happening is that the high crime rate means that lots of police get assigned there but that's what that's what we're looking at okay so we've done a within model let's also do a random effects model that's it that's another common one that you might run our random effects basically what it says is that each individual county has a different intercept but we're not gonna let it be whatever it wants to be with lurch is what which is technically what we're doing with the fixed effects model instead we're going to say it's gonna follow a normal distribution okay that's what it's doing but it's the exact same code so let's go ahead and do random effects this of course is fixed effects and X so all we got to do is change the model type to written there we go random effects okay let's go ahead into a first difference model as well so in a first difference model is is that instead of is it just takes each variable and it subtracts the value from the year before okay so you know if we're looking at our crime data here are we're gonna have as the police presence right okay so let's say we're looking at 1983 for County one it's gonna be the first difference so instead of being this number right here it's gonna be this number minus this number that's gonna go in there so it sort of does a similar thing with first with fixed effects and that it takes out some of that individual very ages focuses on the differences within a different County in different years but it's not strict in quite the same way so we're gonna do the first difference and we just got to do F D that one that okay so we run a couple of different ones there's also some other models in there you can look at the help file for PLM there's for example a between model so the within model only looked at variation within counties across different years the between model only looks at variation within years across different counties we don't use that one too often in in economics but it is there okay one thing you might be familiar with with panel data and specifically talking about fixed effects and random effect is the Haussmann test which allows you to compare the results of the fixed effects and random effects models and see if they're different all right now the reason we might want to do this is that random effects take some additional assumptions above and beyond what fixed effects takes and so if those assumptions are wrong then the random effects model is going to be biased and we don't want that however if they're right those assumptions are right then it's more efficient smaller standard errors and we do like that so we're gonna check if they're different if they're different that suggests that probably the fixed effects is the way we want to go if they're the same then you might be able to get away with random effects now in practice and economics we will typically run this thing and then even if it passes we still use fixed effects so I don't really think there's a whole point to it let's run it anyway so we run a Houseman test comparing random and fixed effects okay this is the pH test and all we got to do is take the pH test and just feed it our different mouths we got fixed effects and we have our random effects models that we created earlier we run that we do reject that there is a which suggests that we're going to favor fixed effects in this scenario okay we might also want the other thing we can do with a panel data model is that we can include a lag of something so you might want you to regress the the log prime rate not just on the number on the member piece that police their per capita but maybe also on last year's crime rate right so we're so basically just number please per capita affect how Crime Act changes from one year to the next right so we're going to take our same model here let's go ahead and do it with fixed effects fixed effects with a lag and all we got to do is do a lag right there a log crime rate can run that and then when we do stargazer on that one what it's going to show us is that we have in our model as an independent variable the log of the crime rate in the previous period which is indeed what we have right there there's the lag of the crime rate now be careful with this lag function be sure to only use it inside the PLM command because there's another lag function in our that does something different and so if you use the lag function outside of PLM it will often use that lag function instead which you don't want instead of the PLMS lag function which you do want right there we go now one common thing that you might want to do with a fixed if I ran a panel data model especially a fixed effects model is to cluster your standard errors at the individual or group level all right so the command for doing this we're going to use Co F testing and now you might remember we've done clustered standard errors before with a function called Co F test we're going to use coop test again when it comes to panel data models the co effects works a little bit differently and I don't even have the code memorized so I'm just gonna paste it in from earlier so there we have it so this is Co F test here and what we are feeding it is our panel data models we're going to get our fixed effects model okay and now it's done a little bit confusing I admit but we're feeding in our model right just like we've done before with Co F test in order to get robust standard errors or clustered centers we're using v cove HC just like we used before to get robust standard errors except this time it's going to give us clustered standard errors like I said a little bit confusing this is the kind of thing I would recommend just sort of copy and pasting the code for rather than thinking about it too hard we then the however this time we can't just give it the v cove HC we also have to give us some information we're going to give it the model that we're working with again we're going to tell the type of clustered errors we want here I've done the HC 0 had a response I see consistent type 0 there's some other kinds in there if you want it to match what you do in Stata that's sss is the type you want here and then we tell it what kind of clustered errors I want and I'm going to tell that I want group standard errors which is the individual standard memory so the individuals here as opposed to a timestamp cluster there I do that and of course I get my clustered standard errors for my regression model ok that's about it that's the basics of how we can work with some panel data how we can get things like fixed effects models random effects models first difference models run a Houseman test get some clustered standard errors include a lag in our model all kinds of good things that's it I will see you next time [Music]
Info
Channel: Nick Huntington-Klein
Views: 24,318
Rating: 4.9534111 out of 5
Keywords: programming, economics, coding, Rstats, statistics, educational, panel data, fixed effects, random effects, within, between, hausman
Id: 2igMNODFypk
Channel Id: undefined
Length: 13min 6sec (786 seconds)
Published: Sun Oct 28 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.