Panel data econometrics - an introduction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video I want to provide an introduction to panel models and also explain some of the difficulties which estimating panel data models actually present so the example which I'm going to talk about here is let's say we were interested in in the various factors which influenced a house price in a given City I at time T and we were explicitly interested in what was the effect of the crime rate in that particular city I at time T on house prices so the magnitude of that effect is given by this coefficient beta one here but we suppose that there are a whole range of other factors some of which are so early time dependence on nickel then VT some of them which are city dependent but don't vary across time so they call that alpha right and then there's some sort of idiosyncratic factors which also influence house prices in a city I at time T so let's talk through each of these three terms here and essentially each of these three terms are our error tones here so this BT what my map represent well this is representing terms which are solely time dependent and this might represent things which are time dependent but don't vary across cities so it might represent the sort of upwards trend in house prices across time since there is some sort of covariance in house prices across properties in the United States and this trend might be Cove area across cities because perhaps the average citizen in the US has got slowly richer across time so that might be some of those factors which are contained within this VT term here how about this alpha right so by just saying it has a subscript iron no subscript T I'm saying that it is solely what I'm going to call city dependent so just so absolutely clear here the variable I take on the value 1 through n where capital n represents our last city in one represents our first and small T here takes on a value of 1 3 T where represents the time period so what sort of factors might be contained in this city dependent era term well these are things which don't vary across time so it might be things like for example the geography the geography almost certainly doesn't vary across time some other factors are approximately time independent so that might be things such as the demographics of that particular city because that basically doesn't change that much from perhaps one year to the next one month or next it cetera a range of other factors such as race and the general education level will also be approximately constant through time okay so before we estimate this above model I'm gonna write it in a slightly different way which is the way which we normally start off thinking about panel models so we're gonna have that the help house price of City I at time T is equal to beta naught plus beta 1 times crime in sitio at time T but when we come to look at this error term VT here what I'm actually gonna do is I'm actually going to include a dummy variable for each of these time periods which we're talking about so we're having some constant gamma 1 times our dummy variable for time period t which I'm gonna call delta 2t plus now I'm gonna include another dummy for the third time period delta 3t all the way up till i have mine so the gamma t minus 1 delta T small T so I've included a dummy variable for t minus one of the periods I don't need to include it for the first period because then we would fall into the dummy variables trap if we were to do that but essentially I have included dummy variables for each of the different time periods which allow for house prices across different cities in the United States to be changing over time okay so I could do exactly the same thing for this alpha right term up here I could just include dummies for each of my n cities and that is actually a perfectly valid estimator and we'll talk about that in due course but as a sort of first starter we don't want to do that and the reason for that is generally that when we have panel data we normally have n which is quite large so we're talking about a large number of cities but the time periods are relatively small so we don't have that many time periods so it's perfectly fine to include dummies for every single time period but this whole expression will become completely unwieldy very quickly if we start some to be dummy variables for each of the different cities so typically what we do is we still contain this alpha right turn here within our errors okay so what do I mean by that well I mean that the actual error which we see for our regression is the sum of alpha Rhein and its idiosyncratic error uit and I'm actually going to call this a particular term I'm gonna call it eater pointy okay so why can't we just estimate this model here via called OLS in other words well I can't we just treat each of our different observations across different cities in across different time as if they're just randomly sampled observations what's the problem with doing that well remember that we require in order for our OLS estimator to be consistent we require that the covariant of our error which is now this Ito righty with our independent variable which in this case it just we've just a one independent variable so we got crime I T has to be equal to zero and note that this is just the requirement for OLS to be consistent in order for it to be unbiased we require it to be covariant with or going to require the covariance of this error we the values of crime for all time periods which aren't necessarily equal to T has to be equal to zero but in principle I'm talking about circumstances where we have quite a large sample so we really only require the conditions for OLS to be consistent so this has to hold for all I and for all T but we don't have to worry about an S here so we just have these two particular requirements here okay so why is this likely not going to be the case if we estimate this above model as I've specified it here well let's actually break out this e to write e into its various parts so we've got here each RIT I can replace by alpha Rhine plus UI T because remember I just defined either ID to be the error which we actually in we don't really observe but it's the inferred error which we have from our regression so I've got the covariance with alpha right plus UI T with crime in city I at time T well we can assume perhaps quite safely that UI T isn't correlated with crime I T so I'm actually just going to assume that that term isn't important so we're just gonna be left here with the covariance of alpha ROI with crime I T but why can't I necessarily assume that this is equal to zero well the reason is that essentially these time independent factors so those factors which are city dependent such as geography demographics race and education are likely correlated with the crime rate as an example I might suppose that as the average age of a city goes up the crime rate which I'm representing here by the horizontal or so the vertical axis likely fall so as the average age of citizens increases the crime rate likely goes down so there's almost certainly some covariance between the age rate which I'm here in terms of demographics with the crime rate similarly you could suppose that as the level of education in a city goes up perhaps the crime rate would decline and you might suppose that the ethnic fractionalization of a city so that's the number of different ethnic fractions present within a city might also be correlated with the crime rate and note that even though we haven't included each of these terms in our aggression or in fact because of the fact we haven't included these terms in our regression there is gonna be some covariance of these time independent factors with our independent variable which is going to mean that the covariance between alpha rye and crime rate doesn't equal zero which means that the covariance of our error eat RIT with the crime rate I T is in city I time T is not equal to zero so in these circumstances here OLS will be both biased and inconsistent so just estimating this above model here by pooled our lense is not going to be a very good thing to do because of these various factors which are contained within this alpha right term here and this alpha right term here is sufficiently important for us to have a particular name for it it is known in econometrics as unobserved heterogeneity and it's unobserved because we don't actually observe necessarily these various factors which are constant through time and it has ro genius because of the fact that it varies across in this example city and it is this unobserved heterogeneity which leads put our less estimates to be able to be biased and inconsistent so what can we do well the answer is we need a new type of estimator and that's what I'm going to talk about in the next video I'll see you there
Info
Channel: Ben Lambert
Views: 165,006
Rating: 4.9111109 out of 5
Keywords: Econometrics (Field Of Study), panel data, panel models
Id: aYx88zmTM0U
Channel Id: undefined
Length: 11min 2sec (662 seconds)
Published: Fri Oct 04 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.