Ordinary Least Squares Estimators - derivation in matrix form - part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
either in this video we're going to drive the explicit form of the least-squared estimators or at least be honest aunty when we're thinking about the matrix form of econometrics so let's just remind ourselves what a least square estimator actually does so the idea here is that you have some independent variable X and we were trying to predict some dependent variable Y so this is just a bivariate model where I've just got one independent variable X causing a variable Y and the idea here is that you have a whole set of observations and if we were trying to fit a sort of linear line to that data the idea would be that we are choosing the y-intercept so that's this sort of value here together with the slope of that line which is a sort of M in our store wire y equals MX plus C notation and we were choosing those parameters in order to minimize the sum of squared distances or square vertical distances of each point from that line and the reason we considered vertical distances opposed to horizontal distances is that we were trying to minimize hours of sum of square prediction errors in our dependent variable Y is after all the thing which we're trying to predict and we wrote out what this sort of least squared form has in our old sort of summation notation it would just be equal to the sum from I equals 1 to N of our sort of errors squared so if we had just as a whole population errors then we would sort of just use our error squared but the idea here is that in fact we use our residuals in place of the population areas because we don't actually observe them and if we specified some sort of linear model which was y is equal to alpha plus beta X ie plus our UI as being our population process we can write out our sort of equivalent in the sample as being the sum from I equals 1 to N of Y I minus alpha minus beta X I or squared and the idea here is that we used alpha hat and beta hats in page of alpha and beta because the idea is that we're choosing our estimate of alpha and beta in order to minimize this sum of square errors so alpha hat is estimating alpha and beta hat is estimating beta so there is the idea that there is some sort of tree population value of alpha and there's some population value of beta but because we only have a sample and because we don't know the population process the best we can do is estimate them so the idea is that we differentiated s with respect to these two parameters alpha hat and with respect to beta hat and when we set those two derivatives being equal to zero and then we could solve explicitly for the least squares estimators of alpha and beta which we denote by alpha hat and beta hat okay so that that was how we sort of went about things when we were using the summation form of econometrics and explicitly when we are talking solely about a bivariate case but the problem with the summation formula of econometrics is that it doesn't generalize particularly well when we're thinking about the multivariate case so that's when I have sort of more than one independent variable so let's just remind ourselves of what the matrix form of econometrics is the idea is that you have a sort of vector of your dependent variable being determined by a matrix of all your independent variable observations times your parameter vector beta plus your population error U and the idea is that we don't actually observe the population error use so in fact what we minimize is a sort of sum of square residuals so that is why we sort of replace u by u hat and similarly we don't observe the population vector of parameters beta so we have to estimate it so that's what we use a beta hat rather than beta so how can we form this sum up here about using our sort of vector notation well actually turns out it's quite simple to write I couldn't actually make it by just taking my residuals taking the transpose of that vector of my residuals and then multiplying it by the vector of my residuals so this if we sort of to write out an explicit form would actually be u1 all the way through to u n times u1 all the way through to u n so notice here that we've got a 1 by n Rho vector multiplied by an N by 1 column vector so the idea here is that the inner indices cancel and we just left by with a 1 by 1 entity at the end of the day so and that's actually what we want we want to minimize some sort of scalar some so and actually if we were to explicitly sort of writing this out then what would the first term be we'll it would be this sort of u1 here multiplied by the u1 here should we just get u1 squared plus the second term would just be u2 squared and then we sort of continue all the way through to u N squared so when we write out explicitly it's quite clear to see that this is of exactly the same form as we had in the sort of summation form of econometrics ok and actually one thing I should just note it is that I should actually have hats on all of my variables because of the fact that we don't actually observe the errors in the population we just observe our estimate of that so the idea is that we're trying to minimize this sum of residual squared in order to derive our form of our estimators for the population parameter vector beta but the problem is at the moment is that this sum here doesn't actually explicitly contain any of our population or our estimates for the population parameters beta so we need to remedy that and that's again not difficult to do if we just recognize that from this relationship we have here we can just rearrange for our residuals being equal to our vector of dependent variables wine minus x times beta hat and if we then substitute this form into our form which we had down here for our with some of residual squared then we can do that quite easily we just get Y minus X beta hat or transposed times y minus X beta hat all on its own so this first term here is our sort of residual vector transposed and the second term here is just the residual vector itself so the idea is that we're trying to choose our parameter vector beta hat in order to minimize this sort of product in the next video we're going to continue with this derivation of the least-squared estimators in matrix form I'll see you then
Info
Channel: Ben Lambert
Views: 114,825
Rating: 4.933434 out of 5
Keywords: Estimator, econometrics, Ordinary Least Squares
Id: fb1CNQT-3Pg
Channel Id: undefined
Length: 7min 30sec (450 seconds)
Published: Tue Jun 25 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.