Estimation Of Parameters In Simple Linear Regression Model

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

welcome to lecture number four, in this lecture we will discuss how to estimate the parameters of a linear regression model, in the earlier lecture we had discussed that there are three parameters beta0, beta1 and sigma square, so if you try to recall in the earlier lecture we had taken the model y is = beta0 + beta1 x + epsilon and we had obtain n observations say x1, y1, x2, y2, x n, y n and we assumed that all this observation they are going to satisfy beta0 + beta1 xi + epsilon i, this is the model that they are going to satisfy, and if you try to recall we had created this diagram this was x, this was y, and then we had observed the point something like this and so on and we wanted to fit here a line something like this. we had given it a name say x1, this is my x1 y1 and this is my x2 y2 so this line is now in more technical terms this is the line which we want to be fitted and this is essentially the line y is = beta0 +beta1 x. so in this case we also had assume that this epsilon i has got mean zero, variance sigma square, and we also assume that f silent are iid that means they are identically and independently distributed. at this movement i am going to make an assumption that epsilon i are iid and they are following a normal distribution. i can write down briefly that iid, epsilon are iid, following normal zero sigma square distribution. this mean that all epsilon i they have been observed from the probability density function normal wi’th mean zero and variance sigma square, and we also assume that they are independent they all epsilon1, epsilon2, epsilon3, epsilon n they are mutually independent of each other. i would like to make here one note that when we are going for the least square distribution this assumption of normal distribution will not be of use there. when we are going for the test of hypotheses and confidence interval estimation then only this assumption of normality will be used, and later on when we are doing the maximum likelihood estimation in that case right from the first step we will require assumption of normality, so that you have to keep in mind. well, i will try to explain you as soon as i come to the maximum likelihood estimation and ordinary least square estimation. so under this setup now we try to estimate the parameters so our objective is estimation of parameter, and you have to keep in mind that there are three parameters beta0, beta1, and sigma square that we want to estimate. now i am going to use here two methods or two approaches, one is method of least squares and another is maximum likelihood estimation, first we try to understand what is this method of least square, now in this graphic if try to see we had said this is my random error involved with the first observation denoted has epsilon1 and similarly this is my epsilon2 and so on. so if you try to see in very observation i have got some random error now principle of least square says that i would like to find out this line, this orange line in such a way such that this random errors are minimum and most of the point they are lying exactly on the line, so in the first case i try to use the method of least squares. so first let us try to understand what is the least squares estimation. the principle of least square says that we try to find out the values of parameters in such way that the total error is as - as possible and most of the points are lying on the line. so if you try to see in this picture the random errors in the first observation is epsilon1, in the second observation this is epsilon2, and so on, so incase if you try to minimize the total error the total error is summation i goes around one to n epsilon i. but can you really do it, if i say we have to minimize it, does it make any sense. we had assumed that some of the errors are in the positive direction that is above the line and some errors are in the negative direction they are line under the line. so if try to sum them up, some may be very close to zero and that will be indicating that my observation do not have random errors, that is wrong so this idea does not work here, so this is not meaning full, so now how to do? let as try to considered and let us try to minimize i goes from one to n summation epsilon i square. now does this make any sense? answer is yes. why? because we had face the problem earlier because some of the random errors were negative so once i try to square them the negative become positive and now i can easily minimize it. well, at this stage you can ask that once i am trying to convert my negative random errors into positive random errors then another option is that i can take the absolute value of epsilon, yes, answer is yes. you can also minimize i goes from one to n absolute value of summation f silent i, yes, you also minimize the sum of absolute errors that is i goes from one to n, summation epsilon i. this is also available in the literature this is called as least absolute division estimation technique, but in this course we are not going to talk about it. so we will try to consider that we want to obtain the values of the parameters by minimizing some of squares of the random error. so the next question is how to minimize it? well, i can use the principle of maxima in minima. let us try to use the principle of maxima/minima and try to obtain the values of beta0, beta1 and sigma square so let me write the summation epsilon i square has as a s of function of beta0 and beta1 i goes from 1 to n, epsilon i square. this can also be written has a summation i goes from 1 to n, y, i – beta0, - beta1, x i whole square. the principal of maxima and minima says that we need to obtain the partial derivative of s with the respect to beta0 and beta1, put them = 0 solve it, and then check using the second order derivative whether the solution gives us the maxima or minima, so exactly we are going to follow the same rule. so if i try to obtain the partial derivatives first. so i try to obtain the partial derivative of this thing wi’th the respective beta0 and this will come out to be minus twice of i goes from 1 to n, y i – beta0, - beta1 x i and next i tried to partially differentiate this s wi’th respect to beta1 and this comes out to be summation i goes from 1 to n y i – beta0 – beta1 x i times x i. and now i try to put it = 0 put them = 0 and i need to solve it. so let we call this as equation number one and equation number two. if i try to solve this equation number one this can be obtained like as follows, once i open the bracket this gives me i goes from 1, 2 n summation y i - n times beta0 – beta1 times summation i goes from one to n x i put it = 0. or i can write down this thing here as a summation i goes from, one to n say y i minus beta0 – beta1 over n summation. i goes on here 1 to n x i is = 0 or i can write down here y bar – beta0 – beta1 x bar = 0. so solving this thing this gives me that beta0 = y bar – beta1 x bar, but this beta0 can be known to us provided beta1 is known to us, but up to now we do not know the beta1 so now i try to solve this equation number two and let us see what we obtain over here. let as try to consider this equation number two and we try to solve it. the equation number two is summation i goes from one to n x i, y i – beta0 – beta1 x i is = 0, and now if you just try to open the bracket and if you try to solve it we get here beta1 is = summation x i y i - n times x bar y bar upon summation x i square minus n times x bar x square summation is i goes around one to n. if i try to simplify, this quantity nothing i goes around one to n x i - x bar y i - y bar and this quantity in the denominator is summation x i - x bar whole square now keep in mind that this x bar and say y bar they are simply our sample mean, whatever the observations we had obtained based on the that i can find out there sample means, so x bar and y bar are known to us. so now i can see one thing that when we stated our model y = beta0 + beta1 x + epsilon in that model this beta0 and beta1 were known to us, but now i can see that once i have got the observations using those observation i can find out the value of beta1. so this i take as an estimator of beta1 in simple words estimator means that the value of the parameters that can be obtain on the basis of given set of data. so i have here parameters that is beta1, but is value is completely unknown to us now i am saying that using my observation i can compute the value of beta1 from this expression, which i have written here, so this is an estimator of beta1. so for the sake of simplicity let we try to rewrite here say beta1 hat is = sxy upon sxx where this sxy nothing but summation i goes from 1 to n x i - x bar y i - y bar and sxx is i goes from 1 to n x i - x bar. so now in the later lectures we are going to use this notation, so what i have seen now that we have obtain the value of beta1 has beta1 hat now the value of beta0 that we had obtained in the earlier slide here like this one this is going to be known to us only when beta one is known to us or i try to write down here that this value of beta0 can be known to us if i try to replace my beta1 by beta1 hat like this. so now using this expression i can again estimate my intercept term so this beta0 hat is an estimator of beta0. both this beta 0 hat and beta1 hat they have been obtained from the principles of least square or in this case we have a minimize the vertical distance between the observed values and the line something like here you can see we had minimize these thing. so they are also known as direct regression estimators this beta0 hat is the direct regression estimator of beta0 and beta one hat is the direct regression estimator of beta1 they are also called has least squares estimates or least squares estimators of beta 0 and beta1. well we have obtain these thing, but we do not know whether the values of beta0 and beta1 that we have obtain as a beta0 hat and beta1 hat are they really minimizing my random errors are they are maximizing it. so for that we have to find out the second order condition, here you can see i have got here two parameters and we are jointly estimating them. so we need to check about the bordered hessian matrix, so the hessian matrix of second order partial derivatives is defined here as h that can be a matrix of two by two wi’th the partial derivatives of the second order wi’th respect to beta0 and second order partial derivative of s wi’th the respect to the beta0 and then beta1 and on the second diagonal the partial derivative of s wi’th the respective beta1 square. this matrix has to be obtained at beta0 = beta0 hat and beta1 is= beta1 hat, so what i have to do i simply have to differentiate it again and then substitute beta0 = beta0 hat and beta1 = beta1 hat in the normal equation that we have to obtain here. in fact they are actually providing us global minima, you can see we have obtain the value of beta0 and beta1. so, if i try to write down compactly we had a model y is = beta 0 + beta1 x + epsilon and we have obtained a fitted model and this fitted model is y is = beta0 hat, + beta1 hat x, and this model is called as a fitted regression model. now after this i have to obtain the fitted values. what is this fitted value? you see when we conducted the experiment and we obtained the data. then there was some different between the observed data and that line, so if i try to make the earlier diagram here once again say this was your x and this was your y and this was your line and the observation they were lying somewhere here, here and so no. so if you try to observe this is suppose our x1, y1 and we expected that this value is going to lie somewhere here on this line y is = beta0 + beta1 x. so we had observed the values x1, y1 but i am expecting that this value should lie somewhere here. the value of y which is obtained using the observed value of xi this is the i’th fitted value. well, let me try to explain simple example suppose i have got here a data, which i can write xi and yi, suppose i have here four sets of data i take suppose xi = 1 and i obtain yi = 6. i take xi is = 3 and i obtain yi = 10, i take xi = 6 and i obtain yi = 22 and once i take xi = 7 i obtain yi = 21. now suppose after fitting the model after obtaining the values of beta0 hat and beta1 hat on the basis of this four pairs of observation suppose i get here a model y is = 2 + 3 x, so this means all this xi , yi they are also going to satisfy this model. now i can obtain here the value of yi hat, how i can obtain? for example my y1 hat that is going to be, i will try to use this model. so this is going to be 2 + 3 times xi, this is actually here one, so this is going to be five. similarly if i try to obtain here y2 hat this is going to be 2 + 3 times here three, so this is going to be 11, similarly y 3 hat this is going to be 2 + 3 into 6 and this is 20. and similarly y four hat this is 2 + 3 into 7 this is going to be 23. after this i can write this point here has a x1 and say y1 hat, so these are nothing but my fitted values and if you try observe what are these values, i simply have fitted the model on the basis of given set of data then using the given values of xi i am trying to obtain the values of yi, which are yi hat, so yi hats are the values of y. which are obtained from the model and they are called as fitted values. so i can write down the fitted value here as y hat is = beta0 hat + beta1 hat and now i am using a given value at x = xi so this is my i’th fitted value. let us try to see a different aspect, now i can find out the difference between yi and yi hat, so yi is the absolute value yi hat is the value of y, which is obtained from the model. so in this case if you try see i try to be note by ei and suppose i define it yi minus yi hat, so this becomes here 6 - 5 which is = here one this becomes here 10 -11 which is = her - one and similarly this becomes a 22 - 20 this is two and this becomes 21 - 23 = - 2, these values are called residuals. i try denote it by ei, so ei is nothing but the difference between yi and yi hat, and in general i can define e as residual the difference between observed and fitted values so this is my residual, now this residual has a very important property. if you observe in this picture am saying the difference between y1 and y1 hat now as per this definition is nothing but e1 earlier we had denoted the same distance as epsilon1 so you can see here that this residues are going to act like as we have observed the random error in my data, remember one thing residuals are random variables, errors are random variables and am not estimating the random errors by residuals, but they will look like as if they are the values of random errors and these residuals helps us a lot in obtaining the information about my random errors and this will try to discuss in the forth coming lectures till then good bye.

Info

Channel: Linear Regression Analysis and Forecasting

Views: 18,665

Rating: undefined out of 5

Keywords: Least squares method, maximum likelihood method, estimation of regression coefficient, estimation of intercept term, residuals, fitted model, relationship study, regression analysis.

Id: bo8K7YGgnug

Channel Id: undefined

Length: 28min 6sec (1686 seconds)

Published: Wed Jan 11 2017