welcome to the lecture number three from this
lecture we are going a start with basic fundamental of a linear regression modeling we are essentially
going to a start with the chapter simple
linear regression analysis, well in practice the contents of this chapter or the basic
concept which we are going to learn in this chapter may not really be helpful. more important
chapter will be the next chapter that will be multiple linear regression analysis, but
whatever concept we are going to learn here they are going to build up the basic fundamental
for the next chapter. the different between simple linear aggression
analysis and multiple linear regression analysis is that, in the case of simple linear regression
we are going to consider only one input variable whereas in the case of multiple linear regression
analysis we are going to consider more than one input variables and in practice we know
that any output is depend on more than one input variables. so that will be a more realistic chapter,
but whatever the concept we are going to discuss in the case of multiple linear regression
model they are base on the concept that we are going to learn in this simple linear regression
model and as a instructor it is also easy for me to explain things when there is one
variable, and i can use this one dimension and two dimension graphics and later on i
can simply extended to a multiple case. here we are going to consider a situation
where we consider only one input variable. so now we consider that the output variable
y that is linearly related by this function beta0 + beta1 x. now you can see here, this
is the same model that a we had discussed in case of lecture number one and lecture
number two okay, so in this case just for your information this is called output variable. this is how we have denoted it earlier and
this was we had called earlier in an elementary language as input variable. now we are going
to talk about that in pure statistical language. in pure statics language this y and x they
have got different names for example this y is called as study variable and in connection
with the study variable this x is called as explanatory variable. this has got several
other names also. for example this y is also called as response
variable and when we talk of response variable then we talk of x as regressors or say sometime
regressor variable and so on and this is also called as dependent variable, y is called
as dependent variable and in connection with dependent variable i can call this x as independent
variable, so similarly there are some other names also that are popular in the literature. but these are some common names that we try
to use for this y and x and just for your remembrance this beta0 we had taken this was
our intercept term and this beta1 was the slow parameter if you try to recall in the
lecture number two we had consider the linear equation y is= m x + c which i had translated
in terms of y is = beta0 + beta1 x, so this is my model. the question is that how to know this model,
we had discussed earlier that model is nothing, but a function between the output and input
variable and when i say i want to find out the model that is equivalent to saying that
i want to find out the parameters of the model. once i say that i want to know the model,
our objective is to know or find the parameters.up to now we have a defined here two parameter
one is beta0 and say another beta1. well they can be some other parameter also
that we will see it as we move further. now we are discussing here the technique of regression
analysis, we had discussed that regression means to move in the backward direction, so
what are we going to do here that we are going to a conduct an experiment or we are going
to observe some data in some real life experiments, and finally we will have sample of data. now i have to find out the values of the parameter
or equivalently i want to find out the model using this data, so let us try to start with
this thing, suppose i say that now we are going to conduct an experiment and obtain
data. well, we are going to find some finite number of data. let us say we want to obtain
n data sets. what is this mean? that means we are going
to conduct the experiment and record the data n times this means what? for example if i
consider a same example that we consider earlier that y is the yield of a crop and x is the
quantity of fertilizer then for example i can take here that x1 that is= 1 kilogram
of fertilizer and i give it to the field and then i try to observe how much yield i get
after sometime. suppose i get the yield y one = 20 kilogram,
then next time i try to take here 2 kilogram of fertilizer and this will be the value of
x2. and suppose i get thirty kilogram of outcome, thirty kilogram of yield so this going to
be my y2 and so on i try to repeat it in times, so what will happen at the end i will have
a data something like x1, y1, x2 y2 and so on say x n, y n. so, this is my observed data set. now what
is your s your objective? your objective is that using this data set i have to find out
the value of the parameters of my model and that would be nothing but finding out a model.
now the next question is how to start it, well, we have assumed that there is a linear
model between y and x that we have to keep in mind because we are going to consider here
only the linear regression modeling. so the first step is that i can plot this
data on a two dimensional plot something like this, this is my x axis, this my y axis and
suppose i plot data and say x1, y1, say x2, y2, x3 y3 x4, y 4, x5, y5 x6, y6 and so on.
now, one can see from this graph that the points are following a sort of linear trend
right and one can see here that if somebody tries there is a trend like this one. but this only a trend and now if i assume
that this is my model for a while then i would like to know what is the equation of this
line? so incase if you try to see let me mark here these values here something like this
my x1, y1, this is my x2 y2 and so on. on other hand it is not always necessary that
the trend between x and y is always increasing in fact there can be other possibility also
that the observations are lying like this one so once you get the data try to plot it and
then try to see the trend in the data. incase if the trend in the data is linear that give
us first assurance yes, a linear model can be fitted to the given dataset or in simple
words i can assume, yes the process can be describe by a linear equation. now you if
you try to see i simply assume that there is a model, the model is now a going to be
something like y is= beta0 + beta1 x this the model that we have considered. i believe that whatever observation i have
obtain here x1, y1, x2, y2 and x n and y n they are actually going to follow this model
and this observation are generated from this model this means that these set of observation
will satisfy this equation, beta0 + beta1 xi, i goes for 1 to n. but if you try to observe
this graphic this two dimension plot is it really happening? no, if i try to write down this orange line
is something like y is = beta0 + beta1 x, and we are assuming that whatever are my points
here they are lying on this line, so essentially what we are assuming that this point, this
point is going lie somewhere here, this point is going to lie somewhere, this point is going
to lie here and well this point is lying on the line and so on. but now you can see that this model is not
appropriate because that is not really describing the true phenomenon. the observation which
i have got here x1 y1 x1 y1, x n, y n they are indicating, yes the approximate relationship
is here yi is= beta0 + beta1 xi, but this not exact. so i can do one thing in order
to make this model more realistic i can add here a term, for example i can rewrite it
yi is = beta0 + beta1 xi and + some random error epsilon i, i goes from one to say here
n. what is this epsilon i, epsilon i is the random
error involved in ith set of data random error. so when i observe the ith set up of observation
xi, yi, this has got some error epsilon i. so i can assume here for example in the case
of this first observation x1, y1 i can write that okay this difference whatever is going
to be here this is trying to denote something like epsilon one. similarly when i try to go for this x2 y2
this difference is going to be something like epsilon two and so on. so in some cases you
can see here that this epsilon one this distance, this is the distance of x1, y1 from this line
that we want to know, that we want to fit and this point x1, y1 is lying above the line
whereas if you try to see in the case of x2, y2 this epsilon two is the difference between
x2 y2, and this line that you want to fit and this observation is lying under the line. the same thing is happening with all other
points, so i can say that some random errors are in the upward direction of the line and
some random errors they are under the line, so i can assume that here that epsilon i has
to be a random quantity, which takes some value that can be positive or that can be
negative. so now i call this thing as a linear regression mode and if you try to see what
is the difference between this model and this model. now if i say what is the different between
these two model, so if you try to see this was a sort of exact relationship, but this
is a this is going to be a, going to depict a statistical relationship, this means what
i am not saying that all my observations are exactly going to follow the straight line,
but they are very close to the straight line and there are certain random errors which
are deviating the observations from the main line. so now i can see her that this is, this orange
line is going to depict the line that you want to fit on the basis of this given set
of data say x1, y1 x2 y2, x n, y n and now our objective is this how should i find this
line, but before that we have to make some assumption for this random error. so what
we try do here, we are going to assume here that the mean of epsilon i is 0. in statistical language i can write down here
say expected value of epsilon i is = 0 for all i goes from one to n, well if you don’t
know the meaning of this e, i can just explain you before i move further e is the
expectation operator. we define it in the following way if i say
if there is a random variable, some random variable z, which is going to follow a probability
density functions fz with a parameter theta, this is my probability density function where
z is the random variable and theta is the parameter then we try to define say expected
value of z integral over the range of z, f z, of dz. incase if you are using a discrete
random variable then in place of the probability density functions. we try to use the probability mass function
and this integral is replaced by the submission, similarly at the same point i can also tell
you that how do we define the variance. we will denote it by var, variance of z, this
is defined as expected value of z - expected value of z whole square. so this is nothing
but integral of z- expected value of z whole square f z, d z and the integral is a over
the range of random variable z, so this how we try to define this expectation and variant. so now i can make two assumptions as i wrote
here expected value of epsilon i to be 0 and variance of epsilon i to be sigma square.
what does this mean? once i write that expected value of epsilon i is = 0 that means we are
trying to say that the mean of epsilon i is 0, this means that some observations they
have got random error, which is positive and some observation they have got random error
which is negative. and once you to try find out its arithmetic
mean there average value is going to be 0 and that is expected also from the precaution
point of you that in a realistic experiment sometime the errors are positive and sometime
the errors are negative, so it is reasonable to assume, that the average of those errors
is going to be 0. but it is a random variable, epsilon i is a random variable so we need
to describe its behavior by variance also. for example if you try to see in this figure
also this epsilon i sometimes they are lying in the upward direction sometimes they are
lying in the downward back to correction and moreover every observation will have a different
type of random error with different amount of random error, so that is described by the
quantity of sigma square. in simple words if say, if the sigma square is low, then i
would say that my observations have got less variance. and they are lying closer to the line and
in case if this sigma square is high then i would say observations have got more scatteredness
and observations are lying quite away from the line. at this moment, if i try to consider
this model that y is = beta0 + beta1 x + here epsilon, so you can see her that i have assumed
this to be random. now we also assume something about y and x and let us see what their behavior. so we assume here that this x is also non-stochastic,
non-stochastic means it is nonrandom in simple language. in practice, there can we situation
where x can be random also, but here in this course we are going to assume that x will
remain as fixed. similarly going on the same concept i also assume that beta0 and beta1,
they are my parameters and i will say that they are fixed but unknown. in some cases this beta zero and beta one
can also be random, but we are not going to consider those situations in this course.
so now since epsilon is random beta0 beta1 and say x they are not random, so this y also
becomes random. after this if you try to see we also have assume that is expected value
epsilon is 0 and variance of epsilon is sigma square. so that means if i want to know my
complete model i also need to know the value of sigma square. so the sigma square also become a parameter
of the model so now you can see here that we have got here three parameters, beta0 beta1
and sigma square. so once i say that i want to know my model that is equivalent to saying
that i want to find out the values of beta0, beta1 and sigma square and how to find them
just on the basics of sample of data x1, y1, x2, y2 x n, y n. we also assume here that when we observe the
observations x1, y1, x2, y2, x n, y n then the corresponding random errors epsilon one,
epsilon 2 epsilon n, they are iids, iids means they are identically distributed as well as
they are independently distributed. so all epsilon 1, epsilon 2, epsilon n, they are
mutually independent of each other and all epsilon1 epsilon 2 epsilon n, they are coming
from the same distribution. now let us try to understand what is the interpretation of
beta0 and beta1. so if you try to look at this model y is = beta0
+ beta1, x + epsilon we had assumed that expect value of epsilon is equal to zero, so i can
write expect value of beta of y is = beta0 + beta1 x. so now if i say x is= 0 that independent
variable takes the value 0 then expected value of y = beta0. if i try to find out the first derivate of
expect value of y with respect x then this comes out to be beta1 so now this beta0 and
beta1 have some interpretation, beta0 is the average value of y or average value of response
when independent variable takes value 0 and beta1 is the rate of change in the average
response, average value of response when there is a unit change in the value of independent
variable x. so this is how we try to interpret the values
of beta0 and beta1 in case of linear regression modeling, so beta0 is simply the average value
of a y when x takes value 0, and beta1 is the slope of the line, which is measured by
the first derivative of average value of y with respect x. now we stop here and in the next lecture,
i would explain you how to estimate this parameters using two techniques, one is ordinary least
square estimation and another is maximum likelihood estimation, till then good bye.