Video 7: Logistic Regression - Introduction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to our first video on logistic regression in this video we will introduce the regression model as well as discuss how can we deal with binary outcomes when they are dependent variables let's start by recapping continuous and categorical variables in the context of a general linear regression model as the one shown think of what could drive customers preferences to buy a certain product amongst the independent variables we could find we could have continuous ones such as age or the person's income or a person height which is important for clothing all these have numerical values we could also have categorical variables that determine two customers product preference for instance gender City or ethnicity and we had learned that we could use dummy variables to represent these on the other hand we could also have various forms of dependent variables for example we could quantify with numbers how much does a customer spend or how much time do customers spend at a store however a more simpler question could be if a customer buys a product or not which is a categorical variable with two potential outcomes for these kind of variables we will be using dummies let's look at some examples of questions with binary outcomes at a bank for instance we could wonder if a person is worth of credit or not and should we give that person a loan or not if you're looking at individual transactions we could use a binary outcome to model if that particular transaction is illegal or a fraudulent transaction in the context of schools a yes/no question could be if a particular student is admitted or not into the school in the context of politics we could assess if this person will vote against or in favor of a particular law finally coming back to the retail context a binary variable could mean that a customer bought a product or did not buy a product so how could we represent binary outcomes recall these are variables that only have two potential values usually the kidding of an observation belongs to a certain category or has satisfied some particular attribute it's going to be a yes/no question similar to how we did with independent variables we can create a dummy variable that is going to be equal to one if the answer to our question is that yes and zero otherwise note that if we coded our dependent variable the other way around the coefficients are going to have the same magnitudes but the opposite signs how come well whatever helped me for example by a product will have the exact opposite effect and helping me not buy that product and thus we're going to have the opposite sign for that attribute the data we will be using this example is a set of 1000 random customers from a given city and we want to know what determines our likelihood or decision to subscribe to a particular magazine our dependent variable naturally is going to be an indicator variable that tells us if the customer has subscribed to magazine or not it will have a 1 if the subscription took place any zero otherwise we also have access to some demographic information that could influence a customer's likelihood of subscribing to the magazine for instance age and in fact in this video we will only focus on this particular attribute of a customer we could also have other attributes such as gender but we will not rely on them on this video if we think about the problem we don't see too many reasons why we could not use a linear model because aside from being binary there's really nothing else special about this binary dependent variable in fact if we want to change this binary variable from a 0 to a 1 we're changing its value to a higher one and thus anything that increases the value of y should favour the likelihood of a customers subscribing to the magazine so we could run a simple linear regression model that looks as the one shown where we have subscribed the binary dependent variable as dependent variable and age as our only regressor the regression output in a statistical package in this case Gretel shows us the coefficient for the intercept and the slope so our estimated model is subscribe equals minus 1 point 7 plus 0.06 for x age and what does result me if we call that our dependent variable is binary it's a 0 or a 1 and you want to make it grow from 0 to 1 this is very closely tied to trying to increase the probability of a customer buying where the probability of subscribing equal 1 is the likelihood of a customer subscribing to the magazine let's denote for simplicity this probability as P and then we can rewrite the model as the probability of subscribing equal 1 or P is minus 1 point 7 plus 0.06 4 times H so since this is the slope of our model we can simply take the coefficient of age and make the following assertion every additional year of age increases the probability of subscription by 6 point 4 percent and this makes a lot of sense now let's check if we can use this to forecast probabilities of customers with given ages recall that probabilities are bounded and they have to be between 0 and 1 moreover also note that the range of age in our data set that goes from 20 to 55 that is the youngest customer in the data set has 20 years old and the oldest one has 55 since it only makes sense to develop forecasts for observations similar to the ones we have in our data then we can assume that computing the estimated probability of a 35 year old person subscribing is very easy what we must do is simply plug in the 35 and we find that the estimated probability of a 35 year old person subscribing is point 54 so far so good but what about people with 25 or 45 years of age given the range of age this should works just as well however if we plug in 25 we find that the probability that this customer buys is estimated to be minus 0.09 and this cannot be correct since a probability cannot have a negative value similarly if we plug in the 45 we end up with the number of 1.2 which is greater than 1 on came an invalid value for probability this becomes more clear if we plot what we're having in this plot you observe the balance of probabilities that should go from zero to one and note that when customers are young say below 26 years of age or so the estimated probabilities are negative meanwhile if the customer has more than 43 or 44 years of age the probabilities are greater than what this model is just not working how could we fix this one opportunist to artificially cap the linear model and say whenever the estimator probably is below a zero make it a zero and whenever the estimated probability is above one make it a one and this would give us a spline function as the one shown with those breaks in the function but this is two engineered way to custom to be a standard approach could we do something better and let's think what should we do to fix this once again note that probabilities should be between 0 and 1 and we know that the probability will be a function of age however the linear function did not work for us so what conditions should this function satisfy to always produce reasonable forecasts for the probability I will give you a few seconds to think about this well there are two main attributes that must be satisfied one is that the probability must always be positive and second that it must be less than 1 so let's now try to develop a new function that satisfies these two criteria and we're going to do it step-by-step first let's ensure that we have a positive number and what functions could give out positive numbers you can think of the absolute value of a number it's always positive the squared version of any number is always positive as well and an alternative to this is an exponential form whereby the exponent of beta0 plus beta1 times age which is the same as saying e powered to beta0 plus beta1 times age is always going to be positive you can check it with Excel or in some other software it's always going to be positive however it sometimes will be greater than one so we need something else to satisfy the second criteria that the probability is less or equal than one and if you think about proportions any giving number divided by a number that is just likely greater than it will give us a number smaller than one so why not do the same why not use the expression we had before and divide it by something that is slightly larger how much well just one unit larger we have the same expression above and below but in the denominator we have a plus one note that we could have added any small value or large value and epsilon for that matter and the condition of having a value less than one would still be satisfied however we use one for reasons that will become clear shortly now even though we have this more complex expression the linear thinking is not completely gone if we do some algebra the previous expression can be written as follows we have Galaga P over 1 minus P P being the result of the prior expression is equal to a linear function of age that looks just like the linear simple regression models we had before so even though the probability of a customer subscribing is not a linear function of age we can perform a simple transformation on it such that it is now a linear function of age the above equation is the one used in logistic regressions and let me show you the output of a logistic regression in Gretl you can see that we have the coefficient for the intercept and the slope the beta 0 and the beta 1 but how do we interpret these coefficients is different we will discuss the interpretation of the coefficients in another video for now let's simply note that the model we estimated getting that our dependent variable is the log of P over 1 minus P is minus 26 point 52 plus 0.78 times H or if we do the algebra again and write this in terms of the probability we have that P is equal to that depression and note that all what I'm doing is rewriting the expression but substituting the beta0 and beta1 s-- for the corresponding coefficients that came out of the regression output if we now plot the probability of against age using this expression this is what we find and note that the probability is no longer below zero or above one in fact as customers grow older the probability asymptotically gets closer to one and asked customers grow younger the function is asymptotically closer to zero but never below zero or above one this is the plot of a logistic model thank you very much
Info
Channel: dataminingincae
Views: 267,556
Rating: 4.905551 out of 5
Keywords: Logistic Regression, Statistics (Field Of Study)
Id: gNhogKJ_q7U
Channel Id: undefined
Length: 11min 52sec (712 seconds)
Published: Tue Sep 16 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.