hello students welcome to the second topic for statistical modeling for business analytics today we will talk about regression with a binary dependent variable so first we will talk about this is how the presentation will go first we will talk about difference between binary versus continuous dependent variable then we will talk about the linear probability model and we will introduce the ideas of probit and logit regression we will again estimate and how to estimate the probit and logit models and how do you infer the infer the results of probit and logit models that is what we will look at and all of this we will again do it by referring to a particular application which is we will look at whether there are there is any racial discrimination in mortgage lending so first we talk about what binary dependent variables are so so far when we have been looking at the dependent variables in all of the examples that we have used the first example was in multiple regression where we looked at district-wide average test scores in school districts we looked at how the class sizes may affect the performance of students in a particular school district or when we looked at traffic fatality rates and beer taxes in both of these cases the dependent variable that is the Y variable was continuous so it could take any of a large number of values and we modelled it as such but what if Y is binary so in some cases you are not interested you are interested in discrete outcomes okay so either you might when you apply to a particular College you may get into the college or not and you suppose you want to know what are the factors that affair whether you will get into the college or not so these factors may be things like your high school grades it may be your SAT scores or in the case of in the case of in that Indian case yet maybe your cat scores or gates course it may also be relevant to what your demographics are yeah another such question might be whether a particular person smokes or Dale's take up smoking or they don't so this may depend on the price of the say traits which may also depend on the level of taxes on the secrets it may depend on the income of the individual who whether they can afford to smoke and it can depend on other demographic variables in this particular lecture we will discuss this example which is whether a mortgage application is accepted or not so before starting with this example I'll briefly describe what a mortgages so if you have if you own a house or if you own some real estate property and you want to get a loan based on the on using that real estate property you're using that house as an asset you would tell the bank that you can keep this as a security and I want to learn for some immediate needs that is called a mortgage so when you apply for such a mortgage to a bank the bank may accept your application and give you the loan or they may deny the application and not give you the loan so why why me they deny your application if they feel that even though you are offering and security of your ability to repay that loan is not high and you may eventually you know you may not be able to repay the loan and you will become a defaulter inserts in such cases you you may want to see if someone is looking at this data they may want to see what are the factors that affect a bank's decision to apply to deny or accept the mortgage application so suppose you are interested in noticing what are the factors that affect this this might be the income of the applicant it might involve the characteristics of the house that they are putting up as a security it may depend on their marital status whether they are married or not married it may also depend on things like their race so this is another case where you are looking at Oh either it gets accepted or it gets denied here either they smoke or they do not hear you may get into college or not as you can see in each of these cases the why dependent variable is binary it is it does not have there is nothing like half getting into college or half smoking or half mortgage application being accepted so so the particular example that we will take look at is the mortgage denial and race a dataset this is a data set that is that has collected data in 1990 in the Greater Boston area it has two thousand three hundred and eighty observations and this was collected and then the home on its Disclosure Act which this disclosure act basically says that banks have to disclose the details of the people who applied for mortgages and whether they were that application was accepted or denied what are your dependent variables is the model denied or accepted and the independent variables are things like the income wealth and employment status of the applicant and other loan property characteristics what other loans have they taken what is the what what is the value of the property that they have put up for security and also things like what is the race of the applicant whether they are white they are black they're Hispanic or there are some other race so all of these factors may affect the denial or the acceptance of their mortgage application this is the particular example maybe in okay now logistic regression and probit logit and probit regression or you know discrete dependent variables logit and probit are just methods for estimating but discrete dependent variables are a very important subset of questions research questions that many people may want to answer so many times when you are looking at a particular individual whether they do one thing or not whether they save for retirement or not whether they smoke or not whether they get married or not whether they commit a crime or they get into you know they are incarcerated or they go to jail for something or not each of these things when you are trying to look at differences between individuals then this kind of the dependent variable of the outcome variable is binary and that is when you use this kind of methodology so how do you the first thing that we look at the first way of trying to model binary dependant variables is using the linear probability model so we first say suppose you were just to take your Y you use your new use your familiar Y I equals to beta 0 plus beta 1 X I plus uij you take this linear regression model but now you will notice that in the normal case when your Y is continuous what does beta 1 stand for beta 1 stands for an increase in how much will there be an increase in Y I for a unit increase in X I this is what beta once transferred right beta 1 is Delta Y by Delta X but what does beta 1 mean when y is binary so Y can only take a value of 1 or 0 there isn't a fractional del Y that happens for a unit change in X so you a fractional change in Y does not make any sense let me draw draw this kind of data and then you will understand so in normal regression you have your you have your y-axis and you have your x-axis and say your x-axis varies between 0 and say suppose here we are talking of education and here you are talking of say salaries or something like that and education may vary from 0 years of education to maybe something like given as as high as 30 years of education or 35 years of education and salaries suppose you say in thousand dollars or it can vary between thousand two thousand whatever per month 3,000 4,000 5,000 6,000 7,000 8,000 and of course it can vary a lot more okay suppose we just say we say ten thousand dollars so this is 10 this is 20 this is 30 this is 40 this is 15 this is 16 this is 70 and so on this is ten thousand dollars so you have an individual who is who has say ten years of education and they are making something like thirty thirty thousand dollars so this is one data point right so you have 10 X 1 equals 2 and y equals 30 similarly you can have many other data points which are like this but basically X can Y can take any value and if you if you fit a regression line through this you may get a line that goes something like this ok but here basically even if you do not have data for every value of education at every value of salary you can still say that in case you had such a education you could possibly predict that you would get that value of salary it makes sense to predict somewhere between the intervals because education can also vary - even though typically education will vary by number of years on things but where salaries can vary at an inner continuous level ok but suppose you have a discrete data and Y this is for y equals to 0 excuse and you have y equals to 1.so you get and say this is your this is your independent variable which is say you know x equals to income of the applicant and this is this is say income X is equal to ahead of it okay let us take another example let us say X is equal to education and this is gets a job or not so say for very low levels of education that person may not get a job so gets a job is one so getting a job is written as a job is one not getting is equal to 0 so you will have some people with low levels of education who are not getting but there may be some people with low levels of with high levels of education also who may not have a job okay but more number of people with low levels of education do not have jobs but when you have high levels of education some of them some with low levels also may have a job but as you moon as the number of education goes up more number of people have jobs ok so these are the number of years of education and you are looking at not the salaries that they are making but whether they are at all getting a job or not ok so your data may be like this now if you were to fit a regression line through this you may get a regression line which is like this okay it's very similar to this but now what does it mean so if your education jumps from here to here and the beta coefficient you say is this and your data words from your Y predicted Y goes from this point to this point what but what does that even mean you do not have a Y which is possible you know you either have a job you don't have a job you cannot have 20% of a job okay so this is what what does y mean and what does the beta 0 plus beta 1 x mean when y is binary you cannot what does beta 1 mean and what does beta 0 plus beta 1 x mean when y is binary and what is the predicted value of y hat mean when y is binary for example what does y hat equals to 0.26 mean okay so if you can't really interpret this kind of a result when you're when you're a linear regression when your data is binary so first we try to do something so that we try to convert our dependent variable which is binary into a dependent variable which is continuous and how we do that is by reinterpreting our dependent variable as a probability so instead of saying directly that it is the absolute value of 0 or 1 what we say is what is the probability that it will take a value of 0 or what is the probability that it will take a value of 1 and as you know when you probability is a probability of an event occurring is a continuous variable it is bounded by 0 and 1 so either the probability of if it will not occur at all if it does not occur at all them the probability is 0 and if it is certain that it will occur then the probability is 1 and if you are sort of 20% you feel that there is a 20 percent chance of the probability occurring then it is point to zero so now you can see that it is a continuous variable which can take a value of zero point one point one five point two nine so it is a continuous variable and it is bounded by zero and one so first we start with the most simple linear probability model where we say that the Y is the probability the predicted value of y is interpreted as the predicted probability that y equals to one and this is important to note that in discrete dependent variables we always take the value of y equals to one that is what we are modeling however you may you may model either you know so for instance if you are getting into college or not you can model the fact that you have gotten into college you can denote that as y equals to one or the fact that you did not get into college you could also denote that as y equals to one depends on what what is the question that you are interested in but typically you will say that if if the event occurs that is the event that you take as y equals to one and if this is a probability that y equals to one then the expected value of y given x is the different values of Y multiplied by the probability of that value happening so what are the two values of Y over here one value is one and one value is zero and one multiplied by the probability that Y will take a value of one given X plus 0 multiplied by the probability that Y will take a value of 0 given X ok now zero multiplied by anything this term disappears so you just have probability of y equals to 1 given X so instead of your familiar Y I equals to beta 0 plus beta 1 X I plus uij here on your left hand side you have probability that Y will be equal to 1 given X on your left hand side now what will happen in the right hand side expected value of UI and if you are using least squares assumption number one an expected value of UI given X I is equal to 0 right so because the average mean for every value with the average mean across all values of X I you hope to fit a curve that such that the average means will be 0 expected value of UI will be 0 or UI will be uncorrelated with X I so you have expected value of beta 0 plus beta 1 X I plus u i given X I but this is just the same as since expected value of UI given X is equal to 0 so you just have the first part of this term that is expected value of beta 0 plus beta 1 X I and so you have on your left hand side you have probability y equals to 1 given X is equal to beta 0 plus beta 1 X I so this is your linear probably deep model so this is when so the when y is binary the linear regression model by I equals to beta 0 plus beta 1 X I plus UI is called a linear probability model because probability of y equals to 1 given X is given as a linear function of beta 0 plus beta 1 X I the predicted value is a probability that is expected value of y given x equals to some fun some values of x is equal to probability of y equals to 1 event X equal take some values of X that is probability that y equals to 1 given X so Y hat is the predicted probability so basically now what it becomes is here getting a job or not it becomes the probability of getting a job so for a given leveling of mitigation what you are now looking at is what is the probability of given getting a job you are not directly looking at Y you are getting probably so you say probability of getting a job with low levels of educationists low and probability of getting a job with high levels of education is higher and closer to 1 and what is beta 1 beta 1 is the change in the probability for a unit change in X so now this beta 1 which is the slope beta 1 hat B or beta 1 is equal to the change in the probability so for this level of education this was your probability of getting a job and for if you increase your education by a little bit this is your probability of getting a job so the change in the probabilities so probability of y equals to 1 given x equals to X plus Delta X minus probability of y equals to 1 given x equals to X divided by Delta X how much does your probability increase if you increase your education by a little bit okay so that is what the incremental probability divided by the change in the dependent independent variable so this is how the mortgage denial case where we were looking at that was the example that we are looking at so in the in our particular case we are defining the mortgage being denied as a dependent variable hence when the mortgage is denied we put that Y as equal to 1 and when the mortgage is approved we put that Y is equal to 0 and if we fit a linear probability model and here on the x-axis we take something that is called the payments to income ratio what is payments to income ratio so if you take a loan then you will have to make your interest payments or every month right so now suppose you have taken a loan of $10,000 and you have to make interest payments of say three hundred dollars per month then 300 is your payments and your income is $10,000 so 300 divided by 10,000 is your P Irish so as your pee Irish sure increases then your ability to make the payment given that you know you need to use your income for other purposes such as buying food paying for the education of your children and so on and so forth given that you have so many uses for your income if a large chunk of your income is goes away in paying for repaying your mortgage loan then your ability to repay the loan goes down and the bat takes that into its consideration and it says you know what if the payments are very high I don't think you will be able to repay this loan and hence they deny your mortgage application so as you can see it has some effect not a very large effect but it has an effect so when you estimate this for the full hmd ad does it you get deny hat that is that is your dependent variable you are trying to look at what is the probability that your mortgage application will be denied that is equal to minus zero point zero eight zero this is your intersect plus point six zero for p i-- ratio and these are your standard errors point zero three two and point zero nine eight this is what a total of two thousand three hundred and eighty data points so now as you can imagine that you this dime does not directly tell you that you know PA ratio goes up by one and deny hat goes up by 0.6 zero four that is not how you interpret it because probability of deny hat goes up by point six zero four so when you're denied a pie ratio is point three then what is the probability of deny hat that is minus point eight zero plus six point six zero for into point three that is point one five one now if your p i-- ratio increases from 0.3 to 0.4 that is instead of thirty percent of your income the payment being equivalent to twenty percent of your monthly income payment becomes equivalent to forty percent of your monthly income so what is the probability that your application will be denied given this particular estimated equation it will be - point here there is a actually there is a problem over here because here it is written as point eight zero but it is actually point zero its flow and here also here it is point zero eight zero okay so point zero eight zero plus point six zero 4 into x point four and you get point two one two so the effect on the probability is of denial is this minus this which comes to about point zero six that is by six point one percentage points your probability of being denied goes up when your payment to income ratio goes up from point 3 to point four so we look at the linear probability model where now we include another independent variable that is we include the race of the applicant which is we basically include a dummy variable which is takes a value of 1 if the applicant is a black person or takes a value of 0 otherwise okay we are trying to see whether there is any whether there might be any omitted variable bias now with the predicted probability of denial for a black applicant with p.i ratio equals 0.3 so in that case p.i ratio takes a value of 0.3 so you put 4 - Oh point 0 9 1 plus 0.5 v 9 multiplied by 0.3 plus 0.177 so the probability of denial for a black person who has a PID she 0.3 is 0.25 for now you keep P I ratio constant at point 3 and you look at the probability of denial for a white for a person who is not black not for a white person but anyone who is not black so here this VI ratio is kept constant at point 3 and you put in a substitute of the values and you find that the probability of denial is 0.077 so if you were a black person your probability of denial was point of twenty five point four percent and if you had not black your probability of denial for seven point seven percent so the difference is 0.177 or 17.7 percentage points and this coefficient and black is significant at the 5% level if you look at the coefficient and the standard error you can see that it is significant at the 5% level still there may be plenty of room for omitted variable bias there may be other variables that we have not included that may result in a omitted variable bias so the linear probability models probability of y equals 1 given X as a linear function of X what are the advantages of the linear property one do one is that it is simple to estimate and to interpret one unit increase in X leads to beta one unit increase in the probability of Y taking a value of one now inference will be the same as in multiple regression and we need to use heteroscedasticity robust standard errors now what are the disadvantages one of the disadvantages is that the primary disadvantage is that LPM may predict probabilities that can be less than zero or greater than one so if you look at the NPR model you are predicting you are fitting a line now this line can predict values of probability which are less than zero so over here somewhere over here for a pie ratio point two you are predicting a negative probability and for a pie ratio of 0.75 you are predicting a probability which is greater than one both of which is not possible right you can't have a probability value less than zero or a value greater than one so this is one of the primary problems another problem is that the probability seems to increase lean now one would assume that these probabilities would rise at a given point it would rise a little high is quickly away from zero and when it is approaching zero it would start stop rising as much and basically it would be an asymptotic curve okay where it is approaching zero gently over here and it is approaching one gently over here so let us look at but if you take a linear probability model so this is what the change in the predicted probability for a given change in X is the same for all values of X okay but that that doesn't seem to make sense so these disadvantages can be solved by using a non linear probability model and what are these nonlinear models they are known as the probit model or the logit regression probit regression or logit regression so we will talk about these now so probit and logit regression is you know you have the with the linear probability model you are modeling the probability of y equals 1 as a linear function of X instead you want the probability to be increasing in X so as the as X goes up you know your probability should increase quickly and as X goes down you that you dis Lopes on the constant also the function should be such that it is bounded by 0 & 1 for all X so we need a non linear functional curve and so here we talk about if we could find a functional form that follows the shape of an S curve so let us look at what that functional form might be so one of the forms is called the probit model so the probit model satisfies these conditions one probability of y equals 1 given X is increasing in X for beta 1 greater than 0 okay so for a positive slope it is increasing in the slope increases and over here the slope decreases and also it is bounded by 0 & 1 now what is the probit model the probit model models the probability that y equals 1 using the cumulative standard normal distribution function file set okay if you remember the normal distribution function or the standard normal distribution function we had a function that looked something like this right so we had we had the normal distribution function which looked like this and the standard normal distribution function it was centered at zero mean was 0 and the standard deviation was one right now what is the cumulative standard normal distribution function basically the cumulative normal standard distribution function measures the area under the curve the area under the curve for for any event value of x say x is equal to over here and minus 2 the what is the area under the curve lay at the point minus 2 to the left side of minus 2 that is the cumulative standard normal distribution function so at X is equal to minus 1 what is the area under the curve which is to the left of X is equal to minus 1 this is the cumulative standard normal distribution function now this distribution function as it happens follows an s-curve okay so the probit regression model is modeled as this probability y equals 1 given X is Phi where Phi stands for the cumulative standard it's something that transforms this covalent this beta 0 plus beta 1 X it transforms it and how does it transform it it user this cumulative standard normal distribution function okay so beta 0 plus beta 1 X is called the Z value or the Z index of the probit model so how do you find out what is you calculate beta 0 plus beta 1 X and you look up the cumulative standard normal distribution table and you find out the values of files of different values of this Z index so suppose your beta 0 is minus 2 and beta 1 is 3 and X is equal to point 4 then your set index is minus 2 plus Phi 3 into point 4 that is minus 0.8 and the Phi of that when you take the cumulative standard normal distribution for minus a there as I said 4 minus 2 it would be the area under this 4 minus point 8 it would be the area to the left side of 4 a minus point 8 and what is that area the area under the standard normal density to the left of Z equals 2 pi minus point 8 which is you look it up over here minus point 8 is 0 point 2 1 1 9 ok the area under the entire normal distribution curve is how much it's basically 1 right and as you move further and further and you are basically the entire area will be 1 but under this point - point 8 it is point 2 1 1 so probability Z is less than minus 0.8 is point 2 1 1 9 and this is what the standard normal distribution exactly what I had shown over here this is what is being shown here so why use the cumulative standard normal distribution function the next shape gives us what we want okay so the cumulative standard normal distribution function looks like this so this is what it looks like if this is the normal distribution function then the Associated The Associated cumulative function will look something which may not be exactly right but this will spot the cumulative normal distribution function Monroe's life standard normal distribution fashion so we it helped follows probability of y equals 1 is increase in X increasing in X for beta 1 greater than zero it is easy to use the probabilities are tabulated in the cumulative normal tables it's relatively straightforward interpretation this is the same value beta 0 hat where once you estimate this beta 0 hat what is the exact Peter 0 hat and what are the beta got hacks X this is the predicted set value given X ok and beta 1 is the change in the end value for a unit change in X so if you use this in the HM da the using the hmd editor and you use data to estimate your function then what you get is probability of deny hat equals 1 given VI ratio is Phi of minus 2 point one nine plus two point nine seven into P I ratio because this is what you have found you have estimated and found your beta one coefficient to be two point nine seven and your beta zero coefficient to be minus two point one nine this is minus two point one nine plus two point nine seven into P I ratio so probability deny equals 1 given P I - oh is this there is a positive coefficient for P I - Oh does this make sense so as your PID shio goes up the amount you have to pay divided by the total income the payment amount becomes a larger and larger part of your income it is it may be possible that for people who have where the payment is a large part of their income they may not be able to make their payments as a result of which the probability of their mortgage applications being denied will be higher so the beta one coefficient that is as positive make sense so standard errors have the usual interpretation basically we are trying to look at whether this given the data what the errors standard errors for the beta coefficients are and that will allow you to see whether it is statistically significant of course over here it seems to be statistically significant so for P I ratio point three you're you take Phi of minus 0.3 one point three and you get point zero nine seven for P I ratio when it changes to point four then your Phi of my said value becomes minus 1 and Phi of minus 1 becomes 0.159 so the probability changes from zero point zero nine seven to probability of denial of the mortgage application changes from point zero nine seven two point one five nine and this is what PS expected now suppose you have multiple addresses so far we were just doing at the PID sure but if you have multiple addresses then probability of y equals 1 given x1 and x2 to regressors is Phi of beta 0 plus beta 1 X 1 plus beta 2 X 2 so the way you calculate your set value or the Z index that is what changes in a linear fashion but this is transformed by a nonlinear function which is Phi and you take the Phi of that linear z value or Z index so beta 1 is the effect on the z score of a unit change in X keeping X 2 holding constant X 2 again you can include the coefficient the variable black where it says it is 0.7 zero eight one five seven nine and that is how it will the it will change your Z value by that much x one for black okay so and finally you have these things you have you can predict your probit probabilities you can use another command here it says spread probably please p hi ract point three white so suppose your bi ratio is point three and your you're a white person what is the probability that you will be denied here if you had written you know you were a black person you would find the probability for that so this is what you get if you estimate using another variable you get find that the coefficient of that is 0.71 is it statistically significant it definitely seems to be so and over here the coefficient for bi ratio goes down if you see before the coefficient for pi ratio over here was two point nine seven right when you were not including the variable black so there was some omitted variable bias here and when you include black over here the coefficient produces some somewhat and if you include even other variables that might have an effect on up you know approval or denial of the application then these dis these variables may change further so that what is the estimated effect of race suppose you keep bi-racial constant at point three you cannot directly say that it changes the probability by 0.7 one because as you can see there is a nonlinear transformation of this entire linear equation that is going on so what you will have to do is find what is the Phi of that when for a certain values of Pi ratio and black and find the Phi of another certain nerve values of Pi ratio and knock here we are keeping the pi ratio as constant at point three and we are just changing there is variable from one to zero you find that for a black person it is point two three three and for a non-black person it is point zero seven five so the difference in rejection probabilities is 0.15 wait or fifteen point eight percentage points there might be still room for omitted variable bias now we come to another transformation which is known as the logit transformation in fact many of you may have heard this term more than prophet but it's just a different transformation to take into account a to arrive at a s-shaped curve which will have the correct problem properties so that your your slope can be your probabilities will increase as x increases or and also the probably their shape curve will be bounded by 0 & 1 so you will not predict probabilities of less than 0 or more than 1 so no chase regression what is the transformation we are talking of here logit regression models the probability of Y being equal to 1 given X as the cumulative standard logistic distribution function here both in the probit regression we had looked at cumulative standard normal distribution function here we are talking of humanity standard logistic distribution function and that is written as capital F functional transformation capital F of beta 0 plus beta 1 X and well how is capital F defined it is defined as 1 by 1 plus e to the power of minus beta 0 plus beta 1 X okay so it is just another functional transformation and if you transform it in this way it will have a similar s-shaped curve because legit and probit use different probability functions the coefficient meters if you use the same data to fit a probit function or you fit a logit function you will get different values of beta 0 and beta 1 just because the functional forms themselves are different so we wrote what was the logit function we wrote the logit function as f of x is equal to 1 by 1 plus e to the power of minus beta 0 plus beta 1 x right that is the same as 1 by 1 plus 1 by e to the power of beta 0 plus beta 1 x because e to the power of a negative negative power well we won by e ^ first part of that so you multiply this over here and you get e to the power of beta 0 plus beta 1 X divided by e to the power of beta 0 plus beta 1 X plus 1 you just multiply this here you will have e to the power of beta 0 plus beta 1 X plus 1 divided by this but since the denominator it is divided by this this goes to the numerator and so you will have the functional form like this which is what is shown over here okay this is what is shown over here now what can you do you can take if you take this so this can also be so what is 1 minus probability of Y given X if you write 1 minus this then you will get some functional form okay so 1 minus if you write probability of Y given X is equal to V to the power of beta 0 plus beta 1 X by 1 plus e to the power of beta 0 plus beta 1 X so 1 minus probability of Y given X is equal to 1 minus this quantity e to the power of beta 0 plus beta 1 X divided by 1 plus e to the power of beta 0 plus beta 1 X that is equal to 1 plus P to the power of beta 0 plus beta 1 X minus e to the power of beta 0 plus beta 1 X divided by 1 plus e to the power of beta 0 plus beta 1 X so this and this cancels out so you have 1 by 1 plus e to the power of beta 0 plus beta 1 X so you have probability of Y given X is this you have 1 minus so this is the probability of denial probability of denial and this is the probability of approval right if you don't get denied your mortgage application will be approved so probability of Y given X divided by 1 minus probability of Y given X this is these are known as the odds of your this is known as the odds of your application getting dinner application getting denied and this is equal to e to the power of beta 0 plus beta 1 x divided by 1 plus e to the power of beta 0 plus beta 1 X from here divided by divided by this so that is multiplied by 1 plus e to the power of beta 0 plus beta 1 X you flip this over divided by 1 so this this cancels out so that is equal to e to the power of beta 0 plus beta 1 X so the odds of your application back in being denied is just e to the power of the Z Z value that you have and e to the power of beta 0 plus beta 1 X now you take log on both sides so you take log of probability of Y given X by 1 minus probability of Y given X that is log odds these are also called the log odds okay now God says just beta 0 plus beta 1 X because if you take enough of this you will just arrive at because 0 plus beta 1 X so this is what has been shown this is this can be written as log odds is equal to beta 0 plus beta 1 X and this is the legit transformation so basically you are looking at this is how you state it and if you estimate it suppose you have beta 0 you get as minus 3 and beta 1 you get in 2 and X you get as point 4 so beta 0 plus beta 1 X which is the z-index is the value you get it as as is minus 2 point 2 so probability of y equals to 1 given X is equal to 0.4 is nothing but 1 by 1 plus e to the power of minus of notice this minus negative of the said index and the z-index itself is negative that is minus 2 point 2 so e to the power of plus 2 point 2 that is point 0 9 9 8 so why do we at all your solution transformation if we do have the probit transformation the reason is that those it is computationally faster in practice logit and probit are very similar sense empirical results typically do not hinge on the launch it from bed choice okay so if you use theta again and you use the no cheat function again this is the residues are the results that you get and you get the coefficient for p i ratio is 5 point 3 7 and the coefficient for black is one point two seven and this is how that two models the predicted probabilities are very close as you can see they almost lie one on top of the other the blue line stands for the probit model and the logit model the black line stands which which model you will use is up to you it doesn't make much of a difference so next we will talking about the estimation and inference in the logit and probit markets