Course Review | Linear Regression, Logistic Regression, Poisson Regression, Survival Analysis

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so now that we progress through all the ideas in the course what we're going to do is step back from the details that we've been in and scale up to the bigger picture and we're trying to do a course reviewer over the overview to remind ourselves a bigger picture of what do we talk about what did we learn in this course again scaling up when we get define details to what do we try and cover so in general discourse with a little modeling sub y as a function of X's reason I put Y in quotes there is that we start with linear regression we're remodeling the mean of Y the mean of a numeric graph then we move into logistic where our Y was something like disease yes or no and we modeled the log of the odds of y equal 1 right there the log of the odds of the disease as a linear function of X's so I tried to write it in quotes just to generalize we model Y or some summary of Y as a linear function of X's through the course we broke these models down into one of two kind of broad goals so the first is spitting an effect size model and here the goal of a model like this is to try to estimate the effect of some variable x1 on wine and we're going to need to control or let me say just for other variables x2 x3 all the way up to execute so I just for things like confounders deal with effect modifiers and so on we talked about different types of variables you may want to include or exclude from look and effect size model so really the goal there is trying to estimate you want to get an unbiased or least- estimate and the effective exponent a lot controlling for other variables and then we saw that we can think of b1 the estimate we get in data as being equal to beta 1 it's true effect okay so we like to think out in the world there is some true value or some true effect that x1 has on Y our estimate and the sampling it is gonna be equal to the true value plus some random error so some random Arivu sample not the entire population plus some bias bias due to confounding or collinearity or other things we've explored so our goal is really to try and remove this or make this as small as possible okay so they're trying to eliminate as much bias as we can error is always going to be there a sample data there's always going to be some error or variability but we can deal with that we could model that we've learned how to estimate standard errors build confidence intervals things like these things that capture the measure of uncertainty so our estimate is the true value plus some noise or some error and we can deal with that our focus was on how do you identify variables maybe introducing bias and try and address those the second Bowl and this got relatively much less time in our course we really focus on trying to s8 the effect of some variable outcome but we also did talk a little bit about kidney of the predictive models and these are we're trying to get a good prediction of that Y so here we're trying to people which variables will be good at predicting which variables are good at predicting the outcome so we're trying to maximize predictive power or all and so here the focus of a predictive model is really on the right our estimate of the outcome and trying to get the best prediction we can or prediction with the smallest amount of error and so this was kind of theme throughout the whole course the type of variables we were working with changed look at America variables disease yes know if those type variables rates as well as times till death so what I'd like to do now is talk a little bit about each of the regression balls we went through again at a higher level bigger picture we're trying to recap some of the ideas we went through this course as we talked about is different then the start of the course we started off with looking at numerical come variables and specifically talking about linear regression and we started by saying if we have some numeric outcome this type of variable can be summarized using a meat U of Y given X is the way the mean of Y and I guess B given X is going to be once we include X variables just we can summarize numeric variables using means and what we learn to do in video regression is model that need Y so model the meat of Y as a linear function of the X rays the mean of Y given X you know plus b1x one okay thanks we can look at the model in this way regardless if we're trying to predict you know some variable or if we're trying to estimate what effect does X 1 have on that right so we can look at the mall Express this way regardless of if it's a predictive model or an effect size model but we did see focusing on effect size models that this coefficient here can be thought of as how does the meat of life change for one group versus another so if x1 is just a exposed or unexposed what's the difference in the mean wide or exposed relative times and we saw interpretation of this change if x1 was a numeric variable / had more than two groups but this was the essence of it this coefficient tells us how does the mean change when x1 changes and then looking at this we saw I just add those over here some extensions so here of course there was a lot of core material and then there was a lot of extras of things that were nice to know about not need to go so not part of the assessment when we talked a little bit about the idea rather than modeling the mean of Y so we could say use something like quantile regression and with this we can model say the median of Y as a linear function of your incidence so rather trying to model the meter what you can model the median of one using a linear function Rises and we really you really only mentioned this in name but it's a extension of linear regression so once you understand the your regression understanding quantile regression it's not a big jump from there and just to kind of extend this you don't need to only model work it can model more than just immediate life it can model any quantile of why there are any percentile of Y as a linear function of X's so again what I mean by that is the median is the 50th percentile we could say model the 80th percentile of Y it's a linear function of X's so this was a topic that you could look at on your own if you wanted to we didn't really get much into it but as I mentioned again the kind of theme of the course is we lay a foundation for certain topics and then I spend time saying here's all the kind of extra paths or extensions you can read about if you want to deepen your knowledge around this area after our discussion of linear regression and we moved into talking about logistic regression right in here was where outcome variable was a yes/no binary or dichotomous variable so the summit have a disease yes or no and we saw these types of variables categorical variables one natural ready to summarize these it's using a proportion and their probability of Y given X or we just started to label it as P removing these Y given X subscript just to trim down some of the notation being used we saw that in logistic regression we could think of this two ways we can think of it if we're going to model P the probability of getting the disease as a logistic function of the XS so that looks like this here all right let's be proportionate with the disease or the probability of getting the disease as a logistic function e to the B naught plus P 1 X 1 all the way up to B K X K over 1 plus e to the B naught plus B 1 X 1 all 50 K this was one way of thinking of the model we're modeling what's the probability of getting disease as this logistic function hence the foreman sort of s-shaped curve or we could think of this same model as being that we're going to model the log of the odds of y equals 1 right if a log of the odds of the disease as a linear function of X's okay so again expression that you can think of it and small log P over 1 minus 1 of the log of the odds of getting the disease as a linear function of the axes now expressing the model of this this scale the log of the odds allows us to link to here connect to a line or a linear function and we've learned a lot about linear functions and how to work with them using linear regression as our basis to build that members name and so again remember these are the same model what expressing the skill probability of disease one of the scale the log of the odds and welcome the same model we saw that expressing it on this scale is a bit more useful way to commit for a predictive model where our goal is sum in a bunch of X's estimate what's the probability of the outcome happen right so we're trying to predict whether or not the outcome will happen this is a good way to look at them all you for building an effect size model looking at the model expressed on this scale the log odds is a next way to look at it well though it's important to stress these are the same model and we saw when doing that if we exponentiate this coefficient it's going to give us the odds ratio what effect is x one have on the log odds what's the odds ratio associated x one adjusted for x 2 x 3 that's one and just a quick note when talking about linear regression we saw this coefficient give us a difference in me I might have forgot to say there it's adjusted for these other variables I just want to quickly mention that case I did forget to say that and then a few extensions we saw so again these are mentioned we'll see a name left for you to explore if you want some extensions of the logistic model we saw need multinomial then just make regression sometimes this gets called polytomous logistic this is where we have more than two possible categories so rather than disease yes or no we're trying to estimate say which province does someone live it or if you want to put in an American context which state okay so somebody has more than two categories but unordered no ordering and that's what the other extension revenge and then ordinal logistic regression and this is where our outcome variable has more than two categories and there is an ordering to them so uh not disease mild severe something like them where there's more than two categories and there is a ranking or an ordering to them so again these were mentioned mostly in name I believe one of the our scripts we've had to have had some our code to explore some of these on your own if you wanted to dig a little bit deeper one thing I hope you're seeing through the course as you've learned about linear than logistic and Poisson regression and survival analysis is that while all these models are different they're also when you kind of zoom out they're all pretty similar or almost the same concept so if you've got a decent understanding of logistic regression understanding multinomial logistic or or the logistic is not a big jump from where you are now it's just we're going to get a little bit more about how these models differ when we have more than two categories or when there's an ordering of ranking aggression and that was where our outcome was a count variable so counting how often something occurred how many people got the disease how many car accidents were there how many people showed up to the ER things like these and we saw that counts one of the natural ways to summarize these you summarized them using a rate which is y over T the number of occurrences per unit time and again the rate is a mean right see average number of occurrences per unit time and we didn't mention this when talking about logistic regression but the proportion can also be thought of as at me right it's a meat of zeros and ones Poisson regression we saw that here what we want to do is model the rate or the account so all the suppress everything in terms of the rate we want to model the rate which things occur or how often occurred as an exponential function of the x-rays we can pick up late and e to the B not just B 1 X 1 all the way up to B K X K so this is looking at the model on the scale of the rate at which events are occurring or we said we can also think of all of the lottery so we can model log right it's a linear function so get in there the log of the rate at which events occur using the linear function ok again at this point in the course things are starting to look a bit repetitive and familiar want to put a reminder here while these are the same model again just Express on the scale the rate of the law great looking at the model Express on this scale is a bit more helpful when building a predictive model so if we want to try to estimate what is the rate at which things are occurring we need something a bunch of X variables and try and estimate that it we're building an effect size model what we want to say what effect does a particular variable have on the outcome looking at the scale of the law great can be a bit more helpful we saw that getting exponentiating this gives us the rate ratio adjusted for X 2 X 3 and so on so it allows us to ask me what effect does X 1 have on the rate adjusted for the dreidels underestimate the rate ratio adjusted programs we learn about building our model psyching variables is pretty similar pretty much the same for all these regression models that we looked at in a second I'll mention some of the extensions we explored before that I want to put a few notes um so if you notice about how these differ a little bit from linear or logistic that we've seen so far things that they saw and it or County is first the idea of individual verse aggregated data so we saw that we can count how often the event occurs for an individual or if events are rare if they're not going to curve very often or they can only occur once like death before the death rates and we need to aggregate their meaning to put people into groups look at how many deaths occurred within each group and then how many person years exposure were there or what's the kind of measure of time or exposure or their but so that was one way different data can be on the individual or aggregated level and we can be looking at counts or rates anyone had the same amount of follow-up time we can look at the count so we're looking at the number of visits to a physician in the year and everyone have been followed for one year we can just compare the counts or the number of business we can just compare that to why for each other if people have different fall times say for me we've recorded how many times have I seen a physician in the last three years and for someone else how many counties of a scene if is a physician in the last six months we have different 12 times we have to model the rate so this gave us the idea of having to use an offset let me talk about exactly what offset is when we're in Poisson regression I just wanted to bring up these notes of these reminders of how Poisson regression Gifford a little bit we have those and the faces think about yeah some of the extensions that we talked about and again these were mentioned as things you can explore on your own if you want they're not stuff you talk about very much in in depth in the course the first was negative by the old regression we said this is essentially very similar focus on regression it allows us to estimate the rate as a function of X's or the log rate as a linear function of X's but where differs that it does not require the me to be equal to the variance so in negative binomial regression it allows us to get separate essence of mean and variability if you remember a future sub regression was that the mean is equal to the grades so it allows us to estimate those two separately and then we also talked a bit about the idea 0 inflated what's on so if you had excessive zeros where the event is not occurring for more people than would normally be expected we can use 0 inflated calls 0 inflated Poisson and we also mentioned briefly 0 inflated negative binomial this allows us to address the excessive 0 as well as estimate the mean and variability separately so again those are topics you could explore on your own a little bit more if you wanted our final topic for the course was survival analysis and there we're looking at that as a time till an event occurs and the survival analysis is it open that she has two components to it the time someone is calling for and an indicator of whether or not the event occurred the event occurred or if they were sensor we can summarize using a hazard and Heather also is pretty similar to me so rate in which events occur and we saw within survival analysis I'm gonna talk mainly about the survival regression models take a look not the captain power model well we'll come back at the moment but I want to stay on the theme of regression models that we put that so we saw we can model the hazard as an exponential function of the exes so we can hazard as e v-not I put that in quotes we talked a bit about how yeah the intercept changes depending on exactly what model were using so let's just put it in quotes there you know plus b1x one I'll go to D K okay and then just a reminder here like that the way we can use this is we fit a regression model that estimates the hazard as a function of the X's and then the survival function as a heat the probability that the survival time is greater than the middle T is e to the negative hazard times T so we actually escalate the hazard has a bunch of X's and then we can sub that in here to get the survival function y estimating survival so then looking it on this scale is a good way to think of it when we're thinking of a predictive model if we want to try to estimate what's the probability of surviving beyond a certain data point we can use this model to estimate the hazard then summon in there s make the survival for that individual or we also saw you can think of it as we're gonna model the log hazard as a linear function of the axons log hazard is you know put that in quotes right we talk about how exactly what that interceptor be nocturnes change depending on if we're looking exponential model or Cox proportional hazards model and so on anyone x1 all them to eat again ticking the model on the scale the log hazard is a good way to look at it we want to talk on the effect size model we saw this coefficient here if we exponentiate that it's going to get in this a hazard ratio adjusted for x2 x3 and so on so again it we can estimate what effect this x1 have on the hazard adjusted for other variables adjusted for confounding and so on this is different a little bit from linear or logistic or the other models in a few ways what made this unique well this also has a time component to it which linear regression did not so just a few notes on this first one is that our choice is B not turn it's different for the exponential Weibull proportional hazard so that's how these models difference the intercept term changed a little bit but putting in quotations what I want to do is show that what we're looking at here is another generalized linear model pretty similar to all the ones we've seen in the course before and we had some lecture time rikako how this term changes if it's exponential versus liable responsible moral hazard and so on another note is that we also discussed the kaplan-meier model and that's not presented anywhere here these are all looking at regression models right the kaplan-meier model is not a regression model although is another useful model for survival analysis so that's why I've left it up raining here because I really wanted to to try and connect the dots and all the different models we've talked about and that one was a little bit different it wasn't regression modeling these are and then some of the extensions and again these were very much left for you to explore on your own if you want to we didn't get into the details of these the first was looking at time-dependent coefficients or parameters these allowing allowing these coefficients to change over time allowing do you want to depend on time or change over time these were allowing the hazard ratio to change over time so we talked a bit about that as an extension for dealing with non proportional hazards and then we have extension that we mentioned mainly just name is time dependent covariance or the X variables so allowing X variables to change over time allowing the dosage of a drug that still is getting to be different over time or a lot of people switch from one treatment to the other over time so these are kind of all the regression models that we looked at in the course kind of zoom note on a higher scalar the bigger picture all of them are all let me say it this way the first one we talked about linear regression that's a linear model logistic poisson and the survival models are all generalized linear models I mean it's a generalization to kind of the outcome of the wine network you're sitting in order to link to a linear function so the point of me saying it the point of you saying this is to try and remind you that the way we go about building our models selecting variables it's all the same in concert and so depending is a predictive model or defect size model if it's an effect size model the idea of confounders effect modifiers collinear variables independent predictors mediators all these sorts of things they're all the same in concept the way we go about checking four of them and deciding who should include these variables in our model or not is pretty much the same regardless of if we're working with logistic progression or linear progression so it doesn't really depend on the type of all the sets of assumptions for all these different models again we're pretty similar with slight changes depending on the exact details of the model we always had this assumption that the left-hand side of the equation so whether it's the log hazard or the mean of Y or the log odds is a linear function of the XS so we assumed linearity the way we checked that was similar different a little bit for logistic but in content that was always kind of the same and if linearity was not met the way we addressed that assumption it was always the same if we could categorize X we could include polynomial terms we could try transforming it so the point of all this is to say that the way we build models select variables go about refining them and so on is the same in concept and very similar in practice and the details of how we do it regardless of the exact type of model were working so we'll take a few minutes to remind ourselves with the approach to model building the variable selection again staying at the kind of big picture level not not getting it to the fine details first we talk about effect size models and there we talked about different types of variables that we may want to look for and include or exclude from our model and the first idea was confounders and there we said kind of a classic diagram that there's some variable x2 that has some effect on x1 our variable of interest so maybe I should put a reminder up here x1 where the lanes are variable interest so one effect this x1 have on line x2 x3 we're using to indicate other variables not our variable x2 has some vectors associated with x1 what effect this x1 have on the outcome that's our question of interest x2 also affects the other and if there's a variable x2 that's a confounder we're going to want to include this we're going to want to adjust skirt italics then we saw the idea of me year's mediators numerically behaved the safes confounders but they're not the same so the idea of a mediator is x1 directly causes x2 or has some direct impact on it and then x2 affects the outcome excellent they also independently affect you so here x2 is sitting on the pathway between x1 and 1x1 needs 2 X 2 y which then decently outcome and for these we generally want to exclude them and we talked a bit about how it's a little bit more complicated than just say include or exclude with money exclude if we want to estimate the total effect as we mentioned the whole idea of mediation analysis is sort of a sub topic you can read about it on your own if you want it's not something we've talked about in the course we skated the level saying if we want to know what's the total effect that x1 has a lot we should exclude it if we include X to be much holding it helps what effect this x1 directly have on wash not the effect that goes to X to talk about give give : e arity and this we were sitting is where it's not mediation x2 has some effect on x1 but it's a really strong Association so I'm just picking that by drawing a really big x1 and x2 are so highly associated you cannot really separate them so this happens quite often with x2 is an alternate measure of export care it's capturing the same almost all the same information that exports and we saw that we're going to want to exclude these so if you have some variable that is almost another measure of x1 we don't want to put that we talked about the idea of what we called independent predictors sometimes these just get called risk factors our bowls estimate would affect this excellent online and there's some other variable X tube that's not associated with x1 but also able to predict it okay and these we said there's arguments for including them and arguments for excludable if x2 is a strong predictor of the outcome that's a good argument to include it and often will also decrease the standard error for B 1 if X 2 is not a confounder it doesn't need to be adjusted for and that's the argument for excluding it so we kind of talked about the pros and cons of each of these we won't get to redoing that whole discussion with and then finally there's the idea effect modifiers you want to know what effect is x1 have online and there's some other variable x2 it's not on the pathway between X 1 and Y but the effect that x1 has a light changes depending on the values of X so if you want to say what effect this x1 have an outcome we need to specify the effect of x1 when x2 takes on certain values and again these we want to include if the effect of x1 I'm like depends on some other variable we want to state what is the effect of x1 given the values next then we spent less time but we also did talk about predicted levels and here we said really our focus is on including significant and strong predictors of Y so our focus is not on these about the idea of confounding if she does not exist in a predictive model and if we're not going to interpret the effects of certain variables doesn't really make sense to say the effect of two variables or stuck together so there we're really trying to include variables are good predictors of the outcome your angles that are reliably measured that are available at so at the time we want to make the prediction can we actually get these variables measured we can use things like a likelihood ratio test to test adding variables if they significantly improve the model we can also use things like AI ski or v.i.c again to evaluate which model seems to be the best so there's certainly credit carroty side which variables should go or not going to a predictive model and then a few things I want to say regardless of if it's an effect size model or a predictive model it's important to remember that the order that we test including variables it does matter so for example it might be the case that this variable X 3 and X 4 and whether it's predictable or an effect size model we might need to include one of them but not both so if we include X 3 we don't believe export if we include an export we no longer need X 3 so these two variables contain almost the same information now if we put X 3 in the model first we're probably gonna find we don't need to include export where if we couldn't export our mall first we might find we don't need to include actually so just a reminder the order that we include graybles in does matter and so that's why we said over the course model building variable selection it's kind of half science half part but so I will get to go through that whole conversation again this is supposed to be it's supposed to be a review in a summary but never mind the order that mean who thinks it does make a difference so we don't want to get too stuck on this it is our p-value less than alpha 5% we've talked to the previous course and this how p-values are useful as a tool to help sort of as a screening tool but we don't want to get too stuck on this magic p-value you know greater than or less than 5% p-value of 4% p-value 6% they're pretty much the same value so many use p-values as a guide to the tool to inform our to say but not getting too stuck on what is the exact value we also type up the idea of if there is interaction or effect modification different ways to approach that problem we can stratify on that variable if it's separate models within strata or we can model it using an interaction term or an effect modification term and we really look at these as two separate ways of trying to address the same problem or deal with the same thing another important thing we tried to stress throughout the course is that the initial data exploration is very important looking at univariant summaries by grading summaries really exploring the data helps inform the model building up procedure so it really helps you know is what things do you need to address can help you identify nonlinearities or influential points and on that note the idea of sensitivity analysis is also important if we have a non-linearity say the effective age on the outcome is not linear we can try including polynomial terms fitting the model or we can try categorizing age and fitting a model and hopefully we see our result did not change very much depending on how big a dress that non-linearity it's so seeing it's not sensitive to the solution we use then the final thing I want to say how this is we talked a bit in the course as we work through analysis and then writing papers based on those is that these stroke guidelines can be quite useful for writing your paper or for critiquing paper so these are helpful in knowing what should go into a paper based on the observational data now just some closing remarks I want to make before we wrap all this up what I hope you got out of this course is I hope that you got a solid foundation in regression models in building them and interpreting them and so on with the kind of caveat that we talked about a lot of ideas at once we try to talk about concepts and the actual computations as well but what I hope is you have a solid foundation that you can understand these models and you can build on your knowledge and expand your knowledge what you need that you have the foundation one is a regression model and you can build on top of that when you do I hope you understand that all these models have assumptions and through never perfectly met in reality so we want to know are they reasonable to work with knowing that the linearity assumption is one of the key assumptions or one of the most important assumptions for us that assumptions not mess things fall apart whether they're predictive models or if exercise models the other assumptions be pushed a little bit more I hope you've got a balance between knowing what what to do with certain data how to do it and how to interpret it when it's done regardless of what end of that spectrum you're on whether you're more of a producer of statistical analyses or a consumer of them that you've all different points on that spectrum and we're able to work with those I hope you're comfortable in working with these statistical software are being able to get some of these analyses done and I hope that you know have a few more tools in your toolbox to try and attack analyses when you need to closing remarks I want to finish off with the first is that I hope you remember the quote that all models are wrong but some are useful right and I think that's really the case and I hope you really see this course as the foundation that you can build on but knowing that when you run an analysis you're almost surely going to need to take some of these methods and extend them or build build further on top of them and that this really is the foundation for regression modeling that you can build on top of well thank you I hope you enjoyed the course stick around guys is need are lots more to taste it is almost as beautiful as a learner :
Info
Channel: MarinStatsLectures-R Programming & Statistics
Views: 5,466
Rating: undefined out of 5
Keywords: r course for beginners, r programming tutorial, r programming language, statistics with r, Data Science with R, statistics for data science, R programming for beginners, statistics course, statistics 101, statistics crash course, statistics course for Data science
Id: 5rOUGoNWw0Y
Channel Id: undefined
Length: 40min 59sec (2459 seconds)
Published: Tue Apr 07 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.