Survival Analysis [1/8]- INTRODUCTION

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
let's do this hey team it's justin zeltzer here from zstatistics.com your international emporium of statistics i just made that up uh today we're going to look at survival analysis which is quite exciting i personally really enjoy this topic and as you can see we're going to go through from the very beginning looking at the intuition behind survival analysis we're going to trace our way through looking at life tables kaplan maya curves the hazard function which is going to be quite interesting and then also look at how it's modeled out in the real world which is predominant predominantly by something called the cox proportional hazards model and we'll do that one over two videos because there's quite a lot of content to cover there but this is going to be great please join us if you like uh you can check out my website zedstatistics.com it's got all the videos up there from this series and my other series on statistics uh so yeah let's get stuck straight in shall we so in this first video we'll be checking out the intuition behind survival analysis we'll also be exploring how we measure survival time we'll look at visualizations of survival rates and also discuss the varying applications that survival analysis has across many fields so what is survival analysis well it's also known as time to event analysis so clearly we are concerned with the time that an event takes to occur after something called an exposure now as an example an exposure might be something like a diagnosis so say the diagnosis of hepatitis and the event we might be interested in could be death or it could also be something like cirrhosis of the liver cirrhosis being a common outcome after one has been diagnosed with hepatitis so in that example the diagnosis of hepatitis is where our clock starts to try to assess this time to event and then the event the progression into cirrhosis of the liver that would be where the clock stops and it's the time in between those two points which is of interest which is called the survival time now you'll notice in my example the event wasn't death but it often can be when we're using survival analysis and a very typical example might be looking at cancer or the diagnosis of particular forms of cancer as our exposure and then assessing how long it takes someone to die from cancer or put more positively how long they typically survive from diagnosis now we can also do interesting things like look at divorce rates so marriage might be the exposure of interest which is a funny way of looking at marriage and then uh the event would be divorce and we could assess the survival time of the marriage so it doesn't really matter if we're talking about health or we're talking about some kind of social institution or the survival say of a company after a particular stock market shock something like that at any event survival time is of particular interest when we're looking at survival analysis you might be thinking now well can't we apply survival analysis to say the coronavirus because that's been all over the news obviously well not really is my answer to you because with the coronavirus we're not really interested in the time it takes someone to die or recover from the initial infection that time period is not really of interest to us we just care if they die or not so in that sense it's not really worth doing a survival analysis on coronavirus data so rather we use survival analysis on things that have a bit more of a delayed onset of symptoms where we care about that time frame so something like cancer where we can say well you've got diagnosed cancer on this particular date and we're keen on knowing how long you survive for after that date maybe dementia is another thing as well that we might look at or a hip fracture or something like that where we deadset keen on knowing how long it takes for the event to occur which in most cases will be death okay so let's have a look at how we might measure survival time so in this example we're going to be using lung cancer as our exposure and assessing the survival time from a diagnosis of lung cancer so let's say we're conducting this study over the course of say 10 years so what we're going to need to do is collect some data from people that have been diagnosed with lung cancer and see when they have died over the course of those 10 years so let's just say our first person in our sample was diagnosed in 2021 and then survived for another four years our second person in the sample was diagnosed in 2022 and survived six years so you get the gist let's say we have a few more people in our sample now as you can see they each have a different starting time from when their diagnosis of lung cancer actually occurred so while indeed on this axis here we have time in years in calendar years if you like what we're going to need to do to conduct a survival analysis is reset each of these particular survival times so that they start at the same point so if we shift them all backwards like this the axis down here now becomes time since diagnosis and now we have something we can analyze for survival so in this case if we draw a vertical line down at say three years we can see that the survival rate after three years is eighty percent as four out of the five people in our sample are still surviving given that their lifelines continue through three years after five years however it's down to forty percent as only two of our five people sampled are still alive so this is essentially what's happening behind every survival analysis that gets conducted now you might be asking what actually happens to the people that don't die by the end of the study period because at some point we have to write up our analysis right and at that point many people in the sample might not yet have died well this is an issue called censoring and we'll look at that in our next video so hold that thought and you can join me in the next one to learn all about it but we're keeping things simple for this first introductory video so let's keep these numbers in mind we have an 80 survival after three years and a 40 survival after five years so let's try and visualize these survival rates now so you can see on this y-axis we have the survival rate from zero percent to a hundred percent now because the numbers are quite nice we've got five people in our sample each time somebody dies the survival rate's going to dip down 20 percent or one-fifth so if we map out the survival rate across time from the example we just saw you can see we get this sort of a stepped ladder type thing where as soon as someone dies the survival rate dips down 20 percent and if we look up the time of three years we can see that the survival rate indeed is 80 percent and at five years the survival rate is 40 percent now this step ladder looking graph that you're seeing here is actually called a kaplan maya curve and again we'll have a look at this in more depth in a subsequent video now if it doesn't look too much like a curve to you that's because we've only got five people in our sample but imagine if we had a thousand people in our sample those steps will be infinitesimally small and so it will look like more of a curve so as you can see this is a nice way to visualize how quickly people die from lung cancer once they are diagnosed with it and there's a good word i'm going to use here which is to say that this is a non-parametric curve and when i say non-parametric i mean that there is no parameters involved it's simply derived from our data set so it looks very custom-made just for the data set that we received parameters can be used when we start doing some more advanced analysis and things like modeling where we might use something called a hazard parameter but again i don't want to blow your head off too soon here so we'll leave it at that for the moment but to finish off this video i just want to look at now some applications of survival analysis and as you can see i've got several different fields in which we can apply survival analysis obviously in health as we've already seen we're looking at the time to death or time to say device failure so the exposure of interest might be the insertion of a pacemaker into someone's chest and you want to know how long that pacemaker survives inside someone's chest that's a crucial piece of information we can also have time to readmission after someone is discharged from hospital there's a lot of scientific papers out there using survival analysis assessing time to readmission now in manufacturing you can appreciate that we could probably use survival analysis to assess the component failure in particular machines and we can also be interested in the time it takes for a device to become obsolescent or for patents to be approved and i haven't just selected these off the top of my head there are actual scientific papers that use survival analysis to assess these particular things in finance we can look at the time it takes for businesses to fail the time it takes staff to turn over or the time it takes someone to get a promotion after starting at a business in that way the word survival is probably misused there but the analysis that we're conducting is still very much appropriate so i guess that shows that goes to show that the event doesn't always have to be death or something bad the event could actually be something good and we could be assessing the time it takes for that good thing to happen so there's our time to divorce again if we're looking at sort of social contexts you can also find the time it takes a couple to have their second child after their first and there's actually quite a few papers that use survival analysis looking at sport and i'll put a link to some of these in the description there's one that looks at the time it takes soccer players to be substituted from the field we might use this analysis to figure out well when is the optimum time to make your substitution so that's it guys thanks for watching the first video in this series on survival analysis they're all going to be up on zed statistics.com leave a like subscribe do all those kind of things that'll help me out a bunch and i'll catch you in the next video where we're going to look at the concept of censoring crucial concept for survival analysis so see you there [Music] you
Info
Channel: zedstatistics
Views: 27,471
Rating: undefined out of 5
Keywords: survival analysis, zedstatistics, zstatistics, justin zeltzer
Id: v1QqpG0rR1k
Channel Id: undefined
Length: 12min 17sec (737 seconds)
Published: Thu Nov 12 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.