ACRM 2022 IC17: Longitudinal Data Analysis Using R: Part I Introductory Topics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone my name is Keith Los and I'm one of the instructors here with the Dr Al Kozlowski talking about longitudinal data analysis here at the American Congress of rehab medicine I hope you're enjoying the workshop so far uh in Al's section he's been talking about more conceptual issues around building mixed effects models and how you can use them in longitudinal data analysis in this section we're really going to focus on the practicalities of running those analyzes using the statistical programming language are and the programming interface R studio so you've been provided with these handouts and if you're familiar with RNR Studio you might find it easy just to kind of go through the handouts you know and run the code on your own and you'll have kind of enough uh support to make that run really smoothly if you're newer to r or you just want a little bit more information the goal of these videos is to provide you with a little bit of extra scaffolding and a little bit of extra interpretation as you run that code so you'll have been provided with these handouts as well as the file files the the r script files to actually implement this code and so you can just basically toggle back and forth between um your R interface and the handout in order to you know actually implement the code because we'll run a code chunk you know and then we'll talk about it then we'll run the next code chunk and we'll talk about it um and so if you if you you know uh feel really comfortable with it again you don't necessarily have to watch the video because you have the handout um but otherwise I'll try to provide you with a little bit of extra information and a little extra interpretation the other thing I will say here is if forever whatever reason when you're running these analyzes on your own computer right now and you encounter an error message um that's totally fine from like an R perspective we're going to have some Zoom sessions for for troubleshooting and discussion of some of those issues so if you are hitting any technical hurdles I would say just you know stick with the video and follow along because I'll be talking about the interpretation of all these elements of the code so even if you hit a technical problem on your end and you can't recreate what's being done here we can figure that out in the troubleshooting session later and then for the the the purposes of the video right now you'll at least be able to see the interpretation and everything else that goes into it so to start with um you know let's just focus on the parts of this handout so I'm going to provide you with some text um and so that gives you an explanation in the different sections and then the areas that are presented in Gray are little code chunks in R and so those correspond to what you see if you have the r markdown file open here so to run those little code chunks you can just press the the Run button here and then you can see that it will actually Implement that code down here in the console window so in this first case we're just opening up the library so there's nothing really much that happens there if you have not installed these libraries on your computer before you're going to need to actually go ahead and Skip down to this second chunk and install those packages and again if the installation of packages in the opening of Library sounds unfamiliar to you we did send a link to a video before the workshop to kind of walk people through how to do that and and how to do that in our studio so you can go back and revisit that video now or sometime future if you feel a little bit uncomfortable with that the other thing I will say is if you uh don't actually like looking at this in the r markdown interface and you actually want to run this code on your own what you can always do is create a script file and then inside of that script file you can just copy the r code chunks and and put them in your script file here so for instance the next thing that we're going to do right down here under data cleaning and quality assurance is actually run the code to import the data so in our markdown I can click that run and you'll see it'll run all the way through and then I've successfully created this object called Data up here in my in my global environment and for the purposes of you know kind of following along with the workshop that's all you would need to do but if you wanted to create your own script file you know that doesn't have my comments and it doesn't have the other text or an idiot or markup language around it all you would need to do is copy that code you know bring it over here and now in your own script file you can run that code so I'll click run and you can see you know it's imported the data file and we're looking at the top using the head function we're looking at the first six rows of all of the columns that are in that data file so these are the data that we're going to be working with in the in the module and to show you kind of in this columnar Arrangement right we've got a column for subject IDs we've got a column for the month for these longitudinal data that were collected over time we and month is coded in integers but we also have a Time variable that tells us you know specifically when those data were recorded you know we have the hypothetical sex of these subjects hypothetical bmi's but then most importantly we have raw scaled film scores because what we're going to be looking at is how the raw scaled functional Independence measure for these hypothetical spinal cord injury subjects changes over time so we have one column that's a rosh fim and that's going to be our primary dependent variable and everything we're working on until you get to the very end where we're going to provide you with some extra information about missing data and in the missing data lectures we're going to have a few different variations of this dependent variable so we'll have rosh fem missing at random or Mar we'll have rosh fim missing not at random or mnar and rosh fim with the last observation carried forward where we've filled in those missing values with the last available observation so again those those three columns will be dependent variables that we'll worry about later but for the bulk of what we're going to do we're we're going to look at these raw scaled film scores and see how they change over time and let me toggle back to the worksheet because that's going to be the easiest way to visualize some of this stuff but again you can flip back and forth running the code in R coming back to this worksheet and if you have any technical R questions please bring them to the zoom Workshop component where you can ask those questions live but once we have those data imported we're going to use the ggplot function in order to build up a plot where we take those data and we're going to look at month on the x-axis and raw scaled film scores on the y-axis and then the other elements of this code I won't really get into but it's going to allow us to add lines and points to the plot that we're creating in order to visualize the raw scaled fem scores over time so here you can see the the the the plot that that code generates and this shows us our raw scaled film scores on a scale from zero being a complete lack of Independence to 100 being complete Independence as a function of time which goes from 1 to 18 months for our three different groups where we have individuals who have a c one to four spinal cord injury a C5 to 8 spinal cord injury and individuals who have paraplegia so what we want to do is try to understand how can we model these data longitudinally right because if we just fit a regular regression to these data we're going to run into a problem where our residuals are not independent of each other because different observations come within a person over time so the traditional General linear model is going to break down here right and the idea that we can't just fit a one time variable to all these observations and treat these observations as independent however this overall kind of notion of that you know our data is equal to some statistical model plus the residual error still applies um so yeah you can read through you know the handout in more detail but what we are going to do is use random effects in addition to the fixed effect of time in order to account for that statistical dependence in the data and there are a few different ways that we can do that as you might recall from the conceptual lectures this morning one would be is that we could could put just a random Intercept in there so for each person they're going to get a unique deviation away from the group level intercept and that allows the line for each person to start in a different place but everyone's going to have the exact same effect of time getting a little more complicated we can have both a random intercept and a random slope which would be represented by this model here where you can see we have group level effects for the intercept and and the slope represented by the betas but we also have its individual deviates away from those slopes and intercept represented by U 0 J and U1 J you can I've kind of like rearranging the terms here to make it a little bit more clear but you can see that okay for each person then they start with the group level intercept and they have some deviate either up or down that determines their intercept and then they have a group level slope and they have some deviate either up or down that determines their unique slope and by having a random slopes random intercepts model we will now be able to estimate a unique trajectory for every person and then we will get residual errors based on how they deviate away from their lines and conceptually that is what's being shown with the code here that generates this figure so we're looking at Raw scaled film scores from 0 to 100 as a function of time going from 1 to 18 months and you can see that the dots and these lines are color coded to reflect the individual subjects so let's for instance look at this green person here and you can see that this person has a much lower intercept than the group level intercept right so they have a random effect so for their own intercept um their slope is actually pretty parallel to the group level slope so they wouldn't show much of a difference in their random effect because they are pretty close to the average slope um within this group but what you can see further though is that even though we have a unique slope and intercept for this person there are still deviations away from that line and we tend to have you know some negative residuals at the beginning more positive residuals in the middle and more negative residuals at the end so we we have a unique slope a unique intercept and a set of residuals for each person and you can see that represented by the different colored lines here the thick black lines represent the group level trajectories and the thin colored lines represent the trajectory for each individual person that you would get if you took that group level beta and added the subject level U right that random effect to it so by adding the fixed and the random effects together you can get the unique coefficients for each person so these are the key components of our model we have fixed effects which determine the group level trajectories we have random effects that determine the individual deviations away from those trajectories and we have the residual errors or random errors which are the difference between each individual data point and the unique trajectory estimated for each person we're going to spend a lot more time on each of these things as we move forward but you know if this is unclear I think it's really helpful to kind of go back at this point and maybe look at some of the slides that Al presented or look at some of the more detailed explanation in the text that I've presented to make sure you feel comfortable with the conceptual distinction between fixed effects random effects and residual errors and if you have any questions about those things be sure to bring them to the zoom you know live q a that we're going to be doing so next our model definitely does not look linear if you were looking at those uh plots earlier you might have said well okay I get the random slopes random intercepts but it doesn't really look like linear change in fact it looks like there's kind of a curve to that line where people show a greater change early on and they show less change later and you're absolutely correct and looking at these figures it certainly doesn't look like a straight line is going to be the best description of our data because there seem to be diminishing returns in these fim scores over time and we're going to deal with different ways of of the handling this non-linearity as we move forward in the advanced session in the afternoon we're actually going to talk about truly non-linear models but for now we're going to focus on curvilinear models that use polynomials to make our lines curve so a curvilinear model creates a curving line but it is linear in its parameters which is to say you know we can have time and we can have time squared but the parameters of beta0 beta 1 1 and beta 2 just get added together right so they are linear in their parameters we might have a unique slope for the effect of time squared but it just it just gets added to the value for the linear effect of time which gets added to the intercept to ultimately determine the value of y at each point in time so curvilinear functions can approximate those sorts of Curves but again they are linear in their parameters which makes them distinct from truly non-linear functions so for instance a negative exponential function like this one is something we'll deal with in the advanced class and you definitely can fit a mixed effect model that is non-linear but curvilinear models are a really good place to start because people are generally more familiar with regression that uses polynomials than they are with negative exponential functions or with regression splines or some of those kinds of things so we will get into truly non-linear models in the advanced session but for now let's just think about creating a curvilinear model and we'll plot that visually again using some ggplot code and as always I'm not going to go through all of it but the thing I do want to specify here is that we're adding a smoothed function uh using the stat smooth argument and you can see I'm special specifying a particular formula for the stat smooth and for this first one we're going to do it for each subject so I'm clustering them by the color based on their subject ID number and then for each person I'm going to estimate y as a function of X Plus x squared and then my second stats move is actually going to create a smooth line for each group so the first one is doing it at the subject level the second one is doing it at the group level but again you can see that I'm specifying a formula where Y is equal to X Plus x squared and that's what's ultimately going to generate a plot like this one where I have curving lines so in this situation and I would actually be estimating a random intercepts random slopes model and not only do I have linear random slopes but I have quadratic random slopes so I would refer to this as the quadratic random slopes model visually that's what we're trying to present here because you can see in each group we're going to have group level effects that tell us okay on average you know how where do people start That's The Intercept how much does their line tilt that would be the linear effect of time and then how much does their line curve and that would be the quadratic effect of time and we're going to have random effects for each of those things The Intercept the linear effect of time and the quadratic effect of time because that's going to allow people to start in different places relative to their group that would be the random intercept it's going to allow people to have a different tilt relative to their group that's the random linear effect and it's going to allow individuals to have a different curvature relative to their group and that is the random quadratic effect and you can see just visually having that random quadratic effect greatly reduces the residuals we're getting a much better fit to everyone's data here than we were in the purely linear model if you scroll back up to that random slopes model so the the formula for this definitely is starting to get a little more complicated right but just to put the formula on the figure on the screen at the same time right we we're just adding one extra component this is still a random slopes model but what we have is a a fixed effect for The Intercept the linear effect of time and the quadratic effect of time and then complementing each of those we have a random effect of The Intercept represented by a u a random effect of the slope and a random effect of the quadratic slope so we're going to get a pretty customized personalized line here for each person and we're getting a much better bottle fit but you can see it's at the expense of making a much more complicated model and these are always the things that we're going to have to balance against each other how well does the model Fit versus how complicated is that model so next we want to decide which of these models which of all the possible models that we want to build is actually the best explanation right conceptually I hope you feel pretty comfortable with what the fixed effects are what the random effects are what the residuals are and now we really need to think about model building and model comparison so how are we going to build a random intercepts model how are we going to build a random slopes model and then how can we use R in order to compare them statistically so we're going to start with our random intercepts model because that is computationally the simplest and it's essentially going to be our Benchmark for what is the simplest model we can fit to these data any any added complexity then has to be better than that model because if we add a variable and it doesn't explain things better than the random intercepts model that variable is not very good and we probably don't want it in there so let's start with our uh random intercepts model where for each person at each time we're just going to estimate a group level intercept and an individual difference away from that intercept and then we're going to fit that and see how much error there is in our data and our goal is going to be to see well how small can we get this error and this model is so simplistic we probably won't reduce the error very much so one thing that I'm going to do before we we do this is I'm going to create a variable called year.0 where I'm just taking months and months started at one so I'm going to subtract off 1 in order to have months start at zero and then I'm going to divide by 12 to convert months into years and the reason for doing that is twofold first I want it to start at zero so The Intercept is really the first observation that we have because our observations actually started at one month but I want them to start at time zero so I'm going to subtract one I also want to scale the slope to make it a little easier to interpret so rather than having um you know months zero or one to eighteen right I want to scale this to years and the years will now go from 0 to 1.5 right so rather than dealing with 0 to 18 months I want to deal with 0 to 1.5 years and by scaling that time variable that helps me avoid some potential convergence issues especially as we add interactions into the model then the here's the r code for actually implementing that model so we're going to create an object called random effect intercepts right or ran F int using the lmer function and inside of the lmer function you want to specify what your dependent variable is and then you'll use this tilde which tells you as a function of and then you can tell it what you want to predict that dependent variable as a function of I've added some comments here just to help keep it cleaner and you can see the the the effects a little more easily but you don't actually need to have those comments in there if you want to have this all in one line and that makes more sense to you the code will totally work that way I just like breaking it up in this way so that you can see okay well what are my fixed effects and then what are my random effects so in this random intercepts model the only fixed effect I have is this intercept right there's a group level intercept for everyone and the only random effect I have is an intercept within each person right so this is an intercept conditioned on the subject ID so everybody is getting their own Intercept in this model so uh that I have a fixed intercept and I have a random intercept right for each subject um and that's going to essentially allow me to fit a very simple model to the raw scaled film scores from this data set so I have to tell the the function where the data are coming from and then finally I have to say if I want to use restricted maximum likelihood or maximum likelihood so remil is restricted maximum likelihood and we're going to set that to false because we want to use maximum likelihood not restricted maximum likelihood and we'll get into a little bit more of why we use maximum likelihood in a second um but the the short answer is is that for deciding amongst models we need to use maximum likelihood in order to do model comparisons if there was one model that we knew we wanted from the beginning and we were just interested in kind of the statistical significance of the individual parameters we could use restricted maximum likelihood but right now we're really interested in what random effects need to be in there and in order to compare models with different random effects we need to use maximum likelihood estimation and you know we'll talk a little bit more about that down below but that is a great question if you want to bring it to the zoom q a later you know we can talk more about the difference between restricted maximum likelihood and maximum likelihood but for now again let's just run this R code in order to create our random intercepts model and then once that is created we can use the summary function to get all of the details of that model printed to the screen so inside of that summary function you can see it gives us a kind of a restatement of what the model is it's going to give us some model fit statistics up here at the top including the akaike information Criterion the Bayesian information Criterion the log likelihood the deviance and the degrees of freedom for the residuals we're going to get into that in in a minute right for now I'm just telling you what those things are those are your model fit statistics uh it also tells you about the scaled residuals right so kind of how far off are you on average when you're when you're making these predictions um that then gives us a summary of the random effects and this is a component of the output we're going to be really interested in uh so you can you can see that it gives us okay the grouping factor the name of the particular random effect and then the variance and the standard deviation of that random effect so you can we have a random effect for the intercepts right and we have a standard deviation of 9.5 points so that means on average people differed from the group level intercept by about 9.5 points on the film below that it then tells us about the residuals right and again it gives us a variance and a standard deviation and the standard deviation is about 13.27 points so the residuals are saying okay we're estimating a group level line for the group and we're estimating a line with a different intercept for each person now how far away from each individual line do the data points tend to be and people tend to be off their individual line by about 13 points so those are pretty big errors suggesting we're not explaining the data very well and that will be probably very apparent when we actually plot this model to show what this model is estimating but that's kind of how we would interpret this this random effects output here the other thing that is nice in that output is that it gives us a summary of the number of subject IDs and the total number of observations and by looking at the total number of observations and how those observations are grouped that could be really helpful for you understanding uh you know were there any errors in your data because if those numbers don't add up or if they don't look like what you think they should that's going to suggest that you know something went wrong somewhere and you probably have to dig back into that and do a bit more troubleshooting the final part of that output is the interpretation of the fixed effects and in this case it's pretty boring because it's just an intercept and it's telling us that on average right people's intercept was 44.904 and that's not super informative although it is statistically different from zero as you can see from the p-value here but all that is saying is that kind of the average point across all these data across all people across time was 44.9 and people deviate from that point but that is what the group level average was so um we're gonna uh there's some other R code where I show you how to kind of pull out the random effects and the fixed effects individually um if you want you can run that code in R to actually see what that looks like and get those observations out um but the main thing that I'm going to do is use some of that information uh to generate some plots uh so I I I'm going to create a data frame called predictions and then I'm going to fill that data frame with some information uh based on those coefficients from the fixed effects and the random effects um so that's a little more detailed just from like an R programming perspective than I want to get into here if you know if you do want to do that with your R code again you know you kind of can come down to the appropriate place inside of the markdown document to where we fit that model um and then where we're going to actually be extracting those results so it's online uh 200 of the of the markdown document and again if you're really interested in in how I am doing this I'm happy to kind of go through that with you over email or or during the zoom q a but for now conceptually just know that once we've built that model I'm pulling out the unique effects for each person so that we can plot them and the conceptual understanding is what's really key here so I'm I'm taking the first 10 subjects and I'm going to plot their data so those are the gray dots with you know the black line connecting all their individual observations and then I am plotting over the top of that as the red bars the predictions from our random intercepts model and you can see that those predictions are not very good right because we're just predicting an intercept it's a constant so it doesn't change over time we are predicting one value for each person now each person gets a different value but it ignores any change over time so even though people might start in a different place right you can see the difference between s05 and s04 or s04 and s03 it is not a good fit to the people's data which is why those residuals were so large right so clearly there's some room for improvement here so the next thing we're going to want to do is build a fixed slope random intercept model where we're going to have that random intercept but we are going to add in an effective time now this is a fixed slope though which means that time is going to be the same for each person so to do this in R I'm going to create a new model called ran F slope and I'm going to use the lmer function where again we'll be looking at Raw scaled film scores as a function of us intercept right that's the one and time which is our year.0 variable so we're going to put that in here as our fixed effects however at the moment the only one we want to vary between people is the intercepts so we'll include a random effect of subject ID so we'll get a random intercept for each person and again we have to tell the lmer function what data it's coming from and we're going to use maximum likelihood estimation once we fit that model we can use the summary function to get the output printed to the screen and it would look like this so again it's going to tell us what model it was fitting it's going to give us our model fit statistics it's going to give us a summary of the residuals and then it's going to give us our explanation of the random effects and our random effects here look the same in terms of that we have a random intercept for subject and we have residuals but you'll notice that the values change some so our our standard deviation for the intercepts is about the same people's intercepts are still about nine points away from the group level intercept but by including the effect of time you can see that we've cut the residuals down by almost half it went from a value of about 13 right for the standard deviation of residuals to a value of 6.6 so we're getting a much better explanation of the data by allowing our model predictions to change over time and if you want to know well how much did they change over time we can look at the output for the fixed effects where now we have both an intercept right which was 26.58 so that is to say at time 0 at the very beginning the average raw scaled film score across everyone was 26.58 but we also now have a slope for year.0 which was about 25.85 and that was statistically significant right given the p-value here and what that's saying then is for a one point increase in the X variable which in this case is time in years so for a one year increase we expect rosh scaled film scores to go up by about 25.8 points so at the beginning of the study on average people started by about 26 and by the first year people changed on average by about 26 points as well so there's definitely a positive slope in these data and again I'm going to use some R code to pull out the individual uh coefficients in order to generate a plot but conceptually our goal is to understand what is that fixed slope random intercept model doing and you can see that plotted for the first 10 subjects here again the dot the gray dots connected by the black lines of the individual subject data the red headlines are the predictions from our model and we have a random Intercept in this model so everyone's line starts in a different place but we have a fixed slope so everyone has exactly the same slope because we did not allow the slope to vary within individuals so you can see that that's a better approximation of the data but it's still not great and especially early on we seem to be missing a lot of the change in raw scaled film scores based on our linear slope so the next sort of level of complexity we can add here is to say well rather than giving everyone the exact same slope what if we created a random slope random intercept model so again showing the mathematically what that would look like right we're going to have a group level intercept that varies by each person and we'll have a group level slope that varies by each person in our terms we're going to create a model which I'm going to call random effect linear random right so so so this is a that we've got a random effect on the linear slope we're going to use lmer we're going to be modeling raw scaled film scores as a function of both an intercept and time right represented by a year.0 variable but now in our random effects I'm going to include a random intercept for each person and also a random effect of time so not only will everyone start in a different place but their slope will tilt right differently depending on which person you are and then finally we can print those random effects to the screen and we'll get an output much like we've seen before where we had uh our our model fit statistics our scaled residuals and then an explanation of our random effects and that random effects output is getting more complicated here because now not only do we have a standard deviation for the intercepts to tell us how much people differed from the group level intercept we also have a standard deviation for the slopes to tell us how much people differed from the group level slope and on average right people were typically seven points above or seven points below the group level slope we had a standard deviation of about seven points for those slopes suggesting that people vary pretty significantly in their trajectories the other additional piece of information that we get here is the correlation between the random effects so not only do we see the standard deviations for the individual random effects we see how they are correlated and in fact these tend to be positively correlated such that people who have higher deviates for The Intercept also tend to have higher deviates for the slope so people who had a higher starting position also tend to change more linearly over time whereas people who had a lower starting position tended to change less linearly or be flatter over time now we can also look at our residuals to get kind of a general sense of how well this is fitting our data and it's gone down you know it had it's gone down from about 6.6 points with the fixed slope to about 5.8 points with the with the random slope so you know it's not a huge Improvement but it definitely is a statistical Improvement where this model is giving us a better explanation of our data again you can see that we're going to have um fixed effects for both The Intercept and the uh slope and both of these things are statistically significant you'll notice that their estimates didn't really change from from the uh the the random intercepts model but one thing I do want to point out is that when we include the random effect for the slope the degrees of freedom go way down here uh it started at something like 630 right and now we have a degrees of freedom of 40. and obviously those things are very different so I did want to talk a little bit about you know well what does that mean when you include that random effect for the slope um when we had just the random intercepts and a fixed slope we were estimating one line and then treating all of the data points right essentially as contributing to that line when we add a random effect for the slope what we're saying is there's not one overall slope that goes through this cloud of data points what we really have is a series of slopes for each person and then what we have because we have 40 people are 40 different slopes that we can compare right it's not one slope being fit through several hundred data points it's once slope being fit through 40 data points then one slope being fit through the next person's data points then one slope Being Fit for the next person's data points and then at the end we have 40 slopes and so we have a random sample of slopes right is how we are treating this variable now and that reduces the degrees of freedom so the the you know exactly what the correct denominator degrees of freedom here are is is an important question and and you know there are some debates about kind of the best way to actually handle this but the thing I want to impress upon you is that your choice of the random effects can make a big difference in your model and so conceptually you want to be thoughtful about which random effects you're including and Rich random effects you are not including and you can't just for instance throw a random effective subject in there and assume that all the statistical dependency is removed from your data because what you might risk doing is inflating your degrees of freedom which makes your type 1 error rate a lot higher so you want to be careful with that and conceptually right I think it's a good practice to think about well I've got 40 people they all have a unique slope I probably want that random effective slope in there now you certainly can test a random intercepts model that has a fixed slope but but theoretically speaking that doesn't really make sense so it's a stepping stone to kind of see okay well how much error do we explain as we add complexity to our model but realistically we probably want that random effect in there because the slopes do vary from person to person even if they vary a tiny amount conceptually we probably want that random effect of the slopes in there now again um I'm I'm pulling out the coefficients uh from that that model to actually generate a plot that shows us what's happening here um but what we're going to get is a uh random slopes model where now you have the red lines for each individual person that differ not only in their intercepts right but they also differ in their slopes so you can see that for instance subjects 0 4 and 0 9 they tended to have quite low intercepts they also tend to have much flatter slopes whereas subjects 0 2 or subject zero eight had much higher intercepts and they tend to have much steeper slopes so you can see across those first 10 subjects we're estimating different slopes and intercepts for each person and there is also a correlation between those slopes and intercepts such that people who started out higher tend to have steeper slopes as well and they show a greater rate of improvement okay so finally right we've also talked about the non-linearity in this model um so what we can do is get a quadratic random slopes model where we're going to allow the line to bend by adding a quadratic effect here of year squared um to to our model and we will allow that to bend differently for each person so we're going to also add a random quadratic effect to our model so we once we once we run that model right and we create ran F quad Rand we can use the summary function to see a detailed output of that model and as before right it's going to give us our model fit statistics and our scaled residuals and then we're going to get our random effects table which is now getting much more complicated because now we've got random effects both for the intercept and the linear effect and the quadratic effect um and we actually have three correlations now because we have the correlation between the intercept and the linear effect and between the intercept and the quadratic effect and between the linear effect and the quadratic effect which tend to be highly correlated now in this case the the the uh the program didn't give us any warnings or errors um but really high correlations between your random effects can often lead to convergence errors and if your random effects are so correlated that it approaches one right essentially they're perfectly correlated uh you'll get a convergence warning and it'll usually say that there's a singularity or or you've hit a boundary in your estimation and conceptually one way we could interpret that is if these effects were so strongly correlated that let's say this correlation was actually negative one I don't need the quadratic random effect in there um because if I have a linear effect and the correlation is one I already know what the quadratic random effect is so essentially those things are determined and you don't need them both in your model if that correlation saturates and goes to one this correlation is pretty strong so theoretically this random effect might not actually be doing that much for me because it's already largely determined by the linear random effect but let's actually test that question and see empirically which of these things is the better model which we'll get to in a second so that correlation is pretty high but it wasn't high enough that you know the math broke and we got a warning so let's keep moving forward with interpreting this model and you know again the other thing we want to look at in this table are the residuals which have gone down by quite a bit now so by including the quadratic fixed effect and the quadratic random effect on average we're only three points away with our predictions right so our our regression lines are actually a pretty good approximation of each person's data and our residuals have gotten a lot smaller from when we started and if we actually want to see what our model is predicting right again we can look at the fixed effects to get the intercept to get the effect of time and to get the quadratic effect of time uh and we could actually you know use these coefficients in an equation to figure out well what's the actual predicted value of y at any one point but the main thing is you know here are our effects uh and we can look at the statistical significance of these effects and again because we've included random effects for all three of these parameters their degrees of freedom are all around 40 because we're saying in essence okay well we've got you know about 40 intercepts 40 slopes and 40 quadratic slopes um so we're estimating those things within each person and then saying okay let's pull all the quadratic effects together see what the group level effect is and how much deviation there is left over after that so to see that model in a little more detail again right we can gloss over the actual R code but I'm pulling out the coefficients from each model to plot the quadratic effect for each person and you can see that this is quite a good approximation of our data for most of these people Everyone differs in both their intercepts they differ in the tilt of their individual lines right being the linear your component and they also differ in the curvature of their lines being the quadratic component so you can see kind of what our model is doing when we estimate those random effects for each person and then on average across people right we can always just come back up to our fixed effects to say Okay so this was The Intercept you know approximately at Time Zero people have a raw scale trim score of about 18 points you know the initial slope is about 64 points however that slope actually gets less positive over time because we have a negative quadratic effect and in fact it starts to bend downward so this is consistent with Improvement that shows diminishing returns because the initial slope is quite steep but it gets less and less steep as time moves on until we actually get to a point that for some of these individuals it would hypothetically bend down and that's that's a problem we can definitely talk about uh later because you know they probably don't really bend down down they flatten so the quadratic slope might be a good approximation within the window of time we're looking at but we don't want to extrapolate beyond the data that we have because then now your quadratic function is going to be forcing your predictions to curve down even though that's probably not really what happens once you move outside of your data so let's take a break there for just a second though feel free to you know go back and re-watch anything or make some notes on things that are confusing and once but now that we have all of these random effects models we want to talk about how do we compare between these different models foreign so again you can read through the output for a little more detail on on some of these things and get a little more background on some of the calculations that are being done um but I want to focus in on just a few key conceptual points in in this workshop and then we can talk through some things in the zoom q a but much like in traditional regression where you're going to use the sum of squared errors in order to make a decision about is something a statistically significant addition right you know does it explain enough error that we think that that is unlikely to be due to sampling variability we're going to have a metric of error from Maximum likelihood estimation called the deviance it is essentially the the conceptual equivalent of the sum of squared errors and our goal is going to be to reduce the deviance and get a model that is a better fit to our data and now the deviance is negative two times the likelihood um and the likelihood is a little bit confusing um but uh if you write it out this is what that formula would ultimately turn into um and this is a little bit intimidating I think as a formula to look at depending on your background in mathematics but there are a couple things that I want to impress upon you uh when we're looking at this first notice that the deviance is likely largely still being determined by the sum of Errors so all else being equal if we give smaller errors we're going to have a smaller deviance and or that that that is desirable because if we have smaller errors that means our model is a better explanation of the data second you'll notice that the sample size actually kind of comes in two different places here we have the N for the number of observations and then we're summing over the errors for each observation so essentially the more observations that we have the we'll have an error for each one right so the the more observations that we have the larger that sum is going to become and this means that the calculation of the deviance is sensitive to the amount of data that you have uh so the the the big picture take home from this is that we can only compare the deviants fairly from models that are based on exactly the same data if you had one data set that was bigger it's it's potentially going to have a bigger deviance just by chance um so we can only fairly compare models that come from the same data set and we can then fit different models to that data and decide which one is better but if you know if we added a covariate and there were some missing values and therefore we had to drop people now we can't fairly compare those models anymore because one is fundamentally based on less data than the other model so that's one thing to keep in mind moving forward is that we have to compare models based on the same amount of data now fortunately in our hypothetical example it's kind of a toy data set where we have complete data so so no one gets dropped in all cases we had a hundred and set or 720 observations and 40 individuals but so what we can do now is decide well which of our unconditional models right where our models where we're just looking at the effect of time is is actually the best approximation of our data and we can compare the random intercepts to the fixed slopes to the random slopes to the quadratic random slopes and we can do that using the Anova function in R so we'll do use say Anova and then we'll enter in our different models um and what we'll we will enter them in the order we want them to be tested so we'll start with our simplest put the random intercepts first then the random intercept fixed slopes then the random linear slope then the random quadratic slope and we're going to get an anova table that looks like this and you can scroll through the the worksheet for a more detailed description of what some of these columns are and and you know what this chi-square test ultimately is what the kai K's information Criterion ultimately is but the thing that I really want you to focus on is that we have two different ways primarily of assessing model fit here one is wald's test of the change in deviance so it looks at the difference in the deviance between successive models and calculates that based on a chi-square distribution so these two models are 945 points different in the deviants and that's a big change in the deviance and that happened happens to be statistically significant based on on the chi-square test similarly these two models are 131 points different right and that is a statistically significant Improvement based on the chi-square test of the change in deviance our final model that quadratic random slopes model is a 763 Point Improvement above the linear model and that is a statistically significant Improvement above the linear test of the change in deviance so with each successive model we're getting a better and better explanation of the data however one of the issues with the walled chi-squared test is that it is just looking at the change in the deviance and so it is seeking out essentially the model that explains the data that are right in front of you right now you know and and which model is the best explanation of those specific data however one of the problems that we face is that we want our models to generalize to new data sets so it's not always just is this the best explanation of the data that are in front of me right now it's a question of is this model going to be the best explanation of a new sample of data or if someone else went and collected this sample you know from the same population and tested similar models would they get the same answer so the statistic that we'll actually prefer is the akike's information Criterion or the AIC and the AIC is shown in this column here and the AIC introduces a penalty based on the a number of additional parameters that you have in your model so the goal of this is to reduce something that's known as overfitting and the idea behind overfitting is that you as you add parameters you might be explaining the data in front of you very well but it's not going to explain a new set of data very well and and we can talk about some concrete examples of this in the zoom session if you're not familiar with that term but in order to avoid overfitting the akike's information Criterion introduces a penalty based on the number of parameters so you can see as a starting point um our akike's information Criterion is a little bit higher than the deviance because we only have three parameters and then down at the end you can see that our akike's information Criterion is actually about 20 points higher than the deviance because we have 10 parameters so for every additional parameter the AIC is imposing a penalty on the deviance and what we want to see then is is the model fit a reduction above and beyond that penalty and in this case it still is because our AIC is always going down and that's what you're looking for when you're looking at the that AIC column which of those is the lowest value because that's going to tell you the model that is the best explanation of the data while taking into consideration that penalty to avoid overfitting so so you know you might think about the deviances sort of being the best explanation from the sense of errors the AIC is the most parsimonious explanation it tells you which one is giving you the best model fit as it versus its complexity right so that's why I talk about the AIC being a measure of parsimony for a given level of complexity it's giving you the best model fit and if your model is starting to be more complicated than it is fit uh your AIC is going to go up so in this case though every model is a successive Improvement above the other one because the AIC keeps going down and therefore we'd make the same decision regardless if we if we use the walled test of the change of deviance or if we use the AIC uh we're going to go with the quadratic random slopes model as our best fitting model so again there is some additional information uh in in those sections you know to kind of talk through the AIC and talk through the wall change in deviance uh and some additional information in that handout um if you're if you're really interested in that but for now I'm actually going to move on to the second uh practical session handout so that's practical session two conditional models and hypothesis testing where we're going to build on that model we just created uh so in the unconditional model we were trying to decide what is the best shape of the time curve now in the conditional model we're trying to decide well what actually makes a difference in how people change over time so this is where we're going to add in a factor of AIS grade and try to decide do our groups of participants change differently over time all right so again if you're following along with this in R you can you know copy and paste the code uh into your own script or you could um you know open up the the r markdown documents that have been given to you and run these individual code chunks for the ones that go with acrm session two um so you know you can follow along with those R code chunks or running the r code on your own depending on how comfortable with that you are but I'm going to focus primarily on the interpretation in the worksheet and then we can save any Zoom truck or troubleshooting of of our problems for the zoom q a a little later on so let's go ahead and scroll down just a little bit here because the first part of this is again just kind of opening up the data and getting a reminder of what these data are so after importing the data I have this ggplot code where we're going to be looking at Raw scaled film scores over time in this hypothetical data set and what we want to see now is if people change differently as a function of the severity of their injury so we had several people who had a C1 to 4 injury several people got a C5 to 8 injury and several people who had paraplegia and we want to look at do these groups differ in their change over time in the previous practical session we established the best way to capture change over time was with a quadratic random effects model and now what we're going to do is say given that model do these groups differ in terms of their intercepts their linear slopes and their quadratic slopes in the unconditional model we determined that it was those three parameters that really made the difference so now we can add a fixed effect of group to help us decide well do the c one to Fours start in a different place than the C5 to eights or do the c one to Fours have a different rate of change than the C5 to eights all right and by exploring how your group affects your trajectory we can answer so or we can test some hypotheses and hopefully answer some relevant scientific questions about how the severity of your injury relates to your initial impairment and relates to your change over time so that's what we're going to ultimately try to decide right and in this spaghetti plot we're looking at the individual trajectories for people shown in Gray and the group level trajectories shown in black and in this case I'm plotting just the moving average for each of those groups so in order to create the conditional model um we're going to uh uh again have our our year.0 variable where we've converted months into years and we've centered time on the first available uh data point and we're just going to start by refitting that quadratic random effects model um so you know if you have the same R session open it's probably already there in your environment but if you've stopped after session one and you're starting session two we want to recreate this model because from the unconditional models this was the best fitting model that we had and so we're going to use this as our Benchmark to see okay does adding an effective group actually make a difference above and beyond just time on average ignoring what group you were in so we're going to create that model again we're going to have fixed effects of The Intercept the linear effect of time and the quadratic effect of time and then we're going to have random effects for The Intercept the linear effect of of time and the quadratic effect of time within each subject and if we want to see you know okay well how just what do these things explain we can use the Anova function also to look at a single model and in this case we're going to get the type 3 analysis of variance table and and I talk about this a little bit in the text hopefully you're familiar with the notion of type 3 sums of squared errors calculations versus type 2 and type 1 if you're not I would encourage you to revisit some of that content uh um you know I can direct you to some videos or regression textbooks that talk about how those things are calculated differently but the one thing I would always impress upon you is be careful if you're looking at a type 1 analysis of variance table versus a Type 3 analysis of variance table because that's going to have a a massive effect on how you interpret the variables in your model in our case because we're using lme4 and we've loaded the lmer test package it is going to be giving us type 3 tables if you don't have both of those packages loaded you might be getting type 1 tables and therefore it's going to be really important both for walking through this handout and in future sessions if you're ever building your own models to look at that and say okay am I getting a type 1 table or am I getting a Type 3 table because the interpretation is going to be very different so knowing that we're getting type 3 tables right the the what we can interpret these as sort of traditional main effects and interactions that you might get from a from an anova output um and we can get Omnibus tests if we use that Anova function if we want to actually look at individual coefficients we can now use the summary function like we've done before right and again it'll tell us our model fit statistics it'll give us a description of the residuals it'll give us the random effects and then it will actually give us the coefficients and that we estimate uh in the fixed effects and you can say okay on average across all people ignoring what group you're in here is our intercept here is our linear slope and here is our quadratic slope again capturing that idea that if we want to describe how Y is changing over time people start at a value of about 18 points and their initial slope is about 64 uh points right that's the initial linear slope but that slope becomes less positive over time right and it actually starts to flatten out as time goes on so if we actually wanted to solve for a specific value of y we could pull out those coefficients plug in different values of X and actually solve for what the predicted raw scaled film score is at any given time however our goal at the start of this right was not just to get the overall slopes and intercepts it's to see how the groups actually differed from each other and to do that we're going to add a fixed effect of AIS grade to our model in order to see how do the groups differ so in R I'm going to create an object called conditional model 1 and again we'll be using lmer to get our raw scaled film scores and then we're going to add a fixed effect of AIS grade and we're going to have AIS grade interact with both time and the quadratic effect of time and as I explained in the text up here we're going to use this asterisk operator and the reason for doing that is if you use the asterisk to say like year.0 by AAS grade R is smart enough to know you want both the main effects and the interactions of those terms so if you have higher order interactions that can be especially helpful because you can say variable 1 asterisk variable 2 asterisk variable 3 and it will give you the fully factorial design it'll give you all the main effects and the interactions alternatively if you wanted to specify things by hand right you could also say okay well I want the main effect of time I want the main effect of AIS grade and I want the interaction between year or between time and AIS grade using the colon operator so the asterisk operator gives you main effects and interactions the colon operator gives you a single interaction by itself and so those would be equally valid ways of kind of writing the same formula but to save on space and because we want the fully factorial design I'm just going to use the asterisk here when I input my fixed effects now the random effects will stay the same right because I have time that varies within subjects AIS grade does not vary within subjects if you if you had a C1 to 4 injury at the beginning of the experiment you had a c one to four injury the whole time right that is a subject level variable so it does not vary within subjects so I don't need to worry about making my random effects any different here when I'm adding that variable to the model so AIS grade just goes in the fixed effects our random effects stay the same as they always have once I have that model right I can run that code in R to create the model object and then I can analyze that that model using the Anova function and that will give us again a Type 3 analysis of variance table where it shows us the main effect of time which is statistically significant the main effect of AIS grade which is statistically significant and the main effect of time squared which is also statistically significant so that suggests that people are changing over time their rate of change right also changes over time and there are some differences on average between the the the groups however we don't actually see a statistically significant Time by group interaction nor do we see a Time by group squared interaction so that suggests that although you know we have some initial differences in those groups and we have some with higher intercepts and some with lower intercepts the trajectory for these groups right is not reliably different um individuals of paraplegia right have a higher intercept than individuals with C5 to 8 have a higher intercept than individuals with C1 to 4 injury um all of them change over time but there's not a significant Time by group interaction nor a quadratic Time by group interaction so that suggests the tilt of their lines and the curvature of their lines is are all about the same and most of the differences in our groups are actually due to these starting points at the intercept so if we want to unpack those um Omnibus main effects and interactions properly though right for instance the effect of AIS grade has two degrees of freedom what we would need to do is actually conduct some post-hoc tests or use the summary function to look at the individual coefficients and this makes the the the the model output a lot more complicated because you can see that now we have a whole bunch of coefficients when we've added in that main effective group and we've added in the interactions with group and time now we have a much bigger model and we have a lot more detail that we have to worry about and unpack so to give a quick overview of what this actually means right and how to interpret this output it's important to remember that R uses uh what we would call dummy coding or treatment coding where it takes a reference group and it codes that group as zero across all levels of the categorical variables and then the group it's getting compared to you gets coded as one so the way we can interpret that then is that anything that doesn't have an AIS grade variable or as grade name in the variable name is referring to the c one to four group because that is the alphabetically first group and therefore it is what R will treat as the reference group so for the intercept right doesn't have an AIS grade effect in it that 12.28 that is the intercept for the c one to four group similarly the effect of time with year zero it doesn't have AIS grade in the name that is the effect of time in the c one to four group and finally I the the effect of time squared right negative 24.86 that is the quadratic effect of time in the c one to four group all of these other coefficients are showing the difference between the named group and the reference group so for AIS grade C5 to 8 that's telling us about the difference in The Intercept between the C5 to 8 group and the C one to four group so the C1 to 4 has had an intercept of 12.28 the intercept for the C5 to eights was 6.8 points higher similarly for AIS grade paraplegia that's telling us the difference between the The Intercept for the C1 to fours and the paraplegic group who had an intercept that was 14.6 points higher so that can be a little hard to unpack but if you're familiar with treatment coding hopefully that makes sense to you if you're newer to treatment coding I would encourage you to go through this section 2.3 unpacking the conditional model where I show you how to take each of those individual coefficients right and then plug in ones or zeros depending on how it is coded in order to simplify the model and although there is kind of a lot of math here it's all very simple you're just plugging in ones and zeros and then reducing the equations and then ultimately write what you get down to are these three equations here wrapped up in our larger model we can see okay here is what the equation actually reduces to for the c one to four group here is what the equation then gets modified to for the C5 to 8 group and here is what the equation gets modified to for the paraplegic group and we can see some of the the differences from our Anova table represented in these three equations there was a main effect of time right so all groups improved over time all of their linear slopes are statistically different from zero and that seems to check out because they all have quite large positive slopes there was also a statistically significant quadratic effect of time right such that all groups uh uh show or I should say on average across groups there was a negative uh curvature to the trajectory um now these things did get Modified by the effect of AIS grade there was a main effect of AIS grade which shows the effect of group on the intercept and we can see that in the intercepts of these three different equations right the we have a higher intercept for the C5 to 8 group then we do the C1 to fours and we have a higher intercept for the paraplegic group than we do the C1 to Fours so we see some group differences in the intercept and that kind of checks out as we look at these models because those intercepts do look quite different from each other now we did not see a statistically significant Group by linear time interaction suggesting that although in the sample these linear trajectories are all different they aren't actually statistically different from each other so we did not find evidence in this model that your linear trajectory changed as a function of group similarly we did not find evidence of a group by a quadratic time interaction suggesting that our quadratic trajectories were not reliably different across the different groups so again I hope that feels a little bit familiar to you if you've worked with dummy coding in regression before if you haven't I'd encourage you to maybe re-watch this section and then spend some time working through that section of the the video also depending on how familiar you are with Omnibus tests and the notion of conducting an Omnibus F test and then doing a follow-up post-hoc test I'd encourage you to spend a little bit more time looking at section 2.4 and how we can use the Anova function and the summary function to ultimately get the Omnibus tests with the Anova function and then get at least some of the post-hoc tests we might be interested in with the summary function so I hope this additional explanation is helpful to you as you work through these handouts and as you run through the r code if you have any questions about anything in the handouts anything conceptually and especially any errors that came up as you're trying to troubleshoot the r r code please bring those to our Zoom q a and Al and I will be available to help you thank you very much and I look forward to seeing you in the next part of the course
Info
Channel: Keith Lohse
Views: 5,529
Rating: undefined out of 5
Keywords: R programming, mixed-effects regression, longitudinal data
Id: 2ETfg35yk04
Channel Id: undefined
Length: 68min 30sec (4110 seconds)
Published: Thu Sep 22 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.