Mixed Model Notation - A Simple Explanation

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Applause] [Music] all right people buckle up yeah today's gonna be the most mathematically heavy my videos have ever been except for the first time i did hlm videos so you remember this formula depression is equal to b0 plus b1 times stress plus e well i kind of lied to you when i showed you that because really it ain't that simple it's actually depression subscript i is equal to b0 plus b1 times stress sub i plus e sub i what does that i represent that represents the i th individual so if you are the 99th person in your data set your i would be 99 so depression subscript 99. i know that's a little complicated and convoluted why do mathematicians have to make things so hard i don't know and we statisticians like to follow in their lead there's something about making something complicated that makes some people feel better about themselves oh there's something nobody else knows i'm amazing loser all right so let's look at a data set and see how this might work so here we've got person and then their index or their eye their depression and their stress so depression subscript three represents depression for the third person which is andy poor andy he's got an 18.8 depression score next time you see andy give him a big hug and tell him you love him and he's having a rough time and if we were to write the equation out using subscript notation we would say depression subscript 3 is equal to b 0 plus b 1 times stress sub 3 plus e sub 3. and for andy that would be 18.8 is equal to b zero plus b1 times 7.7 because that is andy's stress score plus e sub 3. and i was too lazy to figure out what b0 and b1 are and if we knew those we would knew what the error was but you get the idea so implicit in all the regression that we were doing or all the linear models we were doing we had that subscript i just didn't talk about it because it wasn't important well it's kind of important now because with hlm we need those subscripts and they make things a little bit more complicated but they're not that much more complicated i hope because now not only do we have i individuals we also have j clusters or j groups so if we have five therapists there's going to be j is equal to one two three four and five so our new outcome variable is going to be depression subscript i j which is the depression of person i within the cluster j and so when we do that that allows us to model different slopes and intercepts for each and every group so let's look at an example so we have depression i j is equal to b zero j plus b one j times stress i plus e i j are you kidding me i swear statistics will be the death of me that is scary and i get it i know this is really weird and really funky but kind of has to be that way so we have the b zero j what does that mean well b zero in regression means intercept and the j basically means there is a separate intercept for every j or for every cluster and so if we were to break that down even further we could even write an equation for b zero j so b zero j is equal to gamma zero zero plus u zero j gamma we're using gamma now oh boy headaches coming on it's all greek to me and that's okay yeah i don't know why statisticians decided to make things complicated but they did in this case gamma zero zero means the grand intercept it is so grand it would wear a top hat to a party oh please stop or the average intercept across all groups remember i lied to you earlier and i said it was a little more complicated than that and that we don't have fixed effects if we have a fixed slope well actually we do have fixed effects even when we have random effects and so that's what gamma zero zero is is it is the fixed effect of the slope or it is the fixed slope so again let me say that again gamma 0 0 represents the fixed i said slope for all that now i'm going to say it all over again gamma 0 0 is the fixed intercept meaning that when you estimate every single person's equation every one of them will have a gamma zero zero in their equation so gamma zero zero is the fixed intercept or it is the average intercept across all groups and what is u zero j u 0 j is nothing more than the deviation of the j group from the fixed intercept so let's put this in practical terms let's say the fixed intercept is 10 but this particular therapist within his group his intercept is eleven so what is his u zero j it's plus one because again zero j represents the deviation of a cluster from the fixed effect so that's how you would handle the intercept now let's look at the slope again before we had a b one j and now if we were to break that down we have b one j is equal to gamma one zero plus u one j so why is it one zero and the intercept is zero zero the first subscript refers to what parameter it is so you remember when we did basic regression we had a b0 plus b1 b0 is always the intercept and so gamma 0 0 represents the zero width parameter in the model which is the intercept then the second subscript is always going to be zero for the fixed parameters just to indicate that these are the fixed parameters just like before we have a u but this time instead of zero j it's one j because again it's referring to the slope let's say that that intercept is 10 and let's say that that intercept there for that group is 11 then gamma 0 0 is going to be 10 and gamma 0 j for this particular group is going to be plus one and let's say this slope here is going to be 0.5 and this group has a weaker slope so it's going to be a negative number and let's say that its slope instead of being 0.5 is 0.25 so so it's value of u 1 j is going to be negative 0.25 i know that's complicated and i know it's tricky to think about the notation but as richard feynman said nobody understands math people just get used to it so just get used to it spend some time looking over it watch this video over and over again inflate my view counts as you wish and it'll come to you so taken together if we look at this equation all together we would say that depression is a function of a few things one it's a function of an intercept that differs between therapists meaning that there is a different intercept for each and every therapist and we also have a slope and that slope has a fixed effect that is common among all therapists but there's also a different slope for each and every therapist and then finally we get to the e i j that basically says i every individual that's where the i comes from has their own residual and they belong to a particular group and here's a handy little visual that shows gamma zero zero as the grand intercept and then u zero j that is therapist j's deviation from the grand intercept and then we have gamma one zero which is the grand slope or the fixed effect slope and then we have u1j which is therapist jay's deviation from that grand slope isn't it lovely so let's go ahead and look at an example so here we have an alcohol use data set this is a real data set and we are looking at alcohol consumption among youth so we have an id variable indicating which person it is that is our cluster indicator and we have an age variable 14 15 and 16 indicating the age at which we ask them these questions coa is child of alcoholic male is whether they are a male 0 means they are female 1 means they are a male and then age underscore 14 represents the number of years since they were 14 which is redundant with the age variable but you get the idea and then we have alk use or alcohol use which i believe represents the average number of drinks a week and then we have pier which is the average number of alcoholic drinks your peers consume and then we have two other variables c stands for centered and we're not going to get into centering in these videos and once again we have an id notice that it is repeated you have an id that is repeated that's a good indication that you should probably do mixed models and notice also that the only things that change from time 1 to time 2 to time 3 within an individual are age and the number of drinks they consume by the way you can only model something as a random effect if it varies within a cluster if every score within a cluster are the same you can't model it as a random effect and so we might ask a question and that question is to what degree does alcohol consumption increase as a function of age you might also ask of all the predictors we have which is the best predictor of how much these youth will drink could it be pure influence could it be their age or could it be the fact that they were or weren't a child of an alcoholic these questions we can ask using hierarchical linear models so i'm going to show you some r code this isn't my r code video i'm going to go more into details in the next video but i'm going to show you some r code just to give you some exposure the first model we're going to fit is a random effects anova that's always a good starting point and you'll learn why in a future video and in this model we don't use any predictors we just fit a fixed effect mean and see how much these groups differ on their in this case out average alcohol consumption if we were to look at this in mathematical notation it would look like this alcohol use i j is equal to b zero j plus e i j again there are no predictors here and if we were to expand b zero j we would have gamma zero zero which again is the fixed intercept plus u zero j and of course we could put all these together and just replace b zero j with the full equation and you would get something like this alk use i j is equal to gamma 0 0 plus u i j plus e i j wow that is so much fun to say next time someone challenges you to a tongue twister challenge them to say alk used i j is equal to gamma 0 0 plus u i j plus stress i j plus e i j nailed it unfortunately all the tools within flex plot we've been using so far we can use here too so here's some r code that loads the dataset alcuz which is a part of the flexbot package here's the r code we would use to fit a random effects model notice there are no predictors but we do have to have a one in there and one is the way that we tell r that we're fitting an intercept so on the left side of the parentheses are the fixed effects on the right side are the random effects so we are telling r that we want a fixed effect or gamma 0 0 and we are also telling it that every id is allowed to have its different mean so this one refers to gamma 0 0 and then this one right here refers to uij and taken together those become our b0j the lovely thing about flexpod is we can visualize this so if we type in visualize mod and then plot equals model then we get something that looks like this and we see that very clearly that that the average alcohol use differs within cluster and we can even look at the results of what r gives us and i don't actually spend much time looking at this because i got some functions in flex plot to make it easier to conceptualize what's going on but for your information this right here the point five seven three one that represents the variance or it's the variance of each person's mean from the fixed effect mean and this right here the point five six one seven that is the variance of each individual from their mean maybe it's best to understand this visually this points 5731 that represents the deviation of each group's mean from the fixed effect the 0.5617 represents the deviation at each time point from their average across the three time points and right there is the fixed effect that is saying that on average across all people across all time points we average 0.922 drinks a week lovely so that is a random effects anova again random effects nova has no predictors let's go ahead and look at a random intercepts model so with a random intercepts model we now add a predictor and if we have a random intercept we allow the intercepts to vary but the slopes are fixed and again notationally this is what it looks like v 0 j is equal to b 1 times h 14 plus e i j if you were to expand the b 0 j by the way we call that the level 2 model so if we were to expand the level 2 model we would say b 0 j is equal to gamma 0 zero plus u zero j so again the intercepts are allowed to vary but the slope is fixed and if we were to put it all together this is what it would look like yay hallelujah and then if we were to look at it in r this is what the equation would look like again that age 14 tells r that there is a fixed effect because it is outside the parentheses and then within the parenthesis we have a one that tells r that the intercepts are allowed to vary across ids and if we were to use just summary of rand dot intercept we would get something that looks like this and again we have an estimate of the variability of each group from the fixed mean 0.5966 and then we have an estimate of how much each individual differs from their average effect across the time points that's 0.4915 and then now instead of having one fixed effect which was the slope we have two fixed effects we have a 0.65 for the intercept and then 0.27 with the slope so what is this telling us this is telling us that on average at the age of 14 because remember an intercept is whatever your zero point is remember there was a column that had 0 1 and 2 represent 14 15 16. that's one of the reasons just so the intercept is easier to interpret so this point six five tells us this is the average number of drinks across all people across all clusters at the age of 14 years old so on average people who are 14 in this sample they average 0.65 drinks and the 0.27 means that every year they got older they increased the number of drinks they had by an average of 0.27 nice and lovely and of course we could visualize this using flex plot and we get a graphic that looks like this again we have a solid black line which represents the fixed effect and then we have the different colors representing the slope for each and every individual so let's go ahead and do a random slopes model now which means we're going to fix the intercept and again you normally wouldn't do this but just so you can know how to do it here's how you do it so here's what our equation looks like now we don't have b zero j because we're saying that the intercept is fixed but instead we have b one j and b one j if you were to expand it out the level two model becomes gamma one zero plus u one j or we have a fixed slope which is gamma 1 0 and then a deviation from that and if we were to put that all together we would just replace b 1 j with gamma 1 0 plus u 1 j and if we were to look at this in r what would it look like again we have age 14 corresponding to the fixed effects so we are telling r that age 14 has a fixed effect but again within parentheses we have a negative one which is the way that we tell r that the slope should not be random and we also have age 14 that is random age 14 here corresponds to gamma 1 0 and then age 14 here represents this parameter which basically says that every individual is allowed to have their own slope lovely lovely lovely and if we were to do a summary of this we would see now that we don't have any variability about the intercept again because the intercept is fixed but now we have variability about the slope and so that .225 basically says that people's slopes differ and the degree to which it differs is represented by the variance which is 0.225 and if we were to visualize this we would see something that looks like this again we are fixing the intercept to be the same it's kind of weird but we can do it and we see that the slopes are varying and of course we naturally want to do a random slopes and intercepts model and so now we have b0j and b1j and if we look at the level two models of each of these we would have a gamma zero zero for the intercept and a gamma one zero for the slope if we were to put it all together this is what it would look like hooray these terms right here represent the parameters for the intercept and these right here represent the parameters for the slope and to do it in r this is what it would look like again everything outside of parenthesis is a fixed effect so we have a fixed slope and a fixed intercept and within the parentheses we have a random slope now we don't have a one in there and the reason why is because in r you don't have to specify a one and the reason why is because if you add a predictor in r it assumes that the intercept is also random so you have to explicitly tell it minus one if you don't want it if you do want it it's going to implicitly be there if you have any sort of predictors in the model so now we can again look at the variability about the intercept right there 0.6355 represents the degree to which individuals vary in the amount of alcohol they consume when they are 14 years old because again when they are 14 years old that is the intercept value or that is the value at which the slope is equal to zero and then the 0.1552 age 14 represents the degree to which these young individuals vary in the degree to which they increase in their alcohol consumption as they age and by the way it might be helpful to look at the two outputs side by side so on the left we have a model that has just a random intercept and a fixed slope and on the right we have random slopes and random intercepts and you'll notice the difference in the residual error or sigma so when you do not allow the slopes to vary you have a larger residual and that makes sense if you're forcing everybody to have the exact same slope your predictions are going to be poor they're not going to be as close to the individual dots on the other hand in this one we have a much smaller residual error and that again is because each individual is allowed to have their own fit and here is what it would look like if we were to visualize it and that looks all pretty and stuff again the black line represents the fix effects and then the color lines are the random effects for three randomly selected individuals so let me summarize for you because that was complicated i understand that and i'm sorry it's complicated but it kind of has to be complicated remember fix effects means that that effect is applied to every individual regardless of cluster we might call it a common slope and a common intercept and by the way i don't know if this was clear but every time you model a random effect there is also going to be a fixed effect so each individual cluster might have a different slope but there's always going to be an average slope across all clusters and that's what we call the fixed effect and then we have random effects and you include a random effect when you expect that within the clusters they might deviate from the average slope or intercept so do you expect each cluster to have their own slope add a random effect do you expect your clusters to have different intercepts add a random effect and like i've said before typically we don't add a random effect for categorical variables though you could so with that let's review our learning objectives number one understand subscript notation in regression again we use an i to indicate which row we are at in the data set number two understand subscript notation for mixed models so now rather than having just an eye we also have a j the i represents the individual the j represents the cluster number three understand the following random effects anova again that means you don't have any predictors you're just modeling the differences in the means random intercept model so the intercepts are allowed to vary but the slopes are fixed random slopes model so again that means they originate at the same point which is kind of wonky but their slopes are allowed to vary and of course understand a random slopes and intercept model where both the slopes and the intercepts are allowed to vary next understand what fixed and random effects mean fix effect means that every individual in the sample gets that value in their equation and it is represented again by that solid black line so when determining everybody's score that line is added to their score number four understand what gamma zero zero and gamma one zero mean again gamma zero zero is the fixed intercept and gamma one zero is the fixed slope also have a basic understanding of how to do a random fx anova in r how to do a random intercepts model in r how to do a random slopes model and how to do a random slopes and intercepts model and finally have a basic understanding of visualizing these models in r and i say now a basic understanding because we're going to revisit that in the next video until then peace out
Info
Channel: Quant Psych
Views: 1,877
Rating: 5 out of 5
Keywords:
Id: eVuQlGDZxHs
Channel Id: undefined
Length: 20min 41sec (1241 seconds)
Published: Wed Mar 10 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.