Mixed Models, Hierarchical Linear Models, and Multilevel Models: A simple explanation

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] [Applause] [Music] hey friends welcome back today we're talking about hierarchical linear models or mixed models or multi-level models whatever you want to call them i don't care we're going to talk about them so we use multi-level models or mixed models when we have non-independent data you remember that assumption we kind of talked about the independence assumption didn't spend much time with well that's kind of important and what does the assumption of independence mean it basically means your scores are not correlated with one another and no i'm not talking about your x and your y or your independent and your dependent variable those are supposed to be correlated we want them to be correlated what i'm talking about is within a variable that the first person you record an x for is not going to be related to the second person you record an x for in other words whatever person one does shouldn't influence person two but sometimes that's not an easy assumption to defend sometimes we measure the same person multiple times or maybe we measure family members and family members tend to be correlated with one another or maybe we measure students who are taught by the same teachers or maybe we measure people from the same region all these instances mean that we might have correlated data so now let's talk about it for a minute why is this whole independence assumption important well let's use an example let's say i believe i can bend the roll of a dice to my will and i can make it a one i'm really good at picking up the ones and so let's say i roll it five times and i get three ones bam i'm amazing i don't think so in other words sixty percent of the rows were ones that's pretty good so you might ask yourself okay what is the probability of that sort of thing happening by chance and it turns out the probability of getting a one three or more times is point zero three cool that's a rare event but you know i was really hoping for a really really low probability that would be really cool and so i might think to myself i know that if you have a large sample size your probability estimates will go lower or your p values will go lower so maybe you consider two options option number one you roll the dice ten more times so instead of rolling five times you roll 15 times now are you gonna get exactly the same are you gonna get 60 once again probably not so maybe this time you only get one additional one that's not unlikely so what is your probability now at this point your probability is equal to 0.23 or in other words you have a 23 chance of getting four ones out of 15. well technically four or more out of 15. so still rare but not like super rare great so that's one way to increase your sample size but you might be thinking huh if the only thing stopping me from an amazing p value is a larger sample size why don't i just double my sample size or triple my sample size and so if you got three out of five on the first time why not just pretend you rolled it again and got three out of five the second time three out of five the third time and so you would have nine rolls out of 15. and if you compute a probability now your p value would be .0001 or something like that awesome you didn't have to go through the effort of rolling the dice that many times and you got a lower p value win win well that's a good idea i think no that's what we call cheating folks that's too bad why is it cheating think of it this way if your original sample had 60 ones what's the probability your second sample will be sixty percent once the probability is one you have a one hundred percent chance of it being exactly the same why because you duplicated your data set on the other hand the probability of getting that exact same thing on independent roles is gonna be very different so anytime you're using probability to make decisions you can't just duplicate your data set that's gene because the probabilities that you're computing aren't the same as what you think you're computing so the probability of getting 60 on independent dice rolls is very different from the probability of getting sixty percent when you have duplicating your data set now why in the world am i talking about duplicating my data set we all know that's wrong yes you're right hopefully we do know that but here's the thing we have correlated data set like husband and wife or brother and sister or time one and time two we are effectively doubling our sample size so yeah you find a relationship with the husbands now you go and collect the wives guess what you're probably gonna find the same relationship but not because the relationship is there but because whatever you found with the husband is probably gonna be found on the wife because they're not independent and by the way of all the assumptions to violate heteroskedasticity normality independent assumption linearity sorry i forgot of all the assumptions to violate the worst is to violate the assumption of independence because you are artificially inflating your probabilities because again we are artificially multiplying our data set those are not truly independent observations but it happens sometimes we have correlated data sometimes we have husbands and wives sometimes we have people measured more than once sometimes we have clusters of people within therapists or something like that and so we need a way to deal with it we need statistical procedures that explicitly account for that correlated nature of the data so back in the day what we used to do is what's called a repeated measures anova and we use that by temporarily pretending each subject is like a categorical variable and model the categorical variable and so each subject has a subject effect and then at the end of the statistical procedure there's some sort of a correction that goes on in there and it's really complicated and really confusing and not fun to deal with at all and at the end of the day when we make that correction it assumes something that we call sphericity what is sphericity don't worry about it because it'd be really complicated to tell you and we don't have to worry about it when we're using mixed models and so back in the day instead of doing a repeated measures in nova we'd use what's called the multivariate approach and it's really complicated and really confusing and you don't need to know it i'm just giving you some appreciation for how amazing mixed models are so fortunately in the 1900s there was development in this idea that we have mixed models or hierarchical models or multi-level models and these multi-level models or hierarchical linear models they've basically made repeated measures anova completely obsolete thank goodness because they're a bi i find your jokes offensive and distracting so with that what are hierarchical linear models boy i'm glad you asked so we can call them hierarchical linear models or multi-level models or mixed models or mixed effects models they're basically all the same things so for mixed models or hlm we use that when we have clustering and so you might have clustering for example if you have a bunch of clients who have the same therapist or a bunch of students who have the same teacher or the same person measured multiple times again anytime you suspect your data are correlated then it's time for hlm or mixed models or whatever and so here's a handy visual in this situation we have dr russell dr smith and dr bean each of which has three patients patient one patient two and patient three for dr russell etc and if you were to see it as a data set it might look something like this where you have patience and you have doctor which is the name of the variable that indicates the cluster and you have that repeating that's a good indication that you have mixed models or like i said not only do we have clustering because we have therapists or siblings or schools or classrooms or something like that but we also have it when we have people measured more than once so here's an example where we got dustin that's me measured three times time one time two and time three as well as matt and lexi so again anytime you collect multiple measurements from the same person you can't treat those measurements as if they're independent you remember the problem that we talked about earlier where you're effectively doubling your sample size well not only can that happen but you can also be misled in pretty dramatic ways so let's look at an example so suppose we have this image which shows a relationship between symptoms severity and the proportion of people who survive when they go to a hospital and it seems to be from this graph there's actually a slightly positive that the more severe your symptoms are the more likely you are to survive awesome let's wait until we're on our desk bed before going to the hospital that's a good idea yeah that would be a bad idea why well let's see what happens when we color code the data so here we have red green and blue indicating the different hospitals red is in and out green is cypress and blue is university medical and so it seems to be that university medical only accepts patients with super severe symptoms so maybe it's a specialty hospital that has only the top-of-the-line experts or something like that whereas with the in and out hospital it tends to be people with less severe symptoms and so within a cluster what you see is you actually see a negative relationship as expected the more severe your symptoms are the lower your probability of surviving and we wouldn't know that unless we used hierarchical linear models what hierarchical linear models do or mixed models do is they essentially fit a separate regression line for each and every cluster so looking back at our hospital example notice there's a line for in and out and a line for cyprus in a line for university medical and then what it does is it estimates what we call the fixed slope or basically you can kind of think of it as the average slope between severity and symptoms across all hospitals and here's what that looks like which is the solid black line there so the color lines we call the random effects and the black thick line we call the fixed effects so let me summarize what we're talking about so far basically if we treat individuals within a cluster as if they're independent we're probably going to screw things up in one way or another either we're going to inflate our probability estimates or we're going to miss the nature of the relationship entirely we don't want to do that because again the purpose of data analysis is to find out what your data are trying to tell you so hlm or mixed models more or less fit separate regression lines for each and every cluster in reality it's a little more complicated than that but not that much more complicated and the complication which you don't really have to know is that it will estimate a line per slope but it will tend to bias it towards the fixed effect especially if your cluster has a small sample size every cluster has its own regression line simple enough but we also have some decision we can decide that we don't want the slopes to vary between groups we want all the slopes to be exactly the same or maybe we want to say all right the slopes can vary but the intercepts have to be fixed the intercepts always have to start at the exact same point and you can do that and so those parameters that we fix we call fixed effects and those parameters that we let vary we call those random effects a random effect means that every single cluster's parameter is going to be different by the way i'm kind of sort of misleading you at this point just to simplify things but we'll complicate it later and that's okay so again fixed effects everybody gets the same slope or intercept or whatever random effects every cluster has their own unique slope and intercept so i'm going to show you a bunch of different examples of what it might look like to fix certain parameters and let others freely vary so here's what we would call a random slopes and random intercepts model these are probably the most common very often people want the slopes and the intercepts to vary across clusters not always but very often so in this situation we have panel on group there's group one group two and group three and notice that each of those has their own unique regression line and so each of those has no fixed effects again i'm lying to you there is a fixed effect here but i'm going to complicate that later not right now and here's an example where we have a random slopes model which means we have fixed the intercept to be the same for everyone and notice that those lines all basically start at the same point now the only reason why they don't exactly start at the same point is because x doesn't go to zero but if it started at zero you would see that all those lines start at the exact same point by the way it's very rare to do this sort of a model normally you don't want to fix the slopes it's kind of a bad idea but conceptually i want to show you what it looks like when you fix different parts of the model and let other parts of the model freely vary so again the intercept is exactly the same all groups start from the same place but their slopes are allowed to vary so theoretically what this is saying is that each cluster has a different slope but they have the same intercept and i can't think of a theoretical situation where this would actually make sense where you want to fix the intercept but not the slope it just seems kind of weird to me but you might be in a situation where you need to do that so once again here the slopes are random because they vary across groups whereas the intercept is fixed because they are identical across groups and now let's look at an example of a random intercepts model so on this model shown here notice that the intercepts are allowed to vary but the slopes are fixed to be exactly the same and this is pretty common it's not unusual to fix the slopes so with this model each cluster has its own unique mean or its random effects mean and their mean deviates from what we might call the grand mean or the global mean or the fixed effect mean and that's the complication that i'll get to in a minute but here again the slopes are fixed which is basically theoretically saying that the relationship between your predictor and your outcome is identical across groups there is no variability in the slope again this is not an unreasonable assumption especially if you have categorical data so here's an example where we are modeling a fixed effect of male versus female differences so maybe y for example could be height of course people aren't 15 feet tall but you get the idea and maybe the groups one two and three here represent different schools and there's no reason to suspect that the average difference between males and females in one school should be any different than the average difference between males and females in another school so it makes sense to have a fixed slope here so i'm sure you're asking at this point okay great we can fix slopes we can fix intercepts how do i decide generally but not always we allow the intercepts to vary and whether you fix the slopes or not depends on theory but in either case always let theory be your guide and basically what you want to do is whenever you add a predictor you want to ask yourself do i expect the slopes to be consistent across clusters if so we would model it as a fixed effect only which again fixed means everybody gets the same slope regardless of whatever cluster you belong to if not if you think that they might be consistent across clusters then you add a random effect and again random effect means every cluster has their own slope again let theory be your guide if you don't have theory to guide you or if you can't quite tell well categorical variables very often probably usually have fixed slopes again going back to our previous example the difference between males and females in height should be consistent regardless of your cluster and if you have numeric variables usually you want to vary the slopes for example the relationship between anxiety and depression that's probably going to be stronger for some therapists and weaker for other therapists probably depends on your expertise in treating anxiety or depression but always you can always do it empirically you could always find out from the data if the data support a fix versus random effects which is exactly what i'm going to show you but not in this video so with that i'm going to go ahead and quiz you put your thinking caps on folks and bakula because we're going to do a quiz so i'm going to show you a bunch of graphs and you tell me based on the graph are the slopes fixed or are they random are the intercepts fixed or are they written so let's look at this one leave a comment with your best guess i always thought that was really stupid when people ask that because you're gonna watch the end of the video and you're gonna find out the answer so what's the point of leaving a comment i swear i guess the right answer every time anyway don't bother leaving a comment with your guest because i'm going to tell you right now all right graph number one what do we have here well it looks like certainly the slopes are varying and if we were to extend the x-axis over to zero i'm betting that the intercepts are fixed so we would call this a random slopes model how about the next one all right well this one we see height differences so they are certainly varying in their intercept but the slopes are identical in fact they're identical when they shouldn't be identical look at the data they tell you yeah you need to vary those slopes dude so this would be fixed slope random intercept next one again it's hard to tell unless we were to extend the x-axis all the way to zero it could be a fixed slope model and in retrospect i probably shouldn't i probably should have had the scale go to zero so you can actually tell but my best guess would be actually you know what they are not fixed intercepts because if you look at the far right graph over there notice that those two lines are almost touching which means that those two graphs don't originate at the same point so that's going to be a random slope and random intercept model now let's look at the last one all right that one let's see that's again a hard one to tell um i would probably guess it's going to be a fixed intercept model because yeah it looks like those lines will all converge on about the same point so now that we have concluded our very unclear example of what the intercept is fixed let's go ahead and review our learning objectives number one what is the assumption of independence the assumption of independence means that people within your sample are uncorrelated with one another again you can have correlations between variables but not within a variable they should not be correlated within a variable number two two reasons why violating the assumption of independence is problematic number one you are artificially doubling or tripling or quadrupling or whatever your sample size which again means your probability estimates are going to be inflated you're gonna be much more likely to make a type one error and number two the nature of the relationship may be masked so again in our hospital example the average relationship if you ignored the clustering was slightly positive but once we accounted for the clustering it was actually negative across all those hospitals number three understand the difference between mixed models hierarchical linear models multi-level models mixed effects models and there is no difference they are all the same thing the terms are used interchangeably although there are some differences once you get at a super complicated level but we're not going to get into that level so for your purposes they're all the same number four understand geometrically what a mixed model is doing and again all it's doing is it is fitting separate regression lines for each cluster mostly again it's going to bias it towards a fixed effect but basically that's what it's doing number five five five i think five understand what fixed and random effects are again fixed effects means it is the same for every single cluster whereas random effects means every cluster has different ones so it could be a fixed slope or a fixed intercept number six or whatever number we're on know the visual representation of each of the following models random slopes and random intercepts model random slopes only model and random intercepts only model and again this isn't really a learning objective maybe it is i don't know remember we really don't often do a fixed intercept model it's just there to help you kind of conceptualize what a fixed versus random effect is if there are any questions leave them in the comments section and i'll see you next time if i had a nickel for every time someone violated a statistical assumption i could afford not to live in a van down by the river

Info

Channel: Quant Psych

Views: 5,671

Rating: 4.886179 out of 5

Keywords: Statistics, Psychology, NHST, Null Hypothesis Significance Testing, Philosophy of Science

Id: c_tYZxQLoDA

Channel Id: undefined

Length: 18min 28sec (1108 seconds)

Published: Wed Mar 24 2021