Fixed and random effects with Tom Reader

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello I'm Tom reader and in this video I'm going to explore the difference between fixed and random effects in statistical analysis you need to understand this if you're going to analyze your data correctly and avoid the terrible statistical sin of pseudo replication I'm going to assume that you have a working knowledge of the basic statistical tests which are taught in most introductory stats courses at university and that you understand basic stats jargon like factor and dependent variable first up I should explain that all of the statistical tests you've encountered so far t-tests analysis of variance regression they all use a mathematical model to describe how the dependent variable the response variable is influenced by the effects in which you're interested which are caused by the independent variables which include both factors and covariance you may not have seen models written down but you use them every time you run a test and they represent your hypothesis about how the world works now it turns out that the way that we defined the model for any given situation depends on what we think the values of the independent variables mean much of the time we're implicitly assuming that the values of an independent variable are fixed and that they represent the entire population of values in which we're interested so if the independent variable is a factor for example we study they study the factor levels which we consider to be interesting there are no other levels to study or we're just not interested in other levels that there might be for example we might be studying the health effects of smoking let's say we measure the blood pressure of smokers and non-smokers since everybody falls into one of these two categories you either smoke or you don't there are no other categories to worry about alternatively we might be comparing acceleration among models of supercar and we choose the two most popular models to assess because we want to know specifically which of these is quickest although there are other models of supercar we might be happy to ignore them because they're not the focus of our study in these situations we're dealing with what is called a fixed factor or fixed effect sometimes however it's more appropriate to assume that the values of an independent variable are drawn at random from a larger population of possibilities for example rather than setting out to study two particular models of supercar I might be interested in variation in acceleration among models in general perhaps I want to test the idea that variation among models is relatively small compared with the variation caused by differences in the smoothness of gear changes or the reaction speeds of drivers in this case I might need to select a sample of car models to be representative of the wider population of models which exist if so as long as I have replicate measurements of acceleration from each one the models represent different levels of what we call a random factor or a random effect now this might sound like we're dancing on the head of a pin surely the distinction between fixed and random doesn't really matter well it does matter for two big reasons first the mathematical models for the two different types of effect fixed and random are different and the wrong model can lead to the wrong conclusion secondly and perhaps more importantly random effects are often overlooked altogether people simply don't realize that they could be affecting their results this can lead to the statistical sin of pseudo replication most basic stats courses and textbooks and most of the simple off-the-shelf tests you know abound assume that all the effects that you're studying are fixed but there are potential random effects all over the place and you need to know how to spot them and how to deal with them so how do you spot a random effect well most of the effects in which you're interested are going to be fixed if the main purpose of your study is to test the effect of a factor or a covariant on your dependent variable it's probably a fixed effect that you're looking at even when you are sampling factor levels from a larger population your choice is probably highly non-random you select the factor levels in which you're particularly interested for example in our supercar study we're probably more interested in the acceleration of specific models of car is Ferrari or the Lamborghini the fastest rather than the general pattern of variation among models so fixed effects are not too tricky but random effects are sneaky and they crop up when you least expect them remember that study looking at the health effects of smoking you might decide to take five repeat blood pressure measurements at hourly intervals from each participant to help deal with measurement error and the fact that blood pressure measurements fluctuate quite a lot over time that's fine but each group of five measurements comes from the same participant so they're not statistically independent you've introduced the possibility of a random effect of participant ID your participants are probably a random selection from a larger study population and you aren't really interested in whether one particular participant has higher blood pressure than another but you need to account for the fact that the repeated measures from each person will be correlated because some patients will have naturally lower blood pressure than others for reasons which have nothing to do with smoking if you don't deal with the random effect you'll appear to have many more independent replicates than you really have in other words your study will be pseudo replicated okay then so in this case perhaps I abandoned the idea of repeated measures altogether because it complicates things too much but what if half of my participants have their blood pressure measured by one nurse and half by another every nurse will vary slightly in how they take the measurements and although I'm not interested in comparing the performance of particular nurses I still got a potential random effect of the identity of the nurse if one nurse deals with the participants who are smokers and one deals with the non-smokers my study has only got pseudo replication there is only one true replica one nurse per participant group and any difference between the groups could arise purely because of that random effect of nurse this would be a fatal flaw in my study from this example hopefully you can see that random effects are really important and you have to deal with them or you risk your analysis being meaning we've already seen that one way to remove random effects is simply not to repeatedly sample the same level of that effect if every measurement comes from a different level of the random effect then effectively each observation is statistically independent and you don't have any pseudo replication but defining what a level is can be tricky the levels are often recognizable as different individuals as in our blood pressure example individual participants but that's not always the case and anyway as we saw in the example there may be very good reasons why we want to take multiple measurements from the same individual the simplest alternative to the presence of a possible random effect is to average across the data within each level so in our example if you have five blood pressure measurements from each person you could take the average and hey presto you have a single estimate of that person's blood pressure and the random effect has disappeared but this is throwing away a lot of data a better alternative to model is to model the random effect properly in your analysis to do this you're probably going to have to venture beyond the simple statistical tests which are taught in basic stats courses in particular you're going to want to find out about mixed effects models which allow you to model a mixture of fixed and random effects but that's a topic for another day
Info
Channel: University of Nottingham
Views: 61,852
Rating: 4.9486017 out of 5
Keywords: The University of Nottingham, Nottingham Uni, Nottingham University, Notts Uni, Uni of Notts, Higher Education, Russell Group, Learn, Study, Experience, Fixed and random effects, difference, statistical models, Tom Reader, TRANSFORM Statistics Project
Id: FCcVPsq8VcA
Channel Id: undefined
Length: 8min 9sec (489 seconds)
Published: Fri Jun 21 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.