Lecture 01: The General Linear Model

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
This video contains a pink hippo puppet an  example about hearing loss and zombies eating   brains mention of storing body parts  in a fridge in the dalek of statistics   some lego characters may have been  harmed during the making of this video   welcome to the first lecture on this module  and today is uh the first of a series of   five lectures maybe six depending who knows uh  that's going to look at the spine of statistics   so what is it that we mean by the  spine of statistics well basically   in psychology and other sciences there's a kind of  process that you go through that is common to uh   to sort of everyone really you start off with  some kind of question that you want to answer   a scientific question that's been uh driving you  mad making you lose sleep you're like oh my god i   got to know the answer to this question and what  you do if you're a scientist i mean you know if   you're a normal person you just go on google but  if you're a scientist you try to collect some data   to begin with so the whole process starts  off with something known as sampling   so you'll go and collect some data  about question that you're interested   in so that's the first part of the process once  you've collected some data you will then visualize   the data so you plot it in some meaningful way  we're not going to talk about that too much on   this module but you're going to rehearse it a  lot in the interactive tutorials that you do the big thing is that you would then fit some  kind of statistical model to the data which tells   you something meaningful now normally there's  two types of things that we're interested in   when we fit a statistical model the first one is  estimation so this is addressing the question of   how big is the effect that i've looked at so  what is how big is the effect of this variable   versus uh on this other variable so uh you can  do estimation in one of two way i mean they're   both complementary it's not either or in one of  two ways the first is you can have what's known   as a point estimate so this is a single value  that kind of tells you well as this variable   changes by one this other variable changes  by two for example so it's a single value   that kind of quantifies the effect that one  variable has on another the other way you can do   it is with something known as an interval estimate  so you may have heard of confidence intervals   before that's an example of an interval estimate  and that's where you're kind of saying well the   uh well it's complicated what you might be saying  but we'll get into that um you're kind of saying   the effect that this variable has on this other  variable is somewhere between this value and this   other value so it's not a single value it's  like an interval within which you think the   uh the actual size of the effect is going to  fall so that's one thing you can do estimation   the other thing scientists do an awful lot  of which we're going to cover in a couple of   lectures time is hypothesis testing so you  can set up your model in such a way that   it is testing a hypothesis that  you're interested in so this is more   uh kind of looking at an effect in a sort of  dichotomous way like does it exist versus does   it not exist and we'll get into whether this is a  good idea or not uh in due course but scientists   do it a lot so they um basically test hypotheses  so when you fit a model there's two i mean i'd say   complementary things you can do with that model  one is estimation the other is hypothesis testing whenever you fit a model you're assuming all  sorts of things and you know what you assume might   depend on the model but basically for the model  to be a kind of accurate representation of reality   certain things need to be true so whenever you fit  a model what you should also be doing is assessing   whether there's bias in that model so again we've  got a lecture in a few weeks time that talks all   about bias so how you assess the assumptions of  the model or whether anything in the data might   be biasing the model and by bias it could be  biasing the estimation process so it could buy   us the uh you know the values uh that you estimate  in the model but it could also bias the hypothesis   testing part of the model so it could bias the  the tests that you um do on the model so that's   an important part of the process and depending on  the outcome of having a look at your assumptions   you might want to fit a variant of your original  model uh known there's a whole kind of family   of models known as robust models so um these are  models that are well as the name suggests robust   to the violations of uh assumptions so this  process is kind of fairly constant whenever   you you are interested in any kind of scientific  question you start with a question you would   operationalize that question in some way sampling  you'd collect some data once you've collected the   data you'd visualize it fit a statistical model  and you can use that statistical model either to   to do a process of estimating the effect or  effects or you can use it to test hypotheses   and you would always then test the assumptions  of the model look for bias trying to assess is   this model a reasonable representation of reality  and if the answer that question is no you fit a   different model and at least on on this module the  different models that we're going to look at are a   family of robust models so throughout this module  i want you to have in mind this process because   after we spend about five or six lectures i think  just going through the different parts of this   process and then after that all we're really doing  is repeating this process for slightly different   models so we'll have a look at that red blob in  the middle that says fit model and we're going to   ask the question of well what kinds of different  types of models can we fit but in every case   for every model that we fit sampling will be  involved estimation will be involved we'll look   at hypothesis testing we'll look at assumptions  and we'll look at fitting robust versions   of the test so there's a lot of kind of  commonalities throughout all the lectures now within that framework there are i think five  key concepts which if you understand these five   concepts then i think you've gone a long way  to understanding uh what's what's known as   frequent statistics so you've got a long way to  understanding the type of statistical models that   psychologists typically fit actually not  just psychologists and the five concepts   are the standard error parameters interval  estimates null hypothesis significance testing   and estimation so our job in the first few  lectures is to go through these key concepts   because if you have them in place everything  else to some extent i mean nothing's easy   in statistics but everything else to some  extent will hopefully fall into place for you so for today in particular we're going to  start off with just looking at the family   of models that psychologists typically fit so  it's something known as the general linear model   some of this may be familiar to you but there's no  harm in uh going over it again so we're going to   talk about what the general linear model is we're  also going to have a look at how well it has the   name general right and there's a reason for that  which is it's uh it's kind of a model that can   adapt to a variety of different situations and  we're going to look at how this particular model   is a more useful way to think about psychological  statistics than um how it how it's often taught   or often presented historically we're also going  to have a look at the estimation process so how   is it when we fit a model we estimate parts  of it so that's the basic proceed for today   so your learning outcomes hopefully you're  going to understand that most of psychological   statistics has a lot of commonalities to it so  we you only really need to learn one model and   one process and that model and process  applies to any situation all that changes   is the exact form of the model but  everything else around it stays the same we'll have a look like i said at  the linear model so what it is   what it is mathematically but not in too  much depth what it looks like visually and   uh you know like i said how how other models that  might be familiar to you or you might come across   in uh you know your scientific reading are  actually just variants of this one model   we are going to have a look at  what model parameters represent   so these are uh known as betas we're gonna have  a look at what they represent and we're gonna   have a very brief look as a kind of precursor  to another lecture uh as to why we use sampling   and hopefully that's what you're going  to get out of the next 45 minutes or so oh and i forgot about this one and understand  at least squares estimation as well maybe so a big you know i've been teaching statistics  a long time and i teach it to psychologists   and um often they are not it's not their  favorite subject let's put it that i'm   fair to say it's not my favorite subject so  i often ask myself because i love statistics   i mean i was about to say i love it like i don't  love it like my children but i do love statistics   and so it's um you know i do puzzle over why some  students hate it so much one of the things i think   that scares people off of statistics is it  can sound very complicated and people you know   there's lots of kind of jargony words and you've  got to kind of get your head around them people   talk and you know they're saying where's like  parameters and uh well we'll come across later   heteroscedasticity and things like that and it's  like oh what are the long words what do they   mean so it's kind of very bewildering and you  also you know people like thrust equations at   you and they're kind of they all look different  and they're you're told they all do different   things and they've all got greek symbols in and  it's all a bit kind of weird and bewildering   but actually there's a kind of a set of key  concepts and if you understand those key concepts   a lot of the peripheral stuff you can kind  of get away with not knowing to some extent   um or you know at least get to know it when  you know you've got the foundations in place   so the question is can we make  statistics a bit less like this i think the answer is yes we can by focusing  everything on this general linear model which   is a framework that um kind of encompasses as  i keep saying most of the models that are fit   in psychological statistics so what is the general  linear model essentially it boils down to one   equation in its simplest form all we are ever  doing is predicting an outcome so something   we've measured a variable that we've measured and  we're predicting it from a model and that model as   we will see over the course of this module can  kind of expand and contract but just think of   it as a model though and you know worry about  the specifics of the model in specific cases   but we just predict it from a model and  there'll be some error in prediction   so for example um you know you might want to  predict um aggression from use of video games   that's an example of some psychological research  so the outcome you're trying to predict is   aggression so you would measure that in some  meaningful way and the model you're trying to   predict it from includes a variable that looks  you know video game use but it you know could   have lots and lots of other variables in there as  well but there'll always be error in prediction so   if you take a particular kind of um you know if  you want to say well if you play video games for   seven hours a week this is how aggressive  you are gonna be there'll be error in that   prediction so we always have to bear in mind  these models are never perfectly predicting   what goes on there's always uncertainty in  prediction there's always error attached to it   so essentially we're predicting an outcome and a  little hat a little hat there on the outcome is   telling us that it's a kind of an estimated value  or predicted value and we are predicting it from   some variable or variables that we've  measured that we call predictors   and each predictor has um what's  known as a parameter so a beta value so let's do a quick zombie quiz now if you  were to walk around campus you'll notice   that zombies walk among us most of them in  the maths and physics department to be fair   but they're around believe me now i was interested in whether  there were different food habits   for the zombie faculty versus the human faculty   so what i did was i sat out in some of the  canteens around campus and i took notes of whether   people coming in when they chose their food  whether they chose to have potato chips with their   food or whether they chose to have brain chips  because we all know zombies like to eat brains so a researcher me i counted how many humans  and zombies chose brain chips or potato chips   to accompany their dinner at the university  canteen now i've been here a long time so   i'm well versed at accurately predicting whether  someone is a zombie or a human so i can categorize   faculty members as human or zombie um so you know  we can assume that's correct um and then i just   took a note you know just hung out in the canteen  all day with me notepad if they chose brain chips   i ticked brain chips if they chose potato  chips i ticked potato chips here's my data   so i've essentially got a contingency table  and within it i've got some humans and i've   got some zombies and i've got a record of  how many humans ate brain chips 28 of them   did and how many humans ate potato chips 42. so in  the humans more ate potato chips than brain chips   and for the zombies i have the same data so 61 of  them eight brain chips or chose brain chips and 57   chose potato chips so similar amounts my question  is how do i analyze these data if i want to   if i want to you know test the hypothesis  that zombies eat more brain chips than um   humans so mentally you can now do that little quiz  in your head and see what answer you come up with now probably if you look at kind of  you know a lot of textbooks and stuff   probably my own ones included um you'll  find that they tell you to do a chi-square   test say if you've got categorical data these are  categorical data we've got categories of zombies   and humans and categories of brain chip and potato  chips so everything's categorical we've just got   count data we've just counted how many people fall  into the combinations of categories everything   will tell you count data chi-square test that's  the law and if you deviate from that model   literally hell will break loose beelzebub will  come from hades up to earth and destroy us all   worse than that the dalek of statistics will  materialize now i don't because obviously some   people at the start of statistics modules get very  freaked out i don't want to freak you out anymore   but there is something known as the dalek  of statistics and the dalek of statistics   it was a renegade dalek and you know basically  they needed a job for it to do so they set the   dalek of statistics a job of policing the models  that people fit to their data so if you fit the   wrong model the dalek of statistics materializes  and tries to exterminate you and quite possibly   exterminates everyone else on the planet so you  know high stakes but it's okay because we know   if we have categorical data we do a chi-square  test now what i want you to note here just for now   obviously this is not the be all and end all  of everything but notice this is the p value   associated with the test applied to these data  so it's 0.12 rounded off so 0.121 that's the p   value so for lots of scientists that would be  the thing that they interpret so the thing is   i'm in a bit of a rebellious mood today so i feel  like seeing what happens if we do something else   let's just try something else let's just flaunt  the laws of statistics and try something else   and what we're going to try is a spearmint  correlation i'm going to try this because   well a spearmint correlation is something  known as a non-parametric test which everyone   seems to think means that it makes no assumptions  but you know it does make some um so a spearman   correlation but it looks for associations or  relationships between variables which is what   a chi-square test does so you know maybe maybe we  can use this without getting into too much trouble   oh well it turns out if we  do a spearman correlation we get within rounding error the same p value  so even though we're told you know what lots of   sources will tell you categorical data chi-square  test spearmint correlation you get the same result what about kendall's tau correlation  so this is similar to spearman's you know   what happens there oh same result p-value 0.122  so it's within rounding error but i mean these are   yeah maybe special in some way because like i said  they're they're known as non-parametric tests so   they don't have the kind of stringent  assumptions that other tests have   so yeah you would believe so maybe you know maybe  it's reasonable that you get the same result and also the dalek of statistics has not appeared  so there can't be anything too wrong with doing   that i guess um what about pearson correlation  so this is what's known as a parametric test   and when you do parametric tests  you'll have people tell you there are   assumptions that have to be met so for example  you know the data have to be continuous otherwise so although i'm a bit nervous about doing this i'm gonna have a go with pearson  correlation see what happens   you have disobeyed the laws of  statistics you will be exterminated oh you get the same result oh oh  oh i might go and have a cup of tea   well that wasn't so bad we got the same result  uh Dalek seemed a little cross but not too bad   ah what about t-test now t-tests they're supposed  to be for comparing two means we don't have any   means we've just got count data so this is  definitely like this is radical this is this   is like anarchic even to even contemplate doing a  t-test i'm warning you do not do the t-test but i really want to do the t-test you will be exterminated but i really, really, want to do the t-test Do not do the t-test! Screw it, I'm going to do the t-test. ARGHHHHHHHH! what would you know, you get basically the  same result in terms of the p-value okay i'm going to push this a bit further a  one-way anova now this is a test that you're   supposed to apply when you have three means we  don't have any means let alone three of them so   this should definitely definitely definitely mess  up the whole fabric of the universe you are insane ah turns out you get basically what about a linear model so this is the thing  we're going to be talking about regression   you get the same p value log linear  model don't even really know what that is   but it gives you the same p value  what about a multi-level model   now you're pushing me too far i am  going to explode the universe if that you get the same p value  well this is weird isn't it because if you look at lots of resources it  tells you there's certain tests for certain   situations but we've just seen we've done a  whole variety of different models and they all   basically give us the same p value which is the  thing that a lot of uh scientists will focus on why is that i mean that's just weird  because you see in certain textbooks they often have uh flow diagrams in the  back really complicated flow diagrams that   you know like what's your what's your  outcome variable what type of variable   is it how many predictive variables did you have  what did your dog have for dinner is it a tuesday   is it sunny is it rainy and then you  answer all these questions and it tells you   that's the test that you need to do but basically  that's all rubbish because most of the time the   only model that you need to fit is some kind  of version of the general linear model you   don't need all this stuff really you don't need  a flowchart what you need to do is think about   the variables in your general linear model so  the only equation you ever actually need is   equation of a straight line or the equation of the  general linear model and if you understand that   essentially everything we do is just  going to be a contraction or expansion   of that model so you just need to understand the  idea that we're predicting an outcome variable   from one or more predictive variables and those  predictive variables can differ in their type they   can be continuous or categorical but the model  itself doesn't really change in uh in essence so again you can do a little mental quiz here  um within this model we've got some betas   or parameters attached to the predictor variables  so in general we could call these beta ends so the   n just you know is a number that you know whether  it's the first second or third predicted variable   so do you understand what you know have you  learned before what that represents now if   you've er heard of the equation of a straight  line you should know what that represents or it   should be familiar to you because it's uh the  gradient of the line but in statistical terms   although that's kind of geometrically what it  is it statistically represents the relationship   between a predictive variable and an outcome so  it tells us about the direction and the strength   of the effect between two variables so between  the predictor and an outcome and that can mean   if the predictor is categorical that it can  represent differences between means as well we also have a parameter or beta in there beta0  which is not attached to a predictor and this   is known as the intercept and this just tells  us the value of the outcome variable when all   of the predictors are zero so what's the kind of  bass line or bass lines maybe the wrong word but   you know what's the level of the outcome that  you get when all your predictors are zeroing   so let's have a look at some examples of the  linear model so this is an example that's very   close to my own heart um because i like music a  lot i go to concerts a lot and here's some photos   of me and my wife my wife and i at the download  festival quite a few years ago now because   uh this was before we had children  and um it was very muddy as well which   you know may explain why we haven't been back for  a while um go to concerts a lot and periodically   i worry about my hearing because although now  you know you can wear earplugs and stuff um i'm   sufficiently old that when i was going to concerts  as a teenager the idea of earplugs was you know   ridiculous or something you just you just didn't  you know you couldn't get them you didn't well you   probably could get them i guess but you didn't  wear them so um you know i kind of worry that   my hearing is taking a bit of a bashing so  we could have a look at what the effect of   music volume is on uh how long your ears ring for  so if you go to a concert is there a relationship   between the volume of that concert and how long  your ears ring for so your outcome variable is   how long your ears ring for and your predictor  variable is the volume of the concert now if we   had lots and lots of people attending a concert  and we measured uh sorry attending lots of   different concerts and we measured the volume of  the concert or maybe it's the volume in different   parts of the room because i guess that will vary  a bit um we could have a measure of the predictor   the volume and we could also maybe ask the people  to report back on how long their ears were ringing   for after the concert in minutes and we could  plot these so uh we've got a load of data points   all representing people so down here we've got  two people and they the volume of their concert   was well i was getting about 92 decibels and  their ears were ringing for i don't know like 860   minutes or something like that or there abouts  so you could look at everyone's data and then   basically estimate the line that kind  of best represents that cloud of data   so the linear model that we're fitting is that  red line going through the cloud of data and   we've seen there are parameters for this model so the beta attached to volume   the slope of that line so it's the it's the rate  of change of ears ringing as the volume goes up   so how steep is that line so if beta's a  large positive number relatively speaking   then what that means is that the the the duration  of ear ringing is going up very rapidly as the   volume of the concert increases whereas if it was  a negative value that would mean that the amount   of time it is ringing four was actually going down  as the volume of the console went up which would   be a bit strange but you know it could happen and  if the line is completely flat it would mean that   as the volume of the concept increases it's  not affecting how much your ears ring at all   we've also got this beta0 though and to have  a look at what this means we've got to kind of   scale out of the graph a bit so we've got our  cloud of data and our line uh here but if we   were to extend this line backwards right to the  point where the volume of the concept was zero   so no noise whatsoever the question is how long  will people's ears ringing form that's what b sub   zero represents it's the the value of the outcome  when the predictor or predictors are zero so b to   one doesn't change that's still exactly the same  even though we've extended the model but b to zero   is telling us the in this case it's it's not it's  nowhere near the data cloud so it's a predicted   value and we maybe need to take it with a pinch of  salt it's saying if you have no volume whatsoever   how long will your ears ring for now clearly this  value should be zero if you have no noise your   ears shouldn't ring at all but what we find  when we estimate this model is actually the the value is 37.12 so that's kind of illustrative of the fact that um  you know it's a model with error so you'd expect   the intercept to be zero you'd expect no earring  if you've had no volume but in actual fact it's   predicting that you'd have minus 37 minutes of  earring it also illustrates the fact that this   or the values you get out these models can be  ridiculous so it makes no sense to have your   ears ringing for minus 37 minutes like  what does that even mean to anyone so   um you know always bear in mind when you're  fitting models that sometimes the the actual   values that you get out the estimates that you  get out of the model don't they don't have to   make sense in the real world it's always  good to interpret them and question them okay we could address the same question different  ways so rather than looking at the volume of   the concert we could just look at attending  concerts versus not attending concerts   so here we might have two groups of people a group  that don't attend the concert and or concerts   and a group that do attend concerts and look at  again how many minutes there is ring four um and   so basically you get two clouds  of data it's two different groups   and the linear model here is representing the  differences between the means of the two groups   so you can see the blue cloud of  cloud on the uh right of the screen   is those that attended the concert and that they  have a higher mean so the mean of that group is   higher than the mean of those that did not attend  the concert and the difference between those two   means is what beta 1 is going to represent  and b to 0 the intercept here it represents   the mean in the uh what's known as the reference  category we're going to cover this in a few weeks   so it'll make more sense but just bear in  mind that in this case the intercept is   is a mean of one of the groups when we  fit statistical models we're trying to   make general conclusions to take our hearing  loss example we would want to make conclusions   about how volume of concerts affects everyone  not just the people at a particular concert   so in an ideal world we'd be able to collect  data from everyone ever going to a concert and   you know the full range of volumes that they  might experience uh measure their hearing loss   and fit that model or fit a model to all the  people that we want to draw conclusions about   but life isn't like that we don't have access to  everyone that we want to draw conclusions about   and so instead we typically work with samples  so we would take a smaller set of people and   hope that they are representative of the wider  population and we fit our models to that smaller   sample now it'll be more on this in due course  the analogy i always draw with this is architects   so if you're an architect and someone says to you  build me a bridge across a river what you don't do   is order yourself a load of concrete iron rods  suspension cables blue tack uh what else would   they use to build a duct tape bound to use  duct tape um you wouldn't go and order lotus   that and build yourself the bridge like hey guys  i bought all the stuff i'm gonna build me a bridge   over the river let's go what you'd wanna do first  is some tests so you would be able to scale down   version of your model i mean probably a lot  of it's done the computers these days i guess   and then you'd put that model to the tech to  some tests and see how it performs and if it   performs well if it's like a good fit of reality  then you might actually build to the full scale   bridge or you know you might generalize your  conclusions to a bigger scale model so you might   for example say well you know on this river the  bridge is going to be subjected to some rain so   let's see how it holds up how a model holds up to  some rain and you subject it to a rainstorm and   if it survives happy days you know you can assume  that your bridge will survive so you'll make   certain assumptions about the bridge's tolerance  to rain then you might say well it's going to be   so subjected to some it in a wind tunnel or  something and see how your model performs   when it's subjected to wind and basically you'll  have a certain set of assumptions about your model   so it will you know survive winds up to a certain  tolerance and rain up to certain tolerance and as   long as those kind of conditions are met then it's  fair to assume that your full scale model will   also perform well there may be things in the world  that cause your model to not perform very well so   one thing architects are always kind of overlooked  when they're building bridges is the threat of   the pink river hippo now the pink river hippo has  three defining characteristics first is that it's   pink the second is that it lives in a river and  the third is that it has really deep seated rage   against humanity bottled up over generations of  pink river hippos and this rage can be unleashed at any time now the problem is for the  pink river hippo the pink river hippo   thinks that all the other hippos get too  much attention in wildlife documentaries   and this makes it very sad and very angry and that  anger sometimes comes out you know maybe it's been   watching the discovery channel and it's been a  hippo documentary and it hasn't been featured that   rage comes bubbling up to the surface and it will  strike now the problem is for a bridge builder   is you might think well i'm building over a river  where there are no pink river hippos but what if a   pink river hippo gets loose and goes down that  river then you may find your bridge destroyed   so your model is only as good as the  assumptions that you may or that is only as good   it's only good to the extent that  the assumptions that you make are   true so if you make the assumption it  won't be attacked by a pink river hippo   and that in fact is the case in the real  world then your model will hold up pretty good   however if you make that assumption it turns  out that your actual bridge that you build um is   in fact falls foul of the pink river hippo then  um you know clearly your model wasn't very good   anyway what's this got to do with statistics i've  got no idea but it'll at least demonstrate to you   why i didn't go into a career as a filmmaker  so to take our ear-ringing example if you   imagine that this is the population model  up here what we are doing effectively is   taking a sample so we're taking a small subsection  of that population and we're fitting the model to   it and the model hopefully will be pretty similar  but it's not going to be exactly the same so the   parameters the b's that we get in our sample are  not necessarily going to be exactly the same as   the b's in the population of course we don't know  how similar or different they are because we don't   have access to the ones in the population but we  always have to try to kind of have some kind of   estimate of our uncertainty in those values that's  very important so for example uh in this sample   it's telling us that beta0 is 18.06 when  in the population it's 5.5 so they're quite   different values but we could look at taking  um you know imagine we took some other samples   again the you know we're getting slightly  different values of b to zero in all of these   and we're also so the actual kind of relationship  between so the beta for the volume of the concert   and ears ringing is about 10 in the  population which of course we don't   know and we have no way of knowing it's like  a mythical unicorn value that um you know   it's not it's probably not as pretty as a  unicorn but you know we have no access to it   in our sample we're getting values  close to 10 like 9.94 10.1 9.76   9.48 so if we took lots and lots of samples we'd  hopefully get values for beta that are close to   the population but the point is they're going  to be different across different samples as it's   known as sampling variation there's variability  in the estimates you get from different samples   so we're going to come back to this in a bit more  detail in another lecture but worth flagging here so how do we estimate these values   well um we're going to use a really uh we're  going to kind of get everything down to sort   of the most basic model that we could  that we can think of to use as an example   so typically i mean there are there are other  estimation methods that we don't cover on this   module all the models we cover on this module use  something called ordinary least squares estimation   so we'll have a look at an example of this  using um the most basic model that we could use   um and that is the mean so we can  use the mean as a predicted value   it's not necessarily a great model to use but  it's a model we can use so the example we're gonna   have is the number of uh friends that statistics  lecturers have so imagine we've got a sample of   five statistics lecturers and we just measured  how many friends they have so the first one   had one friend that was his hand second  one had three friends third round four   friends next one had three friends next one  had two friends so that's they're our data and someone comes up to us at a party and says um you know how many how many friends do you think  i have and you know we happen to know they're a   statistics lecturer so we could use the me if we  knew the mean of the mean number of friends that   statistics lecturers have we could use that  as a predicted value or a sort of best guess   you know it's a great party trick i mean  if you're ever a party just try this   it's like you'll have so many friends by the  end of the evening be unbelievable oh my god   um so we could use the mean but how do we  find out what the mean is how do we find out   what the value of the mean is well it turns out  the mean is an example of least squares estimation   but to begin with let's imagine that there's  no maths now you might say how can we possibly   imagine that well so before we get into this i  want to introduce you to professor hippo hello   i have to try and move off camera  because i can't actually not move my lips uh now professor hippo he is a genius thank you  very much and uh yeah you know it's um he's kind   of he's he's i'm just kind of like the uh  you know puppet i'm professor bo's puppet   i deliver everything he's the brains behind  the outfit yep and uh he's also all-powerful   you know we've learned before about the power of  the pink hippo right and he's also quite cranky   yep and one day he decided to remove maths  from the universe yeah i don't like maths sucks so decided to remove all maths from the  universe and um but we needed to find out a mean   but we couldn't use maths so what we could do is  guess yep i guess so let's imagine we're trying   to guess what the best predicted value of the  number of friends of statistics lecturers is   now what we could do is we could have a  guess our first guess is going to be two   we think on average two seems like a good amount  of friends for a statistics lecturer to have   now what we can do is we can rearrange  this little equation which says predict   friends from the mean that's what  that parameter is going to be the mean   plus some error in prediction so we predict the  number of friends for a particular lecturer i   from the mean and there'll be some error and  if we rearrange that so we basically take   beat a zero over to that side and  it gets a minus symbol when it goes   over we can see that we can work out the  error by looking at the number of friends   minus our estimate of the mean so our estimated  mean is two so we've got this lovely table here   so we know the actual number of friends  in our sample we've got an estimate   of two and this is represented graphically over  here so the flat line is our our kind of best   guess of two and the dots are the actual values  so what we can do is we can work out the errors   the errors in prediction  so the errors in prediction are the differences between what the line  predicts and what we predict uh sorry what   the line predicts and the actual value that was  observed so for example for this first person   they had one friend we predicted they had two  friends so uh we actually uh over predicted   so we get a minus one as the error because our  model over predicts how many friends have they had and we could do that for everyone so the  second person they had two friends we   predicted two friends so that's an error of zero  the third one had three friends we predicted two   so that she had an error plus one so one more  friend than we predicted we could do that for them   and we could add all these errors to work out what  the total amount of error is however notice that   some of them are negative and some of them are  positive and if you add positive and negative   numbers they're going to cancel out so we can't  just add them up we have to do something to get   rid of the negative numbers and one of the  things we can do it's not the only thing   but one of the things is we can square  them if you square minus 1 it becomes 1.   so what we actually do is we look at the squared  error so we take these errors and we square them   so minus 1 times -1 gives us 1 0 squared is  0 1 squared is 1. so on and so forth and it's   these squared errors that we add up to give us a  total amount of error in our model so when we use   a parameter of two the total error in prediction  is seven okay doesn't seem too bad but can we do   better yep i have a guess of four okay that'll  get the four so let's have a guess of four then so when we guess that the parameter is  for sorry when we guess that the mean is   four our estimates now change to four our model is  this flat line which is now at four instead of two   and again we can look at the errors by looking at   these distances between what was  predicted and what was observed so let's just have a look at those so for our  first person we get an error of minus three   second one error minus two minus one minus  one zero again we can square these values   so that we can um add them up and what we find  when we do that is we end up with a total error   of 15. so this is bigger than the arrow we had  before so this tells us that our estimate of four   is worse than it results in more error  in prediction than our estimate of two   now if we were particularly sad what we  could do is try out every possible guess   of beta so we've tried two we've tried four or  we could try three we could try five we could   try 2.1 we could try 2.5 we could try everything  if we did that what we would see is this so this   is a curve being drawn of every estimate of beta  going from zero up to five and on the y axis it's   telling us the sum of squared error and as you  can see it has a characteristic curve so basically   curves down reaches a minimum value and then the  error starts to come up again so there is a value   there is a value or of of the mean uh that  is the minimum possible value given the data so here's that curve and here are the two  values that we guessed so there's that we   guessed a value of 2 and that gave us an  error total error sum of squared error of 7.   we tried a value of 4 that gave us the total sum  of squared error of 15. so those are the values   the question is what is the value right down  the bottom and the value is turns out to be 2.6   so this is the value of the you  know the mean that or the value   of the parameter that gives us the least  squared error the least sum of squared error so this is an example of  ordinary least squares estimation   so if we plug that number of 2.6 in as our estimate here and we work out those errors so  we get an error minus one point six minus point   sign i don't know why i've still got a hippo on my  hand uh point four point four and one point four   um and we work out those errors and we square them   we get this value 5.2 so that's the least  squared error that there can be given the data now that was a very long-winded way of  doing that i'm trying to illustrate what   least squares actually means now there's  actually an equation that represents this   uh least-squared estimate this equation  you might be familiar with so it's you   just add up the scores that's what this top  half is doing and then divide by the number   of scores it's the standard equation for the  mean and this equation actually gives us the um   least squared estimate so the  mean is a least squared estimate   so we add up the scores 13 divided by the number  of scores five we end up with this value of 2.6   which was that lowest point of the curve so  the equations that we use so as we fit more   complicated models the the equations get more  complicated and we don't need to know them or   get involved with them and indeed we don't um but  essentially they're doing the same process as this   they're fitting mathematical equations that will  that will give values of the betas that have the   least squared error in much the same way as the  mean and important point here is the mean that we   get here is 2.6 that's not a value that actually  occurs in the data so that mean or that value of   the parameter always has error there is no one  in the data set that actually had a score of 2.6   so for every person in the data set the predicted  value of 2.6 is incorrect there's error associated   with it and that's an important thing to bear in  mind there's always uncertainty um because no one   would have 2.6 friends i've got 2.6 friends  now you haven't yeah i actually have   now you haven't i have how have you got  2.6 friends well uh i have three friends   but i didn't really like the top half  of one so i i cut him in half and   put the top off in the fridge i just like his legs so you had a friend you've now got the legs  of a friend the rest of friends in the fridge   yep that's right okay why because i'm a psychopath ah get off get off get off me get off ah you
Info
Channel: Andy Field
Views: 17,459
Rating: 4.9516129 out of 5
Keywords:
Id: 7cSArk7tU4w
Channel Id: undefined
Length: 53min 38sec (3218 seconds)
Published: Tue Sep 22 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.