Effect size calculation and basic meta-analysis, David B. Wilson

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome in the next hour and 14 minutes I'm going to try to pull a rabbit out of the Hat and teach you meta-analysis whoo okay it's going to be idiot this is this is the this is the crash course in in meta-analysis and considering that I'm still learning more about meta-analysis and have been doing it for now I'm not going to admit how long I've been doing it for but I count it in decades not years so that will give you some indication there's a lot here that you can learn and so by necessity this is the skip on the rock off off the pond and you know yes I'm going to throw some formulas at you but the goal here is to understand what the method is all about get the big picture get the gestalt really get a feel for you know when people talk about meta-analysis what are they what do they really mean by that what are they talking about and so that's that's what I'm going to try to accomplish and hopefully we'll get through most of it I've got a few slides that I know I'm probably going to skip over so just forewarn you I put a few more in then then I think we'll need they get into some more advanced topics I find it useful to start at the end and then and then we can all go get coffee or something because we're we'll be done No okay in a sense you can think of a forest plot as the endgame for meta-analysis it's one of the common sort of outputs of a forest plot and here's one on the effects of cognitive behavioral programs this isn't very descriptive these are for for criminals trying to run run criminals through about a 14 or 20 week group based calling behavioral program and then looking at whether it reduces future recidivism this is very bare-bones we see on the left you have each study you know poor Perino in Robinson 19 five each row is a study you have that diamond there which represents the the effect size for the study the finding what that study found and that study found an odds ratio we'll talk about that in a moment that's really close to one meaning no difference between the two groups but a little bit on the positive side small benefit the horizontal line is a 95% confidence interval clearly includes sort of the null value of no effect not terribly interesting ever that's what we've got the line for the for getting an indication of precision and we see at the bottom there are a couple of studies with fairly large positive effects this this Ross one at the bottom it's nods ratio of ten so the cognitive behavioral group had an odds that was ten times lower than the other group of recidivating recidivating but look at confidence interval it's huge it was a small study and as meta analyst you say what's the general pattern of the facts well notice every single one of these studies is positive well that's interesting if this tree treatment had no effect whatsoever we probably wouldn't expect all of them to be positive like this we'd expect sort of it to bounce around some positive some negative and with the exception of this one that's sort of really just out left field that man they practically found a cure for for crime there's variability but they're there so it's sort of a consistent looking pattern and then as mad as you say well that's actually you know crunch some numbers here and come up with an average and so at the bottom we have the mean which that bottom diamond of 1.5 so a 50% reduction in the odd of a recidivating not bad it's a meaningful effect and and a confidence interval around that probably going from about one point two five to two doesn't include the null value of 1 so it's statistically significant so this would suggest a statistically significant benefit of cognitive behavioral programs in reducing recidivism among offenders that's kind of exciting now notice how many of these studies individually were statistically significant well is quickly count those are the ones that do not cross have a confidence interval around one looks like only two so we have one two three four five six seventy nine studies two of which were statistically significant seven of which were not statistically significant yet we're concluding that this works interesting we're going to come back to that point in a moment but this is sort of where we're going with meta-analysis okay so here's the overview of today's talk I'm going to focus a little bit on the logic of men now since we just started with that and I had a little more to say about the logic meta-analysis we're going to talk a little bit about effect sizes what are they some common types we're not going to get into the computation of lots of effect sizes I spent an hour and 15 minutes talking about that this morning and so if you really wanted that that was the advanced topic and even the hour and 15 minutes wasn't enough time well at least touch on it so you get a sense of what they are and then we'll focus on sort of the basic types of meta analysis issues coming up with the mean effect size its confidence interval looking at issues of variability well at least introduce the idea of random effects versus fixed effects there was a whole advanced topic on on that earlier this morning but you should at least know what the basic issue is and I'll try to introduce this is where we may have to crunch things a little bit introduced at least the concept of moderator analysis so at some level that's an advanced topic but you should be aware of what it is and have the concept and then hopefully I'll have time for a few notes about software how to go about doing this because I'm not teaching in software I'm just trying to give you the the the conceptual orientation about what this is all about okay that sound like what you're hoping for alright okay good because that's what I came prepared for so if you want something different I'm going to have to change gears pretty darn fast but that's that's that's what I got here um the logic of meta analysis traditional whoops this is not easy to walk in and out and I have a tendency of doing it I'm going to have to be careful here traditional narrative reviews or sort of non meta analytic approaches to reviewing the literature thank you tend to focus on statistical significance and that's problematic as I'm going to talk here in a moment about the weaknesses of statistical significance and they often tend to lack transparency you know now we have this broader term called systematic reviews that meta-analysis is now sort of often embedded in and and that whole process of systematically reviewing literature is focused very much on the issue of transparency and replicability so that so that the the nature of the review is not just that you know I'm a really smart guy I read these 24 studies and as an expert this is what I think these 24 studies said and and really that's traditionally what a lot of sort of scholarly reviews were all about it's all about this sense that well I'm a scholar and so it's this is my opinion after having carefully thought about and looked at this literature and you just sort of need to take it on faith that I somehow did this in my head the correct way well one of the problems with that approach is statistical significance has a tendency to lead you astray it has its values it's quite useful at the primary study level of making sure and unfortunately this this example is going to date me that you don't end out on the cover of Time magazine saying you've discovered cold fusion okay it's the protection against the a few people got that you know there was really somebody who thought they discovered that and was on Time magazine back in the eighties yes you know it's it's it's protecting you against sort of getting excited that you found something that could be explained by chance that's really important and it's valuable but once you start looking at results across studies it has a tendency to trip you up because there's there's an asymmetric relationship between a significant finding and a non significant finding if you have a statistically significant find that's a strong conclusion you can take that to the bank if you have a statistically nonsignificant finding that's a weak wishy-washy finding okay most people forget this after their first semester of statistics but first semester of statistics when you're writing out those all of those hypotheses things you know failed okay you know you reject the null hypothesis well when you can't reject the null hypothesis you were supposed to write this rather convoluted phrase thank you fail to reject the null hypothesis wait doesn't that mean accept the null hypothesis no it doesn't mean accept the null hypothesis it's it's wishy-washy well the null might be true might be false but we can't say that it's false so okay now in front of you you have a collection of non significant findings and significant findings and you put them on a scale how do you balance these strong conclusions with these weak conclusions if you know the answer to that you're smarter than me okay I can't figure out how you would do that and but reviewers somehow managed to do this all the time sometimes by counting it up and so if we back up a few pages I can manage to do this this way they would say you know in this area there were seven studies that had no findings two that had positive findings so we need more research the conclusions are equivocal and sure enough there's a grand history in in clinical psychology of review after review after review calling for more research and the more research that's done guess what happens if you're focusing on statistical significance the more equivocal the results appear to be because the larger the pile of non significance the larger pile of significance and these two piles just keep growing you will pull your hair out okay so maybe there's a problem with focusing on statistical significance so meta-analysis says you know that's not what we're interested in e house physical significance it's just this abstraction it's the probability of our results in a world in which the null were true of course you don't believe the null is true so I mean it's this very artificial abstraction useful in a very narrow sense but it's not what we're interested in what are we interested in we're interested in what we're studying what's the effect what direction is it in discography Haverhill program for offenders reduce crime or increase crime and if it reduces or increases by how much that's what we really want to know so why don't we focus on what we really want to know let's focus on the magnitude and direction of effect meta-analysis shifts our focus to that and that says well you know maybe we should also approach doing a review as a research project in its own right and approach it with the goals of being transparent replicable etc so that others can evaluate the nature of our work and so then what meta-analysis does is actually focus on those effects which we call effect sizes and some generic sense and it wants to look at several important things well what's the average effect that's interesting on average of these programs reduce crime well I say yes of course then you might come back to me and say well yeah but do they consistently reduce crime ah so we might be interested in the consistency of the FEX maybe some reduced crime more than others and so we might want to look at that variability and in looking at that variability we might want to look at it in relationship to study features you know maybe it's studies that we're done in prisons were more effective than studies done in jails or with certain types of offenders maybe violent offenders these programs particularly useful but those darn property offenders really high-volume criminals in general or just a bugger to try to convince to stop committing crimes who knows there might be all these sorts of relationships so the focus is average effects and the consistency of effects and possibly explaining inconsistency in effects and so the methods that I'm going to sort of work us through all have to do with getting it these three things right so how do we go about doing this well the first thing is we need that effect size we need that thing that encodes our finding in terms of the magnitude and direction well it has to put it on a numeric scale we don't want something that says you know small effect medium effect big effect right we need a number so we're going to do statistics on it it must be comparable across studies you know if if on one study I calculate this number and I come up went to five another study I calculate the number I come a point two to five those should be sort of comparable size effects if I come up with another one that's point four that better be a bigger effect not a smaller effect than the two studies that are point two five right these these sort of bigger numbers should be bigger effects smaller number smaller effects needs to be independent of sample size notice we're not trying studying how many participants the researcher was able to recruit into their study what we're studying is how BIG's the effect that they found you know the phenomena the effectiveness of this program not how many people were in the program and the other quirk is we have to be able to figure out its standard error the standard error is a statistical measure of its precision and if we're going to look at consistency of effects we have to know how precise the effects are in the first place that's what those confidence intervals we're all about is the precision and we need to know that now there are many different effect sizes when I say effect sizes I'm effect size I'm not referring to a specific statistical parameter or statistic I'm referring to anything that sort of satisfies these criteria I think I can probably think up about twelve or fourteen of these off the top of my head and you know some of you might come up with research situations where you create your own effect size as long as it satisfies the necessary criteria and for each effect size as you'll see there's often more than one way to calculate it depending on the information provided in the primary study three well one of this there's two listed here so I should say four of the most common effect sizes used in meta-analysis are the ones listed here the correlation coefficient hopefully all of you your statistics is you're sufficiently familiar with statistics to know what correlation coefficient is and it's a nice standardized effect science right you know range between minus one positive one and and is useful just in its in its form as a correlation coefficient and we're not really going to talk about that one because it's not particularly widely used in in the work in the Cochrane or Campbell collaboration though it's widely used in meta analyses in social psychology psychometric work etc the standardized mean difference sometimes you'll see this referred to as D Cohen's D G hedges J for our purposes all pretty much the same thing right and I'll explain what what that one is here in a moment when you use it the other two there that are quite similar and used in the same situations of the odds ratio and the risk ratio and I believe and have to remind myself here of my slides ah yes okay apologize for that flippin I know that can be confusing one of the challenges of meta-analysis and something it will take you a lot of time if you were in here the workshop prior to this Sandra Wilson was talking about coding effect sizes I mean coding studies one of the big parts of coding studies on in calculating effect sizes it can be a lot of work and you must compute the effect size from the information provided in the primary study this might be from t-tests and p-values descriptive statistics etc you might also need to manipulate the data that they provide you might need to collapse across subgroups common example in my situation is I get a lot of studies in criminology feel that they'll have a comparison group and they'll have the treatment group but they'll report the data separately for the people who successfully made it through the treatment group and the people who dropped out and now the results are always wildly successful for the graduates versus the comparison group well of course they're successful they successfully graduate and success breeds success there are other and this was part of this morning's advanced meta-analysis some of the ways of calculating some of these statistics just algebraically equivalent to other ways computing standardized mean difference from mean standard deviations and sample sizes is the same as getting it from a t-test but estimating from binary data is an approximation and you need to be aware of these these sorts of things and sometimes you know no matter how hard you try you just can't squeeze and effect size out of some studies the more you do this sort of work the more creative and clever you get but some studies you just can't do standardized mean difference here the fundamental relationship is a group contrast treatment group control group boys versus girls drug a versus drug B drug a versus placebo etc and the outcome is sort of conceptually a continuous dependent variable depression self-esteem something that might be measured on a multi-item scale or fundamentally something where it would be statistically appropriate to calculate a mean and since in its simplest form if if this data is data it's appropriate calculate a mean or essentially the construct of interest then the standardized mean difference might be the appropriate effect size but different studies have different outcomes one study might have Beck Depression Inventory and other the Hamilton Depression Inventory these are on different scales how do we make them comparable well through standardization this is the same sort of standardization that occurs in the correlation coefficient by using the normal distribution and so we use the raw standard deviations within each group to look at the normal distribution and examine how much the means are different from one another in terms of the normal distribution hence the name standardized mean difference now this can be computed from a variety of other information from a t-test okay it's that top formula where you just essentially working out removing a component in the t-test that's a function of sample size that first one is algebraically equivalent if you did this computation with the T or with the raw data you'd get the same standardized mean difference or D from a correlation coefficient you can also get D a dichotomous outcome this is interesting you might have some studies that instead of measuring depression by on some scale they just group people into you're depressed you're not depressed oh darn it all I'm going to throw them out that would be a waste wouldn't it be nice if we could actually calculate the standardized mean difference from that study and include it with all the other studies in our meta-analysis but guess what you can't but it's an approximation it's not exactly what you would get if they've given you means and standard deviations bunch of Monte Carlo simulation shows that it works pretty well in most situations and here's this this computation I'm not going to explain all of the terms but the adb see in there are the cell frequencies of a two by two frequency table and essentially what you're doing is you're rescaling an odds ratio from the logistic distribution to a normal distribution pretty cool little trick and it works fairly well let's see here there's I should have pulled it up already there's an effect size calculator on the web that I've created that has lots I mean this this list here is all the different ways that you can calculate the standardized mean difference so we've got from a chi-square of the p-value of a chi-square unstandardized regression coefficients gain scores there's all sorts of these you click on it and it'll give you a little table you plug in the appropriate data that calculator is available both on the George Mason website and on the Campbell website all you really need to take away from this discussion on the standardized mean difference at this point is not you know remembering or trying to remember all these formulas even statisticians don't bother memorizing formulas you'll always look them up because you want to make sure you get them right now sometimes you use them so often that well you just kind of remember them but that's not the point with equations what you need to realize is there's a whole category of equations that help you take the information in the studies and try to squeeze it through some manipulations where you spit out a standardized mean difference the correlation coefficient we're not going to talk about it much but it's for situations where you essentially have two continuous constructs inherently continuous constructs say the relationship between GRE scores and performance in graduate school which by the way isn't very strong correlation but of course in part that's because most people with low GRE scores don't get into grad school okay the odds ratio is widely used in medical meta-analyses because lots of their outcomes are economists and that's a situation where you would use the odds ratio or the risk ratio we've got a group contrast in a dichotomous dependent variable pregnant not pregnant alive dead they were cured they weren't cured okay these sorts of success failure treatment group control groups we get a traditional 2x2 frequency table and it's very easy to calculate the odds ratio with that little formula right there now with the odds ratio if you're if you spend some time thinking about it there's only so many ways that you can report the results for a dichotomous outcome the frequency the percent the proportion and you quickly run out of options so the reality is this one tends to be pretty straightforward because there's really a limited number of ways people might report it so the number of ways of trying compute your effect size R is limited and tends to be fairly straightforward okay the goals of meta-analysis then are sort of to describe this effect size distribution so what you should be imagining is you've decided what type of effect size to use for your meta-analysis you decide that the mode that the fundamental nature of my meta-analysis the nature of the relationship I'm interested in lends itself to the standardized mean difference so I'm doing a standardized mean difference type meta-analysis and so now you've calculated a whole bunch of standardized mean different spec sizes on your collection of studies and you now want to describe that distribution of effect sizes so you might want to calculate the mean you might want to put a confidence interval around that mean you might also want to test whether it's significantly different from zero so we probably don't want to obsess too much over significance levels having just criticized statistical significance testing in the first place when we started down this path right we might want to test the consistency of the facts are these studies telling a consistent story with one voice or is it a chorus of multiple voices right and if it's a chorus well maybe we should try to explore and see if we can identify some of those individual voices so this is our goal with with these with these methods now here's the problem in the early days of meta-analysis people said well this is data just like any other data so let's just throw it into all our standard programs that we our customer using and analyze it let's just throw it to a regular regression analysis or just calculate a mean F SPSS tell us what standard air is woohoo where we go well those analyses assume something that statisticians called iid independent and identically distributed data wow that's a mouthful well it means the data is independent well we still assume that here but identically distributed what the devil does that mean what it means that each data point has roughly the same level of precision so if I give everyone in this room sort of a oh I wouldn't be crazy enough to you know get you know satisfaction with this course but let's say I was crazy and hey you fill out you know satisfaction form with this course the presumption is is that the scores I'm getting from each of you are all sort of roughly of equal precision well with meta-analysis that's clearly not the case effect sizes from some studies are more precise than effect sizes from other studies we might want to give those effect sizes from the larger more precise studies more weight give them a louder voice then the studies from the really small pilot studies well so people said yeah we do so let's do sample size weighting it's not a bad solution but Larry had just came along and figured out it's not optimal and in statistics we actually have a direct measure of precision called the standard error so why don't we wait by the standard error or the inverse because the problem is the standard error the smaller it is the more precise so if we weighted by the standard error we'd be doing the opposite of what we want well in math if you want to flip something so you're doing the opposite you take the inverse right but what Larry hedges determined was it's actually the inverse of the standard air squared well if we standard air is the type of a standard deviation if we square standard deviation we have a variance thank you a few people still awake all right so we call this inverse variance waiting so the type of meta-analysis that I'm teaching you is formally called inverse variance weighting meta-analysis or inverse variance weighted meta-analysis and once we start using the the inverse variance weight we get all kinds of statistical benefits because now if we've got this measure of how precise any of the individuals effect sizes are we can probably come up with something that tells us how precise our mean is yeah sure enough it's really straightforward and we can also then very easily through some little statistical manipulations figure out how consistent create confidence intervals create an index of how consistent our effect sizes are okay unfortunately we're not quite ready to jump into meta-analysis yet we've calculated all these effect sizes and we want to analyze them but a lot of them they they still need a little fixing up to be ready to go to the prom they're not quite ready yet so some of them we got to do a little tweaking to the standardized mean difference it's got a little bit of bias in it it's a little bit too big when the sample sizes are small well there's a formula that corrects for that that's this first formula you don't have to understand it it's just if the sample size is small it punches it down we're just a little bit not much just a little bit and as the sample size gets bigger the adjustment starts going out into the third fourth fifth sixth digit and to where it's you really wouldn't care about it but it's easy to do it's just a little simple math right there so we apply this correction to all effect sizes for doing standardized mean difference and then you get to say I'm using the unbiased standardized mean difference estimator all right okay so that's pretty cool so you just you know remove the Spy's the correlation coefficient we can't calculate the standard error it doesn't have a standard error formula but if you take the correlation coefficient and convert it using Fisher's z transformation you have an easily solvable standard error ah so we'll just use that in its standard errors the odds ratio and the risk ratio as well it's an odd beast and it also doesn't have a standard error keep in mind that an odds ratio of 1 is your no affect everything from 1 to 0 is a negative effect everything above 1 to infinity is the positive effect weight that's this big arm over here this little arm over here it's asymmetrical well if we simply take the log of our odds ratio or the log of our risk ratio we then get something that has an easily solvable standard air so that's what we're going to use for analysis for those of you are going that statistical stuff just went over my head the simple thing to realize is you've calculated your effect sizes and now sometimes you got to dress them up a little bit to get them ready for analysis so now we've cleaned them up they're ready for the prom almost we need our standard errors now more formulas I know I'm sorry they're supposed to be the basics course you're going this is the basics course and our formulas were the advanced course the non formulas are the basics I'll tell you what you need to know from these formulas okay all you really need to be paying attention to is in almost every case it's mostly about sample size so look at the first one for the standardized mean difference we've got n sub 1 and sub 2 and some one and some two those are sample sizes a matter of fact if the if the standard deviation I mean the effect size es under a sub SM for standardized mean difference effect size if that goes to zero that whole term goes goes away so it's it's about sample size oh cool the correlation coefficient 1 over the square root of n minus 3 that's entirely about sample size right the odds ratio a B C and D oh those are the cell frequencies of the two by two table well if we add up a B C and D we have our total sample size oh it's about sample size again now for the weight for meta-analysis it's 1 over each of these standard errors squared okay so that's our weight we need to need a weight for our facts and that this weight it's not it doesn't equal sample size but it's a function of sample size the sample size gets bigger or smaller this weights going to get bigger or smaller we're almost ready almost there everybody's gotten dressed up for the prom but guess what not everybody gets to go to the prom some have to stay home why is this well actually we have to maintain statistical independence so if one study had three or four effect sizes another study had one effect size another had twenty effect sizes and we just let all of them come to the prom some studies are going to have a very loud voice a little deafening and it's going to violate all sorts of statistical assumptions so we can only let one effect size per study come to the prom well that's a bummer ha but the good news is we can have more than one prom we can have proms happening in different rooms different analyses we can look at depression as one that that proms happening in one room we can have say anxiety happening in another room nobody's anxious about proms right okay we can have okay we can have different effect sizes going into different piles but for each sort of analysis only one for study or there's a little parentheses per independent sample okay the idea here is they have to be statistically independent so if you have a study that's actually reporting on three different sub studies it's one for each of those sub studies or completely independent groups okay you can keep that separately but you just have to have this mind sort of you know in your head sort of one per study but you can handle I do lots of different analyses to make use of all those different ones by putting in them into different piles okay we're almost there so we've got our studies we've pulled out all these effect sizes we've calculated effect sizes then we got them ready we cleaned them up and we calculated their weights and now we pulled out an independent subset that answers one of our research questions now we're ready to go okay so the first thing we're going to do is calculate the mean the average and literally it's just the average but it's a weighted average so rather than just adding everything up dividing by the number of numbers we multiply each value by its weight and divided by the sum of the weights the standard weighted average if you remember doing those in statistics that quite frankly this is one of those things that you would have found in a statistics books from the 30s 40s 50s and 60s and seems to have dropped out since the advent of the personal computer for some reason but fairly straightforward formula there so that gives us our mean effect size now here's the cool thing we now want a measure of the precision of that effect size so that we can calculate a z-test and confidence intervals well remember that that weight is based on the precision of the individual effect sizes so maybe our mean is a function of those weights well sure enough if you add up all those weights take the inverse you've got the standard error of the mean now how cool is that if you're just doing sample size weighting this wouldn't work this is what you get by using that that inverse variance way that's pretty cool and once we have a standard error we can do really cool things we can create a confidence interval and this is just standard methods of creating confidence interval there's nothing unique to meta-analysis here we take our standard error times 1.96 gosh where'd that 1.96 come from I've seen that number somewhere before 1.96 1.9 remember where that number is coming from I hear people muttering them huh a to thea of what of what of what distribution the normal distribution that's it's just shy of two standard deviations it's the point at which there's two and a half percent in each tail it's that point oh five significance level for the normal distribution right okay and so we can create our upper upper and lower confidence interval using that standard formula and we can also do a Z test on our mean just by taking our mean divided by the standard error that's also just a standard way of calculating a Z test so Wow with some fairly simple statistics that you can do on a calculator we've calculated our mean effect size we've calculated the standard air we've created confidence intervals and we've done the Z test nothing more than a few simple computations now I should have picked an example with standardized mean difference it's a little easier but this is fine this is that that first example with the cognitive-behavioral and here are the studies we see the sample sizes we see out all of our odds ratios here is that one that was close to one 1.08 here's the one that was really large ten point two nine now remember we had to analyze the log odds ratios so those are all here and there's the weight and so we could by hand go through those computations or we could let a computer do it okay here I'm using macros that I've created for Stata I have versions of these for spss stata and SAS you can also use rev man you can use comprehensive meta-analysis program trying to take what other those are the main ones there's also programs that others have created and to quickly walk through just a few of the things it's telling us we've got nine observations and the fix I'll explain right now we're just doing fixed effects it's telling us that our mean is 0.37 it's giving us a confidence interval or standard error or Z and from the Z we can figure out the p-value well we've just figured out how to calculate that whole row we haven't figured out everything else on this page yet we'll get there but we've just figured out the math behind that whole that whole row that this is the same thing but it's converting it from log odds ratios back into odds ratios just undoing the the log taking antilog or the exponent okay so we've calculated the mean effect size put confidence interval around it then a Z test but really we should be asking are these studies telling a consistent story or is it homogeneous or or is there heterogeneity in these findings so something called the homogeneity analysis test now it probably should be called the heterogeneity test because when it's statistically significant it's heterogeneous and I've had some people when it's statistically significant say oh it's homogeneous because it's the homogeneity test well no it works works the other way and it's testing the assumption that all of these studies are estimating a common effect in other words that the only difference across these effect sizes is sampling error as opposed to true differences across the studies if homogeneity is rejected so in other words we have heterogeneity that single mean effect size is not really a good representation of all of the studies it's in a sense an oversimplification of this research literature because it means that there are real differences across the studies that the studies are finding real differences between them well you've got a couple of options one you can try to model that that difference you could fit a random effects model right oh no I thought this was the basics class well we at least have to understand the concept of the random effects and you'll see why in a moment or you can try to do both you can both assume that there's some random effects and you can try to analyze differences across studies homogeneity analysis it's really not that scary presenting two formulas here the first one for those of you who who like formulas might actually be able to understand what's happening in the formula the first one saying if we take each effect size minus the mean and square it well that squaring gets rid of the negatives puts everything positive right okay oh well okay we're we're figuring out how far each of the studies is from the mean and we're adding it up but we're waiting it because everything's sort of a weighted analysis a big difference from a large difference from a study with a large sample size should matter more than a large difference from a really small study really small study yeah those are going to bounce around more so those differences shouldn't count as much now that looks a lot like the numerator in the formula for the variance yep formula for the variance if you remember brush off that part of the brain you had somewhere where somewhere you've all calculated standard deviation or variance on the top it's each score minus the mean squared add it up then you eventually divide by n minus one take the square root the top part is how much each score varies from its mean we call these the sums of squares that top part we're summing the squared deviations around the mean in this case it's a weighted sums of squares now that formula is a real pain in the neck to do by hand so there's a computational formula just like with the standard deviation there's a formula looks just like this that's for use when you're doing it by hand it's a lot more precise by hand but it gives you the same answer as that first one this is distributed as a chi-square so you use the chi-square distribution to figure out whether it's statistically significant and the number of the the degrees freedom is simply the number of effect sizes minus 1 and in meta-analysis we use K to indicate the number of effect sizes because if we used n it gets too confusing because you've got the N in the studies and then you get the end and effect sizes and that's just so we've we've decided to use K I don't know who picked K why they picked K we've settled on K we use K it works K equals number of effect sizes right it's different than D it's not G right ok so it was available you know now ah let's see here so the Q statistic if it's statistically significant it indicates that yes you have heterogeneity but there's a problem with it it tends to be underpowered when you have a small number of studies so you might then be tempted to say oh this distribution is homogeneous when it really isn't because keep in mind this is just like a null finding and of any other context just because Q is not significant doesn't mean everything is homogeneous just means you're not sure it's heterogeneous and so Julian Higgins developed an alternative index called I square Y I I don't know another available letter apparently now notice that it's mostly a function of Q and anytime Q is bigger than its degrees of freedom you at least have some excess variability beyond what you would expect simply by a chance and so this is just an index of how much excess variability you have relative to you know what I think I think the denominator supposed to be degrees of freedom I'm going to have to check that for and I'm looking at and suddenly it looks wrong but the concept is the same you're looking at how much beyond your degrees freedom you are and the general rules thumb is if you've got 75% if I squared is 75% or larger you got a lot of heterogeneity 50% are larger moderate heterogeneity 25% you're low sort of heterogeneity 25 yeah something like that if you're in those ranges and this is useful when you're doing a small meta-analysis is an alternative to Q now unfortunately I've misled you all the version of meta-analysis that I've just shown you is what we call fixed effects meta-analysis and we actually don't recommend you do fixed effects meta-analysis oh no we recommend that you do random effects meta-analysis but it's really hard to start with random effects meta-analysis it's it's a lot easier to learn random effects from the fixed effects so what am I even talking about here okay well fixed effects meta analysis essentially assumes homogeneity fixed effects meta-analysis assumes that all of these studies are estimating a common treatment effect and that any differences in those effect is simply due to sampling error now one could imagine situations where this assumption would be reasonable if you are doing pure replications I'm a social psychologist I'm running this doing this little lab study where I bring college students in and I deceive them about something and then I I do this groovy little thing and I measure their performance on something and five or six of my colleagues at other universities do the same exact study identical same measures same research design everything it's the same we wouldn't expect those results to be identical in all six universities we'd expect but we would really expect that that underlying relationship should be the same and whatever difference there is it should just be that the particular draw sample of students that they happen to draw from this population of called College sophomores who've been widely studied by social psychologists we know more about College sophomores than any other aspect of the population okay but once you move away from pure replications this logic starts to fall apart you say well you know this cognitive behavioral program with offenders not all of these teachers are going to implement it in quite the same way some are going to do a better job than others and sometimes it's being implemented with only violent offenders sometimes with the general offender population sometimes only with property offenders in each of these cases the effect the underlying effect the true effect if we were God and knew what that value was might bounce around so maybe our effect sizes have variability in them that sampling error plus some variability from true study differences these two pieces instead of just the one well that's what the random effects model assumes unfortunately it also assumes that these random effects are normally distributed now we are not able to prove that they're normally distributed and when people statisticians start debating the random effects model they go off on these tangents about how credible that assumption is but the logic is fundamentally here that now we're assuming that really there are genuine differences across these studies in terms of the underlying effect and so our statistical model should incorporate that variability this is where some people who say you know the studies I'm combining I don't think you can combine it because the two head Oh genius well there is a point at which studies are so heterogeneous that it's conceptually meaningless to combine them but statistically this model assumes heterogeneity and and allows for that to be incorporated okay so the variability effect size is now assumed to be due to both sampling error plus variability and those true effects that are being estimated by these studies now why do use the random effects model one the assumptions are just far more plausible for for most of the research any of us would be looking at unless you're simply analyzing research that are fairly pure replications really the random effects model ripped is a more plausible sort of set of assumptions now the other cool thing is that as your effect sizes become homogeneous the random effects model converges on the fixed effects model well that's kind of cool so you know as the whole distribution becomes homogeneous eventually we get the fixed effects model so the bottom line is use the random effects model the the policy of the Campbell collaboration is to either use fixed effects model or report both I mean the RAM Wow apologies there use the random effects model report the random effects model if you if you're uncomfortable with that you can report both the random and fixed effects but you may not only report the fixed effects and the Cochrane Collaboration I believe has a similar policy that you can report both but you can't just go with fixed effects because the assumptions just aren't plausible cheering well darn you taught me how to do the fixed effects now you're telling me to do the random effects don't worry it's not that different we just need a new set of weights remember our first set of weights we're a function of sampling error the fixed effects model assumes that variability is due to sampling error the random effects assumes the variability due to sampling error plus true differences oh so maybe if we come up with a weight that is sampling error plus the variability the true variability across effect sizes we'll be able to capture this yes that's exactly what we do all right so we need a set of weights that are now a function of both sampling error plus study level variability well we've already got the sampling error piece so we now just need that study variability piece where we even get that from maybe from Q cues our measure of heterogeneity so maybe if we calculate the study level variability using Q we can make this work sure enough there's this thing called tau squared another available apparently this was an available Greek very charming individual apparently and this is our estimate of the study level variability and notice that it's mostly a function there's Q right there Q in the numerator and the weights are in the denominator it's a measure of x of our study level variability now when the distribution becomes homogeneous Q goes to zero now mathematically when you're calculating this if Q is smaller than the degrees of freedom for Q it can go negative is it possible to have a negative variance can you have negative variability come on someone no thank you you can't have negative variability right so Q the smallest we can get is zero right well when it's zero we have a fixed effects model alright and so now what we do we take our we recalculate the weights we take all those standard errors we calculated we square them to get our variance then we add this constant that we count it that's going to be the same for all studies we add that to our to our standard error then take the inverse and we've got our new weights so we now we have our random effects weights well what do we do at this point well we just go through all those same equations that we went through before a weighted mean we have a random effects mean add up these new weights take square root by by one we have the standard the standard error of the random effects mean now those who are good at math excitedly there you go that standard error is going to get fatter yep or going to get bigger going to create a fatter confidence interval yeah we're assuming additional uncertainty in our mean so the confidence interval is going to get bigger because we have more uncertainty but it's also more realistic so we just recompute using these new weights you've got your random effects meta-analysis how wasn't so bad okay and actually I'll go up here to yeah and that's what you see here now the other thing that I've I've sort of oversimplified there's really more than one way to calculate that tau squared I showed you one method there are a bunch of others that use maximum likelihood and all kinds of really nasty statistics okay and they have their advantages and disadvantages in different situations and right here we've got we've got two different methods and you can see that the mean changes a little bit but look what happens to those confidence intervals in the standard error the standard error gets bigger in both cases the confidence interval gets fatter but it's still statistically significant so in this case oh and we've got our cue statistic over here which is not quite statistically significant so in this study one would say you know we can't reject homogeneity these effect sizes are fairly telling a fairly consistent story remember there was just that one study that was sort of really way out of line everything else was telling a fairly consistent story but even if we assume random effects we still get a statistically significant overall effect etc okay yeah where are we here now in a nutshell that's really the bare-bones meta analysis a bare-bones meta-analysis is just going to look at the average effect it's going to look at its confidence interval and it's going to look at its at how homogeneous or heterogeneous it is and there are a lot of meta analyses that stop at that point they create a pretty forest plot to look at like we saw at the beginning but particularly if you if you've got enough studies you might want to start asking whether or not we can explain some of this variability if we can account for it with study features that we've coded sander was teaching those of you who previously all about how to code so hopefully you've extracted information about how your studies are different from one another and you can use that to try to explain this differences and really there there are two ways to think about doing this conceptually and statistically one is through a categorical model basically where you put effect sizes in different piles and you look to see whether those piles are different if you've just got two piles with this cognitive behavioral treatment programming the thing that I didn't show you is it's actually two different programs moral recognition therapy and reasoning and rehabilitation two very similar but still somewhat different cognitive behavioral programs and I might want to know whether one of these programs is more or less effective than the other oh so put all the reasoning and rehabilitation effect sizes in this pile put all the moral recognition effect sizes in this other pile and compare the means and so that's in that case with just two that's a lot like a t-test don't use a t-test we have special statistics for doing it it's called analog to the one-way to the ANOVA plot like one-way ANOVA one-way ANOVA with only two categories like a t-test but you can have three piles you know there might be a third category of programs or fourth type of program might have four different types of treatment programs or maybe it was offender types maybe you had the violent offenders the property offenders and then some studies were just general prison population okay so now I got three piles I got three means and I want to compare across those that's sort of a categorical approach to moderator analysis and we have methods for doing that now sometimes your your study level characteristics might be more continuous in nature maybe you have a measure of degree of implementation how well they implemented the program that was you know crappy pretty bad not so terrible almost good and good okay you know and you want to see whether or not that relates to effect size and you don't really have enough to you know so so you want sort of this linear regression sort of approach sure enough we've got meta analytic forms of regression lots of people incorrectly just throw all this stuff into regular regression that's a no-no you have to use with some people call meadow regression or what I just like to call meta analytic regression and for both of these moderator analyses you can do a fixed effects version or a random effects version once again just start with the random effects version because if you explain all that excess variability with your moderator variables the whole thing just simplifies to a fixed effects for you rather than than running it both ways now in the literature you might see some people call these mixed effects models and you're going wait I learned about random effects what's the mixed effects it's just a random effects moderator model the reason they're calling it mixed is because the independent variables in your regression model are fixed factors that's a technical distinction that you don't need to understand and some people like to call mixed models some random effects models for our purpose it's not a meaningful distinction how am i doing on time doing well this is where I've got some slides that I don't have time for all of so I'm trying to think what would be most productive here yeah this is an example of a categorical moderator analysis and in this case it's actually this is a meta-analysis of domestic violence programs where we had seven studies that were randomized and seven studies that were not randomized and notice here you know categorically we've got the mean the standard error the confidence interval the Z etc for each category that's pretty straightforward what a lot of people do incorrectly is they simply look to see whether each one's significant or statistically significant or not now in this case neither one is but sometimes one of these will be statistically significant the other won't be people will say I've got a moderator effect the program works for girls but it doesn't work for boys for example one row is girls and one was boys guess what sometimes the effects might be the same this is the one had a larger sample size than the other what you really want to look at is the Q test between the two means which is just like a t-test and that's what's testing the difference between the means and that's sort of your proper categorical moderator analysis that's telling you whether or not this categorical variable is accounting for some of that extra variability is it explaining some of the reason why you have noise and your data okay regression analysis in this case you know the effect size is sort of that dependent variable and study features are the independent variable now the cool thing here is you can have more than one study feature so you could build a regression model with several study features you have to have an effect sizes for doing this but you don't need quite as many as you would with typical regression don't use regular regression you must use specialized software for this you must use software design for meta-analysis and I'll talk about software here in a moment of where you can can do sort of this stuff but how this might look would be a result like this now this bottom table of regression coefficients this is if you're familiar with regression this should look pretty familiar to you if you don't know regression I don't have enough time to teach it to you okay but if you know regression you know we've got our standardized regression coefficient a standard error confidence interval Z test P test the standardized beta that's kind of cool right well the only other thing that's relevant is this model homogeneity analysis well the Q test for the model is just like the F test for regression analysis it's telling you whether or not your regression model is explaining variability in your data so in this case look at that model it's significant at point zero zero two so this model is explaining variability and effect sizes hey cool okay then we can go down look at which particular variables matter and it looks like the these two T X bar one T X bar two which quite frankly I don't remember what those are but it's good their independence right having something to do with the type of treatment from this analysis so that's so that's a moderator analysis because it's the basics class really what you should be taking away from this is the conceptual framework that we've got some effect sizes we want to look at sort of the overall average we want to look at its confidence interval we'd want to look at how consistent it is homogeneity and maybe depending on our interest level the number of effect sizes etc we might want to try to explore why some studies have larger or smaller effects using either a categorical moderator analysis or regression based moderator now and says conceptually that's that's the bottom line here that hopefully you can you can take away from this now the other thing in the remaining few minutes is a forest plot it would probably take me an entire hour and 15 minutes to teach you how to create forest plots and that would just be in one program and then we take another hour in another program and okay because depending on which program it would depend but it's important just to sort of make sure you understand what they are it's a visual represents visual representation of the results it's that what we started with now the one that I presented was sort of a stripped-down version I wanted to make it as simple as possible but some of the things you would have you would have studied label mine had that it indicated the author in the year often you'll then have another column with other stuff like the study sample size sometimes a lot of you'll see studies with other columns with maybe sort of the percent successful percent unsuccessful you know so you can imagine you might have several columns of information that you might want to present to your reader but then that forest plot the really pretty part the graphic part you're going to have something to indicate the effect size a dot a diamond square some forest plot programs make this bigger or smaller depending on the size of the weight some people think that's really cool because then the studies get lots of weight get a lot of ink studies with very little weight get small ink my diamonds were all the same size okay you then have a horizontal line with the confidence interval this represents the confidence interval and then you're going to have at the bottom a row for the overall results the diamond now some people here then if up above it's a square there's a diamond down here there's various ways of doing this but the basic idea should be the same and so this is just where we where we start at the beginning so you see we've got this is just the rows with the labels we've got the diamonds for the effect sizes it's intervals and and the main results bare-bones simple meta-analysis yes yes yes yes yeah there sometimes they use the the diamond they make it the whole thing so the diamond is both the confidence interval and the the the center but the main thing that they're doing is they're often for like this one that's small they would try to make this diamond really big because this is getting more weight and this one down here that has very little precision they might make it small with a big line so they they try to have the amount of ink reflect how much weight it got now for me that confident that the width of that line tells me that but their argument is that but the line it's opposite the studies with these big lines is this a very precise effect sizer imprecise effect size imprecise so how much weights are going to get not much this one precise or imprecise is going to get a lot of weight er a little weight a lot of weight now which one gets more ink oops the look the little ones getting in this example it gets less ink and and so that's what they're trying to create correct is this conceptual this sort of this visual problem that in this style the effect sizes that essentially matter the most gets smallest amount of ink and and so that's that's a good question I should have brought an example of one of those okay some comments on software what software to use well you can use whether software you want you have several choices you've got some specialized software like rev man that was developed specifically for Cochran it makes nice force reasonably nice forest plots it's easy to create the poorest plots it calculates the effect sizes for you but only for the basic situations if you really have the means and the standard deviations and that sort of stuff it it can't handle all the wacky stuff that my exercise calculator can handle and it does the basic analyses but it doesn't do a moderator analysis or at least and somebody can correct me I haven't used it in a long time and they keep updating it does it handle moderate no one okay this is not the right class to ask that question to probably comprehensive meta-analysis in Rev man is free comprehensive meta-analysis you've got to pay for does a very nice job with calculating effect sizes doing analyses you code your studies right into it it manages the bibliography it's sort of the the one-stop shop for meta-analysis creates very nice forest plots but it cost money and when it comes to regression analysis it can only handle one variable the author Michael boards seen he keeps telling me yes I'll fix that we'll fix that at some point well I mean eventually he'll get around to having a regression program that can do a lot now the other alternative is to use existing stats program but sort of use add-on macros right so Stata there are lots of macros available for this I've written some many other people have written macros you download from the web and now Stata can do meta-analysis SPSS I think the only macros I'm a whale of it aware of for SPSS or from my website ones that I've created SAS I've created macros there's a book out that has macros but I'm not aware of other stuff really on the web for SAS other my stuff are anybody even heard of are alright a couple hands going up free program really cool kind of hard to learn lots of people have written meta-analysis routines for for are including a really nice forest plot a procedure in are computing effect sizes comprehensive meta-analysis does very nice job rev man limited abilities free on the web not to you know plug my own stuff but is my effects ice calculator will shadow also created one that's for sale but it was developed about twelve years ago and I don't know if it's been kept up to date a few final comments there's a lot of Jam through a lot there I'm proud of you guys you made it these methods continue to advance I've sort of presented the bare-bones but you know in early July I was in Ottawa spending three days with a bunch of people talking about at a conference Society for research synthesis methods talking about these methods and and having arguments about the advancement of these methods there's a lot of active work in this area something that I haven't talked about because it's really should get an entire class of its own is publication selection bias how to assess for it how to try to prevent it etc it's a very important in topic and something I didn't want to shortchange with one minute or two minutes the focus here was how to how to calculate those those fundamental meta-analysis statistics the other thing I told you you know only one effect size gets to go to each prom well there's some methods being development that would allow for much more complicated relationships going to the prom ways of handling essentially just all the effect sizes all at once here you go y'all can come and actually I think that talk is happening in the next room it's pretty cutting-edge stuff just been developed within the last year so is quite interesting but it won't be ready for just everyday use probably for a little while and we're still kicking the tires to make sure it really behaves the way we think it is some common errors in correctly computing effect sizes so the rest of this doesn't matter if you get those effect sizes wrong it's you really want to make sure you understand what you're doing when it comes to calculating effect sizes the other thing is a lot of people they don't recognize data that could be used to calculate effect sizes so often a lot of studies get kicked out that actually could make it into a study if they took a little more time thinking about how to calculate the effect sizes just using fixed effects models instead of random effects models remember just use a random effects model not using moderator analysis to compare means they talked about this earlier a lot of people essentially just do a meta-analysis on subsets and look at which ones are statistically significant which ones aren't that's not the same as whether things are statistically different and that's a very common air
Info
Channel: The Campbell Collaboration
Views: 44,014
Rating: undefined out of 5
Keywords: evidence synthesis, meta-analysis, systematic reviews, research, forest plot
Id: nkcZFAmmKeE
Channel Id: undefined
Length: 79min 8sec (4748 seconds)
Published: Sat Sep 24 2011
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.