Statistics 101: Two-way ANOVA with Replication, Marginal Means Graphs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- [Brandon] So let's go ahead and generalize this a little bit. So we have two-factor ANOVA. And let's look at all the possible outcomes we could have in terms of significance. So in this case, our Factor A, we'll call food, and on our Factor B, we'll call the feeding frequency, one or two times per day. And then Factor AB is the interaction. So we really have five possibilities here. Factor A, the food could be not significant. Factor B, the feeding could be significant. And the interaction could not be significant. And again, technically, we would look right to left. So that's actually what I'll do from here on out. So we go down to the second row. The interaction could not be significant. The Factor B, the feeding, could not be significant. And Factor A could not be significant. That's another option. Go, then, to the third row. Again, the interaction would not be significant. Factor B would not be significant. And maybe in this case, Factor A is significant. Go down to the fourth row. The interaction is not significant. But Factor B and A are both significant. And the final possibility is where the interaction is significant. And notice, once the interaction is significant, we don't even look, we don't even consider, the two factors individually. If the interaction is significant, the individual factors can not be analyzed, because they are two intertwined. They are too confounded together. We cannot look at them separately. So how would this actually look on an interaction graph? Let's really understand how these are made. So you can see here, along the bottom, we have our column effects. Remember, our column was our plant food. So in this case, I labeled them A, B, and C, same thing. So Food A, Food B, and Food C, those are our column effects. Each line is our row effects. So remember our rows in our table were the frequency with which we fed the plants. So we have column effects along the bottom, and the row effects are our actual lines. Now in the axis over here on the left, you can see that that is our growth in centimeters. And we're gonna be plotting the cell means on this graph. Now, when we go to our column effects, what we're gonna look for is, are the lines increasing, going up, or decreasing, coming down? Or, are they just flat, like you see them here? Now for the row effects, what we're gonna look for is, are they far apart, or are they close together? So think about this for a minute. Look at the column effects up-and-down arrow. If the plant grows more as we go from Food A to Food B to Food C, we would expect those lines to go up and to the right. Well, if the plants get taller as we go from Food A to Food B to Food C, we would probably think that the plant food makes a difference. It does have an effect. Now if it goes down and to the right or something like that, then again we would say it probably has an effect, in this case the opposite, that the plant growth actually goes down as we go from plant-food A to B to C. Now of course it can be sort of up and down, too, it doesn't have to be a straight line either way. And what about the row effects? If the row effects are far apart, it would seem that the frequency with which we feed the plants, makes a difference. So in this case, it seems like the two-feeding line is much higher than the one-feeding line. So it would appear, based on this graph, that the frequency with which we feed our plants, does make a difference. And again, these are just graphical models so we can understand how they're sort of made. So let's look at this one. And this is some hypothetical data, just to sort of get across how these graphs are made. So for one feeding, we had a growth of 50 centimeters for A, 50 centimeters for B, and 50 centimeters for C. So you can see, these points right here. Now for two feedings, we had 70 centimeters for Food A, 70 for Food B, and 70 for Food C. That's the top line. Now, do these lines go up and to the right, or down and to the right? Are they not flat, basically? Well, the lines are flat. So it seems that there's no change across the different plant foods. So those are our column C. So there's no change across plant foods because the lines are flat. Now what about the distance between the lines? Well these are our row effects. The lines are far apart. They change across the feedings. So our rows. And that should make sense based on our data up there. We can see that the two-feeding row is consistently 20 centimeters higher than the one-feeding row. So what we conclude, based on a graph that looks something like this... Well the lines do not cross, we know that. So, we would say that our columns are not significant, probably. We wouldn't know until we did the actual analysis, but it appears our columns, or our plant foods, are not significant, because the growth does not change at all across the plant foods. What about our row factor? Well in this case, it does appear that the row factor is significant. The lines are far apart. The two-feeding line is pretty far above the one-feeding line. And then, we already said that the lines did not cross, or are not going to cross, so there is no interaction here. There is no change between the terms as we go across the graph. No interaction. So what about one that looks like this? So here's our hypothetical data. So, for one feeding per day, we had 60, 60, 60. And then, for two feedings per day, we had 62, 62, 62. That's the top line. Are the lines increasing? No. Are the lines decreasing? No. So what does that tell us? Well, the lines are flat, so that means that the growth of the plants, again, does not change as we go from plant food to plant food. It's always 60, 60, 60, or 62, 62, 62. There's no change there. Now what about the distance between the lines? Again, this is our feeding lines. Well, the lines are very close together. There's really no significant change across the feedings. 62 is not that much different than 60. So, in this case, of course the lines did not cross. And, think about this in practical terms. Does it matter which plant food we use? No. Does it really matter which feeding we use, necessarily? One or two? Not really. We're getting about the same growth from one feeding, as two. One feeding is cheaper, so we'll probably just feed the plants once. Because it's not that big a deal to get two more centimeters, and a whole nother feeding. And of course, there's no interaction. So, how do we interpret this? Well, the columns, the foods, that's not significant, the lines are flat. The row factor, the frequency at which we feed the plants, that's not significant. Then of course, there is no interaction. So, neither main effect, or neither of the interaction, is not significant. And that's it. What about one that looked like this? So here's our data. So for one feeding per day, it was 50, then 55, and then 60. And then for two feedings per day, 52, 57, 62. Are our lines increasing up and to the right? Well, yes they are. Are they down to the right? No. But they are increasing up and to the right. So, what does this seem to indicate? Well the lines rise. So in this case, there is some change across the plant foods. So plant-food A seems to be the lowest in growth. Plant-food B seems to be the next-highest in growth. And plant-food C seems to be the highest in growth. There is change across the columns as we go from A, to B, to C. Now what about the distance between the lines? Well, it's not very much, it's still two again. So the lines are really close together, so it does not seem that the number of times we feed the plants per day, makes that much of a difference. Again, just two centimeters. And of course, the lines do not cross. So, two feedings per day is consistently better than one feeding per day, even though it's not by that much. So we could probably conclude here that the columns, C, the plants foods, are significant. So the growth does go up as we go from Food A to Food B to Food C. Now the rows, or the number of feedings per day, is not significant, because the lines are very close together. And of course, there is no interaction, because the two-feeding line is always above the one-feeding line. So, in this case, all else being equal, we would probably select Food C and feed the plants once per day. Because it was consistently the highest. But, feeding it twice doesn't really get us anything. So, how about this one? So here's our data. So for one feeding, we have 50, 55, and 60. That's the line on the bottom. And then for two feedings per day, we have 70 centimeters, 75, and 80. Hmm, so. Are the lines going up and to the right, or down and to the left? Well yeah, they're going up and to the right again. So what can we conclude from that? Well, the lines rise. So there is a change in growth, again, across the plant foods. As we go from A to B to C, both the one-feeding line and the two-feeding line go up and to the right. What about the distance between the lines? So, our feeding lines? Well, they're pretty far apart. So, it does seem that two feedings per day generates a lot more growth. And, do the lines cross? No. Again, because two feedings per day is consistently higher than the one feeding per day line as we go across. So what do we conclude here? Well, the column effects, the foods, are significant. That's because the lines go up and to the right. Now the row effect, or the feeding frequency, is also significant because the lines are far apart. The two-feeding line is always above, and pretty much, you know, higher, 20-centimeters higher than the one-feeding line. Now there is no interaction here. Because the two-feeding line is always above the one-feeding line, they do not cross. Now, how about this one? This one looks different. So, for the one feeding per day, we had a growth of 50 for plant-food A, 60 for plant-food B, and 70 for plant-food C. Now, for the two-feeding line, we had 70 for plant-food A, 60 for plant-food B, and 50 centimeters for plant-food C. Well, that's kind of weird, isn't it? So, do the lines cross? Yes. And notice we stop there. Because we really couldn't say that there's an upward or a downward pattern overall, could we? No. We really couldn't gauge that the lines are far apart because, well, they cross. So, we look at the interaction first. The lines cross, and we stop. So, our interaction is significant. Neither line is above the other one, consistently, across the graph. They cross. So therefore, we would anticipate a significant interaction. And if the interaction is significant, we stop there. We do not analyze the columns, or the rows. We can look at a quick example, just another one, so you can see how this works. We won't go through all that again. But here we have a table. And these are shoe stores. And on the top in the columns, we have the number of competitors, so the competitor shoe stores, within a five-mile radius of each location. So zero competitors in a five-mile radius. One competitor, two, or three or more. On the left-hand side are rows. We have the type of store it is. So we have the standalone store, which is just a store by itself that's sitting out somewhere. Then we have a store that's in the mall. So maybe in a suburban mall, or something like that. And then finally, we have stores that are located in a downtown, sort of urban area. So those are our two factors: the number of competitors in a five-mile radius, and the type of store it is. And then here, we have the average shoe sales per week. So we can see the standalone store with zero competitors around it, had an average shoe-pair sale of 38.667 pairs of shoes. And so on and so forth. So we can see that each column has its own mean. So the zero-competitor column has 30.444. One-competitor column has 29.556 pairs of shoes on average. Et cetera, and so on. Also, the type of store has its own mean here in the end. And then down on the bottom right, we have the overall mean for all the shoe stores. And there are 10 shoe stores in each cell. So what you're looking at here are cell means. Cell means. So 10 stores are combined to generate these means in this table. So let's go ahead and look at our estimated marginal means graph that comes out of SPSS. Now I've tried to color-code these, so you can see exactly where everything is. So let's go ahead and look at the standalone row. That's the blue line. It's also the blue row in the table over here on the left. So you can see we have 38.667, 36, 52.667, and 42. So if we look at the blue line over here on the right, that follows that pattern: 38, 36, 53, about, and then 42. Then of course, the green row is the mall. So on the table, we have 26, 31, 47, 46. If we look at the row over here on the right, in the graph, the green line follows those numbers exactly. Then we have the downtown line. So 26, 21, 27, 27. So this sort of light-brown line along the bottom, and our marginal means follows those numbers. Now, let's go ahead and look at this in a different way. So here is the same thing we had before. And I want you to follow along. This is really important to understanding what we're trying to learn here. So let's take the overall mean. So the overall mean is 35.278. And I gave that a dashed line. Now let's go ahead and take that dashed line and put it on our marginal-means graph. So we'll put it right there. So that is a value of 35.278, approximately. And we'll put that right across to our marginal-means graph. And again, that's the overall mean for all stores. Now let's go ahead and give the zero column a red dot. So, the zero-competitors column had a mean of 30.444. Now let's go ahead and represent that on our graph over here on the right. So, boop, there it is. Now, if you look at our column, if you put your hand over everything to the right of zero, forget it exists. Now we go ahead and put a red dot where our column mean is for all those zero points. So you can see that we have 26 for the green, we have 26.6-ish for the light brown, and then 38.667 for the blue. The red dot there is the mean, for that column. Let's go and do the same thing for one competitor. And there it is. So 29.556. So that is the mean of those three dots that make up the one-competitor column. Now for two competitors, we'll put a green dot. And there it is. So that is the mean for those three points in the two column. And then a yellow dot for three or more, so put that right there. So those four dots are the column means for our number of stores. So a 30.44, 29.556, 42.556, and 38.556. So red, blue, green, and yellow. Now, let's do the same for our rows. So we'll give the standalone stores a light blue star. So, let's go ahead and put that mean there. So what is that? Well that is the mean of the blue line. So the four points that make up the blue line... So it starts at around 40, it goes down, then goes way up, and then comes down again. The mean of those four points is 42.333, and that's about right there, relative to the values on our graph. Now do the same thing for the mall. So we can see that the mean for the mall is 37.667, and we'll put that right there. So, that green star is the mean of the four points that make up the green line. And finally, the downtown. We'll give that sort of a pink color. So we'll do that. And that is the mean of those four points. So the four points that make up the downtown line there along the bottom, that mean is there. So really, take a look at that. That's really important for understanding how this works. Right, so we'll go ahead and put everything back. So you can see how it all fills in. And there we go. Now, on the lower left, we have the output from SPSS. So what's the first thing we look at? The interaction term, exactly. So we look at competitors by location. There's our interaction term highlighted in yellow. So we go across. Its significance is point-016, with an F of 3.315. So if we're at a point-05 significance level, that interaction term is significant. So again, we would not... We would not go on and evaluate the two... The two individual factors independently. So we'll say the interaction is significant, and we stop there. Now there are ways to put the sort of effects size into your research paper or something like that for those other terms, but we're not gonna go into that. But really, in the analysis, we look at the interaction first. Now let's go back to our food data, our plant food. So here's everything we had from the plant food. Same idea. So, we're gonna put our overall mean there, where the black-dotted line is. So our overall mean is 59.09. So we'll put it right across there. Now for the Awesome Advantage column, a red dot again, that column had a mean of 63.5. So we'll put that right there. So again, that's the mean of those two points that make up the AA column. For BB, for Big Buds, we had a column mean of 64.75. So we'll go ahead and put that there. So again, that's the mean of those two cell means. And then for Food CC, same idea. It is there. So again, those are our column means. So those are the means of the two points that make up each column. Now let's do the rows. So, for the one-feeding row, we're gonna put that there. So it had a mean of 60, so it's almost right on top of our overall mean at 59.09. And then our two-feeding row had a mean of 58.17. So it's almost right on top of that one too. Now, let's go ahead and look at our SPSS output. So again, we look at the interaction first. So there it is. So feedings by plant food. And we see that we have an F-value of 9.333, and again our significance is point-000, so that is obviously significant. So we would really stop there in our analysis. We would not go on to evaluate the individual effects. But, for this problem, we're gonna take a look at that. Because this is really important to understanding how all this fits together. Look over at our graph here on the right, okay? Now look at our overall mean line, the black-dashed line. Which of the two are, on average, further away from that? The column means, so the types of plant food, or the row means, the frequency at which we feed? Well according to my vision, it looks like there is more variability in the type of plant food. So you can see that the distance from the dashed line to the dots, is much further than the dashed line to the stars. Now, go over and look at our SPSS output again. Look at the significance level for feedings. Those are our stars. So we have an F of point-578, and a significance value of point-451. Those are our stars. Now I want you to connect the F in significance for the feedings on the left, to where the stars are, relative to the dashed line over here on the right. Are they far away? No. Is there a lot of variability in them? No. They're almost right on top of the overall mean. That's why it's not significant. Because they're right on top of the overall mean. There is no variability there. There is no difference there. Now, look at the plant-food factor. It has an F of 16.526, and a significance value of point-000. So if we were looking at it on its own, we would definitely say it's significant. But of course, there's an interaction, so we don't do that. But, look at how the SPSS output corresponds to the means over here on the right. So for the plant foods, there's a lot of variability among them. The red and the blue are above the overall mean. The green is significantly below the overall mean. There's a lot of distance there. There's a lot of variability there. There's a lot of change there as it corresponds to the overall mean. As compared to the stars. And that's how we can interpret the, visually, the SPSS output over here on the left. If you can understand what's going on in this graph, in this slide here, you've got it. Alright, so let's sum this up and get done with this video. So, interaction graphs are an easy way to eyeball the relationships between cell means, column means, row means, and the overall mean. Now, crossed-row lines indicate, usually, an interaction. Column means spread far apart from the overall mean, like we saw last time, indicate a significant column factor. Row means spread far apart from the overall mean usually indicate a significant row factor. So if you remember that interaction graph with the dots and the stars, the further away the dots and the stars are from that overall-mean line, the more likely those are to be significant. That of course depends on the data as far as whether or not there is an interaction. Because, both main effects can be significant without there being a significant interaction. It just depends on where those dots and stars are, relative to each other. And that's the whole point. So, just because each effect is significant, does not mean there's an interaction that's significant. That's important to remember. But if an interaction exists, and it is significant, the row and column effects can not be evaluated individually. The main effects are too intertwined. They are too confounded together to look at individually, because the values change as you go across. And you really can't untangle it. It's like a know you can not untangle. Okay, so that wraps up our video on The Analysis of Variance, Part Two, where we're looking at graphs of marginal means. And again, I hope that you realize that this sort of visual tool can really key you in as to what's going on in your data. So if you make a simple graph of marginal means, it can tell you a lot about your data before you even run it through a statistical software package. Now I will say that the examples that we used here were simple. We didn't have more than three factors in any variable, and they were pretty obvious. Now you can have multiple levels in a variable. So you could have, you know, six competitor levels, and five different types of stores if you wanted to. And the interaction graph would be, well may be hard to decipher if you have a lot of levels in your variables. So the ones we were looking at are relatively simple. But once you pick up the pattern, even in more complex ones, you'll be able to see these patterns (light music) in the interaction graphs, or the graphs of marginal means, that again will help you interpret what's going on with your data. So, thank you very much for watching. I wish you the best of luck in your work and in your studies, and look forward to seeing you again next time. (light music)
Info
Channel: Brandon Foltz
Views: 49,888
Rating: 4.9267178 out of 5
Keywords: marginal means, estimated marginal means, estimated marginal means spss, brandon foltz anova, two way anova with replication, statistics 101 anova, ANOVA basics, two-way ANOVA, two-factor ANOVA, completely randomized block design, ANOVA introduction, analysis of variance, one way ANOVA, Two-way Analysis Of Variance, brandon foltz, statistics 101, two way ANOVA, two way ANOVA without replication, two factor ANOVA, simple linear regression, machine learning, main effects
Id: GquPk1_CVcM
Channel Id: undefined
Length: 28min 53sec (1733 seconds)
Published: Thu Oct 24 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.