- [Brandon] So let's go
ahead and generalize this a little bit. So we have two-factor ANOVA. And let's look at all
the possible outcomes we could have in terms of significance. So in this case, our
Factor A, we'll call food, and on our Factor B, we'll
call the feeding frequency, one or two times per day. And then Factor AB is the interaction. So we really have five possibilities here. Factor A, the food could
be not significant. Factor B, the feeding
could be significant. And the interaction
could not be significant. And again, technically, we
would look right to left. So that's actually what
I'll do from here on out. So we go down to the second row. The interaction could not be significant. The Factor B, the feeding,
could not be significant. And Factor A could not be significant. That's another option. Go, then, to the third row. Again, the interaction
would not be significant. Factor B would not be significant. And maybe in this case,
Factor A is significant. Go down to the fourth row. The interaction is not significant. But Factor B and A are both significant. And the final possibility is where the interaction is significant. And notice, once the
interaction is significant, we don't even look, we
don't even consider, the two factors individually. If the interaction is significant, the individual factors
can not be analyzed, because they are two intertwined. They are too confounded together. We cannot look at them separately. So how would this actually
look on an interaction graph? Let's really understand
how these are made. So you can see here, along the bottom, we have our column effects. Remember, our column was our plant food. So in this case, I labeled
them A, B, and C, same thing. So Food A, Food B, and Food C, those are our column effects. Each line is our row effects. So remember our rows in our table were the frequency with
which we fed the plants. So we have column
effects along the bottom, and the row effects are our actual lines. Now in the axis over here on the left, you can see that that is
our growth in centimeters. And we're gonna be plotting the cell means on this graph. Now, when we go to our column effects, what we're gonna look for is, are the lines increasing, going up, or decreasing, coming down? Or, are they just flat, like you see them here? Now for the row effects,
what we're gonna look for is, are they far apart, or are they close together? So think about this for a minute. Look at the column
effects up-and-down arrow. If the plant grows more as we go from Food A to Food B to Food C, we would expect those lines
to go up and to the right. Well, if the plants get taller as we go from Food A to Food B to Food C, we would probably think that the plant food makes a difference. It does have an effect. Now if it goes down and to the
right or something like that, then again we would say
it probably has an effect, in this case the opposite, that the plant growth actually goes down as we go from plant-food A to B to C. Now of course it can be
sort of up and down, too, it doesn't have to be a
straight line either way. And what about the row effects? If the row effects are far apart, it would seem that the frequency with which we feed the
plants, makes a difference. So in this case, it seems
like the two-feeding line is much higher than the one-feeding line. So it would appear, based on this graph, that the frequency with
which we feed our plants, does make a difference. And again, these are just graphical models so we can understand how
they're sort of made. So let's look at this one. And this is some hypothetical data, just to sort of get across
how these graphs are made. So for one feeding, we had a
growth of 50 centimeters for A, 50 centimeters for B,
and 50 centimeters for C. So you can see, these points right here. Now for two feedings, we had
70 centimeters for Food A, 70 for Food B, and 70 for Food C. That's the top line. Now, do these lines go
up and to the right, or down and to the right? Are they not flat, basically? Well, the lines are flat. So it seems that there's no change across the different plant foods. So those are our column C. So there's no change across plant foods because the lines are flat. Now what about the
distance between the lines? Well these are our row effects. The lines are far apart. They change across the feedings. So our rows. And that should make sense
based on our data up there. We can see that the two-feeding row is consistently 20 centimeters higher than the one-feeding row. So what we conclude, based on a graph that looks
something like this... Well the lines do not cross, we know that. So, we would say that our columns are not significant, probably. We wouldn't know until we
did the actual analysis, but it appears our columns,
or our plant foods, are not significant, because the growth does not change at all
across the plant foods. What about our row factor? Well in this case, it does appear that the row factor is significant. The lines are far apart. The two-feeding line is pretty far above the one-feeding line. And then, we already said
that the lines did not cross, or are not going to cross, so there is no interaction here. There is no change between the terms as
we go across the graph. No interaction. So what about one that looks like this? So here's our hypothetical data. So, for one feeding per
day, we had 60, 60, 60. And then, for two feedings per day, we had 62, 62, 62. That's the top line. Are the lines increasing? No. Are the lines decreasing? No. So what does that tell us? Well, the lines are flat, so that means that the
growth of the plants, again, does not change as we go from plant food to plant food. It's always 60, 60, 60, or 62, 62, 62. There's no change there. Now what about the
distance between the lines? Again, this is our feeding lines. Well, the lines are very close together. There's really no significant change across the feedings. 62 is not that much different than 60. So, in this case, of course the lines did not cross. And, think about this in practical terms. Does it matter which plant food we use? No. Does it really matter which
feeding we use, necessarily? One or two? Not really. We're getting about the same growth from one feeding, as two. One feeding is cheaper, so we'll probably just
feed the plants once. Because it's not that big a deal to get two more centimeters, and a whole nother feeding. And of course, there's no interaction. So, how do we interpret this? Well, the columns, the foods, that's not significant,
the lines are flat. The row factor, the frequency
at which we feed the plants, that's not significant. Then of course, there is no interaction. So, neither main effect, or neither of the interaction,
is not significant. And that's it. What about one that looked like this? So here's our data. So for one feeding per day, it was 50, then 55, and then 60. And then for two feedings
per day, 52, 57, 62. Are our lines increasing
up and to the right? Well, yes they are. Are they down to the right? No. But they are increasing
up and to the right. So, what does this seem to indicate? Well the lines rise. So in this case, there is some change across the plant foods. So plant-food A seems to
be the lowest in growth. Plant-food B seems to be
the next-highest in growth. And plant-food C seems to
be the highest in growth. There is change across the columns as we go from A, to B, to C. Now what about the
distance between the lines? Well, it's not very much,
it's still two again. So the lines are really close together, so it does not seem
that the number of times we feed the plants per day, makes that much of a difference. Again, just two centimeters. And of course, the lines do not cross. So, two feedings per day
is consistently better than one feeding per day, even though it's not by that much. So we could probably conclude here that the columns, C, the plants foods, are significant. So the growth does go up as we go from Food A to Food B to Food C. Now the rows, or the
number of feedings per day, is not significant, because the lines are very close together. And of course, there is no interaction, because the two-feeding line is always above the one-feeding line. So, in this case, all else being equal, we would probably select Food C and feed the plants once per day. Because it was consistently the highest. But, feeding it twice doesn't
really get us anything. So, how about this one? So here's our data. So for one feeding, we
have 50, 55, and 60. That's the line on the bottom. And then for two feedings per day, we have 70 centimeters, 75, and 80. Hmm, so. Are the lines going up and to the right, or down and to the left? Well yeah, they're going
up and to the right again. So what can we conclude from that? Well, the lines rise. So there is a change in growth, again, across the plant foods. As we go from A to B to C, both the one-feeding line
and the two-feeding line go up and to the right. What about the distance between the lines? So, our feeding lines? Well, they're pretty far apart. So, it does seem that two feedings per day generates a lot more growth. And, do the lines cross? No. Again, because two feedings per day is consistently higher than
the one feeding per day line as we go across. So what do we conclude here? Well, the column effects, the foods, are significant. That's because the lines
go up and to the right. Now the row effect, or
the feeding frequency, is also significant because
the lines are far apart. The two-feeding line is always above, and pretty much, you know, higher, 20-centimeters higher than the one-feeding line. Now there is no interaction here. Because the two-feeding
line is always above the one-feeding line, they do not cross. Now, how about this one? This one looks different. So, for the one feeding per day, we had a growth of 50 for plant-food A, 60 for plant-food B, and 70 for plant-food C. Now, for the two-feeding line, we had 70 for plant-food A, 60 for plant-food B, and 50 centimeters for plant-food C. Well, that's kind of weird, isn't it? So, do the lines cross? Yes. And notice we stop there. Because we really couldn't say that there's an upward
or a downward pattern overall, could we? No. We really couldn't gauge
that the lines are far apart because, well, they cross. So, we look at the interaction first. The lines cross, and we stop. So, our interaction is significant. Neither line is above the other one, consistently, across the graph. They cross. So therefore, we would anticipate
a significant interaction. And if the interaction is
significant, we stop there. We do not analyze the
columns, or the rows. We can look at a quick
example, just another one, so you can see how this works. We won't go through all that again. But here we have a table. And these are shoe stores. And on the top in the columns, we have the number of competitors, so the competitor shoe stores, within a five-mile
radius of each location. So zero competitors in a five-mile radius. One competitor, two, or three or more. On the left-hand side are rows. We have the type of store it is. So we have the standalone store, which is just a store by itself that's sitting out somewhere. Then we have a store that's in the mall. So maybe in a suburban mall,
or something like that. And then finally, we have
stores that are located in a downtown, sort of urban area. So those are our two factors: the number of competitors
in a five-mile radius, and the type of store it is. And then here, we have the
average shoe sales per week. So we can see the standalone store with zero competitors around it, had an average shoe-pair sale
of 38.667 pairs of shoes. And so on and so forth. So we can see that each
column has its own mean. So the zero-competitor column has 30.444. One-competitor column has 29.556
pairs of shoes on average. Et cetera, and so on. Also, the type of store has its own mean here in the end. And then down on the bottom
right, we have the overall mean for all the shoe stores. And there are 10 shoe stores in each cell. So what you're looking
at here are cell means. Cell means. So 10 stores are combined to generate these means in this table. So let's go ahead and look at our estimated marginal means graph that comes out of SPSS. Now I've tried to color-code these, so you can see exactly
where everything is. So let's go ahead and look
at the standalone row. That's the blue line. It's also the blue row in the
table over here on the left. So you can see we have 38.667, 36, 52.667, and 42. So if we look at the blue
line over here on the right, that follows that pattern: 38, 36, 53, about, and then 42. Then of course, the green row is the mall. So on the table, we have 26, 31, 47, 46. If we look at the row over here
on the right, in the graph, the green line follows
those numbers exactly. Then we have the downtown line. So 26, 21, 27, 27. So this sort of light-brown
line along the bottom, and our marginal means
follows those numbers. Now, let's go ahead and look
at this in a different way. So here is the same thing we had before. And I want you to follow along. This is really important to understanding what we're trying to learn here. So let's take the overall mean. So the overall mean is 35.278. And I gave that a dashed line. Now let's go ahead and
take that dashed line and put it on our marginal-means graph. So we'll put it right there. So that is a value of
35.278, approximately. And we'll put that right across
to our marginal-means graph. And again, that's the
overall mean for all stores. Now let's go ahead and give the zero column a red dot. So, the zero-competitors column had a mean of 30.444. Now let's go ahead and represent that on our graph over here on the right. So, boop, there it is. Now, if you look at our column, if you put your hand over
everything to the right of zero, forget it exists. Now we go ahead and put a red dot where our column mean is for all those zero points. So you can see that we
have 26 for the green, we have 26.6-ish for the light brown, and then 38.667 for the blue. The red dot there is the
mean, for that column. Let's go and do the same
thing for one competitor. And there it is. So 29.556. So that is the mean of those three dots that make up the one-competitor column. Now for two competitors,
we'll put a green dot. And there it is. So that is the mean for those three points in the two column. And then a yellow dot for three or more, so put that right there. So those four dots are the column means for our number of stores. So a 30.44, 29.556, 42.556, and 38.556. So red, blue, green, and yellow. Now, let's do the same for our rows. So we'll give the standalone
stores a light blue star. So, let's go ahead and
put that mean there. So what is that? Well that is the mean of the blue line. So the four points that
make up the blue line... So it starts at around 40, it goes down, then goes way up, and
then comes down again. The mean of those four points is 42.333, and that's about right there, relative to the values on our graph. Now do the same thing for the mall. So we can see that the mean
for the mall is 37.667, and we'll put that right there. So, that green star is the mean of the four points that
make up the green line. And finally, the downtown. We'll give that sort of a pink color. So we'll do that. And that is the mean of those four points. So the four points that
make up the downtown line there along the bottom, that mean is there. So really, take a look at that. That's really important for
understanding how this works. Right, so we'll go ahead
and put everything back. So you can see how it all fills in. And there we go. Now, on the lower left, we have the output from SPSS. So what's the first thing we look at? The interaction term, exactly. So we look at competitors by location. There's our interaction
term highlighted in yellow. So we go across. Its significance is point-016, with an F of 3.315. So if we're at a point-05
significance level, that interaction term is significant. So again, we would not... We would not go on and evaluate the two... The two individual factors independently. So we'll say the
interaction is significant, and we stop there. Now there are ways to put
the sort of effects size into your research paper
or something like that for those other terms, but
we're not gonna go into that. But really, in the analysis, we look at the interaction first. Now let's go back to our
food data, our plant food. So here's everything we
had from the plant food. Same idea. So, we're gonna put
our overall mean there, where the black-dotted line is. So our overall mean is 59.09. So we'll put it right across there. Now for the Awesome Advantage column, a red dot again, that column had a mean of 63.5. So we'll put that right there. So again, that's the
mean of those two points that make up the AA column. For BB, for Big Buds, we had a column mean of 64.75. So we'll go ahead and put that there. So again, that's the mean
of those two cell means. And then for Food CC, same idea. It is there. So again, those are our column means. So those are the means of the two points that
make up each column. Now let's do the rows. So, for the one-feeding row, we're gonna put that there. So it had a mean of 60, so it's almost right on top
of our overall mean at 59.09. And then our two-feeding row had a mean of 58.17. So it's almost right
on top of that one too. Now, let's go ahead and
look at our SPSS output. So again, we look at
the interaction first. So there it is. So feedings by plant food. And we see that we have
an F-value of 9.333, and again our significance is point-000, so that is obviously significant. So we would really stop
there in our analysis. We would not go on to evaluate
the individual effects. But, for this problem, we're
gonna take a look at that. Because this is really
important to understanding how all this fits together. Look over at our graph
here on the right, okay? Now look at our overall mean
line, the black-dashed line. Which of the two are, on average, further away from that? The column means, so the types of plant food, or the row means, the frequency at which we feed? Well according to my vision, it looks like there is more variability in the type of plant food. So you can see that the
distance from the dashed line to the dots, is much further than the dashed line to the stars. Now, go over and look at
our SPSS output again. Look at the significance
level for feedings. Those are our stars. So we have an F of point-578, and a significance value of point-451. Those are our stars. Now I want you to connect
the F in significance for the feedings on the left, to where the stars are, relative to the dashed line
over here on the right. Are they far away? No. Is there a lot of variability in them? No. They're almost right on
top of the overall mean. That's why it's not significant. Because they're right on
top of the overall mean. There is no variability there. There is no difference there. Now, look at the plant-food factor. It has an F of 16.526, and a significance value of point-000. So if we were looking at it on its own, we would definitely say it's significant. But of course, there's an
interaction, so we don't do that. But, look at how the SPSS output corresponds to the means
over here on the right. So for the plant foods, there's a lot of variability among them. The red and the blue are
above the overall mean. The green is significantly
below the overall mean. There's a lot of distance there. There's a lot of variability there. There's a lot of change there as it corresponds to the overall mean. As compared to the stars. And that's how we can interpret the, visually, the SPSS output
over here on the left. If you can understand what's
going on in this graph, in this slide here, you've got it. Alright, so let's sum this up
and get done with this video. So, interaction graphs
are an easy way to eyeball the relationships between cell means, column means, row means, and the overall mean. Now, crossed-row lines indicate,
usually, an interaction. Column means spread far
apart from the overall mean, like we saw last time, indicate a significant column factor. Row means spread far apart
from the overall mean usually indicate a significant row factor. So if you remember that interaction graph with the dots and the stars, the further away the
dots and the stars are from that overall-mean line, the more likely those
are to be significant. That of course depends on the data as far as whether or not
there is an interaction. Because, both main
effects can be significant without there being a
significant interaction. It just depends on where
those dots and stars are, relative to each other. And that's the whole point. So, just because each
effect is significant, does not mean there's an
interaction that's significant. That's important to remember. But if an interaction exists,
and it is significant, the row and column effects can not be evaluated individually. The main effects are too intertwined. They are too confounded together to look at individually, because the values
change as you go across. And you really can't untangle it. It's like a know you can not untangle. Okay, so that wraps up our video on The Analysis of Variance, Part Two, where we're looking at
graphs of marginal means. And again, I hope that you realize that this sort of visual tool can really key you in as to
what's going on in your data. So if you make a simple
graph of marginal means, it can tell you a lot about your data before you even run it through a statistical software package. Now I will say that the examples that
we used here were simple. We didn't have more than
three factors in any variable, and they were pretty obvious. Now you can have multiple
levels in a variable. So you could have, you know, six competitor levels, and five different types
of stores if you wanted to. And the interaction graph would be, well may be hard to decipher if you have a lot of
levels in your variables. So the ones we were looking
at are relatively simple. But once you pick up the pattern, even in more complex ones, you'll be able to see these patterns (light music)
in the interaction graphs, or the graphs of marginal means, that again will help you interpret what's going on with your data. So, thank you very much for watching. I wish you the best of luck in your work and in your studies, and look forward to seeing
you again next time. (light music)