ANOVA and MANOVA Analysis in R

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everyone welcome back to my channel when i talk about all things tech and finance and in this video i'm going to be going over the theoretical and applied applications of anova and manova analysis anova is used to find group differences between one dependent variable and menova is used to find group differences among more than two dependent variables you're going to want to use manova if the dependent variables are actually correlated with one another if they're not correlated then it just does not make any sense to utilize minova to identify relationships among your many different dependent variables you can technically run anova on all of your dependent variables so you have an anova test for each of your dependent variables separately but manova could do this simultaneously where you have all of your dependent variables that are your y variables and you have your independent variables that are trying to predict those dependent variables so you can have the relationship status inside of this particular analysis when you're utilizing manova so just to use an example let's try to answer the following do sophomores and juniors defer in their sat scores on the math section in reading and writing section notice in this question prompt there are two or more dependent variables where the dependent variables in this case is a math section in the reading and writing section scores you could technically run two separate anovas one that has a dependent variable of the math section score and the other has a dependent variable of a reading and writing section score but you won't be taking advantage of the unique effects that manova will provide when looking at the results simultaneously manova will attest to see if there is a difference between sophomores and juniors when considering the simultaneous scores of their math and reading and writing sections we usually have a p-value of 0.05 or 5 percent however you can change its threshold value to 10 percent 1 or whichever values you deem fit it is entirely subjective if minova values are significant meaning if your model p values are less than the threshold value you can now interpret the anova results with n number of dependent variables divide your threshold value by the number of dependent variables so if we were to have a p value of 0.05 and we have two dependent variables we will have a new exacting p-value of 0.025 or 2.5 percent which we'll then use to see if our variables matter to the overall model this is known as a bonferroni correction okay so first things first we want to make sure we set our working directory to wherever our location of the data is located at and i'll be working with our data set cooked turkey where essentially i will be utilizing these two variables the representation variable and the treatment variable in order to determine our dependent variables that we have going on over here and these two dependent variables are cooking loss ph level moisture fats hex value which is the hexanol content non-hem which is that i'm not even going to try to say that so iron contents and we can have a cooking time i went ahead and i've already written out some of the logic but essentially what you want to first do is to read in your data i'll be working with the cooked turkey data set where i have all these variables and our dependent variables are going to be the cooking loss the ph the moisture the fat levels the hex which is the hexanol content and the non-hem which is the which is that and the cooking time and we'll be predicting these variables utilizing the repetition variable and the treatment variable our dataset is not that large we only have about 25 observations with nine total variables and so the entire point of this is to identify the relationships among our different features given all of our numerical data that we are working with here and so the overall question that we want to find an answer to is to see if there is a significant difference in our seven dependent variables in our data set coming our independent variables and in order to do some cleaning i have i just set all my y variables of the cooking loss ph moisture fat hex not hem and cooking time to the y variables and i associated my rep variable and my treatment variable as a factor in order to act as the independent variables that will be utilizing and over here i've written out the logic on how to actually utilize a minoma function really quite simple it's very similar to the linear modeling function where we have our y variable tilde x variables and so since this is an additive model we're not associating additional factors we just have our repetition factor in the representation factor and we have the treatment factor over here so let's run that and i need to run this first of course and then this and so as we can tell this is not very interpretable now we just have associated terms with the residuals and its associated values with that and so we have to run an additional test to get some more information out of this minova test and as i have said in our theoretical portion we are going to be utilizing a p-value of five percent as our threshold we don't necessarily have to use five percent as our threshold but in this case i will the main point here is to make sure that our p-value is consistent throughout all of our tests and we're going to be using the wilks roy's hoteling lawley and the police test in order to see if our independent variables have any effect to our dependent variables and so let's run the wilkes test and just for demonstration this is the anova table and the most important part here is to see if our p values are actually less than 0.05 if they are then we have sufficient information to reject our null hypothesis and our null hypothesis is if our means are equal to each other and so it passed our wilkes test to see if it passed our roy's it does so that's fine hoteling lollies test that's okay too let's run the police test and as we can see here our treatment factor is actually not significant it's actually greater than 0.05 and we'll take a look into this treatment factor very soon also one thing to note the police test is actually considered to be probably the most powerful and more most robust model to use especially if our assumptions about linearity are not met and so this is usually the go-to in the state of the art test to actually utilize when we are working with like real real life data in order to find the relationships between our independent variables and our dependent variables we're actually very unclear as to what effect our independent variable has in this case our representation factor has on our y variables and notice our y variables we have a lot of y variables going on and we don't know what effect our representation factor here has on each and each individual factor we have going on over here so one way to do this is to run a summary dot analysis of variance function and we will have all of the separate anova tests associated with our given model and we have the given response variable so in this case we have the cooking laws and we have the independent variables that have somewhat of a influence on that cooking loss as we can see the representation factor actually has an effect but treatment factor does not we can look at our ph values none of our factors actually have an influence on our ph values moisture treatment factor actually does have a representation on the moist value and so on and so forth so we can easily check out which of these factors have an influence on one of our dependent variables and we can easily just work from there and so since our treatment value from what we've actually ran earlier here did not really have much of an effect on our overall model one way to identify what is potentially the best treatment to use is to actually utilize the box plots and i'll just be running through all of our given dependent variables we have cooking laws ph moisture etc cooking time and we're gonna be finding a relationship between those dependent variables and we're going to be finding in relation to the treatment value over here so i'll just be running through this let's actually zoom in this plot actually let's just run everything over here run all that zoom that in so you can actually see that those put me over here so yes um in order to identify what is potentially the best treatment we just want to see if the means are relatively equal so if they're not relatively equal like as if like there's like a nice horizontal line then that treatment is probably not the best we want to have a wide variety of the specific treatments that we are working with and this is a one neat way in order to see what type of treatments are ones that we can take a look at further as you can see here for instance the cook time versus the treatment the means are relatively equal so this is probably not probably like the best type of model in order to identify that with and so we can actually take a further look at the cooking time and response so yeah so over here the treatment versus the cooking time to see if there's a relationship there is not really a relationship because our p-value is so much greater than 0.05 and as you can see here our means are actually really close so this is actually another way to see just like visually to see if our given model between the independent variable and our dependent variable has an influence with each other and so from here once we sort of identified what our given relationships potential relationships has on our given dependent variables we would want to run a linear hypothesis test over here so loaded in this package is called can disk it's very very useful in order to run our generalized hypotheses testing and to identify contrasts and so let's run our model 2 in the second model we have our y variables and we just have our treatment factors since we want to find a relationship between a sub treatment factor and our given y variable same data set cook turkey and our hypothesis here this is essentially related to our test that we have going on over here so each of these represents a specific feature so and these features represents the order at what treatment you are looking at so if we're going to have a zero we don't we're not going to be considering this particular feature or value for our given treatment value over there so we also want to make sure that the summation of our contrast are equal to zero these are essentially just weights that are being linearly combinated with our given values that we have over here and we are essentially trying to find if there is a contrasting relationship between in this case our second treatments and our third treatments see if there's a relationship there and you can easily just run this through through all of our five features that we have going on over here and see if there's actually a relationship so one way to do this we just run our hypothesis and run the linear hypothesis here and as we can see there is actually somewhat of a relationship going on for our relationship for our y variables and treatment factor especially if our treatment factors are being manipulated by our given weights that we have associated up here and so since all of our tests are actually less than five percent then this is a good sign that the specific there's specific observations in our treatment are not valuable to our overall model so let's actually take a look at the next one let's look at the fourth and fifth treatments to see if there's a contrasting relationship same thing run your hypothesis i just named this hypothesis one and the same model that would be running over here and boom as we can see our p-values are actually greater than 0.05 meaning that there is actually no contrasting influence between our fourth and fifth observations in our treatment factors over here let's zoom that up over here so fourth and fifth there's no contrasting relationship between the two in terms of all of our relationships of our treatments and our y variables so overall we have two different analyses that we can actually use we have the anova test which we primarily use in order to identify the relationships among our independent variables and one dependent variable we have the manova test which we will utilize if we have more than one dependent variable and to find the relationships among our independent and dependent variables in this case i had seven i had seven dependent variables and i wanted to find the relationships among the representation factor and the treatment factor we have up here to see if there's any relationships between these two independent variables with our seven dependent variables we utilize the mendova in order to identify the relationships among our dependent variables and the anova test does not take this into consideration so if we're working with more than one dependent variable i highly recommend that you utilize a minova test to see if there's any relationships between the independent and dependent variables and the minova test actually considers relationships among the dependent variables together so i hope that you enjoyed this video make sure you hit that like button and subscribe turn those notifications on i hope to see you in the next video thank you so much for watching

Info

Channel: Spencer Pao

Views: 1,758

Rating: 4.9245281 out of 5

Keywords: ANOVA, MANOVA, Regression, Hypothesis Testing, Multivariate Relationships, Boxplots

Id: FA85ONaVijY

Channel Id: undefined

Length: 14min 56sec (896 seconds)

Published: Sun Mar 14 2021