52 Using the GLM Procedure

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
let's use this proc GLM statement to first verify the ANOVA assumptions then let's test our ANOVA model to answer the question of whether the type of fertilizer the farmers use affects garlic bulb weight let's submit this code let's first check the log and verify that the code ran successfully it looks good now let's move on to our results SAS provides everything we need to verify our ANOVA assumptions and the ANOVA test results it's a good idea to check our assumptions first but that means we have to examine the output in a slightly different order the class level information table specifies the number of levels the values of the class variable and the number of observations SAS read if any row has missing data for a predictor or response variable SAS drops that row from the analysis you assume that the farmers did a good job at sampling garlic bulbs two way by randomly selecting them so you assume that the observations are independent and check that off your list to verify if the variances are equal across fertilizers you can first examine the residuals by predicted plot this plot helps you to see graphically if the equal variance assumption has been met you don't want to see any patterns or trends but rather you want to see a random scatter of residuals above and below zero for the for fertilizer groups the plot looks good you can examine Levine's test for homogeneity to more formally test the equal variance assumption you don't want to reject the null because that would be rejecting one of your assumptions so you want a large p-value for this test because the p-value of 0.4 173 is greater than point zero five you fail to reject the null and conclude that the variances are equal this is good you verified the equal variance assumption to verify the assumption of the errors being normally distributed you check the normal probability plot and histogram of the residuals because the residuals follow the diagonal reference line fairly closely you can say that they are approximately normal the histogram of residuals looks approximately normal as well it has no unique peak and it has short tails but it's approximately symmetric so you verify the assumption that the error terms are normally distributed now you can look at the ANOVA table and feel comfortable interpreting your p-value this is the ANOVA output from proc GLM first you see the overall ANOVA table the first column is information about the degrees of freedom you can think of degrees of freedom as the number of independent pieces of information or the number of values in the final calculation of a statistics that are free to vary the next column is information about the sum of squares the model sum of squares is point zero zero four five eight the error sum of squares is point zero two one eight and the total sum of squares is point zero two six for the mean square model is point zero zero one five SAS calculates this by dividing the model sum of squares by the model degrees of freedom which gives you the average sum of squares for the model the mean square error is point zero zero zero seven eight which is an estimate of the population variance SAS calculates this by dividing the error sum of squares by the error degrees of freedom which gives you the average sum of squares for the error SAS calculates the F statistic by dividing the MSM by the MSE the F statistic is 1.96 because the corresponding p-value of 0.14 3 2 is greater than point zero five you can conclude that there is not a statistically significant difference between the mean bulb weights for the for fertilizers remember you are testing if the means for the for fertilizer types are equal so you fail to reject the null at this point it's important for you to realize that the one-way ANOVA is an omnibus tests to stick and cannot tell you which specific groups are significantly different from each other only that at least two groups are different to determine which specific groups differ from each other you need to use a post hoc test the next table contains the R squared which is the proportion of variance in the data accounted for by the model the R square is between 0 & 1 it's close to 0 if the independent variables do not explain much variability in the data and it's close to 1 if the independent variables explain a relatively large proportion of the variability in the data this R square is 0.173 4 so approximately 17 percent of the variation in bulb light can be explained by fertilizer fertilizer doesn't explain much of our variability interesting the coefficient of variation expresses the root MSE as a percentage of the mean bulb weight it is a unitless measure that is useful in comparing the variability of two sets of data with different units of measure the root MSE is the estimate of the standard deviation of bulb waits for all fertilizers the bulb weight mean is the mean of all the data values in the variable bulb weight without regard to fertilizer now let's look at information about our class variable in the model fertilizer when you have one predictor variable in an ANOVA model the breakdown of the variable in this table is the same as the model line in the overall ANOVA table because you have a balanced design the information for type 1 and type 3 sums of squares is the same all in all the proc GLM output supports your conclusion that there's not a statistically significant difference between the mean bulb weights for the for fertilizers
Info
Channel: Saurabh Singh
Views: 4,849
Rating: 5 out of 5
Keywords:
Id: rGui9HjW2u8
Channel Id: undefined
Length: 6min 37sec (397 seconds)
Published: Sat Jun 24 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.