How To Analyze Data In RStudio? Six Bachelor Level Analysis Methods Quickly Demonstrated.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so in this video I will talk about how to perform six basic bachelor level data analytical methods descriptive statistics correlation t-test chi-square ANOVA a linear regression in one of my previous videos I talked about how to perform these analyses in SPSS and today we're gonna do them in our studio SPSS is an expensive piece of software there are and our studio are completely free which is why many universities these days are switching from teaching SPSS to teaching R so essentially we can do our data entry in Excel or if you think Excel is too expensive we can use calc which is the OpenOffice equivalent to Microsoft Excel calc is completely free and it can save data files in Excel format or CSV format both of which can be imported into R and R studio so if we partner up calc with r we can perform data entry and statistical analysis completely for free so let's get started with our analysis so first let's import some data [Music] so this is an Excel format data file that contains 102 participates about team-building exercises let's first attach the data file and then we're going to do some basic descriptive analysis let's say I'm interested in knowing the average age of the employees included in this sample we can see the mean age is 22 what about the range [Music] youngest 17 oh it is 52 one can also calculate the variance and the standard deviation the standard deviation is of course the square root of the variance which in this case is 6.4 by using the comment summary we can generate a statistical summary of this variable for categorical variables such as gender we can use the command table in order to look at the gender distribution so in this case 66 females and 36 males we could also get the percentages all we need to do is to divide this portion of the command by the sample size which is 102 as we can see over here so about 65% female and 36% male so there we go these are the basic descriptive ngata C's that a bachelor student needs to be able to apply okay now we can move on to correlation let's say I'm interested in knowing whether employees age is related to the amount of experience that they have in the hospitality industry or not because this is a data set from a hotel for that we need to run a correlation we can do a correlation test between age and experience and we can see in this case the Pearson correlation coefficient is 0.7 t6 and we're looking for a p-value that is of course below 0.05 and 2.2 to the negative power of 16 is obviously much much smaller than 0.05 meaning that this correlation is significant we can also visualize the relationship between age and experience [Music] so as shown in this scatterplot the older the employee is the more experienced he or she has in the hotel industry we can also look at the correlations amount multiple variables by producing a correlation matrix let's say I'm interested in the correlations amount variables 10 through 12th and we can ask for the correlations amount these variables we will then see that our studio produces a correlation matrix among these variables and that is how we do bivariate correlation analysis in our studio so we can now move on to doing a t-test let's say I'm interested in knowing whether there is a statistical difference in terms of professional experience between male and female employees in this sample for that we need to do a t-test so first I would like to know what the average amount of experience is for all the employees included in this sample [Music] then I would like to take a look at the average amount of experience per gender [Music] so this would give me the main experience of male employees let's also do female employees I would also like to find out the range for both genders I'd also like to take a look at the variance [Music] let us now do the t-test we're gonna first set variance equal to true [Music] and then we're going to do it again with various equal stat to false and finally we also need to do a Levine's test to see whether the variances are equal or not so these are all the commands we need we can run them all right so first we can see that the mean experience for all the people included in this sample is 4.5 or so years the mean experience for only male employees is 3.9 years and the mean experience for females is 4.75 we could also see that the most experienced male employee has 13 years of experience in the field and the most experienced female employee has 30 years of experience furthermore we can see that the variance values are numerically different between the two genders the experiences of female employees are associated with a variance value of 27.5 whereas male employees have 8.9 or so and now let's take a look at the t-test outcomes we could see that the two T tests yielded similar results in both cases we see that there is no statistical difference between the two genders in terms of their experience the T values in both cases are associated with a p-value much greater than 0.05 however the outcomes of these two T tests are slightly different and that is because in one case we said variance equal to true and in the other case we said to false to see whether variance equal should be set to true or false we have performed the Levine's test and in this case we can see that although the two variance values are numerically different but they are not statistically different and therefore we can conclude that in this particular sample which should set variance equal to true for the t-test and that the first t-test which we performed is the one that we need in this case and that is how you do a t-test in our studio all right so we are going to now do a chi-square test we will perform PI square test in order to test the relationships amongst categorical variables I have already imported and attached the data and in this particular data file we have two variables and these two variables are gender and sport [Music] for gender we've got males and females 70 males and 155 females for sport we have 149 people who prefer badminton and 76 people who prefer soccer let's do a crosstab on these two variables we can see that among the females we have hundred twenty-six females who indicated that they preferred badminton and 29 females we indicated they prefer soccer and for males we'll have 47 males who indicated that they preferred soccer and 23 males who indicated that they preferred badminton let's see if these two variables are related [Music] so we can see that chi-squared value is 48 point 43 and it is associated with a significant p-value three point four to five to the power of negative twelve which is below 0.05 and therefore we can conclude that there is a relationship between gender and sport according to the chi-square test and that is how we do a chi-square in our studio let us now do an analysis of variance I am interested in knowing whether supervisory responsiveness differs across nationalities I have imported and attached the data file let's take a look I am interested in knowing the general level of supervisor responsiveness I am also interested in the level of supervisor responsiveness across three different nationalities so in this data file we've got Germans Chinese and Dutch [Music] we also need to perform the actual analysis of variance should there be significant differences we want to see what these differences are okay now we can run it [Music] we can see that the overall supervisor responsiveness level is 5 point 84 German supervisors are rated 6.08 Chinese supervisors are rated 5 point 87 and Dutch surprises are rated five point one five the outcome of the ANOVA indicates that there is a significant difference the p-value is 0.6 and the multiple comparisons show the difference between Dutch and Chinese is significant the difference between German and Chinese is not and the difference between German and Dutch is and this is analysis of variance in our studio moving on to regression we're going to do a regression analysis trying to predict CD sales on the basis of advertisement spending and airplay time I am using the same data set as I used in the SPSS video I have already imported and attached the data first let's try to visualize the relationship between advertisement spending and CD sales [Music] let's also visualize the relationship between airplay time and CD sales let's do two regressions first we're going to predict CD sales on the basis of advertisement and in the second regression we're going to predict CD sales on the basis of advertisement spending and airplay so using two predictors to predict the dependent variable [Music] we will then ask for a summary of the outcomes of both regressions okay now we can run them the scatterplot shows that there appears to be a pretty noticeable positive relationship between advertisement spending and CD sales and the same goes for airplay and sales now we run the regressions in this first regression analysis we're trying to predict CD sales using advertisement spending we can see that a regression model overall is significant furthermore we can also see that advertisement spending as a predictor is a positive and significant predictor of CD sales in the second regression analysis we can see that the model overall is also significant and that both predictors advertisement spending and airplay time are significant predictors of CD sales and furthermore we notice that airplay time is associated with a much steeper slope so 3.59 versus 0.09 but both predictors are significant so this is a regression analysis in our studio alright thanks for watching this room anyway some random video please like and subscribe and I'll see you next time you
Info
Channel: Ranywayz Random
Views: 7,484
Rating: 4.9398499 out of 5
Keywords: RStudio, R studio, R project, data analysis, descriptive statistics, correlation, T test, Chi square, ANOVA, regression analysis, linear regression, ranywayz, ranywayz random, ranywayzrandom, r programming, r statistics
Id: _I-1EJ58rmk
Channel Id: undefined
Length: 14min 49sec (889 seconds)
Published: Sat Jun 30 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.