Principal component analysis

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this data includes 100 randomly listed individuals from two groups Group A and Group B for each individual we have measured nine properties of that individual listed as the variables V 1 to V 9 we wish to use a principal component analysis to see if we can combine these variables to derive new components which will produce a simpler description of the system so we go to stat multivariate data principal components we select all the variables and under graphs we select to see the scree plot and the score plot for the first two components and under storage we will store the coefficients of the first four principal components so we go to storage and we will store these coefficients in four new columns which will be C 12c 13c 14 and C 15 okay and we now run the analysis we can look first at the scree plot which shows the contributions from each of our new principal components and this is quantified by the variable eigen value and quite simply the most significant contribution comes from the first component a significant contribution from the second component and further components give very little additional information to the system so we're concerned mainly with the first two principal components and if we then look at the score plot which based on these first two components where we see each subject in our data has been plotted on the basis of its first and second component values we can see the value of this plot if we right click on one of the symbols we can then edit the symbols and we will choose groups and we will identify the categorical variable we want to relate to these groups so we will use the grouping variable for a or B click OK and now we can see that each data point has been identified according to the original group in the data that it came from either group a or group B and we can see now that the majority of Group A are on the right-hand side of this plot and the majority of Group B on the left-hand side of the plot so actually the first component of our data has been fairly effective in separating Group A data from Group B data and we can see that only a few data values have been misclassified so there's only a few of Group A that are on the left hand side of the zero line and there's only a few of Group B that are on the right hand side of the line we can now look at the coefficients of the principal components that have been produced so we will look at the data in the worksheet and we can see that column C 12c 13c 14 C 15 now contain the relevant coefficients I can now give titles to these principal coefficients so I call the first one principal component 1 PC to PC 3 PC 4 and for clarification I will add a column here where Allah I will identify the relevant variables B 1 B 2 B 3 etc so that now we can understand this table of coefficients in that principal component 1 is then a linear combination of these variables V 1 to V 9 with these relevant coefficients so P C 1 is equal to 0.04 six times V one plus point four six five times V 2 plus 0 point 2 4 times V 3 etc clearly some of these coefficients are much more significant so that for the first principal component PC one it is mainly a combination of V 2 V 5 V 6 + V 9 which agrees with the grouping that we saw in cluster analysis similarly principal component 2 is a combination of other variables principally V 3 V 4 + v 7 again agreeing with our cluster analysis calculations we can choose to store the calculated value of PC 1 for example for each particular observation in a new column so for example in C 17 we can say we will store the value for the principal component 1 and we will calculate that value by using calculator we restore the results in the new column given the name V PC 1 and for this calculation we will take the coefficient of V 1 so we'll take the coefficient PC 1 but we want that to be in row 1 of our data so it is point 0 4 6 5 5 6 and we will multiply that by the value of V 1 which will be there we will then add the coefficient of V 2 for PC 1 which is in the second row of the PC 1 column and that will be multiplied by V 2 and similarly we do this for all the other coefficients so the next coefficient of V 3 is in Row 3 multiplied by 3 3 and we continue until we have each of these coefficients multiplying the relevant variable and we can then just click OK it then calculates the value of the first principal component V PC 1 for each subject within our data set
Info
Channel: Oxford Academic (Oxford University Press)
Views: 376,524
Rating: 4.7411346 out of 5
Keywords: yt:quality=high, video, oxford university, Oxford university press, education, publishing, scholarship, oxford, oup, oup academic, Oxford academic, Currell, statistics, data analysis, screencasts, research
Id: eJ08Gdl5LH0
Channel Id: undefined
Length: 7min 37sec (457 seconds)
Published: Fri Mar 06 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.