Principal component analysis in R | PCA for genetic diversity assessment using varimax rotation |

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so hello everyone welcome back to yet another episode in my youtube channel today i am going to talk about principal component analysis which in short also known as pca if you are wondering about how to perform principal component analysis in order to study genetic diversity then this video is for you in this video i will focus mainly on the syntax or coding part which we need to know in order to get different things such as script load by product and i will also talk about rotated component matrix which we get by giving the rotation so let me clear it first here i won't get into the technical other math which involves calculating principal components rather than that i would like to focus on why we need to do pca how we are supposed to do it in our and where we need to take our decision under all so if you ask me why the first and the foremost thing which comes to our mind is dimension reduction here the word dimension refers to variable if you don't know how let me explain that with an example of p data set here this is the data of p so you can see i have intentionally taken only seven genotypes in order to keep them clean in graphs and i have also considered some of the categorical variables so these are all the categorical variables which will be converted into factors in r and i have also considered some of the numerical variables which can be quantifiable with the help of numbers so here i will consider yield per plant and let me show you how can we interpret this variable with the help of graph when we have only one variable we can use an histogram to represent the distribution of the data here you can see p725 and ec59820 are the two genotypes for which yield was between six to eight grams per plant and rest having only one genotype as you can see in the bins we can narrow the bin sizes but that is not our interest here if we have two variables suppose plant height and healed we can use scatter plot considering yield in x-axis and height in y-axis here you can see v-22 on scatter plot here in the scatter plot we can guess the height of b-22 is somewhere around 65 centimeter and the yield is exactly around 4.5 gram per plant in the next case what we have to do when we have three variables is we can interpret our results on a display screen with the help of computing power from the processor so here in the 3d plot if your eyes won't lock on the z-axis then you understand how it feels later if you try to find the variables their values then your brain starts to overclock so 3d plots are not the best option when documenting especially on paper it is not possible just imagine if we have more than three variables what we have to do we won't have more than three dimensions practically and don't think about this theory it's there only on books so what we can do is we can address this problem logically what we can make is we can add some of the additional identifiers in order to identify the different variables and the additional identifiers works well with the categorical variables here i will consider flower color so that is identified by the standard watercolor here you can see that the b22 genotype having height around 65 centimeter with the heel 4.5 gram plant is also having purple color flower and we can also consider one more variable so here i will consider shape as an additional identifier in order to identify the seed surface here you can see that the b22 having 65 centimeter right and 4.5 gram yield per plant with purple color flowers is also having smooth seed surface and up to here it's okay when we try to add an numerical variable that is the time where graph seems to be tough to interpret here you can see i have added an additional variable primary branches with the help of an additional identifier the size so yeah how can you compare to get the value of the primary branches just by looking into the size so here it is okay because we are having less number of genotypes and this logic will won't help when we are having more number of variables or more number of individuals so that is where the principal component analysis comes in so how it is going to reduce the number of variables means so consider this is a data set with a number of variables imagine this as a excel sheet and the different colors represent the different columns of an excel sheet where our variables should be stored so how principal component analysis works means it will create a new data set with n number of principal components in such a way that the first principle component will be having most amount of information from all the n number of variables when compared to the second principle component so here the first principal component has 29 percent of information and the second has 20 that is how when we address a by plot with the first two principal components we will address 49 percent of variability here in this case and this trend continues across all the principal component in such a way that the loss principle component will be having least amount of information so this is what our script plot represents and the script load will be helpful in deciding how many number of principal components to select for future predictions so if you are interested in theoretical part check out the links given in the description so with that i will get into the practical part so here the first thing we need to know about is the data under structure in simple terms how we need to take our data in an excel sheet here i have stored my data set in a folder known as cotton pca on the desktop so this is how our data set looks like here you can see i have considered 139 different genotypes along with 18 different variables here i have not considered the replication so each and every row we have unique genotypes so the total number of rows is equals to total number of genotypes in case if you have replication consider the mean value by taking the averages of replication so at last we need to save the data set before importing it into the r with that let's get into the r studio and start coding here i would like to give some tips for the beginners if you are a beginner and don't know anything about r and pc don't skip the video here because instead of getting something the chances of getting into the trouble is more and r is case sensitive so don't mess with the cases capital o is not equal to small or if the text is not visible then please do consider increasing the resolution of the video make sure that you will pass comma after each and every argument after completing a line of code we need to execute that line by pressing ctrl enter or command enter by keeping all those things in your mind let's import the dataset so click on import dataset and select from excel if you are using the r studio for the first time you may not get this interface so give all the permission what it asks then you will get this interface so from here we need to click on browse and select the data set here i have stored my data set on the desktop on a folder known as cotton pca so within that select our data frame later ensure that all the variables are in numbers which mean doubles in case if you want to change the name of the data frame you can change that from here by default the first sheet will be selected if you are having more than one sheet you can change that from here and this is what the code which runs behind this graphical user interface in order to import the dataset so click on import once the data gets imported into the r studio you can view that from here and if you want you can cross verify also from the script if you type the name of the data set that is quarter then execute you will get a brief outlook at the data set in the console and if you want to see the entire data set use the view function here the view starts from capital v within that if we mention our data set name that is quote on we will see our data set in a subject tab when we execute that line here in the r studio itself and next the most important step in our data analysis is scaling that data here we need to scale our data because one of the stepping stone in our principal component analysis is variance at covariance matrix as we all know that the covariance is sensitive for scale and we need to scale the data in order to avoid a single variable dominating over other variables here i will store my scaled quadrant dataset in an object known as scd which in short refers to the same only here i am going to use a function known as scale within that we need to mention our data set name that is quantum later we need to select all the rows and columns we need to scale here we need to select only numerical variables because we won't do the pca for categorical variables or other data here in my dataset as you can see the genotype and name doesn't have any mathematical importance in the pca so we need to take out the first two columns in order to do that within big brackets mention comma so whatever we write behind comma or rows since we need all rows i won't write anything in order to take out the first two columns i will write the code as minus 1 is to minus 2 that is from 1 to 2 we need to take out the column that is why i'm mentioning minus then if you mention center is equal to true or not by default it will consider center as true only so the original argument center is equal to true doesn't have any practical sense here for the reference purpose i am mentioning here when we execute this line the scale data will be stored in scd so execute scd in order to get a outlook at the scaled quadrant data set this is what our scale data looks like so from here onwards whenever i mention the data set it will be scaled quarter data only and in the next step i will adjust the option in my favor here we need to increase the maximum amount of lines which has to be printed in the console so in order to change that use options function within that mentor max dot print has an argument then we need to select the number of lines which has to be printed here i will select 10 000 lines so when we execute this line the option will be changed to 10 000 lines to be printed in the console in the next option i will turn off the scientific display in order to do that we need to mention siphon is equals to 100 this is the code to turn off the scientific display so that we can avoid the logs and exponential terms printing on the console and the last option is setting the working directory so from session set working directory click on choose directory from desktop select the original folder where our data set has been stored that is quarter page here if we click on open the working directory will be set then we can use get working directory function that is get wd within paranthesis don't mention anything so if we execute this line we can cross confirm our working directory so after setting the options right we need to install some of the visualization packages and the first and the four most package we need for visualization is facto extra so use install.packages command within the parentheses mention facto extra using double inverted commas when we execute this line the packet starts to install as you can see the red beep here in the console so wait until it stops alternatively we can also install a new package from the cran so click on install and type the package name that is facto extra here so select the package facto extra and when we click on install the package will starts to install but i will cancel here because i have already installing it with the help of command so after the red beep stops here in the console you can see the package has been successfully installed then we need to load the package so use library function within the parenthesis mention the package name that is facto extra when we execute this line the package will be loaded into the environment and the next package will be useful for visualizing the principal components in three dimension so if you are interested in that then only you install this package so within install.packages command pencil pca 3d in double inverted commas and we execute this line the packet will start to install since i already have this package i will skip this line and load the pca 3d package directly into the environment using library function and this package only works fine with the windows operating system i tried using with mac but it had some compatibility issues with that let's get into the pca by using built-in functions so if you have wonder why there are two different functions built in r that is in stash packages because the way in which the works is different so among the two function let's consider the less popular spectral decomposition function here i will store my output in an object known as pca1 and the function which works on spectral decomposition is break comp from the stats package so select this function within print com function we need to mention our scale quadrant dataset that is scd when we execute this line the output will be there in pc a1 so when we execute pca1 we will get the standard deviation so these are the standard deviations when we square a standard deviation we will get the variance here the variance refers to even value in order to get a pipeline we need to use by plot function within that we just need to pass the object where our pci has been stored here pca1 if we execute this line we will get the plot in close pane and from the object where the pc has been stored we can access these things so in order to select we need to use a dollar symbol so your dollar is the symbol of select so from pca1 i will select the scores when we execute this line we will get this course here in the console so i won't talk much about this function i will get into the second one which is most popular and that is based on single value decomposition here i would like to store the results of pca in an object known as pca2 and the function which works on singular value decomposition is p or comp from the stars package so within paranthesis we need to mention our data set that is scaled quarter data when we execute this line we will get the output in pca2 so when we execute pca2 we will find these results in the console and here we will get the scores and also the standard deviations so here also same if we square the standard deviation we will get our eigenvalues so in order to select the score which means the individual score not the variable scores which are already printed in the console by executing pca2 so in order to get that from pca to the object where our pc has been stored we need to use dollar symbol from that this drop down menu appears so from here if we select x we will get the individual scores so when we execute the ebola this course will appear in the console you can scroll through them in order to get the proportion of variance and the cumulative proportion of variance explained by the principal components we need to use summary function within that we just need to mention the object where our results has been stored here pca2 so if you execute this line the results will be out in console in order to get the pipeline use the same pipeline function within that we just need to mention the object here pca2 so here if you observe the by flows from the older function princ and this function pr com they are not same why because if we compare the scores stored in two different objects of the pca results that is pca1 and pca2 they will not be the same why because if the values are same then the vectors will not be same that is identified by plus or minus so don't intermix the results from two different function you either use printpoint function or use pr com function don't use a byproduct from the pr comp function underscores of print code function with that let's move to the visualization path of our pca so here our efs pca functions from the package facto extra comes to use so here let's see how we can use the factor extra function for our visualization the first function here is f is underscore pca this is the basic function in order to get a byproduct so within this function we need to mention the object where the pcr results has been stored here pca2 so if we execute this line we will get the pipe node directly since these functions provide a high quality good grasp they will take a lot of time in order to produce the plot so wait until the plot appears later from here you can zoom in the plot and if you want you can take the screenshot of this plot from here itself as you can see there is a lot of overlapping of labels so in order to address this issue we need to use an another function and the function name is f underscore pca underscore by pro and this function provide us with a lot of customization options so within that we just need to mention the object where our result has been stored that is pca2 so when we execute this line without any additional argument we will get the same by plot so if we pass an additional argument that is repel which is equals to true when we execute this line we will get a pi plot in which the labels of moon are set with the help of a line attached to the point so here we will get some of the warning messages and that warning messages is true when we are seeing the plot within the r studio without expanding it so when we zoom it in we will get some of the labels which have been missed so here you can see and you can confirm that with your eyes manually so you can see most of the labels have been there though there is warning but some of the labels might be missing so i will address this issue later how to get the plot with all labels and the next function which i'm going to talk is f is underscore pc underscore variables here the variables have been mentioned in short by var so within that we need to mention the object where the results have been stored here pca2 so if we execute this line we will get the plot indicating the direction of variables in the body float so ignore all those warnings if you want you can also use reposition here so if you click on zoom you will get the full plot as i said earlier we can also use repulsion here so pass additional argument ripple is equal to true then we will get a plot like this so in order to get all the information about the individual we can also use fvis underscore pc underscore individual code here ind is a short form for individuals so within that mention the object it is pca2 and execute this line in order to get the individual plot where the individuals have been represented on principle component 1 versus principal component 2. so here dimension 1 on dimension 2 refers to principal component 1 and principle component 2 respectively so in order to get a screen plot use your fizz underscore script pro function within that mentioned the object where pc has been stored here pca2 when we execute this line we will get a screenflow i will talk about the customization of this port later the main reason or the only reason why i use a pr conf function from the starts package is to show you the 3d visualization of principal components here the pci 3d package comes to our help and the function which i am going to use is pca 3d from the package pca 3d within that we just need to mention the object of class print comp or pr com that is pca2 so when we execute this line the 3d plot will appear in a new tab so here you can see this is the 3d plot containing three principal components that is pca one two and three this is only possible with the windows operating system i tried using in mac but it was not possible and in order to get help or know more about pca 3d function type question mark after that mention the package name or the function that is pca 3d here you will get more information what are the arguments we can use to customize the 3d plots so here let me show you the most important customization part that is adding labels here everything remains same within the pca3d function and the only argument we need to pass after the object pca2 is show dot labels which is equal to true when we execute this line you can see a 3d plot with labels i won't recommend this if you have more number of individuals or genotypes as the lord is more clumsy you can only use this when you are having less number of genotypes and look at all the three dimension and take a screenshot where you can see more number of individual and you can use that in your thesis and we can also make it look a little bit fancier so in case if you pass another argument fancy which is also equal to true then you will get a 3d pro like this so in case if you are interested in 3d class then only you can use this procedure in which we need to run this line of code and there is no need of pci 3d package also so you can directly jump after loading the factor extra package here i am going to use another package which is very much fine tune for our data analysis and that is facto mine are here facto starts from capital f and minus starts from capital m and r is capital so while installing the package don't get confused with the cases use install dot packages command within parenthesis by using double inverted comma mention factomine as i said ago so there is no need to install this package because i've already installed i will directly load this package by using library command so within library function we need to mention factominer so when we execute this line the packet gets loaded into our environment and i am going to use this package because this package also works on singular value decomposition here i am going to store my pc output in an object known as fpca for me it refers as full pca since i am going to use all the 18 different components here we need to use a function known as pca all the letters should be in upper case so within that we need to mention our data set that is scaled quarter data set is cd then by default it only consider five principal components in order to change it we need to pass additional argument known as ncp number of components here i will select it as 18 because i have 18 different variables so when we execute the output will be there in fpca so when we execute fpca we will see what are all the results stored in fpca in order to access a particular part of the result we need to type this thing in front of our object fpca so if we type and execute these things we will get the results what are stored in that so let me show you how to get the eigenvalues stored in fpca so type fpca and then type dollar symbol then from here we can select the eigenvalue table so if we execute we will get the table containing eigen values percentage of variance and cumulative percentage of various and this table is very much important for us i will show later how to import this table directly into the x incident in the same way we can also access the different results so in order to get this course select the object that is fpca then use dollar signal in order to select then from here we need to select the individuals because we need individual scores then dollar simple then select on coordinates so if we execute then we will get this course the raw principle component scores and you can scroll down in the console in order to see all the scores so after that let's look into the different things which are not so much important yet useful in our data analysis so the first thing in this part is square question it shows the importance of a principal component for a key one observation in order to find it select the object that is fpca and use dollar signal from here select the individual then use dollar symbol from here select cos 2 so when we execute this line we will get our square questions so the next important part is the contribution of variables so from the object where our pc has been stored that is fpca select the variables by using dollar simple so after selecting the variables then select contribution which in short quantum so when we execute this line we will get the contribution of different variables so with that let's move on to the visualization part the main reason why i selected the facto mine are is the compatibility of if this function of factor or extra is better with the factominer package in order to get a basic by plot we can use fvis underscore pcf function within that we just need to mention the object here fpca when we execute we will find a by pro but this is not customizable in order to get a customizable by plot we need to use another function known as fps underscore pca underscore pipeline within the parenthesis mention the object where pc has been stored here if pca then the first customization option which i am going to use is repulsion so mention repulsion is equal to true if we execute this line we will find the pipe plot with the neighbors mode aside so ignore the warnings it will be there wait until the plot appears so when we click on zoom the missing labels will appear in zoom version so here you can see most of the labels are there but i think some of the labels might be missing here so we can get this label when we export this graph with adequate resolution that i will show you shortly so let's talk about further customization if you have time to interpret the quality of representation of individual you can also use square cosine so here i will use an additional identifier color so here we need to pass additional argument c o l dot ind which refers to color of individual so mention cos2 within double inverted commas so when we execute this line we will get a pipeline in which we can also determine the quality of representation by color since the individuals and variables are having same color that is blue i would like to change the color of variables to red so here in this line only pass the additional argument col dot var which refers to color of variable so i will keep it as red so mention red in double inverted commas when we execute this line this beautiful bike plot appears so this is what the bike load which i was looking to export in order to export close this window and click on export then save as image from this window we need to select the resolution here the resolution will be in pixel so i will keep the width 1920 pixels so if you have maintained the aspect ratio that it will also changes automatically if you want to change the name you can change the name of the backboard from here so if you click here we will get a preview so when i click save the plot will be saved in the working directory and we will get the preview here itself and you can see all the labels which have been missed in the our studio starts to appear in the exported image as i said earlier this plot will be exported in the working directory so if we go to the working directory we will find our plot with that let's move back to the r studio and start working on code if you want to know further about customization you can ask for the help in this r studio itself so after question mark mention fs pcf function so if you execute this line you will find help here with that let's move toward individual scatter plot so the first function which i am going to use is fs underscore pci underscore ind underscore will be there within two words in each and every function so i won't keep the link underscore underscore in each and every time so within that function we need to mention the object where the pc has been stored it is fpca if we execute this line we will get the basic scatter plot here also we can use the repulsion so here i will keep ripple is equals to true here there is no need to mention true with all words if we mention uppercase t then r will read as true only so when we execute this line we will get the scatter plot with the labels having repulsion so here also we can represent the quality so pass the additional argument color of individual and keep it to cos 2 so mention cos2 within double inverted comma then we will get the scatter plot having the quality of representation of individual with that let's see what we can get from variables if we use a phase pca variables function we will get the plot indicating the direction of variables in first two principal components or first two dimensions so here also we can keep the repulsion on so within the fspca variable function mention the first argument that is fpcf where the pc has been stored then pass the additional argument return equals to true when we execute this line we will get the labels having repulsion and the most often practice what we will do with this variable plot is we will represent the variables by the contribution instead of quality of representation so in order to get that kind of plot pass the additional argument color of variables and set it to contribution so here contrib that is the short form of contribution mention that in double inverted comma so when we execute we will get this kind of variable plot and the next kind of plot helps us a lot in determining number of principal components that is script so here the function which we need to use is fs script within that we need to mention the object where the pc has been stored that is fpca so if we execute this length by default we will get only 10 number of components why because we will select the number of components less than 10 only that is most often case what we will do so in order to get all the 18 components we need to pass the additional argument ncp here ncp represents number of components since i have 18 variables i will mention 18. so when we execute this line we will get script plot with 18 different principal components most oftenly we won't consider the bar in script load in order to take out those boards we need to pass an additional argument that is jio here we need to set it to live so mention line with it double inverted commas when we execute this line we'll get a line script plot so if you want a bar screen plot instead of line within geo mention bar so if we execute this like we will get a basket and if you care a lot about aesthetics you can also change the color of bar so in order to change that pass the additional argument bar fill and set it to red color so here mention red with it double inverted comma so when we execute this line we will get a red script plot instead of using proportion of variance in y axis we can also consider even values in order to do that use additional argument choice and set it to egon value and we can also pass the additional customization arguments here also and this is what all i wanted to talk about script load so the next important part is selecting how many number of principal components to be used for future prediction and this is highly subjective thing and there are different way of selecting the number of principal components you can do either by looking script or looking the numbers present in the egon value table which also contains percentage and cumulative percentage of variation explain so i will expand the console in order to get a better view so this is what the table so in case if you already fixed your mind no matter whatever the percentage of variation covered by first three principle components i will only use first three principal components then that is also good here the first three principle component only explain 39 percentage of variation and you can also target the total cumulative percentage of variation explained by the number of principal components so if you want 50 you can select five principal components and in case if you want 90 percent you can select 12 principal components while interpreting our results we will mention those components having more than 0.5 eigen value contains good amount of information about the variability but we can't use 13 different principal components what we can do is we can consider those principal components having eager value more than one here seven principal components this is what the default method what we will find in spss and here i will consider seven principal components for my future prediction the table what you are seeing in the console is very much important for the interpretation of results in the thesis slide so let me tell you how to import this table directly into the excel sheet so click here to collapse the console so from here i will store the table separately in an object known as table 1. so from the object where pc has been stored if pca select the eigenvalue table by using dollar simple so when we execute this line the table will be there in table 1. before importing we need to check the class of table one so within class function mention table and if we execute this like we will find the class as you can see this is a matrix of array so before importing it into the access it we need to convert the matrix into data frame so select table one and use as dot data frame function within that mentioned table one so if we execute this line the table one will be converted as a data frame so check the class of the table once again and here you can find this is a data frame now this is ready to import into the excel sheet in order to import the data set we need an additional package known as write excel so if you don't have use install dot packages command within that mention write xm in the well inverted comma so if we execute this line we will get the right excel package installed in our studio since i already have this package i will directly load this package using library command so within library function mention write excel so if we execute this line the package will be loaded in order to import the data set we need to use write underscore xlsx function within that we need to mention the table which has to be imported here it is table one after that we need to mention the name of the table within double inverted commas so here i will mention eigentable dot excel s6 here dot xlsx refers to the format of excel sheet once we execute this slide the table will be stored in our working directory so if you go back to the working directory you can find the table here icon table so if you click on this table we will find the table imported in our working directory so this is how we can import any table and you can import any table if you want by following same method so in other principle component analysis video i did see a question what is the most technical way of judging number of principal components so in order to address this question we need to convert the eigenvalue table into a data effect first since i've already converted the table into a data frame here i will use plot function to plot the cumulative percentage so with the port function select the table 1 and use dollar symbol to select the column so here select the cumulative percentage of variance so if you execute this line we will get this plot and here we need to look at the point where we will get a straight line here we will get the straight line and this is what the ergo method or branching method denotes so i need to consider 15 different principal components based on this method so here i won't consider this method i usually follow the spss default method and i do consider seven principal components for the rotation so this is what all i wanted to tell about principal components so in case you have any doubt regarding principle components attach it in the comment section below now i am going to the next concept that is rotated components in order to use rotation the only package which comes close to the spss is cycle so we need to install the package cycle in order to install the package we need to use install.packages command within that mentioned site in double inverted commas since i already have this package installed i will load this package directly into the environment using library connect so within library function mention site and execute this line to get the package loaded in our studio so yeah don't worry about using a different function from a new packet psych in order to keep rotation why because if we give rotation we do abandon the principal components that is the reason why i am using a new package psych for rotation in case if you are not convinced and want to know much more about rotation check out the article given in the description below so here i would like to store my results in an object known as rpca rotated principle components the function which i am going to use is principle everything should be in lowercase only within that we need to specify the set that is scale code data scd after that we need to specify the number of components has to be rotated here it is 7 after that we need to specify the type of rotation and the most famous and most often used rotation is very max so mention vary max within double inverted commas and we need to give an additional argument that is scores and we need to keep it as true because we need scores for our interpretation when we execute this line the output will be stored in our pca so when we execute our pca we will get our results so this is what our results looks like here don't get confused from the terminologies what we get in the results you can also consider rotated component as principal component but that is not appropriate in order thesis we find principal component instead of rotated component and we have inherited the same thing from the system thesis what someone followed using some software which throw the result as principal component instead of rotated component here i don't get into the discussion if you want to know about that check the article given in the description below which tells more about the rotated components so here you can find the results along with the rotated component matrix that is rc1 to artisan and here it square refers to the commonalities and we can also access a particular section of the result in order to do that from the object where our rotated components have been stored that is rpga use global assembly and from here we can select a particular thing so i will select common of it is here so when we execute this line we will find the commonalities here and in the same way we can also access the loadings of rotated component in order to do that from the rpca select loadings when we execute this line we will get our rotated component matrix with the cutoff and why this happens because in order to find the association of a variable with the rotated component here you can find that the first variable is strongly associated with sixth rotated component and the second variable is also associated strongly with the sixth rotated component and we can also stop this kind of results while printing so in order to do that within print function mention the object where the loadings have been stored that is rpca within that select loadings and pass the additional argument here i will first get a round figure of three digits from the decimal point later i will pass an additional argument telling cutoff should be equals to zero when we execute this line we will find a normal rotated component matrix here in the console and this is a little bit tough to get into the excel sheet so we need to follow an indirect force in order to do that we need to expand the console first then i will take a screenshot of the figure containing the rotated component matrix then i will save it in the working directory after saving it in the directory i will get into the google chrome and search for image to access it converter then we will find this website from here we need to select extra table.com so when we enter this website this kind of interface will appear so click on anywhere here then this window opens from this window we need to select the location where our screenshot has been stored so if we click on open it asks permission to upload this image so click on ok then the image will convert into an excel table automatically as you can see here then select the format in which you would like to download this table i will select excel and click on download so once the table has been downloaded you can enable editing and change the order or the sequence of rotated components and if you want you can plot the rotated component matrix here only but i will do that in the r studio so i will collapse the console and in order to get a rotated component matrix bar plot we need to use the bar plot function from the graphics package within that if we just mention our loadings stored in rpca we will get this kind of bar plot which is completely basic that to stack it and in order to get a bar plot in which the stacking is not there we need to pass an additional argument that is beside and we need to keep it as true when we execute this line we will get this kind of bar plot and it is very much easy to change the color of bars so in order to do that we need to pass another argument color so c-o-n is the short form and we need to set it for the desired color here i will set it as blue so menses blue within double inverted commas and i would also like to add the main hinting in order to do that use another argument main so for main we need to mention the heading within double inverted commas who mention the heading here rotated component matrix within double entered commas when we execute this line we will get this kind of bar program for reference purpose i will bring the main argument to the next line so hit enter after the comma so in order to get the fancy plot what we see in our thesis in which the pass will be having different colors which indeed refers the different variables in order to do that we need to have an additional or color palette the only package which can handle 18 different colors in a palette is files so if you don't have this package install it using install.packages command so within parallel this is mentioned pals using double inverted commas since i already have this package i will directly load this package using library function and you can just copy this line and paste it over here so here we need to change the color right so take off the blue and use the function which is present in the package pals so here i will use brevard dot accent as the first function within that we need to mention the number of variables or the number of colors so here 18 because i have 18 different variables so when we execute this line we will find this kind of bar block so you can see there is a little bit of overlapping in this colors so that is acceptable and we can also try different shades of a particular color so here same i will just copy and paste this line and i will change the function of color to brief or dot greens which in that we need to mention the number of sets required here i need 18 different shades of green so i will mention 18 when we execute this line we will find a bar pro like this in which 18 different shades of green has been present so you can also play with the different color palettes which are available in the package panels so at last i would like to show once again how to get this backlog we need to use bar block function within the the first argument will be our loadings then we need to avoid the stacking of bars so pass the argument beside is equals to true after this argument we need to select the color so here i will select the color back so in order to do that we need to use alphabet function there are two function alphabet and alphabet two you can use either of this and if you want to know about their colors i will give that in link in the description below so within alphabet we need to mention the number of colors so mention 18 then we need to pass the main heading so mention rotated component matrix with double inverted commas when we execute this line we will get this kind of bar plot which is very much appending to our eyes so you can use this method to get the rotated component matrix which i recommend for you when the number of variables is less than 26 that is equal to number of alphabets so here we can also rearrange this rotated components by using code but it takes a lot of time so so you can do that manually in a photoshop or presentation what i recommend after rearranging these rotated components is the zero what you see in the scale should align with this line in the bar plots so that is the correct method to reanimate so finally we are in the last part of the video here i will show you how to import the scores of rotated components into excel sheet so if you don't know where the scores have been stored they are stored in the object rpca so use the dollar symbol to select this course when we execute this line we find this course here in the concept but they are very much hectic to read from here so i will store the scores in an object known as scores when we check the class of scores we will find that it is a array of matrix and we need to change it to a data frame so push the scores into ask.dataframe function and change the class from array to data frame so if we check the class of scores we will find that it is a data frame now so use write xlsx function within the parenthesis mention the data frame name as first argument and the second argument is scores.xls6 the name with which we need to import the table along with its format so if we execute this line the table will be there in our working directory so go to the working directory and you will find this course table here the first thing we need to do is we need to add serial numbers as identifiers then you rearrange the rotated components then we need to sort the table in either direction so first sort the table from smallest to largest then from largest to smallest then we need to identify the top 10 genotypes based on this course and we need to also identify the genome types which are having scores more than one and we need to do this ignoring plus or minus signs because those are vectors only and they don't have any real value in this course so the easiest way of identifying this course is converting from the normal phone to board phone so hit ctrl b to mark this course so we need to continue the same treatment for all rotated components finally with the help of identifiers or the serial numbers we can rearrange the sequence of genotypes so when we sort it by smallest to largest number we will get the original sequence and we need to mark the top scores by considering different rotated components in combination so i do attach the thesis for reference in the description below so read that it will help you for your interpretation i hope you guys like the content and subscribe my channel because that is the only support i ask from my community and finally thank you for more videos check out the data analysis playlist in my youtube channel you
Info
Channel: The Outlier
Views: 496
Rating: 4.8571429 out of 5
Keywords: 3d PCA plot, FactoMineR, PCA for genetic diversity, PCA in 3d, PCA in Agriculture, PCA in plant breeding, PCA in psych, PCA3d, factoextra pca, factor analysis, pca, principal component analysis, principal component analysis example, principal component analysis tutorial, rotated component matrix, varimax rotation, PCA step by step, principal component analysis step by step
Id: SzGpkfqwUu0
Channel Id: undefined
Length: 52min 16sec (3136 seconds)
Published: Wed Sep 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.