Introduction to Multivariate Analysis

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
you in this series of lecture will be discussing multivariate analysis multivariate analysis is a very interesting well used and popular branch of statistics which deals with several variables now most of the univariate analysis results translates itself to multivariate analysis but then there are certain results with a very specific to multivariate studies this is primarily because multivariate statistics account of the interrelationships between the variables which cannot be done in univariate studies so instead of looking at several variables separately in multivariate studies we'll be looking at them simultaneously and hence we'll be able to study the interrelationships between the variables however multivariate studies also has some special specific features of its own which we do not study in case of univariate data this is primarily because some of this may be trivial in case of univariate studies and in certain cases the problems will not arise in univariate studies so these are specific problems primarily intended for multivariate analysis in this series of lectures we'll be primarily concerned with this specific problems that arise in multivariate studies although there will be some lectures we will have to extend the results of univariate studies to multivariate models so that it's easier to look at the multivariate analysis so specifically we will be looking at three different topics which will go on and elaborate as we do this lecture let us first look at some visualization of multivariate data unlike univariate or even bivariate data which can be plotted either on a single line for univariate data or on a two-dimensional plane in case of a bivariate data visualising multivariate data is difficult as you can see from this figure even for three variables we can probably get a three-dimensional plot of this but we do not get a very clear-cut idea as to how the observations lie because primarily what we are looking at is a three dimensional plane on a two-dimensional background so it doesn't give us the depth of the variables in this case so it makes it very difficult when we have three or even more variables in fact in case of four or more variables we can't even plot those data so in that case a visual analysis of the data is not possible for multivariate data in general another aspect of the multivariate data is look at the visualization of this phases now if we want to look at these phases there are different characteristics of each of this phase how do you actually identify individual based on all of these characteristics so we need to synthesize this different aspects of a human face to identify a particular human being so how do you actually take into account all these features there are various application areas of multivariate analysis this include social science where we can look at the gender age nationality of an individual it can include climatology where we look at minimum temperature maximum temperature rainfall humidity precipitation on a particular day it can be applied econometrics where we look at input cost productions profits etc for firm in shishio demographic studies where you look at the gross domestic product the life expectancy the literacy rate all these which leads to the Human Development Index si in medical sciences when we look at systolic blood pressure a diastolic blood pressure a pulse rate of persons in pathological studies where we look at the blood sugar uric acid levels or hemoglobin counts of patients in pharmaceutical studies where we look at several dark drugs which are six hold per day in a pharmacy so these are various areas where multivariate studies can be applied if you look at this pathological studies so when you go to a physician the doctor tells you to do a pathological test maybe of your blood and when you go to a pathologist he takes different measures of this and he gives you a report which might consist of 14 15 different aspects and what the doctor actually does is he combines this different aspects and identifies the ailment so that is done primarily from the doctors intuition but the problem in statistics is can we get some way of combining this data so that we can come to a single conclusion regarding the condition of the patient based on this 15 or 16 different blood study aspects primary objectives of applied multivariate studies can be broadly classified into three groups one is classification of individuals the second is dimension reduction and third is the cause-effect relationship now what is meant by classification of individuals very often we have a group of individuals with several characteristics and we want to find out how closely the individuals resemble one another so we want to find out the distance between these individuals regarding their similarity that is referred to as a problem of classification classification is easy when we look at a univariate case but it becomes more complex the more the number of variables involved the second problem that comes in is the problem of dimension reduction in dimension reduction we are very often faced with a very large number of variables and it becomes very difficult to analyze this very large number of variables together so what we first need to do is to reduce the number of variables in some logical manner so that given a small number of variables that we can look at and say something about this larger group of variables this is what is referred to as dimension reduction problem the third is in F it is the cause-effect relationship this is in the univariate case what we do for regression analysis in analysis of variance now in analysis of variance on in the usual regression analysis we are primarily concerned with one single response variable but very often we may have more than one single response variable we may have several response variables as such in that case how do a set of covariates affect this set of response variables we need to study this together because very often the response variables are correlated among themselves and hence individual regression studies would not lead to as efficient results as we can have if we study them together first let us look at the problem of grouping grouping can be subdivided in to three aspects the first of this is referred to as cluster analysis cluster analysis answers the question can a group of individuals be subdivided into smaller sub groups based on some similarity measure so in this case we want to group the individuals according to their closeness and form them into separate groups there are two methods of clustering one is referred to as partitioning one of the partition methods k-means clustering where we group the individuals into different clusters such that within the cluster the individuals are similar to each other but they're as different as possible if they are from different clusters so the clusters are homogeneous by itself but they are different from each other as far as practicable another type of clustering is the higher detail clustering where we form a tree with branches and the branches actually tell us the position of the individual so the individuals can be looked upon as the leaves on a branch and the closer the branch is to each one branches to another one the closer are the individuals on that branch to the individuals on the other branch so this again tells us as to how the individuals are similar or dissimilar among themselves we look at ways of defining this clusters the first question that arises is can we explain how the clusters are different or similar among themselves this question is answered by what is referred to as discriminant analysis it studies the properties of a given cluster and thereby it identifies the difference between the different clusters once having formed the clusters and having identified the characteristics defining the cluster we next come to the question of whether a new tree newly arrived individual can be classified into one of the given clusters this is a problem of assigning new individuals to discuss ters and is referred to as a classification problem so a discriminant analysis and classification graph would look like this in this case if you look at this two-dimensional figure here we have this black lines running through them this lines actually form three clusters one at the top with the blue colors primarily the blue ones the one on the right is with the red ones and the one on the left bottom is with the green ones and they form three different clusters now each of these clusters have their own characteristics and individuals within each of these groups are similar among themselves and that black lines actually discriminates between the clusters so if we have a new individual we look at what is x1 and x2 values our plot the individual and see in which of the three groups individual lies and thereby identify the individual into the given cluster the next problem that we come to in multivariate analysis is one of dimension reduction so in this case we asked the question is it necessary to analyze all the variables or do a subset of them containing a major part of the information will do for us reduction of variables can be carried out in several ways the more popular methods are the principal component analysis and factor analysis in principal component analysis we take a linear combination of the variables and form a new variable such that the new variable has as much of information as possible from among the given variables in this way we form one to three new variables taking linear combinations each time and maybe instead of a large number of the we can look at the first two or three principal components which might carry 90% or 95% of the information of all the variables taken together and hence we can just instead of all the variables we can just look at these three four variables and study this so it actually reduces the number of variables under study and hence makes easier analysis of the data as opposed to principal component analysis we look at what is referred to as a factor analysis in factor analysis what we do is instead of taking linear combinations of the given variables we look at each variable being composed of a number of factors the number of factors are generally small and each variable is supposed to be a combination of this given factors so it might be that the first second third variable have a large factor which is common to each of them whereas the fourth fifth and the sixth have another factor which is common to them so once we have identified the factors we can just look at these factors which as I said would be smaller in number than the original variables and hence studying this factor would actually allow us to get an idea of what the original variables or what the information in the original variables carry a third aspect is generally to look at what is referred to as a canonical correlation which would be the correlation between two groups of variables one of which if these two groups of variables are highly correlated then one of those two groups of variables we can always drop and hence look at a smaller number of variables and get an idea of the whole set of variables as such the third aspect of multivariate studies is generally the cause-effect relationships so in this case as in a regression or our ANOVA studies we look at whether a subset of the variables affect another subset of the variables or is there a relationship between them and if so is this relationship in the linear form now very often what happens is this many of the covariance or explanatory variables are non continuous they may be discrete they may be categorical and hence we might be taking recourse to something like an analysis of variance or analysis of covariance but in each of these cases what we look at is a number of response variable is more than one so we extend the analysis of variance or covariance models into a multiple analysis of variance are multiple analysis of covariance model which we referred to as the manova or the main cover similarly we can extend the regression model to a multivariate response regression model as well so as you can see from this example we have two variables y1 and y2 into which is divided to three categories so if he just had y1 it should have been a simple ANOVA with this is an ANOVA with one-way classified okay but now we have two variables y1 and y2 and again we have a one-way classified analysis of variance so we call this a multivariate analysis of variance model hora bankova in general in this lecture we have given you an introduction to multivariate analysis we have looked at three specific problems of multivariate analysis which are very popularly used these are grouping of individuals dimension reduction and the cause-effect relationships will be elaborating on these three topics in the subsequent lectures we'll also be looking at some of the other aspects of multivariate analysis in this lecture series you
Info
Channel: Vidya-mitra
Views: 99,011
Rating: undefined out of 5
Keywords:
Id: KrVbVInFSM8
Channel Id: undefined
Length: 18min 36sec (1116 seconds)
Published: Mon Nov 30 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.