Lecture 14: Face Recognition

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so I guess we will get it started so we'll talk about new topic which is face recognition and this is pretty interesting topic lots of application in biometrics and most of you are familiar with that these days they're both at police station and so on they have this face recognition system in certain version so the idea is that them you are you have some database of people which won't recognize their faces and then you are given the input image you want to know you want tell us that whether this person occurs and long it matches with which person what is identity of that person it's got a face revolution okay so the simple approach will be that we will take these mug shots these images I showed you that Kazi of mug shots and look at the intensity value are gray levels which is also call appearance and so you take an image and make a vector out of it and you have different rows in the image the first row and the second row pattern there you know next first row that third row and concatenate you make this long vector so then now the person will not look the same if you the picked is taking different viewpoint and so on so you want to have some more examples of the same person different view of same person to kind of generate a model which you want to use to recognize these people so I the using these different views so you generate some mortal maybe you can take these different views ever edges become vector and when you want to recognize person you are given unknown image you make a vector out of it and we have these models of different people say 100 people hundred vectors you match this unknown vector to person number one person number two and find out whichever best match and that's a person so that's very intuitive very simple method with your new Brussel you have this matching if they are you know similar then the difference will be small how the match will be maximum if they are different then difference will be large and match really small so you choose the closest one so now you know this simple approach you know will have problems because the typical image is pretty large even the small size like 256 by 256 image if you make a vector out of that will be I mean let's say say it's a bigger than 256 it's 512 by 512 it'll be 250 thousand dimensional vector okay so there's a very large vector you are matching element map element from these many vectors you have for different people so also then you are using this dark gray levels they may be sensor to noise or lighting condition so in order to solve this problem what we will do we will reduce the dimensionality of that very large vector actually if you have our national vector and for that we are going to use this notion of eigenvectors again we will talk about eigenvectors that's used in different places and here we are going to use it face recognition so that method which will reduce the dimensionality of these vectors called principal component analysis and that will kind of give us a basis that we can represent any vector in terms of those eigenvectors okay the main point here is that there can be a few more significant eigenvector which we need to use as compared to being you know very huge dimensional space like 250,000 dimensional space okay and and you have learned this eigenvector you're done several times so you know eigenvectors the spatial vector that if i take a matrix a multiplied with eigenvector i'll get the vector peg and the scalar which is called eigenvalue and to find the eigenvectors and I can value we will look at a minus lambda I and determinant of that and then we'll solve this linear system finding eigen vectors and we have done this example so for this matrix the eigen values are 7 3 and minus 1 and these are the corresponding eigen vectors okay and we have gone through this process I'm going to skip that so now face technician what we are going to do is we will take all the gray levels in the image face image will make a long red out of this which calls you and which will look like that so we have a high as an image and this is the first row and first column second column third column in that columns about this vector this image is M by n okay so we have the we have this first row here and then you have a second row and third row fourth row this is M nth row and this is the last one so we have n columns and M prose or vice versa so this will be M by n a multiplied by n dimensional vector because we have Penrose and columns so that is our you vector so we make a take an image and make a vector out of that which is so for that those concatenate like that so now we are going to assume that file each person we have the small n views this was uppercase n and smaller so the end small n views for each person and there are p % o p can be unread at 30 or whatever and n can be 10 whatever so then we will make a matrix out of these vectors which we'll call a matrix and that will look like that we'll put the first person first view like this in the column second first person second view this another column third column fourth column and inner column so these are the images of 1% number one n images of that person then we have person number two will again put his images as a column vector and dimension of this is M multiplied by n up against em multiplied by abacus in so since we have P persons so we will have these P multiplied by n columns in this matrix okay and each column is for the fight well by five degrees 250 thousand dimensional vector so very large reactor so once we have this matrix a we are going to make a matrix L which is a correlation matrix and that will be M n by M N and this will be like this which will be a a transpose okay so a is the M N by P N and a transpose is the transpose of that will become P n by MN so that's why we will get M n by M L okay so take a letter say like from here we multiply the transpose we get this L matrix so the dimension of this is if in that case we have been talking about 512 by 512 by 250,000 matrix L which is captured the different people and their different views okay yeah so in this a matrix we put the first person first view as a column like this then second view first person as a column like this so these are the different views our first person like that yeah so and each column has 250,000 damaged vector so we have make a vector out of that yeah so I'm going to talk about you know how to if we reduce that okay so right now we are right that it's a pretty big vector so and even though it's a pretty small resolution these as we have very large it kick-up 8-megapixel pixel in our image but we will talk about that okay so so what we are going to do is we will come up with the eigenvectors of this L matrix okay and these will also be called principal components so these are the eigenvector v 1 v 2 v 3 and 5 n and they will form the basis so then we can represent any unknown vector as a linear combination of these eigenvectors okay so now so suppose we have the face you and we can represent you as a linear combination of these eigenvectors the reason is linear combination because we have the eigen vectors Phi 1 Phi 2 Phi 3 2 Phi N and then we have these coefficients a 1 a 2 a 3 so take the a 1 multiplied by Phi 1 and 8 a 2 multiplied by 5 5 2 and so on we added up so now this matrix L is a symmetric because we take the a matrix particular with transpose so it has to be symmetric symmetric means you know if I take the columns make them regular rows and protect rows make them columns there you know there will be the same it will not make a difference look at diagonal elements are similar same so the as you remember that if you have the symmetric matrix then they are orthonormal it has the auto normal eigenvectors which means if I take any two eigenvectors find their dot product if they're back I can vectors are same like when I is equal to J so I take five one five hundred five one then to be one if I take five and find a point for the other eigen vector say five two then we see you because they are perpendicular to each other so therefore if I is not equal to J than 0 if I is equal J is 1 so that's a property of these eigenvectors of this symmetric matrix ok so and we are going to use that so now let's say we want to find out the coefficient of this particular vector u X we want to represent that vector in terms of these eigen vectors Phi 1 Phi 2 and so on so what we will do you'll take the UX is some unknown face we make a vector out of that we want to find out how we can express that in terms out these eigen vectors okay so we are going to take this that vector and say dot product with v 1 the eigen vector now as we said that we can represent any vector as a linear combination of these eigen vectors and these are the linear coefficients a1 a2 and so on so this is the representation of U and then we find a dot product with Phi I now we can expand this which is a 1 Phi 1 a 2 Phi 2 and all this up to a n which is this one and Phi I is outside now we are going to find a dark bar deck of fire with v 1 v 2 v 2 and so on and it will look like that ok now we are going to use the auto normal properties of these eigenvectors so what are we going to get Phi 1 dot product Phi I anybody it will be it of course the scalar because you have a vector transpose and multiply a vector but these I is different from I so we just say here if I is not equal to change we see you so therefore this will be 0 this will be 0 2 all these video except this one will be 1 1 so therefore we will get AI from here so therefore the coefficient for the ayat eigen vector will be a our and we can do that valid v 1 v 2 v 3 and all this and this is the way we will find these coefficients for this particular vector which can represent a linear combination of these eigen vectors okay so that is what we have here so now as he was mentioning that this L is a very large matrix even though the image of the face we are looking at it's you know it's reasonable fight over a trend which is not that bad but when you make a vector out of these in lots of lots of pixel so it becomes hard so therefore this is a trick what we are going to do we make a smaller matrix which would be C which would be like this so it's a a transpose a L matrix was a a transpose okay so is dimension as you remember was mn bipm okay so therefore the l dimension was MN by M in now here we have a transpose a transpose dimensions we can buy amen now PN P is how many people we have and n is how many views for each person we have so which will be much much smaller than 250 mm dimension so that's what we are going to do so but now the question is that we aren't to represent these eigenvectors of this matrix which is the L matrix but which is a larger matrix but we want to reduce the computation we are going to find out the eigen vectors of C and from eigen because C we are going to find eigen vectors of M okay which is which is a trick so therefore we want to show that we can do that so let's say there is alpha I which is eigen vector of C and this is by definition then we'll get some eigen value and then we will get alpha back this is a definition of eigen vector okay so this is C matrix the different metals now we know the C is defined like that a transpose a so we put in here it on suppose a and for I lambda said same thing so what we are going to do now is multiply a on both sides a here and there here so here we have a a transpose M a and for I and them die and a and Phi okay so now as you see that this actually is L matrix okay so this is L matrix and this is a vector because we have matrix x vector and then this is a constant and this vector so therefore the if we know the eigen vectors of C which is alpha we can find out eigen vectors of the L matrix by multiplying alpha I with which is interesting don't because this is a definition now there's a matrix L and then there's some vector and we get the vector back and then I can wait so that's the way we are going to find the first the eigenvectors of the smaller matrix which is C and which alpha at which are the Alpha is I can vectors and from there we are going to find eigen vectors of the larger methods yeah okay yeah no but so it depends on see the how many people you have and then how many views are for each person you have so of course you know you will have fewer people than number of pixels which are used to represent an image of person no because you know these days you have 8 megapixel image and if you get a you know vector out of 8 million pixel yeah yeah yeah yeah yeah yeah but I mean so that tells you that is even you can use that you can have very high resolution images you know you may have you know twelve million pixel each image so that's item main point is that always the C will be smaller than him that's the point okay so so then I think the first definition is pretty easy that we will create a matrix a from the training images which means we will have for each person we have some some you know examples of that some person taken concur for view point then we will compute the C matrix from a and we will compute the eigen vectors of C and then from that we compute eigen vectors of L that is explained to you and then the main point here is that when we represent this any any vector is a linear combination of these eigen vectors we don't have to use all of the eigen vectors we can use just most significant eigen vectors few of them say 30 or 50 and that's where production is and that's the main thing same setup using the two hundred fifty thousand dimensional vector we are going to reduce to say thirty dimension or ten dimension and that's what this principal component is is doing that is reducing the dimension okay that's called PCA so I'm going to show you excel you a code and that will actually do that so once we select the most significant eigen vectors and that we are going to select by looking at the eigen values the if we look at the all these eigen values we get we look at this largest one then we get a corresponding eigen vector which is more important then the next one then next one so we sort these eigenvalues and pick the tough one so then we have these coefficients for each of the person for each view and we will cluster them and find the average and that will become a model of a person and so that's a training so we know that person one looks like that some two looks like that and so on and then we recognition we are given an image which we don't know which person it is we create a vector out of it and then we will find the coefficients that is and then because we are letting you the eigenvectors then we will take the coefficient vector in compared with all different people and whichever is closest to and so that's a person okay so this is actually a MATLAB hold which does that and you can try it out looks pretty simple idea so we will have the one matrix which will have these faces okay and so that is our a matrix which we have put these vectors there as I should explain you before so from the a matrix we can convert a C matrix as I explained to you and we will find out the eigenvalues and eigenvectors of c matrix and this is a command as you have used is TIG see that will give you two matrices this vector C matrix will have the eigenvectors and the value C matrix will have the eigenvalues on the diagonal now the from this matrix we are going to generate a vector because the only useful information in that is the eigenvalues but we know in this command we educate a whole matrix so we'll take the diagonal elements and from this matrix and we'll assign to the vector SS okay then we are going to sort these eigenvalues because we want to look at the most significant eigenvalues and this will be a sorting so we will have sorted we will have index for each of this element and we will use that okay so then after sorting the eigenvalues we are going to use these to sort the eigenvectors because we want to get that the largest you know eigenvalue well correspond with that we won't get eigenvector and this will sort now using this index of the eigenvalues eigenvectors and we'll get this vector C so from vector C then we find a vector L because that's our original matrix L and which will become the as you remember that we multiplying a alpha so we multiply a matrix here and you get to that time yes because you know you have numbers you know so you have five three nine so on and you can sort them you say not just at the top then the next one next one next one next one so you can pick the more significant not just one quietly okay so so this will give us the now eigen root of L and then we want to find out each of the face image that coefficient as you remember that whenever we have the coefficient of fine we'll take the each image and we will just multiply with the eigen vectors and get the coefficient for that and that's the idea of MATLAB you can do this one shot and do for all the images and all coefficients but I showed you that how can you find the coefficient AI is multiplied by I with the vector u all of them cancel and only one the main that's a are if do a 1 a 2 a 3 a 4 but here you can do just one shot for all images all coefficient is one line network code and that's what this coefficient is going to do okay so so in this matrix we have we actually have 30 people and so now we are going to do this loop where we are going to take the coefficient of five most significant coefficient between we are going to use only five eigenvectors okay so we will take these coefficients and then we will add them up we will find the mean of that so this is saying if I is equal to 1 then we are going to look at the from 1 to 5 then when I is equal to then we have 6 to 10 and so on for each one we left 5 and we'll take the first 5 and find a mean and that will become the model of the first person then second person third person it will do this for 30 people okay because we have coefficient for all the people here so this is actually basically training okay now and it's a pretty simple few lines of MATLAB code because in all these eigenvectors eigenvalues in metric multiplication all these things them in one line it's pretty simple huh small I just saying the yeah this actually I think should be is the same I here so actually let's change that think cousin PowerPoint I'll just make it uppercase okay so now yes yeah I mean the simplest one is that while you have you know person 1% you have 5 views you get first in May 2nd to May 3rd image and you want to come up with that what is the representation of that person so you can take five of those final mean that's that's the easiest way to do now it other way will be that you know you can kind of cluster lots of people that you have you know a hundred views of each person and find a center of that and that will be like better than so that may be many ways to do that but this is the more simpler okay so so now we have done the training we have this model then we want to do that cognition okay so so you know say you say well because you know you have these images you are recognized and you enter which image you want look nice and so then you will take that image and find the coefficients again multiplying with a and multiplying because we are going to take that particular image and then find the icon coefficient multiplying with vector M that's the wherever you point coefficients we multiply with the vector L and we can do in a display of these other coefficients and these ones by optos use the file as we party party training now we have coefficient by dimensional vector for the person which are recognized you don't know who that is of the 30 people that we are going to match that with person one model percent 2 percent 3 percent 4 or some twentieth whichever is closest to we will say thanks the person okay and that's what this loop is doing so say we start top this one will have this loop from 2 to 30 and we are going to compare the those coefficients image components we get from the Martin we have so right now we have this the model I and image component we subtract and then you take absolute value of that which is norm and l1 norm now if this is less than the that top one because you're going to start say let's say maybe it's the first one which is first then you know if doesn't then indicators next one will do please stop with eye and we'll keep doing this loop and then we end then that will be our answer so here we are comparing the model with the image coefficient which you got for that person with your recognized so this is the subtraction of the model with image coefficient and then this is what so far we have found that well this particular image matches to the top is a variable and we have this loop if it is less then we replace top with this whatever I which we found here again I should be lower case here and we'll end up then we will have that form it recognized to the death that's it the 30 to go to loop and entails so the pre nice application of the core the concepts of eigen vectors eigen value for a crime aggressive yes because you know used you say it's say it matches one one you say let's assume it mated with one then you want to go through then the two is in a museum say well you have 30 people and you assume that matches already with one so now if it matches better with two then we replace then two in a one two three and four well you have 30 people no so how would you do that so that's just one way to do it they assume that you're dead then say well it's less than the previous one then you replace with that yeah yeah okay so so that's you know I'll be inside as you can try this out right here you know copy this and run your program to do this face recognition and this was actually a big breakthrough in face recognition problem had been very popular but people found out it's a very difficult problem to reference faces started in 60s and 70s and people kind of gave up because it's very difficult and the idea was the first one detective eyes nose different parts of a party part of the face and trying to match those but in 1990 these guys Pentland and dark red level professor at MIT Media Lab and met you Turk was a student PhD student they came up with this idea that we can come up with this eigen space and that's why this method is called eigenfaces then we can reduce the dimensionality and just do simple principal components but damage reduction and it actually works so they had actually real-time demo at International and I took please see DPR conference a major company Hawaii and they actually caught the best paper award there and so since then now face technicians become popular it is not of work there's a whole conference on face recognition separate conference face and gestures so but basic is very simple and the reason this became very popular is very very easy to understand so you have these faces you make a vector out of that you get lots of example you do PCA you get the more significant you know eigenvector express each with very five dimensional vector and sort of 250,000 dimensional scene as you saw here that if image is Phi 12:05 trail face is represented by 250,000 dimension vector now here we showed you we can do it only five dimensional vector because we found the principal components that the base is like bases which are the most significant and that's the wide you can do the scripted okay so this is another application of eigenvector concepts you know we have seen this in the Harris we have seen in cept and we have seen many different places even fundamental metrics we talked about on Tuesday so so the idea is a mathematical basic concepts are very few and each is very simple once you really understand then you can use lots and lots of real-world applications and make sense ok so now let me let me go further and look at say well this was you know say 20 years ago and what has happened since then and what are the potential problems so one thing is that this idea of the face recognition from eigenfaces came actually originally from this paper and these guys were interested in image compression not really face recognition so what they say that well if you can find the bases which are these u1 u2 and so on and is the coefficient and this a mean vector then you can cover the approximation of an image by expressing it using most significant eigenvector and if the eigenvectors are known at the at the receiver then you actually need to transmit the few coefficients and that way you can actually do the image compression so so if this is an image we want to compress then one way will be that we can compress using JPEG it will give you like that okay so it will require 530 points have you uncompressed very very small so this is the JPEG now if you do the eigenfaces you do the method I described that you have these bases and you can compress it like this which looks very realistic and you actually require only 85 bytes so that's a very good application and the only problem here is that you need to generate these bases and then you need to send those bases to the receiver and then after that you need only send the five components because if you have five components in this case then eigenvector cells are already known then you finally in the coming those okay so that's one thing you need to know now there's more than you know the simple eigenvectors and what they did that well we can look at this covariance matrix and not the just simple correlation matrix where we are going to subtract the mean from the each of this element we have each of the phase we have and that's called C matrix and then we find the eigen decomposition of that as you remember there we just have X X transpose but here we're subtracting the moon the first method so using this then you know you can get these coefficients by finding the dot product of the vector minus M then finding that both of that so that we we subtract mean here so you can subtract here also so then we can get approximation of the any vector X by mean plus then the eigen decomposition coefficients and eigen vectors we have say M of those like that as we did before so only difference is you're adding mean here because we subtracted and move and so that's the difference so that kind of this figure kind of explain to you that these are this is the mean and we want to find out the distance of this say this phase and this percent and we could check this on this space and we get X Delta which is shown here okay this approximation of that and then we want to find out the distance of that from the mean which is shown here and that distance is called distance and phase space okay and that distance is from X Delta from the mean and that will be the square of the coefficients a1 a2 and of the M and under root of that that's a distance which find out okay so now if you have two phases then we can find a distance between two phases x and y so X tilde is the present say a coefficient Y tilde is represented by B coefficient we just you know subtract corresponding coefficient one by one square at under root and that give you that how similar two phases are x2 divided love and are projected to this faces space okay so that's the approximation so one thing they've found out that valve and this thing we are using eigenvectors but we are not using actually eigen values because eigenvalues tells you that how important this particular eigenvector is and how in part of next one is solved so that way we can improve this further so what they did that they look at the not only these coefficients but divide the coefficient by the eigenvectors and that gives you better results and again the same thing that mean and etc and this is called the distance which is see when using this eigenvalue idea okay so so we can do that you know several ways one is say when we find the eigenvectors which multiply with their the eigenvector eigenvalue metrics and we don't have to do that and do the same thing and so so the at the bottom line we have two distances we can do the you clear in distance simple you plane distance which is a fair and final distant from in P and Q we can take P minus Q 1 squared 3 2 minus Q 2 square under that's you clean this thing this does not use any eigen values our colons where it excels from the other distance called Mahalanobis distance there we are going to use this covariance information of these eigen values so this is the basically model distance some sent and dissolute Clayton mister and that makes it so that's one point now still there are problems that the phaser concern becomes very difficult because the one person can look very different same person you could take many images they may look very different from each other so we have to look at the difference between the images of the same person actually we want to have such representation that they are not different but actually images are different without different lighting condition to form the final form so that is called the variation within a class so each person is a class so the 30 people we have 30 classes so what is the difference of variation within the class then also we have to look at that we can distinguish different classes there is a person a person B we have to come up with a representation that we can distinguish between them now in distinguish between these two people and we want to say all these images of one person they are similar that's the idea okay so like let's look at these images now this is same person and they are taking a different elimination different lighting direction and so on and some of these images look very different I mean this this image look very different and from here so that so that's why face recognition is a very hard problems not that you see okay so so then there's a very interesting so this so far you know method is called eigenfaces you know the cause is juggen vectors then the new method is called Fisher faces and it uses a very interesting idea called LD a are linear discriminative analysis okay so there is a very nice presentation from this professor in Texas A&M and I have taken the the slides from his presentation but it's very very very solid method and again this is very fundamental to use lots of places so you want to listen to this very carefully and make sure that you are able to follow that so Lda are linear discriminant analysis will will do better than I can phases and that's a huge difference okay so so the way to start explaining is that well again we have a face and each is high dimensional space and we haven't talked about 250,000 dimensional on that size of D dimension so we have many samples you know X 1 X 2 xn and these are faces this is one person 2 and so on so then let's say we have two classes or two people person 1 person 2 and we have n1 samples 4% 1 and n 2 samples 4% - and total is in upper case F so we want to come out a better method to discriminate between these two then what we have been doing so far is in coefficient and finding mean and so on okay and the way we are going to do it that we take each vector which represent the face and we will project on the line ok so we will take a vector X we will find a dot product with another vector W so the dot product will give you a scalar and that will be Y and which means let's say we have two o'clock two persons the blue person blue dots are 1% and the readout or another person and two people and there are different examples of that person person 1 person 2 and let's say to simplify we are representing and only two time x1 x2 this explain figure so we want to find we want to project these on a line and let's say this is a line so you project this person this person and like that so these are the dots here now as you see if you project like that it's very difficult distinguished black I mean the blue from the main that is there they are just very close so we want to be able to distinguish so this projection is not coming okay so as compared to let's say if we project on this line then again we have the blue dot and the red dot project like this and now we can distinguish between them because this these all are one class is another class so that is what you want to do at how we can determine this w vector so that we find a dot product W with X it gives a line and we can distinguish these two okay and that's what the linear discriminant method is going to do that so now we want to find out measure of separation of these two classes so one will be that while we can find the mean of these two classes and we project them and hopefully we can separate this okay so we can find the mean of the first class by summing up all these vectors dividing my number of examples we have in this class and we'll call mu one and another one we knew - and now we want to find the projected because we take the X we projected using W and we call that the mu tilta and so instead I find the mean of X we have fun we have Y where the project and the y is given by dot product of W and X and we can find out that well the mean of X is what matters which is mu I from here and W that is the constant so this will be a projected mean and why and this is the actual million X okay so what we are going to do that while we want to find the W the projection such that the difference between the mean projected mean mean for class 1 and mean for class 2 is is maximized okay so that turns out to be from here it is W and the mu1 minus mu2 okay so here's an example so let's say this is our class 1 there are lots of examples and this is the covariance matrix and this is class 2 and this Aminah for the first one means for the second one and we project the mean here check the mean here and they are different we can separate them so that's one may say well why don't we use that but then there is a problem because means we can separate that's ok but there are lots of other samples in this class one if we protect these all of these and all of these then they will again be mixed together because as you see this will hit here this and this will get here they are all mixed together because we don't have mean only one there are lots of these spread out so that's why just means is not good enough ok so now the good fisher it since it's not considering the standard deviation how what's the variation of these class one example and what's a variation of plus two okay so instead of that what we are going to do is that we look at a difference between means but we are normalize by the measure which is a without within class step skater okay so so we are going to look at this the within class skater which means we have a projection Y and that will subtract from me and we will add up all those examples we'll call this tessai for first class and similarly we have for the second class and this is called within class a skater and we have these are the for the first class and second class because we are subtracting these projection from the group so now we can do that better that we want to know maximize the difference between the mean but divided by these skater of this evaluation of each class and that works better and he is using this we find a w which is like this okay so now if you project a mean it will become project here and that means project here but all these examples will be projected in this thing and all these will projector here again we can separate everything these are separate than this one as compared to previous one as you saw here that we have problem because we have we are not considering the standard deviation the covariance and so on so that is one important point so I look at the mean and also that the variation the variance so so the question is given lots of examples how do you find this w that this is an example if I can understand we can do manually but we have very high dimensional space you know 250 thousand dimensional space so how do you find this projection W vector so that we can multiply W transpose with the X and we find this Y for all of these and I can separate these two classes so that's what we want to talk about okay so so we start with finding the scatter matrices so this is a scatter matrix for the class I so simple you know covariance so you take the X vector subject or mean and find it x transpose and for two classes we will do this we add them up and we call this SW and this will be within class scatter matrix because here we are looking at the mean efforts at class we are taking all those vectors in that class we are finding their scatter matrix that's one thing then we want to find out the scatter matrix of projection because we are going to look at project that in the Y and projection matrix is that is still done so for the eyuth one so we are going to look at the Y minus UI as we did there and then each one of them is y is define the projection of X and W Y transpose X and we subtract like the mean is defined like that as we did before and you simplify basically multiply these two and get this and then this we will call the S the S I and then we will have the subtraction of the mean and which is the MU 1 tilde minus mu 2 tilt again this is the definition we have been using and multiply this with that as a square and all this simplify and this will be called s B which is scattered between classes because it's a you know u 1 and u 2 and then simplify SP W transpose W and so that is or s B and this was our s W so we have two matrices and so we are to look at this vector that we want to look at the the scatter between classes and scattered within class and which we are defined here and we want to find W so that this is maximized okay so now whenever you want to maximize something what you do and you have been doing this long time maximize the minimize what is the what is a way to find maximum any function derivative that's right that's what we are going to do so we take the JW which is this one we differentiate between with respect to W and equal to zero and will give us Dabney okay so this is the numerator this is the denominator so we have this ratio of two functions you're differentiating as you know that differentiate this we are going to Squire this and then keep this constant differentiate this minus will keep this constant and differentiate this and that's what we have done Squire that in denominator and keep this here differentiate this this is the W transpose matrix and W just like some some constant and then so like an X square w transpose x w transpose is a square terms and we have done this before so the derivative of this will be 2 SB w from here - we will keep this constant and then differentiate this is the same thing with two Deb s as W n W ok because the matrix two vectors as we did here so like this at a bit of that and subscribe so then this has to be 0 so now what we are going to do simplify so multiply the numerator by W transpose SW W on both sides this side and this side 0 so we can do that so therefore we will get rid of one of those here so then we are going to simplify this some divide this term by this and divide this with this in case like that so this is the word this thing and we can get tough - also - from - from here because zero on the other side you can get it out that so we got this one this is divided and we have s BW and from here we get SW W like that okay so of course as you see here that is the same thing this and the same thing so they can cancel so we'll just get s BW and then this is defining J this is Jo J so right I'm J here and then SW and W so now we can we can take this and find no x SW inverse on both sides we can do that because 0 because SW inverse s BW - J and the coyote in W so that we can write down like this now JW on the other side so now this is a familiar okay what is this one so this is a matrix inverse of that and the matrix vector and some constant and vector what is this reminds you one take a matrix multiply have a vector you can a vector back and some constant what is that cop matrix multiply a vector you get a vector back it is the eigenvector remember ax but ax is equal to lambda X there is a matrix X is a vector then you get vector back and the constant this eigen vector okay so this is the another application of eigen vector so at the end we end up with take the matrix SW inverse SB find the eigen vector of that and that will give you Delta which is amazing here which is very nice we did this you know systematically and we found we ended up like that so let's look at example so we have two classes X 1 has these examples no four two four one two four two three three six four okay so these are the samples for first class this is our sample for second class we want to apply this method to find the W vector so that we project on that and we can separate the classic linear distance analysis or Fisher faces so we'll just follow the process so we have the samples we'll find out the s 1 matrix scatter matrix for class 1 scatter may be for us to mean of class 1 mean of class 2 okay then the scatter between classes and scatter within classes and then find the inverse of within classes between classes and all those find the eigenvectors of that so i can value and we find out this is the eigenvector with this eigenvalue and that is what we are going to project okay so here are the those classes the first class second class we found out this is a vector if you put checked on this one and then you can go all the way here here here then here here they are very well separated okay so that is called LD a fresher faces linear discriminant and the bottom line is that you just have to know how to compute these scatter matrices the mean this SB which is within and between then take this find eigenvector take the first one and that's what you want to eject so let's you know it's very nice and this is you know use that log LD a linear discriminant analysis and it's very intuitive to follow that and these are scatter matrices are nothing but there's no their covariance matrix subtracting the vector from the mean and taking the transpose of that multi-parameter we may talk about how to compute these scatter matrices this is a very computers sy and the survey you compute the projected of this scale here okay so that you know finish is the eigen the face recognition so we talked about the starting point the pca are eigen faces take the examples you know do the pca find eigen vectors select the few more significant eigenvectors that will span the space then that reduce the dimensionality and you can just take the several example of each person find the coefficient and find the mean of that that peak of a model and your unknown face take a vector projected find the coefficient and compare with all the models then the one which is closest so that's one thing and I give you a code you can try it out then we say well how can we improve that so one way was that when we find the distance we need to include some idea of the eigen value and that gives you a distance called marna bus distance then the next thing which is a very important the LDA is that you have to look at these scattered Cohen's matrices within class within class means all the examples in one class say one person we want to make it they're similar and the the examples in different classes which is the between classes they should be different so we can distinguish them and so using this that we want to find the W so that we can project these two different classes so that their mean is difference at either that's what the first idea then we found out that means not enough because we can have lots of variation in each class you could check them then they intersect that's a problem so we say well mean distance and then you want to scale it by the scatter and then we say it is really good then you talk about how we can find the W and we formulated this in terms of this J and we set the standard differentiation of that equal to zero and we found out that this is the solution and we get a matrix s be SW inverse s be and finite amount of that thank you okay any any questions before we end yeah so here we have used elimination as a criteria for determining the class really forgiven by elimination elimination elimination no I mean we have not used elimination we are saying while the images in one class may vary due to elimination and we have enforced this within class state level and the between classes later both together to get the right W so it takes go yes continue your crystal so can we use the facial features for example person having a beard or of the stache yeah having new glasses has a criteria for in class variation and can be apply early on that yeah man see the problem is that how do you get the mustache or you are you saying that you to take mustache in without massage or now yes yeah yeah so so you know the other you are kind of mixing up so one is that this method will take care of many variation within a class it can be eliminated can be glasses not glasses whether you must ask not oh sorry Lester face is saying so it should take care of that because we are modeling in within class and between classes okay so that problem that's what the object that here we did not say we just give example to elimination different but can be any other difference so since we have lots of examples of same person and we are capturing all this variation using this self within classes scatter matrix so we should be in a way deagle it doesn't matter to the elimination we the must argue the classical ok there was other question yeah yeah no no X these are the examples so see that we have we started by talking about that we have two classes and these you know sorry so we have two classes two people and each person is represented D dimensional vector so X 1 is d dimensional say 250000 dimensional vector and so we have for the person 1 we have N 1 examples and % 2 we have n to itself so these are given to us and we talked about how we can separate so it can we know depending on how you want to put in the face and it can be you know fewer dimension that's fine but this is the general it can be it doesn't have to be face it can be anything no so saying that we have two classes and each is represent by some vector and we already distinguished them that's the idea and actually there's a nice example I can show you right now this guy has that you know this can be applied not on understand your faces but lots of interesting application here so this is what he is saying that actually these are the coffee beans you know so we have we have five classes now we have five typey types of coffee beans and we want to distinguish then automatically and so they have some sniffles 45 sniff is for each of the coffee bean okay so these are all these signals and so it actually now it becomes a sixty dimensional vector long sort of future so 60 damaged vector five classes and he show you that you can come up with no separation using this can be a this one class is another class third class and so on but you use only PCF if you problem here so this is a betrayal you know any vector any dimension two classes you want to separately okay you're not sure yeah no I mean of course you know that it works better than better than a PCA but see the idea is that so you took the variation of face for example and you are using that but captured the middle class and between classes skater now if you have an image which is completely different which you are not modern look so so as you saw here that we were looking at this kind of variance so this is the like we are saying all these examples of this class of it in that all these within them and so that's why we are able to separate very nicely but what happened we have some example which is completely out of it you know we various here know for this rate class then of course will be a problem so that's the idea now if these are the training based method so the method will work as good as your training that's the okay any other question yeah yeah yeah so see that as you know so you will do you have a face and you project on this line W the multiplier W transpose X U as u is just here no so all you can even look at this one here see ya so you so we talked about how we find the lines right so you understood right now we got some some particular instance ax from here you know these are the example we got X which is unknown so we put check that X you know by multiplying with W we know and say where it falls the Falls here here or here which falls here then if this class falls here it's this class that's the way you distinguish because that when you have found the main thing was W and then once you find then you want to say within this distance you know so that's the way you are saying here that this is what is within this then this is class rate within this it's class look so so so that's you know that's the idea and and you know the mean the projected mean here and you also know the you know the covariance matrix so you just look at look at that in order it is because it's guaranteeing that this captures this so actually you want to just find the distance from the beam we project that X by multiplying W and say how much is it from the mean of this and mean of that whenever is close as to we set that clock that's it so it's very simple okay so as we are doing in the in the eigenfaces we are comparing the mean so here we are going to compare with mean the only difference is that here we have a better projection as compared to PCF that's only difference okay so so let's stop you
Info
Channel: UCF CRCV
Views: 40,391
Rating: undefined out of 5
Keywords: education
Id: LYgBqJorF44
Channel Id: undefined
Length: 71min 0sec (4260 seconds)
Published: Wed Nov 14 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.