Principal Component Analysis (The Math) : Data Science Concepts

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everyone in this video we're gonna be going over the math behind principal component analysis in kind of a medium level so we're not going super into the nitty-gritty we're also not going to be super high level like we were in the introductory PCA video but we're going to be looking at a comfortable medium given that there are some videos that you will have want to watch or at least have experience with those concepts beforehand to fully grasp what's coming up in the meat of this video right here and I've outlined those five videos right here in boxes so you want to go ahead and watch the video on vector projections so what does it mean to project one vector onto another vector you want to watch the video on eigenvalues and eigenvectors what are they why are they important what do they mean you want to watch the video on LaGrant suppliers so that's a quick video on not the theory behind LaGrange multipliers but just how to use them in fact I would highly recommend that video because if you watch that we're gonna be doing the exact same computation and that's gonna make the second half of this video really trivial for you the other one you want to watch is derivatives of a matrix which I mentioned in that video that's kind of a misnomer it's more about taking a derivative of matrix operations we're gonna be taking several derivatives of matrix operations in this video so it helps to be familiar with that the last one will be covariance matrix there are actually two videos on the covariance matrix one is on the introductory to what a covariance matrix is the other is the closed form of a covariance matrix okay now that we've talked about the different math videos that you want to watch to have a good foundation for what's coming up in this principal component analysis Matthieu video let's go ahead and get into it so it looks like there's a lot of kind of ugly math behind me but I promise that if you have watched those previous videos and are kind of comfortable with the results then this will go a lot smoother and also I'll be going through it slowly I'll be taking it step by step so don't even worry okay so I put stuff in these little red boxes so that we can keep the logical progression a little bit easier because there's a lot more moving parts in this video so the first one is just what's the set up so the set up is that we have n different vectors X 1 X 2 all the way to X big N and they have together dimension our goal of course in principal component analysis is dimensionality reduction so we want to map this space which has dimensionality D on to a space which has dimensionality M where m of course has to be less than D that's the point of dimensionality reduction how are we going to do that seems like a daunting task but let's take a whole step back and let's look at a simpler picture let's say that we have again these points here so this you probably saw from our projections video the same set up so we have all these different points here and we want to figure out if we're just going to map it onto a one dimensional space so vector u1 somewhere in this space then which you one should we choose here's just two candidates to help us kind of get started thinking about what's a good criteria for picking such a u1 let's say one candidate for u1 is this guy a different candidate for u1 is this guy now let's say we go ahead and project our data onto this guy this candidate for u1 then what's it going to look like so here's that candidate for u1 it looks like one of the data points will be here one will be here one here I won't draw all of them exactly but they'll be kind of evenly spread out among this u 1 vector and there will be a lot of variation preserved so kind of this the length of this vector it's not exactly but it's kind of a proxy for how much variation is preserved from the original data set now what if we choose to instead projected on this different vector u1 down here then there's a different story right because for example if we take this vector and project it down here there's not a ton of variation preserved and that's kind of the criteria for how we're going to select the best u1 to project on if we're just trying to project onto a 1 dimensional space so that's if m is equal to 1 right so we see that the criteria is that we want to maximize the variance of the projections onto this 1 dimensional space what that kind of means is a more human context is that we have this D dimensional space that contains all this information all this data we want to reduce its dimensionality but we want to do it in a clever way we want to do it onto a space so that we preserve as much of the original variation as we can while reducing the number of dimensions so there's dumb ways to do it like if we use this u1 we would be losing at ton of information because everything kind of just compresses into this little space but they're smarter ways to do it we can use this vector so that we still get a one dimensional space in the end but we're preserving this much variation instead which is a lot more variation so the question is how do we figure out mathematically what's this best you want to project on well let's first write out the formula for the projection for any given data point X I this is coming from the projections video so remember the projection of X I this is a vector in our set onto a potential u one vector is going to be u 1 transpose X I you okay you being a unit vector ok so then also the mean of all of the projections so of all of our data is going to be u 1 transpose X bar X bar being the mean of all of our X's u so this is this makes sense because it's just an analogue to this and since the mean is a linear operation it behaves in the exact same way so that's the mean of the projections now again going back to our objective our objective is to maximize the variance of the projected data variance is basically just taken by you know their classic variance formula form stats 101 or your initial stats class is 1 over the number of variables you sum over all of the different variables all the different vectors we have here and we basically subtract the length of each vector so this is the length of the vector right because this is the projected form of the vector and since U is a unit vector its magnitude is given by just this piece right here which is the same thing written here so the length of the projected vector minus the mean of all projected vectors remember variance is found by taking the quantity minus the mean of all the quantities and you take the difference square it that's exactly what we're doing here now we're gonna factor out this u 1 transpose u 1 transpose that's going to come out to the front and we still have the same formula squared okay now the next thing I want to do is expand this guy a little bit so I'm gonna this is squared right so the first one I'm gonna leave exactly as it was u 1 transpose X and minus x-bar that's the first thing the second one I'm going to rewrite it in the opposite way so I'm going to put the transpose on this guy the xn minus X our first and then write you one quick break why am I allowed to do that well because in the end the thing inside here is just a number so it doesn't matter in which order I take the transpose if I put the transpose on the first vector or if I put on the second vector and write that one it doesn't really matter so that's why I'm allowed to do this now since the sum only cares about stuff with the little n in it and u1 doesn't have any little N in it I'm safe to kind of factor the u 1 transpose out and out and now what's left so let me actually just write the U n transpose on the outside I'm gonna write it on the very outside including this one over in here what is this guy this beast right here so this beast right here this one over big end sum from N equals 1 to big n of X n minus X bar xn minus X bar transpose that beast if you go back to our video on the closed form of the covariance matrix is the closed form of the covariance matrix which is s so that's u 1 transpose s u 1 whoo let's take a break ok so we just did a bunch of math we used a bunch of ideas already we've already used the idea of the covariance matrix the projection ok so make sure you're comfortable with all this if you need to rewind go ahead and rewind and become comfortable with the idea that the variance of the projections is given by this awesome closed form u 1 transpose s the covariance matrix u 1 ok so now I'm going to erase the board and then we're going to try to maximize the variance of the projections remember that's our goal subject to the constraint that u1 is a unit vector ok and we're back the good news is that we're done with the hard part of this video all that math we did before just setting up this this maximization equation is the hard part why because the next stuff we're gonna do is an exact repeat of the problem we did in LaGrange multiplier video that's the reason I wanted to do that problem back in the Lagrange's of our video so that now it's already been done but you know what we're here let's do it again so we want to maximize the variance of the projections which as we just found is u 1 transpose s the covariance matrix u 1 subject to the constraint that you transpose u 1 is equal to 1 which just means that u 1 is a unit vector okay so using the power of Lagrange multipliers we have a new objective function which looks like u 1 transpose s u 1 plus lambda 1 minus u 1 transpose u 1 ok so that is the new objective function that we want to maximize so of course we're just going to take the derivative of this guy with respect to u 1 so if we take D D u 1 of that guy we're gonna get 2 su 1 this is coming from the derivative of matrix operations video plus I guess it's gonna be a minus actually lambda which is just the Lagrange multiplier itself times 2 u 1 equals 0 we're gonna set it equal to 0 because it's a maximization minimization problem now it's pretty simple we're going to cancel out the twos we're gonna move this guy to that side we're gonna get s u 1 is equal to lambda u 1 what does that mean again like in the Lagrange multiplier video this means that u 1 whatever direction we choose to project on is going to have to be an eigen vector of the covariance matrix s because this is the definition of an eigen vector is that s times u 1 is equal to some multiplier lambda times u 1 but which eigenvector which eigenvalue do we use because s could have many of them to figure that out let's hit each both sides by u 1 transpose so we get u 1 transpose s u 1 I've just left multiplied by u 1 transpose I'll do the same thing here lambda being a constant can just come right out okay so we have this formulation here what is u 1 transpose u 1 of course that's equal to 1 from our what we stated initially so this guy just goes away and we get this result that u 1 transpose s u 1 is equal to the Lagrange multiplier lambda how does that help us figure out which eigenvector to use so if you notice again u 1 transpose su 1 going back here is the initial thing that we were trying to maximize in the first place so this guy matches to this guy so if we want to maximize the left side of this equation we same thing as maximizing the right-hand side of that equation so we're gonna pick the biggest eigen value of the covariance matrix s because that's going to give us the biggest value for the variance of the projected data which is exactly what we want it to do so I know that's a lot of tracing through these equations but the result the final result is that if we first want to just pick which one dimensional space to project our D dimensional data on what we're going to do is figure out all the different eigen values of the covariance matrix s pick the biggest one right the biggest lambda and then figure out what is the eigenvector that matches up with that eigen value and that's going to be u1 of course we want to make sure to normalize that eigen vector so it's a unit vector and that is the answer for the one dimensional space that we want to project on first onto our data of course your question is now okay that's projecting on a one dimensional space but I probably want to project on a higher space right maybe you four or five whatever dimensional space well you can use mathematical induction on this one we won't get into those details but the result the awesome result is that okay let's say you want to figure out what are the best two dimensions to project my like ten dimensional data on to get the first one you figure out what's the top eigenvalue and you find out the corresponding unit eigenvector for the second one you figure out what is the second biggest eigenvalue the biggest lambda after the initial lambda and you use the second biggest use the eigenvector corresponding to that second biggest eigenvalue for the third one use the third biggest eigenvalue and the eigenvector corresponding to it and that's it you just go down in line for however many different components you want how whatever dimensional space you want to end up in and that's how you would arrive at them that's a lot that's a lot I will admit that's a lot to to process so if you need to go back and watch this video a couple times or watch the prerequisite videos a couple of times I'll also link a couple of external resources if you learn better in a different way that's totally okay my main idea is that I want you guys to understand the moving parts behind principal component analysis mostly because it just gives some insight into what's actually happening how do we actually derive the principal components which reminds me I almost went through this whole video without even explaining what a principal component is these vectors the u1 u2 all these eigenvectors that we're deriving are the principal components okay so that's just what they're called so again I hope you were able to follow this video if you if you got lost anywhere please post something in the comments below I'll do my best to try and get to those okay so until next time

Info

Channel: ritvikmath

Views: 28,146

Rating: 4.9710445 out of 5

Keywords:

Id: dhK8nbtii6I

Channel Id: undefined

Length: 13min 58sec (838 seconds)

Published: Mon Sep 23 2019