Basics of PCA (Principal Component Analysis) : Data Science Concepts

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hey how you doing in this video we're finally gonna get to talk about principle component analysis so this is one of my favorite topics in data science for various reasons one just because it's so applicable and so useful in the real world second one is because it brings together so many of the mathematical foundations that we've been talking about building up in previous videos now we finally get to put them all to work in an actual data science machine learning context okay I thought about how to structure the principal component videos I think what I want to do is this video will be a high-level explainer of principle component analysis so don't worry about having watched any of the math Theory videos because we won't get into that in this particular video we will get into that in the next video which will be kind of a medium level overview of what is principal component analysis mathematically okay so let's get right into it for this high-level explainer video the context as always we're going to provide a context so it's not just math we have this beautiful cat and we are cat researchers so we care a lot about various things about cats for the past you know 10 years we have only thought about the weight and length of a cat so is it a fat or skinny cat is it a long or short cat but all of a sudden we have a breakthrough and we now care about possibly care about purr frequencies so how loud is the purring of the cat is it super loud or is it quiet things like that so let's say we gather up all of our our cats and we measure their length their weight and their purr frequency and plot it on this three dimensional set right here so we have weight here we have length here and purr frequency as the z-axis so you'll notice on this chart that purr frequency ends up being usually zero or maybe a little bit positive but it's never very large compared to the magnitude of length and weight so what that basically means geometrically is that all of our cats kind of fall on the XY or weight length plane so there's not a lot going on in the z direction and this is where one application of principle analysis comes in principal component analysis comes in because in this case we have in our system basically a column for the purr frequency variable but since it's basically zero or close to zero we don't really need it we don't lose a lot by just ignoring that call so in this case we would kind of want to remove that column we want to reduce the dimensionality of our space this is a term you're gonna hear a lot in regards to principal component analysis you have a high dimensional space in this case it's just three not super high but we want to reduce that dimension so that we can get better data storage we can get better computational time on our data and various considerations like that so in this case we would like to take our three dimensional space and kind of put it onto a two dimensional space instead because that's all we really need we're not losing a lot by by ignoring that dimension right so I want to get a kind of a running list going of applications of PCA so one will be dimensionality reduction so this is the application we saw here there are going to be a second and a third one as well a second one would be data visualization so that's going to be a big one to imagine instead of just three things about cats we're way in the future where we have a hundred different variables about cats that we care about or potentially care about so obviously I don't know how to apply in a hundred dimensions I don't know if you can but I can't currently draw in a hundred different directions so the most we can draw as as three dimensional beings is basically in a three dimensional plane so it would be great if we had something like five dimensions if we could somehow figure out how to get that down to three key directions that kind of capture most of our data when we lose a little bit of data because as always going from a higher dimensional to lower dimensional space you are going to lose a little bit of data but that is just a fact it's a matter of is that a trade-off we're willing to make in order to be able to plot the data in three dimensions and show cool visualizations to all of the people who care about cats so that's another consideration of data visualization so the last consideration we'll talk about application of principal component analysis will be feature accession so feature extraction hopefully you guys can see that a little messy but it says feature extraction so this is kind of getting to the meat of what PCA actually does so let's say we have ten different variables about cats we have their weight we have length we have per frequency we have their hair color we have various attributes it's possible that a couple of these attributes are just combinations of the other attributes for example it's possible that weight and length might be tied to the body mass index of the cat for example so it's possible that body mass index weight and length are not all necessary maybe we just need any two of them and we can derive the third one in which case we don't need to bother storing that third variable because it's just extraneous it's just telling us stuff we can already derive from the other two variables right so this is where principal component analysis comes in is it helps us identify those cases where a certain variable or a collection of variables are not really independent they are just kind of combinations of variables that we already have linear combinations of variables that we already have therefore we are safe to just get rid of those so that's another case where it helps us to shrink the dimensionality of our data without really losing any information okay so that's kind of the crux of what principal component analysis is so that's really it for this video that's the crux of what principal component analysis does it takes a high dimensional space and applies various transformations onto it to get it to a lower dimensional space such that this lower dimensional space still captures as much of the dynamics in the original space as we can okay and that that transformation will be more clear in the next video when we talk about how do we actually go about transforming a high dimensional space into a lower dimensional space okay so I'll see you in the next video where we talk about the actual nitty-gritty math behind principal component analysis until next time
Info
Channel: ritvikmath
Views: 35,735
Rating: 4.9284296 out of 5
Keywords:
Id: pmG4K79DUoI
Channel Id: undefined
Length: 6min 1sec (361 seconds)
Published: Mon Sep 09 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.