SVM (The Math) : Data Science Concepts

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hey everyone welcome back so in this video we're going to be talking about the math behind svm so i had a whole separate video on the intuition of svms and i highly highly recommend watching that first so you get an idea of what i'm talking about throughout this video that video will be linked below so let's get right into the math of svn again let's start with a real situation so let's say that we are trying to predict whether or not a student gets into their top choice medical school so we have n students so all the way from one to n and for each student we have a vector of predictors so things that's going to help us determine whether or not they get in for the first student that vector of predictions is x1 and then x2 all the way down to xn and let's say for each student we also have a response so y i is going to be either one or negative one one if the student does get in negative one if they do not and of course our machine learning problem for today is trying to build some kind of model that's going to use the predictors and try to figure out the response and we'll be using an svm so i've drawn below just a graphical representation of our problem so this vector of predictors is going to be generally pretty large but let's just look at it in two dimensions for now to keep things simple so let's say that the two predictors we're looking at are the gpa of the student on the x-axis and the mcat score of the student on the y-axis so we see that students who generally have higher gpas and higher mcats who are these green triangles are getting into their medical schools so these are classified as plus one and students with lower values for these metrics generally are not so these are red x's classified as negative one now in the spirit of svm as we learned in the intuition video we want to create a maximum margin classifier which means that we want to choose a decision boundary so this blue dashed line here is called a decision boundary we would like to create that such that the space around the decision boundary which is called the margin is as big as possible the reason we want this again is because we want some breathing room between the two classes so that's all from the intuition video now let's actually assign some equations to these lines so this blue dashed line is going to be called w dot x minus b is equal to zero and it helps in this video if you have some familiarity with vector algebra but let me expand this for you so you can see better what's going on so let's trace this arrow up if i expand this equation i get w1 x1 plus all the way to w p x p minus v is equal to zero so it's easier to see what's going on now this is simply the equation of a hyperplane and since we're just dealing with two dimensions for us that's just going to be the equation of a line b is the intercept and w one through wp are all of the coefficients so this kind of implies that each vector here x is p dimensional so we have p predictors about the student now we have two other lines in this picture and they have very similar looking equations so this top line so this top blue solid line is given by w dot x minus b is equal to one and the bottom blue line is given by w dot x minus b is equal to negative one now it turns out that w and b are the only two coefficients we care about going forward if we can find good values for w and b we're finished and by good values we mean values such that we maximize this margin now all we need to do now is figure out what is an equation for the size of this margin the margin being the space between the two solid blue lines the first thing to know if you go back to your vector algebra class or look at your notes the normal vector is going to be w and by normal what we mean is perpendicular to the hyperplane in this case perpendicular to the line so this vector here is w itself knowing that we can start forming a story we can say that okay let's say that i am a vector x who is on the decision boundaries so if x is on the decision boundary then it must follow the equation of the decision boundary which again is w dot x minus b is equal to zero so i'm here let's say and i want to know how many units do i need to walk in the w direction in order to get to this other blue line whose equation is written here the first thing i'm going to do is make a unit vector who goes in the direction of w which is simply w divided by the magnitude of w so this is a unit vector who's in the same direction as w so the story that this equation is telling is that if i'm at my current vector x who again is on the dashed decision boundary and i walk k units in the w direction how many units do i need to walk so what is k such that the new location i'm at is on this top blue line whose equation is given here so it looks a little bit complicated but once we kind of break apart this term it becomes pretty obvious because we get w dot x that's the first term here plus k w dot w over magnitude of w and this w dot x minus b so this part and this part is what we literally know that's equal to zero so that goes away this goes away and we simplify and we get that k is equal to one over the magnitude of w all i did so far is say that this distance right here so this distance right here so this distance given by the question mark here is one over the magnitude of w this is important because now we know exactly the size of the margin because the margin is simply just twice that amount so now we know that the margin size is 2 over the magnitude of the vector w w again being this vector of weights that we are choosing so if i want to maximize the margin i need to maximize this quantity which is the same thing as minimizing the denominator so i need to minimize the magnitude of w so the first part of the story is done we know that we need to pick some kind of w and v such that we're going to minimize this magnitude of w but there's one constraint or a couple of constraints that we need to take into account we need it so that whatever w and b we choose everything on one side of this margin is classified as a plus one and everything on the other side of this margin is classified as a negative one what does that mean mathematically that means that if y i is equal to one which means that if the student does truly get into their top choice medical school then we need that w dot x i minus b is greater than or equal to one what does that mean so w dot x i minus b is bigger than or equal to negative one which vectors obey that condition we know that this would be a strict equality if we were on this top line which means that it's going to be greater than or equal to if we are anywhere in this space over here so we're saying that for all vectors who are in this space over here we require that they get classified as plus ones okay so this is enforcing the condition that anything on this side of the margin gets classified as a plus one and we can also make the second condition the parallel condition that if y is equal to negative one which means the student truly does not get into their top choice medical school then we need that w dot x i minus b is less than or equal to negative one which vectors in our picture obey that condition that means that the vector would be on this side of the margin which means that we require that everyone on that side of the margin is classified as a negative one which again is exactly what we would want so these two conditions together can actually be written in a compact form because notice that if we have y i equals one we need this quantity to be bigger than or equal to one which means that the product of this two quantities would be bigger than or equal to one and if y i is equal to negative one then this quantity needs to be less than or equal to negative one which again means that the product of them is bigger than or equal to one so in all cases we need that y i times that quantity this one is bigger than or equal to one for all i going from one to n okay so that was a lot of math i encourage you to rewind or stop or pause or write something on your paper if you need to check all these things for yourself but the two big takeaways at this point are that we first learn that in order to maximize this margin which is what svm is trying to do we need to minimize the magnitude of w but we need to do that such that this constraint is being obeyed and all that is summarized in this green box so in this green box is the hard margin svm problem expressed mathematically we want to pick a w and b so pick a weight vector and pick an intercept such that we're minimizing this w magnitude which is the same thing as maximizing the margin but we need to do that obeying the constraint that for all i going from 1 to n we are obeying this condition which is that y i times that quantity is bigger than or equal to one and this is exactly the hard margin svm problem if we can find a w and b that satisfies this then we have solved the svm problem okay so that's mathematically a hard margin svm and the last thing i'll note here is that of course this is called a support vector machine so where are our support vectors the support vectors you can actually now you have all the foundation to figure out mathematically what they are the support vector would be this green triangle and these two red x's who lie exactly on these two solid blue lines which means that w times x minus b x being any of those support vectors is either equal to one or negative one one if it's in the positive class and negative one if it's in the negative class which means that when you multiply that quantity by its class label you're going to get one so that means the support vectors are exactly those such that y i times that quantity is equal to exactly one anything words greater than one are not support vectors those are the ones that are outside the margin and those are the ones that if you were to shift them around in that space or shift them around this space they're not going to affect the final outcome of the problem okay so that was the math behind svm at least for the hard margin case i think i'm going to save the soft margin case for a second video just so that we're not putting too much in this video if you have any questions at all please leave them in the comments below please like and subscribe for more videos just like this i'll see you next
Info
Channel: ritvikmath
Views: 9,082
Rating: 4.9508195 out of 5
Keywords: big data, machine learning, ai, data science, svm
Id: bM4_AstaBZo
Channel Id: undefined
Length: 10min 19sec (619 seconds)
Published: Wed Nov 25 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.