SVM8- Solving the optimization problem of the SVM (Part 2)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in the last video i ended up with this expression and i said that this is called the dual and this is going to be used to find the lagrangian coefficient in other words alpha sub i which is this one and when we find this coefficient we will use it to find the w so to find the alpha i we need to work out here a maximization problem so this is so now has become the maximization problem so we need to maximize that for alpha i higher or equal to zero now why we have this relaxed constraint here because in my original constraint i have a constraint that is this one i have a constraint that is higher or equal than zero therefore this constraint here on alpha i should also be higher or equal to zero so now to solve this we use numerical methods and because it's uh analytically difficult or i think it's even impossible to solve so in such cases what we do is that we use numerical methods so what we basically do is that we try for different values of alpha i and we see for which of these values which of these values would yield a maximum value for this expression but of course we are not going to do that because this is done there are many functions if you want to program that there are many packages many libraries to to do that and the reason why we did develop this expression even if you are not going to work out this maximization problem is that we wanted to see upon which this maximization problem depends and we discovered that this maximization problem depends upon the number of peers x i x g okay so when the number of peers of training symbols increases then the you know the number of iterations we would need to do also increases is just to know uh upon what the complexity of this maximization problem depends that's all so now when we get the value of alpha i that would maximize this equation here or this expression we replace it in this expression of w to get the value of w the magnitude as well as you know the direction so here alpha i or y sub i is known x sub i is known and alpha sub i is known [Music] this would allow us to compute the w okay but now our optimization problem in our optimization problem we want to find not only w but also b so now before finding b i want just to talk about the what's the meaning of the or what is the interpretation of the different values of of alpha i so we have three possible interpretations okay so the first possibilities is that alpha i or alpha sub i equal to zero so in that case i say that the corresponding training sample x i is a sorry this is not well it is not is not a support vector and i will explain in a moment what is ac port vector okay so what is a support vector a support vector is simply a training sample that lies on the margin for example assume that you have the following um distribution of the training samples say that i have only two or three positive training samples and three negative training samples and somehow i got my margin like that so this training sample here that lies on the margin this positive sample that lies only margin also this negative sample that lies on the margin is called a support vector so this was the first case and if alpha i is different than 0 then it means that x i x sub i because spending training sample is a support vector but here i have another case if alpha i is different than zero and the value of our phi is very high with respect to the when compared to the mean of other values then what i could infer is that the corresponding x i is of course a support vector a support vector and an outlier training center that's what we need to know about the interpretation of alpha i so basically we can identify the support vectors using just the values of alpha i and this is useful to find the values of b or the value of b so now we'll talk about how to find this value of b and if you remember in the beginning i said that b so this is the first rule that we said in the first video so we said now let me just draw here support a feature space to see what i'm talking about this is say that these are my positive samples and these are my negative and somehow this is the margin that i got okay let's use the margin and this is the w this is the given w you said that this is the w that i found by my after optimizing my problem so now i said in a previous video that any the the dot product of this vector with any unknown point say that this unknown point is for example here this is an unknown point so this gives you a measure of distance from this point here to this point a measure of distance from this point to this point and i also said that when w the magnitude of w is equal to one then this dot product represents exactly this distance okay this is what i said and also i said in the first video that the decision rule can be written in this form so this is w dotted with you know any sample that you want yeah but say that this sample is z for example and if the dot product of w with z is higher or equal than c then we say that this is a positive symbol positive sample otherwise x or z is a negative seven and this c is a measure of distance from the center of this feature space to the decision boundary that should be in between this margin here so if this is the d this is the width of my margin this distance here should be divided and c is just a measure of distance from this center to this point here and then what i did is that i said well i can sit i can modify this decision rule to obtain something like that so i get here plus d and this is higher or plus b rather b this is higher or equal to zero and the b is equal to minus c okay that's awesome [Music] sorry for this interruption so this is what it would say and he said that b the b that i'm looking for is equal to minus c so what i will do to find the b is just to find to look for c okay and to look for c what i need to do is to compute this distance or to measure this distance take a get a measure of this distance let me call it d plus and then again i will compute and be a measure of this distance here d minus and then c my c would be equal to d plus plus d minus and i will divide that by two then i get a measure of distance from here to here okay now the question is how to get d plus in d minus well to get that i need a i need to dot product bw with a point a positive point on the left margin on the right margin okay so i need to find a positive support vector which is in this case this support vector i need to find this support vector and i need to find this support negative vector this is a support vector and this is support vector and if i dot this with w i get d plus also if i dot product w with this support vector i get d minus so now the question is how to find these support vectors i need to find a support vector that is positive and also i need to find a support vector that is negative so let me denote the support vector the positive support vector has xp so this is the positive support vector that i'm looking for and x and this is the negative support vector that i'm looking for and i can take any support vector any positive support vector would do it and also any negative support vector would do it so if you remember i said that here i said that when alpha i is equal to 0 is not equal to 0 then x i is a support vector so this gives me an indication of the support vector but now i need an indication for positive support vector okay so let me just write that that down so alpha i if alpha i is different than zero this gives me indication this is a this means that x i is a support a support vector now i need an indication of the support vector which is done by alpha i the value of alpha i but now i also need an indication of the positive training sample or negative training symbol and this is also easy i can use just the y i and if you remember we set this variable to be equal to plus one for positive support for positive samples and minus 1 for negative sentence so now i think the idea is clear so now xp can be any vector whose alpha p is different from zero and y i or y p is equal to one okay and x n is any is associated with can be established by any alpha n that is different from zero and y n for which y n is equal to minus one so this is how i can find x n and x p and then to get the distance d plus this distance here i can just dot product w with xp and the d minus can be obtained by that product it can be dot product of w with x n okay and now my c is equal to w or half w dotted with xp plus w dotted with xn and at the end i say well d is equal to minus that minus c rather b okay so this is how i get the value of b i hope it was clear so this is of course just one way to identify the uh distance d plus and d minus there are of course other ways but i think this this method here is will work fine and now i think that i talked about so far i talked about how to find the w and we found it and also we said that w depends upon alpha i we need to find alpha i and alpha depends upon maximizing this dual lagrangian and after that when we get to the w using this alpha i we can easily compute b by using the dot product that i just explained here
Info
Channel: Zardoua Yassir
Views: 208
Rating: undefined out of 5
Keywords:
Id: TMyOR1NezBY
Channel Id: undefined
Length: 15min 17sec (917 seconds)
Published: Thu Oct 15 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.