Maths Intuition Behind Support Vector Machine Part 2 | Machine Learning Data Science

Video Statistics and Information

Video

Captions Word Cloud

Captions

[Music] hello my name is Krishna and welcome to my youtube channel so guys today in this particular video we are going to understand the maths intuition behind support vector machine my previous video I've given your idea about support vector machine you know and the main focus is basically to get this margin distance maximum says that we'll be able to separate the points quickly so we'll try to understand the maths intuition and in the real-world scenarios guys not all the problem statement will be in this form so we will try to see that how do we fix that and lot of equations I'm going to write so please make sure that you watch this video till the end right so let us go ahead okay to begin with guys understand a basic difference between SVM and logistic regression in logistic regression also we try to create this kind of hyperplane in case of two dimension we'll try to create a straight line so that we'll be able to divide the point properly but one thing that is getting added to SVM is basically this marginal distance you know this positive or the negative plane you know the extra plane that we are actually creating right so this is getting added so in order to derive through this particular stuff we will try to understand what is the max behind it so let us go ahead let me consider a simple example rays over here okay I have two points that is minus 4 comma 0 at this particular coordinate and there is 4 comma 4 and this particular coordinate and suppose I draw this straight line to divide this particular point in this particular case I am going to say that suppose consider that my slope is minus 1 okay and you know that the equation of a hyperplane of this particular hyperplane it is given as W transpose X plus B is equal to 0 I hope everybody's familiar with this because I have already discussed this in my logistic regression if you go ahead search for what is the equation of a hyperplane you basically get this particular equation right and we also know this particular equation right y is equal to MX plus 3 right everybody knows that now understand this thing guys suppose for this particular coordinate I want to calculate the Y value how do we calculate Phi in this particular case I know my slope that is my is nothing but minus one because I have just considered this as minus one you can see right the next thing that I have to do is that what is my C value or B value okay so this C or B constant this is going to become zero because it's passes through the center of origin quadrille coordinates right it passes through zero so I'm going to consider this so in short my equation becomes W transpose X right now with respect to this W okay and this C is basically zero okay now if I want to do this particular multiplication and asked to do the W transpose multiplied of X X I have two coordinates for this particular point right so I have this is my x1 this is my X 2 so concept of this is my x1 and this is my X 2 right so if I want to make this multiplication what is w over here in this case W is nothing but slope right slope over here is minus 1 right and if I consider this B as 0 this will be my 0 over here and why I am doing transpose because I have to do the matrix multiplication and suppose if I take my x coordinates that is x1 and x2 this is nothing but minus 4 comma 0 right 4 comma 0 I want to do the matrix multiplication you will be seeing that now my value is 4 right this 4 is nothing but a positive value now with respect to this guy's understand one very important thing let me come up with any other points over here any other points over here before this particular line and if I try to compute my Y value it is going to be always positive it is going to be always positive okay and this particular thing when I was explaining logistic regression also I missed this thing so I am trying to explain you over here whenever we go below this particular line in this situation over here with respect to this particular points whenever I am trying to compute in this particular equation at that and you'll be seeing my Y value is always going to be positive ok it may differ it may become 4 it may become 8 it may become 9 but what is the main aim anytime I come up with respect to this particular points it is going to be positive okay so this is one scenario now suppose for this coordinate if I calculate okay suppose I go above this particular point and I do the same computation right y is equal to W transpose X now what is my W it is same minus 1 0 what is my X X 1 and X 2 it is nothing but 4 comma 4 now if I do the calculation this is going to become minus 4 huh if I come up with any points at that side now if I try to come up with an equation over here or if I try to find out the Y value it is always going to be negative ok so this basically indicates that on this line when I am coming towards the downward direction this values are positive when I go on the top foot direction we are getting a negative values so here what I can do is that I can consider this as one group this as my another group okay but still the question raises is that I am getting positive and negative value Y we are discussing about this right because it is important in SVM will try to understand why it is important ok so this is one group this is another group when I say positive can I consider it as more plus one see anyhow it is going to be positive ok it may be plus five plus 6 plus seven plus eight plus hundred but it is positive right and since in a classification problem I can consider this as plus one because it is a positive value ok just to simplify my derivation in forward how I am actually going to treat it and similarly over here since this is negative I am going to take it as minus one so this whole classified point is plus one this whole classified point is minus one that is simple pretty much simple right and this particular line basically says that if I go below this it is going to become plus one this is one of my category if I go above this this is going to be minus one right so understand in this particular situation now where you might be is actually zero right this B is actually zero my this particular intersect is actually zero okay now pretty much simple I have done this you have understood why we are taking this as plus 1 and this is minus 1 that you have understood now let us go ahead towards SVM in SVM this particular equation as I told you this equation can be given as W transpose X plus B is equal to zero okay so this is one of my equation that I have written over here now what I do is that from this particular point I'm just going to stretch my hands suppose from this point I'm just going to stretch over here and from this point on we'll just go to the stretch over here so I am going to find out the nearest point over here okay nearest point is basically this blue point right as soon as I find out this nearest point I am just going to draw the marginal plane so this will become my in this particular scenario this will become my negative plane right negative claim now in negative plane I can basically write this equation as W transpose X plus B is equal to minus one okay so this is my another equation and similarly in this particular scenario I can basically write it as W transpose X plus B is equal to positive one so let me consider two planes and this is my marginal plane right in SVM we have to compute this marginal plane right at based which will like based on a problem scenario which will be having the maximum distance between these two will be considered the you know best line for dividing these particular points right we because understand that this central line is only not we have not just start there actually computing this hyperplane may also computing this positive and the negative plane right in a scheme otherwise just imagine if you are not computing it this is just like an artistic regulation these two things are there so we need to compute this and we need to have the maximum distance point right freedom is simple now I have got this particular equation and I have also showed you how this equation is actually valid with respect to minus one and plus one again I can have minus hundred plus hundred but that is not the main scenario over here I am just want to find the positive in the negative points right because understanding real world scenario if I say yes or no I cannot say that okay below this line it is yes after that particular line it is no mathematically how we will say it we need actually considering this particular scenario so three things I have told you how we came to this particular equation this is basically the equation of a hyperplane this is equation because above this particular points showing you whenever I'm competing with respect to the Y value with respect to anything and remember over here I have some B venue the B value is not 0 because it is not passing through the intercept okay there will be some B value over here B was passing through the zero intercept right I mean zero point so over there the B value was 0 but here there are some distance okay perfect now you have understood this but still I told you that in SVM where ever this marginal distance will be high that particular hyperplane I'll be considering so how do i compute this right how do I compare to this very simple let me consider this is my x1 this is my x2 let me just write it in black marker so that you will be able to understand it clearly okay so what I'm going to do let us consider this is my x1 this is my X 2 okay I want to compute the distance between this particular point and this particular point if I if I I mean when I'm saying that by this particular point I basically mean this hyperplane and this particular point I basically have to compute this distance right how do i compute it just understand away I'm just going to love this but I hope you have understood what I am I'm trying to explain okay let's do it now in order to compute this for this particular point can I write the equation something like this the blue transpose x1 plus B is equal to minus 1 because this is the minus 1 right okay and for this particular point I can write the blue transpose X 2 plus B is equal to 1/3 demas simple I want to compute the distance between them what I am talking what is the difference between X 2 and X 1 right this is what I want to compute that will actually give me the distance remember all these distance are same if I consider all these particular distance they are saying right now when I am computing this let me just write it as minus 1 minus 1 over here because I need to find x2 minus x1 so over here you can see that I will write it as a W of T X 2 minus X 1 this will get deducted and this value will be 2 pretty much simple what I've done I'm just I hope you have done this in linear algebra right normal I've just taken it I want to find x2 minus x1 now now understand I just have this w of T I need to remove this w of T is nothing but this transpose of this particular slope values and if I want to remove this I can actually remove it with the help of norm of the blow of P norm of the blue norm of the blue basically means we have given by this particular notation I cannot just write directly remove the blow of P because there is some direction involved okay there is some direction involved so for this what I am going to do I'm just going to write like this I'm going to divide by norm of the blue both the side and dividing by norm of W that basic means as soon as I divide this this this whole magnitude of the blue will just go off this whole magnitude of the blue will do go up but delivery some direction x2 minus x1 this direction that I'm talking about so finally what it says is that this is my equation this is my optimization function and we need to maximize this we need to maximize this very simple I can understand that is we need to maximize this okay so let me just write it down again I need to find out max I mean first of all I need to update the blue comma B such that you know I have to maximize this particular value okay pretty much simple this is this is what is my optimization function optimization function but still there is one more thing to consider over here this also there is also one more thing to consider I told you that anyhow I get the positive or the negative value it will always be going to considered as minus 1 or plus 1 right so here I'll be writing such that okay let me write my Y of I will be 1 will be one where my WT of X plus B is greater than or equal to 1 okay and this will be minus 1 where W to e of X plus B is less than or equal to 1 so a less than or equal to minus 1 ok so this is my condition I have to maximize this particular value I have to maximize this particular value okay I'm just going to draw it over here I have to maximize this particular value but my condition such that I always have to make sure that whenever WT of X plus B is greater than equal to one I have to consider my Y value as plus one okay this is my plus one and wherever my tableau of T of X plus B is less than or equal to minus one after always consider this as minus one because that is what it says suppose I want to compute this particular distance from this particular point right here I will definitely get a higher positive value right higher positive wave with respect to support vectors I will definitely get it as one with repel with respect to this support vector I'm going to get it as minus one but what about these all points I'm going to get a higher positive value okay with respect to this particular equation whenever I get a high value I'm just going to consider it as well okay now there is one more thing that needs to be added so this is my condition of my optimization right I'm just going to rub this again I hope you are understanding it guys if you're not please just go behind this and just have a look okay now I can consider this whole thing that I am considering such that right I can also write W multiplied by W T of X of I plus B of I is greater than or equal to one instead of writing like this I can also represent it writing like this y understand W of I whenever I am computing in this case it will be positive right and whenever I compute this right if I compute this this is also going to become positive so anything multiplied positive x positive will always be greater than equal to one right in the second case suppose if my W of I is minus one minus one basically suppose I'm representing this point okay and if I try to compute the beauty of X plus B of this from this particular distance or with respect to the Y value this is also going to become negative value so negative value multiplied by a negative value will always be greater than equal to one now my name is the whenever I come to this particular equation and whenever I take this Y value and this particular value if I try to multiply this should be greater than or equal to one if it is not greater than or equal to one that basically means it is a miss classification it is a miss classification because understand suppose I am getting minus 1 and suppose this particular value is greater than equal to plus 1 that basically indicates that non-negative point is over here and when I actually compute this particular value it is going to become positive so negative into positive will actually give me a negative which will not satisfy this condition my condition is that whenever I multiply this value with this value and this value with this value it should always be greater than equal to 1 in this particular case when I compute this for a negative value this is going to become a negative value itself I've shown you in this particular example right in this particular example so this multiplied by this will always be our greater or equal to 1 now one more thing that is pretty much important to understand this is my whole optimization function right now one scenario will be that always remember in the real world scenario I don't have this kind of points we will not usually have this kind of points there will be lot of overlapping there will be lot of overlapping you know if we try to get a situation in this particular way that basically means you are trying to overfit the data so in that specific way we should just not over fit the data instead we should try to find out a solution or you can also say a hyper parametric 10 tuning techniques you know because in real-world scenario I will not be finding this this proper way of you know the linear distribution that that basically means I just by creating a straight line I cannot divide those points there'll be lot of overlapping lot of overlapping so what we do is that the final thing about this particular equation I'm just going to rub this days because I can I hope you understood this ok ok the final thing that I will write it as over here right so I have to change W star to and B star in such a way that not C imagine away what I am doing here max of 2 divided by norm of W it is there now I will try to do me of mean off norm of W / - now you're saying that how did I reverse it understand for a function of property that whenever you are trying to do the reciprocal of the same thing over here that instead of Max it will become min and why I am taking mean because I understand guys while you are creating a straight line also and if you if you know about linear regression there also we actually try to create a gradient decent in gradient descent also we try to minimize this distance right through derivatives so that is the reason why I'm taking this mean value okay so this can be represented something like this okay now apart from this I am going to include two terminology just to optimize my model better one is the C value one is the C value I'll tell you what the C value is there and I is equal to 1 to N I'm just going to represent this as Zeta okay Zeta of I now understand what is - - value basically means C value basically says us how many errors my model can consider how many errors and this is the summation of or I'll say that this this is basically the value of the error value of the error now this - terminology we need to understand understand over here guys suppose what is this errors suppose this Blue Point is present below this Ryne or over here right at this particular point of time you can you know that this is misclassified right this is misclassified suppose I may have two to three points okay and suppose my model I'm considering that okay I can I can say that okay even though there is five ever errors my model is not going to change this hyper parameter line I mean the this line this hyperplane that divides the point okay I'm not going to change it okay it's okay fine two to three errors will be there right so I'm considering in that particular way so this C value basically indicates that how many errors we can consider like this suppose I consider five errors the blue points I can consider okay this is basically an error right if I compute this particular distance I will be getting this as an error right and similarly I may have two red points over here I may have two red points over here this is also an error right so even though I'm considering over here that my model can have five errors even though I have five ever as I don't have to you know change my hyperplane it's okay it is having five errors you know we are trying to fix the old fitting problem because if I try to do the or fit even though I get a run one errand I again hyperplane will be changing that should not happen I can consider some number of errors in my solution because I need to create a generalized model right I need to create a generalized model now the other thing is that what is this value of the error value of the error basically means that whatever distance I am getting with respect to this error I am just going to do the summation suppose for this error I got five for this error I got six for this error I got three for this error I got and in the value of the distance I got it as four I'm going to add this four plus three plus six plus five x how many number of errors so this whole is basically what is all about SDM this particular parameter is now called as regularization how do we get this value by hyper parameter tuning regularization right so this is how SVM actually works I know guys there's a lot of maps included for me to learn every time I learn is Rhian some new things I understand about it and probably you have to see this video again and again I'm not demotivating you but instead try to understand in such a way that you understand these things then only you'll be able to code much more efficiently and this kind of SV and you know where we are just creating a straight line this is called as linear SVM you know hard hard SVM your so called is that how classifying SVM that basically we are strictly classifying all the points into two categories right so this is how the mass is derived you know but they are still more problems we'll be having a situation where we cannot just split into like this you know there'll be a lot of overlapping can we just give you one scenario which looks something like this suppose this is my points right and you know I may have some other points in this scenario I cannot divide this points and here I have to basically use the SVM kernel trick the SVM kernel trick will be my next video you know and we will try to see that how we can solve this kind of problems when my data is not linearly separable this is a nonlinear separable data right but this is how the mass is actually derived with respect to the SVM where we are considering two categories of points you know for classifying the points itself I hope you have understood it if not guys please do revise this again and again but you will be able to understanding second or third attempt so I hope you have understood this particular video please do subscribe the channel guys please share with all your friends about this particular explanation it would be definitely helpful for them many people are able to get the jobs by seeing my videos please help them out no you need to help everyone you need to share your knowledge with everyone so yes this was all about this particular video I hope you liked it please do subscribe the channel if you are not address espera CL in the next video have a great day thank you one and all bye-bye

Info

Channel: Krish Naik

Views: 100,391

Rating: undefined out of 5

Keywords: data science tutorial online free, python data science tutorial pdf, python data science tutorial point pdf, what is data science, data science tutorial tutorials point, data science course, support vector machine introduction, support vector machine pdf, support vector machine in data mining, support vector machine tutorial, support vector machine ppt, support vector machine geeks for geeks, support vector machine regression, support vector machine classifier

Id: Js3GLb1xPhc

Channel Id: undefined

Length: 23min 26sec (1406 seconds)

Published: Sat May 02 2020