Radial Basis Function Kernel : Data Science Concepts

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hey everyone how's it going so we're going to continue our discussion of kernels by talking about what i think is the most powerful kernel of all when we talk about svm and that is the radial basis function kernel so you might have heard of this called the rbf kernel sometimes and the reason it's so powerful at a high level is that it's able to take into account really weird strange non-regular decision boundaries where it's not linear it's not like quadratic it's not anything nice it could be really weird as we might very naturally have in the real world and it's still able to solve the svm problem and separate the two classes fairly well in many of those situations and of course the question is why is it able to do this and let's go to the goal of this kernel so the goal of this kernel for example if you're dealing with some naturally two-dimensional data that has some x1 and x2 coordinate of course we know that with kernels we're trying to project our data into higher dimensional space and the goal of the radial basis function kernel kind of takes this idea to its extreme it says that i don't want to stop by just considering the one-way interactions between my variables which are just the variables themselves i don't want to stop it considering just the two interactions where i consider one variable and another variable i don't want to stop at three actually i want to consider the infinite number of interactions between my variables and just already that seems insane and that's why this kernel was hard for me to grasp why it's hard for a lot of students to grasp is like what does that even mean to consider the infinite interactions between these variables for example you have to consider the three interactions the four interactions the five but you just gotta keep going and when we look at our diagram again so we had this diagram in our intro kernels video and we talked about how we're trying to go from the original data which is this x1 x2 and we're trying to eventually get to the transformed inner products which means after we do whatever transformation we're trying to do we need to get all the inner products between the transform vectors and we said in the previous video that if we go this route where we actually transform the data and have that data living in memory and then we go ahead and take the inner products of those that can be inefficient it can be required a lot of data storage but in this case it's actually much worse than that it's just impossible when you think about projecting your data into an infinite dimensional space and you're like okay i actually want to get those transformations you literally can't there's not enough room on your computer to store infinite data so we literally cannot go down this route so we're forced to go down the other route we talked about which tends to be more efficient when we're dealing with very high dimensional data like this and that other route involves first taking the inner products between the original variables which is easy because the original variables live in a two-dimensional space and then the tricky part is going from those original inner products and using some kind of well-constructed kernel function to get those transformed inner products without ever actually doing the transformation and so of course the question is how does the radial basis function actually formulate its kernel and what i'll do is show you the formulation and then we'll kind of back derive which means we're going to show that if we write the kernel in this way it does actually solve the goal or achieve the goal of getting an infinite dimensional projection of your original data so let's start from the actual definition the radial basis function kernel of x i and xj is given by e which is euler's number to the power of negative one half the norm of x i minus xj squared and that's the l2 norm so okay it's not clear how that amounts to projecting anything into an infinite dimensional space but let's do a little bit of algebra and see if we can get this into a form that makes more sense to us the first thing i'll do is say that x i minus xj is norm squared is equivalently written as the transpose of this difference and then the difference itself so that's just a fact about vectors and then i can actually break up this inner part so i can distribute all the terms so i get x i transpose x i minus 2xi transpose x j plus x straight transpose x j so i have this now and then what i'll do is break up this single exponent as two exponents because i want two distinct terms the first is going to collect this term with just the eyes and then this term with just the j so that goes into the first term and the other term will collect this guy the negative 2 and the negative one half nicely cancel out to become exponent of 1 so we get e to the x i transpose x j now the reason i did this is because now we can easily see that note that this function which before didn't seem to have anything naturally to do with the inner products between our original data can be expressed exactly only in terms of inner products of our original data for example if you look at all the x's in this form here you see that they only come in the form of inner products between the data and some other vector in the data so in fact this is a kernel function because we are able to express this function only in terms of inner products of our original data which is important because that's the only way you'd be justified to go from here to here using this thing we're calling a kernel function so that's a good first thing to note now what i'm going to do is i'm going to call this big part here the first e to the power of thing just as some constant c because i don't want to worry about it too much and the other one i'm going to leave it as it is now let's follow this arrow up here let's do a little bit more algebra so this next step isn't absolutely necessary but i wanted to draw as many parallels to the original kernels video as possible and so what i'm going to do is add and subtract 1 to the exponent hopefully you agree that doesn't change anything and so i get c e to the power of x i transpose x j plus 1 and then just e to the minus 1 as well and i'll just absorb this e to the negative 1 into the constant c and i'll just call it c prime so now what we're looking at is that this original kernel is equal to c prime e to the x i transpose xj plus one now here's where this concept of infinity comes in if you think back to your original algebra classes and when you first learn this definition of what does it mean to do e to the x believe it or not this might be a long time when you haven't learned this and you've been using the exponential function the whole time but e to the x is defined as an infinite sum specifically this definition of e to the x is the sum from n equals 1 to infinity of whatever thing is in the exponent which in our case is this guy to the power of n divided by n factorial and that's actually the definition of e euler's number and so we're okay to write it like that and now the last thing to note which is going to tie this whole thing together is that if you think back to our original kernels video which is linked below by the way i highly highly recommend or require actually watching that to understand this pretty well we see that this numerator here 1 plus x i transpose x j to the power of n is exactly the definition of our polynomial kernel of degree n so if you go back to that video you'll see that's exactly the definition and so we can write k poly degree n of x i transpose x j and of course that's all divided by n factorial what does this mean so let's kind of go back to high level and what we just showed is that this thing that we call the rbf kernel which is expressed pretty cleanly like this easy to compute it's just e to the power of something which is pretty efficient to do can equivalently be expressed as an infinite sum of polynomial kernels each of which is increasing in degree from the last and what that means is that using this very innocent closed-looking form we're actually able to consider all of the infinite interactions all of these one two three four five all these infinite degree interactions between our original data and that is the power from there comes the power of the rbf kernel the radial basis function kernel so that that's really it i mean like we've showed the fact that this can be written as this and therefore it is actually giving you the power to do an infinite dimensional projection without ever actually having to do that infinite dimensional projection because that's impossible um so i hope this continues to blow your mind because even though i understand it well now i hope you understand it well too it's still kind of a mind-blowing concept that you can achieve the benefits of an infinite dimensional projection without ever visiting that infinite space so that's just one of the cool things about kernels and i hope you like this video please like and subscribe for more and i'll see you next time
Info
Channel: ritvikmath
Views: 13,886
Rating: undefined out of 5
Keywords:
Id: Q0ExqOphnW0
Channel Id: undefined
Length: 7min 57sec (477 seconds)
Published: Wed Mar 03 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.