Derivative of a Matrix : Data Science Basics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody in this video we'll be talking about the derivative of a matrix and as I say that I kind of have to admit that that's not exactly the terminology for what we're going to be doing it's just that a lot of students think of it as taking the derivative of a matrix so that'll become more clear in just a second before we get into the derivative of a matrix let's go into some familiar territory what is the derivative DDX of the function KX where K is a constant a lot of you would probably find this pretty trivial you would just say oh just K right okay so what's the derivative DDX of KX squared you guys say okay it's still pretty trivial it's just 2k X these problems aren't very hard because we're just taking the derivative of a function well let's jump over here and let's look at this matrix 8 which is just 1 2 3 4 the 2 by 2 matrix now let's think about the operation a times X where X is of course a 2 by 1 vector for this all to work out so what is the derivative of this well that doesn't really seem as clear as these easier problems we did on that side of the board right but at the same time there should be some kind of definition for this because a times X is a function after all we're taking the vector X and we're running it through this linear transformation a remember a matrix is just a linear transformation so we're taking this vector a running it through linear transformation a and we're getting some output back so since it's a function it should also have a derivative I think right yes it should it's just a matter of defining it properly so before we worry about the derivative part let's look at what a times X actually means so we're gonna write it all out in longhand notation because it's going to help us out a times X is simply 1 2 3 4 and X itself has two entries X 1 X 2 now if we go ahead and do this matrix and vector multiplication we get X 1 plus 2 X 2 on the top right on the bottom we're going to get 3 X 1 plus 4 X 2 so that is what a times X is equal to now for terminology let's go ahead and give names to these two functions we've created let's call this one F 1 of X 1 and X 2 and let's call this one F 2 of X 1 and X 2 okay so we see that that makes sense both are functions of two variables X 1 and X 2 and there are two separate functions so F of F 1 is X 1 plus 2 X 2 and F 2 is 3 X 1 plus 4 X 2 so some of you thinking why did I just make this more complicated than it had to be well now we're going to go ahead and define what it means what this means D DX of a X how do we actually define that here is how we'll define that let me get rid of this stuff right here so we're gonna define this as a new matrix which is going to be done in terms of derivatives that we know how to take one similar to the ones we did on that side of the board just before so I'm gonna enumerate them out and then describe them so we have D F 1 DX 1 we have d f 1 DX 2 then we have DF 2 DX 1 and lastly we have D F 2 DX 2 okay so that's a lot of derivatives but we know how to take each of these for example let's look at the first one the first one just says what's the derivative a function f 1 which is this guy with respect to X 1 well here's function f 1 what's the derivative of this function with respect to X 1 that's really easy it's just 1 right because this part cancels out because we're not interested in X 2 and the coefficient of x 1 is just 1 so this is going to be equal to 1 next part what's the derivative of that same function with respect to X 2 in this case we care about the 2 so this is going to be equal to 2 the next question is what's the derivative of f 2 the second function down here with respect to X 1 that's 3 and then this last one will be 4 because it's the derivative of the second function with respect to x2 so we get that the derivative of this linear transformation a times X so here's where I want to say that we're not really taking the derivative of a matrix we're taking the derivative of this linear transformation a times X for example taking the derivative of a matrix doesn't really make any sense because it's not a linear transfer that sky like taking the derivative of a constant which would be zero so we're taking the derivative of this linear transformation right here which happens to involve matrix a so that means that after we've done this derivative we have found that the answer is 1 2 3 4 where did you see that matrix before that was the original matrix a of course so that means that after all of this work we have found that DDX of this linear transformation a times X is equal to a and the reason I think this is so awesome is because it has a very clear analogue to this problem we did in the beginning where we did DDX K X is equal to K it's this K back then was a scalar like a number like 1 or 2 and we found that its derivative was just that number itself in the same way this matrix a is not a scalar but it's a collection of scalars in a little box and we find that when we take the derivative of a times X we get that collection of scalars in the little box back which is awesome it's it's just kind of elegant that way now before we close this video let's look at one more matrix related derivative which is a little bit tougher but we're gonna look at it because it shows up a lot in our data science videos we'll be looking at in a bit so here's a new linear transformation we have X transpose a X so here's our new function that we want to take the derivative of before we take the derivative let's try to understand this function okay so in this case we're going to leave a in a more general format we're not going to give any concrete values so we can do this in a more mathematical way so X will be X 1 X so that transpose just takes the long way of the vector and squishes it into its flat version right here so let me write this a little bit bigger we have x1 x2 a will be a 1 1 a 1 2 a 2 1 a 2 2 that's the four elements of a and of course the X is going to be x1 and x2 ok so I took that transformation and I wrote it out in its long format now let's work it out here let's do these two first since we just did something like that so we're going to get a 1 1 X 1 plus a 1 2 X 2 we're gonna get a 2 1 X 1 plus a 2 2 X 2 ok the other thing I want to do here I'm realizing this now but I want to make sure that this matrix is symmetric it doesn't have to be symmetric it's just that the application will be looking at it in specifically principal component analysis next it will be symmetric so that's going to help us understand so instead of having a 1 2 and a 2 1 let's just call this guy a now the matrix is symmetric ok so that means that this guy and this guy ok so now we just have to apply this to that which is simple as multiplying X 1 by the top and then adding X 2 times the bottom so let's go ahead and do that here that's going to look like a 1 1 X 1 it's going to look like a 1 1 X 1 squared then we're going to get a plus a X 1 X 2 right yeah then we're going to get the bottom times X 2 so we're gonna get a X 1 X 2 and then we're gonna get X 2 times that so a 2 2 X 2 squared by the way if the pace is too fast for you please take a minute to pause and convince yourself everything on this board is accurate ok once you convince yourself of that let's move on we see we have a a X 1 X 2 term here and also here so we can just put two of them simplify that we were able to do that is because of the symmetric 'no sub that matrix okay cool so now we have this guy and we'll call this some function of X 1 and X 2 let me erase this stuff here now what does it mean for us to take the derivative d/dx of this linear transformation right here well in this case we only have one function so it's going to be as simple as d f DX 1 and then DF DX 2 so we only have one function shoot okay so we only have one function so we only care about the derivative of this function with respect to X 1 and also the derivative of that function with respect to X 2 so let me pause here before we actually work out the answer in general you're going to basically write out the complete form of whatever function you're taking the derivative of here's a secret for you once you've done enough of these you don't have to write out the complete form you can kind of just see the patterns as we'll see in a second but for now you'd write out the complete form you would see how many different functions you have in this case we just have one in the previous case we had an F 1 and an F 2 both of which were functions of the two variables and then you would take the number of functions times the number of variables and you would create a new matrix which is basically every pairwise derivative so if you had let's say three different functions and four different variables you would have a 3 by 4 matrix and one element of that matrix would be what's DF 3 with respect to DX 2 and you would have every other combination and that's basically what taken the derivative of some operation that involves a matrix means back to our scheduled program what is DF 1 with respect to X 1 well that is going to be 2 a 1 X 1 2 a 1 1 X 1 plus 2 a x2 - a X 2 what is D F with respect to D x2 that's going to be 2a X 1 and then 2 a 2 2 2 a 2 2 x2 yeah okay cool so that is I think that's falling off the page a little bit but you know what let me just rewrite it because so that's the derivative there looks a little bit ugly can we clean it up we can definitely pull it to ow twos coming out right there so that's a little bit cleaner now what else can we do we can notice that this guy is actually just the multiplication of if I leave this a11 here I leave this a2 to here I leave this a in this a here and I pull out the x1 and the x2 does that fit on the page yes it does then actually take out these X's here and these plus signs and that's the same transformation right there go ahead and redo it redo the multiplication if you want to see that the previous step was equal to the current step but this is true here and what is a 1 1 a a a 2 2 well that's just the original matrix a right I know we've erased it but that's the original matrix a so all in all we have to a X let me write that is equal to 2ei X let me get rid of everything else on here except the result itself so we found that the derivative with respect to X of this transformation a transpose X a transpose sorry X transpose ax is equal to 2ei X and why is this awesome this is awesome not only because we're gonna use this thing in our principal component and future videos it's awesome because it's an analog to the first thing we looked at in this video so we looked at a derivative of with respect to X of KX squared this quadratic is equal to 2 K X in the same way this is sort of like the quadratic of matrix operations because we have an X here and an X here I know it doesn't look exactly the same but it's kind of like the analog and it's derivative is very similar it's two times a in this case it was K times X so again we see some really elegant analogues between the derivatives we know and love and these new new matrix derivatives okay so we're going to be using that pretty heavily in the principal component video and future videos okay so until next time
Info
Channel: ritvikmath
Views: 260,432
Rating: 4.86375 out of 5
Keywords:
Id: e73033jZTCI
Channel Id: undefined
Length: 13min 42sec (822 seconds)
Published: Mon Sep 09 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.