COS 302: Eigenvalues and Eigenvectors

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video i want to talk about a subject that is very very important but also very intimidating when people think about linear algebra and that is the idea of eigenvectors and eigenvalues when we talk about eigenstuff what we're generally talking about are the kind of core patterns represented by a matrix so we'll be talking about some square matrix let's say a and let's take that to be in n by n so this is an n by n matrix a an eigenvalue and eigenvector of a are represented by the following relationship a multiplied by some vector x is equal to a scalar lambda multiplied by that vector x so in this case x is the eigenvector and lambda is a scalar which we call the eigenvalue eigenvectors are the kinds of vectors that when you hit them with a matrix all that happens is that they get multiplied by a scalar so even though we hit x with this matrix a all that happened is that we multiplied it by a lambda now that lambda could be negative it could be big it could be small but all that's happening is we're scaling x in some way the reason this is interesting is because somehow x captures fundamental information about the matrix a now there might be many such eigenvectors for any given matrix a in fact you can see right away that it doesn't care about scale so if i was to multiply a by instead some cx then that's just going to wind up on the right side as well so a scalar multiple of any eigenvector x is also an eigenvector it turns out that eigenvectors and eigenvalues come up a lot when we're doing different kinds of data analysis in fact one of the most important algorithms in all of machine learning and statistics is principal component analysis and pca is entirely about identifying the eigenvectors associated with the covariance matrix of the data you're trying to analyze so to get a little bit of intuition for how that works the idea of pca is that we have some cloud of data and let's imagine that it's high dimensional but i'm just going to draw draw it in two dimensions for now imagine that we wanted to analyze these data and we wanted to somehow represent them in a lower dimensional way now it's two dimensions so that means representing them in one dimension now what we could do is project these data onto any line we wanted and that would give us a one-dimensional representation of these points however when we look at this what we can see is that there's one direction that's more elongated than the others and so we might expect that that direction of large variance might be the one that's most representative of the interesting statistical structure in the data and that's a good intuition it turns out for a lot of kinds of analysis that we might want to do especially as we get into higher dimensional spaces but how do we find that direction of large variants in a high dimensional space well the answer is we look at the eigenvalues of the covariance matrix we'll talk about covariance matrices in a couple of weeks when we talk about probability but for now what i want you to think about is just that we're going to model this elongated set of data via some ellipses you can kind of imagine that that's some representation of the density from which these data came and these ellipses are short in some directions and long in other directions and the ellipses themselves are represented by a positive definite matrix it turns out that if we look at the eigenvalues of the positive definite matrix representing something like this ellipse that we would discover that they represent the long and short directions of these ellipses and so what we can do is look at the direction that is the eigenvector that has the largest eigenvalue and that will be the direction in the data of the highest variance so this is a very important tool and like i said we'll come back to this and in fact you'll study it a little bit in the next homework one of the other really interesting ways that we use eigenvectors and eigenvalues is actually to study things like social networks let me give you a little bit of an idea of how that works let's imagine that we have some graph and let's say the vertices could represent web pages or twitter or instagram users but let's give them labels a b c and d and let's imagine that there are directed edges between these vertices these edges could represent links from one page to another or they could represent the idea that a follows b on instagram whatever you want let's imagine that maybe a follows b b follows a back and maybe a follows c and maybe d also follows c and maybe c follows b and we'll say that b and d each follow each other now we might want to reason about how important these different vertices are you could imagine trying to identify who the influences are and maybe the influencers would be the people who are followed by a lot of people and in particular if those followers are themselves influencers and things like that or on a web page context you could imagine that a web page is very important if it is linked to by other important web pages and so we might want to reason about the structure of the graph in order to decide which vertices are the most important ones maybe those are the ones we want to return in search results say it turns out that we can answer this kind of question by looking at the big eigenvector associated with the adjacency matrix from such a graph imagine that you're taking a random walk on this graph so if we're sitting at vertex d and this is a web page and there are two outgoing links then maybe you choose one of those links at random maybe you choose c and then you are on web page c and you choose one of the outgoing links at random in this case there's only one you go to vertex b and so on and you take a random walk over the vertices of the graph by clicking randomly on links on whatever web pages you land on now if you do that you'll be wandering around the graph and if we imagine this kind of densely connected then maybe you can reach any given vertex on the entire graph by doing this kind of random walk then if we imagine that you do that for a very very long time then what will happen is you will converge to some distribution on the vertices that is to say some fraction of the time you'll be sitting at vertex b and some other fraction of the time you'll be sitting at vertex c and it turns out that that fraction of the time is exactly this notion of importance that you might want to be able to reason about that is to say that if a lot of things are linking to some page and a lot of things are linking to those pages then they're propagating maybe importance up to that webpage like if you're on twitter and you're followed by taylor swift and lebron james then maybe that says something interesting about your importance to the graph so the specific way that we figure this kind of thing out is a little bit more complicated but the gist of it is that we write down the adjacency matrix of this kind of graph and then we normalize it in such a way to reflect this idea that you're taking random steps along the edges of the graph and then it turns out that if we think about the resulting markov chain that is the random walk that you're making around and we look at the transition matrix associated with the markov chain the big eigenvector associated with that will be this measure of relative importance of the vertices and that that's actually a thing that you can compute even at scale by simulating from this kind of random walk it turns out that that method of actually simulating from the random walk you can think of that as what we would call a power method for computing the big eigenvector associated with a matrix like that now this might sound very abstract and the kind of thing that nobody would actually do in practice but what i've described here is actually one of the ideas that led to the google search engine an idea called pagerank now i've left out some important details here but the gist of it the idea that importance propagates along links is one of the reasons the google search engine in the beginning had better search results than a lot of its competitors and procedures like i described actually do inform the way that social media companies like twitter think about the users on their platform so again the idea of an eigenvector is that it's a vector x associated with a matrix a such that when we hit it with a all that happens is that x gets scaled by some lambda so this is a very useful thing to do from the point of view of pattern discovery like i was kind of just talking about but it also gives us a way to think about bases in which it's useful to do different kinds of computation in particular thinking about bases where a is diagonal diagonal matrices are really nice because they're really simple lots of zeros you can invert them easily and they're quite straightforward to reason about so when we can find a diagonal basis that's usually a good thing so what's the relationship between a diagonal basis and things like eigenvectors as we said a second ago let's imagine that a is a square matrix and then it's n by n let's imagine further that we've found n linearly independent eigenvectors and let's imagine that we label those as eigenvector p1 p2 p3 and so on and then let's take all of those and make them the columns of a matrix and let's call that matrix capital p now what happens if we multiply our matrix a by p so a is the matrix we're focusing on the columns of p are the linearly independent eigenvectors that we've found what happens when we do this is we get a matrix that is going to be scaled versions of each of the p's because when we hit each column with a then all that happens is we multiply it by the lambda associated with that eigenvector so if we give them the same indexes as before then we have now a lambda p1 and a lambda 2 p2 and so on and we could actually think of this as multiplying p by a diagonal matrix where the diagonal is composed of these lambdas what i mean by that is let's imagine that we have a matrix let's call it capital lambda and on its diagonal is lambda 1 lambda 2 and so on with zeros off of the diagonal then we can write this as p multiplied by that big lambda so let's do that so these guys combine and we get a p equals p lambda it's worth taking a minute if you haven't thought this through and thinking about how a diagonal matrix like this would scale each of the columns so now when we're in this situation what we can do is multiply on the right by p inverse and we know p inverse exists because we have n linearly independent eigenvectors so we know that this matrix is invertible so we do that and we get a p p inverse equals p lambda p inverse and of course these guys become the identity and so we wind up with a equals p lambda p inverse so we say that p diagonalizes a it gives us a basis in which a is diagonal so the idea of an eigenvector and the idea of diagonalization are very closely related being able to diagonalize a matrix makes life easy in lots of different ways so if we have some a that we care about and we find ps such that lambda is diagonal then just harkening back to the last video where we talked about determinant and trace we can already see how life might be easier so for example let's think about the trace of a so the trace of a now we would write it as trace of p lambda p inverse and we can use the cyclic property to move one of the p's to the other side right so we could say p negative one moving that to the front p lambda and so of course those become identity and we just wind up with the trace of lambda now that's particularly easy right because lambda is diagonal so the only entries it has are the ones that matter and in fact they're the eigenvalues that we found just a second ago so that shows us that the trace is just the sum of the eigenvalues we could do exactly the same kind of thing with determinant right we can also say debt a and just like that we'll say that is equal to debt of p lambda p inverse now the determinant of a product is equal to the product of the determinants and so we'll do that we'll say debt p debt lambda debt p inverse and of course this is 1 over the determinant of p so just writing that out explicitly we get debt p and of course these cancel and so then this is going to be the determinant of lambda and the determinant of a diagonal matrix is the product of the elements along the diagonal so this is the product of the eigenvalues so you can see that there's a kind of sum versus product relationship with the eigenvalues that we get from trace versus determinant then there's one more way that a diagonalization really helps us and that's with thinking about matrix powers to give you an idea let's just imagine that we wanted to square a well that's just multiplying a by itself of course just like we're used to now let's write out the diagonalization we would have p lambda p inverse p lambda p inverse now these turn into the identity matrix and so we get p lambda lambda p inverse or we could say b lambda squared p inverse and this is particularly easy because recall that lambda is again this diagonal matrix composed of the eigenvalues of a so what this means is that all we did was square the eigenvalues of a one of the key ideas that you want to be aware of when you're thinking about eigenstuff is the idea of the characteristic polynomial now our starting point for the characteristic polynomial is to think about our main property associated with an eigenvector and eigenvalue that is to say that if i have a matrix a and i hit an eigenvector x then all that happens is i scale it by some amount lambda now one thing i could do is throw an identity in here if i wanted to because of course lambda is a scalar but a is a matrix so let's solve that by writing this just slightly differently we'll say ax equals lambda identity x and now let's move this over to the other side to give us an equation ax minus lambda i x equals 0. now what we can do is factor out the x and we could wind up with an equation that looks like this a minus lambda i multiplied by x equals zero and the idea of the characteristic polynomial is to try to think about what we would call non-trivial solutions to this equation that is solutions to this equation in which x is not equal to zero when would that happen well that would happen when this matrix has a null space or a kernel so just to remind you the null space or the kernel of a linear transformation is the set of vectors that go to zero after applying that linear transformation and so for this equation to be true there must be some interesting null space and if there's a null space that means that the matrix is not invertible that is to say that if there are some vectors that are not zero and then you hit them with the linear transformation and they become zero there's no way to get back you can't recover the information that you destroyed by hitting it without linear transformation so if it's not invertible then we would say that it's singular that is it's a singular matrix and a singular matrix is one whose determinant is zero so if we wanted to solve this equation then rather than thinking about the x's what we would instead say is we're looking for a situation where the determinant of a minus lambda i equals zero that is situations in which there exists a lambda such that this is singular it turns out that when you write this out it will give you a polynomial in lambda and then if you imagine factoring that polynomial then it will have some set of zeros and those zeros are the eigenvalues of a now you can immediately see that because that produces essentially an arbitrary polynomial it's possible that it won't have any real zeros and so it's perfectly possible for a matrix to not have any real eigenvalues similarly it may be the case that the eigenvalues are not unique because we could have a polynomial in which there are lambdas that correspond to multiple zeros just to give a little bit of a flavor of the way this works so we could imagine a simple case where we said a is equal to let's say 3 1 2 2. now if we write out the determinant that's going to be reasoning about the determinant of the matrix 3 minus lambda 2 1 2 minus lambda and finding the places where that is equal to 0. so let's write out the determinant of this so it's going to be 3 minus lambda multiplied by 2 minus lambda minus 2 equals 0. writing this out further we get six minus five lambda plus lambda squared minus two so put another way we get lambda squared minus five lambda plus four so we could write this out as lambda minus four lambda minus 1. and of course the zeros of that are going to be 4 and 1. the characteristic polynomial also helps us understand a little bit more clearly how we might wind up in a situation where we don't have any eigenvalues so here's one situation where that's true let's imagine we were talking about rotation matrices like you're doing in the homework right now we might have a rotation matrix r in two dimensions that looks like cosine theta negative sine theta sine theta cosine theta now let's think about what the eigenvalues would be for this matrix so as before we're thinking about the situation where we have the determinant of now r minus lambda i equals zero so let's write this out we want the determinant of a matrix that is now cosine theta minus lambda negative sine theta sine theta and then cosine theta minus lambda and we want that to equal 0. let's write out the polynomial cosine theta minus lambda squared now plus sine theta squared equals 0. so you can see right away just from this that we're already in trouble for finding interesting eigenvalues for arbitrary theta because this is a quantity squared plus another quantity squared so both of these things will be non-negative and it's going to be difficult for them to work out to be 0. in fact you can immediately see that unless theta is 0 or pi there's no chance that there's a real eigenvalue if theta is 0 or pi then cosine theta will be 1 or negative one and you can see then that for this to be zero lambda will then have to be one or negative one so there's only two possible thetas in which it's possible to solve this polynomial and in those cases you get a repeated eigenvalue of 1 or negative 1. another useful thing to know is that symmetric matrices are orthogonally diagonalizable what that means is that the eigenvectors you get out are going to be orthogonal to each other and remember that they're only unique up to scale and so we can choose their length which means that we get an orthonormal set of eigenvectors whenever we orthogonalize a symmetric matrix to see why that's true let's imagine that we have two eigenvectors each that has a distinct eigenvalue so let's imagine that we have some x such that we get lambda x when we hit it with a and now let's imagine also that we have a vector y and when we hit that we get a gamma okay so these are two distinct eigenvalues now we come along and let's say we hit this on the left with y transpose so now we get y transpose a x equals lambda y transpose x now these are both scalars so we can transpose scalars no problem so let's do that let's let's write x transpose now a transpose y equals lambda x transpose y and let's do the same thing over here let's imagine we hit this with an x transpose on the left x transpose a y equals gamma x transpose y now note that because a equals a transpose because these are symmetric matrices then we have the situation where these things all need to be equal so we have the situation where we now need lambda x transpose y from that needs to equal gamma x transpose y now we already said that lambda and gamma are distinct eigenvalues so they're not the same values so the only way this can be true is if x transpose y equals zero and if x transpose y equals zero then that means that having distinct eigenvalues means that the associated eigenvectors are orthogonal and as i said before since we can scale eigenvectors however we want we could make them unit length and so then we would have an orthonormal collection of eigenvectors
Info
Channel: Intelligent Systems Lab
Views: 302
Rating: 5 out of 5
Keywords:
Id: OMe5KuoL_hA
Channel Id: undefined
Length: 26min 10sec (1570 seconds)
Published: Tue Feb 23 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.