Singular Value Decomposition (SVD): Mathematical Overview

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] welcome back so we're talking about the singular value decomposition which is one of the most useful matrix decompositions in numerical linear algebra and right now I'm gonna walk you through how its computed and how its interpreted in terms of a data matrix X so we're going to start with the data matrix X which I'm going to define as a collection of column vectors X 1 X 2 so on and so forth up to X M and so the way I think about this I'm going to give you a couple of examples of where this matrix X could come from for example I could take a bunch of pictures of people's faces so I could take person 1 and I could reshape that face into a tall skinny column vector X 1 so let's say this is a megapixel image then I would reshape that into a million by 1 column vector X 1 so that's person 1 I could do the same thing for person 2 and reshape them into a tall skinny X vector and so on and so forth up to X M so let's say I have M people I can get these M column vectors and in this case I'm going to say that X K is in our n where n is the dimension of my measurements so in this case of a megapixel image and would be a million dimensions a million by one vector these reshaped X vectors and I might have M of them where M might be you know a thousand people's faces so a thousand columns each column has a million pixels N equals a million okay another example of this data matrix could be I'll just draw it again another example I think about a lot is if these vectors X come from some physical system some real-world system in science or engineering so for example maybe I have a fluid flow I have some fluid flow past a circle and I get this nice Karman vortex treat that we're used to seeing and maybe I have this flow field as it evolves in time okay and again what I can do is I can take this first snapshot and I can reshape that into a tall skinny vector X 1 the second snapshot in time so this is X 1 X 2 dot dot dot up to XM and so in this case of a physical system now the columns have the additional interpretation that they are the state of the system as it evolves in time ok so there's lots of ways you can build a data matrix from either a library of human faces or from a flow field that's evolving in time or some other physical process that you're measuring that's evolving in time and so what the singular value decomposition is going to allow us to do is take this matrix X and decompose it or represent it as the product of three other matrices okay we're gonna have this equals u times Sigma times V transpose ok and I'm gonna tell you all about these matrices over the next few videos I'm gonna give you what they mean how to compute them and how they're useful for approximating this matrix X the things I want to tell you right now is that the U matrix and the V matrix are what are called unitary matrices or orthogonal matrices and Sigma is a diagonal matrix so I'm going to write this out in kind of matrix form again where we have u 1 u 2 dot dot all the way up to u n this is an N by n square matrix times this Sigma matrix and the Sigma matrix is diagonal so I have Sigma 1 Sigma 2 dot dot dot now because there are only M columns of X there will only be M nonzero singular values and everything else below will be 0 I'll explain this more in a bit and then the last matrix is this V matrix V 1 V 2 dot dot V M transposed ok so we're taking our data matrix X and we're representing it as the product of u times Sigma times V where these each have their intuitive and physical interpretations of what these columns of U and V mean and what the entries of Sigma mean okay so a few things that are important to note these these columns of u are kind of they had the same shape as a column of X so if X is a million by one vector then the columns of U are million by one vectors and the way I think about these these are in some sense in this face example these would be my eigen my eigenfaces these are hierarchically arranged so u1 is somehow more important than u2 and so on and so forth in terms of their ability to describe the variants in the columns of X so in the case of faces these are eigen faces in the case of flow fields there I give you of my original data matrix ax ok so a couple of things I need to point out so U and V are unitary and essentially what that means is that you transpose u equals the identity and so does uu transpose ok so it's this so this matrix u the columns are orthonormal so they're all orthogonal and have unit length and they provide a complete basis for all of our n for this n dimensional subspace or this n dimensional vector space form the columns of my data live okay so you have this property that U is orthogonal and so it's transpose is its inverse same thing with V V V transpose equals V transpose V equals the identity matrix this one is an N by n this one is an M by M ok so that's important the other important thing to note is that Sigma is diagonal which I've already described here and it's non negative and hierarchically ordered so it's in decreasing magnitude so Sigma 1 is greater than or equal to Sigma 2 is greater than or equal to Sigma 3 and so on and so forth is greater than or equal to Sigma M which is again greater than or equal to 0 so they're all non-negative some of them could be 0 but Sigma 1 is always greater than or equal to Sigma 2 is greater than or equal to Sigma 3 and so on and so forth ok so what this means is that the first column of U corresponding to Sigma 1 in the first column of V corresponding to Sigma 1 are somehow more important than the second columns and those are more important than the third columns and so on and so forth in describing the information in the data matrix X and the relative importance of these of these columns of U and columns of V are given by the corresponding singular value so that's what we call these we call the u matrix the left singular vectors we call V the right singular vectors and we call this diagonal matrix Sigma a matrix of singular values so each of these Sigma's is a singular value of X corresponding to the singular vector U and the singular vector V ok and this is going to allow us at some point if some of these Sigma's are really really small we're going to be able to ignore them to chop them off and approximate the matrix X only in terms of the first few dominant columns of U and columns of V and the first dominant singular value Sigma so I'll talk all about that later but this is kind of an important concept is that these are ordered by importance ordered by importance and that ordering carries with it the ordering of the the columns of U and V as well so in the case of eigenfaces or of a data matrix X consisting of columns that are reshaped human faces these column vectors of you have the exact same size as a vector X and so these can be reshaped into faces as well these can be again reshaped into these eigen faces and so on and so forth ok so they have a clear interpretation the entries of Sigma are hierarchically organized that has a clear interpretation and these columns of V also have have an interpretation this one is a little bit trickier to explain because it's V transpose so I'm going to do my best here to to say this in a way that's clear and actually I'm going to start in the eigen flows case so if our data was from a flow field evolving in time then these would be eigen flows hierarchically organized the amount of energy that each of these column vectors captures of the flow would be given by these corresponding decreasing Sigma's and then V one would be essentially the time series for how this first mode u1 evolves in this flow so each of these snapshots has a certain amount of u1 in it and the amount of how that mode varies in time is given by V so these are kind of eigen time series in the case of physics ok these eigen time series in the case of the human faces you would basically take this V transpose this big V transpose matrix so now it's V 1 transpose V 2 transpose and so on and so forth and essentially the first column of this V transpose matrix would tell me the mixture of all of those eigen faces the exact mixture that I have to take of all of those columns of U to add them up to equal x1 and the second column of this V transpose matrix would be the eigen mixture of these columns that add up to x2 and so on and so forth of course scaled by by the singular value so you can think of these as kind of mixtures of views to make X's ok so the first column makes x1 the second column makes x2 and so on and so forth so very interpretable very kind of easy to understand the columns of you have the same size as a column of X so if these are people these are eigen people if these are flow fields these are eigen flow fields and they're hierarchically organized and arranged based on these singular values Sigma so the last things I want to tell you these are very very easy to compute so in MATLAB you just say u s v equals SVD of X I called this Sigma s because I didn't want to type Sigma but that's you signal V equals the SVD of X it's just as easy in Python and R and almost every other language it's also guaranteed to exist that's really important it's guaranteed to exist and it's also unique up to flipping the sign of these vectors I can make this negative u1 and this negative v1 and nothing changes but up to that this is unique and it's guaranteed to exist okay good so we're gonna talk in the next few videos about how you actually compute these matrices use Sigma and V again how we interpret U and V as essentially eigenvectors of correlation matrices between the rows and columns of X and then we are going to be able to use the singular value decomposition to approximate the data matrix X to build models to do linear regression principal components analysis I could for example build models on these eigen time series if I have a physical system we're going to talk about all of that soon thank you

Info

Channel: Steve Brunton

Views: 133,521

Rating: 4.9592175 out of 5

Keywords: Singular value decomposition, machine learning, Data science, Data, Linear algebra, Regression, Principal component analysis, SVD, PCA

Id: nbBvuuNVfco

Channel Id: undefined

Length: 12min 50sec (770 seconds)

Published: Sun Jan 19 2020