Linear Systems of Equations, Least Squares Regression, Pseudoinverse

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back so we're talking about the  singular value decomposition and this is   one of my favorite lectures where we talk about  least squares regression and the pseudo inverse   so this is an extremely useful application of the  singular value decomposition is in solving linear   systems of equations Ax=b so we are very  very used to thinking about the world in terms   of this linear system of equations Ax=b lots and lots of physical problems can be written   this way so this is our famous linear system  of equations and generally we're going to have   A is a matrix that's known, b is a vector that's  known and we're gonna be solving for x which is   our unknown quantity. Now you've you've  been solving equations like this for a long time   Ax=b but the vast majority of linear  algebra classes and textbooks spend almost all   of their time thinking about the case where  A is a square and invertible matrix so if A   is a square matrix then you have exactly as many  knowns as unknowns and if A is invertible meaning   its determinant is nonzero then you can compute  the inverse of a multiplied it by b and that's   what x equals x = A^-1 b that  would be the solution in this linear system of   equations. Now that's useful but the singular value  decomposition allows us to massively massively   generalize this for non square A matrices.   SVD allows us to generalize the solution of this to non square A matrices and this is huge this  opens up a much much larger class of linear   systems especially linear systems that arise in  data analysis and data modelling for example in   linear least-squares regression okay so I'm  really excited to tell you about this and I   think what I want to do to start is to walk you  through a few of the different things that can   happen for non-square a matrices okay so there  are kind of two canonical cases of non square a   matrices that I want to walk you through one is  the so called underdetermined under-determined   system of equations and in this underdetermined  case generally we have n less than m and so we   have what I call a short fat matrix so it's a  short fat a matrix so it's kind of a squat a   matrix like this and then what we would have is at  all X vector times that a matrix equals a short B   vector okay so this is this is a X and B okay now  in general this system is called underdetermined   because there's not enough measurements in B to  uniquely determine a single unique solution X in   general there are going to be infinitely many  solutions X given B okay so there's not just   one solution X there's many many many solutions  X that would be consistent with this system of   equations in general now this is not always true I  can certainly cook up a really degenerate a matrix   so that this is never true for a particular be  a vector but if I just randomly generated an a   matrix that was squat and a B vector then chances  are they're going to be infinitely many solutions   X given that vector B because there are kind of  more unknowns than knowns okay there's there's   more degrees of freedom and X there's not enough  values in B to uniquely determine all of the   values of X good so that that's one case the other  case we think about a lot is the overdetermined   and so we're going to think a lot about kind of  under and over determined systems of equations in   this case n is larger than M and so now we have a  tall skinny let's let's actually write that out we   have a tall skinny matrix a and we're gonna write  that essentially as this tall skinny a matrix   it's going to multiply a short X and it's gonna  result in a tall B B vector okay now again there   will be degenerate cases there are some B values  where there is a unique solution of this or even   infinitely many solutions but in general if I just  randomly generated this system of equations some   a matrix and some B matrix in this over determined  case where there are more equations than there are   unknown values in X then in general there will be  zero solutions of this that exactly satisfy this   equation so again this is not always true but  there in general will be zero solutions X for   a generic B vector okay and that's because there  are not enough degrees of freedom in X to fit all   of these equations in general ok there's there's  too many equations it's over determined again I   can cook up examples where this is not true what  the singular value decomposition is going to   allow us to do though is going to allow us to  approximately invert this a matrix to compute   what's known as the pseudo inverse and find a best  fit X that either comes as close to solving this   as possible or solve this equation with let's say  the minimum two norm of the vector X so I'm going   to write out exactly what I mean by that and what  we're going to have Let's assume that I have my matrix "A". I take a singular value decomposition so I have U Sigma V^T. I'm going to assume  that this is an economy SVD and I'm dropping the hats then what we can define is this "A dagger"   the pseudo inverse of A, called the Moore-Penrose pseudoinverse and I think I actually want to write   it out explicitly in the following way let's say  we have "A x = b" and I'm gonna write for "A"   "U Sigma V^ x = b" and now what we're  gonna try to do is we're gonna try to invert each   of these matrices one by one to solve for x on  the left hand side and again that's one of the   nice things about these being unitary and diagonal  matrices is that it's really easy to invert each   of these matrices one at a time. Okay so what I'm  gonna do if I want to invert this I'm just going   to write this one more time "U Sigma V^T x =" and I'm going to give myself some room   here if I want to invert U I just multiply by  U^T if I want to invert Sigma I'm going   to multiply by Sigma inverse and if I want to  invert V^T I'm going to multiply by V   and so this matrix here is what I'm defining as "A dagger" the pseudo inverse that's equal to V Sigma^-1 U^T  and this is what's known as  the Moore-Penrose pseudoinverse. Because my matrix   A is not square a proper inverse doesn't exist and  this is as close as I'm going to get to inverting   that A matrix and solving solving for x and so  again this cancels to the identity this cancels   to the identity and this cancels to the identity  and so I'm left with x let's call it "x tilde" it's   kind of my best approximation is equal to "V Sigma^-1 U^T b"  which we're going to define   as "A dagger b" that's how we define the pseudo  inverse and it's technically the left pseudo   inverse because A is multiplying x on the left and  I want to invert it on the left. If it was "x A" I would invert it on the right and I get a right  pseudo inverse. Okay so a pretty simple idea we're  going to use the SVD a to invert to pseudo-inverse  this a and solve for X and this is somehow the   best X we're going to get for these two equations  linear systems of equations okay and specifically   these X's actually mean something so in the  underdetermined case in this case here there's   infinitely many solutions this is the X tilde has  the minimum two norm so the x2 is minimum this is   the let me let me write this a different way we  have essentially the minimum to norm X such that   a X tilde equals B is given by this solution so  of all infinitely many X tilde is that do in fact   satisfy this this one is the solution that has the  minimum two norm okay so we call this the minimum   the minimum norm solution and sometimes that can  be useful maybe you know maybe this corresponds to   energy or something and having the minimum norm is  like the cheapest way of multiplying a by a vector   X to get this the specter B now in the case where  there are no solutions in this overdetermined   case this is still kind of the best solution we're  going to get what this does is this is the what's   called the least square solution it minimizes the  error between a X tilde and B this two norm error   so this is called the the least squares solution  so even if there is not an exact solution to this   over determined system of equations this pseudo  inverse solution X tilde equals a pseudo inverse   B is somehow the solution that minimizes the  error in the fit between ax and B so even if   I can't exactly find an X that a times x equals B  this is going to be the one that has the minimum   to nor impossible the least squares solution and  this is the cornerstone of linear least squares   regression we're going to talk a lot about that  in the next few lectures but I really wanted to   just point out that you have now the power of  generalizing these linear systems of equations   to non square matrices a and we compute instead  of the inverse the pseudo inverse using the SVD   and that solution that you get when you solve for  x has these very nice optimal properties they're   either the minimum norm solution that satisfies  the system or the least square solution that   satisfies the system okay so we'll talk more  about this in the next few lectures thank you
Info
Channel: Steve Brunton
Views: 33,892
Rating: 4.9693656 out of 5
Keywords: Singular value decomposition, least squares regression, machine learning, Linear algebra, Regression, Principal component analysis, SVD, PCA, pseudoinverse, systems of equations
Id: PjeOmOz9jSY
Channel Id: undefined
Length: 11min 52sec (712 seconds)
Published: Thu Jan 23 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.