Linear Systems of Equations, Least Squares Regression, Pseudoinverse

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

welcome back so we're talking about the singular value decomposition and this is one of my favorite lectures where we talk about least squares regression and the pseudo inverse so this is an extremely useful application of the singular value decomposition is in solving linear systems of equations Ax=b so we are very very used to thinking about the world in terms of this linear system of equations Ax=b lots and lots of physical problems can be written this way so this is our famous linear system of equations and generally we're going to have A is a matrix that's known, b is a vector that's known and we're gonna be solving for x which is our unknown quantity. Now you've you've been solving equations like this for a long time Ax=b but the vast majority of linear algebra classes and textbooks spend almost all of their time thinking about the case where A is a square and invertible matrix so if A is a square matrix then you have exactly as many knowns as unknowns and if A is invertible meaning its determinant is nonzero then you can compute the inverse of a multiplied it by b and that's what x equals x = A^-1 b that would be the solution in this linear system of equations. Now that's useful but the singular value decomposition allows us to massively massively generalize this for non square A matrices. SVD allows us to generalize the solution of this to non square A matrices and this is huge this opens up a much much larger class of linear systems especially linear systems that arise in data analysis and data modelling for example in linear least-squares regression okay so I'm really excited to tell you about this and I think what I want to do to start is to walk you through a few of the different things that can happen for non-square a matrices okay so there are kind of two canonical cases of non square a matrices that I want to walk you through one is the so called underdetermined under-determined system of equations and in this underdetermined case generally we have n less than m and so we have what I call a short fat matrix so it's a short fat a matrix so it's kind of a squat a matrix like this and then what we would have is at all X vector times that a matrix equals a short B vector okay so this is this is a X and B okay now in general this system is called underdetermined because there's not enough measurements in B to uniquely determine a single unique solution X in general there are going to be infinitely many solutions X given B okay so there's not just one solution X there's many many many solutions X that would be consistent with this system of equations in general now this is not always true I can certainly cook up a really degenerate a matrix so that this is never true for a particular be a vector but if I just randomly generated an a matrix that was squat and a B vector then chances are they're going to be infinitely many solutions X given that vector B because there are kind of more unknowns than knowns okay there's there's more degrees of freedom and X there's not enough values in B to uniquely determine all of the values of X good so that that's one case the other case we think about a lot is the overdetermined and so we're going to think a lot about kind of under and over determined systems of equations in this case n is larger than M and so now we have a tall skinny let's let's actually write that out we have a tall skinny matrix a and we're gonna write that essentially as this tall skinny a matrix it's going to multiply a short X and it's gonna result in a tall B B vector okay now again there will be degenerate cases there are some B values where there is a unique solution of this or even infinitely many solutions but in general if I just randomly generated this system of equations some a matrix and some B matrix in this over determined case where there are more equations than there are unknown values in X then in general there will be zero solutions of this that exactly satisfy this equation so again this is not always true but there in general will be zero solutions X for a generic B vector okay and that's because there are not enough degrees of freedom in X to fit all of these equations in general ok there's there's too many equations it's over determined again I can cook up examples where this is not true what the singular value decomposition is going to allow us to do though is going to allow us to approximately invert this a matrix to compute what's known as the pseudo inverse and find a best fit X that either comes as close to solving this as possible or solve this equation with let's say the minimum two norm of the vector X so I'm going to write out exactly what I mean by that and what we're going to have Let's assume that I have my matrix "A". I take a singular value decomposition so I have U Sigma V^T. I'm going to assume that this is an economy SVD and I'm dropping the hats then what we can define is this "A dagger" the pseudo inverse of A, called the Moore-Penrose pseudoinverse and I think I actually want to write it out explicitly in the following way let's say we have "A x = b" and I'm gonna write for "A" "U Sigma V^ x = b" and now what we're gonna try to do is we're gonna try to invert each of these matrices one by one to solve for x on the left hand side and again that's one of the nice things about these being unitary and diagonal matrices is that it's really easy to invert each of these matrices one at a time. Okay so what I'm gonna do if I want to invert this I'm just going to write this one more time "U Sigma V^T x =" and I'm going to give myself some room here if I want to invert U I just multiply by U^T if I want to invert Sigma I'm going to multiply by Sigma inverse and if I want to invert V^T I'm going to multiply by V and so this matrix here is what I'm defining as "A dagger" the pseudo inverse that's equal to V Sigma^-1 U^T and this is what's known as the Moore-Penrose pseudoinverse. Because my matrix A is not square a proper inverse doesn't exist and this is as close as I'm going to get to inverting that A matrix and solving solving for x and so again this cancels to the identity this cancels to the identity and this cancels to the identity and so I'm left with x let's call it "x tilde" it's kind of my best approximation is equal to "V Sigma^-1 U^T b" which we're going to define as "A dagger b" that's how we define the pseudo inverse and it's technically the left pseudo inverse because A is multiplying x on the left and I want to invert it on the left. If it was "x A" I would invert it on the right and I get a right pseudo inverse. Okay so a pretty simple idea we're going to use the SVD a to invert to pseudo-inverse this a and solve for X and this is somehow the best X we're going to get for these two equations linear systems of equations okay and specifically these X's actually mean something so in the underdetermined case in this case here there's infinitely many solutions this is the X tilde has the minimum two norm so the x2 is minimum this is the let me let me write this a different way we have essentially the minimum to norm X such that a X tilde equals B is given by this solution so of all infinitely many X tilde is that do in fact satisfy this this one is the solution that has the minimum two norm okay so we call this the minimum the minimum norm solution and sometimes that can be useful maybe you know maybe this corresponds to energy or something and having the minimum norm is like the cheapest way of multiplying a by a vector X to get this the specter B now in the case where there are no solutions in this overdetermined case this is still kind of the best solution we're going to get what this does is this is the what's called the least square solution it minimizes the error between a X tilde and B this two norm error so this is called the the least squares solution so even if there is not an exact solution to this over determined system of equations this pseudo inverse solution X tilde equals a pseudo inverse B is somehow the solution that minimizes the error in the fit between ax and B so even if I can't exactly find an X that a times x equals B this is going to be the one that has the minimum to nor impossible the least squares solution and this is the cornerstone of linear least squares regression we're going to talk a lot about that in the next few lectures but I really wanted to just point out that you have now the power of generalizing these linear systems of equations to non square matrices a and we compute instead of the inverse the pseudo inverse using the SVD and that solution that you get when you solve for x has these very nice optimal properties they're either the minimum norm solution that satisfies the system or the least square solution that satisfies the system okay so we'll talk more about this in the next few lectures thank you

Info

Channel: Steve Brunton

Views: 33,892

Rating: 4.9693656 out of 5

Keywords: Singular value decomposition, least squares regression, machine learning, Linear algebra, Regression, Principal component analysis, SVD, PCA, pseudoinverse, systems of equations

Id: PjeOmOz9jSY

Channel Id: undefined

Length: 11min 52sec (712 seconds)

Published: Thu Jan 23 2020