Patrick Landreman: A Crash Course in Applied Linear Algebra | PyData New York 2019

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
somebody hands you a data set like this and this could be this could be literally anything this could be the value of a stock that's changing over time this could be the position of a robot you can use your imagination taking or whatever you do and work for yourself it doesn't matter now you're given this and you are asked to predict what this value is going to be at some future time that's not shown on the graph so you look at this thing and you say okay in the beginning it's pretty much just chaos I don't know what I would do there but maybe as it goes it's it's getting a little periodic and maybe I can exploit that somehow you might be able to make some headway with this but the the punchline is it's not it's not going to be trivial to do this so now what if instead you you happen to know that you could take this time series and decompose it into these eight component parts if you if you look closely at each of these it's basically a sinusoid that's that has some decaying exponential envelope to it and predicting what any one of these is going to do at some future time seems like a much simpler task so the question that should be coming up in your head at this point is where on earth did I get these component functions from and how do you figure out what the what the weighting is or the recipe to get back the original data so these are the types of problems that got me really interested in this field it turns out that you can do this with linear algebra despite the fact that nothing on here looks like a straight line but we're gonna we're gonna look at some fun stuff like this and hopefully you'll leave with some inspiration to go and keep digging up more of this material for yourself before I further the ideas that I'm gonna share with you today I learned through these really excellent courses taught by professor Stephen Boyd and Sanjay law at Stanford University these guys have created a tremendous resource in that they put their their course lectures and notes online for free as well as books on the material yeah please capture these links I'll show them again on my final slide and I can you know post them online if there's any sort of resources for the follow up but everybody here please check this out if you're interested it's it's a tremendous resource so Who am I and how did I get so excited about this why why do I care why did I put all this energy into making a talk on this so I am one of the first engineers at a company called spry health at spry we have made the first fda-cleared wearable wristband that monitors vital signs continuously for people who are living with chronic illnesses so my job as part of the algorithms team is to take the raw physiological information that we get from our sensors and use that to estimate what are the vital signs of the wearer at that particular moment in time and then on top of this we our aim is to build up this enormous and unprecedented data set of vital signs through days and months and then use that information to predict whether or not somebody is at the risk of returning to the hospital or the emergency room in the work that I do it's essentially impossible for me to pick up a research paper and get through it without needing to understand at some level the properties of vectors and matrices but more interestingly to me than that is that if you whatever discipline you work in if you have a problem and you can cram and Bend and stretch this thing to look like this then the only thing you need to are the properties of this matrix a and the rest will sort itself out you can take ideas from finance and use them to solve whatever you're doing and maybe fluid mechanics problems like this come up literally everywhere and we could spend you know days just going through examples but I figured let's just start with a few to to get warmed up so if we are in classical mechanics we could have a rocket or some object the vector X could be a list of a few forces acting at different points on the object the vector Y can be a list of the the total resulting forces or torques and the matrix a then contains all of the information about the geometry that's specific to that that object we can switch gears entirely and go into something for business operations so you can have your vector X now representing the the price of all of the component parts that you use Y can be the prices of all of your finished products or you know how much you need to charge for those and a then the rows of a are the sorts of recipes for each of your products and this becomes a kind of Bill of Materials switching let's see we could work keyboard we can switch fields again this time to statistics and this one's a little more subtle so here what becomes your X vector from an autoregressive model is actually the coefficients for each time leg you'll notice the A's are now the vector on the right y is the output of the model itself so this is now your time series of data and your matrix a is this kind of weird concept where each row is a stacked time window out of your data what I really like about this example in addition to the the one I should at the beginning is it emphasizes that we don't just have to talk about straight lines when we were doing linear algebra if you've worked with AR models before you know these things can can have all kinds of crazy shapes and behavior so so we're not we're not stuck with lines and we can generalize this further I have made absolutely no requirements about how you get the numbers that are in your matrix a these can come from absolutely hideous functions depending on what field you're in they can be you know Bessel functions of the 50th order it doesn't matter as if it's the elements that are in your matrix as long as the weights that tell you how to mix those functions are first order then you can still work with this framework so before diving in to some of the specific topics I want to cover a little bit of a disclaimer this field is so deep and has so much richness that there's no possible way to get through it in a 30-minute talk so the way that I'm going to cover as much as I can is by being completely irresponsible with my math I'm not gonna prove anything I'm gonna wave my arms a lot just trust me on it everything that you see here is a you can google search and find proofs of it very quickly this is all well documented and on top of that I put together a series of Python notebooks that post a link to on my final slide which goes into more examples that I wanted to fit in but didn't have time you can play with you know the source code how I generate some of the figures that I'm gonna show so just know that that's coming up okay so with that preamble out of the way let's you know start dusting off some of our neurons from our you know classes that we took a long time ago the first topic that I want to start with is matrix rank so as a refresher the rank of a matrix is a property it's an integer and that number tells you what is the largest grouping of rows or columns that you can take and have it form a linearly independent set it's easiest to explain that in example so with this matrix that I've shown here I'm gonna use numpy arrays for everything no matter how much I want to and how hard I try I will not be able to take these first two rows and scale them and add them together in some way that produces this third row so these three vectors form a linearly independent set and the rank of this matrix numpy cheerfully tells us is 3 now the the rank has to be less than the number of rows or columns are equal to that so as soon as we add another row to this matrix it's this set will no longer be linearly independent in this case you can see I can take two times the first row plus the second row plus the third row I get the fourth row so the rank stays the same at 3 and we would say this is not a full rank matrix because of that and you know flipping rows and columns it doesn't matter so as soon as we introduce some idea that there's redundant information in this matrix immediately what can come to your mind is there's an opportunity for compression I'll demonstrate how this is going to work with a pretty contrived example but say that you have a matrix that you want to put on a computer this thing has a million rows but only two columns and the second column is exactly double the first column now it would be it would be silly to store all of this in memory when you can automatically save half of the space by just storing the first column and remembering if you ever need to index into the second column just multiply by two so I've shown here how we can express that idea in sort of a matrix equation notation you can take a and you can factor it into these two two things and it's a you know a pretty small matrix but it's matrix this idea of course generalizes and it turns out any matrix if you know the rank can be split into two smaller matrices whose shape is determined by the rank and our mathematician friends have given us a whole host of tools that will give you different types of these matrices so this is absolutely possible now if you go through pretty simple arithmetic to calculate how many numbers do you have to store you find out you need m times and numbers for the original matrix but if your rank R is much smaller than either of those dimensions you can have a very considerable savings in how much you you need to store and somewhat amusingly if you ask the question how many floating-point operations or multiplications does it take to multiply some vector X by this matrix the math is exactly the same you you have M times n flops for for the full matrix and if your rank is low you can save quite a bit of you know time and efficiency by using the the factored form okay starting to feel a little bit warm juices are going the next subject is another one that I think everybody would have seen in kind of an introductory linear algebra course this is gonna be I give vectors and eigen values who here remember is going through lots of loose-leaf paper writing out a polynomial and finding the zeros I hear chuckles in hand so okay I'm not alone on this I can't count how many trees I must have gone through I certainly never had any idea why I was doing all this it turns out to be very useful okay I'll use a picture just to remind us all what these things are suppose you start with a vector X in some space and you multiply that with a square matrix this has the effect of transforming it and in general if you start with this X here you'll end up with a new vector that probably points in some other direction in space for any matrix there are a few special directions that if you transform those vectors you end up with an output that is a scaled version of the vector you started with written as an equation we have a V equals lambda V where lambda is the scaling factor and these things turn out to come up so often we give them a fancy name we use eigen which means same and so you have V as the eigenvector and lambda as the eigenvalue there are quite a few situations where these become useful but the one that kind of grabbed my attention the most when I was learning this it comes up in systems that evolve over time so within this context X our vector is going to represent a list of properties that define a system at some moment in time these could be parameters in a statistical model or in physics a good one is it could be the position velocity and acceleration of some object the matrix a is contains all the information about how you update the state of your system from one time to the next time step it's it's possible to write out an expression for the state at any time given the state at the beginning time and the relationship has entirely made of terms that have this structure with one of the eigenvectors and a coefficient that is an eigenvalue raised to a power that grows with time the ellipses in this are hiding some really ugly terms but those also you know end up having the structure there's factorials and binomial coefficients and things and I just wanted to highlight this here so I find this completely amazing you know we introduced eigenvalues and eigenvectors as these kind of you know funny geometric concepts and here they show up in a sistah or in when we're talking about how things evolve it didn't have to be that way and beyond that because of the particular structure here there turned out to be only a few simple things that can happen so we can have Augen values with a magnitude of 1 in which case those terms in your expression don't do anything as time advances this you can think of this as a kind of steady state behavior you can have I can value Sue's magnitude is less than 1 and as time passes those terms dampen out and go to zero and if you're in you know some sort of mechanical system this tends to represent kind of stability things get calmer over time and then of course the third is if your eigen value has a magnitude greater than one as more time passes the weight of those terms is going to explode to infinity and this you would think of as kind of an unstable system one detail even if all of the entries in your matrix a are real you can still have complex sigan values and eigenvectors and the way that manifests is that you have a periodic oscillation but the asymptotic behavior is exactly the same so a fun way to play with this idea and to see it in practice comes up with Markov chains I have people here worked with Markov chains before just to get a sense I see a couple of hands so for people who haven't seen this before it's a I think a pretty straightforward idea imagine that you have at any point in time you can be in one of a few discrete states and then there's some probability on every time step given what state you're in now that you switch to a different one of the states and that's about it if you pick one at random and then you do coin flips and you you know just write out which state you pass through over time you'll get it you know it a sequence like this and we would call that a Markov chain and this whole thing is a Markov process so I picked you know a situation that might come up you have an employee who can either be working thinking or not doing anything and a classic question that would come up with a model like this is what is the probability or what percent of the time is my employee doing what I want him to do Markov chains are nice because they are a completely linear process here are our state vector is a list of the probabilities at a particular time that you are in each of the states and the transition matrix has a nice interpretation each entry is the probability that you in one time step that you switch from the jafe state into the ice state okay when I was in high school I had a particularly boring hamster who had a very set routine it would sleep it would wake up and run on its wheel and then it would eat and then we go back to sleep and it would repeat this ad nauseum so it's a simple task to write out the Markov process for my hamster look something like this and what I've done here is I've just written out this is the the transition matrix that you would have for a process like this so a question to you guys what types of behavior would you expect to see in a state vector for this process you can just shout things out exactly so one one big one is that every three time steps you would expect to come back to where you started and there's actually one more that's a little more subtle and if anybody you can see it so can you say a bit louder so so yes with the so just reminding that the interpretation of the vector is that it's probability so you can have if you start that all three states with an equal probability of occupation then it turns out that will remain constant as you apply the update matrix so if we go to numpy and we ask it to give back the eigenvalues and eigenvectors of this matrix we can see two things right away so here we have an eigenvalue of 1 which that's going to represent this kind of unchanging behavior and this value is associated with an eigenvector that has equal entries in all three states so this is that that second behavior I mentioned the other two eigenvalues are have complex values and these are a complex conjugate pair so both of these vectors turn out to be in this periodic rotation through the states in the same direction so I threw together just another type of transition process and here we get back that we have all real eigenvalues one of them is equal to 1 the other two are less than 1 so just looking at this can you can you make a guess as to what this system is going to do over time yeah so that is we're gonna see any any part of the initial state that's associated with these two vectors or that's constructed from these two vectors is going to go away what's going to be left is a vector that has all of its weight in one of the states so we should see a kind of convergence and if we if we run the simulation forward that's exactly what we see all of the weight that starts in components 2 and 3 goes to 0 component 1 maxes out so Markov chains as the example that I chose because they deal with probabilities turn out to have a property they will always have an eigen value exactly equal to 1 and this idea is one of the fundamental principles of Google's PageRank algorithm so whether you like it or not you're actually interacting with this every single day which i think is cool for the for the final topic that I wanted to cover today I wanted to go with something that isn't typically seen in a basic linear algebra class so it'll be new but this I think is one of the most powerful ideas in the field it enables you to prove all kinds of things very very easily and it has way more applications than there's time for today so it's called the singular value decomposition and I'll I'll introduce it with a bit of a story it's a fun fact if you take the vectors that comprise a sphere and you transform them now with any matrix a it doesn't matter the shape the resulting output that you get is an ellipse and as an ellipse you can define it by a set of principal axes that are mutually perpendicular or orthogonal to each other now what's amazing is if you take these principal axes and you ask the question what were the input vectors that mapped onto them though is also turn out to give you a set of mutually orthogonal vectors and it turns out that they actually create a basis for the input and output space as a reminder you can think of a basis as if you were to take your Cartesian axes and kind of rotate them so we have some flexibility in how we scale things so to make it nice and clean we define these vectors V and U for the the two bases so that they all have a length of one and then we'll introduce the scaling coefficient Sigma so that the math works out that when you transform one of the inputs you get the output so these Sigma's are going to end up being called singular values which is where the name comes from now we can do is we can start with this equation for one of the vectors and we can actually group all the vectors together stacking them horizontally into a matrix and that gives us a structure that looks like this one so we have a V equals u big Sigma now big Sigma is a diagonal matrix it's almost all zeros except this main diagonal we put those scaling coefficients on and the one last step is we'll just invoke this fact that all of those input and output vectors are a mutually orthogonal so these vectors v and u are what are called orthogonal matrices which means that if you multiply them by the transpose they essentially cancel out so that lets us rearrange this in one step and we've just shown that any single matrix a can be written with the special form where you have an orthogonal matrix a diagonal matrix and an orthogonal matrix this is called the singular value decomposition and like I said it shows up a lot so one of one of the ways in particular is I mentioned there were you know lots of algorithms that could give you this low rank factorization for compression this turns out to be an easy one you just group these two and now you've got your two matrices okay so of the applications for this thing the one that for me I thought was the most motivating or most exciting when I saw it is this idea of the pseudo inverse the other reason is I think this is extremely powerful and but when we do interviews with candidates that spry it's something that very few people seem to have come across before so hopefully this is something I can share with all of you and it will have a great impact in the problems that you solve okay say for this for this set up I'll use an example from like mechanical engineering we have some kind of heat conductive plate and we put some thermal limiters on it this could be you know computer processor is that as you run them more they dissipate power into the board and then we have a couple of sensors that are placed on the board to measure what the temperature is and you know the classic question you would have here is given the measurement of the temperature how much power are we dissipating from our processors the kind of common rule of thumb wisdom is I need to have as many equations as I have unknowns in a problem like this that would translate mathematically to having a be a square matrix and then to solve for the unknowns we would need to have a be invertible and then we could switch the sides and we get out an estimate of what our what our own unknown vector X is this is okay but it has some problems for instance say my budget gets bigger and now I can put a force sensor on there this like intuitively this should only give me more information about my problem but it kind of breaks our math because we don't know how to invert a matrix when it's not square and that force sensor makes our Y vector longer which means you have an extra row and a and now it's not square the other possibility is maybe one of our sensors breaks or maybe they're expensive and we we can't afford to put enough of them on there and it would be much nicer to have a best-guess or to say something rather than just going up her hand and saying you know this can't be done so the the singular value decomposition gives us a very very quick way to get an answer so we just we invoke the decomposition and then if you look at this thing for a moment you might see that if we multiply by this thing which it you know it just ends up being a matrix you can go through some very simple algebra and all of this stuff just cancels out I think let's see I forgot to transpose on there so this this new matrix is so significant that it gets a really fancy name it's called the more penrose pseudoinverse and there is just a single line numpy command that will give this to you and it has some very beautiful interpretations so if your a matrix is tall then the pseudo-inverse gives you the least squares solution it minimizes the sum of squares error and if you haven't worked with least squares before this is this picture of finding a best fit line where you don't necessarily hit all of your data points but you you end up being close to them in the other case if you have a wide matrix then the pseudo-inverse you at you have multiple solutions in this case that are exact but the pseudo inverse will give you the one with the smallest sum of squares of the entries of the vector X and so in a polynomial regression case this would be finding the curve that goes through all your data points but you know has the least wild swings in it or sir to the lowest order okay so we'll apply this idea to a real problem and this is this is how my class at Stanford opened and blew my mind so for this imagine that we have something like a magnetic hard disk that has a head that can move in a 2-dimensional plane and we can steer it by you know tuning two inputs these could be voltages or currents or something so what we'd like to do is be able to move from any starting position to a new target and we'd like to arrive there smoothly and you know sort of in a time that we can control so systems like this and very commonly mechanical systems turn out to be well modeled with what are called linear dynamic systems so this is just this is a couple of matrices and vectors X here is again it's it's some sort of internal states it's not something we necessarily measure directly but we might be able to you know starting from our original equations know what those things are U is whatever the inputs we put in that we can control and then Y is what we physically observe so it's kind of the naive first approach for a problem like this is let's figure out you know the solution where X isn't changing over time and Y is the Y that we want and you can get this by just you know doing some arithmetic to to rearrange this and it turns out that I built a really terrible harddrive because when I just go directly to the point that I'm trying to reach my arm completely overshoots and it swings back and forth and it vibrates for a while and it takes a whole bunch of time for this to kind of dampen out but eventually it does reach the target that I is set for it okay now let's let's do the fancy singular value approach that we just learned so if we if we take the linear dynamic dynamic system and unroll it as a recursion we get kind of a scary looking thing that gives us our final position at any time as a function of all of the inputs leading up to that time and okay now this this looks gross but if you you know take a breath step back you know relax for a second you realize this is just y equals mu this is a simple matrix equation and that means we can use a pseudo inverse and find the U with the smallest sum of squares that gets us to Y at the time that we want so if we run this simulation the the inputs that we get out are these completely bizarre things and I challenge you to come up with those on your own but if you look at what the position of our our mechanical arm does it kind of you know wobbles around for a bit and then at the time 800 seconds that I chose for this it locks in exactly where I wanted it to go and it stays there for all future times no overshoot and it turns out I can make the arrival time arbitrarily small and if this will still work there's a trade-off that you might need to use bigger input voltages to get there but you can do it you know whenever you want okay so right on cue I'd like to you know just review several topics that we've covered so we introduced with all these examples of where linear algebra algebra comes up and we said you know we're dealing with situations that can look much more complicated than straight lines we talked about the rank of a matrix and how you can exploit that for compression efficiency we went into eigenvalues and eigenvectors and we looked at markov chains and how you can make statements and predictions about how systems will evolve over time and then we introduced this new idea of the singular value decomposition and we showed a fun example with the hard disk there so that's that's the end of my talk I'd like to leave this slide up just so that people can have access to this spry is always hiring and we're looking for people like you so please reach out to me if you're curious and want to learn more I'll be around for the full conference I mentioned I had a notebook with more of these examples that link is right here these are the Stanford resources and then some some extra topics if you you know this stuff and you want to you know go deeper into something new so with that thank you very much for your time and attention
Info
Channel: PyData
Views: 6,678
Rating: 4.9069767 out of 5
Keywords: Python, Tutorial, Education, NumFOCUS, PyData, Opensource, download, learn, syntax, software, python 3
Id: wkxgZirbCr4
Channel Id: undefined
Length: 35min 10sec (2110 seconds)
Published: Sat Nov 30 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.