Krylov subspaces

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we're now coming to the big idea for dealing with large matrices and that's creel out of subspaces it's a big subject and it's going to take us some time to get all the way through it so let's just get started and and realize that we're going to be pursuing this for a while now so let's first start with an observation so remember with the power iteration we find the sequence of vectors each new vector is gotten by multiplying the previous one by a and then dividing by some constant so that means that the matrix the vector XK that we come up with in the power iteration is some multiple of a to the K times the starting vector x0 ok so one way of thinking about that just putting different linear algebra language on it is that the specter XK lies in the subspace spanned by this one vector so I'll remind you of the span of a set of vectors it's just the set of all linear combinations so the set of all C 1 B 1 plus C 2 V 2 etc such that these numbers are real or complex depending on the context and we're going to use a special notation for this we're gonna write this as angle brackets around those vectors so the span of a single vector to get back to the specific case here the span of the vector X superscript K is just the set of all C times XK that is a one-dimensional subspace now if we're going to be looking for an eigenvector in this subspace right there's not much of a choice because remember eigenvectors are only determined up to constant multiples anyway and so really there's only one choice it doesn't matter how we normalize the vector so it's just any you know any representation of the basis of this space is the only choice we really have so the fact that we're looking in a one-dimensional space really limits the options so there's a very simple idea to improve on that so we have all these previous iterations that we're not using directly they just helped us get to the place that we are but we could use them more thoroughly so we had to create all these vectors on the way to getting getting here right so these are from previous iterations so why not use them so why not take the span of those vectors well this is what we call a Krylov subspace it's one but that's the span of vectors that are all multiples of a to some power times some starting vector well we'll set up the notation and the definitions in a second but we'll just point out that you know things can only get better if if before we looked only you know in this dimension of the space and now we're allowing ourselves to look in all the other dimensions as well at the same time but we can only do better by doing that and it turns out we do a lot better so let's now you know shift our notation away from power duration so much and talk a little bit more abstractly so well now define this matrix capital K sub M okay it's actually depends on matrix a and the starting vector you and we'll call this the creel off matrix so it's n by M we're saying a is an N by n matrix so it's a little unfortunate but it is consistent with the way we've been doing things it's n by M where n is greater than or equal to M no so we're only going to do this up to the nth power of a now here are some facts about these krilov spaces i didn't define a space yet so the cria of space the Krylov subspace is defined as the span of those same vectors okay in other words it's the column space of the krilov matrix no way of putting it yeah okay and for this notation will use a curly K so the curly K means the space the regular K means the matrix okay so some basic facts about these subspaces they are nested so what that means is that as a space k1 is a subset of k2 which is a subset of K 3 and so on forever they build on each other so if you think in terms of span right from the standpoint of K 3 K 1 is the is the krilov space where we would take C 1 times you plus zero times au plus zero times a squared u so that's certainly every element of k1 is also an element of k3 but then there are elements of k3 that aren't in k1 so it's this growing nested sequence of spaces we can construct these spaces by just repeatedly applying a to a vector so again we don't actually take the powers of a we apply a repeatedly to previous vectors from the sequence and that way we avoid taking a power of a matrix which destroys the sparsity so all we ever have to do is a times the vector to build these spaces next fact if X is in the crea of space km fancy km then x equals the matrix km times Z for some vector Z so again this is just a question of definition if X is in the Krylov Krylov space km that means that X is equal to z1 u plus z2 au and so on that's by definition of km but then we know that a linear combination of vectors is the same as a matrix times the column vector which is the reason we picked our definition of them pre-loss matrix is km in the first place so that's just the same trick we've been doing with matrix vector multiplication since pretty much day one okay and then finally if a is a non singular matrix in the dimension of the empty lot of space is exactly it ok so the big idea that we're going to be working with okay so we have this big matrix a that operates over a huge dimensional space alright so what we're going to do is we're going to approximate this by something which is lower dimensional that we can handle at a lower dimensional space that we're choosing is the space okay so there's a number of questions that we're gonna have to ask about what this big idea means so why is it a good idea how does it work in practice but the first question we really have to ask is how does this work how do we do this what does it even mean to replace the action of a over a whole space with its action over a smaller space and really it's up to us to decide how to do that and then see if something useful comes out of the idea so here's our very first answer which turns out to be not the right answer but it will take us to the right answer eventually so the first answer is to consider our two problems linear systems and eigenvalues it's a it's a little easier to talk about for linear systems so let's say we're solving on a square system ax equals B okay the full problem right we could rewrite this as b minus ax equals zero and we'll just assume that a is non-singular so it has its solution right so if it has a solution of course we could recast this as a minimization problem the smallest that the norm of B minus ax could ever get is zero and so if we minimize it we get the solution right so the solution of this minimization problem is the same as the solution of the original it's x equals a inverse B okay but that allows us to replace it with an approximation problem or an approximation of the original problem so instead of minimizing over all of RN or CNF it's complex we're gonna minimize it over a much smaller space now we're no longer necessarily going to get the answer we would only get the answer of X happen to lie in a space where we're looking if the Truax lied there but hopefully the X that does satisfy this minimization will be close to the true bands okay but let's not even worry about that part of it yet let's just see how this would work so of course if X is in the Krylov space km it's equivalent to X equaling the Krylov matrix km times a vector Z so we'll plug that in to our minimization problem so now we're looking over all possible vectors Z which are M dimensional and we're going to minimize B minus a times X but it X's has this form KMZ and then of course we could rewrite this as a minimization over Z of B minus the matrix a km time see and this is just an ordinary linear least squares problem which is sort of motivates why we look at it because we know how to solve it okay so we know how to solve linear least squares problems we know how to do all of these things that we're doing here so let's see how it works there's no miracle experiment a is just a random 50 by 50 matrix I don't care that it is in sparse in this case we're just testing out the idea B is just some arbitrarily picked right-hand side so it's common when you solve ax equals B to use B as the vector you remember the Krylov matrix right where'd it go the krail off matrix is seeded by this vector U and then depends on the matrix a as well so what we're saying is for u we're going to use the vector B because we we have that vector lying around in our problem okay so at first that one column is the whole Krylov matrix so we solve the linear least squares problem using a backslash okay once we have Z we can reconstruct the X okay by K times Z that was our definition of X so this is our approximate solution so that's an X that lies in the crea of space and then we'll see how good a solution it is so we're just going to take the difference between B minus a X that's what we call the residual so this is our quality measure and then we're going to add a new column on to the Krylov matrix which is a times the last column again if you look at the definition of the Krylov matrix one column is just a times the previous column okay so this is the next column of the krilov matrix so we run this code let's not worry about these warnings for a second but they are important let's just see what the results are so if we look at the residuals you see that for a while they decrease nicely that's what we want right we're trying to drive this residual to zero ideally or at least small enough to make us happen so for a while we're getting happier but then we see they get worse the residuals get larger and that's not good news for us because it means we're getting further away from the solution rather than closer well there's a simple explanation and it is tied to those warnings that we were getting so I can think back to the power iteration we do know that a to the M times u converges to a multiple of the dominant eigenvector so that means that in the limit as n keeps growing the columns that we're putting into this matrix are getting closer and closer to parallel vectors that's kind of the opposite of what we really want remember our ideal for numerical work is as a matrix that has an orthogonal set of columns or even better orthonormal set right those are the ones that avoid cancellation errors if we have nearly parallel vectors that's exactly the situation where you could add two of them together and get something incredibly small which creates loss of significant digits and that's effectively what happens the way it's manifested is the linear in the linear algebra sentences rank or as we now know the singular values of the matrix so of course a low rank matrix has columns which are linearly dependent one way of dealing the two vectors being linearly dependent is for them to be parallel and so as we get larger and larger we just keep adding vectors that look more and more parallel and so the rank should be growing right should be increasing by one mathematically they do but numerically they don't they're so close to being linearly dependent that the machine can't tell them apart and so our first order of business is going to have to be to fix this aspect of the problem so we know we like certain aspects of these krilov spaces the fact that they're nested that we can construct them by repeated application in the matrix and so on but they do not make a suitable numerical set of vectors for us to work with so we're gonna have to replace them with something better
Info
Channel: Toby Driscoll
Views: 16,003
Rating: 4.942029 out of 5
Keywords:
Id: ji__O4deIZo
Channel Id: undefined
Length: 18min 48sec (1128 seconds)
Published: Tue Oct 23 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.