Applied Linear Algebra GMRES

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so we're going to continue on iterations how to solve ax equal to b with iterations and last time i showed you through some coding and writing some things down about how to do so ax equal to b remarkably just by setting up an iteration scheme and that as you iterate this thing converges to the solution and we had a guarantee of when that would happen the guarantee of one that would happen is when a matrix is strongly diagonal dominant in other words you take the diagonal element and if it's absolute value is bigger than the sum of the absolute values of all the other elements in that row then it would for all uh for every single row then you'd have a diagonal dominant structure and you can simply pick any initial guess you want and you iterate and it will converge to a solution so it's a really interesting guarantee and an interesting way to solve x equal to b because up to this point we've been really talking about methods to solve x equal to b just simply by doing some kind of progressive algorithm like l u like gaussian emulation or qr or svd all these are sort of just reduction algorithms that allow you a way to solve ax equal to b well jacobi and gauss seidel are very simple iteration schemes and we're going to talk a little bit more about uh you know more difficult or more advanced iteration schemes so this is our topic iteration methods for x equal to b okay uh today in fact you know we'll i'll get around to we're going to talk about two methods in particular uh today the first one is going to be called uh the generalized method of residual often called gm res so generalized method of residuals gm rest and the other one's going to be a conjugate gradient or biconjugate gradient methods uh which often is called bcg stab or stabilized by conjugate gradient methods so these are going to be the two methods we're going to talk a little bit about today gm res and in the next lecture after this one is conjugate gradient methods by conjugate gradient methods uh the first thing i want to do is why are we going after these iterative techniques so you could always ask that question right is why would i do this is just a just another way to solve ax equal to b but why not just use l u or gaussian elimination because we've learned how to do those but i want to highlight something and the thing i want to highlight is this progress in 1950 uh i'm looking for another pin here because this one's not very dark 1950 what we could do computationally is matrices if you think about a matrix which is m by m 1950 we could solve these being about n equals 20. so not very big right you remember this is the inventions of the computer so this was very slow all right i'm gonna get a new pin here because that one it's not to my liking all right so let's progress forward in time and let's notice how well things progress let's jump fix 15 years to 65. uh you're about an order of magnitude bigger matrices you can solve numerically remember we still don't have pcs or or computers just available to anybody 1965 it's still pretty hard to get a hold of these and there's people that have them but it's not widespread so you can kind of do methods about this big and i'm going to put some names down here forsythe and mueller and cleve mohler he's the founder of matlab so he was uh he he actually i believe was working at stanford at the time he started thinking about numerical linear algebra and he came up with matlab which is matrix laboratory so at that point is when you started to see more and more access to computers by everyday researchers okay and mower started developing this software package which would allow people ease of use and the ability to do these computations uh you know in a nice way so he was really importing taking modern research and putting it into a computer framework where everybody could just use it okay so this is 1965. vance another 15 year then you're around another magnitude up and this is really coming into linpack which is a linear algebra package and then you could start handling matrices about size 2000 by 2000 okay so you're starting to get bigger matrices 1980 uh it's a long time ago right so this is like 40 years ago and so we're starting to get bigger patches but still pretty small for what we want to be doing um and then 1995 you get yet another order of magnitude now you're in the 20 000 by 20 000 range and this is done with la pac linear algebra package and this is still sort of the state of the art stuff so a lot of the best packages always get migrated to el lopac so that you can actually use them okay so there's this migration now the here's the here's the thing i want to highlight and this is something that was highlighted by traffethen in his book is that if you notice over this time period you notice we got three decimal places so this is like a 10 to the three increase so three orders of magnitude larger problems in this 45 year span now what's interesting is that what how did hardware do how much faster and bigger was hardware going and on the hardware side hard where you gotta increase it like to ten to the nine so this got ten to three bigger in problems this is ten to the nine so this is interesting this got a really big increase this actually only went three orders of magnitude that went nine over from magnitude now why is that because if you notice most the packages here or most the methods we used here like lu decomposition like gaussian elimination like svd or qr these things are often for square matrices and by m they're size order m cubed operations right so we're still talking about to compute ax equal to b solves cost you m cubed so look at this one up by 10 to the 3 but since these are like m cubed this is like 10 to the 3 to the cube power which is 10 to the nine so there you go we had to get a 10 to the nice increase in computational hardware abilities to get a 10 to the three increase in solving matrix sizes so what's the problem the problem is that this is the bottleneck right there this is a bottleneck for solving large-scale computational problems it costs you 10 of the three so the entire game around scientific computing is to figure out is there a way to circumvent this how do i beat this because as long as this is sitting there right if you look at our advances every time you want to go up one order of magnitude you're gonna have to do 10 to three orders of magnitude in your hardware development okay that's a really tough uh thing to keep up with okay so you really want to get around this 10 to the three so people started thinking about other ways to solve x equal to b and that's why we're talking about these iteration methods because the iteration methods are methods and which gives us at least the potential in certain cases to beat this m cubed operation counts to solve ax equal to b okay so let's go ahead and talk about some of the methods out there we already talked about jacobi gauss-seidel but let's talk about more sophisticated methods that you'd actually use in practice oh it didn't race very well [Music] what's going on there all right let me uh let me get some water on that board can i ask a question yes so um you mentioned it basically it goes up to 20 uh 20 000 by 20 000 or 200 000 by 200 i'm just wondering thousand by 20 000. that was 19.95 okay so uh is it common for the real work problem uh to be that size or it actually exhaling uh greater than that uh nowadays it's much much bigger than that you know we want to solve for instance if you work for boeing and you want to solve for fluid flow around it design an airplane right and start numerically solving for this numer uh you know for the fluid flow around the airplane you might want matrices that are billions by billions let me say it this way there's no matrix you can make that's big enough that people would go that's too big what everybody wants is they want bigger faster if you tell people i can solve trillion by trillion matrices tomorrow people will show up to you and say here's my problem then i need a trillion by a trillion matrix you know people are doing now simulations of galaxies right this is massive simulation and the only thing limiting them is they can only do certain size simulations limited by the largest super computers in the world and that's their hard limit that's the biggest they can do with memory and solutions and solution techniques so there's no there's no upper limit is the way i think about it you build it people show up with problems and they want to do it fast so all the algorithm that you use today are basically limited to one computer we never thought about like oh people do all kinds of methods where they can also they do whatever multi-threading right so you spread out computation across across gpus or cpus and then you so uh multiprocessors yeah you can do you can do lots of algorithms that take advantage of this too so in that sense there should be no limit at all right because you can theoretically you know but then there's limits of hardware communication right so oftentimes when you solve these even if you can break it apart so iterative methods are really nice for paralyzing except for you still have to make them talk to each other at some point all the local computations still have to be integrated into the final solution i see so that becomes a bottleneck eventually got it thank you yeah so we're going to really so what we're talking about with iterative methods is exactly all these things is how do we get around this issue of this m cubed in other words any of the really large scale problems nobody is doing gaussian elimination nobody you're always doing something else that's faster nobody can afford even today's problems to say let me go do this big fluid flow example you're not going to do gaussian elimination on it you're going to do something much more sophisticated and so this gets us into the iterative techniques and let's talk about it in a more formal way i'm going to introduce for to you the idea of what's called a krylov subspace so i'm given a matrix a and some vector b and what i want to do is think about what a krylov subspace is and this is a kappa and in the i'll tell you what this is in a minute the krylov subspace is span here or the subspace of this is given by the following set of vectors that span this subspace so the vector b a times b a squared b all the way up to a to the n minus one times b so i'm looking at the subspace that this forms okay so notice what i'm doing is a a squared so this is in some sense this is an iteration structure right i hit b with a hit it again hit it again hit it again n minus one times and then what i'm looking for is these set of vectors this is the krylov subspace how i'm going to take advantage of them but the other thing you know about this krylov subspace since i have these n directions right first direction second direction all the way to n minus one i started with n equals zero so this n of m total i could in fact span this with a set of vectors that are orthonormal okay and these are a subset of the m-dimensional space notice i'm only going to n dimensions so out of this m-dimensional space i define an n-dimensional subspace in this fashion which by the way i could say well if this is an n-dimensional subspace i could come up with instead with n orthogonal vectors by the way how would i do that well i would define i'm going to do this with a qr decomposition remember the qr takes a set of vectors that aren't necessarily orthogonal and then i can move them to a coordinate system in which i instead of in these directions i get n orthogonal directions so we're going to make use of qr here in thinking about how do i want to take advantage of these krylov subspaces okay so that's the space i want to work in and so we're going to start defining some things in particular what i want to do is define the krylov matrix so this is kappa this is k this is going to be a matrix and the matrix i make here is b a b a squared b always a one is one b there it is so this matrix is going to be really important for us as we start to look for solutions to ax equal to b using iteration techniques and the method we're going to use is what's called this generalized method of residuals okay gm reds and it's going to exploit this kind of structure in this kind of space but by the way a matrix such as this i could in fact do a qr decomposition so this thing here can be decomposed into q of n r of n okay so notice that n is telling me how many of the columns i want right and remember that a is a m by m matrix and this is this is m by n and n is the number of iterations i do in this crawl out subspace and notice that this subspace i can in fact move me over by gram schmidt orthogonalization to the qr so that i get in orthonormal directions times some r of n which is just the mapping from this space to that space okay so what this is actually telling you what is it actually telling us remember that what we do in gram schmidt graham schmidt says i have all these vectors they're not orthogonal how do i basically make an orthonormal basis they're not only not orthogonal but they're not you know they're not normalized so what i do the first row of q of n is you just take this vector b and you normalize it that's that's how we did it before we just say pick the vector b normalize it that is q one how do i get q two q 2 i take the second vector and i project out the direction that's orthogonal so i project out the direction that's already along b and i say what's the direction that's orthogonal to b to b or q one now and that's what remember we worked with those orthogonal projectors household or triangularization that whole gram schmidt business comes right back here to say okay going my second vector q2 i pick this vector make it so that's orthogonal to my first vector that's q2 normalized and then i pick the third vector i make this as my initial guess but now make it orthogonal to the two previous normalize and so forth that's how we get this q of n okay so this krylov subspace is sitting there and it turns out what it's going to allow us to do is krylov subspace it's going to allow us to construct a recursion relationship and i'll show you that in a little bit which is going to allow us a very fast way to get a search algorithm for the best solution uh to find this a x equal to b solution through an iterative procedure okay so krylov subspaces you see them everywhere every time you're going to talk about iterations iterative solves for x equal to b you're almost always going to talk about krylov subspaces they show up everywhere okay all right so with that defined let's move on i'm not sure why my board is uh being so terrible today it's pretty bad a lot of black showing up here here i'm gonna do one more wipe down usually it's not this bad especially with fresh pen okay hopefully you're closer there now okay so now we're going to do turn our attention to this generalized method of residuals all right what is it and how do we operate this thing remember what we want to do solve x equal to b that's almost always what we want to do right all these decompositions this whole quarter all we've talked about is different algorithms and they'll come back to solving this right and the decompositions actually tell us a lot more like eigenvector decomposition eigenvalue decomposition the svd they tell us things about the matrix a like the spectrum what the singular values are tell us important directions unitary matrices that tell us interesting things but really at the end of the day they all come back to solving x equal to b usually okay and remember that a itself is sum n by m and b here and by one now let's suppose this actually has a solution let's call it x star so x star is equal to a inverse b so there's that's the truth that's the ground truth the question is how do i compute this because you wouldn't actually take the inverse of this thing it's super expensive remember the entire goal here is not to say can i get a solution it's how fast can i get a solution hopefully everybody understands that the whole game of a lot of this linear algebra is not can i do get a solution it's always about how fast can you do it right computational speed remember i showed you that chart it took 10 to the nine increase in hardware to get us 10 to the three increase in matrix size this is a really tough growth rate if we have to get 10 to the nine every time on the hardware just for a 10 to three increase so we're trying to beat that what we'd love to have is if you increased your hardware by 10 to the nine your matrices went up by 10 to the nine that'd be fantastic right but we're trying to figure out methods that just get us the solution and do it faster this is about the slowest way you could possibly solve this is to compute that inverse okay so gm res generalized method of residuals so what it is is a method that's pretty simple all we're going to do is start an iterative procedure where at every iteration we're going to minimize the residual what do i mean by that i want a of x equal to b or ax minus b to be zero but when i give it a guess x of n this isn't satisfied so the residual it's called r of n is equal to b minus a of x of n okay so there you go this is the residual if you got the right solution if this is x star then x minus b is zero and the residual is zero in the meantime remember how these iteration procedures work you give it a guess you start with x0 which gets you to x1 which gets you to x2 and what you would like to do is figure out how do i determine update rule for the x of n so that this thing will iterate and converge to the solution x star okay so the entire idea here is by the way the x of n is going to live in the space and the krylov subspace okay so that's i get to pick a vector out of here out of this krylot subspace so this is where the krylov subspace comes in this is my subspace for representing each pick and what i want to do is basically put each x of n and i want to pick x of n through a least square fit okay let me square fit and remember i'm going to start off with x0 and i'm going to get x1 so in other words it's going to have to live in the first krylov subspace k1 and then when i go to 2 it can live in the space remember the chronological stage looked like it was spanned by b a b a squared b so forth so i am constrained this x event has to live each time i iterate i get to broaden my krylov subspace so first it only can live in here then in here then here okay so that's the idea behind this so this is going to be some n-dimensional space trying to represent this thing is m by m but this thing here is a con small space you know it's it only spans an n-dimensional space of that out of that m-dimensional space so that's the main idea that's it so the question really comes down to is all right so given that this vector x of n lives in that krylov subspace how do i determine the x event at each iteration that minimizes this at each step and the idea is that if i start off with an x of zero i get a number and i go to x1 x2 and this should get smaller and smaller and smaller and the residuals to just start to decrease and then you're going to stop this when you get close enough to the real solution okay so that's the main idea behind gm res all right let me get the next paper here all right so let's start setting this up so we can solve oh man this is a bad day for board [Music] okay let's leave it like that all right so we're going to solve x equals b and remember i'm trying to play around with this x and so i want to ask the question what happens when i multiply that matrix a against the krylov matrix okay the krylov matrix so in other words a acting on k of n remember k of n is b a b a squared b so this here becomes a b a squared b a to the n b okay and so really what i want to do is think about like well my vector x is going to live in this krylov subspace which is a linear combination of those vectors b a b all the way through here and so one way to think about this is what i'd really like to do then is to minimize in other words the vector x one way to think about it is x sub n is going to be a linear combination i'm going to live in this subspace k of n so x of n is going to be a linear combination of those krylov subspace vectors which are b a b a squared b so i'm just going to do a linear combination and c is going to tell me how much of each column i need to represent x of n the question is how do i pick c i want to pick c so that minimizes the residual okay so another way to think about it is this a k of n here what i really want to do is say a k of n c remember k of n c is like my x so that's a x of n minus b and i want to minimize this respect to the vector c that's a least square fit right there okay and what c is doing is saying i am a linear if i'm going to represent this x of n i'm going to pick c which is a linear combination on krylov subspace color of sub vectors times a i have b i have a i have k so i have everything except for c i want to determine those c's those projections onto the k of n's the columns of k of n such that this quantity here this l2 norm is minimized okay that's the that's the high level idea of generalized method of residuals that's it you just at each step minimize the residual that's the residual okay now we could work directly in this krylov matrix k n but what's the problem with working with k of n directly k of n is a bunch of columns right let me just draw it back here b a b all the way to a of n minus 1 b none of these are necessarily orthogonal to each other right they're just they're not even normalized so just they're just a set of vectors you can clearly do a linear combination of them but we've started to learn in this class hopefully that isn't it more advantageous to work with orthonormal vectors this is the power of the svd it gives you u and v which are unitary transformations this is also the power of the qr decomposition it gave you a unitary transformation times an upper triangular and we use those to great effect because those are very easy to manipulate and we want to do the same thing here what i mean by that is instead of working directly with this krylov matrix we know that this thing here has a qr decomposition and so instead of using these as basis elements why not just use the qr why not use these q of n's they're orthonormal this is a unitary transformation it is much better to work with that matrix than the k of n but this q of n comes directly from the krylov subspace okay so this is a better representation of the krylov subspace but nonetheless it's still coming from that okay so instead of solving this problem what we know is we can say well another way to do it is this here which is my x of n i could instead say x of n is equal to instead of projecting c times q of n as a linear combination i could instead say well why work with that basis when i could work with the q of n's and y is now what i try to determine instead of c remember c is just the loadings onto which one of these co the linear combination these columns whereas y now is same idea but now onto an orthonormal basis which is much easier to work with so really at the end of the day what i would do is instead of using this x of n you trade it out for q of n y okay so this is a a nice representation of that krylov subspace all right there's a lot of pieces to this by the way and i'm going to also make this comment i'm only doing a very superficial scratching the surface of iterative methods if you look at trophethan's book there's a whole lot more there also there's entire books on iterative techniques one by ann greenbaum that is probably the the best in the field it's all about just iterative solvers for linear systems and that's a whole field in itself okay so we're just talking about some very high level concepts here uh because we have so much to talk about in the course and so we're not going to get into a lot of depth here but i'm going to sort of sketch out for you some of the important things here and then if you want and you get very interested in this stuff there's a lot more out there to follow up on have a quick question yes does the fact r is missing here uh affect which set of uh columns you're selecting the fact that r is missing so i'm not sure if i understand the question so instead of saying uh x being uh k c now you're using uh qy right yeah um so basically you're saying x is q y without considering r at all yeah i don't care about r all i care about are these orthonormal vectors that's it i just want to take go from working in that coordinate system this so in other words this here spans that subspace and that's all i care about i don't really care how i got there once i get them i use them okay yeah all right so we'll do a lot of name dropping today just one of those types of lectures but sometimes that's how it goes so the next name to drop is arnold or actually arnoldii i'll know the iterations i was talking about arnold schwarzenegger but these are like arnold schwarzenegger anyway this is like the powerful method for doing iterations and it turns out it underlies a lot of the fundamental ideas of these iteration techniques um our knowledge iterations okay so now we're going to come back to some ideas that we already learned before about matrices and we were doing this with eigenvalue eigenvector uh algorithms which is transforming the matrix and then using inner procedure to find the eigenvalues and eigenvectors so in particular we can write a matrix a in self-similar form with the following transformation q is unitary and h is hermitian so this is a full self-similar transformation into a complete orthonormal set of vectors and this isn't sorry not hermitian sorry hessenberg so you can put this in essential form in fact that is the one or the ways the first tricks you do for doing eigenvalue eigenvector decompositions if you remember we talked about some of the algorithms there the idea was to recursively first put it in hub hussenberg form and once you have it in there then finish off your eigenvalues by doing uh iterations okay here we just have this form which tells us okay i know i can do this decomposition and this is where we're going to make a connection to this idea of when i do a part as i start to do this decomposition this is a full decomposition and if this matrix a is massive i don't want to do a full decomposition what i really like to do is start doing this decomposition stop wherever i want okay and so i want to go n iterations in and what does that look like well so first of all i can rewrite this by multiplying on the right hand side by q as a cubed equals q h okay and what i really want to look at is how would i get the first n columns of that q okay now it turns out if you do a little bit of linear algebra i won't do it here you find this interesting relationship a acting on q of n is equal to q of n plus one h n so all of these are partial dis decomposition so this i'm not doing a fault decomposition what i'm looking at is just the first n columns and here's the important thing you find when you do the first 10 columns there is a recursion relationship between q of n to q of m plus 1 because this thing is hessenberg so we can work this all out by the way and in fact what this tells you is that a acting on q of n is equal to h1n in fact let me write down a hesometic matrix h is equal to h11 h one two all h1n and remember it's it's diagonal with one of the off the below diagonal not being zero h21 and everything else here is zero but because it has this form it creates a recursion relationship between the q of n to q of n plus one which is h one and q of one plus all the way to h n and q of n plus h n plus one um sorry one n plus one q of n plus one and this is a really important recursion formula it's the basis of these are normally iterations in other words once i have the q of n i can get q of n plus one once i have the q of n plus one i can go get the q of n plus two once i get q of n plus 3 i can go get the q of n plus 3. i have a connection formula that tells me how my n connects to the n plus 1. okay so this is in some sense a different version of graham schmidt right because i start off with a certain number of directions remember graham schmidt started with q1 which then allowed you to get q2 once you have q1 q2 it allows you get q3 once you have q12q3 you get q4 same kind of idea here once i i'm just building it recursively and this is the recursion formula that you have from this diagonalization here which is a self-similarity transformation to the hessenberg and remember if you look back at those lectures on eigenvalues and eigenvectors we talked about how could you do decompositions in a principled way to put any matrix in the hecimeric form and then from that hessenberg form then you can finish it off with iterations so this is the main idea behind arnoldi is i have now a recursion relationship that allows me to just iterate from q of q of n to get q of n plus one okay so we have this formula here this is an important one okay and we have a of q of n now remember let me erase here what we had previously how do we bring marnoldi in then to this gm rents well if you remember what we wrote down with gm res we're trying to uh minimize the residuals and that was minimize this q of n y minus b well a of q of n is here is actually q of n plus 1 h of n so when i come back to here then i'll say well i'm going to minimize this over y which is equal to minimize over y this a of n cube is now q of n plus y now it's hassenberg in temperature form y minus b now turns out y and b are both in the column space of q and plus one and so what i could do for both sides is multiply by q n plus one star in other words because this is a unitary transformation i can simply take the inverse of it to basically get the following okay and remember what i'm trying to solve for is this here this y so i want to minimize the y's i have this matrix now i i can compute everything and this is an upper hessenberg form so it already is in the spot and it's in a situation where it sets me up very nicely for doing in fact a least square fit so there's a lot of steps i'm skipping here because i just want to get to the punch line and uh and so that you at least have the main idea but this is this is the idea remember that iteration techniques and i said this previously with eigenvalues iteration first make a transformation into some favorable variable setup like hessenberg form and then do iteration we were talking about that in the context of eigenvalues and eigenvectors but notice i just did it here in the context of solving x equal to b use that hessenberg form and now i've set myself up to do the iteration from here okay all right um the only other thing i want to point out here is that this guy here because now it's in the nth plus first direction this is equal to the magnitude of b times e one vector which is one zero zero zero so this thing here just results in and i know i'm skipping a lot of algebra but that's just the way it's going to go for today boom minimize that respect to y so this is your generalized method of residual algorithm so let's talk about how we do it in practice as an algorithm your first direction is the b this is just like graham schmidt remember that my i'm working with this krylov subspace remember what the krylov subspace was krylov subspace crawler vector for b av so forth all the way to a of n minus 1 b how do we do qr on this we pick the first direction the norm belongs to it that's q1 okay now what do i do well what i got to do is now go get all the other vectors and start this process and times so 4 n equals 1 2 3 so forth first i do arnoldi which is this guy right here okay and the step of the arnoldi solve this okay sorry the arnoldii is what i erased it was the uh the arnoldi is what's going to give you uh sorry i erased it it was that recursion formula and once you have that recursion formula for getting the q of n plus one right so that's the thing you want to get is you get your next q from that recursion relationship that i showed you which was the arnoldi formula right of a acting on q of n which related to the q of n plus one then once you have that then you do this minimization over y h y minus b e 1 and then your x of n is equal to q of n y so to solve that y your next x event is just project onto those orthogonal directions repeat that's it so when you come back up here you connect it to the next direction q of n plus one now you can solve this now i'll go here q of n plus 2 and you just recurse so that's generalized method of residuals so you see it's kind of fancy but it's i mean if you kind of spend a lot of some time with this this i you know this building up to this gm lecture uh gm res lecture there's like three or four lectures in traffic and i compressed it down to one sorry about that but all the details are there and uh my feeling was i wanted to cover other materials instead of just getting bogged down with this and what i'm going to do in the next lecture is i'm going to actually show you this other algorithm so by conjugate gradient stabilized method so it's going to be it's just an algorithmic structure conjugate gradient descent and i'll show you that and then what we'll do is actually show these in matlab how they work matlab has all this stuff programmed in this algorithm is in there and then so you can solve a lot of these x equals e problems by just one line of code and here's the line of code ready x equals g m res a b simple as that

Info

Channel: Nathan Kutz

Views: 2,377

Rating: 4.9058824 out of 5

Keywords: numerical methods, linear algebra, GMRES, iterations methods, Arnoldi iterations, Krylov, Krylov subspace, kutz

Id: pPoLsRkBha4

Channel Id: undefined

Length: 49min 10sec (2950 seconds)

Published: Mon Nov 23 2020