How (and why) to raise e to the power of a matrix | DE6

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

You expect me to believe 3b1b actually posted a video? I'm no April Fool OP; I will not click on your link.

👍︎︎ 240 👤︎︎ u/seanziewonzie 📅︎︎ Apr 01 2021 🗫︎ replies

Ahh! Dude I've encountered this a number of times in my controls coursework. And I just sort of pressed the "I believe" button and went to using it. I asked a few times if my instructors, but never got a satisfactory intuitive answer. And I don't think it was covered in DiffEQ or LA for me.

We don't deserve Grant.

Edit: omg he's in the thread.....! This must be what people feel like when they see movie stars.

👍︎︎ 26 👤︎︎ u/Shitty-Coriolis 📅︎︎ Apr 01 2021 🗫︎ replies

The return of the King

👍︎︎ 11 👤︎︎ u/CrosSeaX 📅︎︎ Apr 01 2021 🗫︎ replies

Why can't all educators be like Grant? This is just gorgeous.

👍︎︎ 103 👤︎︎ u/jmcsquared 📅︎︎ Apr 01 2021 🗫︎ replies

What is Grant hinting at when he mentions ed/dx ?

👍︎︎ 20 👤︎︎ u/sbcloatitr 📅︎︎ Apr 01 2021 🗫︎ replies

I took an entire class on representations of Lie groups and the exponential map still scares the shit out of me

👍︎︎ 38 👤︎︎ u/SuperDaniel14 📅︎︎ Apr 01 2021 🗫︎ replies

An example I've used to teach matrix exponentiation before:

A man stands facing North at a particular spot in a field.

He walks forward 100 meters, then turns 360 degrees clockwise.

He returns to start (facing North again), then walks 50 meters, turns 180 degrees clockwise, walks 50 meters, turns 180 degrees clockwise.

That is for some n he walks forward 100/n meters, then turns 360/n degrees clockwise, then repeats this n times.

What shape does he trace out for n=3? n=4? n=5? What about for very large n?

As the person I was teaching was already familiar with how translation and rotation are handled in linear algebra (homogenous coordinates, etc), it sufficed to remind them that ex can be defined as (1 + x/n)^n, or perhaps (I + A/n)^n.

👍︎︎ 6 👤︎︎ u/The_Northern_Light 📅︎︎ Apr 01 2021 🗫︎ replies

I had been wondering what happened to the differential equations series! Now that I’m actually taking the class in college I was really missing it, glad he didn’t forget about it

👍︎︎ 5 👤︎︎ u/Zebratonagus 📅︎︎ Apr 01 2021 🗫︎ replies

this channel is awesome!!

👍︎︎ 5 👤︎︎ u/miguel101_ 📅︎︎ Apr 01 2021 🗫︎ replies
Captions
Let me pull out an old differential  equations textbook I learned from in college,   and let's turn to this funny little exercise  in here that asks the reader to compute e to   the power of At, where A, we're told, is going to  be a matrix, and the insinuation seems to be that   the result will also be a matrix. It then offers  several examples for what you might plug in for A. Now, taken out of context, putting a matrix in  an exponent like this probably seems like total   nonsense, but what it refers to is an extremely  beautiful operation, and the reason it shows up in   this book is that it’s useful, it's used to solve  a very important class of differential equations.   In turn, given that the universe is often written  in the language of differential equations,   you see it pop up in physics all the time  too, especially in quantum mechanics,   where matrix exponents are littered throughout the  place, they play a particularly prominent role.   This has a lot to do with Shrödinger’s equation,  which we’ll touch on a bit later. And it   also may help in understanding your romantic  relationships, but again, all in due time. A big part of the reason I want to cover this  topic is that there’s an extremely nice way to   visualize what matrix exponents are actually doing  using flow, that not a lot of people seem to talk   about. But for the bulk of this chapter, let’s  start by laying out what exactly the operation   actually is, and then see if we can get a feel  for what kinds of problems it helps us to solve. The first thing you should know is that this is  not some bizarre way to multiply the constant e   by itself multiple times; you would be right  to call that nonsense. The actual definition   is related to a certain infinite polynomial  describing for real number powers of e, what   we call its Taylor series. For example, if I took  the number 2 and plugged it into this polynomial,   then as you add more and more terms, each of  which looks like some power of 2 divided by   some factorial...the sum approaches a number  near 7.389, and this number is precisely e*e.   If you increment this input by 1, then somewhat  miraculously no matter where you started from   the effect on the output is always to  multiply it by another factor of e. For reasons that you're going to see a bit,  mathematicians became interested in plugging   all kinds of things into this polynomial, things  like complex numbers, and for our purposes today,   matrices, even when those objects do  not immediately make sense as exponents. What some authors do is give this infinite  polynomial the name “exp” when you plug in more   exotic inputs. It's a gentle nod to the connection  that this has to exponential functions in the   case of real numbers, even though obviously  these inputs don't make sense as exponents.   However, an equally common convention is to  give a much less gentle nod to the connection   and just abbreviate the whole thing as e to the  power of whatever object you’re plugging in,   whether that’s a complex number, or a  matrix, or all sorts of more exotic objects.   So while this equation is a theorem for real  numbers, it’s a definition for more exotic inputs.   Cynically, you could call this a blatant  abuse of notation. More charitably,   you might view it as an example of the beautiful  cycle between discovery and invention in math. In either case, plugging in a matrix, even  to a polynomial, might seem a little strange,   so let’s be clear on what we mean here. The matrix  needs to have the same number of rows and columns.   That way you can multiply it by itself according  to the usual rules of matrix multiplication.   This is what we mean by squaring it.  Similarly, if you were to take that result,   and then multiply it by the original matrix  again, this is what we mean by cubing the matrix.   If you carry on like this, you can take any  whole power of a matrix, it's perfectly sensible.   In this context, powers still mean exactly  what you’d expect, repeated multiplication. Each term of this polynomial is scaled by 1  divided by some factorial, and with matrices,   all that means is that you multiply each component  by that number. Likewise, it always makes sense to   add two matrices, this is something you again to  term-by-term. The astute among you might ask how   sensible it is to take this out to infinity,  which would be a great question, one that I'm   largely going to postpone the answer to, but I  can show you one pretty fun example here now. Take this 2x2 matrix that has -π and  π sitting off its diagonal entries.   Let's see what the sum gives. The  first term is the identity matrix,   this is actually what we mean by definition  when we raise a matrix to the 0th power.   Then we add in the matrix itself, which gives  us the pi-off-the-diagonal terms. And then add   half of the matrix squared, and continuing on I’ll  have the computer keep adding more and more terms,   each of which requires taking one more matrix  product to get a new power and adding it to a   running tally. And as it keeps going, it seems to  be approaching a stable value, which is around -1   times the identity matrix. In this sense, we  say the sum equals that negative identity. By the end of this video, my  hope is that this particular fact   comes to make total sense to you, for any of  you familiar with Euler’s famous identity,   this is essentially the matrix version  of that. It turns out that in general,   no matter what matrix you start with, as  you add more and more terms you eventually   approach some stable value. Though sometimes  it can take quite a while before you get there. Just seeing the definition like this, in  isolation, raises all kinds of questions.   Most notably, why would mathematicians and  physicists be interested in torturing their   poor matrices this way, what problems are they  trying to solve? And if you’re anything like me,   a new operation is only satisfying when you  have a clear view of what it’s trying to do,   some sense of how to predict the output based on  the input before you actually crunch the numbers.   How on earth could you have predicted  that the matrix with pi off the diagonals   results in the negative identity matrix like this? Often in math, you should view the definition  not as a starting point, but as a target.   Contrary to the structure of textbooks,  mathematicians do not start by making definitions,   and then listing a lot of theorems and proving  them, and then showing some examples. The process   of discovering math typically goes the other way  around. They start by chewing on specific problems   and then generalizing those problems, then coming  up with constructs that might be helpful in those   general cases. And only then do you write  down a new definition, or extend an old one. As to what sorts of specific examples might  motivate matrix exponents, two come to mind,   one involving relationships, and other quantum  mechanics. Let’s start with relationships. Suppose we have two lovers, let's call them  Romeo and Juliet. And let's let x represent   Juliet’s love for Romeo, and y represent  his love for her, both of which are going   to be values that change with time. This is an  example that we actually touched on in chapter 1,   it's based on a Steven Strogatz article,  but it’s okay if you didn’t see that. The way their relationship works is that the  rate at which Juliet’s love for Romeo changes,   the derivative of this value, is equal to the -1  times Romeo’s love for her. So in other words,   when Romeo is expressing cool disinterest,  that' when Juliet’s feelings actually increase,   whereas if he becomes too infatuated,  her interest will start to fade.   Romeo, on the other hand, is the  opposite, the rate of change of his   love is equal to Juliet’s love. So while Juliet  is mad at him, his affections tend to decrease,   whereas if she loves him,  that's when his feelings grow. Of course, neither one of these numbers  is holding still; as Romeo’s love   increases in response to Juliet, her equation  continues to apply and drives her love down.   Both of these equations always apply, from  each infinitesimal point in time to the next,   so every slight change to one value immediately  influences the rate of change of the other. This is a system of differential equations.  It’s a puzzle, where your challenge is to   find explicit functions for x(t) and y(t)  that make both these expressions true.   Now, as systems of differential equations  go, this one is on the simpler side,   enough so that many calculus students  could probably just guess at an answer.   But keep in mind, though, it’s not enough to  find some pair of functions that makes this true;   if you want to actually predict where Romeo  and Juliet end up after some starting point,   you have to make sure that your functions match  the initial set of conditions at time t=0. More to   the point, our actual goal is to systematically  solve more general versions of this equation,   without guessing and checking, and it's that  question that leads us to matrix exponents. Very often when you have multiple changing  values like this, it’s helpful to package   them together as coordinates of a single  point in a higher-dimensional space.   So for Romeo and Juliet, think of their  relationship as a point in a 2d space,   the x-coordinate capturing Juilet’s  feelings..and the y-coordinate capturing Romeo’s. Sometimes it’s helpful to picture this  state as an arrow from the origin,   other times just as a point; all that really  matters is that it encodes two numbers. And moving   forward we'll be writing that as a column vector.  And of course, this is all a function of time. You might picture the rate of change of the state,  the thing that packages together the derivative of   x and the derivative of y, as a kind of velocity  vector in this state space. Something that tugs   on our vector in some direction, and with some  magnitude indicating how quickly it’s changing. Remember, the rule here is that the rate of  change of x is -y, and the rate of change of y is   x. Set up as vectors like this, we could  rewrite the right-hand side of this equation   as a product of this matrix with the original  vector [x, y]. The top row encodes Juliet’s rule,   and the bottom row encodes Romeo’s rule. So what  we have here is a differential equation telling   us that the rate of change of some vector  is equal to a certain matrix times itself. In a moment we’ll talk about how matrix  exponentiation solves this kind of equation,   but before that let me show you a simpler way  that we can solve this particular system, one that   uses pure geometry, and it helps set the stage  for visualizing matrix exponents a bit later. This matrix from our system is  a 90-degree rotation matrix. For any of you a bit rusty on how to  think of matrices as transformations,   there’s a video all about it on this  channel, a series really. The basic   idea is that when you multiply a matrix by the  vector [1, 0], it pulls out the first column.   And similarly, if you multiply it by [0, 1]  that pulls out the second column. What this   means is that when you look at a matrix, you  can read its columns as telling you what it   does to these two vectors, known as the basis  vectors. The way it acts on any other vector   is a result of scaling and adding these two  basis results by that vector’s coordinates. So, looking back at the matrix from our system,   notice how from its columns we can tell  it takes the first basis vector to [0, 1],   and the second to [-1, 0], hence why I’m  calling it the 90-degree rotation matrix.   What it means for our equation is that it's saying  wherever Romeo and Juliet are in this state space,   their rate of change has to look like a  90-degree rotation of this position vector. The only way velocity can be permanently  perpendicular to position like this   is when you rotate about the origin  in circular motion. Never growing nor   shrinking because the rate of change has  no component in the direction of position. More specifically, since the length of this  velocity vector equals the length of the   position vector, then for each unit of time, the  distance that this covers is equal to one radius’s   worth of arc length along that circle. In other  words, it rotates at one radian per unit time;   so in particular, it would take 2pi  units of time to make a full revolution. If you want to describe this kind of  rotation with a formula, we can use a more   general rotation matrix, which looks like this.  Again, we can read it in terms of the columns.   Notice how the first column tells us that it  takes the first basis vector to [cos(t), sin(t)],   and the second column tells us that  it takes the second basis vector   to [-sin(t), cos(t)], both of which are  consistent with rotating by t radians. So to solve the system, if you want to predict  where Romeo and Juilet end up after t units of   time, you can multiply this matrix by their  initial state. The active viewers among you   may also enjoy taking a moment to pause and  confirm that explicit formulas you get for x(t)   and y(t) really do satisfy the system of  differential equations that we started with. The mathematician in you might wonder if it’s  possible to solve not just this specific system,   but equations like it for any matrix. To  ask this question is to set yourself up to   rediscover matrix exponents. The main goal for  today is for you to understand how this equation   lets you intuitively picture this operation  which we write as e raised to a matrix, and   on the flip side how being able to compute matrix  exponents lets you explicitly solve this equation. A much less whimsical example  is Shrödinger’s famous equation,   which is the fundamental equation describing  how systems in quantum mechanics change   over time. It looks pretty intimidating, and I  mean, it’s quantum mechanics so of course it will,   but it’s actually not that different from the  Romeo-Juliet setup. This symbol here refers to   a certain vector. It's a vector that packages  together all the information you might care   about in a system, like the various particles’  positions and momenta. It's analogous to our   simpler 2d vector that encoded all the information  about Romeo and Juliet. The equation says that   the rate at which this state vector changes  looks like a certain matrix times itself. There are a number of things making Shrödinger’s  equation notably more complicated. But in the   back of your mind you might think of this as a  target point that you and I can build up to, with   simpler examples like Romeo and Juliet offering  more friendly stepping stones along the way. Actually, the simplest example, which is  tied to ordinary real-number powers of e,   is the one-dimensional case. This is when you  have a single changing value, and its rate of   change equals some constant times itself. So  the bigger the value, the faster it grows.   Most people are more comfortable visualizing  this with a graph, where the higher the value   of the graph, the steeper its slope, resulting in  this ever-steepening upward curve. Just keep in   mind that when we get to higher dimensional  variants, graphs are a lot less helpful. This is a highly important equation in its  own right. It's a very powerful concept   when the rate of change of a value  is proportional to the value itself.  This is the equation governing  things like compound interest,   or the early stages of population growth before  the effects of limited resources kicks in,   or the early stages of an epidemic while  most of the population is susceptible. Calculus students all learn about how the  derivative of e^(rt) is r times itself. In   other words, this self-reinforcing growth  is the same thing as exponential growth,   and e^(rt) solves this equation. Actually, a  better way to think about it is that there are   many different solutions to this equation, one  for each initial condition, something like an   initial investment size or initial population,  which we'll just call x0. Notice, by the way,   how the higher the value for x0, the higher  the initial slope of the resulting solution,   which should make complete  sense given the equation. The function e^(rt) is just a solution when the  initial condition is 1. But! If you multiply by   any other initial condition, you get a new  function that still satisfies the property,   it still has a derivative which is r times  itself. But this time it starts at x0,   since e^0 is 1. This is worth highlighting  before we generalize to more dimensions:   Do not think of the exponential  part as a solution in and of itself.   Think of it as something which acts on an  initial condition in order to give a solution. You see, up in the 2-dimensional case,  when we have a changing vector whose   rate of change is constrained  to be some matrix times itself,   what the solution looks like is also an  exponential term acting on a given initial   condition. But the exponential part, in that case,  will produce a matrix that changes with time,   and the initial condition is a vector. In  fact, you should think of the definition of   matrix exponentiation as being heavily  motivated by making sure this is true. For example, if we look back at the system that  popped up with Romeo and Juliet, the claim now is   that solutions look like e raised to this [[0,  -1], [1, 0]] matrix all times time, multiplied   by some initial condition. But we’ve already seen  the solution in this case, we know it looks like a   rotation matrix times the initial condition. So  let’s take a moment to roll up our sleeves and   compute the exponential term using the definition  I mentioned at the start, and see if it lines up. Remember, writing e to the power of a matrix is  a shorthand, a shorthand for plugging it into   this long infinite polynomial, the Taylor  series of e^x. I know it might seem pretty   complicated to do this, but trust me, it’s very  satisfying how this particular one turns out. If you actually sit down and you compute  successive powers of this matrix,   what you’d notice is that they fall into  a cycling pattern every four iterations. This should make sense, given that we  know it’s a 90-degree rotation matrix.   So when you add together all  infinitely many matrices term-by-term,   each term in the result looks like a polynomial  in t with some nice cycling pattern in the   coefficients, all of them scaled by the relevant  factorial term. Those of you who are savvy with   Taylor series might be able to recognize that  each one of these components is the Taylor   series for either sine or cosine, though in that  top right corner's case it’s actually -sin(t).   So what we get from the computation is exactly  the rotation matrix we had from before! To me, this is extremely beautiful. We have two  completely different ways of reasoning about the   same system and they give us the same answer.  I mean it’s reassuring that they do, but it   is wild just how different the mode of thought  is when you're chugging through the polynomial   vs. when you're geometrically reasoning about what  a velocity perpendicular to a position must imply.   Hopefully the fact that these line  up inspires a little confidence   in the claim that matrix exponents  really do solve systems like this. This explains the computation we saw at the  start, by the way, with the matrix that had   -π and π off the diagonal, producing the negative  identity. This expression is exponentiating   a 90-degree rotation matrix times π, which is  another way to describe what the Romeo-Juliet   setup does after π units of time. As we  now know, that has the effect of rotating   everything by 180-degrees in this state space,  which is the same as multiplying everything by -1. Also, for any of you familiar  with imaginary number exponents,   this example is probably ringing a ton of bells.  It is 100% analogous. In fact, we could have   framed the entire example where Romeo and Juliet’s  feelings were packaged into a complex number,   and the rate of change of that complex  number would have been i times itself since   multiplication by i also acts like a 90-degree  rotation. The same exact line of reasoning,   both analytic and geometric, would have led to  this whole idea that e^(it) describes rotations. There are actually two of many different  examples throughout math and physics   when you find yourself exponentiating some  object which acts as a 90-degree rotation,   times time. It shows up with quaternions or many  of the matrices that pop up in quantum mechanics.   In all of these cases, we have this really neat  general idea that if you take some operation   that rotates 90-degrees in some plane, often  it's a plane in some high-dimensional space   we can’t visualize, then what we get by  exponentiation that operation times time   is something that generates all  other rotations in that same plane. One of the more complicated variations on  this same theme is Shrödinger’s equation.   It’s not just that it has the   derivative-of-a-state-equals-some-matrix-times-that-state  form. The nature of the relevant matrix   here is such that this equation also describes  a kind of rotation, though in many applications   of Schrödinger's equation it will be a  rotation inside a kind of function space. It’s a little more involved, though, because  typically there's a combination of many different   rotations. It takes time to really dig into  this equation, and I’d love to do that in a   later chapter. But right now I cannot help but at  least allude to the fact that this imaginary unit   i that sits so impishly in such a fundamental  equation for all of the universe, is playing   basically the same role as the matrix from our  Romeo-Juilet example. What this i communicates   is that the rate of change of a certain state  is, in a sense, perpendicular to that state,   and hence that the way things have to evolve  over time will involve a kind of oscillation. But matrix exponentiation can do  so much more than just rotation. You can always visualize these sorts of  differential equations using a vector field.   The idea is that this equation tells us that the  velocity of a state is entirely determined by its   position. So what we do is go to every point in  this space, and draw a little vector indicating   what the velocity of a state must be if it passes  through that point. For our type of equation,   this means that we go to each point v in  space, and we attach the vector M times v. To intuitively understand how any given initial  condition will evolve, you let it flow along this   field, with a velocity always matching whatever  vector it’s sitting on at any given point in time.   So if the claim is that solutions to this equation  look like e^(Mt) times some initial condition,   it means you can visualize what the matrix  e^(Mt) does by letting every possible   initial condition flow along this field for  t units of time. The transition from start   to finish is described by whatever matrix  pops out from the computation for e^(Mt). In our main example with the 90-degree rotation  matrix, the vector field looks like this,   and as we saw e^(Mt) describes rotation in that  case, which lines up with flow along this field.   As another example, the more Shakesperian  Romeo and Juliet might have equations   that look a little more like this, where  Juliet’s rule is symmetric with Romeo’s,   and both of them are inclined to get carried  away in response to one and others feelings. Again, the way the vector field you’re looking  at has been defined is to go to each point v in   space and attach the vector M times v. This is the  pictorial way of saying that the rate of change of   a state must always equal M times itself. But for  this example, flow along the vector field looks a   lot different from how it did before. If Romeo  and Juliet start anywhere in this upper-right   half of the plane, their feelings will feed off  of each other and they both tend towards infinity.   If they're in the other half of the plane, well,   let’s just say they stay more true to their  Capulet and Montague family traditions. So even before you try calculating the exponential  of this particular matrix, you can already have an   intuitive sense of what the answer should look  like. The resulting matrix should describe the   transition from time 0 to time t, which, if you  look at the field, seems to indicate that it will   squish along one diagonal while stretching along  another, getting more extreme as t gets larger. Of course, all of this is presuming that e^(Mt)  times an initial condition actually solves these   systems. This is one of those facts that’s easiest  to believe when you just work it out yourself.   But I’ll run through a quick rough sketch. Write out the full polynomial that defines e^(Mt),   and multiply by some initial  condition vector on the right.   And then take the derivative  of this with respect to t.   Because M is a constant, this just means  applying the power rule to each one of the terms. And that power rule really nicely  cancels out with the factorial terms.   So what we're left with is an expression that  looks almost identical to what we had before,   except that every term has  an extra M hanging on to it.   But this can be factored out to the left.  So the derivative of this expression   is M times the original expression,  and hence it solves the equation. This actually sweeps under the rug some details  required for rigor, mostly centered around the   question of whether or not this thing actually  converges, but it does give the main idea. In the next chapter, I would like to talk more  about the properties that this operation has,   most notably its relationship  with eigenvectors and eigenvalues,   which leads us to more concrete ways of thinking  about how you actually carry out the computation,   which otherwise seems insane. Also,  time permitting, it might be fun to   talk about what it means to raise e to  the power of the derivative operator.
Info
Channel: 3Blue1Brown
Views: 806,636
Rating: 4.977879 out of 5
Keywords: Mathematics, three blue one brown, 3 blue 1 brown, 3b1b, 3brown1blue, 3 brown 1 blue, three brown one blue
Id: O85OWBJ2ayo
Channel Id: undefined
Length: 27min 7sec (1627 seconds)
Published: Thu Apr 01 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.