Matrix multiplication as composition | Chapter 4, Essence of linear algebra

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
It is my experience that proofs involving matrices can be shortened by 50% if one throws matrices out. -- Emil Artin Hey everyone! Where we last left off, I showed what linear transformations look like and how to represent them using matrices. This is worth a quick recap, because it's just really important. But of course, if this feels like more than just a recap, go back and watch the full video. Technically speaking, linear transformations are functions, with vectors as inputs and vectors as outputs. But I showed last time how we can think about them visually as smooshing around space in such a way the gridlines stay parallel and evenly spaced, and so that the origin remains fixed. The key take-away was that a linear transformation is completely determined, by where it takes the basis vectors of the space which, for two dimensions, means i-hat and j-hat. This is because any other vector can be described as a linear combination of those basis vectors. A vector with coordinates (x, y) is x times i-hat + y times j-hat. After going through the transformation this property, the grid lines remain parallel and evenly spaced, has a wonderful consequence. The place where your vector lands will be x times the transformed version of i-hat + y times the transformed version of j-hat. This means if you keep a record of the coordinates where i-hat lands and the coordinates where j-hat lands you can compute that a vector which starts at (x, y), must land on x times the new coordinates of i-hat + y times the new coordinates of j-hat. The convention is to record the coordinates of where i-hat and j-hat land as the columns of a matrix and to define this sum of the scaled versions of those columns by x and y to be matrix-vector multiplication. In this way, a matrix represents a specific linear transformation and multiplying a matrix by a vector is, what it means computationally, to apply that transformation to that vector. Alright, recap over. Onto the new stuff. Often-times you find yourself wanting to describe the effect of applying one transformation and then another. For example, maybe you want to describe what happens when you first rotate the plane 90° counterclockwise then apply a shear. The overall effect here, from start to finish, is another linear transformation, distinct from the rotation and the shear. This new linear transformation is commonly called the “composition” of the two separate transformations we applied. And like any linear transformation it can be described with a matrix all of its own, by following i-hat and j-hat. In this example, the ultimate landing spot for i-hat after both transformations is (1, 1). So let's make that the first column of the matrix. Likewise, j-hat ultimately ends up at the location (-1, 0), so we make that the second column of the matrix. This new matrix captures the overall effect of applying a rotation then a sheer but as one single action, rather than two successive ones. Here's one way to think about that new matrix: if you were to take some vector and pump it through the rotation then the sheer the long way to compute where it ends up is to, first, multiply it on the left by the rotation matrix; then, take whatever you get and multiply that on the left by the sheer matrix. This is, numerically speaking, what it means to apply a rotation then a sheer to a given vector. But, whatever you get should be the same as just applying this new composition matrix that we just found, by that same vector, no matter what vector you chose, since this new matrix is supposed to capture the same overall effect as the rotation-then-sheer action. Based on how things are written down here I think it's reasonable to call this new matrix, the “product” of the original two matrices. Don't you? We can think about how to compute that product more generally in just a moment, but it's way too easy to get lost in the forest of numbers. Always remember, the multiplying two matrices like this has the geometric meaning of applying one transformation then another. One thing that's kinda weird here, is that this has reading from right to left; you first apply the transformation represented by the matrix on the right. Then you apply the transformation represented by the matrix on the left. This stems from function notation, since we write functions on the left of variables, so every time you compose two functions, you always have to read it right to left. Good news for the Hebrew readers, bad news for the rest of us. Let's look at another example. Take the matrix with columns (1, 1) and (-2, 0) whose transformation looks like this, and let's call it M1. Next, take the matrix with columns (0, 1) and (2, 0) whose transformation looks like this, and let's call that guy M2. The total effect of applying M1 then M2 gives us a new transformation. So let's find its matrix. But this time, let's see if we can do it without watching the animations and instead, just using the numerical entries in each matrix. First, we need to figure out where i-hat goes after applying M1 the new coordinates of i-hat, by definition, are given by that first column of M1, namely, (1, 1) to see what happens after applying M2 multiply the matrix for M2 by that vector (1,1). Working it out, the way that I described last video you'll get the vector (2, 1). This will be the first column of the composition matrix. Likewise, to follow j-hat the second column of M1 tells us the first lands on (-2, 0) then, when we apply M2 to that vector you can work out the matrix-vector product to get (0, -2) which becomes the second column of our composition matrix. Let me talk to that same process again, but this time, I'll show variable entries in each matrix, just to show that the same line of reasoning works for any matrices. This is more symbol heavy and will require some more room, but it should be pretty satisfying for anyone who has previously been taught matrix multiplication the more rote way. To follow where i-hat goes start by looking at the first column of the matrix on the right, since this is where i-hat initially lands. Multiplying that column by the matrix on the left, is how you can tell where the intermediate version of i-hat ends up after applying the second transformation. So, the first column of the composition matrix will always equal the left matrix times the first column of the right matrix. Likewise, j-hat will always initially land on the second column of the right matrix. So multiplying the left matrix by this second column will give its final location and hence, that's the second column of the composition matrix. Notice, there's a lot of symbols here and it's common to be taught this formula as something to memorize along with a certain algorithmic process to kind of help remember it. But I really do think that before memorizing that process you should get in the habit of thinking about what matrix multiplication really represents: applying one transformation after another. Trust me, this will give you a much better conceptual framework that makes the properties of matrix multiplication much easier to understand. For example, here's a question: Does it matter what order we put the two matrices in when we multiply them? Well, let's think through a simple example like the one from earlier: Take a shear which fixes i-hat and smooshes j-hat over to the right and a 90° rotation. If you first do the shear then rotate, we can see that i-hat ends up at (0, 1) and j-hat ends up at (-1, 1) both are generally pointing close together. If you first rotate then do the shear i-hat ends up over at (1, 1) and j-hat is off on a different direction at (-1, 0) and they're pointing, you know, farther apart. The overall effect here is clearly different so, evidently, order totally does matter. Notice, by thinking in terms of transformations that's the kind of thing that you can do in your head, by visualizing. No matrix multiplication necessary. I remember when I first took linear algebra there's this one homework problem that asked us to prove that matrix multiplication is associative. This means that if you have three matrices A, B and C, and you multiply them all together, it shouldn't matter if you first compute A times B then multiply the result by C, or if you first multiply B times C then multiply that result by A on the left. In other words, it doesn't matter where you put the parentheses. Now if you try to work through this numerically like I did back then, it's horrible, just horrible, and unenlightening for that matter. But when you think about matrix multiplication as applying one transformation after another, this property is just trivial. Can you see why? What it's saying is that if you first apply C then B, then A, it's the same as applying C, then B then A. I mean, there's nothing to prove, you're just applying the same three things one after the other all in the same order. This might feel like cheating. But it's not! This is an honest-to-goodness proof that matrix multiplication is associative, and even better than that, it's a good explanation for why that property should be true. I really do encourage you to play around more with this idea imagining two different transformations thinking about what happens when you apply one after the other and then working out the matrix product numerically. Trust me, this is the kind of play time that really makes the idea sink in. In the next video I'll start talking about extending these ideas beyond just two dimensions. See you then!
Info
Channel: 3Blue1Brown
Views: 1,856,612
Rating: 4.9768581 out of 5
Keywords: math, three, matrix multiplication, mathematics, three brown one blue, one, 3b1b, matrix, 3 brown 1 blue, linear algebra, blue, linear transformation, 3brown1blue, brown
Id: XkY2DOUCWMU
Channel Id: undefined
Length: 10min 3sec (603 seconds)
Published: Mon Aug 08 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.