PROFESSOR: The
previous video was about positive
definite matrices. This video is also linear
algebra, a very interesting way to break up a matrix called the
singular value decomposition. And everybody says SVD for
singular value decomposition. And what is that factoring? What are the three
pieces of the SVD? So this is the fact is
every matrix, rectangular, every matrix factors into--
these are the three pieces. U sigma V transpose. People use those letters
for the three factors. The factor U is an orthogonal
matrix, an orthogonal matrix. The factor sigma in the
middle is a diagonal matrix. The factor V
transpose on the right is also an orthogonal matrix. So I have orthogonal, diagonal,
orthogonal, or physically, rotation, stretching, rotation. Now we have seen
three factors for a matrix, V, lambda, V inverse. What's the difference? What's the difference between
this SVD, this, and the V, lambda, V transpose,
V inverse, V lambda, V inverse for diagonalizing
other matrices? So the lambda is diagonal
and the sigma is diagonal, but they're different. The key point is I now have two
different matrices, not just V and V inverse, but
two different matrices. But the new great
advantage is they are orthogonal
matrices, both of them. So by going to-- and I can do it
for rectangular matrices also. Eigenvalues really worked
for square matrices. Now we really are-- we have two. We have an input matrix
and an output matrix. In those spaces m and n can
have different dimensions. So by allowing two
separate bases, we get rectangular matrices,
and we get orthogonal factors with, again, a diagonal. And this is called--
these numbers sigma instead of eigenvalues,
are called singular values. So these are the
singular values. These are the singular vectors,
the right singular vectors and the left singular vectors. That's the statement
of the factorization. But we have to think a little
bit, what are those factors? What are the-- can we
see why this works? So I want that. And let me do, as
you see this coming, I'll look at A transpose
A. I like A transpose A. So A transpose will
be, I transpose this. V sigma transpose
U transpose, right? That's A transpose. Then I multiply by A
U sigma V transpose. And what do I have? Well, I've got six matrices. But U transpose U in
here is the identity, because U is an
orthogonal matrix. So I really have just the V
on one side, a sigma transpose sigma, that'll be diagonal,
and a V transpose the right. This I recognize. This I recognize. Here is a single V, a diagonal
matrix, a V transpose. What I'm showing
you here, what we reached is the eigenvalue,
the diagonalization, the usual eigenvalues
are in here and the eigenvectors
are in here. But the matrix is A transpose A. Once again, A was rectangular
and completely general and we couldn't see
perfect results. But when we went
to A transpose A, that gave us a positive
semidefinite matrix, symmetric for sure. Its eigenvectors
will be orthogonal. That's how I know this V
matrix, the eigenvectors for this symmetric
matrix, are orthogonal and the eigenvalues
are positive. And they're the squares
of the singular value. So this is telling
me the lambdas for A transpose A are the
sigma squareds for s-- for A. For A itself. Lambda is the same. Lambda for A transpose A is
sigma squared for the matrix A. Well that tells me V,
that tells me sigma, and U disappeared here because
U transpose U was the identity. It just went away. How would I get hold of U? Well, here's one way to see it. I multiply A times A transpose
in that order, in that order. So now I have U
sigma V transpose times the transpose,
which is the V sigma transpose U transpose--
I'm having a lot of fun here with transposes. But V transpose V is now
the identity in the middle. So what do I learn here? I learn that U is
the eigenvector matrix for AA transpose. So these have the
same eigenvalues, A times B has the
same eigenvalues as B times A in this
case, it comes out here. Same eigenvalues. This has eigenvectors V,
this has eigenvectors U, and those are the V and
the U in the singular value decomposition. Well, I have to
show you an example I have to show you an
example and an application, and that's it. So here's an example. Suppose A, I'll make it a
square matrix, 2, 2, minus 1, 1, not symmetric. Certainly not positive definite. I wouldn't use the word because
that matrix is not symmetric. But it's got an
SVD, three factors. And I work them out. This is the orthogonal matrix. I have to divide by square root
of 5 to make it unit vectors. Oops, that's not going to work. How about that? The two columns are orthogonal
and that's a perfectly good U. And then in the sigma, I
got, well that's a-- oh, I did want 1 and 1. I did want 1 and 1, yes. So I have a singular matrix,
determinant 0, singular matrix. So my eigenvalues will be 0 and
it turns out square root of 10 is the other eigenvalue for
that-- other singular value for this guy. And now I'll put in the
V transpose matrix, which is 1, 1, and 1, minus 1 is it? And those have length
square root of 2, which I have to divide by. Well, I didn't do
that so smoothly, but the result is clear. U, sigma, V transpose,
so here's the sigma. And the singular values of this
matrix are square root of 10 and then 0 because
it's a singular matrix. And the eigenvectors, well the
singular vectors of the matrix are the left singular vectors
and the right singular vectors. That looks good to me. And now the
application to finish. A first application is,
well, very important. All the time in
this century, we're getting matrices
with data in them. Maybe in life sciences,
we test a bunch of sample people for genes. So I have a-- my data
comes somehoe-- I have a gene expression matrix. I have samples, people, people
1, 2, 3 in those columns. And I have in the rows,
let me say four rows, I have genes, gene expressions. That would be completely normal. A rectangular matrix,
because the number of people and the number of
genes is not the same. And in reality, those are
both very, very big numbers, so I have a large matrix. And out of it, I want to--
and each number in the matrix is telling me how much the gene
is expressed by that person. We may be searching for
genes causing some disease. So we take several people, some
well, some with the disease, we check on the genes. We get a big matrix and
we look to understand something about of it. What can we understand? What are we looking for? We're looking for the
correlation, the connection, between some combination
maybe of genes and some-- we're looking for a gene
people connection here. But it's not going to
be person number one. We're not looking
for one person. We're going to find a
mixture of those people, so we're going to have sort of
an eigensample, eigenpeople. Oh, that's a terrible--
eigenperson would be better. So I think I'm seeing
an eigenperson. Let me see where I'm
going to put this. So yeah, I think my matrix
would be written-- oh, here is the main point. That just as I see
in this example, it's the first vector
and the first vector and the biggest sigma
that are all important. Well, in that example the
other sigma was 0, nothing. But in this example,
I'll probably have three different sigmas. But the largest sigma, the
first, the U1 and the V1, it's that combination that I want. I want U1 sigma 1 V1 transpose,
the first eigenvector of A transpose A
and of AA transpose. And the first singular,
the biggest singular value, that's the information. That's the best
sort of put together person, eigenperson,
combination of these people and the best
combination of genes. It has the-- in
statistics, I would say the greatest variance. In ordinary English, I would
say the most information. The most information
in this big matrix is in this very special
matrix with only rank one, only a single column repeated. A single row
repeated, and a number sigma 1, the number
that tells me that. Because remember,
U is a unit vector. V is a unit vector. It's that number sigma
1 that's selling me. So it's like that unit vector
times that number, key number, times that unit
vector, that's this. I'm talking here about
principle component analysis. I'm looking for the principle
component, this part. Principle component analysis. A big application in
applied statistics. You know, in large
scale drug tests, statisticians really have
a central place here. And this is on
the research side, to find the-- get
the information out of a big sample. So U1 is sort of a
combination of people. V1 is a combination of genes. Sigma 1 is the biggest
number I can get. So that's PCA, all coming
from the singular value decomposition. Thank you.