COS 302: Applications of Matrix Factorization

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in the next couple of videos we're going to be talking a lot about matrix factorization so things like singular value decomposition these are ways to take a matrix and write it as the product of some hopefully simpler matrices decomposing a matrix into factors is a powerful way to tackle different kinds of problems in linear algebra however it also turns out that it's a powerful way to reason about data in the real world let's talk a little bit about matrix factorization in general in particular low rank matrix factorization our starting point is to imagine that we have some a not necessarily square let's imagine that it is m by n and what we're going to do is say that a this matrix of interest is represented as the product let's say maybe even approximately the product of two other matrices u and v transpose and what we're going to say is that u and v are tall and skinny and so a is low rank to give you kind of a picture of what i mean by that we can imagine that a is some rectangular matrix as we said it's m by n and we're going to imagine that a is low rank and so there's some other matrix u and u is in u is m by k where k is less than both m and n and we're going to say that this is multiplied by some other matrix v transpose and v transpose is also k by n okay so this is saying that a is low rank that is it's literally of rank k assuming that these are linearly independent columns and these are linearly independent rows now usually in linear algebra we think of low rank as being kind of a bad thing to have in a matrix but it turns out to be a really nice way to think about structure and data in the real world let's take an example that we sometimes call topic modeling let's imagine that we have a matrix a and the rows of the matrix a represent different documents in some corpus so imagine maybe that we have a bunch of scientific articles or we have newspaper articles or or something so we have a matrix where the rows are going to be let's say documents so there's some number of documents they're represented in the rows of the matrix and then let's imagine that what we're going to do is represent the different vocabulary words that might exist in those documents let's have those be the columns so any given entry in this if we looked at the ijth entry then this a i j is going to be the number of times the word with index j appeared in the document with index i so many of these things are zero so you could imagine that in articles on neuroscience probably basketball has zero counts but on the other hand in sports articles basketball has an interesting number of counts and so of course we would expect the number of times any given word appears to be informative about what that document is about and so one of the really interesting things we can do is try to discover what we would call the topics associated with these counts so what we could do is imagine that this matrix of counts from some corpus is approximately low rank just like we drew up here what does this give us so we might ask how does some entry aij come about well it's the inner product between some row of this matrix and some column of this matrix right so it would be taking the i throw and the jth column of this matrix if we think about this as a row vector and this is a column vector and we take their inner product then this is going to be large when those values tend to align so we could think of a ij as being a row vector from u here the ith one and we're going to be multiplying that by some v j which is i'm going to say is the uh is this is the column vector from this matrix v so document i will have a bigger number of counts for word j whenever this inner product is large so the idea of topic modeling in this way is to try to discover vectors u that describe documents and vectors v that describe vocabulary in such a way that it approximately explains this matrix so we get a big set of counts and we try to fit these parameters that is the entries of these smaller matrices we try to fit those in such a way that they explain the big matrix of counts and it turns out that if you do that you will discover coherent groups of words words that tend to co-occur so you can think of these as like correlated words so if i see the word barack in a document i'm probably going to see the word obama also in that same document a topic would capture that and it might put those words together into a politics topic basketball and court might similarly appear together quite often and maybe they would land together into a sports topic so what we hope happens whenever we do this kind of low rank approximation is that the low dimensional space that is the the k dimensions will represent k different kinds of topics and that the u matrix will represent how much any given document is about those topics the entry will tend to be non-zero whenever a document is about that topic and similarly the words will group into what topics they are relevant to and in both cases documents can be about multiple things and words can appear in multiple topics what we hope is that we can learn these kinds of vectors from the data and it will discover topics without being told about them in advance this is an example of how factorizing a matrix might help us understand and represent a corpus of documents and understand vocabulary another interesting way that we can use matrix factorization to analyze data is to think about things like social media let's imagine that we represented the follow graph say of instagram with a matrix so now we might have something a little bit different maybe it's a square matrix that's users by users but here we're going to do something a little bit different what we're going to do is put an entry in here whenever user i follows user j so a i j is 1 let's say if user i follows user j and otherwise it's zero so this is the adjacency matrix of the directed follow graph say on instagram or twitter and so the rows are followers and the columns are followees again if we've tried to approximate this in a low rank way the hope is that we'll discover different kinds of communities and different kinds of interests as before we imagine that this is going to factorize and now in this case there's some low dimensional we'll say again k inner dimension but now in both cases the number of rows and the number of columns here are the same because they're both users so what we're hoping will happen here is that when we have a big follow matrix with a whole bunch of users then it will again try to find matrices such that this for user i and user j i will tend to follow j if the inner product between this vector and this vector is large so if most of these things are zeros then the inner product will be zero and so it will not predict that user i is following user j but perhaps if they're big it will tend to be closer to one so if we fit this to data what we hope to see and and what we do often see in practice is the idea that the different columns of this matrix would represent different interests or different communities so for example on instagram i follow machinists and woodworkers and nissan 300 zx twin turbos but other people might follow things about fashion or maybe things about the travel or all kinds of different topics right and and of course people also segregate according to language and geography and many other things and this matrix is representing the kind of content that i'm interested in consuming so what's going to appear in my feed will be what this matrix is about and then this matrix is about the kinds of things that people produce this matrix is trying to capture information about how someone gets followed so i only produce say content about woodworking i don't produce content about 300 zx twin turbos so maybe i would have a different vector determining what i consume versus what i tend to produce again the idea here is that because this is restricted due to being a low rank matrix that in order to explain this large amount of data that we can learn these matrices and that these kinds of patterns of consumption and production will actually emerge just from looking at the follow graph the point here is that this neither of these examples of topics or social media feels very much like solving a linear system and yet factorization gives us a powerful tool to learn structure from the data that would help us make predictions so some new person comes along and you want to make a recommendation about who they should follow then you look at the inner product between them and that content producer and if that inner product is big maybe that's a good recommendation for somebody that they should start following or it allows you to go in and group different users so you might discover that some entries correspond to different geographic locations and so then when you find out that someone is from that area and you make recommendations of content that's relevant to those people the point is things like svd actually do give us ways to solve these kinds of problems so even though singular value decomposition will feel like a very abstract and possibly complicated idea for reasoning about matrices it has unambiguous real-life implications that are far beyond just linear algebra it lets us analyze all kinds of different data and make predictions and these kinds of algorithms really do exist and are deployed in the real world matrix factorization is one of the most powerful ways to perform recommendation in an online platform so a very very common thing and one of the ways that people got started thinking about this was for example the netflix problem where again you can imagine that instead of users by users or documents by vocabulary it would be users by movies where the entries are the possible ratings that someone might assign to that movie then when we discover this factorization the rows of this user matrix will be someone's taste and the columns of the movie matrix would ideally represent say different genres or directors or different properties of movies that would cause someone to be interested in it the list of things like this goes on and on and on people have employed these in lots and lots of different settings and not just recommendation systems but lots of different problems that people can frame in terms of what's called a matrix completion task

Info

Channel: Intelligent Systems Lab

Views: 552

Rating: 5 out of 5

Keywords:

Id: 67a8CIukcPA

Channel Id: undefined

Length: 12min 6sec (726 seconds)

Published: Sun Feb 28 2021