Lecture 16.5 — Recommender Systems | Vectorization Low Rank Matrix Factorization — [ Andrew Ng ]

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in the last few videos we talked about a collaborative filtering algorithm in this video I wanna say a little bit about the vectorization implementation of this algorithm and also talk a little bit about other things you can do with this algorithm for example one of the things you do is given one product can you find other products that are related to this so that if for example a user has recently been looking at one product on their other related products that you could recommend to this user so let's see what we can do about that what I'd like to do is work out an alternative way of writing out the predictions of the collaborative filtering algorithm to start here's our data set with our five movies and what I'm going to do is take all the ratings by all the users and group them into a matrix so here we have five movies and four users and so this matrix Y is going to be a five by four matrix let's just you know taking all the elements all of this data including question marks and grouping them into this matrix and of course the elements of this matrix are the IJ element of this matrix is really what we were previously writing as Y superscript I comma jeans the rating given to Ruby I by user J given this matrix Y of all the ratings that we have there's an alternative way of writing out all the predicted ratings of the algorithm and in particular if you look at what a certain user predicts on a certain movie what user J predicts on movie I is given by this formula and so if you have a matrix of the predictive ratings what you would have is the following matrix where the I comma J entry so this corresponds to the rating that we predict user J would give to movie I is exactly equal to that theta J transpose X I and so you know this is a matrix where right this first element the one one element is a predictive rating of user 1 on movie 1 and this element this is the 1 2 element is their predictive rating of user 2 on movie 1 and so on and this is the predictive rating of user 1 on the last movie and if you want you know this rating is what we would have predicted for this value and this rating is what were their predicted for that value and so on now given this matrix of predicted ratings there is then a simpler or vectorized way of writing this out in particular if I define the matrix X and this is going to be just like the matrix we had earlier for linear regression to me so if X 1 transpose X 2 transpose down to X of n M Transpo's so I'm going to take all the features for my movies and stack them in rows so if you think of each movie as one example and stack all of the features of the different movies and rules and if we also define a matrix capital theta to be equal to and what I'm going to do is take my per user parameters and stack them in columns so that's by parameter for user 1 for user 2 and so on down to my final user and stack them in columns then given this definition of X and theta there is now a much simpler way to write out this matrix of predated ratings in particular this matrix are predicted ratings can be more simply written as just x times theta and so this becomes a vectorized implementation of how to compute all the predicted ratings of all the users on all the movies and in order to give this approach a name the specific collaborative filtering algorithm that we talked about also has a name called low rank matrix factorization and so sometimes we hear a low rank determine or rank matrix factorization that is exactly this algorithm that we worked on and determine the rank matrix factorization comes from the fact that this matrix X data has a mathematical property which in linear algebra is called that just a low rank matrix and so that's what gives this and so this mathematical property of the matrix excavator of the matrix of predictive ratings gives this algorithms name of low rank matrix factorization but if you don't know when the low rank matrix is don't worry about it you really don't need to know that in order to do this algorithm but either way this gives a nice vectorized way to compute all of the users rate predictive ratings on all of the movies finally having run D collaborative filtering algorithm here's one other thing you can do which is use the learned features to find related movies specifically for each product I really for niche movie I we've learned a feature vector X I so you know when you learn the set of features you don't really know the advanced what the different features are going to be but if you run the algorithm and probably the features will tend to capture what are the important aspects of these different movies or different projects or what have you one of the important aspects that cause some users to like certain movies and cause some users to like different sets of movies and so maybe you end up learning a feature you know where x1 equals real man's x2 equals action similar to an earlier video and maybe you learn it's different feature x3 which is the degree to which this is a comedy the some feature it's Fulgencio or some other thing and you have n features altogether and after you've learnt features there's actually often pretty difficult to go in to the learn features and come up with a human understandable interpretation of what these features really are but in practice the features you know even though these features it can be hard to visualize you want to figure out just what these features are usually it will learn features that are very meaningful for capturing whatever are the most important or the most salient properties of a movie that causes users like or dislike it and so now let's say the one adds the following problem say you have some specific movie I and you want to find other movies J that are related to that movie and so well why would you want to do this right maybe you have a user that's browsing movies and they're currently watching movie J then what's the reasonable movie to recommend to them to watch after the double move J or if someone's recently purchased movie J well what's a different movie that would be reasonable to recommend to them to for them to consider purchasing so now you have learned these picture vectors this gives us a very convenient way to measure how similar two movies are in particular movie I has a feature vector X I and so if you can find a different movie J so that the distance between X I XJ is small then this is a pretty strong indication that you know movies Jay and I are somehow similar at least in the sense that some of the lights movie I may be more likely to like movie J as well so just the recap if your your user is looking at some movie I and if you want to find the five most similar movies through that movie in order to recommend five new movies to them what you do is find the five movies J with the smallest distance between the features between all of these different movies and this could give you a few different movies to recommend to your user so with that hopefully you now know how to use a vectorized implementation to compute all the predicted ratings of all the users on all the movies and also how to do things like use learn features to find what might be movies or what might be products that are related to each other

Info

Channel: Artificial Intelligence - All in One

Views: 54,621

Rating: 4.9441342 out of 5

Keywords: Machine Learning, Machine Learning Video Lecture, Computer Science, Video Tutorial, Video Course, Stanford Video Course Machine Learning, Stanford University, University of Stanford, Stanford, Online Machine Learning, Best Machine Learning video course, Andrew Ng, Andrew Ng ML, Andrew Ng Machine Learning, Andrew Ng Course, Andrew Ng Machine Learning Course

Id: 5R1xOJOFRzs

Channel Id: undefined

Length: 8min 19sec (499 seconds)

Published: Thu Feb 09 2017