Artwork Personalization at Netflix | Netflix

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

you [Music] let me dive right in and say first of all Netflix isn't a single product it really is hundreds of millions of products because each user or member or profile gets their own Netflix experience that's tailored and shape to them in order to optimize their engagement there are a retention at the end of the month so that they want to renew and so if you look at any one person's Netflix homepage you won't look like your Netflix on page and that's because we're really trying to personalize with machine learning every aspect of it in order to get folks to continue to stream and watch and be engaged and then to renew at the end of the month and so if you look at what's happening out of the thousands of thousands of movies and TV shows we have to figure out for you what are the top 20-30 TV shows and movies that really you'd be interested in so we have to rank everything and say here the top let's say a few dozen that really are the ones you're gonna be interested in watching we personalize the page layout so where do you want to put those top titles for you do you want to kind of organize them into rows with different let's say groupings do you want to have them kind of scattered about we also think that do things like personalize the promotion of new stuff that's just launched so what should we show you that just got added to the service as a billboard or a trailer way up top on on the top of your page and today I'll be talking a lot about personalized image selection and artwork selection but there's also machine learning in things like our search engine how we message people because we want to tell them about shows that just got released we could send you a text or a push notification or an email saying hey check out this new show for you we think you'd be really interested in it because you watch something else and I'll keep going into some of these other examples but first let me just remind folks one of the first things that Netflix really pioneered in 2006 was this thing called a Netflix challenge and that was really looking at how do we do collaborative ranking so that we can all help each other figure out what are the top movies that we should be paying attention to out of thousands or tens of thousands of options so if you look back twelve years ago now there was this really interesting data set that came out of Netflix which was a big matrix of many many movies and many many users and then the star ratings of those movies for those and for those users and so that one user might have watched let's say stranger things and Zootopia and so she's given two stars and three stars so you've got this matrix which is actually much bigger than 3x3 and the challenge was to guess the stars for some hidden entries in this matrix we've moved away from stars because it turns out what really matters is the place people really are engaging in and not so much the star rating and because stars are very aspirational people tend to give you know five stars - let's say movies like Citizen Kane because it's historic and well-known but no one really wants to watch Citizen Kane so it doesn't get that many plays but it gets a lot of five stars so it turns out just looking at what people play is a lot more informative and you can imagine for every user you can have this long binary vector which we call kind of their play history which is out of all the movies put a one on the movies and TV shows they watch and put a zero on the ones they haven't and that represents that users let's say view history so the classic approach in 2006 which really popped up and was really very relevant for this challenge was this thing called linear matrix factorization so you take that ratings matrix R and you figure out how to rewrite it as a product of two skinny matrices U and M so you know U and M might be kind of four by a million and M might be you know four by a hundred thousand you take the product to those two things and you make this giant matrix which is a million by a hundred thousand so if you figure out how to rewrite our approximately as u times M that gives you an idea of how to fill in the missing entries of R so we moved away from linear factorization because it turns out you want nonlinear let's say machine learning power to really help drive improvements here and you can imagine having the view history being reconstructed through a neural network so rather than reconstruct your view history is a product of two skinny matrices you can say let me think of a neural network which takes my view History X on the left hand side then multiplies it by some matrix squashes it through some sigmoids multiplies by another matrix squashes it down and you keep going through these nonlinearities and then you get the smaller version of you're feeling history called Z which is kind of the code and that's kind of like one of the pieces of that skinny matrix like instead of having a four by a million skinny matrix here you can have like a four dimensional code it's a little more than four but you get the idea and then you can grow that code back into an approximate view history and that approximate view history is gonna have some values in your past zeroes which are nonzero and those tell you what are the movies you'd be likely to be interested in next okay so the zeroes will kind of grow into nonzero values and the bigger they get the more important that is for your next movie recommendation so you can reconstruct the x as closely as possible but you have to shrink it down to a lower dimensionality just like the skinny matrices shrank down to lower dimensionality but now we have nonlinearities you can have many of these kind of slow dimensional reductions you can also instead of trying to reconstruct the users view history perfectly say put a distribution like a Gaussian distribution around the view history and say okay it's roughly around here and with this kind of uncertainty you know so a Gaussian bell curve with error bars turns out that's a little bit better because you know your view history is kind of a reconstruction it's kind of our best guess of where you're gonna go in the future given what you've watched in the past but there's uncertainty around where you're gonna go in future so you really should put a full Gaussian distribution to capture that uncertainty this was a technique that was called a variational auto encoder it was proposed by max Kingma at Maxwell and Deidre Kingma in 2014 and it does strictly better than regular auto encoders and kind of traditional neural networks and we further improved beyond that by saying let's go away from the Gaussian uncertainty and let's put a multinomial uncertainty because the Gaussian doesn't really make sense if you think about it Gaussian ZUP probabilities that go negative it's impossible to have kind of a negative movie preference and so we replace the Gaussian at the very end with a multinomial distribution and that was published a few months ago with some of my collaborators and that nicely gives you better predictions that also turned out to some to one across the entire catalog so then you get kind of a distribution of what's the probably gonna play this next that next and the whole thing has to sum to one because you can't have more than 100 percent probability across all the movies just a quick highlight of some of the results if you try this on standardized data sets it does a lot better so here's the million movie lines 20 million data set and the Netflix data set and it turns out this multi nale multinomial variational auto encoder is beating kind of the non let's say multinomial one and it's beating the the classical auto encoder and it's beating weighted matrix factorization which is that classical technique I showed you from 12 years ago and so those are the linear techniques so we're moving into a nonlinear world and a probabilistic nonlinear world that's kind of the better way to capture how to do ranking ok so then here's another quick vignette before we jump into the artwork personalization so one issue is as we were learning off of and of what people have watched in order to do better recommendation we also discover some interesting extensions into causal machine learning and that's because most of machine learning out there is what I just showed you kind of predict what the person is gonna do next but the reality is when you intervene things kind of get really strange and here's an example I'll talk about in a second but it turns out for most machine learning we don't learn just to learn or just to make a prediction we learn in order to make an action in the real world we learn models that tell us what to do next the problems when we act on those models you change the source of the data that was used to collect initially the data to train your models and so it turns out you actually have to start thinking about things like causality this is how big us motivate kind of the next stage but here's a really interesting toy problem just to get us to think about causality this is the price of airline tickets over time and for let's say one one airline and the number of flights being reserved over time and so you can see these kind of spikes in demand at let's say the holidays in 2015 the holidays in 2016 the holidays in 2017 and you can see the prices as well going up during those times right everyone knows that traveling for the holidays is you know when everyone wants to see family and prices are usually a little steeper than when you go during a quiet period so if you plotted the price and the demand or the number of tickets being sold as you know a little two by two to divide sorry a little kind of correlation plot like this you'd see there's a strong correlation with number of flights being sold and the price going up right so you can't use this machine learning model now that was trained on the past data and say oh great if I increase price demand goes up and people buy more of my tickets right even though that's what the data is telling you we know that's not how the world works and this correlation does not imply causation so this is one of these aha things that you quickly realize when you start dealing with machine learning for the real world and the reason is because there's something called a hidden confounder it's not that price alone is determining the demand there's another hidden confounder C which is for example holidays or there's a conference in town everybody wants to travel to that conference or a couple of coure located conferences and so most of machine learning is trying to learn kind of an input to output behavior X is an input and Y is some target output we learned these great functions that go from X to Y called f of X the problem with that is we assume when you learn f of X equals y you assume that X completely controls y there's nothing else that's affecting this kind of world you're studying right it's not just that price is what determines demand there's another hidden variable C which actually determines both C determines kind of your appetite for paying more because if it's the holidays you really have to see family you will travel no matter what and so it can actually change your price sensitivity it'll also increase demand because more people have that time often need to travel in so C is actually hitting confounder which is messing up the analysis and if you're not careful you will learn if you just do traditional machine learning silly relationships and correlations in the data that cannot be acted upon you know if you see people walking in the street with umbrellas when a training and machine learning model might say oh you want it to stop raining tell everyone to keep their umbrella at home and so turns out we discover this the hard way when we said let's look at how we do personalized mesh messaging and send kind of emails or push notifications or pop-up notifications to our members we basically studied it say how X let's say the messaging number of messages when and what types of messages effect the watching of a member so if you watch if you watch more after messages are sent to you that means we should send you more messages right it seems to be driving your viewing but it turns out the same issue kind of affected us and when we just learned kind of the relationship between messaging and how much people watched that f of X equals y relationship was kind of silly and it's because there's hidden confounders and the main hidden confounders people watch more netflix when they also open up their emails about netflix and check their pop-up notifications and it's just because people are busy from 9:00 to 5:00 working then they're done with work that's when they check kind of their pop-up notifications and their emails from us and that also happens to be when they're also available and able to watch and you know get a TV show or movie in and so this correlation where the hidden confounder is really what's messing up that relationship and we studied let's relationship from between x and y and realize it wasn't really capturing what we wanted and that's because there was a hidden confounder which is you know the time of day and other things we can't really measure so the way we fix this is we said let's add some little randomization Z this little source of randomizations basically think of it as a coin flip which says every now and then you were gonna send a message don't send it or you were gonna send one message send it or you know coin flip you were gonna send a message actually send two messages so something that randomizes how you do your messaging and that breaks the correlation right and so then we do this two-stage learning we learn a function which says from this Z randomization predict X which is the messaging and then using the reconstructed X learn another function G to predict Y and it turns out now if you do this two stages of learning you first learn F then you learn G that G will actually capture the true causal relationships because of these hidden confounders and so turns out this was an idea that goes way back it was used to help prove that cigarettes cause cancer and not that cancer causes more smoking of cigarettes or predisposition to cancer and z was that this taxes that were applied on on cigarette sales by different states that was kind of a pseudo random thing happening some states were taxing cigarettes some states weren't that sort of randomization then gets you the actual causal relationship so long story short here's an example where we actually studied how much email push or in app notifications helped you watch more on top is if you just learned using ordinary kind of least squares a model that predicts watching minutes from at a number of emails pushes and enough notifications and that model told us really strange stuff when we didn't do this kind of causal version it told us if you send emails you actually decrease viewing by a little bit if you send push notifications you decrease viewing by a lot and if you send in up notifications you increase viewing by a good amount and that's because it was not the causal model it was just learning f of X equals y and you know of course your business partners who are actually going to look at this say this is completely nonsense how can sending someone a push notification make them watch a whole lot less but it turns out if you do the two-stage version then you get the correct coefficients and you figure out then okay emails help people watch a little bit push notifications get a huge increase in viewing and then in app notifications get a tiny blip upwards in doing but you don't get this nonsensical decrease in viewing when you message someone which you would have learned if you don't understand causality which is what 99% of machine learning out there does it's non-causal machine learning just correlational even if you have a deep neural network it's really chasing after your correlations rather than causations okay so let me now talk about the image personalization effort and we talked about kind of how to personalize and rank and then how to be kind of mindful of causality when you intervene and then another one is you have to actually explain why you're intervening to the users why are you showing me this machine learnings great at making predictions you can be more causal with your predictions but the user still needs to understand why this is kind of what the machine learning system wants you to do so this is what the homepage looked like before we started personalizing images it just doesn't the algorithms don't care how you're gonna present these different movies it just says Oh put stranger things in the top right billboard and put bright in the second position of the first row and then etc etc but it doesn't really know how these movies and TV shows are gonna be portrayed and so the question is which artwork should we show for different movies and using machine learning it was kind of interesting to say let's change the answer for every single user maybe some users really liked that top left picture because it's got this kind of spookiness and in a scary forest looking thing and it tells you oh there's some some kind of scariest aspect maybe some other users will like that middle picture in the bottom because it shows you know there's some kind of perhaps a relationship going on with two characters and they're kind of teenagers high school kids and that's interesting for some users who really want a story about someone they can relate to and so we thought let's use machine learning to personalize that decision and at first we tried to take the classical approach again standard batch machine learning correlational machine learning where we collect a bunch of data and then out of all the possible models in kind of our machine learning library try to find the one that fits the data the best this is kind of the classical machine learning point of view right you sit back you collect months and months within months of worth of data then you have a giant grab bag of models and different parameters for those models and you say this is the one light bulb model that fits at best and you try to do this efficiently with computational speed and statistical guarantees but that's the basic machine learning approach called batch machine learning turns out there's a very big cost of that approach when you're dealing with kind of real systems in the real world and that is we've got this long process that we're kind of spending months over our massive user database of hundreds of millions of different people where we have to collect data on those people then spend some time learning that model then tuning it engineering it then running this big thing called an a/b test where in experience a half the users get the classical machine learned sorry classical Netflix experience and in a and B and R the other half of the users get this new machine learning spear Ian's and then we see which one kind of does better is it a or B and if B is better than we roll out B to everyone else a problem with that is it incurs this massive amount of regret because for many many months you're wasting time collecting tons of data and building these models that they're just just perfect and then you're saying now let's test they're just perfect model Intel and Selby for several months compare it to sell a and then finally you're like oh this is better and so for all these users and all these months you've been giving them really a worse experience until you roll it out that's the classical approach and we said let's be a little more mindful about this big waste of resource and look at online learning which is a much more efficient way of minimizing that regret and so the idea there is lets interleave the learning machine learning process with the data collection process let's interleave kind of learning and action taking with data collection rather than waiting for all the data to be there then waiting for the best best model to be figured out and then take action interleave the learning the action and the data collection together that's called online learning and the classical approach to this is related to this multi armed bandits technique where you think of you walk into a casino there's many slot machines each one of them pays off with a different probability and you can play one arm at a time as a gambler you want to figure out which arm or omission machine is the is the best paying one and so you want to try them out and you're collecting data about the machines but you don't want to sit there on one machine and just try it a thousand times so you're exactly sure that it's not a very good payoff machine and then move on to the next you want to kind of try the machines and then kind of slowly figure out oh this is the lucky one that really pays out a lot and just keep going to it afterwards and so there's smart algorithms that figure out how to do this optimally and you can apply those and the basic idea behind a multi-armed bandit is you've got some learner and the environment and you really want to sequence the learning with the action and the rewarding you're gonna get right not just wait for tons of learning before you figure out how to take optimal actions so the learner chooses an action tries it in the environment observes a reward updates kind of their model and then chooses a better action and it really is trying to maximize cumulative reward or minimize kind of cumulative or minimize cumulative regret in hindsight both of those are equivalent and so that's kind of the idea versus just collect tons of data and along the way maybe have very few rewards until you're super confident then be super rewarded you try to blend it too and then there's a version called a contextual bandit where in addition the environment also tells you a context so you get information like is the is it is this slot machine blinking or is a is the slot machine you know a loud big slot machine or a small slot machine of the corner so there's additional information about the slot machines called context but similarly it's the same idea it's the algorithms are very similar but the key difference between standard supervised learning and bandits is again this idea of online learning and taking action and updating your models in a kind of sequential way but in a bandit unlike supervised learning where you get input X you predict an output Y and then you get to see what's Y correct yes or no and you get to see the real Y the actual label afterwards right and supervised learning we get to see the target in contextual bandits you see an input X you predict an action to take a and the world only tells you if a was good or bad it gives you a dollar if you got the right answer gives you zero dollars if you didn't but doesn't tell you oh the best action was action 15 when you proposed action 12 that was wrong should have said action 15 it just tells you no action 12 was wrong I won't tell you what the right one ought was that's kind of another fundamental aspect of contextual bandits that's different from classical machine learning of X to Y so in other words here's an example of how supervised learning if you say here's a picture that's the context and you say ok I'm gonna label this as a cat you're told that's wrong and the correct label is dog right that's the supervised approach in contextual bandit you say that's a cat you're told no that's wrong you have to try again because you can't just say oh now I know the labels dog like it was done in the second example here now you have to try again you can't say Fox and that's not Fox also you get zero reward try seal so it takes a lot more effort when you only get this kind of you know reward type signal versus the true label so here's sorry I didn't do the drop-down um so again there's a cat you guess dog it's wrong eventually for the next example you know to see dog immediately but in the contextual bandit example you have to keep trying until you get the the correct answer so it may take you 12 tries on the same input to figure out what the right output is because you're only getting binary answers okay so for the artwork problem this is how we set it up as a contextual bandit the context X is that user's view history what did they like to watch in the past what country are they in things like that the action is what's the best image to show for that show so that TV show or that movie so there may be in this example nine different images for stranger things sometimes you know 20 or so images you show that image for stranger things for that user that's the action you took all you get to see is the user didn't like it and didn't play the user doesn't tell you oh by the way if you show me the image of those two kind of teenagers then I would have played it then I'll tell you the actual true label they just tell you no that was bad so you have to try something else and then you get the binary reward Niva uh-huh that's that's the image that work for them so you may have to do kind of nine different tribes until you get that reward figure it out okay so what is a good outcome for us the reward is when our subscribers at home want some content like it enjoy it okay that's the reward for for our algorithms a bad outcome is they don't click they don't watch they abandon that session they try to find something and then they just do something else instead and we're gonna read a book or play game all right so that's our reward reward is people watching and you can compute how well you're doing roughly by this thing called the take rate so if you look at we have let's say three users and this is what we show them on their page that there's a female user up top she gets all these images on her home page then there's another couple of users they get these other images and we're trying to figure out for altered carbon how good is this image of altered carbon see that top left image it's kind of someone's neck and there's kind of a glowy thing on the back of their neck so it turns out for that female user she liked that image and she clicked and watched those two other users got to serve that same image they didn't click and watch so we got rewarded once out of three attempts and so we get a take rate of one third for altered carbon ok so we think of it as we made one dollar out of three possible max dollars we could have made in that situation that's the take rate so how do we optimize take rate and minimize regret maximizing maximize cumulative reward we do this thing called a contextual bandit and the actual specific algorithm there's a bunch out there there's UCB there's epsilon greedy there's Thompson sampling Thompson sampling is a pretty good one at it handles kind of non-stationarity in the data so users can kind of change their mind and things can kind of evolve over time but the basic big picture is instead of looking at all your users and waiting until you have that single best model with Thompson sampling try a bunch of models randomly different users get different models and then you slowly eliminate models that seem to be working badly and you drop them out you X them out then you keep trying and trying and trying and eventually you'll kill off all the bad models until you're left with that best light bulb model that's kind of the idea behind Thompson sampling you start off with a prior distribution which is all models are all pretty much equally good that's kind of the P of theta the models and then you sample from that and then you do with that model some action you get an a context X you compute an action get a reward and then you update your data and you update your models distribution and now you've eliminated some models and favored others and slowly you'll figure out what the winner is but unlike the old approach where you have to do tons of data collection tons of modeling and then try things out here you're doing kind of slow elimination of the bad models so here's an example of how we figure out what the best image without any contacts this is just an unperson alized solution and a non contextual bandit this is just a vanilla bandit where we don't care about the context of what does this user watch and what country they in we quickly figure out by doing kind of this thompson sampling algorithm that the bottom right image is the best one and that's the one that gets the most rewards and we try all six randomly eventually just like being in a casino with six slot machines we figure out that's the best long machine where there was two actors in it but then we can also include the context which is the users view history and say okay what's the best lot machine for you what's the best image for you based off of your view history in the country you're in and a few other things but primarily that no other kind of demographic information for example if you watch a lot of romance like this person here with serendipity eternal sunshine and while you were sleeping the best image for Good Will Hunting is this one that's what the algorithm figures out because you like you know that's the gateway into this movie why should I pay attention to this movie we figure out you like romance and here's a romantic thing that happens in this movie and that's kind of motivation to give it a try here's another user downstairs that's the user who watches a lot of silly comedies turns out if you watch a lot of comedies the algorithm suggests you should watch Good Will Hunting with a picture of Robin Williams because he's a comedian okay so it does kind of sensible things you know a few other things that realizes if you watch a lot of mooma Thurman movies for a pulp fiction show the picture with uma in it if you watch a lot of John Travolta movies for pulp fiction show the picture of George volta so this is kind of automatically inferred from the data without you know anyone saying oh this is how we deterministic we figure out this rule it's learned okay so how do we evaluate how well this is doing it turns out there's a great technique called replay and this allows us to figure out if you built an algorithm like this figure out how well it will do once you've roll it out in the real world so before you try it out in the real world you have to prove to the business folks that this is gonna help but not hurt so here's an example where for disenchantment we change the image and we have six users and half of them got the image with kind of this queen princess with a crown and then the other ones have this kind of multi character image and these were kind of random randomly assigned at the very beginning those are the log actions that happen in the real world we want to predict now how a new algorithm which will do different things does so here's a new algorithm and it says for this first user I should have shown them the the queen picture for the second user queen and the other two kind of the multi character and then the last user should have gone queen and queen how good is this algorithm given the data we've logged in the past and we don't have the counterfactual right I don't have for example for this that second female user her response to that Queen image I don't know what she would have done maybe she would have played it when she didn't play it before the way we do it is we say okay look at when you actually matched what happened in real life and on those examples compute your average take rate or your average reward so forget the data when you don't match but when you do match what happen in the real kind of real world log data we realize oh okay I matched half the time with this new algorithm and this algorithm picked correct matches where there were two plays and picked a correct match where there wasn't a place so the take rate for this would be 2/3 whereas for instance the other random policy might have had to take rate of 1/3 because only these two things got plays for example right so they the logging policy really got six chances and got it right two out of six but this new policy from our model only agreed with a logged one half the time but when it did agree it was on those winners that actually got plays so that's how we compute the this replay take fraction it turns out this is this is an unbiased way to evaluate how you're doing before you go out in the real world and try things out it's easy to compute the observed awards are real unfortunately it requires lots of data and it has kind of high variance if people kind of know what I mean there are techniques to improve on it there's a technique called doubly robust offline policy evaluation as opposed to replay but nevertheless it gives you a good idea of before you go out and try stuff out in the real world how well is it gonna do so this is kind of our estimate and take fraction of this contextual bandit algorithm and you can see the take fraction of random in green which we just don't know anything we just try images randomly on the user's that gives us a pretty bad take fraction in the middle is a bandit which doesn't care about your view history it just figures out what the best image is overall doesn't try to personalize it just says this is the best image if I had to throw away all the other images and then the contextual bandit is the one that figures out for your use use for your view history here's your best image for your view history here's your best image for your review history here's your best image and it does definitely better than non-personalized and the variety also helps because if you don't have too much variety there's no point in personalizing if there's just one image that's the winner no matter what so there's about a dozen sometimes more than that 20 or so images if you kind of check with your friends and their Netflix you'll say oh okay there may be 12 different images for orange the new black and getting this one my friends are getting that one and it sometimes Wiggles around and changes the best image for you it's not always the best image because your view history changes your ex changes over time you know you start watching more romance you're gonna get the more romantic images okay so how do we do this now with an online performance evaluation first if you want to go online you need to be smart about your engineering you need to scale a lot of the stuff offline can be done with kind of simpler approaches where you don't to worry about you know latency and things like this but when you're actually gonna serve these images you're gonna add calls from the Netflix UI for you know what should I put on the home page what image should I put in the search in the galleries all these requests mean about 20 million requests per second at peak so if you want to test this thing in the real world it better be able to scale to that number of requests and also the UI code written is assuming image lookup is super fast so it turns out if you want to plug-in in a machine learning model into this kind of super fast process it might slow things down it might not be reliable because you know we need really need to test this out without really rewriting the UI code or slowing you down the UI code so there's two strategies you could try live compute where you compute the best image for each user live or you can do online pre compute where you pre compute for every user not waiting for the request but pre compute before they even open their browser what the best images are for each title for them this requires more computation on the right and more caching on the right hand side on the left hand side it gives you the freshest possible data because you only figure out for this person what their image is right now which means if they just want to show you can maybe figure out oh they just watched romantic show here's the more romantic image this is super fresh on the left hand side but it really requires strict SLA and it's very fast turnaround and me it makes you really use only simple algorithms whereas if you do online pre compute you can run more complicated machine learning models the problem is you have to compute for every user ahead of time they're in and store them and cash them so it's making you kind of do more computation and sometimes you don't need that actual users image because they're not gonna log in that day real quick here is the architecture but the basic summary is where you need to do this it pre-compute and so for we run the bandit for each title in each profile we choose the best personalized image for that pair we store that best personalized image pair for that user in evey cache and then we can do a quick lookup off of EB cache at request time so again we're not doing live computation of the images at request that's the key we also have to do a lot of logging ahead of time we in precompute we log what the image was that was selected for this particular user and movie pair we we store the probability of that image being selected remember these are bandits if the log the probabilities event having taken that action with the log of the candidate pool because in some countries some images aren't available for various reasons legal and so on and we also snapshot your features the context of the user at that time so all that has to happen as well there's a lot of logging happening behind the hood and then we have to join the rewards with the features and this is done all kind of with delorean we train the model using spark and this is gets then published to production and at the end we have now a scalable system we also have to monitor it we track the quality of the model online all the time making sure none of these jobs are failing we compare a kind of offline metrics to online metrics and we keep a little bit of the traffic also with randomized images just to do sanity checks and then finally we also have a graceful degradation scheme so it's great to do super sophisticated machine learning and personalized images but sometimes more complicated systems fail more often so it's good to have a simpler backup system for when the complicated thing fails and then even so you find the best image in a nun personalized way and show that if you don't have the best personalized winner and go to a default it may if you really don't know what to show because both systems failed so we have kind of a hierarchy of machine learning systems that are simpler and simpler and simpler in case the complicated ones fail there's always some backup image users don't like it when you show a black icon for for a movie with a number on it okay so we tested this as an a/b test it worked and improved let's say the engagement of the users and was rolled out to 130 million members and it turns out to help the most for lesser-known titles everyone knows pulp fiction's so you don't really need to know although John Travolta's and pulp fiction now I'll watch it but something new that's not well-known it's important to kind of figure out for that new title what is the connection with this user so that they'll actually give it a charge a try like the why behind the machine learning oh you're recommending me this title why oh because it has a cool car chase scene and that's what the picture is telling me and I like action and car chases here's a video [Music] [Music] okay so very quick what's next I mean if it works well on images which you try to change every aspect of the UI and make it all personalized so we're looking at ways of changing how we present the rows the rankings that the title of the row the evidence like oh this won an Emmy or not should we personalize what we tell you about it the synopsis should I write long text or you only want short text versus just changing the images the metadata you know all of that can also be personalized the trailer choice can also be personalized and finally we're also looking at ways of picking the artwork automatically rather than having people doing kind of a curated artwork kind of palette and giving us the 20 images for a show can we just kind of scan the footage of the show and automatically find potential good artwork and then do this kind of personalization edge it on top of that so great I'll wrap up here we're hiring if you are interested take a look at research Netflix com thank you very much and one of your earlier slides you mentioned multi multi noisy can you talk a bit about what how you detect noise is what you do to discard them and does that help or improve your selections so the denoising good question it's so we were saying multi normal you know and saying we're not actually denoising it in the classical way we're just saying that that's our model of the noise and the uncertainty and it basically changes your neural nets loss function and it also gives you an output which is really a kind of a sum to one probability so that's all we really did we changed the last layer of a neural net so it gives you a slightly different output than what a typical neural net gives you which is just a bunch of numbers between you know minus infinity and plus infinity now gives out a distribution which sums to one across the entire catalog of movies and TV shows so if a user starts watching a movie and then stops watching it because they didn't like it so what is the what about the algorithm do about this because if you give it like an image day like you're gonna start and then regret having started it so you would learn if you want them if you say oh it's bad if they do that then you give them the worst possible image so they don't even start right so that's a good point um we do have better ways of evaluating if somebody really got their value from starting that movie and TV show so if you start a TV show and you only watch you know for a few seconds and cancel we don't count that as a reward so the reward is actually a little more subtle than just clicking it's not click based reward it's really looking at how much you watch as well but that's kind of really in the weeds and the details yeah it's a good point yeah you want enough of this show of being watched for it to be considered a reward think of it that way and how do you address the issues that there are multiple users under the same accounts so for example if Emily or so for multiple users on the same account sometimes you can switch profiles and that tells you if you're in with in one profile you're watching romance in another profile you're watching cartoons then we personalize for the profile but if it's many people sharing a profile then we just treat it as a user who has a you know a diverse viewing we don't try to tease apart who is really behind the screen so when doing playback to evaluate your newly learned policy how why doesn't the take rate of the the old take rate the old success rate of the images that you've now changed matter so the old take away does matter so if the old locking policy had a really bad take rate then it's harder to get a really great taker because you have to only agree with it when it was successful so yeah you you basically have this high variance problem where you want to improve as high as possible your take rate but you also want to coincide very often with the old logging policy and so there's kind of a trade-off there yeah well are you liberating you know one more question yeah sorry yeah are you liberating any of the recommendation systems that are out there because you know I know you're talking about contextual algorithms right there are recommendation systems like prediction IO action ml is it home built within Netflix or you know are using something that's out there liberating I mean so in terms of all the code behind all these algorithms it's all custom written code we're not kind of using standard packages a lot of the stuff the you know the equations are fairly reasonable but the implementation because of the scale we really have to do with our own in-house custom frameworks so the kind of context you abandoned is kind of like proprietary so that the code for all this is built in-house proprietary yeah [Music]

Info

Channel: Data Council

Views: 3,057

Rating: 5 out of 5

Keywords: recommendation system, Netflix personalized recommendation system, image personalization, image personalization engine, image personalization engine netflix, tony jebara netflix, machine learning

Id: UjQMEjkrUGo

Channel Id: undefined

Length: 46min 55sec (2815 seconds)

Published: Wed Jan 02 2019