Wayfair Data Science Explains It All: Multi-Armed Bandits

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi my name is debbie mccanna I'm a senior data scientist here at Wayfarer and today I'd like to tell you a little bit about multi armed bandits since this is a very brief video I'm not going to go deep into the math here but most of our bandits are very cool and a pretty intuitive technique that's used all over the place in e-commerce so if you're considering a job in industry and data science this is definitely something that you should know about um the basic idea behind the multi-armed bandit in the past is that you have many of these one-armed bandits or if you're not familiar with old-time casino slang slot machines right so you have a whole Bank of these slot machines and you need to and they have different rates of return and you need to figure out which one has the best rate of return so that you can then play that one and make some money that is the conception of the multi-armed bandit problem historically right so how is this after the bull to e-commerce well as I mentioned it actually occurs all over the place anytime that you have anything that you either need to sort or they you need to choose the best one of let me give you a couple examples let's say that you have some ads right and you want to know which of the ads drives the most traffic to your site right well each one of these ads is abandoned with an unknown rate of return so you put a different ads and you get feedback based on who is coming to your site and then you can figure out which one is the best and do that one right similarly if you have a whole bunch of products let's say which we have a whole bunch of and you need to sort them based on popularity right well each one of the products is abandoned [Music] right overturn an unknown popularity so you sort them in an order you get feedback based on who is buying what and then you can reorder them based on that feedback now this feedback component is key because we need to play each one of these one-armed bandits right we get information about the true rate of return sort of incremental information but we get a lot of it right especially if we have a lot of customers like you do in an e-commerce site or a lot of visitors I guess I should say but wafer we have a lot of customers as well so this puts us in the realm of reinforcement learning reinforcement learning tries to balance exploration that is in this context figuring out which of these one-armed bandits is the best with exploitation in this context playing the one that has given the best rate of return so far and making money right now there are many many techniques out there that try to balance exploration exploitation specifically for a multi-armed bandit problem let me just talk about a couple the most common one that you may not have heard of ironically is epsilon first now epsilon first is the idea in this case epsilon is explore where we explore first would balance out all of our traffic and we explore for however much time we need to figure out which one is the best and then for the rest of time we exploit now this is a very very very common technique and the reason you may not have heard of it is because it is also called a bee testing right if you know anything about e-commerce you've heard about it baby testing well this turns out this is a subset of multi impendence another very common one epsilon greedy in this case you choose a small amount of the time let's say 10% of the time where you are going to epsilon or explore the rest of the time you will exploit right so 10% of the time you're going to choose a random order put a random ad up and then the rest of the time you're going to do the best one that you have found so far and then try to make some money so both of those are very common let me just tell you about my personal favorite very intuitive it's called Thompson sampling now if we have two different bandits in this case and the King one has a true rate of return of 40% the black one has a true rate of return of 60% then we're going to fit beta distributions to each one of these and if we have a very little amount of information these will be nice wide distributions and that means when we sample from each of them we will perhaps get the worst one right a ferret log fairly large percentage of the time so that's our exploring but as we explore and explore explore more these Vantage's reviews and tighten right up right around the true values and we end up just exploiting the one that is the best right very intuitive very clear very nice so thanks for listening check back soon to learn more about projects we're tackling in data science here at Waiver you
Info
Channel: Wayfair Data Science
Views: 15,290
Rating: undefined out of 5
Keywords: data science, multi-armed bandits, A/B tests, e-commerce
Id: IxWhvNjqYns
Channel Id: undefined
Length: 5min 21sec (321 seconds)
Published: Tue Feb 19 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.