A/B Testing Interview with a Google Data Scientist

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is an a b testing question let's say that you design an experiment to measure the impact of financial rewards have on user response rates so let's say it's a survey and the result shows that the treatment group with 10 dollars in rewards has a 30 response rate while the control group without rewards has a 50 response rate this is you know obviously on and so can you explain what might have happened and how you can improve this experimental design yeah for sure so this definitely seems odd because i mean general intuition says that if you give someone a financial incentive so then they tend to reciprocate or respond more uh if the control group itself is having 50 percent then the treatment group should at least have 50 so some things i would definitely think is uh i mean one would be like i would probably trust the experimentation process and then see that okay maybe offering financial incentive is discouraging them because they might feel that we are kind of like buying their response and they don't want to really do it so that could be a hypothesis that seems very unlikely but that could be the reason everything went well and that is what happening and the company is better off just by not giving incentives another thing could be something in terms of sample ratio mismatch you know like for whatever reason the randomization is not really happening so let's say if we plan to do like 50 50 treatment and control group so that randomization or that division is not really happening so we can go and check on that for example let's say that the link to the survey is systematically getting broken for people in the reward group maybe we are attaching the reward link where they can go and claim and because of that the load time is increasing and they are getting frustrated so basically the experience of filling this survey might be systematically bad for people in the reward group so that is a plausible reason to happen and this generally happens more often than not in companies so when they are trying to render like a new new feature and it is taking systematically longer than expected and people get frustrated and they don't continue with the process flow something like that could be happening so i would definitely go and test if everything is same in terms of time taken to complete among the people who converted what is the time taken to complete that survey form for control and treatment and see if they're significantly different so that is something i would definitely test so let's say if even that checks out there is no problem in terms of like how the survey is loading in terms of you know even the survey completion times it's pretty much the same not significant difference then in order to again i would probably go back to my initial hypothesis saying that for whatever reason people are feeling that giving them reward is kind of like buying their loyalty and not really liking it and they're less likely to complete the survey so in order to again reinforce this hypothesis i would test something like create like one more group with uh something in between like a five dollar reward and six dollar reward and see like how the conversion rate is so based on this high post hypothesis let's say if i give like a five dollar reward then the conversion rate should be around like you know between 30 and 50 if that is again happening then you know that reinforces my hypothesis and then i conclude that it is probably better not to give a reward so i will create like an interim group you know something in between or something more than 10 and see if it is around still 30 or even reduced so i will do something like that like at least from my research work i seen multiple papers that use financial rewards to improve survey rates so i don't believe that that is the case and i would probably go back and check my experiment how my experiment is getting connected again and see if i can fix anything there gotcha what could be a alternative i guess explanation that is there any worth towards increasing the size of the reward in this case like maybe people think 10 is too small you know would it be then worth testing or is it the fact that maybe the control group without any kind of reward gives them you know some sort of other benefit towards like the ten dollars so if this experiment was really testing the impact of reward on survey completion rates then even the call to action or the message should be exactly identical except the fact or you know like email saying that hey if you complete this survey then you will get this you can claim the strength reward if for whatever reason control group had some other incentive so maybe what i am testing is not the effectiveness of 10 reward compared to the control i am testing the effectiveness of ten dollar reward compared to the incentive in control i will definitely go back and look at the exact call to action that i am sending my customers or target population and then see whether there is this i mean what i'm testing is what i am trying to test gotcha yeah so i mean for an example right what if you uh advertise it in the subject line like 10 reward for the treatment group but then for the control group you can't really say 10 reward at all then you have to completely change the subject line in that case right so is that testing multiple variants then or is that testing multiple effects or is that still like a valid test so for the same reward you can always give the call to action in multiple ways for example so there is some research that says that so for the same 10 reward if you give store credit versus like a free cash or you know like amazon gift card so the effects are slightly different reward might be the same but the way you operationalize it will have different effects in the same way like having an email with subject line that says 10 dollar reward seems kind of like like spam because you get all these spam emails that look like too good to be true and people are now used to them and you used to just archiving them or deleting them or reporting a spam so maybe you're not operationalizing in a good manner so you could test like a different call to action where subject line is similar to the control group but it's just the reward that when they open the email that is prompting them to complete it so yeah so if for the same ten dollar reward having it in uh subject versus just having it in body definitely will have different effects again so the reason why we would be testing them is to see what's working and use it for the entire population of users so you test it on a sample and then use it for the entire population there is a downstream process that will use the results of this experiment so you would want to design in such a way that can be used in the downstream for the population of users so i would test both the cases and then see what's exactly happening yeah i agree with that too specifically i was thinking that if one email is more catered towards let's say you know helping out because you should help out on our survey versus another one is more catered towards here's ten dollars and it's pretty easy way to make ten dollars then you're kind of biasing for people that have a financial incentive versus people that just have a regular kind of helping out incentive towards your research study so yeah that makes sense so let's say that this is a corrupt result saw changes in a variety of places right whether it's the subject line or something else and let's say you want to run like a great experiment next time you still have some financial budget um you mentioned that you would just test five dollars next to see what would happen is that the perfect experiment that you could run next time like what is the perfect experiment that you would run instead of this like ten dollars kind of financial reward system could you just like describe it from start to end oh yeah sure so the goal that we are trying to optimize here is conversion rates i mean we want as many people to fill the survey as possible so it's not just the completion rate we would want them to put effort into filling the survey so we would look at the completeness of the survey or you know the length of if there are text questions we look at the length of the text all that so even those should also be considered as the winning i mean the important metrics not just the conversion rate so i will first decide on what all is important to me so are those like the completeness is that important as well so if it is what is the weight rate so i will create like a hybrid metric which is representative of what success looks like for me so that will be my step one is to have like a good overall metric so once that is done i've decided that i want to maximize this new metric that i created i want that to be higher for my winning group and which i would subsequently implement for everyone else so the way that i would decide on sample size for this experiment or the amount of time that this should run so that will be based on practical significance because like given infinite sample size you can detect any smallest of the effects so it doesn't make sense to just put tens of thousands of people because just to detect an effect i will see what the practical significant significance for me is so if you take this example of conversion rate so does five percent increase in conversion rate makes sense for me to implement this reward if yes anything less probably doesn't make sense five percent or more is the only one that i would be interested in so then i will get my sample size based on that effect size five percent improvement given distributions of that metric i will uh get the sample size from so you have like let's say 80 power and 0.05 practical significance and the effect size any sample size calculator will give me like what the sample size for each group should be and so i will see if that makes sense for me so if i have more users that means i can test more groups so that's how i will decide my samples so i will always decide my sample size based on practical significance i'm not saying that okay like these are the people that we used for something else i will just test it so we'll always take that as the starting point so the next step after i decide on sample size would be to see again you know to test different variants like this case so okay we've already seen this so there is a drop because of the reward so then now i can just have my control group with no incentive then i have my other treatment group with 10 reward if i know that everything else checks out which is you know there is no sample ratio mismatch everything was working well and i just want to test this hypothesis again you know like subject line versus some other reward so i will test some rewards something in between which is like five dollar reward if that is enough so this is kind of like a pricing question again you know like what is the optimal price or incentive for you to like finally set so i will test something like a five dollar reward and i will also test uh without subject line having that as just a push inside the email to send so i will not really test the same exact ten dollar reward in the subject line for 30 conversion rate because i already know that so if if my five dollar reward falls somewhere in between then i know that people are getting uh discouraged by uh being shown this financial incentive in the subject line itself so if i don't see that then yeah so then i have like new problems but otherwise like i don't have to test this group again um because now i am assuming that everything worked as is so i will only test the other one and definitely i will test the other one without having this ten dollar reward in the subject gotcha cool yeah last question this one is kind of tangentially related but let's say that we ran this you know new experiment again with like the money 10 reward we made everything correct now we see that the financials incentive has increased the response rate let's say the 60 percent versus the regular one is 50 but we have a feeling that after looking through a few of them that the responses are you know like too fast like the sentences are like they're like very short you know they're actually not doing a lot of feedback what would you think is happening there you know kind of obviously maybe they're just based on the financial success but like what would you do going forward after like seeing this yeah so so i think i kind of touched upon this in the previous one where so i said i'll create like a hybrid metric which takes care of not just the conversion rate but also the number of complete responses and all the other things which is like the length of the text so all this switch matters to me so that's why if you see like a lot of companies so they don't just have like conversion rate as they like single metric they have like multiple metrics as a combination of these metrics with weights as their metric that they're optimizing for i mean even like tech companies you know like conversion rate itself is not the main thing like you know the total revenue or daily active users amount of time they're spending amount of people there amount of activity they are doing so so there is like a holistic metric that will be used for optimization so that will be the outcome metric of the experiment yeah that makes sense yeah and being able to calculate that is uh better towards an overall improvement there so yeah cool awesome
Info
Channel: Data Science Jay
Views: 2,507
Rating: undefined out of 5
Keywords: google data scientist, google data science, a/b testing, a/b testing interview questions, interview query, interview query a/b testing, interview query jay, data science jay, data science, airbnb data science, airbnb data scientist interview, square data science interview, square data science, asana data science interview, asana data science, reward experiment
Id: 2sWVLMVQsu0
Channel Id: undefined
Length: 13min 5sec (785 seconds)
Published: Tue Nov 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.