Sampling and Simulation Problems in Data Science Interviews | Simulate Fair Coin and Biased Coin

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey guys welcome back to my channel in today's video i want to dive into a few common ask probability questions in data science interviews i call those type of questions common problem because they involve with fear con and the bias con and use one of them to simulate the other if you are interested in learning what those problems are and how to solve them then keep watching a typical version of the problem is you are giving a bicycle that the probability of getting a head is a p p is unknown but we know that it's larger than zero and less than one sometimes the question made it clear that it's not 0.5 how do you design a strategy to simulate a fear coin using that bias coin a fear coin means the probability of getting a head is the same as that of getting a tail and both of them are 50 percent it's pretty clear that we could not use one toss to simulate a fear coin given that p is unknown so we want to try multiple tosses and find the combination of outcomes that have equal probability we can start with two tosses and see what we can get here we use edge stand for head and t stand for tail so for two tosses we can get four different outcomes hh ht th and tt then we can get the probability of those combinations it's obvious that ht and th have the same probability which is p multiplied by 1 minus p and that is what we want to leverage to simulate a fair coin specifically we toss a buys coin twice in a row if it's hh or tt we discard the result and toss twice again when we see ht we could use it to represent a head of a fear coin and when we see th we use it to represent a tail of that fear con in this way we can guarantee the probability of getting a head is the same as the probability of getting a tail now we have the simplest and the most straightforward solution to the problem let's take a few seconds to think about if there's any potential issues with this approach what happens if p is an extreme value say 0.9 we can get the probabilities of the four combinations because we discard h h and t t we only keep h t and th it means that we will be throwing away the results more than eighty percent of the time and it may take many tosses for us to get a desired outcome in other words the efficiency of this approach is low so how could we improve the efficiency of it can we find a way to use the results of two heads or two tails in fact we can we could keep flipping the bicep cone and combine the outcome together to simulate the head or tail of a fair coin let me elaborate this when we get two consecutive heads or two consecutive tails we can toss two more times and here are all the eight possible outcomes let's try to organize them into pairs and the elements of each pair have the same probability and here are three pairs with this finding we can sum the probabilities of the first element of all pairs and that would be the same as the sum of the second elements of all pairs what does this tell us well it means that we can assign the first group as head of a fair coin and the second group as a tail of that fair because they have the same chance of occurrence that's how we can leverage two consecutive heads and two consecutive tails rather than throwing them away using this method we end up only throwing away four consecutive heads and four consecutive tails all the other outcomes can be utilized intuitively the overall sampling efficiency increases a more scientific way to get the sampling efficiency is to compute the expected number of tosses to get a head or a tail essentially on average how many tosses do we need in order to simulate a head or a tail of a fair coin the less number of tosses the higher the efficiency i will leave this part to you to figure out the exact sampling efficiency and how much we have improved the efficiency by keeping two consecutive heads and tails let's now summarize this strategy when simulating a fear coin with the bisocon we can start with two tosses if the outcome is ht or th we can simply return the result of a head or a tail if the outcome is h tt we need to flip another two times and return head or tail if the outcome is one of these combinations we only discard the outcome if it's a four consecutive heads or four consecutive tails moving on we have a more advanced version of the problem instead of asking you to generate a fear coin from a bias coin the question asks you to generate a range of n numbers with equal probability 1 over n from a biased con for example generating numbers 0 1 to 3 each has a 25 probability as you may have noticed if we use head and tail to represent 0 and 1 respectively we are able to simulate two numbers with equal probability but how do we go beyond two numbers feel free to pause the video and think for a second if you don't have any ideas think about if you can simplify this problem a little bit there are interviews when you ask a hard question especially a technical question you could always try to solve a simpler version first or make some assumptions this can be helpful to break down a seemingly difficult problem into simpler pieces okay so back to the question what if the coin is fair rather than biased could you use it to simulate four numbers for sure you can if it's a fear call you could get four distinct outcomes from two tosses hhtt hd and th and each outcome has exactly the same probability i.e 25 now we can break this problem into two smaller problems one to get a fair coin from a bicycle and the other one is to figure out how many tosses do we need in order to get n different numbers well the good thing is that we've already solved the first problem just two minutes before another interview tip i have for you is try to leverage any problem that you have solved earlier in order to solve the current problem now we only need to know how many tosses are needed to get n different numbers again let's start with a simple example if we toss a fear coin twice we have four outcomes each occurs 25 of the time if we toss a fear coin three times we have eight outcomes which is exactly two to the power of three if we toss a fair coin m times we have two to the power of m results what we want is to simulate n different numbers how many tosses do we need well i guess the answer is clear it should be log base 2 of n now you have the answer to the problem it's not as difficult as it appears right one caveat is that the log base 2 of n may not be a whole number so we need to run it up in cases like this for example if n is 5 we need 3 tosses in a row to get 8 distinct outcomes and we have to discard three outcomes now let's put the solution together first of all we need to use the bias coin to simulate a fear coin we toss two times in a row and we can get hh ht tt and th let's say we choose ht to be the head of the fair coin and th to be the tail of that vericon and both cases have the same probability p multiplied by 1 minus p then we want to simulate a range of n different numbers with this fair coin if n is three we want to toss two times because rounding log base two of three to an integer is two for faircoin we can get four outcomes and we only need three of them we can use hh to represent zero ht to represent one and tt2 represent two we discard the outcome th we can translate it back to the unfair con and hh would be htht of that buys the coin the last problem we'll go through today is to submit a buy coin using a fair coin specifically you are giving a fair coin and you are asked to submit a bias coin with the probability of getting a head 1 over n does it sound like a different problem it's actually very similar to part of the previous problem we just solved namely how to use a fair coin to generate n different numbers and each has probability 1 over n a simple example is when n is 4 meaning the bias coin has a 25 chance of getting a hat we need to toss a fair coin twice and we have four outcomes and each has 25 percent probability we can then choose any one of them to represent the head of the bias the coin and use the other ones to represent the tail this will give us a biased coin with a 25 percent probability of getting a head so how many tosses do we need in general actually we have covered this in the previous question as well we will need log base 2 of n number of tosses and if the number is not an integer we'll have to run it up so if n is 5 we need to toss three times and we'll have to discard three among eight outcomes to get the desired probability you may wonder would abandoning the result impact the probability it will not because we just simply consider it not happening by ignoring the result it's a typical technique used in computational statistics and it's called rejection sampling i hope those questions and answers are helpful if you like this video please give it a thumbs up and subscribe to my channel i make at least one video per week to help you with your interview preparation to land your dream job as always guys i appreciate you for taking the time to watch this video let me know if you have any questions or feedback i will see you soon

Info

Channel: Data Interview Pro

Views: 6,940

Rating: undefined out of 5

Keywords: data science interview, data science interview questions, statistics interview questions, coin problems, sampling fair coin, coin sampling problem, statistics interview problems

Id: -SANBbv0-Hw

Channel Id: undefined

Length: 10min 58sec (658 seconds)

Published: Fri Feb 12 2021