"The Riddler" screencast: Monte Carlo simulation in R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Dave Robinson chief data scientist at data camp and in this screencast I'm gonna be showing an example of how to solve a probability puzzle in R so I've done many screencast before and in each of them I've been analyzing a data set that I haven't seen before but I recently got an interesting a quest of how could I solve a math problem using simulation in R so really kind of a pup solving a puzzle rather than analyzing a data set so I'd love to show an example of this and rather than using tiny Tuesday datasets as I've been doing in previous screencasts I'm gonna be using the the Riddler column from 538 so this is a bi-weekly column that actually gives a puzzle each week usually involving probability sometimes math so I'm gonna solve I'm gonna try to solve the Riddler Express and possibly the Riddler classic that's the longer version it's a longer puzzle from one week go the reason I picked a week go is this prompt this most recent problems a little more about math and I'm than probability so I wanted to pick a problem that I hadn't that I could use a simulation so I've read this problem once very briefly to see that it's about probability but I haven't otherwise tried to solve it this is my first attempt let's see how I do all right this first problem let's see World Chess Championship all right I'm going to read this a bit slowly so you can have a chance but I'm saying it this says that I'm magic of two players one of them is better than the opponent so they win 20% of all games lose 15% of games 65% of games are drawn okay draws actually this is gonna be interesting all right so we say wins are worth 1 point draws our half point losses 0 points who's the first player to 6.5 points alright so we're going to do a simulation pause for this so this probably can be solved mathematically I like like understanding the probability distributions in truth I'm really not as not anything special at Mathematica since graduate school I like to solve these problems using simulation so we're going to learn to play simulation do simulation of these chats chess tournament and learn the chances that the better player wins the 12 game match okay I'm gonna copy these two parts of it so we can keep track of our numbers here in this chess probability puzzle okay what we're gonna do tidy simulation I'm loading the tidy verse package that includes the apply on ggplot2 and what I'm gonna do is simulate a large series of trials and each of them will have I'm gonna simulate all 12 games all right yes so this is this is different than how you might have seen our other our simulations done it's the way that I like to perform it it's certainly not the only approach what I do is I start by saying we have how many trials let's say start with a thousand we can do more later I can say wonder with the trial one to a thousand I'm using the crossing function from tidier that gets all combinations all combinations of what you ask well I'll do trials and game is one to two now I say every we have a thousand trials each of which has two games leave it to thousand rows I'm gonna change that to be twelve games so I have twelve thousand rows if I took a look at them it would be like trial one game one trial one game two trial one game three and so on now I've got all the games in one tall data frame I can simulate all the games at once what I say is his result is let's see wins are worth one point Oh it says first to six point five does that mean does that mean the mature and then we think about this very quickly twelve games one point for each it is isn't it possible that no one gets to six let me see isn't it possible that no one gets to six point five cuz I'm seeing as a second one second and winds are worth one point so if one player one if what is if players drew a series of times let's see six five or seven rupees round wins are worth okay first player to six point five it's a it's confusing to me because I'm not a hundred percent sure maybe I can read this somewhere else where i'm alejandra wasn't sure whether if someone if neither player gets to six point five if they still win let's see okay do you have to get to six point five because you can't have more than oh it was a half point for each player oh I guess ah you could oh I I take it back you could if they drew every game it would be six for each player okay so this makes sense there's always going I wasn't thinking correctly it's always going to add up to twelve and you could draw overall the question is could you get more than six point five all right I feel feel better about this I needed to understand that a little bit I'm gonna keep track of the score of this better player all right so let's focus on here it is 20 percent fifty percent sixty five percent what I want to do is sample from three options they this player wins twenty percent losses losses getting zero fifteen percent and draws sixty five percent so I'm sampling with probabilities point two point one five and point six five what I just did is there it is this sample function oops I need to do sample oh right and give it the number and is the number replaced equals true probables is there it is all right so what this does is get one value from this vector n times the number of road times it'll replace each time and the three and the probabilities the weighted probabilities are twenty percent fifty percent sixty five percent uh here we are okay so yeah we're taking a look at this we actually see is that um is that in the first game this what would case for example the player did I'm tied here and here and lost here one here alright now that I've done that I can add a group by trial and summarize score equals sum of result so today's status get this player score in every match whose trial one trial to trial three and and where I can find out is I can actually what I like about it being um tidy is I can immediately histogram the results and say this is the distribution of the scores that the the player ends up with so we see all the way from it's occasionally what is this see I'm gonna make it ten thousand instead I think I've been huh yeah and what I do when I do it bin width I should say been with 0.5 no 0.25 keep every there it is this would be the histogram of the resulting scores so can sometimes be as the score the player can be as low as two point five sometimes it can be high as as high as ten and the player wins whenever let's see color equals red y-intercept equals six point five the player wins whenever they get at least six point five so if it's six overall they would lose x-intercept alright so the story would be how often do they get at least six point five so I'm gonna call this scores and I'm gonna we can histogram it but we can also simply look at find out how often is it above 6.5 but I say summarize mean of score is greater than or equal to six point five and we find out the player wins 52 percent of the time I can actually increase the number of trials a little bit more if they want a little bit if I want more precise answer and we can keep running this if we want a couple times it's about looks like it's about 52% so the better player is winning more often than by chance but barely more often than by chance Wow it's actually even less impressive that then I thought this is 20 percent 15 percent 65 percent okay yeah that's um that's really pretty interesting I think yeah that that is uh yep that's interesting I'm gonna make this a million for a moment 1e6 get as accurate a score as we can you notice to take in a minute it took at 12 million rows so at 15 1.9 percent chance that the better player wins with these win-loss draw probabilities all right okay so the next question so that that's an answer to what's the probability they'd win a 12 game match now they could say how many games the match have to be to give the better player each of these chances alright so what we can do here is take advantage of this crossing I'm actually gonna copy the same values but instead of crossing just trial I'm gonna do just a thousand first I want to say trial and end games and I'm going to give it a couple of values twenty forty sixty eighty a hundred let's say now and then I also crop and then let's see yes well this will say is I want 20 games 40 games 60 games 80 games a hundred games now I've got a neat trick I can actually let's see I can here's the fanciest thing I think I'm going to do today is say game equals MVN games perform the function seek Len seek Len says um says cake if you guys like for it'll get 1 to 4 100 is 1 to 100 in this case I'm gonna apply that to every one of these values and expand it by that much so I just simulated a thousand games a thousand case trials with 20 games a thousand with 40 a thousand with 60 a thousand with 80 a thousand with a hundred and now I can apply the same method where I group by both trial and end games and oh I need to us I simulate the result good and I summarize the score uh uh-huh oh it's score let's see oh I need a pipe here and it's in games oh and games here it is so now I can say when they're 20 games the score was 8.5 when they're 40 games this in this random simulation and all the way up to 100 now you win if it's greater than 1/2 the number of games so I would say when is score greater than I'll actually ungroup after the summarize because I don't want to perform this while it's still grouped ungroup so did the better player win what was the score greater than end games over - and it looks like in this case no in this case yes in this case no in this case yes great it's the last thing I'll do is group let's see I can actually alright I'm going to do this a bit differently I'm gonna group run this huh I'm going to just summarize in one step or because that we leave this summarized by n games I can actually go right ahead and summarize it again well say mean how often did the the player win say mean score greater than half of a games alright I'm gonna throw 12 in there because that was our original value boy so yeah we've got some noise here I'm actually gonna add bring it up to 10,000 trials not the world's fastest code but it'll do alright so we say in 12 games 52 percent or so 20 games the the better player wins 57% of the time as we increase the number of games we increase the probability that the better player wins but notice we're still not really getting to UM to the values we're hoping for here like 90% certainly not a 99% I'm not sure we're gonna be able to I'm going to do 12 but also I'll do it in units of 25 25 250 by 25 and games and I'm going to call this alright so this actually gets us where we can start making a graph of how does end games affect your probability of the better player winning and we get this line plot and we've seen that um that early on you need you need absolutely suddenly at least 12 games but up at 25 50 100 it starts leveling off so it's getting um this makes sense cuz it can't go above 1 I'm gonna increase it make it go all the way to 500 this will not be the world's fastest code alright and we're gonna see what's the effect of increasing the number of games here we go there it is we're seeing 75% you need little under 100 90 percent 95 we're not quite getting to 99 I can tell you that usually when when I want to do simulations like this I actually usually want to do it on a on a logarithmic scale that is notice here I'm going to actually say giome point and I have evenly spaced points here that's just not usually how I would simulate a form like this I would actually have the points get farther apart and then put the x-axis on a log scale yeah okay because that usually ends up being the way to UM to tell what the trend is and where it passes particular points okay yeah what I'm gonna do now is say um is instead of evenly spaced um notice this is like 1225 everything I'm going to go in powers of two from 12 so I'm actually going to say twelve times two to the power of one to six start with that so that's various doubling points along this curve I'm actually gonna say one to seven going all the way up to 1500 games oh but I should be zero to seven that includes 12 24 48 and so on you really can make lots of choices here depends on your patience um look do this a logarithmic scale I think is gonna make it easier for us to UM attend for our particular points on this line so what I'm gonna say is do 10,000 and that's maybe 50,000 trials of each of these number of games so it's not the world's fastest code it's gonna take a minute to run we could have we probably have sped it up a couple of ways but what I like about the Tidy approach is well it's not the absolute fastest some matrix operations are going to be faster one thing that it does is it um it's very easy for me to keep adding complications like add a number of trials to it and to summarize the results because they're all in this tidy form in - here we go probability so here's the number of games as it approaches there's a when will it hit a 99% and we get this curve we can actually say we throw in a scale Y and Lluis labels equals % format this is probability better player with X's number of games why is probability better player wins this is pretty neat we are probably not gonna have a hard time getting some values like getting a solid estimate for values like number of like 90% I am going to change my change one thing uh-huh instead of powers of two I'm going to do logarithmically increasing values from zero to seven five point five what does this mean so you go from 12 16 24 33 48 it's actually adding more points into the middle of these alright and yeah I'm gonna give I'm gonna give a give that a shot it's gonna take it a little bit longer you know round this value it's going to take a little bit longer but um yeah this shows how often does someone win when there's this number of games again we could have solved this mathematically using I think a binomial distribution but I want to show how we would do simulations and how I would bring like Monte Carlo simulations in are how I would approach this problem okay it's gonna be one minute the last step we're going to need to do is we're gonna want to interpolate between these points to figure out the probability like when does it cross 75% we can't get that precisely so instead we're going to be doing an interpolation alright so here we have let's make this graph notice there are more points there now I see this very nice curve notice as the log of games goes up the probability keeps going up and up and up and then it starts leveling off alright so um so 99% is probably somewhere between these two points alright so the last thing I can do is I've got one game sim is there any way I can figure out values in between these well that's where the aprox function in R comes in what aprox does is say use linear interpreter blasian and say I predict Y is a function of X here actually want to predict end games the function of wind at a particular point I want to say what value of X's and games sim what value of win will lead to so the value of win I'm going to use X out is 0.75 where is 0.75 going to be on the endgames scale the last thing I just realize is I usually do this in a log scale so I'm gonna use use what value of win of when will lead to yes I want to approximate log of and games okay because what I'm doing here is I'm saying log of n games if I add that column it goes up oh I'm gonna do quickly log to some more oh yeah that's right I have it yeah it goes up by particular increments and I'm gonna say what value well would the log of games be at 75 75 because that's what I was doing and this plot is using log of number of games but it Y and what I get is I can add dollar why what did see is is I would expect log of games to be a four point four so if i exponentiate that again it looks like we need about 80 81 games which matches here it is we can see it's somewhere in between this value and a hundred so against eighty weird 81 games to get to 75 we need what else we got we need 9 how would we get to ninety percent to about 250 games how would we get to over the other option a 99% chance this one is probably really rough but we see about 763 and we saw in fact that 768 was about 99 so you know type it over so generally these would be three somewhere the 80 is 257 63 if we ran it for a little longer we could get more precise again this is a simulation approach to problem that could be solved mathematically I wanted to demonstrate this example all right it's interesting curve I'm sure it is definitely related to the binomial distribution yeah that's that's uh cool alright so that was Riddler Express that was one simulation of a chess prava tournament problem I'm gonna read through this example from Riddler classic so I'm gonna give it give it um let me see I'm going to get to read through this myself and you have a minute to read through yourself or skip ahead if you want okay is this kind of interesting this is a two-phase game where we place balls randomly into a set of n cups and we have our throwing and our pruning okay thinking about how we'd solve this let me see okay well and the question is does it fill up every single cup all right I'm gonna try this in the time there's definitely a little bit harder than the other problem I'm gonna try it tidy approach first and a hundred percent sure whether I can make it work let's see okay first we have to choose a value of n I'm gonna start with a small one we will say N equals N equals four and what I'll say is um we pick how many balls are we are we throwing takes balls randomly those are the cup ah we need every cup to contain at least one ping pong ball aha so I'm actually not in this case gonna do it tidy because there's a recursive relationship within this this we're simulating until all the cups have a ping-pong ball okay so I'm actually gonna have a slightly different approach it's gonna be less tidy than my other my other approach all right so and I'll speed it up later or if I figure out ways to do that what I'd start with is that um we see we have hmm could represent it as a list I don't want to do that I'm going to be adding a list of vectors that gets appended to I don't love the efficiency of that adding that like appending to vectors not finalists that this is not on what R is most especially the adding two vectors is going to be a slow part of the of the step oh okay what I can actually do is are there n cups there are n cups and balls okay I'm going to represent that let's do one iteration right now I'm going to represent it as a matrix of zeros we say and row actually zero integers now that I'm thinking about it when it's zero L means 0 integer I'm gonna say how many rows and how many columns and there we go now what is this some what are what iteration am i performing here I'm gonna do the iteration where I say I say here's our matrix every cup every column is a cup every row is a ball number so I'm going to do is in every iteration I'm going to take a random ball put it in a random cop that means random ball one of these four rows random Cup one of these four columns aha so I'm gonna say M sample from one to four in a sample one and say I'm actually gonna say Rho is this column is this so grab two values and then say M of row column we gonna optimize this later make it a little bit faster but it's alright right now row column and then say this is this plus one and now let's take a look at another look at em what we just did is we drew ball 1 and we put it in column in a in the cup for all right we did that once now I would need to do it again so I actually I'm gonna use a for loop we don't always use for loops in R but in this case again we're probably gonna find a way to make this a little bit faster but what I'd say is here we go for I in I'm gonna start with a loop it's not going to end up being a loop I'm gonna say I in 1 to 1/2 for loop is gonna be a different kind of loop what it says initialize the matrix do or loop and then let's take a look at the matrix alright we threw a hundred balls and we ended up with all every column got a bunch of balls of every type that's - that's right here this is too much we um we have loads of we don't we just this is loads of data 7 5 5 5 I have five six eight four yeah this is this is um too much okay so the too many balls everything already has loads of uh loads of them I'm gonna actually try throwing in ten I'm curious now we every ball is and every Cup is not every cup contains at least one ping pong ball when I say every cup contains I'm doing a call sums operation on this matrix so I actually don't want a four loop I want RAL call sums of em well let's see well then let's say all call sums of them are greater than zero so oh no well not all kind of or I can say well any call sums is equal to zero I say well any cup is still empty well any cup is still empty they don't say choose one so actually gonna say this um I'm gonna call this ball and then cup so we say Oh Rose bomb this cup well there we go okay I would I'd say is um here we go there we performed it once we got a result one one one three four all right we get another result we get another result we get another result okay every one of these times let's see yes what we're doing is is saying keep looping and then and then at the end we'd have a we'd end up with a matrix um where we say here's here's this one has cup one has balls one and three and four come to is ball three cup three is ball 1 cup 4 is ball one that's the throwing face face step two is the pruning phase what I want to say is remove any ball whose number does not match the containing cup this is this is cool this is actually that wait where will they match they'll match here here here here that's called die at the diagonal diagonal II can say die Evita they look at the dying of em alright yeah the diadem will actually say 1 0 0 1 that cups 1 & 4 after the pruning phase so like pruning phase would drop this drop this and and we could actually say no we did not satisfy the condition the round is not complete because there there are empty cups these two cups are empty so I can actually say in the pruning phase we just returned that the diag oh look that's the throwing phase so the throwing phase is the phase and then the pruning phase comes after where we say diag of em alright so here's where we're playing the game every one of these times heck that time everything was empty all right and um okay well that's interesting um overthrew why that's interesting in a second so a whole bunch of times it ends up empty wow it really is taking a while to finish the game where after a round is completed there are no empty cups alright so we're gonna actually put this in a function called simulate round so I'm gonna say it's just a function where we say and here you go we say okay here we go and now if I were if I run the function simulate round I get the number of balls of match each okay now I can run this a hundred times or any number of times I want I'm just thinking for a moment how I might want to do that okay I'm gonna do the the old-fashioned our way there's a function called a replicate look and say replicate a hundred times this function and get I'm getting a matrix of results this is one yeah this is I'm starting with this where we're probably gonna try it in a couple different ways all right and in every one of these cases see all right Wow and or in any of these cases are there no empty cups okay there anything on the diagonal well I could actually say call sums how often are you empty for every case here how often are empty here none of them won't be here none of them were empty wherever that equals zero so I can actually say how often was this zero this is a this is one approach I just realized something I want I'm gonna actually be a little bit easier I'm going to say we've got two AG of em I can say all I'm gonna make this a little bit easier is it true that all of this is greater than zero then you would win the round so you could simulate false false false usually no we just hit it true okay so there we go okay round yes I know how we're going to do this I'm gonna simulate this is the step where I'm actually going to get Heidi this simulation is not particularly tidy it's a while loop there are ways we can speed it up shorten it I'm not going to do it right now let me see huh yeah but yeah what I'm gonna do is say data frame I'm gonna say round is 1 to 1000 but I'm also going to add a column column going to add is is when is did you win or not and it's math LGL map I'm forming on every row can actually think there's a function called repeat know it it's per that's not reap rep along its oops oh I guess it's replicated oh okay I can say rerun less that's name the function I don't use this very often there are a few approaches to this we went a thousand times and however many times there are how many of them are any rounds with generating you just check this is um use interval okay yeah yeah so to say is rerun n times and say and each time I'm going to run simulate round there are other ways we could approach this oops me run I actually I'm just gonna map logic for 110 or I can do it wrong number this is that stuff what doesn't really matter they're a few ways to alright I didn't uh oops this needs to take something okay this does need to take well it doesn't matter we'll get back to that but uh I just needed to run this a thousand times and get a column of truth of falses for each and then we say in this case they won once right away one here kept going okay why am i doing it this way I'm simulating all the rounds at once and then I'm gonna say we're playing a sequence of games so right now we say this person played how long did it take them to win was that fifty five yeah it looks like like fifty five rounds they didn't win before that and then we're gonna start them over so they start over and they keep playing okay why can't I um why am i doing it this way because I can actually because I'm generating all the rounds and now I say mutate I'm gonna create cumulative scores I'm gonna do one thing first all right yes I'm gonna say game number is the cumulative sum of let's see so I can't so I'm sure I can't quite do I say cumulative sum of win this would say you go up until he venue in game 1 so notice it switches from 0 to 1 cumulative sum on the sum is true false I can't quite do that I want this to stay in game 0 so I'm gonna do cumulative sum of lag of win so that actually Q moves sum of lag of wind which oh right whenever I lag it the first one Oh actually meant lead oops yeah so what I do is say now it's 0 and then I actually did mean lag but I need the default not to be um I need the default the first value to be 0 if you don't know how lag works worth reading to do it's really quite exciting aha game 0 ends with true game 1 keeps going Wow game 1 lasted a while until 162 163 even game to game 3 game for Game five I'm playing a series of games now I can actually just count how many rounds were required in each game it says would you expect to need to play to finish this game count game number there it is so now I can actually say you know we have this first one to 55 then 108 then 35 and 5 so how many rounds does it take to wage game can actually jet probably do even more than this I can say it try 186 it's gonna it's not the world's fastest thing we have a bunch of steps in here we're doing um matrix operations we're doing uh everything million rounds might have been a little bit much we'll see but I can then come after that I'll be able to count the game number and yeah I think a million rounds is a bit too much I need to do a hundred thousand rounds counting game number how many are steps did it take but I can actually do one other thing there we're interested in how we also had a second question which said how many balls do you expect to draw need to draw and throw to finish this game that'll happen more often in some games than in others so that's what's great about the simulation is we solved this problem but we also by the way here it is here's my new simulation of almost 3,000 games and how many rounds did it take each time I cannot I can use the grandad if I like all right this is a clap pretty classic we call a geometric distribution if I set a bin width of let's set it at 2 but let's say I can t remember how to how to say the problem is I need the first one to start at 0 which I've sort of forgotten how to do let me see for one second center or boundary I can see boundary between two bins ok I can say boundary equals 0 ok this is a pretty classic geometric distribution we expected to see that is the lowest numbers are the most common and then it gets steadily less common as we go the geometric distributions you're gonna recognize it anytime that you say how long until something happens all right but it's already recognized this kind of shape that at this point we can take round I'll take this and simply say summarized mean of n because I can actually call this game summary histogram and histogram and summarize mean event 34 rounds to finish this oh come on this is on average 34 rounds all right that's uh that's one simulation but I also want to know how many by involves that I have to throw to finish the drawn throw to finish the game that means they don't just want to return did we win the game I'm gonna try did we win but I'm gonna return in a list with two items I'm also going to return number of the number of balls that were thrown wrong is it's actually some of em we've been adding to em every single time as we go so if we sum up so here's em we sum em we said that time we had to throw six all right so I could say he was oops all right it's not math logical anymore we would I'm gonna start this with just a thousand because I'm you can do a little more work here I'm going to say I think I can do this I'm gonna find out I'm going to say unnecessary with two values every one of these values become one of those and then I'm gonna unless they ha ha round 1 did not win we threw seven balls round six we did win we didn't win on round six and I'm that round we threw 15 now that I've got that instead of a count group number I'm gonna say group by game number summarize rounds the number of rounds but also balls thrown is the sum of the of balls phone who said how many balls do we have to throw all right now I can say in game one we won in six rounds we had to throw 46 balls in Game two well game zero we good and get in the next game this is how it evolves you through and here's every one of these these steps alright that's pretty that's a pretty interesting that was um there's a thousand rounds let's do a hundred thousand rounds every time was simulated with finding out did we win finding out the number of balls thrown yeah there's a few other ways I could have approached this now that I'm thinking about it yeah is if this is a pretty solid way with letting what out says this is the function that that we were allowing not to be tidy but that has a tidy result that has two data frames that's pretty common we'd say well here we need to do some looping we need to do something I'm again we could you could find ways to speed it up but we'd say yeah we'd discover once we've once we've done all these simulations we get to say how many rounds and how many balls are thrown okay let's see how long this takes I'm realizing a few things I'm missing I think this should be sample int is gonna be a little bit faster and sample and I throw I give it up for L think yes I would say a for L mean for an integer this way the matrix will stay being an integer all right that's not gonna be the slowest thing here there's gonna be other other things we can do okay there's still yeah for balls any calls for so what I'd say then is is I'm going to summarize the games and here we go and I'm gonna do two summaries on that I'm going to do game summary I'm gonna say summarize the mean of the rounds which should be hopefully it'll be the similar to last 134 balls thrown equals mean of balls thrown give a 35 close and somewhere around 300 balls phone all right that's pretty interesting this is um so this was running a simulation when again notice we did a Monte Carlo simulation here that wasn't very tidy we said let's do a loop because we have this on recursive relationship between we we stopped at a particular point and as well as we're accumulating this and yeah and then we would say but from our end result did we win how many balls did we do you end up throwing the yeah that's definitely um this is what yep is what approach to simulating this problem of this particular game around throwing and pruning how could we make this more interesting well one is we could change the value of n I don't know why I said ro here should probably pass n as a function as a part of me as an argument and now I could have said instead of a thousand rounds in general I would say cross where we'd say n is either three or four and then I would say round is either is from one two let's do 10,000 this time and now I can say for each value of n we're mapping from n map the simulate round step mm-hmm yep okay this will do let me see take this I'm gonna do a hundred examples just to check what this works uh-huh what am i doing n is is here and here huh on end we do simulate round oh oh here it is easy wonderful sample intimate matters that I'm not giving it an integer maybe not I don't know for from a performance perspective well see okay well it does because o 1l of course it doesn't do I mean all right I'm going back for this is sampling a random row and column okay here rounds and and then I can say within now instead of grouping by game number I need to group by by both an in game number and actually have to group by and before I do anything before I do anything here okay yeah what I'd say is um before I add the whether you want or not all right so here we have just a handful of games in three and then in four I'm gonna increase the number of rounds by a lot I want to say 10,000 at 50,000 so we could have had a lot more we could have said here's n here and three four or five six and changing the number of balls but what's just important is because we have this table set up just high D set up around the UH the less tidy step of simulating an individual round we're able to UM to perform these kinds of transformations to say I want to change and put parameters like say N equals three or four instead of just N equals N equals four notice how usually this is going to take a couple of options depending on some of the parameters of our simulation oh I know what I did wrong I think I should have returned think of the construction of the data frame at every point here is a waste of time I should have returned them I'm learning I'm learning here and what I'm gonna do is say I'm just changing this for now so we can understand it it's a result equals math so now this will be a list of matrices I'm gonna get back to this in a second I'm gonna ruin our result first now it went before we've still got our group summary sometimes they'll be n some times for N equals three sometimes will be N equals 4 and we group by n before we summarize now I can say if there were three we expect it to take about ten rounds and throw about 55 56 balls if the N is 4 they both go up this makes complete sense to me that I'm the higher end is the longer you need to play the game before you're able to UM before that pruning step will leave anything any ball whose number doesn't match the that's good but yeah I just want to know what I would have done here is return em and then I would have said you take when equals map logical on true result I would say I would say for every value is it true that all diag of the diagonal is entirely equal to sorry all that diagonal is greater than zero and then I would have said balls thrown equals a map result sum sum of every matrix okay it would have been similar I'm not using a none nest here put this all into one step we're doing three things we're mapping onto the end then we're mapping the logical let me try just a hundred examples that's not very common to do oops ice map good here it is okay so we get our result this matrix I just noticed something's wrong the mate oh the matrix is oops one by two that's wrong three by a good good that was I was had some strange intermediate version uh-huh it's a three by three or four by four matrix you a thousand rounds and that can do fifty thousand it's not gonna be all that it's probably got a little bit faster I think it got a little bit faster I think that step of data framing everything might have been a little bit of a waste of time but still the majority the time is probably spent in this loop yeah no that actually was way fast I believe way faster I haven't I have it profile that I can't say for sure it's it's right I can um I'm actually not gonna bother histogram it I'm just gonna say how many rounds how many balls thrown that's pretty good I can I can arbitrarily extend this this could've been all the way from three to five instead of from three to four and I could get a more and more interesting um I'd say like like N equals 3 N equals 4 how many rounds how many balls thrown and keep changing that result okay so notice it took this took some time I think that was what I think about about 40 minutes on that second problem I think is right and I'm not sure exactly I haven't been timing it but notice they actually didn't end up being that much code we have a code for simulating around then we have our tidy approach to extract what we're interested to vary in our parameters particularly with m2 u2 at use map from pair to transform each of these these parameters into our into our various results that were interested in in the form of did we win this round or not and then this group by and a proper use of cumulative sum and lag to discover how long did the game take so the yeah and so we see but and now we get a result above and now we end up in that final result so one part tidy one part less tidy one part more tidy we end up with with our final result that we can that we can bring as a conclusion okay I want to find out if we were right let's see I'm actually sure we could I could easily have made a mistake or misinterpreted something that's at some point very easily all right so this okay so first we got our chess problem and we I said yep we win oh wow this is way closer than I expected our simulation nailed it 52 percent of the time we want a 12-game match and then we see the larger thresholds are 82 2 4 8 773 let's see what we got here 80 about 81 to 52 48 close and these are pretty close noticeably yet we were off by a couple games we don't have an exact we don't have an exact answer here oh that's interesting I do not know oh that's really interesting I don't know I wouldn't have have certainly wouldn't have solved this approach hey check this out someone else um used our they didn't use ggplot2 but they use the artist simulated got a similar curve to what what we had that kind of asymptotic curve they didn't do it on a log scale alright so that was one that was the first problem let's go to the second problem we have ping-pong balls and we find out how many throws do we expect oh yeah the first question how he throws we expect to need first question coupon collectors problem I'm not sure what I'm huh the more as we collect coupons I just one moment I think I might have misinterpreted the problem alright so I'm just learning this now let me see ahaha oh I was emptying the cups after every phase I was empty I was going through all the balls and then removing the ball that news number does not match the the containing Cup okay that was wrong I was emptying them I should have been leaving it all right I let just learn something um all right I'm gonna modify before I look at the answer I'm gonna modify our simulation this is the step where oh wow yeah this is gonna be way harder not harder but I different okay I'm gonna spend a few minutes to try and solve the the more the this problem right this is gonna happen in my live screencasts oops I was emptying the ball there the cops every single round all right so what I'm going to add is let's see this is one simulation I'm going to keep going until you see all right how do i how do i empty everything but the diagonal I'm actually I feel like I remember how to do this but um but let me see it's upper tribe M uh-huh and lower try huh okay yeah that actually is a way I could do this is I could actually I could empty mm-hmm okay what him do is empty the empty the cup empty the cops okay so I solve different variation of this problem that was good to know but what I'm gonna do is put it in another loop I'm gonna say well any yeah well not all diag of M is greater than zero as great it was greater than zero so while not all of the cups all the the cups are empty okay yeah throwing phase there's a new recursive relationship here all right I'm gonna say well any cup is empty well I'm simply okay empty the cups and now I'm actually gonna say I'm using a function I haven't used in a while where I say turn the lower triangle to zero turn the upper triangle also to zero that was these um two steps she's gonna make it zero integer they tiny bit faster okay so I'd say well any cup is empty you keep adding it empty the the non diagonal cups and now I actually I can do all but I can actually say well any call sums of M aha I can do this it's actually kind of neat I can say well any I can say well any cup is still empty and then again well any cup is still empty so well while cup is empty do this and then empty them again cup is empty do this and then empty them again okay that's um that's definitely definitely interesting one thing I need to do here though is and I would need to say let's see I need to keep track of two numbers this makes it even though it's hide it more so I'd actually say rounds is um is is gonna start at zero and I would say balls starts at C now says your balls are zero and I add around every time so rounds is rounds plus one and I add a ball every time in this inner balls is one okay so we're adding these and at the end we just return the number of rounds and balls so we say well any cup is empty we do a new round well any cup is empty we're gonna throw another ball by choosing a ball in a cup adding to it emptying them going back this is probably not the most efficient approach I haven't really explored the most efficient approaches yet but definitely want to speed that up a little bit a little bit actually this makes this set of code a little simpler it's a it's bit of goofy what I'm going to do is grab every and now these are no longer rounds these are games and I'm but it's still gonna take end game is one two let's do just a hundred of each yeah I'm gonna start with that and I say simulate game a different approach and but not no longer simulating every round by itself that actually Wow when I think back on it it didn't make sense to consider it rounds all right that's cool so here's simulation rounds place everything yeah add one I had a ball okay and empty them and then I can actually say grab the first item of each result and the second item of each result and me go to this I hope this will be an int oops it won't be an into it unless I go ahead and turn all these into an end why not no no k keeo reason why not here's my games okay completely different result how many how many rounds and balls is the second one okay different results done with a couple of with two loops one where we say play a new round and then empty it and then one where we um the empty okay completely yeah entirely different answers there's our games number rounds involve each but I can still take my reigning code summarize it oh um it's just gonna be games now I don't need that code anymore if you can't bear to see it go I'm gonna put it a little bit lower here I'm gonna say games alright so these were our three first answers from UM from these this is only from 100 games I'm gonna try 2000 let's see how long that takes so simulating a thousand games of three well that's not so long mr. Minnick 10,000 I say a thousand games with 3,000 games before of N equals 4,000 gives N equals five each time we passing it here playing a complete game with a loop alright and we're keeping track of the rounds of the balls each time thinking about um yeah I'm thinking about how to profile this speed it up alright it looks like it's about four rounds six rounds eight rounds or a little above that and then in terms of number of balls 1630 357 to me it looks like it's good like both of them are going up linearly alright if I thought this one wrong I'm not gonna go back and fix it but it's good to know now that I'm that at least I think I got the interpretation right okay thanks for your patience and I'm gonna go to the following column and we'll take a look at them right expected number of balls thrown woo boy we don't have um it's quadratically that's not oh wow this is uh here is Python code Monty Monty Carlo simulation expected number of rounds increases I'd so this actually says the next effective number of rounds increases the number of cups increases roughly linearly which matches what I've been I'm looking at so far wow this is inin wow there's really an adventure here of um mathematical approaches but how many rounds some throws okay so throws is quadratic does that match up 1633 okay yeah I'm gonna I'm going to decrease the number of games but I'm gonna go with more balls I'm gonna actually try this one more time three two twenty twenty-five by threes why not do a thousand games of each okay and let that play for a second and then we're gonna take this and examine let's see it's interesting to me that rounds is it's so interesting that rounds is trickier that's really interesting huh huh okay but and said the second question is how many rounds I thought the first question was how many rounds but yeah this was my Monte Carlo simulation to try and answer the UM the relationship between number of cups and the time to finish this game okay I'm excited that it that I'm that I was able to do this through simulation even if it took me two tries rather than having to remember how to deal with Markov chains and this looks like quite a quite a monster of a math problem okay that's my conclusion I'm gonna let this finish I think it's actually gonna take a while I'm gonna yeah I'm just gonna write a little a little bit less and I say I'm and tried one more time because I haven't been as the number increases it's gonna get slower and slower and slower okay and say yeah thank you very much for joining me this was the first time I tried solving a probability puzzle with the UM go yep linear and quadratic alright we I'm not sure if we got the exact answers as everyone else but it looks about right and this was our approach thank you so much for joining me I had a great time definitely a few um this is a higher pressure for me than analyzing data I have a little bit less experience but I'm glad I got to show how to solve a probability puzzle through simulation so I hope you enjoyed it and I hope you keep watching my screencasts
Info
Channel: David Robinson
Views: 6,892
Rating: 4.9722223 out of 5
Keywords: rstats, data science
Id: pBGMt28xgvk
Channel Id: undefined
Length: 66min 42sec (4002 seconds)
Published: Tue Dec 04 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.