How chance affects our lives way more than you think | The mathematics of randomness

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video was sponsored by Wix a platform that makes it possible for anyone to build their own unique website for free on one side of the screen is the actual results of 100 coin flips I did before recording this video on the other side is me making up a bunch of coin flips trying to make it look random can you tell which is which pause this video if you want to give it some thought but just to get to the answer let's look right here here we can see a run of four heads in a row the chance of that happening in four flips is only 6.25% but out of 100 flips the chance of that happening at some point is about 99.9% on this right side we can see a few of those runs of four and more heads or tails in a row which isn't too surprising in 100 flips on the other side there's only one run of four heads in a row and nothing more which actually isn't that likely and as you can guess that is the side where I made up the results while the other side was the actual coin flips now I've said in a previous video humans tend to view uniformity as random and they think clusters must be caused by some outside factor when in reality clusters are expected if everything is random like think of a basketball player shooting free throws this of course is not just a matter of luck as a tighly skill-based however there is an aspect of randomness if you track several shots from anyone various patterns are going to show up a good player made 10 towards a higher number of successful shots overall but if they took a million shots let's say it would not be surprising to see clusters of several successful attempts and also several failed attempts or what about when it comes to students taking multiple-choice standardized tests of course these are more knowledge-based but what happens when someone doesn't know the right answer well they have to guess and thus randomness sneaks in I've actually worked with a lot of students on a CT and SAT test prep so for a test like the a CT that's out of 36 points a student averaging a 28 let's say will see scores like these over many tests their parents will tend to really like when this happens but can worry a lot when these show up as if the student is getting worse at taking the tests or something however both of those outcomes are expected at some point even if the student wasn't studying at all so with billions of people in the world and all the possible things that can occur over time it's easy for us to focus only on those strings of outcomes that seem unlikely this could be someone who goes on a hot streak in the casino or a stock market expert that beats the market a few years in a row but can't do it again sometimes these results from skill or some external factors but other times is just randomness that simply comes up because hey it was bound to happen for someone now let's look at some coin flips if I were to flip a coin four times what would be the expected outcome well the answer is of course two heads and two tails but how expected is that to find out let's write out all possible outcomes ordered by how many heads can appear so first no coin can land heads which has only one way of happening then one coin could land heads which can happen four ways there are six ways that we can get our expected outcome of two heads and two tails and we can do the same thing for three heads and four heads so as we can see the probability of getting our expected outcome is 6 out of 16 or 37.5% which ironically means we don't expect the expected outcome it's more likely than any other column but alone it isn't something we'd say is likely to happen now as I'm sure many of you already know the numbers we see here are actually those in the fifth row of Pascal's triangle which we can use to analyze any number of coin flips so next let's look at ten coin flips and to help with this we'll use the 11th row of Pascal's triangle from here we can see there are two hundred and fifty two ways to get five heads and five tails our expected outcome then all the numbers here add up to 1024 which means the probability of getting the expected outcome is about 24.6% even lower than before the consequence of this easily sneaks its way into reality when dealing with polling for example let's say there's a school where half the students are in favor of starting school later and that's getting out later while the other half of students want to keep the schedule as it is if you sample 10 students even if your sample is completely random with no bias there's only a twenty four point six percent chance you get a result that is representative of the school assuming the school is fairly large it's no one's fault for the likely air it's not bias it's just the randomness that is our reality and the consequence of this is you will likely miss inform the student body about what people really think now in the real world companies that do surveys of course need to account for this and definitely would be that our fault if they just surveyed 10 people to help be a little more honest with our survey though we can go back to the 11th row of Pascal's triangle and look at this interval this number here is the amount of ways that we can give four heads out of ten flips or put another way if we interviewed ten students this is how many ways they could return that 40% are in favor of the schedule change and this number is the same but with regards to six heads or 60% of students saying they're in favor of the schedule change now out of the 1,024 possible outcomes this interval accounts for sixty five point six percent of those and since these values are ten percentage points off from our 50-50 expected outcome we can now say that sixty five point six percent of the time our survey will be correct within ten percentage points we could also expand the interval and now say eighty nine percent of the time our air will be within 20 percentage points how confident we are about being within a certain range is what we need to really account for here and the more people we survey the better results we get like in this case if we surveyed a hundred people instead of ten we'd be within ten percentage points ninety-five percent of the time now this is an oversimplification of course since I told you half the schools in favor of the schedule change but as you can see randomness in polling can really affect what it says and unless you sample every single person in a nation or city or whatever which typically isn't possible there's going to be some possibility of getting very wrong results now let's move to some more complex scenarios let's say there's a hypothetical town of 100 people and in this town for the year 2000 there were four crimes committed we'll assume none of the crimes were murder by the way so the population stays constant then in the year 2001 there were three crimes 2002 all sweat 3 crimes and I'll just put down some more numbers for every year until 2011 what do you take of these years here and all the numbers here are pretty small with an average of about 3.8 3 but in 2010 at least the crimes are more than double that average so this was the real world would you say that increase was due to some external factor or is it just randomness to answer this we turn to the Poisson distribution this is a distribution that expresses the probability of some number of events happening during some fixed time interval assuming independent random events like let's say on average a carwash gets 10 new cars pulling up every hour which is why you see 10 up here and we'll assume these arrivals are independent and random now this plot of the probability density function tells us that we do expect 10 cars to show up more than most other numbers but the probability of exactly that happening isn't too high at just over 12.5% and that's because there's a chance 9 people show up as well which has the same probability according to this model actually there's also a chance 11 people show up same with 12 or 6 and so on it's unlikely that let's say 17 people show up if we get 10 on average but over several hours it could happen just due to randomness so this distribution can be really helpful in showing what kind of numbers we expect and this can apply to a lot of discrete real-world events that have a large element of randomness including a number of people entering a restaurant during some give an hour number of calls to some hotline daily visitors to a website bankruptcies filed per month and even crime statistics so let's say on average about 3.8 3 people commit a crime every year in some hypothetical town then this would be the estimated probabilities that some number of crimes happen during any given year assuming again everything is independent and random now let's take the numbers I put on the screen earlier which by the way also had an average of 3.8 3 and make a bar graph of those if we put these two graphs next to each other we see that the more extreme values as well as the most frequent number pretty much matches what we would expect for a truly random process with that same average it's not perfect but there are consistencies showing that the increased crime we saw may not be caused by anything more than randomness something I've also said before as this analysis was used to analyze the pattern of bombs that were being dropped on London during World War two they wanted to know if these were targeted attacks or just random ones and they found there were clusters that occurred where many bombs were dropped but upon further analysis using poisson distribution a statistician determined that those clusters were almost exactly what was expected if the droppings were done randomly and those numbers I showed earlier were actually the real number of shark attacks that occurred in South Africa every year from 2000 2011 which we could definitely say has a degree of randomness but remember humans can have trouble determining whether or some sequence of events is truly random we often just see trends and fixate on that so someone may conclude that hey something must be causing an increase in shark attacks with such a small number hopefully no one would make that assumption too quickly but you get the point that people can immediately attribute certain real-world changes to something tangible when in reality it's nothing more than randomness but now let's acknowledge what I know many of you are thinking how can we determine whether these increases are due to randomness and when they aren't I mean what if sharks are showing up closer to the shore more often or what if poverty rates are causing an increase in crime and we need to do something about it well this brings us to change point detection or the process of determining whether a real changes occurred beyond normal fluctuations in the system and this has tons of applications again it applies to seeing whether something is causing crime rates to go up it can be used to determine whether an increase in monthly purchases is due to that person just spending more or whether it's something like credit card fraud it can be used in manufacturing processes to determine whether some faulty readings are due to chance or real issues with machinery and even in the TV show numbers there was an episode where the mathematician had to determine when a baseball player had started using steroids based on their improved performance well these all have in common is the big element of randomness so if a baseball player has so many home runs per game or season or whatever and they start using steroids we may notice a big change in performance and that change we would call the change point which isn't randomness but a true change in the system or in this case the player here it's obvious where that point is but it isn't always like this because let's back up if some baseball player out there has these home runstats and then this year we see a big increase can we just say oh it's steroid use cuz to me that seems drastic I mean it could have been just a good year for them maybe lots of practice and so on we need more data if future years look like this then that may be signals we found the change point but if future years look like this then it's more likely they just had a good year but the thing is if we wait too long we may be letting someone profit off of cheating which isn't the end of the world however what if instead were monitoring people with some certain disease and want to determine if the potential change point is due to randomness or something as serious as a bioterrorist attack now we really don't want to wait too long but we also don't want to sound the alarm too early and create a panic it's a statistical puzzle that has a great deal of trade off and although most of the sources I found on this were pretty complex I did find a very simplified algorithm for determining if a change point has occurred so let's see how that works with crime rates let's say in some town on average one crime is committed every month but if crime rates go up to once per week then that can be considered like a state of emergency or something that we don't want to get to so here's how we determine if that change has occurred first we set some variable s to 1 and we'll say if s exceeds 50 then a change point has occurred 50 was chosen arbitrarily by the way and I'll talk about that more soon next since there's an average of one crime per month we'll say the daily probability of a crime happening is 1 in 30 if the crime rates go up 2 once a week that means the new daily probability would be 1 in 7 we then divide those numbers and get 4.28 6 we then just do the same thing with the daily probabilities of a crime not being committed giving us a value of 0.8 8 6 7 these are the numbers we'll be working with on any given day if a crime is committed we'll multiply our s value by the 4.2 eight-six if however no crime occurs we multiply by the point 8 8 6 7 and by the way if the S value goes below 1 we reset it to 1 so again we start with s equals 1 and on day 1 let's say no crime is committed we then multiply by 0.8 8 6 7 which a new s value of 0.8 867 since it's less than 1 we then reset it to 1 like I said earlier if no crime happens next several days S will stay at 1 for the same reason but now let's say a crime does occur we then multiply us by 4 point 2 8 6 to get our new s value we'll say the next day there's no crime so we multiply our s value by 0.8 6 7 getting us a slightly lower new value then no crime again on the next day lowers s some more now I'm just gonna add several more days to this calendar and if we were to multiply by those constants each time we would be at an S value of 16 point 4 9 after these 20 days so if the next day a crime is committed that takes our s value above the threshold indicating a change Point has occurred note this does not tell us when it happened but we just know that it did has too many crimes occurred during this time interval now remember there's a lot of trade off within this and that's largely due to our s value like making our threshold s equals 50 will create a false alarm once every three point four years or so where S will exceed 50 just due to randomness if we set the threshold at s equals 40 we would get a false alarm about every 2.5 years however the average time it would take to determine that a change point has occurred would be about 30 days of it happening rather than close to 33 days at best were 50 again it's all about trade off of frequency of false alarms vs. the length of time you wait thinking everything's fine when really a change has occurred if our threshold were 75 the false alarm would happen every five point two years with an average detection time of thirty six point nine days and add a threshold of 150 these would be our values so depending on what the real-life situation is we can move these thresholds around to get the kind of results that are optimal so determining how much randomness really plays into our lives can be a tough thing to quantify as you can see I'm definitely not saying everything is just due to chance but when looking at events through the lens of randomness it's surprising to see what patterns and truths can be uncovered and if you want a bit of last-minute motivation there's a reason why the most successful people in the world typically come from the set of those that simply didn't give up no success is not just all luck but as always luck and randomness do play a role and when you keep taking those chances just like continuously flipping a coin more and more outcomes are going to happen making it more likely you'll come across that one that you really want then I'm just about done but before I end this I want to thank Wix for sponsoring this video one thing I've talked about in another video was how during my first job I actually tried building a business on the side and how during that time I taught myself how to write an HTML CSS PHP and JavaScript so I could build a website and let me tell you that took a lot of time that could have easily been saved with Wix anyone can build their own website for free and no programming experience is required you can start a blog online store a personalized portfolio for business purposes and really anything else you can think of in fact without spending any money let me show you how easy it is to get your website up let's say I want to make a blog to go along with this channel that's of course educational and meant for students I can pick the type of blog I want and after putting in some additional information you get to pick from tons of layouts that offer just the right feel for your site and once you're set up everything is very customizable so it's very easy to edit titles and pictures that give your site the personal touch that you want they actually put this as the default but honestly I think I'll keep it then if I want to maybe make some money on the side by offering some online tutoring it's literally two clicks away I can then design a pricing plan that fits my needs and make the page look exactly how I want a lot of what you'll need is even built into their default settings then buying a domain linking any payment methods creating customized email addresses and everything else you could need is all available on Wiggs if you're trying to go really professional you can even upgrade to a premium plan which is used by professional developers to save time so they can focus on more important business matters to get started right now you can click the link in the description and join over million people who have used wicks to create their own amazing website and with that I'm gonna end that video there if you guys enjoyed be sure to LIKE and subscribe if you want to follow me elsewhere you can find social media links below and I'll see you all in the next video
Info
Channel: Zach Star
Views: 150,426
Rating: undefined out of 5
Keywords: majorprep, major prep, randomness, random, the mathematics of randomness, what is random, random events, statistics, probability, applications of math, applied math, mathematics, how randomness affects us, confidence interval, margin of error, changepoint detection, changepoint, change point, poisson, poisson distribution
Id: _lSBRxhgA-A
Channel Id: undefined
Length: 17min 20sec (1040 seconds)
Published: Thu Apr 18 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.