When numbers don't tell the full story

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video was sponsored by Coursera in the 1990s the Gates Foundation along with several other nonprofits began to advocate for breaking up larger schools into smaller ones and the reason for this is that they noticed several the smaller schools are outperforming pretty much all of the larger ones well it turns out this actually is the case pretty much anywhere that you look in fact a lot of good things tend to happen to areas with smaller populations like the safest towns tend to be those with smaller populations or the states that have the lowest percentage of people with brain cancer also have smaller populations and we could just keep going but there's also something I'm not telling you and that's that several of the lowest performing schools have small student bodies and the most dangerous towns tend to be those with a smaller population and the states that have the highest percentage of people with brain cancer tend to have smaller populations so what's really going on here well this has to do with the law of large numbers which says in terms of percentages more extreme things happen when we look at smaller populations for example if I flipped a coin four times it would not be surprising if 75% of them landed heads or even a hundred percent these are extreme compared to the 50 percent expected outcome but it would not take long for anyone flipping a coin to encounter this however if you flipped a coin a thousand times it would take a lot of trials before you got 75% heads see when you flip a coin a few times the percent that comes up heads can be kind of chaotic bouncing around 50 percent I mean after one flip you can have only a hundred or zero percent that came up heads it's far from the mean but after more trials the outcome tends towards the expected value or 50/50 so when we analyze small schools it's like looking at this side of the graph assuming these students are selected randomly you're just going to get those schools that are way above the mean and some that are way below the mean as a population the school grows and you grab more and more students from the population though the overall scores will approach whatever the mean is now I know many of you are probably saying wait it's not that simple because there are other factors to consider and you're definitely right like with especially the private smaller schools they may require you to do well in a certain test in order to get in and if that's the case that group of students would probably be just smarter on average or when looking at crime rates we can't just ignore the poverty rates of that area and simply at population size since again there are other factors to consider but it is true that when you're looking at those smaller populations you're just likely going to get more of those extreme outcomes and when the Gates Foundation and all those other nonprofits put this to work it turned out to be a failure it wasn't because small schools are inherently worse or no different than larger ones but rather education is more nuanced than simply looking at school size and on a large scale it was just better for those funds to go elsewhere but misinterpreting why several the smaller schools were in fact outperforming the larger ones had already cost these organizations about a billion dollars so now after that failure for example the Gates Foundation is more focused on putting money towards math and science programs improving instruction and more like that now for this next story instead of ending with a lesson I'm gonna start with it and the lesson is do not talk about percentages of numbers when the values that you're talking about can be negative I got that lesson from this book here which I highly recommend if you have not read it but the author gives some great examples as to why this lesson is so important imagine you're running a clothing store where you maybe sell t-shirts sweatshirts hats shoes and pants now let's say your net profit for a given month is ten thousand dollars and ninety percent of that came from t-shirts so my question to you is how do you interpret this it seems like t-shirts are really what's working for us so maybe we should focus just on that honestly but what if I then told you that sweatshirts made up seventy percent of our profits well now you're probably thinking hey there's that American education system at work brought to you by those who still use the Fahrenheit scale but I promise these numbers are technically accurate in fact I'll keep going I'll add that haps accounted for thirty percent of profits shoes were 40 percent and pants fifty percent so how is this possible well it's simply due to the fact that business will include losses which we haven't considered yet let's say rent advertising cost of shipping the product and all those expenses add up to eighteen thousand dollars for the month well now all the numbers make perfect sense the t-shirts were 90 percent of the 10,000 which means we made $9,000 from those the sweatshirts made up 70% of 10,000 or $7,000 and we just continue that means our net sales were $28,000 then - our expenses leaves us with 10,000 the numbers match up and the percentages are technically correct when it with respect to that net result if you want to talk about these percentages with respect to just your sales where there are no negatives then these would be the values which makes way more sense now I know plenty of you are saying well I wouldn't misinterpret the numbers like this but hopefully you can see when it comes to the general population it probably wouldn't be that hard to miss inform people or just give them the numbers and have them misinterpret it for themselves in fact here are two examples where this misinterpretation has come up in politics in June 2011 an article was released in Wisconsin celebrating the great work of their governor at the time turns out during a recent month there was a net increase of 18,000 jobs throughout the entire nation and 9,500 those came from Wisconsin so politicians from the state praised this saying hey over 50% of nationwide job growth have been just in Wisconsin alone again this is technically right but not in the way we think because guess what Massachusetts created ten thousand four hundred jobs in the month accounting for about 58 percent of total job growth in fact just to emphasize how pointless these numbers can be California added twenty eight thousand eight hundred jobs a month meaning 160 percent of that net job growth happened in California see this clearly becomes nonsense because we can make the numbers say anything and that's because eighteen thousand was the net increase Missouri in fact was one state that lost twelve thousand nine hundred jobs Virginia lost fourteen thousand six hundred and several more were in the red after accounting for all 50 states yes the net growth was eighteen thousand but as we've seen it's bad to refer to that value when talking about percentages and politicians ran with this number in fact a representative went in front of an audience celebrating the job growth saying well something we are doing here must be working but it doesn't stop there because another very misleading number showed up in the presidential election between Mitt Romney and Barack Obama in April 2012 Mitt Romney tweeted that ninety two point three percent of people who lost their job under Obama's presidency were women he also emphasized it in a speech around the same time these are statistics which show just how severe the war on women has been by virtue of the president's failed policies the number of job this is an amazing statistic the percentage of jobs lost by women in the president's three years three and a half years ninety two point three percent of all the jobs lost during the Obama years have been lost by women now I know at this point I sound pretty repetitive but although this number is technically true it's just not saying what we think it is according the Bureau of Labor Statistics in January 2009 when Obama took office total employment was just over a hundred and thirty three million by March 2012 right before Mitt Romney published his tweet total employment was about a hundred thirty two point eight million a net loss of seven hundred and forty thousand jobs the employment stats for women on the other hand went from about sixty six point 1 million to sixty five point four million a loss of 683 thousand jobs we do the division and we get that exact figure of ninety two point three percent but that tells us nothing about what happened during that time when we look at the numbers we find that from the beginning of our interval or January 2009 until February 2010 men lost way more jobs than women while from February 2010 until the end of our time window men gained just about the same amount back whereas women didn't gain quite as much so simply saying that ninety two point three percent figure is pretty misleading in fact it was a portion of time where men had a net gain in jobs but this was after suffering millions of job losses while women had a net loss in that same window because they didn't suffer as much beforehand with these figures you could argue that women accounted for three thousand percent of job losses which makes no sense even though the math kind of works out so you see just pick the window of time you want and you can pretty much make the data say whatever you want okay now we're gonna switch gears to a scenario regarding baseball players let's say there are two players that don't necessarily play for the same team but they have had the exact same number of that batch for a given season against pitchers of the same skill level now I'm going to use not very realistic baseball numbers here by the way but it'll get the point across so for the first half of the season let's say player 1 had a batting average of 85 percent while player 2 had a batting average of 90 percent in the second half of the season player 1 batted 50% while player 2 batted 60% and again they had the exact same total number of that bats now I think most people looking at these numbers would of course say player 2 is the better player which is why it might be surprising if I said player 1 is in fact better and had more hits overall than player 2 but it's absolutely possible the trick here is that I did not say they had the same number of that bats for both the first and second half of the season just the same amount overall so let's say for the first half player 1 had 20 at-bats 17 of which were hits leaving the 85% I said earlier then during the second half they had 10 at-bats five of which were hits which of course yields 50% that means out of 30 total at-bats they had 22 hits now for player 2 let's say for the first half of the season they had 10 at-bats 9 of which were hits giving us 90% and for the second half they had 20 at-bats 12 of which were hits giving us 60% this player 2 has also had a total of 30 at-bats but only 21 of them were hits which means their overall percentage was slightly lower than player once this is Simpsons paradox something I mentioned in a previous video but in regards to the acceptance rates of men and women at UC Berkeley but as you can see this paradox does show up in other places and it comes up when trends seen in different groups of data tend to change or disappear when you analyze everyone or everything as a whole in fact a few decades ago David Justice of the Atlanta Braves had a higher batting average than Derek Jeter in both 1995 and 1996 however overall when you look at both years combined Derek Jeter still had a higher batting average but the consequence of this can get much more serious when looking at medical treatments another famous case of this is with regards to kidney stone treatments actually where to certain types of treatments were tested against one another for large kidney stones and small kidney stones overall treatment B was better yet when looking at large and small kidney stones individually treatment a came out ahead both times see here the thing is it's the overall numbers that are misleading because this table says that treatment a is more effective for small stones and also for large ones it's the treatment you're definitely going to prefer but the misinterpretation comes from the fact that since treatment a is better they use it for more of those large stone cases that are more serious yet for more serious cases that any treatment will just be less successful than when used on something more minor as you can see by the smaller percentages on bottom compared to those on top so you're using a better treatment more often on kidney stones that are harder to treat and thus you get the seemingly paradoxical result C Simpsons paradox isn't really a paradox it just shows how easy it can be to misinterpret data when it's presented like you've seen here so when we look at data the wrong way we can be not only sort of wrong we can come to conclusions that are exactly the opposite of the ones the numbers are really telling in fact this is another quick example during the First World War they found that incorporating metal helmets on soldiers increase the number of people who are hospitalized with head injuries looking at this someone could definitely make the very weird assumption that helmets cause more head injuries but remember never be too quick to assume that correlation equates to causation because before helmets were incorporated most people who suffered head injuries like from a gunshot died from it obviously so they were completely removed from the equation whereas with helmets people had a higher chance to survive an injury to the head and thus you have more head injuries in the hospital I'm pretty sure they didn't remove the helmets due to this finding but this example does highlight how completely counterintuitive the results can seem when you don't think enough about how the data was obtained this was an example of survivorship bias a logical err that occurs when you overlook a certain group of people like those who died from head injuries there are several more examples of this that I'm saving for an upcoming video actually but the idea just fits so well here I wanted to include an example now being able to analyze data and then extract useful information and meaning from it is an extremely valuable skill to have in fact there are people who have entire careers based on using powerful computer systems and efficient algorithms to solve problems by analyzing large amounts of data and this is known as data science although this video wasn't about data science hopefully I highlighted the importance of looking at data and numbers in the right way but if you'd like to learn more about all the technical information regarding data science one of those high-paying jobs there's right now you can do so at Coursera the sponsor of today's video the classes they're sponsoring today are some of their most popular and this isn't just one course but rather an entire specialization consisting of several courses meant to take you from beginner to someone with a solid foundation in data science topics the set of courses starts with teaching you how to program and are one of the most popular languages for statistical computing and graphics this is then followed with lectures about gathering data properly and how to handle large data sets then come from a wide variety of sources after a few weeks they cover more of the mathematical side of things including probability Bayes theorem p-values permutation tests and more to give you that required statistical background and then you'll get to put that knowledge to work with practical machine learning which includes forecasting a model-based prediction and much more all these courses come with built-in lectures quizzes example problems and various projects to give you the hands-on experience just like you're in a normal class setting and several people who complete these courses report starting a new career or just getting a tangible career benefit as a direct result so if you're interested and want to support the channel you can click the links below and get started immediately otherwise I'm gonna end that video there if you guys enjoyed be sure to LIKE and subscribe don't forget to follow me on Twitter and join the main Facebook group for updates on everything and I'll see you all in the next video
Info
Channel: Zach Star
Views: 88,466
Rating: undefined out of 5
Keywords: majorprep, major prep, numbers, data, misinterpreting numbers, this is what happens when you misinterpret numbers, lying with data, lying with statistics, lies, damned lies, statistics, stats, how to lie with statistics, wrong math, bad math, law of large numbers, simpson's paradox, survivorship bias, what the numbers are really saying, mathematical thinking, power of mathematical thinking, applied math
Id: FUknTs9AzYA
Channel Id: undefined
Length: 14min 45sec (885 seconds)
Published: Tue May 21 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.