Calling Bullshit 2.5: Unfair Comparisons

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] CARL BERGSTROM: All right. So the next thing I want to talk about in trying to spot bullshit is being aware of unfair comparisons. There are so many unfair comparisons that get put out there in various forms of media. Also in scientific papers-- when people are trying to make arguments-- they say, oh, this is better than that. This is more effective than that. Whatever the case may be. And you have to make sure that these comparisons are actually reasonable. So let me just give you an example of the kind of study someone might do. Maybe I want to know whether fresh juice is sweeter than juice in concentrate form. So how do I test that? I'm going to go get myself a bunch of fresh juice. Buy a bunch of jugs of Martinelli's. Going to go get myself a bunch of concentrate. Buy a bunch of tubes of Minute Maid. And now I'm going to give that to a bunch of people-- let's say 25 subjects-- and I'm going to ask them, which one do you think tastes sweeter? So I do that. Twenty-one people think the Martinelli's is sweeter. Four people think that the Minute Maid is sweeter. I do a little bit of statistics. We don't need to worry about that. But I find that looking at these data, I can conclude that fresh juice is sweeter than from concentrate, with a p value of less than 0.01 using an exact binomial test. I hope someone's a little bit skeptical. What am I doing wrong here? STUDENT: It's literally apples and oranges. CARL BERGSTROM: I'm comparing-- JEVIN WEST: [LAUGHTER]. CARL BERGSTROM: Thank you. I'm comparing apples and oranges. So that was a very unfair comparison. If I want to see, obviously, what the process of concentrating a juice does to its sweetness, I should use the same fruit and probably from the same supplier and so on, and certainly not be comparing apples and oranges. That was a silly example, just to have some fun. But this is really, really common in the sorts of media reports that we see and in scientific papers as well. And I want to give you a couple of examples. I'm sure you've seen these lists. These are all over the internet. They're exposed. They're easy to make, and they generate a lot of clicks. And sometimes they make you click through city by city and see every city. And so they get a lot of ad revenue and so on. So here we go. The most dangerous cities in America, and then your most murders, most violent crimes, whatever they want to do. And so in this particular story that came out not so long ago, the most dangerous list starts off with St. Louis, Missouri. I was born there and lived there at the start of my life. Number two, Detroit, Michigan, I spent most of my teens hanging out there. And so this is starting to get a little bit personal, because these weren't such bad places. These were not such bad places. And so what's going on? Why are they up here? But on the other hand, a city is a city is a city. What could be apples and oranges about that? To answer that question, we have to look into the sociological nature of cities and how they work. And for reasons that would take a lot more than one class to go through, it turns out that-- as you know-- inner cities-- urban cores-- typically have higher crime rates than the outer suburbs. So these are crime rates in Seattle. And so we've got high crime rate down here. We've actually got a relatively high crime rate right here. But that's a very typical pattern that we see in cities. So for example, here's Atlanta, where I lived before I came to Seattle. Here in the central part of Atlanta, we've got fairly high rates of crime. And in the outer suburbs, most of them have much lower crime rates. Now if you think about what constitutes Atlanta-- what's the city of Atlanta-- the Atlanta metropolitan area is really this whole huge range. Atlanta is this gigantic suburban sprawl that's got 13 million people or something like that-- you can look that number up and call BS on me-- but something like that. What's the city of Atlanta? The city of Atlanta is a local political unit. The city of Atlanta is just that region right there. So the city of Atlanta is just this little piece, even though Atlanta metropolitan area is out here. And of course, this little piece that's the city of Atlanta contains Atlanta's historic center and its urban core, where much of the crime is taking place-- where the crime rates are higher. So that's Atlanta. Let's compare Jacksonville, Florida. Here's Jacksonville, Florida. We see a similar pattern to what we see in Atlanta. We've got this urban core and then this larger suburban area around the outside. But now we see something really different in terms of how the city limits are defined. So in 1967, the original city was right here in the urban core. And you see that overlaps quite cleanly with where we've got the high crime rates. But gradually the city of Jacksonville has pushed its boundaries out all the way to include all of the encompassing surrounding suburban areas, where crime rates are much, much lower. So what we've got going on in Jacksonville is that in Jacksonville the city is including the entire metro area. In Atlanta, the city is only including the urban core. So if we go and we look at the murder rate by city-- So here's these different cities. These are murder rate data. Here's Atlanta. Here's Jacksonville. Now we can say, what fraction of the metro area is included in the city? What fraction of the population of the metro area is included in the city? And Atlanta is one of the smallest in the US. Less than 10% of the metro area is in the city. And so Atlanta-- we've got this what seems like a pretty high crime rate. Over here in Jacksonville, the crime rate's lower. But in Jacksonville, 60% of the metro area is included in the city. This is not an apples to apples comparison, because in Atlanta we're only counting the urban core, where the crime rate is highest. In Jacksonville, we're counting the entire surrounding area. And we're getting almost as high of crime rates as we are in Atlanta anyway. So this is very much not an apples to apples comparison, if that makes sense. So I did this. I put these data together over the weekend, and I wanted to just be sure that my story was legit. So I put together a second graph. And I'm not going explain this graph today. This is a teaser for next class. I'll tell you what it is, but I'm not going tell you-- and what I would like you to do is figure out why I did it and why I feel like it strengthens my argument. So in the previous graph, I'm graphing the murder rate in the city. Here I'm the graphing the murder rate in the entire metropolitan area. So if instead of using just the city, I used the entire metropolitan area. And I again graph it against this same measure, what fraction of the metro area is in the city? So you've got Atlanta over here, Jacksonville out here. Now I don't have any trend at all. We had massive statistical significance when I do it against the murder rate in the city. No significance at all. p equals 0.5 when I do it against the murder rate in the metro. So think about why I did that, why that's convincing. And I think if we have time, we'll talk about that some more in the next lecture. I want to do one more example of apples to oranges comparisons. So after the election, this issue of how many people came to the inauguration turned into a huge deal-- which is remarkably stupid and one could call bullshit simply on caring about that because it doesn't really have anything to do with someone's efficacy serving as president. But it seemed very important to people, including our president. And so it ended up being talked about. An enormous amount of people say, oh, so many people came to Obama's inauguration, and no one went to Trump's. And it's so terrible-- or so good, depending on what news source you're reading. And so here's a conservative news outlet. And they say, the media is so unfair, because the mainstream media has ignored the fact that eight times more people watched Trump's inauguration over streaming video than watched Obama's. Now think about that for a second. Just think about it for a second. What's wrong with that? Streaming video was hardly a thing in 2009. They didn't have streaming video, or if they had it, you couldn't afford it with the data charges. So I just grabbed a couple of quick graphs. Here's internet video by terabytes from 2010 to 2015. Here's mobile video. And so these things are exploding. The point is, of course, Obama had fewer viewers on streaming media because people were using streaming media yet. More people drove to Trump's inauguration in a Tesla as well, because they weren't released for Obama's. So there you go. See the kind of energy-- Zero-carbon people love Trump. So now you know. And then to finish this off, these guys-- I don't think they really helped their case much with the way they ended this. They say, "The press left out some important differences. Most importantly, millions watched the inauguration on TV and streaming media-- probably millions in Russia alone." I guess I would have left that detail out, if I'd been writing for them. But there you go. So one of the things I really want to stress-- in this whole lecture, we've looked at a bunch of statistics, and we're going to continue to look at statistics. And in the process of course, we're to look at a bunch of big data algorithms, all of that. But nowhere in that process have we really dug in to what's going on in the algorithm. We haven't criticized the fine details of a particular chi-square test and how many degrees of freedom we're going, because a lot of the time you don't need to. Very few of us are going to be trained as professional statisticians and able to really dig into that-- or as professional data scientists like Jevin. But we can call bullshit on work of guys like this simply by looking into what's going into this algorithm and what's coming out. And so that's what we've been trying to do today. What are you putting in? Are the data reasonable? Are they fair? Are the comparisons reasonable? How did they get those data? Are they pertinent to the claims that are being made? What's the output? Does the output even make sense? Is the output the right order of magnitude? And if it does make sense, does it support the claim that somebody's making-- we should get rid of food stamps. Or does it actually refute the claim? It seems to me that if you've got a government agency running 10 times as efficiently with respect to fraud as the free market, that might not be an argument against that agency. There may be other arguments against it, and people can find those. But thinking carefully about these outputs is really important. So in the class, we're really going to focus over here and over here. And I think you'll be amazed at how much you can do without having to dig in. Not that that isn't fun as well. JEVIN WEST: That's something you'll hear in every machine learning class. And I bet most of you haven't taken a machine learning class. In my class or in any machine learning class, there's this adage, garbage in, garbage out. And one of the things that really excited Carl and I about teaching this class was the fact that we think that we can teach you the same skills without a PhD in machine learning or statistics. We think that you guys-- and some of you do have that background and that skill level-- but you don't need that. You can call BS without those really advanced degrees. [MUSIC PLAYING]
Info
Channel: UW iSchool
Views: 26,072
Rating: 4.9408865 out of 5
Keywords: information science, information school, ischool, university of washington, education
Id: _AXyeKbw3tU
Channel Id: undefined
Length: 11min 39sec (699 seconds)
Published: Tue Apr 25 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.