Becker Brown Bag: Learning From Data, Featuring Steve Levitt

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we're very excited to have Steve Levitt here today for Becker brown-bag now I don't know if this will be part of Steve speech but one thing is clear from our organization of this event given that it's sold out in about 11 minutes so it's a bit of a lot of great econ students here who knows what we should have learned from that the price is too low that's right so I want to thank all of you for joining us today my name is Michael greenstone and I'm the director of the Becker Friedman Institute BFI is a collaboration of the and this is actually a true fact many things you hear are not true this is true of the 300 PhD economists on the campus here and we really have two goals the one one is to help foster the kind of research that is Chicago's been historically associated with that helps people understand the world and a new way and Steve is about as good an example as exists and the second is to make sure that those ideas enter the marketplace the broader marketplace not just academia in a powerful way and Steve is also a model for that so in many respects the BFI is just an effort to try and emulate Steve I just wanted to mention a couple things I think he's steve is widely known but he was awarded the John Bates Clark medal which is given to the most influential economist under the age of 40 he was named one of Time Magazine's hundred people who shaped our world very famously he is a co-author of freakonomics and super Freakonomics freakonomics sold more than 4 million copies he also has a podcast to freakanomics podcast which receives 10 million downloads per month again maybe he does not charge enough for that so we can ask him questions about that but I just wanted to also say you know there's one could go on and on about all of Steve's accomplishments but in many respects I think Steve was really at the vanguard of teaching the economics profession what you could do as people began to gain access to data and computers and how data computers and economic theory could be used to understand it better the world better and lend often very surprising insights about the way the world operates so personally I'm incredibly excited to listen to what Steve has to say on today I think he's going to be talking about AI and big data so he's capturing some good buzzwords and hopefully he'll help us understand what they mean and with that please join me in welcoming Steve up here hey thank you Michael it's gratifying that so many of you are here although I know really you came for the lunch and you're all sorely disappointed that the Decatur decided to boycott my friends and instead you're left with kind of something second-rate Oh Michael talking about changing prices so we did in essence change prices and and raise the price for being a but thank you for coming and before I start I just want to mention something briefly so I'm trying to do something very different in the near future doing something non-academic where I'm trying to take Freakonomics kind of ideas and instead of just writing papers trying to actually go and and change the world and I'm looking for a handful of impossibly talented people to help me do that so if you're the kind of person who thinks that a combination of freakanomics and like a startup culture and doing good in the world it's the kind of thing that might appeal to you drop me an e-mail or come talk to me afterwards okay so back to the topic day so here's here's my motivation for I'm talking about today I have spent my life trying to understand data and I've really done it through the lens of how economists think about data and we have a set of tools I've learned from the really smart economists to develop those and then over the last two or three years more and more to the point where you literally can't turn around without hearing about modern data science and AI and and these these new tools okay and and how they're gonna revolutionize the world okay machine learning okay so I didn't know anything really about those set of tools and I thought well be interesting to to learn about them and what was shocking to me when I began to study them just a little bit I'm definitely no expert on them is how completely and totally different the modern data science machine learning approaches are to using data relative to what economists have been doing for the last 20 30 50 100 years and and it creates a puzzle because I know the economists are not stupid right I the people the best economists who thought of how to think you know how to use data were smart men and women who knew what they were doing and have generated enormous insight okay but I also know that the computer scientist has probably been smarter than the economists okay and that that the the machine like that make it can't be it didn't seem but could it be that the reason that these two are I'll show you how different across star but could they be this different because one group is just completely confused and doesn't know what they're doing and if it's not that how could it be that two totally different ways of thinking about data have arisen side by side and they really have nothing to do with their okay so that's the puzzle that I came to honestly like a year ago trying to understand and this what I've talked about today is my understanding since then of what's come - what I've come to understand okay so in particular what I'll do is I've spent probably too much time giving you a brief history of how economists think about data okay and then I want to talk also about modern data science just just to put the two into perspective and then here the questions that one tried answers why are the approaches so different because it really are extremely different will modern I was with mom I don't know what to call it modern data science it's really by machine learning ass okay is that gonna make economic approach irrelevant because I if you just read newspaper reports I think you would believe that that's true hey am i interested me no I don't think it will and then one ask well if you're an economist what and how should you borrow from this new data science stuff and what if anything can economic approaches contribute to modern data science yes that's that's my agenda for what I want to accomplish today okay so let's start with economic rights today oh okay starts with correlation today the most basic thing we have is correlation okay and I want to give an example X in men let me not me it's okay when I was first starting out as a researcher many many years ago I was interested in the question of whether prisons reduce crime so should we lock up more people to relock up less people okay and so I went in the literature to try and understand what people knew and I got a little bit nervous because whenever you have you ever in research and you straight you see the titles of papers anything oh my god people have already done you know what I've done my work is gonna be completely redundant so I stumbled onto a paper that the title was called a call for a moratorium on prison building okay and it was an empirical paper and it sure seemed like these people had figured out that prisons did not do a good job of reducing crime otherwise how could they have a title like a call for a mortar my prison building so I would attract on this paper is a 1976 paper from a journal called the prison Journal and what they had done is they had divided States into those states that had built lots of prisons between 1955 and 1975 and those set of states that hadn't built as many prisons between 1955 and 75 okay so there's a heavy construction States and the like construction States and it turned out that the prison capacity and the big in the where they built a bunch of cells had gone up by 56 percent and it basically stayed the same in the other states okay and then they looked at crime and it turned out that crime had increased by an amazing 167 percent and the places where they built all the prisons and only by only by 145 percent in the other set of states okay and they looked at the day that's he'll look we built a bunch of prisons crime went up more that proves that if we want less crime all we have to do is to stop building prisons we would you stop building the prisons crime would you know we would you know if we could presumably just buy nothing we took all the friends away they call amazing that would have an effect on crime okay this was published in a peer-reviewed academic journal okay and what these guys were doing we're using correlation to make judgments about the real world okay but in general we know that's not a good idea okay but what but but I still want to say that correlations are the single most important element of data that we have and why is it because the correlation is the only thing that God nature the universe gives us for free okay correlations are everywhere if you have two data series you have an x and a y don't know what they are you could generate the correlation between those two and those are right there for the taking and that's incredibly valuable useful because other than correlations Nature doesn't tend to give us things other than correlations okay the other reason that correlations are important because I think it's important to be able to draw this thing ssin between things everybody can agree upon and things that you argue about because of your preferences or your beliefs or your techniques without and the beauty of a correlation is that there's no reason at all that you send out to people from completely different parts of the world or the political spectrum and you tell them to compute the correlation between two variables why the are cuts gonna go and grab the same data run the correlation get the same answers okay so it's nice to have a starting point where everyone can agree and then let divergence happen after that okay the problem with correlations okay is that this just aren't useful they're almost never useful okay because all they tell you about is to tell you what do they tell you they tell you what has happened in the world they tell you that it was cold and it was dark you know it rained and the birds weren't flying around they tell you stuff like that but they don't tell you anything about why okay and for almost every question at least that economists care about why is it critical p3 because economic all about having theories about the world testing those theories okay and and it convert Lino economic argument that are simply descriptions without an interest in why two things are are linked together are or not linked together okay so a real limitation of correlations is they don't tell us very much about the things that ultimately at least economists care about okay so just as a quick diverge so why are correlations not very useful it's because there are many ways to get to a correlation okay so I'm use notation here where this arrow means causality so you can have a correlation between x and y that happens because X causes Y because two variables x and y X causes Y and that's when we think about economic models we usually write them down where we have an X variable and Y variable and we want you know to show that X causes Y or o demo or or or measure the extent to which is it okay that will generate a correlation because if X causes Y then when X goes up Y goes up okay it can also be the case that y causes X okay so in a mod in your world you think that X is cos uy but in reality Y is causing X this we call reverse causality it still will lead to correlation the same positive correlation between x and y that you get if X causes Y but Y is causing X so the inference about what happens if you want to go intervene in the world it's very different so what I didn't say so why do we care so you might say why do we care about the causal arrow we care about the causal arrow if we care about public policy because in public policy the idea is you have some variable you control that's the next variable that's how much you spend on on education in public schools or how much transfers you make to the poor or whether or not you make marijuana legal that's like an X variable and then the Y variable outcomes like are there lots of car crashes or do wages go up or do teenage girls get pregnant things like that those tend to be the Y variable okay we don't directly control those Y variables we control the X variables and so we care because the only level we can push on an X and if we push on X and it causes Y then Y will change but if we push on X so it's really Y causing X then push down X doesn't do anything to Y at all doesn't do us any good okay you can also get to a more complicated situation where there's some other variable that both causes X to change and causes Y to change okay and so then again if you start pressing on X not do anything directly - why you got to push on the Z not push on the X but if the X is the thing you control it doesn't do any good okay and then most common most complicated is a case where X causes Y and y causes X and I think like a good example this is police and crime right so we've got a lots of reasons to think that police which is like our X variable we can control that if you have more police you're gonna have less crime but it's also true that when you have a lot of crime you tend to hire lots of please so Y is also causing X right so we would not have people in blue uniform standing on almost every corner in Hyde Park if we didn't have crime right the crime is causing those people to stand on the street corner and hopefully the people saying that 3/4 is also causing less crime to happen ok but again just to correlation in here will not help you understand exactly how much X is causing Y exactly how much Y it's causing X ok the correlation is just a starting point for what comes next and so in terms of public policy you somehow have to have a way to to start to think about inferring causality from what you currency in naturally occurring data which is just correlations ok we have a set of techniques that economists have developed and this is really what I mean in many ways by the economic approach it's a set of techniques economists have have come up with to try to deal with the fact that the world only gives us correlation but will care about it cause out ok so the one that's kind of closest to my heart is what we call natural experiments or what I actually prefer to call accidental experiments that get kids that the idea better and the idea is super simple the idea is that really goes back to randomized experiments let me first talk I will talk later but ran randomized experiment let's start by talking about randomized experiment so why is a randomized experiment so amazing so powerful ok the reason is again what is the randomized experiment you take a set of people like this room and we we randomly draw out some of you to put into a treatment group and the other rescue going a control group and then we give some treatment in one part of the group and no treatment to the other we come back later and we see what's happened to the outcomes so maybe some of you we give you know this new flu medicine that's been approved so if you come down with the flu would give you some flu medicine that's been approved yesterday by the FDA the first one in 40 years and if you're in the treatment group when you get sick we give you that we see how much better you get and the other group if you get the flu we do nothing for you and then we see how sick you get hugs okay and so the beauty of it is that because we've randomized through the treatment and control if we didn't do anything to the treatment group if we treated you exactly the same when we came back later we would expect that on average the people in the treatment group which would look exactly like the people in control okay and so all we need to do is compare the people that treatment group to the people in control group afterwards we've now given the treatment with some treatment and any difference we see we think we can attribute to the treatment the intervention okay so the real power of experiments of randomized experiments is that you expect that other than the treatment the shrimp in the crow group would have looked the same okay and that's exact same logic that economists try to exploit in natural experiments or accidental experiments is we got it we try to find two groups of people who we think would have been the same except kind of my chance one group got treated differently than the other group okay but not with any real reason or logic really by accident okay and so the only difference between an accidental experiment and a randomized experiment is that the experimenter doesn't get to choose who gets treated or not treated or even how much they get treated or what you just kind of snoop around in the data and the archives and you try and find some examples of okay and so in general and what's interesting is the best accidental experiments a rise out of the greatest stupidity okay because in general we think in a sensible world you should treat people who are the same in the same way okay and that's the enemy of the actual exit of experiment and the actual experiment when I take two people who are truly identical and because somebody blunders you end up treating one very different than yet okay that's that's kind of the logic um and so how did that happen so examples are things like law changes so you know the law changes happens to come in on a certain day and and some people are grandfathered in there some people are under some people live in a particular state now there's no just over the border in the next day if it doesn't that's Allah or where the law says that if you're over the age of 65 that supplies do good not over the age of 65 doesn't apply to you have got people near the edge okay and arbitrary rule so one example light I've looked at ISM it used to be in airplane airplane to sales Airlines rousseff sales and if the distance of the flight was 199 miles or less than they would charge you $99 if the flight was between 200 and 999 they charge you someone if it was 1,000 2,000 and they had these really sharp divisions depending on whether the flight happened to be a 999 miles 1,001 miles it didn't really make any sense business-wise just as a rule they used and that induced a bunch of variation in prices for two flights that were pretty much so that's an example we can lose you could look at chief and anywhere really where you see sharp discontinuity now let me give an example in practice where we use this and that's in uber so oh boy turns out to be a great great example of a natural experiment so you all know over and you it used to be I don't you guys use uber long enough ago where you remember surge pricing much more clearly so it used to be you would open up the app and it would literally tell you that this ride is going to talk cut cost you 2.5 x what the ride usually costs okay and then a bunch of economists including your own John list went over and told them that's a totally idiotic way to do it because when you make it that transparent to people they're gonna react badly okay and so now that's all hidden in the background still happens you can't see more okay this on the x-axis is what overthinks is the exact true price that they should be charging okay and it goes from 1 all the way up here just to 2.4 it goes much higher but the data gets parsed okay so and on the vertical axis what we have it's a share of people who after they open over app actually end up getting in a car and and paying for trips okay and so we've divided the data really into really fine slices okay so the anyone who's between you know point 0.2 and 0.3 0.2 you know they never charged less than 1 but there's lots of times when they think the right price but actually less than their their regular price okay so the bars here that are just plain white means that that is not a bar where a discontinuities happen okay so so over these bars there's no actual change in price wherever you see the white bars there's no change in price occurring even though the uber model suggests that you're a little different okay so you'd see there's no change in price going as you move here and there's no change in the purchase rate okay so it's like super flat because all of these people kind of do the same thing okay and then when you see a red bar is the one little tiny sliver of data we have that's in just before the discontinuity where they suddenly raise price and the yellow bar is the data just after the raise price okay so the people in the red bars and the yellow bars are almost acts almost identical and I did it over but they face different prices okay what's so cool is when you go white by Wi-Fi white bar nothing happens then you go red bar to yellow bar you see a big drop-off okay and you see again white bar is pretty flat drop off white bar it's pretty flat drop off by first okay and that is so the red bar to the yellow bar is showing you it's the the response to price of people to these changes in price okay and then when you have that and build that data back into the usual way we picture for demand curve this turns out to be this now legendary in my own mind demand curve that shows you what I think is the first real demand curve that we've ever seen of something you care about and when you add that up so I don't know how much you know but if you add up the area and demand curve that gives you consumer surplus and by our estimates the consumer surplus that was coming to from uber in the year of our data a couple years ago was it almost seven billion dollars okay which turns out to be like huge relative to rubriz profits which were probably negative two billion dollars the amount that's paid to the drivers which is like I don't know three or three billion or four billion dollars somebody so really this turns out to be the single most important opponent is consumer surplus and it's kind of not surprising in a way people love uber people use it all the time that's an indication that they're getting a lot of surplusage but what's surprising maybe to us was that the ad that like the the willingness to pay for uber with something like three times the amount of right so kind of on average like two-thirds of like if you pay $10 like seven dollars on average if push comes to shove consumer rule that would have been willing to pay like 20 bucks for that same right and so there seems be huge consumer surplus which also is kind of out of you thinking about because this is all consumer surplus in a world in which lift is also out there so even given the other options people seem to get tons and tons of consumer surplus from over okay all right let's go back now to talk about structural estimation okay now structural estimation in essence I don't want to talk much about it it's a different approach one that a lot of resources in economics are currently invested in and what it tries to do usually in the absence of really good accidental experiments or randomized experiments it tries to use economics theory or maybe just functional form assumptions if you have to to try to to tell you given the data you have how could you then come up with what we call the deep structural parameters that actually describe the causal mechanism in the world and you could then extrapolate those to other settings okay it's totally different discussion talk about how successful the economists aren't doing that but two set of tools that you can economist use for doing I'm gonna skip this in the nature time I skipped a third point okay and in the last point is randomized experiments okay so economists have or late to the game in terms of using randomized experiments I mean they've been used in agronomy and and psychology for a long time but only really have they been really really common in economics in the last say 20 years but they've come to be a really powerful tool okay when they're alive they're randomized experiments in the lab economy future but more and more economists think about randomized field experiments of experience that are done out in the field that are creating insights about stuff and the all the one I want to talk about just super quickly was my own rather odd randomized experiment where I was interested in the question of whether or not people people are quit too much or too little okay whether they say you know quick jobs or end relationships more or less than they should it's economies have so much to say like with all these models about how people should behave but we don't really have a lot of evidence about how people do behave okay so the thought was how could I figure that out and it's not easy so let's say I want to think about like divorce okay how would I figure out whether people stay married too long or too little okay so what you want to know is well you want to know well if if people get divorced so you take two sets of people okay and they're right on the margin for getting divorced and once that does and you see whether they're happier you know six months ago it's the later than the people who don't get divorced right that would be the the thought experiment okay so how do you think about how you going to do that in the real world because number one I don't get to divorce and not divorce people and and it's I couldn't really think of a good accent experiment but what I decided what I learned actually because Dubner does he's freaking out mcpot caste and after he does podcast like a thousand people write to me and say how they changed how his podcast changed her life because we share a joint freakonomics email death and it suddenly occurred to me dub mera changes people's lives perfect okay I'm gonna take advantage of the fact that people listen to Dubner and I'm gonna try to change their lives as well and so what I did is I built a web page and we advertise it and it was called freakonomics experiments and you can go there we advertise look if you're having trouble making a decision in your life come to our web page and we'll help solve your problems and we we we did all assume you know we kind of get we've pretended to try to solve people's problems by having them think differently and ask different questions what we really wanted at the end of the day was for them to say I'm still just as confused as I was before I got here web page and then we said okay then we'll do you the ultimate favor and we had this beautiful virtual coin that would be tossed up in the air and it came up heads then you would get divorced quit your job you know get a tattoo whatever the Kim who tells you would okay and what was so interesting is that like 25,000 people came to the website and flipped the price and even more interesting they actually followed the coin toss so here here's another picture which is one of for me one of the most interesting gratifying figures I've ever made in a paper okay so here before they flip the coin we ask them how likely they were to get divorce to quit the job whatever okay and they can give an answer anyway from from 0 to 100 okay then we randomly assign them flip the coin okay and the people who got yes you know make the change they're in the Green Line and the vertical axis is how many people actually made the change okay and if you've got no don't make a change then you were the orange line here okay so you take people who said in the beginning I am NOT going to change no matter what okay they still flip the coin and it turns out that something like twenty percent of them who got heads ended up making the change and only like maybe I don't know twelve percent of people who didn't get heads made the change okay what's interesting is across the entire scope of how likely people said they were make the change the Green Line is well above the orange line okay so the difference between these two lines is vertical distance it's how impactful our coin to us was on what people actually did in their lives okay this is all assuming reporting stuff there's a lot of issues floating around that we deal with in in the academic paper but I really believe this to be mostly true and what's interesting he said people kind of new with themselves like the people who said they were likely to make a change they were much more likely make a change on average than the people who did okay but but people were far too extreme and thinking they would for sure wouldn't for sure making a change okay but then most interesting is we follow them up and six months later when we asked them it turned out that the people who got heads okay so the people who got the Green Line were for almost every question that mattered where it was like a question where it could affect your life like some questions should I go to this movie or that movie but anything that was important the people who got heads were happier six months later and were more satisfied with their choice and would do it again than the people who got tails okay and because I can't think of any other reason why the people who randomly got this virtual coin to turn up heads are any different from the people who got it trapped tails the only logical conclusion I can make is that the people who got head they were both more likely to change and they're happier so I think that the causal arrow goes from making a change being happier okay subject to a bunch of caveats you can read about in the paper which but I still think that it's it's actual true results and so the answer that I now so what's good is it's changed the way I think about the world and whenever anybody asks me any question about anything I always give the same answer which is I tell them to quit okay if you ever are on the margin at all you should change what you're doing because at least according this evidence people are way too reluctant to change and this should not an average you should be you know if you change if you're not sure what they do you think it kind of be the same outcome whether you change or not but it looks like people don't change quick stuff nearly nearly as much as they should okay so that's I've talked too much about economics okay but that's what economists do more or less okay they do things like randomized experience and actual experiments run regressions you know that's the way we think about the world okay so what has made it the modern data science has a whole bunch of different names okay so things like random forests and cluster analysis and deep learning okay and it's not just that they have different names but their techniques are completely and totally different than what economists do and the easiest one for me to explain in a short amount of time is random for us okay so let me just explain as an example of how they do stuff okay so the basic idea is that you have some outcome Y and you have two possible variables that can explain whatever that did yeah can explain what happens while the x1 and x2 and what you do is you start by building what's called a decision tree so a decision tree that's the following it says I'm going to do a cut of my data I'm gonna cut my data into two pieces okay and I'm gonna choose the the pieces that cut them into based on the cut that will maximize the difference between the two the two pieces of the data that are left okay so I basically slice part of my data to make give me an hour two data sets and the the characteristic of these two pieces it's that they're really unlike the other one hey that's the first step in my tree the next step in my tree is a look at what's the next cut I can make so now I have two datasets I can either cut the first data set okay so I look at the cut in the first data set that will make the biggest difference between what's left and that data set or I can cut the second data set and I find the one that makes the biggest difference between those two okay and I just keep cutting up until some point of stopping rule that I've defined about weird weird of stuff okay that's what a decision tree is so here's an example where you start okay in this variable in this case X 1 and X 2 each are allowed and we just think I'm not sure we generate the data I think we just generate fake data here X 1 runs from 0 to 1 and X 2 runs from 0 to 1 I think the uniformly distributed between 0 & 1 okay and then we have some Y variable and so whenever the Y variable is equal to 1 the Y variable is equal either you can walk at a 1 or 0 when it's equal to 1 it's a blue dot here which is equal to red it's it's it's a 0 okay so the blue dots are 1 and the y variable and the red dots are zeros for the Y variable okay and so this is just our universe of data that we created and when you start making cuts of the best cut you can make in the data that explains the most is divided the data between cases where X 2 is less than point 3 versus where X 2 is greater than point 3 so that would be this line right here so the first cut of the decision tree split off this part of the data from this part of the data okay then the next step is now you got where X 1 is less than point 8 8 okay so then you now cut this part from that part okay and you just keep on cutting the data until you get to some like you can't explain very much right yeah that's a decision tree okay so what I think is so so to an economist this is a somewhat bizarre way to think about cutting up data okay it's very it's like it's um you know hierarchal some person but but what's most interesting is what you're left with at the end okay so under if you can read these numbers but here you have this block here okay and in this range the the mean value of y is point 1 2 3 so Y is almost always 0 so almost all these dots are red but look right above it to all the blue dots right there okay so on the line right between crossing over from here to here like there's a huge changing your prediction right if you're just here you're gonna predict that it's going to be blue for sure if you're just here you're gonna think that it's likely be red it's super nonlinear okay so the defining characteristic of this and really all of the new modern data science techniques is that they are enormous ly nonlinear in the sense that they let small changes in your variables can lead to really radical predictions and what's gonna happen in in the world whereas in general almost all economic models have a kind of feeling of linearity built into them okay so this is a single decision tree what a what a random forest is is you take subsets of you didya like a million observations and you take say a hundred thousand observations at a time you pull out a hundred other observations you build the decision tree you put those hundred thousand observations back in you build another decision you pull another hundred thousand build another if you do a bunch of those until you've built a forest of decision trees and then what you do is you let each decision tree gets to vote okay so now you get a realization like out-of-sample of some particular sets of X's ones and x2 and then you let each decision tree have one vote about whether it thinks that Y is going to be a one or a zero and majority rules okay so that's that's what the way these models look okay so I didn't really explain to you the nuts and bolts of how we build the kind of metric models but I just got to say it's like nothing like this there's zero familiarity to anyone who's done econometrics in what happens in this kind of I'm up okay so then that's where you just say well this is super weird that these exist so now let me kind of give my sense of what modern data science the core of is okay the core of is it is theory free okay it's just a way of cutting up the data and it doesn't care why you're cutting up the data okay where's opposed to econometrics at least pretends to care deeply about the Y's that are underneath it okay you you at least pretend that you have a model in mind would you call and run the data and that that model is what you're testing okay interestingly it's focused almost all on correlation and Patrick and pattern recognition just another way of saying correlation myself so these tools turn out to be empirically it turns out they turn out to be incredibly good at like telling you whether it's a dog or a cat okay better than econometric models of trying to get the features of a picture and then tell you as dark yet these things figure out what the things they're dogs or cats okay admittedly self you know they're they're obviously black boxes right you you plug stuff in and even when you get the answers out you often can't exactly tell why the model told you to do what it tells you it's a dog or cat but you don't really understand why the model thinks it's a dog or cat okay but the fact is when it comes to the kind of problems they've been applied to they are really really effective far more effective than the kind when economists ride usually to build a model that that does what these models do using our tools were not as good as these tools are at joinha okay all right so now back to key questions quickly cuz I do want to leave you chance to question okay why the to approach is so different is the first question okay and the first one is in part because there's a different mindset of how economists think about the world how computer scientists think about the world okay but I don't want to oversell that because I think that's actually the least important reason why these things are so different okay the real reason they're different and once you see it it makes it only I was is that they were designed to do two totally different things what economists try to do with data is we try to explain why something has happened in the past right we take a historical data set we try to understand it and we try to we try to put the reasons and the causal arrows into it okay when people use the data science approaches they almost invariably are only interested about predicting the future okay so when Netflix wants to figure out what movie you are likely to like they do not care at all about why you're gonna like that movie or what your background is or what would happen if a completely different set of movies were released all they care about is can they put a movie in front of you that you're gonna like so you keep on sticking with Netflix and not doing something else okay and so interestingly it turns out that predicting the future in the static world is roughly the only interesting question that exists where correlation is good enough okay as long as the world is the same tomorrow as it is today okay and I know that who cares why today turned out the way it did today tomorrow is also going to more or less turn out the way today is - okay and that's the premise of modern data science essentially is that the word that what happened in the past is going to happen in the future and so I don't need to understand why so it's interesting okay and we're surprising to me it took me a while to see that that that really is like modern data science is the king of correlation right and so it is the king of predicting as long as you're predicting the world where stuff's going to be the same okay now the problem is that well let me not get ahead much okay last thing I was going to say that the other reason that they're different is that modern data science only ask questions when there's enough data to ask that using modern data science approaches it turns out modern data science is very greedy about needing data relative to old economic approaches and so there are lots of questions you wouldn't even think to try to answer with modern techniques that economists are quite content to try and tackle because we don't need so much data because we impose more in the way of theory okay so will modern data science make the economic approach irrelevant okay so the obvious answer that question is no okay because for the thing that economic the econometrics does which is explaining in the past modern data science has not and I don't think ever will prove itself to be particularly useful okay and but look that the fact is if you then you go and think about it is mostly people don't care that much about what economists do right so it turns out in business and in practice predicting the future turns out to be a lot more corn than explaining the past okay and it's just economists just haven't been in the business of explaining the future you know putting the future it's not our Forte it's not what we think our job is okay and so I think I will lots of job security as economists because business don't care about understanding why the past is the way it is so economists can keep on doing this okay so what should we borrow though okay it seems kind of sensible there's all these new techniques what is in it that economists can use okay the first thing for sure is that win in those rare cases and they're quite rare where economists are actually in the business of prediction I think it's silly for economists not to use the data science tools because they are simply empirically almost always turned out in a horse race to do better then the then the the econometric models in doing pure prediction things when you have really thick there okay the other thing that I think is more holistic but is the bigger value to you know is that modern data science has had a spirit to it which is just that everything is data words are data images are data that you can do you can basically exploit anything and turn into data and that is a really smart and important idea that economists didn't really latch onto economists mostly thought the data was the stuff that the government produced and put into data tapes and books and and always there was numbers and always it was rectangular in a sense that there was like the way data was structured was there's a state and it has a bunch of variables about okay but anyway is super important like even stuff like voice and face everything is data okay and that's the thing that if you are a researcher or someone using data business your mantra should be everything is data it's just a matter of whether I'm clever enough and have the right tools to turn that thing into data that can use to be effective okay there's little stuff that a bunch of counts right about how we can do like things slightly better and binning I don't care about that okay but the fourth thing that I think is you know potentially harder to see but interesting is that if modern techniques get good enough like so look of course if modern AI becomes better than humans and doing what humans do well then you can see like then we don't need economist anymore right because it would be better but but more subtle is that these techniques might be really good at brute force looking for natural experiments and then churning those up so that economists could then try to like-new the human element to try to determine whether their next experiment because what what these techniques do a lot of is they they look for these highly nonlinear things so you know they might say you know for some reason patients who have these seven characteristics all at the same time tend to die alive even though if you only have six of them you don't die any more than other patients okay and then that might be something that would lead economists or doctors in this case to say okay can I think of a theory why those seven things together might be a sign of some underlying causal thing so we might be able to use techniques in that way okay okay so what if anything can economic approaches contribute to modern science I think this is actually a more important question because I don't think people have been asking this very much as opposed to a lot of people are asking the other dimension okay the first is that in a number of cases I've been asked to build economic models in settings in which other people have built data science modern data science math and what's interesting is their models always beat my models in terms of prediction but the correlation between my predictions and their predictions is actually really low it's often like 0.5 or 0.6 okay and if you have two signals that are not very correlated if you take a weighted average of those two you could actually do better than either signal alone now you put more weight on the data science when you do one you can kind of metric one but having multiple like relatively independent signals is really really useful in prediction problems and so for a couple times we've worked with companies and and we're able to convince them that some kind of old-fashioned approaches really had real value for them because it gave such different answers relative to the kinds of answers they were getting with newer approaches okay the other thing is that in a world that's changing the world could change faster than data are generated that the modern technique techniques used to keep up so the model can't keep up with the changes and that's another case where you can obviously see why you'd want to use the old techniques which are more anchored in causality so in if you have a model anchoring causality even if the world changes you can often use theory to make a prediction but will happen in the new world we're in a new world myself so it Netflix if something totally change and now you know and now I don't know what it is you know I mean what the change would be in Netflix you know sudden we have a whole new like people stopped watching regular movies and the only watch you know virtual reality 3d movies like overnight like they're old models aren't gonna pretty much of anything until they have enough new data on the virtual reality stuff but economic models might tell you something about what the transitions okay as I said before if there's enough data around it's also true that if you're faced with no estimate at all because you don't have an updated into modern techniques you could you can fall back on the techniques okay my last point which is almost unrelated not take some question but it's I think in some sense the most funneled fundamental insight that's come to me over time is that often times when you talk to people there's this idea that big data is kind of enough by itself right they're just a big data it solves all your problems okay and I have come to believe that that's not true at all in fact what I couldn't believe not only is big data not an answer but myself but there's actually a complementarity between ideas and big data not a substitute ability right so it's like if you have big data and you have great ideas you can do amazing things you couldn't have done either without ideas poor without big data okay but just having one of the other ideas and no data or data no ideas I think action leads you to a terrible place and and I've worked with a lot of companies I have not worked with a lot of companies have that have a lot of ideas and no data I've worked with lots of companies that have a lot of data and no ideas and I can tell you I've never seen a good outcome from the use of data at the companies that simply don't have any ideas but have amazing data that they imagine should have answers to everything but they can't they literally cannot think of the question to ask and so what they do is they stockpile terabytes and terabytes of difference of data which becomes a bigger and bigger burden on them how to keep track of it and they never do anything with it except like draw incredibly terrible inferences which end up leading them to make bad decisions decisions worse than they would have made if they had met look I think the future belongs to companies and situations and people who have big data say big data is incredibly valuable and it will make those of you who also have great ideas much more productive as well okay so let me stop there I only have just a few minutes for questions but I'm happy to take them and whatever you see fit we have microphones here if anybody has anything to say all right if no one wants so we could let people go or I can um I just keep on talking straight doesn't store it oh yes I'm here's a brave soul hi hello yes I'm a really brief question I just wanted to hear more about sort of do we describe the coin tossing experiment I feel like it was counterintuitive steps that you've taken from sort of sampling and the way you wanted to approach gathering data the conclusion wrong voice I didn't know it can do that and then the conclusion that you drew from that data we're really like non-intuitive sort of how did you think about the process did you have a method ins did the experiment kind of you know have a life of its own as you've gathered data and you changed some things about it as time went on you know what pretty much was the the coin tossing experiment questions like I don't think adhere the coin testing question was did I was I was the question is so it's kind of surprised at the what seemed to be logical jumps and fallacies in what you did in the coin toss experiment and David emerge whole cloth or did it change along the way what you're thinking in fact here's somebody thinking wait okay my thought was the perfect okay that I learned this from Michael actually so Michael's the first one who ever told me to do it Michael and whenever a student is presenting results will say to them if you could have exactly the data set you wanted the perfect data set what would it look like you still got that question Michael okay so Mike elastic was okay and I but what is a perfect question perfect data if you want to understand about say quitting your job and should you judge okay and what I would want is I wouldn't want people who are exactly on the margin for quitting your job I want the people who wake up every morning and say my god I hate my job I want to quit my job okay but they just suck on it either they don't sometimes they quit sometimes they don't okay that's an essence of perfect data because I don't want to Sam I don't want to fire a bunch of people who love their jobs and see if they're happy or not because they're not on the margin is all about people on the margin because whatever you're having trouble making a decision you're out the mark okay and so what I wanted to find was a pool of people who were so undecided about whether they wanted to get divorce to quit there of it that they would come to my idiotic website and flip my idiotic coin right so that is like the definition of somebody's on the margin if a virtual coin can sway your decision you were really truly on the margin okay it's open it was superb for me really it what's the perfect data set now what was not perfect about it it's the fact that when I then tried to it said I couldn't really sit on the shoulder of these people or get inside their head and know how they really felt so I had to rely on them telling me how they felt I had to rely on them responding to my survey so not everybody when I contacted in six months later wanted to come back and talk to me now a partly I deal with that by being clever upfront and when you said you were gonna flip a coin I also asked you to name a friend okay who could who you would also tell about the decision who then I could go and talk to and ask what really a bit II really quit his job did he really is he happier in just lying to me okay and so it's good that I had these third parties who didn't have the same incentive to lie not like anyone at a very starting seemed to lie but I think one thing we learned experiments is that people try to do it spare mental subjects try to do what the experimenter was okay and so when when people were told to put their jobs I'm afraid that might tell me I didn't quit my job consider why me they make me happy and then I've asked them later were they happy I maybe they would tell me they were happy because they thought I wanted them to be happy okay but the third parties who like didn't even know who I was and why I was writing them I thought were much less likely to want to come out and say oh yeah lie to me and say this guy was really hot this guy quit his job and stuff like that so so but but in essence the this is one of the unusual papers I've written where from the very beginning I knew exactly what I was going to do exactly how tel I stayed and what I expected to see as opposed to but usually I just get Apollo data I start thinking about it I kind of worry about the details oh okay let me take one more quick question and then I'll I'll let you guys go thank you so much I kind of wonder can you elaborate more about your startup what is your start with trying to do after yeah your fastest speech about yeah my basic experience so I've been really lucky right so for an academic economist I've had an incredible ability to talk to people the opportunity to talk to people right so a book that's that's highly read this podcast and you know I can I can get 25,000 people to you know get divorced or not get divorced unlike just like whatever but what's interesting is I cannot point to any successful change in public policy for better or for worse that I think it's happened because of my research I think the world the out most of the outcomes in the world at least directly would be exactly the same if I had never been here at all okay and it's it's an interesting it's a telling statement about the inability of research at least my kind of research to have an impact okay and in some sense should I care about that I'm a Trevor you care about that or not okay but but I've gotten to an age where I think it would be kind of mostly gonna be fun to try to go and change the world and do stuff and in particular my own view of the world is that when things are easy to do people go and do them and when you will be showered with accolades and love because you go out and do something that feels really good that people go and do them so I think a lot of the easy things that philanthropists can do they've already been done okay I'll give you this so I talked to one very prominent bland service and and he was talking about a problem and I suggested a solution to that problem he acknowledged that what probably would be a good solution and he said but the thing is the only reason I do philanthropy because I want people like me okay and if I did what you said people wouldn't like me so I'm not gonna do it okay so in a world in which that's the case I think there's room for someone like me who doesn't really care whether people like me or not to go out and try to make the world a better place by doing the kinds of things that other people like who care maybe more about the reputation or maybe they want to run for president someday or whatever that they don't want to do that coming so essentially what my you know startup wants to do is to find really smart people and to take ideas we have but also I mean the thing is I believe that they're amazing ideas out in the world and through the freakonomics platform there are a lot and sometimes good ideas are attached to people who want to spend their life pursuing those ideas but more often than not it's like people like my dad my dad just like a happy doctor does his thing but I'd say over my lifetime my dad is called me five amazing ideas that I think could like potentially have in very small ways and world-changing my dad's not gonna go with quit being a doctor and go start something so so I think we have the possibly to crowdsource really amazing ideas that other people don't have access to and then you know clean the best of those and really go and put them into practice but the problem is that ideas alone don't win okay what I've learned they said you can't go to a funder or implement you'd say I have a great idea you have to go with more let's go with something like a prototype you have to go with an example of how it works it has to be really simple and so what I want this organization be able to do is to take ideas and turn them into you know the equivalent of a business plan or a prototype or a device that can then actually be there for people because people don't often have imagination myself included about what are the buildings if you actually put like look I never I thought why would I want a cell phone what could I possibly do with the cell phone my life is perfect with a cell phone why would someone make is okay 10 years later like I can't even man it would life without a cell phone but I needed someone I had to put a cell phone in my hand or I had to see friends use cell phones to see it so in essence that's kind of what we want this startup to do and but we don't want to we're not going to be massive like implementers or manufacturers we want to be kind of intermediary through then fine using a market test we want to be able to like sell in an intellectual sense our ideas to the kinds of people who would go and implement them or would want to like take them and do business so thank you so much for your time ok that was a terrific speech by Steve I just wanted to my favorite part Steve has this incredible manner of talking where since it's not a big deal I didn't do you know everyone doing he said that's what economists would do and so I just want to underscore Steve among the many things he's gotten accomplished he got 25,000 people to flip a coin and change her life and what I but I think there's something larger to take from that that I just want to underscore for all of you it strikes at Steve's willingness to think about the world in a way that the world does it have to be the way it is and that as he was saying theory and ideas combined with data can often allow you to alter the world or shape the world in a new way and so I think it was just a terrific speech for in the lesson I don't want to be missed is that I encourage all you to think about how not to be afraid about thinking about how the world could be different than it is and change it in some way so thanks deep [Applause]
Info
Channel: Becker Friedman Institute at UChicago - BFI
Views: 6,796
Rating: 4.968504 out of 5
Keywords:
Id: 2EH1D3nhOGI
Channel Id: undefined
Length: 57min 7sec (3427 seconds)
Published: Mon Nov 19 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.