Steven Levitt, Sackler Big Data Colloquium

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good evening it's just past six o'clock so I may say good evening instead of a good afternoon but you you're very welcome my name is Ralph cicerone I'm president of the National Academy of Sciences it's a it's a wonderful honor and a pleasure to welcome you here tonight and to tell you just a bit about the lecture and the lecturer this is our 15th annual Arthur M Sackler lecture and the lecture is presented along with the National Academy of Sciences Sackler colloquium on drawing causal inference from Big Data first of all I want to recognize and thank Jane Gillian Sackler for her sponsorship of this entire program would you mind standing and saying aloe gel she serves as president and chief executive officer of the excuse me Arthur M Sackler foundation for the arts sciences and humanities and they do wonderful work in all three of those major endeavors and she has funded this program at the National Academy of Sciences and memory of her husband we are in our 15th year as I said of the successful programs and in the program the inside front cover has a very brief summary of dr. Sackler career in these three major endeavors it's it's quite impressive at our conferences we try to bring together people around different disciplines that sometimes don't interact so the programs tend to be interdisciplinary to bring together this diverse group of researchers and practitioners on topics that are usually a broad current interest like today and each year one of the colloquia out of several colloquia each year one of them will feature the annual public lecture as tonight's program exemplifies the colloquium has been organized by Professor dick Schifrin rich different from Indiana University Suzanne Jame Mike harlots Jennifer Hill Michael Jordan Bernard Shaw Colt and Josh each second and I want to thank all of them on the organizing committee for how effectively they work to assemble just a stellar roster of speakers from different disciplines to examine health is so-called Big Data phenomena it's now being used in social networks medicine health economics business internet search engines genetics and for all kinds of mischief well that's just one person's opinion I now have the honor of introducing tonight's speaker professor Stephen Levin dr. Levin received his PhD I suppose he earned it also at MIT didn't just receive it that alone is a major accomplishment these days he serves as the William B Ogden Distinguished Service professor of economics at the University of Chicago where he also directs the initiative on price theory and Chicago experiments at the Becker Friedman Institute of course named after Gary Becker and Milton Friedman in 2003 he won a major award that John Bates Clark medal given by the American economics Association at the time I think that medal was only given every two years to the person judged to be the outstanding economist in the United States under the age of 40 some years later maybe around 2010 the award began to be given annually so it's very very competitive highly selective award and it really is very meaningful he was also named one of the Time Magazine's 100 people who shaped the world in 2006 he's the co-author of the 2005 book Freakonomics and a 2007 sequel entitled super for economics which served both as economic textbooks and as a series of cautionary tales about the fallacy of conventional wisdom not only in economics but in our daily lives in 2014 he published to think like a freak which describes an entirely new way to sell solve problems he asks very provocative press questions and I think all of us can identify with and looks at standard as well as alternative ways of answering those questions so we look forward to hearing his thoughts on the topic today professor Levin thank you and welcome thank you all for being here it's it's wonderful to be able to talk to you and and an honor to be associated with the names of the of the presenters that at this at this conference and the past speakers in this series so my first my first taste of big data I think would have been about 1998 or 1999 I was in Washington DC I opened up the newspaper and there was a little blurb that said that there had been a whistleblower in in Japan who claimed that sumo wrestling was fixed okay and I don't know anything about sumo wrestling but I was obsessed with the ideas of cheating corruption and I thought well you know usually when somebody says something's fixed it probably is fixed and maybe I could do an academic paper on it might have seemed like a pretty stupid idea but it was the idea I had so I went to I went to my computer I mean this is how long ago that I went through so it used to be before before Google changed the world that you'd go to say like something like Lycos and you type in something and nothing you ever wanted would ever come up okay did you go to Alta Vista the same thing sorry I did this and I just that morning someone had said to me I've never heard of this thing Google before but it's really I typed something it was amazing so there the first time ever use Google I just tried the other two search engines I usually used and I had typed and found nothing that would help me write this paper and so yeah completely with no hope at all I typed in the first three words I ever typed into Google were machine readable tsumo data okay four words okay and it was truly unbelievable because the very first thing that came up this back when the internet had hardly anything the very first thing that came up with a link to I guess it was the guy's name is Jerry Yang or something from Yahoo it turns out he loves sumo wrestling more than anything and so he had hired someone to type in the results of every sumo match that had occurred in the last 30 years in Japan and literally I got connected to a link of like a you know of an excel file that gave me you know 2/3 of all the data I would need to write a paper now just to put in perspective what it was like to do economics back then I mean it was hard so my friend Josh angrist at MIT he was doing a lot of work on the census and and there was a room at this place called the National Bureau of Economic Research in Cambridge and all it was were these enormous data tapes they you know you know heist off you carried them to something or dispensed mminton and it was a full-time job for a graduate student or an undergraduate to spin the data so you could think about doing an analysis okay and if you tire somebody do it now it would just be a few years later where that same process would take about 15 minutes okay but at that time to actually be able to go on the web and get these these data we're truly truly amazing so I got these data turned out that as I said it wasn't everything I needed I did need to go and buy the last 20 years of sumo World magazine and heaven are a type and a bunch of extra stuff about the the wrestlers and whatnot but it turned out to be within about a day I could make a graph that made me completely and totally convinced that sumo was utterly raped okay and then I did more analysis and and the patterns were unmistakable so it turns out there there's a so one of the themes of this talk is going to be that that big data alone aren't enough you need to understand human behavior and an incentive for most of the applications okay in this case reading about sumo wrestling I could see that the that in their matches which they they have they do 15 bouts is part of a two-week match and if you have a winning record you win at least eight of them then you get promoted up the ladder and if you have a losing record seven or fewer wins you get demoted so it turns out there was a you could see very clear there's a very sharp discontinuity and the incentives around the eighth win so if there was going to be cheating what you'd expect to happen is when two wrestlers met on the last day of the tournament and one had exactly seven wins and seven defeats and the other had say six wins and eight defeats well the one who had seven wins really wanted to get that eighth win but the one who had six wins the seventh win didn't matter as much as a so there were gains to trade and if you could figure out how to cheat you would over all you'd be better off okay Sonia it turns out there's a very stark pattern that showed that the people who needed these eighth wins were much more likely to win like they won maybe 75 or 80 percent of the time okay and even the day before if you had six wins mean you needed to win two in a row you almost always win your seventh win and you almost always won your eighth one it's clear that the pen but think people would say well but maybe that's not cheating maybe that's just effort people have a secret move and they keep the secret move in reserve until they really need to win or sometimes you laugh but Mike that was my um my colleague Casey Mulligan's theory was the secret moves theory but but then you saw other things it turns out for instance and this is the kind of thing could only do with with big data is that you could look at every time that I say Ralph and I had to wrestle okay and I needed to win that day okay then the next time that I met Ralph which might be two months later it might be two years later it turns out he won about 80% of the matches against me the next time two years off so it really looked like very much like a tit-for-tat so when I needed the win Ralph let me win and then I remembered that and then in the future I paid Ralph back and so it turns out we did this analysis to me it was quite convincing got published in a good economic journal and we wrote about it in in freakanomics which ended up being published in Japan and literally we convinced not a single person in Japan that sumo wrestling was covered what's so interesting though is about seven years later they found two text messages from from from one wrestle to two other wrestlers saying okay I agree this is the amount of money will transact I'll throw the match turned out those two text messages were enough to come to everyone in Japan the similar wrestling was fixed big data had no oppression whatsoever on how they did it but that certainly fueled my optimism about about how important big data could be to me as a academic economist going forward but then of course a little bit taste of reality came not long after and I was minding my own business in my office and I got a call from this group called FinCEN okay so it's the financial crimes and I don't know where it is but Finn said it's part of department Phedre there they're in charge of money laundering why not and they had read some of my articles and they said we really need your help okay I super excited right because I already told you I was obsessed with the idea of being able to look into money laundering and crime of all sorts and they said there's a meaning of the of the G 12:15 I know it some G something and and the thing is Germany has an estimate of how much money laundering there's the UK as an estimate how much money laundering there was go in the world Canada has one we don't have one we need one could you could you make one for us and I said okay just to be sure to thank so you're saying you want me to estimate or make up essentially how much money laundering there is in the world exactly yeah because we need to have a number when we shove it to me I said okay and then what are you and then like oh we're gonna try to then reduce that number somehow no no we're not going to try and reduce it we just can't show up with the meeting wheel in a number I said well that doesn't make a lot of sense me tell you what I'll give you the number if we can make a deal that that will work together to try to actually reduce the number and I said okay that would be great okay then reality said in first of all was government work so in order to actually get access to the data it was going to be some sort of six-month process where I got all these credentials that are uttered up and then I said I said well what did you want it how do you want to do it and I said well you know I know you have these suspicious activity reports which I and whenever someone put like does a bank deposit over five thousand dollars or something that get filed and I really think you know in the spirit of what we now call big data boat didn't call it that then you know we could pour over those data and understand and some of the patterns and implications and I said probably already do it yourself but maybe I could help in some ways and they said I mean you're welcome to do that but but the number of boxes we have of them is really overwhelming I said what do you mean I said well we get about a million a year I said that's fine I said yeah but they're just in boxes we don't they're not electronic and I mean and and I couldn't believe that but it actually seems and I said well then hot why do you make people file a million of them of all you do is put in boxes they said well it's good to have them because when we catch someone when someone else catches someone else doing something bad then we can go back and look through the boxes and trying to find evidence we use that in the trials to try to you know extra evidence against them so I'm glad to say I did go back to FinCEN a few years ago and they're no longer in boxes but still as far as I could tell no one's looking at them to try to catch criminals until after they already catched the criminals so um so let me just start with some okay I pulled some facts about big data off the web which I don't know if they're true and they're so outrageous that I don't actually believe them but but it actually makes a point about big data so according to the internet every minute in 2014 Facebook users share nearly 2.5 million piece of content this is this is per minute 300,000 tweets over 200 million email messages are sent Google receives over 4 million search queries and that doubled from just 2012 which is shocking to me okay so so as of 2012 the claim is that every two days we create as much data five exabytes as we did from the beginning of human history to 2003 it's really I mean it's even if it's exaggerated by you know a hundred times it's still an amazement this is one that was a funny thing so there are over 550 billion pictures posted shared on the web each year and that's seven times the number of pictures that were ever even taken in a year 15 years ago so this is just an absolute explosion and in the amount of data out there and so um so we talked about Big Data I don't even try know what Big Data is I'm just gonna give you my definition of Big Data and it seems to me that my that that the Big Data has a couple of elements one is that it's big there's a lot of it okay the second is that it's often a different kinds sources of data than we've used in the past so so typically as academics our data sources were were government surveys and they were kind of big but they were narrow and what they covered okay now when when people think of Big Data I think they think it looks like scanner data search data you know really massive data sets that are there tend to be highly individualized have a very disaggregated data and often have an element of real to them I think what makes them the most interesting which is a real-time from a business perspective and particular academics we don't do anything in real-time but in businesses and in policy you actually want stuff in real-time to try to be able to make make decisions and the other thing that I think often comes with big data that makes it different and interesting especially for academics is the is the fact that it often now has less structure than what we're used to I mean the data sets I knew when I was you know coming of age in economics were you know us you had 50 states and you had 20 years and it was a rectangle so 1,000 you know a rectangle with a thousand thousand rows and there's many columns and now you you have these data that are more much more open-ended like how someone's you know the the search pattern someone's had on the web for the last month you know some people will have none some people will have infinity it won't be in a life so it's I think that to me is what is most interesting also the network then some networks are what's really interesting academically ok but I got to confess to you that I'm I'm not an expert in this area at all I am not so the people presenting in this conference are true producers of academic insight about how to think about these data ok but I'm a consumer I'm a very active consumer as an academic of these tools and a big data and so I write all was my my papers are are data-driven and I use big data and recently I've been doing projects which have I think been exactly in the sort of the heart and the spirit of what of the changing patterns in academics a big data so for instance one of the products I've just completed started from the questioning that I was just interested about Hurricane Katrina so Hurricane Katrina was his incredible catastrophe and what makes it interesting you know as well as tragic is that there was virtually no warning ok and people's lives without any warning were turned upside down something like half of the residents of New Orleans had to flee maybe 25 or 30% never came back many people died but it was a complete in total economic disruption of the lives of a bunch of people and especially of a bunch of very poor people in the United States okay and the question is what happened to them how bad was it okay case we know from other settings like if you work in a in a steel mill and the steel mill closes you will you will suffer economically for the rest of your life but this is very different because this didn't you know the people who still had a very specific set of skills which are no longer valuable because not only the stairs steel mill closed but a lot of other ones do as well it's a very different kind of shock okay and it's especially interesting to me in light of thinking about of what climate claim climate change because it's quite likely that over the next hundred or so years we're going to see major disruptions due to changing climates and in terms of quantifying the costs understanding what the the human toll is of this disruption is relevant now it's a different kind of disruption because the climate change one comes with a 50-year warning okay clearly you can do a lot to prepare for with 50 years but if we start just you know but but it's a kind of an upper bomb the effect of the of the folks in New Orleans is kind of an upper bound on how much can be hurt by dislocation now it turns out it's a really hard question to answer using typical data sources because New Orleans is a small part of the US so any survey that's done by the government doesn't have very many people from New Orleans is essentially no reliability okay and you really want long-term impact so it's it's you know it's it's very very few of the government surveys are you gonna find any evidence of where someone lived three years ago or four years ago she's not part of the questions you ask I started working with a big insurance company thinking well at least the thing about insurance is that the people who had insurance and lost stuff are going to keep in touch with the insurance companies that have a way to track people okay but turned out in the end they weren't so interested in what I was doing and really it was such a incomplete subset of the data it wasn't that good but it turns out now and it's almost beyond belief and I almost hate to talk about it publicly for fear that when people understand that it existed that people get nervous but it turns out through the Department of Treasury with the right with the right credentials you can actually get the the w-2s and the 1040 s of every person in America depersonalized but with the ability to know where they live and attract them over time okay that's it so twelve we did so we were able to take a pool of people basically not a pool but every every person living in New Orleans in 2004 before the hurricane hit and then the hard part is he who's your control group because the problem is they were only a little bit different than anywhere else on the planet but but we took a bunch of cities that look a lot like New Orleans on observables and we took individuals from those cities and try to match them up to the people who and so that so from 2000 to 2004 the economic patterns and the ages look similar and the marital patterns things for those people and then we get a shock to New Orleans and not a shock to people who say lived in Cleveland or Jackson Mississippi or whatever the the control city would have been and we just followed them through their tax returns and you can know a lot about people from the tax returns because you know not only their income but but you can look at whether they're their dividend returns and you can see what they're getting in terms of government you know retirement and and and other government sources and relief and all sorts of things and in many ways I think it was the first real long term look that anyone had into what happened to these folks so what happened so it turns out that not surprisingly the hurricane had a really dramatic economic effect on these folks in the beginning so the average incomes fell by I don't know 25 or 30% and they stayed down for the first two years over very low okay we can also follow the mobility people moved and then most of them some came back the most stayed away then something really interesting happens the wages of the the people who lived in New Orleans at the time now they could be spread out anywhere in the world now and when the u.s. Emmys started to rise and by four years after the hurricane their incomes were actually higher than the incomes of the control group of people who looked like them beforehand but had lived in some place other than Orleans okay and so it actually in that Confucius club so we're spending our data now it's we've got the latest data to see but but what it really looks like is that Hurricane Katrina just wasn't a very bad shock for people who lived in New Orleans economically now I can't say anything about utility now my guess is it was terrible in terms of welfare and utility whatnot that basically because people could have left before but I think we don't know always in economics you can say what's in the data and then you have to start making up stories about the why okay so that's what's in the data the why so here's my own view of probably what happened I think that people in general tend not to move because they get a lot of utility out of their neighborhood the family the friends people around them okay but if you happen to have the bad luck to be in a place like New Orleans where the economy was not very good there weren't there are many jobs then you're kind of stuck economically then the hurricane you know you know zaps you out of that reality and sends you off to some crazy place like Houston with no warning and there's no New Orleans to go back to for you and then you start maybe optimizing in a different way economically and it turns out to be you know for these folks at least turning into more income okay it's also true that New Orleans got really expensive the cost of living in New Orleans won't went way up and so in in real terms it's not clear they were a whole lot better off than they were before and so part of it could be that as well but it's a kind of study I think that really epitomizes the change in how economists are thinking about the world because we have access to data that allows us to answer questions we never could have answered before so as I thought about big data and thought about my own career one thing that became clear to me is that or at least let me say it was a hypothesis I had was that the future of economics was not gonna lie in government data sense but it was going to lie in a place that was acts the most natural place to do research which was with businesses in partnerships academic partnerships with business because the thing about but think about the government's are not really in the data generation business they're in the country running business okay and they floss a few scraps to generate data for various purposes but not really necessarily to make academics happy more often to have statistics to be able to to judge how the economy's work but the thing about businesses every day through their activities they throw enormous amounts of data okay about how customers responded to this price of this product okay and they essentially are data generators and they're doing it anyway so there's no marginal cost to do tagging on to do it so I really came to believe that if you wanted to test economic theory doing it with businesses would be the best way to do it okay so as I told you I'm no expert in Big Data but it turned out that pretending to be one serves me a lot of a very good time in businesses they didn't know any better than to know that I wasn't an expert in Big Data I had written the book for economics and even though it had nothing to do with business for economics if you read it it was not a business book some genius at the publishing company got it labeled a business book okay and so it ended up being the number one business book in the country for a while and when you have that you automatically get anointed as a business expert okay and and so that was wonderful because firms started calling me CEOs would call me because I always his business expert I've written this best-selling book and and the conversation who would usually start with them saying you know hey you know you loved your book would you would you come and do some work for us and I said well you know I'd be happy to but the thing is I just don't know anything about business so I know more about Big Data as I do about business and and they would say all God we just love that self-deprecating Midwestern charm that's just and the truth was I really did but it got me the opportunity to do a whole lot of things and and and I have to say that as is optimistic as I started it turned out to be a lot harder to get things done when I when I actually got down to it now one of the first things I started was a project with a big airline and I happen to have a chance meeting with there was a very happy guy the airline and he worked in the the frequent flyer program and having thought about this it seemed to me of the frequent flyer program was just like the lifeblood of the airline because what an incredible luxury that you sign up almost every every customer you care about and you know every single thing they do because they work very hard to make sure that you know it because they want to get credit for the miles okay and so the the kind of approach I've I've really taken through my career is what economists call a natural experiment approach or a accidental experiment approach where you look for something like so so that the gold standard course is a randomized experiment but if you can run a truly randomized experiment where I could you know take in the airline example so if I could take a bunch of flyers a bunch of flights and I could just make half of them at random have a six-hour delay okay the airline would let me do that okay and the other one would gone time and then I'd track those passengers who got for no good reason no weather no nothing just six hours of punishment okay and then I would track their behavior over time and I'd be able to see what the effect of a six hour delay was on travelers okay so that's what you'd love to do but look I can't do that because the airline's never let me do that but in essence they have that because a lot of times like me today I just got whacked with a big delay for no good reason okay and I could have been on yesterday's flight I could have been on tomorrow's flight but I was on today's flight but the people on tomorrow's flight are a lot like me but they aren't going to get to the delay so I can use that as an accidental experiment like a you know a quasi randomization to then look and see what the behavior is going for okay so I thought about this for a long time but never thought I'd actually the chance to do anything with it and I said to the guy they're like God you know I'm just so curious about so many questions related to this just you know are you you know are you able to tell me any of the answer to these questions based on your analysis of the data and he said well I've run the frequent flyer program my job is to make sure that the people get their miles and they've renamed their miles my job is not to figure out all these other things operational things about weather delays matter okay and and I was absolutely stunned that he had never thought about it like it just didn't think that was part of his job and I said well what it could you do it and he said you know it's kind of interesting and we had a meeting we sat down with the IC people and they said this to try to put together our seven legacy data sets to try to make them talk to totally completely infeasible it would be like a a five person year task to try to do it and I don't even think we did succeed if we did okay so it just so happened that I had an undergraduate I really liked who who had graduated nothing to do and I said hey you know this is a little bit crazy but um I just talked to this airline and and they said there's this impossible task but you got nothing else to do what if what if I pay you okay I said that I'm gonna pay this guy and if you could just find a seat for him in a Carol I just wanna if you let him I just want him to mess around in your data and see if anything interesting happens and I won't published anything without you know is it all be like covered by India whatever and and it turned out the hit just done a lot of downsizing so there was a entire you know there's forty Carol's open for the guy to see it all on him and I sent him down there and it was really interesting interesting attend to business because he just he's the only times I said look I just want you to be nice to people smile all the time have lunch with people ask them about their data and ask them if they can think if you can get to the data set and just see if you can find links between their different data sets of every day and even put together it took them about three months one undergraduate about three months to do the thing that the IT expert at the airline said it was in to do and and it wasn't real time course two assistants with one you know six months nap time and we wait and we looked at the data and it was really interesting because you could see exactly the executives and we start with the exact question I just told you about which is to look at delays and how long delays affect the travelers and it turned out that for the the really high status travelers were to have a lot one really bad delay cost the airline about two thousand dollars in revenue over the next two or four years I mean much bigger I think than they expected much bigger than I expected but you could see just very clearly in this in the sort of aggressive discontinuity approach that you could see it was right and so we presented the data to the to the head of the the frequent flyer program and in others who were invested and they were you know very pleased very excited about it I think they believed they understood I mean the beauty of a lot of the simple methods are that you know one of the most beauties of quasi randomization is people can understand it right if you do complicated statistics no one can ever figure out what we're really doing but we could just show them here's our logic they could see you see the patterns and and you know I really thought this being my first experience of business we were off to the races right so they're gonna understand now that if one undergrad can do this in three months it was worth an investment we're gonna answer every question now it doesn't matter who your pilot is your gate creme you can just think about pricing I mean everything then you go to randomized experiments why settle for these accidental things and and they said they at the end of the presentation that's really great and I said what about you know should we should we try to really blow this thing out and the guy said but my job is to make sure people redeem their miles and get their mouth it's not really and and as hard as I tried to convince a guy that he could be you know if he wanted to be CEO he should just take control of the whole organization he should be the guy who like beats the fear that everything couldn't do it okay so I gave up okay things got worse from there much worse from there so I got smarter over time so I did all that for free which was obviously a mistake exposed because in the end I never got to publish anything but so then I then I realized that the more I charged the better things went with firms and so I started reading charging a whole lot of money to go talk to firms and and that's how I got in with one of the biggest fast-food chains in America in the world and they allowed me to Kannada again they said you do whatever you watched go around you know you think about things differently see if you can come up with some good ideas and I went around to various things and and the one that stands out the most was the issue of Human Resources which I just really believe that that there's so many opportunities for big data exploitation Human Resources and it's just not happening okay so so in preparation for a meeting with the head of HR I actually applied for a job flipping burgers at this place and as you apply to give them some information they ask you about 40 questions okay and the question that I remember best most vivid question was you're wiping the table when when a customer punches you in the face what do you do hey I punch him back be ice cream I've been punched I've been punched call 9 1 C calmly walk to your supervisor and say I've been punched and there's a D - I don't know whether was good ok so we get in the meeting and I say to the head of HR I said I was really wondering what the answer was what was the right answer to question 23 about the punch thing so no I don't have no idea you know I'm not sure and I said well well what answer you know ha more broadly how do you know what the right answers are and she said well we hired a third-party firm one of these big HR you know Mercer or one of these big firms and they um they designed the questions and told us what the right answers were and I said well how do they know what the right answers are and the answer we love cuz they're experts because that's what they do and I said well which questions are the best ones and said I don't know your type I said well which ones are most predictive and she again act like you didn't understand what I'm I said well by my calculation you've had about 4 million people take this test over the last 6 or 7 years and and you've hired about 1.5 million of them and all you got to do is just link up the the answers they gave on this test to how long they stay at the company and whether they're promoted whether they're good what their performances are and and then you know number one you'd really learn what the right answers were to the different questions and then also which questions are informative and you could you know every week you could change out the questions and constantly be experimenting and having better in protest and she said well we've never done we've never done that she said you know my problem is you know I'm the HR manager in my job my job is to keep us from being sued because our questions are discriminatory so as far as I'm concerned I don't ever want to know anyone's answer to any of those questions the third party handle that all I see is red yellow green green means interview for sure yellow means maybe maybe not represents don't interview okay and that's all I want to do I don't want to go any deeper then and I said okay well let's just go a little further with this and I said um you know we started talking about how much will they hire their turnover and I said okay I can imagine that if you did this smartly you can reduce turnover by 10% and I did the calculation can't member exactly woods but I think I think it was something like it was worth perhaps fifty to a hundred million dollars a year to them okay and I did this back to them on calculation so I think this would be worth about 50 million dollars to about of mine and my firm you know even at are incredibly exorbitant rates was only gonna charge her maybe like one hundred fifty thousand dollars so that seems like a pretty good ROI so it's like 50 million but it's gonna last for a long time so it's really something more like worth a couple under me could not have been less interested so um this story actually has an interesting ending because shortly thereafter I was asked to speak to the 50 top people globally of this big fast food chain they it brought them all together and because I was doing this analysis they asked me if I would come and talk to them about their front and I said sure and I said you want me to just like have fun and talk about you know nice things like sumo wrestling and and stuff like that or you want me to really talk about the hard facts about how you run your business I said of course we want the hard facts we want to you know we're we're all about the truth and so I got up in front of these 60 people and I just basically told them three different and I said a lot of nice things here then I told them three anecdotes exactly like this HR anecdote and and the room got quieter and quieter and and the next day the CEO wrote an email saying that we were fired from every all contracts were canceled they didn't want to talk to us ever again which I thought was great actually because who has to work for a company that when you tell them the truth they don't want to you know they're not interested anyway but it definitely was an interesting set of lessons to me here because what I realized is this is the firm that had had a lot of success and they had a lot of success in an old model and their old mala was that you know middle-aged men sat in their offices and knew better than the customer better than the data what the answers were right there you to be a good manager it's coming you had to guess what the answers weren't doing a good job you had to guess you know you decreed what should happen and if you were right things went well but I think that's that is completely a model which is at odds with the new economy and you take you take whether it's Google or Amazon or or you know Facebook whatever and here it's a different world it's a world in which you can experiment and which the data tell you the answers you don't have to rely on inside ok it's a very hard world for business to transcend I think also policymakers have transformed into where the role of the executive goes from being I am the expert who knows everything to my job being can I ask the right questions and kind of marshal the right data to tell me what the answers are to those questions ok so a third business experience I had while ultimately a failure at least was better than the first two and it was with a newer company an Internet company a company that that provided software to people on a subscription basis ok and and we really smart exciting company to work with and one of the things that they they had a problem ok and one of the problems was that when a year came up it was time to renew the subscription something like 35% of the people never renewed ok and what was thought about it is that among the people who never reviewed renewed were people who the day before we who were using it every day of the week up until the day they never reviewed and then not up ok now they didn't know that much about this but they just knew they had a renewal problem and so they call the meeting and it was going to be a typical meeting like many business meetings were people from all different functions and expertise were going to come together and they were going to go around the room and each person was going to offer their suggestion about how they could get these renewal rates up ok and they're gonna say well I think you know so one guy was going to say I think that the problem is really pricing that our pricing is too high if we lower the price and people more like they renew another guy was going to say you know I really think it's about we market to renewal to them that we need to send emails in a week I had it you know two weeks at a time said of one week at over okay and and that's the way business works wait there's a problem people come together opinions are expressed and then essentially the most usually either either some group opinion comes in or more likely the most senior person decrease what the answer is and then people go out and do it okay so just for fun and inside I said could we try something different how about we have the first meeting but it's only 15 minutes long it's just to define the problem okay and then you let me and my team go off so I should have said so we did a data project there where we had tried to put together an integrated like big data set of everything that was useful this company and this took I don't know four people the company six months to do okay so in this is just a snapshot that just goes to show how hard is for these companies to put their data together even though this was a relatively young relatively good company so we had a data set that now with the click of a few buttons to give you some answers so we said now we're have a fifteen minute meeting and they give us a week to put it together all the data that we can relevant to the question and then let's have the meeting in the light of the data right so not people to pay me about what's going on but actually looking at the data so in a week we've put together a bunch of information and the most interesting thing we found is that when you just look at the error codes something like it was like thirty or forty percent of all the the people who didn't renew again tried to renew something went wrong with their credit card and then they never tried again okay and so no one knew this no one had ever bothered to look at it was in the data but it wasn't anyone's job really to look at this poor thing then we showed them a bunch of really simple sets of you know big bar little bar slides as you know comparisons of things complete simple you just divided things up and the meeting was totally different because instead of a meeting being conjectures about what I think's going on was going people said whoa if that's really true to the answer is this and there was a very experimental you oriented firm so it will out of the meeting three randomized experiments respond that went on to show that indeed when someone's credit card gets turned down a personal call to them you know at least it was just to say hey I noticed you in trouble your credit card you know it seems like we have the wrong address and father something led to about half of those people actually been doing done so it turned out that that the real solutions were not what anyone would have suspected but it comes but was completely different way in now the reason I say was a total failure excuse everyone lauded how this worked and the company literally never ever did it again okay then even though people loved what happened just wasn't the way things happened it wasn't the natural it was no one's a job to do what we did which was to have nothing to do but just when given a problem go and work on the - okay and so in some sense it was a failure in that regard as well so so what do I take away from these examples these are just a you know a quick a quick look into some of my experience with big data and and here's here my views whatever 15 years down the road from from my first contact with big data so as an academic I think what I've learned is that collaboration with businesses because sure a whole lot harder than I would have thought it would have been when I started and it really is a it's a problem it's really a problem party of incentives that the things that I'm interested in as an academic are rarely about profit maximization for the firm and it's often about disruption of what they want to do so it's hard to make these collaborations work whether it's done you know we you know whether it's a randomized experiment whether it's just looking at data we're okay but at the same time I still believe that the payoffs are so enormous relative to to you know what the other marginal contribution economists can make in other realms I think it's really valuable and I also think what's really important about these kind of business academic collaborations is that it takes a completely different set of skills or it's a partially different set of skills than other kinds of economic research okay so there's a set of people like me who can't do like crazy theory who aren't going to do new econometrics but can maybe be useful because I can go out and engineer a way to try to trick a firm into doing something that that could teach us about things that we can't learn another way and I and I think that for a certain kind of I mean the economics profession is very much moving in a very mathematical abstract direction and I think it's it's a kind of AIT's an antidote to that for the kind of people who don't fit in to the to the very technical space so I'm all you so we saw I really do believe still that the combination of collaborations with firms big data and randomization and what we call field experiments natural field strength is absolutely going to be at the center of what economics is and what other social sciences are going forward now if I look at it from the perspective of how to Noah a talking head on TV okay if I think about what I would say if if somebody wanted me to talk on TV I think my answer when when when asked about big data is that for most firms I think that in the end big data will be a net negative for them okay now there are a set of firms Google being absolutely the best example who have completely and totally taken big data and made it something which is incredibly valuable right so any Google is it's actually in exactly the right spot right because because Google has gotten people to come tell them what they want and it's a relatively easy problem to satisfy what people want as opposed to what most companies are trying to do so when I talk to other companies the companies that that are that do something like they sell life insurance or something they look we sell life insurance using big data we should be able to sell these people you know cars and sweaters and other things we know so much about the legend okay and the promise that no one really wants that my inbox is so crowded with junk and now Gmail for instance is so good at taking all those out so I never have to look at them that pushing things onto me it's just a very difficult way okay so I think for most big firms okay and it's just an opinion it's my impression based on what I've seen that every big firm I talked to many little firms I talked to the conversation starts with big data okay and then not sure the CEO is insured why or what they're supposed to do with it but they feel like they're getting left behind so they feel like they should do something okay and I think what's likely to happen is that they're going to make tremendous investments without having the right kind of talent around to think about the data the right kind of infrastructure to do it and end up just churning through a whole lot of wasted time and effort to not get very much that's my opinion now I I'm wrong about most things I might be wrong about that but it is definitely the feeling of when I have it now when I think about it if I were a CEO okay what would I do if I were CEO if I were CEO I would start with a huge investment in infrastructure that what I've seen even good firms do not have the IT structure to be able to do anything with their data okay so I talked about the airline I talked about the software company it's surprising the companies you would think would be really good at this turn out to have no ability to do it okay and I never would have imagined how absolutely critical this ability to link between David is because big firms basically generate data for a bunch of different reasons like an airline has ticking data it has frequent flyer data it has operational data about delays and whatnot it is it has Human Resources data and it turns out that it's very difficult to make these all talking so but that's in order to get the the the big payoff a big data that's exactly what you need and so for instance I worked with a bank where they tried to do that and a year and a half later a working group of 20 people still had not succeeded in pulling that off I mean it's just a really hard task and so I think as I see oh you have to decide I'm gonna really try to do this right or not do it at all and the second thing I would do if I were and I had these data is that would make it available to anyone definitely within the firm I even want to think about whether you want to make it available more broadly okay and the reason is that the most interesting uses for data are rarely the use of they were generated for them rarely the guy who said the guy who who you know record the data often doesn't have the ideas and he doesn't have that what you know the idea for hotter they used to try to link multiple dates okay so you kinda need people in the organization who have ideas to be able to get access to the data this is very scary very scary for for companies both in terms of privacy laws but also just in terms of you know of worries about how do you know what people gonna do with it or you know so right now you know Capital One just got into trouble because a bunch of the people who worked there where we're looking and seeing how many how many people were charging using the charge cards at Chipotle and then buying or selling short Chipotle and other stocks using that information okay now it's not not Capital One's follow me these guys were just like started probably having fun seeing that the data were predictive and then realize they could make a lot of money okay but that's the kind of thing to see oh well but I think that's in some sense exactly what you want to happen except you don't want the guy to be trading on it illegally you want to be you know coming up with insights about why you should be doing with this data the other thing I would do as a CEO is I would give them give people the time to actually try to think about these ideas I think that there's um there's a real lack of focus on thinking and idea generation in in firms because people are busy and they have responsibilities and I think that's a luxury that we have in academics an incredible luxury that we have in academics which isn't available to people many people outside of academics I would I would Institute a rule in which I tried to make decision-making based on data and experimentation it's really I mean so we're talking about Big Data but but what is even more shocking to me in business is how little experimentation how easy it is in many settings to do true randomization and how it is a gold standard economic adem --ax and it's medicine and it is virtually absent in in in most businesses even many good businesses and and I think it's it's not just because people aren't trained to do it I think it's something deeper I think the real obstacle is an order to run an experiment you have to start from the perspective that all academics start from which is I don't know I don't know the answer so I'm gonna go write a research paper i'ma run an experiment but the problem is in business I think there's no three worse words you can ever say in business then I don't know to admit that you don't know if you're the head of marketing is such a costly thing that you can never run an experiment because it by running the experiment you're already admitting you don't know it's actually it's took me a long time to understand this but I've really come to believe that it's true but that's the biggest obstacle is admitting ignorance it's just for boat and in firms ok so that's what I would do if I were a CEO and my guess is even then I think I would probably fail changing firm culture enough to really make big data yeah I think it's just a hard hard change to make and then and the reason I think it's so hard in the end comes down to what I would do if I were do better right so my perspective on big data if I were do gooder is that I would drop everything else how I'm doing and I would try to create a new form of Education which would train people to to thrive in a big data world ok can you think about it there there's no particular degree path there's actually a very good training for what you want and I don't become a data scientist wait what they're called these days but so you know what you want from someone is ability to be smart to understand data right to understand databases to be to understand statistics to know something about economics and you know marginal cost equal marginal benefit it's something about human nature ok and to be able to understand experimentation it's a weird mix ok so I don't think I'd be wrong but my is that the computer science programs I've seen you don't do that all you learn how to program okay and that's a piece of it but a small piece of it and statistics is maybe close but the statisticians I've met tend not to care very much about the human side of problems they tend to be interested in the in the in the nature of the the data and whatnot but but not often to be all that effective so I've met a lot of statisticians at firms and and what they do to me is what they do to everyone else it's a use really fancy jargon and you know super high tech stuff then no one else understands but makes a ton of sense it probably gives the right answer but it gives people the heebie-jeebies because they don't understand what they're talking about and so when I work with companies that like I said I think I'd like to show comparison of means and that's often more effective than really complicated things for people who don't understand anything about statistics okay so you need some statistics but they can't they I think actually an econ degree is not such a bad training for data science in the sense that you learned a bunch of things but but but the PhD takes you often to hyperspace of really of a lot of things that are never going to be useful and I think an undergraduate degree doesn't teach you really enough of anything to to make you prepared for it okay in an MBA I mean some sense this is maybe a modern version of the NBA would be this thing but the current NBA could not be further from what this is that the curtain mu has essentially zero content in this yeah I'm not even joking so so I tell you I'm not I'm not a um I'm not really a do-gooder at heart but but but at times I get taken over by do-gooding desires and so I did indeed so you know so so I believe that there is a valuable place in the world for creating some new I don't know called a degree I have no idea how you actually do it but it's going to be some mitts of these things like something like an MBA or like an MVP which which is designed to create people who will thrive in a big data world okay and I think little exaggerated say do-gooder I mean in the sense that I think it will have tremendous private value for the people who do it who take the degree so I think you can charge them a ton of money for it so it's even but the do-gooding part is like that getting the whole thing rolling anyone who wants to do it I would encourage them to go and do it I took one halting step in this direction and I did it at the University of Chicago MBA program booth okay and I thought well like these are the best some of the best business school students in the world and I'm gonna offer class I did it with with my my colleague John list and we focused me on Big Data more on experimentation okay so our class was on experiment in business and what we did is and it turned out to be an incredibly popular offering okay that it was one of the most popular offerings it was oversubscribed by like six fold they had and the beauty of the Chicago Business Gould's actually was prices to allocate right so they only people who got in our class are people who allocated every single bit of their budget to that one class and had the other three classes they took just had to be whatever was for free okay and so so we had the most you know the most motivated qualified MBAs anywhere and they came and took her class and so John taught him the theory that's the statistics experimentation and my only job was and for you know for for ten weeks for an hour and a half was to just tell them about examples of experiments that we and others had run in firms okay so I could do about three per day so so over the course of the ten weeks I presented in 30 different experiments where we talked about what the problem was how's it how the randomization helped how they did there am ization with the application thing okay people seem to like the class very enthusiastic lots of participation and and just to show how far our our do gating spirit was going after the class was over we invited all the students out for drinks to get their feedback on the class so we could see about how you know how to make a better medicine and and though and the thing I remember so vividly was uh over over beers waited until the folks that had a couple beers so they might be more honest in their in their responses and I said so um you know what do you think you think you use this in in your in your on their jobs okay and I should say you Gooding is maybe a little exaggeration because what we actually hoped we were doing was to send 80 80 automatons out into the world who thought Leviton list know everything about experiments for the rest of my career whenever I have an interesting problem I'm gonna call my old professors on the phone and let them you know get you know hundreds and hundreds of great academic papers out of this okay and so I said hey what do you know what do you think it's something you'd ever ever use in your in your job and I remember one person spoke and everyone else nodded and he said um you know it was a really cool class loved it so entertaining but I don't see how I could ever apply it in my job and I said well well why not I mean it seems like you know like what's your job and he said well I do I do marketing at a at a at a beverage company and I said we'll wait a second two of the experiments I talked about were marketing experiments at a beer company they said yeah but I'm in soft drinks and at that point that this is gonna be a lot harder row to hold and I had thought because anyway that's what I had to talk to you about tonight thank you very much pecan
Info
Channel: NAS Colloquia
Views: 14,137
Rating: 4.9047618 out of 5
Keywords: Steven Levitt, Big Data (Industry), National Academy Of Sciences (Membership Organization), Statistics (Field Of Study), sackler colloquia
Id: r5jATFtKtI8
Channel Id: undefined
Length: 59min 55sec (3595 seconds)
Published: Thu Apr 02 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.